Z X T T W Y U Y using instrumental variables in education research SREE workshop march 2010 sean f reardon outline a little background on the potential outcomes framework what is an instrumental variable? and what’s it good for? assumptions needed to instrumental variables practical methods of estimating IV models sources of bias in IV models additional topics © 2010 by sean f. reardon. all rights reserved. potential outcomes framework a stylized example what is the effect of receiving tutoring in math on student math achievement? some made-up data for illustration: Observed Student Treatment and Achievement Data ID 1 2 3 4 5 6 Treatment Condition no tutoring no tutoring no tutoring tutoring tutoring tutoring © 2010 by sean f. reardon. all rights reserved. Test Score 55 60 65 60 72 63 Observed Student Treatment and Achievement Data ID 1 2 3 4 5 6 Treatment Condition no tutoring no tutoring no tutoring tutoring tutoring tutoring Test Score 55 60 65 60 72 63 Observed and Unobserved Potential Achievement Data Student ID 1 2 3 Untutored Average 4 5 6 Tutored Average Overall Average Treatment Condition no tutoring no tutoring no tutoring Score if not Tutored 55 60 65 60 tutoring tutoring tutoring © 2010 by sean f. reardon. all rights reserved. Score if Tutored 60 72 63 65 Observed Score 55 60 65 60 60 72 63 65 62.5 Tutoring Effect Definition of an “effect” The effect, , [on some outcome Y] [for some unit i] [of some treatment condition t relative to some other condition c] is defined as the difference between the value of Y that would be observed if unit i were exposed to treatment t and the value of Y that would be observed if unit i were exposed to treatment c. More formally, we define the effect of t relative to c on Y for unit i as: We define the average effect of t relative to c in a population P as: © 2010 by sean f. reardon. all rights reserved. The “Fundamental Problem of Causal Inference” (Holland, 1986) Although both and are defined in principle, it is impossible to observe both of them for the same unit (because any given unit can be exposed to only one of t or c). Thus, the causal effect cannot be observed. The problem of causal inference is thus a problem of missing data. The outcome Yi under its “counterfactual” condition is never observed. How can we construct unbiased estimates of the average potential outcomes and under the counterfactual conditions? © 2010 by sean f. reardon. all rights reserved. Observed Student Treatment and Achievement Data ID 1 2 3 4 5 6 Treatment Condition no tutoring no tutoring no tutoring tutoring tutoring tutoring Test Score 55 60 65 60 72 63 Observed and Possible Unobserved Potential Achievement Data Student ID 1 2 3 Untutored Average 4 5 6 Tutored Average Overall Average Treatment Condition no tutoring no tutoring no tutoring tutoring tutoring tutoring Score if not Tutored 55 60 65 60 55 60 65 60 60 © 2010 by sean f. reardon. all rights reserved. Score if Tutored 60 72 63 65 60 72 63 65 65 Observed Score 55 60 65 60 60 72 63 65 62.5 Tutoring Effect +5 +12 -2 +5 +5 +12 -2 +5 +5 Observed Student Treatment and Achievement Data ID 1 2 3 4 5 6 Treatment Condition no tutoring no tutoring no tutoring tutoring tutoring tutoring Test Score 55 60 65 60 72 63 Observed and Possible Unobserved Potential Achievement Data Student ID 1 2 3 Untutored Average 4 5 6 Tutored Average Overall Average Treatment Condition no tutoring no tutoring no tutoring tutoring tutoring tutoring Score if not Tutored 55 60 65 60 55 70 70 65 62.5 © 2010 by sean f. reardon. all rights reserved. Score if Tutored 60 55 65 60 60 72 63 65 62.5 Observed Score 55 60 65 60 60 72 63 65 62.5 Tutoring Effect +5 -5 0 0 +5 +2 -7 0 0 What if we can’t conduct an RCT? If we can randomize students to receive either tutoring or no tutoring, and ensure that every student complies with his or her assigned treatment status, the randomization will allow us to estimate the effect of tutoring very easily. but what if students don’t comply with their treatment assignment? some assigned to tutoring don’t go to tutoring some assigned to no tutoring get tutored anyway this means tutoring is no longer randomly assigned – at least some of the variation in treatment status is potentially endogenous so a comparison of those assigned to tutoring and no tutoring won’t give us an estimate of the effect of tutoring (but only the effect of being assigned to tutoring) this is one case where instrumental variables are useful instrumental variables models What is an instrumental variable? an instrumental variable is an exogenous factor that causes some of the variation in treatment status (though need not be all) we use it to identify the portion of variation in treatment that is exogenous and then only rely on that exogenous variation to estimate the effect of treatment © 2010 by sean f. reardon. all rights reserved. A general structural model Z X T T W Y U Y © 2010 by sean f. reardon. all rights reserved. T: treatment status Y: outcome measure X: observed confounders U: unobserved confounders W: observed ignorable causes of Y Y: unobserved ignorable causes of Y T: unobserved ignorable causes of T Z: instrument (observed ignorable cause of T) Relating treatments and outcomes T Y we would like to estimate the effect of T on Y this involves seeing how T and Y are related but to infer a causal relationship from the covariance of T and Y, we need to understand the source of variation in T © 2010 by sean f. reardon. all rights reserved. why do some people get different types/degrees of the treatment? Relating treatments and outcomes Z variation in T may be caused by factors unrelated to the outcome Y T Y T © 2010 by sean f. reardon. all rights reserved. these may be observed (Z) or unobserved (T) if the only variation in Z comes from factors unrelated to Y, then T is as good as randomly assigned, so getting a causal estimate is easy Relating treatments and outcomes X variation in T may be caused, in part, by observed factors that are related to the outcome Y T Y T as long as there is some variation in T that is caused by some (not necessarily observable) ignorable cause (Z or T), we can still easily get an estimate of the effect of T © 2010 by sean f. reardon. all rights reserved. observed confounders (X) statistically control for X (compute relationship between T and Y, conditional on X) Relating treatments and outcomes X T variation in T may be caused, in part, by observed and unobserved factors that are related to the outcome Y Y T U here, we cannot get an unbiased estimate of the effect of T © 2010 by sean f. reardon. all rights reserved. observed confounders (X) unobserved confounders (U) reverse causality (Y affects T) statistical control can’t adjust for U the ignorable cause (T) is not observed Relating treatments and outcomes Z X T Y T U © 2010 by sean f. reardon. all rights reserved. if we cannot observe all the confounders (or if Y affects T), then we need some observed factor that affects T but does not otherwise affect Y this (Z) is called an instrument (or instrumental variable). because the part of the variation in T that is induced is ignorable (as good as random), we can use this part of the variation in T to identify the effect of T on Y Tutoring example, revisited the observed data is not sufficient to estimate the average effect of tutoring what if we can’t do an experiment, or if we do an experiment and not everyone complies? © 2010 by sean f. reardon. all rights reserved. tutoring voucher as an instrument randomly assign eligible students to receive a either voucher allowing them to receive free tutoring (Z=1) or no voucher (Z=0). observe whether students attend tutoring (T=1) or not (T=0). note: this choice is not random—students may choose tutoring or not, regardless of voucher status (Ti≠Zi). observe later achievement (Y) we want to estimate the effect of T (tutoring vs no tutoring) on Y (achievement). © 2010 by sean f. reardon. all rights reserved. Four subpopulations (angrist, imbens, & rubin, 1996) compliers those who would comply with treatment assignment (those for whom Ti=Zi) non-compliers always-takers never-takers those who would always receive the treatment, regardless of assignment (those for whom Ti=1) those who would never receive the treatment, regardless of assignment (those for whom Ti=0) defiers those who would always do the opposite of treatment assignment (those for whom Ti=1-Zi) © 2010 by sean f. reardon. all rights reserved. Observed Outcomes N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply): Offered Voucher No Yes Total Tutored No Yes 45 5 15 35 60 40 © 2010 by sean f. reardon. all rights reserved. Proportion Tutored .10 .70 .40 Observed Outcomes N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply): Offered Voucher No Yes Total Tutored No Yes 45 5 15 35 60 40 © 2010 by sean f. reardon. all rights reserved. Proportion Tutored might be compliers or never-takers .10 .70 .40 Observed Outcomes N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply): Offered Voucher No Yes Total Tutored No Yes 45 5 15 35 60 40 © 2010 by sean f. reardon. all rights reserved. Proportion Tutored might be .10 defiers or never-takers .70 .40 Observed Outcomes N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply): Offered Voucher No Yes Total Tutored No Yes 45 5 15 35 60 40 © 2010 by sean f. reardon. all rights reserved. Proportion Tutored might be defiers or always-takers .10 .70 .40 Observed Outcomes N=100, 50% receive vouchers, but not all comply with assignment (only 60% comply): Offered Voucher No Yes Total Tutored No Yes 45 5 15 35 60 40 © 2010 by sean f. reardon. all rights reserved. Proportion Tutored might be .10 compliers or always-takers .70 .40 estimating the proportion of compliers assume there are no defiers then everyone with Z=1, T=0 is a never-taker (15 of 50 (30%) with Z=1 in our example) there should be the same proportion (30%) of never-takers among those with Z=0, because Z is random the same logic implies there are 10% of the population who are always-takers thus, 60% (100% - 30% - 10%) are compliers © 2010 by sean f. reardon. all rights reserved. Estimating the proportion of compliers we can also estimate this by regressing the treatment variable on the instrument tutor = G0 + G1*voucher + e tutor = .10 + 0.60*voucher Thus, the average effect of being assigned a voucher on tutoring status is +0.60, meaning that the average student’s probability of receiving tutoring increases by 0.60 if assigned a voucher (which means that 60% of the students comply with the voucher assignment). © 2010 by sean f. reardon. all rights reserved. Observed Outcomes Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1 Offered Tutored Voucher No Yes Total No 48.3 70.0 50.5 Yes 44.9 61.6 56.6 Total 47.5 62.6 53.5 © 2010 by sean f. reardon. all rights reserved. Observed Outcomes Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1 Offered Tutored here we’re Voucher No Yes Total assuming no defiers No 48.3 70.0 50.5 (later we will see why Yes 44.9 61.6 56.6 this is necessary) Total 47.5 62.6 53.5 average outcome among untutored compliers and never-takers © 2010 by sean f. reardon. all rights reserved. Observed Outcomes Estimated effect of the voucher offer on test scores = 56.6 – 50.5 = +6.1 Offered Tutored here we’re Voucher No Yes Total assuming no defiers No 48.3 70.0 50.5 (later we will see why Yes 44.9 61.6 56.6 this is necessary) Total 47.5 62.6 53.5 average outcome among untutored compliers and never-takers © 2010 by sean f. reardon. all rights reserved. average outcome among tutored compliers and always-takers OLS estimates OLS yields: test = 47.5 + 15.1*(tutored) the estimated effect of tutoring is +15.1 points but we should worry about whether this is biased, because some students chose whether to get tutoring or not. the tutored group includes compliers and alwaystakers; the control group includes compliers and never-takers; so they are not equivalent groups © 2010 by sean f. reardon. all rights reserved. The Wald IV estimator if we are willing to assume that the voucher offer had no effect on the outcome of the noncompliers (because it did not alter their treatment status and does not affect their outcome through any other way), then we can estimate the effect of tutoring like this: The average effect of the voucher in the population is estimated to be +6.1 but only 60% of students’ decisions about whether to get tutoring were affected by the voucher offer (only 60% of sample are compliers) © 2010 by sean f. reardon. all rights reserved. Wald estimator average effect in population ( ) = average effect on compliers ( ) x proportion who are compliers ( ) + average effect on non-compliers ( ) x proportion who are non-compliers ( ) © 2010 by sean f. reardon. all rights reserved. Wald estimator this says that the average effect of the treatment among the compliers equals the average effect in the population divided by the proportion of the population who are compliers thus, the average effect among the compliers is = +6.1/.60 = +10.1 © 2010 by sean f. reardon. all rights reserved. What have we learned? An instrumental variable allows us to estimate the average effect of the treatment among those whose treatment status is affected by the instrument (“compliers”) called the “local average treatment effect” (LATE) note that we can’t identify who the compliers are We can’t estimate the average treatment effect in the population, because we can’t estimate the effect among non-compliers because the instrument doesn’t affect their treatment status, there is no exogenous variation in their treatment status that we can use. © 2010 by sean f. reardon. all rights reserved. What assumptions have we made? the instrument only affects the outcome through its impact on the treatment (this is called the exclusion restriction) the instrument is ignorably (randomly) assigned this allows us to estimate the effect of the instrument on the outcome and on the treatment the instrument affects the treatment for at least some people otherwise there are no compliers there are no defiers © 2010 by sean f. reardon. all rights reserved. more general IV models what if treatment is not binary? above we assumed the treatment (tutoring) was binary but not all treatments are binary we could offer vouchers of different amounts students could receive different amounts of tutoring as a result, compliance may take on many values for some students, the amount of tutoring received may be strongly affected by the instrument; for others, it may be weakly affected or not at all affected. © 2010 by sean f. reardon. all rights reserved. a more general model of the IV estimator for a given individual i, is the effect of Z on Y this effect may vary across individuals we would like to estimate the average effect, © 2010 by sean f. reardon. all rights reserved. Zi i Yi 1. exclusion restriction if the only way that Z affects Y is through its effect on T, then we have . or, put differently, Zi γi Ti i Yi the assumption that the only way that Z affects Y is through its effect on T is called the exclusion restriction. © 2010 by sean f. reardon. all rights reserved. 2. zero compliance-effect covariance we can write the average effect of Z on Y as if we assume the assumption that is called the zero compliance-effect covariance assumption. © 2010 by sean f. reardon. all rights reserved. , then we have 3. instrument relevance as long as the assumption that is sometimes called the instrument relevance assumption; or sometimes just referred to as the assumption that the instrument affects the treatment. if is small (close to zero), we say that the instrument is a weak instrument. , we can rewrite the above as © 2010 by sean f. reardon. all rights reserved. 4. the instrument is ignorably assigned if the above three assumptions are met, we have if Z is ignorably assigned, then we can easily estimate both (the average effect of Z on Y) and (the average effect of Z on T). the assumption of ignorable assignment thus makes estimation of the effect of T on Y possible. © 2010 by sean f. reardon. all rights reserved. what do these assumptions mean? exclusion restriction: the offer of a tutoring voucher does not affect students’ achievement except by affecting the amount of tutoring they receive zero compliance-effect covariance: there is no correlation between how strongly a voucher offer affects the amount of tutoring a student gets and how effective tutoring is for that student © 2010 by sean f. reardon. all rights reserved. what do these assumptions mean? instrument relevance: the offer of a voucher has some effect, on average, on the amount of tutoring students receive (at least one student is affected by the offer). ignorable assignment of the instrument: the voucher offer is randomly assigned (this would be violated, for example, if the principal gave vouchers to students she deemed most in need of tutoring). © 2010 by sean f. reardon. all rights reserved. some examples NYC voucher experiment (howell et al, 2002; krueger & zhu, 2004) Effect of schooling on wages, using quarter of birth as instrument (angrist & kreuger, 1991). Effect of teacher absence on student achievement, using snowfall as instrument (miller, murnane & willet, 2007) Effects of segregation on educational attainment and wages, using railroads as an instrument (ananat 2007) © 2010 by sean f. reardon. all rights reserved. estimating IV models estimating IV models in practice in practice, we don’t usually compute the effect of Z on Y and Z on T and divide them because we made need more complex models (if we want to include other covariates in the model, for example) because we need to compute standard errors most common methods of estimating IV models is with two-stage least squares (TSLS or 2SLS). © 2010 by sean f. reardon. all rights reserved. Three relevant equations 1: is the person-specific effect of Z on Y. 2: is the person-specific effect of Z on T. but the equation we really are interested in is 3: is the person-specific effect of T on Y. © 2010 by sean f. reardon. all rights reserved. Three relevant equations 1: is the person-specific effect of Z on Y. the “reduced form” equation 2: is the person-specific effect of Z on T. but the equation we really are interested in is 3: is the person-specific effect of T on Y. © 2010 by sean f. reardon. all rights reserved. Three relevant equations 1: is the person-specific effect of Z on Y. the “reduced form” equation the “first stage” equation 2: is the person-specific effect of Z on T. but the equation we really are interested in is 3: is the person-specific effect of T on Y. © 2010 by sean f. reardon. all rights reserved. Three relevant equations 1: is the person-specific effect of Z on Y. the “reduced form” equation the “first stage” equation 2: is the person-specific effect of Z on T. but the equation we really are interested in is the “second stage” 3: equation is the person-specific effect of T on Y. © 2010 by sean f. reardon. all rights reserved. two-stage least squares fit the first-stage equation (estimate the effect of Z on T); compute fitted values: fit the second-stage equation, using predicted values of T in place of observed values of T: © 2010 by sean f. reardon. all rights reserved. two-stage least squares fit the first-stage equation (estimate the effect of Z on T); compute fitted values: fit the second-stage equation, using predicted values of T in place of observed values of T: © 2010 by sean f. reardon. all rights reserved. two-stage least squares fit the first-stage equation (estimate the effect of Z on T); compute fitted values: fit the second-stage equation, using predicted values of T in place of observed values of T: © 2010 by sean f. reardon. all rights reserved. two-stage least squares because the predicted values of T from the firststage equation include only the variation in T that is caused by the instrument, the estimated coefficient from the second-stage equation will be unbiased (as long as the 4 IV assumptions are met). if you do this by hand, you’ll get the wrong standard errors; statistical software usually has built-in routines (e.g., -ivregress- command in Stata) to compute correct standard errors. © 2010 by sean f. reardon. all rights reserved. Effects of attending charter school we can’t randomize students to charter or traditional public schools Abdulkadiroglu, et al (2009) examine students who apply to oversubscribed charter schools, whose admission is determined by lottery (randomization) instrument is winning the lottery treatment is # of years in a charter school © 2010 by sean f. reardon. all rights reserved. example: effect of charter schooling first stage (compliance) © 2010 by sean f. reardon. all rights reserved. reduced form 2sls (effect of winning (effect of a lottery on ach.) year in charter) are the IV assumptions valid in this study? exclusion restriction? zero compliance-effect covariance? instrument relevance? ignorable assignment? © 2010 by sean f. reardon. all rights reserved. sources of bias in IV models sources of bias in IV failure of exclusion restriction assumption failure of ignorability assumption failure of zero compliance-effect covariance assumption finite sample bias weak instruments cause 3 problems: exacerbate bias due to failure of assumptions (exclusion restriction, ignorability, zero covariance) exacerbate finite sample bias lead to incorrect estimation of standard errors when using twostage least squares © 2010 by sean f. reardon. all rights reserved. failure of the exclusion restriction recall that the exclusion restriction says that the only way that Z affects Y is through its effect on T. as a result, we can write © 2010 by sean f. reardon. all rights reserved. Zi γi Ti i Yi failure of the exclusion restriction if the exclusion restriction is violated, then there is some other path through which Z affects Y as a result, we can write © 2010 by sean f. reardon. all rights reserved. Zi i γi Ti i Yi failure of the zero covariance assumption averaging the above in the population now, dividing through by , we get so the IV estimator (the ratio of the average effect of Z on Y to the average effect of Z on T) will be biased if is small, the biases will be larger © 2010 by sean f. reardon. all rights reserved. failure of the zero covariance assumption averaging the above in the population now, dividing through by , we get so the IV estimator (the ratio of the average effect of Z on Y to the average effect of Z on T) will be biased if is small, the biases will be larger © 2010 by sean f. reardon. all rights reserved. bias due to failure of the exclusion restriction bias due to failure of the zero compliance-effect covariation assumption failure of the zero covariance assumption if all the assumptions except the zero compliance-effect covariance assumption are met, we have so the IV model will estimate the complianceweighted average treatment effect (CWATE). if T is binary and there are no defiers, this will be the same as the average effect among the compliers (LATE), because non-compliers will get 0 weight. © 2010 by sean f. reardon. all rights reserved. failure of the ignorability assumption if the instrument is not ignorably assigned, then we cannot obtain unbiased estimates of the effect of Z on Y or of the effect of Z on T. Thus, the ratio of the two may be biased. © 2010 by sean f. reardon. all rights reserved. weak instruments weak instruments do not, strictly-speaking, violate any of the IV assumptions, but they do exacerbate the bias from other assumptions rule of thumb: an instrument is weak if the Fstatistic on the instrument(s) from the first stage equation is <10. © 2010 by sean f. reardon. all rights reserved. weak instruments and bias the IV estimator weak instruments cause 3 problems with IV estimator: exacerbate bias due to failure of the exclusion restriction, ignorability, and monotonicity exacerbate finite sample bias lead to incorrect estimation of standard errors when using two-stage least squares finite sample bias even if the 4 IV assumptions are met, IV estimation is biased unless using an infinite sample most pronounced with weak instruments and small samples © 2010 by sean f. reardon. all rights reserved. additional uses mediation models suppose we randomly assign a treatment (e.g., teacher professional development) that we think will affect student learning by affecting instructional practice we can treat the PD as an instrument, and the mediator (instructional practice) as the ‘treatment’ and use IV to estimate the effect of instructional practice (which can’t be randomized) on learning but worry about exclusion restriction (are there other ways that the PD could affect learning?) © 2010 by sean f. reardon. all rights reserved. multiple mediator models suppose we have a randomize students to 3 treatment conditions. two first stage equations: second stage equation: © 2010 by sean f. reardon. all rights reserved. IV to correct for measurement error suppose we want to estimate the effect of cognitive skill on wages: if cognitive skill is measured with error by ACH, OLS will give a biased estimate of . if we have a second test of skills, we can use one test as an instrument for the second test, and then use the predicted value of the second test in the wage equation. called “errors-in-variables” (EIV) model. © 2010 by sean f. reardon. all rights reserved.