Instrumental variables Anant Nyshadham Instrumental Variables • What is a natural experiment? – “situations where the forces of nature or government policy have conspired to produce an environment somewhat akin to a randomized experiment” • Angrist and Krueger (2001, p. 73) • Natural experiments can provide a useful source of exogenous variation in problematic regressors – But they require detailed institutional knowledge Instrumental Variables and Natural Experiments • Some natural experiments in economics – Existing policy differences, or changes that affect some jurisdictions (or groups) but not others • Minimum wage rate • Excise taxes on consumer goods • Unemployment insurance, workers’ compensation – Unexpected “shocks” to the local economy • Coal prices and the Middle East oil embargo (1973) • Agricultural production and adverse weather events Instrumental Variables and Natural Experiments • Some potential pitfalls – Not all policy differences/changes are exogenous • Political factors and past realizations of the response variable can affect existing policies or policy changes – Generalizability of causal effect estimates • Results may not generalize beyond the units under study – Heterogeneity in causal effects • Results may be sensitive to the natural experiment chosen in a specific study (L.A.T.E.) Instrumental Variables and Natural Experiments • Some natural experiments used as IV which are of interest to development economists – Acemoglu Johnson & Robinson (2001): settler mortality – Paxson (1992): rainfall – Schultz & Tansel (1997): healthcare prices True Model • Suppose true model is: – Y = a + bX + cV + e • a, b, and c are parameters to be estimated; e is error term • Do not observe V • Can only estimate: – Y = a + bX + e • What do we do to get b instead of b? Methods • Y = a + bX + η; η = cV + e • Differencing/FE • Find groups with common V (assumption), but variation in X • Subtract off V to remove it from error term • Instrumental Variable • Find instrument Z; X = j + kZ + i • Predict portion of X which does not correlate with V • Use this portion in original estimating equation IV Criteria and Assumptions • • • • Step/Stage 1: X = j + kZ + I X’ = k’Z Step/Stage 2: Y = a + bX’ + η; recover true b Criteria for Z Z must sufficiently predict X: k>>0 or k<<0 • Testable using estimate of k from first stage • Z must only impact Y through X • Cov(Z,η)=0; Cov(Z,V)=0 & Cov(Z,e)=0 • Z does not belong original estimation equation • Assumption, untestable An IV example: Angrist and Krueger (1991), J.L.E. • Returns to education (Y = wages) – Problem of omitted “ability bias” • Years of schooling vary by quarter of birth – Compulsory schooling laws, age-at-entry rules – Someone born in Q1 is a little older and will be able to drop out sooner than someone born in Q4 • Q.O.B. can be treated as a useful source of exogeneity in schooling Angrist and Krueger (1991), J.L.E. • People born in Q1 do obtain less schooling – But pay close attention to the scale of the y-axis – Mean difference between Q1 and Q4 is only 0.124, or 1.5 months • So...need large N since R2X,Z will be very small – A&K had over 300k for the 1930-39 cohort Source: Angrist and Krueger (1991), Figure I Angrist and Krueger (1991), J.L.E. • Final 2SLS model interacted QOB with year of birth (30), state of birth (150) – OLS: b = .0628 (s.e. = .0003) – 2SLS: b = .0811 (s.e. = .0109) • Least squares estimate does not appear to be badly biased by omitted variables – But...replication effort identified some pitfalls in this analysis that are instructive Bound, Jaeger, and Baker (1995), J.A.S.A. • Potential problems with QOB as an IV – Correlation between QOB and schooling is weak • Small Cov(X,Z) introduces finite-sample bias, which will be exacerbated with the inclusion of many IV’s – QOB may not be exogenous (correlated with unobservable determinants of wages, e.g. family income) – QOB may not satisfy exclusion restriction (e.g. age relative to peers changes social dynamics, competition, leadership skill etc.) Bound, Jaeger, and Baker (1995), J.A.S.A. • Even if the instrument is “good,” matters can be made far worse with IV as opposed to LS – Weak correlation between IV and endogenous regressor can pose severe finite-sample bias • And…really large samples won’t help, especially if there is even weak endogeneity between IV and error • First-stage diagnostics provide a sense of how good an IV is in a given setting – F-test and partial-R2 on IV’s Useful Diagnostic Tools for IV Models • Tests of instrument relevance – Weak IV’s → Large variance of bIV as well as potentially severe finite-sample bias • Tests of instrument exogeneity – Endogenous IV’s → Inconsistency of bIV that makes it no better (and probably worse) than bLS • Durbin-Wu-Hausman test – Endogeneity of the problem regressor(s) Tests of Instrument Relevance • Diagnostics based on the F-test for the joint significance of the IV’s – Nelson and Startz (1990); Staiger and Stock (1997) – Bound, Jaeger, and Baker (1995) • Partial R-square for the IV’s – Shea (1997) • There is a growing econometric literature on the “weak instrument” problem Tests of Instrument Exogeneity • Model must be overidentified, i.e., more IV’s than endogenous X’s – H0: All IV’s uncorrelated with structural error • Overidentification test: 1. Estimate structural model 2. Regress IV residuals on all exogenous variables 3. Compute NR2 and compare to chi-square • df = # IV’s – # endogenous X’s Application: Adolescent Work and Delinquent Behavior • Prior research shows a positive correlation between teenage work and delinquency – Reasons to suspect serious endogeneity bias • 2nd wave of the NLSY97 (N = 8,368) – Y = 1 if committed delinquent act (31.9%) – X = 1 if worked in a formal job (52.6%) – Z1 = 1 if child labor law allows 40+ hours (14.2%) – Z2 = 1 if no child labor restriction in place (39.6%) Regression Model Ignoring Endogeneity . reg pcrime work if nomiss==1 & wave==2 Source | SS df MS -------------+-----------------------------Model | 1.37395379 1 1.37395379 Residual | 1815.97786 8366 .217066443 -------------+-----------------------------Total | 1817.35182 8367 .217204711 Number of obs F( 1, 8366) Prob > F R-squared Adj R-squared Root MSE = = = = = = 8368 6.33 0.0119 0.0008 0.0006 .4659 -----------------------------------------------------------------------------pcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------work | .0256633 .0102005 2.52 0.012 .0056677 .0456588 _cons | .3053242 .0074009 41.26 0.000 .2908167 .3198318 ------------------------------------------------------------------------------ • Teenage workers significantly more delinquent – Modest effect but consistent with prior research First-Stage Model . reg work law40 nolaw if nomiss==1 & wave==2 Source | SS df MS -------------+-----------------------------Model | 271.829722 2 135.914861 Residual | 1814.33364 8365 .216895832 -------------+-----------------------------Total | 2086.16336 8367 .249332301 Number of obs F( 2, 8365) Prob > F R-squared Adj R-squared Root MSE = = = = = = 8368 626.64 0.0000 0.1303 0.1301 .46572 -----------------------------------------------------------------------------work | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------law40 | .0688902 .0154383 4.46 0.000 .0386274 .099153 nolaw | .3818684 .0110273 34.63 0.000 .3602521 .4034847 _cons | .3655636 .0074883 48.82 0.000 .3508847 .3802425 ------------------------------------------------------------------------------ • State child labor laws affect probability of work – This is a really strong first stage (F, R2) Two-Stage Least Squares Model . ivreg pcrime (work = law40 nolaw) if nomiss==1 & wave==2 Instrumental variables (2SLS) regression Source | SS df MS -------------+-----------------------------Model | -19.5287923 1 -19.5287923 Residual | 1836.88061 8366 .219564978 -------------+-----------------------------Total | 1817.35182 8367 .217204711 Number of obs F( 1, 8366) Prob > F R-squared Adj R-squared Root MSE = = = = = = 8368 6.86 0.0088 . . .46858 -----------------------------------------------------------------------------pcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------work | -.0744352 .0284206 -2.62 0.009 -.1301466 -.0187238 _cons | .3580171 .0158135 22.64 0.000 .3270187 .3890155 -----------------------------------------------------------------------------Instrumented: work Instruments: law40 nolaw ------------------------------------------------------------------------------ What Do the Models Suggest Thus Far? • Completely different conclusions! – OLS = Teenage work is criminogenic (b = +.026) • Delinquency risk increases by 8.5 percent (base = .305) – 2SLS = Teenage work is prophylactic (b = –.074) • Delinquency risk decreases by 20.7 percent (base = .358) • Which model should we believe? – We still have some additional diagnostic work to do to evaluate the 2SLS model • Overidentification test Overidentification Test from the Software . overid Tests of overidentifying restrictions: Sargan N*R-sq test 0.509 Chi-sq(1) Basmann test 0.508 Chi-sq(1) P-value = 0.4757 P-value = 0.4758 • IV’s jointly pass the exogeneity requirement – Notice that -overid- provides a global test, whereas the regression-based approach allows you to test the IV’s jointly as well as individually So Where Do We Stand with the Work-Delinquency Question? • Are child labor laws correlated with work? – YES = first-stage F is large • Are child labor laws good IV’s? – YES = overidentification test is not rejected • Is teenage work endogenous? – YES = Hausman test is rejected • Prior research findings that teenage work is criminogenic are selection artifacts Now…What Happens if I Throw in a Potentially Bogus Instrument? • Now there are three instrumental variables – Z1 = 1 if child labor law allows 40+ hours (14.2%) – Z2 = 1 if no child labor restriction in place (39.6%) – Z3 = 1 if high unemployment rate in county (20.1%) • A little more difficult to tell a convincing story that the unemployment rate is only related to delinquency through work experience – But let’s see what happens First-Stage Model . reg work law40 nolaw highun if nomiss==1 & wave==2 Source | SS df MS -------------+-----------------------------Model | 277.229696 3 92.4098987 Residual | 1808.93366 8364 .216276144 -------------+-----------------------------Total | 2086.16336 8367 .249332301 Number of obs F( 3, 8364) Prob > F R-squared Adj R-squared Root MSE = = = = = = 8368 427.28 0.0000 0.1329 0.1326 .46505 -----------------------------------------------------------------------------work | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------law40 | .0636421 .0154519 4.12 0.000 .0333525 .0939317 nolaw | .3775975 .0110447 34.19 0.000 .3559472 .3992479 highun | -.0636009 .0127283 -5.00 0.000 -.0885517 -.0386502 _cons | .3808061 .0080759 47.15 0.000 .3649754 .3966368 ------------------------------------------------------------------------------ • So far so good and consistent with expectation Two-Stage Least Squares Model . ivreg pcrime (work = law40 nolaw highun) if nomiss==1 & wave==2 Instrumental variables (2SLS) regression Source | SS df MS -------------+-----------------------------Model | -16.0635514 1 -16.0635514 Residual | 1833.41537 8366 .219150773 -------------+-----------------------------Total | 1817.35182 8367 .217204711 Number of obs F( 1, 8366) Prob > F R-squared Adj R-squared Root MSE = = = = = = 8368 5.47 0.0194 . . .46814 -----------------------------------------------------------------------------pcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------work | -.0657624 .0281159 -2.34 0.019 -.1208765 -.0106483 _cons | .3534516 .0156602 22.57 0.000 .3227537 .3841496 -----------------------------------------------------------------------------Instrumented: work Instruments: law40 nolaw highun ------------------------------------------------------------------------------ Post-Hoc Diagnostics . overid Tests of overidentifying restrictions: Sargan N*R-sq test 5.301 Chi-sq(2) Basmann test 5.301 Chi-sq(2) P-value = 0.0706 P-value = 0.0706 . ivendog Tests of endogeneity of: work H0: Regressor is exogenous Wu-Hausman F test: Durbin-Wu-Hausman chi-sq test: 12.32811 12.31438 F(1,8365) Chi-sq(1) P-value = 0.00045 P-value = 0.00045 • Overidentification gives cause for concern – The p-value shouldn’t be anywhere near 0.05 Conclusion from Diagnostic Tests • 2SLS “work effect” is similar – Without unemployment, b = –.074 (s.e. = .028) – With unemployment, b = –.066 (s.e. = .028) • But…the second model is invalidated because the unemployment rate is not exogenous – If affects criminality through other channels • We need to control for all other indirect pathways, or… • It should not be used as an IV at all Closing Comments about Instrumental Variables Studies • In general, a lagged value of the endogenous regressor is not a good instrument – Traditional structural equation model uses lagged values of X and Y as instruments to break the simultaneity between the current values of X and Y X1 X2 Y1 Y2 These models impose the awfully strong assumption that lagged values of X and Y only affect the outcomes through current values Rules for Good Practice with Instrumental Variables Models • IV models can be very informative, but it’s your job to convince your audience – Show the first-stage model diagnostics • Even the most clever IV might not be sufficiently strongly related to X to be a useful source of identification – Report test(s) of overidentifying restrictions • An invalid IV is often worse than no IV at all – Report LS endogeneity (DWH) test Rules for Good Practice with Instrumental Variables Models • Most importantly, TELL A STORY about why a particular IV is a “good instrument” • Something to consider when thinking about whether a particular IV is “good” – Does the IV, for all intents and purposes, randomize the endogenous regressor?