IV/2SLS models 1 zi 0 x0 0.80 zi 1 x1 0.57 2 y0 3186 y1 3278 y1 y0 3278 3186 ˆ 1 400 x1 x0 0.57 0.80 3 Vietnam era service • • • • • • • • Defined as 1964-1975 Estimated 8.7 million served during era 3.4 million were in SE Asia 2.6 million served in Vietnam 1.6 million saw combat 203K wounded in action, 153K hospitalized 58,000 deaths http://www.history.navy.mil/library/online/america n%20war%20casualty.htm#t7 4 Vietnam Era Draft • 1st part of war, operated liked WWII and Korean War • At age 18 men report to local draft boards • Could receive deferment for variety of reasons (kids, attending school) • If available for service, pre-induction physical and tests • Military needs determined those drafted 5 • Everyone drafted went to the Army • Local draft boards filled army. • Priorities – Delinquents, volunteers, non-vol. 19-25 – For non-vol., determined by age • College enrollment powerful way to avoid service – Men w. college degree 1/3 less likely to serve 6 Draft Lottery • Proposed by Nixon • Passed in Nov 1969, 1st lottery Dec 1, 1969 • 1st lottery for men age 19-26 on 1/1/70 – Men born 1944-1950. • Randomly assigned number 1-365, Draft Lottery number (DLN) • Military estimates needs, sets threshold T • If DLN<=T, drafted 7 Questions? • What are the research questions? • Why can we NOT obtain estimates from observational data? 8 • If volunteer, could get better assignment • Thresholds for service • • • • Draft 1970 1971 1972 Year of Birth 1946-50 1951 1952 Threshold 195 125 95 • Draft suspended in 1973 9 10 11 12 13 Angrist/Evans 14 19 48 19 51 19 54 19 57 19 60 19 63 19 66 19 69 19 72 19 75 19 78 19 81 19 84 19 87 19 90 19 93 19 96 19 99 20 02 Percent in labor force Female Labor Force Paticipation Rate 70 60 50 40 30 20 10 0 Year 15 16 17 18 19 20 21 22 . * get correlation coefficient between; . * instrument and endogenous RHS variable; . corr morekids samesex; (obs=254654) | morekids samesex -------------+-----------------morekids | 1.0000 samesex | 0.0695 1.0000 Correlation coefficient 23 Ratio of variances = (0.0020246/0.0291242)^2 = 0.004832484 24 R2 = 290.247937/60030.836855 = 0.004832 βiv = -0.0092924/0.0675253= -0.137631 25 Reduced form, just identified model 26 First stage, just identified model 27 2SLS, just identified model Βiv= -0.0083481/0.0693854 = -0.120315 28 1st stage over identified model 29 ivreg2 • Download from www • Within stata, type ssc install ivreg2, replace • and hit return • Does all the tests seemlessly 30 Outcome of interest W’s (exogenous covariates) * the syntax is ivreg2 y w (x=z), first endog(x); * the first command asks stata to report the 1st stage, and; * endog(x) asks stata to do the hausman-wu test of endogeneity; ivreg2 workedm boy1st boy2nd agem1 agefstm black hispan othrace (morekids=samesex), first endog(morekids); Test for endogeneity of morekids in model Endogenous variable And instruments Ask for 1st stage 31 IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics consistent for homoskedasticity only Total (centered) SS Total (uncentered) SS Residual SS = = = 63460.72056 134513 60402.67924 Number of obs F( 8,254645) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 254654 865.24 0.0000 0.0482 0.5510 .487 -----------------------------------------------------------------------------workedm | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -.1203151 .0278407 -4.32 0.000 -.1748818 -.0657483 boy1st | .0009211 .0019489 0.47 0.636 -.0028986 .0047409 boy2nd | -.0048314 .0019425 -2.49 0.013 -.0086386 -.0010241 agem1 | .0219352 .0009013 24.34 0.000 .0201687 .0237018 agefstm | -.0264911 .0012647 -20.95 0.000 -.0289698 -.0240124 black | .1899764 .0047674 39.85 0.000 .1806325 .1993203 hispan | -.0139081 .0053812 -2.58 0.010 -.0244551 -.0033611 othrace | .0443545 .0048137 9.21 0.000 .0349198 .0537891 _cons | .4498966 .0138562 32.47 0.000 .4227389 .4770543 -----------------------------------------------------------------------------Underidentification test (Anderson canon. corr. LM statistic): 1405.578 Chi-sq(1) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 1413.330 Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38 15% maximal IV size 8.96 20% maximal IV size 6.66 25% maximal IV size 5.53 Source: Stock-Yogo (2005). Reproduced by permission. -----------------------------------------------------------------------------Sargan statistic (overidentification test of all instruments): 0.000 32 OLS estimation -------------Estimates efficient for homoskedasticity only Statistics consistent for homoskedasticity only Number of obs = 254654 F( 8,254645) = 2825.70 Prob > F = 0.0000 Total (centered) SS = 60030.83676 Centered R2 = 0.0815 Total (uncentered) SS = 96912 Uncentered R2 = 0.4311 Residual SS = 55136.2215 Root MSE = .4653 -----------------------------------------------------------------------------morekids | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------boy1st | -.0015753 .0026228 -0.60 0.548 -.0067158 .0035653 agem1 | .0304246 .000298 102.09 0.000 .0298405 .0310087 agefstm | -.0435676 .0003462 -125.85 0.000 -.0442461 -.0428891 black | .0679715 .0041853 16.24 0.000 .0597684 .0761747 hispan | .125998 .0038974 32.33 0.000 .1183591 .1336369 othrace | .0479479 .0044209 10.85 0.000 .039283 .0566127 twoboys | .0598382 .0025731 23.26 0.000 .0547951 .0648813 twogirls | .0789326 .0026467 29.82 0.000 .0737452 .08412 _cons | .3138696 .0092684 33.86 0.000 .2957038 .3320353 -----------------------------------------------------------------------------Included instruments: boy1st agem1 agefstm black hispan othrace twoboys twogirl > s -----------------------------------------------------------------------------F test of excluded instruments: F( 2,254645) = 715.13 1st stage F Prob > F = 0.0000 Angrist-Pischke multivariate F test of excluded instruments: F( 2,254645) = 715.13 Prob > F = 0.0000 33 Summary results for first-stage regressions ------------------------------------------Variable morekids | F( | (Underid) (Weak id) 2,254645) P-val | AP Chi-sq( 2) P-val | AP F( 2,254645) 715.13 0.0000 | 1430.31 0.0000 | 715.13 34 IV (2SLS) estimation -------------------Estimates efficient for homoskedasticity only Statistics consistent for homoskedasticity only Number of obs = 254654 F( 7,254646) = 987.26 Prob > F = 0.0000 Total (centered) SS = 63460.72056 Centered R2 = 0.0475 Total (uncentered) SS = 134513 Uncentered R2 = 0.5506 Residual SS = 60445.97117 Root MSE = .4872 -----------------------------------------------------------------------------workedm | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -.1127816 .0276854 -4.07 0.000 -.167044 -.0585193 boy1st | .0009424 .0019496 0.48 0.629 -.0028786 .0047635 agem1 | .0217057 .0008969 24.20 0.000 .0199478 .0234635 agefstm | -.0261649 .0012583 -20.79 0.000 -.0286312 -.0236987 black | .1895035 .0047653 39.77 0.000 .1801637 .1988433 hispan | -.014818 .0053707 -2.76 0.006 -.0253444 -.0042916 othrace | .0439784 .004813 9.14 0.000 .034545 .0534118 _cons | .4448388 .0137111 32.44 0.000 .4179656 .4717121 -----------------------------------------------------------------------------Underidentification test (Anderson canon. corr. LM statistic): 1422.320 Chi-sq(2) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 715.129 Stock-Yogo weak ID test critical values: 10% maximal IV size 19.93 15% maximal IV size 11.59 20% maximal IV size 8.75 25% maximal IV size 7.25 Source: Stock-Yogo (2005). Reproduced by permission. -----------------------------------------------------------------------------Sargan statistic (overidentification test of all instruments): 6.182 Chi-sq(1) P-val = 0.0129 -endog- option: Endogeneity test of endogenous regressors: Chi-sq(1) P-val = 3.809 0.0510 Regressors tested: morekids -----------------------------------------------------------------------------Instrumented: morekids Included instruments: boy1st agem1 agefstm black hispan othrace Excluded instruments: twoboys twogirls ------------------------------------------------------------------------------ Test of over id. Hausman endo test 35 . . . . . > * output residuals and do the tests of overid; * and hausman test by brute force; predict res_2sls_worked, res; * test of overid; reg res_2sls_worked twoboys twogirls boy1st agem1 agefstm black hispan othr ace; Source | SS df MS Number of obs = 254654 -------------+-----------------------------F( 8,254645) = 0.77 Model | 1.46731447 8 .183414308 Prob > F = 0.6269 Residual | 60444.5039254645 .237367723 R-squared = 0.0000 -------------+-----------------------------Adj R-squared = -0.0000 Total | 60445.9712254653 .237366028 Root MSE = .4872 -----------------------------------------------------------------------------res_2sls_w~d | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------twoboys | -.0052822 .0026941 -1.96 0.050 -.0105625 -1.83e-06 twogirls | .0042367 .0027711 1.53 0.126 -.0011946 .0096681 boy1st | .004822 .0027461 1.76 0.079 -.0005603 .0102043 agem1 | 3.72e-07 .000312 0.00 0.999 -.0006112 .000612 agefstm | 2.07e-06 .0003625 0.01 0.995 -.0007084 .0007125 black | -.0000392 .0043822 -0.01 0.993 -.0086282 .0085498 hispan | -.0000393 .0040807 -0.01 0.992 -.0080375 .0079588 othrace | .0000149 .0046288 0.00 0.997 -.0090575 .0090872 _cons | -.0021381 .0097043 -0.22 0.826 -.0211583 .016882 ------------------------------------------------------------------------------ 36 • • • • • • • SSM = 1.467 SST = 600444.50 R2 = SSM/SST = 2.43E-5 N = 254654 NR2 = 6.18 Dist as χ2(1) P-value of 6.18 is 0.0129 37 Do Hausman test brute force . * Run Hausmans test of endogeneity, two instrument case; . * add residual from 1st stage regression to OLS of structural model; . reg workedm morekids boy1st agem1 agefstm black hispan othrace res_1st_2zs; Source | SS df MS Number of obs = 254654 -------------+-----------------------------F( 8,254645) = 1677.06 Model | 3176.20362 8 397.025453 Prob > F = 0.0000 Residual | 60284.5169254645 .236739449 R-squared = 0.0500 -------------+-----------------------------Adj R-squared = 0.0500 Total | 63460.7206254653 .249204685 Root MSE = .48656 -----------------------------------------------------------------------------workedm | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -.1127816 .0276489 -4.08 0.000 -.1669726 -.0585906 boy1st | .0009424 .001947 0.48 0.628 -.0028736 .0047585 agem1 | .0217057 .0008957 24.23 0.000 .0199501 .0234612 agefstm | -.0261649 .0012566 -20.82 0.000 -.0286279 -.0237019 black | .1895035 .004759 39.82 0.000 .180176 .1988311 hispan | -.014818 .0053636 -2.76 0.006 -.0253305 -.0043054 othrace | .0439784 .0048067 9.15 0.000 .0345574 .0533994 res_1st_2zs | -.0541136 .0277264 -1.95 0.051 -.1084566 .0002294 _cons | .4448388 .013693 32.49 0.000 .4180009 .4716768 -----------------------------------------------------------------------------. * notice that OLS of this model generates 2SLS estimates of the other; . * variables in the model (morekids, boy1st, etc.); . test res_1st_2zs; ( 1) res_1st_2zs = 0 F( 1,254645) = 3.81 Prob > F = 0.0510 38 . * Run Hausmans test of endogeneity, one instrument case; . * add residual from 1st stage regression to OLS of structural model; . reg workedm morekids boy1st agem1 agefstm black hispan othrace res_1st_2zs; Source | SS df MS -------------+-----------------------------Model | 3176.20362 8 397.025453 Residual | 60284.5169254645 .236739449 -------------+-----------------------------Total | 63460.7206254653 .249204685 Number of obs F( 8,254645) Prob > F R-squared Adj R-squared Root MSE = 254654 = 1677.06 = 0.0000 = 0.0500 = 0.0500 = .48656 -----------------------------------------------------------------------------workedm | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -.1127816 .0276489 -4.08 0.000 -.1669726 -.0585906 boy1st | .0009424 .001947 0.48 0.628 -.0028736 .0047585 agem1 | .0217057 .0008957 24.23 0.000 .0199501 .0234612 agefstm | -.0261649 .0012566 -20.82 0.000 -.0286279 -.0237019 black | .1895035 .004759 39.82 0.000 .180176 .1988311 hispan | -.014818 .0053636 -2.76 0.006 -.0253305 -.0043054 othrace | .0439784 .0048067 9.15 0.000 .0345574 .0533994 res_1st_2zs | -.0541136 .0277264 -1.95 0.051 -.1084566 .0002294 _cons | .4448388 .013693 32.49 0.000 .4180009 .4716768 ------------------------------------------------------------------------------ Can reject at 5.1 percent the null the coefficients are The same 39 Angrist/Krueger 40 Example • Suppose a school district requires that a child turn 6 by October 31 in the 1st grade • Has compulsory education until age 18 • Consider two kids • One born Oct 1, 1960 • Another born Nov 1,1960 41 • Oct 1, 1960 – – – – Starts school in 1966 (age 5) Turns 6 a few months into school Starts senior year in 1977 (age 16) Does not turn 18 until after HS school is over • Nov 1, 1960 – – – – Start school in 1967 (age 6) Turns 7 a few months into school Starts senior year in 1978 (age 17) Turns 18 midway through senior year 42 43 44 45 46 . * get reduced-forms for wald estimate; . * compare to table III, panel B; . reg educ qob1; βiv==-0.0110989/-0.1088179=-0.10199 Source | SS df MS -------------+-----------------------------Model | 727.393312 1 727.393312 Residual | 3546940.27329507 10.7643852 -------------+-----------------------------Total | 3547667.66329508 10.76656 Number of obs F( 1,329507) Prob > F R-squared Adj R-squared Root MSE = = = = = = 329509 67.57 0.0000 0.0002 0.0002 3.2809 -----------------------------------------------------------------------------educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------qob1 | -.1088179 .0132376 -8.22 0.000 -.1347633 -.0828725 _cons | 12.79688 .0065904 1941.75 0.000 12.78397 12.8098 -----------------------------------------------------------------------------. reg earnwkl qob1; 1st stage Source | SS df MS -------------+-----------------------------Model | 7.56705582 1 7.56705582 Residual | 151830.3329507 .460780197 -------------+-----------------------------Total | 151837.867329508 .460801763 Number of obs F( 1,329507) Prob > F R-squared Adj R-squared Root MSE = = = = = = 329509 16.42 0.0001 0.0000 0.0000 .67881 -----------------------------------------------------------------------------earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------qob1 | -.0110989 .0027388 -4.05 0.000 -.0164669 -.0057309 _cons | 5.902694 .0013635 4329.00 0.000 5.900022 5.905367 47 ------------------------------------------------------------------------------ Reduced-form . * get correlation coefficient for; . * educ and qob1; . corr educ qob1; (obs=329509) | educ qob1 -------------+-----------------educ | 1.0000 qob1 | -0.0143 1.0000 Correlation coefficient: z and x 48 49 % of Mothers that Smoked During Pregnancy by Birth Month of their Child 14.0% 13.5% % Smoked 13.0% 12.5% 12.0% 11.5% 11.0% JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC Month 50 51 52 53 Average Birth weight by Birth Month 3340 Birth weight in grams 3330 3320 3310 3300 3290 3280 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC Month 54 55 Overidentified model • 10 years of birth • 3 quarters of birth • 30 instruments 56 . * get dummies needed for the models; . xi i.yob*i.qob; i.yob _Iyob_30-39 (naturally coded; _Iyob_30 omitted) i.qob _Iqob_1-4 (naturally coded; _Iqob_1 omitted) i.yob*i.qob _IyobXqob_#_# (coded as above) The xi command i.m*i.n takes and generates dummies for i.m, i.n then all the unique interactions of m and n 57 . * run 2sls, qob times yob interactions as instruments; . * compare to column (2), table V; . ivregress 2sls earnwkl _Iyob_* (educ=_Iqob* _IyobX*); Instrumental variables (2SLS) regression YOB effects Number of obs Wald chi2(10) Prob > chi2 R-squared Root MSE = = = = = 329509 41.67 0.0000 0.1102 .64034 QOB main effects and qob x yob interactions as instruments ------------------------------------------------------------------------------ earnwkl | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .0891154 .0161098 5.53 0.000 .0575408 .1206901 _Iyob_31 | -.0088813 .0055293 -1.61 0.108 -.0197185 .0019558 DELETE SOME RESULTS _Iyob_39 | -.0585271 .0104573 -5.60 0.000 -.0790231 -.0380311 _cons | 4.792727 .2006807 23.88 0.000 4.3994 5.186054 -----------------------------------------------------------------------------Instrumented: educ Instruments: _Iyob_31 _Iyob_32 _Iyob_33 _Iyob_34 _Iyob_35 _Iyob_36 _Iyob_37 _Iyob_38 _Iyob_39 _Iqob_2 _Iqob_3 _Iqob_4 _IyobXqob_31_2 _IyobXqob_31_3 _IyobXqob_31_4 _IyobXqob_32_2 _IyobXqob_32_3 _IyobXqob_32_4 _IyobXqob_33_2 _IyobXqob_33_3 _IyobXqob_33_4 _IyobXqob_34_2 _IyobXqob_34_3 _IyobXqob_34_4 _IyobXqob_35_2 _IyobXqob_35_3 _IyobXqob_35_4 _IyobXqob_36_2 _IyobXqob_36_3 _IyobXqob_36_4 _IyobXqob_37_2 _IyobXqob_37_3 _IyobXqob_37_4 _IyobXqob_38_2 _IyobXqob_38_3 _IyobXqob_38_458 _IyobXqob_39_2 _IyobXqob_39_3 _IyobXqob_39_4 . estat overid; Tests of overidentifying restrictions: Sargan (score) chi2(29)= 25.4394 (p = 0.6553) Basmann chi2(29) = 25.4383 (p = 0.6553) 59 . estat firststage; First-stage regression summary statistics -------------------------------------------------------------------------| Adjusted Partial Variable | R-sq. R-sq. R-sq. F(30,329469) Prob > F -------------+-----------------------------------------------------------educ | 0.0033 0.0032 0.0004 4.90707 0.0000 -------------------------------------------------------------------------- 1st stage F – lots of concerns about finite sample bias 60 In columns (4) and (8), age and agesq reduce information contained in instrument. 1st stage F falls to 1.6. Compare 2sls to IV in these cases. In this instance, low F – poor 1st stage fit – results collapse to OLS 61 Notice how close the 2SLS and OLS are Generate instruments by interacting 3 QOB x 10 YOB dummies (30) 3 QOB x 50 YOB dummies (147) 177 instruments, 176 DOF in NR2 test 62 63