Instrumental Variables I Objective We are trying to learn the effect of education on income • We have Card (1993)’s data on years of schooling, wages, proximity to a four year college and various other controls. • We will obtain OLS and IV estimates of the returns to education and discuss any problems in this particular context and in general OLS Results . reg lwage educ exper expersq black smsa smsa66 south reg66*, robust Linear regression Number of obs = F( 15, 2994) = Prob > F = R-squared = Root MSE = 3010 91.31 0.0000 0.2998 .37228 -----------------------------------------------------------------------------| Robust lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .0746933 .0036462 20.48 0.000 .0675439 .0818427 exper | .084832 .0067548 12.56 0.000 .0715875 .0980765 expersq | -.002287 .0003194 -7.16 0.000 -.0029133 -.0016608 black | -.1990123 .0181644 -10.96 0.000 -.2346282 -.1633964 smsa | .1363845 .0192172 7.10 0.000 .0987042 .1740648 smsa66 | .0262417 .0185908 1.41 0.158 -.0102102 .0626937 south | -.147955 .0280346 -5.28 0.000 -.202924 -.092986 reg661 | -.1405174 .0451252 -3.11 0.002 -.228997 -.0520378 reg662 | -.0441502 .0372945 -1.18 0.237 -.1172756 .0289751 …… ------------------------------------------------------------------------------ Are you surprised? What is the OLS Identification Assumption? What sources of bias are likely to be present? Which direction are these sources of bias likely to bias our estimates? What do we require for an instrument to be valid? What do we require for an instrument to be valid? 1. Relevance: cov(z, x) ≠ 0 2. Exogeneity cov(z, e) = 0 What do we require for an instrument to be valid? 1. Relevance: cov(z, x) ≠ 0 – Important because if the instrument isn’t correlated with the endogenous variable then knowing the value of the instrument doesn’t tell us anything about the endogenous variable. – Do we care about the unconditional correlation or the correlation conditional on the other controls? Why? – Can we test this? How? 2. Exogeneity cov(z, e) = 0 What do we require for an instrument to be valid? 1. Relevance: cov(z, x) ≠ 0 2. Exogeneity cov(z, e) = 0 – Important because we want the instrument to effect z only through x – Can we test this? If not what do we do instead? – How does this assumption relate to the key OLS identification assumption? Testing Relevance How can we test the relevance of an instrument? Testing Relevance How can we test the relevance of an instrument? 1. Calculate cor(x,z) – Better than nothing but not ideal. Why? 2. Run the ‘first stage’ regression – – – – What should we include? What do we look at? What if we have more than one instrument? What if we have more than one endogenous variable? 3. Use the post-estimation commands after estimating our main regression. We’ll do (2) today. 1st Stage Results reg educ nearc4 exper expersq black smsa smsa66 south reg66*, robust note: reg666 omitted because of collinearity Linear regression Number of obs F( 15, 2994) Prob > F R-squared Root MSE = = = = = 3010 244.92 0.0000 0.4771 1.9405 -----------------------------------------------------------------------------| Robust educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------nearc4 | .3198989 .0850763 3.76 0.000 .153085 .4867128 exper | -.4125334 .0320751 -12.86 0.000 -.4754249 -.3496418 expersq | .0008686 .0017076 0.51 0.611 -.0024795 .0042167 ... Where do we look to test the Relevance condition? Is it satisfied? First-Stage F A ‘First Stage F-Statistic’ in excess of 10 is often used as the threshold for satisfaction of the Relevance condition • What do we mean by a first stage F Statistic • Can we see it on the previous slide? – (we can, but not directly) in general you can use Stata’s ‘test’ command How plausible is it that nearc4 is exogenous? IV Results ivregress 2sls lwage (educ=nearc4) exper expersq black smsa smsa66 south reg66*, robust note: reg669 omitted because of collinearity Instrumental variables (2SLS) regression Number of obs Wald chi2(15) Prob > chi2 R-squared Root MSE = = = = = 3010 840.83 0.0000 0.2382 .3873 -----------------------------------------------------------------------------| Robust lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .1315038 .0539995 2.44 0.015 .0256667 .237341 exper | .1082711 .0233466 4.64 0.000 .0625127 .1540295 expersq | -.0023349 .0003478 -6.71 0.000 -.0030167 -.0016532 black | -.1467757 .0523622 -2.80 0.005 -.2494038 -.0441477 smsa | .1118083 .0310619 3.60 0.000 .050928 .1726886 smsa66 | .0185311 .0205103 0.90 0.366 -.0216684 .0587306 south | -.1446715 .0290653 -4.98 0.000 -.2016385 -.0877045 reg661 | -.1078142 .0409668 -2.63 0.008 -.1881077 -.0275208 How have the results changed? Are they what you expect? What explanations could there be for the differences? Does the exclusion of IQ break the exogeneity condition? . reg IQ nearc4 Source | SS df MS -------------+-----------------------------Model | 2869.62905 1 2869.62905 Residual | 487188.423 2059 236.614096 -------------+-----------------------------Total | 490058.052 2060 237.892258 Number of obs F( 1, 2059) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2061 12.13 0.0005 0.0059 0.0054 15.382 -----------------------------------------------------------------------------IQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------nearc4 | 2.5962 .7454966 3.48 0.001 1.134195 4.058206 _cons | 100.6106 .6274557 160.35 0.000 99.38014 101.8412 ------------------------------------------------------------------------------ How about now? . reg IQ nearc4 smsa66 reg662-reg669 Source | SS df MS -------------+-----------------------------Model | 30699.1017 10 3069.91017 Residual | 459358.951 2050 224.077537 -------------+-----------------------------Total | 490058.052 2060 237.892258 Number of obs F( 10, 2050) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2061 13.70 0.0000 0.0626 0.0581 14.969 -----------------------------------------------------------------------------IQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------nearc4 | .3478974 .8144087 0.43 0.669 -1.249257 1.945052 smsa66 | 1.089165 .8086998 1.35 0.178 -.4967934 2.675124 reg662 | 1.099282 1.649748 0.67 0.505 -2.136074 4.334639 reg663 | -1.559295 1.622997 -0.96 0.337 -4.742191 1.6236 reg664 | -.5425011 1.916258 -0.28 0.777 -4.300517 3.215515 reg665 | -8.47546 1.665513 -5.09 0.000 -11.74173 -5.209185 reg666 | -7.421172 1.973869 -3.76 0.000 -11.29217 -3.550175 reg667 | -8.39441 1.829768 -4.59 0.000 -11.98281 -4.806013 reg668 | -2.924975 2.34463 -1.25 0.212 -7.52308 1.67313 reg669 | -2.891917 1.797382 -1.61 0.108 -6.416801 .6329674 _cons | 104.7735 1.624972 64.48 0.000 101.5867 107.9602 ------------------------------------------------------------------------------ Do we believe the IV results?