Endogeneity in Econometrics: Instrumental Variable Estimation Ming LU Endogeneity • Omitting variable bias • Simultaneity • Measurement error • Can we ignore the omitted variables bias? It can be satisfactory if the estimates are coupled with the direction of the biases for the key parameters. • Can we use proxy to eliminate omitted variable bias? –Sometimes. • Can FE estimation solve omitting variable problem? First differencing or fixed effects estimation eliminates time-constant variables. In addition, the panel data methods do not solve the problem of timevarying omitted variables Idea of IV Estimation • Exogenous variable. • Indirect effects of IV. Example What can serve as IV for edu? • • • • Mother’s education? Number of siblings? The report of others? A dummy variable that is equal to 1 if a man is born in the first quarter of the year. Angrist and Krueger (1991). (Problematic.) • In China, the years of primary edu? IV for skipped class? • The distance from home to school. Other examples of IV • IV for institution: Language? History? • Mauro (1995) 使用人口的种族和语言构成 作为腐败的工具变量,Hall and Jones (1999) 用距离赤道的距离和以西欧语言为 第一语言的程度作为制度质量的工具变量, La Porta et al. (1997, 1998, 1999)把法律的 起源作为各种法律结构的工具变量。 Acemoglu, Johnson, and Robinson (2001, 2002)使用殖民地时代(1500年前后)的死亡 率和人口密度作为制度的工具变量 • IV for school choice: Number of steams? Identification • Refer to (15.9) and (15.10) The (asymptotic) standard error of SST is the total sum of squares of the xi Self-selection • Angrist (1990) studied the effect that being a veteran in the Vietnam war had on lifetime earnings. • Draft lottery number is a good IV candidate for veteran. • Some additional words about natural experiment and DID Properties of IV with a Poor Instrumental Variable Poor IV can cause serious bias. R2 • Most regression packages compute an Rsquared after IV estimation, using the standard formula: R2= 1- SSR/SST, where SSR is the sum of squared IV residuals, and SST is the total sum of squares of y. • R2 can be negative in this case. IV ESTIMATION OF THE MULTIPLE REGRESSION MODEL • structural equation Estimation Efficient IV Equation (15.26) is an example of a reduced form equation, which means that we have written an endogenous variable in terms of exogenous variables. TWO STAGE LEAST SQUARES 2SLS in words • The first stage is to run the regression in (15.36), where we obtain the fitted values yˆ2. • The second stage is the OLS regression (15.38). Because we use yˆ2 in place of y2, the 2SLS estimates can differ substantially from the OLS estimates. • Another interpretation: Multiple Endogenous Explanatory Variables • ORDER CONDITION FOR IDENTIFICATION OF AN EQUATION: • We need at least as many excluded exogenous variables as there are included endogenous explanatory variables in the structural equation. IV SOLUTIONS TO ERRORS-INVARIABLES PROBLEMS One possibility is to obtain a second measurement on X*1, say, z1, as IV. An alternative is to use other exogenous variables as IVs for a potentially mismeasured variable. TESTING FOR ENDOGENEITY AND TESTING OVERIDENTIFYING RESTRICTIONS • The 2SLS estimator is less efficient than OLS when the explanatory variables are exogenous; as we have seen, the 2SLS estimates can have very large standard errors. How to test endogeneity? • 1. Comparing the OLS and 2SLS estimates and determining whether the differences are statistically significant. (Hausman, 1978) • 2. A regression test: Another interpretation of 2SLS • Including vˆ2 in the OLS regression (15.51) clears up the endogeneity of y2. • We can also test for endogeneity of multiple explanatory variables. For each suspected endogenous variable, we obtain the reduced form residuals. Then, we test for joint significance of these residuals in the structural equation, using an F test. Testing Overidentification Restrictions • If we have more than one instrumental variable, we can effectively test whether some of them are uncorrelated with the structural error. • Use one IV and get the predicted residual, then test the correlation between other IVs and the residual. TESTING OVERIDENTIFYING RESTRICTIONS: • (i) Estimate the structural equation by 2SLS and obtain the 2SLS residuals, uˆ1. • (ii) Regress uˆ1 on all exogenous variables. Obtain the R-squared, say R12. • (iii) Under the null hypothesis that all IVs are uncorrelated with u1, nR12 ~ª X 2(q) , where q is the number of instrumental variables from outside the model minus the total number of endogenous explanatory variables. If nR12 exceeds (say) the 5% critical value in the X 2(q) distribution, we reject H0 and conclude that at least some of the IVs are not exogenous. Is it better to have more IVs? • Adding instruments to the list improves the asymptotic efficiency of the 2SLS. But this requires that any new instruments are in fact exogenous. • With the typical sample sizes available, adding too many instruments—that is, increasing the number of overidentifying restrictions—can cause severe biases in 2SLS. 2 Omitted Topics • 2SLS WITH HETEROSKEDASTICITY • APPLYING 2SLS TO TIME SERIES EQUATIONS APPLYING 2SLS TO POOLED CROSS SECTIONS AND PANEL DATA • For pooled cross sections data: add time dummy. • For panel data: In the first stage, use the differenced IV to get an estimate of the endogenous variable. • Question: If the panel model is a FE one, how to check the efficiency of IV if the IV is time invariant? STATA commands • • • • • To compare OLS and 2SLS ivreg y (x=iv) x2 est store f2 reg y x x2 hausman f2 • The sequence is important. STATA commands • • • • • To compare FE and IV-FE xtivreg y (x=iv) x2, fe est store f2 xtreg y x x2, fe hausman f2 The end.