Endogeneity in Econometrics: Instrumental Variable Estimation

advertisement
Endogeneity in Econometrics:
Instrumental Variable Estimation
Ming LU
Endogeneity
• Omitting variable bias
• Simultaneity
• Measurement error
• Can we ignore the omitted variables bias?
It can be satisfactory if the estimates are
coupled with the direction of the biases for
the key parameters.
• Can we use proxy to eliminate omitted
variable bias? –Sometimes.
• Can FE estimation solve omitting variable
problem? First differencing or fixed effects
estimation eliminates time-constant
variables. In addition, the panel data
methods do not solve the problem of timevarying omitted variables
Idea of IV Estimation
• Exogenous variable.
• Indirect effects of IV.
Example
What can serve as IV for edu?
•
•
•
•
Mother’s education?
Number of siblings?
The report of others?
A dummy variable that is equal to 1 if a
man is born in the first quarter of the year.
Angrist and Krueger (1991). (Problematic.)
• In China, the years of primary edu?
IV for skipped class?
• The distance from home to school.
Other examples of IV
• IV for institution: Language? History?
• Mauro (1995) 使用人口的种族和语言构成
作为腐败的工具变量,Hall and Jones
(1999) 用距离赤道的距离和以西欧语言为
第一语言的程度作为制度质量的工具变量,
La Porta et al. (1997, 1998, 1999)把法律的
起源作为各种法律结构的工具变量。
Acemoglu, Johnson, and Robinson (2001,
2002)使用殖民地时代(1500年前后)的死亡
率和人口密度作为制度的工具变量
• IV for school choice: Number of steams?
Identification
• Refer to (15.9) and (15.10)
The (asymptotic) standard error of
SST is the total sum of squares of the xi
Self-selection
• Angrist (1990) studied the effect that being
a veteran in the Vietnam war had on
lifetime earnings.
• Draft lottery number is a good IV
candidate for veteran.
• Some additional words about natural
experiment and DID
Properties of IV with a Poor
Instrumental Variable
Poor IV can cause serious bias.
R2
• Most regression packages compute an Rsquared after IV estimation, using the
standard formula: R2= 1- SSR/SST, where
SSR is the sum of squared IV residuals,
and SST is the total sum of squares of y.
• R2 can be negative in this case.
IV ESTIMATION OF THE
MULTIPLE REGRESSION
MODEL
• structural equation
Estimation
Efficient IV
Equation (15.26) is an example of a reduced form
equation, which means that we have written an
endogenous variable in terms of exogenous
variables.
TWO STAGE LEAST SQUARES
2SLS in words
• The first stage is to run the regression in (15.36),
where we obtain the fitted values yˆ2.
• The second stage is the OLS regression (15.38).
Because we use yˆ2 in place of y2, the 2SLS
estimates can differ substantially from the OLS
estimates.
• Another interpretation:
Multiple Endogenous
Explanatory Variables
• ORDER CONDITION FOR
IDENTIFICATION OF AN EQUATION:
• We need at least as many excluded
exogenous variables as there are included
endogenous explanatory variables in the
structural equation.
IV SOLUTIONS TO ERRORS-INVARIABLES PROBLEMS
One possibility is to obtain a second measurement
on X*1, say, z1, as IV.
An alternative is to use other exogenous variables
as IVs for a potentially mismeasured variable.
TESTING FOR ENDOGENEITY AND
TESTING OVERIDENTIFYING
RESTRICTIONS
• The 2SLS estimator is less efficient than
OLS when the explanatory variables are
exogenous; as we have seen, the 2SLS
estimates can have very large standard
errors.
How to test endogeneity?
• 1. Comparing the OLS and 2SLS estimates and
determining whether the differences are
statistically significant. (Hausman, 1978)
• 2. A regression test:
Another interpretation of 2SLS
• Including vˆ2 in the OLS regression (15.51)
clears up the endogeneity of y2.
• We can also test for endogeneity of multiple
explanatory variables. For each suspected
endogenous variable, we obtain the reduced
form residuals. Then, we test for joint
significance of these residuals in the structural
equation, using an F test.
Testing Overidentification
Restrictions
• If we have more than one instrumental
variable, we can effectively test whether
some of them are uncorrelated with the
structural error.
• Use one IV and get the predicted residual,
then test the correlation between other IVs
and the residual.
TESTING OVERIDENTIFYING
RESTRICTIONS:
• (i) Estimate the structural equation by 2SLS and
obtain the 2SLS residuals, uˆ1.
• (ii) Regress uˆ1 on all exogenous variables.
Obtain the R-squared, say R12.
• (iii) Under the null hypothesis that all IVs are
uncorrelated with u1, nR12 ~ª X 2(q) , where q is
the number of instrumental variables from
outside the model minus the total number of
endogenous explanatory variables. If nR12
exceeds (say) the 5% critical value in the X 2(q)
distribution, we reject H0 and conclude that at
least some of the IVs are not exogenous.
Is it better to have more IVs?
• Adding instruments to the list improves the
asymptotic efficiency of the 2SLS. But this
requires that any new instruments are in
fact exogenous.
• With the typical sample sizes available,
adding too many instruments—that is,
increasing the number of overidentifying
restrictions—can cause severe biases in
2SLS.
2 Omitted Topics
• 2SLS WITH HETEROSKEDASTICITY
• APPLYING 2SLS TO TIME SERIES
EQUATIONS
APPLYING 2SLS TO POOLED
CROSS SECTIONS
AND PANEL DATA
• For pooled cross sections data: add time
dummy.
• For panel data: In the first stage, use the
differenced IV to get an estimate of the
endogenous variable.
• Question: If the panel model is a FE one,
how to check the efficiency of IV if the IV is
time invariant?
STATA commands
•
•
•
•
•
To compare OLS and 2SLS
ivreg y (x=iv) x2
est store f2
reg y x x2
hausman f2
• The sequence is important.
STATA commands
•
•
•
•
•
To compare FE and IV-FE
xtivreg y (x=iv) x2, fe
est store f2
xtreg y x x2, fe
hausman f2
The end.
Download