notes

advertisement
INSTRUMENTAL VARIABLES REGRSSION MODEL
INTRODUCTION
We have seen that if the error term is correlated with an explanatory variable, then the OLS
estimator is biased in small samples and inconsistent in large samples. This means that even in
large samples the OLS estimator may not produce an estimate that is close to the true value of the
population parameter being estimated. As a result, an empirical study that estimates a linear
regression model using the OLS estimator when the error term is correlated with an explanatory
variable is not internally valid.
Three Major Sources of Correlation Between the Error Term and an Explanatory Variable
The 3 most important sources that produce a correlation between the error term and an
explanatory variable are the following. 1) Confounding variable. 2) Reverse causation. 3)
Measurement error in an explanatory variable. The bias caused by a confounding variable can be
corrected by including it as an explanatory variable in the model, if it is observable, or specifying
and estimating a fixed effects regression model if it is unobservable and differs across units but is
constant over time. However, these methods do not work if the confounding variable(s) is
unobservable and/or differs across units and over time. Also, these methods do not work for
reverse causation and measurement error in an explanatory variable.
INSTRUMENTAL VARIABLE (IV) REGRESSION MODEL
When the error term is correlated with an explanatory variable, it is not possible to find an
estimator that is unbiased in small samples. However, it is possible to find an estimator that is
consistent in large samples. To obtain consistent estimates in large samples, we can specify an
instrumental variable (IV) regression model, and use an instrumental variable (IV) estimator.
IV REGRESSION MODEL WITH ONE EXPLANTORY VARIABLE AND ONE
INSTRUMENTAL VARIABLE
The IV regression model with one explanatory variable is
Yt = α + βXt + μt
The IV regression model allows the error term to be correlated with the explanatory variable, and
therefore the error term to have a non-constant, non-zero, mean. That is, Corr(μt, Xt) ≠ 0, and E(μt
|Xt) ≠ 0. The remaining assumptions are the same as the MCLRM. Any variable correlated with
the error term is called an endogenous variable. Any variable uncorrelated with the error term is
called an exogenous variable.
Estimation
To use the sample of data to obtain estimates of the parameters of the IV regression model, we
use an IV estimator. To use an IV estimator, you must have one or more valid instrumental
variables. An instrumental variable is also called an instrument. We will designate an
instrumental variable as I.
Instrumental Variable
A valid instrumental variable, I, has 2 properties.
1. Instrument Relevance - An instrumental variable, I, is relevant if it is correlated with the
endogenous variable X. That is, Corr(It, Xt) ≠ 0.
2. Instrument Exogeneity – An instrumental variable, I, is exogenous if it is uncorrelated with the
error term μ. That is, Corr(μt, Xt) ≠ 0.
Two-Stage Least Squares (2SLS) Estimator
The most often used IV estimator is the two-stage least squares (2SLS) estimator. It involves two
stages.
Stage #1: Regress X on I using the OLS estimator. Save the predicted values X^.
Stage #2: Regress Y on the predicted values X^ using the OLS estimator.
The 2SLS estimator for the slope parameter β is also given by the following formula.
β^2SLS = Cov(I, Y) / Cov(I, X)
where Cov(I, Y) is the sample covariance between I and Y, and Cov (I, X) is the sample
covariance between I and X.
Sampling Distribution of the 2SLS Estimator
The large sample (asymptotic) sampling distribution of the 2SLS estimator β^2SLS is given by
β^2SLS ~ N(β, Variance), where Variance = (1/n){Var[(I – E(I))μ] / [Cov(I, X)]2}
This indicates that the 2SLS estimator has an approximate normal distribution in large samples.
Also, as the sample size increases, the sampling distribution of β^2SLS collapses to the true value β.
Therefore, in large samples the 2SLS estimator should produce an estimate that is close to the
true value of the population parameter.
2SLS Estimator and Estimated Standard Errors
To obtain a correct estimate of the standard error of the estimate, we must use the residuals,
μt^ = Yt – α^ - β^Xt, not the residuals from the second-stage regression, εt = Yt – α^ - β^Xt^.
Statistical programs with a 2SLS command will calculate the correct standard errors for you.
IV REGRESSION MODEL WITH ONE EXPLANTORY VARIABLE AND TWO OR MORE
INSTRUMENTAL VARIABLES
Suppose that we have m instrumental variables, I1, I2, …, Im. If m = 0, then the regression
coefficients α and β are said to be underidentified. If m = 1, then the regression coefficients α and
β are said to be exactly identified. If m > 1, then the regression coefficients α and β are said to be
overidentified. The IV estimator can be used to obtain estimates if α and β are exactly identified
or overidentified. It cannot be used if α and β are underidentified.
2SLS Estimator
The 2SLS estimator now involves the following two stages.
Stage #1: Regress X on I1, I2, …, Im using the OLS estimator. Save the predicted values X^.
Stage #2: Regress Y on the predicted value variable X^ using the OLS estimator.
If all instrumental variables are relevant and exogenous, then the 2SLS estimator has a normal
distribution and is consistent in large samples.
IV REGRESSION MODEL WITH TWO OR MORE EXPLANTORY VARIABLES AND TWO
OR MORE INSTRUMENTAL VARIABLES
The IV regression model with two or more explanatory variables and m instrumental variables is
Yt = β1 + β2Xt2 + β3Zt1 + …, + β2+rZrt + μt
Where Xt2 is the endogenous explanatory variable, Zt1, …, Zrt are r exogenous explanatory
variables, and I1, I2, …, Im are m instrumental variables.
2SLS Estimator
The 2SLS estimator now involves the following two stages.
Stage #1: Regress X on I1, I2, …, Im and Z1, Z2, …, Zr using the OLS estimator. Save the
predicted values X^.
Stage #2: Regress Y on the predicted value variable X^ and the exogenous explanatory variables
Zt1, …, Zrt using the OLS estimator.
If I1, I2, …, Im are relevant and exogenous, and Z1, Z2, …, Zr are exogenous, then the 2SLS
estimator has a normal distribution and is consistent in large samples.
CHECKING THE VALIDITY OF INSTRUMENTAL VARIABLES
If all instrumental variables I1, I2, …, Im are uncorrelated with the endogenous explanatory
variable X, then they are not relevant. If the instrumental variables have a relative low correlation
with X, then they are said to be weak instruments. If the instruments are either not relevant or
weak, then 2SLS will not have a normal distribution and be inconsistent in large samples. If any
instrumental variable is correlated with the error term, then it is not exogenous. If any
instrumental variable is not exogenous, then 2SLS will be inconsistent in large samples. If 2SLS
is inconsistent, then it will not produce an estimate that is close to the true value of the population
parameter, even if the sample size is large. Therefore, we should check the validity of our
instrumental variable(s).
Checking Instrument Relevance
To check for instrument relevance, you calculate the F-statistic for the null hypothesis that the
coefficients of the instrumental variables are all zero in the first-stage regression. An often used
rule-of-thumb is that an F-statistic of less than 10 indicates possible weak instruments.
Checking Instrument Exogeneity
If you have one instrumental variable, and therefore the regression coefficients are exactly
identified, then you cannot check for instrument exogeneity. If you have two or more
instrumental variables, and therefore the regression coefficients are overidentified, then you can
do a test of the overidentifying restrictions. This allows you to check if all instrumental variables
are exogenous. The null hypothesis is the hypothesis that all instrumental variables are
exogenous. The alternative hypothesis is that at least one of the instrumental variables is
endogenous (i.e., correlated with the error term). The test is a Lagrange multiplier test and
involves two steps. Step #1: Estimate the IV regression model using the 2SLS estimator. Save the
2SLS residuals μt^. Step #2. Regress the residuals μt^ on the instrumental variables I1, I2, …, Im
and exogenous explanatory variables Z1, Z2, …, Zr using OLS. Use the R2 statistic from this
regression to calculate the LM test statistic: LM = nR2, where n is sample size. The LM statistic
has an approximate chi-square distribution, with m – 1 degrees of freedom.
IV REGRESSION MODEL WITH TWO OR MORE ENDOGENOUS EXPLANTORY
VARIABLES, TWO OR MORE EXOGENOUS EXPLANATORY VARIABLES, AND TWO
OR MORE INSTRUMENTAL VARIABLES
The IV regression model with k endogenous explanatory variables, r exogenous explanatory
variables, and m instrumental variables is
Yt = β1 + β2Xt2 + …, + βkXtk + βk+1Zt1 + …, + βk+rZrt + μt
Where Xt2, …, Xtk are k – 1 endogenous explanatory variables, Zt1, …, Zrt are r exogenous
explanatory variables, and I1, I2, …, Im are m instrumental variables.
2SLS Estimator
The 2SLS estimator now involves the following two stages.
Stage #1: Regress each Xi on I1, I2, …, Im and Z1, Z2, …, Zr using the OLS estimator. Save the k –
1 predicted value variables X2^, …, Xk^.
Stage #2: Regress Y on the predicted value variables X2^, …, Xk^ and the exogenous explanatory
variables Z1, Z2, …, Zr using the OLS estimator.
If I1, I2, …, Im are relevant and exogenous, and Z1, Z2, …, Zr are exogenous, then the 2SLS
estimator has a normal distribution and is consistent in large samples.
Underidentified, Exactly Identified, and Overidentified Regression Coefficients
The regression coefficients are underidentified if m < k – 1 ; exactly identified if m = k – 1 ; and
overidentified if m > k – 1 .
Checking the Validity of Instrumental Variables
Instrument Relevance
If k = 2 and you have one endogenous explanatory variable, then you can use the F-statistic to
check for instrument relevance. If k > 2, then the F-test cannot be used to check for instrument
relevance.
Instrument Exogeneity
The Lagrange multiplier test can be used to test for instrument exogeneity (test the
overidentifying restrictions) if m > k – 1. The LM test statistic has an approximate chi-square
distribution with (m – (k – 1)) degrees of freedom.
Download