ch10InstrumentalVariables

advertisement
Endogenous Regressors and Instrumental
Variables Estimation
Adapted from Vera Tabakova, East Carolina University

10.1 Linear Regression with Random x’s

10.2 Cases in which x and e are Correlated

10.3 Estimators Based on the Method of Moments

10.4 Specification Tests
Principles of Econometrics, 3rd Edition
Slide 10-2
The assumptions of the simple linear regression are:

SR1. yi  1  2 xi  ei i  1, , N

SR2. E (ei )  0

SR3. var(ei )  2

SR4. cov(ei , e j )  0

SR5. The variable xi is not random, and it must take at least two
different values.

SR6. (optional) ei ~ N (0, 2 )
Principles of Econometrics, 3rd Edition
Slide 10-3
The purpose of this chapter is to discuss regression models in which
xi is random and correlated with the error term ei. We will:

Discuss the conditions under which having a random x is not a
problem, and how to test whether our data satisfies these conditions.

Present cases in which the randomness of x causes the least squares
estimator to fail.

Provide estimators that have good properties even when xi is random
and correlated with the error ei.
Principles of Econometrics, 3rd Edition
Slide 10-4

A10.1
yi  1  2 xi  ei correctly describes the relationship
between yi and xi in the population, where β1 and β2 are
unknown (fixed) parameters and ei is an unobservable
random error term.

A10.2
The data pairs  xi , yi  i  1, , N , are obtained by random
sampling. That is, the data pairs are collected from the
same population, by a process in which each pair is
independent of every other pair. Such data are said to be
independent and identically distributed.
Principles of Econometrics, 3rd Edition
Slide 10-5

A10.3
E  ei | xi   0. The expected value of the error term ei,
conditional on the value of xi, is zero.
This assumption implies that we have (i) omitted no important variables,
(ii) used the correct functional form, and (iii) there exist no factors that
cause the error term ei to be correlated with xi.
 If E  ei | xi   0 , then we can show that it is also true that xi and ei are
uncorrelated, and that cov  xi , ei   0 .
 Conversely, if xi and ei are correlated, then
show that E  ei | xi   0 .
Principles of Econometrics, 3rd Edition
cov  xi , ei   0 and we can
Slide 10-6

A10.4
In the sample, xi must take at least two different values.

A10.5
var  ei | xi   2 . The variance of the error term, conditional
on xi is a constant σ2.

A10.6
ei | xi ~ N  0, 2  . The distribution of the error term,
conditional on xi, is normal.
Principles of Econometrics, 3rd Edition
Slide 10-7

Under assumptions A10.1-A10.4 the least squares estimator is
unbiased.

Under assumptions A10.1-A10.5 the least squares estimator is the
best linear unbiased estimator of the regression parameters,
conditional on the x’s, and the usual estimator of σ2 is unbiased.
Principles of Econometrics, 3rd Edition
Slide 10-8

Under assumptions A10.1-A10.6 the distributions of the least squares
estimators, conditional upon the x’s, are normal, and their variances
are estimated in the usual way. Consequently the usual interval
estimation and hypothesis testing procedures are valid.
Principles of Econometrics, 3rd Edition
Slide 10-9
Figure 10.1 An illustration of consistency
Principles of Econometrics, 3rd Edition
Slide 10-10
Remark: Consistency is a “large sample” or “asymptotic” property. We have
stated another large sample property of the least squares estimators in Chapter
2.6. We found that even when the random errors in a regression model are not
normally distributed, the least squares estimators still have approximate
normal distributions if the sample size N is large enough. How large must the
sample size be for these large sample properties to be valid approximations of
reality? In a simple regression 50 observations might be enough. In multiple
regression models the number might be much higher, depending on the quality
of the data.
Principles of Econometrics, 3rd Edition
Slide 10-11

A10.3* E  ei   0 and cov  xi , ei   0
E  ei | xi   0  cov  xi , ei   0
E  ei | xi   0  E  ei   0
Principles of Econometrics, 3rd Edition
Slide 10-12

Under assumption A10.3* the least squares estimators are consistent.
That is, they converge to the true parameter values as N.

Under assumptions A10.1, A10.2, A10.3*, A10.4 and A10.5, the least
squares estimators have approximate normal distributions in large
samples, whether the errors are normally distributed or not.
Furthermore our usual interval estimators and test statistics are valid,
if the sample is large.
Principles of Econometrics, 3rd Edition
Slide 10-13
 If assumption A10.3* is not true, and in particular if cov  xi , ei   0
so that xi and ei are correlated, then the least squares estimators are
inconsistent. They do not converge to the true parameter values
even in very large samples. Furthermore, none of our usual
hypothesis testing or interval estimation procedures are valid.
Principles of Econometrics, 3rd Edition
Slide 10-14
Figure 10.2 Plot of correlated x and e
Principles of Econometrics, 3rd Edition
Slide 10-15
y  E  y   e  1  2 x  e  1  1 x  e
True model, but it gets estimated as:
yˆ  b1  b2 x  .9789  1.7034 x
…so there is s substantial bias …
Principles of Econometrics, 3rd Edition
Slide 10-16
Example
Wages = f(intelligence)
In the case
of a positive
correlation
between x and
the error
Figure 10.3 Plot of data, true and fitted regressions
Principles of Econometrics, 3rd Edition
Slide 10-17
When an explanatory variable and the error term are correlated the
explanatory variable is said to be endogenous and means
“determined within the system.”
Principles of Econometrics, 3rd Edition
Slide 10-18
yi  1  2 xi*  vi
xi  xi*  ui
X
Y
u
v
Principles of Econometrics, 3rd Edition
(10.1)
(10.2)
e
Slide 10-19
yi  1  2 xi*  vi
 1  2  xi  ui   vi
 1  2 xi   vi  2ui 
(10.3)
 1  2 xi  ei
Principles of Econometrics, 3rd Edition
Slide 10-20
cov  xi , ei   E  xi ei   E  xi*  ui   vi  2ui  
 E  2ui2   2u2  0
Principles of Econometrics, 3rd Edition
(10.4)
Slide 10-21
We will focus on this case
WAGEi  1  2 EDUCi  ei
(10.5)
Omitted factors: experience, ability and motivation.
Therefore, we expect that cov  EDUCi , ei   0.
X
Y
W
Principles of Econometrics, 3rd Edition
Slide 10-22
We will focus on this case
Other examples:





Y = crime, X = marriage, W = “marriageability”
Y = divorce, X = “shacking up”, W = “good match”
Y = crime, X = watching a lot of TV, W = “parental
involvement”
Y = sexual violence, X = watching porn, W = any unobserved
anything that would affect both W and Y
The list is endless!
X
.
Y
W
Principles of Econometrics, 3rd Edition
Slide 10-23
Qi  1  2 Pi  ei
(10.6)
There is a feedback relationship between Pi and Qi. Because of this
feedback, which results because price and quantity are jointly, or
simultaneously, determined, we can show that cov( Pi , ei )  0.
The resulting bias (and inconsistency) is called the simultaneous
equations bias.
Principles of Econometrics, 3rd Edition
X
Y
Slide 10-24
yt  1  2 yt 1  3 xt  et
AR(1) process: et  et 1  vt
If   0 there will be correlation between yt 1 and et .
In this case the least squares estimator applied to the lagged
dependent variable model will be biased and inconsistent.
Principles of Econometrics, 3rd Edition
Slide 10-25
When all the usual assumptions of the linear model hold, the method
of moments leads us to the least squares estimator. If x is random and
correlated with the error term, the method of moments leads us to an
alternative, called instrumental variables estimation, or two-stage
least squares estimation, that will work in large samples.
Principles of Econometrics, 3rd Edition
Slide 10-26
Suppose that there is another variable, z, such that

z does not have a direct effect on y, and thus it does not belong on the
right-hand side of the model as an explanatory variable ONCE X IS
IN THE MODEL!


zi should not be simultaneously affected by y either, of course
zi is not correlated with the regression error term ei. Variables with
this property are said to be exogenous

z is strongly [or at least not weakly] correlated with x, the endogenous
explanatory variable

A variable z with these properties is called an instrumental variable.
Slide 10-27
Principles of Econometrics, 3rd Edition

An instrument is a variable z correlated with x but not with the error e

In addition, the instrument does not directly affect y and thus does not
belong in the actual model as a separate regressor (of course it should affect
it through the instrumented regressor x)

It is common to have more than one instrument for x (just not good ones!)

These instruments, z1; z2; : : : ; zs, must be correlated with x, but not with e

Consistent estimation is obtained through the instrumental variables or
two-stage least squares (2SLS) estimator, rather than the usual OLS
estimator
Principles of Econometrics, 3rd Edition
Slide 10-28

Using Z, an “instrumental variable” for X is one
solution to the problem of omitted variables bias
 Z, to be a valid instrument for X
must be:
Z
X
W
Y
e
– Relevant = Correlated with X
– Exogenous = Not correlated with Y
except through its correlation with X
E  zi ei   0  E  zi  yi 1 2 xi   0

(10.16)

1
yi  ˆ 1  ˆ 2 xi  0

N


(10.17)
1
zi yi  ˆ 1  ˆ 2 xi  0

N
Based on the method of moments…
Principles of Econometrics, 3rd Edition
Slide 10-30
Solving the previous system, we obtain a new estimator that
cleanses the endogeneity of X and exploits only the component
of the variation of X that is not correlated with e instead:
ˆ  N  zi yi   zi  yi 
2
N  zi xi   zi  xi
  zi  z  yi  y 
  zi  z  xi  x 
(10.18)
ˆ 1  y  ˆ 2 x
We do that by ensuring that we only use the predicted value of X from its
regression on Z in the main regression
Principles of Econometrics, 3rd Edition
Slide 10-31
These new estimators have the following properties:

They are consistent, if E  zi ei   0.

In large samples the instrumental variable estimators have
approximate normal distributions. In the simple regression model
2



ˆ 2 ~ N  2 , 2
2 
 r  x  x  
zx
i


Principles of Econometrics, 3rd Edition
(10.19)
Slide 10-32

The error variance is estimated using the estimator
ˆ 2IV 
  yi  ˆ 1  ˆ 2 xi 
2
N 2
For:
2



ˆ 2 ~ N  2 , 2
2 
 r  x  x  
zx
i


(10.19)
The stronger the correlation between the instrument and X the better!
Principles of Econometrics, 3rd Edition
Slide 10-33
 
var ˆ 2 
2
2
zx
r
  xi  x 
2
var  b2 

rzx2
Using the instrumental variables is less efficient than OLS (because
you must throw away some information on the variation of X) so it
leads to wider confidence intervals and less precise inference.
The bottom line is that when instruments are weak, instrumental
variables estimation is not reliable: you throw away too much
information when trying to avoid the endogeneity bias
Principles of Econometrics, 3rd Edition
Slide 10-34

Not all of the available variation in X is
used
 Only that component of X that is
“explained” by Z is used to explain Y
X
Y
Z
X = Endogenous variable
Y = Response variable
Z = Instrumental variable
X
Y
Best-case scenario: A lot of X
is explained by Z, and most of
the overlap between X and Y is
accounted for
Y
Realistic scenario: Very little of
X is explained by Z, and/or
what is explained does not
overlap much with Y
Z
X
Z

The IV estimator is BIASED
 E(bIV) ≠ β (finite-sample bias) but consistent: E(b) → β as N → ∞
So IV studies must often have very large samples
 But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β anyway…

Asymptotic behavior of IV:
plim(bIV) = β + Cov(Z,e) / Cov(Z,X)
 If Z is truly exogenous, then Cov(Z,e) = 0

Three different models to be familiar with
 First stage (“Reduced form”): EDUC = α0 + α1Z + ω
 Structural model: WAGES = β0 + β1EDUC + ε
 WAGES = δ0 + δ1Z + ξ

ω
An interesting equality:
δ1 = α1 × β1
Z
α1
X
ε
β1
Y
so…
β1 = δ1 / α1
ξ
Z
δ1
Y
xˆ  .1947  .5700 z1  .2068 z2
(se) (.079) (.089)
(.077)
yˆ IV _ z1 , z2  1.1376  1.0399 x
(se)
Principles of Econometrics, 3rd Edition
(.116)
(.194)
(10.26)
(10.27)
Slide 10-39
yˆOLS  .9789  1.7034 x
yˆ IV _ z1  1.1011  1.1924 x
(se) (.088) (.090)
(se)
yˆ IV _ z2  1.3451  .1724 x
yˆ IV _ z3  .9640  1.7657 x
(se)
(se)
(.256) (.797)
Principles of Econometrics, 3rd Edition
(.109) (.195)
(.095) (.172)
Slide 10-40
ln WAGE   1  2 EDUC  3 EXPER  4 EXPER 2  e
ln WAGE  =  .5220  .1075  EDUC  .0416  EXPER  .0008  EXPER 2
(se)
(.1986) (.0141)
Principles of Econometrics, 3rd Edition
(.0132)
(.0004)
Slide 10-41
EDUC  9.7751  .0489  EXPER  .0013  EXPER 2  .2677  MOTHEREDUC
(se)
(.4249) (.0417)
(.0012)
(.0311)
We hope for a high t ratio here
ln WAGE   .1982  .0493  EDUC  .0449  EXPER  .0009  EXPER 2
(se)
(.4729) (.0374)
(.0136)
(.0004)
Check it out: as expected much lower than from OLS!!!
Principles of Econometrics, 3rd Edition
Slide 10-42
Use mroz.dta
summarize
drop if lfp==0
gen lwage = log(wage)
gen exper2 = exper^2
* Basic OLS estimation
reg lwage educ exper exper2
estimates store ls
* IV estimation
reg educ exper exper2 mothereduc
predict educ_hat
reg lwage educ_hat exper exper2
Include in z all
G exogenous
variables and the
instruments
available
Exogenous variables
Instrument
themselves!
But check that the latter will give you
wrong s.e. so not reccommended to
run 2SLS manually
Principles of Econometrics, 3rd Edition
43
In econometrics, two-stage least squares (TSLS or 2SLS) and
instrumental variables (IV) estimation are often used interchangeably
The `two-stage' terminology comes from the time when the easiest way
to estimate the model was to actually use two separate least squares
regressions
With better software, the computation is done in a single step to ensure
the other model statistics are computed correctly
Principles of Econometrics, 3rd Edition
44
In econometrics, two-stage least squares (TSLS or 2SLS) and
instrumental variables (IV) estimation are often used interchangeably
The `two-stage' terminology comes from the time when the easiest way
to estimate the model was to actually use two separate least squares
regressions
. reg lwage educ exper exper2
Source
SS
df
MS
Model
Residual
35.0222967
188.305145
3
424
11.6740989
.444115908
Total
223.327442
427
.523015086
lwage
Coef.
educ
exper
exper2
_cons
.1074896
.0415665
-.0008112
-.5220406
Principles of Econometrics, 3rd Edition
Std. Err.
.0141465
.0131752
.0003932
.1986321
t
7.60
3.15
-2.06
-2.63
Number of obs
F( 3,
424)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.002
0.040
0.009
=
=
=
=
=
=
428
26.29
0.0000
0.1568
0.1509
.66642
[95% Conf. Interval]
.0796837
.0156697
-.0015841
-.9124667
.1352956
.0674633
-.0000382
-.1316144
45
First stage or “reduced form” equation:
. reg educ exper exper2 mothereduc
Source
SS
df
MS
Model
Residual
340.537834
1889.65843
3
424
113.512611
4.45674158
Total
2230.19626
427
5.22294206
educ
Coef.
exper
exper2
mothereduc
_cons
.0488615
-.0012811
.2676908
9.775103
Std. Err.
.0416693
.0012449
.0311298
.4238886
t
1.17
-1.03
8.60
23.06
Number of obs
F( 3,
424)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.242
0.304
0.000
0.000
=
=
=
=
=
=
428
25.47
0.0000
0.1527
0.1467
2.1111
[95% Conf. Interval]
-.0330425
-.003728
.2065029
8.941918
.1307655
.0011659
.3288787
10.60829
Good news!!
Principles of Econometrics, 3rd Edition
46
Second stage equation:
. reg lwage educ_hat exper exper2
Wrong standard errors!!!
Source
SS
df
MS
Model
Residual
10.181204
213.146238
3
424
3.39373467
.502703391
Total
223.327442
427
.523015086
lwage
Coef.
educ_hat
exper
exper2
_cons
.0492629
.0448558
-.0009221
.1981861
Principles of Econometrics, 3rd Edition
Std. Err.
.0390562
.0141644
.000424
.4933427
t
1.26
3.17
-2.17
0.40
Number of obs
F( 3,
424)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.208
0.002
0.030
0.688
=
=
=
=
=
=
428
6.75
0.0002
0.0456
0.0388
.70902
[95% Conf. Interval]
-.0275049
.0170147
-.0017554
-.7715157
.1260308
.072697
-.0000887
1.167888
47
Second stage equation:
Correct standard errors!!!
. ivregress 2sls lwage (educ=mothereduc) exper exper2
Instrumental variables (2SLS) regression
lwage
Coef.
educ
exper
exper2
_cons
.049263
.0448558
-.0009221
.1981861
Instrumented:
Instruments:
Std. Err.
.0372607
.0135132
.0004045
.4706623
Number of obs
Wald chi2(3)
Prob > chi2
R-squared
Root MSE
z
1.32
3.32
-2.28
0.42
P>|z|
0.186
0.001
0.023
0.674
=
=
=
=
=
428
22.25
0.0001
0.1231
.67642
[95% Conf. Interval]
-.0237666
.0183704
-.0017148
-.7242952
.1222925
.0713413
-.0001293
1.120667
educ
exper exper2 mothereduc
Principles of Econometrics, 3rd Edition
48
Second stage equation:
“Correcter” standard errors
Wit a small sample and robust
To heteroskedasticity!!!
. ivregress 2sls lwage (educ=mothereduc) exper exper2, vce(robust) small
Instrumental variables (2SLS) regression
lwage
Coef.
educ
exper
exper2
_cons
.049263
.0448558
-.0009221
.1981861
Instrumented:
Instruments:
Robust
Std. Err.
.0380396
.0156038
.0004319
.4891462
Number of obs
F(
3,
424)
Prob > F
R-squared
Adj R-squared
Root MSE
t
1.30
2.87
-2.14
0.41
P>|t|
0.196
0.004
0.033
0.686
=
=
=
=
=
=
428
5.50
0.0010
0.1231
0.1169
.6796
[95% Conf. Interval]
-.0255067
.0141853
-.001771
-.7632673
.1240326
.0755264
-.0000732
1.159639
educ
exper exper2 mothereduc
Principles of Econometrics, 3rd Edition
49
A 2-step process.

Regress x on a constant term, z and all other exogenous variables G,
and obtain the predicted values xˆ.

Use x̂ as an instrumental variable for x.
Principles of Econometrics, 3rd Edition
Slide 10-50
Two-stage least squares (2SLS) estimator:

Stage 1 is the regression of x on a constant term, z and all other
exogenous variables G, to obtain the predicted values x̂ . This first
stage is called the reduced form model estimation.

Stage 2 is ordinary least squares estimation of the simple linear
regression
yi  1  2 xˆi  errori
Principles of Econometrics, 3rd Edition
(10.23)
Slide 10-51
2

var ˆ 2 
2
ˆ
  xi  x 
 
ˆ 
2
IV

yi  ˆ 1  ˆ 2 xi
Principles of Econometrics, 3rd Edition

2
N 2
2
ˆ

IV
var ˆ 2 
2
  xˆi  x 
 
(10.24)
(10.25)
Slide 10-52
If we regress education on all the exogenous variables and the TWO instruments:
Rule of thumb
These look promising 
for strong instruments: F>10
In fact: F test says
Null hypothesis: the regression parameters are zero for the variables
mothereduc, fathereduc
Test statistic: F(2, 423) = 55.4003, p-value 4.26891e-022
Principles of Econometrics, 3rd Edition
Slide 10-53
ln WAGE = .0481  .0614 EDUC  .0442 EXPER  .0009 EXPER 2
(se)


(.4003) (.0314)
(.0134)
(.0004)
ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small
estimates store iv
With the additional instrument, we achieve a significant result in this case
Principles of Econometrics, 3rd Edition
Slide 10-54
Is a bit more complex, but the idea is that you need at least as many
Instruments as you have endogenous variables
You cannot use the F test we just saw to test for the weakness of
the instruments
Principles of Econometrics, 3rd Edition
Slide 10-55
When testing the null hypothesis H 0 : k  c use of the test statistic
t  ˆ  c se ˆ is valid in large samples. It is common, but not

k
  
k
universal, practice to use critical values, and p-values, based on the
Student-t distribution rather than the more strictly appropriate N(0,1)
distribution. The reason is that tests based on the t-distribution tend to
work better in samples of data that are not large.
Principles of Econometrics, 3rd Edition
Slide 10-56
When testing a joint hypothesis, such as H 0 : 2  c2 , 3  c3 , the test
may be based on the chi-square distribution with the number of
degrees of freedom equal to the number of hypotheses (J) being
tested. The test itself may be called a “Wald” test, or a likelihood ratio
(LR) test, or a Lagrange multiplier (LM) test. These testing procedures
are all asymptotically equivalent
Another complication: you might need to use tests that are robust to
heteroskedasticity and/or autocorrelation…
Principles of Econometrics, 3rd Edition
Slide 10-57
y  1  2 x  e
eˆ  y  ˆ  ˆ x
1
2
R  1   eˆ
2
2
i
  yi  y 
2
Unfortunately R2 can be negative when based on IV estimates.
Therefore the use of measures like R2 outside the context of the least
squares estimation should be avoided (even if GRETL and STATA
produce one!)
Principles of Econometrics, 3rd Edition
Slide 10-58

Can we test for whether x is correlated with the error term? This might give
us a guide of when to use least squares and when to use IV estimators. TEST
OF EXOGENEITY: (Durbin-Wu) HAUSMAN TEST

Can we test whether our instrument is sufficiently strong to avoid the
problems associated with “weak” instruments? TESTS FOR WEAK
INSTRUMENTS

Can we test if our instrument is valid, and uncorrelated with the regression
error, as required? SOMETIMES ONLY  WITH A SARGAN TEST
Principles of Econometrics, 3rd Edition
Slide 10-59
H 0 : cov  xi , ei   0

H1 : cov  xi , ei   0
If the null hypothesis is true, both OLS and the IV estimator are
consistent. If the null hypothesis holds, use the more efficient
estimator, OLS.

If the null hypothesis is false, OLS is not consistent, and the IV
estimator is consistent, so use the IV estimator.
Principles of Econometrics, 3rd Edition
Slide 10-60
yi  1  2 xi  ei
Let z1 and z2 be instrumental variables for x.
1.
Estimate the model xi  1  1 zi1  2 zi 2  vi by least squares, and
obtain the residuals vˆ  x  ˆ  ˆ z  ˆ z . If there are more than
i
i
1
1 i1
2 i2
one explanatory variables that are being tested for endogeneity,
repeat this estimation for each one, using all available instrumental
variables in each regression.
Principles of Econometrics, 3rd Edition
Slide 10-61
2.
Include the residuals computed in step 1 as an explanatory variable
in the original regression,yi  1  2 xi  vˆi  ei . Estimate this
"artificial regression" by least squares, and employ the usual t-test
for the hypothesis of significance
H 0 :   0  no correlation between xi and ei 
H1 :   0  correlation between xi and ei 
Principles of Econometrics, 3rd Edition
Slide 10-62
3.
If more than one variable is being tested for endogeneity, the test
will be an F-test of joint significance of the coefficients on the
included residuals.

Note: This is a test for the exogeneity of the regressors xi and not for
the exogeneity of the instruments zi. If the instruments are not valid,
the Hausman test is not valid either.
Principles of Econometrics, 3rd Edition
Slide 10-63
* Hausman test regression based
reg educ exper exper2 mothereduc fathereduc
predict vhat, residuals
reg lwage educ exper exper2 vhat
reg lwage educ exper exper2 vhat, vce(robust)

There are several different ways of computing this test, so don't worry if your result from other
software packages differs from the one you compute manually using the above script
Principles of Econometrics, 3rd Edition
Slide 10-64
.
. reg lwage educ exper exper2 vhat, vce(robust)
Linear regression
Number of obs
F( 4,
423)
Prob > F
R-squared
Root MSE
lwage
Coef.
educ
exper
exper2
vhat
_cons
.0613966
.0441704
-.000899
.0581666
.0481003
Robust
Std. Err.
.0326667
.0151219
.0004152
.0364135
.4221019
=
=
=
=
=
428
21.52
0.0000
0.1624
.66502
so
t
1.88
2.92
-2.16
1.60
0.11
P>|t|
0.061
0.004
0.031
0.111
0.909
[95% Conf. Interval]
-.0028127
.0144469
-.0017152
-.0134073
-.7815781
.125606
.0738939
-.0000828
.1297405
.8777787
So education is only barely exogenous at about 10%
Principles of Econometrics, 3rd Edition
Slide 10-65
hausman iv ls, constant sigmamore
Coefficients
(b)
(B)
iv
ls
educ
exper
exper2
_cons
.0613966
.0441704
-.000899
.0481003
.1074896
.0415665
-.0008112
-.5220406
Forces Stata to use the OLS
Residuals when calculating the
Error variance in both estimators
(b-B)
Difference
so
-.046093
.0026039
-.0000878
.5701409
sqrt(diag(V_b-V_B))
S.E.
.0276406
.0015615
.0000526
.3418964
b = consistent under Ho and Ha; obtained from ivregress
B = inconsistent under Ha, efficient under Ho; obtained from regress
Test:
Ho:
difference in coefficients not systematic
chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
2.78
Prob>chi2 =
0.0954
(V_b-V_B is not positive definite)
So education is only barely exogenous at about 10%
Principles of Econometrics, 3rd Edition
Slide 10-66
y  1  2 x2 
 G xG  G 1 xG 1  e
xG 1  1   2 x2 
  G xG  1 z1  v
If we have L > 1 instruments available then the reduced form equation is
xG 1  1   2 x2 
Principles of Econometrics, 3rd Edition
  G xG  1 z1 
L zL  v
Slide 10-67
y  1  2 x2 
 G xG  G 1 xG 1  e
xG 1  1   2 x2 
  G xG  1 z1  v
Then test with an F test whether the instruments help to determine the
value of the endogenous variable. Rule of thumb F > 10 for one
endogenous regressor
xG 1  1   2 x2 
Principles of Econometrics, 3rd Edition
  G xG  1 z1 
L zL  v
Slide 10-68

As we saw before:




* Testing for weak instruments
reg educ exper exper2 mothereduc
reg educ exper exper2 fathereduc
reg educ exper exper2 mothereduc fathereduc
test mothereduc fathereduc

Principles of Econometrics, 3rd Edition
Slide 10-69
.
. estat firststage

* Testing
for weak instruments using estat
First-stage regression summary statistics
ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small
estat firststage
Adjusted
Partial
Variable
R-sq.
R-sq.
R-sq.
F(2,423)
educ
0.2115
0.2040
0.2076
55.4003
so
# of endogenous regressors:
# of excluded instruments:
5%
2SLS relative bias
2SLS Size of nominal 5% Wald test
LIML Size of nominal 5% Wald test
Principles of Econometrics, 3rd Edition
0.0000
We are quite alright here
Minimum eigenvalue statistic = 55.4003
Critical Values
Ho: Instruments are weak
Prob > F
10%
19.93
8.68
10%
20%
(not available)
15%
11.59
5.33
20%
8.75
4.42
1
2
30%
25%
7.25
3.92
Slide 10-70

* Robust tests
reg educ exper exper2 mothereduc fathereduc, vce(robust)
test mothereduc fathereduc
.
. reg educ exper exper2 mothereduc fathereduc, vce(robust)
Linear regression
Number of obs
F( 4,
423)
Prob > F
R-squared
Root MSE
educ
Coef.
exper
exper2
mothereduc
fathereduc
_cons
.0452254
-.0010091
.157597
.1895484
9.10264
Robust
Std. Err.
.0419107
.0013233
.0354502
.0324419
.4241444
.
. test mothereduc fathereduc
( 1)
( 2)
t
1.08
-0.76
4.45
5.84
21.46
P>|t|
so
0.281
0.446
0.000
0.000
0.000
=
=
=
=
=
428
25.76
0.0000
0.2115
2.039
[95% Conf. Interval]
-.0371538
-.0036101
.0879165
.125781
8.268947
.1276046
.0015919
.2272776
.2533159
9.936333
Still alright
mothereduc = 0
fathereduc = 0
F(
2,
423) =
Prob > F =
Principles of Econometrics, 3rd Edition
49.53
0.0000
Slide 10-71

* Robust tests using ivregress
ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small vce(robust)
estat firststage
.
. estat firststage
First-stage regression summary statistics
Variable
R-sq.
Adjusted
R-sq.
educ
0.2115
0.2040
Partial
R-sq.
Robust
F(2,423)
0.2076
49.5266
so
Principles of Econometrics, 3rd Edition
Prob > F
0.0000
Still alright
Slide 10-72
We want to check that the instrument is itself exogenous…but you can only do
it for surplus instruments if you have them (overidentified equation)…
1.
Compute the IV estimates ˆ k using all available instruments, including the
G variables x1=1, x2, …, xG that are presumed to be exogenous, and the L
instruments z1, …, zL.
2.
Obtain the residuals
Principles of Econometrics, 3rd Edition
eˆ  y  ˆ 1  ˆ 2 x2 
 ˆ K xK .
Slide 10-73
3.
Regress ê on all the available instruments described in step 1.
4.
Compute NR2 from this regression, where N is the sample size and
R2 is the usual goodness-of-fit measure.
5.
If all of the surplus moment conditions are valid, then NR 2 ~ (2L B ) .
If the value of the test statistic exceeds the 100(1−α)-percentile from
2

the ( L  B ) distribution, then we conclude that at least one of the
surplus moment conditions restrictions is not valid.
Principles of Econometrics, 3rd Edition
Slide 10-74
# Sargan test of overidentification
ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2,
small
predict ehat, residuals
quietly reg ehat mothereduc fathereduc exper exper2
scalar nr2 = e(N)*e(r2)
scalar chic = invchi2tail(1,.05)
scalar pvalue = chi2tail(1,nr2)
di "NR^2 test of overidentifying restriction = " nr2
di "Chi-square critical value 1 df, .05 level = " chic
di "p value for overidentifying test 1 df, .05 level = " pvalue
You should learn about the Cragg-Uhler test if you end up having
more than one endogenous variable
Principles of Econometrics, 3rd Edition
Slide 10-75
# Sargan test of overidentification
.
. di "NR^2 test of overidentifying restriction = " nr2
NR^2 test of overidentifying restriction = .37807151
.
. di "Chi-square critical value 1 df, .05 level = " chic
so .05 level = 3.8414588
Chi-square critical value 1 df,
.
. di "p value for overidentifying test 1 df, .05 level = " pvalue
p value for overidentifying test 1 df, .05 level = .53863714
You should learn about the Cragg-Uhler test if you end up having
more than one endogenous
So we arevariable
quite safe here…
Principles of Econometrics, 3rd Edition
Slide 10-76
# Sargan test of overidentification
. quietly ivregress 2sls lwage (educ=mothereduc fathereduc) exper exper2, small
.
. estat overid
Tests of overidentifying restrictions:
so
Sargan (score) chi2(1) =
Basmann chi2(1)
=
.378071
.373985
(p = 0.5386)
(p = 0.5408)
You should learn about the Cragg-Uhler test if you end up having
more than one endogenous
So we arevariable
quite safe here…
Principles of Econometrics, 3rd Edition
Slide 10-77















asymptotic properties
conditional expectation
endogenous variables
errors-in-variables
exogenous variables
finite sample properties
Hausman test
instrumental variable
instrumental variable estimator
just identified equations
large sample properties
over identified equations
population moments
random sampling
reduced form equation
Principles of Econometrics, 3rd Edition





sample moments
simultaneous equations bias
test of surplus moment conditions
two-stage least squares estimation
weak instruments
Slide 10-78
Download