ECON 837 - Econometrics Answer Keys Midterm exam Exercise 1: β

advertisement
ECON 837 - Econometrics
Answer Keys Midterm exam
Exercise 1:
1. The estimator of β1 obtained by running a regression of y on X1 only is
β̂1 = (X1′ X1 )−1 X1′ y
= (X1′ X1 )−1 X1′ (X1 β1 + X2 β2 + u)
= β1 + (X1′ X1 )−1 (X1′ X2 )β2 + (X1′ X1 )−1 X1′ u
And we can deduce the bias:
E(β̂1 |X) − β1 = P12 β2 , where P12 = (X1′ X1 )−1 X1′ X2 .
since E(u) = 0 from the classical assumptions.
2. The matrix P12 is such that:
(
P12 =
X1′ X1
n
)−1 (
X1′ X2
n
)
where
X1′ X1 /n
=
X1′ X2 /n =
n
∑
i=1
n
∑
(x1i x′1i )/n sample variance of X1
(x1i x′2i )/n sample covariance btw X1 and X2
i=1
β̂1 will be unbiased if (X1′ X2 )/n = 0, that is, if the sample covariance between
X1 and X2 is exactly zero. It will also be unbiased if β2 = 0, that is if X2 are
irrelevant explanatory variables.
3. (a) Using the notations of the previous questions, we have: X1 = [1 P ] and
X2 = [I]. We can then use the formula from question 1. above to get
(
E[
α̂
β̂
(
)
|X] −
1
α
β
)
= P12 β2
with
P12
1
∑
= ∑ 2
n i Pi − ( i Pi )2
Finally, we get:
E(β̂|X) − β =
=
−
( ∑
)
∑
∑
∑
2
P
I
−
P
(P
I
)
i i
i i
i i
i i
∑
∑
∑ i
− i Pi i Ii + n i (Pi Ii )
∑
Pi
i∑
n
∑
∑
n
i Ii +∑
2
i Pi − (
i
i (Pi Ii )
Pi )2
×γ
sample cov. between P and I
×γ
sample var of P
We expect the sign of the bias to be positive, due to a positive covariance
between P and I and a positive sign of γ .
(b) Examples of other variables that should have been considered: some variable
related to the quality of the furniture (e.g. brand name); some variables related to socio-economic characteristic (eg gender, number of family members
living at home). Their omission may bias the estimated price eect if they
are correlated with other covariates as well as with the dependent variables.
Exercise 2:
1. Not necessarily. Identication and computational issues only arise from exact
multicollinearity. Here, we may have "near-multicollinearity" which does not
invalidate or prevent OLS, but may create some numerical and stability issues.
2. The main idea is to remove some variables that are highly correlated with other
(included) variables. There are 2 ways to decide which variables to remove: based
on statistical theory, remove those whose coecient is not signicantly dierent
from zero; based on economic theory, remove those who should not be included
according to econ theory.
3. One needs to make sure the model contains all possible relevant variables (in other
words, one has not forgotten a relevant variable that could cause endogeneity
issues and bias our estimators). The selection of variables that remain in the
2
model depend on the goal of the model: are we testing some economic theory,
or are we looking for the best t in a statistical sense? If we are testing an econ
theory, one can compare estimation results based on the unconstrained model,
and the constrained model where the associated restrictions have been imposed
on the parameters; if we are looking for best t model, one can work remove the
variables associated with parameters that are not signicantly dierent from 0:
however, one has to be careful not to create endogeneity bias.
Exercise 3:
1. Since we are testing a univariate parameter (ie 1 restriction only) there are 2
test statistics that can be used to design an asymptotic test for H0 : µ = µ0 vs
H1 : µ ̸= µ0 at level 0.95: the t-test and the Wald test.
t =
W =
µ̂ − µ0
ase(µ̂)
ˆ
with µ̂ =
(µ̂ − µ0 )2
avar(µ̂)
ˆ
1∑
σ̂
1∑
xi , ase(µ̂)
ˆ
= √ , σ̂ 2 =
(xi − µ̂)2
T i
T
n
with avar(µ̂)
ˆ
= σ̂ 2 /n
Since xi are iid, we can apply a CLT on xi to get:
√
d
n(µ̂ − µ) → N (0, σ 2 )
(Note that
∑
xi /n = µ̂).
i
Under H0 , the limit distribution is:
√
d
n(µ̂ − µ0 ) → N (0, σ 2 )
√
n
d
⇒
(µ̂ − µ0 ) → N (0, 1)
σ
√
n
d
⇒
(µ̂ − µ0 ) → N (0, 1) and
σ̂
n
d
(µ̂ − µ0 )2 → χ2 (1)
2
σ̂
where σ̂ is the consistent estimator of σ introduced above. Then, we get:
µ̂ − µ0 d
√ → N (0, 1)
σ̂/ n
n(µ̂ − µ0 )2 d 2
W =
→ χ (1)
σ̂ 2
t =
3
The associated decision rules are:
- [t-test] Reject H0 if |t| > t1−α/2 with t1−α/2 the (1−α/2)-quantile of the standard
normal distribution;
- [Wald test] Reject H0 if W > χ21−α (1) with χ21−α (1) the (1 − α)-quantile of the
chi-square distribution with 1 degree of freedom.
2. To show that our test is consistent against the following xed alternative, H1 :
µ = µ1 where µ1 ̸= µ0 , we need to show that the probability of rejecting H0
converges to 1 as the sample size increases under H1 . To do so, we derive the
asymptotic distribution of the test statistic under H1 .
√
n
t =
(µ̂ − µ0 )
σ̂
√
n
=
(µ̂ − µ1 + µ1 − µ0 )
√σ̂
√
n
n
=
(µ̂ − µ1 ) +
(µ1 − µ0 )
σ̂
σ̂
Using a CLT on the xi , we can show that the rst term converges in distribution
to a normal with mean 0 and variance 1; since µ1 ̸= µ0 , we can show that the
second term diverges either towards innity or -innity (depending on the sign of
(µ1 − µ0 ). We can then compute the asymptotic power:
P ower = P (|t| > 1.96)
√
n
= P (|Z +
(µ1 − µ0 )| > 1.96)
σ̂
n
→ 1
√
n
The limit follows since | n(µ1 − µ0 )| →
∞, and therefore as the sample size
increase the quantity becomes eventually larger than 1.96.
3. To derive the asymptotic power of our test under the following sequence of local
√
alternatives, H1,n : µ = µ0 + δ/ n, we rst derive the asymptotic distribution of
4
our test statistic under H1,n .
t =
=
=
=
√
n
(µ̂ − µ0 )
√σ̂
n
(µ̂ − µ1,n + µ1,n − µ0 )
σ̂
√
√
n
n
(µ̂ − µ1,n ) +
(µ1,n − µ0 )
σ̂
√σ̂
n
δ
(µ̂ − µ1,n ) +
σ̂
σ̂
Using a CLT on the xi , we can show that the rst term converges in distribution
to a normal with mean 0 and variance 1; the second term converges to δ/σ since σ̂
is a consistent estimator of σ . Overall, the test statistic converges in distribution
towards a normal with mean δ/σ and variance 1. We can then compute the
asymptotic power:
P ower = P (|t| > 1.96)
= P (|Zδ | > 1.96) where Z ∼ N (δ/σ, 1)
≥ α
The power is some probability between α and 1 that depends on δ : the larger δ ,
the larger the power.
Exercise 4 [13 points]:
1. The OLS estimator of β is

β̂ = (Y2′ Y2 )−1 Y2′ Y1
y2,1


y1,1

 . 
. 
where Y2 = 
 ..  , Y1 =  .. 
y2,n
β̂ = (Y2′ Y2 )−1 Y2′ (βY2 + u1 )
( ′ )−1 ( ′ )
Y2 Y2
Y2 u1
= β+
n
n
5
y1,n
P
2
Since y2,i are iid, we can apply a LLN to deduce: Y2′ Y2 /n →
E(y2,i
). Also, we
have:
Y2′ u1
n
(αX ′ + u′2 )u1
n
′
X u1 u′2 u1
= α
+
n
n
P
→ αE(xt u1,i ) + E(u2,i u1,i )
=
By assumption xi is an exogenous variables, which means E(xy u1,i ) = 0. By
assumption, E(u2,i u1,i ) = σ12 . So, in general, this estimator is not consistent.
In order to recover consistency, we need to restrict σ12 = 0, that is no correlation
between the error terms in both equations: such restriction would also guarantee
that there is no endogeneity issues, which would make OLS the preferred estimator
(and consistent).
2. The OLS estimator of α is

α̂ = (X ′ X)−1 X ′ Y2
x1

. 
with X = 
 .. 
xn
= (X ′ X)−1 X ′ (αX + U2 )
( ′ )−1 ( ′ )
X U2
XX
= α+
n
n
P
Since xi is iid, we can apply a LLN to show: X ′ X/n →
E(x2i ); in addition, we
P
also have: X ′ u2 /n →
E(xi u2,i ) = 0 since xi is exogenous. We conclude that this
estimator is consistent.
3. The IV estimator of β using x as an instrument is:
β̂iv = (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1
where Ŷ2 = α̂X = Px Y2
= (Y2′ Px Y2 )−1 Y2′ Px Y1
= (Y2′ Px Y2 )−1 Y2′ Px (βY2 + u1 )
= β + (Y2′ Px Y2 )−1 Y2′ Px u1
6
We can show that the second term converges in probability toward 0:
(Y2′ Px Y2 )−1 Y2 Px u1 = [(Y2′ X)/n(X ′ X/n)−1 (X ′ Y2 )/n]−1 (Y2′ X)/n(X ′ X/n)−1 X ′ u1 /n
where, as previously shown and discussed,
− X ′ Y2 /n → E(y2i xi ) = α
P
− X ′ X/n → E(x2i ) ̸= 0
P
− X ′ u1 /n → E(xi u1,i ) = 0
P
Hence, the IV estimator is consistent.
When comparing the OLS and the IV estimator of β : the IV estimator should be
used when there is endogeneity to deliver a consistent estimator; otherwise, the
OLS estimator should be used.
4. (a) The OLS estimator of α in the updated model is
α̂ = (X ′ X)−1 X ′ Y2
= (X ′ X)−1 X ′ (αn X + U2 )
( ′ )−1 ( ′ )
XX
X U2
= αn +
n
n
P
→ 0
√
n
where the rst term αn = α/ n →
0 and the second term converges in
probability to 0 as shown previously.
In addition, we have:
√
nα̂ = α +
(
X ′X
n
)−1 (
)
√ X ′ U2
n
n
→ α + N (0, E(x2i )−2 E(x2i u22i )
d
(b) In the updated model, the IV estimator of β using x as an instrument is
such that:
β̂iv = (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1
= β + (Y2′ Px Y2 )−1 Y2′ Px u1
7
where
X ′ Y2
n
=
=
X ′ (αn X + u2 )
n
′
α X X X ′ u2
√
+
n
n n
P
→ 0
We need to rescale this expression in order to nd its limit:
√ X ′ Y2
n
n
√ X ′ (αn X + u2 )
n
n
′
√
XX
X ′ u2
= α
+ n
n
n
d
2
→ Z2 ∼ N (αE(xi ), E(x2i u2i,2 ))
=
So overall, we have:
β̂iv
= (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1
= β + (Y2′ Px Y2 )−1 Y2′ Px u1
[√
(
)−1 √ ′ ]−1 [ √ ′ ( ′ )−1 √ ′ ]
nY2′ X X ′ X
nX Y2
nY2 X X X
nX u1
= β+
n
n
n
n
n
n
→ β + [Z2′ E(x2i )−1 Z2 ]−1 [Z2′ E(x2i )−1 Z1 ]
d
√
′
d
where a CLT yields nXn u1 →
Z1 ∼ N (0, E(x2i u21,i )). The limit of the iv
estimator is stochastic and it is not consistent anymore.
(c)
i. The OLS estimator of α is now
α̂ = (X ′ X)−1 X ′ Y2
= (X ′ X)−1 X ′ (αn X + U2 )
( ′ )−1 ( ′ )
XX
X U2
= αn +
n
n
P
→ 0
n
0 and the second term converges in
where the rst term αn = α/nk →
probability to 0 as shown previously.
8
In addition, when k < 1/2, we have:
(
k
n α̂ = α +
X ′X
n
since
n
′
U2
nk
√
=
n
n
kX
)−1 (
)
′
P
k X U2
n
→α
n
√ ′
nX U2 d
→0
n
ii. In the updated model, the IV estimator of β using x as an instrument
is such that:
= (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1
β̂iv
= β + (Y2′ Px Y2 )−1 Y2′ Px u1
[
(
)−1 k ′ ]−1 [ k ′ ( ′ )−1 k ′ ]
nk Y2′ X X ′ X
n X Y2
n Y2 X X X
n X u1
= β+
n
n
n
n
n
n
d
→ β
since
when k < 1/2
nk X ′ Y2
X ′X
nk
=α
+√
n
n
n
√
nX ′ u2 P
→ αE(x2i )
n
Hence, when k < 1/2, the IV estimator is consistent.
When k > 1/2, there is no well-dened limit distribution.
(d) This is the typical framework used to model "weak instruments": these are
instruments with limited explanatory power in the sense that they only have
"small" correlation with the endogenous variable. To model such "small
correlation" and still be able to derive some asymptotic distribution, econometrician rely on the trick where the value of that correlation is tied to the
√
sample size. When the rate is exactly n, the information disappears as fast
as it accumulates when increasing the sample size (CLT), and the estimator
is not consistent with a non-standard distribution; when the rate is slower
nk and k < 1/2 (instrument is near-weak, a little better than weak, but not
9
quite strong), one can still get consistent estimator and the limit distribution
is standard, but associate with a slower rate of convergence; nally, with nk
and k > 1/2, everything is lost, the instrument is "too weak".
10
Download