Uploaded by Pier Scal

PS1 Solutions

advertisement
Solutions to Problem Set 1
Econometrics (30413)
Spring 2022
Theory Questions
Question 1
Given the model:
Y = Xβ + e
e ∼i.i.d.N (0, σ 2 I)
consider a general linear estimator:
0
−1
0
β̃ = (X X) X + C Y
K×1
K×T
K×K
K×T T ×1
where C is a deterministic matrix.
a) Write down the condition to be imposed on C so that E(β̃ = β).
b) Supposing that the condition in a) is satisfied, write down the variance-covariance matrix of
β̃.
c) Show whether the estimator β̃ is consistent.
d) Show which condition has to be imposed on C in order for the model fit X β̃ and the residuals
Y − X β̃ to be orthogonal.
1
Solution
a) β̃ can be written as:
β̃ = [(X 0 X)−1 X 0 + C][Xβ + e] = β + (X 0 X)−1 X 0 e + CXβ + Ce
so that E(β̃) = β if and only if CX = 01 .
b)
V ar(β̃) = σ 2 [(X 0 X)−1 X 0 + C][(X 0 X)−1 X 0 + C]0
0 0
CX (X 0 X)−1 + CC 0 ] = σ 2 [(X 0 X)−1 + CC 0 ]
= σ 2 [(X 0 X)−1 + (X 0 X)−1 |X{z
C} + |{z}
=0
=0
c) We can write:
0 −1
σ2
XX
0
V ar(β̃) =
∗
+ T ∗ CC
T
T
2
2
σ
σ
−1
=
∗ ΣXX +
∗ T ∗ CC 0
T
T
The first term will certainly tend to 0 for a big enough T , as it is the product between a
constant and a term that tends towards 0.
However, the second term does not tend to 0 when T increases, implying that V ar(˜(β))
doesn’t tend to 0 for T → ∞.
The β̃ estimator is thus not consistent, unless we impose a condition on the limit of T ∗ CC 0
(for instance if we impose that it tends to a constant).
d) In order for the model fit to be orthogonal to the residuals, the following condition has to
be satisfied:
β̃ 0 X 0 ẽ = 0
We can rewrite the residuals as:
1
Notice that CX = 0 6= XC = 0. However, if CX = 0, then also X 0 C 0 = 0
K×K
T ×T
2
ẽ =Y − X β̃ = Y − X[(X 0 X)−1 X 0 + C]Y =
= [I − X(X 0 X)−1 X 0 − XC]Xβ +[I − X(X 0 X)−1 X 0 − XC]e
{z
}
|
=0
The estimator β̃ as:
β̃ = [(X 0 X)−1 X 0 + C][Xβ + e] = β + (X 0 X)−1 X 0 e + CXβ +Ce
| {z }
=0
And therefore:
β̃ 0 X 0 ẽ = [e0 C 0 + β 0 + e0 X(X 0 X)−1 ]X 0 [I − X(X 0 X)−1 X 0 − XC]e
0
= [e0 C 0 + β 0 + e0 X(X 0 X)−1 ][X
− X 0 e} −X 0 XCe] =
| e {z
=0
= −e0 C 0 X 0 XCe − β 0 X 0 XCe − e0 XCe = −(e0 C 0 X 0 + β 0 X 0 + e0 )XCe
Consequently, it must be C = 0 for the orthogonality condition to be satisfied2 .
2
Remember that CX = 0 6= XC = 0
3
Question 2
Consider the linear model with k regressors:
Y = Xβ + ε
e ∼ i.i.d.(0, σ 2 I)
Under the strong OLS assumptions:
a) Write down the OLS estimator of the β parameters, β̂.
b) Write down the relationship between the residuals ε̂ and the errors ε.
c) Show that β̂ and ε̂ are independent.
Solution
a) Let us consider the loss function we want to minimize:
S=
T
X
e2t = e0
e = (y − Xβ)0 (y − Xβ)
1×T T ×1
t=1
= y 0 y − y 0 Xβ − β 0 X 0 y + β 0 X 0 Xβ
1×1
1×T T ×1
1×T T ×1
1×T T ×1
Since all the terms are scalars:
S = y 0 y − 2β 0 X 0 y + β 0 X 0 Xβ = 0
The first-order conditions are then:
∂S
= −2X 0 y + 2X 0 X β = 0
k×k k×1
∂β
k×1
And the OLS estimators are:
4
β̂ = (X 0 X)−1 X 0 y
As a last step, we have to verify that β̂ are indeed the minimizers of the sum of squared
regression residuals. To do this, we can check the second order derivative:
∂ 2S
= 2X 0 X
∂β∂β 0
As X 0 X is positive definite, S is indeed minimized.
b) The residuals of the regression ε̂ can be rewritten as a function of the errors ε:
ε̂ = y − X β̂ = y − X(X 0 X)−1 X 0 y = [I − X(X 0 X)−1 X 0 ][Xβ + ε]
= Xβ + ε − X(X 0 X)−1 X 0 Xβ −X(X 0 X)−1 X 0 ε = [I − X(X 0 X)−1 X 0 ]ε
{z
}
|
=Xβ
c) As the errors are normally distributed, β̂ is also normally distributed. Therefore, in order
to show that β̂ and ε̂ it is enough to show that their covariance is null:
Cov(β̂, ε̂) = E[(β̂ − β)(ε̂ − 0)0 ] = E (X 0 X)−1 X 0 ε ε0 (I − X(X 0 X)−1 X 0 )
= σε2 (X 0 X)−1 X 0 I − X(X 0 X)−1 X 0 = 0
5
Question 3
Consider the matrix:
MX = I − X(X 0 X)−1 X 0
Show that:
a) MX · MX = MX , i.e. MX is idempotent.
b) MX · X = 0.
Solution
a) We can rewrite the expression as:
MX · MX = [I − X(X 0 X)−1 X 0 ][I − X(X 0 X)−1 X 0 ]
= I − X(X 0 X)−1 X 0 − X(X 0 X)−1 X 0 + X(X 0 X)−1 X 0 X(X 0 X)−1 X 0
{z
}
|
=I
= I − X(X 0 X)−1 X 0 −X(X 0 X)−1 X 0 + X(X 0 X)−1 X 0 = I − X(X 0 X)−1 X 0 = MX
{z
}
|
=0
and we have shown that MX is idempotent.
b)
MX · X = [I − X(X 0 X)−1 X 0 ]X = X − X (X 0 X)−1 X 0 X = X − X = 0
|
{z
}
=I
6
Question 4
Consider a linear regression of y on a vector X of K variables (with a constant).
Then let us consider an alternative set of regressors Z = XP, where P is a diagonal matrix.
a) Show that the vector of residuals in the regression of y on X is identical to the one of y on
Z.
b) What does this result imply about changing the regression fit by changing the unit of measure
of the independent variable?
Solution
a) The vector of residuals in the first regression (y = Xβ + ε) is equal to:
ε̂ = y − X β̂ = y − X(X 0 X)−1 X 0 y = [I − X(X 0 X)−1 X 0 ] y = MX y
|
{z
}
MX
The vector of residuals in the second regression (y = Zγ + u) is equal to:
û = y − Z γ̂ = [I − Z(Z 0 Z)−1 Z 0 ]y = [I − XP ((XP )0 XP )−1 (XP )0 ]y
= [I − XP P −1 (X 0 X)−1 (P 0 )−1 P 0 X 0 ]y = [I − X(X 0 X)−1 X 0 ]y = MX y
The fourth passage is justified by the following properties:
(AB)0 = B 0 A0 ⇒ (XP )0 = P 0 X 0
(AB)−1 = B −1 A−1 where both are invertible square matrices
Therefore:
[(XP )0 XP ]−1 = [(XP )0 X P ]−1 = P −1 [(XP )0 X]−1 = P −1 [P 0 X 0 X]−1 = P −1 (X 0 X)−1 (P 0 )−1
| {z }
K×K
7
b) As the residuals are identical, the fit will also be the same. Therefore changing the unit of
measurement is the same as multiplying by a diagonal matrix P where the k th element of
the diagonal is the factor to be applied to the k th variable.
8
Question 5
Consider the model:
Y = X1 β1 + X2 β2 + ε
ε ∼ i.i.d.N (0, σ 2 I)
where X1 and X2 are univariate deterministic variables.
a) Propose and and provide an explanation for an estimator δ̂ for the parameter δ = β1 + β2 .
b) Write down the expected value and the variance of δ̂.
c) Propose and provide an explanation for an estimator for the interval at the (1 − α)% for
δ = β1 + β2 .
d) Propose and discuss the properties of a test for the null hypothesis δ = 0 against the
alternative hypothesis δ 6= 0.
e) Propose and provide an explanation for an estimator for the parameters β1 and β2 under
the restriction that δ = β1 + β2 = 0.
Solution
a) Starting from:
 
β̂1
β̂ =   = (X 0 X)−1 X 0 Y
β̂2
we obtain δ̂ = β̂1 + β̂2 as the OLS estimator for δ.
b) As we know that the expected value of β̂ is:
 
β1
E(β̂) = β =  
β2
9
we obtain that
E(δ̂) = E(β̂1 ) + E(β̂2 ) = β1 + β2 = δ
Similarly, we can use the variance-covariance matrix of β̂ to derive the one for δ̂, as:
V ar(δ̂) = V ar(β̂1 ) + V ar(β̂2 ) + 2Cov(β̂1 , β̂2 )
q
ˆ δ̂) where tc is the critical value at the α2 % for
c) The estimator for the interval is δ̂ ± tc ∗ var(
a t-student distribution with T − 2 degrees of freedom and var(
ˆ δ̂) is the estimated variance
of δ̂ (which is obtained by substituting σ 2 with its estimator in the formula for var(δ̂)).
d) We can test the null hypothesis δ = 0 with the following test statistic:
δ̂ − 0
t= q
var(
ˆ δ̂)
Under the null, t ∼ tT −2 . We can then derive the critical value tc such that P r(x > tc ) =
α
2
and we reject the null hypothesis if t > tc or t < −tc .
e) If δ = β1 + β2 = 0, β1 = −β2 , and the model can then be rewritten as:
Y = −X1 β2 + X2 β2 + ε
⇒ Y = (X2 − X1 )β2 + ε
We can then obtain β̂2 by performing a regression of Y on (X2 − X1 ), and then estimate the
parameter β1 as β̂1 = −β̂2 .
10
Applied Questions
Question 6
Consider the following model
y = β0 + β1 x + β2 z + ε
The R output is
a) Comment on the coefficients of x and z. Which is their effect on y? Are they significant?
b) Get the 95% normal confidence interval for β1 . Do you get the same result as in the R
output? (HINT: the critical value you need to use is 1.96 )
c) Discuss the result from the F -test.
Solution
a) Ceteris paribus, a unit increase in x is associated with an increase in y of 7.315 units, whereas
a unit increase in z is associated with an increase in y of 0.009 units.
11
Both coefficients are statistically significant, as the p-values associated to their respective
t-tests are smaller than 0.01, hence the nulls that either of the coefficients, taken separately,
are equal to zero are rejected at 1% significance level.
b) A 95% confidence interval for β1 , based on the critical values of the normal distribution is
defined as
[β̂1 − 1.96 SE(β̂1 ) , β̂1 + 1.96 SE(β̂1 )]
Given the estimated parameter and standard error, the normal confidence interval we get
for β1 is
[5.5251 , 9.1052]
The resulting confidence interval is very similar to the confidence interval reported in the
R output, which is (correctly) based on the student t distribution. This happens because
we have 932 degrees of freedom, and as these become large the student t approaches the
standard normal distribution, ending up to have very similar critical values.
c) The F test reported in the R output tests the null that all regression coefficients are jointly
equal to zero, against the alternative that at least one of them is not equal to zero. The
p-value associated to the test is smaller than 0.01, hence we reject the null at 1% significance
level.
12
Question 7
The following table shows the regression of salary, wage, on the educational level, educ, experience
in levels and squared, and a dummy variable, f emale, taking value 1 if the person is female, and
0 otherwise.
wage = β0 + β1 educ + β2 exper + β3 exper2 + β4 f emale + ε
a) How do you interpret the coefficient associated to educ?
b) Which is the goal of using a dummy variable like f emale?
c) How do you interpret the estimated coefficient on the dummy variable? Write the predicted
value of wage when the dummy variable is equal to 1 and when it is equal to 0.
d) If in the data set there have been a male dummy variable, assuming value 1 if the subject is
male, do any problem arise by adding this second dummy variable in the previous model?
e) How do you interpret the coefficients on exper and expersq?
13
Solution
a) A unit increase in educ is associated with an increase in wage of 0.556.
b) The f emale dummy is aimed at capturing differences in earnings due to gender, for given
levels of schooling and experience.
c) Ceteris paribus, i.e. for given levels of education and experience, being female entails earnings
which are 2.114 units lower.
If f emale = 0,
wage
[ = −2.319 + 0.556 educ + 0.255 exper − 0.004 exper2
If f emale = 1,
wage
[ = −2.319 − 2.114 + 0.556 educ + 0.255 exper − 0.004 exper2
d) With both a male and f emale dummy, there would be a problem of perfect collinearity, as
male + f emale = 1
this issue is known as the dummy variable trap.
e) The marginal effect of additional experience on earnings is
∂ wage
= β̂2 + 2 β̂3 exper
∂ exper
Since β̂2 > 0 and β̂3 < 0, there are estimated diminishing returns from additional years of
experience.
Additional experience has no marginal return when
0.255 − 0.0088 exper = 0 ⇐⇒ exper =
0.255
≈ 29
0.0088
i.e. after approximately 29 years of working experience, additional experience is detected to
have, ceteris paribus, no beneficial effects on earnings anymore.
14
Extra Question
The following table shows the regression of log salary (wage) on two dummy variables (female and
nonwhite) and an interaction term of the two, plus education (educ) and experience (exper ) as
controls:
wage = α0 + α1 f emale + α2 nonwhite + α3 f emale ∗ nonwhite + α4 educ + α5 experε
a) What is the interpretation of female?
b) What is the role of the interaction term? How much would you expect wages to differ from
a white male and a non-white female?
c) Look at the significance of the coefficients. What does your model tell you about wage
differences across the sample?
Now, consider the following slightly modified specification:
wage = β0 + β1 f emale + β2 educ + β3 f emale ∗ educ + α4 exper + u
15
d) What is the difference between the interaction term in this model compared to the previous
regression? How do you interpret β3 ?
Solution
a) All else equal, the expected wage of a female compared to a male is lower by 33%3
b) The interaction term gives us the join, or simultaneous, effect of the dummies. In fact,
the impact of being female and non-white, according to our model, can be decomposed as
follows:
– α1 : differential effect of being a female
– α2 : differential effect of being non-white
– α3 : differential effect of being a female non-white
As such, as our reference group (the group identified by all dummies being zero) is a white
3
Bear in mind, however, that the log approximation to percentage change works better for small changes. In
this case, we are probably in the range of overestimation of the percentage change already.
16
male, we can then quantify the expected wage differential as:
E [f emale = 1, nonwhite = 1, exper, educ] − E [f emale = 1, nonwhite = 1, exper, educ] =
=(α0 + α1 + α2
= − 0.395%
c) Looking at p-values, we see that the coefficients to both nonwhite and the interaction are
not significant. This implies that, all else equal, our model predicts that there is no expected
difference in salary between white and non-white workers. Moreover, being a female reduced
expected salary, regardless ethnicity.
d) Now, the interaction terms influences the slope of the relationship between education and
wage. Indeed, it is now the case that:
E [f emale = 1, exper, educ] = (β0 + β1 ) + (β2 + β3 ) educ + β4 exper
We now interpret the coefficient as the differential effect of gaining one additional year of
education between males and female. Indeed, it may appear from our model that gains in
education, while having a positive effect on wage on the overall sample, are less remunerative
for female, as we obtain a negative coefficient for the interaction term. Yet, the coefficient
is rather low in magnitude, and not significant, so we should conclude that this effect is not
present in the sample at hand.
17
Download