Solutions to Problem Set 1 Econometrics (30413) Spring 2022 Theory Questions Question 1 Given the model: Y = Xβ + e e ∼i.i.d.N (0, σ 2 I) consider a general linear estimator: 0 −1 0 β̃ = (X X) X + C Y K×1 K×T K×K K×T T ×1 where C is a deterministic matrix. a) Write down the condition to be imposed on C so that E(β̃ = β). b) Supposing that the condition in a) is satisfied, write down the variance-covariance matrix of β̃. c) Show whether the estimator β̃ is consistent. d) Show which condition has to be imposed on C in order for the model fit X β̃ and the residuals Y − X β̃ to be orthogonal. 1 Solution a) β̃ can be written as: β̃ = [(X 0 X)−1 X 0 + C][Xβ + e] = β + (X 0 X)−1 X 0 e + CXβ + Ce so that E(β̃) = β if and only if CX = 01 . b) V ar(β̃) = σ 2 [(X 0 X)−1 X 0 + C][(X 0 X)−1 X 0 + C]0 0 0 CX (X 0 X)−1 + CC 0 ] = σ 2 [(X 0 X)−1 + CC 0 ] = σ 2 [(X 0 X)−1 + (X 0 X)−1 |X{z C} + |{z} =0 =0 c) We can write: 0 −1 σ2 XX 0 V ar(β̃) = ∗ + T ∗ CC T T 2 2 σ σ −1 = ∗ ΣXX + ∗ T ∗ CC 0 T T The first term will certainly tend to 0 for a big enough T , as it is the product between a constant and a term that tends towards 0. However, the second term does not tend to 0 when T increases, implying that V ar(˜(β)) doesn’t tend to 0 for T → ∞. The β̃ estimator is thus not consistent, unless we impose a condition on the limit of T ∗ CC 0 (for instance if we impose that it tends to a constant). d) In order for the model fit to be orthogonal to the residuals, the following condition has to be satisfied: β̃ 0 X 0 ẽ = 0 We can rewrite the residuals as: 1 Notice that CX = 0 6= XC = 0. However, if CX = 0, then also X 0 C 0 = 0 K×K T ×T 2 ẽ =Y − X β̃ = Y − X[(X 0 X)−1 X 0 + C]Y = = [I − X(X 0 X)−1 X 0 − XC]Xβ +[I − X(X 0 X)−1 X 0 − XC]e {z } | =0 The estimator β̃ as: β̃ = [(X 0 X)−1 X 0 + C][Xβ + e] = β + (X 0 X)−1 X 0 e + CXβ +Ce | {z } =0 And therefore: β̃ 0 X 0 ẽ = [e0 C 0 + β 0 + e0 X(X 0 X)−1 ]X 0 [I − X(X 0 X)−1 X 0 − XC]e 0 = [e0 C 0 + β 0 + e0 X(X 0 X)−1 ][X − X 0 e} −X 0 XCe] = | e {z =0 = −e0 C 0 X 0 XCe − β 0 X 0 XCe − e0 XCe = −(e0 C 0 X 0 + β 0 X 0 + e0 )XCe Consequently, it must be C = 0 for the orthogonality condition to be satisfied2 . 2 Remember that CX = 0 6= XC = 0 3 Question 2 Consider the linear model with k regressors: Y = Xβ + ε e ∼ i.i.d.(0, σ 2 I) Under the strong OLS assumptions: a) Write down the OLS estimator of the β parameters, β̂. b) Write down the relationship between the residuals ε̂ and the errors ε. c) Show that β̂ and ε̂ are independent. Solution a) Let us consider the loss function we want to minimize: S= T X e2t = e0 e = (y − Xβ)0 (y − Xβ) 1×T T ×1 t=1 = y 0 y − y 0 Xβ − β 0 X 0 y + β 0 X 0 Xβ 1×1 1×T T ×1 1×T T ×1 1×T T ×1 Since all the terms are scalars: S = y 0 y − 2β 0 X 0 y + β 0 X 0 Xβ = 0 The first-order conditions are then: ∂S = −2X 0 y + 2X 0 X β = 0 k×k k×1 ∂β k×1 And the OLS estimators are: 4 β̂ = (X 0 X)−1 X 0 y As a last step, we have to verify that β̂ are indeed the minimizers of the sum of squared regression residuals. To do this, we can check the second order derivative: ∂ 2S = 2X 0 X ∂β∂β 0 As X 0 X is positive definite, S is indeed minimized. b) The residuals of the regression ε̂ can be rewritten as a function of the errors ε: ε̂ = y − X β̂ = y − X(X 0 X)−1 X 0 y = [I − X(X 0 X)−1 X 0 ][Xβ + ε] = Xβ + ε − X(X 0 X)−1 X 0 Xβ −X(X 0 X)−1 X 0 ε = [I − X(X 0 X)−1 X 0 ]ε {z } | =Xβ c) As the errors are normally distributed, β̂ is also normally distributed. Therefore, in order to show that β̂ and ε̂ it is enough to show that their covariance is null: Cov(β̂, ε̂) = E[(β̂ − β)(ε̂ − 0)0 ] = E (X 0 X)−1 X 0 ε ε0 (I − X(X 0 X)−1 X 0 ) = σε2 (X 0 X)−1 X 0 I − X(X 0 X)−1 X 0 = 0 5 Question 3 Consider the matrix: MX = I − X(X 0 X)−1 X 0 Show that: a) MX · MX = MX , i.e. MX is idempotent. b) MX · X = 0. Solution a) We can rewrite the expression as: MX · MX = [I − X(X 0 X)−1 X 0 ][I − X(X 0 X)−1 X 0 ] = I − X(X 0 X)−1 X 0 − X(X 0 X)−1 X 0 + X(X 0 X)−1 X 0 X(X 0 X)−1 X 0 {z } | =I = I − X(X 0 X)−1 X 0 −X(X 0 X)−1 X 0 + X(X 0 X)−1 X 0 = I − X(X 0 X)−1 X 0 = MX {z } | =0 and we have shown that MX is idempotent. b) MX · X = [I − X(X 0 X)−1 X 0 ]X = X − X (X 0 X)−1 X 0 X = X − X = 0 | {z } =I 6 Question 4 Consider a linear regression of y on a vector X of K variables (with a constant). Then let us consider an alternative set of regressors Z = XP, where P is a diagonal matrix. a) Show that the vector of residuals in the regression of y on X is identical to the one of y on Z. b) What does this result imply about changing the regression fit by changing the unit of measure of the independent variable? Solution a) The vector of residuals in the first regression (y = Xβ + ε) is equal to: ε̂ = y − X β̂ = y − X(X 0 X)−1 X 0 y = [I − X(X 0 X)−1 X 0 ] y = MX y | {z } MX The vector of residuals in the second regression (y = Zγ + u) is equal to: û = y − Z γ̂ = [I − Z(Z 0 Z)−1 Z 0 ]y = [I − XP ((XP )0 XP )−1 (XP )0 ]y = [I − XP P −1 (X 0 X)−1 (P 0 )−1 P 0 X 0 ]y = [I − X(X 0 X)−1 X 0 ]y = MX y The fourth passage is justified by the following properties: (AB)0 = B 0 A0 ⇒ (XP )0 = P 0 X 0 (AB)−1 = B −1 A−1 where both are invertible square matrices Therefore: [(XP )0 XP ]−1 = [(XP )0 X P ]−1 = P −1 [(XP )0 X]−1 = P −1 [P 0 X 0 X]−1 = P −1 (X 0 X)−1 (P 0 )−1 | {z } K×K 7 b) As the residuals are identical, the fit will also be the same. Therefore changing the unit of measurement is the same as multiplying by a diagonal matrix P where the k th element of the diagonal is the factor to be applied to the k th variable. 8 Question 5 Consider the model: Y = X1 β1 + X2 β2 + ε ε ∼ i.i.d.N (0, σ 2 I) where X1 and X2 are univariate deterministic variables. a) Propose and and provide an explanation for an estimator δ̂ for the parameter δ = β1 + β2 . b) Write down the expected value and the variance of δ̂. c) Propose and provide an explanation for an estimator for the interval at the (1 − α)% for δ = β1 + β2 . d) Propose and discuss the properties of a test for the null hypothesis δ = 0 against the alternative hypothesis δ 6= 0. e) Propose and provide an explanation for an estimator for the parameters β1 and β2 under the restriction that δ = β1 + β2 = 0. Solution a) Starting from: β̂1 β̂ = = (X 0 X)−1 X 0 Y β̂2 we obtain δ̂ = β̂1 + β̂2 as the OLS estimator for δ. b) As we know that the expected value of β̂ is: β1 E(β̂) = β = β2 9 we obtain that E(δ̂) = E(β̂1 ) + E(β̂2 ) = β1 + β2 = δ Similarly, we can use the variance-covariance matrix of β̂ to derive the one for δ̂, as: V ar(δ̂) = V ar(β̂1 ) + V ar(β̂2 ) + 2Cov(β̂1 , β̂2 ) q ˆ δ̂) where tc is the critical value at the α2 % for c) The estimator for the interval is δ̂ ± tc ∗ var( a t-student distribution with T − 2 degrees of freedom and var( ˆ δ̂) is the estimated variance of δ̂ (which is obtained by substituting σ 2 with its estimator in the formula for var(δ̂)). d) We can test the null hypothesis δ = 0 with the following test statistic: δ̂ − 0 t= q var( ˆ δ̂) Under the null, t ∼ tT −2 . We can then derive the critical value tc such that P r(x > tc ) = α 2 and we reject the null hypothesis if t > tc or t < −tc . e) If δ = β1 + β2 = 0, β1 = −β2 , and the model can then be rewritten as: Y = −X1 β2 + X2 β2 + ε ⇒ Y = (X2 − X1 )β2 + ε We can then obtain β̂2 by performing a regression of Y on (X2 − X1 ), and then estimate the parameter β1 as β̂1 = −β̂2 . 10 Applied Questions Question 6 Consider the following model y = β0 + β1 x + β2 z + ε The R output is a) Comment on the coefficients of x and z. Which is their effect on y? Are they significant? b) Get the 95% normal confidence interval for β1 . Do you get the same result as in the R output? (HINT: the critical value you need to use is 1.96 ) c) Discuss the result from the F -test. Solution a) Ceteris paribus, a unit increase in x is associated with an increase in y of 7.315 units, whereas a unit increase in z is associated with an increase in y of 0.009 units. 11 Both coefficients are statistically significant, as the p-values associated to their respective t-tests are smaller than 0.01, hence the nulls that either of the coefficients, taken separately, are equal to zero are rejected at 1% significance level. b) A 95% confidence interval for β1 , based on the critical values of the normal distribution is defined as [β̂1 − 1.96 SE(β̂1 ) , β̂1 + 1.96 SE(β̂1 )] Given the estimated parameter and standard error, the normal confidence interval we get for β1 is [5.5251 , 9.1052] The resulting confidence interval is very similar to the confidence interval reported in the R output, which is (correctly) based on the student t distribution. This happens because we have 932 degrees of freedom, and as these become large the student t approaches the standard normal distribution, ending up to have very similar critical values. c) The F test reported in the R output tests the null that all regression coefficients are jointly equal to zero, against the alternative that at least one of them is not equal to zero. The p-value associated to the test is smaller than 0.01, hence we reject the null at 1% significance level. 12 Question 7 The following table shows the regression of salary, wage, on the educational level, educ, experience in levels and squared, and a dummy variable, f emale, taking value 1 if the person is female, and 0 otherwise. wage = β0 + β1 educ + β2 exper + β3 exper2 + β4 f emale + ε a) How do you interpret the coefficient associated to educ? b) Which is the goal of using a dummy variable like f emale? c) How do you interpret the estimated coefficient on the dummy variable? Write the predicted value of wage when the dummy variable is equal to 1 and when it is equal to 0. d) If in the data set there have been a male dummy variable, assuming value 1 if the subject is male, do any problem arise by adding this second dummy variable in the previous model? e) How do you interpret the coefficients on exper and expersq? 13 Solution a) A unit increase in educ is associated with an increase in wage of 0.556. b) The f emale dummy is aimed at capturing differences in earnings due to gender, for given levels of schooling and experience. c) Ceteris paribus, i.e. for given levels of education and experience, being female entails earnings which are 2.114 units lower. If f emale = 0, wage [ = −2.319 + 0.556 educ + 0.255 exper − 0.004 exper2 If f emale = 1, wage [ = −2.319 − 2.114 + 0.556 educ + 0.255 exper − 0.004 exper2 d) With both a male and f emale dummy, there would be a problem of perfect collinearity, as male + f emale = 1 this issue is known as the dummy variable trap. e) The marginal effect of additional experience on earnings is ∂ wage = β̂2 + 2 β̂3 exper ∂ exper Since β̂2 > 0 and β̂3 < 0, there are estimated diminishing returns from additional years of experience. Additional experience has no marginal return when 0.255 − 0.0088 exper = 0 ⇐⇒ exper = 0.255 ≈ 29 0.0088 i.e. after approximately 29 years of working experience, additional experience is detected to have, ceteris paribus, no beneficial effects on earnings anymore. 14 Extra Question The following table shows the regression of log salary (wage) on two dummy variables (female and nonwhite) and an interaction term of the two, plus education (educ) and experience (exper ) as controls: wage = α0 + α1 f emale + α2 nonwhite + α3 f emale ∗ nonwhite + α4 educ + α5 experε a) What is the interpretation of female? b) What is the role of the interaction term? How much would you expect wages to differ from a white male and a non-white female? c) Look at the significance of the coefficients. What does your model tell you about wage differences across the sample? Now, consider the following slightly modified specification: wage = β0 + β1 f emale + β2 educ + β3 f emale ∗ educ + α4 exper + u 15 d) What is the difference between the interaction term in this model compared to the previous regression? How do you interpret β3 ? Solution a) All else equal, the expected wage of a female compared to a male is lower by 33%3 b) The interaction term gives us the join, or simultaneous, effect of the dummies. In fact, the impact of being female and non-white, according to our model, can be decomposed as follows: – α1 : differential effect of being a female – α2 : differential effect of being non-white – α3 : differential effect of being a female non-white As such, as our reference group (the group identified by all dummies being zero) is a white 3 Bear in mind, however, that the log approximation to percentage change works better for small changes. In this case, we are probably in the range of overestimation of the percentage change already. 16 male, we can then quantify the expected wage differential as: E [f emale = 1, nonwhite = 1, exper, educ] − E [f emale = 1, nonwhite = 1, exper, educ] = =(α0 + α1 + α2 = − 0.395% c) Looking at p-values, we see that the coefficients to both nonwhite and the interaction are not significant. This implies that, all else equal, our model predicts that there is no expected difference in salary between white and non-white workers. Moreover, being a female reduced expected salary, regardless ethnicity. d) Now, the interaction terms influences the slope of the relationship between education and wage. Indeed, it is now the case that: E [f emale = 1, exper, educ] = (β0 + β1 ) + (β2 + β3 ) educ + β4 exper We now interpret the coefficient as the differential effect of gaining one additional year of education between males and female. Indeed, it may appear from our model that gains in education, while having a positive effect on wage on the overall sample, are less remunerative for female, as we obtain a negative coefficient for the interaction term. Yet, the coefficient is rather low in magnitude, and not significant, so we should conclude that this effect is not present in the sample at hand. 17