ECON 837 - Econometrics Answer Keys Midterm exam Exercise 1: 1. The estimator of β1 obtained by running a regression of y on X1 only is β̂1 = (X1′ X1 )−1 X1′ y = (X1′ X1 )−1 X1′ (X1 β1 + X2 β2 + u) = β1 + (X1′ X1 )−1 (X1′ X2 )β2 + (X1′ X1 )−1 X1′ u And we can deduce the bias: E(β̂1 |X) − β1 = P12 β2 , where P12 = (X1′ X1 )−1 X1′ X2 . since E(u) = 0 from the classical assumptions. 2. The matrix P12 is such that: ( P12 = X1′ X1 n )−1 ( X1′ X2 n ) where X1′ X1 /n = X1′ X2 /n = n ∑ i=1 n ∑ (x1i x′1i )/n sample variance of X1 (x1i x′2i )/n sample covariance btw X1 and X2 i=1 β̂1 will be unbiased if (X1′ X2 )/n = 0, that is, if the sample covariance between X1 and X2 is exactly zero. It will also be unbiased if β2 = 0, that is if X2 are irrelevant explanatory variables. 3. (a) Using the notations of the previous questions, we have: X1 = [1 P ] and X2 = [I]. We can then use the formula from question 1. above to get ( E[ α̂ β̂ ( ) |X] − 1 α β ) = P12 β2 with P12 1 ∑ = ∑ 2 n i Pi − ( i Pi )2 Finally, we get: E(β̂|X) − β = = − ( ∑ ) ∑ ∑ ∑ 2 P I − P (P I ) i i i i i i i i ∑ ∑ ∑ i − i Pi i Ii + n i (Pi Ii ) ∑ Pi i∑ n ∑ ∑ n i Ii +∑ 2 i Pi − ( i i (Pi Ii ) Pi )2 ×γ sample cov. between P and I ×γ sample var of P We expect the sign of the bias to be positive, due to a positive covariance between P and I and a positive sign of γ . (b) Examples of other variables that should have been considered: some variable related to the quality of the furniture (e.g. brand name); some variables related to socio-economic characteristic (eg gender, number of family members living at home). Their omission may bias the estimated price eect if they are correlated with other covariates as well as with the dependent variables. Exercise 2: 1. Not necessarily. Identication and computational issues only arise from exact multicollinearity. Here, we may have "near-multicollinearity" which does not invalidate or prevent OLS, but may create some numerical and stability issues. 2. The main idea is to remove some variables that are highly correlated with other (included) variables. There are 2 ways to decide which variables to remove: based on statistical theory, remove those whose coecient is not signicantly dierent from zero; based on economic theory, remove those who should not be included according to econ theory. 3. One needs to make sure the model contains all possible relevant variables (in other words, one has not forgotten a relevant variable that could cause endogeneity issues and bias our estimators). The selection of variables that remain in the 2 model depend on the goal of the model: are we testing some economic theory, or are we looking for the best t in a statistical sense? If we are testing an econ theory, one can compare estimation results based on the unconstrained model, and the constrained model where the associated restrictions have been imposed on the parameters; if we are looking for best t model, one can work remove the variables associated with parameters that are not signicantly dierent from 0: however, one has to be careful not to create endogeneity bias. Exercise 3: 1. Since we are testing a univariate parameter (ie 1 restriction only) there are 2 test statistics that can be used to design an asymptotic test for H0 : µ = µ0 vs H1 : µ ̸= µ0 at level 0.95: the t-test and the Wald test. t = W = µ̂ − µ0 ase(µ̂) ˆ with µ̂ = (µ̂ − µ0 )2 avar(µ̂) ˆ 1∑ σ̂ 1∑ xi , ase(µ̂) ˆ = √ , σ̂ 2 = (xi − µ̂)2 T i T n with avar(µ̂) ˆ = σ̂ 2 /n Since xi are iid, we can apply a CLT on xi to get: √ d n(µ̂ − µ) → N (0, σ 2 ) (Note that ∑ xi /n = µ̂). i Under H0 , the limit distribution is: √ d n(µ̂ − µ0 ) → N (0, σ 2 ) √ n d ⇒ (µ̂ − µ0 ) → N (0, 1) σ √ n d ⇒ (µ̂ − µ0 ) → N (0, 1) and σ̂ n d (µ̂ − µ0 )2 → χ2 (1) 2 σ̂ where σ̂ is the consistent estimator of σ introduced above. Then, we get: µ̂ − µ0 d √ → N (0, 1) σ̂/ n n(µ̂ − µ0 )2 d 2 W = → χ (1) σ̂ 2 t = 3 The associated decision rules are: - [t-test] Reject H0 if |t| > t1−α/2 with t1−α/2 the (1−α/2)-quantile of the standard normal distribution; - [Wald test] Reject H0 if W > χ21−α (1) with χ21−α (1) the (1 − α)-quantile of the chi-square distribution with 1 degree of freedom. 2. To show that our test is consistent against the following xed alternative, H1 : µ = µ1 where µ1 ̸= µ0 , we need to show that the probability of rejecting H0 converges to 1 as the sample size increases under H1 . To do so, we derive the asymptotic distribution of the test statistic under H1 . √ n t = (µ̂ − µ0 ) σ̂ √ n = (µ̂ − µ1 + µ1 − µ0 ) √σ̂ √ n n = (µ̂ − µ1 ) + (µ1 − µ0 ) σ̂ σ̂ Using a CLT on the xi , we can show that the rst term converges in distribution to a normal with mean 0 and variance 1; since µ1 ̸= µ0 , we can show that the second term diverges either towards innity or -innity (depending on the sign of (µ1 − µ0 ). We can then compute the asymptotic power: P ower = P (|t| > 1.96) √ n = P (|Z + (µ1 − µ0 )| > 1.96) σ̂ n → 1 √ n The limit follows since | n(µ1 − µ0 )| → ∞, and therefore as the sample size increase the quantity becomes eventually larger than 1.96. 3. To derive the asymptotic power of our test under the following sequence of local √ alternatives, H1,n : µ = µ0 + δ/ n, we rst derive the asymptotic distribution of 4 our test statistic under H1,n . t = = = = √ n (µ̂ − µ0 ) √σ̂ n (µ̂ − µ1,n + µ1,n − µ0 ) σ̂ √ √ n n (µ̂ − µ1,n ) + (µ1,n − µ0 ) σ̂ √σ̂ n δ (µ̂ − µ1,n ) + σ̂ σ̂ Using a CLT on the xi , we can show that the rst term converges in distribution to a normal with mean 0 and variance 1; the second term converges to δ/σ since σ̂ is a consistent estimator of σ . Overall, the test statistic converges in distribution towards a normal with mean δ/σ and variance 1. We can then compute the asymptotic power: P ower = P (|t| > 1.96) = P (|Zδ | > 1.96) where Z ∼ N (δ/σ, 1) ≥ α The power is some probability between α and 1 that depends on δ : the larger δ , the larger the power. Exercise 4 [13 points]: 1. The OLS estimator of β is β̂ = (Y2′ Y2 )−1 Y2′ Y1 y2,1 y1,1 . . where Y2 = .. , Y1 = .. y2,n β̂ = (Y2′ Y2 )−1 Y2′ (βY2 + u1 ) ( ′ )−1 ( ′ ) Y2 Y2 Y2 u1 = β+ n n 5 y1,n P 2 Since y2,i are iid, we can apply a LLN to deduce: Y2′ Y2 /n → E(y2,i ). Also, we have: Y2′ u1 n (αX ′ + u′2 )u1 n ′ X u1 u′2 u1 = α + n n P → αE(xt u1,i ) + E(u2,i u1,i ) = By assumption xi is an exogenous variables, which means E(xy u1,i ) = 0. By assumption, E(u2,i u1,i ) = σ12 . So, in general, this estimator is not consistent. In order to recover consistency, we need to restrict σ12 = 0, that is no correlation between the error terms in both equations: such restriction would also guarantee that there is no endogeneity issues, which would make OLS the preferred estimator (and consistent). 2. The OLS estimator of α is α̂ = (X ′ X)−1 X ′ Y2 x1 . with X = .. xn = (X ′ X)−1 X ′ (αX + U2 ) ( ′ )−1 ( ′ ) X U2 XX = α+ n n P Since xi is iid, we can apply a LLN to show: X ′ X/n → E(x2i ); in addition, we P also have: X ′ u2 /n → E(xi u2,i ) = 0 since xi is exogenous. We conclude that this estimator is consistent. 3. The IV estimator of β using x as an instrument is: β̂iv = (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1 where Ŷ2 = α̂X = Px Y2 = (Y2′ Px Y2 )−1 Y2′ Px Y1 = (Y2′ Px Y2 )−1 Y2′ Px (βY2 + u1 ) = β + (Y2′ Px Y2 )−1 Y2′ Px u1 6 We can show that the second term converges in probability toward 0: (Y2′ Px Y2 )−1 Y2 Px u1 = [(Y2′ X)/n(X ′ X/n)−1 (X ′ Y2 )/n]−1 (Y2′ X)/n(X ′ X/n)−1 X ′ u1 /n where, as previously shown and discussed, − X ′ Y2 /n → E(y2i xi ) = α P − X ′ X/n → E(x2i ) ̸= 0 P − X ′ u1 /n → E(xi u1,i ) = 0 P Hence, the IV estimator is consistent. When comparing the OLS and the IV estimator of β : the IV estimator should be used when there is endogeneity to deliver a consistent estimator; otherwise, the OLS estimator should be used. 4. (a) The OLS estimator of α in the updated model is α̂ = (X ′ X)−1 X ′ Y2 = (X ′ X)−1 X ′ (αn X + U2 ) ( ′ )−1 ( ′ ) XX X U2 = αn + n n P → 0 √ n where the rst term αn = α/ n → 0 and the second term converges in probability to 0 as shown previously. In addition, we have: √ nα̂ = α + ( X ′X n )−1 ( ) √ X ′ U2 n n → α + N (0, E(x2i )−2 E(x2i u22i ) d (b) In the updated model, the IV estimator of β using x as an instrument is such that: β̂iv = (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1 = β + (Y2′ Px Y2 )−1 Y2′ Px u1 7 where X ′ Y2 n = = X ′ (αn X + u2 ) n ′ α X X X ′ u2 √ + n n n P → 0 We need to rescale this expression in order to nd its limit: √ X ′ Y2 n n √ X ′ (αn X + u2 ) n n ′ √ XX X ′ u2 = α + n n n d 2 → Z2 ∼ N (αE(xi ), E(x2i u2i,2 )) = So overall, we have: β̂iv = (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1 = β + (Y2′ Px Y2 )−1 Y2′ Px u1 [√ ( )−1 √ ′ ]−1 [ √ ′ ( ′ )−1 √ ′ ] nY2′ X X ′ X nX Y2 nY2 X X X nX u1 = β+ n n n n n n → β + [Z2′ E(x2i )−1 Z2 ]−1 [Z2′ E(x2i )−1 Z1 ] d √ ′ d where a CLT yields nXn u1 → Z1 ∼ N (0, E(x2i u21,i )). The limit of the iv estimator is stochastic and it is not consistent anymore. (c) i. The OLS estimator of α is now α̂ = (X ′ X)−1 X ′ Y2 = (X ′ X)−1 X ′ (αn X + U2 ) ( ′ )−1 ( ′ ) XX X U2 = αn + n n P → 0 n 0 and the second term converges in where the rst term αn = α/nk → probability to 0 as shown previously. 8 In addition, when k < 1/2, we have: ( k n α̂ = α + X ′X n since n ′ U2 nk √ = n n kX )−1 ( ) ′ P k X U2 n →α n √ ′ nX U2 d →0 n ii. In the updated model, the IV estimator of β using x as an instrument is such that: = (Ŷ2′ Ŷ2 )−1 Ŷ2′ Y1 β̂iv = β + (Y2′ Px Y2 )−1 Y2′ Px u1 [ ( )−1 k ′ ]−1 [ k ′ ( ′ )−1 k ′ ] nk Y2′ X X ′ X n X Y2 n Y2 X X X n X u1 = β+ n n n n n n d → β since when k < 1/2 nk X ′ Y2 X ′X nk =α +√ n n n √ nX ′ u2 P → αE(x2i ) n Hence, when k < 1/2, the IV estimator is consistent. When k > 1/2, there is no well-dened limit distribution. (d) This is the typical framework used to model "weak instruments": these are instruments with limited explanatory power in the sense that they only have "small" correlation with the endogenous variable. To model such "small correlation" and still be able to derive some asymptotic distribution, econometrician rely on the trick where the value of that correlation is tied to the √ sample size. When the rate is exactly n, the information disappears as fast as it accumulates when increasing the sample size (CLT), and the estimator is not consistent with a non-standard distribution; when the rate is slower nk and k < 1/2 (instrument is near-weak, a little better than weak, but not 9 quite strong), one can still get consistent estimator and the limit distribution is standard, but associate with a slower rate of convergence; nally, with nk and k > 1/2, everything is lost, the instrument is "too weak". 10