Midterm exam for Econometrics 101, Warwick Econ Ph.D, 2015 1) (25 marks) Doing again a proof we saw together Let (Xi )1≤i≤n be an iid sample of k × 1 random vectors following the same distribution as the k × 1 random vector X . Let (Yi )1≤i≤n be an iid sample of random variables following the same distribution as the random variable Y . Assume that the k × k matrix E(XX 0 ) is invertible. Let β = E(XX 0 )−1 E(XY ) (0.0.1) be the population OLS regression coecient of Y on X . Let n βb = 1X Xi Xi0 n !−1 n 1X Xi Yi n i=1 (0.0.2) i=1 be the estimator of β . Redo the proof of the following theorem, which we saw together. Theorem 0.0.1 Asymptotic normality of βb βb. β: is an asymptotically normal sequence of estimators of √ n(βb − β) ,→ N (0, E(XX 0 )−1 E(XX 0 e2 )E(XX 0 )−1 ). In the proof, you can use the following result without proving it: Theorem 0.0.2 The product of a deterministic matrix and a normally distributed vector is a normally distributed vector If U ∼ N (0, V (U )) is a k×1 normal vector and B is a k×k deterministic matrix, then BU ∼ N (0, BV (U )B 0 ). Solution Write Yi = Xi0 β + ei . Substitute this into (0.0.2). This yields n 1X Xi Xi0 n !−1 n 1X Xi (Xi0 β + ei ). n i=1 i=1 !−1 n n 1X 1X = β+ Xi Xi0 Xi ei . n n βb = i=1 i=1 Therefore, √ n(βb − β) = n 1X Xi Xi0 n i=1 As 1 n !−1 √ n n 1X Xi ei n ! . (0.0.3) i=1 0 −1 i=1 Xi Xi Pn → E(XX 0 )−1 (why?), the Slutsky lemma implies that !−1 ! ! n n n √ 1X 1X 1X 0 0 −1 √ Xi Xi n Xi ei ,→ E(XX ) n Xi ei . n n n i=1 i=1 (0.0.4) i=1 Since E(Xe) = 0 (why?), the central limit theorem implies that √ n n 1X Xi ei n ! ,→ N (0, V (Xe)). i=1 1 (0.0.5) Because E(Xe) = 0, V (Xe) = E((Xe − E(Xe))(Xe − E(Xe))0 ) = E(XX 0 e2 ). (0.0.6) Combining Equations (0.0.3), (0.0.4), (0.0.5), (0.0.6) and Theorem 0.0.2 yields the result (once noted that E(XX 0 )−1 0 = E(XX 0 )−1 ). QED. 2) Short questions a) (5 marks) Assume a policy maker oers you to run a randomized experiment to measure the eect of a training for unemployed people. 400 unemployed are motivated to participate in the experiment, and the policy maker leaves it to you to decide how many will be in the control group, and how many will be in the treatment group. To maximize the statistical power of your experiment, how many unemployed should be in the treatment group? No need to justify your answer, just give a number. b) (5 marks) Assume a policy maker oers you to run a randomized experiment to measure the eect of a class size reduction program in Portugal. Based on estimates from other countries, you anticipate that this policy should increase students test scores by 20% of a standard deviation. Given the number of students that will be included in the experiment, and the share of them who will receive the treatment, you compute that for α = 0.05 and λ = 0.8, the minimum detectable eect of the experiment is exactly equal to 20% of a standard deviation. If the true eect of that policy is indeed to increase test scores by 20% of a standard deviation, what is the probability that you will incorrectly conclude that the policy has no eect at the outset of the experiment? No need to justify your answer, just give a number. c) (10 marks) Assume you regress wages on a constant, a dummy variable for gender (1 for males, 0 for females) and a dummy variable for black people (1 for black people, 0 for nonblack). The gender and black dummies are respectively denoted by X2 and X3 . The following table presents the mean of wages in the four subgroups dened by those two variables. Females Males Non-Black Black 2000 2500 1400 1600 The coecient of X3 , the black dummy, is equal to wδ1 + (1 − w)δ2 , where w is a positive weight, and δ1 and δ2 are two numbers which you can compute from the table. Give the values of δ1 and δ2 . No need to justify your answer, just give two numbers. d) (5 marks) What is the only circumstance in which you might prefer using non-robust instead of robust standard errors when you run a regression? Reply in no more than one sentence. Solution a) 200. 2 b) 20%. c) δ1 = −600, δ2 = −900. d) When there are less than 200 units in your sample. 3) Measuring heterogeneous treatment eects. Assume you are interested in measuring the eect of reducing class size on students' performance. You run an experiment where 500 rst graders are allocated to small classes with 10 students, while 500 other rst graders are allocated to regular classes with 25 students. The allocation of students to classes is made randomly. Let Di denote a dummy equal to 1 if student i is in a small class. Let Yi denote students' i mark in a test in the end of rst grade, standardized by the variance of marks. Let Yi1 denote the standardized mark that student i will obtain in that test if she is sent to a small class. Let Yi0 denote the standardized mark that student i will obtain in that test if she is not sent to a small class. We have Yi = Di Yi1 + (1 − Di )Yi0 . Finally, assume that in their last year of kindergarten, students took a standardized test measuring their readiness for rst grade, and let Xi be a dummy equal to 1 for students who scored above the median in this test. a) (3 marks) Explain in a few sentences why randomization ensures that Di ⊥⊥ (Yi0 , Yi1 , Xi ). Your answer should not be longer than 5 lines. b) (3 marks) Assume you regress Yi on a constant and Di . Let β denote the population coecient of Di in that regression. Using one of the three examples of saturated regressions we saw together, give a formula for β . Your formula should only involve Yi and Di , not Yi0 or Yi1 , and to justify it you only need to say whether this regression corresponds to the rst, second, or third example of a saturated regression we saw together. c) (4 marks) Using the result from the previous question, show that β = E(Yi1 − Yi0 ). Hint: you can use without proving them the two following facts 1. Di ⊥⊥ (Yi0 , Yi1 , Xi ) implies that Di ⊥⊥ Yi0 and Di ⊥⊥ Yi1 2. For any random variables U and V , if U ⊥⊥ V then E(U |V ) = E(U ). d) (5 marks) Now, assume you regress Yi on a constant, Di , Xi , and Di × Xi . Let β0 and β1 denote the population regression coecients of Di and Di × Xi in that regression. Using one of the three examples of saturated regressions we saw together, give a formula for β0 and β1 . Your formula should only involve Yi , Di , and Xi , not Yi0 or Yi1 , and to justify it you only need to say whether this regression corresponds to the rst, second, or third example of a saturated regression we saw together. e) (5 marks) Using the result from the previous question, show that β0 = E(Yi1 − Yi0 |Xi = 0) and β1 = E(Yi1 − Yi0 |Xi = 1) − E(Yi1 − Yi0 |Xi = 0). Hint: you can use the fact that Di ⊥⊥ (Yi0 , Yi1 , Xi ) implies that Di ⊥⊥ (Yi0 , Xi ) and Di ⊥⊥ (Yi1 , Xi ). Then, remember that joint independence implies conditional independence. Finally, you can use the fact for any random variables (U, V, W ), E(V |U = u) − E(W |U = u) = E(V − W |U = u). 3 f) (5 marks) Now, assume you estimate the regression in question d). You nd βb1 = −0.3, and you can also reject the null hypothesis that β1 = 0 at the 5% level. Interpret these results. Based on the results of the experiment, which students should benet from smaller classes? Solution a) Because treatment is randomly allocated, the treatment and the control group must be comparable on all dimensions. b) β = E(Yi |Di = 1) − E(Yi |Di = 0). This follows from the second example of a saturated regression we saw together. c) E(Yi |Di = 1) − E(Yi |Di = 0) = E(Yi1 |Di = 1) − E(Yi0 |Di = 0) = E(Yi1 ) − E(Yi0 ) = E(Yi1 − Yi0 ). d) β0 = E(Yi |Di = 1, Xi = 0) − E(Yi |Di = 0, Xi = 0) and β1 = E(Yi |Di = 1, Xi = 1)−E(Yi |Di = 0, Xi = 1)−(E(Yi |Di = 1, Xi = 0)−E(Yi |Di = 0, Xi = 0)). This follows from the third example of a saturated regression we saw together. e) E(Yi |Di = 1, Xi = 0) − E(Yi |Di = 0, Xi = 0) = E(Yi1 |Di = 1, Xi = 0) − E(Yi0 |Di = 0, Xi = 0) = E(Yi1 |Xi = 0) − E(Yi0 |Xi = 0) = E(Yi1 − Yi0 |Xi = 0). The proof of the second statement is similar. f) Within the subgroup of students above the median of readiness for rst grade, the treatment/control comparison of test scores in the end of rst grade is 30% of a standard deviation lower than among students below the median, and this dierence is statistically signicant. Therefore, the eect of studying in smaller classes is stronger for weak than for strong students. Students below the median of readiness for rst grade should benet from smaller classes. 4) Performing inference on the upper bound of the support of a random variable in a nonparametric model. Assume you observe an iid sample of n random variables (Yi ) which follow a continuous distribution on [0, θ]. Let F denote the cumulative distribution function of these random variables: F (x) = P (Yi ≤ x). Let us also assume that F is dierentiable on [0, θ]. Its derivative is the density of the (Yi ). It is denoted f . θ is the unknown parameter we would like to estimate. Consider the following estimator for θ: θbM L = max {Yi }. 1≤i≤n 4 a) Show that for any x ∈ [0, θ], P (θbM L ≤ x) = (F (x))n , for x < 0 P (θbM L ≤ x) = 0, and for x > θ P (θbM L ≤ x) = 1. b) Assume that f (θ) > 0. Use question a) to show that n θ − θbM L ,→ U , where U follows an exponential distribution with parameter f (θ). Hint 1 : the cdf of a random variable following an exponential distribution with parameter f (θ) is 1 − exp(−xf (θ)) for x ≥ 0 and 0 otherwise. Hint 2 : to prove this, you can use the following result: if a function g is dierentiable at some real number u with derivative g 0 (u), then for any real number k, g(u+ nk ) = g(u)+g 0 (u) nk + εnn , where εn is a sequence converging towards 0 when n goes to +∞. c) Assume that f (θ) = 0 but f is dierentiable at θ and f 0 (θ) 6= 0. As f (θ) = 0 and f is a density function which must be positive everywhere, if f 0 (θ) 6= 0 then we must have f 0 (θ) < 0 (otherwise, f would be strictly negative for values to the left of θ). Use question √ a) to show that n θ − θbM L ,→ U , where U follows a Weibull distribution with parameters 2 and q 2 −f 0 (θ) . Hint 1 : the cdf of a random variable following a Weibull distribution with !2 parameter 2 and q 2 −f 0 (θ) is 1 − exp − q x 2 −f 0 (θ) for x ≥ 0 and 0 otherwise. Hint 2 : to prove this, you can use the following result: if a function g is twice dierentiable at some real number u with rst and second order derivatives g 0 (u) and g 00 (u), then for any real number k2 k , g(u + √kn ) = g(u) + g 0 (u) √kn + g 00 (u) 2n + εnn , where εn is a sequence converging towards 0 when n goes to +∞. d) Question b) shows that if f (θ) 6= 0, θbM L is an n−consistent estimator of θ. Question c) √ shows that if f (θ) = 0 (but f 0 (θ) 6= 0), θbM L is a n−consistent estimator of θ. Explain intuitively why θbM L converges more slowly towards θ when f (θ) = 0. Solution a) For x ∈ [0, θ], P (θbM L ≤ x) = P (Y1 ≤ x, ..., Yn ≤ x) = P (Y1 ≤ x)...P (Yn ≤ x) = (F (x))n . The rst step holds because the max of n random variables is lower than x if and only if each of those random variables is lower than x. The second step follows from independence. For x < 0, the result follows from the fact that the max of positive random variables cannot be lower than 0. For x > θ, this follows from the fact that the max of random variables lower than θ must be lower than θ as well. b) The cdf of an exp(f (θ)) variable is continuous everywhere. Using the denition of conver gence in distribution, we therefore have to show that P n θ − θbM L ≤ x converges to the cdf of an exp(f (θ)) for every x. 5 For x < 0, P n θ − θbM L ≤ x = 0 because the random variable n θ − θbM L is positive. This proves the result for x < 0. For x ≥ 0, P n θ − θbM L ≤ x = x P θbM L > θ − n x b 1 − P θM L ≤ θ − n x n 1− F θ− n x 1 − exp n ln F θ − n x εn 1 − exp n ln F (θ) − f (θ) + n n 1 − exp (−xf (θ) + εn ) → 1 − exp(−xf (θ)). = = = = = n→+∞ c) The cdf of a variable following a Weibull distribution is continuous everywhere. Using the √ denition of convergence in distribution, we therefore have to show that P n θ − θbM L ≤ x converges to the cdf given in the question for every x. √ For x < 0, P n θ − θbM L ≤ x = 0 because the random variable tive. This proves the result for x < 0. √ n θ − θbM L is posi- For x ≥ 0, √ n θ − θbM L ≤ x P = = = = = = → n→+∞ x b P θM L > θ − √ n x 1 − P θbM L ≤ θ − √ n n x 1− F θ− √ n x 1 − exp n ln F θ − √ n x x2 f 0 (θ) εn + 1 − exp n ln F (θ) − √ f (θ) + n 2 n n 0 f (θ) 1 − exp x2 + εn 2 f 0 (θ) 1 − exp x2 . 2 d) When f (θ) = 0, the probability that a realization of the Yi is close to θ is lower than when f (θ) > 0. 6