PS3 for Econometrics 101, Warwick Econ Ph.D * denotes more mathematical exercises, on which you should spend some time if you are a type 2 student but not if you are type 1. If you are type 1, you should still read the result of exercise 3, without trying to prove it, as it will help you interpret results of the applied exercises. Exercise 1: using the delta method to derive condence intervals for σ(Y ). Let (Yi )1≤i≤n be an iid sample of n V (Y ) and random variables with 4th moment and with a strictly positive variance. Let be an estimator of V (Y ), 1. Show that and let 2 Vb (Y ) = Y 2 − Y q σ b(Y ) = Vb (Y ) be an estimator of its standard deviation. 0 √ 2 n (Y , Y )0 − (E(Y 2 ), E(Y ))0 ,→ N (0, V0 ), where V0 = 2. Give a consistent estimator of V (Y 2 ) cov(Y 2 , Y ) cov(Y 2 , Y ) V (Y ) V0 , Vb0 . ! . No need to prove that the estimator is indeed consistent, just mention which theorems you would use to prove consistency. 3. Use the previous question and the delta method to show that √ n Vb (Y ) − V (Y ) ,→ N (0, V1 ), where V1 = (1, −2E(Y ))V0 (1, −2E(Y ))0 . V1 , Vb1 . 4. Give a consistent estimator of You can write Vb1 as a function of Vb0 . No need to prove that the estimator is indeed consistent, just mention which theorems you would use to prove consistency. 5. Give a condence interval for V (Y ) with asymptotic coverage 1 − α. You just need to write the upper and lower bounds of the condence interval as functions of Vb1 , q1− α2 , coverage and n. 1 − α, Vb (Y ), No need to prove that the condence interval indeed has asymptotic just mention which lemma you would use in the proof. 6. Use the delta method to show that √ n (b σ (Y ) − σ(Y )) ,→ N (0, V2 ), where V2 = 1 1 V1 . 4V (Y ) 7. Give a consistent estimator of V2 , Vb2 . You can write Vb2 as a function of Vb1 . No need to prove that the estimator is indeed consistent, just mention which theorem you would use to prove consistency. 8. Give a condence interval for σ(Y ) with asymptotic coverage 1 − α. You just need to write the upper and lower bounds of the condence interval as functions of Vb2 , q1− α2 , coverage and n. 1 − α, σ b(Y ), No need to prove that the condence interval indeed has asymptotic just mention which lemma you would use in the proof. Solution 1. Let Ui = (Yi2 , Yi )0 . (Ui )1≤i≤n with a second moment (as Yi is an iid sequence of 2×1 vectors of random variables has a fourth moment by assumption). It follows from the central limit theorem that √ n(U − E(U )) ,→ N (0, V (U )). √ √ This proves the result, once noted that n(U −E(U )) = n (Y 2 , Y )0 − (E(Y 2 ), E(Y ))0 and V (U ) = V ((Y , Y ) ) = 2. Vb0 = is a consistent estimator of V (Y 2 ) cov(Y 2 , Y ) cov(Y 2 , Y ) V (Y ) 0 2 V0 . 2 Y4− Y2 Y 3 − Y 2Y ! = V0 . Y 3 − Y 2Y 2 Y2− Y The proof would rely on the weak law of large numbers, and on the continuous mapping theorem. 3. Let Φ(x, y) = x − y 2 . Now, ∂Φ ∂x (x, y) Φ(E(Y 2 ), E(Y )) = V (Y ∂Φ ∂y (x, y) ), and Φ(Y 2 , Y ) = = 1, and = −2y . Therefore Φ̇(x, y) = (1, −2y)0 . Vb (Y ). Then, it follows from the delta method and from the rst question that √ n Vb (Y ) − V (Y ) ,→ N (0, Φ̇(E(Y 2 ), E(Y ))0 V0 Φ̇(E(Y 2 ), E(Y ))). This proves the result. 4. Vb1 = (1, −2Y )Vb0 (1, −2Y )0 is a consistent estimator of V1 . The proof follows from the b0 is a consistent estimator of V0 , from the weak law of large numbers, and fact that V from the continuous mapping theorem. 5. s IC 1 (α) = Vb (Y ) − q1− α2 is a condence interval for V (Y ) s Vb1 b , V (Y ) + q1− α2 n Vb1 n with asymptotic coverage equal to relies on results from previous questions and the Slutsky lemma. 2 1 − α. The proof 6. Let φ(x) = √ x. φ̇(x) = 1 √ . 2 x φ(V (Y )) = σ(Y ) and φ(Vb (Y )) = σ b(Y ). Then, it follows from the delta method and from the result of the third question that √ n (b σ (Y ) − σ(Y )) ,→ N (0, (φ̇(V (Y )))2 V1 ). This proves the result. Vb2 = 7. 1 b 2 V1 is a consistent estimator of 4 Y 2 −(Y ) consistent estimator of V2 . This follows from the fact that Vb1 is a V1 , from the weak law of large numbers, and from the continuous mapping theorem. 8. s IC 2 (α) = σ b(Y ) − q1− α2 is a condence interval for σ(Y ) s Vb2 ,σ b(Y ) + q1− α2 n b V2 n with asymptotic coverage equal to 1 − α. The proof relies on results from previous questions and the Slutsky lemma. Exercise 2: quantiles as m-estimands In the course, we showed that the median of a continuous random variable can be seen as an m-estimand. In this exercise, we want to show that the same applies to any quantile of Let Y be a continuous random variable with a continuous and strictly increasing cdf ρτ (u) = (τ − 1)u and be the quantile of order Show that τ of for u ≤ 0. and u, let ρτ (u) = (τ − 1{u ≤ 0})u. ρτ (u) = τ u for admitting a rst moment. For any real number u > 0, FY , Y. You can check that Y : qτ (Y ) = FY−1 (τ ). qτ (Y ) ρ0.5 (u) = 0.5|u|. is the solution of Then, let qτ (Y ) P (Y ≤ y) = τ . qτ (Y ) = argmin E(ρτ (Y − θ)). θ∈R Hints: you should use the exact same steps as in the proof of Example 2 in the notes. You need to show that E(ρτ (Y − θ)) − E(ρτ (Y − qτ (Y ))) is purpose, you should consider rst a and E(ρτ (Y − qτ (Y ))) Y < qτ (Y ). θ > qτ (Y ). strictly positive for any θ 6= qτ (Y ). You should then decompose both into three pieces depending on whether Then you should replace the functions ρτ (Y − θ) On that E(ρτ (Y −θ)) Y ≥ θ, qτ (Y ) ≤ Y < θ, and ρτ (Y − qτ (Y )) or by simpler expressions within each of the 6 conditional expectations. Finally, at some point you will have to use the fact that P (Y ≤ qτ (Y )) = τ and P (Y > qτ (Y )) = 1 − τ . 3 Solution Let θ be strictly greater than qτ (Y ). Then, E(ρτ (Y − θ)) − E(ρτ (Y − qτ (Y ))) = E(ρτ (Y − θ)|Y ≥ θ)P (Y ≥ θ) + E(ρτ (Y − θ)|qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) + E(ρτ (Y − θ)|Y < qτ (Y ))P (Y < qτ (Y )) − E(ρτ (Y − qτ (Y ))|Y ≥ θ)P (Y ≥ θ) − E(ρτ (Y − qτ (Y ))|qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) − E(ρτ (Y − qτ (Y ))|Y < qτ (Y ))P (Y < qτ (Y )) = τ E(Y − θ|Y ≥ θ)P (Y ≥ θ) + (1 − τ )E(θ − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) + (1 − τ )E(θ − Y |Y < qτ (Y ))P (Y < qτ (Y )) − τ E(Y − qτ (Y )|Y ≥ θ)P (Y ≥ θ) − τ E(Y − qτ (Y )|qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) − (1 − τ )E(qτ (Y ) − Y |Y < qτ (Y ))P (Y < qτ (Y )) = τ (qτ (Y ) − θ)P (Y ≥ θ) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) + (1 − τ )(θ − qτ (Y ))P (Y < qτ (Y )) = (θ − qτ (Y ))((1 − τ )P (Y < qτ (Y )) − τ P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) = (θ − qτ (Y ))((1 − τ )P (Y ≤ qτ (Y )) − τ P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) = (θ − qτ (Y ))((1 − τ )τ − τ P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) = (θ − qτ (Y ))τ (P (Y > qτ (Y )) − P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) = (θ − qτ (Y ))τ (P (Y ≥ qτ (Y )) − P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) = (θ − qτ (Y ))τ P (qτ (Y ) ≤ Y < θ) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) = P (qτ (Y ) ≤ Y < θ)E((θ − qτ (Y ))τ + (1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ) = P (qτ (Y ) ≤ Y < θ)E(θ − Y |qτ (Y ) ≤ Y < θ) > 0. The second equality follows from the law of iterated expectations. Various steps merely amounts to replacing weak by strict inequalities (or the converse) in probabilities: these hold because Y the cdf of is assumed to have a continuous distribution. The last inequality holds because Y is strictly increasing and qτ (Y ) ≤ Y < θ, θ − Y θ > qτ (Y ), is strictly positive. so P (qτ (Y ) ≤ Y < θ) > 0. E(θ − Y |qτ (Y ) ≤ Y ≤ θ) Moreover, when is the expected value of a random variable which is always strictly positive, therefore it is strictly positive. A similar reasoning applies for θ < qτ (Y ). * Exercise 3: the expected value of a random variable is the average of its quantiles. Let Z fZ . Let be a random variable with a strictly increasing cdf z and z 1. Show that FZ and with a strictly positive pdf respectively denote the lower and upper bounds of its support. E(Z) = R1 0 FZ−1 (τ )dτ . Hint: you should use the substitution z = FZ−1 (τ ) in R1 −1 0 FZ (τ )dτ and the integration by substitution theorem (see Wikipedia entry). 2. Infer from the previous question that E(Y1 − Y0 ) = R1 0 FY−1 (τ ) − FY−1 (τ )dτ : 1 0 the average treatment eect is the average of quantile treatment eects. Solution 1. To prove the result, we are going to use the integration by susbtitution theorem (see the 4 wikipedia entry for more details). Let's do the substitution z = FZ−1 (τ ) in R1 FZ−1 (τ )dτ . 0 dz = ḞZ−1 (τ )dτ. Moreover, 1 ḞZ−1 (τ ) = ḞZ (FZ−1 (τ )) FZ (FZ−1 (τ )) = τ This can be seen by dierencing ḞZ−1 (τ ) = . with respect to 1 fZ (FZ−1 (τ )) = τ. Therefore, 1 , fZ (z) which nally implies that fZ (z)dz = dτ. Moreover, FZ−1 (0) = z , and FZ−1 (1) = z . Therefore, it follows from the integration by susbtitution theorem that Z 0 1 FZ−1 (τ )dτ Z z = zfZ (z)dz = E(Z). z 2. It follows from the previous question that 1 Z E(Y1 − Y0 ) = E(Y1 ) − E(Y0 ) = 0 FY−1 (τ )dτ − 1 Z 0 1 FY−1 (τ )dτ = 0 Z 0 1 FY−1 (τ ) − FY−1 (τ )dτ. 1 0 Exercise 4: using quantile regressions to measure changes in the US wage structure (based on Angrist et al. (2006)) Let τ X be a k×1 vector of random variables. Let of the distribution of Y |X = x. qτ (Y |X = x) qτ (Y |X = x) denote the quantile of order is just a regular quantile, so it follows from qτ (Y |X = x) = argmin E(ρτ (Y − θ)|X = x). the previous exercise that Now, let qτ (Y |X) be θ∈R a random variable equal to Y conditional on qτ (Y |X = x) when X = x. qτ (Y |X) is called the τ th quantile of X. qτ (Y |X) = argmin E(ρτ (Y − g(X))). (0.0.1) g(.) Among all possible functions of X , g(X), qτ (Y |X) is the one which minimizes E(ρτ (Y −g(X))). Indeed, assume to simplify that X is a discrete random variable taking J values x1 , ..., xJ . Then, it follows from the law of iterated expectations that E(ρτ (Y −g(X))) = E(E(ρτ (Y −g(X))|X)) = J X 1{X = xj }E(ρτ (Y −g(xj ))|X = xj )P (X = xj ). j=1 For each j , E(ρτ (Y − g(xj ))|X = xj ), qτ (Y |X = xj ). which minimizes Therefore qτ (Y |X) = E(ρτ (Y − g(X))). regression function X 0 βτ , viewed as a function of with PJ j=1 1{X g(xj ), is minimized at = xj }qτ (Y |X = xj ) g(xj ) = is the function of X Equation (0.0.1) motivates the denition of the quantile βτ = argmin E(ρτ (Y − X 0 b)). b∈Rk 5 Angrist et al. (2006) show that X 0 βτ X. of is a weighted MMSE approximation of When βτ is qτ (Y |X) is indeed a linear function of βbτ = argmin b qτ (Y |X) 1 n Pn i=1 ρτ (Yi − within the set of all linear functions of X , X 0 βτ is equal to it. A natural estimator Xi0 b). The census80.dta, census90.dta, and census00.dta contain representative samples of US born black and white men aged 40-49 with at least ve years of education, with positive annual earnings and hours worked in the year preceding the census, respectively from the 1980, 1990, and 2000 censuses. The goal of the exercise is to estimate, for each census wave, returns of years of education on wages, on dierent quantiles of the distribution of wages, controlling for age, experience, and race. Here, returns should be understood as a correlational concept, not as a causal concept. Coecients of education in the regressions you will run should be interpreted as follows: an increase of one year of education is the τ th associated to an increase of quantile of wages by XXXX, controlling for YYYY. Even though these parameters are not causal ones, they are still very interesting: comparing the structure of these returns across census waves will teach us something on the evolution of the structure of wages in the US over these 20 years. Each of these data sets contains the following variables: age (age in years of the individual), educ (number of years of education), logwk (log wage), exper (number of years of professional experience), exper2 (number of years of professional experience squared), black (a dummy if someone is black). 1. Open census80.dta. Using the qreg command, run quantile regressions of order 10, 25, 50, 75, and 90 of logwk on educ, age, exper, exper2, and black. 2. Interpret the coecient of educ in each regression, following the guideline I gave you in the introduction of the exercise. Remember that wage is in logarithm, and that you control for age, exper, exper2, and black in your regression. 3. Do the condence intervals of the coecient of educ in the quantile regressions of order 10, 25, 50, 75, and 90 overlap? In 1980, did years of education had very dierent returns across dierent quantiles of the distribution of wages? 4. Based on your regression results, draw a graph with years of education on the x-axis, and predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education, holding the other variables constant. mapping education into education Your graph should consist in 5 lines: educ , ×βb0.1 the line the line mapping education into education educ etc., where β beduc denotes the coecient of education in the quantile regression of ×βb0.25 0.1 educ b order 10, β0.25 denotes the coecient of education in the quantile regression of order 25, etc.. No need that the line correspond perfectly to the true quantile regression functions, but based on your regression results, you should determine a) if each of these 5 lines is increasing or decreasing and b) whether they are parrallel to each other, and if not which ones are more increasing / decreasing than the others. 6 5. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient of educ in that regression very dierent from that of educ in the quantile regressions? Is the value of this coecient very surprising in view of the result in Exercise 3 and the results of the quantile regressions? 6. Open census90.dta. Using the qreg command, run quantile regressions of order 10, 25, 50, 75, and 90 of logwk on educ, age, exper, exper2, and black. 7. Interpret the coecient of educ in each regression. 8. Do the condence intervals of the coecient of educ in the quantile regressions of order 10, 25, 50, 75, and 90 overlap? In 1990, did years of education had very dierent returns across dierent quantiles of the distribution of wages? How do the coecients of the 1990 quantile regressions compare to those of the 1980 regressions? 9. Based on your regression results, draw a graph with years of education on the x-axis, and predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education, holding the other variables constant. mapping education into education Your graph should consist in 5 lines: educ , ×βb0.1 the line the line mapping education into education educ etc., where β beduc denotes the coecient of education in the quantile regression of ×βb0.25 0.1 educ b order 10, β0.25 denotes the coecient of education in the quantile regression of order 25, etc.. No need that the line correspond perfectly to the true quantile regression functions, but based on your regression results, you should determine a) if each of these 5 lines is increasing or decreasing and b) whether they are parrallel to each other, and if not which ones are more increasing / decreasing than the others. 10. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient of educ in that regression very dierent from that of educ in the quantile regressions? Is the value of this coecient very surprising in view of the result in Exercise 3 and the results of the quantile regressions? 11. Open census00.dta. Using the qreg command, run quantile regressions of order 10, 25, 50, 75, and 90 of logwk on educ, age, exper, exper2, and black. 12. Interpret the coecient of educ in each regression. 13. Do the condence intervals of the coecient of educ in the quantile regressions of order 10, 25, 50, 75, and 90 overlap? In 2000, did years of education had very dierent returns across dierent quantiles of the distribution of wages? 14. Based on your regression results, draw a graph with years of education on the x-axis, and predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education, holding the other variables constant. Your graph should consist in 5 lines: the line beduc , the line mapping education into education mapping education into education ×β 0.1 7 educ ×βb0.25 etc., where order 10, educ βb0.25 educ βb0.1 denotes the coecient of education in the quantile regression of denotes the coecient of education in the quantile regression of order 25, etc.. No need that the line correspond perfectly to the true quantile regression functions, but based on your regression results, you should determine a) if each of these 5 lines is increasing or decreasing and b) whether they are parrallel to each other, and if not which ones are more increasing / decreasing than the others. 15. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient of educ in that regression very dierent from that of educ in the quantile regressions? Is the value of this coecient very surprising in view of the result in Exercise 3 and the results of the quantile regressions? 16. Compare the three graphs you drew, to summarize the results from the quantiles regressions in 1980, 1990, and 2000. How did the returns to education evolve over these 20 years? What have been the main changes in the structure of US wages over that period? 17. Find another potentially interesting set of quantile regressions you could run, run them, interpret their results, and send me a 10 lines email explaining the regressions you ran, and your results. Solution 1. See code. 2. educ = 0.073. βb0.1 As wages are in logarithm, and the regression controls for age, experience, experience squared, and race, the regression coecient should be interpreted as follows: In 1980, one more year of education was associated with a 7.3% increase of the 10th quantile of wages, holding age, experience, experience squared, and race constant. All the other coecients can be interpreted similarly. 3. The condence interval of educ β0.1 overlaps with that of educ , β educ , β0.25 0.75 and educ , β0.90 imply- ing that these coecients might not be signicantly dierent (checking that condence intervals overlap is not, strictly speaking, a test of whether two parameters are significantly dierent, but this is a good indication). educ , β0.25 educ , β0.50 educ , and β0.75 Similarly, the condence intervals of educ most often overlap. β0.90 Anyway these coecients are not economically very dierent: they are all included between 0.068 and 0.079, implying at most a one percentage point dierence in returns to education across those quantiles of wages. 4. Education has fairly similar positive returns across all quantiles. Therefore, the 5 lines should all have the same positive slope. parrallel and increasing lines. 8 You should therefore draw a graph with 5 5. educ = 0.072: βbOLS In 1980, one more year of education was associated with a 7.2% increase of the mean of wages, holding age, experience, experience squared, and race constant. This coecient is not very surprising following the result of Exercise 3: education is associated with a 7% increase of all quantiles of the distribution of wages, and the eect of education on the average is the average of its eect on all quantiles, so its eect on the average of wages is also to increase it by 7% per year of education. 6. See code. 7. The interpretation of the coecients is the same as in question 2. 8. The estimates of educ , β educ , β educ , β0.1 0.25 0.50 and educ β0.75 are not economically very dierent (they are all included between 0.106 and 0.111), and their condence intervals most often overlap. On the other hand, the estimate of educ β0.90 is substantially larger, and its condence interval does not overlap with that of any of the other coecients. This implies that years of education is associated with a larger increase of the 90 th percentile of wages than for other percentiles. Also, all the coecients are larger than in 1980, implying that the returns to years of education on the labor market have increased over the period. 9. Education has fairly similar positive returns across the rst four quantiles, but has higher returns on the last one. Therefore, your graph should bear 4 parrallel and increasing line, and a 5th more increasing upper line. 10. educ = 0.113: βbOLS In 1990, one more year of education was associated with a 11.3% increase of the mean of wages, holding age, experience, experience squared, and race constant. This coecient is similar to the one we derived in the rst four quantile regressions, even though it is slightly higher. This coecient is not very surprising following the result of Exercise 3: education is associated with a 11% increase of all quantiles of the distribution of wages except for higher quantiles where its eect is higher; the eect of education on the average is the average of its eect on all quantiles; so its eect on the average of wages is slightly greater than 11%. 11. See code. 12. The interpretation of the coecients is the same as in question 2. 13. The estimates of educ , β educ , β educ , β educ , β0.1 0.25 0.50 0.75 and educ β0.90 are all substantially dierent, and their condence intervals never overlap. Returns to education consistently increase for larger quantiles. 14. Education has positive returns on each quantile, but it has larger returns for larger quantiles. Therefore, your graph should bear 5 non parrallel increasing lines, with increasing slopes from the bottom line to the top line. 9 15. educ = 0.115: βbOLS In 2000, one more year of education was associated with a 11.5% increase of the mean of wages, holding age, experience, experience squared, and race constant. This coecient is similar to the one we derived in the third quantile regression (that of the median). It is substantially larger than than educ βb0.75 and educ . βb0.90 educ βb0.1 and educ , βb0.25 and substantially smaller Once again, this reects the fact that the eect of education on average wage is the average of its eect on all quantiles. 16. The eect of education on average wages was multiplied by more than 1.5 over the period. This mostly comes from the top of the distribution: the eect of education on the 90th percentile of wages was multiplied by more than 2. When you compare your three graphs, they should show you that the variance of wages within groups of people with the same level of education went from being constant across educational levels to being an increasing function of education. In 2000, education is much more rewarded on the labor market than it was in 1980. On the other hand, within highly educated people, the variance of wages is much larger in 2000 than it was in 1980, implying that other factors now play a greater role in the determination of wages. Exercise 5: using quantile regressions to measure heterogeneous returns to a boarding school for disadvantaged students (based on Behaghel et al. (2014)) Behaghel et al. (2014) ran an experiment in which applicants to a boarding school for disadvantaged students were randomly admitted to the school. The boardingschool.dta data set contains three variables. mathsscore is students maths score two years after the lottery which determined who was admitted or not to the school took place. This measure is standardized with respect to the mean and standard deviation of the raw score in the control group. This is the outcome variable in this exercise, so we denote it Y. baselinemathsscore is a measure of students baseline maths ability, before the lottery took place. We will use it as a control variable, so we denote it X. boarding is a dummy equal to 1 if a student won the lottery and gained admission to the boarding school. This is the treatment variable, so it is denoted D. 1. Run a very simple regression to show that students who won the lottery had very similar baseline maths scores to those who lost the lottery. If you were to compare the treatment and the control groups not on one, but on 20 baseline characteristics, how many signicant dierences at the 5% level should you expect to nd? 2. Run a very simple regression to measure the eect of the boarding school on students' average maths test scores two years after the lottery. Interpret the coecient from this regression (do not forget that test scores are standardized with respect to the mean and standard deviation of the raw score in the control group). 3. Run a slightly more complicated regression to increase the statistical precision of your coecient of interest. Does this coecient change a lot with respect to the previous 10 regression? Was this expected? Does the standard error of your main coecient of interest change a lot with respect to the previous regression? Was this expected? 4. We have qτ (Y |D) = 1{D = 0}qτ (Y |D = 0) + 1{D = 1}qτ (Y |D = 1) = qτ (Y |D = 0) + D(qτ (Y |D = 1) − qτ (Y |D = 0)). This shows that of Y qτ (Y |D) on a constant and D is a linear function of qτ (Y |D = 0), while the coecient of D will be an qτ (Y |D = 1) − qτ (Y |D = 0). (a) Show that order Therefore, a quantile regression will estimate this conditional quantile function: the coecient of the constant will be an estimate of estimate of (1, D)0 . βbτ is an estimate of FY−1 (τ ) − FY−1 (τ ), 1 0 the quantile treatment eect of τ. (b) Plot a graph with τ on the x-axis and βbτ 90% condence interval on the y-axis, for and the upper and lower bounds of its τ ranging from 0.03 to 0.97. On that purpose, you need to: 1. run a quantile regression of order for every 2. store τ ranging from 0.03 to τ of Y on D using the qreg command in Stata 0.97 τ , βbτ , βbτ − 1.64 ∗ se(βbτ ), βbτ + 1.64 ∗ se(βbτ ) in a matrix 3. clear the data 4. form a data set containing the values you stored in the matrix, using the svmat command 5. plot the requested graph. (c) Does the boarding school increase more the lowest quantiles of the distribution of scores, those close to the median, or the highest ones? (d) Does the graph conrm the result in exercise 3? 5. The rank invariance assumption states that the treatment does not change the rank of an individual in the distribution of the outcome. The individual at the 90th percentile of the distribution of Y0 will also be at the 90th percentile of the distribution of Y1 . Under this assumption, quantile treatment eects can be interpreted as individual treatment eects. FY−1 (τ ) − FY−1 (τ ) 1 0 compares the τ th quantile of the distributions of Under the rank invariance assumption, it is the same individual who is at the of these two distributions, so Y1 and Y0 . τ th quantile FY−1 (τ )−FY−1 (τ ) is a measure of the eect of the treatment 1 0 on her. The rank invariance assumption is not a credible assumption, but it is useful to assess treatment eect heterogeneity. Indeed, any deviation from rank invariance implies more 11 treatment eect heterogeneity than that obtained under rank invariance. One can formally show that V (Y1 − Y0 ) ≥ V RI (Y1 − Y0 ): the true variance of the treatment eect, which we cannot measure because we do not observe V RI (Y1 − Y0 ), Y1 − Y0 , is at least as high as the variance of the treatment eect under rank invariance. V interesting result, because RI (Y 1 − Y0 ) That's an is something we can estimate. In the simple case where the sample bears as many treated and untreated observations (n0 under rank invariance the Y0 of Y(i)1 , th largest value of merely Y(i)0 , the i the observation with the Y0 . largest value of Y1 , is Therefore, one can use n 0 1X (Y(i)1 − Y(i)0 )2 − n i=1 to estimate ith = n1 ), n 0 1X (Y(i)1 − Y(i)0 ) n !2 i=1 V RI (Y1 − Y0 ). We will not prove this result that any deviation from rank invariance implies more treatment eect heterogeneity than that obtained under rank invariance, but let me give you a little bit of intuition. Assume that FY−1 (τ ) − FY−1 (τ ) 1 0 is increasing in τ. Rank invariance already implies treatment eect heterogeneity: under rank invariance, FY−1 (τ ) − FY−1 (τ ) 1 0 increasing in τ implies that people at the upper quantiles of the dis- tribution of the outcome have larger eects than those at the lower quantiles. Now, any deviation from rank invariance implies even more treatment eect heterogeneity. Under rank invariance, FY−1 (1) − FY−1 (1) is the value of the treatment eect for the person with 1 0 the highest value of Y1 . If rank invariance is violated, the counterfactual value of the outcome for that person under non treatment is no longer τ < 1. FY−1 (1), but FY−1 (τ ), for some 0 0 If rank invariance is violated, this person, instead of having the highest Y0 , must Y0 . This implies that the true treatment eect −1 FY0 (τ ), is even larger than what we had concluded under have a lower rank in the distribution of −1 for that person, FY (1) 1 rank invariance. − Similarly, if the person with the lowest value of have the lowest value of Y0 , Y1 actually did not then the true treatment eect for that person is even lower than what we had concluded under rank invariance. In this example, deviations from rank invariance mean that people at upper quantiles have even larger eects than what we had concluded under rank invariance, while people at lower quantiles have even lower eects, implying that overall, treatment eects are even more heterogeneous that what we had concluded under rank invariance. After this long introduction, consider again the graph you obtained in question 4.b). (a) Look at whether condence intervals of FY−1 (τ ) − FY−1 (τ ) 1 0 for dierent values of τ −1 −1 cross or not. At the 10% level, can you reject that FY (τ ) − FY (τ ) is constant 1 0 −1 −1 for every τ ? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only two 1 0 −1 −1 dierent values? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only 1 0 three dierent values? 12 (b) Under the rank invariance assumption, what is the minimum number of dierent values that Y1 − Y0 must take? Are these values very dierent from each other? Is the eect of the boarding school on students test scores heterogeneous? 6. Given the shape of the quantile treatment eect function likely to be greater or smaller than V (Y0 )? τ 7→ FY−1 (τ )−FY−1 (τ ), is V (Y1 ) 1 0 How could you estimate V (Y1 ) and V (Y0 )? Which results could you use to derive condence intervals for these two quantities? 7. This policy is expensive: it multiplies by two the expenditure per year per student. However, the gain on average maths scores it produces is roughly the same as the one produced by dividing class size by two, which also multiplies by two the expenditure per year per student. The literature has found that class size reductions have larger eects for the weakest students. Based on the results of the previous questions, nd one reason why you might prefer class size reduction over the boarding school policy if you were a social planner. Send me a 5 lines email with your answer. Solution 1. You should regress X on D. The coecient is very small, and insignicant. The lottery created almost perfectly comparable treatment and control groups from the perspective of their baseline maths score. If you were to run 20 regressions of baseline characteristics (gender, parents education...) on D, you should get (on average) one signicant coe- cient. The coecient of these regressions is an estimate of E(X|D = 1) − E(X|D = 0). By virtue of the randomization, and because baseline characteristics were xed before the lottery, E(X|D = 1) − E(X|D = 0) = 0. know that when from 0 H0 is satised, a test of 5% at the 5% level has a H0 Therefore, H0 : β = 0 based on whether β probability of wrongly rejecting is satised. Still, we is signicantly dierent H0 . So out of 20 tests, you will on average wrongly reject one. 2. You should regress Y on D. The coecient is 0.295. As the data comes from a random- ized experiment, this coecient can receive a causal interpretation: spending two years in the boarding school increases students' test scores by 3. You can regress Y on D and X. The coecient of D 29.5% of a standard deviation. is almost the same, what we expected, following the second omitted variable formula (as allocated, it must be independent of the lottery, therefore µ=0 X, D 0.289. was randomly a baseline characteristic which was xed before in this formula). The standard error of this coecient is 15% lower. This is what we expected, following Theorem 5.2.2 in the notes: 2 1−RY,(D,X) 2 1−RY,D = 1−0.28 1−0.01 = 0.73, This is √ and 0.73 = 0.85. 13 2 RD,X = 0, 4. (a) βbτ is an estimate of qτ (Y |D = 1) − qτ (Y |D = 0). −1 qτ (Y |D = 1) − qτ (Y |D = 0) = FY−1 |D=1 (τ ) − FY |D=0 (τ ) = FY−1 (τ ) − FY−1 (τ ) 1 |D=1 0 |D=0 = FY−1 (τ ) − FY−1 (τ ). 1 0 (b) See code. The resulting graph is shown below. Figure 1: QTE of the boarding school (c) The boarding school has a negative eect on the rst quintile of the distribution of test scores, even though few estimates are statistically signicant. It has a moderately positive eect on the second, third, and fourth quintiles of test scores (around 25-30% of a standard deviation). A fairly large number of QTE are signicantly dierent from 0 at the 10% level. Finally, the boarding school has a very large and signicant eect on the top quintile of the distribution of test scores (around 0.9 standard deviations). (d) Yes, it does. We saw in question 1 that the boarding school increases average test scores by 0.29 sd. speaking equal to Therefore, −0.2 b 1 − Y0 ) = 0.29. E(Y Estimated QTE are roughly over the rst quintile of the distribution, to next three quantiles, and to 0.25 over the 0.8 over the last quantile. −0.2 ∗ 0.2 + 0.25 ∗ 0.6 + 0.8 ∗ 0.2 ≈ 0.29. 5. (a) Not all condence intervals overlap. For instance, the condence intervals of the QTE above 0.8 do not overlap with those of the QTE below 0.15 (up to some exceptions). Similarly, the condence intervals of 14 FY−1 (0.42) − FY−1 (0.42), 1 0 FY−1 (0.62) − FY−1 (0.62), 1 0 and do not overlap with that of (0.75) FY−1 (0.75) − FY−1 0 1 FY−1 (0.05) − FY−1 (0.05). 1 0 up to (0.79) (0.79) − FY−1 FY−1 0 1 Finally, the condence interval −1 of FY (0.95) 1 − FY−1 (0.95) does not overlap with that of any QTE below 0.6. We 0 −1 −1 can therefore reject that FY (τ ) − FY (τ ) is a constant function of τ . It must 1 0 −1 −1 at least take three dierent values (for instance, FY (0.95) − FY (0.95) is signif1 0 −1 −1 icantly dierent from FY (0.42) − FY (0.42) which is itself signicantly dierent 1 0 −1 −1 from FY (0.05) − FY (0.05)). 1 0 (b) Under rank invariance, FY−1 (τ ) − FY−1 (τ ) 1 0 are individual level values of Following results in the previous question, under rank invariance Y1 − Y0 . Y1 − Y0 must take at least three values: a negative value for people in the lowest quintile of the distribution, a value around 25-30% of a standard deviation for people in the 2nd, 3rd and 4th quintiles, and a value around 90% of a standard deviation for people in the highest quintile. Even under rank invariance, the eect of the treatment already seems to be highly heterogeneous across students. As any deviation from rank invariance will increase even more this heterogeneity, we can assert that the eect of the school is highly heterogeneous across students. 6. The treatment increases the largest values of the outcome, and diminishes the smallest ones. Therefore, it probably increased the variance of test scores. To check this, you 2 Y 2 |D = 1 − Y |D = 1 could use estimate V (Y0 ). to estimate V (Y1 ), and 2 Y 2 |D = 0 − Y |D = 0 to You could rely on results from the rst exercise to derive condence intervals for these two quantities, or you could use the bootstrap. 7. It follows from the quantile treatment eects analysis that the boarding school increases the highest quantiles, and decreases the lowest ones. On the contrary, class size reductions benet more to weak students, so they probably increase more the lowest than the highest quantiles of the distribution. A purely utilitarian planner would be indierent between the two policies, as they have the same cost and the same average treatment eect. However, a planner with a welfare function putting more weight on the lower quantiles of the distribution than on the upper quantiles will prefer class size reduction. * Exercise 6: Proving that Theorem 6.4.4 applies to the sample median Let Y be continuous random variable with strictly increasing and continuous cdf strictly positive and continuous pdf support of Y. fY . Let y and y FY and be the lower and upper bounds of the The goal of this exercise is to show that the median of Y is an m-estimand satisfying Assumption 9 in the notes, and to infer from this its asymptotic behavior using Theorem 6.4.4. It follows from example 2 in the notes that me(Y ) = argmin E(|Y − θ|). θ∈R Therefore, me(Y ) = argmin E(|Y − θ| − |Y |). So θ∈R m(y, θ) = |y − θ| − |y|. 15 me(Y ) can be seen as an m-estimand with 1. Show that θ 7→ m(y, θ) θ is dierentiable in for every θ dierent from expression of its derivative. Hint: you should start considering a value of you should give explicit expressions of m(y, θ) depending on the value of y. Give an θ ≥ 0, y then (there are three cases you must distinguish), then you should dierentiate these expressions with respect to θ. Then you should do the same thing for 2. Infer from the previous question that 9: θ 7→ m(y, θ) 3. Show that m(y, θ) m(y, θ) satises the rst requirement of Assumption θ0 = me(Y ) is dierentiable at θ < 0. for almost every y ∈ R. satises the second requirement of Assumption 9: for every (θ1 , θ2 ), |m(y, θ1 ) − m(y, θ2 )| ≤ |θ1 − θ2 |. 4. Let M (θ) = E(m(Y, θ)) = E(|Y − θ| − |Y |). (a) Show that for any θ ≥ 0, Z 0 E(|Y − θ| − |Y |) = Z θ Z θfY (y)dy. θ 0 y y (θ − 2y)fY (y)dy − θfY (y)dy + (b) Infer from the previous question that θ Z (θ − 2y)fY (y)dy − θ(1 − FY (θ)). E(|Y − θ| − |Y |) = θFY (0) + 0 (c) Infer from the previous question that Z E(|Y − θ| − |Y |) = θ −2yfY (y)dy − θ(1 − 2FY (θ)). 0 (d) Infer from the previous question and from an integration by parts (see Wikipedia entry if needed) that θ Z E(|Y − θ| − |Y |) = 2 FY (y)dy − θ. 0 θ ≥ 0, M (θ) is twice continuously dierentiable. Give Ṁ (θ) (f ) Follow the same steps as in the 5 preceding subquestions to show that for any θ < 0, (e) Conclude that for any and M̈ (θ). Z E(|Y − θ| − |Y |) = −2 0 FY (y)dy − θ. θ (g) Conclude that for any and θ < 0, M (θ) is twice continuously dierentiable. Give Ṁ (θ) M̈ (θ). (h) Check that Ṁ (θ) and M̈ (θ) are continuous at continuously dierentiable for any (i) Check that M̈ (θ) 0. Conclude that θ ∈ R. is invertible for any 16 θ. Give M̈ (θ0 )−1 . M (θ) is twice 5. Conclude from the preceding questions that me(Y ) is an m-estimand satisfying As- me(Y c ) sumption 9. Apply Theorem 6.4.4 to deduce from this that normal estimator of me(Y ). is an asymptotically Give an expression of its asymptotic variance. Solution 1. Assume θ ≥ 0. If y ≥ θ, m(y, θ) = y − θ − y = −θ. y < 0, m(y, θ) = θ. Therefore, for θ 7→ m(y, θ) is dierentiable for every −1 ṁ(y, θ) = 1 Similarly, if θ 7→ m(y, θ) −θ 2y − θ θ is dierentiable for every −1 ṁ(y, θ) = 1 Therefore, irrespective of whether dierentiable for every θ 6= y , θ≥0 or y≥θ if 0≤y<θ if y<0 θ 6= y , with if θ<y if θ>y if y≥0 if θ≤y<0 if y<θ Assumption 9: θ 7→ m(y, θ) for which m(y, θ) θ 6= y , with if θ<y if θ>y we always have that θ 7→ m(y, θ) is with is dierentiable at if θ<y if θ>y m(y, θ) 2. A consequence of the previous question is that y if θ < 0, −1 ṁ(y, θ) = 1 only If θ < 0, m(y, θ) = Therefore 0 ≤ y < θ, m(y, θ) = θ − 2y . θ ≥ 0, −θ m(y, θ) = θ − 2y θ Therefore If satises the rst requirement of θ0 = me(Y ) is not dierentiable at θ0 is for almost every y∈R y = θ0 ). 3. |m(y, θ1 ) − m(y, θ2 )| = ||y − θ1 | − |y − θ2 || ≤ |y − θ1 − (y − θ2 )| = |θ1 − θ2 |. The inequality follows from the triangle inequality. 17 (the 4. (a) E(|Y − θ| − |Y |) Z y = (|y − θ| − |y|) fY (y)dy y Z 0 θ Z (|y − θ| − |y|) fY (y)dy + = (|y − θ| − |y|) fY (y)dy + y Z = 0 0 y y Z (θ − 2y)fY (y)dy − θfY (y)dy + (|y − θ| − |y|) fY (y)dy θ θ Z y Z θfY (y)dy. 0 θ (b) Therefore, 0 Z E(|Y − θ| − |Y |) = θ θ Z y Z (θ − 2y)fY (y)dy − θ fY (y)dy + y fY (y)dy θ 0 θ Z (θ − 2y)fY (y)dy − θ(1 − FY (θ)). = θFY (0) + 0 (c) Therefore, Z θ E(|Y − θ| − |Y |) = θFY (0) + θ Z θ −2yfY (y)dy − θ(1 − FY (θ)) Z θ = θFY (0) + θ(FY (θ) − FY (0)) + −2yfY (y)dy − θ(1 − FY (θ)) 0 Z θ = −2yfY (y)dy − θ(1 − 2FY (θ)). fY (y)dy + 0 0 0 (d) Therefore, it follows from an integration by parts, dierentiating ing fY (y), −2y and integrat- that E(|Y − θ| − |Y |) = [−2yF (y)]θ0 Z θ − −2FY (y)dy − θ(1 − 2FY (θ)) 0 Z θ = −2θFY (θ) + 2 FY (y)dy − θ(1 − 2FY (θ)) 0 Z θ = 2 FY (y)dy − θ. 0 (e) This shows that for any 2FY (θ) − 1 and θ ≥ 0, M (θ) M̈ (θ) = 2fY (θ). 18 is twice continuously dierentiable. Ṁ (θ) = θ < 0, (f ) For any E(|Y − θ| − |Y |) Z y = (|y − θ| − |y|) fY (y)dy y θ Z Z 0 y θ θ Z = y −θfY (y)dy 0 θ Z θ 0 Z = θ y Z (2y − θ)fY (y)dy + θfY (y)dy + Z (θ − 2y)fY (y)dy − θ fY (y)dy + y (|y − θ| − |y|) fY (y)dy 0 0 Z y (|y − θ| − |y|) fY (y)dy + (|y − θ| − |y|) fY (y)dy + = Z y fY (y)dy 0 θ 0 Z (2y − θ)fY (y)dy − θ(1 − FY (0)) Z 0 Z 0 2yfY (y)dy − θ(1 − FY (0)) fY (y)dy + θFY (θ) − θ θ θ Z 0 θFY (θ) − θ(FY (0) − FY (θ)) + 2yfY (y)dy − θ(1 − FY (0)) θ Z 0 2yfY (y)dy − θ(1 − 2FY (θ)) θ Z 0 0 [2yF (y)]θ − 2FY (y)dy − θ(1 − 2FY (θ)) θ Z 0 −2θFY (θ) − 2 FY (y)dy − θ(1 − 2FY (θ)) θ Z 0 −2 FY (y)dy − θ. = θFY (θ) + θ = = = = = = θ θ < 0, M (θ) (g) This shows that for any 2FY (θ) − 1 and is twice continuously dierentiable. M̈ (θ) = 2fY (θ). (h) It follows from the previous question and from the fact that the limit when fY (i) θ goes to 0 by the left of Ṁ (θ) θ 0 is continuous, the limit when Therefore, M (θ) M̈ (θ) = 2fY (θ) invertible. goes to for any M̈ (θ0 )−1 = θ. me(Y ). FY is continuous that 2FY (0) − 1 = Ṁ (0). by the left of M̈ (θ) is Similarly, as 2fY (0) = M̈ (0). θ ∈ R. This number is strictly positive by assumption, so it is 1 2fY (θ0 ) . me(Y ) is an m-estimand satisfying Assump- Then it follows from Theorem 6.4.4 that estimator of is is twice continuously dierentiable for any 5. It follows from the preceding questions that tion 9. Ṁ (θ) = me(Y c ) is an asymptotically normal Its asymptotic variance is M̈ (θ0 )−1 E(ṁ(Y, θ0 )ṁ(Y, θ0 )0 )M̈ (θ0 )−1 1 1 = E (1{Y < me(Y )} − 1{Y > me(Y )})2 2fY (me(Y )) 2fY (me(Y )) 1 = . 4fY (me(Y ))2 19 * Exercise 7: Using Theorem 6.4.3 to derive the asymptotic behavior of the maximum likelihood estimator. Let X be a random variable with support θ ∈ Θ ⊆ R} θ0 ∈ Θ. [x, x], f ∈ {pθ : whose probability density function (to simplify, I assume that the dimension of It follows from Example 5 in the notes that − ln(pθ (x)). The corresponding m-estimator is θb, 1 Pn argmin i=1 − ln(pθ (Xi )). We have seen that if: n θ0 θ is one). f = pθ0 for some m(x, θ) = b= estimator: θ is an m-estimand, with the maximum likelihood θ∈Θ 1. for any θ, pθ 6= pθ0 on at least a non empty open subset of the support of 2. Θ 3. θ 7→ ln(pθ (x)) is twice continuously dierentiable with respect to 4. ¨ θ (X)) E(ln(p 0 is invertible, then θ0 X is compact θ for every x. satises Assumptions 5, 6, 7, and 8 in the notes, so we can apply Theorem 6.4.3 to assert that the maximum likelihood estimator is asymptotically normal, with asymptotic variance −1 2 −1 ¨ ˙ ¨ θ (X)) E ln(pθ0 (X)) E ln(pθ0 (X)) E ln(p . 0 There should be − transformed into a signs appearing in each term, but we can forget about them, as one is + by the square, and multiplying the two other yields a + as well. The goal of this exercise is to derive a simpler expression of this asymptotic variance. follows, all derivatives are taken with respect to 1. Show that for any θ, Rx x ṗθ (x)dx = 0. In what θ. Hint: you can use the fact that Rx x pθ (x)dx = 1 and assume that you can invert the derivative and integral signs because the dominated convergence theorem applies. 2. Infer from this that ˙ θ (X))) = E(ln(p 0 x Z ˙ θ (x))pθ (x)dx = 0. ln(p 0 0 x 3. Infer from this that ¨ θ (X)) = E(ln(p 0 Z x ¨ θ (x))pθ (x)dx = − ln(p 0 0 x Z x ˙ θ (x)))2 pθ (x)dx = −E (ln(p 0 0 2 ˙ θ (X)) . ln(p 0 x Hint: the rst and last equality just follow from the denition of these expectations. The one you need to prove is the middle one. Rx x ˙ θ (x))pθ (x)dx = 0 ln(p with respect to For that, you should dierentiate θ. 4. Infer from this that the asymptotic variance of the maximum likelihood estimator is E 2 −1 ˙ ln(pθ0 (X)) . 20 5. Assume X follows a exp(θ0 ) distribution. compute the asymptotic variance of Compute θb. Use the previous questions to θb. Solution 1. As for every 0. θ, pθ (x) is a density, for every θ Rx pθ (x)dx = 1. Therefore, ∂ ∂θ Rx x pθ (x)dx = As we have assumed that the dominated convergence theorem applies, one can invert Rx ṗθ (x)dx = 0. R x ṗθ (x) It follows from the previous question that for every θ , x pθ (x) pθ (x)dx = 0. Rx ṗθ (x) ˙ θ (x)pθ (x)dx = 0. Finally, , this proves that for every θ , lnp the derivative and integral sign. This yields 2. x x As ˙ θ (x) = lnp x pθ (x) Z ˙ θ (X)) = E ln(p 0 x ˙ θ (x))pθ (x)dx, ln(p 0 0 x because at θ0 pθ0 (x) is the density of X. Putting the last and last but one equality evaluated together proves the result. 3. We have shown in the previous question that for every Dierencing each side with respect to θ θ, Rx x ˙ θ (x))pθ (x)dx = 0. ln(p and using the dominated convergence theorem, we get x Z ¨ θ (x))pθ (x) + ln(p ˙ θ (x))ṗθ (x) dx = 0. ln(p x This rewrites Z x ¨ θ (x))pθ (x)dx = − ln(p Z x x ˙ θ (x))ṗθ (x)dx. ln(p x 2 ˙ θ (x))ṗθ (x) = ln(p ˙ θ (x)) pθ (x). ln(p 2 ¨ ˙ The rst and last equality merely follow from the denition of E(ln(pθ0 (X)) and E ln(pθ0 (X)) . This proves the middle equality, once noted that 4. The asymptotic variance of the maximum likelihood estimator is therefore equal to −1 −1 2 ¨ ˙ ¨ E ln(pθ0 (X)) E ln(pθ0 (X)) E ln(pθ0 (X)) −1 2 2 2 −1 ˙ ˙ ˙ = −E ln(pθ0 (X)) E ln(pθ0 (X)) −E ln(pθ0 (X)) = E 5. If X 2 −1 ˙ ln(pθ0 (X)) . follows and exponential distribution with parameter ln(pθ (x)) = ln(θ) − θx, and ˙ θ (x)) = ln(p 1 θ n − x. Moreover, θ0 , pθ (x) = θ exp(−θx), E(X) = 1 θ0 , and n V (X) = so 1 θ02 1X 1X θb = argmin − ln(pθ (Xi )) = argmin (− ln(θ) + θXi ) . θ∈Θ n i=1 θ∈Θ n i=1 1 Pn 1 b 1 The foc conditions of this problem are i=1 − θ + Xi = 0, so θ = X . It follows from n the results of the previous questions that its asymptotic variance is E 2 −1 ˙ ln(pθ0 (X)) =E 21 !−1 1 2 1 X− = = θ02 . θ0 V (X) Exercise 8: a very small Monte-Carlo study of M-estimators Let U Let θ0 = be a random variable following the uniform distribution on argmin E(m(U, θ)). [0, 1]. The goal of the exercise is to compute Let θ0 m(u, θ) = (u−θ)2 . and to estimate its θ∈[0,1] m-estimator θb to 1. Show that compare the two. θ0 = 1 2. 2. Generate a data set with a variable Ui U containing 10 000 observations of random variables following the uniform distribution on [0, 1]. 3. Generate 101 more variables in this data set, respectively equal to m(U, 0), m(U, 0.01), m(U, 0.02),..., m(U, 1). 4. Estimate the sample mean of these 101 variables. 5. Which of these variables has the lowest sample mean? 6. What is the value of θb? 7. Do the exercise again with 100 observations. Solution 1. E(m(U, θ)) = θ2 − θ + 1 3 . This quantity is minimized at θ0 = 1 2. 2. See code. 3. See code. 4. See code. 5. In my simulations, 6. θb ≈ 0.5. m(U, θ). interval m(U, 0.5) is the one with the lowest sample mean. The denition of the m-estimator is that it is the value of θ which minimizes Here, I have evaluated this function only over a grid of 100 points of the [0, 1]. If I had went on to pick up 100 equally spaced points over [0.49, 0.51] and evaluated this function over these 100 points, I would probably have found a point where m(U, θ) is lower than m(U, 0.5). That's why here of the m-estimator. 7. With 100 observations, I get θb = 0.49. 22 m(U, 0.5) is an approximation