PS3 for Econometrics 101, Warwick Econ Ph.D * denotes more mathematical exercises, on which you should spend some time if you are a type 2 student but not if you are type 1. If you are type 1, you should still read the result of exercise 3, without trying to prove it, as it will help you interpret results of the applied exercises. Exercise 1: using the delta method to derive condence intervals for σ(Y ). Let (Yi )1≤i≤n be an iid sample of n V (Y ) and random variables with 4th moment and with a strictly positive variance. Let be an estimator of V (Y ), 1. Show that and let 2 Vb (Y ) = Y 2 − Y q σ b(Y ) = Vb (Y ) be an estimator of its standard deviation. 0 √ 2 n (Y , Y )0 − (E(Y 2 ), E(Y ))0 ,→ N (0, V0 ), where V0 = 2. Give a consistent estimator of V (Y 2 ) cov(Y 2 , Y ) cov(Y 2 , Y ) V (Y ) V0 , Vb0 . ! . No need to prove that the estimator is indeed consistent, just mention which theorems you would use to prove consistency. 3. Use the previous question and the delta method to show that √ n Vb (Y ) − V (Y ) ,→ N (0, V1 ), where V1 = (1, −2E(Y ))V0 (1, −2E(Y ))0 . V1 , Vb1 . 4. Give a consistent estimator of You can write Vb1 as a function of Vb0 . No need to prove that the estimator is indeed consistent, just mention which theorems you would use to prove consistency. 5. Give a condence interval for V (Y ) with asymptotic coverage 1 − α. You just need to write the upper and lower bounds of the condence interval as functions of Vb1 , q1− α2 , coverage and n. 1 − α, Vb (Y ), No need to prove that the condence interval indeed has asymptotic just mention which lemma you would use in the proof. 6. Use the delta method to show that √ n (b σ (Y ) − σ(Y )) ,→ N (0, V2 ), where V2 = 1 1 V1 . 4V (Y ) 7. Give a consistent estimator of V2 , Vb2 . You can write Vb2 as a function of Vb1 . No need to prove that the estimator is indeed consistent, just mention which theorem you would use to prove consistency. 8. Give a condence interval for σ(Y ) with asymptotic coverage 1 − α. You just need to write the upper and lower bounds of the condence interval as functions of Vb2 , q1− α2 , coverage and n. 1 − α, σ b(Y ), No need to prove that the condence interval indeed has asymptotic just mention which lemma you would use in the proof. Exercise 2: quantiles as m-estimands In the course, we showed that the median of a continuous random variable can be seen as an m-estimand. In this exercise, we want to show that the same applies to any quantile of Let Y be a continuous random variable with a continuous and strictly increasing cdf ρτ (u) = (τ − 1)u and be the quantile of order Show that τ of for u ≤ 0. You can check that Y : qτ (Y ) = and u, let ρτ (u) = (τ − 1{u ≤ 0})u. ρτ (u) = τ u for admitting a rst moment. For any real number u > 0, FY Y. FY−1 (τ ). qτ (Y ) ρ0.5 (u) = 0.5|u|. is the solution of Then, let qτ (Y ) P (Y ≤ y) = τ . qτ (Y ) = argmin E(ρτ (Y − θ)). θ∈R Hints: you should use the exact same steps as in the proof of Example 2 in the notes. You need to show that E(ρτ (Y − θ)) − E(ρτ (Y − qτ (Y ))) is purpose, you should consider rst a and E(ρτ (Y − qτ (Y ))) Y < qτ (Y ). θ > qτ (Y ). strictly positive for any θ 6= qτ (Y ). You should then decompose both into three pieces depending on whether Then you should replace the functions ρτ (Y − θ) On that E(ρτ (Y −θ)) Y ≥ θ, qτ (Y ) ≤ Y < θ, and ρτ (Y − qτ (Y )) or by simpler expressions within each of the 6 conditional expectations. Finally, at some point you will have to use the fact that P (Y ≤ qτ (Y )) = τ and P (Y > qτ (Y )) = 1 − τ . * Exercise 3: the expected value of a random variable is the average of its quantiles. Let Z fZ . Let be a random variable with a strictly increasing cdf z and z 1. Show that FZ and with a strictly positive pdf respectively denote the lower and upper bounds of its support. E(Z) = R1 0 FZ−1 (τ )dτ . Hint: you should use the substitution z = FZ−1 (τ ) in R1 −1 0 FZ (τ )dτ and the integration by substitution theorem (see Wikipedia entry). 2. Infer from the previous question that E(Y1 − Y0 ) = R1 0 FY−1 (τ ) − FY−1 (τ )dτ : 1 0 the average treatment eect is the average of quantile treatment eects. Exercise 4: using quantile regressions to measure changes in the US wage structure (based on Angrist et al. (2006)) Let τ X be a k×1 vector of random variables. Let of the distribution of Y |X = x. qτ (Y |X = x) the previous exercise that qτ (Y |X = x) denote the quantile of order is just a regular quantile, so it follows from qτ (Y |X = x) = argmin E(ρτ (Y − θ)|X = x). θ∈R 2 Now, let qτ (Y |X) be a random variable equal to Y conditional on qτ (Y |X = x) when X = x. qτ (Y |X) is called the τ th quantile of X. qτ (Y |X) = argmin E(ρτ (Y − g(X))). (0.0.1) g(.) Among all possible functions of X , g(X), qτ (Y |X) is the one which minimizes E(ρτ (Y −g(X))). Indeed, assume to simplify that X is a discrete random variable taking J values x1 , ..., xJ . Then, it follows from the law of iterated expectations that E(ρτ (Y −g(X))) = E(E(ρτ (Y −g(X))|X)) = J X 1{X = xj }E(ρτ (Y −g(xj ))|X = xj )P (X = xj ). j=1 For each j , E(ρτ (Y − g(xj ))|X = xj ), qτ (Y |X = xj ). which minimizes Therefore X 0 βτ X. of X 0 βτ , with PJ j=1 1{X g(xj ), is minimized at = xj }qτ (Y |X = xj ) g(xj ) = is the function of X Equation (0.0.1) motivates the denition of the quantile βτ = argmin E(ρτ (Y − X 0 b)). Angrist et al. (2006) show that b∈Rk is a weighted MMSE approximation of When βτ qτ (Y |X) = E(ρτ (Y − g(X))). regression function viewed as a function of is qτ (Y |X) is indeed a linear function of βbτ = argmin b qτ (Y |X) 1 n Pn i=1 ρτ (Yi − within the set of all linear functions of X , X 0 βτ is equal to it. A natural estimator Xi0 b). The census80.dta, census90.dta, and census00.dta contain representative samples of US born black and white men aged 40-49 with at least ve years of education, with positive annual earnings and hours worked in the year preceding the census, respectively from the 1980, 1990, and 2000 censuses. The goal of the exercise is to estimate, for each census wave, returns of years of education on wages, on dierent quantiles of the distribution of wages, controlling for age, experience, and race. Here, returns should be understood as a correlational concept, not as a causal concept. Coecients of education in the regressions you will run should be interpreted as follows: an increase of one year of education is the τ th associated to an increase of quantile of wages by XXXX, controlling for YYYY. Even though these parameters are not causal ones, they are still very interesting: comparing the structure of these returns across census waves will teach us something on the evolution of the structure of wages in the US over these 20 years. Each of these data sets contains the following variables: age (age in years of the individual), educ (number of years of education), logwk (log wage), exper (number of years of professional experience), exper2 (number of years of professional experience squared), black (a dummy if someone is black). 1. Open census80.dta. Using the qreg command, run quantile regressions of order 10, 25, 50, 75, and 90 of logwk on educ, age, exper, exper2, and black. 2. Interpret the coecient of educ in each regression, following the guideline I gave you in the introduction of the exercise. Remember that wage is in logarithm, and that you control for age, exper, exper2, and black in your regression. 3 3. Do the condence intervals of the coecient of educ in the quantile regressions of order 10, 25, 50, 75, and 90 overlap? In 1980, did years of education had very dierent returns across dierent quantiles of the distribution of wages? 4. Based on your regression results, draw a graph with years of education on the x-axis, and predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education, holding the other variables constant. mapping education into education Your graph should consist in 5 lines: educ , ×βb0.1 the line the line mapping education into education educ etc., where β beduc denotes the coecient of education in the quantile regression of ×βb0.25 0.1 educ b order 10, β0.25 denotes the coecient of education in the quantile regression of order 25, etc.. No need that the line correspond perfectly to the true quantile regression functions, but based on your regression results, you should determine a) if each of these 5 lines is increasing or decreasing and b) whether they are parrallel to each other, and if not which ones are more increasing / decreasing than the others. 5. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient of educ in that regression very dierent from that of educ in the quantile regressions? Is the value of this coecient very surprising in view of the result in Exercise 3 and the results of the quantile regressions? 6. Open census90.dta. Using the qreg command, run quantile regressions of order 10, 25, 50, 75, and 90 of logwk on educ, age, exper, exper2, and black. 7. Interpret the coecient of educ in each regression. 8. Do the condence intervals of the coecient of educ in the quantile regressions of order 10, 25, 50, 75, and 90 overlap? In 1990, did years of education had very dierent returns across dierent quantiles of the distribution of wages? How do the coecients of the 1990 quantile regressions compare to those of the 1980 regressions? 9. Based on your regression results, draw a graph with years of education on the x-axis, and predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education, holding the other variables constant. mapping education into education Your graph should consist in 5 lines: educ , ×βb0.1 the line the line mapping education into education educ ×βb0.25 beduc denotes the coecient of education in the quantile regression of etc., where β 0.1 educ b order 10, β denotes the coecient of education in the quantile regression of order 25, 0.25 etc.. No need that the line correspond perfectly to the true quantile regression functions, but based on your regression results, you should determine a) if each of these 5 lines is increasing or decreasing and b) whether they are parrallel to each other, and if not which ones are more increasing / decreasing than the others. 10. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient of educ in that regression very dierent from that of educ in the quantile regressions? 4 Is the value of this coecient very surprising in view of the result in Exercise 3 and the results of the quantile regressions? 11. Open census00.dta. Using the qreg command, run quantile regressions of order 10, 25, 50, 75, and 90 of logwk on educ, age, exper, exper2, and black. 12. Interpret the coecient of educ in each regression. 13. Do the condence intervals of the coecient of educ in the quantile regressions of order 10, 25, 50, 75, and 90 overlap? In 2000, did years of education had very dierent returns across dierent quantiles of the distribution of wages? 14. Based on your regression results, draw a graph with years of education on the x-axis, and predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education, holding the other variables constant. mapping education into education Your graph should consist in 5 lines: educ , ×βb0.1 the line the line mapping education into education educ ×βb0.25 beduc denotes the coecient of education in the quantile regression of etc., where β 0.1 educ b order 10, β denotes the coecient of education in the quantile regression of order 25, 0.25 etc.. No need that the line correspond perfectly to the true quantile regression functions, but based on your regression results, you should determine a) if each of these 5 lines is increasing or decreasing and b) whether they are parrallel to each other, and if not which ones are more increasing / decreasing than the others. 15. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient of educ in that regression very dierent from that of educ in the quantile regressions? Is the value of this coecient very surprising in view of the result in Exercise 3 and the results of the quantile regressions? 16. Compare the three graphs you drew, to summarize the results from the quantiles regressions in 1980, 1990, and 2000. How did the returns to education evolve over these 20 years? What have been the main changes in the structure of US wages over that period? 17. Find another potentially interesting set of quantile regressions you could run, run them, interpret their results, and send me a 10 lines email explaining the regressions you ran, and your results. Exercise 5: using quantile regressions to measure heterogeneous returns to a boarding school for disadvantaged students (based on Behaghel et al. (2014)) Behaghel et al. (2014) ran an experiment in which applicants to a boarding school for disadvantaged students were randomly admitted to the school. The boardingschool.dta data set contains three variables. mathsscore is students maths score two years after the lottery which determined who was admitted or not to the school took place. This measure is standardized with respect to the mean and standard deviation of the raw score in the control group. This 5 Y. is the outcome variable in this exercise, so we denote it baselinemathsscore is a measure of students baseline maths ability, before the lottery took place. We will use it as a control variable, so we denote it X. boarding is a dummy equal to 1 if a student won the lottery and gained admission to the boarding school. This is the treatment variable, so it is denoted D. 1. Run a very simple regression to show that students who won the lottery had very similar baseline maths scores to those who lost the lottery. If you were to compare the treatment and the control groups not on one, but on 20 baseline characteristics, how many signicant dierences at the 5% level should you expect to nd? 2. Run a very simple regression to measure the eect of the boarding school on students' average maths test scores two years after the lottery. Interpret the coecient from this regression (do not forget that test scores are standardized with respect to the mean and standard deviation of the raw score in the control group). 3. Run a slightly more complicated regression to increase the statistical precision of your coecient of interest. regression? Does this coecient change a lot with respect to the previous Was this expected? Does the standard error of your main coecient of interest change a lot with respect to the previous regression? Was this expected? 4. We have qτ (Y |D) = 1{D = 0}qτ (Y |D = 0) + 1{D = 1}qτ (Y |D = 1) = qτ (Y |D = 0) + D(qτ (Y |D = 1) − qτ (Y |D = 0)). This shows that of Y qτ (Y |D) on a constant and D is a linear function of qτ (Y |D = 0), while the coecient of D will be an qτ (Y |D = 1) − qτ (Y |D = 0). (a) Show that order Therefore, a quantile regression will estimate this conditional quantile function: the coecient of the constant will be an estimate of estimate of (1, D)0 . βbτ is an estimate of FY−1 (τ ) − FY−1 (τ ), 1 0 the quantile treatment eect of τ. (b) Plot a graph with τ on the x-axis and βbτ 90% condence interval on the y-axis, for and the upper and lower bounds of its τ ranging from 0.03 to 0.97. On that purpose, you need to: 1. run a quantile regression of order for every 2. store τ ranging from 0.03 to τ of Y on D using the qreg command in Stata 0.97 τ , βbτ , βbτ − 1.64 ∗ se(βbτ ), βbτ + 1.64 ∗ se(βbτ ) in a matrix 3. clear the data 4. form a data set containing the values you stored in the matrix, using the svmat command 5. plot the requested graph. 6 (c) Does the boarding school increase more the lowest quantiles of the distribution of scores, those close to the median, or the highest ones? (d) Does the graph conrm the result in exercise 3? 5. The rank invariance assumption states that the treatment does not change the rank of an individual in the distribution of the outcome. The individual at the 90th percentile of the distribution of Y0 will also be at the 90th percentile of the distribution of Y1 . Under this assumption, quantile treatment eects can be interpreted as individual treatment FY−1 (τ ) − FY−1 (τ ) 1 0 eects. compares the τ th quantile of the distributions of Y1 and Y0 . th quantile Under the rank invariance assumption, it is the same individual who is at the τ of these two distributions, so FY−1 (τ )−FY−1 (τ ) is a measure of the eect of the treatment 1 0 on her. The rank invariance assumption is not a credible assumption, but it is useful to assess treatment eect heterogeneity. Indeed, any deviation from rank invariance implies more treatment eect heterogeneity than that obtained under rank invariance. One can formally show that V (Y1 − Y0 ) ≥ V RI (Y1 − Y0 ): the true variance of the treatment eect, which we cannot measure because we do not observe V RI (Y 1 − Y0 ), Y1 − Y0 , is at least as high as the variance of the treatment eect under rank invariance. interesting result, because V RI (Y 1 − Y0 ) That's an is something we can estimate. In the simple case where the sample bears as many treated and untreated observations (n0 = n1 ), th largest value of under rank invariance the Y0 of Y(i)1 , the observation with the i th largest value of Y . Therefore, one can use merely Y(i)0 , the i 0 n 0 1X (Y(i)1 − Y(i)0 )2 − n i=1 to estimate n 0 1X (Y(i)1 − Y(i)0 ) n Y1 , is !2 i=1 V RI (Y1 − Y0 ). We will not prove this result that any deviation from rank invariance implies more treatment eect heterogeneity than that obtained under rank invariance, but let me give you a little bit of intuition. Assume that FY−1 (τ ) − FY−1 (τ ) 1 0 is increasing in τ. Rank invariance already implies treatment eect heterogeneity: under rank invariance, FY−1 (τ ) − FY−1 (τ ) 1 0 increasing in τ implies that people at the upper quantiles of the dis- tribution of the outcome have larger eects than those at the lower quantiles. Now, any deviation from rank invariance implies even more treatment eect heterogeneity. Under rank invariance, FY−1 (1) − FY−1 (1) is the value of the treatment eect for the person with 1 0 the highest value of Y1 . If rank invariance is violated, the counterfactual value of the outcome for that person under non treatment is no longer τ < 1. FY−1 (1), but FY−1 (τ ), for some 0 0 If rank invariance is violated, this person, instead of having the highest − must Y0 . This implies that the true treatment eect −1 FY0 (τ ), is even larger than what we had concluded under have a lower rank in the distribution of −1 for that person, FY (1) 1 Y0 , 7 rank invariance. Similarly, if the person with the lowest value of have the lowest value of Y0 , Y1 actually did not then the true treatment eect for that person is even lower than what we had concluded under rank invariance. In this example, deviations from rank invariance mean that people at upper quantiles have even larger eects than what we had concluded under rank invariance, while people at lower quantiles have even lower eects, implying that overall, treatment eects are even more heterogeneous that what we had concluded under rank invariance. After this long introduction, consider again the graph you obtained in question 4.b). (a) Look at whether condence intervals of (τ ) − FY−1 (τ ) FY−1 1 0 for dierent values of τ −1 −1 cross or not. At the 10% level, can you reject that FY (τ ) − FY (τ ) is constant 1 0 −1 −1 for every τ ? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only two 1 0 −1 −1 dierent values? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only 1 0 three dierent values? (b) Under the rank invariance assumption, what is the minimum number of dierent values that Y1 − Y0 must take? Are these values very dierent from each other? Is the eect of the boarding school on students test scores heterogeneous? 6. Given the shape of the quantile treatment eect function likely to be greater or smaller than V (Y0 )? τ 7→ FY−1 (τ )−FY−1 (τ ), is V (Y1 ) 1 0 How could you estimate V (Y1 ) and V (Y0 )? Which results could you use to derive condence intervals for these two quantities? 7. This policy is expensive: it multiplies by two the expenditure per year per student. However, the gain on average maths scores it produces is roughly the same as the one produced by dividing class size by two, which also multiplies by two the expenditure per year per student. The literature has found that class size reductions have larger eects for the weakest students. Based on the results of the previous questions, nd one reason why you might prefer class size reduction over the boarding school policy if you were a social planner. Send me a 5 lines email with your answer. * Exercise 6: Proving that Theorem 6.4.4 applies to the sample median Let Y be continuous random variable with strictly increasing and continuous cdf strictly positive and continuous pdf support of Y. fY . Let y and y FY and be the lower and upper bounds of the The goal of this exercise is to show that the median of Y is an m-estimand satisfying Assumption 9 in the notes, and to infer from this its asymptotic behavior using Theorem 6.4.4. It follows from example 2 in the notes that me(Y ) = argmin E(|Y − θ|). θ∈R Therefore, me(Y ) = argmin E(|Y − θ| − |Y |). So me(Y ) can be seen as an m-estimand with θ∈R m(y, θ) = |y − θ| − |y|. 1. Show that θ 7→ m(y, θ) is dierentiable in θ for every θ dierent from expression of its derivative. Hint: you should start considering a value of 8 y. Give an θ ≥ 0, then you should give explicit expressions of m(y, θ) depending on the value of y (there are three cases you must distinguish), then you should dierentiate these expressions with respect to θ. Then you should do the same thing for 2. Infer from the previous question that 9: θ 7→ m(y, θ) 3. Show that m(y, θ) m(y, θ) satises the rst requirement of Assumption θ0 = me(Y ) is dierentiable at θ < 0. for almost every y ∈ R. satises the second requirement of Assumption 9: for every (θ1 , θ2 ), |m(y, θ1 ) − m(y, θ2 )| ≤ |θ1 − θ2 |. 4. Let M (θ) = E(m(Y, θ)) = E(|Y − θ| − |Y |). (a) Show that for any θ ≥ 0, Z 0 E(|Y − θ| − |Y |) = Z θ Z 0 y y (θ − 2y)fY (y)dy − θfY (y)dy + θfY (y)dy. θ (b) Infer from the previous question that θ Z (θ − 2y)fY (y)dy − θ(1 − FY (θ)). E(|Y − θ| − |Y |) = θFY (0) + 0 (c) Infer from the previous question that Z E(|Y − θ| − |Y |) = θ −2yfY (y)dy − θ(1 − 2FY (θ)). 0 (d) Infer from the previous question and from an integration by parts (see Wikipedia entry if needed) that θ Z E(|Y − θ| − |Y |) = 2 FY (y)dy − θ. 0 θ ≥ 0, M (θ) is twice continuously dierentiable. Give Ṁ (θ) (f ) Follow the same steps as in the 5 preceding subquestions to show that for any θ < 0, (e) Conclude that for any and M̈ (θ). Z 0 E(|Y − θ| − |Y |) = −2 FY (y)dy − θ. θ (g) Conclude that for any and θ < 0, M (θ) is twice continuously dierentiable. Give Ṁ (θ) M̈ (θ). (h) Check that Ṁ (θ) and M̈ (θ) are continuous at continuously dierentiable for any (i) Check that M̈ (θ) 0. Conclude that is invertible for any θ. Give M̈ (θ0 )−1 . me(Y ) is an m-estimand satisfying As- sumption 9. Apply Theorem 6.4.4 to deduce from this that me(Y ). is twice θ ∈ R. 5. Conclude from the preceding questions that normal estimator of M (θ) me(Y c ) is an asymptotically Give an expression of its asymptotic variance. 9 * Exercise 7: Using Theorem 6.4.3 to derive the asymptotic behavior of the maximum likelihood estimator. Let X be a random variable with support θ ∈ Θ ⊆ R} θ0 ∈ Θ. [x, x], f ∈ {pθ : whose probability density function (to simplify, I assume that the dimension of It follows from Example 5 in the notes that − ln(pθ (x)). The corresponding m-estimator is θb, 1 Pn argmin i=1 − ln(pθ (Xi )). We have seen that if: n θ0 θ is one). f = pθ0 for some m(x, θ) = b= estimator: θ is an m-estimand, with the maximum likelihood θ∈Θ 1. for any θ, pθ 6= pθ0 on at least a non empty open subset of the support of 2. Θ 3. θ 7→ ln(pθ (x)) is twice continuously dierentiable with respect to 4. ¨ θ (X)) E(ln(p 0 is invertible, then θ0 X is compact θ for every x. satises Assumptions 5, 6, 7, and 8 in the notes, so we can apply Theorem 6.4.3 to assert that the maximum likelihood estimator is asymptotically normal, with asymptotic variance −1 2 −1 ¨ ˙ ¨ θ (X)) E ln(pθ0 (X)) E ln(pθ0 (X)) E ln(p . 0 There should be − transformed into a signs appearing in each term, but we can forget about them, as one is + by the square, and multiplying the two other yields a + as well. The goal of this exercise is to derive a simpler expression of this asymptotic variance. follows, all derivatives are taken with respect to 1. Show that for any θ, Rx x ṗθ (x)dx = 0. In what θ. Hint: you can use the fact that Rx x pθ (x)dx = 1 and assume that you can invert the derivative and integral signs because the dominated convergence theorem applies. 2. Infer from this that ˙ θ (X))) = E(ln(p 0 x Z ˙ θ (x))pθ (x)dx = 0. ln(p 0 0 x 3. Infer from this that ¨ θ (X)) = E(ln(p 0 Z x ¨ θ (x))pθ (x)dx = − ln(p 0 0 x Z x ˙ θ (x)))2 pθ (x)dx = −E (ln(p 0 0 2 ˙ θ (X)) . ln(p 0 x Hint: the rst and last equality just follow from the denition of these expectations. The one you need to prove is the middle one. Rx x ˙ θ (x))pθ (x)dx = 0 ln(p with respect to For that, you should dierentiate θ. 4. Infer from this that the asymptotic variance of the maximum likelihood estimator is E 2 −1 ˙ ln(pθ0 (X)) . 10 5. Assume X follows a exp(θ0 ) distribution. compute the asymptotic variance of Compute θb. Use the previous questions to θb. Exercise 8: a very small Monte-Carlo study of M-estimators Let U Let θ0 = be a random variable following the uniform distribution on argmin E(m(U, θ)). [0, 1]. The goal of the exercise is to compute Let θ0 m(u, θ) = (u−θ)2 . and to estimate its θ∈[0,1] m-estimator θb to 1. Show that compare the two. θ0 = 1 2. 2. Generate a data set with a variable Ui U containing 10 000 observations of random variables following the uniform distribution on [0, 1]. 3. Generate 101 more variables in this data set, respectively equal to m(U, 0.02),..., m(U, 1). 4. Estimate the sample mean of these 101 variables. 5. Which of these variables has the lowest sample mean? 6. What is the value of θb? 7. Do the exercise again with 100 observations. 11 m(U, 0), m(U, 0.01),