PS3 for Econometrics 101, Warwick Econ Ph.D

advertisement
PS3 for Econometrics 101, Warwick Econ Ph.D
* denotes more mathematical exercises, on which you should spend some time if you are a
type 2 student but not if you are type 1. If you are type 1, you should still read the result
of exercise 3, without trying to prove it, as it will help you interpret results of the applied
exercises.
Exercise 1: using the delta method to derive condence intervals for
σ(Y ).
Let
(Yi )1≤i≤n
be an iid sample of
n
V (Y )
and
random variables with 4th moment and with a strictly
positive variance. Let
be an estimator of
V (Y ),
1. Show that
and let
2
Vb (Y ) = Y 2 − Y
q
σ
b(Y ) = Vb (Y ) be an estimator
of its standard deviation.
0
√ 2
n (Y , Y )0 − (E(Y 2 ), E(Y ))0 ,→ N (0, V0 ),
where
V0 =
2. Give a consistent estimator of
V (Y 2 )
cov(Y 2 , Y )
cov(Y 2 , Y )
V (Y )
V0 , Vb0 .
!
.
No need to prove that the estimator is indeed
consistent, just mention which theorems you would use to prove consistency.
3. Use the previous question and the delta method to show that
√ n Vb (Y ) − V (Y ) ,→ N (0, V1 ),
where
V1 = (1, −2E(Y ))V0 (1, −2E(Y ))0 .
V1 , Vb1 .
4. Give a consistent estimator of
You can write
Vb1
as a function of
Vb0 .
No need to
prove that the estimator is indeed consistent, just mention which theorems you would
use to prove consistency.
5. Give a condence interval for
V (Y )
with asymptotic coverage
1 − α.
You just need
to write the upper and lower bounds of the condence interval as functions of
Vb1 , q1− α2 ,
coverage
and
n.
1 − α,
Vb (Y ),
No need to prove that the condence interval indeed has asymptotic
just mention which lemma you would use in the proof.
6. Use the delta method to show that
√
n (b
σ (Y ) − σ(Y )) ,→ N (0, V2 ),
where
V2 =
1
1
V1 .
4V (Y )
7. Give a consistent estimator of
V2 , Vb2 .
You can write
Vb2
as a function of
Vb1 .
No need
to prove that the estimator is indeed consistent, just mention which theorem you would
use to prove consistency.
8. Give a condence interval for
σ(Y )
with asymptotic coverage
1 − α.
You just need
to write the upper and lower bounds of the condence interval as functions of
Vb2 , q1− α2 ,
coverage
and
n.
1 − α,
σ
b(Y ),
No need to prove that the condence interval indeed has asymptotic
just mention which lemma you would use in the proof.
Solution
1. Let
Ui = (Yi2 , Yi )0 . (Ui )1≤i≤n
with a second moment (as
Yi
is an iid sequence of
2×1
vectors of random variables
has a fourth moment by assumption). It follows from the
central limit theorem that
√
n(U − E(U )) ,→ N (0, V (U )).
√
√ This proves the result, once noted that
n(U −E(U )) = n (Y 2 , Y )0 − (E(Y 2 ), E(Y ))0
and
V (U ) = V ((Y , Y ) ) =
2.

Vb0 = 
is a consistent estimator of
V (Y 2 )
cov(Y 2 , Y )
cov(Y 2 , Y )
V (Y )
0
2
V0 .
2
Y4− Y2
Y 3 − Y 2Y
!
= V0 .

Y 3 − Y 2Y
2 
Y2− Y
The proof would rely on the weak law of large numbers,
and on the continuous mapping theorem.
3. Let
Φ(x, y) = x − y 2 .
Now,
∂Φ
∂x (x, y)
Φ(E(Y 2 ), E(Y )) = V (Y
∂Φ
∂y (x, y)
), and Φ(Y 2 , Y ) =
= 1,
and
= −2y . Therefore Φ̇(x, y) = (1, −2y)0 .
Vb (Y ). Then, it follows from the delta
method and from the rst question that
√ n Vb (Y ) − V (Y ) ,→ N (0, Φ̇(E(Y 2 ), E(Y ))0 V0 Φ̇(E(Y 2 ), E(Y ))).
This proves the result.
4.
Vb1 = (1, −2Y )Vb0 (1, −2Y )0 is a consistent estimator of V1 . The proof follows from the
b0 is a consistent estimator of V0 , from the weak law of large numbers, and
fact that V
from the continuous mapping theorem.
5.
s

IC 1 (α) = Vb (Y ) − q1− α2
is a condence interval for
V (Y )
s
Vb1 b
, V (Y ) + q1− α2
n

Vb1 
n
with asymptotic coverage equal to
relies on results from previous questions and the Slutsky lemma.
2
1 − α.
The proof
6. Let
φ(x) =
√
x. φ̇(x) =
1
√
.
2 x
φ(V (Y )) = σ(Y )
and
φ(Vb (Y )) = σ
b(Y ).
Then, it follows
from the delta method and from the result of the third question that
√
n (b
σ (Y ) − σ(Y )) ,→ N (0, (φ̇(V (Y )))2 V1 ).
This proves the result.
Vb2 =
7.
1
b
2 V1 is a consistent estimator of
4 Y 2 −(Y )
consistent estimator of
V2 .
This follows from the fact that
Vb1
is a
V1 , from the weak law of large numbers, and from the continuous
mapping theorem.
8.
s

IC 2 (α) = σ
b(Y ) − q1− α2
is a condence interval for
σ(Y )
s
Vb2
,σ
b(Y ) + q1− α2
n

b
V2 
n
with asymptotic coverage equal to
1 − α.
The proof
relies on results from previous questions and the Slutsky lemma.
Exercise 2: quantiles as m-estimands
In the course, we showed that the median of a continuous random variable can be seen as an
m-estimand. In this exercise, we want to show that the same applies to any quantile of
Let
Y
be a continuous random variable with a continuous and strictly increasing cdf
ρτ (u) = (τ − 1)u
and
be the quantile of order
Show that
τ
of
for
u ≤ 0.
and
u, let ρτ (u) = (τ − 1{u ≤ 0})u. ρτ (u) = τ u for
admitting a rst moment. For any real number
u > 0,
FY ,
Y.
You can check that
Y : qτ (Y ) = FY−1 (τ ). qτ (Y )
ρ0.5 (u) = 0.5|u|.
is the solution of
Then, let
qτ (Y )
P (Y ≤ y) = τ .
qτ (Y ) = argmin E(ρτ (Y − θ)).
θ∈R
Hints: you should use the exact same steps as in the proof of Example 2 in the notes. You need
to show that
E(ρτ (Y − θ)) − E(ρτ (Y − qτ (Y ))) is
purpose, you should consider rst a
and
E(ρτ (Y − qτ (Y )))
Y < qτ (Y ).
θ > qτ (Y ).
strictly positive for any
θ 6= qτ (Y ).
You should then decompose both
into three pieces depending on whether
Then you should replace the functions
ρτ (Y − θ)
On that
E(ρτ (Y −θ))
Y ≥ θ, qτ (Y ) ≤ Y < θ,
and
ρτ (Y − qτ (Y ))
or
by simpler
expressions within each of the 6 conditional expectations. Finally, at some point you will have
to use the fact that
P (Y ≤ qτ (Y )) = τ
and
P (Y > qτ (Y )) = 1 − τ .
3
Solution
Let
θ
be strictly greater than
qτ (Y ).
Then,
E(ρτ (Y − θ)) − E(ρτ (Y − qτ (Y )))
=
E(ρτ (Y − θ)|Y ≥ θ)P (Y ≥ θ) + E(ρτ (Y − θ)|qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
+
E(ρτ (Y − θ)|Y < qτ (Y ))P (Y < qτ (Y ))
−
E(ρτ (Y − qτ (Y ))|Y ≥ θ)P (Y ≥ θ) − E(ρτ (Y − qτ (Y ))|qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
−
E(ρτ (Y − qτ (Y ))|Y < qτ (Y ))P (Y < qτ (Y ))
=
τ E(Y − θ|Y ≥ θ)P (Y ≥ θ) + (1 − τ )E(θ − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
+
(1 − τ )E(θ − Y |Y < qτ (Y ))P (Y < qτ (Y ))
−
τ E(Y − qτ (Y )|Y ≥ θ)P (Y ≥ θ) − τ E(Y − qτ (Y )|qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
−
(1 − τ )E(qτ (Y ) − Y |Y < qτ (Y ))P (Y < qτ (Y ))
=
τ (qτ (Y ) − θ)P (Y ≥ θ) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ) + (1 − τ )(θ − qτ (Y ))P (Y < qτ (Y ))
=
(θ − qτ (Y ))((1 − τ )P (Y < qτ (Y )) − τ P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
=
(θ − qτ (Y ))((1 − τ )P (Y ≤ qτ (Y )) − τ P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
=
(θ − qτ (Y ))((1 − τ )τ − τ P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
=
(θ − qτ (Y ))τ (P (Y > qτ (Y )) − P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
=
(θ − qτ (Y ))τ (P (Y ≥ qτ (Y )) − P (Y ≥ θ)) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
=
(θ − qτ (Y ))τ P (qτ (Y ) ≤ Y < θ) + E((1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)P (qτ (Y ) ≤ Y < θ)
=
P (qτ (Y ) ≤ Y < θ)E((θ − qτ (Y ))τ + (1 − τ )θ + τ qτ (Y ) − Y |qτ (Y ) ≤ Y < θ)
=
P (qτ (Y ) ≤ Y < θ)E(θ − Y |qτ (Y ) ≤ Y < θ)
>
0.
The second equality follows from the law of iterated expectations.
Various steps merely
amounts to replacing weak by strict inequalities (or the converse) in probabilities: these hold
because
Y
the cdf of
is assumed to have a continuous distribution. The last inequality holds because
Y
is strictly increasing and
qτ (Y ) ≤ Y < θ, θ − Y
θ > qτ (Y ),
is strictly positive.
so
P (qτ (Y ) ≤ Y < θ) > 0.
E(θ − Y |qτ (Y ) ≤ Y ≤ θ)
Moreover, when
is the expected value of
a random variable which is always strictly positive, therefore it is strictly positive. A similar
reasoning applies for
θ < qτ (Y ).
* Exercise 3: the expected value of a random variable is the average of its quantiles.
Let
Z
fZ .
Let
be a random variable with a strictly increasing cdf
z
and
z
1. Show that
FZ
and with a strictly positive pdf
respectively denote the lower and upper bounds of its support.
E(Z) =
R1
0
FZ−1 (τ )dτ .
Hint: you should use the substitution
z = FZ−1 (τ )
in
R1
−1
0 FZ (τ )dτ and the integration by substitution theorem (see Wikipedia entry).
2. Infer from the previous question that
E(Y1 − Y0 ) =
R1
0
FY−1
(τ ) − FY−1
(τ )dτ :
1
0
the average
treatment eect is the average of quantile treatment eects.
Solution
1. To prove the result, we are going to use the integration by susbtitution theorem (see the
4
wikipedia entry for more details). Let's do the substitution
z = FZ−1 (τ ) in
R1
FZ−1 (τ )dτ .
0
dz = ḞZ−1 (τ )dτ.
Moreover,
1
ḞZ−1 (τ ) =
ḞZ (FZ−1 (τ ))
FZ (FZ−1 (τ )) = τ
This can be seen by dierencing
ḞZ−1 (τ ) =
.
with respect to
1
fZ (FZ−1 (τ ))
=
τ.
Therefore,
1
,
fZ (z)
which nally implies that
fZ (z)dz = dτ.
Moreover,
FZ−1 (0) = z ,
and
FZ−1 (1) = z .
Therefore, it follows from the integration by
susbtitution theorem that
Z
0
1
FZ−1 (τ )dτ
Z
z
=
zfZ (z)dz = E(Z).
z
2. It follows from the previous question that
1
Z
E(Y1 − Y0 ) = E(Y1 ) − E(Y0 ) =
0
FY−1
(τ )dτ −
1
Z
0
1
FY−1
(τ )dτ =
0
Z
0
1
FY−1
(τ ) − FY−1
(τ )dτ.
1
0
Exercise 4: using quantile regressions to measure changes in the US wage structure
(based on Angrist et al. (2006))
Let
τ
X
be a
k×1
vector of random variables. Let
of the distribution of
Y |X = x. qτ (Y |X = x)
qτ (Y |X = x)
denote the quantile of order
is just a regular quantile, so it follows from
qτ (Y |X = x) = argmin E(ρτ (Y − θ)|X = x).
the previous exercise that
Now, let
qτ (Y |X)
be
θ∈R
a random variable equal to
Y
conditional on
qτ (Y |X = x)
when
X = x. qτ (Y |X)
is called the
τ th
quantile of
X.
qτ (Y |X) = argmin E(ρτ (Y − g(X))).
(0.0.1)
g(.)
Among all possible functions of
X , g(X), qτ (Y |X) is the one which minimizes E(ρτ (Y −g(X))).
Indeed, assume to simplify that
X
is a discrete random variable taking
J
values
x1 , ..., xJ .
Then, it follows from the law of iterated expectations that
E(ρτ (Y −g(X))) = E(E(ρτ (Y −g(X))|X)) =
J
X
1{X = xj }E(ρτ (Y −g(xj ))|X = xj )P (X = xj ).
j=1
For each
j , E(ρτ (Y − g(xj ))|X = xj ),
qτ (Y |X = xj ).
which minimizes
Therefore
qτ (Y |X) =
E(ρτ (Y − g(X))).
regression function
X 0 βτ ,
viewed as a function of
with
PJ
j=1 1{X
g(xj ),
is minimized at
= xj }qτ (Y |X = xj )
g(xj ) =
is the function of
X
Equation (0.0.1) motivates the denition of the quantile
βτ = argmin E(ρτ (Y − X 0 b)).
b∈Rk
5
Angrist et al. (2006) show that
X 0 βτ
X.
of
is a weighted MMSE approximation of
When
βτ
is
qτ (Y |X)
is indeed a linear function of
βbτ = argmin
b
qτ (Y |X)
1
n
Pn
i=1 ρτ (Yi
−
within the set of all linear functions of
X , X 0 βτ
is equal to it. A natural estimator
Xi0 b).
The census80.dta, census90.dta, and census00.dta contain representative samples of US born
black and white men aged 40-49 with at least ve years of education, with positive annual
earnings and hours worked in the year preceding the census, respectively from the 1980, 1990,
and 2000 censuses. The goal of the exercise is to estimate, for each census wave, returns of
years of education on wages, on dierent quantiles of the distribution of wages, controlling
for age, experience, and race. Here, returns should be understood as a correlational concept,
not as a causal concept. Coecients of education in the regressions you will run should be
interpreted as follows: an increase of one year of education is
the
τ th
associated to an increase of
quantile of wages by XXXX, controlling for YYYY. Even though these parameters
are not causal ones, they are still very interesting: comparing the structure of these returns
across census waves will teach us something on the evolution of the structure of wages in the
US over these 20 years.
Each of these data sets contains the following variables: age (age in years of the individual),
educ (number of years of education), logwk (log wage), exper (number of years of professional
experience), exper2 (number of years of professional experience squared), black (a dummy if
someone is black).
1. Open census80.dta. Using the qreg command, run quantile regressions of order 10, 25,
50, 75, and 90 of logwk on educ, age, exper, exper2, and black.
2. Interpret the coecient of educ in each regression, following the guideline I gave you
in the introduction of the exercise. Remember that wage is in logarithm, and that you
control for age, exper, exper2, and black in your regression.
3. Do the condence intervals of the coecient of educ in the quantile regressions of order
10, 25, 50, 75, and 90 overlap? In 1980, did years of education had very dierent returns
across dierent quantiles of the distribution of wages?
4. Based on your regression results, draw a graph with years of education on the
x-axis, and
predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education,
holding the other variables constant.
mapping education into education
Your graph should consist in 5 lines:
educ ,
×βb0.1
the line
the line mapping education into education
educ etc., where β
beduc denotes the coecient of education in the quantile regression of
×βb0.25
0.1
educ
b
order 10, β0.25 denotes the coecient of education in the quantile regression of order 25,
etc.. No need that the line correspond perfectly to the true quantile regression functions,
but based on your regression results, you should determine a) if each of these 5 lines
is increasing or decreasing and b) whether they are parrallel to each other, and if not
which ones are more increasing / decreasing than the others.
6
5. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient
of educ in that regression very dierent from that of educ in the quantile regressions?
Is the value of this coecient very surprising in view of the result in Exercise 3 and the
results of the quantile regressions?
6. Open census90.dta. Using the qreg command, run quantile regressions of order 10, 25,
50, 75, and 90 of logwk on educ, age, exper, exper2, and black.
7. Interpret the coecient of educ in each regression.
8. Do the condence intervals of the coecient of educ in the quantile regressions of order
10, 25, 50, 75, and 90 overlap? In 1990, did years of education had very dierent returns
across dierent quantiles of the distribution of wages?
How do the coecients of the
1990 quantile regressions compare to those of the 1980 regressions?
9. Based on your regression results, draw a graph with years of education on the
x-axis, and
predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education,
holding the other variables constant.
mapping education into education
Your graph should consist in 5 lines:
educ ,
×βb0.1
the line
the line mapping education into education
educ etc., where β
beduc denotes the coecient of education in the quantile regression of
×βb0.25
0.1
educ
b
order 10, β0.25 denotes the coecient of education in the quantile regression of order 25,
etc.. No need that the line correspond perfectly to the true quantile regression functions,
but based on your regression results, you should determine a) if each of these 5 lines
is increasing or decreasing and b) whether they are parrallel to each other, and if not
which ones are more increasing / decreasing than the others.
10. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient
of educ in that regression very dierent from that of educ in the quantile regressions?
Is the value of this coecient very surprising in view of the result in Exercise 3 and the
results of the quantile regressions?
11. Open census00.dta. Using the qreg command, run quantile regressions of order 10, 25,
50, 75, and 90 of logwk on educ, age, exper, exper2, and black.
12. Interpret the coecient of educ in each regression.
13. Do the condence intervals of the coecient of educ in the quantile regressions of order
10, 25, 50, 75, and 90 overlap? In 2000, did years of education had very dierent returns
across dierent quantiles of the distribution of wages?
14. Based on your regression results, draw a graph with years of education on the
x-axis, and
predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education,
holding the other variables constant.
Your graph should consist in 5 lines:
the line
beduc , the line mapping education into education
mapping education into education ×β
0.1
7
educ
×βb0.25
etc., where
order 10,
educ
βb0.25
educ
βb0.1
denotes the coecient of education in the quantile regression of
denotes the coecient of education in the quantile regression of order 25,
etc.. No need that the line correspond perfectly to the true quantile regression functions,
but based on your regression results, you should determine a) if each of these 5 lines
is increasing or decreasing and b) whether they are parrallel to each other, and if not
which ones are more increasing / decreasing than the others.
15. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient
of educ in that regression very dierent from that of educ in the quantile regressions?
Is the value of this coecient very surprising in view of the result in Exercise 3 and the
results of the quantile regressions?
16. Compare the three graphs you drew, to summarize the results from the quantiles regressions in 1980, 1990, and 2000. How did the returns to education evolve over these 20
years? What have been the main changes in the structure of US wages over that period?
17. Find another potentially interesting set of quantile regressions you could run, run them,
interpret their results, and send me a 10 lines email explaining the regressions you ran,
and your results.
Solution
1. See code.
2.
educ = 0.073.
βb0.1
As wages are in logarithm, and the regression controls for age, experience,
experience squared, and race, the regression coecient should be interpreted as follows:
In 1980, one more year of education was associated with a 7.3% increase of the 10th
quantile of wages, holding age, experience, experience squared, and race constant. All
the other coecients can be interpreted similarly.
3. The condence interval of
educ
β0.1
overlaps with that of
educ , β educ ,
β0.25
0.75
and
educ ,
β0.90
imply-
ing that these coecients might not be signicantly dierent (checking that condence
intervals overlap is not, strictly speaking, a test of whether two parameters are significantly dierent, but this is a good indication).
educ ,
β0.25
educ ,
β0.50
educ , and
β0.75
Similarly, the condence intervals of
educ most often overlap.
β0.90
Anyway these coecients are not
economically very dierent: they are all included between
0.068
and
0.079,
implying at
most a one percentage point dierence in returns to education across those quantiles of
wages.
4. Education has fairly similar positive returns across all quantiles. Therefore, the 5 lines
should all have the same positive slope.
parrallel and increasing lines.
8
You should therefore draw a graph with 5
5.
educ = 0.072:
βbOLS
In 1980, one more year of education was associated with a 7.2% increase
of the mean of wages, holding age, experience, experience squared, and race constant.
This coecient is not very surprising following the result of Exercise 3: education is
associated with a 7% increase of all quantiles of the distribution of wages, and the
eect of education on the average is the average of its eect on all quantiles, so its
eect on the average of wages is also to increase it by 7% per year of education.
6. See code.
7. The interpretation of the coecients is the same as in question 2.
8. The estimates of
educ , β educ , β educ ,
β0.1
0.25
0.50
and
educ
β0.75
are not economically very dierent (they
are all included between 0.106 and 0.111), and their condence intervals most often overlap. On the other hand, the estimate of
educ
β0.90
is substantially larger, and its condence
interval does not overlap with that of any of the other coecients. This implies that
years of education is associated with a larger increase of the 90
th percentile of wages than
for other percentiles. Also, all the coecients are larger than in 1980, implying that the
returns to years of education on the labor market have increased over the period.
9. Education has fairly similar positive returns across the rst four quantiles, but has higher
returns on the last one. Therefore, your graph should bear 4 parrallel and increasing
line, and a 5th more increasing upper line.
10.
educ = 0.113:
βbOLS
In 1990, one more year of education was associated with a 11.3% increase
of the mean of wages, holding age, experience, experience squared, and race constant.
This coecient is similar to the one we derived in the rst four quantile regressions,
even though it is slightly higher.
This coecient is not very surprising following the
result of Exercise 3: education is associated with a 11% increase of all quantiles of the
distribution of wages except for higher quantiles where its eect is higher; the eect
of education on the average is the average of its eect on all quantiles; so its eect
on the average of wages is slightly greater than 11%.
11. See code.
12. The interpretation of the coecients is the same as in question 2.
13. The estimates of
educ , β educ , β educ , β educ ,
β0.1
0.25
0.50
0.75
and
educ
β0.90
are all substantially dierent, and
their condence intervals never overlap. Returns to education consistently increase for
larger quantiles.
14. Education has positive returns on each quantile, but it has larger returns for larger quantiles. Therefore, your graph should bear 5 non parrallel increasing lines, with increasing
slopes from the bottom line to the top line.
9
15.
educ = 0.115:
βbOLS
In 2000, one more year of education was associated with a 11.5% increase
of the mean of wages, holding age, experience, experience squared, and race constant.
This coecient is similar to the one we derived in the third quantile regression (that of
the median). It is substantially larger than
than
educ
βb0.75
and
educ .
βb0.90
educ
βb0.1
and
educ ,
βb0.25
and substantially smaller
Once again, this reects the fact that the eect of education on
average wage is the average of its eect on all quantiles.
16. The eect of education on average wages was multiplied by more than 1.5 over the
period. This mostly comes from the top of the distribution: the eect of education on
the 90th percentile of wages was multiplied by more than 2. When you compare your
three graphs, they should show you that the variance of wages within groups of people
with the same level of education went from being constant across educational levels to
being an increasing function of education. In 2000, education is much more rewarded
on the labor market than it was in 1980. On the other hand, within highly educated
people, the variance of wages is much larger in 2000 than it was in 1980, implying that
other factors now play a greater role in the determination of wages.
Exercise 5: using quantile regressions to measure heterogeneous returns to a
boarding school for disadvantaged students (based on Behaghel et al. (2014))
Behaghel et al. (2014) ran an experiment in which applicants to a boarding school for disadvantaged students were randomly admitted to the school. The boardingschool.dta data set
contains three variables. mathsscore is students maths score two years after the lottery which
determined who was admitted or not to the school took place. This measure is standardized
with respect to the mean and standard deviation of the raw score in the control group. This
is the outcome variable in this exercise, so we denote it
Y.
baselinemathsscore is a measure
of students baseline maths ability, before the lottery took place. We will use it as a control
variable, so we denote it
X.
boarding is a dummy equal to
1
if a student won the lottery and
gained admission to the boarding school. This is the treatment variable, so it is denoted
D.
1. Run a very simple regression to show that students who won the lottery had very similar
baseline maths scores to those who lost the lottery. If you were to compare the treatment and the control groups not on one, but on 20 baseline characteristics, how many
signicant dierences at the 5% level should you expect to nd?
2. Run a very simple regression to measure the eect of the boarding school on students'
average maths test scores two years after the lottery. Interpret the coecient from this
regression (do not forget that test scores are standardized with respect to the mean and
standard deviation of the raw score in the control group).
3. Run a slightly more complicated regression to increase the statistical precision of your
coecient of interest.
Does this coecient change a lot with respect to the previous
10
regression?
Was this expected?
Does the standard error of your main coecient of
interest change a lot with respect to the previous regression? Was this expected?
4. We have
qτ (Y |D) = 1{D = 0}qτ (Y |D = 0) + 1{D = 1}qτ (Y |D = 1)
= qτ (Y |D = 0) + D(qτ (Y |D = 1) − qτ (Y |D = 0)).
This shows that
of
Y
qτ (Y |D)
on a constant and
D
is a linear function of
qτ (Y |D = 0),
while the coecient of
D
will be an
qτ (Y |D = 1) − qτ (Y |D = 0).
(a) Show that
order
Therefore, a quantile regression
will estimate this conditional quantile function: the coecient
of the constant will be an estimate of
estimate of
(1, D)0 .
βbτ
is an estimate of
FY−1
(τ ) − FY−1
(τ ),
1
0
the quantile treatment eect of
τ.
(b) Plot a graph with
τ
on the x-axis and
βbτ
90% condence interval on the y-axis, for
and the upper and lower bounds of its
τ
ranging from
0.03
to
0.97.
On that
purpose, you need to:
1. run a quantile regression of order
for every
2. store
τ
ranging from
0.03
to
τ
of
Y
on
D
using the qreg command in Stata
0.97
τ , βbτ , βbτ − 1.64 ∗ se(βbτ ), βbτ + 1.64 ∗ se(βbτ )
in a matrix
3. clear the data
4. form a data set containing the values you stored in the matrix, using the svmat
command
5. plot the requested graph.
(c) Does the boarding school increase more the lowest quantiles of the distribution of
scores, those close to the median, or the highest ones?
(d) Does the graph conrm the result in exercise 3?
5. The rank invariance assumption states that the treatment does not change the rank of
an individual in the distribution of the outcome. The individual at the 90th percentile of
the distribution of
Y0
will also be at the 90th percentile of the distribution of
Y1 .
Under
this assumption, quantile treatment eects can be interpreted as individual treatment
eects.
FY−1
(τ ) − FY−1
(τ )
1
0
compares the
τ th
quantile of the distributions of
Under the rank invariance assumption, it is the same individual who is at the
of these two distributions, so
Y1
and
Y0 .
τ th quantile
FY−1
(τ )−FY−1
(τ ) is a measure of the eect of the treatment
1
0
on her.
The rank invariance assumption is not a credible assumption, but it is useful to assess
treatment eect heterogeneity. Indeed, any deviation from rank invariance implies more
11
treatment eect heterogeneity than that obtained under rank invariance. One can formally show that
V (Y1 − Y0 ) ≥ V RI (Y1 − Y0 ):
the true variance of the treatment eect,
which we cannot measure because we do not observe
V RI (Y1 − Y0 ),
Y1 − Y0 ,
is at least as high as
the variance of the treatment eect under rank invariance.
V
interesting result, because
RI (Y
1
− Y0 )
That's an
is something we can estimate. In the simple
case where the sample bears as many treated and untreated observations (n0
under rank invariance the
Y0
of
Y(i)1 ,
th largest value of
merely Y(i)0 , the i
the observation with the
Y0 .
largest value of
Y1 ,
is
Therefore, one can use
n
0
1X
(Y(i)1 − Y(i)0 )2 −
n
i=1
to estimate
ith
= n1 ),
n
0
1X
(Y(i)1 − Y(i)0 )
n
!2
i=1
V RI (Y1 − Y0 ).
We will not prove this result that any deviation from rank invariance implies more
treatment eect heterogeneity than that obtained under rank invariance, but let me
give you a little bit of intuition.
Assume that
FY−1
(τ ) − FY−1
(τ )
1
0
is increasing in
τ.
Rank invariance already implies treatment eect heterogeneity: under rank invariance,
FY−1
(τ ) − FY−1
(τ )
1
0
increasing in
τ
implies that people at the upper quantiles of the dis-
tribution of the outcome have larger eects than those at the lower quantiles. Now, any
deviation from rank invariance implies even more treatment eect heterogeneity. Under
rank invariance,
FY−1
(1) − FY−1
(1) is the value of the treatment eect for the person with
1
0
the highest value of
Y1 .
If rank invariance is violated, the counterfactual value of the
outcome for that person under non treatment is no longer
τ < 1.
FY−1
(1), but FY−1
(τ ), for some
0
0
If rank invariance is violated, this person, instead of having the highest
Y0 ,
must
Y0 . This implies that the true treatment eect
−1
FY0 (τ ), is even larger than what we had concluded under
have a lower rank in the distribution of
−1
for that person, FY (1)
1
rank invariance.
−
Similarly, if the person with the lowest value of
have the lowest value of
Y0 ,
Y1
actually did not
then the true treatment eect for that person is even lower
than what we had concluded under rank invariance. In this example, deviations from
rank invariance mean that people at upper quantiles have even larger eects than what
we had concluded under rank invariance, while people at lower quantiles have even lower
eects, implying that overall, treatment eects are even more heterogeneous that what
we had concluded under rank invariance.
After this long introduction, consider again the graph you obtained in question 4.b).
(a) Look at whether condence intervals of
FY−1
(τ ) − FY−1
(τ )
1
0
for dierent values of
τ
−1
−1
cross or not. At the 10% level, can you reject that FY (τ ) − FY (τ ) is constant
1
0
−1
−1
for every τ ? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only two
1
0
−1
−1
dierent values? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only
1
0
three dierent values?
12
(b) Under the rank invariance assumption, what is the minimum number of dierent
values that
Y1 − Y0
must take? Are these values very dierent from each other? Is
the eect of the boarding school on students test scores heterogeneous?
6. Given the shape of the quantile treatment eect function
likely to be greater or smaller than
V (Y0 )?
τ 7→ FY−1
(τ )−FY−1
(τ ), is V (Y1 )
1
0
How could you estimate
V (Y1 )
and
V (Y0 )?
Which results could you use to derive condence intervals for these two quantities?
7. This policy is expensive:
it multiplies by two the expenditure per year per student.
However, the gain on average maths scores it produces is roughly the same as the one
produced by dividing class size by two, which also multiplies by two the expenditure per
year per student. The literature has found that class size reductions have larger eects
for the weakest students. Based on the results of the previous questions, nd one reason
why you might prefer class size reduction over the boarding school policy if you were a
social planner. Send me a 5 lines email with your answer.
Solution
1. You should regress
X
on
D.
The coecient is very small, and insignicant. The lottery
created almost perfectly comparable treatment and control groups from the perspective
of their baseline maths score. If you were to run 20 regressions of baseline characteristics
(gender, parents education...) on
D,
you should get (on average) one signicant coe-
cient. The coecient of these regressions is an estimate of
E(X|D = 1) − E(X|D = 0).
By virtue of the randomization, and because baseline characteristics were xed before
the lottery,
E(X|D = 1) − E(X|D = 0) = 0.
know that when
from
0
H0
is satised, a test of
5%
at the 5% level has a
H0
Therefore,
H0 : β = 0
based on whether
β
probability of wrongly rejecting
is satised. Still, we
is signicantly dierent
H0 .
So out of 20 tests,
you will on average wrongly reject one.
2. You should regress
Y
on
D.
The coecient is
0.295.
As the data comes from a random-
ized experiment, this coecient can receive a causal interpretation: spending two years
in the boarding school increases students' test scores by
3. You can regress
Y
on
D
and
X.
The coecient of
D
29.5%
of a standard deviation.
is almost the same,
what we expected, following the second omitted variable formula (as
allocated, it must be independent of
the lottery, therefore
µ=0
X,
D
0.289.
was randomly
a baseline characteristic which was xed before
in this formula). The standard error of this coecient is
15% lower. This is what we expected, following Theorem 5.2.2 in the notes:
2
1−RY,(D,X)
2
1−RY,D
=
1−0.28
1−0.01
= 0.73,
This is
√
and
0.73 = 0.85.
13
2
RD,X
= 0,
4.
(a)
βbτ
is an estimate of
qτ (Y |D = 1) − qτ (Y |D = 0).
−1
qτ (Y |D = 1) − qτ (Y |D = 0) = FY−1
|D=1 (τ ) − FY |D=0 (τ )
= FY−1
(τ ) − FY−1
(τ )
1 |D=1
0 |D=0
= FY−1
(τ ) − FY−1
(τ ).
1
0
(b) See code. The resulting graph is shown below.
Figure 1: QTE of the boarding school
(c) The boarding school has a negative eect on the rst quintile of the distribution of
test scores, even though few estimates are statistically signicant. It has a moderately positive eect on the second, third, and fourth quintiles of test scores (around
25-30% of a standard deviation). A fairly large number of QTE are signicantly
dierent from 0 at the 10% level. Finally, the boarding school has a very large and
signicant eect on the top quintile of the distribution of test scores (around 0.9
standard deviations).
(d) Yes, it does. We saw in question 1 that the boarding school increases average test
scores by
0.29
sd.
speaking equal to
Therefore,
−0.2
b 1 − Y0 ) = 0.29.
E(Y
Estimated QTE are roughly
over the rst quintile of the distribution, to
next three quantiles, and to
0.25
over the
0.8 over the last quantile. −0.2 ∗ 0.2 + 0.25 ∗ 0.6 + 0.8 ∗
0.2 ≈ 0.29.
5.
(a) Not all condence intervals overlap.
For instance, the condence intervals of
the QTE above 0.8 do not overlap with those of the QTE below 0.15 (up to
some exceptions).
Similarly, the condence intervals of
14
FY−1
(0.42) − FY−1
(0.42),
1
0
FY−1
(0.62) − FY−1
(0.62),
1
0
and
do not overlap with that of
(0.75)
FY−1
(0.75) − FY−1
0
1
FY−1
(0.05) − FY−1
(0.05).
1
0
up to
(0.79)
(0.79) − FY−1
FY−1
0
1
Finally, the condence interval
−1
of FY (0.95)
1
− FY−1
(0.95) does not overlap with that of any QTE below 0.6. We
0
−1
−1
can therefore reject that FY (τ ) − FY (τ ) is a constant function of τ . It must
1
0
−1
−1
at least take three dierent values (for instance, FY (0.95) − FY (0.95) is signif1
0
−1
−1
icantly dierent from FY (0.42) − FY (0.42) which is itself signicantly dierent
1
0
−1
−1
from FY (0.05) − FY (0.05)).
1
0
(b) Under rank invariance,
FY−1
(τ ) − FY−1
(τ )
1
0
are individual level values of
Following results in the previous question, under rank invariance
Y1 − Y0 .
Y1 − Y0
must
take at least three values: a negative value for people in the lowest quintile of the
distribution, a value around 25-30% of a standard deviation for people in the 2nd,
3rd and 4th quintiles, and a value around 90% of a standard deviation for people
in the highest quintile.
Even under rank invariance, the eect of the treatment
already seems to be highly heterogeneous across students. As any deviation from
rank invariance will increase even more this heterogeneity, we can assert that the
eect of the school is highly heterogeneous across students.
6. The treatment increases the largest values of the outcome, and diminishes the smallest
ones. Therefore, it probably increased the variance of test scores. To check this, you
2
Y 2 |D = 1 − Y |D = 1
could use
estimate
V (Y0 ).
to estimate
V (Y1 ),
and
2
Y 2 |D = 0 − Y |D = 0
to
You could rely on results from the rst exercise to derive condence
intervals for these two quantities, or you could use the bootstrap.
7. It follows from the quantile treatment eects analysis that the boarding school increases
the highest quantiles, and decreases the lowest ones. On the contrary, class size reductions benet more to weak students, so they probably increase more the lowest than the
highest quantiles of the distribution. A purely utilitarian planner would be indierent
between the two policies, as they have the same cost and the same average treatment
eect.
However, a planner with a welfare function putting more weight on the lower
quantiles of the distribution than on the upper quantiles will prefer class size reduction.
* Exercise 6: Proving that Theorem 6.4.4 applies to the sample median
Let
Y
be continuous random variable with strictly increasing and continuous cdf
strictly positive and continuous pdf
support of
Y.
fY .
Let
y
and
y
FY
and
be the lower and upper bounds of the
The goal of this exercise is to show that the median of
Y
is an m-estimand
satisfying Assumption 9 in the notes, and to infer from this its asymptotic behavior using
Theorem 6.4.4.
It follows from example 2 in the notes that
me(Y ) =
argmin
E(|Y − θ|).
θ∈R
Therefore,
me(Y ) = argmin E(|Y − θ| − |Y |).
So
θ∈R
m(y, θ) = |y − θ| − |y|.
15
me(Y )
can be seen as an m-estimand with
1. Show that
θ 7→ m(y, θ)
θ
is dierentiable in
for every
θ
dierent from
expression of its derivative. Hint: you should start considering a value of
you should give explicit expressions of
m(y, θ)
depending on the value of
y.
Give an
θ ≥ 0,
y
then
(there are
three cases you must distinguish), then you should dierentiate these expressions with
respect to
θ.
Then you should do the same thing for
2. Infer from the previous question that
9:
θ 7→ m(y, θ)
3. Show that
m(y, θ)
m(y, θ) satises the rst requirement of Assumption
θ0 = me(Y )
is dierentiable at
θ < 0.
for almost every
y ∈ R.
satises the second requirement of Assumption 9: for every
(θ1 , θ2 ),
|m(y, θ1 ) − m(y, θ2 )| ≤ |θ1 − θ2 |.
4. Let
M (θ) = E(m(Y, θ)) = E(|Y − θ| − |Y |).
(a) Show that for any
θ ≥ 0,
Z
0
E(|Y − θ| − |Y |) =
Z
θ
Z
θfY (y)dy.
θ
0
y
y
(θ − 2y)fY (y)dy −
θfY (y)dy +
(b) Infer from the previous question that
θ
Z
(θ − 2y)fY (y)dy − θ(1 − FY (θ)).
E(|Y − θ| − |Y |) = θFY (0) +
0
(c) Infer from the previous question that
Z
E(|Y − θ| − |Y |) =
θ
−2yfY (y)dy − θ(1 − 2FY (θ)).
0
(d) Infer from the previous question and from an integration by parts (see Wikipedia
entry if needed) that
θ
Z
E(|Y − θ| − |Y |) = 2
FY (y)dy − θ.
0
θ ≥ 0, M (θ)
is twice continuously dierentiable. Give
Ṁ (θ)
(f ) Follow the same steps as in the 5 preceding subquestions to show that for any
θ < 0,
(e) Conclude that for any
and
M̈ (θ).
Z
E(|Y − θ| − |Y |) = −2
0
FY (y)dy − θ.
θ
(g) Conclude that for any
and
θ < 0, M (θ)
is twice continuously dierentiable. Give
Ṁ (θ)
M̈ (θ).
(h) Check that
Ṁ (θ)
and
M̈ (θ)
are continuous at
continuously dierentiable for any
(i) Check that
M̈ (θ)
0.
Conclude that
θ ∈ R.
is invertible for any
16
θ.
Give
M̈ (θ0 )−1 .
M (θ)
is twice
5. Conclude from the preceding questions that
me(Y )
is an m-estimand satisfying As-
me(Y
c )
sumption 9. Apply Theorem 6.4.4 to deduce from this that
normal estimator of
me(Y ).
is an asymptotically
Give an expression of its asymptotic variance.
Solution
1. Assume
θ ≥ 0.
If
y ≥ θ, m(y, θ) = y − θ − y = −θ.
y < 0, m(y, θ) = θ.
Therefore, for
θ 7→ m(y, θ)
is dierentiable for every

−1
ṁ(y, θ) =
1
Similarly, if
θ 7→ m(y, θ)


−θ



2y − θ




θ
is dierentiable for every

−1
ṁ(y, θ) =
1
Therefore, irrespective of whether
dierentiable for every
θ 6= y ,
θ≥0
or
y≥θ
if
0≤y<θ
if
y<0
θ 6= y ,
with
if
θ<y
if
θ>y
if
y≥0
if
θ≤y<0
if
y<θ
Assumption 9:
θ 7→ m(y, θ)
for which
m(y, θ)
θ 6= y ,
with
if
θ<y
if
θ>y
we always have that
θ 7→ m(y, θ)
is
with
is dierentiable at
if
θ<y
if
θ>y
m(y, θ)
2. A consequence of the previous question is that
y
if
θ < 0,

−1
ṁ(y, θ) =
1
only
If
θ < 0,
m(y, θ) =
Therefore
0 ≤ y < θ, m(y, θ) = θ − 2y .
θ ≥ 0,


−θ



m(y, θ) = θ − 2y




θ
Therefore
If
satises the rst requirement of
θ0 = me(Y )
is not dierentiable at
θ0
is
for almost every
y∈R
y = θ0 ).
3.
|m(y, θ1 ) − m(y, θ2 )| = ||y − θ1 | − |y − θ2 || ≤ |y − θ1 − (y − θ2 )| = |θ1 − θ2 |.
The inequality follows from the triangle inequality.
17
(the
4.
(a)
E(|Y − θ| − |Y |)
Z y
=
(|y − θ| − |y|) fY (y)dy
y
Z
0
θ
Z
(|y − θ| − |y|) fY (y)dy +
=
(|y − θ| − |y|) fY (y)dy +
y
Z
=
0
0
y
y
Z
(θ − 2y)fY (y)dy −
θfY (y)dy +
(|y − θ| − |y|) fY (y)dy
θ
θ
Z
y
Z
θfY (y)dy.
0
θ
(b) Therefore,
0
Z
E(|Y − θ| − |Y |) = θ
θ
Z
y
Z
(θ − 2y)fY (y)dy − θ
fY (y)dy +
y
fY (y)dy
θ
0
θ
Z
(θ − 2y)fY (y)dy − θ(1 − FY (θ)).
= θFY (0) +
0
(c) Therefore,
Z
θ
E(|Y − θ| − |Y |) = θFY (0) + θ
Z
θ
−2yfY (y)dy − θ(1 − FY (θ))
Z θ
= θFY (0) + θ(FY (θ) − FY (0)) +
−2yfY (y)dy − θ(1 − FY (θ))
0
Z θ
=
−2yfY (y)dy − θ(1 − 2FY (θ)).
fY (y)dy +
0
0
0
(d) Therefore, it follows from an integration by parts, dierentiating
ing
fY (y),
−2y
and integrat-
that
E(|Y − θ| − |Y |) =
[−2yF (y)]θ0
Z
θ
−
−2FY (y)dy − θ(1 − 2FY (θ))
0
Z
θ
= −2θFY (θ) + 2
FY (y)dy − θ(1 − 2FY (θ))
0
Z θ
= 2
FY (y)dy − θ.
0
(e) This shows that for any
2FY (θ) − 1
and
θ ≥ 0, M (θ)
M̈ (θ) = 2fY (θ).
18
is twice continuously dierentiable.
Ṁ (θ) =
θ < 0,
(f ) For any
E(|Y − θ| − |Y |)
Z y
=
(|y − θ| − |y|) fY (y)dy
y
θ
Z
Z
0
y
θ
θ
Z
=
y
−θfY (y)dy
0
θ
Z
θ
0
Z
= θ
y
Z
(2y − θ)fY (y)dy +
θfY (y)dy +
Z
(θ − 2y)fY (y)dy − θ
fY (y)dy +
y
(|y − θ| − |y|) fY (y)dy
0
0
Z
y
(|y − θ| − |y|) fY (y)dy +
(|y − θ| − |y|) fY (y)dy +
=
Z
y
fY (y)dy
0
θ
0
Z
(2y − θ)fY (y)dy − θ(1 − FY (0))
Z 0
Z 0
2yfY (y)dy − θ(1 − FY (0))
fY (y)dy +
θFY (θ) − θ
θ
θ
Z 0
θFY (θ) − θ(FY (0) − FY (θ)) +
2yfY (y)dy − θ(1 − FY (0))
θ
Z 0
2yfY (y)dy − θ(1 − 2FY (θ))
θ
Z 0
0
[2yF (y)]θ −
2FY (y)dy − θ(1 − 2FY (θ))
θ
Z 0
−2θFY (θ) − 2
FY (y)dy − θ(1 − 2FY (θ))
θ
Z 0
−2
FY (y)dy − θ.
= θFY (θ) +
θ
=
=
=
=
=
=
θ
θ < 0, M (θ)
(g) This shows that for any
2FY (θ) − 1
and
is twice continuously dierentiable.
M̈ (θ) = 2fY (θ).
(h) It follows from the previous question and from the fact that
the limit when
fY
(i)
θ
goes to
0
by the left of
Ṁ (θ)
θ
0
is continuous, the limit when
Therefore,
M (θ)
M̈ (θ) = 2fY (θ)
invertible.
goes to
for any
M̈ (θ0 )−1 =
θ.
me(Y ).
FY
is continuous that
2FY (0) − 1 = Ṁ (0).
by the left of
M̈ (θ)
is
Similarly, as
2fY (0) = M̈ (0).
θ ∈ R.
This number is strictly positive by assumption, so it is
1
2fY (θ0 ) .
me(Y ) is an m-estimand satisfying Assump-
Then it follows from Theorem 6.4.4 that
estimator of
is
is twice continuously dierentiable for any
5. It follows from the preceding questions that
tion 9.
Ṁ (θ) =
me(Y
c )
is an asymptotically normal
Its asymptotic variance is
M̈ (θ0 )−1 E(ṁ(Y, θ0 )ṁ(Y, θ0 )0 )M̈ (θ0 )−1
1
1
=
E (1{Y < me(Y )} − 1{Y > me(Y )})2
2fY (me(Y ))
2fY (me(Y ))
1
=
.
4fY (me(Y ))2
19
* Exercise 7: Using Theorem 6.4.3 to derive the asymptotic behavior of the maximum likelihood estimator.
Let
X
be a random variable with support
θ ∈ Θ ⊆ R}
θ0 ∈ Θ.
[x, x],
f ∈ {pθ :
whose probability density function
(to simplify, I assume that the dimension of
It follows from Example 5 in the notes that
− ln(pθ (x)). The corresponding m-estimator is θb,
1 Pn
argmin
i=1 − ln(pθ (Xi )). We have seen that if:
n
θ0
θ
is one).
f = pθ0
for some
m(x, θ) =
b=
estimator: θ
is an m-estimand, with
the maximum likelihood
θ∈Θ
1. for any
θ, pθ 6= pθ0
on at least a non empty open subset of the support of
2.
Θ
3.
θ 7→ ln(pθ (x))
is twice continuously dierentiable with respect to
4.
¨ θ (X))
E(ln(p
0
is invertible,
then
θ0
X
is compact
θ
for every
x.
satises Assumptions 5, 6, 7, and 8 in the notes, so we can apply Theorem 6.4.3
to assert that the maximum likelihood estimator is asymptotically normal, with asymptotic
variance
−1 2 −1
¨
˙
¨ θ (X))
E ln(pθ0 (X))
E
ln(pθ0 (X))
E ln(p
.
0
There should be
−
transformed into a
signs appearing in each term, but we can forget about them, as one is
+
by the square, and multiplying the two other yields a
+
as well. The
goal of this exercise is to derive a simpler expression of this asymptotic variance.
follows, all derivatives are taken with respect to
1. Show that for any
θ,
Rx
x
ṗθ (x)dx = 0.
In what
θ.
Hint: you can use the fact that
Rx
x
pθ (x)dx = 1
and assume that you can invert the derivative and integral signs because the dominated
convergence theorem applies.
2. Infer from this that
˙ θ (X))) =
E(ln(p
0
x
Z
˙ θ (x))pθ (x)dx = 0.
ln(p
0
0
x
3. Infer from this that
¨ θ (X)) =
E(ln(p
0
Z
x
¨ θ (x))pθ (x)dx = −
ln(p
0
0
x
Z
x
˙ θ (x)))2 pθ (x)dx = −E
(ln(p
0
0
2 ˙ θ (X))
.
ln(p
0
x
Hint: the rst and last equality just follow from the denition of these expectations.
The one you need to prove is the middle one.
Rx
x
˙ θ (x))pθ (x)dx = 0
ln(p
with respect to
For that, you should dierentiate
θ.
4. Infer from this that the asymptotic variance of the maximum likelihood estimator is
E
2 −1
˙
ln(pθ0 (X))
.
20
5. Assume
X
follows a
exp(θ0 )
distribution.
compute the asymptotic variance of
Compute
θb.
Use the previous questions to
θb.
Solution
1. As for every
0.
θ, pθ (x) is a density, for every θ
Rx
pθ (x)dx = 1.
Therefore,
∂
∂θ
Rx
x
pθ (x)dx =
As we have assumed that the dominated convergence theorem applies, one can invert
Rx
ṗθ (x)dx = 0.
R x ṗθ (x)
It follows from the previous question that for every θ ,
x pθ (x) pθ (x)dx = 0.
Rx
ṗθ (x)
˙ θ (x)pθ (x)dx = 0. Finally,
, this proves that for every θ ,
lnp
the derivative and integral sign. This yields
2.
x
x
As
˙ θ (x) =
lnp
x
pθ (x)
Z
˙ θ (X)) =
E ln(p
0
x
˙ θ (x))pθ (x)dx,
ln(p
0
0
x
because
at
θ0
pθ0 (x)
is the density of
X.
Putting the last and last but one equality evaluated
together proves the result.
3. We have shown in the previous question that for every
Dierencing each side with respect to
θ
θ,
Rx
x
˙ θ (x))pθ (x)dx = 0.
ln(p
and using the dominated convergence theorem,
we get
x
Z
¨ θ (x))pθ (x) + ln(p
˙ θ (x))ṗθ (x) dx = 0.
ln(p
x
This rewrites
Z
x
¨ θ (x))pθ (x)dx = −
ln(p
Z
x
x
˙ θ (x))ṗθ (x)dx.
ln(p
x
2
˙ θ (x))ṗθ (x) = ln(p
˙ θ (x)) pθ (x).
ln(p
2 ¨
˙
The rst and last equality merely follow from the denition of E(ln(pθ0 (X)) and E
ln(pθ0 (X))
.
This proves the middle equality, once noted that
4. The asymptotic variance of the maximum likelihood estimator is therefore equal to
−1 −1
2
¨
˙
¨
E ln(pθ0 (X))
E ln(pθ0 (X))
E ln(pθ0 (X))
−1
2
2 2 −1
˙
˙
˙
=
−E
ln(pθ0 (X))
E
ln(pθ0 (X))
−E
ln(pθ0 (X))
= E
5. If
X
2 −1
˙
ln(pθ0 (X))
.
follows and exponential distribution with parameter
ln(pθ (x)) = ln(θ) − θx,
and
˙ θ (x)) =
ln(p
1
θ
n
− x.
Moreover,
θ0 , pθ (x) = θ exp(−θx),
E(X) =
1
θ0 , and
n
V (X) =
so
1
θ02
1X
1X
θb = argmin
− ln(pθ (Xi )) = argmin
(− ln(θ) + θXi ) .
θ∈Θ n i=1
θ∈Θ n i=1
1 Pn
1
b 1
The foc conditions of this problem are
i=1 − θ + Xi = 0, so θ = X . It follows from
n
the results of the previous questions that its asymptotic variance is
E
2 −1
˙
ln(pθ0 (X))
=E
21
!−1
1 2
1
X−
=
= θ02 .
θ0
V (X)
Exercise 8: a very small Monte-Carlo study of M-estimators
Let
U
Let
θ0 =
be a random variable following the uniform distribution on
argmin
E(m(U, θ)).
[0, 1].
The goal of the exercise is to compute
Let
θ0
m(u, θ) = (u−θ)2 .
and to estimate its
θ∈[0,1]
m-estimator
θb to
1. Show that
compare the two.
θ0 =
1
2.
2. Generate a data set with a variable
Ui
U
containing 10 000 observations of random variables
following the uniform distribution on
[0, 1].
3. Generate 101 more variables in this data set, respectively equal to
m(U, 0), m(U, 0.01),
m(U, 0.02),..., m(U, 1).
4. Estimate the sample mean of these 101 variables.
5. Which of these variables has the lowest sample mean?
6. What is the value of
θb?
7. Do the exercise again with 100 observations.
Solution
1.
E(m(U, θ)) = θ2 − θ +
1
3 . This quantity is minimized at
θ0 =
1
2.
2. See code.
3. See code.
4. See code.
5. In my simulations,
6.
θb ≈ 0.5.
m(U, θ).
interval
m(U, 0.5)
is the one with the lowest sample mean.
The denition of the m-estimator is that it is the value of
θ
which minimizes
Here, I have evaluated this function only over a grid of 100 points of the
[0, 1].
If I had went on to pick up 100 equally spaced points over
[0.49, 0.51]
and evaluated this function over these 100 points, I would probably have found a point
where
m(U, θ)
is lower than
m(U, 0.5).
That's why here
of the m-estimator.
7. With 100 observations, I get
θb = 0.49.
22
m(U, 0.5)
is an approximation
Download