PS3 for Econometrics 101, Warwick Econ Ph.D

advertisement
PS3 for Econometrics 101, Warwick Econ Ph.D
* denotes more mathematical exercises, on which you should spend some time if you are a
type 2 student but not if you are type 1. If you are type 1, you should still read the result
of exercise 3, without trying to prove it, as it will help you interpret results of the applied
exercises.
Exercise 1: using the delta method to derive condence intervals for
σ(Y ).
Let
(Yi )1≤i≤n
be an iid sample of
n
V (Y )
and
random variables with 4th moment and with a strictly
positive variance. Let
be an estimator of
V (Y ),
1. Show that
and let
2
Vb (Y ) = Y 2 − Y
q
σ
b(Y ) = Vb (Y ) be an estimator
of its standard deviation.
0
√ 2
n (Y , Y )0 − (E(Y 2 ), E(Y ))0 ,→ N (0, V0 ),
where
V0 =
2. Give a consistent estimator of
V (Y 2 )
cov(Y 2 , Y )
cov(Y 2 , Y )
V (Y )
V0 , Vb0 .
!
.
No need to prove that the estimator is indeed
consistent, just mention which theorems you would use to prove consistency.
3. Use the previous question and the delta method to show that
√ n Vb (Y ) − V (Y ) ,→ N (0, V1 ),
where
V1 = (1, −2E(Y ))V0 (1, −2E(Y ))0 .
V1 , Vb1 .
4. Give a consistent estimator of
You can write
Vb1
as a function of
Vb0 .
No need to
prove that the estimator is indeed consistent, just mention which theorems you would
use to prove consistency.
5. Give a condence interval for
V (Y )
with asymptotic coverage
1 − α.
You just need
to write the upper and lower bounds of the condence interval as functions of
Vb1 , q1− α2 ,
coverage
and
n.
1 − α,
Vb (Y ),
No need to prove that the condence interval indeed has asymptotic
just mention which lemma you would use in the proof.
6. Use the delta method to show that
√
n (b
σ (Y ) − σ(Y )) ,→ N (0, V2 ),
where
V2 =
1
1
V1 .
4V (Y )
7. Give a consistent estimator of
V2 , Vb2 .
You can write
Vb2
as a function of
Vb1 .
No need
to prove that the estimator is indeed consistent, just mention which theorem you would
use to prove consistency.
8. Give a condence interval for
σ(Y )
with asymptotic coverage
1 − α.
You just need
to write the upper and lower bounds of the condence interval as functions of
Vb2 , q1− α2 ,
coverage
and
n.
1 − α,
σ
b(Y ),
No need to prove that the condence interval indeed has asymptotic
just mention which lemma you would use in the proof.
Exercise 2: quantiles as m-estimands
In the course, we showed that the median of a continuous random variable can be seen as an
m-estimand. In this exercise, we want to show that the same applies to any quantile of
Let
Y
be a continuous random variable with a continuous and strictly increasing cdf
ρτ (u) = (τ − 1)u
and
be the quantile of order
Show that
τ
of
for
u ≤ 0.
You can check that
Y : qτ (Y ) =
and
u, let ρτ (u) = (τ − 1{u ≤ 0})u. ρτ (u) = τ u for
admitting a rst moment. For any real number
u > 0,
FY
Y.
FY−1 (τ ).
qτ (Y )
ρ0.5 (u) = 0.5|u|.
is the solution of
Then, let
qτ (Y )
P (Y ≤ y) = τ .
qτ (Y ) = argmin E(ρτ (Y − θ)).
θ∈R
Hints: you should use the exact same steps as in the proof of Example 2 in the notes. You need
to show that
E(ρτ (Y − θ)) − E(ρτ (Y − qτ (Y ))) is
purpose, you should consider rst a
and
E(ρτ (Y − qτ (Y )))
Y < qτ (Y ).
θ > qτ (Y ).
strictly positive for any
θ 6= qτ (Y ).
You should then decompose both
into three pieces depending on whether
Then you should replace the functions
ρτ (Y − θ)
On that
E(ρτ (Y −θ))
Y ≥ θ, qτ (Y ) ≤ Y < θ,
and
ρτ (Y − qτ (Y ))
or
by simpler
expressions within each of the 6 conditional expectations. Finally, at some point you will have
to use the fact that
P (Y ≤ qτ (Y )) = τ
and
P (Y > qτ (Y )) = 1 − τ .
* Exercise 3: the expected value of a random variable is the average of its quantiles.
Let
Z
fZ .
Let
be a random variable with a strictly increasing cdf
z
and
z
1. Show that
FZ
and with a strictly positive pdf
respectively denote the lower and upper bounds of its support.
E(Z) =
R1
0
FZ−1 (τ )dτ .
Hint: you should use the substitution
z = FZ−1 (τ )
in
R1
−1
0 FZ (τ )dτ and the integration by substitution theorem (see Wikipedia entry).
2. Infer from the previous question that
E(Y1 − Y0 ) =
R1
0
FY−1
(τ ) − FY−1
(τ )dτ :
1
0
the average
treatment eect is the average of quantile treatment eects.
Exercise 4: using quantile regressions to measure changes in the US wage structure
(based on Angrist et al. (2006))
Let
τ
X
be a
k×1
vector of random variables. Let
of the distribution of
Y |X = x. qτ (Y |X = x)
the previous exercise that
qτ (Y |X = x)
denote the quantile of order
is just a regular quantile, so it follows from
qτ (Y |X = x) = argmin E(ρτ (Y − θ)|X = x).
θ∈R
2
Now, let
qτ (Y |X)
be
a random variable equal to
Y
conditional on
qτ (Y |X = x)
when
X = x. qτ (Y |X)
is called the
τ th
quantile of
X.
qτ (Y |X) = argmin E(ρτ (Y − g(X))).
(0.0.1)
g(.)
Among all possible functions of
X , g(X), qτ (Y |X) is the one which minimizes E(ρτ (Y −g(X))).
Indeed, assume to simplify that
X
is a discrete random variable taking
J
values
x1 , ..., xJ .
Then, it follows from the law of iterated expectations that
E(ρτ (Y −g(X))) = E(E(ρτ (Y −g(X))|X)) =
J
X
1{X = xj }E(ρτ (Y −g(xj ))|X = xj )P (X = xj ).
j=1
For each
j , E(ρτ (Y − g(xj ))|X = xj ),
qτ (Y |X = xj ).
which minimizes
Therefore
X 0 βτ
X.
of
X 0 βτ ,
with
PJ
j=1 1{X
g(xj ),
is minimized at
= xj }qτ (Y |X = xj )
g(xj ) =
is the function of
X
Equation (0.0.1) motivates the denition of the quantile
βτ = argmin E(ρτ (Y − X 0 b)).
Angrist et al. (2006) show that
b∈Rk
is a weighted MMSE approximation of
When
βτ
qτ (Y |X) =
E(ρτ (Y − g(X))).
regression function
viewed as a function of
is
qτ (Y |X)
is indeed a linear function of
βbτ = argmin
b
qτ (Y |X)
1
n
Pn
i=1 ρτ (Yi
−
within the set of all linear functions of
X , X 0 βτ
is equal to it. A natural estimator
Xi0 b).
The census80.dta, census90.dta, and census00.dta contain representative samples of US born
black and white men aged 40-49 with at least ve years of education, with positive annual
earnings and hours worked in the year preceding the census, respectively from the 1980, 1990,
and 2000 censuses. The goal of the exercise is to estimate, for each census wave, returns of
years of education on wages, on dierent quantiles of the distribution of wages, controlling
for age, experience, and race. Here, returns should be understood as a correlational concept,
not as a causal concept. Coecients of education in the regressions you will run should be
interpreted as follows: an increase of one year of education is
the
τ th
associated to an increase of
quantile of wages by XXXX, controlling for YYYY. Even though these parameters
are not causal ones, they are still very interesting: comparing the structure of these returns
across census waves will teach us something on the evolution of the structure of wages in the
US over these 20 years.
Each of these data sets contains the following variables: age (age in years of the individual),
educ (number of years of education), logwk (log wage), exper (number of years of professional
experience), exper2 (number of years of professional experience squared), black (a dummy if
someone is black).
1. Open census80.dta. Using the qreg command, run quantile regressions of order 10, 25,
50, 75, and 90 of logwk on educ, age, exper, exper2, and black.
2. Interpret the coecient of educ in each regression, following the guideline I gave you
in the introduction of the exercise. Remember that wage is in logarithm, and that you
control for age, exper, exper2, and black in your regression.
3
3. Do the condence intervals of the coecient of educ in the quantile regressions of order
10, 25, 50, 75, and 90 overlap? In 1980, did years of education had very dierent returns
across dierent quantiles of the distribution of wages?
4. Based on your regression results, draw a graph with years of education on the
x-axis, and
predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education,
holding the other variables constant.
mapping education into education
Your graph should consist in 5 lines:
educ ,
×βb0.1
the line
the line mapping education into education
educ etc., where β
beduc denotes the coecient of education in the quantile regression of
×βb0.25
0.1
educ
b
order 10, β0.25 denotes the coecient of education in the quantile regression of order 25,
etc.. No need that the line correspond perfectly to the true quantile regression functions,
but based on your regression results, you should determine a) if each of these 5 lines
is increasing or decreasing and b) whether they are parrallel to each other, and if not
which ones are more increasing / decreasing than the others.
5. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient
of educ in that regression very dierent from that of educ in the quantile regressions?
Is the value of this coecient very surprising in view of the result in Exercise 3 and the
results of the quantile regressions?
6. Open census90.dta. Using the qreg command, run quantile regressions of order 10, 25,
50, 75, and 90 of logwk on educ, age, exper, exper2, and black.
7. Interpret the coecient of educ in each regression.
8. Do the condence intervals of the coecient of educ in the quantile regressions of order
10, 25, 50, 75, and 90 overlap? In 1990, did years of education had very dierent returns
across dierent quantiles of the distribution of wages?
How do the coecients of the
1990 quantile regressions compare to those of the 1980 regressions?
9. Based on your regression results, draw a graph with years of education on the
x-axis, and
predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education,
holding the other variables constant.
mapping education into education
Your graph should consist in 5 lines:
educ ,
×βb0.1
the line
the line mapping education into education
educ
×βb0.25
beduc denotes the coecient of education in the quantile regression of
etc., where β
0.1
educ
b
order 10, β
denotes the coecient of education in the quantile regression of order 25,
0.25
etc.. No need that the line correspond perfectly to the true quantile regression functions,
but based on your regression results, you should determine a) if each of these 5 lines
is increasing or decreasing and b) whether they are parrallel to each other, and if not
which ones are more increasing / decreasing than the others.
10. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient
of educ in that regression very dierent from that of educ in the quantile regressions?
4
Is the value of this coecient very surprising in view of the result in Exercise 3 and the
results of the quantile regressions?
11. Open census00.dta. Using the qreg command, run quantile regressions of order 10, 25,
50, 75, and 90 of logwk on educ, age, exper, exper2, and black.
12. Interpret the coecient of educ in each regression.
13. Do the condence intervals of the coecient of educ in the quantile regressions of order
10, 25, 50, 75, and 90 overlap? In 2000, did years of education had very dierent returns
across dierent quantiles of the distribution of wages?
14. Based on your regression results, draw a graph with years of education on the
x-axis, and
predicted quantiles of order 10, 25, 50, 75, and 90 of wages for each value of education,
holding the other variables constant.
mapping education into education
Your graph should consist in 5 lines:
educ ,
×βb0.1
the line
the line mapping education into education
educ
×βb0.25
beduc denotes the coecient of education in the quantile regression of
etc., where β
0.1
educ
b
order 10, β
denotes the coecient of education in the quantile regression of order 25,
0.25
etc.. No need that the line correspond perfectly to the true quantile regression functions,
but based on your regression results, you should determine a) if each of these 5 lines
is increasing or decreasing and b) whether they are parrallel to each other, and if not
which ones are more increasing / decreasing than the others.
15. Run the OLS regression of logwk on educ, age, exper, exper2, and black. Is the coecient
of educ in that regression very dierent from that of educ in the quantile regressions?
Is the value of this coecient very surprising in view of the result in Exercise 3 and the
results of the quantile regressions?
16. Compare the three graphs you drew, to summarize the results from the quantiles regressions in 1980, 1990, and 2000. How did the returns to education evolve over these 20
years? What have been the main changes in the structure of US wages over that period?
17. Find another potentially interesting set of quantile regressions you could run, run them,
interpret their results, and send me a 10 lines email explaining the regressions you ran,
and your results.
Exercise 5: using quantile regressions to measure heterogeneous returns to a
boarding school for disadvantaged students (based on Behaghel et al. (2014))
Behaghel et al. (2014) ran an experiment in which applicants to a boarding school for disadvantaged students were randomly admitted to the school. The boardingschool.dta data set
contains three variables. mathsscore is students maths score two years after the lottery which
determined who was admitted or not to the school took place. This measure is standardized
with respect to the mean and standard deviation of the raw score in the control group. This
5
Y.
is the outcome variable in this exercise, so we denote it
baselinemathsscore is a measure
of students baseline maths ability, before the lottery took place. We will use it as a control
variable, so we denote it
X.
boarding is a dummy equal to
1
if a student won the lottery and
gained admission to the boarding school. This is the treatment variable, so it is denoted
D.
1. Run a very simple regression to show that students who won the lottery had very similar
baseline maths scores to those who lost the lottery. If you were to compare the treatment and the control groups not on one, but on 20 baseline characteristics, how many
signicant dierences at the 5% level should you expect to nd?
2. Run a very simple regression to measure the eect of the boarding school on students'
average maths test scores two years after the lottery. Interpret the coecient from this
regression (do not forget that test scores are standardized with respect to the mean and
standard deviation of the raw score in the control group).
3. Run a slightly more complicated regression to increase the statistical precision of your
coecient of interest.
regression?
Does this coecient change a lot with respect to the previous
Was this expected?
Does the standard error of your main coecient of
interest change a lot with respect to the previous regression? Was this expected?
4. We have
qτ (Y |D) = 1{D = 0}qτ (Y |D = 0) + 1{D = 1}qτ (Y |D = 1)
= qτ (Y |D = 0) + D(qτ (Y |D = 1) − qτ (Y |D = 0)).
This shows that
of
Y
qτ (Y |D)
on a constant and
D
is a linear function of
qτ (Y |D = 0),
while the coecient of
D
will be an
qτ (Y |D = 1) − qτ (Y |D = 0).
(a) Show that
order
Therefore, a quantile regression
will estimate this conditional quantile function: the coecient
of the constant will be an estimate of
estimate of
(1, D)0 .
βbτ
is an estimate of
FY−1
(τ ) − FY−1
(τ ),
1
0
the quantile treatment eect of
τ.
(b) Plot a graph with
τ
on the x-axis and
βbτ
90% condence interval on the y-axis, for
and the upper and lower bounds of its
τ
ranging from
0.03
to
0.97.
On that
purpose, you need to:
1. run a quantile regression of order
for every
2. store
τ
ranging from
0.03
to
τ
of
Y
on
D
using the qreg command in Stata
0.97
τ , βbτ , βbτ − 1.64 ∗ se(βbτ ), βbτ + 1.64 ∗ se(βbτ )
in a matrix
3. clear the data
4. form a data set containing the values you stored in the matrix, using the svmat
command
5. plot the requested graph.
6
(c) Does the boarding school increase more the lowest quantiles of the distribution of
scores, those close to the median, or the highest ones?
(d) Does the graph conrm the result in exercise 3?
5. The rank invariance assumption states that the treatment does not change the rank of
an individual in the distribution of the outcome. The individual at the 90th percentile of
the distribution of
Y0
will also be at the 90th percentile of the distribution of
Y1 .
Under
this assumption, quantile treatment eects can be interpreted as individual treatment
FY−1
(τ ) − FY−1
(τ )
1
0
eects.
compares the
τ th
quantile of the distributions of
Y1
and
Y0 .
th quantile
Under the rank invariance assumption, it is the same individual who is at the τ
of these two distributions, so
FY−1
(τ )−FY−1
(τ ) is a measure of the eect of the treatment
1
0
on her.
The rank invariance assumption is not a credible assumption, but it is useful to assess
treatment eect heterogeneity. Indeed, any deviation from rank invariance implies more
treatment eect heterogeneity than that obtained under rank invariance. One can formally show that
V (Y1 − Y0 ) ≥ V RI (Y1 − Y0 ):
the true variance of the treatment eect,
which we cannot measure because we do not observe
V
RI (Y
1
− Y0 ),
Y1 − Y0 ,
is at least as high as
the variance of the treatment eect under rank invariance.
interesting result, because
V
RI (Y
1
− Y0 )
That's an
is something we can estimate. In the simple
case where the sample bears as many treated and untreated observations (n0
= n1 ),
th largest value of
under rank invariance the Y0 of Y(i)1 , the observation with the i
th largest value of Y . Therefore, one can use
merely Y(i)0 , the i
0
n
0
1X
(Y(i)1 − Y(i)0 )2 −
n
i=1
to estimate
n
0
1X
(Y(i)1 − Y(i)0 )
n
Y1 ,
is
!2
i=1
V RI (Y1 − Y0 ).
We will not prove this result that any deviation from rank invariance implies more
treatment eect heterogeneity than that obtained under rank invariance, but let me
give you a little bit of intuition.
Assume that
FY−1
(τ ) − FY−1
(τ )
1
0
is increasing in
τ.
Rank invariance already implies treatment eect heterogeneity: under rank invariance,
FY−1
(τ ) − FY−1
(τ )
1
0
increasing in
τ
implies that people at the upper quantiles of the dis-
tribution of the outcome have larger eects than those at the lower quantiles. Now, any
deviation from rank invariance implies even more treatment eect heterogeneity. Under
rank invariance,
FY−1
(1) − FY−1
(1) is the value of the treatment eect for the person with
1
0
the highest value of
Y1 .
If rank invariance is violated, the counterfactual value of the
outcome for that person under non treatment is no longer
τ < 1.
FY−1
(1), but FY−1
(τ ), for some
0
0
If rank invariance is violated, this person, instead of having the highest
−
must
Y0 . This implies that the true treatment eect
−1
FY0 (τ ), is even larger than what we had concluded under
have a lower rank in the distribution of
−1
for that person, FY (1)
1
Y0 ,
7
rank invariance.
Similarly, if the person with the lowest value of
have the lowest value of
Y0 ,
Y1
actually did not
then the true treatment eect for that person is even lower
than what we had concluded under rank invariance. In this example, deviations from
rank invariance mean that people at upper quantiles have even larger eects than what
we had concluded under rank invariance, while people at lower quantiles have even lower
eects, implying that overall, treatment eects are even more heterogeneous that what
we had concluded under rank invariance.
After this long introduction, consider again the graph you obtained in question 4.b).
(a) Look at whether condence intervals of
(τ ) − FY−1
(τ )
FY−1
1
0
for dierent values of
τ
−1
−1
cross or not. At the 10% level, can you reject that FY (τ ) − FY (τ ) is constant
1
0
−1
−1
for every τ ? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only two
1
0
−1
−1
dierent values? At the 10% level, can you reject that FY (τ ) − FY (τ ) takes only
1
0
three dierent values?
(b) Under the rank invariance assumption, what is the minimum number of dierent
values that
Y1 − Y0
must take? Are these values very dierent from each other? Is
the eect of the boarding school on students test scores heterogeneous?
6. Given the shape of the quantile treatment eect function
likely to be greater or smaller than
V (Y0 )?
τ 7→ FY−1
(τ )−FY−1
(τ ), is V (Y1 )
1
0
How could you estimate
V (Y1 )
and
V (Y0 )?
Which results could you use to derive condence intervals for these two quantities?
7. This policy is expensive:
it multiplies by two the expenditure per year per student.
However, the gain on average maths scores it produces is roughly the same as the one
produced by dividing class size by two, which also multiplies by two the expenditure per
year per student. The literature has found that class size reductions have larger eects
for the weakest students. Based on the results of the previous questions, nd one reason
why you might prefer class size reduction over the boarding school policy if you were a
social planner. Send me a 5 lines email with your answer.
* Exercise 6: Proving that Theorem 6.4.4 applies to the sample median
Let
Y
be continuous random variable with strictly increasing and continuous cdf
strictly positive and continuous pdf
support of
Y.
fY .
Let
y
and
y
FY
and
be the lower and upper bounds of the
The goal of this exercise is to show that the median of
Y
is an m-estimand
satisfying Assumption 9 in the notes, and to infer from this its asymptotic behavior using
Theorem 6.4.4.
It follows from example 2 in the notes that
me(Y ) =
argmin
E(|Y − θ|).
θ∈R
Therefore,
me(Y ) = argmin E(|Y − θ| − |Y |).
So
me(Y )
can be seen as an m-estimand with
θ∈R
m(y, θ) = |y − θ| − |y|.
1. Show that
θ 7→ m(y, θ)
is dierentiable in
θ
for every
θ
dierent from
expression of its derivative. Hint: you should start considering a value of
8
y.
Give an
θ ≥ 0,
then
you should give explicit expressions of
m(y, θ)
depending on the value of
y
(there are
three cases you must distinguish), then you should dierentiate these expressions with
respect to
θ.
Then you should do the same thing for
2. Infer from the previous question that
9:
θ 7→ m(y, θ)
3. Show that
m(y, θ)
m(y, θ) satises the rst requirement of Assumption
θ0 = me(Y )
is dierentiable at
θ < 0.
for almost every
y ∈ R.
satises the second requirement of Assumption 9: for every
(θ1 , θ2 ),
|m(y, θ1 ) − m(y, θ2 )| ≤ |θ1 − θ2 |.
4. Let
M (θ) = E(m(Y, θ)) = E(|Y − θ| − |Y |).
(a) Show that for any
θ ≥ 0,
Z
0
E(|Y − θ| − |Y |) =
Z
θ
Z
0
y
y
(θ − 2y)fY (y)dy −
θfY (y)dy +
θfY (y)dy.
θ
(b) Infer from the previous question that
θ
Z
(θ − 2y)fY (y)dy − θ(1 − FY (θ)).
E(|Y − θ| − |Y |) = θFY (0) +
0
(c) Infer from the previous question that
Z
E(|Y − θ| − |Y |) =
θ
−2yfY (y)dy − θ(1 − 2FY (θ)).
0
(d) Infer from the previous question and from an integration by parts (see Wikipedia
entry if needed) that
θ
Z
E(|Y − θ| − |Y |) = 2
FY (y)dy − θ.
0
θ ≥ 0, M (θ)
is twice continuously dierentiable. Give
Ṁ (θ)
(f ) Follow the same steps as in the 5 preceding subquestions to show that for any
θ < 0,
(e) Conclude that for any
and
M̈ (θ).
Z
0
E(|Y − θ| − |Y |) = −2
FY (y)dy − θ.
θ
(g) Conclude that for any
and
θ < 0, M (θ)
is twice continuously dierentiable. Give
Ṁ (θ)
M̈ (θ).
(h) Check that
Ṁ (θ)
and
M̈ (θ)
are continuous at
continuously dierentiable for any
(i) Check that
M̈ (θ)
0.
Conclude that
is invertible for any
θ.
Give
M̈ (θ0 )−1 .
me(Y )
is an m-estimand satisfying As-
sumption 9. Apply Theorem 6.4.4 to deduce from this that
me(Y ).
is twice
θ ∈ R.
5. Conclude from the preceding questions that
normal estimator of
M (θ)
me(Y
c )
is an asymptotically
Give an expression of its asymptotic variance.
9
* Exercise 7: Using Theorem 6.4.3 to derive the asymptotic behavior of the maximum likelihood estimator.
Let
X
be a random variable with support
θ ∈ Θ ⊆ R}
θ0 ∈ Θ.
[x, x],
f ∈ {pθ :
whose probability density function
(to simplify, I assume that the dimension of
It follows from Example 5 in the notes that
− ln(pθ (x)). The corresponding m-estimator is θb,
1 Pn
argmin
i=1 − ln(pθ (Xi )). We have seen that if:
n
θ0
θ
is one).
f = pθ0
for some
m(x, θ) =
b=
estimator: θ
is an m-estimand, with
the maximum likelihood
θ∈Θ
1. for any
θ, pθ 6= pθ0
on at least a non empty open subset of the support of
2.
Θ
3.
θ 7→ ln(pθ (x))
is twice continuously dierentiable with respect to
4.
¨ θ (X))
E(ln(p
0
is invertible,
then
θ0
X
is compact
θ
for every
x.
satises Assumptions 5, 6, 7, and 8 in the notes, so we can apply Theorem 6.4.3
to assert that the maximum likelihood estimator is asymptotically normal, with asymptotic
variance
−1 2 −1
¨
˙
¨ θ (X))
E ln(pθ0 (X))
E
ln(pθ0 (X))
E ln(p
.
0
There should be
−
transformed into a
signs appearing in each term, but we can forget about them, as one is
+
by the square, and multiplying the two other yields a
+
as well. The
goal of this exercise is to derive a simpler expression of this asymptotic variance.
follows, all derivatives are taken with respect to
1. Show that for any
θ,
Rx
x
ṗθ (x)dx = 0.
In what
θ.
Hint: you can use the fact that
Rx
x
pθ (x)dx = 1
and assume that you can invert the derivative and integral signs because the dominated
convergence theorem applies.
2. Infer from this that
˙ θ (X))) =
E(ln(p
0
x
Z
˙ θ (x))pθ (x)dx = 0.
ln(p
0
0
x
3. Infer from this that
¨ θ (X)) =
E(ln(p
0
Z
x
¨ θ (x))pθ (x)dx = −
ln(p
0
0
x
Z
x
˙ θ (x)))2 pθ (x)dx = −E
(ln(p
0
0
2 ˙ θ (X))
.
ln(p
0
x
Hint: the rst and last equality just follow from the denition of these expectations.
The one you need to prove is the middle one.
Rx
x
˙ θ (x))pθ (x)dx = 0
ln(p
with respect to
For that, you should dierentiate
θ.
4. Infer from this that the asymptotic variance of the maximum likelihood estimator is
E
2 −1
˙
ln(pθ0 (X))
.
10
5. Assume
X
follows a
exp(θ0 )
distribution.
compute the asymptotic variance of
Compute
θb.
Use the previous questions to
θb.
Exercise 8: a very small Monte-Carlo study of M-estimators
Let
U
Let
θ0 =
be a random variable following the uniform distribution on
argmin
E(m(U, θ)).
[0, 1].
The goal of the exercise is to compute
Let
θ0
m(u, θ) = (u−θ)2 .
and to estimate its
θ∈[0,1]
m-estimator
θb to
1. Show that
compare the two.
θ0 =
1
2.
2. Generate a data set with a variable
Ui
U
containing 10 000 observations of random variables
following the uniform distribution on
[0, 1].
3. Generate 101 more variables in this data set, respectively equal to
m(U, 0.02),..., m(U, 1).
4. Estimate the sample mean of these 101 variables.
5. Which of these variables has the lowest sample mean?
6. What is the value of
θb?
7. Do the exercise again with 100 observations.
11
m(U, 0), m(U, 0.01),
Download