PS1 for Econometrics 101, Warwick Econ Ph.D

advertisement
PS1 for Econometrics 101, Warwick Econ Ph.D
Exercise 1: working with probabilities (1/3)
The probability density function for the random variable
6x(1 − x)
fX (x) =
1. Find
P (X ≤ 1/2).
0
What is the median of
P (X ≤ 1/2)
using
is given by:
0≤x≤1
otherwise
2. Find the cumulative distribution function
3. Calculate
if
X
X?
FX (x) = P (X ≤ x).
FX (x).
Exercise 2: working with probabilities (2/3)
Let
X
be a random variable with probability density function
x
fX (x) = 2 − x
if
0<x<1
if
1<x<2
0
Find
otherwise
V (X).
Exercise 3: working with probabilities (3/3)
Let
X
be a continuous random variable with pdf
fX (x) =
Show that
E(X) = 0
and
V (X) = 1,
2
√1 e−x /2 ,
2π
knowing that
x ∈ R.
R∞
−∞ fX (x)
= 1.
Exercise 4: transpose, inner products, and outer products.
The tranpose of a matrix
A is a matrix B
of
B
is the element on the
of
B
are the columns of
vector. The transpose of
j th
A.
A
If
line and
A
such that the element on the ith line and
ith
column of
A.
is denoted
A0 .
If
A
is a square matrix,
a) Let

3 2



.
A=
1
0


5 2
A0 .
b) Let
B=
Find
Putting it in other words, the lines
is a column vector, its transpose is the corresponding line
symmetric with respect to its diagonal.
Find
j th column
4 8
8 3
B0.
1
!
.
A0 = A
if and only if
A
is
c) Let
4
C=
.
8
C 0.
Find
X
Let
!
β
and
X 0β
(as
over
i
X0
k×1
be two
is
1×k
and
vectors. The inner product of
β
of the product of the
k×k
matrix equal to
Xβ 0
k × 1, X 0 β
is
ith
(as
is
k×1
and
β
is a real number equal to
is indeed a real number). This number is the sum
X
coordinates of
X
X
β0
and
and
is
β.
The outer product of
1 × k , Xβ 0
is indeed a
k×k
X
and
d) Let
j th
coordinate of
X = (2, 3)0
and
is a
matrix). The
0
element on the ith line and j th colum of Xβ is the product of the ith coordinate of
of the
β
X
and
β.
β = (6, −4)0 .
e) Compute the matrixes
Xβ 0
and
X 0β.
Compute
β0X .
Check that it is equal to
β0X .
Are they equal? Is there another relationship between
the two?
f ) Compute the matrix
XX 0 .
Which property does it satisfy?
Exercise 5: expectation and variance matrixes.
For any matrix
A whose elements are random variables Aij , E(A) is the matrix whose elements
are the expectations of the random variables in
of
E(A)
is
E(Aij ),
A:
the element on the ith line and jth column
the expectation of the element on the ith line and jth column of
A.
a) Let

X11 X12




X=
X
X
21
22


X31 X32
be a
3×2
matrix of random variables, with
E(X22 ) = 3, E(X31 ) = −6,
0
matrix E(X ). Check that
of random variables
Let
X
be a
k×1
and
2×2
=
Give the
3×2
matrix
E(X).
Give the
2×3
E(X)0 . This result holds more generally: for any matrix
X , E(X 0 ) = E(X)0 .
vector of random variables.
on the ith line and jth column of
b) Let
E(X32 ) = 4.
E(X 0 )
E(X11 ) = 1, E(X12 ) = 4, E(X21 ) = −1,
X = (X1 , X2 )0
be a
V (X)
is
V (X)
is a
k×k
cov(Xi , Xj ).
2 × 1 vector of random variables.
0
matrix (X −E(X))(X −E(X)) , and then show that
This result holds more generally: for any
matrix, such that the element
Give an explicit expression of the
E((X −E(X))(X −E(X))0 ) = V (X).
k ×1 vector of random variables, E((X −E(X))(X −
E(X))0 ) = V (X).
c) Let
B=
4 8
8 3
2
!
and
X = (X1 , X2 )0 .
Check that
E(BX) = BE(X).
This result holds more generally: for any
k×k matrix deterministic matrix B and k×1 vector of random variables X , E(BX) = BE(X).
To which property of the expectation operator is this equality due?
d) Let
X11
X=
be a
2×2
X21 X22 .
matrix of random variables, and let
C=
Compute the matrixes
!
4 8
.
2 3
CXC 0 , E(CXC 0 ), and CE(X)C 0 , and check that E(CXC 0 ) = CE(X)C 0 .
This result holds more generally: for any
k×k
!
X12
deterministic matrix
k×k
matrix of random variables
X
and for any
and for any
C , E(CXC 0 ) = CE(X)C 0 .
e) Use the results from questions b), c), and d) to show that for any
variables
X
k×k
deterministic matrix
B , V (BX) = BV
k×1
vector of random
(X)B 0 .
Exercise 6: A super consistent estimator
Assume you observe an iid sample of
on
[0, θ]. θ
is the unknown parameter we would like to estimate.
a) Show that
√
n random variables (Yi ) following the uniform distribution
E(Yi ) = θ/2.
Use this to form an estimator
1
θbM M
for
θ.
Show that
θbM M
is
n-consistent.
Consider the following alternative estimator for
b) Why does using
θbM L
c) Show that for any
to estimate
θ
θ: θbM L = max {Yi }.
1≤i≤n
sounds like a natural idea?
x ∈ [0, θ], P (θbM L ≤ x) =
x n
, for
θ
x < 0 P (θbM L ≤ x) = 0,
and for
x > θ P (θbM L ≤ x) = 1.
d) Use this to show that
n
θ−θbM L
θ
,→ U ,
where
U
follows an exponential distribution with
parameter 1. Hint : to prove this, you need to use the denition of convergence in distribution
in your lecture notes.
e) Which estimator is the best:
θbM M ,
or
θbM L ?
f ) Illustrate this through a Monte-Carlo study. Draw 1000 iid realizations of variables following
a uniform distribution on
θbM M
and
θbM L .
[0, 1] in Stata (you need to use the uniform()
What is the value of
θ
command), compute
in this example? Which estimator is the closest to
1
θ?
We call this a parametric model, because the distribution of the data is fully known up to something, θ,
which is a parameter, a one dimensional object. When you just say: the (Yi ) are iid, the distribution of the
data is known up to F , the cdf of the (Yi ). F can be any element of an innite dimensional set, the set of all
possible cdf. We call such models non-parametric models.
3
g) Let tx denote the
xth quantile of the exp(1) distribution.
is a condence interval for
θ
with asymptotic coverage
Show that
h
i
IC(α) = θbM L , θbM L + θbM L t1−α
n
1 − α.
Exercise 7: Roy selection model (1951), and randomized experiments.
a) Try to prove the following theorem:
Theorem 0.0.1
Let
of
F
Using uniforms to generate other continuous distributions
denote a strictly increasing cdf. If
F −1 (U )
U
[0, 1]
distribution, then the cdf
G(x) = P (F −1 (U ) ≤ x).
You need to show that
follows the uniform
F.
is
Hint: the cdf of
F −1 (U )
if the function
G(x) = F (x).
b) The inverse of the cdf of a random variable following the
− λ1
ln(1 − x).
on
(0, 1), − λ1 ln(1 − U )
It follows from the previous theorem that if
follows an
exp(λ)
distribution is
x 7→
follows a uniform distribution
distribution. Use this to generate in stata a data
base with a rst variable containing 1000 draws of the
variable
U
exp(λ)
exp(1)
distribution (we will call this
Y0 ), and a second variable containing 1000 draws of the exp(0.8) distribution (we will
call this variable
Y1 ).
The command to generate draws from uniform distributions in Stata is
unif orm().
Y0
Say that the 1000 observations in your data are 1000 unemployed people,
is the monthly
wage (in thousand pounds) they will have 6 months from now if they do not participate in
a training program, and
Y1
is the wage they will get if they do participate. We would like
to measure the eectiveness of this training program, and to do this the indicator we use is
E(Y1 − Y0 ).
c) In this exercise, we know the probability distribution of
distribution, and
Y1
follows an
exp(0.8)
exponential distributions to show that
distribution.
Y0
and
Y1 : Y0
follows and
exp(1)
Use this, and standard properties of
E(Y1 − Y0 ) = 0.25.
What is the eect of this program
on participants' monthly wages?
We are rst going to assume that unemployed self-select themselves into the training program,
and the decision rule they use is the following one:
to
1
Ds = 1 {Y1 − Y0 > 0.1},
where
Ds
is equal
if the unemployed chooses to participate.
d) Give an interpretation to this decision rule from the perspective of economic theory. What
do
Y1 − Y0
and
0.1
represent?
e) This simple decision rule is called the Roy selection model. On which assumption does it
rely? Any idea to come up with a more realistic decision rule?
f ) Generate the
when
Ds
variable in the data (you just need to compute a dummy equal to
Y1 − Y0 > 0.1,
observation.
Y = Y1
and to
0
otherwise). Generate also
for people with
Ds = 1 and Y = Y0
4
Y,
1
the observed outcome for each
for people with
Ds = 0.
Compute the
Y
sample mean of
with
Ds = 0.
among people with
That's an estimator for
Ds = 1
minus the sample of mean of
E(Y |Ds = 1) − E(Y |Ds = 0),
Y
among people
the naive measure of the
eect of the treatment we mentioned in the lectures. Is this estimator close to the average
0.25?
treatment eect,
Can you explain why this is the case?
Now, assume that for each unemployed you toss a fair coin, compel her to follow the program
if she gets heads, and compel her not to follow it if she gets tails. The variable
1
the result of that lottery: it is equal to
g) Generate the
Dr
mean of
Y0
Dr = 0 .
Y
if the unemployed gets heads.
among people with
Dr = 1 minus the sample mean of Y
Is the value of the estimator close to
among people with
Compare the sample mean of
Dr = 0.
people with
denotes
variable in the data (you can create a dummy equal to 1 if uniform()≤
Compute the sample mean of
people with
Dr
Dr = 1
Y1
E(Y1 − Y0 )?
to the sample mean of
among people with
Dr = 1
Y0
1/2).
among
Compare the sample
among people with
to the sample mean of
Dr = 0.
Y1
among
Why those two comparisons illustrate that randomized experiments
cancel out selection bias?
h) The randomized experiment allows us to measure
E(Y1 − Y0 ),
up to some statistical uncer-
tainty. In this particular context, explain where the uncertainty comes from.
i) Generate 300 samples of 1000 realizations of random variables following the same distribution as
Y0
and
Y1 ,
compute for each of them the
90%
condence interval for
E(Y1 − Y0 )
using the formula in Theorem 2.6.1 in the notes, and compute the percentage of times when
E(Y1 −Y0 ) does not lie in the condence interval.
time should
E(Y1 − Y0 )
According to the theory, which percentage of
lie in its condence interval? Does what you nd conrm the theory?
Run the same exercise again with only
100
observations, and with only
30
observations. Is
asymptotia very far?
j) The condence interval we derived in the previous question relies on Theorem 2.6.1, which
assumes that
V (Y0 ) = V (Y1 ).
Is this assumption satised here? Does this seem to matter?
k) Use Theorem 2.6.2 in the notes, to show that with
n = 1000, α = 0.05,
MDE of this experiment is 0.18 standard deviation of
Y.
E(V (Y |Dr )) + V (E(Y |Dr ))
eect is equal to
0.22
Y.
β = 0.8,
Use the ANOVA formula (V
to compute the standard deviation of
standard deviation of
and
Y,
the
(Y ) =
and show that the true
Is this a well designed experiment?
l) Assume we can only have 653 participants in the experiment. Use Theorem 2.6.2 in the
notes to show that with
standard deviation of
Y,
n = 653, α = 0.05,
and
β = 0.8,
which is exactly equal to the true eect of the experiment. Generate
300 pair of variables following the same distribution as
the
95%
the MDE of this experiment is 0.22
condence interval for
in the condence interval.
E(Y1 − Y0 ),
Y0
and
Y1 ,
compute for each of them
and compute the percentage of times when 0 lies
According to the theory, which percentage of times should
within the condence interval of
E(Y1 − Y0 )?
0
lie
Does the normal approximation works ne for
those exponentially distributed data?
5
m) Write me a 10 lines email summarizing your results: How do your results illustrate the
fact that randomized experiments are a good tool to measure the eect of a treatment? How
do they illustrate that there is still statistical uncertainty in the results of a randomized
experiment?
6
Download