PS5 for Econometrics 101, Warwick Econ Ph.D

advertisement
PS5 for Econometrics 101, Warwick Econ Ph.D
Exercise 1: Using number of calls before response to recover the average eect of the treatment
within a subgroup (Behaghel et al., 2012)
This question is a continuation of exercise 1 in the previous problem set + the question on
non-response in the exam of last year. In this exercise, we assume that it was actually not
a letter which was sent to households, but that they were called over the phone to ask them
whether someone in the household runs a business or not. Some households did not answer the
rst phone call so they had to be called again. Households were called a maximum of 25 times.
If they could not be reached in less than 25 calls, they were regarded as non respondents. In
this context, Ri = 0 if household i did not take any of the 25 calls, and Ri = 1 otherwise.
The graph below shows response rate in the treatment and in the control groups as a function
of the number of calls. For instance, you can see that a little bit less than 50% of treated
households answered the questionnaire after 5 calls, while in the control group, this rate was
reached after 25 calls only.
1
25
Control
0
5
10
15
Number of calls
Treatment
20
Figure 3: Response rates by assignment according to the number of phone calls
0
.1
.2
.3
.4
.5
.6
a) What is the value of Pb(Ri = 1|Di = 1)? Of Pb(Ri = 0|Di = 0)? Will Lee bounds (those
which were studied in the end term exam of last year) be wide or narrow in this application?
b) Explain intuitively how you could use number of calls to recover the eect of the treatment
in a subpopulation. Hint: the red lines on the25graph might help. Under which assumption
does this method rely?
c) How could you test whether that assumption is likely to hold?
Exercise 2: Group-level OLS regressions with time and group xed eects estimate a weighted
2
sum of Wald-DIDs (de Chaisemartin and D'Haultfoeuille, 2015).
Assume you observe repeated cross-sections of the same population at various dates and that
units can belong to various groups (e.g. counties or states). The date at which a unit is
observed is represented by a random variable T ∈ {0, 1, ..., t}, and the group a unit belongs
to is represented by a random variable G ∈ {0, 1, ..., g}. For instance, if there are 5 periods
(0,1,2,3,4) in the data and groups are American states, t = 4 while g = 49. Let Y be
an outcome variable. Let D be a treatment variable. Let β denote the coecient of D
in a 2SLS regression of Y on a constant, group dummies (1{G = g})1≤g≤g , time dummies
(1{T = t})1≤t≤t , and D, with a rst stage fully saturated in (T, G). The goal of this exercise
is to show that if T ⊥⊥ G, then β is equal to a weighted sum of Wald-DIDs.
To alleviate a little bit the notations, for every random variable X and for any (g, t) ∈
{0, 1, ..., g} × {0, 1, ..., t}, let Xgt denote a random variable with the same probability distribution as X|G = g, T = t. For instance, D00 denotes a random variable with the same probability
distribution as D|G = 0, T = 0. Because these two random variables have the same probability distribution, E(Xgt ) = E(X|G = g, T = t). For every (g, g 0 , t) ∈ {0, ..., g}2 × {1, ..., t},
let
DIDD (g, g 0 , t) = E(Dgt ) − E(Dgt−1 ) − E(Dg0 t ) − E(Dg0 t−1 ) .
DIDD (g, g 0 , t) is the di-in-di comparing the evolution of the mean of the treatment variable
between groups g and g 0 and between periods t − 1 and t. Assume that for every 1 ≤ t ≤ t
the mean of treatment does not follow a parallel evolution in any pair of groups between t − 1
and t.1 This is equivalent to assuming that DIDD (g, g 0 , t) 6= 0 for every (g, g 0 , t). If this is
satised, then we can dene
0
WDID (g, g , t) =
E(Ygt ) − E(Ygt−1 ) − E(Yg0 t ) − E(Yg0 t−1 )
.
E(Dgt ) − E(Dgt−1 ) − E(Dg0 t ) − E(Dg0 t−1 )
WDID (g, g 0 , t) is the di-in-di comparing the evolution of the mean of the outcome variable
between groups g and g 0 and between periods t − 1 and t, divided by the same di-in-di but
for the treatment variable. Finally, for (g, t) ∈ {1, ..., g} × {1, ..., t}, let
DIDD (g, g − 1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D))
.
Pt
t=1 DIDD (g, g − 1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D))
g=1
a
wgt
= P
g
a) First, let's try to interpret the assumption that T ⊥⊥ G by considering an example. If the
data is a survey of a representative sample of 100 000 individuals in the American population
made in three dierent years (100 000 individuals surveyed in 1991, 100 000 dierent individuals surveyed in 1992, 100 000 dierent individuals surveyed in 1993), and G represents
individuals' state of residence, what does the assumption that T ⊥⊥ G require? Is it likely to
be satised in this example?
1
If for some t, there are groups which experience a parallel evolution of their mean treatment between
t−1
and t, the formula that you will obtain in question f.8) remains valid after grouping together these groups.
3
b) What is the predicted value for D from the rst stage regression? (remember that the rst
stage is fully saturated in (T, G), which means that it has all the time dummies, all the group
dummies, and all their interactions)
Z̃)
, where Z̃ is the residual from an OLS regression
c) Conclude from question b) that β = cov(Y,
V (Z̃)
of E(D|G, T ) on a constant, group dummies (1{G = g})1≤g≤g , and time dummies (1{T =
t})1≤t≤t .
e = cov(E(D|T, G), Z)
e = cov(D, Z)
e . Conclude that β =
d) Show that V (Z)
cov(Y,Z̃)
.
cov(D,Z̃)
e) Let α, αg , and αt respectively denote the coecients of the constant and of the group and
time dummies in the regression of E(D|G, T ) on a constant, group dummies (1{G = g})1≤g≤g ,
and time dummies (1{T = t})1≤t≤t .
e.1) As T ⊥⊥ G, one can show that αt is equal to the coecient of the dummy 1{T = t} in a
regression of E(D|G, T ) on a constant and time dummies (1{T = t})1≤t≤t (without the group
dummies). Use this to give a formula for αt .
e.2) As T ⊥⊥ G, one can show that αg is equal to the coecient of the dummy 1{G = g} in
a regression of E(D|G, T ) on a constant and group dummies (1{G = g})1≤g≤g (without the
time dummies). Use this to give a formula for αg .
e.3) Conclude from the 2 preceding questions that
e = E(D|G, T ) − α − (E(D|G) − E(D|G = 0)) − (E(D|T ) − E(D|T = 0)) .
Z
f) Now that we have an explicit expression for Ze, we can try to rewrite β . Let's consider rst
its numerator.
e = E((E(D|G, T ) − E(D|G) − E(D|T ) + E(D))E(Y |G, T )).
f.1) Show that cov(Y, Z)
f.2) Deduce from this that as T ⊥⊥ G,
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt ) − E(D|G = g) − E(D|T = t) + E(D))E(Ygt ).
t=0 g=0
f.3) Deduce from this that
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt )−E(D|G = g)−E(D|T = t)+E(D)) (E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 ))) .
t=0 g=0
You need to check that the summation of the three terms added wrt to question f.2) is equal
to 0.
f.4) Deduce from this that
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt )−E(D|G = g)−E(D|T = t)+E(D)) (E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 ))) .
t=1 g=1
4
f.5) Deduce from this that
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt )−E(D|G = g)−E(D|T = t)+E(D))
g
t
X
X
WDID (g 0 , g 0 −1, t0 )DIDD (g 0 , g 0 −1, t0 )
t0 =1 g 0 =1
t=1 g=1
You need to show that
E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 )) =
g
t X
X
WDID (g 0 , g 0 − 1, t0 )DIDD (g 0 , g 0 − 1, t0 ).
t0 =1 g 0 =1
It's easier to do if you start from the right hand side.
f.6) Deduce from this that
e =
cov(Y, Z)
g
t X
X
g
t X
X
WDID (g, g−1, t)DIDD (g, g−1, t)
P (G = g 0 )P (T = t0 )(E(Dg0 t0 )−E(D|G = g 0 )−E(D|T = t0 )+E(D)).
t0 =t g 0 =g
t=1 g=1
f.7) Deduce from this that
e =
cov(Y, Z)
g
t X
X
WDID (g, g−1, t)DIDD (g, g−1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D)) .
t=1 g=1
f.8) Similarly, one can show that
e =
cov(D, Z)
g
t X
X
DIDD (g, g − 1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D)) .
t=1 g=1
Conclude from this and question f.7) that
β =
g
t X
X
a
WDID (g, g − 1, t)wgt
.
t=1 g=1
As you saw in question b), this 2SLS regression is equivalent to an OLS regression of Y on time
dummies, group dummies, and the mean of the treatment variable in each group × period
cell. An interesting special case of this type of OLS regression is when the unit of observation
is a group × period cell (say county × year), and the dependent variable is the average value
of Y in each group × period cell. This type of group level regression is extremely frequent in
economics. In such instances, groups are stable over time (unless counties or states appear or
disappear over the period under consideration, which is not going to be the case with 20th
century data), thus implying that the assumption that T ⊥⊥ G will be satised. For instance,
Gentzkow et al. (2011) estimate a regression of political participation in county c and year t
on county and year dummies, and on the number of newspapers available in county c and year
t. The result you just derived shows that the coecient of the number of newspapers in their
regression is equal to a weighted sum of Wald-DIDs which are comparisons of the evolution of
political participation across pairs of counties and consecutive elections, scaled by the same
comparison but for the evolution of the number of newspapers available.
5
Download