PS5 for Econometrics 101, Warwick Econ Ph.D

advertisement
PS5 for Econometrics 101, Warwick Econ Ph.D
Exercise 1: Using number of calls before response to recover the average eect of the treatment
within a subgroup (Behaghel et al., 2012)
This question is a continuation of exercise 1 in the previous problem set + the question on
non-response in the exam of last year. In this exercise, we assume that it was actually not
a letter which was sent to households, but that they were called over the phone to ask them
whether someone in the household runs a business or not. Some households did not answer the
rst phone call so they had to be called again. Households were called a maximum of 25 times.
If they could not be reached in less than 25 calls, they were regarded as non respondents. In
this context, Ri = 0 if household i did not take any of the 25 calls, and Ri = 1 otherwise.
The graph below shows response rate in the treatment and in the control groups as a function
of the number of calls. For instance, you can see that a little bit less than 50% of treated
households answered the questionnaire after 5 calls, while in the control group, this rate was
reached after 25 calls only.
1
25
Control
0
5
10
15
Number of calls
Treatment
20
Figure 3: Response rates by assignment according to the number of phone calls
0
.1
.2
.3
.4
.5
.6
a) What is the value of Pb(Ri = 1|Di = 1)? Of Pb(Ri = 0|Di = 0)? Will Lee bounds (those
which were studied in the end term exam of last year) be wide or narrow in this application?
b) Explain intuitively how you could use number of calls to recover the eect of the treatment
in a subpopulation. Hint: the red lines on the25graph might help. Under which assumption
does this method rely?
c) How could you test whether that assumption is likely to hold?
Solution
2
a) Pb(Ri = 1|Di = 1) is the sample response rate in the treatment group after 25 calls, which
is slightly less than 65%, as you can see from the Figure. Pb(Ri = 1|Di = 0) is the sample
response rate in the control group after 25 calls, which is equal to 50%. Lee bounds will be
wide in this application, because the dierence in response rates across the treatment and the
control group is large.
b) A solution could be to compare the entire subpopulation of respondents in the control
group (all of those who answered the questionnaire, irrespective of how many calls one had to
make before reaching them), to households in the treatment group which replied in less than
6 calls.
The reasoning in Behaghel et al. goes as follows. The graph clearly shows that treatment has
an eect on response: the two groups were similar in the beginning of the experiment, but
in the end the response rate is 15 percentage points larger in the treatment group. Because
of this, respondents in the treatment and in the control group are probably not comparable:
control respondents include only forgiving subjects, while treated respondents include both
forgiving and unforgiving subjects. But number of calls also has an eect on response: the
more you call people, the more they are likely to answer. Because of this, the response rate
after 6 calls in the treatment group is the same as in the control group after 25 calls. Calling
the control group 19 times more than the treatment group compensates, in terms of response
rate, the holding grudge eect which leads some control group subjects not to reply in the rst
place. If those 19 extra phone calls allow you to catch up control group subjects who have
been induced not to reply in less than 6 calls because they have been denied the treatment,
those two groups will indeed be comparable, and comparing them will capture the average
eect of the treatment among subjects who respond in less than 6 calls when they are sent to
the control group and in less than 25 calls when they are sent to the control group.
But if those 19 extra phone calls do not allow you to catch up control group subjects who
have been induced not to respond because of the fact they were denied treatment, then you
cannot be sure that the two groups you are comparing are indeed comparable.
c) The assumption under which this method relies is that subjects responding in less than 6
calls in the treatment group are comparable, in terms of their potential outcomes, to those
who respond in less than 25 calls in the control group. It is not possible to directly test this
assumption, but an indirect test amounts to comparing the mean of a number of baseline
covariates in those two groups.
Exercise 2: Group-level OLS regressions with time and group xed eects estimate a weighted
sum of Wald-DIDs (de Chaisemartin and D'Haultfoeuille, 2015).
Assume you observe repeated cross-sections of the same population at various dates and that
units can belong to various groups (e.g. counties or states). The date at which a unit is
observed is represented by a random variable T ∈ {0, 1, ..., t}, and the group a unit belongs
to is represented by a random variable G ∈ {0, 1, ..., g}. For instance, if there are 5 periods
3
(0,1,2,3,4) in the data and groups are American states, t = 4 while g = 49. Let Y be
an outcome variable. Let D be a treatment variable. Let β denote the coecient of D
in a 2SLS regression of Y on a constant, group dummies (1{G = g})1≤g≤g , time dummies
(1{T = t})1≤t≤t , and D, with a rst stage fully saturated in (T, G). The goal of this exercise
is to show that if T ⊥⊥ G, then β is equal to a weighted sum of Wald-DIDs.
To alleviate a little bit the notations, for every random variable X and for any (g, t) ∈
{0, 1, ..., g} × {0, 1, ..., t}, let Xgt denote a random variable with the same probability distribution as X|G = g, T = t. For instance, D00 denotes a random variable with the same probability
distribution as D|G = 0, T = 0. Because these two random variables have the same probability distribution, E(Xgt ) = E(X|G = g, T = t). For every (g, g 0 , t) ∈ {0, ..., g}2 × {1, ..., t},
let
DIDD (g, g 0 , t) = E(Dgt ) − E(Dgt−1 ) − E(Dg0 t ) − E(Dg0 t−1 ) .
DIDD (g, g 0 , t) is the di-in-di comparing the evolution of the mean of the treatment variable
between groups g and g 0 and between periods t − 1 and t. Assume that for every 1 ≤ t ≤ t
the mean of treatment does not follow a parallel evolution in any pair of groups between t − 1
and t.1 This is equivalent to assuming that DIDD (g, g 0 , t) 6= 0 for every (g, g 0 , t). If this is
satised, then we can dene
0
WDID (g, g , t) =
E(Ygt ) − E(Ygt−1 ) − E(Yg0 t ) − E(Yg0 t−1 )
.
E(Dgt ) − E(Dgt−1 ) − E(Dg0 t ) − E(Dg0 t−1 )
WDID (g, g 0 , t) is the di-in-di comparing the evolution of the mean of the outcome variable
between groups g and g 0 and between periods t − 1 and t, divided by the same di-in-di but
for the treatment variable. Finally, for (g, t) ∈ {1, ..., g} × {1, ..., t}, let
DIDD (g, g − 1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D))
.
Pt
t=1 DIDD (g, g − 1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D))
g=1
a
wgt
= P
g
a) First, let's try to interpret the assumption that T ⊥⊥ G by considering an example. If the
data is a survey of a representative sample of 100 000 individuals in the American population
made in three dierent years (100 000 individuals surveyed in 1991, 100 000 dierent individuals surveyed in 1992, 100 000 dierent individuals surveyed in 1993), and G represents
individuals' state of residence, what does the assumption that T ⊥⊥ G require? Is it likely to
be satised in this example?
b) What is the predicted value for D from the rst stage regression? (remember that the rst
stage is fully saturated in (T, G), which means that it has all the time dummies, all the group
dummies, and all their interactions)
Z̃)
, where Z̃ is the residual from an OLS regression
c) Conclude from question b) that β = cov(Y,
V (Z̃)
of E(D|G, T ) on a constant, group dummies (1{G = g})1≤g≤g , and time dummies (1{T =
t})1≤t≤t .
1
If for some t, there are groups which experience a parallel evolution of their mean treatment between
t−1
and t, the formula that you will obtain in question f.8) remains valid after grouping together these groups.
4
e = cov(E(D|T, G), Z)
e = cov(D, Z)
e . Conclude that β =
d) Show that V (Z)
cov(Y,Z̃)
.
cov(D,Z̃)
e) Let α, αg , and αt respectively denote the coecients of the constant and of the group and
time dummies in the regression of E(D|G, T ) on a constant, group dummies (1{G = g})1≤g≤g ,
and time dummies (1{T = t})1≤t≤t .
e.1) As T ⊥⊥ G, one can show that αt is equal to the coecient of the dummy 1{T = t} in a
regression of E(D|G, T ) on a constant and time dummies (1{T = t})1≤t≤t (without the group
dummies). Use this to give a formula for αt .
e.2) As T ⊥⊥ G, one can show that αg is equal to the coecient of the dummy 1{G = g} in
a regression of E(D|G, T ) on a constant and group dummies (1{G = g})1≤g≤g (without the
time dummies). Use this to give a formula for αg .
e.3) Conclude from the 2 preceding questions that
e = E(D|G, T ) − α − (E(D|G) − E(D|G = 0)) − (E(D|T ) − E(D|T = 0)) .
Z
f) Now that we have an explicit expression for Ze, we can try to rewrite β . Let's consider rst
its numerator.
e = E((E(D|G, T ) − E(D|G) − E(D|T ) + E(D))E(Y |G, T )).
f.1) Show that cov(Y, Z)
f.2) Deduce from this that as T ⊥⊥ G,
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt ) − E(D|G = g) − E(D|T = t) + E(D))E(Ygt ).
t=0 g=0
f.3) Deduce from this that
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt )−E(D|G = g)−E(D|T = t)+E(D)) (E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 ))) .
t=0 g=0
You need to check that the summation of the three terms added wrt to question f.2) is equal
to 0.
f.4) Deduce from this that
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt )−E(D|G = g)−E(D|T = t)+E(D)) (E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 ))) .
t=1 g=1
f.5) Deduce from this that
e =
cov(Y, Z)
g
t X
X
P (G = g)P (T = t)(E(Dgt )−E(D|G = g)−E(D|T = t)+E(D))
g
t
X
X
WDID (g 0 , g 0 −1, t0 )DIDD (g 0 , g 0 −1, t0 )
t0 =1 g 0 =1
t=1 g=1
You need to show that
E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 )) =
g
t X
X
t0 =1 g 0 =1
5
WDID (g 0 , g 0 − 1, t0 )DIDD (g 0 , g 0 − 1, t0 ).
It's easier to do if you start from the right hand side.
f.6) Deduce from this that
e =
cov(Y, Z)
g
t X
X
g
t X
X
WDID (g, g−1, t)DIDD (g, g−1, t)
P (G = g 0 )P (T = t0 )(E(Dg0 t0 )−E(D|G = g 0 )−E(D|T = t0 )+E(D)).
t0 =t g 0 =g
t=1 g=1
f.7) Deduce from this that
e =
cov(Y, Z)
g
t X
X
WDID (g, g−1, t)DIDD (g, g−1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D)) .
t=1 g=1
f.8) Similarly, one can show that
e =
cov(D, Z)
g
t X
X
DIDD (g, g − 1, t)P (G ≥ g)P (T ≥ t) (E (D|G ≥ g, T ≥ t) − E (D|G ≥ g) − E (D|T ≥ t) + E(D)) .
t=1 g=1
Conclude from this and question f.7) that
β =
g
t X
X
a
WDID (g, g − 1, t)wgt
.
t=1 g=1
As you saw in question b), this 2SLS regression is equivalent to an OLS regression of Y on time
dummies, group dummies, and the mean of the treatment variable in each group × period
cell. An interesting special case of this type of OLS regression is when the unit of observation
is a group × period cell (say county × year), and the dependent variable is the average value
of Y in each group × period cell. This type of group level regression is extremely frequent in
economics. In such instances, groups are stable over time (unless counties or states appear or
disappear over the period under consideration, which is not going to be the case with 20th
century data), thus implying that the assumption that T ⊥⊥ G will be satised. For instance,
Gentzkow et al. (2011) estimate a regression of political participation in county c and year t
on county and year dummies, and on the number of newspapers available in county c and year
t. The result you just derived shows that the coecient of the number of newspapers in their
regression is equal to a weighted sum of Wald-DIDs which are comparisons of the evolution of
political participation across pairs of counties and consecutive elections, scaled by the same
comparison but for the evolution of the number of newspapers available.
Solution
a) We will have T ⊥⊥ G if for each state, P (G = g|T = 1991) = P (G = g|T = 1992) = P (G =
g|T = 1993): the share that each state accounts for in the American population should not
vary over time. This is unlikely to be exactly satised, but that should not be too strongly
violated either.
b) If the rst stage regression is fully saturated in (T, G), the regression function and the CEF
are the same thing. Therefore, the predicted value for D from the rst stage is E(D|T, G).
6
c) Therefore, the second stage is an OLS regression of Y on a constant, the time dummies,
the group dummies, and E(D|T, G). Then, the result follows from Frisch-Waugh.
d) The rst equality follows from the fact that
E(D|T, G) = α +
g
X
αg 1{G = g} +
g=1
t
X
αt 1{T = t} + Ze
t=1
and Ze is by construction uncorrelated with the time and group dummies. The second equality
follows from the law of iterated expectations.
e.1) The regression of E(D|G, T ) on a constant and time dummies (1{T = t})1≤t≤t corresponds
to the second example of a saturated regression in the lecture notes. Therefore,
αt = E(E(D|G, T )|T = t) − E(E(D|G, T )|T = 0) = E(D|T = t) − E(D|T = 0).
The second equality follows from the second law of iterated expectations.
e.2) Following the same reasoning,
αg = E(E(D|G, T )|G = g) − E(E(D|G, T )|G = 0) = E(D|G = g) − E(D|G = 0).
e.3) Therefore,
e
Z
=
E(D|G, T ) − α −
g
X
(E(D|G = g) − E(D|G = 0)) 1{G = g} −
(E(D|T = t) − E(D|T = 0)) 1{T = t}
t=1
g=1
=
t
X
E(D|G, T ) − α − (E(D|G) − E(D|G = 0)) − (E(D|T ) − E(D|T = 0)) .
f.1)
e = cov(Y, E(D|G, T ) − E(D|G) − E(D|T ))
cov(Y, Z)
= E((E(D|G, T ) − E(D|G) − E(D|T ) + E(D))Y )
= E((E(D|G, T ) − E(D|G) − E(D|T ) + E(D))E(Y |G, T )).
The rst equality follows from the fact that α, E(D|G = 0) and E(D|T = 0) are constant,
and from the fact that the mean of E(D|G, T ) − E(D|G) − E(D|T ) + E(D) is equal to 0 (rst
law of iterated of expectations). The second equality follows from the rst law of iterated
expectations.
f.2) The result merely follows from the denition of an expectation, and from the fact that
G ⊥⊥ T .
7
f.3)
g
t X
X
P (G = g)P (T = t)(E(Dgt ) − E(D|G = g) − E(D|T = t) + E(D))E(Y0t )
t=0 g=0
=
t
X

P (T = t)E(Y0t ) E(D) −
=
P (G = g)E(D|G = g) +
g=0
t=0
t
X
g
X
g
X

P (G = g|T = t)E(Dgt ) − E(D|T = t)
g=0
P (T = t)E(Y0t ) (E(D) − E(D) + E(D|T = t) − E(D|T = t))
t=0
= 0.
One can show that the other two terms added in the summation when going from question
f.2) to f.3) also sum up to 0.
f.4) It's easy to see that each term in the summation in f.3) with g = 0 or t = 0 is equal to 0.
f.5)
g
t X
X
WDID (g 0 , g 0 − 1, t0 )DIDD (g 0 , g 0 − 1, t0 )
t0 =1 g 0 =1
=
g
t X
X
E(Yg0 t0 ) − E(Yg0 t0 −1 ) − E(Yg0 −1t0 ) − E(Yg0 −1t0 −1 )
t0 =1 g 0 =1
=
t
X
E(Ygt0 ) − E(Ygt0 −1 ) − (E(Y0t0 ) − E(Y0t0 −1 ))
t0 =1
= E(Ygt ) − E(Yg0 ) − (E(Y0t ) − E(Y00 )) .
f.6) This follows from permuting the summations.
f.7) For instance, we have
g
t X
X
P (G = g 0 )P (T = t0 )E(Dg0 t0 )
t0 =t g 0 =g
=
g
t X
X
P (G = g 0 , T = t0 )E(Dg0 t0 )
t0 =t g 0 =g
g
t X
X
P (G = g 0 , T = t0 )
= P (G ≥ g, T ≥ t)
E(Dg0 t0 )
P (G ≥ g, T ≥ t)
0
0
t =t g =g
= P (G ≥ g)P (T ≥ t)
g
t X
X
P (G = g 0 , T = t0 |G ≥ g, T ≥ t)E(Dg0 t0 )
t0 =t g 0 =g
= P (G ≥ g)P (T ≥ t)E(D|G ≥ g, T ≥ t).
The result can be derived similarly for the other terms.
f.8) The result follows combining the formula for β obtained in d), the formula obtained in
f.7), and that given to you in f.8).
8
Download