A note on the assumptions underlying instrumented dierence in dierences.

advertisement
A note on the assumptions underlying instrumented
dierence in dierences.
∗
Clément de Chaisemartin
December 20, 2013
Abstract
Many papers use dierential changes in treatment rate across groups over time as
a source of variation to capture treatment eects. The resulting estimand is the Wald
dierence in dierences, i.e. the ratio of the dierence in dierences on the outcome
and on the treatment.
This notes studies under which assumption this estimand
captures a treatment eect parameter in a model with heterogeneous eects.
It
shows that this requires stronger assumptions than those of the standard dierence
in dierences model.
Besides standard common trends assumptions, the average
treatment eect among subjects whose treatment change over time should be the
same across groups.
Keywords: Dierence in dierences, instrumental variable regressions, heterogeneous treatment eect.
JEL Codes: C21, C23
∗
Department of Economics, University of Warwick, clement.de-chaisemartin@warwick.ac.uk
1
1
Introduction
Dierence in dierences (DID) is one of the most popular methods for evaluating the
eect of a treatment in the absence of experimental data. In its basic version, a control
group is untreated at two dates, whereas a treatment group becomes treated at the
second date.
Under the assumption that the mean of the outcome would have followed
the same evolution over the two periods in the two groups if the treatment group had
remained untreated, the so-called common trend assumption, one can measure the eect
of the treatment by comparing the evolution of the mean outcome in both groups. DID
only require repeated cross section data, not necessarily panel data, which may explain
why this method is so pervasive.
Many natural experiments cannot be analyzed within the standard DID framework, because they do not lead to a sharp change in treatment rate for any group dened by a set
of observable characteristics, but only to a larger increase of the treatment rate in some
groups than in others. With panel data at hand, the analyst could dene the treatment
group as units going from non treatment to treatment between the two periods, while the
control group could be made up of units remaining untreated at the two periods. But this
denition of groups would be endogeneous, and might violate the common trend assumption. Units choosing to go from non treatment to treatment between the two periods might
do so because they experience dierent trends in outcomes.
The many papers that have exploited this type of variation to capture treatment eects
have used linear instrumental variable (IV) regressions.
Lochner & Moretti (2004) use
state compulsory laws as an instrument for schooling. They estimate IV regressions with
time and state xed eects as included instruments, while their excluded instrument is
compulsory schooling years. With only two states and two periods of time, their regression
coecient would merely be the ratio of the DID on wages and on schooling, which I will
hereafter refer to as the Wald-DID. With many states and many periods of time, one can
show that as their instrument only varies at the time and state level, their coecient is a
weighted average of Wald-DID across all possible pairs of states and periods of time. Other
papers which estimate similar type of IV-DID regressions include Akerman et al. (2013),
Burgess & Pande (2004), Duo (2001), or Field (2007).
2
Under the assumption that the eects of time and treatment are constant across units
in the population, IV-DID regressions capture the eect of the treatment under standard
IV assumptions. Models assuming constant treatment eects used to be very common in
econometrics, but there has recently been a shift towards models allowing for heterogeneous
eects, as many researchers think heterogeneous eects is a more realistic assumption. It
is now well-known than in a world with heterogeneous eects, instrumental variable regressions usually capture the average eect of the treatment for a subgroup of units, the
so-called local average treatment eect (LATE) (see Imbens & Angrist (1994)). Surprisingly, there has been no paper studying the assumptions under which those aforementioned
IV-DID regressions capture a local average treatment eect (LATE) in a model with heterogeneous treatment eects. The purpose of this note is to undertake this study.
I show that IV-DID captures a LATE under substantially stronger assumptions than the
standard DID assumptions. Firstly, IV-DID requires common trend assumptions not only
for the outcome but for the treatment variable as well. This ensures that the denominator
of the Wald-DID captures the share of compliers induced to get treated because of the
instrument. Then, following the same logic as in Imbens & Angrist (1994), the numerator
of the Wald-DID should capture the intention to treat, i.e. the eect of the instrument on
the outcome. For this to be true, any change in the mean of the outcome due to time and
not to the instrument should be the same in the two groups. As time has an eect both
on the outcome and on selection into treatment, rationalizing this requires making four
dierent common trends assumptions on the outcome: one for observations going from
non treatment to treatment between the two dates, one for those going from treatment to
non treatment, one for those remaining untreated, and one for those remaining treated.
Common trend assumptions for the two subgroups who switch treatment status are a
combination of common trends and common eects assumptions.
Once combined with
standard common trend assumptions, they actually imply that treatment eects should be
the same for switchers in the treatment and in the control groups. This is a substantial
restriction to treatment eect heterogeneity across groups.
Overall, this shows that while the regression discontinuity design carries to the fuzzy case
without requiring much stronger identifying assumptions, the same does not apply to DID.
3
2
Assumptions under which the Wald-DID captures a LATE
I assume that the analyst has two random samples drawn at two dates from the same
population.
1
Let
T
be an indicator for the second period.
Let
G
be an indicator for
whether an observation belongs to the treatment or to the control group. In most DID
papers,
G is typically a funcion of a time invariant observable characteristic, such as gender.
t ∈ {0, 1},
For every
let
to get treated at date
Z t = z.
Finally, let
D = DZT ,
and
t.
Zt
be a dummy for whether an unit receives an encouragement
Let
Dzt ∈ {0; 1}
denote her potential treatment at date
t
when
Ydt denote her potential outcome at date t when D = d.2 T , G, Z = Z T ,
Y = YDT
are observed. In this IV-DID framework, I assume that
Z = T × G:
treatment group subjects in period 1 are the only ones who receive an encouragement for
treatment. This implies that
Z 0 = 0,
and
Z 1 = G.
A natural way of studying IV-DID identication in a model with heterogeneous treatment
eects is to view it as a special case of the LATE theorem in Imbens & Angrist (1994).
Indeed, the Wald-DID is just a standard Wald ratio on rst dierences of outcomes and
treatments, with group as the instrument. The rst dierence of treatment can take three
values:
−1, 0,
and
1.
One must therefore use Angrist and Imbens's model with multi-
variate treatment (see Angrist & Imbens, 1995). In this framework, one has to introduce
one potential outcome for each possible value of treatment, and assume that potential
outcomes and treatments are jointly independent of the instrument. Here, this amounts
to introducing three potential rst dierences of the outcome depending on whether treatment status diminishes, increases, or remains stable. Let
∆D0 , ∆D1 , ∆Y−1 , ∆Y0 , and ∆Y1
denote those potential rst dierences of treatment and outcome.
When applied to the
Wald-DID, the random instrument assumption in Angrist & Imbens (1995) requires that
(∆D0 , ∆D1 , ∆Y−1 , ∆Y0 , ∆Y1 ) ⊥⊥ G.
(2.1)
However, Equation (2.1) is hard to interpret. Rewriting it in terms of the original potential
outcomes and treatments and not of their rst dierences yields
(D01 − D00 , D11 − D10 , Y01 − Y10 , 1{D01 = 0, D00 = 0}(Y01 − Y00 ) + 1{D01 = 1, D00 = 1}(Y11 − Y10 ), Y11 − Y00 ) ⊥⊥ G.
1 The
following results would also hold with panel data.
in this notation is the assumption that the instrument does not have a direct eect on the
2 Implicit
outcome, which is usually referred to as an exclusion restriction.
4
This proves dicult to interpret, because
either
Y11 − Y10
or
∆Y0
is not well dened as it can correspond to
Y01 − Y00 .
Equation (2.1) still reveals what are the key assumptions needed to interpret the Wald-DID
as a LATE. Firstly, trends on potential treatments should be the same in the control and
in the treatment groups. This will ensure that the denominator of the Wald-DID captures
the share of compliers in the population. Secondly, any change in the mean of the outcome
due to time must be the same in the two groups, so as to ensure that the numerator of the
Wald-DID captures the intention to treat eect of the instrument on the outcome. This
implies for instance that
E(Y01 − Y10 |G = 1, D01 − D00 = −1) = E(Y01 − Y10 |G = 0, D01 − D00 = −1),
which means that the change in mean outcome for subjects who go from treatment to non
treatment because of time and not because of the instrument should be the same in the
two groups.
I consider a mean-independence version of Equation 2.1, which only includes (meanindependence) pieces of Equation 2.1 necessary for identication.
Assumption 2.1
(Common trends)
P (D01 − D00 = −1|G = 1) = P (D01 − D00 = −1|G = 0)
(2.2)
P (D01 − D00 = 1|G = 1) = P (D01 − D00 = 1|G = 0)
(2.3)
E(Y01 − Y10 |G = 1, D01 − D00 = −1) = E(Y01 − Y10 |G = 0, D01 − D00 = −1)
(2.4)
E(Y11 − Y00 |G = 1, D01 − D00 = 1) = E(Y11 − Y00 |G = 0, D01 − D00 = 1),
(2.5)
and
E(Y01 − Y00 |G = 1, D01 = 0, D00 = 0) = E(Y01 − Y00 |G = 0, D01 = 0, D00 = 0)
= E(Y11 − Y10 |G = 1, D01 = 1, D00 = 1) = E(Y11 − Y10 |G = 0, D01 = 1, D00 = 1). (2.6)
The last equation means that trends on mean outcome for never treated subjects should be
the same in the two groups, and they should also be the same as trends for always treated
subjects. Once combined with (2.2) and (2.3), this ensures that
0}(Y01 − Y00 ) + 1{D01 = 1, D00 = 1}(Y11 − Y10 )
∆Y0 = 1{D01 = 0, D00 =
is mean independent of
5
G.
Then, I also take the following assumption, which is just the standard monotonicity condition in IV models (see Imbens & Angrist (1994)).
Assumption 2.2
For every
(Monotonicity)
t ∈ {0, 1},
D1t ≥ D0t .
Under those two assumptions, one can show that the Wald-DID captures a LATE.
Theorem 2.1
(The LATE-DID Theorem)
Suppose Assumptions 2.1 and 2.2 hold. Then
E(Y11 − Y01 |G = 1, D01 = 0, D11 = 1)
E(Y |T = 1, G = 1) − E(Y |T = 0, G = 1) − (E(Y |T = 1, G = 0) − E(Y |T = 0, G = 0))
=
E(D|T = 1, G = 1) − E(D|T = 0, G = 1) − (E(D|T = 1, G = 0) − E(D|T = 0, G = 0))
Proof
Consider rst the numerator of the Wald-DID.
E(Y |T = 1, G = 1) − E(Y |T = 0, G = 1) − (E(Y |T = 1, G = 0) − E(Y |T = 0, G = 0))
= E(YD1 1 |T = 1, G = 1) − E(YD0 0 |T = 0, G = 1) − E(YD1 1 |T = 1, G = 0) − E(YD0 0 |T = 0, G = 0)
1
0
0
0
= E(YD1 1 − YD0 0 |G = 1) − E(YD1 1 − YD0 0 |G = 0)
1
=
E(YD1 1
1
0
−
YD1 1 |G
0
0
0
= 1)
+ E(YD1 1 − YD0 0 |G = 1) − E(YD1 1 − YD0 0 |G = 0).
0
0
0
(2.7)
0
The rst equality follows from the fact that only the treatment group in period 1 receives
the instrument, so for them
Y = YD11 ,
second equality follows because the
1
while in the three remaining groups
T =0
Y = YDTT .
The
0
and
the same population.
6
T =1
samples are two random samples of
I shall show now that the second piece in the last equality is equal to
0.
E(YD11 − YD00 |G = 1) − E(YD11 − YD00 |G = 0)
0
0
0
0
= E(Y01 − Y10 |G = 1, D01 − D00 = −1)P (D01 − D00 = −1|G = 1)
− E(Y01 − Y10 |G = 0, D01 − D00 = −1)P (D01 − D00 = −1|G = 0)
+ E(Y11 − Y00 |G = 1, D01 − D00 = 1)P (D01 − D00 = 1|G = 1)
− E(Y11 − Y00 |G = 0, D01 − D00 = 1)P (D01 − D00 = 1|G = 0)
+ E(Y01 − Y00 |G = 1, D01 = 0, D00 = 0)P (D01 = 0, D00 = 0|G = 1)
+ E(Y11 − Y10 |G = 1, D01 = 1, D00 = 1)P (D01 = 1, D00 = 1|G = 1)
− E(Y01 − Y00 |G = 0, D01 = 0, D00 = 0)P (D01 = 0, D00 = 0|G = 0)
− E(Y11 − Y10 |G = 0, D01 = 1, D00 = 1)P (D01 = 1, D00 = 1|G = 0)
= 0
(2.8)
The three dierences in the last equality are all equal to
0.
For the rst, this follows from
Equations (2.2) and (2.4). For the second, this follows from Equations (2.3) and (2.5). For
the third, this follows from Equations (2.2), (2.3), and (2.6).
Then, it follows from Assumption 2.2 that
E(YD11 −YD11 |G = 1) = E(Y11 −Y01 |G = 1, D01 = 0, D11 = 1)P (D01 = 0, D11 = 1|G = 1).
1
0
(2.9)
Let us now turn to the denominator of the Wald-DID.
E(D|T = 1, G = 1) − E(D|T = 0, G = 1) − (E(D|T = 1, G = 0) − E(D|T = 0, G = 0))
= E(D11 − D00 |G = 1) − E(D01 − D00 |G = 0)
= E(D11 − D01 |G = 1)
+ E(D01 − D00 |G = 1) − E(D01 − D00 |G = 0)
= E(D11 − D01 |G = 1)
= P (D01 = 0, D11 = 1|G = 1)
(2.10)
The last but one equality follows from (2.2) and (2.3), while the last one follows from
Assumption 2.2.
The result follows after combining Equations (2.7), (2.8), (2.9), and (2.10).
7
QED.
Theorem 2.1 should not come as a surprise: it is well-known that the Wald ratio captures
a LATE in models with heterogeneous eects under assumptions in the spirit of Imbens
& Angrist (1994). What is more surprising is that when translated to the IV-DID context, those assumptions prove stronger than a mere combination of DID and IV type of
assumptions. DID identication only requires one common trend assumption for the outcome.
Here, common trend assumptions are needed both for the outcome and for the
treatment variable. Imposing an additive model for the treatment variable might be more
problematic than for the outcome, as treatment is binary. Moreover, several common trend
assumptions are needed for the outcome. Finally, and most importantly, Equations (2.4)
and (2.5) are a combination of common trend and common eect assumptions. Consider
the following mean independence conditions:
E(Y11 − Y10 |G = 1, D01 − D00 = −1) = E(Y11 − Y10 |G = 0, D01 − D00 = −1)
(2.11)
E(Y01 − Y00 |G = 1, D01 − D00 = 1) = E(Y01 − Y00 |G = 0, D01 − D00 = 1)
(2.12)
They are also implied by Equation (2.1).
Strictly speaking, they are not necessary to
prove Theorem 2.1, but it is hard to think of a model rationalizing Assumption 2.1 but not
Equations (2.11) and (2.12). Combining Equations (2.4) and (2.11), and Equations (2.5)
and (2.12) yields
E(Y11 − Y01 |G = 1, D01 − D00 = −1) = E(Y01 − Y01 |G = 0, D01 − D00 = −1)
E(Y11 − Y01 |G = 1, D01 − D00 = 1) = E(Y01 − Y01 |G = 0, D01 − D00 = 1).
Therefore, the LATE interpretation of the Wald-DID does not only require common trend
assumptions, it also requires that the eect of the treatment be the same in the two groups,
at least for subjects who switch treatment status between the two dates.
I nally consider an alternative set of assumptions under which the Wald-DID captures
the same LATE.
Assumption 2.3
(Common trends 2)
E(D01 − D00 |G = 1) = E(D01 − D00 |G = 0)
(2.13)
E(YD11 − YD00 |G = 1) = E(YD11 − YD00 |G = 0).
(2.14)
0
0
0
8
0
One can prove Theorem 2.1 under Assumptions 2.3 and 2.2 instead of Assumptions 2.1
and 2.2. Assumption 2.3 is uneasy to interpret, as (2.14) is not an assumption on potential
outcomes but on the outcome which would be observed in each group if the treatment
group had not received an encouragement for treatment in period 1. It therefore combines
both common trend assumptions on potential outcomes and on selection into treatment.
It is weaker than Assumption 2.1. To see it, notice that
E(YD11 − YD00 |G = g)
0
=
P (D01
−
0
D00
= −1|G = g)E(Y01 − Y10 |G = g, D01 − D00 = −1)
+ P (D01 − D00 = 1|G = g)E(Y11 − Y00 |G = g, D01 − D00 = 1)
+ P (D01 = 0, D00 = 0|G = g)E(Y11 − Y00 |G = g, D01 = 0, D00 = 0)
+ P (D01 = 1, D00 = 1|G = g)E(Y11 − Y00 |G = g, D01 = 1, D00 = 1).
Equation (2.1) implies that the right hand side is independent of
that Assumption 2.3 is satised.
g,
which in turn implies
But for Assumption 2.3 to hold while Assumption 2.1
is violated, the previous display shows that deviations from Equations (2.2), (2.3), (2.4),
(2.5), and (2.6) should exactly compensate one another. Once again, it is hard to think of
a model rationalizing this.
3
Conclusion
This note investigates the conditions under which IV-DID regressions capture a local average treatment eect in a model with heterogeneous eects. It shows that this requires
stronger assumptions than the standard common trend assumption needed for DID. In
particular, this requires that the average eect of the treatment be the same in the treatment and in the control groups for observations who switch treatment status between the
two dates.
9
References
Akerman, A., Gaarder, I. & Mogstad, M. (2013), The skill complementarity of broadband
internet, Technical report, Stockholm University, Department of Economics.
Angrist, J. D. & Imbens, G. W. (1995), `Two-stage least squares estimation of average
causal eects in models with variable treatment intensity',
Statistical Association
Journal of the American
90(430), pp. 431442.
Burgess, R. & Pande, R. (2004), `Can rural banks reduce poverty?
indian social banking experiment',
evidence from the
American Economic Review .
Duo, E. (2001), `Schooling and labor market consequences of school construction in indonesia:
Evidence from an unusual policy experiment',
American Economic Review
91(4), 795813.
Field, E. (2007), `Entitled to work: Urban property rights and labor supply in peru',
Quarterly Journal of Economics
The
122(4), 15611602.
Imbens, G. W. & Angrist, J. D. (1994), `Identication and estimation of local average
treatment eects',
Econometrica
62(2), 46775.
Lochner, L. & Moretti, E. (2004), `The eect of education on crime: Evidence from prison
inmates, arrests, and self-reports',
The American Economic Review
10
94(1), 155189.
Download