Fuzzy Dierences-in-Dierences Clément de Chaisemartin Xavier D'Haultf÷uille University of Warwick

advertisement
Fuzzy Dierences-in-Dierences
Clément de Chaisemartin
University of Warwick
Xavier D'Haultf÷uille
CREST
McGill, 01/25/2016
Outline
Motivation: Wald-DIDs are everywhere
Two pitfalls of the Wald-DID
One new design restriction, two new estimands
Applications
Take-away
The Wald-DID
I Duo (2001), returns to education.
I
Design.
Primary school construction program in Indonesia. Some districts
receive more new schools than others. Two groups: high /low program
districts. Old cohorts not exposed, young cohorts exposed.
I
Estimation.
Regress individuals' wages on dummy for young cohorts,
dummy for high program districts and years of schooling. Instrument for
schooling: interaction of the two dummies.
I Let
W =
wages and
S=
schooling. Coe of schooling is Wald-DID:
E (W |Young,High prog.)−E (W |Old,High
E (S|Young,High prog.)−E (S|Old,High
prog.)−(E (W |Young,Low prog.)−E (W |Old,Low prog.))
prog.)−(E (S|Young,Low prog.)−E (S|Old,Low prog.))
I I will show now that beyond this example, all papers estimating OLS or
2SLS regressions with time and group xed eects estimate Wald-DIDs.
.
2SLS regressions with time and group dummies also
estimate a weighted average of Wald-DIDs
I Duo (2001) then moves to richer specications.
I
Estimation.
Regress individuals' wages on cohort and district of birth
dummies, and on years of schooling. Instrument for schooling: being born
after the program
×
number of schools constructed in district of birth.
I Coecient of years of schooling
=
Pd
d=1
wd WDID (d, d − 1) :
1. Districts ranked according to increase in schooling between old and
young cohorts.
2.
WDID (d, d − 1):
DID comparing the evolution of wages between
cohorts aected/not aected by the program in districts
3.
d − 1, divided by same DID
wd : weights summing up to
d
and
for years of schooling.
1 (formula in supplementary materials).
Regressions with time and group dummies estimate a
weighted average of Wald-DIDs
I Enikolopov et al. (2011) study eect of independent information on votes.
I
Design.
Introduction of independent TV channel between 1995 and 1999
elections in Russia. Signal quality better in some regions than in others.
I
Estimation.
Regress % votes for opposition in region
region dummies, 1999 election dummy, and
r
independent TV in region
and year
wr :
I
WDID (r , r − 1):
r
and year
t
on
of people with access to
t.
I We show that coecient of access to TV
I
%
=
Pr
r =1
wr WDID (r , r − 1).
weights summing up to 1.
DID comparing evolution of votes for opposition between
1995 and 1999 in regions
r
people with access to TV.
and
r − 1,
divided by same DID for
%
of
First-dierence regressions also estimate a weighted average
of Wald-DIDs
I Gentzkow et al. (2011): eects of newspapers on electoral participation.
I
Design.
Changes in number of newspapers available in US counties from
1872 to 1928.
I
Estimation.
t −1
and
t
Regress change in electoral participation between elections
c on year dummies and change in the number
t − 1 and t .
P1928 Pc
newspapers =
c=1 wct WDID (c, c − 1, t).
t=1872
in county
of
newspapers between
I Coecient of
I
wct :
I
WDID (c, c − 1, t): DID comparing evolution of participation between
elections t − 1 and t in counties c and c − 1, divided by same DID for
weights summing up to 1.
number of newspapers.
Wald-DIDs are everywhere...
I 10.1% of the 337 papers published in the American Economic Review
between 2010 and 2012 estimate either
I
the exact same 2SLS regression as in Duo (2001),
I
or the exact same OLS regression as in Enikolopov et al. (2011),
I
or the exact same OLS regression as in Gentzkow et al. (2011),
thus implying that they estimate weighted averages of Wald-DIDs.
I Excluding purely theoretical papers from the denominator, this share
raises to 19.7%.
... but we do not know under which assumptions Wald-DID
estimates a causal eect.
I Most treatment eect estimands initially studied under standard linear
and constant treatment eect model.
I Over the last twenty years, there has been a move away from this model:
unrealistic assumptions.
I Conditions under which DID estimates a causal eect in a model with
heterogeneous treatment eects are well-known (see, e.g. Blundell et al.,
2004, or Abadie, 2005).
I Surprisingly, there is no paper studying under which assumptions
Wald-DID estimates a causal eect.
Contributions of this paper
I We show that Wald-DID heavily relies on two strong assumptions:
1. Treatment eects should be homogeneous between groups.
2. Treatment eects should not change over time.
I We argue that these assumptions are often not plausible.
I We propose:
1.
A design restriction to solve problem 1:
nd a control group where
treatment stable. Often easy to achieve.
2.
Two new estimators to solve problem 2:
time-corrected Wald ratio
and changes-in-changes Wald ratio. Easy to use: Stata package.
Allows for binary and multivariate treatment, discrete and continuous
control variables, multiple periods and groups, clustering.
I We use our results to revisit Duo (2001) and Gentzkow et al. (2011). We
obtain economically and signicantly dierent results from the authors'.
Outline
Motivation: Wald-DIDs are everywhere
Two pitfalls of the Wald-DID
One new design restriction, two new estimands
Applications
Take-away
Set-up and notations
I We have a repeated cross-sections or cohorts data set (results also apply
to panel under slight modications of our assumptions, see paper).
I Data can be divided into:
1. Two groups.
G = 0:
control, and
G = 1:
treatment. E.g.: districts
with few/many schools constructed.
2. Two periods.
T =0
and
T = 1.
I Interested in eect of binary treatment
potential outcomes of same individual.
E.g.: old/young cohorts.
D on outcome Y . Y (0) and Y (1):
Y (1) − Y (0): treatment eect.
I Notations:
=
X,
E (X |G = 1, T = 1) − E (X |G = 1, T = 0)
−
(E (X |G = 0, T = 1) − E (X |G = 0, T = 0)) .
1. For any random variable
DIDX
2.
WDID =
DIDY
.
DIDD
I I focus on simple case with binary treatment, two groups, two periods: all
results extend to more general cases.
Reminder: in sharp designs, DIDY = ATE if common
trends.
I Sharp design:
D = T × G.
Only treatment group receives the treatment
in period 1. E.g. Card and Krueger (1994).
I Common trends assumption (CT):
E (Y (0)|G = 1, T = 1) − E (Y (0)|G = 1, T = 0)
=
E (Y (0)|G = 0, T = 1) − E (Y (0)|G = 0, T = 0).
If treatment group untreated in period 1, mean outcome would have
followed same evolution in the two groups.
Well-known result from Blundell et al. (2004) or Abadie (2005):
Theorem 1
In sharp designs, if CT holds then
DIDY = E (Y (1) − Y (0)|G = 1, T = 1).
Fuzzy designs.
Period 0
30% treated
Period 1
50% treated
Control Group
70% untreated
50% untreated
20% treated
70% treated
Treatment Group
80% untreated
30% untreated
Only one requirement: treatment rate does not follow parallel evolution in the
two groups.
Populations of interest in fuzzy designs.
Period 0
Period 1
Always Treated: Y(1)
Always Treated: Y(1)
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Always Treated: Y(1)
Always Treated: Y(1)
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Control Group
Treatment Group
I Assume data = repeated cross-sections of the same population.
I From
I
T =0
to
T = 1,
treatment increases from 30 to 50% in control.
⇒ 30% of units treated at
T = 1 (S), and 50% never
I In
T = 0,
both dates (AT), 20% become treated in
treated (NT).
we cannot distinguish NT from S. In
distinguish AT from S.
T = 1,
we cannot
In fuzzy designs, two more assumptions needed for
identication.
1. Stable treatment eects (STE).
g ∈ {0, 1},
E (Y (1) − Y (0)|AT , G = g , T = 1) = E (Y (1) − Y (0)|AT , G = g , T = 0).
For
In each group, ATE among always treated the same in period 0 and 1.
2. Homogeneous treatment eects (HTE).
E (Y (1) − Y (0)|S, G = 1, T = 1) = E (Y (1) − Y (0)|S, G = 0, T = 1).
= in treatment and control groups.
In period 1, ATE among switchers
Identication result (1/3)
Assume we are in a fuzzy design where
D 6= T × G .
1. If CT holds,
WDID
=
−
−
+
P(D = 1|G = 1, T
DIDD
P(D = 1|G = 1, T
E (Y (1) − Y (0)|D = 1, G = 1, T = 0)
DIDD
P(D = 1|G = 0, T
E (Y (1) − Y (0)|D = 1, G = 0, T = 1)
DIDD
P(D = 1|G = 0, T
E (Y (1) − Y (0)|D = 1, G = 0, T = 0)
DIDD
E (Y (1) − Y (0)|D = 1, G = 1, T = 1)
= 1)
= 0)
= 1)
= 0)
.
Intuition
I
DIDY : 6=
between trends of mean outcome in the 2 groups.
I CT: if nobody treated, mean outcome follows same trend in two groups.
I In
sharp designs only 1 departure from scenario where nobody treated:
treatment group treated in period 1.
I
⇒
under CT,
6=
between trends must come from treatment eect in
treatment group in period 1:
I In
DIDY = E (Y (1) − Y (0)|G = 1, T = 1).
fuzzy designs, 4 departures from scenario where nobody treated:
some
units treated in each of the 4 cells.
I
⇒
under CT,
DIDY
6=
between trends comes from treatment eect in all cells:
=
E (Y (1) − Y (0)|D = 1, G = 1, T = 1)P(D = 1|G = 1, T = 1)
−
E (Y (1) − Y (0)|D = 1, G = 1, T = 0)P(D = 1|G = 1, T = 0)
−
E (Y (1) − Y (0)|D = 1, G = 0, T = 1)P(D = 1|G = 0, T = 1)
+
E (Y (1) − Y (0)|D = 1, G = 0, T = 0)P(D = 1|G = 0, T = 0).
Identication result (2/3)
Assume we are in a fuzzy design where
D 6= T × G .
1. If CT holds,
WDID
=
−
−
+
P(D = 1|G = 1, T
DIDD
P(D = 1|G = 1, T
E (Y (1) − Y (0)|D = 1, G = 1, T = 0)
DIDD
P(D = 1|G = 0, T
E (Y (1) − Y (0)|D = 1, G = 0, T = 1)
DIDD
P(D = 1|G = 0, T
E (Y (1) − Y (0)|D = 1, G = 0, T = 0)
DIDD
E (Y (1) − Y (0)|D = 1, G = 1, T = 1)
= 1)
= 0)
= 1)
= 0)
.
2. If CT and STE hold,
WDID
=
−
(P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0))
DIDD
(P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0))
E (Y (1) − Y (0)|S, G = 0, T = 1)
.
DIDD
E (Y (1) − Y (0)|S, G = 1, T = 1)
Intuition
DIDY
=
E (Y (1) − Y (0)|D = 1, G = 1, T = 1)P(D = 1|G = 1, T = 1)
−
E (Y (1) − Y (0)|D = 1, G = 1, T = 0)P(D = 1|G = 1, T = 0)
−
(E (Y (1) − Y (0)|D = 1, G = 0, T = 1)P(D = 1|G = 0, T = 1)
−
E (Y (1) − Y (0)|D = 1, G = 0, T = 0)P(D = 1|G = 0, T = 0)) .
Control Group
Treatment Group
Period 0
Period 1
Always Treated: Y(1)
Always Treated: Y(1)
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Always Treated: Y(1)
Always Treated: Y(1)
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
If ATE of always treated does not change over time,
DIDY
=
E (Y (1) − Y (0)|S, G = 1, T = 1) (P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0))
−
E (Y (1) − Y (0)|S, G = 0, T = 1) (P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0)) .
Identication result (3/3)
Theorem 2
Assume we are in a fuzzy design where
D 6= T × G .
1. If CT holds,
WDID
=
−
−
+
P(D = 1|G = 1, T
DIDD
P(D = 1|G = 1, T
E (Y (1) − Y (0)|D = 1, G = 1, T = 0)
DIDD
P(D = 1|G = 0, T
E (Y (1) − Y (0)|D = 1, G = 0, T = 1)
DIDD
P(D = 1|G = 0, T
E (Y (1) − Y (0)|D = 1, G = 0, T = 0)
DIDD
E (Y (1) − Y (0)|D = 1, G = 1, T = 1)
= 1)
= 0)
= 1)
= 0)
.
2. If CT and STE hold,
WDID
=
−
(P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0))
DIDD
(P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0))
E (Y (1) − Y (0)|S, G = 0, T = 1)
.
DIDD
E (Y (1) − Y (0)|S, G = 1, T = 1)
3. If CT, STE, and HTE hold,
WDID
=
E (Y (1) − Y (0)|S, G = 1, T = 1).
Application 1: Duo (2001) heavily relies on HTE (1/2).
I In Duo (2001), schooling increased in high and low program districts.
Older cohort
Younger cohort
Di
Low program districts
9.40
9.76
0.36
High program districts
8.02
8.49
0.47
Table 1:
I Let
R1
and
R0
Years of education in Duo's groups
denote returns to schooling in high and low program
districts.
I It follows from previous theorem that under CT and STE, Wald-DID is
equal to
0.47/0.11
I If
R1 = R0 = R ,
I In this context,
× R1 − 0.36/0.11 × R0 .
no problem: 0.47/0.11
R1 = R0
× R − 0.36/0.11 × R = R .
not warranted: control districts have more skilled
labor, so they could have dierent returns.
I If
R1 = 0.10
and
R0 = 0.12, WDID = 0.035!
Application 1: Duo (2001) heavily relies on HTE (2/2)
I After estimating simple Wald-DID, Duo moves to richer specications.
I Coecient of schooling in her regression
=
Pd
wd WDID (d, d − 1).
WDID (d, d − 1)
districts d and d − 1.
d=1
I Using previous theorem, under CT and STE each of the
weighted dierence of returns to schooling in
=
I Rearranging, weighted sum of returns to schooling in each district.
I Weights can be estimated. In half of districts, returns to schooling receive
0
-.4
-.2
Weight
.2
.4
negative weights, and negative weights sum to -3.28.
0
100
200
300
Districts ordered by increase in schooling
Figure 1:
Enikolopov et al.
Weights in Duo (2001).
Application 2: Gentzkow et al. (2011) heavily rely on STE
I Back to Gentzkow et al. (2011) (eect of newspapers on political
participation).
I Coecient of newspapers is
P1928
t=1872
Pc
c=1
wct WDID (c, c − 1, t).
I Using previous theorem, under CT and STE each of the
= weighted dierence of treatment eect in counties
c
WDID (c, c − 1, t)
c − 1.
and
I Rearranging, weighted sum of treatment eects in each county.
I Weights positive for all counties
⇒
their coe does not rely on HTE.
I But using one new estimator we propose and which does not rely on STE
signicantly aects results.
⇒
their results heavily rely on STE.
I Not appealing: over time, new media develop and break monopoly of
printed press
⇒
eects of newspapers might diminish.
Outline
Motivation: Wald-DIDs are everywhere
Two pitfalls of the Wald-DID
One new design restriction, two new estimands
Applications
Take-away
To avoid relying on HTE, nd control group where
treatment stable
Remember: if CT and STE hold,
WDID
=
−
(P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0))
DIDD
(P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0))
E (Y (1) − Y (0)|S, G = 0, T = 1)
.
DIDD
E (Y (1) − Y (0)|S, G = 1, T = 1)
Theorem 3
If CT and STE hold, and
P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0),
WDID = E (Y (1) − Y (0)|S, G = 1, T = 1).
Intuition
Period 0
Period 1
Always Treated: Y(1)
Always Treated: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Always Treated: Y(1)
Always Treated: Y(1)
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Control Group
Treatment Group
I In treatment group, mean of
=
in period 1 - mean of
Y
in period 0
Eect of time on the outcome + Treatment eect among switchers.
I In control group, mean of
=
Y
Y
in period 1 - mean of
Y
in period 0
Eect of time on the outcome.
I CT+STE guarantee that eect of time is the same in both groups.
I Therefore,
WDID =
ATE among treatment group switchers.
First set of alternative assumptions
I If treatment stable in control group, Wald-DID does not rely on HTE.
I But will still rely on CT + STE. Instead, we consider generalization of
CT. Conditional common trends (CCT):
Period 0
Period 1
Always Treated: Y(1)
Always Treated: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Always Treated: Y(1)
Always Treated: Y(1)
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Control Group
Treatment Group
E (Y (1)|AT , G = 1, T = 1) − E (Y (1)|AT , G = 1, T = 0)
=
E (Y (1)|AT , G = 0, T = 1) − E (Y (1)|AT , G = 0, T = 0).
E (Y (0)|NT ∪ S, G = 1, T = 1) − E (Y (0)|NT ∪ S, G = 1, T = 0)
=
E (Y (0)|NT ∪ S, G = 0, T = 1) − E (Y (0)|NT ∪ S, G = 0, T = 0).
First alternative estimand: the Time-Corrected Wald ratio
Let
WTC =
where
E (Y |G = 1, T = 1) − E (Y + δD |G = 1, T = 0)
,
E (D|G = 1, T = 1) − E (D|G = 1, T = 0)
δd = E (Y |D = d, G = 0, T = 1) − E (Y |D = d, G = 0, T = 0).
Theorem 4
If CCT holds and
P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0),
WTC = E (Y (1) − Y (0)|S, G = 1, T = 1).
Intuition
Period 0
Period 1
Always Treated: Y(1)
Always Treated: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
Always Treated: Y(1)
Always Treated: Y(1)
Control Group
Treatment Group
WTC =
Switchers: Y(0)
Switchers: Y(1)
Never Treated: Y(0)
Never Treated: Y(0)
E (Y |G = 1, T = 1) − E (Y + δD |G = 1, T = 0)
.
E (D|G = 1, T = 1) − E (D|G = 1, T = 0)
Second term of numerator equal to mean of
Y
we would have observed in
period 1 in treatment group if switchers had remained untreated.
Second set of alternative assumptions
I For continuous outcomes, we also consider a generalization of the
changes-in-changes model (Athey and Imbens, 2006) to the fuzzy case.
Common changes (CC) assumption:
1.
2.
Y (d) = hd (Ud , T ) with hd (., t) strictly increasing.
Ud ⊥
⊥ T |AT , G and Ud ⊥
⊥ T |NT ∪ S, G .
I Under CC, we can recover
next slide.
E (Y (1) − Y (0)|S, G = 1, T = 1).
Intuition on
Intuition
E (Y (1) − Y (0)|S, G = 1, T = 1) we need to
E (Y (1)|AT , G = 1, T = 1) and E (Y (0)|NT ∪ S, G = 1, T = 1).
I To be able to recover
recover
I Under CCT, we have
E (Y (1)|AT , G = 1, T = 1) = E (Y (1) + δ1 |AT , G = 1, T = 0).
I Under CC, we have something dierent:
an AT in treatment group whose
Y (1)
in period 0 places her at the
q th
quantile of the distribution of AT in control group in period 0 will have in
th
period 1 the same Y (1) as that of the AT at the q
quantile of the
distribution of AT in control group in period 1.
I
Ud ⊥
⊥ T |AT , G :
AT's ranks stable over time.
I Under CC,
E (Y (1)|AT , G = 1, T = 1) = E (Q1 (Y (1))|AT , G = 1, T = 1),
Q1 (.) denotes the quantile-quantile transform among AT in the
group from period 0 to 1.
I Similar reasoning for
E (Y (0)|NT ∪ S, G = 1, T = 1).
where
control
Second alternative estimand: the CIC Wald ratio
WCIC =
Q1
and
Q0 :
E (Y |G = 1, T = 1) − E (QD (Y )|G = 1, T = 0)
.
E (D|G = 1, T = 1) − E (D|G = 1, T = 0)
quantile-quantile transforms of the outcome between periods 0 and
1 in the control group among treated and untreated units.
Wald-CIC similar to Wald-TC, except that instead of accounting for the eect
of time through additive shifts, applies quantile-quantile transforms.
Theorem 5
If CC holds and
P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0),
WCIC = E (Y (1) − Y (0)|S, G = 1, T = 1).
Wald-TC or Wald-CIC?
Estimators (1/2)
Let
ngt
Igt = {i : Gi = g , Ti = t} (resp. Idgt = {i : Di = d, Gi = g , Ti = t}) and
ndgt ) denote the size of Igt (resp. Idgt ) for all (d, g , t) ∈ {0, 1}3 .
(resp.
The Wald-DID and Wald-TC estimators are simply dened by
1
cDID =
W
1
n11
1
cTC =
W
P
i∈I11
n11
n11
Yi −
1
P
i∈I10
n10
Yi −
1
P
i∈I01
n01
Yi +
P
P
1
1
i∈I11 Di − n10
i∈I10 Di − n01
i∈I01 Di +
h
i
P
P
1
b
i∈I11 Yi − n10
i∈I10 Yi + δDi
P
P
,
1
1
i∈I11 Di − n10
i∈I10 Di
n11
P
where
δbd =
1
nd 01
X
i∈Id 01
Yi −
1
nd 00
X
i∈Id 00
Yi .
1
n00
1
n00
P
i∈I00
P
i∈I00
Yi
Di
,
Estimators (2/2)
Let
FbYdgt
denote the empirical cdf of
Y
1
FbYdgt (y ) =
ndgt
on the subsample
X
Idgt :
1{Yi ≤ y }.
i∈Idgt
Similarly, we estimate the quantile of order q ∈ (0, 1) of Ydgt by
1
FbY−dgt
(q) = inf {y : FbYdgt (y ) ≥ q}. The estimator of the quantile-quantile
bd = Fb−1 ◦ FbY . Then, the Wald-CIC estimator is dened by
transform is Q
d 00
Yd 01
1
cCIC =
W
P
n11
1
n11
i∈I11
P
Yi −
i∈I11
1
P
n10
Di −
1
n10
i∈I10
P
bD (Yi )
Q
i
i∈I10
Di
.
Inference
Theorem 6
(Yi , Di , Gi , Ti )(1≤i≤n)
P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0). Then
Suppose that we observe an iid sample
and
1. If
E (Y 2 ) < ∞ and CT and STE hold,
√ L
cDID − E (Y (1) − Y (0)|S, G = 1, T = 1) −→
n W
N (0, V1 ) .
2. If
E (Y 2 ) < ∞ and CCT holds,
√ L
cTC − E (Y (1) − Y (0)|S, G = 1, T = 1) −→
n W
N (0, V2 ) .
3. If outcome has bounded support and CC holds,
√ L
cCIC − E (Y (1) − Y (0)|S, G = 1, T = 1) −→
n W
N (0, V3 ) .
I Proof for
cCIC :
W
estimator
standard empirical process
= Hadamard dierentiable functional
⇒ we use functional Delta method.
I Estimators still asymptotically normal if clustering.
of
Extensions
I When one cannot nd a control group where treatment is stable, we show
that the ATE of switchers is partially identied and we derive sharp
bounds.
I We extend our analysis to non-binary treatments.
I We show how to use our estimators with multiple groups and time periods.
I We show how to incorporate discrete and continuous covariates into the
estimation.
Outline
Motivation: Wald-DIDs are everywhere
Two pitfalls of the Wald-DID
One new design restriction, two new estimands
Applications
Take-away
New control and treatment groups in Duo (2001).
I Duo's Wald-DID
=
weighted dierence between treatment eects in
treatment and control districts.
I Education higher in control districts
⇒
economic development and returns
to education may be higher there.
I If that's the case, Wald-DID downward biased.
I We use a dierent procedure to dene our groups. Goal = ensure that
years of education stable between cohorts in control group.
I Control group= districts such that p-value of a
across cohorts
χ2
comparing education
> 0.5.
I Treatment group=districts such that this p-value< 0.5 and average
number of years of education increased.
Older cohort
Younger cohort
Control districts
9.60
9.55
-0.05
(0.097)
Treatment districts
8.65
9.64
0.99
(0.082)
Table 2:
Dierence (s.e.)
Years of education in our control and treatment groups
Placebo tests
I We can use previous cohorts to check the validity of common trends
assumptions with these new groups.
Cohorts
-2 vs -1
DID schooling
DID wages
Numerator Wald-TC
Numerator Wald-CIC
N
Table 3:
-1 vs 0
0 vs 1
0.101
-0.006
1.030
(0.191)
(0.160)
(0.127)
0.050
0.002
0.164
(0.035)
(0.026)
(0.028)
0.029
-0.0083
0.103
(0.028)
(0.022)
(0.027)
0.034
-0.0047
0.107
(0.028)
(0.023)
(0.027)
14,452
19,938
22,339
Placebo tests with our control and treatment groups
Returns to education
I We compare Duo's estimate to our three new measures of returns to
education.
Duo's 2SLS
WDID
WTC
WCIC
OLS
0.073
0.159
0.104
0.100
0.077
(0.043)
(0.028)
(0.027)
(0.027)
(0.001)
30,828
22,339
22,339
22,339
30,828
Returns to education
N
I Our Wald-DID is twice as large as Duo's. Does not rely on homogeneity
assumption...
I ... But requires that returns to education
=
among old and young workers.
Not realistic: e.g. education delays labour market entry but returns to
experience concave
I If so,
WDID
⇒
returns presumably larger among old workers.
biased upwards.
I Wald-TC and Wald-CIC do not rely on STE. Lower than Wald-DID but
higher than Duo's estimate and OLS.
Robustness checks
Eects of newspapers on electoral participation
I Gentzkow et al. (2011): regress change in electoral participation between
elections
t −1
and
t
in county
number of newspapers between
c on year dummies
t − 1 and t .
and change in the
I Instead, rst alternative estimator:
1. For each pair of consecutive elections, we form control group of
counties where newspapers stable between
t −1
and
t,
and two
treatment groups: counties where newspapers increased, and
counties where decreased.
2. We estimate Wald-DID with control group and treatment group 1,
and Wald-DID with control group and treatment group 2.
3. We take a weighted average of these Wald-DIDs over dates
(weights? see paper).
I Second alternative estimator: Wald-TCs instead of Wald-DIDs.
I Treatment stable in all control groups
⇒
these estimators do not rely on
HTE. The weighted average of Wald-TCs also does not rely on STE.
Results
Table 4:
Eect of one additional newspaper on turnout
Gentzkow et al. (2011)
WDID
WTC
0.0026
0.0031
0.0047
(0.0009)
(0.0012)
(0.0014)
15627
15627
15627
N
I
WTC
signicantly dierent from estimator in Gentzkow et al. (2011) and
from Wald-DID.
I To reconstruct trends in turnout that a county which went from 2 to 3
newspapers between
t −1
t would have experienced if number of
WDID uses all counties where number of
t − 1 and t .
and
newspapers had not changed,
newspapers stable between
I On the other hand,
WTC
only uses counties where number of newspapers
stable and with 2 newspapers in
t − 1.
Outline
Motivation: Wald-DIDs are everywhere
Two pitfalls of the Wald-DID
One new design restriction, two new estimands
Applications
Take-away
Take-away
I In fuzzy DID designs, researchers should nd a control group where
treatment is stable. Otherwise, their results will rely on the assumption
that treatment eects are homogeneous across groups.
I When they have found such a control group, they might want to use
Wald-TC or Wald-CIC estimator: do not require that treatment eect be
stable over time. Using these estimators can make a big dierence in
practice wrt to standard estimators.
I We are developping a Stata package. Computes Wald-DID, Wald-TC,
Wald-CIC, accounts for clustering, includes covariates, etc. First version
already available on my website.
My portfolio
I Econometrics, on top of this paper:
1. Tolerating deance? Treatment eects without monotonicity.
Monotonicity in Imbens and Angrist (1994) can be weakened.
2. Next please! A new denition of the treatment and control groups
for randomizations with waiting lists, with Luc Behaghel. How to
dene the treatment and the control group when treatment is oered
according to a random ordering until all seats are lled.
I Education/health:
1. Ready for boarding? The eects of a boarding school for
underprivileged students with Luc Behaghel and Marc Gurgand.
School increases students' test scores in Maths, but only after two
years and among strong students. Long-term paper coming soon.
2. Does a community website improve health outcomes among
patients awaiting kidney transplant? with Luc Behaghel. Patients
randomized into 3 arms: control, information website, information +
community website (Facebook for patients awaiting transplant).
3. Assessing the classroom-wide impact of a mental health program for
disruptive students: eects on disruptive students, classmates, and
teachers, with Nicolas Navarrete.
Enikolopov et al. (2011) also heavily rely on HTE
I Back to Enikolopov et al. (2011) (eect of independent TV in Russia).
I Coecient of independent TV is
Pr
wr WDID (r , r − 1).
WDID (r , r − 1)
eect in regions r and r − 1.
r =1
I Using previous theorem, under CT and STE each of the
weighted dierence of treatment
I Rearranging, weighted sum of treatment eects in each region.
I Treatment eects in more than half of the regions receive negative
.01
.005
-.005
0
Weight
.015
.02
weights, and negative weights sum to -2.26.
0
500
1000
1500
2000
Regions ordered by increase in TV
Figure 2:
Back
Weights in Enikolopov et al. (2011).
=
Robustness checks
I We change the p-value threshold from 0.5 to 0.6 (distribution of education
not perfectly stable in our initial control group): leaves results unaected.
I Our procedure to form groups is statistical. Should be accounted for in
our standard errors.
I To account for this: double bootstrap procedure. First, we boostrap
individuals, and we form our groups again, then we bootstrap districts and
we compute our three estimators.
I Does not aect our main conclusions.
Back
Wald-TC or Wald-CIC?
Common trends versus common changes:
1. Common trends not invariant to scaling of outcome
⇒
logs versus levels
problem. But only restrict rst moment of outcome.
2. Common changes invariant to scaling, but restricts whole distribution of
outcome.
When groups very dierent in levels in the pre-period, Wald-TC might be very
sensitive to scaling and maybe Wald-CIC better.
Back
Download