Fuzzy Dierences-in-Dierences Clément de Chaisemartin University of Warwick Xavier D'Haultf÷uille CREST McGill, 01/25/2016 Outline Motivation: Wald-DIDs are everywhere Two pitfalls of the Wald-DID One new design restriction, two new estimands Applications Take-away The Wald-DID I Duo (2001), returns to education. I Design. Primary school construction program in Indonesia. Some districts receive more new schools than others. Two groups: high /low program districts. Old cohorts not exposed, young cohorts exposed. I Estimation. Regress individuals' wages on dummy for young cohorts, dummy for high program districts and years of schooling. Instrument for schooling: interaction of the two dummies. I Let W = wages and S= schooling. Coe of schooling is Wald-DID: E (W |Young,High prog.)−E (W |Old,High E (S|Young,High prog.)−E (S|Old,High prog.)−(E (W |Young,Low prog.)−E (W |Old,Low prog.)) prog.)−(E (S|Young,Low prog.)−E (S|Old,Low prog.)) I I will show now that beyond this example, all papers estimating OLS or 2SLS regressions with time and group xed eects estimate Wald-DIDs. . 2SLS regressions with time and group dummies also estimate a weighted average of Wald-DIDs I Duo (2001) then moves to richer specications. I Estimation. Regress individuals' wages on cohort and district of birth dummies, and on years of schooling. Instrument for schooling: being born after the program × number of schools constructed in district of birth. I Coecient of years of schooling = Pd d=1 wd WDID (d, d − 1) : 1. Districts ranked according to increase in schooling between old and young cohorts. 2. WDID (d, d − 1): DID comparing the evolution of wages between cohorts aected/not aected by the program in districts 3. d − 1, divided by same DID wd : weights summing up to d and for years of schooling. 1 (formula in supplementary materials). Regressions with time and group dummies estimate a weighted average of Wald-DIDs I Enikolopov et al. (2011) study eect of independent information on votes. I Design. Introduction of independent TV channel between 1995 and 1999 elections in Russia. Signal quality better in some regions than in others. I Estimation. Regress % votes for opposition in region region dummies, 1999 election dummy, and r independent TV in region and year wr : I WDID (r , r − 1): r and year t on of people with access to t. I We show that coecient of access to TV I % = Pr r =1 wr WDID (r , r − 1). weights summing up to 1. DID comparing evolution of votes for opposition between 1995 and 1999 in regions r people with access to TV. and r − 1, divided by same DID for % of First-dierence regressions also estimate a weighted average of Wald-DIDs I Gentzkow et al. (2011): eects of newspapers on electoral participation. I Design. Changes in number of newspapers available in US counties from 1872 to 1928. I Estimation. t −1 and t Regress change in electoral participation between elections c on year dummies and change in the number t − 1 and t . P1928 Pc newspapers = c=1 wct WDID (c, c − 1, t). t=1872 in county of newspapers between I Coecient of I wct : I WDID (c, c − 1, t): DID comparing evolution of participation between elections t − 1 and t in counties c and c − 1, divided by same DID for weights summing up to 1. number of newspapers. Wald-DIDs are everywhere... I 10.1% of the 337 papers published in the American Economic Review between 2010 and 2012 estimate either I the exact same 2SLS regression as in Duo (2001), I or the exact same OLS regression as in Enikolopov et al. (2011), I or the exact same OLS regression as in Gentzkow et al. (2011), thus implying that they estimate weighted averages of Wald-DIDs. I Excluding purely theoretical papers from the denominator, this share raises to 19.7%. ... but we do not know under which assumptions Wald-DID estimates a causal eect. I Most treatment eect estimands initially studied under standard linear and constant treatment eect model. I Over the last twenty years, there has been a move away from this model: unrealistic assumptions. I Conditions under which DID estimates a causal eect in a model with heterogeneous treatment eects are well-known (see, e.g. Blundell et al., 2004, or Abadie, 2005). I Surprisingly, there is no paper studying under which assumptions Wald-DID estimates a causal eect. Contributions of this paper I We show that Wald-DID heavily relies on two strong assumptions: 1. Treatment eects should be homogeneous between groups. 2. Treatment eects should not change over time. I We argue that these assumptions are often not plausible. I We propose: 1. A design restriction to solve problem 1: nd a control group where treatment stable. Often easy to achieve. 2. Two new estimators to solve problem 2: time-corrected Wald ratio and changes-in-changes Wald ratio. Easy to use: Stata package. Allows for binary and multivariate treatment, discrete and continuous control variables, multiple periods and groups, clustering. I We use our results to revisit Duo (2001) and Gentzkow et al. (2011). We obtain economically and signicantly dierent results from the authors'. Outline Motivation: Wald-DIDs are everywhere Two pitfalls of the Wald-DID One new design restriction, two new estimands Applications Take-away Set-up and notations I We have a repeated cross-sections or cohorts data set (results also apply to panel under slight modications of our assumptions, see paper). I Data can be divided into: 1. Two groups. G = 0: control, and G = 1: treatment. E.g.: districts with few/many schools constructed. 2. Two periods. T =0 and T = 1. I Interested in eect of binary treatment potential outcomes of same individual. E.g.: old/young cohorts. D on outcome Y . Y (0) and Y (1): Y (1) − Y (0): treatment eect. I Notations: = X, E (X |G = 1, T = 1) − E (X |G = 1, T = 0) − (E (X |G = 0, T = 1) − E (X |G = 0, T = 0)) . 1. For any random variable DIDX 2. WDID = DIDY . DIDD I I focus on simple case with binary treatment, two groups, two periods: all results extend to more general cases. Reminder: in sharp designs, DIDY = ATE if common trends. I Sharp design: D = T × G. Only treatment group receives the treatment in period 1. E.g. Card and Krueger (1994). I Common trends assumption (CT): E (Y (0)|G = 1, T = 1) − E (Y (0)|G = 1, T = 0) = E (Y (0)|G = 0, T = 1) − E (Y (0)|G = 0, T = 0). If treatment group untreated in period 1, mean outcome would have followed same evolution in the two groups. Well-known result from Blundell et al. (2004) or Abadie (2005): Theorem 1 In sharp designs, if CT holds then DIDY = E (Y (1) − Y (0)|G = 1, T = 1). Fuzzy designs. Period 0 30% treated Period 1 50% treated Control Group 70% untreated 50% untreated 20% treated 70% treated Treatment Group 80% untreated 30% untreated Only one requirement: treatment rate does not follow parallel evolution in the two groups. Populations of interest in fuzzy designs. Period 0 Period 1 Always Treated: Y(1) Always Treated: Y(1) Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) Always Treated: Y(1) Always Treated: Y(1) Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) Control Group Treatment Group I Assume data = repeated cross-sections of the same population. I From I T =0 to T = 1, treatment increases from 30 to 50% in control. ⇒ 30% of units treated at T = 1 (S), and 50% never I In T = 0, both dates (AT), 20% become treated in treated (NT). we cannot distinguish NT from S. In distinguish AT from S. T = 1, we cannot In fuzzy designs, two more assumptions needed for identication. 1. Stable treatment eects (STE). g ∈ {0, 1}, E (Y (1) − Y (0)|AT , G = g , T = 1) = E (Y (1) − Y (0)|AT , G = g , T = 0). For In each group, ATE among always treated the same in period 0 and 1. 2. Homogeneous treatment eects (HTE). E (Y (1) − Y (0)|S, G = 1, T = 1) = E (Y (1) − Y (0)|S, G = 0, T = 1). = in treatment and control groups. In period 1, ATE among switchers Identication result (1/3) Assume we are in a fuzzy design where D 6= T × G . 1. If CT holds, WDID = − − + P(D = 1|G = 1, T DIDD P(D = 1|G = 1, T E (Y (1) − Y (0)|D = 1, G = 1, T = 0) DIDD P(D = 1|G = 0, T E (Y (1) − Y (0)|D = 1, G = 0, T = 1) DIDD P(D = 1|G = 0, T E (Y (1) − Y (0)|D = 1, G = 0, T = 0) DIDD E (Y (1) − Y (0)|D = 1, G = 1, T = 1) = 1) = 0) = 1) = 0) . Intuition I DIDY : 6= between trends of mean outcome in the 2 groups. I CT: if nobody treated, mean outcome follows same trend in two groups. I In sharp designs only 1 departure from scenario where nobody treated: treatment group treated in period 1. I ⇒ under CT, 6= between trends must come from treatment eect in treatment group in period 1: I In DIDY = E (Y (1) − Y (0)|G = 1, T = 1). fuzzy designs, 4 departures from scenario where nobody treated: some units treated in each of the 4 cells. I ⇒ under CT, DIDY 6= between trends comes from treatment eect in all cells: = E (Y (1) − Y (0)|D = 1, G = 1, T = 1)P(D = 1|G = 1, T = 1) − E (Y (1) − Y (0)|D = 1, G = 1, T = 0)P(D = 1|G = 1, T = 0) − E (Y (1) − Y (0)|D = 1, G = 0, T = 1)P(D = 1|G = 0, T = 1) + E (Y (1) − Y (0)|D = 1, G = 0, T = 0)P(D = 1|G = 0, T = 0). Identication result (2/3) Assume we are in a fuzzy design where D 6= T × G . 1. If CT holds, WDID = − − + P(D = 1|G = 1, T DIDD P(D = 1|G = 1, T E (Y (1) − Y (0)|D = 1, G = 1, T = 0) DIDD P(D = 1|G = 0, T E (Y (1) − Y (0)|D = 1, G = 0, T = 1) DIDD P(D = 1|G = 0, T E (Y (1) − Y (0)|D = 1, G = 0, T = 0) DIDD E (Y (1) − Y (0)|D = 1, G = 1, T = 1) = 1) = 0) = 1) = 0) . 2. If CT and STE hold, WDID = − (P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0)) DIDD (P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0)) E (Y (1) − Y (0)|S, G = 0, T = 1) . DIDD E (Y (1) − Y (0)|S, G = 1, T = 1) Intuition DIDY = E (Y (1) − Y (0)|D = 1, G = 1, T = 1)P(D = 1|G = 1, T = 1) − E (Y (1) − Y (0)|D = 1, G = 1, T = 0)P(D = 1|G = 1, T = 0) − (E (Y (1) − Y (0)|D = 1, G = 0, T = 1)P(D = 1|G = 0, T = 1) − E (Y (1) − Y (0)|D = 1, G = 0, T = 0)P(D = 1|G = 0, T = 0)) . Control Group Treatment Group Period 0 Period 1 Always Treated: Y(1) Always Treated: Y(1) Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) Always Treated: Y(1) Always Treated: Y(1) Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) If ATE of always treated does not change over time, DIDY = E (Y (1) − Y (0)|S, G = 1, T = 1) (P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0)) − E (Y (1) − Y (0)|S, G = 0, T = 1) (P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0)) . Identication result (3/3) Theorem 2 Assume we are in a fuzzy design where D 6= T × G . 1. If CT holds, WDID = − − + P(D = 1|G = 1, T DIDD P(D = 1|G = 1, T E (Y (1) − Y (0)|D = 1, G = 1, T = 0) DIDD P(D = 1|G = 0, T E (Y (1) − Y (0)|D = 1, G = 0, T = 1) DIDD P(D = 1|G = 0, T E (Y (1) − Y (0)|D = 1, G = 0, T = 0) DIDD E (Y (1) − Y (0)|D = 1, G = 1, T = 1) = 1) = 0) = 1) = 0) . 2. If CT and STE hold, WDID = − (P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0)) DIDD (P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0)) E (Y (1) − Y (0)|S, G = 0, T = 1) . DIDD E (Y (1) − Y (0)|S, G = 1, T = 1) 3. If CT, STE, and HTE hold, WDID = E (Y (1) − Y (0)|S, G = 1, T = 1). Application 1: Duo (2001) heavily relies on HTE (1/2). I In Duo (2001), schooling increased in high and low program districts. Older cohort Younger cohort Di Low program districts 9.40 9.76 0.36 High program districts 8.02 8.49 0.47 Table 1: I Let R1 and R0 Years of education in Duo's groups denote returns to schooling in high and low program districts. I It follows from previous theorem that under CT and STE, Wald-DID is equal to 0.47/0.11 I If R1 = R0 = R , I In this context, × R1 − 0.36/0.11 × R0 . no problem: 0.47/0.11 R1 = R0 × R − 0.36/0.11 × R = R . not warranted: control districts have more skilled labor, so they could have dierent returns. I If R1 = 0.10 and R0 = 0.12, WDID = 0.035! Application 1: Duo (2001) heavily relies on HTE (2/2) I After estimating simple Wald-DID, Duo moves to richer specications. I Coecient of schooling in her regression = Pd wd WDID (d, d − 1). WDID (d, d − 1) districts d and d − 1. d=1 I Using previous theorem, under CT and STE each of the weighted dierence of returns to schooling in = I Rearranging, weighted sum of returns to schooling in each district. I Weights can be estimated. In half of districts, returns to schooling receive 0 -.4 -.2 Weight .2 .4 negative weights, and negative weights sum to -3.28. 0 100 200 300 Districts ordered by increase in schooling Figure 1: Enikolopov et al. Weights in Duo (2001). Application 2: Gentzkow et al. (2011) heavily rely on STE I Back to Gentzkow et al. (2011) (eect of newspapers on political participation). I Coecient of newspapers is P1928 t=1872 Pc c=1 wct WDID (c, c − 1, t). I Using previous theorem, under CT and STE each of the = weighted dierence of treatment eect in counties c WDID (c, c − 1, t) c − 1. and I Rearranging, weighted sum of treatment eects in each county. I Weights positive for all counties ⇒ their coe does not rely on HTE. I But using one new estimator we propose and which does not rely on STE signicantly aects results. ⇒ their results heavily rely on STE. I Not appealing: over time, new media develop and break monopoly of printed press ⇒ eects of newspapers might diminish. Outline Motivation: Wald-DIDs are everywhere Two pitfalls of the Wald-DID One new design restriction, two new estimands Applications Take-away To avoid relying on HTE, nd control group where treatment stable Remember: if CT and STE hold, WDID = − (P(D = 1|G = 1, T = 1) − P(D = 1|G = 1, T = 0)) DIDD (P(D = 1|G = 0, T = 1) − P(D = 1|G = 0, T = 0)) E (Y (1) − Y (0)|S, G = 0, T = 1) . DIDD E (Y (1) − Y (0)|S, G = 1, T = 1) Theorem 3 If CT and STE hold, and P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0), WDID = E (Y (1) − Y (0)|S, G = 1, T = 1). Intuition Period 0 Period 1 Always Treated: Y(1) Always Treated: Y(1) Never Treated: Y(0) Never Treated: Y(0) Always Treated: Y(1) Always Treated: Y(1) Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) Control Group Treatment Group I In treatment group, mean of = in period 1 - mean of Y in period 0 Eect of time on the outcome + Treatment eect among switchers. I In control group, mean of = Y Y in period 1 - mean of Y in period 0 Eect of time on the outcome. I CT+STE guarantee that eect of time is the same in both groups. I Therefore, WDID = ATE among treatment group switchers. First set of alternative assumptions I If treatment stable in control group, Wald-DID does not rely on HTE. I But will still rely on CT + STE. Instead, we consider generalization of CT. Conditional common trends (CCT): Period 0 Period 1 Always Treated: Y(1) Always Treated: Y(1) Never Treated: Y(0) Never Treated: Y(0) Always Treated: Y(1) Always Treated: Y(1) Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) Control Group Treatment Group E (Y (1)|AT , G = 1, T = 1) − E (Y (1)|AT , G = 1, T = 0) = E (Y (1)|AT , G = 0, T = 1) − E (Y (1)|AT , G = 0, T = 0). E (Y (0)|NT ∪ S, G = 1, T = 1) − E (Y (0)|NT ∪ S, G = 1, T = 0) = E (Y (0)|NT ∪ S, G = 0, T = 1) − E (Y (0)|NT ∪ S, G = 0, T = 0). First alternative estimand: the Time-Corrected Wald ratio Let WTC = where E (Y |G = 1, T = 1) − E (Y + δD |G = 1, T = 0) , E (D|G = 1, T = 1) − E (D|G = 1, T = 0) δd = E (Y |D = d, G = 0, T = 1) − E (Y |D = d, G = 0, T = 0). Theorem 4 If CCT holds and P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0), WTC = E (Y (1) − Y (0)|S, G = 1, T = 1). Intuition Period 0 Period 1 Always Treated: Y(1) Always Treated: Y(1) Never Treated: Y(0) Never Treated: Y(0) Always Treated: Y(1) Always Treated: Y(1) Control Group Treatment Group WTC = Switchers: Y(0) Switchers: Y(1) Never Treated: Y(0) Never Treated: Y(0) E (Y |G = 1, T = 1) − E (Y + δD |G = 1, T = 0) . E (D|G = 1, T = 1) − E (D|G = 1, T = 0) Second term of numerator equal to mean of Y we would have observed in period 1 in treatment group if switchers had remained untreated. Second set of alternative assumptions I For continuous outcomes, we also consider a generalization of the changes-in-changes model (Athey and Imbens, 2006) to the fuzzy case. Common changes (CC) assumption: 1. 2. Y (d) = hd (Ud , T ) with hd (., t) strictly increasing. Ud ⊥ ⊥ T |AT , G and Ud ⊥ ⊥ T |NT ∪ S, G . I Under CC, we can recover next slide. E (Y (1) − Y (0)|S, G = 1, T = 1). Intuition on Intuition E (Y (1) − Y (0)|S, G = 1, T = 1) we need to E (Y (1)|AT , G = 1, T = 1) and E (Y (0)|NT ∪ S, G = 1, T = 1). I To be able to recover recover I Under CCT, we have E (Y (1)|AT , G = 1, T = 1) = E (Y (1) + δ1 |AT , G = 1, T = 0). I Under CC, we have something dierent: an AT in treatment group whose Y (1) in period 0 places her at the q th quantile of the distribution of AT in control group in period 0 will have in th period 1 the same Y (1) as that of the AT at the q quantile of the distribution of AT in control group in period 1. I Ud ⊥ ⊥ T |AT , G : AT's ranks stable over time. I Under CC, E (Y (1)|AT , G = 1, T = 1) = E (Q1 (Y (1))|AT , G = 1, T = 1), Q1 (.) denotes the quantile-quantile transform among AT in the group from period 0 to 1. I Similar reasoning for E (Y (0)|NT ∪ S, G = 1, T = 1). where control Second alternative estimand: the CIC Wald ratio WCIC = Q1 and Q0 : E (Y |G = 1, T = 1) − E (QD (Y )|G = 1, T = 0) . E (D|G = 1, T = 1) − E (D|G = 1, T = 0) quantile-quantile transforms of the outcome between periods 0 and 1 in the control group among treated and untreated units. Wald-CIC similar to Wald-TC, except that instead of accounting for the eect of time through additive shifts, applies quantile-quantile transforms. Theorem 5 If CC holds and P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0), WCIC = E (Y (1) − Y (0)|S, G = 1, T = 1). Wald-TC or Wald-CIC? Estimators (1/2) Let ngt Igt = {i : Gi = g , Ti = t} (resp. Idgt = {i : Di = d, Gi = g , Ti = t}) and ndgt ) denote the size of Igt (resp. Idgt ) for all (d, g , t) ∈ {0, 1}3 . (resp. The Wald-DID and Wald-TC estimators are simply dened by 1 cDID = W 1 n11 1 cTC = W P i∈I11 n11 n11 Yi − 1 P i∈I10 n10 Yi − 1 P i∈I01 n01 Yi + P P 1 1 i∈I11 Di − n10 i∈I10 Di − n01 i∈I01 Di + h i P P 1 b i∈I11 Yi − n10 i∈I10 Yi + δDi P P , 1 1 i∈I11 Di − n10 i∈I10 Di n11 P where δbd = 1 nd 01 X i∈Id 01 Yi − 1 nd 00 X i∈Id 00 Yi . 1 n00 1 n00 P i∈I00 P i∈I00 Yi Di , Estimators (2/2) Let FbYdgt denote the empirical cdf of Y 1 FbYdgt (y ) = ndgt on the subsample X Idgt : 1{Yi ≤ y }. i∈Idgt Similarly, we estimate the quantile of order q ∈ (0, 1) of Ydgt by 1 FbY−dgt (q) = inf {y : FbYdgt (y ) ≥ q}. The estimator of the quantile-quantile bd = Fb−1 ◦ FbY . Then, the Wald-CIC estimator is dened by transform is Q d 00 Yd 01 1 cCIC = W P n11 1 n11 i∈I11 P Yi − i∈I11 1 P n10 Di − 1 n10 i∈I10 P bD (Yi ) Q i i∈I10 Di . Inference Theorem 6 (Yi , Di , Gi , Ti )(1≤i≤n) P(D = 1|G = 0, T = 1) = P(D = 1|G = 0, T = 0). Then Suppose that we observe an iid sample and 1. If E (Y 2 ) < ∞ and CT and STE hold, √ L cDID − E (Y (1) − Y (0)|S, G = 1, T = 1) −→ n W N (0, V1 ) . 2. If E (Y 2 ) < ∞ and CCT holds, √ L cTC − E (Y (1) − Y (0)|S, G = 1, T = 1) −→ n W N (0, V2 ) . 3. If outcome has bounded support and CC holds, √ L cCIC − E (Y (1) − Y (0)|S, G = 1, T = 1) −→ n W N (0, V3 ) . I Proof for cCIC : W estimator standard empirical process = Hadamard dierentiable functional ⇒ we use functional Delta method. I Estimators still asymptotically normal if clustering. of Extensions I When one cannot nd a control group where treatment is stable, we show that the ATE of switchers is partially identied and we derive sharp bounds. I We extend our analysis to non-binary treatments. I We show how to use our estimators with multiple groups and time periods. I We show how to incorporate discrete and continuous covariates into the estimation. Outline Motivation: Wald-DIDs are everywhere Two pitfalls of the Wald-DID One new design restriction, two new estimands Applications Take-away New control and treatment groups in Duo (2001). I Duo's Wald-DID = weighted dierence between treatment eects in treatment and control districts. I Education higher in control districts ⇒ economic development and returns to education may be higher there. I If that's the case, Wald-DID downward biased. I We use a dierent procedure to dene our groups. Goal = ensure that years of education stable between cohorts in control group. I Control group= districts such that p-value of a across cohorts χ2 comparing education > 0.5. I Treatment group=districts such that this p-value< 0.5 and average number of years of education increased. Older cohort Younger cohort Control districts 9.60 9.55 -0.05 (0.097) Treatment districts 8.65 9.64 0.99 (0.082) Table 2: Dierence (s.e.) Years of education in our control and treatment groups Placebo tests I We can use previous cohorts to check the validity of common trends assumptions with these new groups. Cohorts -2 vs -1 DID schooling DID wages Numerator Wald-TC Numerator Wald-CIC N Table 3: -1 vs 0 0 vs 1 0.101 -0.006 1.030 (0.191) (0.160) (0.127) 0.050 0.002 0.164 (0.035) (0.026) (0.028) 0.029 -0.0083 0.103 (0.028) (0.022) (0.027) 0.034 -0.0047 0.107 (0.028) (0.023) (0.027) 14,452 19,938 22,339 Placebo tests with our control and treatment groups Returns to education I We compare Duo's estimate to our three new measures of returns to education. Duo's 2SLS WDID WTC WCIC OLS 0.073 0.159 0.104 0.100 0.077 (0.043) (0.028) (0.027) (0.027) (0.001) 30,828 22,339 22,339 22,339 30,828 Returns to education N I Our Wald-DID is twice as large as Duo's. Does not rely on homogeneity assumption... I ... But requires that returns to education = among old and young workers. Not realistic: e.g. education delays labour market entry but returns to experience concave I If so, WDID ⇒ returns presumably larger among old workers. biased upwards. I Wald-TC and Wald-CIC do not rely on STE. Lower than Wald-DID but higher than Duo's estimate and OLS. Robustness checks Eects of newspapers on electoral participation I Gentzkow et al. (2011): regress change in electoral participation between elections t −1 and t in county number of newspapers between c on year dummies t − 1 and t . and change in the I Instead, rst alternative estimator: 1. For each pair of consecutive elections, we form control group of counties where newspapers stable between t −1 and t, and two treatment groups: counties where newspapers increased, and counties where decreased. 2. We estimate Wald-DID with control group and treatment group 1, and Wald-DID with control group and treatment group 2. 3. We take a weighted average of these Wald-DIDs over dates (weights? see paper). I Second alternative estimator: Wald-TCs instead of Wald-DIDs. I Treatment stable in all control groups ⇒ these estimators do not rely on HTE. The weighted average of Wald-TCs also does not rely on STE. Results Table 4: Eect of one additional newspaper on turnout Gentzkow et al. (2011) WDID WTC 0.0026 0.0031 0.0047 (0.0009) (0.0012) (0.0014) 15627 15627 15627 N I WTC signicantly dierent from estimator in Gentzkow et al. (2011) and from Wald-DID. I To reconstruct trends in turnout that a county which went from 2 to 3 newspapers between t −1 t would have experienced if number of WDID uses all counties where number of t − 1 and t . and newspapers had not changed, newspapers stable between I On the other hand, WTC only uses counties where number of newspapers stable and with 2 newspapers in t − 1. Outline Motivation: Wald-DIDs are everywhere Two pitfalls of the Wald-DID One new design restriction, two new estimands Applications Take-away Take-away I In fuzzy DID designs, researchers should nd a control group where treatment is stable. Otherwise, their results will rely on the assumption that treatment eects are homogeneous across groups. I When they have found such a control group, they might want to use Wald-TC or Wald-CIC estimator: do not require that treatment eect be stable over time. Using these estimators can make a big dierence in practice wrt to standard estimators. I We are developping a Stata package. Computes Wald-DID, Wald-TC, Wald-CIC, accounts for clustering, includes covariates, etc. First version already available on my website. My portfolio I Econometrics, on top of this paper: 1. Tolerating deance? Treatment eects without monotonicity. Monotonicity in Imbens and Angrist (1994) can be weakened. 2. Next please! A new denition of the treatment and control groups for randomizations with waiting lists, with Luc Behaghel. How to dene the treatment and the control group when treatment is oered according to a random ordering until all seats are lled. I Education/health: 1. Ready for boarding? The eects of a boarding school for underprivileged students with Luc Behaghel and Marc Gurgand. School increases students' test scores in Maths, but only after two years and among strong students. Long-term paper coming soon. 2. Does a community website improve health outcomes among patients awaiting kidney transplant? with Luc Behaghel. Patients randomized into 3 arms: control, information website, information + community website (Facebook for patients awaiting transplant). 3. Assessing the classroom-wide impact of a mental health program for disruptive students: eects on disruptive students, classmates, and teachers, with Nicolas Navarrete. Enikolopov et al. (2011) also heavily rely on HTE I Back to Enikolopov et al. (2011) (eect of independent TV in Russia). I Coecient of independent TV is Pr wr WDID (r , r − 1). WDID (r , r − 1) eect in regions r and r − 1. r =1 I Using previous theorem, under CT and STE each of the weighted dierence of treatment I Rearranging, weighted sum of treatment eects in each region. I Treatment eects in more than half of the regions receive negative .01 .005 -.005 0 Weight .015 .02 weights, and negative weights sum to -2.26. 0 500 1000 1500 2000 Regions ordered by increase in TV Figure 2: Back Weights in Enikolopov et al. (2011). = Robustness checks I We change the p-value threshold from 0.5 to 0.6 (distribution of education not perfectly stable in our initial control group): leaves results unaected. I Our procedure to form groups is statistical. Should be accounted for in our standard errors. I To account for this: double bootstrap procedure. First, we boostrap individuals, and we form our groups again, then we bootstrap districts and we compute our three estimators. I Does not aect our main conclusions. Back Wald-TC or Wald-CIC? Common trends versus common changes: 1. Common trends not invariant to scaling of outcome ⇒ logs versus levels problem. But only restrict rst moment of outcome. 2. Common changes invariant to scaling, but restricts whole distribution of outcome. When groups very dierent in levels in the pre-period, Wald-TC might be very sensitive to scaling and maybe Wald-CIC better. Back