Synthetic Control Estimation Beyond Case Studies WORKING PAPER

advertisement
WORKING PAPER
Synthetic Control Estimation Beyond Case Studies
Does the Minimum Wage Reduce Employment?
David Powell
RAND Labor & Population
WR-1142
February 2016
This paper series made possible by the NIA funded RAND Center for the Study of Aging (P30AG012815) and the NICHD funded RAND
Population Research Center (R24HD050906).
RAND working papers are intended to share researchers’ latest findings and to solicit informal peer review.
They have been approved for circulation by RAND Labor and Population but have not been formally edited or
peer reviewed. Unless otherwise indicated, working papers can be quoted and cited without permission of the
author, provided the source is clearly referred to as a working paper. RAND’s publications do not necessarily
reflect the opinions of its research clients and sponsors. RAND® is a registered trademark.
For more information on this publication, visit www.rand.org/pubs/working_papers/WR1142.html
Published by the RAND Corporation, Santa Monica, Calif.
© Copyright 2016 RAND Corporation
R ® is a registered trademark
Limited Print and Electronic Distribution Rights
This document and trademark(s) contained herein are protected by law. This representation of RAND
intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is
prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and
complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research
documents for commercial use. For information on reprint and linking permissions, please visit
www.rand.org/pubs/permissions.html.
The RAND Corporation is a research organization that develops solutions to public policy challenges to help
make communities throughout the world safer and more secure, healthier and more prosperous. RAND is
nonprofit, nonpartisan, and committed to the public interest.
RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors.
Support RAND
Make a tax-deductible charitable contribution at
www.rand.org/giving/contribute
www.rand.org
Synthetic Control Estimation Beyond Case Studies:
Does the Minimum Wage Reduce Employment?∗
David Powell
RAND †
First Version: October 2015
This Version: February 2016
Abstract
Panel data are often used in empirical work to account for additive fixed time and unit
effects. More recently, the synthetic control estimator relaxes the assumption of additive fixed
effects for case studies, using pre-treatment outcomes to create a weighted average of other
units which best approximate the treated unit. The synthetic control estimator is currently
limited to case studies in which the treatment variable can be represented by a single indicator
variable. Applying this estimator more generally, such as applications with multiple treatment
variables or a continuous treatment variable, is problematic. This paper generalizes the case
study synthetic control estimator to permit estimation of the effect of multiple treatment variables, which can be discrete or continuous. The estimator jointly estimates the impact of the
treatment variables and creates a synthetic control for each unit. Additive fixed effect models
are a special case of this estimator. Because the number of units in panel data and synthetic
control applications is often small, I discuss an inference procedure for fixed N . The estimation
technique generates correlations across clusters so the inference procedure will also account for
this dependence. Simulations show that the estimator works well even when additive fixed
effect models do not. I estimate the impact of the minimum wage on the employment rate of
teenagers. I estimate an elasticity of -0.44, substantially larger than estimates generated using
additive fixed effect models, and reject the null hypothesis that there is no effect.
Keywords: synthetic control estimation, finite inference, minimum wage, teen employment,
panel data, interactive fixed effects, correlated clusters
JEL classification: C33, J23, J31
∗
I thank both the Center for Causal Inference and the Methods Lab for providing funding. I am also
appreciative of both Arindrajit Dube and David Neumark for making their respective data and code available.
I received helpful feedback from seminar participants at the Center for Causal Inference as well as Abby
Alpert, Tom Chang, John Donohue, Michael Dworsky, Beth Ann Griffin, Kathleen Mullen, David Neumark,
and Michael Robbins.
†
Contact information: dpowell@rand.org
1
1
Introduction
Empirical research often relies on panel data to account for fixed differences across units and
common time effects, using changes in the policy variables to identify the relationship between the policies and the outcome of interest. Additive fixed effect models include separate
additive unit (“state”) fixed effects and additive time fixed effects. These models assume
that an unweighted average of all states acts as a valid counterfactual for each state. In this
paper, I introduce a synthetic control estimator which creates an empirically-optimal control
group for each unit using a weighted average of the other units. This estimator builds on
and generalizes previous work by Abadie and Gardeazabal (2003), Abadie et al. (2010), and
Abadie et al. (2014).
This previous work introduced a technique to estimate causal effects for case studies
and I refer to the estimator as the “case study synthetic control” (CSSC) estimator. The
innovation of this approach was to use the period before the treatment (terrorism outbreak in
Abadie and Gardeazabal (2003); the adoption of a tobacco control program in California in
Abadie et al. (2010)) to empirically construct a comparison group which is appropriate for the
group exposed to the treatment. The synthetic control estimator uses a data-driven approach
to create this comparison group. A typical motivation for the use of the synthetic control
estimator is that the parallel trends assumption required by additive fixed effect models often
does not hold in practice. It has become common to cite the “parallel trends” assumption
implicit in additive fixed effect models and to test for pre-existing trends through event study
analysis. If the event study analysis suggests non-parallel trends, however, it is not generally
clear how to proceed. CSSC permits estimation of specifications with interactive fixed effects
which generate non-parallel trends.
The estimator introduced in this paper generalizes CSSC beyond case studies, per2
mitting the inclusion of multiple discrete or continuous treatment variables in the main
specification as well as interactive fixed effects. The estimator jointly estimates the parameters associated with the treatment variables and a synthetic control for each unit. The
estimator performs well in simulations. I apply this estimator to study the employment
effects of the minimum wage.
Understanding the employment ramifications of minimum wage increases is an essential part of the debate over the efficacy of federal, state, and local policies to change the
minimum wage. The minimum wage rose in 14 states at the beginning of 2016 (12 due to
legislative action and 2 due to automatic cost-of-living increases).1 President Barack Obama
promoted a bill to raise the federal minimum wage to $10.10 in his 2014 State of the Union
Address and reiterated his support for a minimum wage increase in his 2015 and 2016 State
of the Union Addresses. Presidential candidate Bernie Sanders supports a federal minimum
wage increase to $15 per hour while candidate Hillary Clinton supports an increase to $12
per hour.
The economic literature on the relationship between the minimum wage and employment has exploited state-level minimum wage increases to test whether states with increasing minimum wage experience different employment changes relative to states with stagnant
minimum wages over the same time period. This additive fixed effects approach assumes
that adopting states and non-adopting states would have experienced the same employment
trends, on average, in the absence of the policy adoption.
Recent work has questioned the usefulness of this approach with evidence that estimated disemployment effects in the traditional fixed effects approach are not robust to the
inclusion of state-specific trends (Allegretto et al. (2011)) or limiting comparisons to contiguous counties which straddle state borders (Dube et al. (2010)). In response, Neumark et
1
http://www.ncsl.org/research/labor-and-employment/state-minimum-wage-chart.aspx
3
al. (2014b) argues that these new approaches do not produce more appropriate comparison
groups and estimates a disemployment elasticity for teenagers of -0.15. In more recent work,
Neumark et al. (2014a) states, “A central issue in estimating the employment effects of minimum wages is the appropriate comparison group for states (or other regions) that adopt
or increase the minimum wage.” Synthetic control estimation offers a method to empirically
select appropriate control groups for treated states.
The application of CSSC has proven quite useful in applied work (see Nonnemaker
et al. (2011); Hinrichs (2012); Cavallo et al. (2013); Saunders et al. (2014); Ando (2015);
Bauhoff (2014); Fletcher et al. (2014); Cunningham and Shah (2014); Mideksa (2013);
Torsvik and Vaage (2014); Munasib and Rickman (2015); Munasib and Guettabi (2013);
Eren and Ozbeklik (2015) for a small subset of examples). While initially motivated by case
study applications, the CSSC estimator is often used in circumstances where multiple units
are treated (e.g., Billmeier and Nannicini (2013); Donohue et al. (2015); Powell et al. (2015))
by applying the CSSC estimator to each treated unit and then aggregating the estimates,
though there is little agreement on how to implement this aggregation. The treatment in
these circumstances can be represented by a single indicator variable.
In this paper, I extend the synthetic control approach beyond case studies and applications where the treatment variable must be represented by a single indicator variable. The
synthetic control estimator introduced in this paper allows for multiple treatment variables,
which can be discrete or continuous. I will refer to this estimator as the generalized synthetic
control (GSC) estimator. This estimator parallels fixed effects estimators which account for
fixed state differences and time fixed effects while allowing for multiple discrete or continuous
treatment variables. However, instead of assuming additive state fixed effects and additive
year fixed effects, the synthetic control estimator permits estimation of a specification with
interactive fixed effects while nesting more traditional additive fixed effect models. Additive
4
fixed effect models are equivalent to subtracting off the mean value of the outcome and the
explanatory variables for each year, giving equal weight to each state for this de-meaning.
The synthetic control estimator generates a data-driven control for each state which does
not constrain the weights on the units to be equal, assigning different weights to generate
an optimal control.
A primary motivation of CSSC in many applications is that it allows for the creation
of a synthetic control which has a similar outcome trajectory in the pre-treatment period,
typically implemented by using the outcome in each pre-treatment period to create the
synthetic control unit. Kaul et al. (2015) points out that if every pre-treatment outcome
is used for the creation of the synthetic control, then additional covariates are not used by
CSSC. GSC, by allowing for multiple explanatory variables, circumvents this problem and
uses the control variables in a way that is similar to linear panel data models – by estimating
the independent relationship between each control variable and the outcome. This is a further
advantage of using GSC, even with case studies. Similarly, in the circumstance of multiple
treated units of the same policy, GSC provides a natural means of aggregating the estimates
across adopting units.
For inference, Abadie and Gardeazabal (2003) suggests a placebo test. Recent work
has questioned the validity of this method for CSSC inference (Ando and Sävje (2013))
and it is difficult to apply placebo tests more generally when the treatment is continuous
or there are multiple treatment variables. I build on a method developed for panel data
in complementary work (Powell (2015)) which provides valid p-values in the presence of a
finite number of correlated clusters. Accounting for dependence across units and developing
appropriate inference for fixed N are crucial for use with a synthetic control estimator.
First, many panel data applications involve only a relatively small number of units. Second,
the estimator uses other states as controls for each unit, mechanically inducing correlations
5
across units.
In the next section, I further discuss the CSSC estimator to motivate a more general
synthetic control technique and include background on the minimum wage debate in the
economics literature. In Section 3, I introduce the generalized synthetic control (GSC)
estimator and discuss an appropriate inference procedure. Section 4 includes simulation
results. In Section 5, I apply GSC to estimate the relationship between state minimum
wages and the employment rate of teenagers. Section 6 concludes.
2
2.1
Background
Case Study Synthetic Control Estimation
This paper builds on the synthetic control estimation technique discussed in Abadie and
Gardeazabal (2003), Abadie et al. (2010), and Abadie et al. (2014). In the CSSC framework,
there are J + 1 units and T time periods. Unit 1 is exposed to the treatment in periods
T0 + 1 to T and unexposed in periods 1 to T0 . All other units are unexposed in all time
periods. Outcomes are defined by a factor model:
Yit = αit Dit + δt + λt μi + it
(1)
where Dit represents the treatment variable and is equal to 1 for state i = 1 and time periods
t > T0 , 0 otherwise. λt is a 1 × F vector of common unobserved factors and μi (F × 1)
is a vector of factor loadings. Gobillon and Magnac (forthcoming) compares the synthetic
control approach to estimation using interactive fixed effects (Bai (2009)) for estimation of
equation (1). This work illustrates the usefulness of relaxing the restriction that the fixed
effects are additive.
6
Unit 1 is the treated unit while all other units are part of the “donor pool,” the
units that may potentially compose part of the synthetic control. The motivation for the
approach is to find a weighted combination of units in the donor pool such that
J+1
wj∗ μj = μ1 .
(2)
j=2
If this condition holds, then
J+1
j=2
wj∗ Yjt provides an unbiased estimate of Y1t for D1t =
0.
The weights are constrained to be non-negative and to sum to one. They are generated to minimize the difference between the pre-treated outcomes of the treated unit and
the unit’s synthetic control. I assume that there are no other control variables, which are
denoted Zit in Abadie and Gardeazabal (2003) and included additively in equation (1), and
that all pre-treatment outcomes (and only pre-treatment outcomes) are used to generate the
synthetic control weights. This is common in applications using CSSC (see Cavallo et al.
(2013) for one example).2 The inclusion of control variables to generate the match between
the treated unit and the synthetic control has been criticized (Kaul et al. (2015)) as the
CSSC estimator will not use the covariates when each value of the pre-treatment outcome is
used. For this reason, I assume that no additional covariates are used with the CSSC estimator. The GSC estimator below will allow for the use of control variables in a manner that
parallels traditional regression analysis by letting them have a direct effect on the outcome
variable and estimating that effect jointly with the other parameters.
Let W = (w2 , . . . , wJ+1 ) such that wj ≥ 0 for all 2 ≤ j ≤ T + 1 and
J+1
j=2
wj = 1.
Let X1 = (Y11 , . . . , Y1T0 ), a (T0 × 1) vector of pre-intervention outcomes for the treated unit.
X0 is a (T0 × J) of the same variables for the units in the donor pool. The weights are
2
The use of each pre-treatment outcome to create the synthetic control is especially useful to account for
differential pre-existing (non-linear) trends across units.
7
estimated using
W∗ = argmin |X1 − X0 W|V
W
s.t. wj ≥ 0 for all 2 ≤ j ≤ T + 1 and
J+1
wj = 1
j=2
for some matrix V. In words, CSSC involves a constrained optimization to find the weighted
average of the donor states which is “closest” to the treated unit in terms of pre-treated
outcomes.
The treatment effect estimate for period t > T0 is
α̂it = Yit −
J+1
wj∗ Yjt .
(3)
j=2
CSSC permits estimation of a treatment effect for each post-treatment time period, though
it is common in applied work to report an aggregate post-treatment effect. For inference,
Abadie and Gardeazabal (2003) and Abadie et al. (2010) recommend a placebo test in which
a treatment effect is estimated for each unit in the donor pool as if it were treated (and
Unit 1 is included in the donor pool). Under the null hypothesis that there is no treatment
effect, these placebo tests generate a distribution of estimates which would be observed
randomly. The placement of α̂it in this distribution generates an appropriate p-value. Ando
and Sävje (2013) discusses possible limitations of the placebo test approach for synthetic
control estimation.
CSSC has proven valuable in applied work as a means of relaxing the parallel trends
assumption required by additive fixed effect models. The framework of equation (1) nests
8
the more typical additive fixed effect model which assumes
λt =
1 φt
⎡
⎤
⎢ γi ⎥
and μi = ⎣
⎦,
1
(4)
such that λt μi = γi +φt . When additive fixed effects are appropriate, CSSC will, in principle,
apply appropriate weights. The application of CSSC has proven quite useful in applied work.
A primary contribution of this paper is that the motivation behind CSSC should apply more
generally to panel data models, not just case studies.
2.2
Finite Inference
Panel data models are often applied in cases where the number of units is relatively small.
For example, the minimum wage analysis of this paper uses 50 states plus Washington D.C.
The donor pool in Abadie et al. (2010) includes 38 states; Abadie and Gardeazabal (2003)
uses 16 Spanish regions; Abadie et al. (2014) uses a sample of 16 OECD countries. One
exception in the CSSC context is Robbins et al. (2015) which studies a crime intervention
policy implemented by city block and includes a total of 3,601 blocks in the data. Similarly,
Ando (2015) has over 800 units in the donor pool.
In this paper, I use an inference procedure for fixed N , which is useful more generally for panel data with a fixed number of units. This method is introduced in Powell (2015)
and applied here for the synthetic control estimator as a special case. Additive fixed effects
create problems when N is small because they induce a mechanical correlation across clusters. Inference procedures developed for a small number of clusters are frequently not valid
for estimation with state and year fixed effects (e.g., Hansen (2007); Bester et al. (2011);
Ibragimov and Müller (2010)). The synthetic control estimator introduced in this paper also
9
generates a correlation across clusters as the synthetic control for each state is a weighted average of other states. After subtracting off each state’s synthetic control, the gradient of each
unit is a function of the error term for that unit and a combination of other units. I develop
an inference procedure which accounts for empirical correlations across units. These correlations can also arise for reasons other than the estimation procedure, such as similarities
across states due to geography, political institutions, industry composition, etc.
2.3
Minimum Wage and Employment
A vast literature has debated whether minimum wage increases affect employment rates (see
Neumark et al. (2014b) for a review). The empirical literature has recently focused on determining appropriate controls for states with increasing minimum wages. State minimum
wages increase due to state legislative changes, binding federal minimum wage changes, or
state cost-of-living adjustments. The literature typically uses all sources of variation, comparing states with increasing minimum wages to states without a change in their minimum
wage. The potential of the CSSC estimator to generate appropriate control units has been
recognized recently in the literature, though it it difficult to appropriately apply CSSC to
this application. I discuss uses of the CSSC estimator to study the possible disemployment
effects of the minimum wage in the literature.
Sabia et al. (2012) studies the employment effects of the 2004-2006 New York minimum wage increase, finding large employment reductions. They use CSSC for part of their
analysis, though they do not use any pre-intervention outcomes to construct the synthetic
control. Instead, the synthetic control is based only on other covariates which is unlikely to
account for pre-existing trends.
More recently, Dube and Zipperer (2015) (also discussed in Allegretto et al. (2015))
10
uses CSSC to study the minimum wage employment effects, discussing the application of
CSSC for a continuous treatment variable. To implement CSSC, the authors select on
states with no minimum wage changes for two years prior to the treatment (i.e., minimum
wage change) and with at least one year of post-treatment data. Further selection criteria
are also necessary and limit the sample to the study of 29 of the 215 state-level minimum
wage increases over their time period (their calculation). The donor pools for each of the
treated states is limited to states with no minimum wage changes over the same time period
and occasionally consist of relatively few states. Furthermore, the synthetic controls are
generated for a relatively short time period around the time of adoption of the treated state.
These selection criteria are unnecessary using the GSC estimator introduced below as the
estimator will include all states and time periods and use all available variation, paralleling
additive fixed effects estimators.
More importantly, Dube and Zipperer (2015) create the synthetic controls for each
adopting state based on pre-intervention outcomes. These outcomes, however, are themselves
treated given the potential that the minimum wage causally affects employment. Consequently, the synthetic control for each treated state using CSSC is, in fact, only valid if the
minimum wage does not affect employment and, otherwise, does not estimate the model
represented in equation (1). The GSC estimator introduced below accounts for the causal
impact of the treatment variables when creating the synthetic controls for each state.
Overall, it is difficult to use the traditional CSSC estimator to study the effects of
the minimum wage. The minimum wage is constantly changing across states, making it
difficult to isolate “treated” states and “donor” states. More generally, this problem arises
whenever there are multiple treatments, multiple treated units, treatment variables that are
continuous, etc. In a fixed effects specification, the designation of units as “treated” and
“control” is unnecessary. This paper introduces a synthetic control equivalent.
11
3
Synthetic Control Estimator
This section develops the GSC estimator which generalizes CSSC beyond case studies. It
permits the inclusion of multiple treatment variables, which can be discrete or continuous,
in the specification. I follow the notation developed for CSSC and used in Section 2.1 as
closely as possible. I assume that there are N units and T time periods.3 I model outcomes
as
Yit = Dit α0 + δt + λt μi + it ,
(5)
where Dit is a K × 1 vector of treatment variables for state i at time period t. As before,
λt is a 1 × F vector of common unobserved factors and μi (F × 1) is a vector of factor
loadings. The power of the λt μi term parallels the benefits discussed for CSSC. This term
permits flexible, non-linear state-level trends and shocks that may be correlated with the
policy variables. Similar to the CSSC estimator, the additive fixed effects equivalent to
equation (5) is actually a special case and available if that specification is correct. While a
state-time interaction term would be collinear with Dit , interactive fixed effects permit the
inclusion of a term which varies flexibly at the state and time level. The primary difference
between equation (5) and the specification represented in equation (1) is that Dit is a vector
and not limited to one indicator variable.4
The estimator will require simultaneous estimation of a synthetic control for each
state. Let
wi = (wi1 , . . . , wii−1 , wii+1 , . . . , wiN ),
where wi is the vector of weights on all other states to generate the synthetic control for
3
It is straightforward to apply the estimator to unbalanced panels.
The parameter α0 in equation (5) does not vary by state or time (unlike the parameter in equation (1)).
However, it is straightforward to allow this heterogeneity by interacting Dit with state or time indicators.
4
12
state i. The weights are constrained, as before, to be non-negative and to sum to one. I
define the set of possible weighting vectors to create the unit i synthetic control by
Wi =
⎧
⎨
⎩
wi
⎫
⎬
j
j
w
=
1,
w
≥
0
for
all
j
.
i
i
⎭
(6)
j=i
wij represents the weight given unit j for the creation of the synthetic control for unit i.
3.1
Main Assumptions
In this section, I discuss the assumptions necessary for identification of α0 . The weights
themselves are not necessarily identified as multiple sets of weights may be appropriate. This
will not affect identification of the parameters of interest. Let Di = (Di1 , . . . , DiT ).
A1 Outcomes: Yit = Dit α0 + δt + λt μi + it
A2 Conditional Independence: There exists wi ∈ Wi such that the following conditions
hold:
(a)
(b)
j
w i μj
E μi −
⎡
E ⎣it −
| Di −
j=i
j=i
wij jt
j=i
| Di −
j=i
13
wij Dj = 0
⎤
wij Dj ⎦ = 0
A3 Rank Condition:
⎡
⎢
⎢
E⎢
⎢
⎣
(Di1
−
(DiT −
j j=i wi Dj1 )
..
.
j=i
⎤
j
j=i wi Yj1
..
.
wij DjT )
j=i
wij YjT
⎥
⎥
⎥ is rank k + 1 for all wi ∈ Wi
⎥
⎦
A1 defines the outcome as a function of the treatment variables, a time fixed effect,
(an arbitrary number of) interactive fixed effects, and an observation-specific disturbance
term. The advantage of the synthetic control estimator is that it accounts for λt μi , which
may be arbitrarily correlated with the treatment variables. As CSSC does for the comparative case study context, this specification nests additive fixed effect models. The dimensions
of λt and μi are both unknown and neither is estimated.
A2 assumes the existence of synthetic controls but allows the synthetic control to be
imperfect (unlike equation (2)). This assumption states that there exists synthetic control
weights such that Di − j=i wij Dj is uncorrelated with deviations from μi . Of course, this
condition is more likely to hold when the deviations of μi from the synthetic control are
small. A2(a) does not assume uniqueness of the synthetic control for each unit and, in
fact, identification of α0 does not require unique synthetic controls. A2(b) states that the
observation-specific disturbance is conditionally independent of Di − j=i wij Dj .
A3 is a rank condition. A necessary condition implied by A4 is T ≥ k + 1. A4
requires independent variation in each component of Dit − j=i wij Djt . Furthermore, it
requires independent variation in λt μi , corresponding to the last column ( j=i wij Yjt ). Alternatively, I could write the rank condition as a function of λt μi terms. I use the above
14
condition since it is less restrictive. The alternative rank condition would use
⎤
⎡
j ⎢ (Di1 − j=i wi Dj1 )
⎢
..
E⎢
.
⎢
⎣
(DiT −
j j=i wi DjT )
λ 1 μ1
..
.
. . . λ1 μi−1
..
..
.
.
λ1 μi+1
..
.
. . . λ1 μ N
..
..
.
.
λT μ1 . . . λT μi−1 λT μi+1 . . . λT μN
However, it is not necessary to require independence of Di1 −
j=i
⎥
⎥
⎥
⎥
⎦
wij Dj1 to the full set of
interactive fixed effects for each unit. Some linear combinations of these interactive fixed
effects are not allowed because they are excluded from Wi . A4 limits the rank condition to
only the permitted weighted combinations.
3.2
Estimation
The estimator simultaneously estimates the parameters associated with each treatment
variable while constructing a synthetic control for each unit.
The synthetic control is
an estimate of Yit − Dit α̂ using a constrained weighted average of (Yjt − Djt
α̂) for all
j
acts as an estimate of λt μi , constructed empirically to
j = i.
j=i wi Yjt − Djt α̂
minimize the sum of the square of the residuals. Under the assumptions listed above,
j
−
D
α̂
does not have to be a perfect estimate of λt μi .
w
Y
jt
jt
i
j=i
This approach generates a synthetic control for each unit while accounting for the
causal effects of the policy variables. This joint estimation allows for the weighted average
of Yjt − Djt α̂ to account for λt μi .
3.2.1
Implementation
The estimator chooses an estimate for α0 and simultaneously constructs a synthetic control
for each state. This approach is similar in spirit to an additive fixed effect estimator which
jointly estimates the parameters of interest while using an unweighted average of all states
15
as a control for each unit. The synthetic controls are constructed by selecting weights in Wi .
Let |·| represent the Euclidian norm. The estimator is
(α̂, ŵ1 , . . . , ŵN ) =
⎫
⎧
⎬
⎨ j
Y
φ
Y
argmin
−
D
b
−
−
D
b
it
jt
it
jt
i
⎭
b,φ1 ∈W1 ,...,φN ∈WN ⎩
t
i
j=i
ŵij
Yjt −
Djt
α̂
(7)
j=i
is used as a “control” for unit i. It would be inappropriate to
construct a synthetic control based solely on Yit since Yit is causally impacted by Dit for all
units.
In practice, I implement the above minimization in steps. I create a grid of possible
values for α0 and search over that grid. For each “guess” b, I calculate Yit − Dit b for all
i and then estimate the synthetic control for each unit i. The estimation of each synthetic
control involves a constrained minimization. Then, I calculate the argument in equation (7).
The b that minimizes this objection function is α̂.
This procedure assumes that there are only one or two treatment variables such
that grid-searching is possible. For each b, one can simply use the synth command5 , with
Yit − Dit b as the outcome, developed for the CSSC estimator to obtain a synthetic control
for each state. This approach estimates the parameters jointly. Other estimation methods
of the synthetic control weights and the parameters of interest are also possible.
5
More information can be found at http://web.stanford.edu/ jhain/synthpage.html
16
3.2.2
Weighting
With CSSC, it is customary to visually check the fit of the synthetic control with the treated
unit in the pre-period. It may not be possible to generate an appropriate synthetic control
for the treated unit, casting doubts in those circumstance on the estimate generated by
CSSC. For the estimator of this paper, the parallel concern is that Assumption A2(a) may
not hold for all units. It should be noted that additive fixed effect models assume a stricter
version of Assumption A2(a). I employ a two-step procedure which places more weight on
units with “more appropriate” synthetic controls. The steps are as follows:
1. For each state i,
⎧
⎫
⎨ ⎬
i ≡ min
bi φji Yjt − Djt
Ω
Yit − Dit bi −
bi ,φi ∈Wi ⎩
⎭
t
j=i
In the first step, I allow the parameters (bi ) to vary by state. The purpose of the first
step is simply to estimate a measure of fit by state. If a unit or units do not have valid
synthetic controls, then these may affect the first-step estimate of α0 and the measure
of fit for all states.6 Instead, I estimate the sum of squared errors by state so that
the units without valid synthetic controls do not impact the variance estimate for the
other units.
−1 .
2. Let Ai ≡ Ω
i
3. Minimize the weighted sum of squared residuals:
(α̂, ŵ1 , . . . , ŵN ) =
6
It is possible that α0 is not identified when using only a single unit. However, this will not cause
problems in estimating the variance for each unit.
17
⎧
⎫
⎨ ⎬
argmin
b φji Yjt − Djt
Yit − Dit b −
Ai ⎭
b,φ1 ∈W1 ,...,wN ∈WN ⎩
i
t
(8)
j=i
This method will reduce the weight on units with poor synthetic controls. I employ
this two-step method in the simulations.7
3.2.3
Discussion
The primary motivation of GSC is to extend synthetic control estimation to applications with
multiple treatment variables, multiple adopters, non-binary treatments, etc. It is instructive
to discuss the usefulness of GSC in more basic cases, including those in which CSSC is
typically used, such as applications with multiple adopters of the same policy or even with
case studies when conditioning on additional covariates is important for identification.
It is common to apply the CSSC estimator to applications where multiple units adopt
the same treatment. For each treated unit, a separate treatment effect is estimated. However,
there is no accepted method for aggregating the multiple estimates. GSC aggregates these
estimates naturally by estimating a single parameter across all treated units.8
Furthermore, when multiple units are treated, it is unclear how to construct the
donor pools for each treated unit. For example, imagine a policy which all 50 states adopt
over the sample period but at different times. The donor pool is empty under the requirement
that the donor pool consists only of untreated states. With a fixed effects model, it is
unnecessary to separate the states into ever-treated states and never-treated states. With
CSSC, however, this distinction is required. Donohue et al. (2015) studies right-to-carry
7
The weighting procedure makes little difference in the final estimates of the minimum wage application
(see Table 3). Though not shown, the simulation results (Tables 1 and 2) were also similar with and without
the second step of the two-step approach.
8
If unit-specific estimates are desired, they are still available in this framework by interacting the treatment
variables with unit-specific indicators.
18
(RTC) laws. Only 9 states had never passed RTC legislation by 2012 and the authors
hypothesize that is too few to construct proper synthetic controls for each adopting state
and to conduct appropriate inference. To expand the donor pool, Donohue et al. (2015)
include some adopting states in the donor pool based on the time of adoption. Similarly,
Billmeier and Nannicini (2013) expands the donor pool for their application by restricting
the analysis to countries with a sufficient number of countries which were non-treated within
10 years of the treated country (meeting other restrictions as well).
The synthetic control estimator introduced above parallels models with additive
fixed effects. Treated units are (potentially) also used as controls when appropriate. Dividing
units into treatment units and control units is unnecessary. The GSC estimator combines
the benefits of CSSC (interactive fixed effects) with the benefits of traditional panel data
models (using all states as possible controls).
Moreover, a major appeal of CSSC is the ability to construct a control unit with the
same pre-treatment pattern of the outcome variable. Abadie and Gardeazabal (2003) and
Abadie et al. (2010) model the matching procedure to generate the synthetic control weights
as a function of pre-treated outcomes and covariates. However, Kaul et al. (2015) points
out that if each pre-treatment outcome is used for the matching, then the covariates will
have no effect on the synthetic control and effectively drop out. Kaul et al. (2015) proposes
that only one pre-treatment outcome or the average pre-treatment value of the outcome
variable should be used. This approach undermines a primary motivation for using CSSC.
The introduced framework allows for the outcome to be a function of multiple variables
and, consequently, it is possible to create a synthetic control for the treated unit while also
accounting for other time-varying covariates. These covariates are not used to create the
synthetic control but can have their own independent effects on the outcome. Again, this
generalizes traditional additive fixed effect models.
19
3.3
Identification
Identification of α0 holds under the assumptions listed above. These assumptions are similar
to those for many panel data estimators with the exception that the treatment variables must
have independent variation relative to the synthetic controls (i.e., relative to the weighted
average of the other units). In an additive fixed effect model, the corresponding assumption
is that the treatment variables must have independent variation relative to the unweighted
average of the other units.
j
Theorem 3.1 (Identification). If A1-A3 hold, then E Yit − Dit b − j=i wi Yjt − Djt b has a unique minimum and b = α0 at this minimum.
I include a discussion in Appendix Section A.1. Theorem 3.1 only states that α0 is
unique. The weights are not necessarily unique but unique weights do not impact identification of the treatment effects.
3.4
Consistency
Throughout this paper, I consider the weights which generate the synthetic controls as nuisance parameters which are necessary for consistent estimation of α0 . To discuss properties of
the GSC estimates, however, it is helpful to make assumptions about the synthetic untreated
outcomes, which I denote Γit (b). For a given b, it is possible to estimate the corresponding
synthetic control weights for each unit. Define
Γit (b) ≡
N
wji (b) Yjt − Dit b ,
j=1
T
N
i
φj Yjt − Djt b .
where wi (b) = argmin
Yit − Dit b −
φi ∈Wi t=1 j=1
20
(9)
The wji (b) are defined by equation (9). These are the synthetic control weights if the treatment effects were equal to b. I discuss consistency for T → ∞ and assume that N is fixed.
Consistency of α0 requires additional assumptions on the data:
A4: (Yit , Dit ) stationary and ergodic for each i.
A5: Define Θ = b | Yit − Dit b
≤ Yit for all i, t and assume Θ is compact with
α0 in the interior of Θ.
A6: Yit < ∞.
A7: Γit (b) continuous in b and twice differentiable.
A4 permits dependence within each unit. Other dependence structures would also
generate similar theoretical results. I impose no assumptions about independence or dependence across units. The inference procedure will allow for dependence across units and
arbitrary dependence within-units. A5 assumes that the parameter space is compact. It
makes a further restriction that Θ only include b such that Yit − Dit b
≤ Yit . This
parameter set is not empty since it holds for b = 0. This assumption makes the discussion
of consistency and asymptotic normality more straightforward. A6 states that the norm
of the outcome is finite. A7 disallows large jumps in the value of the synthetic control for
small changes in the treatment parameters. This assumption does not rule out large changes
in the weights for any synthetic control as b changes. Given that the weights are not necessarily unique, we might expect that small changes in the parameters (b) may generate
large changes in the synthetic control weights. However, this does not imply large changes
in Γit (b). Given that Yit − Dit b is continuous in b and the synthetic control is trying to fit
this term, assuming continuity of Γit (b) is likely unrestrictive.
Under these assumptions, uniform convergence of the objective function holds (Theorem 2 in Jennrich (1969)) and, consequently, the estimates are consistent by Theorem 2.1
21
of Newey and McFadden (1994):
p
−→ α0 .
Theorem 3.2 (Consistency). If A1-A7 hold, then α
See Appendix A.1 for a more detailed discussion.
3.5
Asymptotic Normality
The proposed inference procedure uses the gradient of the objective function and relies on
the gradient converging to a normally-distributed random variable. This requires asymptotic
normality of the α0 estimates. This section focuses primarily on showing that the estimate
√
converges at a T rate or faster. As before, I assume fixed N . Because I make no assumptions
about dependence across units, it is only possible to say that the estimate converges at least
√
as fast as T .
Define the gradient for unit i as
T
1
gi (b) ≡ −
E
T t=1
∂Γit (b) Yit − Dit b − Γit (b)
Dit −
∂b
Define
T
1
Ĥi (b) ≡
T t=1
(b)
(b)
∂Γ
∂Γ
∂ 2 Γit (b) it
it
Yit − Dit b − Γit (b) + Dit −
.
Dit −
∂b∂b
∂b
∂b
The Hessian (H), evaluated at α0 , is
!
H≡E
∂Γit (α0 )
Dit −
∂b
"!
I impose additional regularity conditions:
22
∂Γit (α0 )
Dit −
∂b
"
,
A8: H full rank.
A9: Σ ≡ limT →∞ Var
#1 N
i
$
gi (α0 ) exists.
1 ∂ 2 Γit (b) A10: There exist constants C1 and C2 such that T t ∂b∂b Yit − Dit b − Γit (b) ≤
it (b)
it (b)
≤ C2 < ∞, for all i and b ∈ Θ.
Dit − ∂Γ∂b
C1 < ∞ and T1 t Dit − ∂Γ∂b
A8 is related to A3 but includes independence of the gradient of Γit (b). A9 (with
A4) implies that a central limit theorem holds related to the gradient function. A10 is
necessary for uniform convergence of the Hessian matrix.
Theorem 3.3 (Asymptotic Normality). If A1-A10 hold, then
√
d
− α0 ) −→ N (0, H −1 ΣH −1 )
M (α
for some M ≥ T .
I include a discussion of this theorem in Appendix A.1. Again, I make no assumptions on independence across units so can only show that the estimates converge at a rate
√
√
T or faster, though they potentially converge at rate N T . Estimation of Σ is likely
complicated when N is fixed and correlations across units exists. Estimation of H is also
likely complicated. These concerns motivate the inference procedure discussed below. The
inference procedure will not require estimation of H and will account for dependence across
units.
3.6
Inference
I discuss an inference procedure that is valid for fixed N as T → ∞. Furthermore, the
procedure permits proper inference even if the clusters are correlated. This feature is important in this context and, more generally, whenever a small number of units are (implicitly
or explicitly) used as controls for other units because this approach generates a mechanical
correlation across clusters. For the synthetic control estimator of this paper, the estima-
23
tor constructs a synthetic control for each unit through a weighted (constrained) average of
using all other units. Consequently, each unit composes part of the synthetic conα
Yjt −Djt
trols for other units, generating correlations across units. This section follows the inference
procedure developed in Powell (2015) for panel data estimators more generally.
The method simulates the distribution of a test statistic using weights generated by
the Rademacher distribution which is equal to 1 with probability
1
.
2
1
2
and -1 with probability
The method uses the panel nature of the data to estimate the relationship between the
gradient functions for each cluster and isolates the independent part of the gradient for
each cluster. Once the independent functions are isolated, appropriate inference is possible
through simulation.
For inference, this paper considers a null hypothesis of the form
H0 : a(b) = 0.
(10)
This framework allows for nonlinear hypotheses and follows the notation used in Newey and
West (1987). I assume that null hypothesis only involves b and not the synthetic control
weights. The null hypothesis will be tested by imposing the restriction when minimizing the
objective function:
(α̃, w̃1 , . . . , w̃N ) =
⎫
⎧
⎨ ⎬
argmin
φji Yjt − Djt
b subject to a(b) = 0.
Yit − Dit b −
⎭
b,φ1 ∈W1 ,...,φN ∈WN ⎩
i
t
j=i
(11)
In words, the null hypothesis is imposed and the other parameters (including the synthetic
24
control weights) are estimated. The inference procedure will use the gradient with respect
to α for each unit i with the null hypothesis imposed, which I denote
gi (α̃, w̃i ) =
1
T
T
t=1
⎡
⎤⎡
⎤
⎣Dit −
α̃ ⎦ .
w̃ji Djt ⎦ ⎣Yit − Dit α̃ −
w̃ji Yjt − Djt
j=i
(12)
j=i
gi (α̃, w̃i ) is a K × 1 vector. The inference procedure introduced in Powell (2015), uses only
a subset of these elements (H ≤ K), defined by the conditions below. I denote the function
that will be used in the inference method by si (α̃, w̃i ) and introduce conditions that these
functions must meet.
I1 (Identification): There exists H × 1 vector si (·) such that
E si (α̃, w̃i ) = (q1 , . . . , qH )
where
E si (α̃, w̃i ) = 0
if
a(b) = 0
qh = 0
if
a(b) = 0
for all h
Assumption I1 is an identification assumption that is met by at least one element of gi (α̃, w̃i )
under Theorem 3.1. This condition states that only some elements of the gradient will be
used. Some elements of gi (α̃, w̃i ) will equal 0 even when the null hypothesis is not true and
these conditions are not useful for inference. Equation (12) also excludes the gradient of the
objective function with respect to the synthetic weights.
Under the null hypothesis and Theorem 3.3, we know that
√
d
T si (α̃, w̃) −→ N (0, Ṽi )
for some Ṽi .9 Given arbitrary within-unit dependence, estimation of Ṽi is difficult. Instead,
I perturb the si functions given weights meeting the following conditions.
I2 (Weights): Assume i.i.d weights {Wi }N
i=1 independent of si (α̃, w̃i ) such that for all i
a. E[Wi ] = 0.
9
This result follows from Theorem 3.3 and continuity of the gradient under A7.
25
b. E[Wi2 ] = 1.
c. E |Wi |r < Δ < ∞ for all i.
Given these weights, it is possible to simulate the distribution of si (α̃, w̃i ) for each i.
I use the Rademacher distribution which has the advantage that E[Wi3 ] = 0 and E[Wi4 ] = 1
such that using this distribution is able to asymptotically match symmetric distributions
(such as the normal distribution) and provide asymptotic refinement.10 Condition I2c is
necessary to ensure that a CLT holds for the weighted functions. Finally, I assume that the
gradient functions across clusters are asymptotically independent:
#
$ p
I3 (Asymptotic Independence): Cov si (α̃, w̃i ), sj (α̃, w̃j ) −−−−→ 0 for all i = j as T → ∞.
This assumption is likely not met by the gradient function for GSC. However, it is possible
to construct a function that meets this condition using the gradients. Before I discuss that
step, I include a result from Powell (2015):11
Lemma 3.1. Assume a(b) = 0 and that A1-A10, I1-I3 hold with T → ∞. Then,
⎛
⎜ s1 (α̃, w̃)
√ ⎜
..
T⎜
.
⎜
⎝
sN (α̃, w̃)
⎞
⎞
⎛
⎟
⎜ Ṽ1
⎟ d
⎜
..
⎟ −−−−→ N ⎜0,
.
⎟
⎜
⎠
⎝
⎛
⎜ W1 s1 (α̃, w̃)
√ ⎜
..
T⎜
.
⎜
⎝
WN sN (α̃, w̃)
0 ⎟
⎟
⎟
⎟
⎠
ṼN
0
⎛
⎞
⎞
⎜ Ṽ1
⎟
⎜
⎟ d
...
⎟ −−−−→ N ⎜0,
⎜
⎟
⎝
⎠
0
0 ⎟
⎟
⎟
⎟
⎠
ṼN
The importance of Lemma 3.1 is that both vectors converge to the same distribution.
Because E[Wi Wj ] = 0, the convergence to the same distribution can only occur if si (α̃, w̃i )
10
If the number of units is less than 10, Powell (2015) recommends using the weights introduced in Webb
(2013).
11
Lemma 2.2 in Powell (2015).
26
and sj (α̃, w̃j ) are asymptotically uncorrelated since any correlation would not be preserved
by the weights. For inference, the method uses a Wald Statistic defined by:
⎛
⎞
⎞
N
N
1
α̃, w̃)−1 ⎝ 1
si (α̃, w̃i )⎠ Σ(
si (α̃, w̃i )⎠ ,
S=⎝
N i=1
N i=1
⎛
α̃, w̃) =
where Σ(
1
N −1
N
i=1
!
si (α̃, w̃i ) −
1
N
N
j=1
sj (α̃, w̃j )
" !
(13)
si (α̃, w̃i ) −
1
N
N
j=1
sj (α̃, w̃j )
This approach does not require estimation of Ṽ , permitting arbitrary within-cluster correlations. Instead, it uses the variance of si (α̃, w̃i ) across clusters.
(d)
To simulate the distribution of the test statistic, consider weights Wi
which satisfy
condition I3 and construct a simulated test statistic, indexed by (d):
⎛
⎞
⎞
N
N
1
(d)
(d)
(d) (α̃, w̃)−1 ⎝ 1
=⎝
Wi si (α̃, w̃i )⎠ Σ
W si (α̃, w̃i )⎠ ,
N i=1
N i=1 i
⎛
S (d)
(14)
⎛
⎛
⎞⎞ ⎛
⎛
⎞⎞
N
N
N
1
1
1
⎜ (d)
⎟ ⎜ (d)
⎟
(d)
(d)
(d) (α̃, w̃) =
where Σ
W sj (α̃, w̃j )⎠⎠ ⎝Wi si (α̃, w̃i ) − ⎝
W sj (α̃, w̃j )⎠⎠ .
⎝Wi si (α̃, w̃i ) − ⎝
N − 1 i=1
N j=1 j
N j=1 j
The scores are perturbed such that the test statistic can be simulated without
re-estimating the model. Under the null hypothesis, si (α̃, w̃i ) should be centered around
zero such that large values (in magnitude) suggest that the null hypothesis is incorrect.
(d)
Wi si (α̃, w̃i ) is centered around zero whether the null hypothesis is true or not. Consequently, when the null hypothesis is not true, the large value of S should be rare in the
distribution of S (d) .
Theorem 3.4. Assume a(b) = 0 and that A1-A10, I1-I3 hold with T → ∞. Then, S and
S (d) converge to the same distribution.
Theorem 3.4 follows from Lemma 3.1 and the continuous mapping theorem. See
27
"
.
Powell (2015) for more details.
3.6.1
Adjusting for Correlations
The above results assume asymptotic independence across clusters. It is necessary to construct asymptotically independent functions for each unit using the gradient. I model each
element of the gradient (denoted by k) for unit i as correlated with the gradient for all other
units by
gitk (α, w̃i ) =
L
aik Mtk
,
with E[Mtk
] = 0 for all ,
˜
˜ (15)
E[Mt Mtk
] = 0 for = .
=1
Given that no restrictions are placed on aik , this assumption is not restrictive in terms of
constraining the variance of git . Furthermore, no restrictions are placed on the relationship
between Mt and Ms for all s and t, permitting arbitrary within-cluster correlations. Instead,
this assumption rules out that lags or leads of the M random variables independently affect
a unit without proportionately affecting other units. This condition is met if it and js are
uncorrelated for i = j or t = s. It is straightforward to extend this approach to allow for such
leads and lag, though this extension is not pursued here.12 This setup permits correlations
driven by the synthetic control approach. Furthermore, it allows for arbitrary within-time
correlations across units which are determined by outside (non-estimation) factors such as
common shocks. States may have correlated gradients because they experience similar shocks
and trends due to geographic proximity, similar political institutions, comparable industry
composition, etc. The inference approach will account for correlations regardless of the
underlying mechanisms.
I rewrite the above setup in a triangular framework such that gitk is only a function of
12
See Powell (2015) for more about this extension.
28
gi+1,tk , . . . , gN,tk . This is a simple rearrangement of equation (15) and imposes no additional
assumptions.
gitk (α̃, w̃i ) =
N
cijk gjtk (α̃, w̃i ) + μitk ,
(16)
j=i+1
where μitk is uncorrelated with μjtk for all i = j. The above setup separates the gradi
i
ent of each unit into a dependent component ( N
j=i+1 cjk gjtk (α̃, w̃i )) and an independent
component (μitk ). Using OLS, estimate ĉijk in equation (16) for all i, j, k. Then,
⎛
Cov ⎝gitk −
N
j=i+1
ĉijk gjtk ,
N
gmtk −
⎞
⎠ −−−−→ 0 for all i = m.
ĉm
jk gjtk
p
(17)
j=m+1
Estimation of equation (16) allows for the correlations across clusters to be estimated. It
is then possible to create independent functions for each cluster. Given asymptotically
uncorrelated functions, Theorem 3.4 holds. Alternatively, it is also possible to estimate
⎛
gitk (α̃, w̃i ) = cik ⎝
N
⎞
gjtk ⎠ + μitk .
(18)
j=i+1
The cijk terms are not parameters of interest so it is possible to aggregate them into one
estimate. The inference steps are as follows:
1. Estimate the constrained parameters using equation (11).
2. Create the gradient for each unit using equation (12) and meeting assumption I1.
3. Estimate equation (16) (or equation (18)) for each unit and form the residual. These
are the si functions.
4. Create the test statistic defined by equation (13).
5. Simulate the test statistic using Rademacher weights 999 times (equation (14)).
29
6. The p-value is equal to the number of times that S is greater than the simulated value
(d)
).
of S: p̂ = D1 D
d=1 1(S > S
Large values of S imply that we should rarely observe the distribution of gradients created
by the null hypothesis and that we should reject it. By Theorem 3.4, we can simulate how
often we should see a value as large as S if the null hypothesis were true.
4
Simulations
In this section, I report the results of simulations using the synthetic control estimator of this
paper and the more traditional additive fixed effects estimator. I generate data with two sets
of interactive fixed effects. First, I generate data with a continuous treatment effect which is
a function of the interactive fixed effects. Second, I create a standard difference-in-differences
situation where the treatment is represented by a dummy variable.
4.1
Continuous Treatment Effect
The data are generated by
(1) (1)
(2) (2)
Treatment Variable: dit = φit + μi λt + 2μi λt
(1) (1)
(2) (2)
Outcome: yit = 2μi λt + μi λt + it
(1)
(2)
where μi , μi
(1)
∼ U (0, 1); λt
(2)
∼ N (0, 400); λt
∼ N (0, 100); φit ∼ U (0, 50); it ∼ N (0, 1).
Both the outcome and the treatment variable are functions of the interactive fixed effects.
Not accounting for these interactive fixed effects should lead to biased estimates. I generate
30
the above data for N = 30 and T = 30. The treatment effect is zero, and the treatment
variable is continuous. In the top part of Table 1, I report Mean Bias, Median Absolute
Deviation, and Root-Mean-Square Error (RMSE). The first estimator is an additive fixed
effects estimator which includes both state and time fixed effects. The second estimator
is the synthetic control estimator of this paper. The fixed effects estimator, as expected,
performs poorly. The synthetic control estimator performs well.
In the bottom half of Table 1, I test the inference procedure discussed in Section
3.6. For the additive fixed effects estimator, I adjust the standard errors for clustering at the
state level and use a t-distribution with 29 degrees of freedom. The fixed effects estimator
always rejects the null hypothesis (α0 = 0). The inference procedure of Section 3.6 using the
synthetic control estimates works well at multiple significance levels.
Table 1: Simulation Results: Continuous Treatment Variable
Fixed Effects
Synthetic Control
Fixed Effects
Synthetic Control
Mean
Bias
Median
Absolute Deviation
Root Mean
Squared Error
0.2988
0.0011
Rejection Rate
Significance Level = 0.10
1.000
0.099
0.2966
0.0000
Rejection Rate
Significance Level = 0.05
1.000
0.055
0.3055
0.0061
Rejection Rate
Significance Level = 0.01
1.000
0.009
The “fixed effects” estimator refers to a model with additive state and time fixed effects. For
inference, standard errors are adjusted for clustering at the state level and a t-distribution
with 29 degrees of freedom is used. The “synthetic control” estimator is introduced in this
paper. The inference procedure is discussed in Section 3.6.
4.2
Difference-in-Differences
I generate similar data as above but use a discrete treatment variable. This treatment
variable is equal to 0 for all units initially. Once the treatment is adopted, the unit never
repeals the treatment. Finally, all units adopt treatment by period t = 28. It would be
31
difficult to use CSSC to estimate the relevant parameter with these data since it is not clear
which states to consider “treated” and which to include as part of the donor pool for those
treated units. The data are generated by
(1) (1)
(2) (2)
Define ait = 1 φit + μi λt + 2μi λt > 45
⎧
⎪
⎪
⎪
0 if t = 1
⎪
⎪
⎪
⎨
Treatment Variable: dit = 1 if di,t−1 = 1 or ait = 1 or t ≥ 28
⎪
⎪
⎪
⎪
⎪
⎪
⎩0 otherwise
(1) (1)
(2) (2)
Outcome: yit = 2μi λt + μi λt + it
(1)
(2)
where μi , μi
(1)
∼ U (0, 1); λt
(2)
∼ N (0, 400); λt
∼ N (0, 100); φit ∼ U (0, 50); it ∼ N (0, 1).
This data generating process produces a serially-correlated treatment dummy variable equal
to 1 about 50% of the time. The simulation results are presented in Table 2. The additive
fixed effects estimator performs especially poorly as one would expect given the correlation
between the (omitted) interactive fixed effects and the treatment variable. The synthetic
control estimator performs well. It also rejects at approximately the correct rates.
5
Employment Effects of Minimum Wage
Recent work has suggested that the traditional empirical strategy in the minimum wage
literature – using all states as controls for states with increasing minimum wages – is inappropriate. The synthetic control estimator of this paper will empirically generate an
appropriate control for each state. I first replicate findings using the additive fixed effect
32
Table 2: Simulation Results: Difference-in-Differences
Fixed Effects
Synthetic Control
Fixed Effects
Synthetic Control
Mean
Bias
Median
Absolute Deviation
Root Mean
Squared Error
2.0639
-0.0013
Rejection Rate
Significance Level = 0.10
0.628
0.089
2.6580
0.0200
Rejection Rate
Significance Level = 0.05
0.567
0.052
4.2160
0.0277
Rejection Rate
Significance Level = 0.01
0.386
0.005
The “fixed effects” estimator refers to a model with additive state and time fixed effects. For
inference, standard errors are adjusted for clustering at the state level and a t-distribution
with 29 degrees of freedom is used. The “synthetic control” estimator is introduced in this
paper. The inference procedure is discussed in Section 3.6.
specification typically estimated in the literature:
ln Est = αs + γt + β ln(M Wst ) + st
where Est is the employment rate in state s, quarter t for 16-19 year olds. The log of
the employment rate is modeled as a function of state fixed effects, time fixed effects, and
the log of the minimum wage (M W ). I do not control for the overall state unemployment
rate or the relative size of the youth population, though these controls are typical in the
literature. These controls are potentially problematic if they are also affected by changes in
the minimum wage and are often included because of concerns that minimum wage policy
reacts to state economic conditions. An advantage of the synthetic control estimator is that
it should make inclusion of these control variables unnecessary.
I use the data set constructed for Dube and Zipperer (2015) and Allegretto et al.
(2015).13 This allows for straightforward comparisons of estimates generated by the synthetic control approach and fixed effects estimation. The data are state-level minimum wage
13
Code and data can be found at Arindrajit Dube’s site: http://arindube.com/working-papers/, accessed
January 15, 2016.
33
information merged with employment rates aggregated from the Current Population Survey
(CPS) for 1979-2014. The data are quarterly and aggregated by state. The outcome of
interest is the employment rate of the 16-19 population. There are 51 units and 144 time
periods.
Table 3 replicates results found in Table 2 of Allegretto et al. (2015) without conditioning on the control variables mentioned above. Columns (1)-(6) show that the estimate
varies between -0.243 and 0.083 depending on the inclusion and flexibility of state-specific
trends. Columns (7)-(12) include division-time fixed effects such that identification originates from minimum wage changes within a Census division. The estimates vary between
-0.228 and 0.125. In general, the estimates appear sensitive to the inclusion of state-specific
trends and choice of control group. GSC should account for these considerations.
I report the main empirical results of this paper in Columns (13) and (14) of Table
3. In column (13), I report GSC estimates which weight all states equally, regardless of the
fit of the synthetic control generated for each state. I estimate an elasticity of -0.45, larger
in magnitude than estimates typically found in the literature. In Section 3.2.2, I discussed
the merits of a two-step procedure which places more weight on states with better synthetic
controls. Such a two-step procedure could also be employed for fixed effect estimators but
this is not typically done in applied work. Column (14) reports the estimates using the
two-step procedure. I estimate an elasticity of -0.44, implying that a 10% increase in the
minimum wage decreases the teen employment rate by 4.4%. This estimate is statistically
significant from zero at the 1% level. These estimates are larger in magnitude than those
typically generated by additive fixed effect models, suggesting that those models produce
estimates that are biased against finding a relationship between the minimum wage and
employment.
34
Table 3: Effect of Minimum Wage on Teen Employment
Outcome:
ln(Minimum Wage)
State Fixed Effects
Time Fixed Effects
State Trends
Division-specific Time Effects
N
ln(Minimum Wage)
State Fixed Effects
Time Fixed Effects
State Trends
Division-specific Time Effects
N
ln(Minimum Wage)
Two-Step Weighting?
N
ln(Employment / Population)
Additive State and Time Fixed Effects
(2)
(3)
(4)
(5)
0.044
0.053
0.083
0.017
[-0.120, 0.208]
[-0.139, 0.245]
[-0.097, 0.262]
[-0.174, 0.209]
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Linear
Quadratic
Cubic
Quartic
No
No
No
No
7344
7344
7344
7344
Additive State and Division-Time Fixed Effects
(7)
(8)
(9)
(10)
(11)
-0.228*
0.106
0.125
0.105
0.077
[-0.469, 0.013]
[-0.056, 0.267]
[-0.105, 0.356]
[-0.070, 0.281]
[-0.099, 0.252]
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Linear
Quadratic
Cubic
Quartic
Yes
Yes
Yes
Yes
Yes
7344
7344
7344
7344
7344
Synthetic Control Estimates
(13)
(14)
-0.45**
-0.44***
[-1.19, -0.02]
[-0.77, -0.02]
No
Yes
7344
7344
(1)
-0.243***
[-0.406, -0.080]
Yes
Yes
No
No
7344
(6)
0.026
[-0.166, 0.218]
Yes
Yes
Quintic
No
7344
(12)
0.084
[-0.098, 0.266]
Yes
Yes
Quintic
Yes
7344
Notes: Significance Levels: *10%, **5%, ***1%. 95% Confidence Intervals in brackets are adjusted for clustering at statelevel. Hypothesis testing for synthetic control estimator use simulation technique discussed in text. Inference for the
synthetic control estimates accounts for within-state clustering and cross-sectional correlations. 2-step weighting uses the
estimation procedure discussed in Section 3.2.2.
35
6
Discussion and Conclusion
This paper introduces a synthetic control estimator which generalizes the case study synthetic control estimator of Abadie and Gardeazabal (2003) and Abadie et al. (2010). It
also generalizes traditional additive fixed effect models. The estimator jointly estimates
the parameters associated with the treatment variables while creating synthetic controls for
each unit. These synthetic controls permit the treatment variables to co-vary with flexible
state-specific trends.
The estimator works well in simulations. I provide evidence of the usefulness of the
estimator by estimating the effect of the minimum wage on teenage employment rates. The
recent literature on this topic has noted that choice of appropriate control groups for states
increasing their minimum wages is essential for consistent estimation. The estimator of this
paper creates these control groups empirically instead of assuming that all other states are
appropriate as additive fixed effect estimator assume. The estimator in this paper should be
useful more broadly for applications using panel data.
36
A
Appendix
A.1
Properties of Estimator
j
Theorem 3.1 (Identification). If A1-A3 hold, then E Yit − Dit b − j=i wi Yjt − Djt b has a unique minimum and b = α0 at this minimum.
Proof.
⎡
⎤
E ⎣Yit − Dit b −
b ⎦
wij Yjt − Djt
j=i
is minimized for b, wi ∈ Wi such that
Dit b
+
j=i
wij
conditional expectation of Yit . I first show that Dit α0 +
Djt
b
is equal to the
Yjt −
j
is the
j=i wi Yjt − Djt α0
expected value of Yit .
⎡
E ⎣Yit − Dit α0 −
⎡
⎛
⎞
wij
⎤
j
w i Dj ⎦
Yjt − Djt α0 Di −
j=i
j=i
⎤
j
j j ⎥
⎢
wi μj ⎠ + it −
wi jt Di −
w i Dj ⎦
= E ⎣λt ⎝μi −
j=i
j=i
by A1
j=i
= 0 by A2.
The above implies that
⎡
⎤
⎤
⎡
E ⎣Yit Di −
wij Dj ⎦ = E ⎣Dit α0 +
wij Yjt − Djt α0 Di −
wij Dj ⎦
j=i
⎡⎛
j=i
j=i
⎤
j
j j ⎥
⎢
= E ⎣⎝Dit −
wi Djt ⎠ α0 +
wi Yjt Di −
w i Dj ⎦
j=i
37
⎞
j=i
j=i
By assumption A3, we know that this is unique.
p
−→ α0 .
Theorem 3.2 (Consistency). If A1-A7 hold, then α
Proof. By the triangle inequality,
j wi Yjt − Djt
wij Yjt − Djt
b ≤ Yit − Dit b + b Yit − Dit b −
j=i
j=i
j
≤ Yit + wi Yjt By A5
j=i
≤ Yit + max Yjt By definition of Wi
j
< ∞ By A6
This shows that we can find a function d(Y ), with E[d(Y )] < ∞, which dominates
b for all b. Given that the parameter set is compact by
Yit − Dit b − j=i wij Yjt − Djt
b is continassumption, identification of α0 holds, and Yit − Dit b − j=i wij Yjt − Djt
uous (by A7), then uniform convergence of the objective function to its expectation holds
(Theorem 2 in Jennrich (1969)).
Consistency of α̂ follows immediately from Theorem 2.1 of Newey and McFadden
(1994).
Theorem 3.3 (Asymptotic Normality). If A1-A10 hold, then
for some M ≥ T .
1. Consistency holds by Theorem 3.2.
2. α0 is in the interior of Θ by A5.
38
√
d
− α0 ) −→ N (0, H −1 ΣH −1 )
M (α
3. The objective function is twice differentiable by A7.
, d
4. By condition A9, M
i gi (b) −→ N (0, Σ).
N
Finally, by A10, a dominance condition holds for Ĥit (b) for all b and, consequently,
Ĥi (b) uniformly converges to H(b) by Theorem 2 in Jennrich (1969). By A8, H(α0 ) is
nonsingular. Under these conditions, Theorem 3.1 of Newey and McFadden (1994) holds,
proving the theorem.
39
References
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller, “Synthetic control methods for comparative case studies: Estimating the effect of Californias tobacco control
program,” Journal of the American Statistical Association, 2010, 105 (490).
, , and , “Comparative politics and the synthetic control method,” American Journal
of Political Science, 2014.
and Javier Gardeazabal, “The economic costs of conflict: A case study of the Basque
Country,” American Economic Review, 2003, pp. 113–132.
Allegretto, Sylvia A, Arindrajit Dube, and Michael Reich, “Do minimum wages
really reduce teen employment? Accounting for heterogeneity and selectivity in state panel
data,” Industrial Relations: A Journal of Economy and Society, 2011, 50 (2), 205–240.
Allegretto, Sylvia, Arindrajit Dube, Michael Reich, and Ben Zipperer, “Credible Research Designs for Minimum Wage Studies: A Response to Neumark, Salas, and
Wascher,” Unpublished manuscript, 2015.
Ando, Michihito, “Dreams of urbanization: Quantitative case studies on the local impacts of nuclear power facilities using the synthetic control method,” Journal of Urban
Economics, 2015, 85, 68–85.
and Fredrik Sävje, “Hypothesis Testing with the Synthetic Control Method,” 2013.
Bai, Jushan, “Panel data models with interactive fixed effects,” Econometrica, 2009, 77
(4), 1229–1279.
Bauhoff, Sebastian, “The effect of school district nutrition policies on dietary intake and
overweight: A synthetic control approach,” Economics & Human Biology, 2014, 12, 45–55.
Bester, C Alan, Timothy G Conley, and Christian B Hansen, “Inference with
dependent data using cluster covariance estimators,” Journal of Econometrics, 2011, 165
(2), 137–151.
Billmeier, Andreas and Tommaso Nannicini, “Assessing economic liberalization
episodes: A synthetic control approach,” Review of Economics and Statistics, 2013, 95
(3), 983–1001.
Cavallo, Eduardo, Sebastian Galiani, Ilan Noy, and Juan Pantano, “Catastrophic
natural disasters and economic growth,” Review of Economics and Statistics, 2013, 95 (5),
1549–1561.
Cunningham, Scott and Manisha Shah, “Decriminalizing indoor prostitution: Implications for sexual violence and public health,” Technical Report, National Bureau of Economic Research 2014.
40
Donohue, John, Abhay Aneja, and Kyle D. Weber, “Do Handguns Make Us Safer?
A State-Level Synthetic Controls Analysis of Right-to-Carry Laws,” 2015.
Dube, Arindrajit and Ben Zipperer, “Pooling Multiple Case Studies Using Synthetic
Controls: An Application to Minimum Wage Policies,” 2015.
, T William Lester, and Michael Reich, “Minimum wage effects across state borders:
Estimates using contiguous counties,” Review of Economics and Statistics, 2010, 92 (4),
945–964.
Eren, Ozkan and Serkan Ozbeklik, “What Do Right-to-Work Laws Do? Evidence from
a Synthetic Control Method Analysis,” Journal of Policy Analysis and Management, 2015.
Fletcher, Jason M, David E Frisvold, and Nathan Tefft, “Non-linear effects of soda
taxes on consumption and weight outcomes,” Health Economics, 2014.
Gobillon, Laurent and Thierry Magnac, “Regional policy evaluation: Interactive fixed
effects and synthetic controls,” Review of Economics and Statistics, forthcoming.
Hansen, Christian B, “Asymptotic properties of a robust variance matrix estimator for
panel data when T is large,” Journal of Econometrics, 2007, 141 (2), 597–620.
Hinrichs, Peter, “The effects of affirmative action bans on college enrollment, educational
attainment, and the demographic composition of universities,” Review of Economics and
Statistics, 2012, 94 (3), 712–722.
Ibragimov, Rustam and Ulrich K Müller, “t-Statistic based correlation and heterogeneity robust inference,” Journal of Business & Economic Statistics, 2010, 28 (4), 453–468.
Jennrich, Robert I, “Asymptotic properties of non-linear least squares estimators,” The
Annals of Mathematical Statistics, 1969, pp. 633–643.
Kaul, Ashok, Stefan Klößner, Gregor Pfeifer, and Manuel Schieler, “Synthetic
Control Methods: Never Use All Pre-Intervention Outcomes as Economic Predictors,”
2015.
Mideksa, Torben K, “The economic impact of natural resources,” Journal of Environmental Economics and Management, 2013, 65 (2), 277–289.
Munasib, Abdul and Dan S Rickman, “Regional economic impacts of the shale gas and
tight oil boom: A synthetic control analysis,” Regional Science and Urban Economics,
2015, 50, 1–17.
and Mouhcine Guettabi, “Florida Stand Your Ground Law and Crime: Did It Make
Floridians More Trigger Happy?,” Available at SSRN 2315295, 2013.
Neumark, David, JM Ian Salas, and William Wascher, “More on recent evidence on
the effects of minimum wages in the United States,” IZA Journal of Labor Policy, 2014,
3 (1), 24.
41
, , and , “Revisiting the Minimum Wage–Employment Debate: Throwing Out the
Baby with the Bathwater?,” Industrial & Labor Relations Review, 2014, 67 (3 suppl),
608–648.
Newey, Whitney K. and Daniel McFadden, “Large sample estimation and hypothesis
testing,” Handbook of Econometrics, 1994, 4, 2111–2245.
Newey, Whitney K and Kenneth D West, “Hypothesis testing with efficient method
of moments estimation,” International Economic Review, 1987, pp. 777–787.
Nonnemaker, James, Mark Engelen, and Daniel Shive, “Are methamphetamine precursor control laws effective tools to fight the methamphetamine epidemic?,” Health Economics, 2011, 20 (5), 519–531.
Powell, David, “Inference with Correlated Clusters,” 2015.
, Rosalie Liccardo Pacula, and Mireille Jacobson, “Do Medical Marijuana Laws
Reduce Addictions and Deaths Related to Pain Killers?,” Working Paper 21345, National
Bureau of Economic Research July 2015.
Robbins, Michael, Jessica Saunders, and Beau Kilmer, “A Framework for Synthetic
Control Methods with High-Dimensional, Micro-Level Data,” 2015.
Sabia, Joseph J, Richard V Burkhauser, and Benjamin Hansen, “Are the effects
of minimum wage increases always small? New evidence from a case study of New York
state,” Industrial & Labor Relations Review, 2012, 65 (2), 350–376.
Saunders, Jessica, Russell Lundberg, Anthony A Braga, Greg Ridgeway, and
Jeremy Miles, “A Synthetic Control Approach to Evaluating Place-Based Crime Interventions,” Journal of Quantitative Criminology, 2014, pp. 1–22.
Torsvik, Gaute and Kjell Vaage, “Gatekeeping versus monitoring: Evidence from a case
with extended self-reporting of sickness absence,” 2014.
Webb, Matthew D, “Reworking wild bootstrap based inference for clustered errors,” Technical Report, Queen’s Economics Department Working Paper 2013.
42
Download