§»1

advertisement
§»1
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/howmuchshouldwetOObert
DEWEY
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
HOW MUCH SHOULD WE TRUST
DIFFERENCES-IN-DIFFERENCES ESTIMATES?
Marianne Bertrand
Esther Duflo
Sendhil Mullainathan
Working Paper 01 -34
October 2001
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssrn.com/abstract id=xxxxx
s\
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
HOW MUCH SHOULD WE TRUST
DIFFERENCESIN-DIFFERENCES ESTIMATES?
Marianne Bertrand
Esther Duflo
Sendhil Mullainathan
Working Paper 01 -34
October 2001
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssm.com/abstract id= xxxxx
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
NOV
5 2001
LIBRARIES
How Much
Should
We
Trust Differences-in-Differences Estimates?
Marianne Bertrand
Esther Duflo
Sendhil Mullainathan*
This Version: October 2001
Abstract
Most
Difference-in-Difference
correlated outcomes. Yet almost
(DD) papers
all
that serial correlation introduces. This
of interest in
will
DD
estimation
(e.g.
rely
on many years of data and focus on
serially
these papers ignore the bias in the estimated standard errors
is
especially troubling because the independent variable
the passage of law)
is itself
very serially correlated, which
exacerbate the bias in standard errors. To illustrate the severity of this issue, we randomly
generate placebo laws in state-level data on female wages from the Current Population Survey.
For each law, we use
OLS
error for this estimate.
DD
to
compute the
The standard
DD
estimate of
estimation finds an "effect" significant at the
Two
its "effect"
as well as the standard
errors are severely biased: with about 20 years of data,
5%
level for
up
to
45%
very simple techniques can solve this problem for large sample
of the placebo laws.
sizes.
The
first
technique
and ignoring the time-series variation altogether; the second
technique is to estimate standard errors while allowing for an arbitrary covariance structure
between time periods. We also suggest a third technique, based on randomization inference
testing methods, which works well irrespective of sample size. This technique uses the empirical
distribution of estimated effects for placebo laws to form the test distribution.
consists in collapsing the data
NBER and CEPR; MIT, NBER and CEPR, MIT and
thank Alberto Abadie, Daron Acemoglu, Joshua Angrist, Abhijit Banerjee, Victor Chernozhukov, Kei
Hirano, Guido Imbens, Larry Katz, Jeffrey Kling, Kevin Lang, Steve Levitt, Kevin Murphy, Emmanuel Saez, Doug
Staiger, Bob Topel and seminar participants at Harvard, MIT, University of Chicago GSB, University of California at
Los Angeles, University of California Santa Barbara, and University of Texas at Austin for many helpful comments.
Tobias Adrian, Shawn Cole, and Francesco Franzoni provided excellent research assistance. We are especially grateful
to Khaled for motivating us to write this paper, e-mail: marianne.bertrand@gsb.uchicago.edu; eduflo@mit.edu;
'University of Chicago Graduate School of Business,
NBER..
We
mullain@mit.edu.
1
Introduction
1
Difference-in-Difference
relationships.
DD
passage of law).
(DD) estimation has become an
it
the difference in outcomes after and before the intervention for
one might
first isolate
One would then compare changes
unemployment duration
in
simplicity as well as
arise
potential to circumvent
The
DD
DD
estimation also has
its
for residents of states raising
great appeal of
many
when making comparisons between heterogeneous
Obviously,
a
its
unemployment insurance
states that have raised
benefits to residents of states not raising benefits.
its
to estimate causal
to this difference for unaffected groups. For example, to identify the incentive
effects of social insurance,
benefits.
way
estimation consists of identifying a specific intervention or treatment (often the
One then compares
groups affected by
increasingly popular
DD
estimation comes from
of the endogeneity problems that typically
individuals.
1
drawbacks. Most of the debate around the validity of
2
estimate revolves around the possible endogeneity of the laws or interventions themselves.
Sensitive to this concern, researchers have developed a set of informal techniques to gauge the
3
extent of the endogeneity problem. In this paper,
DD
estimation.
We
assume away biases
we address an
altogether different problem with
in estimating the intervention's effect
and instead focus on
possible biases in estimating the standard error around this effect.
DD
estimates and standard errors for these estimates most often derive from using Ordinary
Least Squares (OLS) in repeated cross-sections (or a panel) of data on individuals in treatment and
control groups for several years before
outcome
of interest for individual
i
in
and
group
whether the intervention has affected group
regression using
s (such as
s at time
ist
2
As
4
t.
a state) at time
One then
t
and
Ts
t
be a
dummy
for
typically estimates the following
OLS:
Y = A + B + cX
where
Formally, let Yi s t be the
after a specific intervention.
and Bt are fixed
s
t
effects for the states
lst
+(3Tst +
e ist
and years and Xi S t represents the relevant individual
See Meyer (1994) for an overview.
See Besley and Case (1994). Another prominent concern has been whether
behavioral parameter. See
Heckman
(1)
(1996) and Blundell and
MaCurdy
DD
(1999).
estimation ever isolates a specific
Abadie (2000) discusses how well
control groups serve as a control.
3
Such techniques include the inclusion of pre-existing trends in states passing a law, testing for an "effect" of the
law before it takes effect, or using information on political parties to instrument for passage of the law (Besley and
Case 1994).
4
For simplicity of exposition, we will often refer to interventions as laws, groups as states and time periods as
years in what follows.
Of course
this discussion generalizes to other types of
DD
estimates.
The estimated impact
controls.
around that estimate are
each state-year (or
OLS
s-t) cell.
we argue
In this paper,
of the intervention
is
then the
Our survey
of
DD
that the estimation of equation 1
DD
context. First,
we
papers, which
very
little
and an
DD
DD
in
at
DD model,
has been largely
serial correlation
the treatment variable
an especially
Tst
series.
changes
coming from the OLS estimation of equation
no
5%
itself
5%
level in as
much
as
45%
1.
performs on placebo laws, where state
of the time.
effect.
variable (from the Current Population Survey)
the
DD
random. Since these laws are
rejection rates of the null hypothesis of
We
it
estimation are typically highly positively serially
we examine how
percent level should be found only
effect at
well-understood,
within a state over time. These three factors reinforce each other to create potentially
assess the extent of this bias,
5%
in practice subject to a possibly
discuss below, finds an average of 16.5 periods. Second, the
intrinsic aspect of the
and year of passage are chosen
the
Standard errors
estimation usually relies on fairly long time
large mis- measurement in the standard errors
To
is
is
Three factors make
estimation.
most commonly used dependent variables
correlated. Third,
(3.
5
ignored by researchers using
DD
estimate
standard errors after accounting for the correlation of shocks within
severe serial correlation problem. While serial correlation
important issue in the
OLS
fictitious,
In fact,
we
a significant "effect" at
find dramatically higher
For example, using female wages as a dependent
and covering 21 years of data, we
of the simulations.
find a significant
6
propose three different techniques to solve the serial correlation problem. 7 The
techniques are very simple and work well for sufficiently large samples.
First,
first
two
one can remove
the time-series dimension by aggregating the data into two periods: pre- and post-intervention.
Second, one can allow for an arbitrary covariance structure over time within each state. Both of
these solutions
work
well
when
the
number
of groups
is
large (e.g.
50 states) but fare poorly as
5
This correction accounts for the presence of a common random effect at the state-year cell level. For example,
economic shocks may affect all individuals in a state on an annual basis (Moulton 1990; Donald and Lang 2001).
Ignoring this grouped data problem can lead to an under-statement of the standard error. In most of what follows,
we will assume that the researchers estimating equation 1 have already accounted for this problem, either by allowing
for appropriate random group effects or, as we do, by collapsing the data to a higher level of aggregation, such as
state-year
6
cells.
Similar magnitudes arise in data manufactured to match the
sure that the placebo laws are not by chance picking
7
up a
CPS
distributions
Other techniques fare poorly. Simple parametric corrections which estimate
fare poorly because even long time series (by
fails
specific processes (such as
an AR(1))
standards) are too short to allow precise estimation of the auto-
and to identify the right assumption about the auto-correlation process.
because the number of groups (e.g. 50 states) is not large enough.
correlation parameters
block bootstrap
DD
and where we can be absolutely
real intervention.
On
the other hand,
number
the
of groups gets small.
irrespective of
sample
size.
We
propose a third (and preferred) solution which works well
This solution, based on the randomization inference tests used
statistics literature, uses the distribution of
in the
estimated effects for placebo laws to form the test
statistic.
The remainder
of this paper proceeds as follows. In section
of the auto-correlation problem: section 2.1 reviews
in biased
5,
how
DD
assess the potential relevance
DD papers
performs on placebo laws.
to assess
account
how
it
will result
affects
them.
Section 4 describes possible solutions.
discusses implications for the existing literature.
We
conclude in Section
6.
Auto-correlation and Standard Errors
2
2.1
Review
It will
be useful to quickly review exactly
Consider the
a
we
failing to take it into
standard errors, and section 2.2 surveys existing
Section 3 examines
Section
why
2,
OLS
estimation of equation
the vector of parameters.
variance of the
OLS
Assume
estimate
is
OLS
1
serial correlation
and denote
,
which we regress
on
vt
with
e
has E[e]
=
and
T
0.
O,.
The
true
(2)
=
of
(W) -1
(3)
consider a simple uni-variate time-series case in
let's
vt
follows an
AR(1)
In this special case, equations 2 and 3 can be simplified to:
~
(1 + 2 P
^^-j
r v2
v
v
z^t=i
2_,t=i
t
=
periods of data. Suppose that the error term u t follows an AR(1)
with auto-correlation parameter A A
=
E[ee]
1
process with auto-correlation parameter p and that the independent variable
var(a)
estimation.
is:
compare these expressions,
yt
OLS
the vector of independent variables and
= ^(V'VyWnViV'V)-
estimate of the variance
easily
poses a problem for
given by:
est var(d)
To more
V
that the error term
var(d)
while the
why
+
2P
T T v~
2
l-.t=i
t
and
est var(d)
=
%
t
+-+
2P
^^2)
v
2-,t=i
t
As
T—
>
These formulas make transparent three
OLS
~P
oo, the ratio of estimated to true variance equals
known
well
A
.
about how
facts
serial correlation biases
estimates of standard errors. First, positive serial correlation in the error term (p
>
0) will
cause an under-statement of the standard error while negative serial correlation will cause an overstatement.
each
how
This
new year
is
of data than
serially correlated the
=
serially correlated (A
important in the
positive serial correlation
intuitive:
DD
0),
OLS
is
no bias
is less
information in
Second, the magnitude of the bias also depends on
assumes.
independent variables
there
means that there
is.
In fact,
when
the independent variable
is
not
estimated standard errors. This second point
in the
context, where the intervention variable
Tst
is
is
in fact very serially correlated.
Indeed, for affected states, the intervention variable typically equals
period after period until one
year where
it
turns to
passed by state
s
1
and then stays
by time
t" varies little
at
In other words, the variable "whether a law was
1.
over time within a state, thereby exacerbating any serial
correlation in the dependent variable. Finally, the magnitude of the problem depends on the length
of the time series (T). All else held constant, as
T
increases, the bias in the
OLS
estimates of the
standard errors worsens.
A
2.2
Survey of
DD
Papers
This quick review suggests the relevance of a
depends on three
the most
serial correlation
factors: (1) the typical length of the
commonly used dependent
variables;
and
problem
time series used;
(3)
for existing
DD
papers
(2) the serial correlation of
whether any procedures are use to correct
for serial correlation.
Since these factors are inherently empirical,
identified all
DD
we
collected data
on published
DD
papers.
We
papers published in 6 journals between 1990 and 2000: the American Economic
Review, the Industrial and Labor Relations Review, the Journal of Political Economy, the Journal of
Public Economics, the Journal of Labor Economics, and the Quarterly Journal of Economics.
classified
a paper as
"DD"
interventions. For example,
as a
DD
if it
met two following
we would not
paper (even though
it
criteria.
classify a
might suffer from
paper must focus on
specific
paper that regressed wages on unemployment
serial correlation issues as well).
paper must use units unaffected by the law as a control.
5
First, the
We
We
Second, the
found 92 such papers. For each of
these papers,
we determined the number
of time periods in the study, the nature of the
and the technique(s) used to estimate standard
variable,
Table
1
summarizes the results of
Sixty-nine of the 92
DD
We
this exercise.
dependent
errors.
start with the lengths of the time series.
papers used more than two periods of data. Four of these papers began
with more than two periods but collapsed the data into two effective periods: before and
Table
1
8
reports the distribution of time periods for the remaining 65 papers.
of periods used
of data.
As we
is
16.5
and the median
More than 75%
is 11.
will see in the simulations below, lengths
after.
The average number
of the papers use
more than
5 periods
such as these are more than enough to
cause serious under-estimation of the standard errors.
The most commonly used
variables in
DD
estimation are employment and wages. 10 Eighteen
papers study employment and thirteen study wages. Other labor market variables, such as
ment and unemployment
also receive significant attention, as
To
variables are clearly highly auto-correlated.
cite
do health outcomes. Most of these
an example, Blanchard and Katz (1992)
their survey of regional fluctuations find strong persistence in shocks to state
and unemployment.
It is
retire-
interesting to note that first-differenced variables,
in
employment, wages
which might have a
tendency to exhibit negative auto-correlation (and thereby over-state standard errors) are quite
uncommon.
In short, the bulk of
DD
papers focus on outcomes which are
likely positively serially
correlated.
How
do these 65 papers correct
address the problem at
all.
Only
for serial correlation?
five
of adjusting standard errors.
state,
The
one of the solutions we suggest
Two
fifth
When
A
11
Of these
five,
does very
four use a
little in
the
allows for arbitrary variance-covariance matrix within
in Section 4.
additional points are worth noting. First, 80 of the original 92
variation.
9
it.
will see later, this correction
problem with grouped error terms as the unit of observation
8
vast majority of papers do not
papers explicitly deal with
parametric AR(k) GLS-based correction. As we
way
The
Only 36 of these papers address
is
more
this problem, either
DD
papers have a potential
detailed than the level of the
by clustering standard errors or
we only recorded the shortest span.
whatever unit of time is used. The very long time series in the data, such as the 51 or 83 at the
and 99 percentile arise because several papers use monthly or quarterly data.
95
10
When a paper studies more than one variable, we record all the variables used.
11
For example, the effect of state level laws is studied using individual level data.
a used several data sets with different time spans,
period here
is
by aggregating the data. Second, several informal techniques are used
endogeneity of the intervention variable.
variable in equation
For example, three papers include a lagged dependent
seven include a trend specifically for treated states, 15 plot graphs of some
1,
form to examine the dynamics of the treatment
the law, two see
(DDD) by
the effect
if
is
finding another control group.
summary, our review suggests that
DD
existing
DD
it
does not
a specific data
We
Group
old.
tell
set,
us
of the
effect before
triple-differences
these informal techniques
in Section 5.
an important problem for
serial correlation is likely
CPS
how
DD
many
papers are likely to report under-estimated standard
serious the problem
women
of residence.
is
in practice.
for
in their fourth interview
the years 1979 to 1999.
We
extract information on weekly earnings,
The sample contains
Of the 900,000 women
earnings).
how
return to the issue on
an
To
assess magnitudes,
we turn
to
a sample of women's wages from the Current Population Survey (CPS). 12
extract data on
We
is
Estimation
While the survey above shows that most
errors,
examine whether there
papers and that this problem has been poorly addressed to date.
Over-Rejection in
3
We
problem
effect, 3
and eleven formally attempt to do
persistent,
interact with the serial correlation
In
with the possible
for dealing
month
focus on
women between
all
employment
Merged Outgoing Rotation
in the
25 and 50 years
status, education, age,
We
nearly 900,000 observations.
define
wage
and
state
as log(weekly
in the original sample, approximately 300,000 report strictly
positive weekly earnings. This generates (50*21=1050) state-year cells with each cell containing on
average a
little less
than 300
The correlogram
of the
women
wage
with positive weekly earnings.
residuals
is
informative.
We
estimate
first,
second and third auto-
correlation coefficients of the residuals from a regression of the logarithm of wages on state
dummies
(the relevant residuals since
DD includes these dummies).
The
and year
auto-correlation coefficients
are obtained by a simple regression of the residuals on the corresponding lagged residuals.
therefore imposing
common
auto-correlation
0.51,
2
The CPS
is
is
and
auto-correlation parameters for
is
strongly significant.
one of the most commonly used data sets
all states.
The second and
in the
DD
literature.
The estimated
We
first
are
order
third order auto-correlation
are high as well
.44
(
and
the residual was following an
AR(1)
process.
less rapidly
than we would expect
13
quantify the bias induced by serial correlation in the
which
some
affect
states
and not
others.
between 1985 and 1995. 14 Second, we
them
an affected state
We
Ts
We
DD
draw
first
we randomly generate
context,
random from a uniform
at
t
is
then defined as a
dummy
after the intervention date,
can then estimate
DD
variable which equals
and
1 for all
DD
OLS on
(equation 1) using
If
DD
of 1.96 for the t-statistic.
This exercise
II error, or
CPS
tells
no
effect
((3
=
0) exactly
5%
of the time
The
us about
Type
I
error.
Note that a small variant
power. After constructing the placebo intervention,
data by the outcome plus
point (approximately
inter-
that live in
The estimation
To understand how
new
laws
we would expect
that
when we use a threshold
15
T
st
times whichever effect
Ts t,
2%)
effect of
the intervention. 16
new laws randomly drawn each
time),
we can
By
assess
also allows us to assess
to simulate. For example,
* .02 to
often
in
we
generate a true .02 log
repeatedly estimating
how
Type
we can replace the outcome
we wish
can replace log(weekly earnings) by log(weekly earnings) plus Tst
(with
women
these placebo laws.
provides an appropriate estimate for the standard error,
reject the null hypothesis of
the
effect).
performs we can repeat this exercise a large number of times, each time drawing
random.
we
distribution
otherwise.
generates an estimate of the laws' "effect" and a corresponding standard error.
well
laws,
random and designate
select exactly half the states (25) at
by the law (even though the law does not actually have an
as "affected"
vention variable
at
if
Placebo Interventions
3.1
To
and decline much
.33 respectively),
DD finds an
DD
effect
in this data
when
there
13
Solon (1984) points out that in panel data, when the number of time periods is fixed, the estimates of the autoOLS regression are biased. Using Solon's generalization of Nickell's
(1981) formula for the bias, the first order auto-correlation coefficient of 0.51 we estimate in the wage data, with
21 time periods, would correspond to a true auto-correlation coefficient of 0.6 if the data generating process were
correlation coefficients obtained using a simple
an AR(1). However, Solon's formulas also imply that the second and third order auto-correlation coefficients would
be much smaller than the coefficients we observe if the true data generating process were an AR(1) process with an
auto-correlation coefficient of 0.6. To match the estimated second and third order auto-correlation parameters, the
data would have to follow and AR(1) process with an auto-correlation coefficient of 0.8.
We choose to limit the intervention date to the 1985-1995 period to ensure having enough observations prior and
post intervention.
I5
One might
argue that the rejection rate could be higher than
5%
we might
accidentally be capturing some real
on that data manufactured to track the
variance structure of the CPS also produces rejection rates similar to those in the CPS.
The 2% effect was chosen so that there is sufficient power in the data sets we study.
interventions with the randomization procedure. However,
8
we
will
as
show
later
actually
is
17
one.
Basic Rejection Rates
3.2
The
row of Table 2 presents the
first
We
without any correction for grouped error terms.
dummies
(less
We
was greater than
first
18
column of row
2%
is
shows the results of
1
no true
67.5%
effect
no
null hypothesis of
One important
effect of
of this
(we added .02
effect in
for
row performs a
*
Tst
DD
is
micro
CPS
data.
over-rejecting by a factor of
similar exercise but on the
to the data).
85.5% of the
We
CPS
find that, in this case,
data altered
we
reject the
cases.
(1990) general arguments to
all
cell; it
DD
inference.
is
diagonal while in practice
problem, only 36 correct
To properly account
for
state-year cell in equation
for
cell.
OLS
estimation above
assumes that the variance
might be block diagonal, with a constant
As noted
earlier,
while 80 papers suffer from
it.
such shocks, one can assume that there
is
a random effect for each
1:
Y
lst
will
and year
it
The
does not allow for aggregate year-to-year
the observations within a state. In other words,
correlation coefficient within each state
we
this exercise in the unaltered
reason for this gross over-rejection has been described by Donald and Lang and
matrix for the error term
this case,
above
control variables Xi st include 4 education
of the time. Thus, in this setup,
who apply Moulton's
shocks that affect
n In
as described
the placebo laws, we find that the null of no effect
does not account for correlation within state-year
this
micro data,
level.
The second column
to contain a
(2001),
5%
at the
rejected a stunning
thirteen.
CPS
the fraction of simulations where the null hypothesis of no intervention
i.e.
Here, even though there
is
the
report the fraction of simulations in which the absolute value of the t-statistic
1.96,
was rejected
The
1
in
than high school, high school, some college and college and more) and a quartic in
age as controls.
effect
re-estimate equation
The
200 independent draws of placebo laws.
at least
when performed
result of this exercise
= A s + B + cX
it
t
count only correct rejections,
i.e.
+P T + Vst +
3t
the
number
of
(4)
tit
DD
estimates that are significant and
positive.
18
The average
of the estimated coefficients
coefficients are unbiased.
(3
was
0.
Thus while OLS overestimates standard
errors, the estimated
where ust are group random
group
year
by the random
level introduced
random
In row
3,
Row
effect.
2 reports rejection rates
We
continue to find a
we take a more
44%
rejection rate.
To
and
this aggregate data,
we
term diagonal.
The
the problem, since
Row
= a s +>y + /3Tst + e
t
cells
it
no
effect.
were the only reason
for over-rejection, aggregation
for the error
effect is
almost as high here as when use the micro data and
In about
44%
of the simulations,
we
reject the null
20
These magnitudes suggest that
errors generates a dramatic bias.
is
(5)
st
would make the variance-covariance matrix
correct the standard errors for clustering.
context
then compute means of these
3 displays the results of multiple estimations of equation 5 for placebo laws.
rejection rate of the null of
hypothesis of no
regress log weekly
refers to the aggregation.
the correlation within state-year
fully solve
first
estimate:
where the bar above the variables
ought to
We
we
aggregate the data into
year. This leaves us with 50 times 21 state-year cells in the aggregate data.
Yst
If
We
aggregate,
earnings on the controls (education and age) and form residuals.
Using
allow for state-
19
drastic approach to solve this problem.
state-year cells to construct a panel of states over time.
residuals by state
when we
using the White correction (1984) which allows for an arbitrary intra-group
effects
correlation matrix.
error can be corrected for the correlation at the
The standard
effects.
failing to
account for
As we saw
in
serial correlation
equation
the serial correlation of the intervention variable
2.1,
Ts
t
when computing standard
one important factor
itself.
In fact,
in the
we would expect
DD
the
un-corrected estimates of the standard errors for the intervention variable to be consistent in any
variation of the
this point,
DD
model where the intervention variable
we construct a
is
not serially correlated.
different type of intervention variable.
half of the states to form the treatment group.
As
before,
To
illustrate
we randomly
select
However, instead of randomly choosing one date
19
Practically, this is usually implemented by the "cluster" command in STATA. We also applied the correction
procedure suggested in Moulton (1990). That procedure allows for a random effect for each group, which puts
structure on the intra-cluster correlation matrices and therefore may perform better in finite samples. This is especially
true when the number of clusters is small (if in fact the assumption of a constant correlation is a good approximation).
The rate of rejection of the null hypothesis of no effect was not statistically different under the Moulton technique,
possibly reflecting the fact that the
20
One might worry
ticity in the data.
number
of clusters
However, the results
in
large in this application.
is
deals with the clustering problem, introduces heteroskedasTable 2 do not change if we use standard heteroskedasticity-correction
that the aggregation process, while
it
techniques.
10
after
which
all
The law
dates between 1979 and 1999.
is
now
defined as
intervention variable
its
is
now repeatedly turned on and
off,
with
its
otherwise. In other words, the
value yesterday telling us nothing
Ts
value today, and thereby eliminating the serial correlation in
that the null of no effect
is
rejected only
6%
select 10
the observation relates to a state that
1 if
belongs to the treatment group at one of the 10 intervention dates,
about
we randomly
the states in the treatment group are affected by the law,
of the time in this case.
t-
In row
Removing the
we
4,
see
now
serial correlation
in the law
removes the over-rejection. This strongly suggests that the bias in the standard errors
due to
than other properties of the error terms.
is
serial correlation, rather
In rows 5 and
states.
When
6,
we examine how
we modify
rejection rates vary as
12 states or 36 states are affected, the rejection rates are
The placebo
way
laws so far have been constructed in such a
affects all treated states in the
same
In row
year.
of passage can differ across treated states.
We
7,
we
the
number
39% and 43%
of affected
respectively.
that the intervention variable
create placebo laws such that the date
randomly choose half of the
form the
states to
treatment group but now randomly choose a passage date separately for each state (uniformly
drawn between 1985 and 1995)
if
with the null of no
One might
still
effect
Or perhaps
teristics of
otherwise.
The
rejection rates continue to be high in
being rejected about
35%
of the time.
t,
and
worry at this point that factors other than
large rejection rates. Perhaps
changes).
Ts
by time
state s has passed the law
this case,
The
in the treatment group.
we
we
other features of the
wage
replicate our analysis in
whose variance structure
in
variable
defined to equal
is still
serial correlation give rise to these
data, such as state specific trends or other characrise to
the over-rejection.
manufactured data.
To
directly address all
Specifically,
we generate data
terms of relative contribution of state and year fixed
the empirical variance decomposition in the CPS.
The data
AR(1) process with an auto-correlation parameter
p.
By
is
effects
.8,
we
construction,
find a rejection rate roughly equal to
11
matches
normally distributed and follows an
we can
therefore be sure
that there are no ambient trends and that the laws truly have no effect. Yet, in row
assume that p equals
1
are by chance detecting actual laws (or other relatively discrete
the distribution of the error term, give
of these problems,
t
what we found
8,
in the
where we
CPS, 37%.
Magnitude of
3.3
Effects
These excess rejection rates are stunning, but they might not be particularly problematic
estimated "effects" of the placebo laws are economically insignificant.
of the statistically significant intervention effects in Table
We
Using the aggregate data, we perform
3.
3 reports the empirical distribution of the estimated effects
J3,
when
2.
Table
these effects are significant.
surprisingly, this distribution appears quite symmetric: half of the false rejections are negative
Not
and half are
positive.
On
roughly to a 2 percent
average, the absolute value of the effects
effect.
Nearly
10%
range, and the remaining
60%
fall in
the
1
elasticities.
21
%
to 2
is
roughly
range.
.02,
which corresponds
About 30%
fall
DD
estimates are often represented as
For example, suppose that the law under study corresponds to be a
in the child-care subsidy.
Similarly, in
An
DD
many
meaning a measured 2%
5%
increase
increase in log earnings of .02 would correspond to an elasticity of
estimates, the affected group
effect
in the 2 to
are larger than 3%.
These magnitudes are especially large considering that
.4.
the
examine the magnitudes
200 independent simulations of equation 5 for placebo laws defined such as in row 3 of Table
3%
if
on the
is
often only a fraction of the sample,
sample would indicate a much larger
full
effect
on the treated
subsample. 22
To summarize, our
tions
1
findings suggest that the standard errors from the
grossly understate the true standard errors, even
the micro data. In other words, reporting simple
accounting
for serial correlation will
generate
DD
many
if
one accounts
OLS
for
estimation of equa-
group data
effects in
estimates and their standard errors without
spurious results. 23
Varying Serial Correlation
3.4
How
does over-rejection vary with the
question in two different ways. First,
The
serial correlation in the
we use
dependent variable?
other outcome variables in the
CPS
We
address this
as left-hand side
DD
estimates are normalized using the magnitude of the change in the policy variable of interest.
when studying the effects of changes unemployment insurance benefits on job search, one would
examine the full sample of the unemployed, not only those who actually took up the program. This use of all people
22
For example,
eligible for
a program rather than
all
recipients
is
often motivated by an attempt to avoid the endogeneity caused by
selective take-up.
DD
Mapping these findings to the existing literature on
requires more care. Researchers often use informal
techniques along with the mechanical
procedure we have described. For example, to check for endogeneity, they
will test whether a law appears to have an "effect" before it was passed. We discuss whether these informal techniques
DD
interact with the over-rejection rates in Section
5.
12
Second, we experiment with various auto-correlation parameters in the manufactured
variables.
data.
The
first
part of Table 4 estimates equation 5 on employment rate, average weekly working
hours and change in log wages, as well as the original log weekly earnings. As before, the table
displays the rejection rate of the null hypothesis of
random draws
effect of
of the intervention variable, both in
the intervention.
We
no
200 independent simulations with
effect in
raw data and
in
also report in Table 4 estimates of the
As we
auto-correlation coefficients for each of these variables.
see,
data altered to create a
first,
second and third Order
the false rejection problem
diminishes with the serial correlation in the dependent variable. As expected,
the first-order auto-correlation
is
negative (as
it is
2%
when the estimate
the case for change in log wages),
we
of
find that
the conventional standard errors tend to underestimate the precision of the estimated treatment
effect.
The second
The
part of Table 4 uses manufactured data.
error structure
is
designed to follow
an AR(1) process. As before, the manufactured data has the same number of states and years as
the
CPS
data and
is
constructed so that the relative importance of state
effects,
year effects and
the error term in the total variance matches the variance composition of the wage data in the CPS.
Not
surprisingly, the false rejection rates increase with the auto-correlation
There
process.
moderate
levels,
is
exactly the right rejection rate
such as a p of
should be. As noted
OLS
is
formula
is
close to
negative, there
is
or
.4,
there
is
no auto-correlation. But even at
rejection rates are already two to four times as large as they
with an AR(1) parameter of
what we observe
in the
CPS
0.8,
data.
the rejection rate using the standard
And
again,
when the
auto-correlation
under-rejection.
Varying the
3.5
The
earlier,
.2
when
parameter in the AR(1)
stylized exercise
Number
of States and
Time Periods
above focused data with 51 states and 21 time periods.
Many
DD
papers use
fewer states (or treated and control units), either because of data limitations or a desire to focus
only on comparable controls.
Table
5,
before,
we examine how the
we use the CPS
DD
papers use fewer time periods.
In
over-rejection rate varies with these two important parameters.
As
For similar reasons, several
as well as manufactured data to analyze these effects.
13
We
also
examine
rejection rates
Rows
rejection.
falls
1-4
24
when we have added a treatment
and 10-13 show that varying the number of states does not change the extent of over-
Rows
as the time
5-9
and 14-18 vary the number of years. As expected, the extent of over-rejection
span gets shorter, but
it
does so at a surprisingly slow rate. For example, even with
only 7 years of data, the over-rejection rate
of the
DD
4
nearly
50%
8% (CPS
in the
data) and
in the
many
CPS, three times too
With
periods.
17% (manufactured
data).
large.
Around 70%
5 years of data, the rejection
When T=50,
the rejection rate
manufactured data.
Solutions
Parametric Methods
4.1
A
16%
is
papers in our survey use at least that
rate varies between
rises to
the data.
effect to
first
natural solution to the serial correlation problem would be to specify the auto-correlation
structure for the error term, estimate
errors.
We
Row
its
parameters, and use equation (2) to estimate true standard
implement several variations of
this basic correction
method
Table
in
2 performs the simplest of these parametric corrections, wherein an
estimated in the data. 25
rejection rate
is still
This technique does
The
34.5%.
As
correlation coefficient.
is
auto-correlation parameter
is
failure here
biased downwards.
the auto-correlation parameter
.8
in the
in part
is
is
CPS
.8,
data (row
.62 is
3),
is
the
due to the under-estimation of the auto-
In the
Similarly, in the
a p of
AR(1) process
to solve the serial correlation problem:
OLS
well understood, with short time-series the
auto-correlation coefficient of only 0.4.
autocorrelation of
little
6.
CPS
data,
OLS
estimation of the
estimates a first-order
manufactured data where we know that
estimated (row
7).
the rejection rate goes
If
we impose a
down
first-order
to 12.5%, a clear but
only partial improvement.
In row
24
8,
we
establish
an upper-bound on the power of any potential correction, by examining
CPS
data, we vary the number of states by randomly selecting some states and discarding others. In both
we continue to treat exactly half the states.
25
Computationally, we first estimate the first order auto-correlation coefficient of the residual by regressing the
In the
types of data,
residual on
its lag,
the residual.
are the
and then uses
The matrix
same whether
is
or not
this estimated coefficient to
form an estimate of the variance-covariance matrix of
block diagonal, with a matrix of the form
we assume each
state has
its
own
14
Q
(in
equation 2)
in
auto-correlation parameter.
each block. The results
how
well the true variance-covariance matrix does. In other words,
matrix implied by an AR(1) process with p of
rejection rate
there
a
is
2%
is
now
effect,
is
now
As
to estimate standard errors.
5% when
indistinguishable from
the rejection rate
.8
we use the variance-covariance
there
32.3%. This
is
no
effect.
More
expected, the
interestingly,
be a useful benchmark
will
when
for other
corrections.
Another problem with
this
the auto-correlation process.
CPS. In rows 9 and
10,
parametric correction
As noted
=
.35.
9,
that
an AR(1) does not
we use the manufactured data
the autocorrelation process. In row
and p2
earlier,
may be
small sample bias given in Solon (1984).
the correlogram of wages in
fit
to an
These parameters were chosen because they match
in the
correctly specified
to see the effect of such mis-specification of
we generate data according
and third auto-correlation parameters
we have not
AR(2) process with
well the estimated
wage data when we apply the formulas
We then correct
term follows an AR(1) process. The rejection rate
p\
first,
=
.55
second
to correct for
the standard error assuming that the error
rises significantly
with this mis-specification of
the auto-correlation structure (30.5%).
In row 10,
the
CPS
we
use a process that provides an even better match of the time-series properties of
data: the
sum
of an
variance of the white noise
is
AR(1) (with auto-correlation parameter
13
% of the
total variance of the residual).
the auto-correlation in this data by fitting an AR(1),
close to
what we found
for the
CPS
However, attempting to correct
not look
we
like
a plausible option:
it
data in row
for
is
correct the standard errors in the
we
26
26
reject the null in
(the
When
trying to correct
about
39%
of the case,
2.
the auto-correlation by specifying different processes does
clearly difficult to find the right process.
CPS
data by imposing the
white noise processes that we have seen match fairly well the
high.
and a white noise
0.95)
CPS
specific
data.
In rows 4 and
AR(2) and AR(1)
The
5,
plus
rejection rates remain
27
Note that an AR(1) plus white noise seems a
priori a very reasonable process for the
CPS
data.
Indeed, even
AR(1) in the population, the repeated cross-section in the CPS implies that a non-persistent
sampling error is added to the error term each year.
"Similar results hold if we estimate the parameters of the processes, as in row 2, rather than impose them.
if
the wage follows an
15
Block Bootstrap
4.2
An
Block bootstrap
1994).
shirani,
structure by keeping
We
first
state.
then draw
this residual
and compute the
each state a
for
block bootstrap (Efron and Tib-
new
We
implement block bootstrap as
residuals. This gives a vector of residuals
€j
follows.
for each
residual vector from this distribution (with replacement).
back to the original predicted value gives us a new outcome variable Y^
then estimate equation
we can form
1
is
a variant of bootstrap which maintains the auto-correlation
is
the residuals of a state together.
all
estimate equation
We
Adding
ei,
method with which we experiment
alternative correction
1 for
different
this
Y^ and
By
outcome.
.
We
repeatedly sampling from the residual distribution of
repeat this exercise to estimate a sequence of
/?&.
The
distribution
of these parameters then gives us a test statistic.
The results
The
of the block bootstrap estimation are reported in Table
rejection rates are
well (row 3).
still
high:
35%
The problem appears
in the
CPS
data (row
to be the small
number
2)
29%
7.
They
in the
are not encouraging.
manufactured data as
of blocks or states. In
row
when we
4,
allow for 400 states (and 200 states passing the law), block bootstrap delivers a close to correct
rejection rate. Since very few applications in practice can rely
on that many groups, block bootstrap
does not appear to be a realistic solution to the serial correlation problem.
Empirical Variance-Covariance Matrix
4.3
Both the most parametric and the most non-parametric methods seem to
fail
because of lack of data:
there are not enough time periods to estimate the time series process and perform a parametric
correction,
not
make
and not enough states
use of the fact that
for
block bootstrap. However, the techniques
we have a
large
auto-correlation process in a flexible way.
is
the same in
In this case,
all states.
if
number
Specifically,
the data
is
years, the variance-covariance matrix of the error
of size
T
by
the element
T
(where
(i,i
+
j)
T
is
is
the
number
of states that
above did
can be used to estimate the
sorted by states and (by decreasing order of)
term
ej
tried
suppose that the auto-correlation process
is
of time periods).
the correlation between
we
and
block diagonal, with 50 identical blocks
Each
ti-j-
of these block
We
is
symmetric, and
can therefore use the variation
across the 50 states to estimate each element of this matrix, and use this estimated matrix to
16
compute standard
T
of
errors
from equation
This
2.
equivalent to treating the problem as a system
is
seemingly unrelated equations estimated jointly
for
each state) in each year.
(in
200 simulations)
significant
We
7.75%
is
implement
in the
CPS
(1 for
this technique in
(row
and 10%
2)
each year), with 50 data points (one
Table
8.
in the
The
rejection rate
manufactured data (row
9),
of states drops, the rejection rates increase (rows
method does not
6 and 8).
4,
A
all states.
see that the rejection rate
is
if
when a
true
2%
is
number
that this
the data generating process
Finally, the technique has low power. In
only 8.5%
First, as the
second obvious issue
deliver consistent estimates of the standard error
not the same across
effect is associated
column 2 of row
we
2,
with the laws.
Arbitrary Variance- Covariance Matrix
4.4
This procedure can be generalized to an estimator of the variance covariance matrix which
consistent in the presence of any correlation pattern within states over time.
consistently estimate each element of the matrix
like
is
a
improvement over the two previous methods.
This correction method however has important limitations in practice.
is
we obtain
formula to compute the standard
errors.
28
Q, in
this case,
Of
course,
is
we cannot
but we can use a generalized White-
This estimator for the variance-covariance matrix
given by:
V = (X'X)-'(f^u'J uA(X'X)where n c
is
the total number of states,
X
is
1
matrix of independent variables and Uj
is
defined for
each state to be:
T
uj
= Yl e^ x n
t=i
where the summation
state)
and Xj t
is
is
over
all
is
consistent as the
28
This
results for this estimation procedure are
29
is
the residual at time
number of
shown
arbitrary variance-covariance matrix does quite well.
allow for
ejt is
t
(in that particular
29
This estimator of
a row vector of dependent variables (including the constant).
the variance-covariance matrix
The
elements in the state,
The
in
states tends to infinity.
Table
9.
Despite
its generality,
rejection in the unaltered
CPS
the
data
is
analogous to applying the Newey-West correction (Newey and West 1987) in the panel context where we
lags to potentially be important.
implemented in a straightforward way by using the cluster command in STATA and choosing entire states
all
This is
(and not only state-year
cells) as clusters.
17
6%
(row
2,
column
found in Table 8 (row
relative to the
Moreover, the procedure has relatively high power compared to what we
1).
2,
column
We
upper-bound.
2%
rejection rate in the case of a
and 32%
in
AR(1) data with p
=
we can
In the manufactured data,
2).
saw
in Tables 4 6, that
effect
was 78%
The
.8.
74% and 27.5%
when the number
of treated units
Time
Ignoring
4.5
8.5% with 10
states,
it
does
comes nearly these
respectively.
5% when
the number of groups
This method, therefore, seems to work well,
states).
large enough.
Series Information
Another possible solution to the
component
is
well
manufactured data with no auto-correlation
in
Again, however, rejection rates increase significantly above
(15% with 6
how
with the correct covariance matrix, the
arbitrary variance-covariance matrix
upper-bounds, achieving rejection rates of
declines
also see
in the estimation
serial correlation
problem
is
to simply ignore the time series
and when computing the standard
errors.
simply average the data before and after the law and run equation
variable as a panel of length
2.
The
rates are approximately correct
1
30
To do
on
this,
this averaged
outcome
The
rejection
results of this exercise are reported in Table 10.
now
Moreover, the power of .295 in row 2
at 6%.
one could
is
quite high
relative to other techniques.
Taken
states at the
longer the
slightly
however, this solution
literally,
same
time.
same and
If states
work only
will
for laws that are
passed for
the treated
all
pass the law at different times, "before" and "after" are no
are not even defined for states that never pass the law.
modify the technique in the following way.
First,
one regress
Y
st
One can however
on state fixed
effects,
dummies, and any relevant covariates. One then divides the residuals of the treatment
into
two groups: residuals from years before the
estimate and the standard error
dummy. This procedure does
all
passed at the same time.
The downside
when
30
the
One
number
could
still
come from an OLS
It
is
and residuals from years
also does well
small.
when
row
states, residual aggregation
use time-series data for specification checks.
18
after the law.
The
2) for laws that are
the laws are staggered over time (row 4).
raw and residual aggregation)
With 20
states only
regression of this two-period panel on an after
as well as the simple aggregation (row 3 vs.
of these procedures (both
of states
law,
year
is
that they do poorly
has a rejection rate of
9.5%, nearly twice too large.
With
only 6 states, the rejection rate
is
as high as 31.5%. 31
extent of over-rejection in fact appears to increase faster in this case than in the clustering
The
method
presented above.
Randomization Inference
4.6
We
have so
two procedures that appear to do
far isolated
sufficiently large. In this section,
the sample
size.
The
we
DD
propose to compute
when the number
of states
is
present an estimation technique that does well irrespective of
principle behind the test
carried out throughout this paper.
fairly well
simple,
is
and motivated by the simulation
To compute the standard
number
estimates for a large
of
exercises
error for a specific experiment,
randomly generated placebo laws and
we
to
use the empirical distribution of the estimated effects for these placebo laws to form significance
test for the true law.
This test
is
32
closely related to the randomization inference test (or Fisher's exact test), discussed
in the statistical literature (see
Rosenbaum
Before laying out the test in more details,
of the
way randomization
individuals, half of
whom
(1996) for an overview of this test
it
have received a
to Ti.
we have outcome data
applications).
illustration
(such as sick days) on
Define Yi to be the outcome and T, be the
flu shot.
i.
Suppose that the we make the hypothesis that the
=
ai
effect is constant:
Yi
Now,
its
be useful to give a simple, motivating
inference works. Suppose
indicator for vaccination, both for individual
treatment
will
and
let
us assume that the
To estimate the
flu
shot was administered to a
effect of the shot,
we could take the
those receiving the shot and those not receiving
hypotheses about this parameter?
+ 9Ti
We
it.
random group,
difference in
so that a^
OLS
orthogonal
mean outcomes between
Call this estimator 9(Ti, Yi).
would use the
is
How
can we
test
estimate of the standard error but this
estimates rests on strong assumptions, such as homoskedasticity, normal distribution of the error
and independence between observations. Instead, to
31
test
whether the
coefficient
is
different
from
0,
(as small as 12), the normal approximation to the t-distribution will clearly not work.
not causing the over-rejection since for small degrees of freedom, a threshold of 1.96 should not
produce this high of a rejection rate. For example, for 6 degrees of freedom, a 1.96 threshold should only produce
roughly a 10% rejection rate. Donald and Lang (2001) discuss inference in small-sample aggregated data sets.
32
The code needed to perform this test is available from the authors upon request.
But
At these small samples
this alone
is
19
one can use the fact that under the
statistically the
One
same.
null of
no
people in the treatment and control groups are
effect,
can, therefore, generate a set of placebo interventions Ti
and estimate
the "effect" of these placebos. Repeated estimation will give us a distribution of effects for such
placebos.
We
can then observe where the original estimate 9
hypothesis that the coefficient
is 0.
For example, to form a two-tailed test for whether the
from
at the
cut-off value.
5%
level,
The
we would use
intuition
behind
random, we can generate other
in the data
we
truly
this test
is
way:
make
we want
if
(i.e
9
is
significantly different
flu
shot
is
Y)
as our
assumed to be
random, zero-effect shots and see what estimates these produce
not indexed by
to test whether the estimate
is
We
itself).
making no assumptions about
It
need not be homoskedastic,
on any large sample
Under the assumption that the treatment
we could
z),
are
this test does not rest
therefore valid for any sample size.
constant across unit
Since the
quite simple.
normal or even independent across individuals. Further,
effect is
is
the 2.5% and 97.5% values in the distribution 9(Ti,
the error term (except for the randomness of the flu shot
it is
estimate
initial
have. Notice the advantages of this procedure.
approximation:
distribution to test the
lies in this
different
from
test other
9q,
we
the treatment and the control comparable under the null, and
hypotheses in the same
form Yio
first
=
ai
—
9oTiq, to
we apply the same technique
33
to the transformed data.
The
statistical inference test
To form the estimate
we propose
of the law's effect,
DD
model follows naturally from
this example.
we estimate by OLS the usual aggregate equation:
Yst =
We
for the
as + j
t
+pT
st
+€ St
then generate a placebo law that affect 25 randomly chosen states
in
a random year,
Ps
t
and
estimate using OLS:
Y = a s + 7t + 7 P + e^
st
Repeating
this
procedure
many
times for
st
random placebo laws produces a
G(.) to be this distribution and 7 to be the
the hypothesis that our estimate
lies in
33
the G(.) distribution.
One can
use the
is
random
variable
statistically different
drawn from
from
0,
entire confidence intervals.
20
this distribution.
we simply need
For example, to form a two-tailed test of
same technique to form
distribution of 7. Define
level p,
To
test
to ask where
$
we would simply
7 at the
identify the
\ lower
the estimated coefficient
to 0, otherwise
we accept
term. Instead,
we have
we
In Table 11,
lies
and upper
of the distribution and use these values as cutoffs: If
outside these two cutoff value,
it.
As
before, note that
relied solely
assess
tail
how
we
reject the hypothesis that
a
5%
Pst
400 draws of
We
of
no
We
cutoff.
known
T
st
.
CPS
data as well
Again, we randomly generate
auto-correlation structure.
perform 200 independent draws of
Tst we construct the test statistic
,
For each of these draws, we perform
to construct the distribution G(j).
see that the basic randomization inference procedure leads us to reject the null hypothesis
effect in
6%
of the simulations (row 2,
powerful, leading to rejecting in
before, in the
23%
column
1).
Notice that the procedure also seems fairly
when
of the cases
manufactured data, we can compare
its
there
is
an
effect
(row
2,
was 78%
performance to the upper-bound.
manufactured data with no auto-correlation and 32%
in
performance of randomization inference,
There
30% and
an additional assumption behind
is
84
%
this test,
is
know
we
know
will not
1988, and
we
its effect,
between 1971 and 2000, or something
The
These
AR(1) data with p
2%
=
effect
The
.8.
(e.g.
in the flu shot
example, the
lie in
it lies
5,
a different
If
a particular law took place in
should we assume when generating the placebo laws that
else?
Rows
it
could have been passed at any random time
3 to 5 consider the effect of using different (and
simpler) assumptions about the law generating process.
window. In row
saw
namely the requirement that we know the
the true process by which laws are generated.
the law could only have been passed in 1988, that
assume that
We
that half of the sample was randomly selected to get the treatment). In practice,
are estimating
distribution to
As
comparable to these upper-bounds.
exact statistical process determining which units get treated
statistician
in
2).
column
in Tables 4 6, that with the correct covariance matrix, the rejection rate in the case of a
34
error
on the random assignment of the laws. 34
intervention variables. For each randomly generated intervention
for
equal
we have not used any information about the
well this procedure performs in the aggregated
as in manufactured data with a
it is
window around the
Each row allows the laws
actual law date.
In row 3 for example,
in
a 5 year window centered on the actual date. In row
we
force the placebo laws to occur in the
same year
for the test
4,
we assume a
we
3 year
as the actual laws; in other
theoretical justification for this test rests on the proof of the validity of using the randomization distribution.
assumption of randomization of the treatment to map out the distribution of the test
tests use the strong
statistic.
random
The only difference in our case is that state laws are not
conditional on the state fixed effects and year dummies.
21
random
unconditionally, but instead that they are
words,
we permute only which
of type
and type
I
II errors, all
states were affected
by the
As can be
laws.
seen, both in terms
35
the procedures produce very similar results.
repeat the exercise using a progressively smaller
number
of states.
Even with
The next few rows
as few as 6 states,
the procedure produces exactly the right rejection rates.
To summarize, Table
11
makes
clear that, of all the techniques
it
we have
tion inference performs the best. It removes the over-rejection problem
of sample size. Moreover,
it
and does so independently
appears to have power comparable to that of the other
randomization inference testing
literature.
considered, randomiza-
is
well
known
in statistics,
it is
tests.
Although
largely ignored in the econometrics
Difference in differences estimation, which deals with small effective sample size, and
complicated error distribution, seems a particularly
fertile
ground
for the application of this testing
technique.
Implications for Existing Papers
5
This paper has highlighted an important problem with
tions to deal with
it.
What
DD
estimation and proposed several solu-
are the implications for the existing stock of papers which use the
estimation technique but do not explicitly correct for serial correlation?
most of these papers use long time
possibility,
however,
is
that
some
and variables which are
series
We
DD
have already seen that
likely quite auto-correlated.
One
of the informal techniques used in these papers might indirectly
help alleviate the auto-correlation problem.
As we noted
earlier, researchers
have developed a
formed in conjunction with the estimation of equations
endogeneity of the interventions, something that
random
35
OLS
or
5.
These
tests are
meant to
assess the
It is
possible that these tests
may
inciden-
problem. In Table 12, we report rejection rates when we perform
estimation of equations
may
1
not a problem in our setup, as we construct
interventions that are by definition exogenous.
tally lessen the auto-correlation
the
is
set of diagnostic tests that are often per-
1
and
5 in
combination with several commonly used diagnostic
how many
states were affected. If we see 25 of 50 states affected, was
chance of being affected or was it one in which exactly 25 states will
be affected? Simulations which vary this additional element show that the results are not affected by which selection
process is assumed (even if it differs from the actual process). It is worth noting that while these results tell us that
in simulations, using incorrect approximations do not make a difference, this may not be a general theoretical result.
Similar problems
arise in
determining
the process one in which each state had a
50%
iid
22
techniques. 36
We
concentrate on 3 data sets: micro
term by state-year
follows
The
aggregate
cell),
CPS wage
data,
CPS wage
and manufactured data where the
an AR(1) process with an autocorrelation parameter of
first
variable but also include a
at
then reject the null hypothesis of no
on the pre-law
dummy
is
dummy
only
effect
if
error
term
0.8.
We
diagnostic test looks for pre-existing "effects" of the law.
DD with the T
coefficient
data (with clustering of the error
re-estimate the original
for "this state will pass a law next year"
the coefficient on the law
is
either insignificant or opposite signed.
significant
.
We
and the
This diagnostic test
implies only a small reduction in the rejection rates.
The second
no
effect if
diagnostic test examines persistence of the effect.
the estimated coefficient
is
statistically significant
We
under
reject the null hypothesis of
OLS and
the effect persists
three years after the intervention date. Again, this does not lead to a significant reduction in the
rejection rate in
The
any of the data
sets.
third test looks for pre-existing trends in the treatment sample.
the pre-period and estimate whether there
between control and treatment
states).
coefficient is statistically significant
and
is
there
this test,
is
no
we
reject the null of
this test but
reject the null
effect if
effect.
The
the
OLS
treatment
rejection rates are lower
remain above 20%.
Finally, in the last test
we
no
statistically significant pre-existing
trend or a treatment trend of opposite sign of the intervention
under
in
a significant time trend in these years for the difference
Under
if
We perform a regression
if
the
we allow
OLS
for a
coefficient
is
treatment specific trend in the data. Under this
significant
and stays
significant
and of the same
after controlling in the regression for a yearly trend interacted with the treatment
test,
size
dummy. This
diagnostic test leads to a bigger reduction in the rejection rate, especially in the aggregate data
(.105
and
.150),
though
this
is still
at least twice as large as
in Table 12 therefore confirm that the existing stock of
we would want. 37
DD
papers
is
In short, the results
very likely affected by the
36
0f course, we can only study the formal procedures which people use. One can always argue that informal
procedures (such as looking at the data) will lead one to avoid the serial correlation problem. Such a claim is by
construction hard to test using simulations. The only way to defend it would be go back to the original papers and
submit them to the procedures discussed above.
"Moreover, it is not clear that these tests are rejecting the "right" ones. Any stringent criterion will reduce the
rejection rate, but how are we to assess whether the reduction is sensible? We investigated this by computing the
marginal rejection rate of the randomization inference test, conditional on passing the treatment trend diagnostic
test. The odds that, having passed the trend test, an intervention would pass the randomization inference test are
only 26% suggesting that the treatment trend reduces the rejection rate but not in a way that alleviates the serial
correlation problem.
23
estimation problem discussed in this paper. 38
Conclusion
6
Our
results suggest that,
because of
serial correlation,
DD
estimation as
it is
commonly performed
While the bias
grossly under-states the standard errors around the estimated intervention effect.
induced by
the
DD
serial correlation is well
understood
in theory, the sheer
magnitude of
this
problem
in
context should come as a surprise to most readers. Since a large fraction of the published
DD papers we surveyed report t-statistics around 2, our results suggest that the findings in many of
these papers
may
not be as precise as originally thought to be and that far too
many
false rejections
of the null hypothesis of no effect have taken place.
We
propose three solutions to deal with the
serial correlation
problem
in the
DD
context.
Collapsing the data into pre- and post- periods or allowing for an arbitrary covariance matrix within
state over time have been
shown to be simple viable
solutions
when sample sizes
are sufficiently large.
Alternatively, a simple adaptation of the randomization inference testing techniques developed in
the statistics literature appears to fully correct standard errors irrespective of sample
We
size.
have attempted variants on these diagnostic tests, such as changing what constitutes an "effect" in the preWe have also attempted all the tests together. The results were qualitatively
period or what constitutes a trend.
similar.
24
References
Abadie, Alberto, "Semiparametric Difference-in-Differences Estimators," Working Paper, Kennedy
School of Government, Harvard University, 2000.
Blundell, Richard
and Thomas MaCurdy, "Labor Supply,"
in
Handbook of Labor Economics, Orley
Ashenfelter and David Card (eds.), December 1999.
and Robert Tibshirani An Introduction
Efron, Bradley
Statistics
and Probability, no
Heckman, James, "Discussion,"
and James Poterba
in
57,
Chapman and
to the
Bootstrap
Monograph
in
Applied
Hall, 1994
Empirical Foundations of Household Taxation, Martin Feldstein
(eds.), 1996.
Donald, Stephen and Kevin Lang, "Inferences with Difference in Differences and Other Panel Data,"
Boston University Working Paper, 2001.
MaCurdy, Thomas
E.,
in a Longitudinal
"The Use
of
Time
Series Processes to
Data Analysis," Journal of Econometrics,
Meyer, Bruce, "Natural and Quasi-Natural Experiments
Economic
in
v.
of Earnings
18 (1982): 83-114.
Economics," Journal of Business and
13 (1995): 151-162.
Statistics, v.
Moulton, Brent R., "An Illustration of a
in
Model the Error Structure
Pitfall in
Micro Units," Review of Economics and
Estimating the Effects of Aggregate Variables
Statistics, v.
72 (1990): 334-338.
Newey, Whitney and K. D. West, "A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent-Covariance Matrix," Econometrica, v. 55, pp. 703-708, 1987.
Nickell, Stephen,
"Biases in
Dynamic Models with Fixed
Effects,"
Econometrica,
v.
49 (1981):
1417-1426.
Rosenbaum, Paul, "Hodges-Lehmann Point Estimates of Treatment
ies,"
Journal of the American Statistical Association,
v.
Effect in Observational Stud-
88 (1993): 1250-1253.
Rosenbaum, Paul, "Observational Studies and Nonrandomized Experiments,"
Rao, eds, Handbook of
Ghosh and C.R.
Statistics, v. 13 (1996).
Solon, Gary, "Estimating Auto-correlations in Fixed-Effects Models,"
Paper
S.
NBER
Technical Working
no. 32, 1984.
White, Halbert, Asymptotic Theory for Econometricians, Academic Press, 1984.
25
Table
1:
Survey of
DD
Papers"
of DD papers
Number with more than 2 periods of data
Number which collapse data into before-after
92
Number with
65
Number
69
4
potention serial correlation problem
Number with some
serial correlation correction
5
GLS
4
Arbitrary variance-covariance matrix
1
Distribution of time-span for papers with
more than
2 periods
Average
16.5
Percentile
Value
1%
5%
3
10%
25%
50%
75%
90%
95%
99%
Number
Informal manipulations of data
Graph time series of effect
See
if
15
2
effect persists
Examine
lags of law to see timing of effect
2
DDD
11
Include trend specific to passing states
7
Explicitly include lead to look for effect prior to law
3
Include laggged dependent variable
3
Number which have
Number which
clustering
deal with
Most commonly used
problem
80
36
it
variables
Employment
Wages
18
13
Health/Medical Expenditure
8
Unemployment
Fertility/Teen Motherhood
4
Insurance
4
Poverty
3
Consumption/Savings
3
6
"Notes: Data comes from a survey of all articles in six journals between 1990 and 2000: American
Economic Review; Industrial Labor Relations Review; Journal of Labor Economics; Journal of
Political Economy; Journal of Public Economics; and Quarterly Journal of Economics. We define
an article as "Difference-in-Difference" if it: (1) examines the effect of a specific interventions and
(2) uses units unaffected
by the intervention as a control group.
26
3
4
5.75
11
21.5
36
51
83
Table
2:
DD
Rejection Rates for Placebo Laws"
Law Type
Data
Technique
Rejection Rate
No
A.
CPS
CPS
CPS
CPS
CPS
CPS
CPS
micro
25 states, one date
aggregate
aggregate
aggregate
aggregate
aggregate
=
.8
Cluster
OLS
25 states, one date
Serially uncorrelated laws
OLS
OLS
12 states, one date
OLS
36 states, one date
25 states, multiple dates
B.
AR(1), p
OLS
25 states, one date
micro
Effect
2%
Effect
REAL DATA
OLS
.675
.855
(.027)
(.020)
.44
.74
(.029)
(.025)
.435
.72
(.029)
(.026)
.06
.895
(.014)
(.018)
.433
.673
(.029)
(.027)
.398
.668
(.028)
(.027)
.48
.71
(.029)
(.026)
.373
.725
(.028)
(.026)
MANUFACTURED DATA
OLS
25 states, one date
'Notes:
1.
Each cell represents the rejection rate of the null hypothesis of no effect (at the 5% significance level)
on the intervention (law) variable for randomly generated placebo interventions. The number of
simulations for each
2.
CPS
cell is at least
data are data for
two hundred.
women between
25 and 50
in
the fourth interview
month
of the
Merged
1979 to 1999. The dependent variable unless otherwise
specified is log weekly earnings. In row 3 to 7, data are aggregated to state-year level cells after
controlling for demographic variables (education and age). Manufactured data are data generated
so that the variances match the CPS variances. The p refers to the auto-correlation parameter in
manufactured data.
Outgoing Rotation Group
for the years
3.
All regressions also include, in addition to the intervention variable, state and year fixed
the individual level regression they include the demographic controls as well.
4.
Standard errors are
5.
"Effect" specifies whether an effect of the placebo law has been
in parenthesis
and computed using the number of simulations.
27
added to the data.
effects.
In
Table
3:
Magnitude of
CPS
No
Average Coefficient
Fraction of effects
<
Negative
2%
Effect
Positive
Negative
.18
.17
.715
.0075
(.022)
(.022)
(.026)
(.005)
.02
-.02
.026
-.017
(.000)
(.000)
(.000)
(.000)
.59
.58
.33
1
(.028)
(.028)
(.027)
(.000)
.01
in (.01, .02]
in (.02, .03]
in (.03,-04]
>
Estimates
Effect
Positive
Rejection Rate
DD
Aggregate Data
.31
.3
.39
(.027)
(.026)
(.028)
.084
.12
.16
(.016)
(.019)
(.021)
.014
.04
(.000)
(.000)
.12
(.007)
(.000)
(.019)
(.000)
"Notes:
1.
The
positive (negative)
columns report results
for
estimated effects of interventions which are positive
(negative)
data are data for women between 25 and 50 in the fourth interview month of the Merged
Outgoing Rotation Group for the years 1979 to 1999. The dependent variable is log weekly earnings.
Data are aggregated to state-year level cells after controlling for the demographic variables (education
and age).
2.
CPS
3.
All regressions also include, in addition to the intervention variable, state
4.
Standard errors are in parenthesis
and year
and computed using the number of simulations.
28
fixed effects.
Table 4 Varying Auto-correlation
Data and Dependent Variable
Technique
Rejection Rate
No
A.
Effect
2%
Effect
REAL DATA
Pl,P2,p3
CPS
CPS
CPS
CPS
agg,
agg,
Log wage
.509, .440, .332
Employment
.470, .418, .367
agg, Hours worked
agg,
Changes
in log
.151, .114, .063
wage
B.
-.046, .032,
002
OLS
OLS
OLS
.435
.72
(.029)
(.026)
.415
.698
(.028)
(.010)
.263
.265
(.025)
(.025)
(.000)
(.009)
OLS
.978
MANUFACTURED DATA
Pi
AR(1
OLS
AR(1
OLS
.2
AR(1
OLS
.6
AR(1
OLS
.8
AR(1
.783
(.024)
.123
.738
(.019)
(.025)
OLS
.4
AR(1
.053
(.013)
OLS
-.4
.19
.713
(.023)
(.026)
.333
.700
(.027)
(.026)
.373
.725
(.028)
(.026)
.008
.7
(.005)
.026)
"Notes:
1.
2.
Each cell represents the rejection rate of the null hypothesis of no effect (at the 5% significance level)
on the intervention variable for randomly generated interventions. The number of simulations for
each cell is at least two hundred.
CPS
data are data for women between 25 and 50 in fourth interview month of the Merged Outgoing
Rotation Group for the years 1979 to 1999. The dependent variable unless otherwise specified is log
weekly earnings. Data are aggregated to state-year level cells, after controlling for the demographic
variables (education
and
age).
Manufactured data are data generated so that the variances match
the
CPS
variances.
CPS
regressions also include, in addition to the intervention variable, state
3.
All
4.
Standard errors are
5.
The
in
parenthesis and computed using the
number
variables pi refer to the estimated auto-correlation parameters of lag
29
and year
of simulations.
i.
fixed effects.
Table 5 Varying
N
CPS
CPS
CPS
CPS
CPS
CPS
CPS
CPS
CPS
aggregate
50
20
agg
agg
10
agg
6
agg
50
50
agg
agg
50
agg
50
agg
50
T
AR(1), p=.8
AR(1), p=.8
AR(1), p=.8
AR(1), p=.8
AR(1), p
=
.8
AR(1), p=.8
AR(1), p=.8
AR(1), p=.8
50
20
10
6
50
50
Rejection rate
Effect
A.
REAL DATA
2%
Effect
.435
.72
(.029)
(.026)
.36
.53
(.028)
(.029)
.425
.525
(.029)
(.029)
21
.45
.433
(.029)
(.029)
21
.29
.675
(.026)
(.027)
11
7
.16
.63
(.021)
(.028)
.08
.503
(.016)
(.029)
5
.0775
.39
(.015)
(.028)
3
2
.073
.315
(.015)
(.027)
MANUFACTURED DATA
21
21
21
21
11
5
50
T
No
21
50
and
Rejection rate
21
B.
N
3
50
.35
.638
(.028)
(.028)
.35
.538
(.028)
(.029)
.3975
.505
(.028)
(.029)
.393
.5
(.028)
(.029)
.335
.588
(.027)
(.028)
.175
.5525
(.022)
(.029)
.09
.435
(.017)
(.029)
.4975
.855
(.029)
(.020)
"Notes:
1.
2.
Each cell represents the rejection rate of the null hypothesis of no effect (at the 5% significance level)
on the intervention variable for randomly generated interventions. The number of simulations for
each cell is at least two hundred.
CPS
data are data for women between 25 and 50 in the fourth interview month of the Merged Outgoing
Rotation Group for the years 1979 to 1999. The dependent variable unless otherwise specified is log
weekly earnings. Data are aggregated to state-year level cells after controlling for the demographic
variables (education and age). Manufactured data are data generated so that the variances match
the
CPS
variances.
3.
All
CPS
regressions also include, in addition to the intervention variable state
4.
Standard errors are
5.
N
refers to the
the
CPS
data
The parameter
in
p measures the auto-correlation.
parenthesis and
and year
fixed effects.
computed using the number of simulations.
number of states used in the simulation and T refers to the number
used, we randomly drop states or years to fulfill the criterion.
is
30
of years.
When
Table
Data
Parametric Solutions
6:
Technique
Estimated p
Rejection rate
No
A.
CPS
CPS
CPS
CPS
agg
Standard AR(1)
agg
agg
.368
.705
(.026)
AR(1) correction
imposing p=.8
.12
.4435
(.019)
(.029)
correction
=
.55
and p2
=
AR(1)
+ White
=
.95
Noise
and n/s=.13
Standard AR(1)
.622
correction
AR(l),p =
AR(1)
correction
imposing pAR(2), pi =
P2
=
=
Standard AR(1)
.55
-35
.95,
.335
.638
(.027)
(.028)
.373
.765
(.028)
(.024)
.205
.715
(.023)
(.026)
.06
.323
(.023)
.444
correction
AR(1)+ white
p
.5725
(.029)
MANUFACTURED DATA
OLS
p=.i
.228
(.024)
.35
B.
AR(1),
.72
(.026)
.345
p
AR(1), p=.8
.435
(.029)
(.027)
imposing p\
CPS
effect
correction
AR(2)
agg
2%
REAL DATA
OLS
agg
Effect
Standard AR(1)
noise
noise/signah .13
.301
correction
.305
.625
(.027)
(.028)
.385
.4
(.028)
(.028)
"Notes:
1.
represents the rejection rate of the null hypothesis of no effect (at the 5% significance
on the intervention variable for randomly generated interventions. The number of
simulations for each cell is at least two hundred.
Each
cell
level)
2.
CPS
data are data for
women between
Merged Outgoing Rotation Group
25 and 50 in the fourth interview
for the years
1979 to 1999.
The dependent
month
of the
variable unless
Data are aggregated to state-year level cells, after
and age). Manufactured data are data
generated so that the variances match the CPS variances. An AR(1) + white noise process is
the sum of an AR(1) plus an iid process, where the auto-correlation for the AR(1) component
is given by p* and the relative variance of the components is given by the noise to signal
otherwise specified
is
log weekly earnings.
controlling for the demographic variables (education
ratio (n/s).
3.
All
CPS
regressions also include, in addition to the intervention variable, state and year fixed
effects.
4.
Standard errors are
5.
the AR(1) correction
in parenthesis
is
and computed using the number of simulations.
implemented
in stata using the xtgls
31
command.
Table 7 Block Bootstrap
N
Technique
Data
Rejection rate
No
A.
CPS
CPS
aggregate
aggregate
OLS
B.
Block Bootstrap
AR(1), p
=
OLS
.S
Effect
50
.435
.72
(.029)
(.026)
.35
.60
(.028)
(.027)
MANUFACTURED DATA
OLS
AR(1), P =.&
2%
CPS DATA
50
Block Bootstrap
Effect
Block Bootstrap
50
50
400
400
.38
.735
(.028)
(.025)
.285
.645
(.026)
(.028)
.415
1
(.028)
(.000)
.075
.98
(.015)
(.008)
"Notes:
1.
Each
level)
cell
represents the rejection rate of the null hypothesis of no effect (at the 5% significance
for randomly generated interventions. The number of
on the intervention variable
simulations for each
2.
cell is at least
two hundred.
in the fourth interview month of the
the years 1979 to 1999. The dependent variable unless
otherwise specified is log weekly earnings. Data are aggregated to state-year level cells after
controlling for the demographic variables (education and age). Manufactured data are data
CPS
data are data
for
women between 25 and 50
Merged Outgoing Rotation Group
for
generated so that the variances match the
3.
All
CPS
CPS
variances.
regressions also include, in addition to the intervention variable, state
and year
effects.
4.
Standard errors are
in parenthesis
and computed using the number of simulations.
32
fixed
Table
8:
Empirical Variance-Covariance Matrix"
Data
N No
Technique
A.
CPS
CPS
OLS
agg
50
variance
CPS
CPS
OLS
agg
20
variance
CPS
agg
OLS
10
CPS
agg
Empirical
10
variance
CPS
CPS
OLS
agg
AR(1), rho=.8
.72
(.026)
.0775
.085
(.015)
(.016)
.36
.53
(.028)
(.029)
.0825
.08
(.016)
(.016)
.425
.525
(.029)
(.029)
.0825
.0975
(.016)
(.017)
.45
.433
(.029)
(.029)
.165
.1825
(.021)
(.022)
6
variance
B.
.435
6
Empirical
agg
Effect
(.029)
20
Empirical
agg
2%
REAL DATA
50
Empirical
agg
Effect
MANUFACTURED DATA
Empirical
50
variance
.105
.16
(.018)
(.021)
"Notes:
1.
represents the rejection rate of the null hypothesis of no effect (at the 5% significance
on the intervention variable for randomly generated interventions. The number of
simulations for each cell is at least two hundred.
Each
cell
level)
2.
CPS
data are data
for
women between 25 and 50
in the fourth interview
month
of the
1979 to 1999. The dependent variable unless
otherwise specified is log weekly earnings. Data are aggregated to state-year level cells after
controlling for demographic variables (education and age). Manufactured data are data
Merged Outgoing Rotation Group
for the years
generated so that the variances match the
3.
All
CPS
CPS
variances.
regressions also include, in addition to the intervention variable, state
and year
effects.
4.
Standard errors are
in parenthesis
and computed using the number of simulations.
33
fixed
Table
9:
Data
Arbitrary Variance-Covariance Matrix 3
Technique
#
Rejection Rate
States
No
CPS
agg
OLS
51
CPS
agg
Cluster
51
CPS
agg
OLS
20
CPS
agg
Cluster
20
CPS
CPS
CPS
OLS
agg
agg
10
Cluster
OLS
agg
agg
B.
AR(1), p=.8
AR(1), p=0
10
6
Cluster
Cluster
Cluster
2%
Effect
REAL DATA
A.
CPS
Effect
6
.435
.72
(.029)
(.026)
.06
.27
(.014)
(.026)
.36
.53
(.028)
(.029)
.0625
.1575
(.014)
(.021)
.425
.525
(.029)
(.029)
.085
.1025
(.016)
(.018)
.450
.433
(.029)
(.029)
.15
.1875
(.021)
(.023)
MANUFACTURED DATA
50
50
.045
.275
(.012)
(.026)
.035
.74
(.011)
(.025)
"Notes:
1.
2.
Each cell represents the rejection rate of the null hypothesis of no effect (at the 5% significance level)
on the intervention variable for randomly generated interventions. The number of simulations for
each cell is at least two hundred.
CPS data are data for women between 25 and 50 in the fourth interview month of the Merged Outgoing
Rotation Group for the years 1979 to 1999. The dependent variable unless otherwise specified is log
weekly earnings. Data are aggregated to state-year level cells after controlling for the demographic
variables (education and age). Manufactured data are data generated so that the variances match
the CPS variances.
CPS
regressions also include, in addition to the intervention variable, state
3.
All
4.
Standard errors are
in
parenthesis and
and year fixed
computed using the number of simulations.
34
effects.
Table 10 Ignoring Time Series Data"
Data
N
Technique
Rejection rate
No
CPS
agg Federal
Federal
Federal
Staggered
CPS
agg Federal
Federal
Federal
Staggered
CPS
agg Federal
Federal
Federal
Staggered
CPS
agg Federal
Federal
Federal
Staggered
Effect
2%
Effect
REAL DATA
A.
OLS
50
Simple Aggregation
50
Residual Aggregation
Residual Aggregation
OLS
50
50
20
Simple Aggregation
20
Residual Aggregation
Residual Aggregation
OLS
20
20
10
Simple Aggregation
10
Residual Aggregation
Residual Aggregation
OLS
10
10
6
Simple Aggregation
6
Residual Aggregation
Residual Aggregation
6
6
.435
.72
(.029)
(.026)
.060
.295
(.014)
(.026)
.053
.210
(.013)
(.024)
.048
.335
(.012)
(.027)
.36
.53
(.028)
(.029)
.060
.188
(.014)
(.023)
.095
.193
(.017)
(.023)
.073
.210
(.015)
(.024)
.425
.525
(.029)
(.029)
.078
.095
(.015)
(.017)
.095
.198
(.017)
(.023)
.103
.223
(.018)
(.024)
.450
.433
(.029)
(.029)
.130
.138
(.019)
(.020)
.315
.388
(.027)
(.028)
.275
.335
(.026)
(.027)
B. MANUFAC'i'UUKI DAT.'Simple Aggregation
.050
50
'
AR(1),
P- =.8 Federal
(.013)
Federal
Staggered
AR(1),
p--
=0 Federal
Federal
Staggered
Residual Aggregation
Residual Aggregation
Simple Aggregation
50
.045
.235
(.012)
(.024)
50
.075
.355
(.015)
(.028)
50
Residual Aggregation
Residual Aggregation
.243
(.025)
.053
.713
(.013)
(.026)
50
.045
.773
(.012)
(.024)
50
.105
.860
(.018)
(.020)
'Notes:
Each cell represents the rejection rate of the null hypothesis
randomly generated interventions. The number of simulations
of
for
no
effect (at the
each
5%
cell is at least
significance level) on the intervention variable for
two hundred.
data are data for women between 25 and 50 in the fourth interview month of the Merged Outgoing Rotation Group for the years
1979 to 1999. The dependent variable unless otherwise specified is log weekly earnings. Data are aggregated to state-year level cells after
controlling for demographic variables (education and age). Manufactured data are data generated so that the variances match the CPS
variances. All CPS regressions also include, in addition to the intervention variable, state and year fixed effects. Standard errors are in
parenthesis and computed using the number of simulations.
CPS
35
36
Table 11: Randomization Inference"
Data
N
Technique
Rejection rate
No
A.
CPS
CPS agg
CPS
CPS
CPS
CPS
CPS
CPS
CPS
CPS
OLS
agg
agg
agg
agg
CPS
agg
agg
CPS
CPS
CPS
agg
agg
CPS agg
5 year
window
20
20
10
Randomization Inference
RI
5 year
window
10
RI same year
10
OLS
6
Randomization Inference
5 year
10
10
RI 3 year window
RI
20
20
OLS
agg
agg
20
RI same year
CPS agg
CPS
51
RI 3 year window
CPS agg
CPS
51
Randomization Inference
RI
51
51
OLS
agg
CPS agg
window
RI same year
agg
agg
5 year
RI 3 year window
agg
agg
51
Randomization Inference
RI
CPS
window
6
6
RI 3 year window
6
RI same year
6
!>
Rejection rate
2%
Effect
Effect
vr..
.435
.72
(.029)
(.026)
.065
.235
(.014)
(.024)
.06
.23
(.014)
(.024)
.06
.23
(.014)
(.024)
.04
.24
(.011)
(.025)
.36
.53
(.028)
(.029)
.045
.1
(.012)
(.017)
.04
.125
(.011)
(.019)
.065
.095
(.014)
(.017)
.045
.115
(.012)
(.018)
.425
.525
(.029)
(.029)
.055
.115
(.013)
(.018)
.055
.095
(.013)
(.017)
.05
.125
(.013)
(.019)
.07
.115
(.015)
(.018)
.450
.433
(.029)
(.029)
.04
.07
(.011)
(.015)
.035
.065
(.011)
(.014)
.03
.06
(.010)
(.014)
.055
.07
(.013)
(.015)
MANUFACTURED DATA
AR(1), rho=0.8
iid
data
B.
Randomization Inference
Randomization Inference
50
50
.05
.3
(.011)
(.025)
.08
.84
(.019)
(.026)
'Notes:
1-
Each
2.
CPS
cell represents the rejection rate of the null hypothesis of no effect (at the
variable for randomly generated interventions.
5%
significance level)
on the intervention
data are data for women between 25 and 50 in the fourth interview month of the Merged Outgoing Rotation Group for the
years 1979 to 1999. The dependent variable unless otherwise specified is log weekly earnings Data are aggregated to state-year
level cells after controlling for education and age. Manufactured data are data generated so that the variances match the CPS
variances. All CPS regressions also include, in addition to the intervention variable, state fixed effects and year fixed effects.
Standard errors are in parenthesis and computed using the number of simulations.
37
Table
12: Effect of
2
CPS
Data:
No
Ho: P =
OLS
Informal Tests'
Rejected
effect before
OLS+Persistence of
OLS+No
effect
No
Effect
=
2%
.8
effect
.360
.725
.345
.625
(.034)
(.031)
(.034)
(.031)
the law
.400
.634
.325
.565
(.033)
(.032)
(.033)
(.035)
.378
.602
.305
.565
(.033)
(.031)
(.033)
(.035)
effect
treatment specific
trend in pre-period
OLS+Coef
2%
if:
coef significant (OLS)
OLS+No
AR(1), p
Aggregate
Effect
.248
.518
.235
.530
(.029)
(.035)
(.030)
(.035)
.106
.418
.150
.375
(.018)
(.018)
(.025)
(.034)
significant
with treatment specific trend
(TREND)
"Notes:
1.
2.
Each cell represents the rejection rate of the null hypothesis of no effect (at the 5% significance level)
on the intervention variable for randomly generated interventions. The number of simulations for
each cell is at least two hundred.
CPS
data are data for women between 25 and 50 in the fourth interview month of the Merged Outgoing
Rotation Group for the years 1979 to 1999. The dependent variable unless otherwise specified is
log weekly earnings. Data are aggregated to state-year level cells after controlling for demographic
variables (age
and education). Manufactured data are data generated so that the variances match
the
CPS
variances.
3.
All
CPS
regressions also include, in addition to the intervention variable, state and year fixed effects.
4.
Standard errors are
in
parenthesis and
computed using the number
38
of simulations.
Date Due
lM/'}
Lib-26-67
MIT LIBRARIES
3 9080 02246 1112
Download