^lil mm iBi fill

advertisement
MIT LIBRARIES
DUPL
mmmmmimunuunmmmRSvmmtainniuiMKny^
;
",
3 9080 02613 1349
Hrl.'WftlKllillHJfrDHfJiiiill.'limii
fill
^11
mm
iiijl'
iBi
^lil
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/treatmenteffecthOOangr
[B31
n.^^
^
M415
Technology
Department ot Economics
Working Paper Series
Massachusetts
Institute of
TREATMENT EFFECT HETEROGENEITY
THEORY AND PRACTICE
IN
Joshua Angrist
Working Paper 03-27
August 2003
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network
http://ssrn.com/abstract
Paper iSollection
id=X^XXX
/
at
rxk
I
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
SEP
1
9 2003
LIBRARIES
j
C:\D_Drive\WPDOCS\PROJECTS\sargan\drafts\revs8_03\sargan8.wpd; August
Treatment Effect Heterogeneity
in
1,
2003
Theory and Practice*
By,
Joshua D. Angrist
MITandNBER
August 2003
'Presented as the Sargan lecture. Royal
Sargan developed
much
Economic Society, University of Warwick,
April 2003. Denis
of the estimation theory that instrumental variables practitioners rely on today. For recent
surveys of Sargan's contnbutions see Arellano (2002) and Phillips (2003). Special thanks go to Patricia Cortes and
Francisco Gallego for outstanding research assistance and to Alberto Abadie, Victor Chemozhukov, Jerry Hausman,
and Guido Imbens for helpful discussions and comments.
Preston, for their detailed vvntten comments.
This
is a
Thanks
also
go
revised version of
Temple and
Working Paper 9708.
to the editors, Jon
NBER
Ian
Treatment Effect Heterogeneity
in
Theory and Practice
Instrumental Variables (IV) methods identify internally valid causal effects for individuals
status is
manipulable by the instnmient
at
hand.
whose treatment
Inference for other populations requires homogeneity
assumptions. This paper outlines a theoretical framework that nests causal homogeneity assumptions. These
ideas are illustrated using sibling-sex composition to estimate the effect of child-bearing
marital outcomes.
on economic and
The application is motivated by American welfare reform. The empirical results generally
support the notion of reduced labor supply and increased poverty as a consequence of childbearing, but
evidence on the impact of childbearing on marital stability and welfare use
is
Joshua D. Angrist
MIT Economics Department
angrist@mit.edu
keywords: Instrumental Variables, Marital
JELNumbers:C31,J12,J13.
Stability,
Welfare, Causal Effects.
more tenuous.
Empirical research often focuses on causal inference for the purpose of prediction, yet
to say that
most prediction involves
particular set of empirical results
is
a fair
amount of guesswork. The relevance or "external
always an open question. As Karl Pearson (191
1,
p.
seems
it
fair
validity" of a
157) observed in
an early discussion of the use of correlation for prediction, "Everything in the universe occurs but once, there
is
no absolute sameness of repetition." This practical difficulty notwithstanding, empirical research
always motivated by a belief
likely effects
that estimates for a particular context
of similar programs or events
in the future.
discouraging empirical work reveals that empiricists like
The basis for extrapolation
stability
while
of causal effects.
is
a set
As a graduate
my own teaching and research
are willing to extrapolate.
student,
I
learned about parameter stability as "the Lucas critique,"
focuses on the identification possibilities for average causal effects in
Applied micro-econometricians devote considerable
whether homogeneity and
stability
assumptions can be justified, and
implications of heterogeneity for alternative parameter estimates.
sometimes comes
at the
in often-
of assumptions about the cross-sectional homogeneity or temporal
models with heterogeneous potential outcomes.
attention to the question of
almost
provide useful information about the
Our investment of time and energy
me
is
Regrettably, this sort of analysis
expense of a rigorous examination of the internal validity of estimates,
the estimates have a causal interpretation for the population under study. Clearly, however,
valid estimates are less interesting if they are completely local,
i.e.,
to the
i.e.,
whether
even internally
have no predictive value for populations
other than the directly affected group.
In this paper,
I
discuss the nature and consequences of homogeneity assumptions that facilitate the
use of instrumental variables (IV) estimates for extrapolation."
To be
precise,
I
am
interested in the
assumptions that link a Local Average Treatment Effect (LATE) tied to a particular instrument with the
population Average Treatment Effect (ATE), which
is
not instrument-dependent. Implicitly,
^Denis Sargan noted the difficulty of the core instrumental variables identification problem,
identification in
models with constant
effects, in his
assumptions concerning these errors, namely
in Arellano,
2002).
that
seminal 1958 paper:
"It is
I
have
in
mind
i.e.,
not easy to justify the basic
they are independent of the instrumental variables."
(p.
396; quoted
prediction for populations defined by covariates.
were
me
to treat individuals with characteristics
to sidestep variability
due to changes
focus on
I
ATE
because
it
X, what would the hkely change
in the
answers the question: "If we
outcomes be"? This allows
in
process determining treatment status. For example, causal
research often focuses on average treatment effects on the treated population. Overall average treatment
effects are theoretically
treated as well as
The
stable than average effects
in
on
empirical work (see,
limited in a precise
way by
a
e.g.,
is
of special
Moffitt,
1
interest
and independence - analogous
in the
-
depend on who gets
both because of the growing importance
999), and because the ex ante generality of IV estimates
number of well-known
theoretical results.
constant treatment effects and certain types of randomized
interest
the treated, since the latter
on the distribution of potential outcomes.
external validity of fV estimates
of rv methods
is
more
the standard
trials,
to the notion that the instrument induces a
are not sufficient to capture the expected causal effect
on
a
Except
in special cases like
IV assumptions of exclusion
good experiment for the
effect
of
randomly selected individual or even
population subject to treatment. Rather, basic IV assumptions identify causal effects on "compliers,"
defined as the subpopulation of treated individuals
instrument. Although this limitation
is
whose treatment
status
can be influenced by the
unsurprising, the nature and plausibility of assumptions under which
IV estimates have broader predictive power are worth exploring.
The next two
sections develop a theoretical
framework linking
population subgroups defined by their response to an instalment. That
parameters like
LATE and ATE. My agenda is to make this
from stronger (no selecfion bias)
applied to the
to
composition. In particular,
are
much more
likely to
is, I
consider formal finks between
link using a range
of assumptions, progressing
weaker (a proportionality assumption). These theoretical ideas
same sex instrument, used by An grist and Evans
on labor supply. This instrument
alternative causal parameters to
arises
from the
among parents who have
go on to have a third
child.
fact that
at least
( 1
are then
998) to estimate the effects of childbearing
some parents prefer
a
mixed
sibling sex
two children, those with two boys or two
Because child sex
is
virtually
girls
randomly assigned, a
dummy
for
same sex
sibling pairs provides an instrumental variable that can be
used
to identify the effect
of childbearing on a range of economic and family outcomes.
My earlier work with Evans using the same se.x instmment focused on the effects of childbearing on
labor supply.
While labor supply outcomes also appear
investigation of the effects of childbearing
marital stability can be motivated
I
An
stability.
the
grounds
features an
inquiry into the effects of family size on
by American welfare reform, which penalizes
women receiving public assistance on
welfare receipt more likely.
on marital
work
in this paper, the empirical
that increases in family size
further childbearing
by
make continued poverty and
therefore look at effects on marital status, poverty status, and welfare use, as
well as labor supply. Estimates of ATE for the effects of childbearing are generally smaller than estimates
of LATE.
For example, while estimates of LATE for the effect of childbearing on welfare use and marital
stability are
mostly significantly different from zero, most (though not
all)
of the estimates of ATE for effects
of childbearing on marital status and welfare use are small and insignificant.
for teen mothers,
latter is
that
is
LATE appears to be virtually identical to the population average treatment effect when the
empirical results suggest a pattern of modest effects, but the variability in parameter estimates
model specifications and samples, as well as
the usual
problem of more imprecise estimates under
weaker identifying assumptions, reduces the predictive value any findings.
that the attempt to
go from
LATE
to
ATE
weakens
marital stability and welfare use, but the estimates of
is
tantalizing result
imputed under the assumptions considered below.
The
across
One
On
balance
it
seems
fair to
the evidence for an adverse effect of childbearing
ATE do not provide a sharp alternative to LATE.
perhaps not surprising, given the difficulty of the underlying identification problem.
experimental sciences, the best evidence for predictive value
is
likely to
experiments, which in the case of applied econometrics usually means
come from new
new
instruments.
As
say
on
This
in
the
data sets and
new
1.
Causality and Potential Outcomes in Research on Childbearing
The
effects of children
on marital
course of more than academic interest to
stability
have long been of interest to
many married couples.
social scientists
Previous research
(e.g.,
and are of
Becker, Lanes, and
Michael, 1977; Cherlin, 1977; Heaton, 1990 and Waite and Lillard, 1991) suggests the presence of young
children increases marital stability, although many authors acknowledge serious selection problems that
may
A related issue is the connection between childbearing and women's standard of living. A large
bias results.
literature looks at the effect
of teen childbearing on mothers' schooling, earnings, and welfare
status,
Bronars and Grogger, 1994). Interest
can be
sometimes using instrumental variables
(e.g.,
motivated by welfare reform, which include "family caps"
eliminate benefits paid for children
bom
in
to welfare recipients,
many
U.S. states.
on the theory
in this question
Family caps reduce or
by
that further childbearing
welfare mothers increases the likelihood they will stay poor and therefore continue to receive benefits.'
Are children
the glue that holds couples together or a
women?
collapse? Does childbearing further impoverish poor
potential outcomes,
i.e.,
with
family size
and
let Yg,
is
at least
D^ be an indicator for
two children. Because D,
is
women
binary,
I
not determined directly by a program or policy.
though only one
the observed outcome,
Y,
let
be her circumstances otherwise.
for everyone,
that accelerates a fragile family's
Implicit in these questions
is
the notion of
a contrast in circumstances with and without childbearing, for a given family.
represent this idea formally,
women
burden
Y„
is
with more than two children
will refer to
it
sample of
as a "treatment," even though
Let Y,, be a woman's circumstances
We imagine both of these potential
ever observed for each
in a
woman. Formally,
To
this
if
D,=l,
outcomes are well-defined
can be expressed by writing
as
= Y„,(l-D,) + Y„Di.
For both practical and substantive reasons,
'See Maynard,
et al
I
focus here on
fertility
consequences defined with
(1998) for more on the motivation for family caps. The possibility of a link between
is little evidence that family caps actually affect fertility behavior.
childbearing and poverty notwithstanding, there
See, e.g., Grogger and Bronars (2001), Blank (2002) or
Kearney (2002).
reference to the transition from two to
more
On
than two children.
the practical side, instalments based on
Angrist and Evans (1998) used parents
sibling sex composition are available for this fertility increment.
preferences for a mixed sibling sex composition to estimate the labor supply consequences of childbearing.
On the substantive side, post-war reductions in marital
(see, e.g., Westoff, Potter,
to
have
more
a third child
a
and Sagi, 1963). While almost
may be due
in part to a
fact that the
median of 2
all
have been concentrated
couples want
sense of whether this
generally, the economic welfare of the family.
supported by the
and
fertility
is
- Y„,|
D,= l], which
is
outcomes
average, E[ Y(,,| D,= 1 ] Alternately,
.
E[Y|; - Yg,], which can be used to
(or a
Estimation of ATE
woman
is
j
- Yn,|
we may be interested
women who have
D=
1
is
research on causal effects tries
For example,
a third child.
Note
we may
that E[Y|,|
equivalent to estimating the counter-factual
unconditional average treatment effect (ATE),
if the
analysis conditions on covariates).
equivalent to estimation of both counterfactual averages, E[Yo,| D,= l ]and £[¥,,! D,=0].
D=
l
bias term disappears
this
is
with a particular set of characteristics
and D,=0 equals E[Y,,
- Yo,|
to
measure. The observed difference
D = l]
when
in
outcomes
plus a bias term:
= E[Y„|D,= 1]- E[Yo,|D,=0]
= E[Y„-YoJ D=l]
outcomes. But
stability or,
make predictive statements about the impact of childbearing on a randomly
E[Y,|D = 1]- E[Y,|D,=0]
The
]
in the
Causal parameters are easy to describe but hard
between those with
woman,
for different subpopulations.
the effect on
an observed quantity, so estimating E[Y|
woman
good for long-term marital
children.
be interested in E[Y,,
chosen
child, the decision
population of welfare mothers in 1990 had an average of about 2.3 children
to capture the average difference in potential
is
one
range
Finally, interest in the 2-to-3 child increment
Since both Y,, and Y„, are never both obser\'ed for the same
D,= 1 ]
at least
in the 2-3 child
childbearing
is
detennined
(1)
+{E[Yo,| D,= l]-E[Yo,| D-0]}.
in a
manner independent of a woman's
potential
independence assumption seems unrealistic since childbearing decisions are made
of information about family circumstances and earnings potential.
in light
Two
sorts of strategies are typically
omitted variables bias.
One assumes
used
that conditional
Then any causal
independent of potential outcomes.
conditional-on-X comparisons. This
is
on covariates,
a strong assumption that
2.
IV in
is a
interest, Dj, is
from weighted
seems most plausible when researchers have
Alternately,
instrumental variable which, perhaps after conditioning on covariates,
The instrument used here
of
Xj, the regressor
effect of interest can be estimated
considerable prior inforrnation about the process determining Dj.
potential outcomes.
presence of possible
to estimate causal effects in the
we
is related to
might
try to find
an
Dj but independent of
dummy variable
indicating same-sex sibling pairs.
on the treated
whose treatment
context
TV estimates capture
changed by the instrument
at
the effect of treatment
hand. This idea
is
for those
status can
be
easiest to formalize using a notation for potential treatment
assignments that parallels the notation for potential outcomes. In particular,
let Doi
and
D,, denote potential
treatment assignments indexed relative to a binary instrument. Suppose, for example, Dj
determined by
is
a latent-index assignment mechanism,
D,=
where
l(Y„
+
(2)
Y,Zi>7li),
binary instrument, and
Z, is a
=
treatment assignments are Do,
The
is
just
1[Yo
>
is
a
'^\\
random
error independent of the instrument.
and D,, = 1[Yo + Yi >
constant-effects latent-index assignment
or vice versa.
model
x],
model
is
'H,])
Dqi
what treatment would receive if Zi=0, and
i
and
D,;.
restrictive since
Whether linked
D,j tells us
potential
both of which are independent of Z|.
We can relax this restriction by allowing a random Yn for each
an alternative notation for
Then the
to
i,
in
it
implies D,, >
which case the
for all
if
i,
latent index
an index model or note, Dq,
what treatment would receive
i
D,,,
tells
us
Z =1 The observed
.
assignment variable, Dj, can therefore be written:
D,
=
Do,(1-Z,)
This notation makes
+ D„Z,
it
clear that, paralleling potential outcomes, only one potential assignment
is
ever
observed for a particular individual.
The key assumptions supporting IV estimation
Independence.
{Yo^, Yi^, Dq,, D,,)
FlRSTSTAGE.
P[D = 1|Z = 1] ^P[D = 1|
MoNOTONICITY.
Either D,, >
D,,,
These assumptions capture the notion
affects the probability
of treatment
Imbens and Angrist [1994] show
V
n
i
are given
below
(for a
model without
covariates):
Z,.
Z,=0].
or vice versa; without loss of generality,
that the instrument
(first-stage),
and
is
assume the fonner.
"as good as randomly assigned" (independence),
affects
everyone the same way
if at all
(monotonicity).
that together they imply:
E[Y,|Z,= 1]-E[Y,|Z,=0]
= E[Y„-Yo,| D„ > DJ.
E[D,iZ-l]-E[D,|Z=0].
The
left-hand side of this expression
models with measurement
YqJ D|, >
Dq,], is the effect
the population for
A
Wald's (1940) estimator
Effect
(LATE) on the
of treatment on those whose treatment status
is
right
for regression
hand
side, E[Y|,-
changed by the instrument,
i.e.,
which U, =1 and Do=0.''
+
a,
some constant
a.
Y„,
the population analog of
The Local Average Treatment
standard assumption invoked
Y„ =
for
error.
is
most empirical studies
is
constant causal effects,
In the childbearing application, a constant effects
consistently estimates the
YqJ D,, > Doi]=a. The
in
i.e.,
assumption implies that IV
common effects of childbearing on all women, since, given constant effects, E[Y,i-
LATE result above highlights the fact that
in a
more
realistic
world where
this effect
'Proof of the LATE result: E[Y,| Z = 1]=E[Y„, + (Y,,-Y„JD,| Z = l], which equals £[¥„, + (Y, -Y„,)D|,] by
independence. Likewise E[YJ Z=0]=E[Y„, + (Y|,-Y„,)D„, ], so the Wald numerator is E[(Y, -Yo,)(D| -D„,)].
Monotonicity means D, -D„, equals one or zero, so E[(Y|,-Y„J(D,,-D„,)]=E[Y| ,-Y„,|D, >D„,]P[D| >Do,]. A similar
argument shows E[D,| Z,= 1]-E[D,| Z,=0] = E[D„-Do,]=P[D,
>DJ.
varies (and indeed
then
we can be
it
must vary if, for example, Y,
sure only that
by manipulating
Zj.
IV captures
is
a binary
outcome or other variable with limited
support),
on individuals whose treatment status can be changed
the effect
These are people with 0,,= ! and Do,=0, or D, -Doi=l Note also
.
are defined with reference to a particular instrument, then
-
that since
Dn
and Dq,
again, in the absence of additional assumptions
-we should expect different instruments to uncover different average causal effects. We might, for example,
expect an IV strategy based on the same sex instrument to identify a different average effect than an
instrument based on twin births.
instruments that are
In fact, Angrist and
much lower than
those using
Evans (1998) report IV estimates using twin
same sex
birth
instruments.
Angrist, Imbens, and Rubin (1996) refer to people with D||-Doi=l as the population oi compilers.
This terminology is motivated by an analogy to randomized trials where
and D|
is
Z, is a
randomized offer of treatment
actual treatment status. Since D,i-Doi=l implies D,=Zj, compliers are those
experimenter's intended treatment status (though not
For compliers, the averages of Yj, and
Abadie (2002) shows
Yf,|
all
who comply
with an
those with D,=Zj are compliers, as explained below).
as well as the average difference are also identified. In particular,
that
E[Y,D,|Z,=l]-E[Y,Di|Z,=0]
E[Y„|D„>Do,]
(3a)
E[Yo,|D„>Doi].
(3b)
E[D,|Z,= 1]-E[D,|Z,=0]
E[Y,(l-Di)|Z,= l]-E[Y,(l-D,)|Z,=0]
E[(l-D,)|Z,= l]-E[(l-Di)|Zi=0]
The entire (marginal) distributions of Y, and Yqi are
j
similarly identified, a fact used
by Abadie, Angrist, and
Imbens (2002) to estimate the causal effect of treatment on the quantiles of potential outcomes for compliers.
An
by
a
important econometric result in the theory of causal effects
mechanism
like (2), population
is
that
when
treatment
is
assigned
average treatment effects and the effect on the treated are not identified
without assumptions such as constant effects or some other assumption beyond the 3 given above. This
result or
theorem appears
in various forms; see, for
example. Chamberlain (1986),
8
Heckman
(1990), and
Angrist and Imbens
and the
role played
( 1
99 1 ). The next section develops a framework that highlights the
by
alternative
homogeneity assumptions
in efforts to
limits to identification
go beyond LATE. The Same sex
instrument offers an especially challenging proving ground for these ideas since at most
women
of American
have an additional child as a result of sex preferences. Causal effects on same sex compilers can
therefore be quite far from overall average effects if the impact of childbearing on these
Before turning to a general discussion of treatment effect heterogeneity,
between LATE, ATE, and
2.1
7%
effects
on the treated
in a parametric
I
women is not typical.
briefly explore the relationship
model that mimics the same sex setup.
A Parametric Example
Following Heckman, Tobias, and Vytlacil (2001),
trivariate
Normal model
Normal,
let p,„
the distribution of [Y,, Y„,
ATE is zero by construction. Assume also that Yi>0
be the correlation between Y,,- Yq, and
E[Y„-Y„,|
D„ >
Do,]
({)()
so monotonicity
In this parametric
r|,.
= E[Y„-Y„,| Yo+Y,>
=
where
Assuming
(2).
calculated average causal effects using a
of potential outcomes and the error term
for the joint distribution
assignment mechanism given by equation
I
ti,
>
model,
is
in the latent-index
r|,]'
is
joint standard
satisfied with D|,>Do,
LATE
and
can be written:
(4)
Yo]
P,o{[4)(Yo)-4>(Yo+Y,)][*(Yn+Yi)-^(Yo)]'},
and $() are the Normal density and distribution functions. Similarly,
we
can use Normality to
write the effect on the treated as:
E[Y„-Y„,| D,= l] = E{E[Y„-Yo,| Yo+Y,Z,>ti„
Z,]|
= -p,o{A(Yo+Yi)E[Z,|
where A()
is
the inverse Mill's ratio,
(|)(
)/3>().
D = l]
= E[Y„-Yo,|
Yo
+
Y,
>
D = l]
^,
LATE
(5)
+ A(Yo)(l-E[Z,|
This formula
expression better clarifies the difference between
E[Y„- Y„,|
D = I}
is
and the
D = l])}.
useful for calculation, but the following
effect
on the
>Yo]" + E[Y„-Yo,| Yo >
treated:
ti,](1 -a>),
where w={$(Yo+Yi)-'5(Yo)}[P(Z,= 1)/P(D,= 1)] and 1-u=$(yo)/P(D = 1). Equation
(6)
(6)
shows the
effect
on
the treated to be a weighted average of LATE
depend on the
2.1.1
LATE
first
and the average
stage and the distribution of
effect
and the effect on the treated both depend on the correlation between potential outcomes and
distribution of the instrument.
sketched in Fig.
against <J(Yo) for a fixed
1
The
The
ATE
which plots
,
first
1
sets pio
=
,
would be
is
in the figure
( 1
treated group only (see, e.g..
2
on the
shows
is
also
depends on the
LATE, and the effect on the treated
Bemoulli(.5). In other words, as with the
998), the simulated instrument
1
is
a
dummy that equals one with
by 7 percentage points. The top panel of Fig.
of treatment increases with the gains from treatment, as in a Roy
p.o
=
-.5 for stronger selecfion on gains. With positive
shows
that
p,o,
the
LATE equals the effect on the treated when <I>(Yo)=E[D||
incompatible with the Normal latent-index model since
an important special case
effect
on the treated
reflected through the horizontal axis.
The leftmost point
is
a constant equal to zero),
and increases the probability that D; equals
-.1, so that the probability
Z,=0]=0. This
(
stage of .07 and an instrument that
(1951) model, while the bottom panel sets
figure
effect
relationship between alternative causal parameters in the parametric
same sex instrument in Angrist and Evans
probability V2
with weights that
Zj.
the latent first-stage error, and on the first-stage coefficients.
is
"H,,
The Effect on the Treated
vs.
LATE
model
on those with Yo >
in practice,
Bloom,
1
most commonly
in
'A leading example
is
the
requires Yo~~ °°> but E[Dj| Zj=0]=0
trials
with partial compliance in the
984 or Angrist and Imbens, 1991).^ At the other end of the
treated approaches the overall average effect
that increasing the size
randomized
it
of the
first
randomized
when almost everyone
gets treated. Finally, Fig.
stage effect from .07 to .30 pulls both
trial
figure, the
LATE
and the effect on
used to evaluate subsidized training programs offered through
the Job Training Partnership Act, one of America's largest Federally-sponsored training programs. Subsidized
training
was offered but not compulsory
in the
randomly selected treatment group. About 60
treatment took up the offer, so E[Dj| Z,= 1 ]=.6, where Zj
status.
LATE
On
is
is
the
% of those offered
randomized offer of treatment and D,
is
acUial training
the other hand, (virtually) no one in the control group received treatment, so E[D|| Z|=0] = 0. In this case,
the effect on the treated because the set of
analysis of the
al
ways-takers
JTPA.
10
is
virtually
empty. See Orr,
et a/
(1996) for an IV
the treated closer to the overall average effect.
The
effect
consequence of the
of treatment on the treated
fact that selection
is
LATE
above
for
on gains makes E[Y|,-YnJ Yo >
first-stage baseline values, a
all
'H,]
bigger than
LATE.
Moreover,
LATE provides a better measure of the effect of treatment on a randomly chosen individual (ATE) than does
the effect on the treated for
from equation
density.
(4)) is that
most parameter
values.
LATE=ATE when
Thus, as noted by
Heckman and
A
important feature of the figure (also apparent
final
Yi=''2yo since 4>(Yo)=4'("Y[i) by symmetry of the
Normal
Vyilacil (2000), a "symmetric first stage" that changes the
probability of treatment from/? to \-p implies
LATE
equals
ATE
in the
Normal model, or
in
any
latent
variable model with jointly symmetric errors.*
3.
Identification
Problems and Prospects
Angrist, Imbens, and
Rubin (1996) show
population into three groups, which
compilers,
other
i.e.,
those for
whom D|,=
I
l
refer to
that the
potential-outcomes framework for IV divides a
below as "potential-assignment subpopulations." The
and Do,=0.
In the latent
two groups include individuals whose treatment
index model, compilers have Yo+Yi>'n,>Yo-
status
is
unaffected by the instrument.
of never-takers, with D, =D|,=0. Never-takers are never treated regardless of the value of Z,
might be exposed.
In the latent
with Doi=l and D,
=0
The set of the
is
=
l
.
is
to
The
consists
which they
Always-takers are always treated regardless of the value of Z, to
In the latent-index
empty by
treated
One
are
index model, never-takers have r|,>Yo+Yi- The second unaffected group
consists ofalways-takers, with D, =Df,
which they might be exposed.
first
model, always-takers have
Yo^'Hi-
A possible fourth group
virtue of the monotonicity assumption.
the union of the disjoint sets of always-takers and compilers with
Z=
l
.
This
"Joint symmetry means that if AVj,. fl,) is the joint density of yj=Yj,-E[Yj,] and r|,, then/f-y,, -T|^=/(y„ t\,).
weaker condition with the same result (a symmetric first stage ranging from p to -p gives LATE=ATE) is that
EfyJ Ti|] is an odd function (as for a linear model) and that r|, has a symmetric distribution. Angrist ( 1 991 ) somewhat
more loosely noted that IV estimates should be close to ATE when the first stage changes the probability of
treatment at values centered on one-half, as is required for the first stage to be symmetric.
A
1
11
provides an interpretation for
D, =Doi
tlie
following identity:
+ (D,-D„,)Z„
since Doi=l indicates always-takers and {Dy-D(i^)Z,, indicates compiiers with Z,=l. Since Z|
of complier
status,
therefore be
is
independent
compiiers with Zj=l are representative of all compiiers. Causal effects on the treated can
decomposed
E[Y,i - Yo;i
as:
D = l]
= E[Y,,
- Yo,|
Do,>D„]{l-P(Do=D„=l| D,= l)) +
E[Y„
Equation (7) generalizes
- Yo,|
Do,=D, =1]P(D„,=D„=1| D,= l)).
which gives the same decomposition
(6),
(7)
for the
Normal model.
Because an
instrumental variable provides no information about average treatment effects in the set of always-takers,
LATE
is
identified while E[Y,,-Y(,,|
To further pinpoint the
D,= ]]
is
not.
identification challenge in this context, note that E[Y,,| Do,=D,j=]
]
and E[YoJ
Doi=D,j=0] can be estimated using the following relations:
E[Y,J
DorD„=l] = E[Y„| Do = l] =
E[Yoi|
Do,=D„=0] = E[Yo,| D„=0] = E[Y.| D,=0, Z,= l].
The missing pieces of
E[Y,| D,= l, Z,=0]
(8a)
(8b)
the identification puzzle are therefore the fully counter-factual averages, E[Y,,|
DorD„=0]andE[Yo,|Do,=D„=l].
3.1 Restricting Potential-Assignment Subpopidations
The conditional expectation functions (CEFs) of Y,, and
framework
for the discussion
E[Y,jDoi,D,i] =
a,
+
of alternative identification
p,oDo,
Yqj given potential assignments provide a
strategies.
These CEFs can be written:
+ PnD„
(9a)
E[Yo,|Do„D„] = ao+PooDo, + Po,D,,
(9b)
Equations (9a) and (9b) impose no restrictions since there are three potential-assignment subpopulations and
three parameters in each
CEF. The 6 conditional means,
12
E[Yj,| Dq,, D,,], are
uniquely determined by (9a,b)
as follows:
Definition
Indicator
CEF
Compliers
D,,= l,Do,=0
D|,-Do,
Co
Always-takers
D|=Du=l
Dni
Co
Never-takers
D,,=Do,=0
l-D,,
tto
Group
The CEF
for observed outcomes, E[Y,|
CEFs of Yo, and Y,,
given
and
D,,,
D,,
D„
Z,],
+
+
CEF
for Y„.
K,
Poi
Poo
+
a,
Poi
for Y,.
+ Pn
+ P10 +
P1.
a,
has a distribution with 4 points of support, while the
depend on 6 parameters. This suggests the
the former without additional restrictions, a result implied
latter are
not identified from
by the theorem below.
and Monotonicity assumptions hold and that YQ^and Y,,
and f,(y/ D,,, DoJ denote the conditional distribution
functions for potential outcomes given potential assignments and letfyp^fy, d, z) denote thejoint distribution
ofY,. D,, and Z,. Then ffy/ Dj,, DJ and f(y/ D,„ DgJ are not identified from fy^^fy, d, z).
THEOREM: Suppose the Independence,
have multinomial distributions.
Proof: Factor the d.f using
D,„ Do,)
f/y|
'^'
z),
P^o(y)D„,
and
of the parameters determining
show
= fyiDzCyl d,
+ P„(y)D,„
fvozCY' d, z)
= a/y) +
substitute into fypzCY'
First-Stage,
Let f(,(yl D,,, DgJ
iterate
z)g|32;(d,z).
The second term
is
unrestricted.
Let
expectations to obtain the multinomial likelihood solely as a function
fo(y| D||, D,,,)
and
f|(y|
D,„
Dq,).
Finally, substitute for fo(y| D,,,
D^) and
f|(y|
invariant to the choice of Pon(y) and P,|(y) as long as a,(y)+P|,(y) is
constant. Non-identification of Po(,(y) implies non-identification of the marginal distribution of Yqi while
D|,, D,,,) to
the likelihood
is
non-identification of p,|(y) implies non-identification of the marginal distribution of Y,,.
The multinomial
distributional assumption raises the question of
how
general the theorem
is.
It
seems
general enough for practical purposes since, as noted by Chamberlain (1987), any distribution can be
approximated
arbitrarily well
by
a multinomial.
Moreover,
I'd like to rule out identification
based on
continuity or support conditions to avoid paradoxes such as "identification at infinity".'
3.2
A Menu of Restrictions
A
strike
variety of restrictions on (9a,b) are sufficient to identify
me as being of special
interest.
The simplest
is
ATE.
I
briefly discusses 4 cases that
ignorable treatment assignment or "no selection bias."
'See Chamberlain (1986). The multinomial assumption has some content since it implies that potential
outcomes have bounded support, so that ATE and effects on the treated are bounded. See Manski ( 990) or
1
Heckman and
V>tlacil (2000).
13
Restriction
This implies
1
(NO SELECTION
LATE
=
BIAS). Poo=Poi=Pio=Pii=0;
= ATE. Under
a|-a(,
Restriction
1,
ATE
can be estimated from simple treatment-
control comparisons.
Because the assumption of no selection bias involves four restrictions while two would be sufficient,
ATE
over-identified in this case.*
is
identification
by comparing IV and
1,
standard
OLS estimates,
Hausman (1978)
test for
endogeneity exploits over-
equivalent here to a comparison of Wald estimates with
A modified and potentially more powerful test can be based on the fact that
treatment-control differences.
under Restriction
A
E[Yii| Doj=l]=a|
and
E[Yo,| D|i=0]=ao.
Using
(8a,b), this suggests the following
specification test;
Test for Selection Bias.
E[Y,|Z,= l]-E[Yi|Z-0]
={E[Y,|D,= 1,Z,=0]-E[Y,|D,=0,Z,= 1]}.
(Tl)
E[D,|Z,= 1]-E[D;1Z.=0].
In the appendix,
I
show how
The Hausman
a test statistic based
test for selection bias replaces E[Y,|
here since under Restriction
while
T
are the
1
from the
implicitly
1
both
fact that the
D,= l, Z=0]-E[YJ Di=0, Z = l] on the right hand
The Hausman test will
side of Tl with E[Yj| D|=1]-E[Y,| Di=0].
test arises
on Tl can be computed using regression software.
also
work in the causal framework outlined
OLS and FV estimate ATE. The difference between Tl
Hausman
test implicitly
compares E[Y|,|D,,>Do,] withE[Yj|| Dj=j] forj=0,l,
compares E[Yjj| D, i>Doi] with E[ Yj,| D, j=Doi=j]
for j=0,
The empirical
results
same under monotonicity but not
in general.
and a Hausman
1
.
These two pairs of comparisons
below suggest that Tl which uses
,
'a weaker version of Restriction 1 with P||=Poi,=0 is also sufficient to identify ATE since this equates neverCEF of Y,, and always-takers with compliers for the CEF of Y^,. This seems no easier to
takers with compliers for the
motivate than Restriction
1
,
so
I
limit the discussion to the over-identified case.
14
monotonicity, indeed provides a more powerful specification
While pivotal
assumption of no selection bias
for specification testing, the
for causal inference since the use of
IV
test.'
is
an unattractive basis
An
motivated by the possibility of selection bias.
is
assumption that allows for selection bias amounts to the claim
mean-independent of potential treatment assigmnents.
I
that the difference
between
Y,,
alternative
and Y„,
is
refer to this as "conditional constant effects."
Formally, this means:
Restriction 2 (conditional constant effects).
p„o=P,o; Poi=Pii-
This pair of restrictions
ATE.
ATE.,
or, equivalently,
sense that Y,, and
the
same
Y,,,
is
just sufficient to identify
E[Y|
— Y(„|
D,„
D^,]
In particular,
= E[Y, —Yd,]. While
we
again have
=
ai-a,,
=
restriction 2 allows for selection bias in the
are correlated with potential treatment assignments, the correlation
for both potential
LATE
outcomes, so that the difference between Y,, and Yg;
is
restricted to be
is
orthogonal to potential
treatment assignments.
Ill
tlie
same sex example.
constant, are nevertheless the
rules out
Roy (195
treatment.
In the
)
1
Restriction 2
amounts
same regardless of
a
to
woman's
type selection, where treatment status
case of childbearing, for example, a
implausibly) be independent of individual-level variation
On the plus side. Restriction 2
is
is
likelihood of having children.
determined
woman's childbearing
in the
'"Note that the
second part
effects
is
is
first
part
the basis of
much
is
by the gains from
decision must (somewhat
in that
does not require
it
Y„,.'"
of Restnction 2
sufficient to identify the effect
theoretical matter this
Restriction 2
labor-supply consequences of childbearing.
'Abadie (2002) develops a number of related bootstrap specification
the
at least in part
weaker than the usual constant-effects assumption
between Y,, and
a deterministic link
saying that average treatment effects, while not
empirical work and
is
tests.
sufficient to identify the effect of treatment
on the
treated, while
of treatment on the non-treated. Although conditional constant
may be
a reasonable approximation for practical purposes, as a
typically implausible unless treatment
15
is
exogenous;
see, e.g..
Wooldridge
(
1
997, 2003).
A third restriction, which
I
call "linearity," is
appealing because
with a benchmark Roy-type selection model. The linearity condition
Restriction 3 (linearity).
E[YoJ
Do,,
is
not fundamentally inconsistent
is:
Poo=Poi; Pio=Pii-
In this case, the potential-outcomes
E[Y„|Do„D„] =
it
CEFs can be
written:
+
p,,(Doi
+ D„)
(10a)
D„] = ao +
Po,(D„i
+ 0,0-
(10b)
a,
Restriction 3 requires the potential-outcomes
measure of the desire or
suitability
CEF
to be linear in Di* = Doi
of an individual for treatment.
nevertheless think of (10a) and (10b) as providing a
If
+
D,;,
where
the restriction
minimum mean-squared
Dj*
is
is
a
summary
false,
we can
error approximation to the
unrestricted model, (9a) and (9b).
To see how average
causal effects are identified under Restriction 3, write the probabilities of being
an always-taker and never-taker as
P[Do,=D,i=l] = E[Doi]=p3
P[Doi=D„=0] = E[l-D.,]=p„.
and note
that
E[Doi
+ Dh] =
Substitute into
( 1
1
+
Oa) and
E[Yh-Yo,]
(p3 -
( 1
=
pj.
Ob) and difference to obtain
[(a,
+ P„)
- (ao
+
Po,)]
+ (Pn
(H)
- Po.XPa - P„)
= E[Y„-Yo,|D„>D„,]
+ {(E[Y„| Do,= l]-E[Y„| D„>Do,])
The components on
the right
hand
side
of (
1
- (E[Yo,|
D,>Do,]-E[Yo,| D„=0])}(p3
1) are easily estimated; details are
- p„).
given in the appendix.
A calculation similar to that used to derive (11) shows that the effect of treatment on the treated can
16
be constructing using
E[Y„
- Yo,|
D,= l] =
+ p„)
[(a,
- (ao
+
M] + (Pii
- PoiXPa/Pa)
(12)
= E[Y„^Yo,|D„>Do,]
+ {(E[Y„| Do=l]-E[Y„| D,
where p^
is
if there are
3. 1
From
the probability of treatment.
no always-takers,
LATE
is
(12),
- (E[Yo,|
>Do,])
D„>Do,]-E[Y„,| D, =0])}(p.yp,),
we can immediately
derive the
Bloom (1984) result that
the effect on the treated."
Symmetry Revisited
Restriction 3
is
closely related to the
symmetry property discussed
see this, note that as a consequence of linearity
we can
interpolate the
CEP
in the
parametric example.
for compilers
To
by averaging as
follows:
E[Y„ D„ >
D„,]
I
= {E[Y„ Do,= l] + E[Y„ D„=0]}/2.
(13)
|
I
This means that expected outcomes for compilers can be obtained as the average of expected outcomes for
always- and never-takers.
is
What
distributional assumptions support a relation like
determined by a latent-index assignment mechanism, as
E[YjD„ =
Do,=0] =
E[YJD„ =
D„,=1] = E[YJti,<Yo],
E[Y„h,>Yo +
in
equation
(2).
(
1
3)?
Suppose treatment
Then,
Y,]
and
E[Y^,
I
D„ >
Do,]
= E[YJ
Yo
+
Y,
>
Tl,
>Yo]-
If in addition, Yi=-2yo, then equation (13) holds as long as (Yj„
t],) is
jointly symmetric, as in the
Normal
model. The restriction Yi=-2yo implies
P[D,= 1|Z,=0] = P[ti,<y„]=1^/^
P[D,= l|Z,= l] =
"With no
(14)
Ph,<-Yo]=P
always-takers,
we have
Dq.sO, so Restriction 3
17
is
not binding.
for
some /?
e [0,1] so the first stage
is
also
symmetric
(e.g, a first stage effect
of .1 that
shifts the probability
of treatment from 1-/7=. 45 top=.55).
The upshot of the previous discussion
first
is
that a
stage imply the interpolating property, (13),
LATE
equals
ATE
since
p=p„ given
or,
symmetric latent error distribution and a symmetric
equivalently, Restriction 3. Moreover,
the first stage described in (14).''^ Intuitively, a
with symmetrically distributed latent errors equates
individuals with characteristics that place
them
first-stage relationship
may
shifts the probability
LATE with ATE because average treatment effects
in
the middle of the
In
LATE
such cases,
equals
it
r||
if,
as
average treatment effects for one sample to
this question, see
rji.
first stage.
to invoke Restriction 3
I
describe a simple scheme for
IV should estimate average treatment
This approach naturally raises the question of
make
Restriction
effects
3,
who
would be similar
in the
is
to
use
outline a procedure designed to
between two samples with and without a symmetric
problem
how
inferences about average effects in another. For a recent
Hotz, Imbens and Klerman (2000),
the extrapolation
and
typical, the first stage
extrapolate the results from randomized trials across sites with different populations. Here
that if effects differ little
for
distribution (compliers) are
seems more
of treatment asymmetrically? In the empirical section,
effects in this specially constructed sample.
on
seems reasonable
ATE. But what
using covariates to construct a subsample with a symmetric
attack
first-stage
be fortuitously symmetric, as for the 1990 Census sample of teen
mothers using the same sex instrument.
proceed under the assumption that
again have
symmetric
representative of average treatment effects for individuals over the entire distribution of
A
we
I
rely
first-stage,
on the
fact
then given
solved under the maintained assumption that average treatment
symmetric sample and
its
complement.
this, note that P[D =
Z=0]=1 -p implies p^ = -p. Since p^ + p„ + {P[D,=1| Z = 1]-P[D,= 1| Z=0]}
+
(2;^-l)=l,
this
implies
As noted in the discussion of the parametric model, LATE=ATE
p„=\-p.
Pn
given (14) also results when E[yjj t|,] is an odd function and the marginal distribution of r|, is symmetric. This makes
'^To see
1
1
1
= (\^p) +
it
possible to have a relation like (13) with, say, a binary or otherwise limited dependent variable for which a
symmetnc distnbution
is
implausible.
18
3.2 Weakening Restriction 3
Suppose again
that treatment assignment can be
outcomes CEFs are linear
in
t],
(as
would be
modeled using equation
the case under joint Normality).
(2),
and
that the potential-
Then we can
write,
E[Y„| D„„ D„] =
a,
+
p,E[ti,| Do,,
D„]
(15a)
D„] =
a„
+
PoE[ti,| Do,,
D„]
(15b)
E[Yo,| D„„
where
Eh.l
Do,,
D„] = Eh.l D„=0] + {Eh,| D„,= l]-Eh,| D„=l, Do,=0]}Do, +
(16)
{E[ti,|D„=1,Do,=0]-E[ti,|D„=0]>D„.
Substituting (16) into (1 5a) and (15b) generates an expression for the coefficients in (9a), (9b). This leads
to the following generalization
of Restriction
3;
Restriction 4 (proportionality). Poo=ePoi;
The
first
part of the proportionality restriction
= {Ehil Doi=l]-E[TiJ D„=l,
which shows why
Restriction
E[Y^,
I
if
infinity,
status.
D,;=0]}/{E[ti,|
(15a,b) alone. Using (16),
we have
D,-!, Do,=0]-Eh.| D.-O]},
(17)
positive.
4 leads
to a generalization
of the interpolation formula for average potential outcomes.
D„ >
Do,]
=
(1/(1+6))E[Y^,
|
Do,= l]
+
(0/(l+0))E[Y^,
|
D„=0],
(18)
6=0, compilers have the same expected potential outcomes as always-takers, while as 6 approaches
compilers have the same expected potential outcomes as never-takers.
The
model
comes from
9>0.
we now have
In particular,
so that
is
p, 0=6(3,1, for
linearity
for continuous
On
assumption used
outcomes.
It
to
may
motivate Restriction 4 seems most plausible
in the
context of a
be more of stretch, however, for binary outcomes such as marital
the other hand, without covariates the distribution of r|,
19
is
arbitrary.
We
can therefore define
r|,
as the latent error tenn in an assignment
on the unit interval." This guarantees
in the unit interval.
mechanism hi<;e
that
(
1
5a, b)
(2), after
can generate
transformation to a uniform distribution
fitted
values for outcome
CEFs that also
fall
Alternately, the weighted average in (18) can be motivated directly as a natural
generalization of equally-weighted interpolation using (13).
To develop an
estimator using (18), substitute Restriction 4 into (9a) and (9b) to obtain:
E[Y„| D„„ D„] =
a,
+ p„(eDo, + D„)
E[Yo,|D„„D„] =
a„
+
Differencing and averaging,
+ D„).
po,(eDo;
we have
=[(a, + P,,)-(ao + Poi)]+(Pn-Poi)(ep,-pJ
E[Y,;-Yo,]
= E[Y„
- Yo,
D„
I
>Doi
]
+ {6-'(E[Y„| D,„=1]-E[Y„| D,
>Do,]) -
We can map out the values of ATE consistent with the data
This sensitivity analysis
is
An
(E[YJ D,
>DJ-E[YJ
by evaluating (19)
subject to the caveat that at the extremes
not identified, a fact apparent from
(19)
it
infinity,
on the
Although 6
is
stage coefficients, Yq
first
suggests a strategy for estimating 6 using information on these coefficients only. Suppose that
a uniform distribution
Uniform
=
[Yo
+
on the unit
Y.]/[l -Yo]
symmetry
in a straightforward
"The
'""To
as discussed above, or that the
interval.
Then
ATE is
CDF of
ti,
not identified
and
( 1
Yi.
This
5a,b) holds
can be approximated by
a straightforward calculation gives
= P(D,= 1| Z,= l)/[1 -P(D,= 1|
This has the property that 6=1
p„).
(18).'''
clearly depends in large part
for a latent error transformed to
-
for alternative choices of 6.
where 6 equals zero or
alternative to sensitivity analysis is to try to estimate 6 using (17).
without further assumptions,
D„=0])}(ep,
Z,=0)].
(20)
when P(D,= 1| Z,= 1)=1-P(D=1| Z=0),
while capturing deviations from
manner. The value of 6 calculated using (20)
in the
1990 Census sample
requires that the underlying error have a continuous distribution.
compute the
effect of treatment
on the treated under Restriction
20
4, replace
Op^-p^ with Op^/pj
in (19).
analyzed here
is .61
close to the value calculated using Normality (.58).
,
Homogeneity Restrictions
Specification Tests for
Because
ATE
is
- by definition - invariant
Restrictions 2, 3, and 4 can be partly checked
of Restriction
In the case
IV estimates of the same
2, this
amounts
to the particular instalment used to estimate
by comparing alternative estimates using different instruments.
to a
Sargan (1958) over-identification
structural coefficient.
Under
suggested by the fact that under Restrictions 3 or
A
test
of whether
whether
4.
- Y„,|
p,
,
-
D,„ Do,]= (ar
p,,
Poi is
-
Po,
positive
aj +
equals zero
is
comparing alternative
A final set of specification tests
is
4,
(p,, " Po,)(9D„,
is
test
Restrictions 3 and 4, the relevant comparison should
use equation (19) to convert estimates of LATE into estimates of ATE.
E[Y„
it,
+ D„).
therefore a test of conditional constant effects, while a test of
a test for Roy-type selection
on the gains from treatment.
Childbearing, Marital Status, and Economic Welfare
The same sex instrument
is
a
dummy
for
having two boys or two
girls at first
Angrist and Evans (1998) showed this instrument increases the likelihood mothers with
go on to have a third child by about 6-7 percentage points, but
demographic characteristics. The data
set
used here
is
the
paper. This sample includes mothers aged 21-35 with
1
is
and second
at least
birth.
two children
otherwise uncorrected with mothers'
990 Census
extract used in the Angrist
and Evans
two or more children, the oldest of whom was
less
than 18 at the time of the Census.
Descriptive statistics are reported in Table
women, and
for four
1
for the full
sample, for a subsample of ever-married
subsamples defined by mothers' education and age
at first birth.
The
division into
subsamples was motivated by earlier results showing markedly different effects of childbearing by maternal
education, and because of the policy interest in teen mothers.
21
The probability of having
a third child ranges
from a low of .33
The
in the
sample of women with some college,
probability of having a same-sex sibling pair
is
more
control for the demographic covariates listed in Table
variables of interest appear
4.1
OLS,
at
bottom of Table
the
1
to a
high of
.5
in the
or less constant at .505.
using linear models.'^
sample of teen mothers.
Some of the
Means
estimates
for the
outcome
1.
and 2SLS Estimates
IV,
The effect oisame sex on
the probability of having a third child varies
from a low of 5.9 percentage
points in the some-college sample to a high of 6.5 percentages points in the no-college sample. This can be
seen in the
first
row of Table 2, which reports first-stage estimates. The
E[D|| Z,= l]— E[D,| Z,=0]
group. '^
As
a
= E[D,
-Do,],
is
first-stage effect
without covariates,
also an estimate of the proportion of the population in the compilers
benchmark, the next two rows of Table 2 show estimates of the effect of childbearing on two
of the labor supply variables studied by Angrist and Evans (1998). These are IV and
models without covariates,
The Wald (IV)
i.e.,
Wald
OLS
estimates from
estimates and simple treatment-control contrasts.
estimates of the effect of a third child on
employment
status
and weeks worked
suggest mothers reduced their labor supply as a consequence of childbearing, though not by as
indicated by the
less likely to
OLS estimates.
For example,
for
are larger for less-educated
weeks worked are about
women than for those
"See Abadie (2003) and Frolich (2002)
'^Although
nevertheless
1
as
3 percentage points
work, but the corresponding IV estimate suggests a causal effect of only 8 percentage points.
The OLS and IV estimates
it
women who had a third child were about
much
is
we cannot
with
-
7 and
some
- 5.
estimates of labor supply effects
college; in fact, the labor supply estimates are
for nonlinear causal
identify individual compilers in
The IV
models with covariates.
any sample and tabulate
their characteristics directly,
possible to use data to describe the distribution of compiler characteristics, and to compare this to
the unconditional distribution. In particular, the difference in first-stage estimates across samples defined by
covariates characterizes the distribution of covariates
x„
E[-x,|
D,>D(,,]/E[x,]
=
E[D,,-D|,,|
.t
=
l
]/E[D|
-D(,,].
among
compilers.
To
see
this,
note that for a binary covariate,
Table 2 therefore also shows same sex compilers to be less
educated and more likely to have been married than the overall average.
22
not significant in the some-college sample. In contrast, the IV estimates are smaller for
first birth as
The
teenager than for
last
two rows
in
women who
Table 2 show
had their
after adding controls for age, age at first birth,
for three schooling groups.
including them has
effect
little
covariates are
OLS and
dummies
dummies, and dummies
birth as an adult.
first
first-stage,
two-stage least squares (2SLS) estimates
to indicate first-born
same sex
Since
on the 2SLS estimates. Moreover,
is
in response to the addition
OLS
and second-bom boys, race
uncorrelated with these covariates,
in spite
of the fact that some of the
good predictors of outcomes, estimates with covariates
than those without. Perhaps more surprisingly, the
women who had their
are only slightly
more
precise
estimates of labor supply effects also change
little
of covariates.
Estimates of the effect of having a third child on marital status, poverty status, and welfare use are
reported in Table 3 for models with and without covariates.
In the
children are less likely to be married. But this
due
age
at first birth,
since
OLS
is at
least in part
estimates with controls
show
important finding
rv and OLS
is
that
when
is
the effect of childbearing
to uncontrolled
demographic factors such
that additional childbearing is associated with an
increase in the likelihood of being married. In contrast to the
covariates suggest that the causal effect of childbearing
sample of all women, those with more
is
OLS
estimates,
IV estimates with or without
a reduced probability of being married. Thus, an
estimated
in
models with demographic controls,
estimates have opposite signs.
The most important change
in marital status caused
likelihood of being divorced or separated.
The estimated
by childbearing appears
effects
to
be an increase
in the
of childbearing on the probability of being
ever-married or divorced (but not separated) are not significantly different from zero. Consistent with an
increase in marital breakup, the birth of a third child also appears to lead to a marked increase in the
likelihood a
at least a
woman lives in a family with total family income below the poverty line. Here we should expect
mechanical effect since the poverty threshold
falls
as family size increases.
estimates are larger than IV estimates in models without covariates,
23
OLS
and IV estimates
Although
in
OLS
models with
woman is poor by 9-10 percentage points.
covariates both indicate that a third child increases the hicehhood a
Given the elevated rates of marital breakup and the increase
by childbearing,
woman
is
imprecise.
it
seems reasonable
on welfare. Both
the
to expect that the birth
IV and
The OLS estimate of
OLS
increase in the
The IV estimates show no
The IV estimate
an effect of
to the
a marginally significant 3.3
% with
magnitude represents a roughly one-third
sample of ever-married
On
in the
women
women
woman has
ever
by
are unlikely to be affected
sample of ever-married
the other hand, while the
rates is a significant 8 percentage points (s.e.=.028) for
insignificant for
6.7 percentage points without
between childbearing and the probability a
Not surprisingly, therefore, the IV estimates
identical to those in the full sample.
this
is
though the IV estimates are
welfare.
relationship
been married, so estimates limited
selection bias.
in levels,
number of women on
this,
on welfare use range from
the effect
While small
third child also increases the likelihood a
estimates tend to support
covariates to 3.9 percentage points with covariates.
or without covariates.
of a
poverty rates that appear to be caused
in
women are almost
IV estimate of the reduction
with no college,
it is
in marriage
close to zero and
women with some college. The effects of childbearing on poverty are also larger in the no-
college sample, though the difference in effects
on welfare use by college status is reversed and much smaller
than the difference in effects on poverty rates.
The difference
in estimates
by mothers' age
at first birth also
suggest a pattern of larger effects with
decreasing socioeconomic status, though the contrast is not as clear cut as the differences by schooling group.
While the increases
in marital dissolution
and welfare receipt are larger for teen mothers than for adult
mothers, the estimates are significant only in the latter group. Estimates of effects on divorce/separation are
similar in the
two groups, though again much more precise for the sample of adult mothers. This difference
in precision undoubtedly reflects the smaller
sample of teen mothers. One clear contrast, however,
higher likelihood that a third birth pushes a teen mother into poverty.
significant regardless of mothers' age at first birth, but
24
it
is
The impact on poverty
is
the
status is
roughly three times larger for teen mothers.
4.2 Heterogeneity across potential-assignment subpopiilations
The
had
first-stage estimates
% of each sample consists of compliers,
imply that 6-7
i.e.,
mothers
who
response to a homogenous sibling-sex mix. Because the overall probability of treatment ranges
a child in
upwards from about
.32, the
overwhelming majority of treated individuals are
al ways-takers.
This can be
seen in Table 4, which gives the distribution of potential-assignment subpopulations. In the sample of
women,
for
example, 6.3
% are compliers, 34 % are always-takers
sibling-sex composition), and 59
sibling-sex composition).
%
are never-takers
(i.e.,
The proportion of treated who
(i.e.,
would never have
are compliers
relatively small proportion
of compliers, the scope for differences
assignment subpopulation
is
in
the latter
is
4.2.1
to -.352 in the
1
-
(Pj/pj), or
of
about 8 %. Given the
average causal effects across potential-
that
determines
calculated using Restriction 3 and equation (11), or
described in the Appendix.
mothers
is
a third child regardless
to
substantial.
Table 4 also reports the estimate of (p^-p„), the multiplier
ATE when
have a third child without regard
all
The
estimate of (p^-pj
is
in
-.25 in the full sample,
how
far
LATE
is
from
models with covariates
and ranges from
as
for teen
sample of adult mothers.
Symmetric subpopulations
The value of zero for
same
as
(p^^p,,) in the teen
ATE under Restriction 3.
This
mother sample
is
noteworthy because
consequence of the
it
means
that
fact that the first stage for teen
LATE
mothers
is
the
is
almost perfectly symmetric: the same sex instrument shifts the probability of further childbearing from
about .47 to
ATE
.53.
for teen
The
is
a
Moreover, because 6 for teen mothers
mothers under Restriction 4 are also close
first
two columns of Table
5 focus
is
to
about
1
when estimated using
LATE.
on the comparison between estimates
mothers only, repeating earlier estimates for these samples from Table
first-stage coefficient
and intercept. For the most
part,
25
(20), estimates of
IV estimates
3
for all
women and teen
without covariates, including the
for teen
mothers are similar
to
those for
the sample of all
While the estimated
women.
on employment
effect
is
considerably lower
(s.e.=.051) versus .084 (s.e.=.027) in the full sample, the effect of childbearing on
(s.e.=l .3) in the full
sample and
-.062 (s.e.=.024) in the
estimated in the
2, for teen
A
test
full
full
-4.8 (s.e.=2.4) for teen mothers.
at -.026
weeks worked
Similarly, the effect
on marital
is -5.2
status
is
sample and -.066 for teen mothers (.051). Note that we can view the parameters
sample as estimates of E[Y,i- YoJ
D|i>D(,j,
X=all women], while the estimates
in
column
mothers, can be interpreted as measuring E[Y,j- YqjI X=teen mothers] under Restriction 3 or
of equality across columns
1
and 2
is
4.
therefore a joint test of the invariance of average treatment
on X and on the compilers potential-outcomes subpopulation. The fact that these
effects to conditioning both
are similar is evidence against substantial treatment effect heterogeneity in both dimensions, though of course
there are scenarios where this test has
There
women
is
some evidence
samples.
For
all
no power. '^
for a difference in effects
women,
the
IV estimate of
(s.e.=.023), while the corresponding estimate
across samples
is
precise than in the
weakened, however, by the
full
a symmetric first stage.
the effect of childbearing
I
details of the
on poverty
status is .095
143 (s.e.=.05) in the teen mother sample. The comparison
fact the estimates in the teen
sample. This raises the question of whether
we can
mother sample are much
less
construct a larger sample with
attempted to construct such a sample by estimating a Probit first-stage allowing
interactions with covariates
The
is.
on poverty status between the teen mother and all-
and then selecting the sample based on covariate-specific
symmetric sample selection are as follows. The idea
is to
fitted values.'*
use a parametric model
capture the variation in the first-stage effect of same sex on childbearing with demographic covariates. The
model allows
for a large set
predicted first-stage effect
Here,
of interaction terms with covariates.
is
I
then look for covariate values where the
symmetric in the sense required by Restriction
"As with an over-identification test, the power of the
we maintain £[¥,,- Y(,J X=teen mothers]=E[Y,,- Y„,].
'*A maintained assumption here
is
test turns
that the distribution
covariates used to select the sample with a symmetric
26
I
began with a Probit
first-
on maintaining the validity of a benchmark.
of yj, and
first stage.
3.
r|j
is
jointly symmetric conditional on the
stage equation:
P[D,= 1|Z,.X,] = <D[ko'X, + (k,'X,)Z,],
where X,
is
(21)
a vector of covariates that includes age, age at first birth, Black and Hispanic
dummies, and
dummies indicating women with some college and college graduates. The main effects, Ko'X,, and interaction
terms, k, 'X,, use the same parameterization of covariate effects (in particular, they both allow for linear terms
in the
age variables plus main effects for the dummies).
values. For each of these values,
i^„(XO^
women
is
plotted in Fig. 3. This gives the distribution of the probability of childbearing
with different X-characteristics and Z, equal
.34,
though there
By definition, a symmetric first stage
identify a
sample where
Column 3 of Table
first-stage in this
as required
this is
most
likely,
I
shifts the probability
absolute value than
in the full
the symmetric sample
is
tail
is
in the
to .54, i.e.,
in this
sample
and
.6.
approximately from/? to \—p,
symmetric sample
are smaller in
For example, the estimated effect on weeks worked
in
complementary sample
is
a lack
of precision.
of the distribution of first-stage base values plotted
turns out, limiting the
.4
sample complementary to the symmetric
handicapped by
larger symmetric sample can be constructed simply
it
with iioP^d between
-3.7 (s.e.=2.2), while the corresponding estimate in the
comparison
concentrated
which has about 104,000 obsen.'ations. The estimated
sample, and smaller than
-6 (s.e.=1.6). Again, however, the
The long right
women
initially selected
4.
is
of treatment across the value of one-half
of treatment from .47
sample, for which results are reported in column
distribution of k„(X,)
considerable spread.
shifts the probability
5 reports estimates for this sample,
sample
is
The
to zero.
by symmetry. For most outcomes, the IV estimates
working up. As
distinct
calculated
around the overall average of about
To
on about 1,700
<J.[k„'X,],
the distribution of which
for
I
In practice, K(,'X, takes
in Fig. 3 suggests that
by dropping values of 7:y(X,) beginning from the
to individuals
left
and
with values of 7to(X|) greater than or equal
.35 leads to a first stage that shifts the probability of treatment
27
an even
to
from .465 to .533, virtually perfectly
This can be seen
symmetric.
resulting sample of
which reports
5,
62,264 observations. Most of the estimates
sample and
this
different
from zero
is
in this
first-stage
and IV estimates for the
symmetric sample are close
-5.9 (s.e.= 1.8)
to those
and the effect on divorce or
The
in the
results in
Alternately,
we
Table
The
its
complement (.136 versus
.048), but the estimated effect
is still
significantly
complementary sample.
ofATE
4.2.2 Imputation
3.
of Table
.068 (s.e.=.032). Perhaps surprisingly, the estimated effect on poverty status differs markedly
is
between
in
5
For example, the effect on weeks worked
in the full sample.
separation
1
column
in
Table
5 reflect
can use equati ons
results
(1 1 )
an attempt to identify or construct samples where
or
(1
9) to impute a value of
LATE=ATE.
ATE for the various subsamples analyzed
of this effort are presented in Table 6 for four outcomes;
this table also reports the
no-selection alternative used to construct the specification test discussed at the beginning of Section 3.2.
The
estimates of the no-selection alternative are
corresponding
sample
is
OLS
estimates.
OLS
than a conventional
between IV and the no-selection
are similar to the estimates of
LATE, even
in
in the full
This suggests, as noted
-7.56 (s.e.=.12).
alternative provides a
estimate of LATE of -5.3 (s.e.=l
somewhat smaller estimates
.6).
(1 1)
more powerful specification
for the effect of childbearing
samples where the
the estimate of ATE for the sample of non-teen
still
is
weeks worked
IV/OLS comparison.
Estimates of ATE constructed using equation
again mostly
from the estimates of LATE than the
farther
estimate of the effect on
-7.34 (s.e.=.08), while the no-selecdon alternative
earlier, that the contrast
test
For example, the
all slightly
(i.e.,
adult)
first-stage is not
mothers
is
Using the estimates of 6 shown
for the effect
symmetric. For example,
-4.1 (s.e.= 1.5), in
in
on weeks worked
comparison with an
Table 4 and equation (19) generates
on weeks worked other than
in the teen
mother sample, though
significant.
Estimates of ATE for outcomes other than weeks worked are mostly insignificant. This contrasts
28
LATE. Again,
with the mostly significant estimates of
estimates
LATE.
of
ATE
when
8
woman
outside the teen mother sample
For example, while
(s.e.=.01 9), the
is
move
LATE
this is partly a
The evidence
with two children
is
But the
substantially closer to zero than the estimates of
suggests the probability of divorce or separation increases by .053
corresponding estimates of ATE are .028 (s.e.=.01 9)
estimated.
problem of precision.
when
6 equals one and .009 (.s.e.=.02 1)
that further childbearing increases divorce or separation for the typical
therefore weaker than the estimates of
sample of teen mothers, the estimates of
ATE
for effects
LATE
would
suggest.
Except for the
on poverty status are also smaller than the
corresponding estimates of LATE.
5.
Summary and
Conclusions
The framework outlined here provides a strategy
potential-assignment subpopulations.
effects
1
focused
for
initially
modeling treatment
on
restrictions that
make IV
on compliers representative of the overall population average treatment
leads to procedures that can be used to impute average treatment effects
outcomes for compliers, always-takers, and never-takers.
instruments suggests this approach
On
may be
the empirical side, estimates of
treatment effects for this population,
proportionality assumptions.
And
An
illustration
effect heterogeneity across
effect.
estimates of causal
The framework
also
from information on average
of these ideas using same sex
useful in applied work.
LATE
when
for teen
the
latter
mothers are close to the corresponding average
are inferred using a
number of
linearity
or
while estimates of the overall average effect of childbearing are smaller
than the corresponding IV estimates, most of the estimated effects on labor supply and poverty status remain
substantial and significant.
On
the other hand, most (though not all) of the estimated average effects on
marital status and welfare use are small
and
insignificant.
Estimates of the effects of childbearing on marital stability and welfare use using the
instrument suggest the outline of a coherent picture, but
29
many
features remain unresolved.
same sex
In this
application, the theory of parameter heterogeneity runs quickly into the sandpile of sampling variance
specification uncertainty.
likely to
weaken the case
and precise alternative
marital stability
On
balance,
to traditional IV.
seems
validity of IV estimates
is
data sets, and, of course,
think extrapolation efforts of the sort implemented here are
more
for the predictive value of a particular causal estimate than to provide a concrete
and welfare use
destructive evidence
I
and
to
is
me
For example, the evidence for an adverse
clearly
to
weakened by the attempt
be a prominent feature of
ultimately established less
to
go from
life in
effect
of childbearing on
LATE to ATE.
the empirical world.
This sort of
The external
by new econometric methods than by replication
by new instalments.
30
in
new
APPENDIX: COMPUTATION
1.
The
Drop
test for selection bias.
individual subscripts from the notation. Consider the following two-equation system:
Y = Ao + A,D +
Y=
+
6„
6,(1
(Al)
|i
[D=Z]) + 6,((D-Z)/2) + v
The
test for selection bias is a test
and
A2
is
sample
sizes
A =
,
when A
6,
(
The two coefficients and
estimated by OLS.
A and A2 and allowing
on the order of that used here,
be estimated by stacking
for
of whether
(A2)
1
1
)
is
estimated by IV using
for heteroscedastic
and correlated residuals.
seems reasonable
it
to treat the
stochastic and use the standard eiTor of the estimate of A, to construct a
2.
Estimates under Restriction
Use (A2)
In practice,
estimate of 62 as non-
t-test.
to write:
E[Yo|
D,=D„=1]=
D=0, Z=l] = E[Y|| D,=Do=0]=
Estimates of E[Y,|| D, >
D,,]
6„
+
6^ -
6,/2
hJl.
and E[Y|| D, > D„] can be obtained as IV estimates of the coefficients Ag, and
(A3) and (A4), below:
DY = A,o +
(l-D)Y =
A,,D +
Ao„
(A3)
n,
+ A„,(l-D) +
(A4)
(io.
Estimates of ATE under Restriction 3 area linear combination of
the standard error for any linear combination of them
To
further simplify, rewrite equation
E[Y,
-
Yo] = A„[l-(p,-pJ]
dependent variable d'=D(2Z-
1
)- Z.
(
1
1 )
in
is
60, 6,,
Aoi,and A,,. These coefficients and
can be estimated by stacking A2, A3, and A4.
terms of the parameters
- Ao,[l+(p,-p„)]
To accommodate models with covariates, it
a
instrument
3.
E[Y,| D=l, Z=0] = E[Y||
A,, in
Z as an
the asymptotic standard error for their difference can
+ 26„(p,-pJ.
in
A2-A4
as
(A5)
convenient to use a regression set-up to estimate p^- p„. Define
Regress d" on Z; the coefficient on Z
is
an estimate of p^- p„. Note that
(without covariates) the standard error for the estimated Pa-p„ is the same as the standard error for the firststage coefficient since the latter can be written 1 -p^-p^. To estimate E[Y, - Yq| D=1], replace p^-Pn with
Pa/Pd in (A5).
Models with covariates were estimated by adding covariates to the relevant first-stage equations, and
A -A4. As a shortcut for inference for estimates of ATE using A5, it seems reasonable to treat
(Pa^Pn) '^"d 6(1 as known since these are estimated much more precisely than A,, and A,,,, which are
themselves instrumental variables estimates. Note also that IV estimates of A,, and Aq, are independent.
to equations
1
31
3.
Estimates under Restriction 4.
Substitute parameters
E[Y,
- Yo]
=
from A1-A4
into (19)
and simplify
A,,[l-(p,-e-'p„)] - Ao,[l+(ep,-pJ]
Standard errors were calculated treating
P3, p„, 6q, 62,
+
and 9
32
to obtain
[6„(l+0-') + (62/2X6-'- l)](ep3-pj.
as
known.
(A6)
REFERENCES
Abadie, A. (2002). "Bootstrap Tests for Distributional Treatment Effects
in
Instnimental Variables Models,"
Journal of the American Statistical Association, vol. 97, pp. 284-292.
Abadie, A. (2003). "Semi-parametric Instrumental Variables Estimation of Treatment Response Models,"
Journal of Econometrics, vol. 1 13, pp. 231-263.
Abadie, A., Angrist,
J.,
and Imbens, G. (2002). "Instrimiental Variables Estimates of the Effect
of Subsidized Training on the Quantiles of Trainee Earnings," Econometrica,
Angrist,
Epidemiology,
Angrist,
J.
NBER Technical
Regressors: Simple Strategies
Angrist,
November.
Models with Dummy Endogenous
foTEmpivicalpracUce" Journal ofBusiness ami economic Statistics,
1
15,
2-16.
and Evans, W.E. (1998). "Children and Their Parents' Labor Supply: Evidence from Exogenous
Variation in Family Size," American Economic Review, vol. 88, pp. 450-477.
J.
J.,
and Imbens.G. (1991). "Sources of Identifying Information
Technical Working Paper No.
Angrist,
Working Paper No.
(2001). "Estimation of Limited Dependent Variable
vol. 19, pp.
Angrist,
vol. 70. pp. 91-117.
(1991). "Instrumental Variables Estimation of Average Treatment Effects in Econometrics and
J.
1
17,
in
Evaluation Models,"
NBER
December 1991.
Imbens,G. and Rubin, D.B. (1996), "Identification of Causal Effects Using Instrumental
Variables," Journal of the American Statistical Association, vol. 91, pp. 444-55.
J.,
M. (2002). "Sargan's Instrumental Variables Estimation and the Generalized Method of
Moments," Journal aof Business and Economic Statistics., vol. 20, pp. 450^59.
Becker, G., Landes, E., and Michael, R.T. (1977). "An Economic Analysis of Marital InstahiVily," Journal
of Political Economy, vol. 85, pp. 141-87.
Blank, R. M. (2002). "Evaluating Welfare Reform in the United States," NBER Working Paper 8983. June.
Bloom, H. S. (1984). "Accounting for No-Shows in Experimental Evaluation Designs," Evaluation
Arellano,
1
Review, vol.
8, pp.
225-46.
Chamberlain, G. (1986). "Asymptotic Efficiency
Econometrics,
Chamberlain,
G.
vol. 32, pp.
(1987).
in
Semi-Parametric Models with Censoring," Journal of
189-218.
"Asymptotic
Efficiency
in
Estimation
with
Conditional
Moment
of Econometrics, vol. 34, pp. 305-334.
Cherlin, A. (1977). "The Effect of Children on Marital Dissolution," Demography, vol. 14, pp. 265-72.
Bronars, S. G. and Grogger, J. (1994). "The Economic Consequences of Unwed Motherhood: Using Twins
as a Natural Experiment," American Economic Review, vol. 84, pp. 1141-1 156.
Frolich, M. (2002). "Nonparametric FV estimation of Local Average Treatment Effects with Covariates,"
IZA Discussion Paper No. 588, September.
Grogger, J. and Bronars, S.B. (2001). "The Effect of Welfare Payments on the Marriage and fertility
Behavior of Unwed Mothers: Results from a Twins Experiment, "Joi/r«a/o/P«/;7;co/£'c'onowv, vol.
Restrictions, "Jo«r«a/
109, pp. 529-545.
Hausman,
J.
(1978). "Specification Tests in Econometrics," Econometrica, vol. 46, pp. 1251-1271.
Heaton, T. B. (1990). "Marital Stability throughout the Child-Rearing Years," Demography, vol. 27, pp. 5563.
Heckman,
Heckman,
J. J.,
Heckman,
J.J.,
J. J.
(1990). "Varieties of Selection B\as," American
Economic Review,
and Vytlacil, E. (2001). "Four Parameters of Interest
Programs." Southern Economic Journal, vol. 68, pp. 210-223.
Tobias,
J.,
and Vytlacil, E. (2000). "Local Instrumental Variables,"
Paper 252. April.
33
vol. 80, pp.
in the
313-318.
Evaluation of Social
NBER Technical
Working
Holland,
P.
W.
(1986). "Statistics and Causal Inference," Journal of the
American
Statistical Association,
945-970.
vol. 81, pp.
GAFN: A Re-Analysis
NBER Working Paper 8-7, November.
Hotz, V.J., Imbens, G. and Klerman, J.A. (2000). "The Long-Term Gains from
the Impacts of the California
Imbens, G.
W. and
Angrist,
Econometrica,
Kearney, M. (2002).
J.
Family Cap,"
Program,"
of
(1994). "Identification and Estimation of Local Average Treatment Effects,"
vol. 62, pp.
"Is there
GAfN
467-75.
an Effect of Incremental Welfare Benefits on
Fertility
Behavior?
A look at the
NBER Working Paper 9093, August.
Manski, C. (1990). "Nonparametric Bounds on Treatment Effects," American Economic Review
vol. 80,
pp. 319-322.
Maynard,
R.,
Boehnen,
E., Corbett, T.,
Sandefur, G., and Mosley,
J.
(1998). "Changing Family Formation
The Family, and Reproductive
DC: National Academy Press.
R. A (1999). "New Developments in Econometric Methods for Labor Market Analysis," Chapter
24 in O. Ashenfelter and D. Card, eds.. The Handbook of Labor Economics, Volum 3A,
Behavior Through Welfare Reform,"
in
Robert
Moffit, ed., Welfare,
Behavior: Research Perspectives, Washington,
Moffit,
Amsterdam: North-Holland.
Bloom, H.S., Bell, S.H., Doolittle, F., Lin, W., and Cave, G. (1996). Does Training for the
Disadvantaged Work?, Washington, DC: The Urban Institute.
Phillips, P. C.B. (2003). Vision and Influence in Econometrics: John Denis Sargan," Econometric Theory,
Orr, L. L.,
forthcoming.
Grammar of Science, London: Adam and Charles Black.
A.
"Some
Thoughts
on the Distribution of Earnings," Oxford Economic Papers
Roy,
(1951).
Pearson, K. (1911). The
Sargan, D. (1958), "The Estimation of
Econometrica,
Waite, L.
J.
vol. 26, pp.
3,
135-46.
Economic Relationships Using Instrumental Variables,"
393-415.
and Lillard, L.A. (1991). "Children and Marital Disruption," American Journal of Sociology,
vol. 96, pp. 930-53.
Wald, A. (1940). "The Fitting of Straight Lines if Both Variables are Subject to Error," Annals of
Mathematical Statistics, vol. 1 1 pp. 284-300.
Wooldridge, J. M. (1997). "On Two-Stage Least Squares Estimation of the Average Treatment Effect in a
Random Coefficient Model," Economics Letters, vol. 56, pp. 129-133.
Wooldridge, J. M. (2003). "Further Results on Instrumental Variables Estimation of Average Treatment
Effects in the Correlated Random Coefficient Model," Economics Letters, vol. 79, pp. 185-191.
,
34
Panel A: Moderate selection on gains
0.20
-
0.15
-
^
"---^
0.10
in
-^^^
*•*
o
0.05
-
0.00
-
-0.05
-
-0.10
-
~-^^
^_
0)
c
a>
E
~^^^
03
-LATE
-TT
^
-0.15
-0.20
"^
-
0.0
0.1
0.2
0.3
0.5
0.4
0.7
0.6
0.8
0.9
P{D=1|Z=0)
Panel B: Strong selection on gains
1 c;n
'
1.00
5
L___
050
^~'"'''---^'
o do
en
o
TT
—
^
reatment
o
^^-^
"^
'
LATE
"^
-1.00
0.0
0.1
0,2
0.4
0.3
0.5
0.6
0.7
0.8
0.9
P(D=1|Z=0)
Fig.
The
sets
between LATE,
ATE
and the effect on the treated (TT) for alternate firstfixed at .07 and ATE=0. The top panel calculation
the correlation between gains and the treatment index to -.1, while the bottom panel sets this
1
:
relationship
stage baseline values.
correlation to
-.5.
The
first-stage effect
is
Panel A: Moderate selection on gains
n
9=;
1
0.20
0.15
o
^
"^^^^^^^^"t—
o
-^
o d en
effects
*-
00
^
-0.05
.
''^^...^^,
^^--^
-
0)
H
—^
-^
-TT
-0.10
LATE
-.
-0.15
i
1
j
!
-0.20
n 9^
0.0
0.3
0.2
0.1
0.5
0.4
0.6
0.7
P(D=1|Z=0)
Panel B: Stronj selection on gains
2,
1.00-
w
0.50
^^^^^^^^^
-
0)
p
'
00
--.—
_j.^
0)
E
^
!
£
TT
-0.50
LATE
1-
-1.00
1
^n
0.0
0.1
0.2
0.3
0.5
0.4
0.6
0.7
P(D=1|Z=0)
Fig. 2:
The
relationship between
stage baseline values.
sets the correlation
correlation to
-.5.
The
LATE, ATE and
the effect
first-stage effect is fixed at .30
between gains and the treatment index
on the
treated (TT) for alternate first-
and ATE=0. The top panel calculation
to -.1,
while the bottom panel sets
this
sf-
,Ji
r^"
Density
2
--
>?3I«
o-
1
1
.4
.2
.6
.8
P(D=liZ=0,X)
Fig. 3: Distribution
of
first-stage
base probability as a function of covariates. The
Black and Hispanic dummies, and dummies for
some college and college graduates. There are about 1,700 values in the histogram.
covariates are age, age at
first birth.
TABLE
1
DESCRIPTIVE STATISTICS,
:
WOMEN AGED 21-35 WITH 2 OR MORE CHILDREN
Means and Standard Deviations
Variable
All
Ever
No
Some
Teen
Adult
Women
Married
College
College or +
Mothers
Mothers
Children ever born
More
than 2 children (=1
if
mother
had more than 2 children)
Boy
1st (=1 if 1st child
Boy 2nd
(=1 if
Two boys
was
a
2nd child was
(=1 if
Two girls
were
(=1 if
Same sex
(=1 if first
0.405
0.328
0.500
0.324
(0.470)
(0.500)
(0.468)
0.512
0.512
0.510
0.514
0.509
0.513
(0.500)
(0.500)
(0.500)
(0.500)
(0.500)
(0.500)
0.511
0.511
0.509
0.512
0.508
0.511
(0.500)
(0.500)
(0.500)
(0.500)
(0.500)
(0.500)
0.264
0.264
0.262
0.266
0.262
0.264
(0.441)
(0.441)
(0.440)
(0.442)
(0.440)
(0.441)
0.241
0.241
0.242
0.240
0.245
0.240
(0.428)
(0.427)
(0.428)
(0.427)
(0.430)
(0.427)
0.505
0.505
0.504
0.506
0.507
0.504
(0.500)
(0.500)
(0.500)
(0.500)
(0.500)
(0.500)
Age
Age
at first birth (mother's age
when
first
child
2.41
(0.68)
(0.491)
two children
were the same sex)
2.72
(0.90)
0.370
two children
first
girls)
2.41
(0.68)
(0.483)
two children
first
2.55
(0.81)
0.375
boy)
were boys)
2.49
(0.75)
(0.484)
boy)
a
2.50
(0.76)
was bom)
Black Mother
30.4
30.6
29.92
31.3
28.8
31.1
(3.5)
(3.4)
(3.60)
(3.0)
(3.89)
(3.0)
21.8
22.0
20.85
23.4
17.9
23.5
(3.5)
(3.5)
(3.16)
(3.50)
(1.13)
(2.8)
0.131
0.092
0.141
0.115
0.237
0.088
(0.337)
(0.289)
(0.348)
(0.319)
(0.425)
(0.283)
Hispanic Mother
0.113
0.112
0.144
0.065
0.156
0.096
(0.317)
(0.315)
(0.351)
(0.246)
(0.363)
(0.294)
0.089
0.036
0.144
0.037
-
(0.285)
(0.186)
(0.351)
(0.190)
Never Married
0.068
(0.252)
Married Now
0.798
0.857
0.768
0.846
0.649
0.859
(0.401)
(0.350)
(0.422)
(0.361)
(0.477)
(0.348)
Divorced
0.081
0.087
0.083
0.079
0.122
0.065
(0.273)
(0.282)
(0.276)
(0.270)
(0.327)
(0.246)
Divorced or Separated
High School Graduate
0.137
0.136
0.114
0.197
0.099
(0.344)
(0.343)
(0.317)
(0.398)
(0.299)
0.420
0.423
0.685
-
0.426
0.417
(0.494)
(0.465)
-
(0.495)
(0.493)
(=1 if high school
diploma and no further education)
Some College
0.127
(0.333)
(0.494)
(=1 if some college, but
no degree)
College Graduate {-]
if bachelor's
0.264
0.269
-
0.682
0.174
0.301
(0.441)
(0.444)
-
(0.466)
(0.379)
(0.458)
0.123
0.131
-
0.318
0.018
0.166
(0.338)
-
(0.466)
(0.131)
(0.372)
degree
or higher)
(0.329)
In Poverty (=1 if family income
below the poverty
line)
Welfare Recipient (=1
Worked for pay
0.256
0.103
0.347
0.136
(0.437)
(0.303)
(0.476)
(0.343)
0.098
0.065
0.130
0.047
0.187
0.062
(0.297)
(0.247)
(0.336)
(212)
(0.390)
(0.241)
(=1 if worked for
1989)
in
0.158
(0.364)
if public
assistance income>0)
pay
0.197
(0.398)
0.662
0.674
0.623
0.723
0.650
0.667
(0.473)
(0.469)
(0.485)
(0.447)
(0.477)
(0.471)
Weeks worked (weeks worked
in
1989)
Number
at
age
1
women
5 or
later.
26.9
24.27
29.3
25.0
26.7
(22.9)
(23.00)
(22.4)
(22.81)
(22.9)
357063
236418
143589
380007
of observations
Notes: Data are from the 1990
26.2
(22.9)
PUMS. The sample
The no-college sample
includes
includes
women
women
with 2 or more children whose 2nd child was
at least
in
1
and
who had
269851
iheir first birth
with no college or with an associate occupational degree. The some-college sample includes
with an associate academic degree, some college but no degree, or a college degree. Teen mothers are those
younger. Standard deviations are reported
110156
age
parentheses. All calculations use sample weights.
who had
their first birth at age 19 or
TABLt
All
Women
OLS
Variable
IV
2
Ever
OLS
AND LABOR SUPPLY ESTIMATES
FIRST-STAGE
M amed
No
IV
Some College
OLS
or +
IV
Teen Mothers
OLS
\\
Adult Mothers
OLS
IV
No Covariates
A.
Firsl
College
OLS
IV
Sla^c
More
than 2 children
00628
0.0663
(0.002)
(0.002)
-
0.0652
0.0594
0638
0.0616
(0 002)
(0.003)
(0.003)
(0024)
Outcomes
Worked for pav
If
0.132
0.084
0126
0.083
0.132
0.105
-0 112
0.057
-0.140
•0026
0,130
01 09
(0.002)
(0.027)
(0.002)
(0.026)
(0.002)
(0.034)
(0003)
(0.044)
(0.003)
(0 051)
(0.002)
(0.032)
7.34
-5.15
-7.12
5.09
7.22
6.52
6,59
3.21
7.47
-4.76
-7.19
-5.26
(0.08)
(130)
(0.09)
(1.27)
(0.11)
(1.60)
(0,14)
(2.18)
(0.15)
(2.40)
(0.10)
(1.57)
-
Vets Harked
B.
Sta^
More than 2
With Covariates
First
0.0523
children
(0
0017)
0.0658
0.0644
-
00592
00633
(0,0017)
(0.0022)
-
(00027)
(0.0033)
0.0623
(00019)
-
Outcomes
Worked for pav
0.148
-0.097
0.148
0.097
0.144
-0.120
0.151
-0.060
•0.132
-0071
0.154
0.108
(0002)
(0.027)
(0.002)
(0 026)
(0.002)
(0.034)
(0.003)
(0.(M3)
(0.003)
(0.049)
(0.002)
(0.031)
8.33
•5,93
8 43
5.92
7.92
7.43
884
3.45
•7.51
•7.55
•8.62
-5.30
(0.09)
(1.27)
(0.09)
(1.24)
(Oil)
(1.57)
(0.14)
(2.15)
(0.15)
(2.28)
(0,10)
(1.52)
Weeks xiorked
Notes The table reports
OLS
and IV csiimatcs Liflhc coefficient on the Afore than 2 children variable The IV cslimales use 5omc sex as an instrument The covariates included
and dummies
arc Age.
Age
Table
Standard errors arc tcporled
I.
ulfirai birth,
for
Boy ht. Boy 2nd. Black. Hispanic, Other
in parentheses, Al) calculations
race.
use sample weights.
High school graduate. Some college and College graduate The samples
arc the
in
same
panel
as in
B
TABLE
All
Outcomes
3:
EFFECTS ON MARITAL STAUTUS, POVERTY STATUS AND WELFARE USE
W omen
Ever
Mlarried
No
OLS
Married Now
Divorced
Divorced or Separated
In Poverty
Weifare Recipient
Married Now
Divorced
Divorced or Separated
In Povertv
Welfare Recipient
OLS
or +
Teen
Mlothcrs
OLS
Adult Mothers
OLS
IV
-0.021
-0.010
-
-
-0.026
-0.023
-0.002
0.007
0.016
(0,001)
(0.153)
-
-
(0.001)
(0.021)
(0.001)
(0.020)
(0.002)
-0.025
-0.062
-0.0075
-0.055
-0.035
-0.082
0.008
-0.035
-0.015
-0.066
0.018
-0.048
(0.002)
(0 024)
(0.001)
(0.020)
(0.002)
(0.030)
(0.002)
(0.036)
(0.003)
(0.051)
(0.002)
(0.025)
IV
IV
IV
OLS
IV
No Covariates
-0
010
(0.0391)
0.0004
-0.003
(0.0009)
(0.014)
-0.013
0.011
-0012
0.012
-0.012
0.0044
-0.017
0.022
-0.024
-0010
-0.022
0.016
(0,001)
(0.016)
(0.001)
(0.016)
(0.001)
(0.0195)
(0.002)
(0.027)
(0.002)
(0.035)
(0.001)
(0.017)
0.0023
0.053
0.0056
0.055
0.0070
0.057
-0.011
0.046
-0.0030
0.048
0018
0.049
(0.0013)
(0.019)
(0.0014)
(0.020)
(0.0016)
(0.024)
(0.002)
(0.032)
(0.0027)
(0.043)
(0001)
(0.021)
0.143
0.095
0.124
0.082
0.167
0.107
0.070
0.088
0.178
0.143
0.083
0.062
(0.002)
(0.029)
(0.002)
(0.020)
(0.002)
(0.031)
(0.002)
(0030)
(0003)
(0050)
(0.002)
(0.024)
0.067
0.033
0.050
0.032
0.079
0.028
0.030
0.049
0.091
0018
0.030
0.032
(0.001)
(0.018)
(0.001)
(0.014)
(0.002)
(0.024)
(0.002)
(0.022)
(0.003)
(0.042)
(0.001)
(0.017)
0.0026
0.0051
-
-
0.0038
-0.025
0.0061
0.023
0.0087
-0.031
0.0043
0.0025
(0.0009)
(0.0136)
-
-
(0.0013)
(0.019)
(0.0012)
(0.018)
(0.0022)
-0.034
(0.0009)
(0.0126)
B.
Ever Married
Some Col lege
IV
A.
Ever Married
College
OLS
With Covariates
0.037
-0.052
0.037
-0.046
0.028
-0.080
0.057
-0.011
0.020
-0.078
0.047
0.041
(0.001)
(0.021)
(0.001)
(0.019)
(0.002)
(0.028)
(0.002)
(0.033)
(0.003)
(0.047)
(0.002)
(0.023)
-0.037
0.0071
-0.039
0.0067
-0.029
0.0010
-0.048
0.018
0 026
0.021
-0040
0.017
(0.001)
(0.0157)
(0.001)
(0.0159)
(0.001)
(0.0196)
(0.002)
(0.026)
(0002)
(0.035)
(0.001)
(0.017)
-0.033
0.048
-0.036
0.047
-0.023
0.054
-0.049
0.038
-0.013
0.040
-0.041
0.048
(0.001)
(0.019)
(0.001)
(0.019)
(0.002)
(0.025)
(0.002)
(0.031)
(0.003)
(0.043)
(0.001)
(0.020)
0.093
0.097
0.087
0.085
0.113
0.113
0.055
0.074
0.148
0.186
0.065
0.061
(O.OCl)
(0.021)
(0.001)
(0.019)
(0002)
(0.028)
(0.002)
(0.029)
(O003)
(0.047)
(0.002)
(0.022)
0039
0.033
0.031
0.032
0.048
0.032
0.020
0.041
0.071
0.043
0.022
0.030
(0.001)
(0.017)
(0.001)
(0.014)
(0.002)
(0.023)
(0.001)
(0.021)
(0.003)
(0.040)
(0 001)
(0016)
Notes: The samples and models are as in Table
2,
with difTerent dependent variables. Standard errors are reported
m parentheses.
All calculations use sample weights.
TABLE 4: POTENTIAL-ASSTGNMENT SUBPOPULATIONS
NoC(ivariates
Sample
All
Women
P.
Pa
P,.
P..-Pn
(1)
(2)
(3)
(4)
(5)
(6)
0.063
0.344
0.594
-0.250
0.619
0.375
(0.0018)
(0.0018)
0.370
Ever Married
0.066
0.337
0.597
0.405
College
0.065
0.372
0.563
0.059
0.328
College
0.298
0.642
0.500
0,064
0.468
0.468
0.324
0.062
0.293
(00020)
Notes: The
which
is
first
column
the parameter 9
ofSame
were calculated as described
appendix. Standard errors are reported
in
-0.344
0.510
-0.0006
0.999
0.645
-0.352
0.502
(0.0020)
reports the proportion treated.
given by the first-stage effect
0.696
(0.0034)
(0.0034)
Adult Mothers
-0.191
(0.0027)
(0.0027)
Teen Mothers
0.607
(0.0023)
(0.0023)
Some
-0.261
(0.0018)
(0.0018)
-
No
With Covanates
P[D=I]
sex.
in the
Pc
Pa-P„
(7)
(S)
0.062
-0.250
(0.0017)
(0.0018)
0,066
-0.260
(0.0017)
(0.0018)
0.064
-0.191
(0.0022)
(0.0023)
0.059
-0.344
(0.0027)
(0.0027)
0.063
-0.0005
(0.0033)
(0.0034)
0.062
-0.351
(0.0019)
(0.0020)
The second column shows the proportion of compliers
in the
sample,
The estimates of the proportion of always-takers and never-takers and
text. Estimates with covariatcs were calculated as described in the
parentheses. All calculations use sample weights.
TABLE
5:
SYMMETRIC FIRST STAGE SAMPLES
Symmetric Sample
All
Variables
First
Stage
Teen Mothers
(1)
(OLS
Constant
Outcomes (IV
11
& 7io(X)<=0.6
or7to(X)>0.6
7to(X)>=0.35
7ro(X)<0.35
(4)
(5)
(6)
.(3)
0.0628
0.0638
0.0713
0.0588
0.0684
0.0576
(0.002)
(0.003)
(0.003)
(0.002)
(0.003)
(0.002)
0.344
0.468
0.471
0.296
0.465
0.253
(0.001)
(0.002)
(0.002)
(0.001)
(0.002)
(0.001)
Estimates')
Worked for pay
Weeks worked
Ever Married
Married Now
Divorced
Divorced or Separated
In Poverty
Welfare Recipient
Number of Observations
Columns
(2)
Symmetric Sample
ito(X)<0.4
estimates")
Coefficient
Notes:
Women
I
7t„(X)>=0.4
1
-0.084
-0.026
-0.038
-0.109
-0.080
-0.092
(0.027)
(0.051)
(0.045)
(0.034)
(0.038)
(0.039)
-5.15
-4.76
-3.72
-6.03
-5.90
-4.71
(1.30)
(2.40)
(2.21)
(1.62)
(1.83)
(1.86)
-0.010
-0.0098
-0.016
-0.0031
-0.0051
-0.0092
(0.015)
(0.0391)
(0.032)
(0.0170)
(0.0267)
(0.0162)
-0.062
-0.066
-0.033
-0.066
-0.075
-0.039
(0.024)
(0.051)
(0.045)
(0.027)
(0.038)
(0.028)
0.011
-0.010
-0,029
0.025
0.012
0.0057
(0.016)
(0.035)
(0.031)
(0.018)
(0.026)
(0.0189)
0.053
0.048
0.020
0.063
0.068
0.033
(0.019)
(0.043)
(0.038)
(0.022)
(0.0032)
(0.023)
0.095
0.143
0.095
0.087
0.136
0.048
(0.023)
(0.050)
(0.044)
(0.027)
(0.036)
(0.028)
0.033
0.018
0.021
0.034
0.027
0.032
(0.018)
(0.042)
(0.035)
(0.020)
(0.029)
(0.020)
380007
110156
103803
276204
162264
217743
and 2 repeat estimates from Tables 2 and
selected as described in the text. Standard errors are
shown
3, for
models without covariates. Columns 3-6 report estimates using samples
in parentheses. All calculations
use sample weights.
TABLE
6:
IMPUTATION OF ATE
ATE
No
Outcome
All
(3)
(4)
(5)
-7.34
-7.56
-5.15
-4.31
-3.19
(0.08)
(0.12)
(1.30)
-7.12
-7.33
-5.09
-4.41
-3.45
(0.09)
(0.13)
(1.27)
(123)
(1.43)
Women
Ever Married
No
College
Some
College or +
Teen Mothers
Adult Mothers
In Poverty-
All
-7.33
-6.52
-5.73
-4.94
(1.60)
(1.57)
(1,70)
-6.59
-6.90
-3.21
-2.38
-0,60
(0.14)
(0.21)
(2.18)
(2.08)
(2,69)
-7.47
-7.66
-4.76
-4.76
-4,76
(0.15)
(0.22)
(2.40)
(2.40)
(2.40)
No
-7 19
-7.43
-5.26
(0.10)
(0.15)
(1.57)
00005
0.053
0,028
0.0092
(0.0019)
(0.019)
(0,019)
(0.0216)
0.0056
0.0043
0055
0024
-0.0002
(0.0014)
(00020)
(0.020)
(0,019)
(0.0221)
0.0070
0.0053
0.057
0,032
0.011
(0.0016)
(0.0024)
(0.024)
(0,024)
(0.026)
-0.011
-0.014
0.046
0,034
0.034
(0.002)
(0.003)
(0.032)
(0,029)
(0.037)
-0.0030
-0.0063
0.048
0,048
0.048
(0.0027)
(0,0040)
(0.043)
(0,043)
(0.043)
-0.018
-0.021
0.049
0,017
-0.0024
(0.001)
(0.002)
(0.021)
(0,019)
(0.024)
0.143
0.150
0.095
0,049
-0,0023
(0.002)
(0.002)
(0.023)
(0024)
(0,029)
0.124
0.129
0.082
0,054
0,020
(0.002)
(0.002)
(0.020)
(0.021)
(0,026)
0.167
0.175
0.107
0.061
0,014
(0.002)
(0.003)
(0.031)
(0.032)
(0,036)
College
College or +
0.070
0.071
0.088
0.059
0,03!
(0.002)
(0.003)
(0.030)
(0.031)
(0,042)
Teen Mothers
0.178
0.181
0.143
0.143
0,143
(0.003)
(0.005)
(0.050)
(0.051)
(0.051)
Adult Mothers
Welfare
All
0.083
0.088
0.062
0.017
-0.039
(0.002)
(0.003)
(0.024)
(0.025)
(0.033)
Women
Recipient
0.067
0.072
0.033
0.0058
-0.026
(0.001)
(0.002)
(0.018)
(0.0185)
(0.022)
Ever Married
No
0.050
0.052
0.032
0.015
-0.0051
(0.001)
(0.002)
(0.014)
(0015)
(0.0182)
0.079
0.085
0.028
-0,001
-0.031
(0.002)
(0.003)
(0.024)
(0,025)
(0.029)
College
Some
College or +
0.030
0.030
0.049
0,037
0.029
(0.002)
(0.002)
(0.022)
(0,022)
(0.030)
Teen Mothers
0.091
0.096
0.018
0,018
0.018
(0.003)
(0,004)
(0.042)
(0,043)
(0,043)
Adult Mothers
Notes: Columns
1
Resinction
4.
0.030
0.032
0.032
0,006
(0.001)
(0 002)
(0.017)
(0,017)
and 3 repeat estimates from Tables 2 and
for the selection-bias test.
-2.19
(1.91)
-4
0.0023
Mamed
Some
06
(1,48)
(0.0013)
Women
Ever
(1.45)
(0.16)
Adult Mothers
All
27)
-722
College or +
Separated
(I
1
(0.11)
Teen Mothers
Divorced or
/(1-P[D=1|Z=0])
LATE
(2)
College
Some
9=
= P[D=l|Z=l]
Alternative
Ever Married
No
e
(1)
Women
Worked
ATE
OLS
Sample
Weeks
Selection
Column 4 repones
3.
Column
2
shows
the no-seleciion
estimates of ATE under Restriction 3 and
column
-0024
(0
023)
ahemative under Resmciion
5 reports estimates of
I
and
ATE under
ti.89
MOV
1 I
2003
Date Due
MIT LIBRARIES
„
I
nil!
I
lllll|llll[lll)llliili|ni
3 9080 02613 1349
^'liiiBi
Download