MIT LIBRARIES DUPL mmmmmimunuunmmmRSvmmtainniuiMKny^ ; ", 3 9080 02613 1349 Hrl.'WftlKllillHJfrDHfJiiiill.'limii fill ^11 mm iiijl' iBi ^lil Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/treatmenteffecthOOangr [B31 n.^^ ^ M415 Technology Department ot Economics Working Paper Series Massachusetts Institute of TREATMENT EFFECT HETEROGENEITY THEORY AND PRACTICE IN Joshua Angrist Working Paper 03-27 August 2003 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network http://ssrn.com/abstract Paper iSollection id=X^XXX / at rxk I MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEP 1 9 2003 LIBRARIES j C:\D_Drive\WPDOCS\PROJECTS\sargan\drafts\revs8_03\sargan8.wpd; August Treatment Effect Heterogeneity in 1, 2003 Theory and Practice* By, Joshua D. Angrist MITandNBER August 2003 'Presented as the Sargan lecture. Royal Sargan developed much Economic Society, University of Warwick, April 2003. Denis of the estimation theory that instrumental variables practitioners rely on today. For recent surveys of Sargan's contnbutions see Arellano (2002) and Phillips (2003). Special thanks go to Patricia Cortes and Francisco Gallego for outstanding research assistance and to Alberto Abadie, Victor Chemozhukov, Jerry Hausman, and Guido Imbens for helpful discussions and comments. Preston, for their detailed vvntten comments. This is a Thanks also go revised version of Temple and Working Paper 9708. to the editors, Jon NBER Ian Treatment Effect Heterogeneity in Theory and Practice Instrumental Variables (IV) methods identify internally valid causal effects for individuals status is manipulable by the instnmient at hand. whose treatment Inference for other populations requires homogeneity assumptions. This paper outlines a theoretical framework that nests causal homogeneity assumptions. These ideas are illustrated using sibling-sex composition to estimate the effect of child-bearing marital outcomes. on economic and The application is motivated by American welfare reform. The empirical results generally support the notion of reduced labor supply and increased poverty as a consequence of childbearing, but evidence on the impact of childbearing on marital stability and welfare use is Joshua D. Angrist MIT Economics Department angrist@mit.edu keywords: Instrumental Variables, Marital JELNumbers:C31,J12,J13. Stability, Welfare, Causal Effects. more tenuous. Empirical research often focuses on causal inference for the purpose of prediction, yet to say that most prediction involves particular set of empirical results is a fair amount of guesswork. The relevance or "external always an open question. As Karl Pearson (191 1, p. seems it fair validity" of a 157) observed in an early discussion of the use of correlation for prediction, "Everything in the universe occurs but once, there is no absolute sameness of repetition." This practical difficulty notwithstanding, empirical research always motivated by a belief likely effects that estimates for a particular context of similar programs or events in the future. discouraging empirical work reveals that empiricists like The basis for extrapolation stability while of causal effects. is a set As a graduate my own teaching and research are willing to extrapolate. student, I learned about parameter stability as "the Lucas critique," focuses on the identification possibilities for average causal effects in Applied micro-econometricians devote considerable whether homogeneity and stability assumptions can be justified, and implications of heterogeneity for alternative parameter estimates. sometimes comes at the in often- of assumptions about the cross-sectional homogeneity or temporal models with heterogeneous potential outcomes. attention to the question of almost provide useful information about the Our investment of time and energy me is Regrettably, this sort of analysis expense of a rigorous examination of the internal validity of estimates, the estimates have a causal interpretation for the population under study. Clearly, however, valid estimates are less interesting if they are completely local, i.e., to the i.e., whether even internally have no predictive value for populations other than the directly affected group. In this paper, I discuss the nature and consequences of homogeneity assumptions that facilitate the use of instrumental variables (IV) estimates for extrapolation." To be precise, I am interested in the assumptions that link a Local Average Treatment Effect (LATE) tied to a particular instrument with the population Average Treatment Effect (ATE), which is not instrument-dependent. Implicitly, ^Denis Sargan noted the difficulty of the core instrumental variables identification problem, identification in models with constant effects, in his assumptions concerning these errors, namely in Arellano, 2002). that seminal 1958 paper: "It is I have in mind i.e., not easy to justify the basic they are independent of the instrumental variables." (p. 396; quoted prediction for populations defined by covariates. were me to treat individuals with characteristics to sidestep variability due to changes focus on I ATE because it X, what would the hkely change in the answers the question: "If we outcomes be"? This allows in process determining treatment status. For example, causal research often focuses on average treatment effects on the treated population. Overall average treatment effects are theoretically treated as well as The stable than average effects in on empirical work (see, limited in a precise way by a e.g., is of special Moffitt, 1 interest and independence - analogous in the - depend on who gets both because of the growing importance 999), and because the ex ante generality of IV estimates number of well-known theoretical results. constant treatment effects and certain types of randomized interest the treated, since the latter on the distribution of potential outcomes. external validity of fV estimates of rv methods is more the standard trials, to the notion that the instrument induces a are not sufficient to capture the expected causal effect on a Except in special cases like IV assumptions of exclusion good experiment for the effect of randomly selected individual or even population subject to treatment. Rather, basic IV assumptions identify causal effects on "compliers," defined as the subpopulation of treated individuals instrument. Although this limitation is whose treatment status can be influenced by the unsurprising, the nature and plausibility of assumptions under which IV estimates have broader predictive power are worth exploring. The next two sections develop a theoretical framework linking population subgroups defined by their response to an instalment. That parameters like LATE and ATE. My agenda is to make this from stronger (no selecfion bias) applied to the to composition. In particular, are much more likely to is, I consider formal finks between link using a range of assumptions, progressing weaker (a proportionality assumption). These theoretical ideas same sex instrument, used by An grist and Evans on labor supply. This instrument alternative causal parameters to arises from the among parents who have go on to have a third child. fact that at least ( 1 are then 998) to estimate the effects of childbearing some parents prefer a mixed sibling sex two children, those with two boys or two Because child sex is virtually girls randomly assigned, a dummy for same sex sibling pairs provides an instrumental variable that can be used to identify the effect of childbearing on a range of economic and family outcomes. My earlier work with Evans using the same se.x instmment focused on the effects of childbearing on labor supply. While labor supply outcomes also appear investigation of the effects of childbearing marital stability can be motivated I An stability. the grounds features an inquiry into the effects of family size on by American welfare reform, which penalizes women receiving public assistance on welfare receipt more likely. on marital work in this paper, the empirical that increases in family size further childbearing by make continued poverty and therefore look at effects on marital status, poverty status, and welfare use, as well as labor supply. Estimates of ATE for the effects of childbearing are generally smaller than estimates of LATE. For example, while estimates of LATE for the effect of childbearing on welfare use and marital stability are mostly significantly different from zero, most (though not all) of the estimates of ATE for effects of childbearing on marital status and welfare use are small and insignificant. for teen mothers, latter is that is LATE appears to be virtually identical to the population average treatment effect when the empirical results suggest a pattern of modest effects, but the variability in parameter estimates model specifications and samples, as well as the usual problem of more imprecise estimates under weaker identifying assumptions, reduces the predictive value any findings. that the attempt to go from LATE to ATE weakens marital stability and welfare use, but the estimates of is tantalizing result imputed under the assumptions considered below. The across One On balance it seems fair to the evidence for an adverse effect of childbearing ATE do not provide a sharp alternative to LATE. perhaps not surprising, given the difficulty of the underlying identification problem. experimental sciences, the best evidence for predictive value is likely to experiments, which in the case of applied econometrics usually means come from new new instruments. As say on This in the data sets and new 1. Causality and Potential Outcomes in Research on Childbearing The effects of children on marital course of more than academic interest to stability have long been of interest to many married couples. social scientists Previous research (e.g., and are of Becker, Lanes, and Michael, 1977; Cherlin, 1977; Heaton, 1990 and Waite and Lillard, 1991) suggests the presence of young children increases marital stability, although many authors acknowledge serious selection problems that may A related issue is the connection between childbearing and women's standard of living. A large bias results. literature looks at the effect of teen childbearing on mothers' schooling, earnings, and welfare status, Bronars and Grogger, 1994). Interest can be sometimes using instrumental variables (e.g., motivated by welfare reform, which include "family caps" eliminate benefits paid for children bom in to welfare recipients, many U.S. states. on the theory in this question Family caps reduce or by that further childbearing welfare mothers increases the likelihood they will stay poor and therefore continue to receive benefits.' Are children the glue that holds couples together or a women? collapse? Does childbearing further impoverish poor potential outcomes, i.e., with family size and let Yg, is at least D^ be an indicator for two children. Because D, is women binary, I not determined directly by a program or policy. though only one the observed outcome, Y, let be her circumstances otherwise. for everyone, that accelerates a fragile family's Implicit in these questions is the notion of a contrast in circumstances with and without childbearing, for a given family. represent this idea formally, women burden Y„ is with more than two children will refer to it sample of as a "treatment," even though Let Y,, be a woman's circumstances We imagine both of these potential ever observed for each in a woman. Formally, To this if D,=l, outcomes are well-defined can be expressed by writing as = Y„,(l-D,) + Y„Di. For both practical and substantive reasons, 'See Maynard, et al I focus here on fertility consequences defined with (1998) for more on the motivation for family caps. The possibility of a link between is little evidence that family caps actually affect fertility behavior. childbearing and poverty notwithstanding, there See, e.g., Grogger and Bronars (2001), Blank (2002) or Kearney (2002). reference to the transition from two to more On than two children. the practical side, instalments based on Angrist and Evans (1998) used parents sibling sex composition are available for this fertility increment. preferences for a mixed sibling sex composition to estimate the labor supply consequences of childbearing. On the substantive side, post-war reductions in marital (see, e.g., Westoff, Potter, to have more a third child a and Sagi, 1963). While almost may be due in part to a fact that the median of 2 all have been concentrated couples want sense of whether this generally, the economic welfare of the family. supported by the and fertility is - Y„,| D,= l], which is outcomes average, E[ Y(,,| D,= 1 ] Alternately, . E[Y|; - Yg,], which can be used to (or a Estimation of ATE woman is j - Yn,| we may be interested women who have D= 1 is research on causal effects tries For example, a third child. Note we may that E[Y|,| equivalent to estimating the counter-factual unconditional average treatment effect (ATE), if the analysis conditions on covariates). equivalent to estimation of both counterfactual averages, E[Yo,| D,= l ]and £[¥,,! D,=0]. D= l bias term disappears this is with a particular set of characteristics and D,=0 equals E[Y,, - Yo,| to measure. The observed difference D = l] when in outcomes plus a bias term: = E[Y„|D,= 1]- E[Yo,|D,=0] = E[Y„-YoJ D=l] outcomes. But stability or, make predictive statements about the impact of childbearing on a randomly E[Y,|D = 1]- E[Y,|D,=0] The ] in the Causal parameters are easy to describe but hard between those with woman, for different subpopulations. the effect on an observed quantity, so estimating E[Y| woman good for long-term marital children. be interested in E[Y,, chosen child, the decision population of welfare mothers in 1990 had an average of about 2.3 children to capture the average difference in potential is one range Finally, interest in the 2-to-3 child increment Since both Y,, and Y„, are never both obser\'ed for the same D,= 1 ] at least in the 2-3 child childbearing is detennined (1) +{E[Yo,| D,= l]-E[Yo,| D-0]}. in a manner independent of a woman's potential independence assumption seems unrealistic since childbearing decisions are made of information about family circumstances and earnings potential. in light Two sorts of strategies are typically omitted variables bias. One assumes used that conditional Then any causal independent of potential outcomes. conditional-on-X comparisons. This is on covariates, a strong assumption that 2. IV in is a interest, Dj, is from weighted seems most plausible when researchers have Alternately, instrumental variable which, perhaps after conditioning on covariates, The instrument used here of Xj, the regressor effect of interest can be estimated considerable prior inforrnation about the process determining Dj. potential outcomes. presence of possible to estimate causal effects in the we is related to might try to find an Dj but independent of dummy variable indicating same-sex sibling pairs. on the treated whose treatment context TV estimates capture changed by the instrument at the effect of treatment hand. This idea is for those status can be easiest to formalize using a notation for potential treatment assignments that parallels the notation for potential outcomes. In particular, let Doi and D,, denote potential treatment assignments indexed relative to a binary instrument. Suppose, for example, Dj determined by is a latent-index assignment mechanism, D,= where l(Y„ + (2) Y,Zi>7li), binary instrument, and Z, is a = treatment assignments are Do, The is just 1[Yo > is a '^\\ random error independent of the instrument. and D,, = 1[Yo + Yi > constant-effects latent-index assignment or vice versa. model x], model is 'H,]) Dqi what treatment would receive if Zi=0, and i and D,;. restrictive since Whether linked D,j tells us potential both of which are independent of Z|. We can relax this restriction by allowing a random Yn for each an alternative notation for Then the to i, in it implies D,, > which case the for all if i, latent index an index model or note, Dq, what treatment would receive i D,,, tells us Z =1 The observed . assignment variable, Dj, can therefore be written: D, = Do,(1-Z,) This notation makes + D„Z, it clear that, paralleling potential outcomes, only one potential assignment is ever observed for a particular individual. The key assumptions supporting IV estimation Independence. {Yo^, Yi^, Dq,, D,,) FlRSTSTAGE. P[D = 1|Z = 1] ^P[D = 1| MoNOTONICITY. Either D,, > D,,, These assumptions capture the notion affects the probability of treatment Imbens and Angrist [1994] show V n i are given below (for a model without covariates): Z,. Z,=0]. or vice versa; without loss of generality, that the instrument (first-stage), and is assume the fonner. "as good as randomly assigned" (independence), affects everyone the same way if at all (monotonicity). that together they imply: E[Y,|Z,= 1]-E[Y,|Z,=0] = E[Y„-Yo,| D„ > DJ. E[D,iZ-l]-E[D,|Z=0]. The left-hand side of this expression models with measurement YqJ D|, > Dq,], is the effect the population for A Wald's (1940) estimator Effect (LATE) on the of treatment on those whose treatment status is right for regression hand side, E[Y|,- changed by the instrument, i.e., which U, =1 and Do=0.'' + a, some constant a. Y„, the population analog of The Local Average Treatment standard assumption invoked Y„ = for error. is most empirical studies is constant causal effects, In the childbearing application, a constant effects consistently estimates the YqJ D,, > Doi]=a. The in i.e., assumption implies that IV common effects of childbearing on all women, since, given constant effects, E[Y,i- LATE result above highlights the fact that in a more realistic world where this effect 'Proof of the LATE result: E[Y,| Z = 1]=E[Y„, + (Y,,-Y„JD,| Z = l], which equals £[¥„, + (Y, -Y„,)D|,] by independence. Likewise E[YJ Z=0]=E[Y„, + (Y|,-Y„,)D„, ], so the Wald numerator is E[(Y, -Yo,)(D| -D„,)]. Monotonicity means D, -D„, equals one or zero, so E[(Y|,-Y„J(D,,-D„,)]=E[Y| ,-Y„,|D, >D„,]P[D| >Do,]. A similar argument shows E[D,| Z,= 1]-E[D,| Z,=0] = E[D„-Do,]=P[D, >DJ. varies (and indeed then we can be it must vary if, for example, Y, sure only that by manipulating Zj. IV captures is a binary outcome or other variable with limited support), on individuals whose treatment status can be changed the effect These are people with 0,,= ! and Do,=0, or D, -Doi=l Note also . are defined with reference to a particular instrument, then - that since Dn and Dq, again, in the absence of additional assumptions -we should expect different instruments to uncover different average causal effects. We might, for example, expect an IV strategy based on the same sex instrument to identify a different average effect than an instrument based on twin births. instruments that are In fact, Angrist and much lower than those using Evans (1998) report IV estimates using twin same sex birth instruments. Angrist, Imbens, and Rubin (1996) refer to people with D||-Doi=l as the population oi compilers. This terminology is motivated by an analogy to randomized trials where and D| is Z, is a randomized offer of treatment actual treatment status. Since D,i-Doi=l implies D,=Zj, compliers are those experimenter's intended treatment status (though not For compliers, the averages of Yj, and Abadie (2002) shows Yf,| all who comply with an those with D,=Zj are compliers, as explained below). as well as the average difference are also identified. In particular, that E[Y,D,|Z,=l]-E[Y,Di|Z,=0] E[Y„|D„>Do,] (3a) E[Yo,|D„>Doi]. (3b) E[D,|Z,= 1]-E[D,|Z,=0] E[Y,(l-Di)|Z,= l]-E[Y,(l-D,)|Z,=0] E[(l-D,)|Z,= l]-E[(l-Di)|Zi=0] The entire (marginal) distributions of Y, and Yqi are j similarly identified, a fact used by Abadie, Angrist, and Imbens (2002) to estimate the causal effect of treatment on the quantiles of potential outcomes for compliers. An by a important econometric result in the theory of causal effects mechanism like (2), population is that when treatment is assigned average treatment effects and the effect on the treated are not identified without assumptions such as constant effects or some other assumption beyond the 3 given above. This result or theorem appears in various forms; see, for example. Chamberlain (1986), 8 Heckman (1990), and Angrist and Imbens and the role played ( 1 99 1 ). The next section develops a framework that highlights the by alternative homogeneity assumptions in efforts to limits to identification go beyond LATE. The Same sex instrument offers an especially challenging proving ground for these ideas since at most women of American have an additional child as a result of sex preferences. Causal effects on same sex compilers can therefore be quite far from overall average effects if the impact of childbearing on these Before turning to a general discussion of treatment effect heterogeneity, between LATE, ATE, and 2.1 7% effects on the treated in a parametric I women is not typical. briefly explore the relationship model that mimics the same sex setup. A Parametric Example Following Heckman, Tobias, and Vytlacil (2001), trivariate Normal model Normal, let p,„ the distribution of [Y,, Y„, ATE is zero by construction. Assume also that Yi>0 be the correlation between Y,,- Yq, and E[Y„-Y„,| D„ > Do,] ({)() so monotonicity In this parametric r|,. = E[Y„-Y„,| Yo+Y,> = where Assuming (2). calculated average causal effects using a of potential outcomes and the error term for the joint distribution assignment mechanism given by equation I ti, > model, is in the latent-index r|,]' is joint standard satisfied with D|,>Do, LATE and can be written: (4) Yo] P,o{[4)(Yo)-4>(Yo+Y,)][*(Yn+Yi)-^(Yo)]'}, and $() are the Normal density and distribution functions. Similarly, we can use Normality to write the effect on the treated as: E[Y„-Y„,| D,= l] = E{E[Y„-Yo,| Yo+Y,Z,>ti„ Z,]| = -p,o{A(Yo+Yi)E[Z,| where A() is the inverse Mill's ratio, (|)( )/3>(). D = l] = E[Y„-Yo,| Yo + Y, > D = l] ^, LATE (5) + A(Yo)(l-E[Z,| This formula expression better clarifies the difference between E[Y„- Y„,| D = I} is and the D = l])}. useful for calculation, but the following effect on the >Yo]" + E[Y„-Yo,| Yo > treated: ti,](1 -a>), where w={$(Yo+Yi)-'5(Yo)}[P(Z,= 1)/P(D,= 1)] and 1-u=$(yo)/P(D = 1). Equation (6) (6) shows the effect on the treated to be a weighted average of LATE depend on the 2.1.1 LATE first and the average stage and the distribution of effect and the effect on the treated both depend on the correlation between potential outcomes and distribution of the instrument. sketched in Fig. against <J(Yo) for a fixed 1 The The ATE which plots , first 1 sets pio = , would be is in the figure ( 1 treated group only (see, e.g.. 2 on the shows is also depends on the LATE, and the effect on the treated Bemoulli(.5). In other words, as with the 998), the simulated instrument 1 is a dummy that equals one with by 7 percentage points. The top panel of Fig. of treatment increases with the gains from treatment, as in a Roy p.o = -.5 for stronger selecfion on gains. With positive shows that p,o, the LATE equals the effect on the treated when <I>(Yo)=E[D|| incompatible with the Normal latent-index model since an important special case effect on the treated reflected through the horizontal axis. The leftmost point is a constant equal to zero), and increases the probability that D; equals -.1, so that the probability Z,=0]=0. This ( stage of .07 and an instrument that (1951) model, while the bottom panel sets figure effect relationship between alternative causal parameters in the parametric same sex instrument in Angrist and Evans probability V2 with weights that Zj. the latent first-stage error, and on the first-stage coefficients. is "H,, The Effect on the Treated vs. LATE model on those with Yo > in practice, Bloom, 1 most commonly in 'A leading example is the requires Yo~~ °°> but E[Dj| Zj=0]=0 trials with partial compliance in the 984 or Angrist and Imbens, 1991).^ At the other end of the treated approaches the overall average effect that increasing the size randomized it of the first randomized when almost everyone gets treated. Finally, Fig. stage effect from .07 to .30 pulls both trial figure, the LATE and the effect on used to evaluate subsidized training programs offered through the Job Training Partnership Act, one of America's largest Federally-sponsored training programs. Subsidized training was offered but not compulsory in the randomly selected treatment group. About 60 treatment took up the offer, so E[Dj| Z,= 1 ]=.6, where Zj status. LATE On is is the % of those offered randomized offer of treatment and D, is acUial training the other hand, (virtually) no one in the control group received treatment, so E[D|| Z|=0] = 0. In this case, the effect on the treated because the set of analysis of the al ways-takers JTPA. 10 is virtually empty. See Orr, et a/ (1996) for an IV the treated closer to the overall average effect. The effect consequence of the of treatment on the treated fact that selection is LATE above for on gains makes E[Y|,-YnJ Yo > first-stage baseline values, a all 'H,] bigger than LATE. Moreover, LATE provides a better measure of the effect of treatment on a randomly chosen individual (ATE) than does the effect on the treated for from equation density. (4)) is that most parameter values. LATE=ATE when Thus, as noted by Heckman and A important feature of the figure (also apparent final Yi=''2yo since 4>(Yo)=4'("Y[i) by symmetry of the Normal Vyilacil (2000), a "symmetric first stage" that changes the probability of treatment from/? to \-p implies LATE equals ATE in the Normal model, or in any latent variable model with jointly symmetric errors.* 3. Identification Problems and Prospects Angrist, Imbens, and Rubin (1996) show population into three groups, which compilers, other i.e., those for whom D|,= I l refer to that the potential-outcomes framework for IV divides a below as "potential-assignment subpopulations." The and Do,=0. In the latent two groups include individuals whose treatment index model, compilers have Yo+Yi>'n,>Yo- status is unaffected by the instrument. of never-takers, with D, =D|,=0. Never-takers are never treated regardless of the value of Z, might be exposed. In the latent with Doi=l and D, =0 The set of the is = l . is to The consists which they Always-takers are always treated regardless of the value of Z, to In the latent-index empty by treated One are index model, never-takers have r|,>Yo+Yi- The second unaffected group consists ofalways-takers, with D, =Df, which they might be exposed. first model, always-takers have Yo^'Hi- A possible fourth group virtue of the monotonicity assumption. the union of the disjoint sets of always-takers and compilers with Z= l . This "Joint symmetry means that if AVj,. fl,) is the joint density of yj=Yj,-E[Yj,] and r|,, then/f-y,, -T|^=/(y„ t\,). weaker condition with the same result (a symmetric first stage ranging from p to -p gives LATE=ATE) is that EfyJ Ti|] is an odd function (as for a linear model) and that r|, has a symmetric distribution. Angrist ( 1 991 ) somewhat more loosely noted that IV estimates should be close to ATE when the first stage changes the probability of treatment at values centered on one-half, as is required for the first stage to be symmetric. A 1 11 provides an interpretation for D, =Doi tlie following identity: + (D,-D„,)Z„ since Doi=l indicates always-takers and {Dy-D(i^)Z,, indicates compiiers with Z,=l. Since Z| of complier status, therefore be is independent compiiers with Zj=l are representative of all compiiers. Causal effects on the treated can decomposed E[Y,i - Yo;i as: D = l] = E[Y,, - Yo,| Do,>D„]{l-P(Do=D„=l| D,= l)) + E[Y„ Equation (7) generalizes - Yo,| Do,=D, =1]P(D„,=D„=1| D,= l)). which gives the same decomposition (6), (7) for the Normal model. Because an instrumental variable provides no information about average treatment effects in the set of always-takers, LATE is identified while E[Y,,-Y(,,| To further pinpoint the D,= ]] is not. identification challenge in this context, note that E[Y,,| Do,=D,j=] ] and E[YoJ Doi=D,j=0] can be estimated using the following relations: E[Y,J DorD„=l] = E[Y„| Do = l] = E[Yoi| Do,=D„=0] = E[Yo,| D„=0] = E[Y.| D,=0, Z,= l]. The missing pieces of E[Y,| D,= l, Z,=0] (8a) (8b) the identification puzzle are therefore the fully counter-factual averages, E[Y,,| DorD„=0]andE[Yo,|Do,=D„=l]. 3.1 Restricting Potential-Assignment Subpopidations The conditional expectation functions (CEFs) of Y,, and framework for the discussion E[Y,jDoi,D,i] = a, + of alternative identification p,oDo, Yqj given potential assignments provide a strategies. These CEFs can be written: + PnD„ (9a) E[Yo,|Do„D„] = ao+PooDo, + Po,D,, (9b) Equations (9a) and (9b) impose no restrictions since there are three potential-assignment subpopulations and three parameters in each CEF. The 6 conditional means, 12 E[Yj,| Dq,, D,,], are uniquely determined by (9a,b) as follows: Definition Indicator CEF Compliers D,,= l,Do,=0 D|,-Do, Co Always-takers D|=Du=l Dni Co Never-takers D,,=Do,=0 l-D,, tto Group The CEF for observed outcomes, E[Y,| CEFs of Yo, and Y,, given and D,,, D,, D„ Z,], + + CEF for Y„. K, Poi Poo + a, Poi for Y,. + Pn + P10 + P1. a, has a distribution with 4 points of support, while the depend on 6 parameters. This suggests the the former without additional restrictions, a result implied latter are not identified from by the theorem below. and Monotonicity assumptions hold and that YQ^and Y,, and f,(y/ D,,, DoJ denote the conditional distribution functions for potential outcomes given potential assignments and letfyp^fy, d, z) denote thejoint distribution ofY,. D,, and Z,. Then ffy/ Dj,, DJ and f(y/ D,„ DgJ are not identified from fy^^fy, d, z). THEOREM: Suppose the Independence, have multinomial distributions. Proof: Factor the d.f using D,„ Do,) f/y| '^' z), P^o(y)D„, and of the parameters determining show = fyiDzCyl d, + P„(y)D,„ fvozCY' d, z) = a/y) + substitute into fypzCY' First-Stage, Let f(,(yl D,,, DgJ iterate z)g|32;(d,z). The second term is unrestricted. Let expectations to obtain the multinomial likelihood solely as a function fo(y| D||, D,,,) and f|(y| D,„ Dq,). Finally, substitute for fo(y| D,,, D^) and f|(y| invariant to the choice of Pon(y) and P,|(y) as long as a,(y)+P|,(y) is constant. Non-identification of Po(,(y) implies non-identification of the marginal distribution of Yqi while D|,, D,,,) to the likelihood is non-identification of p,|(y) implies non-identification of the marginal distribution of Y,,. The multinomial distributional assumption raises the question of how general the theorem is. It seems general enough for practical purposes since, as noted by Chamberlain (1987), any distribution can be approximated arbitrarily well by a multinomial. Moreover, I'd like to rule out identification based on continuity or support conditions to avoid paradoxes such as "identification at infinity".' 3.2 A Menu of Restrictions A strike variety of restrictions on (9a,b) are sufficient to identify me as being of special interest. The simplest is ATE. I briefly discusses 4 cases that ignorable treatment assignment or "no selection bias." 'See Chamberlain (1986). The multinomial assumption has some content since it implies that potential outcomes have bounded support, so that ATE and effects on the treated are bounded. See Manski ( 990) or 1 Heckman and V>tlacil (2000). 13 Restriction This implies 1 (NO SELECTION LATE = BIAS). Poo=Poi=Pio=Pii=0; = ATE. Under a|-a(, Restriction 1, ATE can be estimated from simple treatment- control comparisons. Because the assumption of no selection bias involves four restrictions while two would be sufficient, ATE over-identified in this case.* is identification by comparing IV and 1, standard OLS estimates, Hausman (1978) test for endogeneity exploits over- equivalent here to a comparison of Wald estimates with A modified and potentially more powerful test can be based on the fact that treatment-control differences. under Restriction A E[Yii| Doj=l]=a| and E[Yo,| D|i=0]=ao. Using (8a,b), this suggests the following specification test; Test for Selection Bias. E[Y,|Z,= l]-E[Yi|Z-0] ={E[Y,|D,= 1,Z,=0]-E[Y,|D,=0,Z,= 1]}. (Tl) E[D,|Z,= 1]-E[D;1Z.=0]. In the appendix, I show how The Hausman a test statistic based test for selection bias replaces E[Y,| here since under Restriction while T are the 1 from the implicitly 1 both fact that the D,= l, Z=0]-E[YJ Di=0, Z = l] on the right hand The Hausman test will side of Tl with E[Yj| D|=1]-E[Y,| Di=0]. test arises on Tl can be computed using regression software. also work in the causal framework outlined OLS and FV estimate ATE. The difference between Tl Hausman test implicitly compares E[Y|,|D,,>Do,] withE[Yj|| Dj=j] forj=0,l, compares E[Yjj| D, i>Doi] with E[ Yj,| D, j=Doi=j] for j=0, The empirical results same under monotonicity but not in general. and a Hausman 1 . These two pairs of comparisons below suggest that Tl which uses , 'a weaker version of Restriction 1 with P||=Poi,=0 is also sufficient to identify ATE since this equates neverCEF of Y,, and always-takers with compliers for the CEF of Y^,. This seems no easier to takers with compliers for the motivate than Restriction 1 , so I limit the discussion to the over-identified case. 14 monotonicity, indeed provides a more powerful specification While pivotal assumption of no selection bias for specification testing, the for causal inference since the use of IV test.' is an unattractive basis An motivated by the possibility of selection bias. is assumption that allows for selection bias amounts to the claim mean-independent of potential treatment assigmnents. I that the difference between Y,, alternative and Y„, is refer to this as "conditional constant effects." Formally, this means: Restriction 2 (conditional constant effects). p„o=P,o; Poi=Pii- This pair of restrictions ATE. ATE., or, equivalently, sense that Y,, and the same Y,,, is just sufficient to identify E[Y| — Y(„| D,„ D^,] In particular, = E[Y, —Yd,]. While we again have = ai-a,, = restriction 2 allows for selection bias in the are correlated with potential treatment assignments, the correlation for both potential LATE outcomes, so that the difference between Y,, and Yg; is restricted to be is orthogonal to potential treatment assignments. Ill tlie same sex example. constant, are nevertheless the rules out Roy (195 treatment. In the ) 1 Restriction 2 amounts same regardless of a to woman's type selection, where treatment status case of childbearing, for example, a implausibly) be independent of individual-level variation On the plus side. Restriction 2 is is likelihood of having children. determined woman's childbearing in the '"Note that the second part effects is is first part the basis of much is by the gains from decision must (somewhat in that does not require it Y„,.'" of Restnction 2 sufficient to identify the effect theoretical matter this Restriction 2 labor-supply consequences of childbearing. 'Abadie (2002) develops a number of related bootstrap specification the at least in part weaker than the usual constant-effects assumption between Y,, and a deterministic link saying that average treatment effects, while not empirical work and is tests. sufficient to identify the effect of treatment on the treated, while of treatment on the non-treated. Although conditional constant may be a reasonable approximation for practical purposes, as a typically implausible unless treatment 15 is exogenous; see, e.g.. Wooldridge ( 1 997, 2003). A third restriction, which I call "linearity," is appealing because with a benchmark Roy-type selection model. The linearity condition Restriction 3 (linearity). E[YoJ Do,, is not fundamentally inconsistent is: Poo=Poi; Pio=Pii- In this case, the potential-outcomes E[Y„|Do„D„] = it CEFs can be written: + p,,(Doi + D„) (10a) D„] = ao + Po,(D„i + 0,0- (10b) a, Restriction 3 requires the potential-outcomes measure of the desire or suitability CEF to be linear in Di* = Doi of an individual for treatment. nevertheless think of (10a) and (10b) as providing a If + D,;, where the restriction minimum mean-squared Dj* is is a summary false, we can error approximation to the unrestricted model, (9a) and (9b). To see how average causal effects are identified under Restriction 3, write the probabilities of being an always-taker and never-taker as P[Do,=D,i=l] = E[Doi]=p3 P[Doi=D„=0] = E[l-D.,]=p„. and note that E[Doi + Dh] = Substitute into ( 1 1 + Oa) and E[Yh-Yo,] (p3 - ( 1 = pj. Ob) and difference to obtain [(a, + P„) - (ao + Po,)] + (Pn (H) - Po.XPa - P„) = E[Y„-Yo,|D„>D„,] + {(E[Y„| Do,= l]-E[Y„| D„>Do,]) The components on the right hand side of ( 1 - (E[Yo,| D,>Do,]-E[Yo,| D„=0])}(p3 1) are easily estimated; details are - p„). given in the appendix. A calculation similar to that used to derive (11) shows that the effect of treatment on the treated can 16 be constructing using E[Y„ - Yo,| D,= l] = + p„) [(a, - (ao + M] + (Pii - PoiXPa/Pa) (12) = E[Y„^Yo,|D„>Do,] + {(E[Y„| Do=l]-E[Y„| D, where p^ is if there are 3. 1 From the probability of treatment. no always-takers, LATE is (12), - (E[Yo,| >Do,]) D„>Do,]-E[Y„,| D, =0])}(p.yp,), we can immediately derive the Bloom (1984) result that the effect on the treated." Symmetry Revisited Restriction 3 is closely related to the symmetry property discussed see this, note that as a consequence of linearity we can interpolate the CEP in the parametric example. for compilers To by averaging as follows: E[Y„ D„ > D„,] I = {E[Y„ Do,= l] + E[Y„ D„=0]}/2. (13) | I This means that expected outcomes for compilers can be obtained as the average of expected outcomes for always- and never-takers. is What distributional assumptions support a relation like determined by a latent-index assignment mechanism, as E[YjD„ = Do,=0] = E[YJD„ = D„,=1] = E[YJti,<Yo], E[Y„h,>Yo + in equation (2). ( 1 3)? Suppose treatment Then, Y,] and E[Y^, I D„ > Do,] = E[YJ Yo + Y, > Tl, >Yo]- If in addition, Yi=-2yo, then equation (13) holds as long as (Yj„ t],) is jointly symmetric, as in the Normal model. The restriction Yi=-2yo implies P[D,= 1|Z,=0] = P[ti,<y„]=1^/^ P[D,= l|Z,= l] = "With no (14) Ph,<-Yo]=P always-takers, we have Dq.sO, so Restriction 3 17 is not binding. for some /? e [0,1] so the first stage is also symmetric (e.g, a first stage effect of .1 that shifts the probability of treatment from 1-/7=. 45 top=.55). The upshot of the previous discussion first is that a stage imply the interpolating property, (13), LATE equals ATE since p=p„ given or, symmetric latent error distribution and a symmetric equivalently, Restriction 3. Moreover, the first stage described in (14).''^ Intuitively, a with symmetrically distributed latent errors equates individuals with characteristics that place them first-stage relationship may shifts the probability LATE with ATE because average treatment effects in the middle of the In LATE such cases, equals it r|| if, as average treatment effects for one sample to this question, see rji. first stage. to invoke Restriction 3 I describe a simple scheme for IV should estimate average treatment This approach naturally raises the question of make Restriction effects 3, who would be similar in the is to use outline a procedure designed to between two samples with and without a symmetric problem how inferences about average effects in another. For a recent Hotz, Imbens and Klerman (2000), the extrapolation and typical, the first stage extrapolate the results from randomized trials across sites with different populations. Here that if effects differ little for distribution (compliers) are seems more of treatment asymmetrically? In the empirical section, effects in this specially constructed sample. on seems reasonable ATE. But what using covariates to construct a subsample with a symmetric attack first-stage be fortuitously symmetric, as for the 1990 Census sample of teen mothers using the same sex instrument. proceed under the assumption that again have symmetric representative of average treatment effects for individuals over the entire distribution of A we I rely first-stage, on the fact then given solved under the maintained assumption that average treatment symmetric sample and its complement. this, note that P[D = Z=0]=1 -p implies p^ = -p. Since p^ + p„ + {P[D,=1| Z = 1]-P[D,= 1| Z=0]} + (2;^-l)=l, this implies As noted in the discussion of the parametric model, LATE=ATE p„=\-p. Pn given (14) also results when E[yjj t|,] is an odd function and the marginal distribution of r|, is symmetric. This makes '^To see 1 1 1 = (\^p) + it possible to have a relation like (13) with, say, a binary or otherwise limited dependent variable for which a symmetnc distnbution is implausible. 18 3.2 Weakening Restriction 3 Suppose again that treatment assignment can be outcomes CEFs are linear in t], (as would be modeled using equation the case under joint Normality). (2), and that the potential- Then we can write, E[Y„| D„„ D„] = a, + p,E[ti,| Do,, D„] (15a) D„] = a„ + PoE[ti,| Do,, D„] (15b) E[Yo,| D„„ where Eh.l Do,, D„] = Eh.l D„=0] + {Eh,| D„,= l]-Eh,| D„=l, Do,=0]}Do, + (16) {E[ti,|D„=1,Do,=0]-E[ti,|D„=0]>D„. Substituting (16) into (1 5a) and (15b) generates an expression for the coefficients in (9a), (9b). This leads to the following generalization of Restriction 3; Restriction 4 (proportionality). Poo=ePoi; The first part of the proportionality restriction = {Ehil Doi=l]-E[TiJ D„=l, which shows why Restriction E[Y^, I if infinity, status. D,;=0]}/{E[ti,| (15a,b) alone. Using (16), we have D,-!, Do,=0]-Eh.| D.-O]}, (17) positive. 4 leads to a generalization of the interpolation formula for average potential outcomes. D„ > Do,] = (1/(1+6))E[Y^, | Do,= l] + (0/(l+0))E[Y^, | D„=0], (18) 6=0, compilers have the same expected potential outcomes as always-takers, while as 6 approaches compilers have the same expected potential outcomes as never-takers. The model comes from 9>0. we now have In particular, so that is p, 0=6(3,1, for linearity for continuous On assumption used outcomes. It to may motivate Restriction 4 seems most plausible in the context of a be more of stretch, however, for binary outcomes such as marital the other hand, without covariates the distribution of r|, 19 is arbitrary. We can therefore define r|, as the latent error tenn in an assignment on the unit interval." This guarantees in the unit interval. mechanism hi<;e that ( 1 5a, b) (2), after can generate transformation to a uniform distribution fitted values for outcome CEFs that also fall Alternately, the weighted average in (18) can be motivated directly as a natural generalization of equally-weighted interpolation using (13). To develop an estimator using (18), substitute Restriction 4 into (9a) and (9b) to obtain: E[Y„| D„„ D„] = a, + p„(eDo, + D„) E[Yo,|D„„D„] = a„ + Differencing and averaging, + D„). po,(eDo; we have =[(a, + P,,)-(ao + Poi)]+(Pn-Poi)(ep,-pJ E[Y,;-Yo,] = E[Y„ - Yo, D„ I >Doi ] + {6-'(E[Y„| D,„=1]-E[Y„| D, >Do,]) - We can map out the values of ATE consistent with the data This sensitivity analysis is An (E[YJ D, >DJ-E[YJ by evaluating (19) subject to the caveat that at the extremes not identified, a fact apparent from (19) it infinity, on the Although 6 is stage coefficients, Yq first suggests a strategy for estimating 6 using information on these coefficients only. Suppose that a uniform distribution Uniform = [Yo + on the unit Y.]/[l -Yo] symmetry in a straightforward "The '""To as discussed above, or that the interval. Then ATE is CDF of ti, not identified and ( 1 Yi. This 5a,b) holds can be approximated by a straightforward calculation gives = P(D,= 1| Z,= l)/[1 -P(D,= 1| This has the property that 6=1 p„). (18).''' clearly depends in large part for a latent error transformed to - for alternative choices of 6. where 6 equals zero or alternative to sensitivity analysis is to try to estimate 6 using (17). without further assumptions, D„=0])}(ep, Z,=0)]. (20) when P(D,= 1| Z,= 1)=1-P(D=1| Z=0), while capturing deviations from manner. The value of 6 calculated using (20) in the 1990 Census sample requires that the underlying error have a continuous distribution. compute the effect of treatment on the treated under Restriction 20 4, replace Op^-p^ with Op^/pj in (19). analyzed here is .61 close to the value calculated using Normality (.58). , Homogeneity Restrictions Specification Tests for Because ATE is - by definition - invariant Restrictions 2, 3, and 4 can be partly checked of Restriction In the case IV estimates of the same 2, this amounts to the particular instalment used to estimate by comparing alternative estimates using different instruments. to a Sargan (1958) over-identification structural coefficient. Under suggested by the fact that under Restrictions 3 or A test of whether whether 4. - Y„,| p, , - D,„ Do,]= (ar p,, Poi is - Po, positive aj + equals zero is comparing alternative A final set of specification tests is 4, (p,, " Po,)(9D„, is test Restrictions 3 and 4, the relevant comparison should use equation (19) to convert estimates of LATE into estimates of ATE. E[Y„ it, + D„). therefore a test of conditional constant effects, while a test of a test for Roy-type selection on the gains from treatment. Childbearing, Marital Status, and Economic Welfare The same sex instrument is a dummy for having two boys or two girls at first Angrist and Evans (1998) showed this instrument increases the likelihood mothers with go on to have a third child by about 6-7 percentage points, but demographic characteristics. The data set used here is the paper. This sample includes mothers aged 21-35 with 1 is and second at least birth. two children otherwise uncorrected with mothers' 990 Census extract used in the Angrist and Evans two or more children, the oldest of whom was less than 18 at the time of the Census. Descriptive statistics are reported in Table women, and for four 1 for the full sample, for a subsample of ever-married subsamples defined by mothers' education and age at first birth. The division into subsamples was motivated by earlier results showing markedly different effects of childbearing by maternal education, and because of the policy interest in teen mothers. 21 The probability of having a third child ranges from a low of .33 The in the sample of women with some college, probability of having a same-sex sibling pair is more control for the demographic covariates listed in Table variables of interest appear 4.1 OLS, at bottom of Table the 1 to a high of .5 in the or less constant at .505. using linear models.'^ sample of teen mothers. Some of the Means estimates for the outcome 1. and 2SLS Estimates IV, The effect oisame sex on the probability of having a third child varies from a low of 5.9 percentage points in the some-college sample to a high of 6.5 percentages points in the no-college sample. This can be seen in the first row of Table 2, which reports first-stage estimates. The E[D|| Z,= l]— E[D,| Z,=0] group. '^ As a = E[D, -Do,], is first-stage effect without covariates, also an estimate of the proportion of the population in the compilers benchmark, the next two rows of Table 2 show estimates of the effect of childbearing on two of the labor supply variables studied by Angrist and Evans (1998). These are IV and models without covariates, The Wald (IV) i.e., Wald OLS estimates from estimates and simple treatment-control contrasts. estimates of the effect of a third child on employment status and weeks worked suggest mothers reduced their labor supply as a consequence of childbearing, though not by as indicated by the less likely to OLS estimates. For example, for are larger for less-educated weeks worked are about women than for those "See Abadie (2003) and Frolich (2002) '^Although nevertheless 1 as 3 percentage points work, but the corresponding IV estimate suggests a causal effect of only 8 percentage points. The OLS and IV estimates it women who had a third child were about much is we cannot with - 7 and some - 5. estimates of labor supply effects college; in fact, the labor supply estimates are for nonlinear causal identify individual compilers in The IV models with covariates. any sample and tabulate their characteristics directly, possible to use data to describe the distribution of compiler characteristics, and to compare this to the unconditional distribution. In particular, the difference in first-stage estimates across samples defined by covariates characterizes the distribution of covariates x„ E[-x,| D,>D(,,]/E[x,] = E[D,,-D|,,| .t = l ]/E[D| -D(,,]. among compilers. To see this, note that for a binary covariate, Table 2 therefore also shows same sex compilers to be less educated and more likely to have been married than the overall average. 22 not significant in the some-college sample. In contrast, the IV estimates are smaller for first birth as The teenager than for last two rows in women who Table 2 show had their after adding controls for age, age at first birth, for three schooling groups. including them has effect little covariates are OLS and dummies dummies, and dummies birth as an adult. first first-stage, two-stage least squares (2SLS) estimates to indicate first-born same sex Since on the 2SLS estimates. Moreover, is in response to the addition OLS and second-bom boys, race uncorrelated with these covariates, in spite of the fact that some of the good predictors of outcomes, estimates with covariates than those without. Perhaps more surprisingly, the women who had their are only slightly more precise estimates of labor supply effects also change little of covariates. Estimates of the effect of having a third child on marital status, poverty status, and welfare use are reported in Table 3 for models with and without covariates. In the children are less likely to be married. But this due age at first birth, since OLS is at least in part estimates with controls show important finding rv and OLS is that when is the effect of childbearing to uncontrolled demographic factors such that additional childbearing is associated with an increase in the likelihood of being married. In contrast to the covariates suggest that the causal effect of childbearing sample of all women, those with more is OLS estimates, IV estimates with or without a reduced probability of being married. Thus, an estimated in models with demographic controls, estimates have opposite signs. The most important change in marital status caused likelihood of being divorced or separated. The estimated by childbearing appears effects to be an increase in the of childbearing on the probability of being ever-married or divorced (but not separated) are not significantly different from zero. Consistent with an increase in marital breakup, the birth of a third child also appears to lead to a marked increase in the likelihood a at least a woman lives in a family with total family income below the poverty line. Here we should expect mechanical effect since the poverty threshold falls as family size increases. estimates are larger than IV estimates in models without covariates, 23 OLS and IV estimates Although in OLS models with woman is poor by 9-10 percentage points. covariates both indicate that a third child increases the hicehhood a Given the elevated rates of marital breakup and the increase by childbearing, woman is imprecise. it seems reasonable on welfare. Both the to expect that the birth IV and The OLS estimate of OLS increase in the The IV estimates show no The IV estimate an effect of to the a marginally significant 3.3 % with magnitude represents a roughly one-third sample of ever-married On in the women women woman has ever by are unlikely to be affected sample of ever-married the other hand, while the rates is a significant 8 percentage points (s.e.=.028) for insignificant for 6.7 percentage points without between childbearing and the probability a Not surprisingly, therefore, the IV estimates identical to those in the full sample. this is though the IV estimates are welfare. relationship been married, so estimates limited selection bias. in levels, number of women on this, on welfare use range from the effect While small third child also increases the likelihood a estimates tend to support covariates to 3.9 percentage points with covariates. or without covariates. of a poverty rates that appear to be caused in women are almost IV estimate of the reduction with no college, it is in marriage close to zero and women with some college. The effects of childbearing on poverty are also larger in the no- college sample, though the difference in effects on welfare use by college status is reversed and much smaller than the difference in effects on poverty rates. The difference in estimates by mothers' age at first birth also suggest a pattern of larger effects with decreasing socioeconomic status, though the contrast is not as clear cut as the differences by schooling group. While the increases in marital dissolution and welfare receipt are larger for teen mothers than for adult mothers, the estimates are significant only in the latter group. Estimates of effects on divorce/separation are similar in the two groups, though again much more precise for the sample of adult mothers. This difference in precision undoubtedly reflects the smaller sample of teen mothers. One clear contrast, however, higher likelihood that a third birth pushes a teen mother into poverty. significant regardless of mothers' age at first birth, but 24 it is The impact on poverty is the status is roughly three times larger for teen mothers. 4.2 Heterogeneity across potential-assignment subpopiilations The had first-stage estimates % of each sample consists of compliers, imply that 6-7 i.e., mothers who response to a homogenous sibling-sex mix. Because the overall probability of treatment ranges a child in upwards from about .32, the overwhelming majority of treated individuals are al ways-takers. This can be seen in Table 4, which gives the distribution of potential-assignment subpopulations. In the sample of women, for example, 6.3 % are compliers, 34 % are always-takers sibling-sex composition), and 59 sibling-sex composition). % are never-takers (i.e., The proportion of treated who (i.e., would never have are compliers relatively small proportion of compliers, the scope for differences assignment subpopulation is in the latter is 4.2.1 to -.352 in the 1 - (Pj/pj), or of about 8 %. Given the average causal effects across potential- that determines calculated using Restriction 3 and equation (11), or described in the Appendix. mothers is a third child regardless to substantial. Table 4 also reports the estimate of (p^-p„), the multiplier ATE when have a third child without regard all The estimate of (p^-pj is in -.25 in the full sample, how far LATE is from models with covariates and ranges from as for teen sample of adult mothers. Symmetric subpopulations The value of zero for same as (p^^p,,) in the teen ATE under Restriction 3. This mother sample is noteworthy because consequence of the it means that fact that the first stage for teen LATE mothers is the is almost perfectly symmetric: the same sex instrument shifts the probability of further childbearing from about .47 to ATE .53. for teen The is a Moreover, because 6 for teen mothers mothers under Restriction 4 are also close first two columns of Table 5 focus is to about 1 when estimated using LATE. on the comparison between estimates mothers only, repeating earlier estimates for these samples from Table first-stage coefficient and intercept. For the most part, 25 (20), estimates of IV estimates 3 for all women and teen without covariates, including the for teen mothers are similar to those for the sample of all While the estimated women. on employment effect is considerably lower (s.e.=.051) versus .084 (s.e.=.027) in the full sample, the effect of childbearing on (s.e.=l .3) in the full sample and -.062 (s.e.=.024) in the estimated in the 2, for teen A test full full -4.8 (s.e.=2.4) for teen mothers. at -.026 weeks worked Similarly, the effect on marital is -5.2 status is sample and -.066 for teen mothers (.051). Note that we can view the parameters sample as estimates of E[Y,i- YoJ D|i>D(,j, X=all women], while the estimates in column mothers, can be interpreted as measuring E[Y,j- YqjI X=teen mothers] under Restriction 3 or of equality across columns 1 and 2 is 4. therefore a joint test of the invariance of average treatment on X and on the compilers potential-outcomes subpopulation. The fact that these effects to conditioning both are similar is evidence against substantial treatment effect heterogeneity in both dimensions, though of course there are scenarios where this test has There women is some evidence samples. For all no power. '^ for a difference in effects women, the IV estimate of (s.e.=.023), while the corresponding estimate across samples is precise than in the weakened, however, by the full a symmetric first stage. the effect of childbearing I details of the on poverty status is .095 143 (s.e.=.05) in the teen mother sample. The comparison fact the estimates in the teen sample. This raises the question of whether we can mother sample are much less construct a larger sample with attempted to construct such a sample by estimating a Probit first-stage allowing interactions with covariates The is. on poverty status between the teen mother and all- and then selecting the sample based on covariate-specific symmetric sample selection are as follows. The idea is to fitted values.'* use a parametric model capture the variation in the first-stage effect of same sex on childbearing with demographic covariates. The model allows for a large set predicted first-stage effect Here, of interaction terms with covariates. is I then look for covariate values where the symmetric in the sense required by Restriction "As with an over-identification test, the power of the we maintain £[¥,,- Y(,J X=teen mothers]=E[Y,,- Y„,]. '*A maintained assumption here is test turns that the distribution covariates used to select the sample with a symmetric 26 I began with a Probit first- on maintaining the validity of a benchmark. of yj, and first stage. 3. r|j is jointly symmetric conditional on the stage equation: P[D,= 1|Z,.X,] = <D[ko'X, + (k,'X,)Z,], where X, is (21) a vector of covariates that includes age, age at first birth, Black and Hispanic dummies, and dummies indicating women with some college and college graduates. The main effects, Ko'X,, and interaction terms, k, 'X,, use the same parameterization of covariate effects (in particular, they both allow for linear terms in the age variables plus main effects for the dummies). values. For each of these values, i^„(XO^ women is plotted in Fig. 3. This gives the distribution of the probability of childbearing with different X-characteristics and Z, equal .34, though there By definition, a symmetric first stage identify a sample where Column 3 of Table first-stage in this as required this is most likely, I shifts the probability absolute value than in the full the symmetric sample is tail is in the to .54, i.e., in this sample and .6. approximately from/? to \—p, symmetric sample are smaller in For example, the estimated effect on weeks worked in complementary sample is a lack of precision. of the distribution of first-stage base values plotted turns out, limiting the .4 sample complementary to the symmetric handicapped by larger symmetric sample can be constructed simply it with iioP^d between -3.7 (s.e.=2.2), while the corresponding estimate in the comparison concentrated which has about 104,000 obsen.'ations. The estimated sample, and smaller than -6 (s.e.=1.6). Again, however, the The long right women initially selected 4. is of treatment across the value of one-half of treatment from .47 sample, for which results are reported in column distribution of k„(X,) considerable spread. shifts the probability 5 reports estimates for this sample, sample is The to zero. by symmetry. For most outcomes, the IV estimates working up. As distinct calculated around the overall average of about To on about 1,700 <J.[k„'X,], the distribution of which for I In practice, K(,'X, takes in Fig. 3 suggests that by dropping values of 7:y(X,) beginning from the to individuals left and with values of 7to(X|) greater than or equal .35 leads to a first stage that shifts the probability of treatment 27 an even to from .465 to .533, virtually perfectly This can be seen symmetric. resulting sample of which reports 5, 62,264 observations. Most of the estimates sample and this different from zero is in this first-stage and IV estimates for the symmetric sample are close -5.9 (s.e.= 1.8) to those and the effect on divorce or The in the results in Alternately, we Table The its complement (.136 versus .048), but the estimated effect is still significantly complementary sample. ofATE 4.2.2 Imputation 3. of Table .068 (s.e.=.032). Perhaps surprisingly, the estimated effect on poverty status differs markedly is between in 5 For example, the effect on weeks worked in the full sample. separation 1 column in Table 5 reflect can use equati ons results (1 1 ) an attempt to identify or construct samples where or (1 9) to impute a value of LATE=ATE. ATE for the various subsamples analyzed of this effort are presented in Table 6 for four outcomes; this table also reports the no-selection alternative used to construct the specification test discussed at the beginning of Section 3.2. The estimates of the no-selection alternative are corresponding sample is OLS estimates. OLS than a conventional between IV and the no-selection are similar to the estimates of LATE, even in in the full This suggests, as noted -7.56 (s.e.=.12). alternative provides a estimate of LATE of -5.3 (s.e.=l somewhat smaller estimates .6). (1 1) more powerful specification for the effect of childbearing samples where the the estimate of ATE for the sample of non-teen still is weeks worked IV/OLS comparison. Estimates of ATE constructed using equation again mostly from the estimates of LATE than the farther estimate of the effect on -7.34 (s.e.=.08), while the no-selecdon alternative earlier, that the contrast test For example, the all slightly (i.e., adult) first-stage is not mothers is Using the estimates of 6 shown for the effect symmetric. For example, -4.1 (s.e.= 1.5), in in on weeks worked comparison with an Table 4 and equation (19) generates on weeks worked other than in the teen mother sample, though significant. Estimates of ATE for outcomes other than weeks worked are mostly insignificant. This contrasts 28 LATE. Again, with the mostly significant estimates of estimates LATE. of ATE when 8 woman outside the teen mother sample For example, while (s.e.=.01 9), the is move LATE this is partly a The evidence with two children is But the substantially closer to zero than the estimates of suggests the probability of divorce or separation increases by .053 corresponding estimates of ATE are .028 (s.e.=.01 9) estimated. problem of precision. when 6 equals one and .009 (.s.e.=.02 1) that further childbearing increases divorce or separation for the typical therefore weaker than the estimates of sample of teen mothers, the estimates of ATE for effects LATE would suggest. Except for the on poverty status are also smaller than the corresponding estimates of LATE. 5. Summary and Conclusions The framework outlined here provides a strategy potential-assignment subpopulations. effects 1 focused for initially modeling treatment on restrictions that make IV on compliers representative of the overall population average treatment leads to procedures that can be used to impute average treatment effects outcomes for compliers, always-takers, and never-takers. instruments suggests this approach On may be the empirical side, estimates of treatment effects for this population, proportionality assumptions. And An illustration effect heterogeneity across effect. estimates of causal The framework also from information on average of these ideas using same sex useful in applied work. LATE when for teen the latter mothers are close to the corresponding average are inferred using a number of linearity or while estimates of the overall average effect of childbearing are smaller than the corresponding IV estimates, most of the estimated effects on labor supply and poverty status remain substantial and significant. On the other hand, most (though not all) of the estimated average effects on marital status and welfare use are small and insignificant. Estimates of the effects of childbearing on marital stability and welfare use using the instrument suggest the outline of a coherent picture, but 29 many features remain unresolved. same sex In this application, the theory of parameter heterogeneity runs quickly into the sandpile of sampling variance specification uncertainty. likely to weaken the case and precise alternative marital stability On balance, to traditional IV. seems validity of IV estimates is data sets, and, of course, think extrapolation efforts of the sort implemented here are more for the predictive value of a particular causal estimate than to provide a concrete and welfare use destructive evidence I and to is me For example, the evidence for an adverse clearly to weakened by the attempt be a prominent feature of ultimately established less to go from life in effect of childbearing on LATE to ATE. the empirical world. This sort of The external by new econometric methods than by replication by new instalments. 30 in new APPENDIX: COMPUTATION 1. The Drop test for selection bias. individual subscripts from the notation. Consider the following two-equation system: Y = Ao + A,D + Y= + 6„ 6,(1 (Al) |i [D=Z]) + 6,((D-Z)/2) + v The test for selection bias is a test and A2 is sample sizes A = , when A 6, ( The two coefficients and estimated by OLS. A and A2 and allowing on the order of that used here, be estimated by stacking for of whether (A2) 1 1 ) is estimated by IV using for heteroscedastic and correlated residuals. seems reasonable it to treat the stochastic and use the standard eiTor of the estimate of A, to construct a 2. Estimates under Restriction Use (A2) In practice, estimate of 62 as non- t-test. to write: E[Yo| D,=D„=1]= D=0, Z=l] = E[Y|| D,=Do=0]= Estimates of E[Y,|| D, > D,,] 6„ + 6^ - 6,/2 hJl. and E[Y|| D, > D„] can be obtained as IV estimates of the coefficients Ag, and (A3) and (A4), below: DY = A,o + (l-D)Y = A,,D + Ao„ (A3) n, + A„,(l-D) + (A4) (io. Estimates of ATE under Restriction 3 area linear combination of the standard error for any linear combination of them To further simplify, rewrite equation E[Y, - Yo] = A„[l-(p,-pJ] dependent variable d'=D(2Z- 1 )- Z. ( 1 1 ) in is 60, 6,, Aoi,and A,,. These coefficients and can be estimated by stacking A2, A3, and A4. terms of the parameters - Ao,[l+(p,-p„)] To accommodate models with covariates, it a instrument 3. E[Y,| D=l, Z=0] = E[Y|| A,, in Z as an the asymptotic standard error for their difference can + 26„(p,-pJ. in A2-A4 as (A5) convenient to use a regression set-up to estimate p^- p„. Define Regress d" on Z; the coefficient on Z is an estimate of p^- p„. Note that (without covariates) the standard error for the estimated Pa-p„ is the same as the standard error for the firststage coefficient since the latter can be written 1 -p^-p^. To estimate E[Y, - Yq| D=1], replace p^-Pn with Pa/Pd in (A5). Models with covariates were estimated by adding covariates to the relevant first-stage equations, and A -A4. As a shortcut for inference for estimates of ATE using A5, it seems reasonable to treat (Pa^Pn) '^"d 6(1 as known since these are estimated much more precisely than A,, and A,,,, which are themselves instrumental variables estimates. Note also that IV estimates of A,, and Aq, are independent. to equations 1 31 3. Estimates under Restriction 4. Substitute parameters E[Y, - Yo] = from A1-A4 into (19) and simplify A,,[l-(p,-e-'p„)] - Ao,[l+(ep,-pJ] Standard errors were calculated treating P3, p„, 6q, 62, + and 9 32 to obtain [6„(l+0-') + (62/2X6-'- l)](ep3-pj. as known. (A6) REFERENCES Abadie, A. (2002). "Bootstrap Tests for Distributional Treatment Effects in Instnimental Variables Models," Journal of the American Statistical Association, vol. 97, pp. 284-292. Abadie, A. (2003). "Semi-parametric Instrumental Variables Estimation of Treatment Response Models," Journal of Econometrics, vol. 1 13, pp. 231-263. Abadie, A., Angrist, J., and Imbens, G. (2002). "Instrimiental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings," Econometrica, Angrist, Epidemiology, Angrist, J. NBER Technical Regressors: Simple Strategies Angrist, November. Models with Dummy Endogenous foTEmpivicalpracUce" Journal ofBusiness ami economic Statistics, 1 15, 2-16. and Evans, W.E. (1998). "Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size," American Economic Review, vol. 88, pp. 450-477. J. J., and Imbens.G. (1991). "Sources of Identifying Information Technical Working Paper No. Angrist, Working Paper No. (2001). "Estimation of Limited Dependent Variable vol. 19, pp. Angrist, vol. 70. pp. 91-117. (1991). "Instrumental Variables Estimation of Average Treatment Effects in Econometrics and J. 1 17, in Evaluation Models," NBER December 1991. Imbens,G. and Rubin, D.B. (1996), "Identification of Causal Effects Using Instrumental Variables," Journal of the American Statistical Association, vol. 91, pp. 444-55. J., M. (2002). "Sargan's Instrumental Variables Estimation and the Generalized Method of Moments," Journal aof Business and Economic Statistics., vol. 20, pp. 450^59. Becker, G., Landes, E., and Michael, R.T. (1977). "An Economic Analysis of Marital InstahiVily," Journal of Political Economy, vol. 85, pp. 141-87. Blank, R. M. (2002). "Evaluating Welfare Reform in the United States," NBER Working Paper 8983. June. Bloom, H. S. (1984). "Accounting for No-Shows in Experimental Evaluation Designs," Evaluation Arellano, 1 Review, vol. 8, pp. 225-46. Chamberlain, G. (1986). "Asymptotic Efficiency Econometrics, Chamberlain, G. vol. 32, pp. (1987). in Semi-Parametric Models with Censoring," Journal of 189-218. "Asymptotic Efficiency in Estimation with Conditional Moment of Econometrics, vol. 34, pp. 305-334. Cherlin, A. (1977). "The Effect of Children on Marital Dissolution," Demography, vol. 14, pp. 265-72. Bronars, S. G. and Grogger, J. (1994). "The Economic Consequences of Unwed Motherhood: Using Twins as a Natural Experiment," American Economic Review, vol. 84, pp. 1141-1 156. Frolich, M. (2002). "Nonparametric FV estimation of Local Average Treatment Effects with Covariates," IZA Discussion Paper No. 588, September. Grogger, J. and Bronars, S.B. (2001). "The Effect of Welfare Payments on the Marriage and fertility Behavior of Unwed Mothers: Results from a Twins Experiment, "Joi/r«a/o/P«/;7;co/£'c'onowv, vol. Restrictions, "Jo«r«a/ 109, pp. 529-545. Hausman, J. (1978). "Specification Tests in Econometrics," Econometrica, vol. 46, pp. 1251-1271. Heaton, T. B. (1990). "Marital Stability throughout the Child-Rearing Years," Demography, vol. 27, pp. 5563. Heckman, Heckman, J. J., Heckman, J.J., J. J. (1990). "Varieties of Selection B\as," American Economic Review, and Vytlacil, E. (2001). "Four Parameters of Interest Programs." Southern Economic Journal, vol. 68, pp. 210-223. Tobias, J., and Vytlacil, E. (2000). "Local Instrumental Variables," Paper 252. April. 33 vol. 80, pp. in the 313-318. Evaluation of Social NBER Technical Working Holland, P. W. (1986). "Statistics and Causal Inference," Journal of the American Statistical Association, 945-970. vol. 81, pp. GAFN: A Re-Analysis NBER Working Paper 8-7, November. Hotz, V.J., Imbens, G. and Klerman, J.A. (2000). "The Long-Term Gains from the Impacts of the California Imbens, G. W. and Angrist, Econometrica, Kearney, M. (2002). J. Family Cap," Program," of (1994). "Identification and Estimation of Local Average Treatment Effects," vol. 62, pp. "Is there GAfN 467-75. an Effect of Incremental Welfare Benefits on Fertility Behavior? A look at the NBER Working Paper 9093, August. Manski, C. (1990). "Nonparametric Bounds on Treatment Effects," American Economic Review vol. 80, pp. 319-322. Maynard, R., Boehnen, E., Corbett, T., Sandefur, G., and Mosley, J. (1998). "Changing Family Formation The Family, and Reproductive DC: National Academy Press. R. A (1999). "New Developments in Econometric Methods for Labor Market Analysis," Chapter 24 in O. Ashenfelter and D. Card, eds.. The Handbook of Labor Economics, Volum 3A, Behavior Through Welfare Reform," in Robert Moffit, ed., Welfare, Behavior: Research Perspectives, Washington, Moffit, Amsterdam: North-Holland. Bloom, H.S., Bell, S.H., Doolittle, F., Lin, W., and Cave, G. (1996). Does Training for the Disadvantaged Work?, Washington, DC: The Urban Institute. Phillips, P. C.B. (2003). Vision and Influence in Econometrics: John Denis Sargan," Econometric Theory, Orr, L. L., forthcoming. Grammar of Science, London: Adam and Charles Black. A. "Some Thoughts on the Distribution of Earnings," Oxford Economic Papers Roy, (1951). Pearson, K. (1911). The Sargan, D. (1958), "The Estimation of Econometrica, Waite, L. J. vol. 26, pp. 3, 135-46. Economic Relationships Using Instrumental Variables," 393-415. and Lillard, L.A. (1991). "Children and Marital Disruption," American Journal of Sociology, vol. 96, pp. 930-53. Wald, A. (1940). "The Fitting of Straight Lines if Both Variables are Subject to Error," Annals of Mathematical Statistics, vol. 1 1 pp. 284-300. Wooldridge, J. M. (1997). "On Two-Stage Least Squares Estimation of the Average Treatment Effect in a Random Coefficient Model," Economics Letters, vol. 56, pp. 129-133. Wooldridge, J. M. (2003). "Further Results on Instrumental Variables Estimation of Average Treatment Effects in the Correlated Random Coefficient Model," Economics Letters, vol. 79, pp. 185-191. , 34 Panel A: Moderate selection on gains 0.20 - 0.15 - ^ "---^ 0.10 in -^^^ *•* o 0.05 - 0.00 - -0.05 - -0.10 - ~-^^ ^_ 0) c a> E ~^^^ 03 -LATE -TT ^ -0.15 -0.20 "^ - 0.0 0.1 0.2 0.3 0.5 0.4 0.7 0.6 0.8 0.9 P{D=1|Z=0) Panel B: Strong selection on gains 1 c;n ' 1.00 5 L___ 050 ^~'"'''---^' o do en o TT — ^ reatment o ^^-^ "^ ' LATE "^ -1.00 0.0 0.1 0,2 0.4 0.3 0.5 0.6 0.7 0.8 0.9 P(D=1|Z=0) Fig. The sets between LATE, ATE and the effect on the treated (TT) for alternate firstfixed at .07 and ATE=0. The top panel calculation the correlation between gains and the treatment index to -.1, while the bottom panel sets this 1 : relationship stage baseline values. correlation to -.5. The first-stage effect is Panel A: Moderate selection on gains n 9=; 1 0.20 0.15 o ^ "^^^^^^^^"t— o -^ o d en effects *- 00 ^ -0.05 . ''^^...^^, ^^--^ - 0) H —^ -^ -TT -0.10 LATE -. -0.15 i 1 j ! -0.20 n 9^ 0.0 0.3 0.2 0.1 0.5 0.4 0.6 0.7 P(D=1|Z=0) Panel B: Stronj selection on gains 2, 1.00- w 0.50 ^^^^^^^^^ - 0) p ' 00 --.— _j.^ 0) E ^ ! £ TT -0.50 LATE 1- -1.00 1 ^n 0.0 0.1 0.2 0.3 0.5 0.4 0.6 0.7 P(D=1|Z=0) Fig. 2: The relationship between stage baseline values. sets the correlation correlation to -.5. The LATE, ATE and the effect first-stage effect is fixed at .30 between gains and the treatment index on the treated (TT) for alternate first- and ATE=0. The top panel calculation to -.1, while the bottom panel sets this sf- ,Ji r^" Density 2 -- >?3I« o- 1 1 .4 .2 .6 .8 P(D=liZ=0,X) Fig. 3: Distribution of first-stage base probability as a function of covariates. The Black and Hispanic dummies, and dummies for some college and college graduates. There are about 1,700 values in the histogram. covariates are age, age at first birth. TABLE 1 DESCRIPTIVE STATISTICS, : WOMEN AGED 21-35 WITH 2 OR MORE CHILDREN Means and Standard Deviations Variable All Ever No Some Teen Adult Women Married College College or + Mothers Mothers Children ever born More than 2 children (=1 if mother had more than 2 children) Boy 1st (=1 if 1st child Boy 2nd (=1 if Two boys was a 2nd child was (=1 if Two girls were (=1 if Same sex (=1 if first 0.405 0.328 0.500 0.324 (0.470) (0.500) (0.468) 0.512 0.512 0.510 0.514 0.509 0.513 (0.500) (0.500) (0.500) (0.500) (0.500) (0.500) 0.511 0.511 0.509 0.512 0.508 0.511 (0.500) (0.500) (0.500) (0.500) (0.500) (0.500) 0.264 0.264 0.262 0.266 0.262 0.264 (0.441) (0.441) (0.440) (0.442) (0.440) (0.441) 0.241 0.241 0.242 0.240 0.245 0.240 (0.428) (0.427) (0.428) (0.427) (0.430) (0.427) 0.505 0.505 0.504 0.506 0.507 0.504 (0.500) (0.500) (0.500) (0.500) (0.500) (0.500) Age Age at first birth (mother's age when first child 2.41 (0.68) (0.491) two children were the same sex) 2.72 (0.90) 0.370 two children first girls) 2.41 (0.68) (0.483) two children first 2.55 (0.81) 0.375 boy) were boys) 2.49 (0.75) (0.484) boy) a 2.50 (0.76) was bom) Black Mother 30.4 30.6 29.92 31.3 28.8 31.1 (3.5) (3.4) (3.60) (3.0) (3.89) (3.0) 21.8 22.0 20.85 23.4 17.9 23.5 (3.5) (3.5) (3.16) (3.50) (1.13) (2.8) 0.131 0.092 0.141 0.115 0.237 0.088 (0.337) (0.289) (0.348) (0.319) (0.425) (0.283) Hispanic Mother 0.113 0.112 0.144 0.065 0.156 0.096 (0.317) (0.315) (0.351) (0.246) (0.363) (0.294) 0.089 0.036 0.144 0.037 - (0.285) (0.186) (0.351) (0.190) Never Married 0.068 (0.252) Married Now 0.798 0.857 0.768 0.846 0.649 0.859 (0.401) (0.350) (0.422) (0.361) (0.477) (0.348) Divorced 0.081 0.087 0.083 0.079 0.122 0.065 (0.273) (0.282) (0.276) (0.270) (0.327) (0.246) Divorced or Separated High School Graduate 0.137 0.136 0.114 0.197 0.099 (0.344) (0.343) (0.317) (0.398) (0.299) 0.420 0.423 0.685 - 0.426 0.417 (0.494) (0.465) - (0.495) (0.493) (=1 if high school diploma and no further education) Some College 0.127 (0.333) (0.494) (=1 if some college, but no degree) College Graduate {-] if bachelor's 0.264 0.269 - 0.682 0.174 0.301 (0.441) (0.444) - (0.466) (0.379) (0.458) 0.123 0.131 - 0.318 0.018 0.166 (0.338) - (0.466) (0.131) (0.372) degree or higher) (0.329) In Poverty (=1 if family income below the poverty line) Welfare Recipient (=1 Worked for pay 0.256 0.103 0.347 0.136 (0.437) (0.303) (0.476) (0.343) 0.098 0.065 0.130 0.047 0.187 0.062 (0.297) (0.247) (0.336) (212) (0.390) (0.241) (=1 if worked for 1989) in 0.158 (0.364) if public assistance income>0) pay 0.197 (0.398) 0.662 0.674 0.623 0.723 0.650 0.667 (0.473) (0.469) (0.485) (0.447) (0.477) (0.471) Weeks worked (weeks worked in 1989) Number at age 1 women 5 or later. 26.9 24.27 29.3 25.0 26.7 (22.9) (23.00) (22.4) (22.81) (22.9) 357063 236418 143589 380007 of observations Notes: Data are from the 1990 26.2 (22.9) PUMS. The sample The no-college sample includes includes women women with 2 or more children whose 2nd child was at least in 1 and who had 269851 iheir first birth with no college or with an associate occupational degree. The some-college sample includes with an associate academic degree, some college but no degree, or a college degree. Teen mothers are those younger. Standard deviations are reported 110156 age parentheses. All calculations use sample weights. who had their first birth at age 19 or TABLt All Women OLS Variable IV 2 Ever OLS AND LABOR SUPPLY ESTIMATES FIRST-STAGE M amed No IV Some College OLS or + IV Teen Mothers OLS \\ Adult Mothers OLS IV No Covariates A. Firsl College OLS IV Sla^c More than 2 children 00628 0.0663 (0.002) (0.002) - 0.0652 0.0594 0638 0.0616 (0 002) (0.003) (0.003) (0024) Outcomes Worked for pav If 0.132 0.084 0126 0.083 0.132 0.105 -0 112 0.057 -0.140 •0026 0,130 01 09 (0.002) (0.027) (0.002) (0.026) (0.002) (0.034) (0003) (0.044) (0.003) (0 051) (0.002) (0.032) 7.34 -5.15 -7.12 5.09 7.22 6.52 6,59 3.21 7.47 -4.76 -7.19 -5.26 (0.08) (130) (0.09) (1.27) (0.11) (1.60) (0,14) (2.18) (0.15) (2.40) (0.10) (1.57) - Vets Harked B. Sta^ More than 2 With Covariates First 0.0523 children (0 0017) 0.0658 0.0644 - 00592 00633 (0,0017) (0.0022) - (00027) (0.0033) 0.0623 (00019) - Outcomes Worked for pav 0.148 -0.097 0.148 0.097 0.144 -0.120 0.151 -0.060 •0.132 -0071 0.154 0.108 (0002) (0.027) (0.002) (0 026) (0.002) (0.034) (0.003) (0.(M3) (0.003) (0.049) (0.002) (0.031) 8.33 •5,93 8 43 5.92 7.92 7.43 884 3.45 •7.51 •7.55 •8.62 -5.30 (0.09) (1.27) (0.09) (1.24) (Oil) (1.57) (0.14) (2.15) (0.15) (2.28) (0,10) (1.52) Weeks xiorked Notes The table reports OLS and IV csiimatcs Liflhc coefficient on the Afore than 2 children variable The IV cslimales use 5omc sex as an instrument The covariates included and dummies arc Age. Age Table Standard errors arc tcporled I. ulfirai birth, for Boy ht. Boy 2nd. Black. Hispanic, Other in parentheses, Al) calculations race. use sample weights. High school graduate. Some college and College graduate The samples arc the in same panel as in B TABLE All Outcomes 3: EFFECTS ON MARITAL STAUTUS, POVERTY STATUS AND WELFARE USE W omen Ever Mlarried No OLS Married Now Divorced Divorced or Separated In Poverty Weifare Recipient Married Now Divorced Divorced or Separated In Povertv Welfare Recipient OLS or + Teen Mlothcrs OLS Adult Mothers OLS IV -0.021 -0.010 - - -0.026 -0.023 -0.002 0.007 0.016 (0,001) (0.153) - - (0.001) (0.021) (0.001) (0.020) (0.002) -0.025 -0.062 -0.0075 -0.055 -0.035 -0.082 0.008 -0.035 -0.015 -0.066 0.018 -0.048 (0.002) (0 024) (0.001) (0.020) (0.002) (0.030) (0.002) (0.036) (0.003) (0.051) (0.002) (0.025) IV IV IV OLS IV No Covariates -0 010 (0.0391) 0.0004 -0.003 (0.0009) (0.014) -0.013 0.011 -0012 0.012 -0.012 0.0044 -0.017 0.022 -0.024 -0010 -0.022 0.016 (0,001) (0.016) (0.001) (0.016) (0.001) (0.0195) (0.002) (0.027) (0.002) (0.035) (0.001) (0.017) 0.0023 0.053 0.0056 0.055 0.0070 0.057 -0.011 0.046 -0.0030 0.048 0018 0.049 (0.0013) (0.019) (0.0014) (0.020) (0.0016) (0.024) (0.002) (0.032) (0.0027) (0.043) (0001) (0.021) 0.143 0.095 0.124 0.082 0.167 0.107 0.070 0.088 0.178 0.143 0.083 0.062 (0.002) (0.029) (0.002) (0.020) (0.002) (0.031) (0.002) (0030) (0003) (0050) (0.002) (0.024) 0.067 0.033 0.050 0.032 0.079 0.028 0.030 0.049 0.091 0018 0.030 0.032 (0.001) (0.018) (0.001) (0.014) (0.002) (0.024) (0.002) (0.022) (0.003) (0.042) (0.001) (0.017) 0.0026 0.0051 - - 0.0038 -0.025 0.0061 0.023 0.0087 -0.031 0.0043 0.0025 (0.0009) (0.0136) - - (0.0013) (0.019) (0.0012) (0.018) (0.0022) -0.034 (0.0009) (0.0126) B. Ever Married Some Col lege IV A. Ever Married College OLS With Covariates 0.037 -0.052 0.037 -0.046 0.028 -0.080 0.057 -0.011 0.020 -0.078 0.047 0.041 (0.001) (0.021) (0.001) (0.019) (0.002) (0.028) (0.002) (0.033) (0.003) (0.047) (0.002) (0.023) -0.037 0.0071 -0.039 0.0067 -0.029 0.0010 -0.048 0.018 0 026 0.021 -0040 0.017 (0.001) (0.0157) (0.001) (0.0159) (0.001) (0.0196) (0.002) (0.026) (0002) (0.035) (0.001) (0.017) -0.033 0.048 -0.036 0.047 -0.023 0.054 -0.049 0.038 -0.013 0.040 -0.041 0.048 (0.001) (0.019) (0.001) (0.019) (0.002) (0.025) (0.002) (0.031) (0.003) (0.043) (0.001) (0.020) 0.093 0.097 0.087 0.085 0.113 0.113 0.055 0.074 0.148 0.186 0.065 0.061 (O.OCl) (0.021) (0.001) (0.019) (0002) (0.028) (0.002) (0.029) (O003) (0.047) (0.002) (0.022) 0039 0.033 0.031 0.032 0.048 0.032 0.020 0.041 0.071 0.043 0.022 0.030 (0.001) (0.017) (0.001) (0.014) (0.002) (0.023) (0.001) (0.021) (0.003) (0.040) (0 001) (0016) Notes: The samples and models are as in Table 2, with difTerent dependent variables. Standard errors are reported m parentheses. All calculations use sample weights. TABLE 4: POTENTIAL-ASSTGNMENT SUBPOPULATIONS NoC(ivariates Sample All Women P. Pa P,. P..-Pn (1) (2) (3) (4) (5) (6) 0.063 0.344 0.594 -0.250 0.619 0.375 (0.0018) (0.0018) 0.370 Ever Married 0.066 0.337 0.597 0.405 College 0.065 0.372 0.563 0.059 0.328 College 0.298 0.642 0.500 0,064 0.468 0.468 0.324 0.062 0.293 (00020) Notes: The which is first column the parameter 9 ofSame were calculated as described appendix. Standard errors are reported in -0.344 0.510 -0.0006 0.999 0.645 -0.352 0.502 (0.0020) reports the proportion treated. given by the first-stage effect 0.696 (0.0034) (0.0034) Adult Mothers -0.191 (0.0027) (0.0027) Teen Mothers 0.607 (0.0023) (0.0023) Some -0.261 (0.0018) (0.0018) - No With Covanates P[D=I] sex. in the Pc Pa-P„ (7) (S) 0.062 -0.250 (0.0017) (0.0018) 0,066 -0.260 (0.0017) (0.0018) 0.064 -0.191 (0.0022) (0.0023) 0.059 -0.344 (0.0027) (0.0027) 0.063 -0.0005 (0.0033) (0.0034) 0.062 -0.351 (0.0019) (0.0020) The second column shows the proportion of compliers in the sample, The estimates of the proportion of always-takers and never-takers and text. Estimates with covariatcs were calculated as described in the parentheses. All calculations use sample weights. TABLE 5: SYMMETRIC FIRST STAGE SAMPLES Symmetric Sample All Variables First Stage Teen Mothers (1) (OLS Constant Outcomes (IV 11 & 7io(X)<=0.6 or7to(X)>0.6 7to(X)>=0.35 7ro(X)<0.35 (4) (5) (6) .(3) 0.0628 0.0638 0.0713 0.0588 0.0684 0.0576 (0.002) (0.003) (0.003) (0.002) (0.003) (0.002) 0.344 0.468 0.471 0.296 0.465 0.253 (0.001) (0.002) (0.002) (0.001) (0.002) (0.001) Estimates') Worked for pay Weeks worked Ever Married Married Now Divorced Divorced or Separated In Poverty Welfare Recipient Number of Observations Columns (2) Symmetric Sample ito(X)<0.4 estimates") Coefficient Notes: Women I 7t„(X)>=0.4 1 -0.084 -0.026 -0.038 -0.109 -0.080 -0.092 (0.027) (0.051) (0.045) (0.034) (0.038) (0.039) -5.15 -4.76 -3.72 -6.03 -5.90 -4.71 (1.30) (2.40) (2.21) (1.62) (1.83) (1.86) -0.010 -0.0098 -0.016 -0.0031 -0.0051 -0.0092 (0.015) (0.0391) (0.032) (0.0170) (0.0267) (0.0162) -0.062 -0.066 -0.033 -0.066 -0.075 -0.039 (0.024) (0.051) (0.045) (0.027) (0.038) (0.028) 0.011 -0.010 -0,029 0.025 0.012 0.0057 (0.016) (0.035) (0.031) (0.018) (0.026) (0.0189) 0.053 0.048 0.020 0.063 0.068 0.033 (0.019) (0.043) (0.038) (0.022) (0.0032) (0.023) 0.095 0.143 0.095 0.087 0.136 0.048 (0.023) (0.050) (0.044) (0.027) (0.036) (0.028) 0.033 0.018 0.021 0.034 0.027 0.032 (0.018) (0.042) (0.035) (0.020) (0.029) (0.020) 380007 110156 103803 276204 162264 217743 and 2 repeat estimates from Tables 2 and selected as described in the text. Standard errors are shown 3, for models without covariates. Columns 3-6 report estimates using samples in parentheses. All calculations use sample weights. TABLE 6: IMPUTATION OF ATE ATE No Outcome All (3) (4) (5) -7.34 -7.56 -5.15 -4.31 -3.19 (0.08) (0.12) (1.30) -7.12 -7.33 -5.09 -4.41 -3.45 (0.09) (0.13) (1.27) (123) (1.43) Women Ever Married No College Some College or + Teen Mothers Adult Mothers In Poverty- All -7.33 -6.52 -5.73 -4.94 (1.60) (1.57) (1,70) -6.59 -6.90 -3.21 -2.38 -0,60 (0.14) (0.21) (2.18) (2.08) (2,69) -7.47 -7.66 -4.76 -4.76 -4,76 (0.15) (0.22) (2.40) (2.40) (2.40) No -7 19 -7.43 -5.26 (0.10) (0.15) (1.57) 00005 0.053 0,028 0.0092 (0.0019) (0.019) (0,019) (0.0216) 0.0056 0.0043 0055 0024 -0.0002 (0.0014) (00020) (0.020) (0,019) (0.0221) 0.0070 0.0053 0.057 0,032 0.011 (0.0016) (0.0024) (0.024) (0,024) (0.026) -0.011 -0.014 0.046 0,034 0.034 (0.002) (0.003) (0.032) (0,029) (0.037) -0.0030 -0.0063 0.048 0,048 0.048 (0.0027) (0,0040) (0.043) (0,043) (0.043) -0.018 -0.021 0.049 0,017 -0.0024 (0.001) (0.002) (0.021) (0,019) (0.024) 0.143 0.150 0.095 0,049 -0,0023 (0.002) (0.002) (0.023) (0024) (0,029) 0.124 0.129 0.082 0,054 0,020 (0.002) (0.002) (0.020) (0.021) (0,026) 0.167 0.175 0.107 0.061 0,014 (0.002) (0.003) (0.031) (0.032) (0,036) College College or + 0.070 0.071 0.088 0.059 0,03! (0.002) (0.003) (0.030) (0.031) (0,042) Teen Mothers 0.178 0.181 0.143 0.143 0,143 (0.003) (0.005) (0.050) (0.051) (0.051) Adult Mothers Welfare All 0.083 0.088 0.062 0.017 -0.039 (0.002) (0.003) (0.024) (0.025) (0.033) Women Recipient 0.067 0.072 0.033 0.0058 -0.026 (0.001) (0.002) (0.018) (0.0185) (0.022) Ever Married No 0.050 0.052 0.032 0.015 -0.0051 (0.001) (0.002) (0.014) (0015) (0.0182) 0.079 0.085 0.028 -0,001 -0.031 (0.002) (0.003) (0.024) (0,025) (0.029) College Some College or + 0.030 0.030 0.049 0,037 0.029 (0.002) (0.002) (0.022) (0,022) (0.030) Teen Mothers 0.091 0.096 0.018 0,018 0.018 (0.003) (0,004) (0.042) (0,043) (0,043) Adult Mothers Notes: Columns 1 Resinction 4. 0.030 0.032 0.032 0,006 (0.001) (0 002) (0.017) (0,017) and 3 repeat estimates from Tables 2 and for the selection-bias test. -2.19 (1.91) -4 0.0023 Mamed Some 06 (1,48) (0.0013) Women Ever (1.45) (0.16) Adult Mothers All 27) -722 College or + Separated (I 1 (0.11) Teen Mothers Divorced or /(1-P[D=1|Z=0]) LATE (2) College Some 9= = P[D=l|Z=l] Alternative Ever Married No e (1) Women Worked ATE OLS Sample Weeks Selection Column 4 repones 3. Column 2 shows the no-seleciion estimates of ATE under Restriction 3 and column -0024 (0 023) ahemative under Resmciion 5 reports estimates of I and ATE under ti.89 MOV 1 I 2003 Date Due MIT LIBRARIES „ I nil! I lllll|llll[lll)llliili|ni 3 9080 02613 1349 ^'liiiBi