Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/instrumentalvariOOabad ,31 [415 © |Oev* working paper department of economics Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings Alberto Abadie Joshua Angrist Guido Imbens No. 99-16 October 1999 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 WORKING PAPER DEPARTMENT OF ECONOMICS Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings Alberto Abadie Joshua Angrist Guido Imbens No. 99-16 October 1999 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142 PliACHuSETTS INSTITUTE OF TECHNOLOGY LIBRARIES Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings* — MIT Alberto Abadie — MIT and NBER Guido Imbens — UCLA and NBER Joshua Angrist Revised: September 1999 Abstract The effect of government programs on the distribution of participants' earnings important for program evaluation and welfare comparisons. This paper reports timates of the effects of The estimation uses a gram impacts on the JTPA training programs new instrumental quantiles of (QTE) estimator accommodates exogenous sion when selection for treatment is variables. covariates linear where the empirical results at low quantiles. of earnings for first step is effects and reduces to quantile regres- We show that the JTPA program had the men QTE estimator can JTPA this develop distribution theory estimated nonparametrically. Perhaps surprisingly, however, pro- This quantile treatment programming problem, although requires first-step estimation of a nuisance function. for the case method that measures exogenously determined. The be computed as the solution to a convex es- on the distribution of earnings. variable (IV) outcome is For women, the largest proportional impact training raised the quantiles only in the upper half of the trainee earnings distribution. *We thank Moshe Buchinsky, Gary Chamberlain, Jinyong Hahn, Jerry Hausman, Whitney Newey, Shlomo Yitzhaki, and seminar participants at Berkeley, MIT-Harvard, Penn, and the Econometric Society Summer 1998 meetings for helpful comments and discussions. Thanks also go to Erik Beecroft at Abt Associates for providing us with the National JTPA Study data and for helpful discussions. Abadie acknowledges financial support from the Bank of Spain. Imbens acknowledges financial support from the Sloan Foundation. 1. Effects of many Introduction economic variables on distributions of outcomes are of fundamental A areas of empirical economic research. government programs leading example interest in the question of is how affect the distribution of participants' earnings, since the welfare analysis of public policies involves distributions of outcomes. Policy-makers often hope that subsidized training programs will reduce earnings inequality by raising the lower quantiles of the earnings distribution of and thereby reducing poverty (Lalonde (1995), US Department Labor (1995)). Another example from labor economics distribution of earnings. unionism Fortin, is Freeman One is the effect of union status on the of the earliest studies of the distributional consequences of (1980), while more recent analyses include Card and Lemieux (1996), who have asked whether changes in (1996), and DiNardo, union status can account a significant fraction of increasing wage inequality in the 1980s. for Although the importance of distribution effects is widely acknowledged, most evaluation research focuses on average outcomes, probably because the statistical techniques required to estimate effects on restrict the means are easier to use. Many econometric models also implicitly treatment effects to operate in the form of a simple "location shift" mean effect captures the impact of treatment at treatment on a distribution and there is is easy to assess all quantiles. when treatment perfect compliance with treatment assignment. Of status , in which case course, the impact of is randomly assigned Randomization guarantees that outcomes in the treatment group are directly comparable to outcomes in the control group, so valid causal inferences can be obtained by simply comparing the treatment and control distributions. in The problem of how to draw randomized studies with non-compliance or assignment is more In this paper, difficult, inferences about distributional effects in observational studies however, and has received we show how less attention. with non-random 1 to use a source of exogenous variation in treatment status Rosenbaum and Rubin (1983), and Manski Heckman, Smith and Clements (1985). (1994), (1997), Imbens and Rubin and Abadie discuss (1999a) effects on distributions. Manski (1994, 1997) develops estimators for (1997), bounds on quantiles. 'Discussions of average treatment effects include Rubin (1977), Heckman and Robb - an instrumental variable - to estimate the effect of treatment on the quantiles of the distribution of outcomes in non-randomized studies, or in situations where the offer of treatment is randomized but treatment (QTE) estimator is itself is used here to estimate the voluntary. This Quantile Treatment Effects effect of training on trainees served by the Job Training Partnership Act (JTPA) of 1982, a large publicly-funded training program designed to help economically disadvantaged individuals. JTPA The data come from the National Study, a social experiment begun in the late early 1980s at 16 locations across the JTPA to evaluate the effects of training. For this study, JTPA US applicants were randomly assigned to treatment and control groups. Individuals in the treatment group were offered JTPA training, while those in the control group were excluded Only 60 percent of the treatment group actually received treatment assignment as an instrument The treatment subpopulation we effects in the control of effects affected The like the This terminology JTPA, the by an instrumental Rubin is fact, in (1996). used because in randomized the case of the effects for JTPA, where who (almost) is variable. established by Imbens approach to instrumental variables and Angrist (1994) and Angrist, Imbens, and Imbens and Rubin (1997) extended these results to the identification of hypotheses about distribution impacts such as stochastic dominance. papers developed simple estimators or a scheme for estimating the 2 a trials whose treatment status how the effect of treatment on distributions, and Abadie (1999a) showed quantiles. for compilers are also representative cases, compilers are those identification results underlying the compilers first we can use the relevant subpopulation consists of people group received treatment, on the treated. 2 In other (IV) models were training, but for treatment. always comply with the treatment protocol. In no one a period of 18 months. estimated using the framework developed here are valid call compilers. with partial compliance, for to test global But neither effect of of these treatment on We focus here on conditional quantiles because quantiles provide useful summary Angrist and Imbens (1991) discuss the relationship between instrumental variables and effects on the treated. in the Orr, et al. (1996) and Heckman, Smith, and Taber (1994) report average JTPA. Heckman, Clements and Smith using a non-IV framework. (1997) estimate the distribution of effects JTPA on the treated treatment effects statistics for distributions, and because quantile comparisons have been recent discussions of changing wage inequality Murphy (1994)). and Buchinsky (1992) The paper organized as follows. is discusses the identification problem. (see, e.g., at the heart of Chamberlain (1991), Katz and Section 2 outlines the conceptual framework and Section 3 presents the estimator, which allows for a binary endogenous regressor (indicating exposure to treatment) and reduces to Koenker and Bassett (1978) quantile regression when selection for treatment is exogenous. Like quantile regression, the estimator developed here can be written as the solution to a convex linear programming (LP) problem, although implementation estimation of a nuisance function in a of the QTE estimator requires Finally, Section 4 discusses the estimates of first step. on the quantiles of trainee earnings. The estimates effects of training for women show larger proportional increases in earnings at lower quantiles of the trainee earnings distribution. But the estimates men for suggest the impact of training was largest in the upper half of the distribution and not at lower quantiles as policy-makers perhaps would have wished. Conceptual Framework 2. The setup outcome is as follows. variable, The data consist of Y, a binary treatment indicator D, and a binary instrument, Z. In the case of subsidized training, Y is earnings, D indicator of the randomized offer of training. not everyone n observations on a continuously distributed who was program participation, and indicates Z offered training received and it D are not equal in the status, say a D would indicate union status, dummy indicating individuals and who work Z As in Rubin of causal effects, Y an because might be a would be an instrument in firms that organizing campaigns (Lalonde, Marschke and Troske (1996)). vector of covariates, is and because a few people who were not offered training received services anyway. In a study of the effect of unions, measure of wages, JTPA Z We for union were subject to union also allow for an r x 1 X. (1974, 1977) we and our earlier work on instrumental variables estimation define the causal effects of interest using potential outcomes and potential treatment status. D, Yd, we In particular, outcomes indexed against define potential and potential treatment status indexed against Z, D z . Potential outcomes and potential treatment status describe possibly counterfactual states of the world. Thus, tells us what value take if had D = Z D would take Z were equal to if were equal to d. The 0. Similarly, Yd tells us while 1, Do tells us what value D\ D would what someone's outcome would be if they objects of causal inference are features of the distribution of potential outcomes, possibly restricted to particular subpopulations. The observed treatment status is: D = D + (A In other words, if Z= 1, then D\ the observed outcome variable is observed, while 1 terfactual causal inference is difficult is outcomes as being defined Z = 0, then D observed. Likewise, is -(1-D). for (1) that although for everyone, one potential outcome are ever observed we think of all possible coun- only one potential treatment status and any one person. 3 Principal Assumptions 2.1. The principal assumptions of the potential outcomes Assumption 2.1: For almost all values of Independence; (Yi,Y ,Di,D (i) (ii) ) is jointly independent of ^ E[D \X). D = First-Stage: £[Di|X] (iv) MONOTONICITY: P(D > The 1 \X) framework for IV are stated below: X, Non-Trivial Assignment.- P(Z = 1\X) e (hi) 3 if Z. is: Y = Y -D + Y The reason why Do) Z given X. (0, 1). 1. idea of potential outcomes appears in labor economics in discussions of the effects of union status. See, for example, Lewis' (1986) survey of research on union relative wage effects. Assumption ment status subsumes two related requirements. 2.1(i) First, identify the causal effect of the instrument. This is comparisons by instru- equivalent to instrument- error independence in traditional simultaneous equations models. comes are not directly affected by the instrument. This Angrist, Imbens and Rubin (1996) how they Assumption JTPA differ. is Second, potential out- an exclusion two requirements and for additional discussion of these 2.1(i) is plausible because of the randomly assigned See restriction. (though not guaranteed) in the case of the offer of treatment. Assumption 2.1(h) requires that the conditional distribution of the instrument not be degenerate. in The two other ways as some and treatment assignment relationship between instruments well. correlation between As D in simultaneous equations models, and Z; this is effect in and it is D in one direction. Monotonicity assumption for the a plausible in most applications JTPA, where D = It is for (almost) everyone. and coun- on observed covariates, X. For example, evaluation studies focus on estimating the difference between the average outcome (which of treatment (which A is inference problem in evaluation research involves comparisons of observed for the treated 4 2.1(iv) guarantees identification of This monotonicity assumption means that the terfactual outcomes, possibly after conditioning many Imbens 4 automatically satisfied by latent-index models for treatment assignment. also a reasonable The Also, any model with heterogeneous potential outcomes that satisfies assumptions 2.1(i)-2.1(iii). instrument can only affect require that there be stated in Assumption 2.1 (hi). and Angrist (1994) have shown that Assumption meaningful average treatment we restricted is latent-index is is model observed) and what this average would have been in the absence counter- factual). Outside of a randomized trial, the difference in for participation is D= 1{A where Ao and Ai are parameters and 77 D\ — l{Ao + Ax > 77}, and either D\ is t? an error term that > Do everyone, then monotonicity holds for Z' + Z-A! - or =1— Dq > D\ Z. > is 0} independent of Z. Then for everyone. If Ax < Do = l{Ao > Dq > D\ so that t]}, for average outcomes by observed treatment status E[Y \X,D = 1 1] - E[YQ \X,D = 0] is typically a biased estimate of this effect: = {E[Y \X,D = 1] - E[YQ \X,D= D= 1] - E\Y 1 + {E[Y The first also be written as E\Y\ is term in brackets the bias term. is \X, \X, D= 1]} 0}}. the average effect of the treatment on the treated, which can — Yq\X,D = 1] since expectation is a linear operator; the second For example, comparisons of earnings by training status are biased trainees are selected for training if on the basis of low earnings potential. This bias extends to comparisons other than the mean. For example, the relationship above holds if we replace conditional expectations with conditional quantiles. 2.2. An Identification Using Instrumental Variables instrumental variable solves the problem of identifying causal effects for a group of individuals whose treatment status is affected The by the instrument. following result (Imbens and Angrist (1994)) captures this idea formally: Lemma 2.1: Under Assumption (and assuming that the relevant expectations are 2.1 Z = \\- E[Y\X, Z = 0] = E[Y E[D\X, Z = 1] - E[D\X, Z = 0] E[Y\X, 1 E[YX -Y \X,Di to individuals for > D ] called a Local is whom D\ > Dq as compliers because in a whatever their assignment. In other words, the whose treatment status was changed cannot be identified (i.e., we never observe both D\ and Do where Do E[Y1 = in the randomized who comply with trial We with partial the treatment protocol experiment induced by Z . Note that individuals are compliers) because any one person. Also note that in the special case for everyone, -Y \X,D >D l Q] = E[Y -Y \X,D = = E[Y -Y \X,D = l l refer set of compliers is the set of individuals we cannot name the people who for }. 1 Average Treatment Effect (LATE). compliance, this group would consist of individuals in this set -Y \X,D >D finite) l l] l], = E[Y 1 -Y \X,D 1 = l,Z=l] D LATE so The equivalence between the effect of treatment on the treated. is compliers and effects on the treated in cases where distributional characteristic The compliers concept and not is any identically zero holds for just means. LATE at the heart of the is Do effects for explanation for how IV methods work. compliers are. For these people, Z — Suppose D, since framework and provides a simple initially it is that we could know who the D > D always true that x . This observation plus Assumption 2.1 leads to the following lemma: Lemma Given Assumption 2.1 and conditional on X, treatment 2.2: (independent of the potential outcomes) for compliers: (Yi, Yo) -L D\X, Proof: Assumptions 1, D = 0. When D — x A that (YU 2.1(i) says 1 D = and Lemma consequence of 2.2 is 0, D Y , uD ) ± assignment is X course, as compliers is it stands, Do] (i.e., 2.2 operational, = P(Z — This function Lemma 2.3: ± ) . Z\X,D = 1 D can be substituted for Z. is : D= 2.2 is of effect 0,D > X even though treatment E[Y, ] we it (Abadie, 1999b) Let - Y \X,D for the define the following function of D-(l-Z) (l-D)-Z i-ttoPO MX) when "identifies compliers" in h(Y,D,X) be D, same Z x Do] = > D }. (2) ' To {S} ' D = Z, otherwise k is negative. the following average sense: any integrable p7p^~^) individual). and X: real function of Then, given Assumption 2.1, E[h{Y,D,X)\D > 1 no practical use because the subpopulation of 1\X). Note that k equals one useful because D = we do not observe D\ and Dq _ where n (X) DQ ignorable that in the subpopulation of compliers, comparisons - E[Y\X Lemma not identified make Lemma > x is not ignorable in the population: E[Y\X,D = 1,D > Of D Z\X, so (Yi,Y means by treatment status estimate an average treatment of D, status, E[k h(Y, D,X)}. (Y,D,X). To why see this into three groups: is true, note that, who have D\ > compilers = Dq = and never-takers who have D\ E[h(y,D,X)\X,Di>D by monotonicity, the population can be partitioned 0. = p(D^D^\X~) E[h{Y D X)lX] { ' ] X, we have the Z= Z = individuals with all D= and 1 ' =D = £[/i(F,D,X)|X,A Likewise, those with 1, Thus, E[h{Y,D,X)\X~Di Monotonicity means that who have D\ = Dq = Dq, always-takers 1] = Z> = 1 P(A = D = • 0] • P(D = D = 1 D = and 1|X) 0\X)\. must be never-takers. must be always-takers. Since Z is ignorable given and never-takers as a function of following expressions for always-takers observed moments: E[h(Y,D X)\X,Di t = A) = E[h(Y,D,X)\X,D = 1 = E[h{Y,D,X)\X,D = 1,Z = 0] D-(l-Z) 1 h(Y,D,X) X E P{D = 1\X,Z = 0) 1 " TTo(X) l] D = 0] = E[h(Y, D,X)\X, D= 0, 1 P(£ = 0|X,Z = P{D\ = Dq = An and never-takers using P(D\ 0\X) = P(D = implication of condition involving Abadie (1999b). 5 In the next 5 For example, if we define fi (fi, — a) > Dq], and identified fi E\Yq\D\ is 1 ~ D) - TTo(X) 1) WAX) A' is Z— 1). = D = 1\X) Integrating over X = P(D = l\X,Z = identified for compliers. This point we show how Lemma and completes the argument. that any parameter defined as the solution to a section, 0) 2.3 is moment explored in detail in can be used to develop an and a as same intercept that is E[k (Y - m - aD) 2 }. then, 0\X, Lemma 2.3 (Y,D,X) P == 1] Z given X can similarly be used to identify the proportions Monotonicity and ignorability of of always-takers Z = argmin (ma) E[(Y - m - aD) a — E\YX - Yq\D\ > Do], so by conventional IV methods). 8 2 \Di > D ], a is LATE (although fi By Lemma 2.3, (/u,a) also that is not the minimizes estimator for the causal effect of treatment on the quantiles of an outcome variable. Quantile Treatment Effects 3. The QTE Model 3.1. The QTE linear estimator and additive analysis is based on a model where the is when the treatment model because the resulting estimator when regression is there 3.1: For 6 £ Qg(Y\X,D,Di > As a consequence (0, 1), there exist unique ag of D ) JTPA Lemma 2.2, we training changed the G R and (3 e G W such that = a e D + X%. and Yq ofY given (4) X for compliers. median earnings and D for compliers. This tells us, for example, of participants. Note, however, that where average differences equal differences in may also be of focus on the marginal distributions of potential outcomes because identification social welfare — Yq requires much stronger assumptions and because economists comparisons typically use differences in distributions and not the distribution of differences for this purpose (see, 6 IV and ordinary not the quantile of the difference (Yi— Yq). Although the latter of the distribution of Y\ making and quantile the parameter of primary interest in this model, ag, in contrast with average treatment effects, is ) denotes the 9-quantile gives the difference in the ^-quantiles of Y\ interest, QTE of interest are defined as follows: 1 averages, ag Koenker and Bassett (1978) quantile simplifies to Q e {Y\X, D, D > D whether The with X, but we use an additive no instrumenting. The relationship between is estimated. is (OLS). The parameters where effect varies is therefore analogous to the relationship between conventional least squares Assumption treatment and covariates at each quantile, so that a single treatment effect straightforward regression effect of e.g., Atkinson (1970)). 6 Heckman, Smith and Clements (1997) discuss models where features of the distribution of the difference — Yq) are identified. They note that this may be of interest for questions regarding the political economy of social programs. If the ranking of individuals in the distribution of the outcome is preserved (Y\ The model above differs in a number of ways from the model in the seminal papers by Amemiya (1982) and Powell (1983), who used least absolute deviations to estimate a simultaneous equations system. Their approach begins with a traditional simultaneous equations model, and Rather, the idea tailed. is not motivated by an attempt to characterize effects on distributions. to improve is Most importantly, of interest in the function. on 2SLS when the distributions of the error terms are longwith the parameters in equation in contrast Amemiya/Powell setup do (4), the parameters not, in general, define a conditional quantile 7 The parameters as (see Bassett of the conditional quantile function in equation (4) can be expressed and Koenker {a e ,Pe) where p e (X) is = ai*g (1982)): mm (a,/3)eM'-+ 1 i the check function, defined as p e (A) Therefore, using Lemma 2.3, {a e ,Pe) = ae and minimand fi e = (9 is — 1{A < A)], 0}) • A for - aD - (5). globally convex in (ag,P e ) since for compilers times However, since k is some negative objective function turns out to be non-convex. tion problems of this type (piecewise linear do not ensure a global optimum real A. X'0)]. positive constant (5) it is (see, e.g., when D A number is equal to the {P{D\ > Do))- lowing the analogy principle (Manski (1988)) a natural estimator of (ag,/3 g ) counterpart of any are identified as argmin (Q/3)eMr+1 E[k p e {Y This population objective function check-function E [Pe( Y ~ aD ~ x 'P)\ D > is Fol- the sample not equal to Z, the sample of algorithms exist for minimiza- and non-convex objective functions), but they Charnes and Cooper (1957) or Fitzenberger (1997a,b), for a discussion of a related censored quantile regression problem). Unlike the under the treatment, then the estimator impacts. in this paper is informative about the distribution of treatment King (1983) discusses horizontal equity concerns that require welfare analyses involving the joint distribution of outcomes. 7 Amemiya and Powell papers comes from conditional median restrictions on the However, a conditional median restriction on the reduced form does not imply that the Identification in the reduced form. is a conditional median. In fact, for a binary endogenous regressor, conditional median on the reduced form and structural equation are typically incompatible. structural equation restrictions 10 — conventional quantile regression minimand, the sample analog of equation (5) does not have a linear programming representation. Now, let U= (Y,D,X); applying the Law of Iterated Expectations to equation (5), we obtain {<*e,Po) = argmin (Q)/3)er+ i E[k v p g (Y - aD - X'j3)], (6) where kv u (U) for = E[Z\U] = E[k\U\ = P(Z = is and therefore non-negative. Lemma 1\Y, Under Assumption 3.1: Do = By E[D-(l-Z)\U] D = 1 —j we show below, k u = P{D > D ku (U) 2.1, First consider the product monotonicity, (i-D)-MU) D, X). Although simple to derive, Proof: 1. —--^t D-(1-MU)) 1 of signal importance because, as tation is = D — (1 implies = P(D(1 - X x = = P(A = D = n v {U) = E 1 D = Z= and 1\U) l\U) P{Z = 0\U) ^D Q\D 1 P(Z = 0|A = 1 1 if Hence: 1. = P(D = D = 1\U)-P{Z = - D) Z\U] = P(D = a conditional probability is \U). = P{D 1 = D Q = l\U)-P{Z = Similarly, E[{1 second represen- Z). This differs from zero only D = Z) this D = l,U) l,Ylt X) 0\X). l\X). Therefore, D(l-Z) (l-D)Z U P(Z = 0\X) P(Z = l\X) - P(A = D = l\U) - P{D = D = X = 0\U) == P{D > D X \U). a A consequence of this lemma LP is that it is possible to develop a representation based on a sample analog of equation 11 (6)). QTE estimator with an This can be thought of as a scheme to "convexify" the sample analog of (5). The resulting convex QTE estimator minimizes a positively- weighted check-function minimand, with a global minimum be obtained as the solution to a linear programming problem number iterations. This is in a finite that can of simplex and Hahirs (1998) LP-type estimator similar in spirit to Buchinsky for censored quantile regression. An interesting question is whether there based on k u instead of the sample analog of gies The distribution theory is non-parametrically in (X, D) (X). 7r cells. 9 If We the an estimator can be shown that both strate- it In light of this result, the and v Q (U) to construct an estimate of developed assuming that model consistently estimates ear In fact, (5). paper focuses on estimation by minimizing the sample analog of quires first-step estimation of iro(X) . efficiency cost to using produce estimators that are asymptotically equivalent. 8 rest of the ku any is X is k. u (6). This re- (U), denoted discrete, so a saturated lin- use series approximation to estimate v Q (U) number of terms in the series approximation increases at an appropriate rate with the sample size, this procedure ensures that the esti- mated conditional expectations converge however, it may make dimensionality of when the 3.2. X is to the true conditional expectations. In practice, sense to use something less than a saturated model for high. We the discuss practical aspects of the estimation strategy further results are presented in Section 4. that we have a random sample (ao,f3 e )' for ag and j3 e in equation (4). {Y;, If D z , Xi, Zi}™=1 . Let W = (1990). Since k u is unknown, we estimate See Newey (1994); the estimators same functional. (D,X')' and 6g — k u were known, the estimation problem would reduce to a weighted quantile regression problem of the type discussed by the when Estimation Assume 8 X this function Newey and nonparametrically in a Powell first step are asymptotically equivalent since they nonparametrically estimate 9 Buchinsky and Hahn (1998) similarly decompose the covariates in a censored quantile regression problem into a set of discrete variables and a set of continuous variables. They use non-parametric methods to estimate the conditional probability of censoring in cells 12 defined by the discrete variables. and use the fitted values k„([/i) in a second step to construct the estimator: n % = 1 - ]T 1{k„(^) > argmin, GRr+1 0} «„(^) p e (y, - W/5), • • (7) 77, First step estimation of kv is carried out using non-parametric series regression. increasing sequence of positive integers {A(fc)}^=1 (Y x ( lj , ...,Y X( K)). Assume that only takes on a finite = Then, any random sample {Vi}"=1 {wi,...,wj}). 3 be indexed X and a positive integer K, as {{Vi } i =1 }j =1 j (X, D). In the same , where {Vi j } fashion, the number {(Z{, let For an K p (Y) = W G of values (so that U )}f=1 V = from l (Z,U) can J i =1 are subsequences for distinct fixed values of sample can be indexed as {{Vi^L^iLi, where {l^,}£'= i are subsequences for distinct fixed values of X. Now, a nonparametric power series estimator v(U) of vq(U) is given by the Least Squares projection of {Z, }™ J=1 on {p K (Yi.)}™ 3=1 (this amounts to non-parametric Z series regression of Y on in each W-cell). Let u t be the fitted values of such estimator for the observations in our sample. Consider the simple estimator n(X) of ir Z (X) obtained by averaging within X. Our cells of first step estimator of k u is given by: ^)~ ~(TT\-1 K 1 3.3. ^-(1-%) (l-A)-Pi l-rr {Xl ) tt(^) • Distribution Theory This subsection summarizes asymptotic results for the QTE estimator. Proofs are given in the appendix. Theorem on W, Y 3.1: is Under assumptions and and 3.1 if (i) number 7r (X) is i.i.d.; (ii) bounded away from zero and one, and of values; (iv) conditional on W , eg is conditional differentiate at zero with density fee \w,Di>D {ty that W; (v) kv is is W and D\ > Do takes on a is continuously bounded and bounded away from zero bounded away from zero uniformly in 13 X continuously distributed with bounded density; the distribution function of eg conditional on uniformly in the data are continuously distributed with support equal to a compact interval and density bounded away from zero; (Hi) finite 2.1 U ; (vi) for s equal to the number of continuous ^ AA(0,O), 6e ) where Y derivatives in Q = J-'ZJ" 1 ofu K n 2s — and > K 5 /n — Then, n l l 2 (6 e P(D > D 1 ] \ andZ = E[W] with^ = K-m(U)+H(X)-{Z-7r -MX)) (I The asymptotic Assumption To produce an estimator loss. ... 6" ~1 v ( Yj Wj = <pUQ h {-^~ ... . a kernel function. Consider the following estimator of J: J 1=1 i robust to mis-specification 10 and K J=-Y n For is X of the asymptotic variance matrix, let (Y-W'8\ 1 MS) = -h v {—h—) is in the /-cell of X , v {Ui )-^(8e )-W Wi i let ni Hi = k(K) & = ~ |> ~ Hn = ) In such a case, quantile regression estimates 3.1). the best linear predictor under asymmetric </?(•) (1-D)-Z (MX)) 2 variance formula provided by this theorem of the functional form (in where 2 — m{U) = {e-l{Y-W6 9 < 0})-W, (X)}, D-(l-Z) H(X) = E m(U) An 0. > = E[feg w>Dl>Do (0) WW'\D, > D J , , i - Dt • (1 - - W& < Zi) l-7f(Xi) K{Vi) estimator of (6 E (1 - 0}) D • W k Z t) {1 ( ~ Dk) WW ' Zil Di ;i "Z - n(x )y " ' (1 *' } t r ' 7?(Xi) - 1{Y - w(6 e < 0}) • Wi + H t {Z x Srpfc)}. can then be constructed as n t—' i=\ The following theorem establishes the consistency of an asymptotic covariance matrix es- timator. 10 Most on quantile regression treats the linear model as a literal specification for condimodel can be viewed as an approximation. This interpretation is discussed by Buchinsky (1991), Chamberlain (1991), Fitzenberger (1997), and Portnoy (1991). of the literature tional quantiles. Alternately, the linear 14 Theorem Under the assumptions of Theorem 3.2: for some neighborhood of (Hi) ip(z) \(p{z) - > J (p(z) dz = 0, (p(zo)\ 0, <C- \z-zQ f€g \w,Di>D 1, J < Thenn = J~ EJ1 \. (t) h — > 1 oo; oo; > (ii) first derivative; C > there exists (iv) — nh A 0, has bounded and continuous (-) tp(z)\dz \z and 3.1 such that A Q. Effects of Subsidized Training 4. Background 4.1. The JTPA began funding training in October 1983, and continued to fund federally- sponsored training programs into the late 1990s. The program included a number of parts or "titles", the largest of which is Title II, which supports training JTPA Study economically disadvantaged. At the time of the National JTPA Title II programs were serving about of roughly 1.6 billion dollars. JTPA 1 judged to be for those in the early 1990s, million participants a year, at an annual cost services were delivered at 649 sites, also called Service Delivery Areas (SDAs), located throughout the country. Title II of the ized evaluation. JTPA 11 is unusual in that explicitly incorporated a The National JTPA study ever undertaken in the US. participants at 16 it The JTPA SDAs. These sites is mandate for random- the largest randomized training evaluation evaluation study collected data on about 20,000 were not a random sample of and all SDAs; rather, they implement the experimental design, were chosen for diversity, willingness and the and composition of the experimental sample they could provide. Although size ability to the non-random selection of sites raises issues of external validity (as in als), within sites, applicants were randomly selected for JTPA many treatment. The clinical tri- evaluation sample includes applicants who applied between November 1987 and September 1989. The original study of the labor-market persons for whom impact of Title II services continuous data on earnings (from either State unemployment insurance (UI) records or two follow-up surveys) were available for at least 30 11 Other parts of the JTPA, such as Title III programs for workers et al. (1996), Bloom, et al. (1997), who months lost their jobs as after 15 random a consequence of Background for this section and the US Department of Labor (1999) website. international competition, did not include a randomized evaluation. from Orr, was based on 15,981 is drawn assignment. Although data are available on a range of labor market outcomes sample, we focus on the sum of earnings in this 30-month period since best measure of the program's lasting economic impact at months following is probably the on participants. Individuals who JTPA services for period their application (though they could participate in other programs were not offered treatment were generally excluded from receiving of 18 this for this any time). The JTPA was a complicated program service providers included ganizations, community sistance colleges, State skills, strategies. basic education, or both; (OJT/JSA); (iii) other services that and/or a combination of the first were recommended as part of the two. employment The types and private-sector training agencies. grouped into three general service occupational that offered a wide range of services. may have SDA, the data in the analysis (i) classroom training in included probationary employment JTPA intake process, before though individuals were assigned to treatment with their or- on-the-job training and/or job search as- For the National JTPA community of services offered can be These strategies are (ii) services, JTPA sample were Study, service strategies random assignment. different probabilities artificially Al- depending on balanced to maintain a 2/1 treatment-control ratio at each location. The JTPA generally offered services to a deemed number eligible for training if of different groups. Title II applicants were they faced one of a number of "barriers to em- ployment". These included long-term use of welfare, being a high school dropout, 15 or more recent weeks ability, of unemployment, limited English reading proficiency below 7th grade barriers were unemployment spells level, or proficiency, physical or an arrest record. and high-school dropout gorized as being in one of five groups: adult men, adult status. Applicants were cate- women, female youth, male youth men and women because the samples are largest for these two groups. There are 6,102 adult and 5,102 adult men with 30-month earnings data. 16 dis- The most common non- arrestees, and male youth arrestees. In this study we focus on adult 30- month earnings data mental women with 4.2. Average Effects Using our earlier notation, having been enrolled for offer of Y is JTPA 30- month earnings, services, Z and D indicates those who were recorded as Although the indicates the offer of services. treatment was randomly assigned, only about 60 percent of those offered training actually received JTPA which randomized the This services. is JTPA a consequence of the offer of services early in evaluation design, the application process, but did not compel those offered services to participate in training. While all applicants indicated at least some interest in receiving offered treatment were not necessarily notified immediately, JTPA services, those and some time may have passed before training could begin. In the meantime, applicants selected for treatment found jobs, received services somewhere else, or simply lost interest. Providers may have may also have had an incentive to delay the enrollment of applicants that they thought were unlikely to benefit some from treatment, while some SDAs were unable to find service providers for applicants. Also, on the other side of the randomization offer, a small proportion of those selected for the control group (1.6 percent) received imenters' attempt to prevent this. 12 Treatment status is JTPA services despite the exper- therefore likely to be correlated with potential outcomes and cannot be treated as exogenous. Although treatment status framework appear to apply itself was not randomly assigned, the assumptions of our in this case: the randomized been independent of potential outcomes, the offer of treatment offer of treatment is is likely to have unlikely to have affected outcomes through any mechanism other than treatment itself, and denial of made treatment more likely. Moreover, because of the randomization is not likely to have very low probability of receiving JTPA services in the control compilers in this case can also be interpreted as effects on those Since training offers were randomized in the National not required to identify training 12 Adult men and women who were services than were received effects. Even JTPA (Z — by 0) group, effects for who were treated. Study, covariates (X) are in experiments like this, however, offered treatment ultimately received about 150 by the members of the control group. 17 services more hours it is of training customary to control for covariates to correct for chance associations between D and X Moreover, in our setup, covariates can be used to describe earnings in Orr, et al, 1996). quantiles for compilers in population subgroups, since we estimate Qg(Y\X, D, D\ > We and Hispanic applicants, a therefore include as covariates group dummies, and dummies worked at dummies a least 12 by differed holders), AFDC dummies receipt (for recommended women) and whether the is carried out separately for men and women for Most more The I. other) and There are more minority applicants than be recommended likely to Average 30-month earnings for for in the program rules, a relatively applicants also have low previous men were recommended of the OJT/JSA services, while the classroom training than sample are about $19,000 employment women were OJT/JSA for low men and or other $13,000 women. As a benchmark Table II reports for the OLS and impact of training. 14 The while the OLS first column reports unadjusted trainee/non-trainee differences, estimates in column (2) are from a regression of the dependent variable on training difference are precisely purposes of comparison with earlier analyses of the JTPA, conventional instrumental variables (2SLS) estimates of the the covariates and a training 13 OJT/JSA, sex. proportion of high school graduates. services. applicant since previously reported results in the general population and, not surprisingly given the slightly dummy married applicants, 5 age- service strategy (classroom, Descriptive statistics are reported in Table rates. for ). whether earnings data are from the second follow-up survey 13 In addition, for the analysis for GED for black D weeks in the 12 months preceding random assignment. Also included are for the original dummy dummies graduates (including for high-school (as is dummy $3,970 for measured for (D). Without the use of covariates, the training/non- men and $2,133 for women. Trainee/nontrainee differences both men and women. OLS estimates of training effects in comes from a background survey conducted as part of the JTPA intake is similar to that described in Appendix B of Orr et al. (1996), except that we collapsed some categories and omitted SDA dummies because they had low explanatory power. 14 As with the quantile regression and QTE estimates discussed later, the standard errors in Table II The process. covariate information The covariate list used here are robust in the sense that they provide consistent estimates of the asymptotic variance of the estimators under general misspecification. 18 models with covariates are similar to the differences without controls. The reduced-form Not effects of Z surprisingly, since the offer of treatment are reported in columns (3) and was randomly assigned without conditioning on estimates with and without covariates differ are not directly comparable with the OLS little. (4). covariates, the Note that the reduced form estimates estimates since many and (6) of of those offered training did not actually receive training. The instrumental variable estimates in columns (5) ized offer of treatment (Z) as an instrument for construct the estimates in columns (1) and those offered training. When (2). less OLS 2SLS estimate size of the for corresponding men OLS is among $1,593 estimate. This amounts to a 9 percent estimate. a 15 percent earnings increase for similar to those reported in previous studies. women. These results are 15 Estimates of Quantile Treatment Effects 4.3. Table III reports OLS and conventional quantile regression estimates of the effect of train- ing. The OLS estimate of the training coefficient tile regression as was used to $1,780 with a standard error of $532, not dra- is matically different from the corresponding men and use the random- II This corrects for non-participation than half the For women, however, the 2SLS estimate same in the covariates are included, the with a standard error of $895, earnings increase for D Table covariates are the regression estimates same as those used to construct the estimates in Table is $3,754 for show that the gap in quantiles proportionate terms) below the median than above earnings is it. $2,215 for women. by trainee status is 136 percent higher. For dramatic, but still marked. Like the women OLS is The The quan- much larger (in For men, the .85 quantile of trainee about 13 percent higher than the corresponding quantile the .15 quantile is less men and II. for non-trainees, while the difference in impact across quantiles estimates shown in the table, the quantile regression coefficients do not necessarily have a causal interpretation. Rather they provide 15 The 2SLS et al. (1996). estimates in Table II are very close to those in Table 4.6 of the National The JTPA study by Orr estimates are not identical because the covariates are not identical. Percentage effects were computed as the coefficient on training, divided by and other covariates set to means for the treated. 19 fitted values with the training dummy set to zero a descriptive comparison of earnings distributions Implementation of the QTE estimator requires oretical results in the previous section are and non-trainees. for trainees step estimation of k u first based on non-parametric The . the- series estimation of the conditional expectations in k„, but this leaves open a range of possibilities. Since the elements of X are discrete, non-parametric estimation of E[Z\X] is in principle straightfor- ward. In practice, however, a fully saturated model leads to problems with small or missing covariate dom is We cells. assignment, Z therefore estimated Z£[.Z|X] using the restriction that because of ran- and X should be uncorrelated in large samples. simply the empirical E[Z\. The expectation vq(U) separate models for A value. series D = 0, 1. = E[Z\Y,D,X] QTE but for A tiles the little explanatory approximation was used to estimate terms in Y. Selection of the order The order is for the same for both 16 magnitude though earlier, for men the women less precisely estimated than the 2SLS estimates in Table the 2SLS estimates are not 2SLS estimates much smaller than are considerably smaller than particularly interesting finding for men is that the OLS II. estimates, OLS. QTE estimates of effects on quan- exhibit a pattern very different from the quantile regression estimates. In particular, QTE estimates show no evidence of a change in the .15 or .25 quantile. at low quantiles are substantially smaller error) of the effect on the regression estimate effect on the is .15 quantile for men For example, the is men is estimates QTE esti- estimate (standard $121 (475), while the corresponding quan- $1,187 (205). Similarly, the .25 quantile for The than the corresponding quantile regression mates, and they are small in absolute terms. tile was estimated using estimates of the effect of training on median earnings, reported in Table IV, are similar in As noted resulting estimate Most X's were dropped because they had the series approximation was guided by cross-validation. values of D. The QTE estimate (standard error) of the $702 (670), while the corresponding quantile regression lc Hausman and Newey (1995) use a similar approach to dimension-reduction for non-parametric estimation of consumer demand equations. Given estimates of k„, we computed QTE coefficient estimates by weighted quantile regression using the Barrodale-Roberts (1973) linear programming algorithm for quantile regression (see, e.g., Koenker and D'Orey (1987)). A biweight kernel was used for the estimation of standard errors. 20 estimate is $2,510 (356). This suggests that quantile regression estimates of training effects at low quantiles are especially distorted by positive selection on earnings potential. It seems that training did not really change the lower deciles of the trainee earnings distribution for men. In contrast with the results at low quantiles, however, the male earnings above the median are large and QTE estimates of effects on statistically significant (though smaller still than the corresponding quantile regression estimates). The QTE women show estimates for with the largest proportional effects at low quantiles. For example, training women by to raise the .15 quantile of earnings for The significant effects of training at every quantile, is estimated $324 (175), an increase of 35 percent. estimates also suggest training raises the .85 quantile by $1,900 (997), but this increase of only 7 percent. Most of the QTE estimates for women proportional impact on the lower course, women's earnings are tail of an are reasonably close to the corresponding quantile regression estimates. Thus, whether or not training as endogenous, the estimates support the notion that for is women training treated is had a bigger the earnings distribution than the upper tail. especially low in this sample, so large proportional effects Of do not translate into large dollar amounts. 17 Orr, et al. concluded that (1996) reported effects by subgroups but found no clear patterns. (p. 160) "the benefits of of different types of JTPA. Our are broadly distributed across a wide variety men and women." Heckman, concluded that heterogeneity of impacts the JTPA results is Smith, and Clements (1997) similarly important but that most women benefited from do not contradict these general conclusions, but they nevertheless show more heterogeneity in program effects than is revealed by a simple analysis within subgroups. In particular, our results strongly suggest that training for adult much larger proportional effect upper tail 17 (though the absolute Interestingly, QTE They on the lower effect tail women had of the earnings distribution than on the lower on the tail is small). estimates of the proportional effects of training on tional quantile regression estimates not only because the training impact men is are smaller than conven- lower, but also because the is bigger. This reflects the fact that the constant and covariate-effects estimated by QTE are Qe{Yo\X, D\ > Do). This is bigger than the quantile regression intercept because of positive selection male compilers. constant 21 a for for Perhaps most striking among our findings is not seem to have raised the lower quantiles of their earnings. an by program operators to target effort higher earnings potential. in any assessment using a distribution more well off, it may be results in distributional changes that does because of men with would be undesirable social welfare function that weights the lower tail of the earnings seems One response purpose of the JTPA was to aid economically likely that the lower quantiles are of particular to this finding might be that few JTPA concern to applicants were very so that distributional effects within applicants are of less concern than the fact that the program helped many applicants overall. However, the upper quantiles of earnings were who reasonably high for adult males this This services at relatively easy-to-employ heavily. Since the ostensible disadvantaged workers, policy makers. The men the result that training for adult upper tail is participated in the National JTPA Study. Increasing therefore unlikely to have been a high priority. Summary and Conclusions 5. This paper reports estimates of the ings for participants. on quantiles. The We QTE use a effect of subsidized training new estimator on the quantiles of earn- for the effect of a non-ignorable estimator can be used to determine how an intervention affects the distribution of any variable for individuals whose treatment status binary instrument. The estimator accommodates exogenous conventional quantile regression when the treatment is covariates exogenous. It treatment is changed by a and collapses to minimizes a convex piecewise-linear objective function similar to that for conventional quantile regression, can be computed as the solution to a linear programming problem after tion of a nuisance function. this first step is The paper develops estimated nonparametrically. and first-step estima- distribution theory for the case where QTE estimates of the effect of training on the quantiles of the earnings distribution suggest interesting and important differences in program effects at different quantiles, women. These the JTPA differences are large and differences in distributional impact for men and enough to potentially change the welfare analysis of program. 22 Appendix Proof of Theorem 3.1: This proof largely follows that of Theorem 1 in Buchinski and Hahn (1998). Consider, ='^2 gi (j,K) Gn(T,K) i where gi (r, k) and egi \/n(Se - — = K{Ui) {9 — W-5g. The Y{ Now, 5e)- function define r„(r,K) MLLUA almost surely. By - n- \{e ei 1'2 W[t)+ - Gn (r, 1{kv > = E[G n (T, k)}. = _ n -V2 W . e+] 0} Kv{Ui) . kv ) • Note + (1 - 6) is - n~ 1 ' 2 W(r)- - [{e ei convex in r and it is e«]}, minimized at rn = that, {6 _ 1{£0i _ n -l/2 W - < < , Q}) and Weierstrass domination, (iv) dE[g(T,K l/ ) ] , dr d 2 E{g(r, . r=0 = «„)] 7^ [t=o= n- 1 OTOT _„-l/2 -n- 1 / 2 E[Wk v (U) (9 l{e e = 0})] 0, WW'\D > D Q P(D 1 > E[ftolW:Dl>Do (0) X ] £>„). Then, = r n (r, k v ) = E[f£ ^ WiDl>Do (0) WW|£>i > £> feo\w,Di_>Do(fy are bounded away from zero, J where J o(l), P{Pi > A))- Note • ] Cn (Ui) + -r'Jr that by (ii) and since both k v and non-singular. Define, is = n- x '2 {e - < l{eei 0}) • W it n Un(K)=Y,K(U )-UUi), i i=l and = k(^) ?„([/;, «, r) Note that E\u n (K v )\ G n (j,n) • = = = {9 0, [(e ei ~ n~ 1 ' 2 W{t) + - e+] + (1 - 6) {(e gi - n~ 1 '2 W[t)~ - e£] + r'U^)}. then r„(T,K )-l-(G7l (r,K)-rn (r,K^)) 1/ r„(T, k„) - T'u) n (K) + (G„(r, k) + r'w„( K ) {r n (r, k„) + /£[«„(«„)]}) n = r n (r, K„) - r'w n («0 + ^{PnC^i. K ' T) ~ E - 9 n( t/ '[/ - K ^> T )]l i=l Lemma A.l : w„(l{K y > 0} £„) =n _1/2 ^Vi + oP (l) with i=i 23 £ty = and 2 £|M| < oo. Lemma A. 2 Applying : Eti(ft,( t[..l^>0}-ic„,T)-%(C/,K„ Lemma a given r. Let rj =J n 2 Define A„(r) convex in r, ( 1 T~ 7 0} > cj n (l{K l/ ?n)' • J( T ~ 7 Ku) = 0} k„). • = ln) T is = Lemma |r n (r)| Therefore, by > E r+1 0} by Op(l)- So, . Op (l) that: jT _ T W "(1{^ > ' K„) 0} + - 7^ J??n = ^ r'Jr + . o p (l). Since A n (r) is K(t)-\t'Jt -0, k„) = - Lemma 3 (t in - ry n )'J(r - ? ? J - - rfn Ji ln + rn (r) Buchinsky and Hahn (1998), we have that r n = ?7 n + o p (l). A.l -6 e )^N(O,J- ZJl 1 ), E[ipip']. PROOF OF Lemma A.l zero. + Then, n 1 / 2 (6 £= ' T K„) 0} • G„(t, 1{k„ where Note > r'w n (l{^ = G n (i~, l{«t/ > 0} K„) + r'o; n (l{K„ > 0} K„), then X n (r) applying Pollard's convexity lemma (Pollard (1991)): any compact subset of with sup TgT *Vt " o 2 g sup where o p (l). A. 2: Gn (T, l{Ku > for = T)]} 1 This assumption is : To prove this lemma we use the assumption that k v probably stronger than necessary but it is bounded away from allows us to ignore the trimming using K 1 2 s making the asymptotics easier. Assumption (vi) implies that, ((K/rij) / + K~ ) —> — = almost surely for all j 6 {1, ..., J}. Therefore sup t/eW \v op (l) (see, e.g., Newey (1997), Theorem vq\ 4). Since txq is bounded away from zero and one (by (iii)), then sup;y eW \k u — k u = o p (l). Since /c„ is bounded away from zero, with probability approaching one the trimming is not binding and we can ignore \{k v > 0}, \ it for the asymptotics. uj n Let 7Tq {K u ) be the population ^n = = -= y.m(Ui) V n ~( mean -F=V m(/7i) V™^ of Z V - 7—- 1 for the /-cell of i • 1 V TTOi-TTt - 7r 0(A X 7T i ) and ? its (A i ) + Rn / sample counterpart. r -(TTi-TToi) (l-7Ti)-(l-7r0i)/ 24 (l-A,)-^, A, •(!-£*,) tt'-tt' (i_^).(i_4) o, note that D ir {\-VM l-D^-Vi, 111 fe -9 n' V 1 = l Lemma \V D f& ~ ,m V^ < sup Then, applying (1-Tf')- (1 -*{,),/ @ii ii) ~ v Oii) — Vq\ n 'fe A, (%, , -^Oi,) ^ Op(l). *W V Newey and McFadden 4.3 in \ ) (l-5?')-(l -^)J (1994), -l I l lX n -TV (i-A,)-^_ £ (!-£>) D-(l-i/ i/ m([/) (7T (X))2 A.-(i-hh.) \ (1-^.(1-4) J (1) (l-D)-Z ) m(C/) A" (l-Tro(X))' D-(l-Z) (i-MX)Y (x)y (7r X Therefore, i(K v ) To " lf. V^ n + ^Vi/(X )-{^-7r / A-(l-gj) (l-A)-gj V 1-T0(^i) TToCX) i (X )}+Op (l). I prove, J_V- an A A-(i-Pi) (l-A)-gj >™ Vn "7= ^i JT( 1 V - A • (1 - Zi) 7T7T l-TTo(Ai) (1 - A) ~ 7r Zi lv (Ajj + , Op(l) / notice that A V t=l • (l - vj) l-Tro(Xi) (l " - A) Pt no(Xi) J -. ni l - A,) tt 25 (A%) Pjj So, we show that just have to for each j € {1, J} (l-A)-zv Di. -(1-Vi.) ( ..., i,=i :^m(^). 1- +0,(1 ,(!) l-Tro(^) , ttoP^.; This will be done by checking assumptions 6.1 to 6.6 in Newey (1994). Assumptions 6.1 and 6.2 follow from the conditions of the theorem (see Newey~(1994), page 1373). Assumption 6.3 holds with d — and ad = s. Assumption 6.4 holds for b(z) — and derivative equal to directly m(U) 5 Assumptions 6.5 and 6.6 follow from: (i) rij K~ 2s — 0; (ii) /rij — check Assumption 6.5 note that (vi) implies that s > 5/2, therefore 6.5 is also valid with d — 0). To check assumption 6.6 note that since K > (almost surely). In particular, to > K K~ — m(U) i D 1-D -MX) MX) s < (note that Assumption > 00, then, there exists a sequence £ K such that E as K - oo (see Newey D m(U) l ~£)-tKP MX), l-no(X) (1994), page 1380 last paragraph.) Di (1 - Vi) (1 - Di) Tn^ mm \ V" Now, applying the (TT\ (-L l A-(l-^Oi) - I -MX,) (1 ~ ~ Di) Z% MXi) ) Di V Proof of Lemma lemma A. 2 : (1994), - VQi) (1 - ZO + o p (l) (1 7To(A'i N 2=1 VQ MXi) l-A result of the Newey Vi x and the results in MXi) ) 1 K (U) ^ + Op(l) TTo(A'i) - A) holds. Note that pn (Ui,l{K v > 0}-K v ,T)-pn (Ui,K 1/ ,T) = (1{k„ > 0}-K„-n v )-Sn (Ui,T), where Sn (Ui,r) so \Sn (Ui,r)\ E[n 9-[{e ei + (1-6). < n- x l 2 \{\e 8i < n- l ' 2 \W[ T \} \ \Sn (Ui,r\} =E < n ~Fu 1'2 \ E[l{\e e \ -n-" 2 W' T)+-el} = < l [(e w - n" /2 W!t)~ - e,"] + r'UU), \W[t\. Also, jT^W'tW w (n-W \W'r\) 1 \W'r\] - Feolw (-7i-V 2 \W'r -1/2 26 \W't\ 2-E[feolw (0)-\W'T\ 2 }<™ Then, n (Ui,l{K ^p = v > 0} -K V ,T) -pn (Ui,K v ,T) i=l 1 2 1 < sup - -k„\ \k v - ^n- ueu \Sn (Ui,T)\ = o p (l). i=l Also, by cancellation of cross-product terms, J2 E i(Pn(. U,^,r)f = ^2p n (Ui,Ku,r) -E\p n {U,K v ,T)\ 1=1 v»=l < £[l{|e fli and the result of the lemma Proof of Theorem Consistency of £ \W[t\ • 2 ] 0, 3.2: )\W,D 1 E[<Ph {6 < n-^l^/rl) holds. easy to prove and is | >D we will focus /* 1 = ] i<P = on J. By fv- W'6f>\ { h and (ii) < (hi), for < e* eg: dV /y|w,J3 1 >A,(v) J fi(z)feo\W,D i 1 >D (h-z)dz = fe e \W,D 1 >D o (0) + h = fe e \W, Dl >D o {0) + dz Z-<p(z)' dz O(h). (A.1) Therefore, limiE[<p h (6 e )\ a —>U By equation (A.l) and condition lim h —*0 E [k v <p h is kv , </?(•) and n h? — > oo, • >d o (0)- ] is eventually bounded (in X P(D > Do] X , Do] WW'\D > D WW'\D > [feelw Dl>Do (0) > 1 P(D > D l ] D ) ). = 0(l/h 2 ). then Notice that, since k„ X>„(0i) Di W are bounded, ^ 5>„(tfi) i , x var( Ku -^ h (6e)-WW') Since U ]W 3.1, E[<p h (6o)\W,Di bounded, we have that also h —>Q = E Also, since ] hm E [E [<p h {6 g )\W, D > D = WW'} (6e) W 1 Theorem (iv) in absolute value) by a constant. Since W,D > D = <p hi is <p hii 1 E [U\ w Dl>Do (0) WiW[ (6 e ) , bounded away from zero (uniformly 3e) WiWl - K v {Ui) <p hA (8e) WW'ID, > D ] P(D 1 > D (A.2) ). in U), W W[ { i=l n < C sup ueu \k v - kv 1 \ - V" ~ n \\k v fh i(Se) WiW- =C • sup ueu 27 |re„ - 1 k„\ - ~ " V"^ i=l • fh,i( Se Under the assumptions of Theorem 1 3.1, sup,y eW \k u — kv 0. > In addition, n ~ - — \ 1 - I 2=1 i=l C-^-h ±-h ¥e-8 * e \\.\m\ i=l C < • n 1 /2 ||?e -8 e \\- (n 1 /^ 2 )- = 1 o p (l). As shown above, ' i=i Therefore, = £ f>„(0i) ¥>m&) WiW! lY,K v (Ui) • 2 By (i) and (iv), for I Y,K v {Ui) ip =1 <p h>l (6 e ) WiWi + o„(l). (A.3) 2=1 some constant C Ki (6e) WiWi - K v {Ui) <p hii W W[ (6 e ) t < C n 1 /2 • \\6 e -Se\\- {n l'2 h 2 )- 1 = o p {\). 2=1 (A.4) Combining equations (A. 2), (A.3) and (A.4), we get J 28 A J. References Abadie, A. (1999a): "Bootstrap Tests Variable," Mimeo, MIT. for the Effect of a Treatment on the Distribution of an Outcome Abadie, A. (1999b): "Semiparametric Estimation of Instrumental Variable Models Mimeo, MIT. Amemiya, T. Angrist (1982): "Two Stage Least Absolute Deviations Causal Effects," Estimators," Econometrica 50, 689-711. D. AND G. W. Imbens (1991): "Sources of Identifying Information Working Paper No. 117. J. for in Evaluation Models," NBER Technical W. Imbens, and D. B. Rubin (1996): "Identification of Causal Effects Using Instrumental Variables," Journal of the American Statistical Association, 91, 444-472. Angrist, D., G. J. ATKINSON, A. B. Barrodale, AND I., imation," "On (1970): the Measurement of Inequality," Journal of Economic Theory, F. D. K. SIAM Journal Roberts (1973): "An Improved Algorithm on Numerical Analysis, for Discrete S., L. L. Orr, S. "The Benefits and Costs 244-263. linear approx- 10, 839-848. BASSETT, G., and R. Koenker (1982): "An Empirical Quantile Function Errors," Journal of the American Statistical Association, 77, 407-415. Bloom, H. /i 2, for Linear Models with iid H. Bell, G. Cave, F. Doolittle, W. Lin and J. M. Bos (1997): JTPA Title II-A Programs," The Journal of Human Resources, 32, of 549-576. BUCHINSKY, M. (1991): The Theory and Practice of Quantile Regression, Ph.D. Dissertation. Harvard (1994): "Changes University. BUCHINSKY, M. in the US Wage Structure 1963-87: Application of Quantile Regres- Econometrica, 62, 405-458. sion," BUCHINSKY, M. (1995): "Estimating the Asymptotic Covariance Matrix A Monte Carlo Study," Journal of Econometrics 68, 303-338. BUCHINSKY, M., AND Hahn J. "An Alternative Estimator (1998): for for the Quantile Regression Models: Censored Quantile Regression Model," Econometrica, 66, 653-671. Card, D. (1996): "The Effect, of Unions on the Structure of Wages: A Longitudinal Analysis," Econo- metrica 64, 957-980. Chamberlain, G. in (1991): "Quantile Regression, Censoring, and the Structure of Wages," in Advances Econometrics Sixth World Congress 1, ed. by C.A. Sims. Cambridge: Cambridge University Press. Charnes A., and W. W. Cooper (1957): "Nonlinear Power of Adjacent Extreme Point Methods in Linear Programming," Econometrica 25, 132-153. DiNardo, J., NICOLE M. Fortin, and T. Lemieux Distribution of Wages, 1973-1992: A (1996): "Labor Market Institutions and the Semiparametric Approach," Econometrica 64, 1001-1045. FlTZENBERGER, B. (1997a): "A Guide to Censored Quantile Regression," ed. by G.S. Maddala and C.R. Rao. New York: Elsevier. FlTZENBERGER, B. (1997b): "Computational Aspects in Handbook of Statistics 15, of Censored Quantile Regression," in Proceedings Data Analysis Based on the Li-Norm and Related Methods 31, ed. by Y. Dodge. Haywood California: Institute of Mathematical Statistics Lecture Notes-Monograph Series. of the 3rd International Conference on Statistical 29 FREEMAN, R. "Unionism and the Dispersion of Wages," Industrial and Labor Relations Review, (1980): 34, 3-23. J. A., AND W. K. Newey (1995): "Nonparametric Estimation and Deadweight Loss," Econometrica 63, 1445-1476. Hausman, HECKMAN, AND R. ROBB J. "Alternative (1985): Methods in Longitudinal Analysis of Labor Market Data, ed. Cambridge University Heckman, J. J., of Exact Consumers Surplus for Evaluating the Impact of Interventions," by J. Heckman and B. Singer. New York: Press. Smith, and C. Taber (1994): "Accounting NBER technical Working Paper No. 166. for Dropouts in Evaluations of Social Experiments," Heckman J., J. Accounting Smith and N. Clements (1997): "Making the Most Out of Social Experiments: Heterogeneity in Programme Impacts," Review of Economic Studies, 64, 487-535. for IMBENS, G. W., and D. ANGRIST (1994): "Identification and Estimation of Local Average Treatment J. Effects," Econometrica, 62, 467-476. IMBENS, G. W., and D. B. Rubin (1997): "Estimating Outcome Distributions mental Variables Models," Review of Economic Studies, 64, 555-574. KATZ, L., terly KING, AND K. MURPHY (1992): "Changes in Relative Wages: Supply and for Compilers in Instru- Demand Factors," Quar- Journal of Economics, 107, 35-78. Mervyn A. (1983): "An Index of Inequality: With Applications to Horizontal Equity and Social Mobility," Econometrica, 51, 99-115. Koenker, R., and G. Bassett (1978): "Regression Quantiles," Econometrica, 46, 33-50. KOENKER, R., AND V. D'OREY (1987): tistical Society, LALONDE, R.J. J., G. 9, 149-168. Marschke and K. Troske to Analize the Effects of et de Statistique, Data on Establishments United States," Annales dEconomie (1996): "Using Longitudinal Union Organizing Campaigns in the 41/42, 155-185. LEWIS, H. G. (1986): Union Relative Wage Manski, C. F. the Royal Sta- "The Promise of Public-Sector Sponsored Training Programs," Journal of Eco- (1995): nomic Perspectives Lalonde, R. "Computing Regression Quantiles," Journal of Applied Statistics, 36, 383-393. (1988): Analog Estimation Effects: A Survey. Chicago: University of Chicago Press. Methods in Econometrics. New York: Chapman and Hall. MANSKI, C. F. ed. (1994): "The Selection Problem," in Advances in Econometrics Sixth World Congress by C.A. Sims. Cambridge: Cambridge University Press. Manski, C. F. The Mixing Problem (1997): in Programme 2, Evaluation, Review of Economic Studies 64, 537-553. Newey, W. K, and D. McFadden Handbook of Econometrics 4, ed. (1994): "Large Sample Estimation and Hypothesis Testing," in by R. F. Engle and D. McFadden. Amsterdam: Elsevier Science Publishers. Newey, W. K. "The Asymptotic Variance of Semiparametric Estimators," Econometrica (1994): 62, 1349-1382. Newey, W. K. (1997): "Convergence Rates and Asymptotic Normality for Series Estimators," Journal of Econometrics 79, 147-168. Newey, W. K, and J. L. Powell (1990): "Efficient Estimation of Linear and Type Regression Models Under Conditional Quantile Restrictions," Econometric Theory 30 6, I Censored 295-317. Orr, L. L., H. S. Bloom, S. H. Bell, F. Doolittle, for the Disadvantaged Work?, Washington, POLLARD, D. "Asymptotics (1991): for W. Lin and G. Cave (1996): Does Training DC: The Urban Institute. Least Absolute Deviation Estimators," Econometric Theory, 7, 186-199. PORTNOY, S. L. (1991): "Asymptotic Behavior of Regression Quantiles in Non-stationary Dependent Cases," Journal of Multivariate Analysis, 38, 100-113. Powell, J. L. (1983): "The Asymptotic Normality of Two-Stage Least Absolute Deviations Estimators," Econometrica, 51, 1569-1575. Rosenbaum, P. R., AND D. B. Rubin (1983): "The Central Role of the Propensity Score in Observa- tional Studies for Causal Effects," Biometrika, 70, 41-55. Rubin, D. B. (1974): "Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies," Journal of Educational Psychology, 66, 688-701. Rubin, D. B. (1977): "Assignment to Treatment Group on the Basis of a Covariate," Journal of Educa- tional Statistics, 2, 1-26. US Department OF Labor What's Working (and What's Not), A Summary of Research on Employment and Training Programs, Washington, DC: US Government (1995): the Economic Impacts of Printing Office, January. US Department of Labor (1999): Employment and Training Administration, Job Training Partner- ship Fact Sheet, http://www.doleta.gov/programs/factsht/jtpa.htm. 31 Table I Means and Standard Deviations Assign] nent Entire Treatment Control Sample A. Difference (t-stat.) Men Number of observations 5,102 3,399 1,703 Baseline Characteristics Age 32.91 [9.46] High school or GED Married Black Hispanic Worked less weeks than 13 in past year 32.85 [9.46] 33.04 [9.45] -.19 (-67) .69 .69 .69 -.00 [.45] [.45] [.45] (-.12) .35 .36 .34 [.47] [.47] [.46] .02 (1.64) .25 .25 .25 .00 [.44] [.44] [.44] (.04) .10 .10 .09 .01 [.30] [.30] [.29] (.70) .40 .40 .40 .00 [.47] [.47] [.47] (.56) Experimental Characteristics Second follow-up Training .29 .30 .28 [.46] [.46] [.45] .42 .62 .01 [.49] [.48] [.11] .02 (1.14) .61 (70.34) Service strategy: Classroom training OJT/JSA Other Outcome 30 .20 .21 .19 [.40] [.41] [.39] .02 (1.73) .50 .50 .50 .00 [.50] [.50] [.50] (.07) .29 .29 .31 [.46] [.45] [.46] -.02 (-1.57) variable: month earnings 19,147 19,520 18,404 1,116 [19,540] [19,912] [18,760] (1.96) 32 continued from previous page Entire Assigm nent Treatment Control Sample B. Difference (t-stat.) Women Number of observations 6,102 4,088 2,014 Baseline Characteristics Age 33.33 33.33 [9.78] High school or GED Married Black Hispanic Worked less weeks than 13 in past year AFDC [9.77] 33.35 [9.81] .72 .73 .70 [.43] [.43] [.44] .22 .22 .21 [.40] [.40] [.39] -.02 (-.09) .03 (2.01) .01 (1.55) .26 .27 .26 .01 [.44] [.44] [.44] (.95) .12 .12 .12 -.00 [.32] [.32] [.33] (-.89) .52 .52 .52 -.00 [.47] [.47] [.47] (-.08) .31 .30 .31 [.46] [.46] [.46] -.01 (-1.03) Experimental Characteristics Second follow-up Training .26 .26 .25 .01 [.44] [.44] [.43] (.45) .45 .66 .02 [.50] [.47] [.13] .64 (80.24) Service strategy: Classroom training OJT/JSA Other Outcome .38 .38 .39 -.01 [.49] [.49] [.49] (-37) .37 .37 .38 -.01 [.48] [.48] [.49] (-.85) .24 .25 .23 [.43] [.43] [.42] .02 (1.40) variable: 30 month earnings 13,029 13,439 12,197 [13,415] [13,614] [12,964] 1,242 (3.46) Note: The first three columns of the table report means and standard deviations (in brackets) for the National JTPA Study 30-month earnings sample. The last column shows the difference in means by assignment status and reports the t-statistic (in parenthesis) for the null hypothesis of equality in means. 33 a o a) 3 u a E o tot o> oo -—~- h- CO - in rf £ ft £ 3 nj t~ tS O w J3 > CD "o CD s o u CD t-i a "c6 CO 0) CO ^j tO "fl C 6 CD ~ CD +J <e g e& q W = w B& 9 -a - s 15 »? ^Ica 3~ «c T *j *—" O rt a ^ t. * C O *. c c « S c to a; tti Q o, r. 15 § 2.* § 3™ ,", "3 .2 o ^ °|qO o •= S-^S.2 *5S u O co to - io ,-H W C3 tp C C CD * V> .S 3 5 <„ ft +^ £ bE»| H en H (_ a " CO O O I O CN t--. o _ t— iO CO U3 m r CO CO „ b °> o a ? = *— W • 0> «rT =!O a t3 cL-e^ a) H OJ 1 8-S-o s 34 --5 ^ Table III Quantile Regression and OLS Estimates Dependent variable: 30-month earnings OLS A. % Impact of Training High school or GED Black Hispanic Married Worked less weeks than 13 in past year Constant 0.50 0.75 3,754 1,187 2,510 4,420 4,678 4,806 (536) (205) (356) (651) (937) (1,055) 21.20 135.56 75.20 34.50 17.24 13.43 4,015 339 1,280 3,665 6,045 6,224 (571) (186) (305) (618) (1,029) (1,170) 0.85 -2,354 -134 -500 -2,084 -3,576 -3,609 (626) (194) (324) (684) (1087) (1,331) 251 91 278 925 -877 -85 (883) (315) (512) (1,066) (1,769) (2,047) 6,546 587 1,964 7,113 10,073 11,062 (629) (222) (427) (839) (1,046) (1,093) -6,582 -1,090 -3,097 -7,610 -9,834 -9,951 (566) (190) (339) (665) (1,000) (1,099) 9,811 -216 365 6,110 14,874 21,527 (1,541) (468) (765) (1,403) (2,134) (3,896) Women Training % Impact of Training High school or GED Black 2,215 367 1,013 2,707 2,729 2,058 (334) (105) (170) (425) (578) (657) 18.46 60.76 44.42 32.25 14.47 8.09 6,373 3,442 166 681 2,514 5,778 (341) (99) (156) (396) (606) (762) -544 22 -60 -129 -866 -1,446 (397) (115) (188) (451) (679) (869) -1,151 -31 -222 -995 -1,620 -1,503 (488) (130) (194) (546) (911) (992) -667 -213 -392 -758 -1,048 -902 (436) (127) (209) (522) (785) (970) -5,313 -1,050 -3,240 -6,872 -7,670 -6,470 (370) (137) (289) (522) (672) (787) AFDC -3,009 -398 -1,047 -3,389 -4,334 -3,875 (378) (107) (174) (468) (737) (834) Constant 10,361 649 2,633 8,417 16,498 20,689 (815) (255) (490) (966) (1,554) (1,232) Hispanic Married Worked less weeks . 0.25 Men Training B. Quantile 0.15 Note: The than 13 in past year table reports OLS and quantile regression estimates of the effect of training on earnings. The specification used also includes indicators for service strategy recommended, age group and second follow-up survey. Robust standard errors in parenthesis. 35 Table IV Quantile Treatment Effects and 2SLS Estimates Dependent variable: 30-month earnings 2SLS Quantile 0.15 A. 0.50 0.75 0.85 Men Training % Impact of Training High school or GED Black Hispanic Married Worked less weeks than 13 in past year Constant B. 0.25 1,593 —vzt 702 1,544 3,131 3,378 (895) (475) (670) (1,073) (1,376) (1,811) 8.55 5.19 11.99 9.64 10.69 9.02 4,075 714 1,752 4,024 5,392 5,954 (573) (429) (644) (940) (1,441) (1,783) -2,349 -171 -377 -2,656 -4,182 -3,523 (625) (439) (626) (1,136) (1,587) (1,867) 335 328 1,476 1,499 379 1,023 (888) (757) (1,128) (1,390) (2,294) (2,427) 6,647 1,564 3,190 7,683 9,509 10,185 (627) (596) (865) (1,202) (1,430) (1,525) -6,575 -1,932 -4,195 -7,009 -9,289 -9,078 (567) (442) (664) (1,040) (1,420) (1,596) 10,641 -134 1,049 7,689 14,901 22,412 (1,569) (1,116) (1,655) (2,361) (3,292) (7,655) Women Training % Impact of Training High school or GED 1,780 324 680 1,742 1,984 1,900 (532) (175) (282) (645) (945) (997) 14.60 35.47 23.14 18.37 10.06 7.39 3,470 262 768 2,955 5,518 5,905 (342) (178) (274) (643) (930) (1026) -123 -401 -1,423 -2,119 (204) (318) (724) (949) (1,196) -554 Black (397) Hispanic -1,145 -73 -138 -1,256 -1,762 -1,707 (488) (217) (315) (854) (1,188) (1,172) -652 -233 -532 -796 38 -109 (437) (221) (352) (846) (1,069) (1,147) -5,329 -1,320 -3,516 -6,524 -6,608 -5,698 (370) (254) (430) (781) (931) (969) AFDC -2,997 -406 -1,240 -3,298 -3,790 -2,888 (378) (189) (301) (743) (1,014) (1,083) Constant 10,538 984 3,541 9,928 15,345 20,520 (828) (547) (837) (1,696) (2,387) (1,687) Married Worked less weeks than 13 in past year table reports 2SLS and QTE estimates of the effect of training on earnings. Assignment used as an instrument for training. The specification used also includes indicators for service strategy recommended, age group and second follow-up survey. Robust standard errors in parenthesis. Note: The status is 36 DEC 2000 Date Due Lib-26-67 MIT LIBRARIES llllllllllllf" 3 9080 0191 7 6004