Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/ivmodelofquantilOOcher Dewey 11 15 .01- Technology Department of Economics Working Paper Series Massachusetts AN IV institute of MODEL OF QUANTILE TREATMENT EFFECTS MIT Christian Hansen, MIT Victor Chernozhukov, Working Paper 02-06 December 2001 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research NetworK Paper Collection at http://parers.ssrn.com/paper.taf?ar)siract. id= 2 98879 . \tf\ Massachusetts Institute of Technology Department of Economics Working Paper Series AN IV MODEL OF QUANTILE TREATMENT EFFECTS MIT MIT Victor Chernozhukov, Christian Hansen, Working Paper 02-06 December Room 2001 E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssm.com/paper.taf7abstract_icN 298879 MASSACHUSETTS INSTITUTE OF T ECHMOLOGY MAR 7 2002 LIBRARIES An IV Model of Quantile Treatment Effects 1 Victor Chernozhukov and Christian Hansen Department of Economics MIT MIT Department of Economics Working Paper Revised: December 2001 Abstract This paper develops a model of quantile treatment effects with treatment endo- The model primarily exploits similarity assumption as a main restriction that handles endogeneity. From this model we derive a Wald IV estimating equation, and show that the model does not require functional form assumptions for identification. geneity. We tile then characterize the quantile treatment function as solving an "inverse" quan- regression problem and suggest its finite-sample analog as a practical estimator. This estimator, unlike generalized method-of-moments, can be easily computed by solving a series of conventional quantile regressions, and does not require grid searches over high-dimensional parameter efficient. sets. A properly weighted version of this estimator The model and estimator apply is also to either continuous or discrete variables. We apply this estimator to characterize the median and other quantile treatment effects in a market demand model and a job training program. Key Words: Quantile Regression, Inverse Quantile Regression, Instrumental Quantile Regression, Treatment Effects, Empirical Likelihood, Training, JEL Codes: Demand Models. C13, C14, C30, C51, D4, J24, J31. ©2000-2001. Victor Chernozhukov and Christian Hansen. We thank the seminar participants at Urbana-Champaign, Winter Econometric Society 2000, EC2 Conference on Causality and Exogeneity in Econometrics for useful discussions of the research reported in this paper. We would like to express our appreciation to Guido Imbens, Petra Todd, James Heckman, Joel Horowitz, Roger Koenker, Stephen Portnoy, Edward Vytlacil, and especially Joshua Angrist, Jerry Hausman and Whitney Newey for constructive discussions. Conversations with Alberto Abadie at the Konztanz conference, May 2000, Takeshi Amemiya, Han Hong, as well as debates with Tom MaCurdy, while the first author was a student at Stanford, inspired the model and 1 Cornell, University of Pennsylvania, University of Illinois at the estimation developed in this paper. We are especially grateful to them. 1 [ For convenience of refereeing only. To be removed.] Contents 1 Introduction 2 A Model 1 of Quantile Treatment Effects Outcomes and the 3 QTE 2.1 Potential 2.2 The Instrumental Quantile Treatment Model A Comparison with a LATE Model with Common Error 2.3 3 3 5 3 Economic Examples 6 4 Wald IV and Inverse Quantile Regression 8 4.2 Main Identification Restriction The Inverse Quantile Regression 4.3 Conditions for (Global) Identification 4.1 5 Estimation 6 Empirical Applications Demand 6.2 Evaluation of a 6.3 Numerical Performance 15 for Fish 15 JTPA Program 16 17 7 Conclusion and Future Research A Definitions and B Two-Stage Quantile Regression: Inconsistency when D 9 10 1 6.1 C Comparison 8 18 Lemmas with Abadie et 19 al's Model QTE varies with t 19 21 Proof of Theorem 1 22 E Proof of Theorem 2 23 F Proof 3 24 of Theorem G Assumptions R.1-R.6 24 H Proof of Theorem 4 25 I Identification Results: Generalizations 27 J Additional Results: Empirical Likelihood 29 K A Lemma 32 Introduction 1 The ability of quantile regression models, Koenker and Bassett (1978), to characterize them appealing for exam- the impact of variables on the distribution of outcomes makes ining many economic (2001). The applications, see e.g. Buchinsky (1998) and Koenker and Hallock unemployment distributional impacts of social programs, such as welfare, insurance, and training programs are of large interest to economists. in all of these cases, treatment is self-selected or quantile regression inappropriate. This paper Unfortunately, endogenous, making conventional makes two contributions. First, this paper proposes a model of quantile treatment effects with endogeneity. At the heart of the model is an assumption of similarity (containing rank invariance as a special case) that allows us to address endogeneity. This differs from the monotonicity assumptions of Heckman's nonparametric selection model and Imbens and Angrist's LATE model. 2 We show main implication that this model's is a Wald IV estimating equation: P(Y<q(D,T)\Z) = where q(d,r) treatment Y is is is (1) the T-quantile of the potential or counterfactual outcome exogenously set to the value the actual outcome, and justification r, Z is d, D is when the the actual endogenous treatment, an instrument. Thus, the model provides a causal and interpretation of the Wald IV estimating equation (l). 3 We show also that the model does not require functional form assumptions for identification. Second, we characterize the function problem and suggest tor, unlike generalized its q as solving an inverse quantile regression finite-sample analog as a practical estimator. This estima- method-of-moments and other similar estimators, can be easily computed by solving a series of conventional quantile regressions (convex optimization problems), and does not require grid searches over high-dimensional parameter sets. A properly weighted version of this estimator to characterize the is also efficient. median and other quantile treatment We apply this estimator effects in a market demand model and a job training program. An important aspect of the proposed model is treatment effect heterogeneity, given which conventional linear IV inconsistently estimate the average treatment effect (ex- ample 5.1). Thus, even if one is interested in ATE, one has to estimate the QTE's first and integrate them over the quantile index to obtain a consistent estimate of ATE. Alternatively, one may estimate only the median treatment effects to characterize the 4 central effects, using the proposed approach. 2 See Vytlacil (2001) on the distributional equivalence of these two models. important prior work that estimates functions q under restrictions 3 There is very Hogg like (1). This Koenker (1998) characterizes Hogg's estimator as a Wald's IV approach to quantile regression. See Abadie (1995), Christoffersen, Hahn, and Inoue (1999), MaCurdy and Timmins (1998) for GMM-like approaches to estimation and testing. Also see Hong and Tamer (2001) for a fundamental treatment of censoring case. The problem is that a function q, satisfying the estimating equation, has not had any known causal meaning within standard IV models with non-constant treatment effects, such as those of Heckman or Imbens and Angrist. Thus, our model provides a causal interpretation and support of (1) and of these previous important estimators. 4 In expected utility framework, some form of average is typically of interest, but since we (econome- work starts as early as (1975); Further details of the model and the estimator are as follows. oped in the standard potential outcomes framework, and the difference in the quantiles of potential heart of the model is reasonable in is similarity, many The model QTE is is devel- defined as the outcomes under potential treatments. At the a generalization of rank invariance assumption, which applications and also facilitates interesting interpretations of Lehmann(1974), Doksum (1974), and Koenker and Geling (2001). This assumption is different from the monotonicity assumptions of the prevalent IV models QTE, as in (the selection - LATE Similarity also requires less stringent independence models). conditions (allowing, for example, measurement error in the instrument). As a result the model differs from the selection-LATE models. However, the two models do contain a large It common subclass. should also be noted that the model and estimators looked at in this paper both substantively complement and LATE known from the fundamental model of differ and Imbens (2001) developed within the framework. Amemiya's approach and its extension by Chen and Portnoy (1996), as two-stage quantile regression However, we show that differs across quantiles 2SQR is (2SQR), allow continuous treatment variables. not consistent when the quantile treatment effect (Appendix B). The inconsistency is noted, since this estimator has often been used expressly to estimate heterogeneous quantile treatment the other hand, Abadie et its Amemiya QTE model of Abadie, Angrist, (1982) and the al's effects. On (2001) approach applies only to binary treatments, and extension to more general treatments is not known. Allowing general treatments is clearly important. The approach paper expressly allows in this and applies to arbitrary- continuous, it can be used to study education for QTE discrete, or binary effects, demand that vary across quantiles - treatment variables. Thus systems, and any other non-binary treatments. On the estimation side, the inverse quantile regression transparent estimator, unlike is an easily computable and GMM that requires grid searches over high-dimensional The estimator is obtained as a link between Koenker-Bassett quantile Wald IV restrictions. In addition to deriving theoretical properties of inverse quantile regression, we provide user-friendly computer programs that implement the estimator, standard errors, and produce graphical output. parameter sets. regression and the The remainder of the paper is organized as follows. Section 2 presents the model. demand Section 3 provides two economic models as examples: aggregate the returns to education. quantile regression. Estimation methods are described in Section 5, contains two empirical applications, corresponding to models in section A word on notation. Following Koenker, Y and and Section 6 3. we use Fy{-\x) and Qy{t\x) to denote the conditional distribution function and the r-quantile of as analysis Section 4 presents identification results and the inverse Y given X= x; capitals such denote random variables and y denote the values they take. do not know (are agnostic about) which particular average, the entire distributional impact needs to be evaluated. In addition, Manski's(1988) ingenious work provides ordinal utility models of decision making under uncertainty, where agents maximize a r-quantile of utility distribution. In such framework only quantiles of potential outcomes would be of interest to a policy-maker. tricians) typically A 2 The Model section begins with an important preliminary discussion that naturally leads to QTE the of Quantile Treatment Effects model of this paper. Potential 2.1 QTE Outcomes and the We develop our model within the conventional Neymann-Fisher-Rubin potential outcome framework. 5 Potential real-valued outcomes are indexed against treatment D (D £ V, a subset of R'), and denoted Yd, while potential treatment status is indexed against the instrument Z, and denoted D z For example, Yd is an individual's outcome when D = d and D z is an individual's treatment status when Z = z. . The potential or counterfactual outcomes {Yd, observed outcome is Y = YD That only the Z?-th component of {Yd, d £ is, in relation to potential The . V} is observed. Typically outcomes, inducing endogeneity or sample objective of causal analysis butions of potential outcomes Yd. is D is selected selectivity. to learn about the features of marginal distri- For example, treatment effect (ATE). The quantile treatment tiles d € V}, such as wages or demand, Given the actual treatment D, the vary across individuals or states of the world. = EYd — EYd> is the average (QTE) is the difference in quan- fid,d> effect of potential outcomes under different potential treatments: 6 Qrd {r)-Qyd ,{r). A main obstacle to learning about the Early formulations of QTE QTE QTE is the sample selectivity or endogeneity. by Lehmann (1974) and Doksum (1974) axiomatically ability t ( "prone to die at and the treatment. The subjects differ in this latent characteristic and their response to the treatment is described by QTE. An assumption that allows such interpretation is rank invariance. Rank invariance was also used by Heckman and Smith (1997) and Koenker and Bilias (2001) in quantile models without endogeneity. 7 interpret as a a measure of interaction of the latent an early age, "prone to learn Our model fast," etc.) uses similarity as a main restriction that allows to address endogeneity. Similarity facilitates analogous interpretation of the quantile treatment effects in our framework and incorporates rank invariance as a special 2.2 The case. The Instrumental Quantile Treatment Model. first part of the model is a potential outcomes model. The other part relates the treatment choice to the potential outcomes, accounting for endogeneity. 5 See e.g. 6 Heckman and Robb Generally, pact, whereas QTE ATE (1986) and Imbens and Angrist (1994). more informative than ATE, since they summarize the distributional imIn fact, summarize the impact on the first moment of the distribution. are fd,d'=/o {Qrd ir)-Q Ydl (T))dT. 7 Heckman and Smith (1997) use rank invariance to identify QYd -Y'( T ) — Qyd {^) — Qyd ,( T )- Assumption 1 (IQT Model) Al Potential Outcomes. For almost every value of (X, Z) X = x, Given for some Yd = q{d,X,Ud such that q(d,x,r) A2 is Yd the r-th quantile of Ud ~ U(0, = (x,z), 1), ), for < any r < 1. Selection. For unknown function 5 and random process V, given X = x, Z = z, Dz = 5{z,x,V). A3 Independence. A4 {Ud } Given X, Similarity. For each d and Ud A5 Observed variables d', is independent of Z. given W consist of U = UD for ( interest also A3* FULL Independence. {Ud of Z, given much more a is , Al the restrictive special case of A3 and A4 V}, or equivalently {Yd ,D z }, are jointly independent > In ) X. A4* Rank Invariance. Ud = Ud = U is < x,z. { Of Ud Y = q(D,X,U), D = S(Z,X,V), \ 2.1 X, Z) equal in distribution to is ( Remark V, ( conditional r- quantile of the Conditional Quantile for each d. Yd is q(d, x, r), given X = x. Our main interest Treatment Effect q(d,x,T)-q(d',x,r), the difference in quantiles of potential outcomes distributions conditional on x. In A2, the unobserved ment choices Dz random vector V responsible for the difference in treat- is across observationally identical individuals. 5() We do not impose any other assumptions on important to accomodate realistic economic examples. selection function. A3 states that potential outcomes are is the (measurable) this function. independent of Z, given X. A3 This is more general than A3*, the assumption of selection-LATE models, that requires both and the potential treatments {D z } to be independent of the instrument Z. strong assumption that can be easily violated error or there are omitted variables related to when Z A3* {Yd } is a is measured with V) in the selection the instrument (a part of the error is equation. Imbens and Angrist (1994) provide additional examples violating A3*. Given the same observed characteristic x and treatment in terms of potential outcomes. Their relative Tanking the rank invariance or common U,i (V,Z,X) the expectation does not vary across the treatment states may the ranks are "similar," while ex-post the ranks d. by Heckman and Smith (1997). See Example 3.2 also facilitates interpretation of the for differ. Thus similarity allows Bilias (2001). Additionally, A4 is was shown an example. QTE as a measure of the interaction between Doksum the latent ex-ante ability t and the treatment, following and of (any In other words, ex-ante substantial ex-post slippage in the ranks, the importance of allowing which A4 still differ error assumption. A4, similarity, states that given the information function of) the subjects d, determined by the rank or {Uj}. This vector can be collapsed to a single variable under assumption ability vector A4* - is (1974) and Koenker a key identification device, leading to the Wald restrictions in Section 4. Similarity tional in A2 is the main restriction of the LATE/selection models. However, A4 IQT model. It is absent in the conven- enables a more general selection function that requires neither the monotonicity assumption or stronger independence as- sumptions of the LATE models. Thus, LATE models mainly exploit monotonicity and stronger independence assumption to address endogeneity, while the present approach The value uses the similarity assumption. of one versus the other has to be judged in each particular application. 2.3 A Comparison with a Although the common IQT model differs LATE Model with Common Error. from selection-LATE model, the two do contain a large subclass. Indeed, consider the following model: vi Yd = g {ud {X),U), de{o,i}. V2 D z = V3 {Yd, V4 U \{d(z,X) V} (or > V) {Yd,D z }) for some real- valued function d. are independent of Z, given does not vary across potential treatments Assume that g is monotone in U, so that error X. d. U can be normalized to be uniform. a special case of the IQT model, with assumption V4 corresponding to exact rank invariance or common error assumption A4* (see Doksum (1974), Robins This model is and Tsiatis (1991), Heckman and Smith (1997), and Vytlacil (2000) for various jusrank invariance), and V3 being a stronger version of the independence assumption A3, in fact corresponding to A3*. Vytlacil (2000) shows such a model incorporates a wide variety of familiar nonlinear simultaneous equations models. In tifications of turn, the IQT model incorporates the model V1-V4 as an important special case. Economic Examples 3 The following examples highlight the nature of the thorough because Example IQT The model. underlies the empirical applications in Section it (Demand with Non-Separable Error) The 3.1 discussion following supply-demand example. Consider the "random ization of the classic ( i. I ii. iii. { Yp = q(p,U), Yp = p(p,z,U), P e{p:q(p,Z,U) = is quite 6. is a general- coefficient" model (2) p(p,U)}- The map p >— Yp is the random demand function, that is, it is the potential demand when the price is set (externally) to the value p. Likewise, p >— Yp is the random supply function, that is the potential supply when the price is set (externally) to p. > > Additionally, is Yp and Yp , q(-), and p(-) depend on the covariates X , but this dependence Random variable U is the level of the demand in the sense that (p, U) < when U < U'. Demand is maximal when U = 1 and minimal when U = 0, suppressed. q(p,U') holding p fixed. Likewise, >—> p is given by hi is The the level of supply. r-quantile of the demand curve Y p P -*Qvp (-r) , Thus with probability The t, the curve quantile treatment effect elasticity p i-> Yp = lies g(p,r). below the curve p Qyp (t)- d In q(p,r)/d In p. The characterized by an elasticity is i—> depends on the state of the demand r (low or high) and may vary with r. For example, this variation could arise when the number of buyers varies and aggregation induces non-constant elasticity across the individual demand demand levels as a process of This model incorporates many Yp = traditional models with separable error q(p) + £, where £ = F-\U). The model i. is much more general in that the price can affect of the demand curve, while in (2) it only affects the location of demand curve. Condition iii. is summation of curves, holding the price fixed. (3) the entire distribution the distribution of the the equilibrium condition that generates endogeneity - the selection by the market depends on the potential demand and supply outcomes and ii. As a result P = S(Z,V), where V consists of U, IA, and other variables i. (including "sunspot" variables, if the equilibrium price is not unique). Thus what we observe can be written as simultaneous equations of a general form, with observables 8 of the actual price Y = q(P,U), P = S(Z,V). 8 w To appreciate the generality, note that model incorporates, for example, the simultaneous equamodel of Imbens and Newey(2001), who assume that V is univariate, S is monotone in V, both V and U are independent of Z if in addition we assume U is uniform. Imbens and Newey (2001) developed some ingenious identification results using these stronger assumptions. tions y 6 Because of endogeneity, <5vip( t ) 7^ ?(-P. T )> therefore the conventional quantile regression will be inappropriate to estimate the r-th quantile demand curve. Additionally, we show Appendix in A 2SQR that is generally not suitable for estimation purposes. We show that the instrumental variables Z, like weather conditions, that shift the supply curve and do not affect the level of the demand curve U allows identification of the r-quantile of the demand function, p >— q(p,r). Furthermore, the IQT model allows arbitrary correlation between Z and V. This allows, for example, measurement error in Z (e.g. in weather conditions). The standard IV approaches (Heckman et al (2001), Imbens and Angrist (1994)) do not accommodate such a possibility. > Example 3.2 (Education/Training Returns) Let "earnings" in the "education" states d {0, 1} 6 be determined by a "random Y =q 1 An (X,U 1 ), 1 coefficients" Y = individual's training or education decision is qo (X,U model ). given by D=1MZJ,V)>0) where unobserved vector V potentially depends on (but by) the ability vector (f/j,t/ first kind of dependence is ) and arbitrarily is on functions not necessarily determined q\ and q , X and Z. The endogeneity. Roy model, no restrictions are placed on the individual specific and the individual observes these before making the schooling choice. For identification, we impose similarity: conditional on (Z,X,V), £/, equals in distribution to U This is more restrictive than the general case, but perhaps not as restrictive as it may appear. This restriction allows arbitrary correlation between Yq In the standard variations in earnings, . and Y\ and allows the general treatment impacts through the q\{-) and qo(-) functions. A main difference between this model and the Roy model is the implicit ex ante nature of the decision process. Instead of knowing the exact outcomes in any state of the world, the subject anticipates the same distribution of ability across treatment states and makes the decision accordingly. 9 Indeed, consider a simple example that satisfies similarity A4: U = +v r) where 77 is a function of error vector are the slippage terms such that vq degenerate case when v\ = uq , Ui=r] + vi, V in the selection equation, and u and and v\ ~ v\ given (X,Z,V). Rank invariance A4* is a = 0. X Finally note that the similarity only need hold conditional on Z, and V. This seems to be a reasonable framework. For example, people generally decide on whether to attend college or not before they observe their rank/ability , among college edu- cated and non-college educated individuals with observationally identical characteristics. 9 Thus, More it seems a plausible approximation that they would anticipate the same precisely, we assume that he has enough information only to anticipate the same distribution The assumption does not require the subject to have correct beliefs. of ability across states. relative to similar distribution of their rank/ability across the treatment states individuals (with the same X covariates and Z). Another difference with conventional IV model that the is Z allows for dependence to exist between the instrument approaches expressly disallow this, as mentioned and V in the previous IQT model expressly whereas the standard example. E.g., consider the following simple schooling decision rule D= l{<p(Z) + V >0}. In the schooling or training context, if Z is a family background, with a sizable error, so independence between Z capture omitted variables which are correlated to and Z V it may be measured V need not hold. could also and impact the schooling decision but not the outcome. Note that measurement error or omitted variables also violate the monotonicity assumption often used in the IV literature. See Imbens and Angrist (1994) for other examples of violation. To summarize, three aspects of the proposed model are highlighted by the above examples. First, the The similarity IQT model allows arbitrarily general quantile treatment effects. assumption in no way can interpret the QTE restricts their shape. Second, under similarity, we as measuring the interaction between the latent ex-ante ability and the treatment, following Doksum (1974) and Koenker and Bilias (2001). The seems reasonable in many settings. Third, the similarity allows the selection in A2-A3 to be more general than that in the popular IV approaches, although this should be taken as a subsidiary point. similarity Wald IV and Inverse Quantile Regression 4 IQT model and the Wald-type IV restrictions, Koenker and Basset's (1978) quantile regression, and show that the identified without functional form assumptions. Here we establish a link between the relate those to model is 4.1 Main The Identification Restriction following theorem provides provides an important link of the parameters of the IQT model to the Wald-type Theorem 1 Suppose i. ifY is IV estimating A1-A5 hold, equations. and given continuously distributed (q(D,X,r) X,Z is strictly P[Y<q(D,X,r)\X,Z} = increasing in r a.s.) then T, P[Y <q{D,X, T )\X,Z]=T, it. otherwise (q(D,X,r) is non-decreasing in t, a.s.), a.s. P[Y<q(D,X,T)\X,Z}>r, P[Y<q(D,X,T)\X,Z]<T, 8 a.s. . . with the last inequality being strict P> probability By given linking the X IQT model = q{D,X,r) q{D, X,t') if to Wald's IV quantile restrictions, Theorem an empirical and causal content to these restrictions. In this regard, the serves the same purpose > for some r' r with Z and as the LATE 1 provides IQT model model developed by Imbens and Angrist (1994) to provide the link between the Wald's IV approach and the (local) average treat- ment However, our results employ the similarity assumption effects. monotonicity assumptions to obtain this in place of the link. As noted, Theorem 1 allows for an arbitrary variable Y, for arbitrary treatment D, and arbitrary instrument Z. Thus equations (5) and (6) lead to natural ways to estimate any model with endogeneity as long as the corresponding quantiles of the potential outcome distribution q{d,x r) may be specified. We focus on the continuous Y, but the discrete case is clearly relevant - see e.g. Manski (1985), Horowitz (1992), Powell (1986), and Hong and Tamer (2001). variable 1 IQT model Z" strategy to estimate the quantile treatment not possible to use the same "conditioning on Z" strategy to estimate other Before proceeding further, it is very important to note that although the allows the use of a "conditioning on effects, it is treatment effects of interest. For example, in order to estimate the average treatment within the effect IQT model, we need to estimate the quantile treatment first and then integrate them over quantile index here. This feature Example pose 4.1 (Average Treatment is the effects not work = l*(d) mean treatment E [Y — fi{D)\Z] = 0, Effects: Failure of 2SLS) Within A1-A5, sup- the equation Yd fi(d) will analogous to that in the selection-LATE models (Heckman 1990). is /i(d) is finite in where Conventional linear IV r. but this is + Ee d = ed, function. 0, would be natural to expect that It false since generally E[q(D,U)-n(D)\Z}?0, because U is not independent of E [Y = q(D, U)\Z] =11 J D conditional on q{d, u)dP[D Z in general, so that = d,U = u\Z\ J\o,\\ # / f q(d, u)dP[D = J J[0,1\ The 4.2 equality holds The The main if there is d\Z] dP[U = no endogeneity or the treatment u\Z] = E [fJ-(D)\Z] effect is constant. Inverse Quantile Regression identification restriction of problem, which we call Theorem 1 can be posed as an optimization the inverse quantile regression for its "inverse" relation to the . (conventional) quantile regression of Koenker and Bassett (1978). This links the Wald IV model, the restrictions, IQT and quantile regression together. we note that Theorem 1 states that is the r-th quantile q(X,D,r) conditional on (X,Z). Therefore, the problem of In order to obtain the link, of random Y— variable finding a function q(x, d, r) satisfying equations (5) or (6) is the problem of the inverse quantile regression: is the solution to the quantile regression Find a function q(x, d, r) such that problem, in which we regress Y — q{X,D,r) on any function of (Z,X). Theorem 2 formally states this result. Theorem 2 For P-a.e. value (x,z) of (X,Z), the following are equivalent state- ments, for each measurable 1. q' satisfies % Q 3. c assuming equation (5) or (6) (in place of q). = \x,z( T ) q' where °. 4- q' is = tu + + an argmin argminl veR (1 v(x, z) Remark q'(D, X, t). E [pT (Y — q'(x,d) — v) \x, z] , — r)u~ U>(x,z) (measurable) functions v), = Y- integrability, q' satisfies = where p T {a) e ip, = , where the minimum is computed over all candidate and, assuming integrability, E \pT argminl (Y — tp(x, d) — v) \x, z\ . 4.1 Integrability conditions can be removed by subtracting p T >(Y — q'(x,d) — The "argminl" above means is a fixed number, inside the expectation. where v "lim T 'j r argmin," and is a pure technicality, insuring uniqueness of solution. needed there for non-continuous Theorem Y and at It is most countably many values of r € 2 applies to continuous, discrete, or only (0, 1). mixed outcomes, so estimation based is both interpretive and con- on Theorem 2 can be applied to such data. Theorem 2 structive. First, any consistent estimator asymptotically solves the inverse quantile regression problem. Second, Theorem 2 (part 4) suggests a way to construct practical method of moments or minimum distance methods estimators (in addition to obvious based on equations 4.3 (5) and (6)). Conditions for (Global) Identification Here we show that we do not need functional form assumptions to identify long as we have a reasonable instrument. We 10 QTE as focus on the case of binary D, while the ' appendix contains generalizations. The following analysis but we suppress this for ease of notation. functions mapping d from {0,1} to if belongs to \t — + 5] 5,r a.s. for > 6 (y : is all conditional on X = x, Define C{x) as convex hull of the set of > fY {y\d) 0) such that P(Y < ip(D,r)\Z) 0. Define the following function n z (^x) = = where z Z, given X = x. Theorem t— » if for any 0,z 2 )P[D = = <p(D)\z2]), fy(<p(l)\D OH fY (<p(l)\D /y.cMl),l|2i) /y,r.(¥>(l),l|z2) Jz (ip,x) w. is full = hold, if that fyivld, z,x) > and is a unique solution of q(d,x,r)\x,z) = r for P-a.e. ip £ JZ (<P> x ) £(a:) Das is (2001). easy to verify in JTPA example The many difference is 2S finite 6. Z = and has full rank w. pr. For example, suppose Then det Jz is 7^ over the range (7) > 0. effects in weighting by a density. in the applications. discussed in Section l\z 2 ) X = x, given z, finite These conditions are akin to the identification of average treatment (2001) or l\z t ] with positive probability g(d,x, r) >— » = l,z 1 )P[D = = l,zi)P[D = where Z\ and Z2 are independent replica of 2, and > pr. define 0|*i] /y, o (¥7(0),0|2 2 ) 3 Suppose A1-A5 among C(x) P[Y < /r.i>(y(0),0|zi) q(d,x,r). Then d P(Y < )P[D 0, Zl = such that rank Jz(<p,x) is of d frM0)\D = fY {<p(0)\D say that rank will (Z\,Zi) <p{D)\zi\, Assuming relevant smoothness 1,2). ^-n z (¥>) = M<p,x) = We = (zj,j {P[Y < Z = Abadie The condition or as in the 1 equivalent to a nonconstant likelihood ratio property: i|z = = 1) e C(x). The instrument Z should impact the joint distribution of /r D (y>(0),0|Z , A,„Mi), for D any /v,d(j/i side 5 (p at all relevant points. 1|Z = 0) = is finite. for In the any y, so /y o (y(0),0|Z 1) , * JTPA /y, D data the condition In other cases, the condition is Mi), \\z P[D = is = = 1\Z 0) o) = 0] = 0, Y and which means always true as long as the left-hand- simply plausible. Estimation In this paper, it is natural to focus on estimating the basic linear model, which covers a wide area of applications. outcome is In this model a conditional r-quantile of the potential given by (or approximated by) Q Ydlx (T) = d'a T 11 + X'pT , (8) where d is an and x is a A: x of Al, and I 1 x 1- vector of treatment variables (possibly interacted with covariates) vector of (transformations of) covariates. This model a foundation of quantile regression research (see is is a specialization Koenker and Hallock e.g. (2000) and Buchinsky (1998) for reviews). Using Theorem 2 we offer the inverse quantile regression estimator as a finitesample analog of the inverse quantile regression in the population. In the appendix, for completeness and comparisons, we also provide the results for the generalized empirical The presented estimator is perhaps the only practical estimator that can be applied to reasonably general cases. Other strategies such as method of likelihood estimators. moments To or empirical likelihood are typically infeasible, suppose we have no state the idea clearly, first 10 as explained below. covariates or simply treat co- variates as the part of vector d above. In this case, a simple analog of the population inverse quantile regression is as follows: Find 3 by minimizing a norm of 7(a) over a subject to 7(0] solving the quantile regression of Y — D'a on Z: j[a] = argmin i Y^=i Pr^Xi ~ D'{ a — Z[^f). Now suppose we have covariates X t . Then the procedure can be modified as follows: Find a by minimizing a norm of 7(0] over a subject to quantile regression of Y — D'a on Z and X: (7[a], j9[a]) = argmin,,, £ ££., p r (Y - D[a - Xtf - Z'a)- The estimate of j3 Y— can be obtained as a usual quantile regression of In order to improve efficiency, and allow (7(a), /3[q]) solving we allow the observations for estimated instruments. Define the d'a on the X. to be weighted differently weighted quantile regression objective function: 1 n Q n (a,/3 n = ~zZ ) 5=] $i $; V; Vi Hy * - D '^ ~ X'^ ~ ^ 9] where > = $(Xi, Zi), where $ is a smooth r x 1 vector function of instruments, = <S?(Xi,Zi), where $ is a smooth consistent estimate of $, satisfying R5, = V(Xi,Zi) > 0, where V is a smooth weight function, = V(Xi, Zi) > 0, where V is a smooth consistent estimate of V, satisfying R5. Note that one may simply set $, sions above. Efficient estimation of nonparametric estimators is = Zi or V= ; which 1, described in Corollary will give 1. and parametric approximations of V standard smoothness condition, stated as a technical assumption We also assume (a T ,(3T ) belongs to a compact 10 Note, however, that A EL possible feasible approach has is many good set Ay. us the simpler ver- We can use a wide variety and $, R5 in satisfying a appendix G. B. Other technical conditions properties and purportedly performs well in finite samples. as follows. In the first stage, IQR estimates of the parameters could be obtained. Then, in the second stage, the estimates could be recomputed using to a neighborhood around the estimates obtained in the 12 first stage. EL limiting the domain are stated as assumptions R1-R5 in the appendix. Most of them are standard in the quantile regression literature. Now let's formally define the estimation procedure as follows: a = arg inf 7[q]^47[q], such that (/?[<*], 7[a])= = where Q of j3T is [— <5, r 5] for 8 > and A -^-> Q n {a,p,j). arginf (/3, (9) i v 7 )esxc; A is A a positive definite matrix. final w \ ' estimate obtained as /3 Equations is arginfQ n (S,/3,0). (n) are a finite sample inverse or instrumental quantile regression (9) -(10) (IQR). (10) = the quantile regression step, and (9) is the "inverse" step. This formulation allows one to effectively reduce the dimensionality of a potentially difficult is optimization problem to the dimension of a. In GMM, the objective function highly multi-modal and has zero derivative almost everywhere, implying the need to perform a grid search over a subset of R^ where 2 of the next section dim(x) perhaps when dim(i) = + dim(a) = K = dim(i)-l-dim(a) Such an estimator 16). is (e.g. in Example infeasible, except 2 or 3. In contrast, a simple implementation of inverse quantile R dlm Q '. The regression regression would require only a grid search over a subset of quantile steps are solved as fast as OLS by interior point preprocessing, see Portnoy and Koenker (1997). ( methods combined with The computations may be improved further by employing parametric programming. In this approach the quantile regression in (10) is initially solved for some qo, then one solves for P\a\ and 7(0] for nearby a using a standard sensitivity analysis. We now turn to the theoretical properties of the estimator. In the appendix we also study the properties of the generalized empirical likelihood estimators. Theorem 4 Under assumptions R1-R6 listed in the appendix V^(a-a r )-^KAT(0,S), where convergence and 7V(0, S) is joint, is normal vector with mean and variance M = S = t(1-t)£$$', where K = {J'a HJa )- J'Q H, H = J^AJ-,, L = J^ l [Ik 0]M, I - Ja K, $ = V [X' $']', Ja = E [/«(0|X, D, Z)<S>D'} and J = E [/«(0|X, Z)XX'), where t = Y- Da T - X'/3T Finally, Je = E\\/V /«(0|X, Z)**'], where [J' J;]' is the partition of Jg such that Jp is a k x (k + r) matrix and J7 is a r x (k + r) matrix. l : • : : . , Corollary 1 Generally, when the number of instruments $ equals that of enodogenous regressors D, the joint asymptotic variance of a and (3 has a simple form J'^SJ- 13 1 , (12) = E[f,(0\X.D,Z)<b[D' X'}}. Choice of A is irrelevant. Further, if $ = $* = E[D v\Z,X]/V*, where v = f [0\D,Z,X], and V = V* = f (0\X,Z), the asymptotic for J : t variance of a and (3, e simplifies to t(1-t)£[^***']- 1 , where V* = V* [X', $*']'• ( 13 ) If the number of instruments $ is larger than the number of enodogenous D, choice of weighting matrix A matters. An optimal choice of A is given by -1 an r x r matrix. In this case the joint asymptotic variance 22 = (J-, Jg J-,) Corollary 2 regressors A= of \Jg ] a and , J3 joint variance equals (13) N= NSN', where equals (JS J) J~ {J' l J)' 1 J' J~ l If in addition, . V= V* , the . bound for the GMM estimators under conditional moment Theorem 1. This is the efficiency bound in the sense of Amemiya Chamberlain (1987), or Newey (1990). See also Newey and Powell (1990). is the efficiency restrictions as in (1977), Corollary 1 suggests a reasonable approach to estimation and inference. First of all, in section 6, D on Z with OLS we used a simplest and most transparent strategy, to form the instrument $, and setting V; = 1. We projecting used methods described in Koenker (1994) to obtain the estimates of standard errors based on the simple formula (12). Powell (1986)'s methods also apply without modification. Generally, we can use many established methods to either approximate or implement exactly the optimal procedure. We can estimate / (0|) by the kernel methods described in Andrews (1994) or quantile regression differencing as in Koenker (1994), and E[Dv\Z, X] can be estimated using series estimation (e.g. OLS of Dv on Z, X and their powers), as in Newey (1997) and Andrews and Whang (1990). Assumption R5 allows for a wide variety of nonparametric and parametric estimation procedures Andrews (1994) discusses a number of them. £ In practice, it is often reasonable to use parametric approximations, cf. Amemiya we may use conditional normality for /e (|) to get an approximation of the standard errors and optimal weights in the quantile regressions above. On (1975). For example, the other hand, E[Dv\Z, X] can be approximated by polynomial functions estimated by OLS. As long as approximation of the optimal procedure is in Z, X and accurate, the standard errors, based on (13) or on a more robust formula (12), will also be accurate. When there is a compelling reason to use instruments $ of dimension larger than that of D, Corollary 2 describes the choice of the weighting matrix A that simplifies the asymptotic variance. in programming languages R (free software and Matlab that implement the estimation and inference are available from the authors. The programs implement both the optimal and The documented computer programs available from www.r-project.org) sub-optimal instrument cases. 14 Empirical Applications 6 This section presents the empirical illustration to the economic models presented in section 3. The first example is demand model, and a market the second example is an evaluation of a job training program. Demand 6.1 for Fish we present estimates of demand elasticities which may potentially vary demand, r. The data contain observations on price and quantity of fresh whiting sold in the Pulton fish market in New York over the five month period from December 2, 1991 to May 8, 1992. These data were used previously in Graddy In this section, with the level of (1995) to test for imperfect competition in the market and later in Angrist, Graddy, and Imbens (2000) to IV estimator illustrate use of the conventional The as a weighted and quantity data are aggregated by day, with the price measured as the average daily price for the dealer and the quantity average of heterogeneous demands. price as the total amount of fish sold that day. The data also contain information on the day of the week of each observation and variables indicating weather conditions at sea, which are used as instruments to identify the demand equation. consists of 111 observations for the days in The demand function we estimate where t of Yp is demand when demand Qin(yp )|x(T) = a T \np + X'PT price The non-separable error and random The top two panels IQR is sample p. elasticity , aT varies across the quantiles 3, this is a demand model with elasticity. of Figure of ln(Y) on ln(P) using total takes a standard Cobb-Douglas form: Following discussion in section level. The which the market was open. provide the estimates of elasticities obtained by 1 wind speed as the instrumental variable, while the lower (QR) panels depict standard quantile regression estimates. The shaded region around the point estimates represents the 80 percent confidence interval. While the reported estimates are for a model without covariates, the estimated elasticities are not sensitive to the inclusion of dummy The price effect on quantities sold, as estimated by constant across the entire range of quantiles. quite small, in all range from -2 to cases -.5, week or other variables for the days of the much less The magnitudes than unity. with the median QR, appears IQR elasticity of -1, indicating variation of elasticities IQR greater in magnitude than the price effects predicted by QR demand curves estimates is plotted in Figure very different. IQR of the effects are also estimates, on the other hand, with the level of demand. Except at high quantiles, the in the covariates. to be approximately 2. elasticities are QR. This is Note that the interpretation of estimates a (causal) uniformly clearly demand model, shown IQR and while QR estimates the conditional quantiles of the equilibrium quantity as a function of the equilibrium price. The IQR estimates of the demand elasticities a T illustrate demand levels. The results indicate that demand elasticity is 15 heterogeneity across the quite high in magnitude at low quantiles, but is explanations for this aggregate decreasing in the quantile index. While there are demand behavior, many possible does cast doubt on the hypothesis that the it market is a sum of the demand curves of numerous identical who randomly arrive at the market. The estimates may also suggest statistic may be insufficient to truly capture the demand function variety. demand in this price-taking agents that a single Evaluation of a 6.2 The impact JTPA Program of job training programs on the earnings of participants, especially those with low income, is of great interest to economists, but evaluating the causal effect of training programs on earnings is difficult due to the self-selection of treatment status. However, data available from a randomized training experiment conducted under the Job Training Partnership Act (JTPA) provides a mechanism for In the experiment, people were randomly assigned the offer of addressing this issue. JTPA training services, but because people were able to refuse to participate, the actual treatment receipt was Of self-selected. training. those offered treatment, only 60 percent participated in the number of individuals from the control group who The random assignment of the training offer provides a plausible a person's actual training status. Adadie et. al. (2000) and Heckman There was also a small received training. instrument for and Smith (1997) provide detailed information regarding data collection procedures and institutional details of the JTPA. We limit the analysis to the adult males. To capture the effects of training on earnings, we estimate a linear model: Q Yd \x(T) = da T + X'pT , where d indicates training status and is instrumented for by assignment to the control group, the potential outcomes Yd are earnings, and X is a vector of covariates. The data consist of 5,102 observations with data on earnings, training and assignment status, and other individual characteristics. Earnings are measured as total earnings over the 30 We month period following the assignment dummies for black and Hispanic also include GED into the treatment or control group. persons, a dummy indicating high- dummies, a marital status dummy, a dummy indicating whether the applicant worked 12 or more weeks in the 12 months prior to the assignment, a dummy signifying that earnings data are from a second follow-up survey, and dummies for the recommended service strategy. 11 school graduates and holders, five age-group Results for standard quantile regression are illustrated in Figure 4 and mates The shaded region estimates. The first panel in Figure 3. for the point IQR esti- represents the 90 percent confidence interval in each figure shows the estimated impact of the participation in the training program across various quantiles. A quick compari- son of the two sets of results shows that the standard quantile regression estimates of the statistical impacts of training are well above the treatment regression estimates are uniformly larger than the the difference is quite substantial. This difference IQR is The quantile and in many cases perhaps most important in the n The recommended service strategy was broken into three categories: training and/or job search assistance, and other forms of training. 16 effect. estimates, classroom training, on-the-job low to middle quantiles where the conventional quantile regression estimates indicate a relatively large statistical impact of training on the earnings of participants. The QR differences in the standard and the IQR estimates, as well as the distri- made even more apparent when one butional impacts of the program, are the impact of training in percentage terms. 12 considers Quantile regression estimates indicate The IQR large percentage impacts, especially in the lower quantiles. estimates, on the other hand, indicate that the percentage causal impact of the training program and relatively constant This is low, is between 5 and 10 percent, along the whole distribution. interesting since the supposed intent of job training programs to raise the is incomes of low income individuals. However, we observe that the impacts were actually the greatest for the upper quantiles. Coefficient estimates for several of the covariates are also included in the figures. None of the results are particularly surprising. Being Hispanic has no significant impact on potential earnings at any point medium and in the distribution, while at We quantiles, blacks earn significantly less than whites. high also see that education, as measured by high school graduation or having a GED, has a positive impact along almost the entire distribution, with the impact growing monotonically in the quantile index. This pattern is also observed for the marriage effect, which tapers off in the The highest quantiles. effect of having worked the previous year runs in little in almost exactly the opposite direction, impacting earnings negatively at and decreasing earnings substantially in the upper tail quantiles all of the distribution. We next compare our results with those in Abadie et al.(2001). Since identification two models comes through different assumptions and the estimated treatment ef- in fects are for different populations (the Abadie et al's model is for the sub-population of LATE-compliers), the estimation results need not agree. However, the ample where both sets of are almost certainly satisfied, and ilar characteristics, similarity it seems reasonable that, assumption is also fulfilled. relative to others 13 LATE-compliers is is an ex- with sim- Under these conditions, the models overlap and the results should indeed be comparable of JTPA assumptions appear to hold. Independence and monotonicity if the subpopulation representative of the entire population. This appears to be the case in the present example. Lastly, consider the results of Heckman and Smith (1997). The model of Heckman and Smith (1997) did not incorporate endogeneity (it had a different point). Thus their results correspond to our QR results (fig 4), and differ from the IQR results (fig 3). 6.3 The in Numerical Performance objective functions for selected quantiles from Examples Figure from the 5. fish The upper 1 and 2 are graphed three panels in the figure illustrate the objective functions example, while the lower panels correspond to the 12 JTPA example. The The percentage impact of training is calculated for both whites, Percentage Impact I, and black, Percentage Impact II. Percentages are calculated for married high-school graduates aged 30 to 35. Note that conditioning on covariates weakens the required similarity condition requiring that similarity only hold for people with the same covariate values. 17 objective functions are very well-behaved, especially in the the objective functions from the fish example does have However, attributable to the small sample size. JTPA many example. local Each of minima, which in all cases, the functions is have an obvious unique global minimum. Conclusion and Future Research 7 This paper offered two contributions. First, which allows effects main it proposed a model of quantile treatment for treatment endogeneity. identification restriction. The The model model resulting differs exploits the similarity as a from both Heckman's non- LATE model. From this model show that the model does not require functional form assumptions for identification. Second, we characterized the quantile treatment function as solving an inverse quantile regression problem and suggested parametric selection model and Imbens and Angrist's We we derive a its finite-sample analog as a practical estimator. Wald IV estimating equation. This estimator, unlike generalized method-of-moments, can be easily computed by solving a regressions, tile sets. A quan- series of conventional and does not require grid searches over high-dimensional parameter properly weighted version of it is also efficient. characterize quantile treatment effects in a market We applied this estimator to demand model and evaluation of a job training program. An important feature of the proposed model interested in quantile treatment effects, one may is that even though one still may not be have to estimate them. Indeed, the average treatment effects can not be estimated by conventional IV methods, as shown in example 5.1. 14 Instead, quantile treatment effects have to be estimated first and then integrated over the quantile index. Alternatively, one may estimate only the median treatment effects, using the proposed model and estimator. In companion works, we consider a number of directions. In a joint work with Whitfully non-parametric estimation, which poses ney Newey and Guido Imbens, we explore an interesting problem. Other research directions are also considered. For example, an important research question an expected costs, and utility effects *Note that the is how to estimate policy-relevant treatment effects in framework, given particular social on choice probabilities local average treatment effect (cf. may 18 Heckman still be loss functions, known program et al (2001)). identified without estimating the QTE. A. IQR: Treatment Effect A.IQR: Intercept o o i 0.2 0.4 0.6 r 0.8 0.2 0.4 QR : 0.8 Quantile Index Quantile Index B. 0.6 Price Effect B. QR: Intercept o o 0.4 0.2 0.6 0.8 0.2 Quantile Index Figure 1: Inverse Quantile Regression It tends to be estimated by QR. 0.6 0.8 Quantile Index and Quantile Regression Results quantile treatment effect, estimated by IQR, curve. 0.4 much higher than the is The demand for fish data. the elasticity of the T-th quantile "price effect" on the r-quantiles of quantities sold, Demand Functions by Cond'l Quantile Functions Deciles . o ° o ° o "©._ ° "--°. fib" --o. o "°o>. --. "-ex o -.. ..o "eP. o o <*"--- o °~°- -- o o o -o. --©P. ° » o o"--». o "— o--.. <*>-... o o ---. "—-—__ o o o o o 0.0 -0.5 0.5 log-price Demand Functions by Cond'l Quantile Functions Deciles '". Co o to • K» o °-. o D o\ %» ° o o "°- ••-? <>*-. ° °o ^- o- --.--. ° -".-;; £-° do-.. o ** price Figure 2: Le/t fish is in price Column: The estimated by IQR demand curves, indexed by the quantile index (.1, .2, .3, .5, .7, .8. display and .9.). The top display is the original space. Right Column: quantity sold as a function of price. The bottom "O---.. ° ° *:-•-o o o o -«-- display is in log-price-log-quantity space. The estimated The top in the original space. display is The bottom conditional quantile curves of in log-price-log-quantity space. Percentage Impact IQR: Treatment Effect Percentage Impact I II \ ^ ^ ,>- c o O o j CM ^ c ^ * t a O o o o 02 0.4 0.6 02 0.8 t 02 0.8 0.6 Quantile Index HS Grad Black Hispanic o o ••• ' CM £ c o o o CM a %^1 O O o o f O o . t - 0.8 '.'>,." pM -,. l y^' o § " 0.4 Quantile Index CD I 0.6 Quantile Index o o c 0.4 c-r " CM 1 <0 02. 0.4 0.6 0.2 0.8 0.4 0.6 0.2 0.8 0.4 0.6 Quantile Index Quantile Index Quantile Index Married Worked < 13 wks Intercept o o 0.8 .~- :-. . o o c *" 8 o c a ' . o # c \x ^ScW £ jW o 'W w 02 o o 0.6 0.8 Quantile Index Figure ^tf 0.2 0.4 0 s ^^w"- o o o 0.4 I o O S 0.6 ^g£§j!r „.^-rp0P^ o <&-j£ 0.2 0.8 Quantile Regression results on 0.6 Quantile Index Quantile Index 3: Inverse 0.4 JTPA data. 0.8 Percentage Impact QR: Training 1 o 0.2 0.4 0.6 II 8 o 0.2 0.E Percentage Impact 1 0.4 0.6 0.8 0.4 0.2 0.8 0.6 Quantile Index Quantile Index Quantile Index HS Grad Black Hispanic o o o ~-A. CM U^B^fl o my*Vx w* c 0> e . : '.•: \sk v, . 0) pT' A v/« V Vi o o 0.2 0.4 0.6 0.2 O.t 0.4 0.6 0.2 0.8 0.4 0.6 0.8 Quantile Index Quantile Index Quantile Index Married Worked < 13 wks Intercept i m o c o o A x& CM C '/^' 0) idX~^ o 0.2 0.4 0.6 0.2 0.8 0.6 0.8 0.2 4: Quantile Regression results on 0.4 0.6 Quantile Index Quantile Index Quantile Index Figure 0.4 m0^ JTPA data. 0.8 = .2 A. Quantile Index = .4 Index = .1 B. Quantile Index A. Quantile Index B. Quantile 1000 -100D = .5 A. Quantile Index = .7 B. Quantile Index = .9 3000 alpha Figure the 5: JTPA A. IQR example. objective functions for the fish example. B. IQR objective functions for A and Lemmas Definitions We W= use the following empirical processes in the sequel, for f -> E„/(W) = -f] f(W ~ G„/(W) = 4= V (/(WO - £/(Wi)) f t ), D, X, Z) (Y, ^= £"=1 (/ (Wi) - Ef(Wi)) j. if / is estimated function, G n /(W) means: f= Outer and inner probabilities, P* and P, are defined as in van der Vaart (1998). In this paper For example, -^-> means convergence in (outer) probability, will say that process {I i—* each e > and 77 > 0, £ C} v n (l),l there is <5 > ti—*oo sup valued objective functions. Related literature A.l (Geyer's Lemma) Qn finite converges to Qoo in R on on an open non-empty set provided the latter Lemma i. Hi. Zn Zoo is S.t. s a.s., uniquely defined Q n () Qn{Zn < ) if for > v„(l')\ < 77) € is , bounded pseudo-metric space. totally They allow general discontinuities Rockafellar and is and Wets and R - (1998). a sequence of lower-semi-continuous conlet V> be a countable dense subset of dimensional sense, where Qoo is Isc R . convex and then — in a.s. Qn(z) >nf zeR d argmin z£R d Qoa(z) R arginf Qoo(z), . + in, in \ 0; Zn uniquely defined inM. is => <3oo() in t°°(K) over any compacts * d = Op (l). a.s. K, where Qoo is continuous. Then Zoo. Two-Stage Quantile Regression: Inconsistency when QTE varies with r The model proposed by Amemiya consists of (t) (ii) where We in distribution. A.2 (Approximate Argmins) Suppose Zn B is R T>, in finite arginf Q„(z) ii. is Suppose {Qn} vex Ik-valued random functions, defined on If — \v n (l) following results are from Knight (1999). Lemma convergence p(Z,l')<<5 some pseudo-metric p on £, such that (C,p) The — means : limsup P*( for and stochastically equi- continuous (s.e.) in £°°(C) is D is an endogenous vector, Y = D'6 + X'(3 + U, D = Z'-y + V, U (14) depends on the real-valued U, X is a vector of and V are independent of X and Z, and U and V D i.e. exogenous or predetermined variables, two equations are jointly symmetric and absolutely continuous. 19 can be estimated by two-stage LAD, Amemiya (1982). by projecting and using the median regression of Y on (Z'^y,X). Another valid second stage is the quantile regression, cf. Chen and Portnoy (1996), known as two-stage quantile regression (2SQR). Parameters D on Z (/3,5) to get 7, Model imposes the constant (14) QTE. The treatment variable, if assigned externally, shifts the location of the outcome variable, but does not affect the scale or shape of This severely limits the treatment The assumptions 2SQR non-constant, of constant its distribution. variety. QTE is 2SQR. crucial for validity of If QTE effects are does not consistently estimate them. Unfortunately a rather extensive empirical literature has used 2SQR to estimate the non-constant QTE. To explain the inconsistency, it suffices to consider an example with no endogeneity. Suppose for some increasing one-to-one map 5(): Y = DS(U),U = U(0,1), Assume that V, V D= Z'f Z, U, V + V, (15) are mutually independent. have densities conditional on = 0. It is sufficient to show 2SQR problem. That is, there is Z that 6(t) population no a such that i. ii. By {Y < a + definition it > 0. To pin generally not the is down 7 we may optimum in the E(l(Y<a + 5(T)Z'y)-T)=0, E (l(y < a + 5(j)Z'-y) - t) Z'-y = 0. S(r)Z'-y} = {V5(U) + Q = Qm(t), where thus D and that assume E[V] - Z'-y{5{U) M = V8(U) + Z'-y 6(t)) < (6(U) a}. Equation - i. 5(r)), remains to check whether £(1(M<Q m (t))-t).Z'7 = Generally (17) Equation (17) holds when is false. This necessarily happens when when t = 1/2 and M Simple examples is M is (17) 0. T-quantile independent of Z: = Qm(t) Pa.e.«?[M <Qm(t)\Z] = Qm\z{t) <5(t) = 5, P a.e. the constant treatment effect case, symmetric given Z, as suffice to t in Amemiya confirm that (17) indeed or, for V=5+ • Z'-y = . 5(U) The + 5 = N(0, fails. The first example involves no truncated to be positive, 1), N(0, 1), truncated to be positive, N(0, 1)/100, D= Z'-f + V, Y = D 6(U). following computation uses monte-carlo integration using 500, 000 simulations. . E{1(M < Qm{t)) -T)Z'-y = The second example • V= 5 + N(0, 0.34, for r = involves endogeneity: 1), truncated to be positive, 20 .7 with example, (1982). endogeneity: • implies: s.e. of .003 = • Z'"f The + 5 = S(U) . iV(0, 1), truncated to be positive, D = Z'-y + V. V/100, Y = D 6(U). - following computation uses monte-carlo integration using 500, 000 simulations. E (1(M < • C Qm{t)) - = t) Z'-y 0.38, for r = with .7 Comparison with Abadie s.e. .003 et al's Model and Imbens (2001), henceforth AAI, the treatment variable D and the The binary nature of D is critical, and extensions to the general case are not known. The general, non-binary case is clearly important. This approach, In Abadie, Angrist, instrument Z however, well suited to is are both binary. many experimental The potential outcomes Yd studies. by the treatment status d € {0, 1}, and the potential treatments D 2 are indexed by the instrument status z e {0, 1}. The realized outcome is Y = Yd, while the realized treatment is D = Dz- AAI impose the independence condition are indexed (Y , Ylt Di, Do) are independent of Z, (18) and the monotonicity assumption: Di > Do For example, Do = and Vi. (19) let Dz = i.e. a.s. l(y>(0) Model + V) > l(v(Z) and Di = where V), 1(^(1) + Exploiting that shows the converse D and Z is independent of V, is depend on the potential outcomes Yo independence and monotonicity assumption. true as well, in the sense of distribution equivalence.) and monotonicity, AAI show that are binary, independence, the subpopulation of compilers, where (20) V may V). (20) along with (18) satisfies the (Vytlacil (2001) also Z D\ > Do, the realized treatment D is in independent of potential outcomes: (Yi, Yo) are The independent of D \X, D\ > Do- compilers are manipulated by the instrument and, therefore, randomly receive a treat- ment That status. is, the treatment status given to is them independently of their potential X. That is, endogeneity is removed responses Vo and Vj, conditional on observed covariates in this subpopulation. Let Qy\x,c(t) denote the r-quantile for the population of compilers conditional on (X, D\ Do)- The quantile treatment effect <5(r) is a difference > and in the conditional r-quantiles of Y\ Vo for compliers: Qy Ix .c(t)=6(t)D + X'P(t). AAI suggest an ingenious weighting scheme that "finds compliers" (compliers are unobserved) and interpret their estimator as a re-weighted The main First, our model's relative to compliers. example, in Koenker and Bassett's quantile regression. differences with our approach are the following. QTE is defined relative to the population, while AAI's The compliers may QTE is defined substantially differ from the entire population. For Angrist and Krueger (1992), the compliers are those whose education level affected by their birthdate. Thus, the 90% QTE 21 in AAI's model may differ substantially is from 90% QTE in our model. The QTE's of two models may coincide if compilers mode! are representative of the population and other assumptions overlap as well. the in AAPs Second, AAI's model applies to binary cases only, while the present approach applies to general cases. similarity or Third, the estimation procedure are fundamentally different. QTE rank invariance to identify AAI while Fourth, we use use the monotonicity and stronger independence conditions. D Proof of Theorem Part (a). Conditioning on X=x is P[Y<q[D,T]\Z = 1 z] = P[q[D,UD < ( } ] ( J? l 2> W (5) = ( ) q[D,r]\Z P[U D <t\Z = f J P[UD < p t\Z = z, [cV,„, < t\Z <r|Z = = z] z], f P[U < t\Z = P[L/ Z suppressed. For P-a.e. value z of z, V= = z, dP [V = v] V= V= v] v] v\Z = z] dP [V = v\Z = dP [V = v\Z = z] z] z] (7) = r. Equality (1) is by Al and A5. Equality (3) is by definition. Equality (4) (5) is by the similarity assumption A4: for each d, conditional on (V = v, Us(z,v) equals in distribution to Equality (6) when r show i-> (2) is by definition and equality q{d, r) is continuous, since (7) is U >-* q(d, r) (0, 1) h q\d, r] is strictly increasing, left-continuous, P[q[D,UD ] = q[D,T]\Z = z]=0, so that P-a.e. P[Y<q[D,r]\Z] =P[K <q[D,r]\Z}. 3 r t—* q [d y r] is said to be left-continuous if Hm T /j T q 22 [rf, r'\ = is Z= x, z) q [d,r]. is immediate {U D < t} (0, 1) for < (2) To strictly increasing. the event ] i-> = . the event {q [D,Uo] < <j[Di T ]} by t q\d, r] non-decreasing on other hand, the event {q[D, U D < q\D, t]} implies the event {Ud 15 strictly-increasing and left-continuous in (0, 1) for each d. r by A2. Equality by A3. Note that equality we assumed that r holds more generally, simply note that for t 6 Finally, since is X each t}, since t we have d. i-> implies On the q[d, t] is Part (b). Conditioning on X=x suppressed. For P-a.e. value z of is P[Y <q[D,T]\Z = z] l P[g[D UD ]<q{D,r}\Z = { Z ) > (2) > P[UD <t\Z z} (22) = z], (3) > P [U < t\Z =z]=t, where the equalities (1) and by the same arguments as (3) are in {UD < t} is a subset of the each d. On the other hand, equality (2) follows because the event by t i—> q [d, t] non-decreasing for P[Y < q[D,r]\Z the proof of part event {q [D, and (a), UD < ] q [D, r]} =.z] ®P[ q [D,UD ]<q[D,r]\Z = z] (23) = (ar<)P[UD <t\Z = z] <&P\U <t\Z = z]=t, where the equalities (4) holds as an equality if the event {q [D, q[D, UD < ] and q [D, by the same arguments as (6) are q[D,r]} equals the event Z= conditional on t] is flat at t, P-a.e., z, (5) z, since < r} by r >— q [d, t] non-decreasing Proof of Theorem 2 First show that statement (1) 4=> statement (2). Let e = Y — q'(d,X,r). This by definition Q e \x,z( T ) — inf{m P[t < m\X, Z] > t}. diately = Q e [r\x, z] next show that statement (2) •!=> statement (3). it is a best predictor under asymmetric absolute loss, We 55. any v follows imme- : quantile. Therefore, p. Z = {UD < t} P-a.e. If on the other hand, if with prob > 0, conditional on Z = z, then {g [.D, U D ] < q [D, t]} is a strict subset of the event {U D and left-continuous for each d. E in the proof of part (a). increasing at r, P-a.e., conditional on t] is strictly We need to show a stronger fact — (2) & (3) is the conditional cf. Manski (1985), — extending his argument. Write for < E[p r (e - v)\x, = z]-E \pT - t) (1 (<r) \x, z] v [ {-v)dF +t [ J\o,< l,oo) e = (l-r) + [ = v P[e < v\x,z] (e-v) dF c + (1 dFe [e\x, z]+ J [e- tv] dF e [e\x, z] (24) [e\x,z] -r) v P[e G (v,0)\x,z\ - rvP[e >Q\x,z] [e\x,z] v((l -r)P[e <0\x,z] (25) -P[e >0\x,z]r) + [ 23 (e - v) dFc [e\x,z] > 0. > (25) since v 0, < and the other hand, P[e (e / - v)dFe \e\x, that there z] [p T = z] 0, if < r and (ii) f°(e - v) dFc [e\x,z] > 0, and one of P[e = 0|x, z] > 0, the inequality (i) is strict. If on then the inequality 0, occurs only P if e [e|x,2] c \ must be (ii) Indeed, in this case strict. no mass) on (assigns is flat =Q contradicts to (v, 0), which given Xi z(t)- > {e-v)\x,z\- E[p T = (1-t)/ +t f 0|x,z] Indeed, strict. 0|x, no mass at is Next, for any v E = = < P[e (i) these inequalities must be {e)\x,z\ dF v e [e\x z]+ 1 [(l-T)v-e] dFe [e\x,z] J dFt [e\x,z] (-v) 7[ "' oo) = (1 - t) v P[e + f [v-e] J(0,u) = < 0|x, z]-t v P[ee - tv P[e > (0, v)\x, z] (26) v\x, z] dF,[e\x,z] ^((1 - r) P[e < 0\x, z]-t P[e > + 0|x, 2]) [v-e] dFc [e\x, / z] > >(0,v) 0|x, z] < 1 - r and / "[t; - e] dF [e|x, 2] > 0. (26) = if (i) < 0|x,2] = r and (ii) the second term is zero, (ii) happens iff t 1— F (t\d, 2) is flat at (0, t>). When (26) = 0, is not the unique predictor under p T loss. Since r h-> Q £ x s (t) is left-continuous, for any sequence r^jrwe have q m = Qe|i, (''"m) T and for any v > 0, denoting e m = € — q m > since v > > 0|x, 2] = 1 — (26) P[e » e thus P[e t | E and P[e > r, , (e [p T m m - !: , «) |x, 2] = v((1-t^) P[e m - E PtL (e [ m) |x, 2] <g m |x,2]-r^ P[e m >gm |i,z]) + v ' since both of the terms are non- negative „+ /( m. (q m )[ v ~ e ~ Qm\dFe [e\x,z] > 0, by the since earlier =Q £ , v + qm ) for large m. E In addition, [p Tin (t m- by arguments v) |x, z] Thus, <5,[t4,|x,2] are unique best predictors -E [u - e]<*F£m [e|x, arguments, and |i, z In other words, the last statement implies that / (t) e {q P [|x,2] e [p T m (e , m) for for large is only needed for at most countably Finally, equivalence (3) -» (4) F The is |x, 2] > any v < 0. lim x ' j T and statement Qt m {T'm \x,z] (3). many r in (0, 1). obvious. a special case of Theorem 5 in section I Assumptions R.1-R.6 following assumptions are maintained for the inverse quantile regression. 24 = for sufficiently large Proof of Theorem 3 The proof G is 0, [v— e]dPem [e|x,2] (modified by the limit operation) best predictor under asymmetric absolute the limit operation > has to assign positive mass to m, and (2) , J"(0 m ,v + qm ) like in (25) for Thus, we demonstrated the equivalence of statement 2] J(0,v) = 0. is the unique loss. Note that , Rl W = R2 (q t ,/? t r (Yi,Di,Xi,Zj) are ) - D[a - E<pr(Yi : and (Di,Xi,Zi) take values iid V= S interior V, where X[0)^>i = 0, AxB where in a compact set. compact, convex, and {a T ,(3T ) is = V tf; t - \X[ and *;]' : >p T (u) = is unique (a,0) (l(u < 0) - r). R3 y has bounded conditional density given X, D, Z, uniformly over support of (X, D, Z). R4 J(ir) = a(Q g, y) £ T (V - £>'<* - A''/? - $'7)*] has full column rank and is continud,m <T' ous at each (a, 0,^) in Ax B x Q, where Q is an open ball in R at zero. 1— >— $>(z,x) and (z,x) V(z, x) belong to a set T wp — 1; F is R5 Functions (z,x) set of boundedly differentiate functions C ^ with smoothness order > dim(z,x)/2 16 [¥> , > » 7 77 , *() -*-» #(•), V(-) -=-> V(-) Remark G.l regression. tile Remark G.2 All assumptions, but They may be Smoothness As discussed (x, z). £ in 5", uniformly over compact > 0. R5, are analogous to the standard assumptions refined at a cost of R5 needs in sets. V(-) more complicated notation and for quan- proof. to hold only for the non-discrete sub-component of the text condition R5 allows for a wide variety of nonparametric parametric estimators, as shown by Andrews (1994). Ideally, we would like and to approximate the optimal instruments and the optimal weight as closely as possible using non-parametric or parametric methods. There a wide variety of estimators that satisfy assumption R5, such as is smooth parametric approximation to V(X, Z) and $>(X, Z) and smooth kernel estimators (1990), H 1. and Newey and Powell (1990) for or, alternatively, See Andrews (1994), (1995), series estimators. various smooth Newey (1997), a catalogue of estimators that satisfy condition R5. Proof of Theorem 4 In the proof IV denotes (Y,D,X,Z). Define for = (0,-y) and O = (l(u<0) -t) f(W, a, f(W, where *=V (X' , *')', g(W, = (t — l(u = = >fr{Y <p-r(Y - D'a - X'0 - D'a - X'0 - * = *(X, Z), * = V g(W, a, where p T (u) 9) a, 9) < 9) a, 9) = = for ee B x See page 154 ee 7 )*, $(X, Z); D'a - X'0 - $'f)V, p T (Y - D'a - X'0 - $'f)V, 0))u. Let Q(a, 9) = Eg(W, a, 6), Q 0(a) = (J3(a), 7(a)) 0(a) = (0(a), 7 (q)) a = arg 6 5 7 )f $' p T (Y - Q n (a,9)=E n g(W,a,9), and {X', *)', $' in = arg inf Q n (a, 9), = arg inf Q "* inf ||7(a)ll, van der Vaart and Wellner (1996). 25 = (a, 0), arg inf ||7(<*)ll- (A-,0) and <p r (u) = By Theorem 1 (i), the true parameters (a T Eip T {Yi On the other hand, by R3 9(a) - ,/3 T - D',a r ) solve the equation - *;0)*, = Xi/3T 0. the equation: satisfies - D[a - X[p{a) - *h(Q))*i = Etp T (Yi 0. We need to find a* such that this equation holds and the norm of 7(a) a' = aT makes the norm of 7(0*) For each a and 9, Thus a* = a T equal zero. R2 P(a") by a LLN ( lemma K.l unique. Additionally, by 2. = = f3T is as small as possible. is a solution; by R2 it is . E n [g(W, oc, 9) - g(W, a T , §)] ) -^ E[g(W, a, 9) - g(W, a T ,9)}, some fixed 9 (the subtraction of the terms is to make the summands bounded functions of W). The lhs is a finite convex function in 9 and a, at least wp — 1. Therefore the convergence for > is uniform over compact Hence by lemma A.l sets. 9(a T ) -£-> and 9(a) 9-r 3 -^-> a T ( shown below). By the computational properties small ball at a T 3. -^9 T provided , 5 -^- a T . . of quantile regression estimator 9(a„), for 4. any a„ 0(K/V^) = V^E„f(W,an ,e(an )). By lemma K.l, the following expansion of r.h.s. is valid for any in a (28) a„ -£-» aT lr : V^E n f(W,a n ,9(a n ))sG„f(W,a„,9(<Xn)) + V^Ef(W,a n ,9n (c<r,)) = G„f(W a T ,9T ) + 1 Expanding the o v (\) + V^Ef(W, a n ,9(a n )) last line further = G„f(W,a T ,9 T + op (l) ) + + In other words for any V^(9(a n ) - 9 T ) Vn(l(a n ) Over a shrinking a„ —a > + op (l))^(9(a n )-8T (Ja + o p (l))y/n(a n - a T ). (Je (30) ) T = -Jg G„f(W,a T ,9T - J^ Ja [l + o„(l)]v/r7(a„ - a T + op (l), 1 l ) = -Jy G n f(W,a T ,9 T - J7 J at q t denoted B n (a T ), wp — 0) ball ) ) + op (l))^fc(a„ - a T ) + op (l). 1, for 11x11^ = x' Ax [1 , S= arg inf n»EB„(or) ||7( Q »)IU- Observe that V^IRKJIU = 17 Note that by convention l|Op(l) in empirical - J-,Ja [l + op (1)]v^(q„ - a T )IU +0p( .,, process theory 26 Ef(W) means (Ef(W)),_j. i.e 2 Since J-, A Ja and have Vn(S - Q x = means where — aT rank, y/n(a full = ) — argm{\\ = Op (l). ) J-,Ja n\\ A . that the limit distributions of the lhs and rhs agree. Conclude that: V^(8 - L = -Jg'll - 8r) G n f(W,a r ,0r) -^ 1 J^J'y AJ1 G n f(W,a T) e T ) and Ja(J'a JyAJ.y Ja r i J^AJ1 }G n f(W,a T ,e T J'a ) N(Q,S) by CLT. Finally, the consistency /3(2), - J-,G n f(W,a T ,9 T ) L V^(5 - q t ) = -(J'a J^Aj-,Ja y with Hence by lemma A. and asymptotic representation except that in the definition of instead of 7 for we use follows analogously to that of j3 its plim 0. Therefore, analogously to (40)- (42): L = -Jg'lh yfiUft-flr) : - Ja (J'a J^AJ1 Ja y 0][7 1 J'a J!1 AJ^]G n f(W,a T ,0 T ). U Proof of step 3. The argument is just slightly more complicated than usual consistency cf. Amemiya (1985) or Newey and McFadden (1994). For some 9, 6(a) maximizes arguments, Q n (a,9) = -E n [g(W,a,9) -g(W,a T ,9)] -^ Q x (a,9) = where the convergence wp — uniformly in 1, Q„ (a, 6(a)) - infce.4 [Qoo(0(a)) each a. now It A follows Qoo(0(a)) > [i] in (9, ||7(a)||/i| e/2 [ii] > On (a, 0(a)) - a collection of balls with diameter — sup ese ^ B(o O°o(0)] > wp — 1, uniformly in a ) > e/2 1, — » e. each centered at 9(a). <5, by assumption R4 and concavity 0, Then in 9 for > Ooo (0(a)) sup Qe ^, 0, Ooo(a,0(a)) - -Qoo (0(a)) + Q~(0(a)) sup ||0(a) which by - 0(a)|| Lemma < 5, for > any 5 A. 2 implies 3 0. = sup Q=°(0). fl€e\B(a) This implies that sup ae ^ |||7(a)||^ - -£-» a*. Identification Results: Generalizations I The following statements sake, R'. — (31) sets, using step 2. For any e > 0, > Q n (a,8(a)) by definition, 0=c(a, 9(a)) > > Q x (a,9(a)) - e/2 by (31). Hence wp -> 1 ege\B(a) Thus wp - g(W, a T ,0)] a) over compact Q n (a,9(a)) [hi] a, 9) On(a,0(a)) > Q„(a,B(a)) - A) be Let {B(a),a £ = uniform 6 e/2 by (31), 0=o (a, 6(a)) e is a -E[g(W, we suppress We can label the points of the support as functions to [t — and functions are all conditional on the event X = x. For notation Suppose support of D is a finite set of discrete values in this conditioning. <p 5,t mapping d from + <5] a.s. for small 6 z where z = (zj, 1 {1, ...J} to (y <j< h-> > 0. {1, fy(y\d) ..., > Define C(x) as the convex hull of J}. 0) such that P(Y < <p(D, t)\Z) belongs Define the following function n 2 (v>,x) = J). Define, : [P[Y < <p(D)\zjl 1 < j < J], assuming relevant smoothness Jz (<£>,z) = = l\z 1 ] ... fy(<p(J)\D = fr(<p(l)\D=l,zj)P{D=l\zj} ... fY (<p(J)\D = J,zj)P[D = fy(<p(l)\D=l,Z 1 )P[D 27 -j-Tl^ip) J,Z 1 )P[D=J\zi} J\zj] J The statement that rank rank Jzifjx) Theorem = Jz(tp,x) where Z J, P(Y < if for any is full Then d i—» hold, q{d,x,r) q(d,x,'r)\x, z) £ C(x) Jz(f, x) <p > w. pr. means that with positive probability X= x. > and {Zj} are independent replica of Z, given D) Suppose A1-A5 5 (Discrete the range of d>—> q(d,x,r). among £(x) = = is and T for P-a.e. is finite that fy(y\d,z,x) finite over a unique solution of given z, and has X = x, (32) rank w. pr. full > 0. Proof. Condition on the event X = x. We know that q(d, x,t) solves (32) from Theorem Suppose there exits q" 6 £(x) that also solves (32) such that 1, hence it belongs to C(x). d »-» q*(d) and d i—» q(d,x, t) disagree on {1, ..., J}. Then = riz(<7*,x) for a conformable vector of A' where gj 6 'C( a; ), Finally, Then l's, 1. (U z (q",x) - which is It and IIz(<7,x) = P — a.s., It, RJ any vector A € for n z (q,x)) = impossible by the A' full = z (gJ, x) \ {0} Taylor expansion gives 0, rank assumption. we consider continuous D. 18 As Newey and Powell E(Y — fi(D)\Z) = 0, P - a.s. the condition for identification of /x is (2001) shown, in the model the Lehmann-Scheffe completeness condition: P LI E[A(D)\z] = where V is => a.e. a collection of A(D) = V Fd[-|-z] as z varies a.e., Z over support of given X = x. Lehmann (1954) provided a sufficient "happy family" condition: L2 P[D = The full d\z] is a full rank exponential family h(d) rank condition requires 77(2) L2 to satisfy a linear constraint. exp(r/(^)'T(d) to vary over an open rectangle in K + A(z)). dlm ( T ('i " and T(d) not allows for a broad variety of non-parametric distributions. X = x but we suppress this conditioning. m that map a d from the set £>(x), the support of D, t+ to for small S > given X = x. f(y\d) > 0) such that P[Y < m(D)\Z] e [t Solution q said to be unique any other solution m = q V — a.e., where V defined above. The statements are conditional on the event Define C(x) as a convex hull of functions (2/ is Theorem d uniformly in for any A(d) = Then d z. t— » q(d, x, r) is m(d) — q(d, x, t) such dent standard uniform ii. is if 6 Suppose A1-A5 hold, and that fy{y\d,z,x) 1—> q(d, x, t), i. 6] a.s., S, : sufficient condition for exp(r)(z)'T(d,t)) is C, i. E [f e that m£ is f(t, d\z) set and C(x) and (C,A(D)\D, z)A(D)\z] -1 > finite over the range of a unique solution of (32) [P€ [f.|<f, e among C(x) = Y — q(d, x, t) = P-a.e. => 2] — Pe [0|d, z]] if and indepen- A(D) = V-a.e. fo[d\z\ oc h(d, t) an exponential family of full rank. In the last expression, /(0, d\z) = lirat^o f(t,d\z). Condition ii. is a plausible non-parametric condition, with rhs motivated as a Taylor approximation of the log of lhs. 18 To be removed and is given here for completeness. The work with Whitney Newey and Guido Imbens 28 full treatment is to be given in the joint is m £ C(x) such that P[Y < m(D)\z] = r m{D)\z] — P[Y < q{D)\z] equals by Taylor expansion P Proof. By hypothesis there = P[Y < difference EA(D) ft (5A{D)\D,z)d8 = EMC&(D)\D,z)[A(D)] = Then a.e. the (33) 0, Jo jo where £ is a uniform variable on [0, 1], independent of Z, D and A = m—q. (33) is proportional to E*[A(D)\z] where E* is the expectation distorted by fe ((A(D)\D, z). For uniqueness we ii. any measurable function This proves -a.e. implies (also using f(t(d), d\z) E/( for V A(D) = need that (33)=0 implies condition t ( d),d|z)A(d) = <=> == = = = A(rf) t(d), since f(t(d), d\z) To prove i.. fo{d\z) is ii. sufficient, by L2 0) P - a.e. remains a (34) dim(d)-rank exponential fam- full Thus if t(d) = A(d), (34) still holds. Now note f* fY _ q{D) {^{D)\D,z)dC, equals {P[Y q(D) < A(D)\D, z] - P[Y - q(D) < 0\D, z])/A(D), so E f(Md)td z) A{d) = lhs of (33). ily. \ Additional Results: Empirical Likelihood J Here we treat the generalized empirical likelihood. Define 7(W,i?) J = ^{X, Z) \& = ^(X Z) , is is s <^ T for $ = (a, /3), ip T (u) = t — l(u < 0) (y - D'q - A"^)f where , a smooth function of an instrument, a smooth consistent estimate of such function, satisfying L5. Define also 7 = = Q„(t?,7) arginfQ„(i?,7), 5= " 1 ~f= vn 2J s[/(W,i?)'7 ], i=l argsup[inf Q„(i9,7)]. (35) finite, and four times differentiable function so on an and equals +00 outside it. Normalize [9 J s(t;)/9ii '](0) = 1 for Function s() equals a strictly convex, open interval of j = 1,2. K containing 0, : = — ln(l — Functions so(u) u),exp(i>), (1 + 2 v) /2 lead to the GMM estimator. and continuous up-dating and Kitamura and Stutzer (1997). likelihood, exponential tilting, Newey and Smith Just as (2001), GMM, GEL is infeasible in our settings with two or more useful in low-dimensional settings or as a refinement of the IQR used to bring the estimates to a right neighborhood, and the such a neighborhood. For this purpose, The pivotal objective function of Assumptions L.1-L.6 The LI Wi = (Yi,Di,Xi,Zi) = L2 t? L3 tfo is unique L4 5(i?) = (a T ,/3 T ) G GEL GEL known are covariates. estimator. GEL to have See Imbens (1997), good The It may be latter can be can be recomputed over finite sample properties. can be used for construction of confidence intervals. following assumptions are maintained is i.i.d. interior 1? s.t. Eip T (Y well-known empirical A E<p T (Y - D'a - and (Di,X Z,) take values % , x B, a compact convex - D'a - X'/?)* = X'/3) 2 **' is 0, in in a compact V=A x B. positive definite for each 29 set. set. i)eV. (S = S(i9 )) L5 wp -• 1, J is a set of boundedly differentiable functions CJJ,, > dim(z,x)/2. 19 <£(•) >£() e T, uniformly over compacts. - D'a - X'P)*] = E [/y--D' Q -x< 5 (0|X, D, Z)^\D' X')) is defined, [<p T (Y {z,x) >—> 'I'fjjX) 6 T, with smoothness order L6 J(-d) = finite, Remark £E has — r\ > : full column rank and continuous at each a,/? in A x B. J.l All assumptions, but L5, are fairly standard. L5 allows for a wide variety of nonparametric and parametric estimators. See Remark G.2. Theorem 7 (GEL) In the linear model (8) and assumptions L1-L6 listed above n/™7 where S= r(l - t)EW Corollary 3 Further, fe [0\D, Z,X], and efficiency V = and J o, (J's-tj)-' o, o 1 s-^s-JiJ's-tj)- 1 E[f,{0\X,D,Z)^[D' = V* ** : X']}, <f =Y- where ** if we = fe (0\X,Z), the asymptotic variance of set [X', **']', ^- 1 D'a{r) - X'j3{r). = E[D v\Z,X]/V, a and f3, v = simplifies to the bound t(1-t)E[****'] -:1 . Proof. The proof extends the arguments of Kitamura and Stutzer (1997) and Christoffersen, Hahn, and Inoue (1999). 1. Define f(W,-d) where by ^ and = * we <p T denote vectors Q„(i? )7 ) 7(1?) = 7(i9) = arginf-, <3($,7), (1997) or (Y - D'a - X'p)V, = arginf Q„(i?,7), i9* = argsup^ gv -0* = i9 -0 Q(-d, 7) = (Y - D'a - X'/3)$, = E s[/(W,0)'7], and argsupinf Qn(i?,7), Q($,7). By arguments of Kitamura and Stutzer inf., and <p T Z) and ty(X, Z). Define also E„s[/(W,i?)'7], Newey and Smith, By Lemma "if(X, = /(W,i?) 7(1?*) = 0. do E„s[f(W,ti n )'j] -^> Es [f(W,d )'i\ for each 7 dlm ' 7 a dense countable subset of R \ Hence by convexity lemma A.l, since Es[f(W, i?o)'7] 2. finite K.l, in R, for any i?„ -^-> in is over an open set by LI 7(i?o) -^ 0, 7(5) -^ 0, provided S -^ i9 . 3. By lemma K.l and consistency proof of Kitamura and Stutzer (1997) or e.g. Newey and Smith (2001) and references therein that do not require smoothness of /: -d ——* i?o- shows y/nE„f(W,ti) 4. Step 4, proved below, = Op (l). Lemma K.l and properties of s, the following expansion valid, wp— 1 for (y„,i9n) = (7, i5) or (7,1, i?n) = (7(^0), #0), In view of steps 2-3, by 5. first order conditions = is Vn~E n f(W,d n )s[f(W,d n ) ln ] = [v^E„/(W,i9 n )] + Enf(W,d n )f(W,0 n )''vW. + Op (v^||7n|| 2 ) = [JRE n f{W, 1J0) + J + Op (l))V£(tf n ~ tft))] + ( 9 of the > See page 154 in van der Vaart and Wellner (1996). 30 (S 2 + Op(l))' V^7n + Op (^|]7n|| (36) ). By step and L4, V"7n 4, (36), = Op (l), i.e. v^7 = Op (l) and and by v^7(tfo) Step 6, Op (l). , oP (l)- (39) _1 V™7 = -S >/JJEn/(VMo) - S J^n(d - # J'S- 1 VSE„/(W,i?o) + J'5- 1 J %/S(5-i? ) + op (l) (36) (39) gives - 07(tf = i?o) = -[5 n/^7 -^ and also = -1 Prom (38) proved below, shows v/^J 7 7. (37) and L6, (36), (37), v^(5-i?o) = 6. = Op (l). -(J'S _1 7V(0, -5 S -1 _1 J) -1 J(J'5 _1 [S J'S _1 -1 J) v^nEn/CW.tfo) -1 + J'5" ]VnE„/(W, l? 1 J'15" 4. By definition, for -n(E„s[7(W, £)'(?„] - any g„ s(0)) < into which yields -^ iV(0, 1 (J'S" J)" 1 ), + op (l) 1 ), with asymptotic covariance between \fn{-d jointly, Proof of step ) when put op (l), which 0, op(l) 1 - JiJ'S- 1 ^- + ) = — -do) and ^/n'f equal 0. = Op (l/y n), / -n(E„s[/(W,5)'7] - s(0)) (40) < -7i(E„s[7(VMo)'tW] By Lemma K.l, wp — » the following expansions are valid (by steps like in (36)) 1 =^E ,[/(W, rhsof(40) r + OP (n|| 7 (tf (since )]7(^o)V^+^7W57(tfo)v/^ 19 3 )|| ) = Op (l), ^7(^0) = Op (l) and VE"„[7(VMo)] = -^[/(W.tfo)] by Lemma lhsof(40) By s(0)). 6. For K.l and step and (42) = Op (l/y/n) is arbitrary, we have y/nE„f(W,S) = Op (l). any $„ = O p {\/s/n), by definition -n(E„s[f(W, 5)'y] By Lemma ), =^E4f(W,d)}gn V^+lV^9'r>Sgn V^ + Op (n\\g„\\ 3 ). (40)- (42), because gn Proof of step K.l 5, - s[0]) < -n(E„s[7(W, tf n ) the following expansions (by steps 7] - s[0]). (43) like in (36)) are valid lhs of (43) = -v^E n 7(W, d YiJn - i V^y'Sr^ + rhs of (43) = - VnE„f(W, -d„)'^y/n - °p(1), (44) and by Lemma \ y/nrj'S^y/n + op (l), K.l Vae„7(VM„) = VnEn /(W ,tf„) + J(tf„-#)v^ + °P (l)> r VnE„f(W,0) = ^E„/(W,i? Putting (43) - (45) together, (45) ) + J'(S for any o p (l). we have ^(tf„ - Syj'iy/Z < Because (46) holds - t?)v^ + tf„ = Op (\/y/n), o p (l). (46) implies J'^y/n 31 (46) = o p (l). K A Lemma This lemma uses empirical process arguments to obtain some of stochastic relationships. Lemma K.l (Expansions) Under assumptions L1-L6, function m i. ii. that is Liphitz over the range of as $n for any real-valued i?o, f(W) G„/(VM„)=G n /(W,iM + op (l), 'E„m[f(W,tf n )] E„s[f(W,i!f„y-y] Hi. Em[f(W, #o)] -^-» > Es[f(W, ( i?o)'7] in particular, for fOT ^ach -y m(x) = xx' etc.), in a countable dense subset o/R ""'"''. Under assumption R1-R6, For each (a,9), E„[g(W, a,0) - g(W,a T ,§)] iv. v. G n f{W,a n ,8(a n )) = G n f{W,a,,9T + n is = { = 7r (*, *, (a, /?, 7) and n=A of T by logJV N ( e ,^,£ 2 (P)) for some 6' bracketing < 0. Thus number of X= is B x Q where Q is -^a T an a ball at y T (Y - D'a - x'p - *(x, Z)'-r)*(Jf, Z), tt) >-> The bracketing number Donsker. x E\g(W,a T ,0) -g(W,a T ,0)]. op (l), for any ) Proof. Denote -?-> T is Cor = o( By Cor Donsker. =0 because it O (log ^[.[(e,^7 of these has the same smoothness properties. l(4>,n) <-><Pt(Y e f} , Z/2(P))) as well. Therefore V is Donsker. Class two uniformly bounded (by LI or Rl and L5 or R5) so the product H is R4 Exploiting the or L6, the bracketing is Lipshitz over (T x V), ri is formed as a product classes: = FV, and by Theorem 2.10.6 in van der Vaart and Wellner Donsker. Now we show i. using the established Donskerness. Define the process h = (*, 1?) h-> G„<p T (Y - D'a - X'/3)#(X, Z). — —— -^-> -do, we have p(h, h) » ^>o uniformly over compacts and i? n 0, where p denned by the L 2 (P) seminorm p[h) = E\\ip T (Y - D'a - X'/3)*(X, Z)||, so that Since \t is - D'a- X'/3 - HX^Y-y), jr€n,$ef] Ti (1996) * of V= is e F, iren,$ef} monotonicity and boundedness of indicator function and assumptions number * van der Vaart and Wellner (1996) the {(*,tt)i- (D'a-X'P-$(X,Z)'~f), O [logN\ |(e, T, L,2{P))), e n, class of functions (7 ) 2.7.4 in The van der Vaart and Wellner (1996) 2.7.4 in 7 tt 0. . G n <p T (Y - D'a n - > X'/9„)*(X, Z) - G n <p T {Y - D'a 32 X'/3)*(X, Z) = o p (l) is By the above analysis G„Tn[f(W,-& n )] is Donsker (asymptotically Gaussian) as Theorem 2.10.6 in van der Vaart and Wellner (1996) (m which f(W, $ n ) belongs wp — 1 by assumption.) From this > The proof of To show By 00. its 1. > and iv. follows exactly as note that 7 K d,m< in "' ) . i? n 7 in Fc )'7] -^-» F where , Es[f(W, ii., respectively, using that Es[f (W, d )' f] the latter s, Thus for set, = +°o say F, or is H is Donsker. Es[f(W, 3o)'i\ < convex, open, and 7 g F, Es[f(W,ti)'-y]\ v _ vf _f < 00, Es[f{W, tioYl] < 00. denotes the closure of F. The analogous argument delivers, Conditional on this event, step Similarly take En s[f(W, nowhere dense is and i. either such that is convexity and lower-semicontinuity of boundary wp — v. iii. to immediate. is ii. well, using bounded subsets Lipshitz over is = i?o)'7] gives ii. So 00. E„s[f(W, d-nY'y] -^-* follows by taking iii. all the rationals not in the boundary of F. References "Changes ABADIE, A. (1995): Regression Approach," Spanish Labor Income Structure During the 1980s: in CEMPFI Working A Quantile Paper No. 9521. (2001): "Semiparametric Instrumental Variable Estimation of Treatment Response Models," Harvard, mimeo. ABADIE, A., ANGRIST, J. AND G. Imbens (2001): "Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings," Econometnca, AMEMIYA, T. (1975): "The (1977): forthcoming. "The nonlinear limited-information maximum-likelihood estimator and the mod- nonlinear two-stage least-squares estimator," ified p. maximum likelihood Econometrics, 3(4), 375—386. J. and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equation model," Econometrica, 45(4), 955—968. (1982): "Two Stage Least Absolute Deviations (1985): Advanced Econometrics. Harvard University ANDREWS, D. Vol. 4, ed. (1994): Estimators," Econometrica, 50, 689-711. Press. "Empirical Process Methods in Econometrics," in Handbook of Econometrics, by R. Engle, and D. McFadden. North Holland. ANDREWS, D. W. K. "Nonparametric kernel estimation (1995): for semiparametric models," Econo- metric Theory, 11(3), 560-596. ANDREWS, D. W. K., AND Whang Y.-J. ANGRIST, J., AND A. Krueger (1992): Buchinsky, M. (1994): "Changes sion," "Age in U.S. at 6(4), 466—479. School Entry," JASA, 57, 11-25. wage structure 1963-87: An application of quantile regres- Econometrica, 62, 405—458. (1998): "Recent Advances Research," Journal of Chamberlain, G. J. (1990): "Additive interactive regression models: circumven- Econometric Theory, tion of the curse of dimensionality," (1987): Human in Quantile Regression Models: A Practical Guideline for Empirical Resources, 33(1), 88-126. "Asymptotic efficiency in estimation with conditional moment restrictions," Econometrics, 34(3), 305-334. CHEN, L., AND S. Portnoy (1996): "Two-stage regression quantiles and two-stage trimmed least squares estimators for structural equation models," Comm. Statist. Theory Methods, 25(5), 1005- 1032. CHRISTOFFERSEN, P. F., J. Value-at-Risk Measures," Hahn, SSRN AND A. INOUE (1999): "Testing, Comparing, and Combining working paper. Das, M. (2001): "Instrumental Variable Estimation of Nonparametric Models with Discrete Endogenous Regressors," mimeo. 33 Dorsum. K. (1974): "Empirical probability plots and statistical inference for nonlinear models in the two-sample case," Ann. HECKMAN, ings, 80, HECKMAN, in Statist., 2, 267-277. J. (1990): "Varieties of Selection Bias," American Economic Review, Papers and Proceed- 313-338. AND J., ROBB R. (1986): "Alternative Methods for Solving the Problem of Selection Bias Evaluating the Impact of Treatments on Outcomes," in Drawing Inference from Self-Selected Samples, ed. by H. Wainer, pp. 63-107. Springer- Verlag, HECKMAN, AND J. J., J. SMITH (1997): New York. "Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts," Rev. Econom. Stud., 64(4), 487- With the 535, assistance of Nancy Clements, Evaluation of training and other social programmes (Madrid, 1993). HoGG, R. V. (1975): "Estimates of Percentile Regression Lines Using Salary Data," Journal of American Statistical Association, 70(349), 56-59. HoNG, H., Horowitz, AND E. TAMER (2001): "Estimation of Censored Regression with Endogeneity," Preprint. "A Smoothed Maximum Score Estimator (1992): J. the for the Binary Response Models," Econometrica, 60(3). Imbens, G. W. (1997): "One-step estimators for over- identified generalized method of moments mod- Rev. Econom. Stud., 64(3), 359-383. els," AND IMBENS, G. W., ment J. D. ANGRIST (1994): "Identification and Estimation of Local Average Treat- Effects," Econometrica, 62(2), 467-475. AND W. IMBENS, G. W., K. NEWEY (2001): "Estimation of Nonparametric Triangular Simultaneous Equations," mimeo. KlTAMURA, Y., AND M. STUTZER (1997): "An information-theoretic alternative to generalized method of moments estimation," Econometrica, 65(4), 861—874. KNIGHT, K. (1999): "Epi-convergence and Stochastic Equisemicontinuity," Preprint. KOENKER, R. (1994): "Confidence intervals for regression quantiles," in Asymptotic statistics (Prague, 1993), pp. 349-359. Physica, Heidelberg. KOENKER, R. (1998): "Treating the treated, varieties of causal analysis," a note. KOENKER, R., KOENKER, R., AND G. S. Bassett (1978): "Regression Quantiles," Econometrica, 46, 33-50. AND Y. BlLIAS (2001): "Quantile Regression for Duration Data: Reemployment Bonus Experiments," Empirical Economics, the Pennsylvania A Reappraisal of to appear, also at http //www econ uiuc edu/'roger/research/home html. . : . . . KOENKER, R., AND K. HALLOCK (2000): "Quantile Regression: An Introduction," Journal of Economic Perspectives, forthcoming. LEHMANN, E. L. (1974): Nonparametrics: statistical methods based on ranks. Holden-Day Inc., FVancisco, Calif., and With the special assistance of H. J. M. d'Abrera, Holden-Day San Series in Probability Statistics. MaCurdy, AND T., C. TlMMlNS (1998): "Application of Smoothed Quantile Estimation," paper presented at Quantile Regression Conference in Konstanz, mimeo. Manski, C. F. (1985): "Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator," Journal of Econometrics, 27, 313-333. (1988): "Ordinal utility models of decision making under uncertainty," Theory and Decision, 25(1), 79-104. NEWEY, W. K. rica, 58(4), (1990): "Efficient instrumental variables estimation of nonlinear models," 809-837. 34 Economet- (1997): "Convergence rates and asymptotic normality for series estimators," J. Econometrics, 79(1), 147-168. Newey, W. K., AND Engle, R.F. and D. NEWEY, W. K., AND McFadden D. McFadden J. L. (1994): (eds), Powell "Large Sample Estimation and Hypothesis Testing," in Handbook of Econometrics, Vol. IV. (1990): "Efficient estimation of linear and type 1 censored regres- sion models under conditional quantile restrictions," Econometric Theory, 6(3), 295-317. (2001): "Nonparametric Instrumental Variable Estimation," Preprint. NEWEY, W. K., AND R. Smith (2001): "Higher Order Properties of Generalized Empirical Likelihood," working paper, MIT. Portnoy, S., AND R. KOENKER (1997): "The Gaussian Hare and the Laplacian Tortoise," Statistical Science, 12, 279-300. Powell, ROBINS, J. L. J. M., (1986): "Censored Regression Quantiles," Journal of Econometrics, 32, 143-155. AND A. A. TsiATIS (1991): "Correcting rank preserving structural failure time models," RoCKAFELLAR, R. T., AND R. B. Wets (1998): van DER VaaRT, A. W. (1998): Asymptotic VAN DER VaaRT, A. W., Springer-Verlag, New AND J. A. noncompliance Statist. in randomized trials using Theory Methods, 20(8), 2609—2631. Variational Analysis. Springer- Verlag, Berlin. statistics. WELLNER for Comm. Cambridge University (1996): Press, Weak convergence and Cambridge. empirical processes. York. Vytlacil, E. (2000): "Semiparametric Identification of the Average Treatment Effect in Nonseparable Models," mimeo. (2001): "Selection Model and LATE modehEquivalence Result," Econometrica, forthcoming. 35 Date Due V** ^ Lib-26-67 MIT LIBRARIES 3 9080 02527 8901