MIT LIBRARIES [©iWSfl] liHIIIIillll llllllMllllllllIlliJIIlillll 3 9080 03317 5883 Massachusetts Institute of TecFinology Department of Economics Working Paper Series INFERENCE ON COUNTERFACTUAL DISTRIBUTIONS Victor Chernozhukov Ivan Fernandez-Val Blaise Melly Working Paper 08-1 August 8, 2008 Revised: April 4, 2009 RoomE52-251 50 Mennorial Drive Cambridge, MA 021 42 This paper can be downloaded without charge from the Research Network Paper Collection http://ssrn.com/abstract=1 235529 Social Science at Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/inferenceoncount00cher2 INFERENCE ON COUNTERFACTUAL DISTRIBUTIONS VICTOR CHERNOZHUKOVt Abstract. In this IVAN FERNANDEZ- VAL§ paper we develop procedures models about how potential policy interventions an outcome of interest. for BLAISE MELLY* performing inference affect the entire regression in marginal distribution of These policy interventions consist of either changes in the dis- tribution of covariates related to the outcome holding the conditional distribution of the outcome given covariates fixed, or changes in the conditional distribution of the outcome given covariates holding the marginal distribution of the covariates fixed. Under either of these assumptions, theorems for we obtain uniformly consistent estimates and functional central limit the counterfactual and status quo marginal distributions of the outcome as well as other function-valued effects of the policy, including, for example, the effects of the policy on the marginal distribution function, quantile function, functionals. We construct simultaneous confidence sets for these functions; these sets take into account the sampling variation in the estimation of the relationship come and covariates. approaches for Our procedures rely on, and our theory covers, between the out- all main regression modeling and estimating conditional distributions, focusing especially on classical, quantile, duration, and distribution regressions. Our procedures are general and accommodate both simple unitary changes changes and other related in the values of a given covariate as well as in the distribution of the covariates or the conditional distribution of the given covariates of general form. We outcome apply the procedures to examine the effects of labor market institutions on the U.S. wage distribution. Key Words: Policy effects, counterfactual distribution, quantile regression, duration regression, distribution regression JEL Date: April Classification: C14, C21, C41, J31, J71 4, 2009. This paper replaces the earlier independent projects started in 2005 "Inference on Counterfactual Distributions Using Conditional Quantile Models," by Chernozhukov and Fernandez- Val, and "Estimation of Counterfactual Distributions Using Quantile Regression," by Melly. We would like to thank Alberto Abadie, Josh Angrist, Manuel Arellano, David Autor, Arun Chandrasekhar, Flavio Cunha, Brigham Frandsen, Jerry Hausman, Michael Jansson, Joonhwan Lee, Pierre- Andre Maugis, and seminar participants at Banff International Research Station Conference on Semiparameteric and Nonparametric Methods State, St. in Econometrics, Berkeley, Boston University, CEMFI, Columbia, Harvard/MIT, MIT, Ohio Gallen, and 2008 Winter Econometric Society Meetings for very useful comments that helped improve the paper. Companion software developed by the authors (counterfactual package available from Blaise Melly. 1 for Stata) is A 1. • ', ' Introduction basic objective in empirical economics is :' 2000 in if some outcome variable For example, we might be interested in what the wage distribution would be workers have the same characteristics as in 1990, what the distribution of infant birth weights would be for black mothers what the distribution as white mothers, if they receive the same amount of prenatal care would be of consumers expenditure the income tax, or what the distribution of housing prices would be hazardous waste wages of market for site. In other examples, same in of wages More characteristics). for black outcome variable . (e.g., if generally, black workers are paid as white workers with we can think of a policy intervention either In this paper X Y or in the conditional distribution of we develop procedures that determine the Y given V of a i ' given change is in A'. to perform inference in regression models about these counterfactual policy interventions affect the entire marginal distribution of The main assumption 1'. that either the policy does not alter the conditional distribution given A' and onl}' alters the marginal distribution of A, or that the policy does not alter the X. X of interest Y, or as a change in the conditional distribution of the distribution of of what the distribution Policy analysis consists of estimating the effect on the distribution of how local workers would be in the absence of racial as a change in the distribution of a set of explanatory variables X we clean up a female workers are paid as male workers with the same characteristics), (e.g., if discrimination in the labor market the we might be interested if we change if female workers would be in the absence of gender discrimination in the labor what the distribution or ., to predict the effect of a potential policy intervention or a counterfactual change in economic conditions on of interest. . marginal distribution of A' and only alters the conditional distribution of K given Starting from estimates of the conditional distribution or quantile functions of the outcome given covariates, we obtain uniformly consistent estimates for functional of the marginal distribution function of the outcome before and after the intervention. Examples of these functional include distribution functions, quantile functions, quantile policy effects, distribution policy effects, means, variances, and Lorenz curves. We then construct confidence sets around these estimates that take into account the sampling variation coming from the estimation of the conditional model. These confidence sets are uniform in the sense that they cover the entire functional of interest with pre-specified probability. analysis specifically targets and covers the principal approaches Our to estimating conditional distribution models most often used in empirical work, including classical, quantile, duration, and distribution regressions. Moreover, our approach can be used to analyze the effect of both simple interventions consisting of unitary changes covariate as well as more elaborate spond to known transformations changes policies consisting of general distribution or in the conditional distribution of the the counterfactual distribution of in the values of a X outcome given and conditional distribution of given in the covariate covariates. Moreover, Y given X can corre- of these distributions or to the distributions in a different subpopulation or group. This array of alternatives allows us to answer a wide variety of policy questions such as the ones mentioned in the To develop the inference results, we first paragraph. establish the functional of the marginal distribution functions before and of the functional estimators of the conditional (Hadamard) after the policy model of the differentiability with respect to the limit outcome given the covariates. This result allows us to derive the asymptotic distribution for the functionals of interest taking into account the sampling variation coming from the relationship between the outcome and covariates by means first stage estimation of the of the functional delta method. Moreover, this general approach based on functional differentiability allows us to establish the validity of convenient resampling methods, such as bootstrap and other simulation methods, to make uniform inference on the functionals of relies interest. Because our analysis only on the conditional quantile estimators or conditional distribution estimators satisfying a functional central limit theorem, the major regression methods listed above. techniques, though in the discussion and commonly used methods it applies quite broadly and we show it covers As a consequence, we cover a wide array we devote of attention primarily to the most practical of estimating conditional distribution and quantile functions. This paper contributes to the previous literature on estimating policy effects using re- gression methods. In particular, important developments include the work which introduced regression-based estimators to evaluate the mean effect of policy inter- ventions, and of Gosling, Machin, of Stock (1989), and Meghir (2000) and Machado and Mata (2005), which proposed quantile regression-based policy estimators to evaluate distributional fects of policy interventions, estimators. but did not provide distribution or inference theory Our paper contributes for these to this literature by providing regression-based policy estimators to evaluate quantile, distributional, and other effects effects) of ef- (e.g., Lorenz and Gini a general policy intervention and- by deriving functional limit theory as well as practical inferential tools for these policy estimators. on a rich variety of regression models Our policy estimators are based for the conditional distribution, including classical. 4 and distribution quantile, duration, vious estimators of Gosling, Machin, and Meghir (2000) and important special cases. In fact, our tlieory covers regressions."' In particular, our limit theory is pre- tlie Machado and Mata (2005) as generic and applies to any estimator of the conditional distribution that satisfies a functional central limit theorem. Accordingly, we cover not only a wide array conditional distributions, but also for most practical current approaches of the many for estimating other existing and future approaches, including, example, approaches that accommodate endogeneity (Abadie, Angrist, and Imbens, 2002, Chesher Our paper , is 2003, Chernozhukov and Hansen, 2005, and Imbens and Newey, 2009). also related to the literature that evaluates policy effects effects using propensity score methods. Lemieux (1996) developed estimators The influential article of DiNardo, Fortin, and for counterfactual densities using reweighting in the spirit of Horvitz and Thompson (1952). and treatment propensity score Important related work by Hirano, Imbens, and Ridder (2003) and Firpo (2007) used a similar reweighting approach in exogenous treatment effects models to construct As we comment quantile treatment effects, respectively. methods to adapt the reweighting theory for such estimators. efficient estimators of average and later in the paper, it is of these articles to develop policy estimators Here, however, we focus on developing possible and limit inferential theory for policy estimators based on regression methods, thus supporting empirical research using regression techniques as its primary method (Buchinsky, 1994, Chamberlain, 1994, Han and Hausman, 1990, Machado and Mata, 2005). The recent book of Angrist and Pischke (2008, Chap. 3) provides a nice comparative discussion of regression and propensity score methods. Finally, a related work by Firpo, Fortin, and Lemieux (2007) studied the effects of special policy interventions consisting of marginal changes in the values of the covari- As we comment ates. later in the paper, their approach, functionals of interest, is quite different from ours. on more general non- marginal changes in based on a linearization of the In particular, our approach focuses both the marginal distribution of covariates and conditional distribution of the outcome given covariates. We focus on semi-parametric estimators due to their dominant role in empirical work (Angrist and Pischke, 2008). In contrast, fully nonparametric estimators are practical only in situations with a small number of regressors. In future work, however, we hope to extend the analysis to nonparametric estimators. "In this case, the literature provides estimators for Fy^, the distribution of potential treatment d, and Fd,z, the exogenous regressors Z theorem specified before and after policy. in Y joint distributions of (endogenously determined) treatment status central limit theorem specified in the limit outcome main Appendix D, our text As long as the estimator of Fy^ and the estimator of Fo,z under D and satisfies the functional satisfies the functional central inferential theory applies to the resulting policy estimators. We illustrate our estimation and inference procedures with the U.S. wage distribution. Fortin, Our analysis is motivated by the influential and Lemieux (1996), which studied the of the changes in the We complement an analysis of the evolution of institutional article by DiNardo, and labor market determinants wage distribution between 1979 and 1988 using data from the CPS. and complete their analysis by using a wider range of techniques, including quantile regression and distribution regression, providing standard errors for the estimates of the main effects, confidence bands. wage and extending the analysis Our results reinforce the in explaining the increase in wage to the entire distribution using simultaneous importance of the decline inequality. They in the real also indicate the minimum importance of changes in both the composition of the workforce and the returns to worker characteristics in explaining the evolution of the entire wage distribution. Our show results that, after controlling for other composition effects, the process of de-unionization during the 80s played a minor role in explaining the evolution of the wage distribution. We organize the rest of the paper as follows. In Section 2 we performing counterfactual analysis, setting up the modeling assumptions factual outcomes, results and inferential procedures for the policy estimators. we give a summary Appendix, we include proofs and additional theoretical - 2.1. for the for counter- and introduce the policy estimators. In Section 3 we derive distributional empirical application, and in Section 5 // methods describe 2. we present the In Section 4 of the main results. results. , In the : Methods FOR CouNTERF.ACTUAL Analysis Observed and counterfactual outcomes. In our analysis it is important to distin- guish between observed and counterfactual outcomes. Observed outcomes come from the population before the policy intervention, whereas (unobserved) counterfactual outcomes come from the population after the potential policy intervention. We use the observed outcomes and covariates to establish the relationship between outcome and covariates and the distribution of the covariates, which, together with either a postulated distribution of the covariates under the policy or a postulated conditional distribution of outcomes given covariates under the policy, determine the distribution of the outcome after the policy intervention, under conditions precisely stated below. We divide our population in two groups or subpopulations indexed by j G {0, 1}. Index corresponds to the status quo or reference group, whereas index group from which we obtain the marginal distribution of 1 corresponds to the A' or the conditional distribution 6 Y of given A' to generate the counterfactual outcome distribution.'^ various regression models of outcomes given covariates, and j, k, for j,k € let = {0. 1}. We can describe the observed outcome QY^{Ui\Xj), where U] ~ (7(0, 1) [/j of Y Qy W X given X in in group group j as a function of in via the Skorohod representation: independently of Xj Here the conditional quantile function plays the we can think convenient to consider the Fx^ be the marginal distribution of the p- vector of covariates covariates and a non-additive disturbance yJ is Let Qyj{u\x) be the conditional u-quantile of following representation. group it In order to discuss ~ F.v^, {0, 1}. More generally role of a link function. {u\x) as a structural or causal function e for j mapping the covariates and the disturbance to the outcome, where the covariate vector can include control variables to account for endogeneity. In the classical regression model, the disturbance from the covariates, as not be. Our analysis We in the location shift A'l ~ need it will cover either case. The consider two different counterfactual experiments. first experiment consists Fvj, while keeping the conditional quantile function as in group counterfactual outcome Y^ VqI separable model described below, but generally drawing the vector of covariates from the distribution of covariates of is in group i.e., 1, The Qyo{u\x). 0, therefore generated by is := Qy^{Ul\Xx), where U^ ~ t/(0, 1) mdependently of A'l ~ Fx,- (2.1) This construction assumes that we can evaluate the quantile function Qy^^{u\x) at each point X in the support of This requires that either the support of A'j. A'l is a subset of the support of A'o or we can extrapolate the quantile function outside the support of For purposes of analysis, it is useful to distinguish the alternative distributions of the covariates. (1) A^o- two different ways of constructing The covariates before policy arise from two different populations or subpopulations. and after the These populations might correspond to different demographic groups, time periods, or geographic locations. Specific examples include the distributions of worker characteristics in different years distributions of socioeconomic characteristics for black versus white mothers. covariates under the policy intervention arise as ates in group Our 0; that is A'l = ^(A'o), where some known transformation g{-) is a known results also cover the policy intervention of changing the conditional distribution of observed outcome in group 1 Y given X. (2) and The of the covari- function. This case covers, for both the marginal distribution of In this case the counterfactual outcome corresponds X and to the example, unitary changes in the location of one of the covariates, where ej is a unitary p- vector with a one in the position Xj tions of the covariates implemented as = (1 — j; a)£'[A'o] or + mean cuXq. preserving redistribu- These types of policies are useful for estimating the effect of smoking on the marginal distribution of infant birth weights, the effect of a change in taxation on the marginal distribution of food expenditure, or the effect of cleaning up a of housing prices (Stock, 1991). different local hazardous waste Even though these two thought experiments, our econometric analysis site on the marginal distribution cases correspond to conceptually will cover either situation within a unified framework. The second experiment function in group as in group 0, that Y° := 1, is, outcome from the conditional quantile consists of generating the Qyj{u\x), while keeping the marginal distributions of the covariates Xq ~ Fxq- QyM\Xo). The counterfactual outcome y'° where U° ~ L/(0, 1) is therefore generated by independently of Xq ~ Fx^- (2.2) This construction assumes that we can evaluate the quantile function Qyi{u\x) at each point X in the support of Xq. This requires that either the support of the support of Xi or we can extrapolate the Xq is a subset of quantile function outside the support of Xi In this second experiment, the conditional quantile functions before and after the policy intervention may arise from two lations might correspond to different locations. This type of policy is demographic groups, time periods, or geographic useful for conceptualizing, for example, bution of wages for female workers would be same We These popu- different populations or subpopulations. if what the distri- they were paid as male workers with the characteristics, or similarly for blacks or other minority groups. '- ' formally state the assumptions mentioned above as follows: Condition M. (2.1) or (2.2). Counterfactual outcome variables of interest are generated by either The conditional distributions of the groups, namely the conditional quantile functions outcome given Qy the covariates in both [{) or the conditional distribution functions Fyj[-\-) for j G {0, 1}, apply or can be extrapolated to all x ^ X , where X is a compact subset ofW^ that contains the supports of Xq and Xi. 2.2. Parameters of interest. The primary (function-valued) parameters of interest are the distribution and quantile functions of the outcome before and after the policy as well as functionals derived from them. we In order to define these parameters, first associated with the quantile function Qyj{u\x) Fy-(y|x)Given our definitions / I recall that the conditional distribution is: {QyM\^) < y} du, (2.1) or (2.2) of the rounterfactual je{0,l}. (2.3) outcome, the marginal distribu- tions of interest are: F>;.{y) := Prjy/ < The corresponding marginal F,- (y|x)(iFv,(x), j,k € {0,1} (2.4) , = M{y:F^.{y)>u}, J,/cG{0,1}. , u-quantile policy' effect and the y-distribution policy effect are: QE^^{u) It is useful to Qy^{u) — Qy^{u) of covariates On f quantile functions are: Q'y^{u) The = y} = Q'y^{u)-Q%{u) and DF^/y) = F^^(y) - F°^(y), j,/cG{0,1}. mention a couple of examples to understand the notation. For instance, is from the quantile effect under a pohcy that changes the marginal distribution F_Vo to F.Vi, fixing the other hand, Qy^iu) the conditional distribution of outcome to FYo(y\x). — Qy^iu) is the quantile effect under a policy that changes the conditional distribution of the outcome from FYg{y\x) to F>-j(y|x), fixing the marginal distribution of covariates to Fxo Other parameters of interest include, for counterfactual outcomes. Lorenz curves, of partial means to overall example, Lorenz curves of the observed and commonly used to measure inequality, are ratios means roc /y iclF^^ii)/ j^ defined for non-negative outcomes only. More generally, functionals of the marginal distributions of the IdF^^il), we might be outcome before and interested in arbitrary after the interventions Hy{y):=<p{y,FlF,\,F^.^,F°,). (2.5) These functionals include the previous examples as special cases as = j^^tdFy {t) =: fiy mean HY{y) = J^ t~dFy (t) — (/iy. )^ =: such as means, with //y(y) fUy — l-i-Yg': variances, with with Hyiy) - (4^)2 - {a°Yj-, ; well as other policy effects, with [uy Lorenz policy ekcts, with Hy{y) )^; = examples Hyiy) = variance policy effects, L{y, F,\) - L{y, F°J =: 9 LEy {y)\ effects, = Gini coefficients, with Hyiy) = with Hyiy) G^^ - G^^ =: 1 - 2 = Gy- and Gini policy ; GE^y^ In the case where the policy consists of either a Xi J^L{Fy ,y)dy =: known transformation giXo), or a cliange in the conditional distribution of Y X, we can given A^ = the distribution and quantile functions for the effect of the policy, FLi5)= f [ l{QA,{u\x)<5}dudFx,ix), of the covariates, also identify — Yj' Yq, by: j,fcG{0,l}, (2.6) J a: Jo where Q^g{u\x) = QYo{u\g{x)) - QYoiu\x) and Qai{u\x) (5i^(a) = inf{5 F^^{5) : > a}, j, = Qy^{u\x) - (5v'o(u|x); and k E {0, 1}, (2.7) under the additional assumption (Heckman, Smith, and Clements, 1997): Condition RP. Conditional rank Conditional models. The preceding 2.3. and quantile functions of interest = preservation: Uq Uq\Xo and 11° = Uq\Xo. analysis shows that the marginal distribution depend on either the underlying conditional quantile function or conditional distribution function. Thus, we can proceed by modeling and esti- mating either of these conditional functions. We can rely on several principal approaches we drop the dependence on the group index to carrying out these tasks. In this section to simplify the notation. Example 1. Classical regression and generalizations. of the principal approaches to modeling The classical location-shift U ~ (7(0, 1) conditional mean. is 7n{X) independent of The disturbance V conditional cjuantile function Qy{u\x) covariates impact the model, it is is one and estimating conditional quantile functions. model takes the form Y= where Classical regression + X , V, V= and m{-) is (2.8) a location function such as the has the quantile function Qv{u), and = m{x) + Q\/{u). outcome only through the clear that a general Qy{U), change This model location. is Y therefore has parsimonious in that Even though this is a location in the distribution of covariates or the conditional quantile function can have heterogeneous effects on the entire marginal distribution of Y, affecting its various quantiles in a differential manner. In the rest of the discussion we keep the The most common model distribution, quantile, quantile policy effects, for the and distribution policy effects functions as separate cases to empheisize the importance of these functionals in practice. Lorenz curves are special cases of the general functional with Hyly) will not be considered separately. = /^ tdFy (t)/ j^ tdFy (t), and 10 m{x) regression function linear in parameters, is least squares or instrumental variable unrestricted and estimate results cover such it common m{x) We methods. = x'P, and we can estimate it using can leave the quantile function Qv{u) using the empirical quantile function of the residuals. Our estimation schemes as special cases, since we only require the estimates to satisfy a functional central limit theorem. The location model has played a and exogenous treatment classical role in regression analysis. models, for example, can be analyzed and estimated effects using variations of this model (Cameron and Trivedi, 2005 Chap. A Wooldridge, 2008). variety of standard survival after a transformation such as the time model, The cf. model is 25, and duration models Cox model with Weibull hazard Docksum and Gasko location-scale shift Many endogenous and Imbens and also imply (2.8) or accelerated failure (1990). a generalization that enables the covariates to impact the conditional distribution through the scale function as well: Y = m{X) + where U~ independently of [/(O, 1) X , a{X)-V, V= and a positive scale function. In this model a{-) is Qy[U), the conditional quantile function takes the form Qy{il\x) that changes in the distribution of X cr(.r)Qv''(")- It is clear or in Q)-{u\x) can have a nontrivial effect entire marginal distribution of Y\ affecting its on the various quantiles in a differential manner. This model can be estimated through a variety of means 1968, and — m{x) + (see, e.g., Rutemiller and Bowers, Koenker and Xiao, 2002). Example 2. Quantile regression. principal approach to modeling We can also rely on quantile regression as a and estimating conditional quantile functions. In this approach, we have the general non-separable representation Y= The model permits and covariates to impact the outcome by changing not only the location scale of the distribution but also its entire shape. effects goes back to Doksum with the location-scale The Qy{U\X). shift (1974), who showed An early convincing example of such that real data can be sharply inconsistent paradigm. Quantile regression precisely addresses this issue. leading approach to quantile regression entails approximating the conditional quantile 11 function by a linear form Qy{u\x) of this = x'P{u)} Koenker (2005) provides an excellent review method. Quantile regression allows researchers to tional distribution. fit parsimonious models to the entire condi- has become an increasingly important empirical tool in applied It economics. In labor economics, for example, quantile regression has been widely used to model changes in the wage distribution (Buchinsky, 1994, Chamberlain, 1994, Abadie, 1997, Goshng, Machin, and Meghir, 2000, Machado and Mata, 2005, Angrist, Cher- nozhukov, and Fernandez- Val, 2006, and Autor, Katz, and Kearney, 2006b). Variations of quantile regression can be used to obtain quantile endogenous and exogenous treatment and distribution treatment effects in models (Abadie, Angrist, and Imbens, 2002, effects Chernozhukov and Hansen, 2005, and Firpo, 2007). Example 3. Duration regression. A common way functions in duration and survival analysis Fyiy\x) where t{-) is variates is = is limited in an important way. model conditional distribution through the transformation model: exp(exp(m(x) + f(y))), This model a monotonic transformation. to is (2.9) rather rich, yet the role of co- model leads In particular, the to the following location-shift representation: t{Y) where V = m{X) + has an extreme value distribution and is •'-:- V, ^ X independent of . Therefore, covariates impact a monotone transformation of the outcome only through the location function. The estimation of this model 1990, Donald, Green, Example 4. mation models is the subject of a large and important literature Distribution regression. Instead of for the conditional distribution, y. An example Fyivlx) is = is restricting attention to transfor- we can consider + and allows for more is unrestricted in Throughout, by "hnear" we mean specifications that are Unear function takes the form z'P{u) where z = = is, if f{x). in the the original covariate is y. This specification exp(exp(i))) flexible effect of the covariates. non-linear in the original covariates; that modehng Fv(y|x) A(m(y,x)), a known link function and m{y, x) t{y)) directly the model includes the previous example as a special case (put A{v) m{x) Lancaster, and Paarsch, 2000, and Dabrowska, 2005). separately for each threshold where A (e.g., The and m{y,x) = leading example of parameters but could be highly X , then the conditional quantile 12 would be a probit or this specification /3{y) is an unknown function 1995). This approach is logit link function U~ 2.4. (/(0, 1) similar in spirit to quantile regression. In particular, as quantile V = Qy{U\X) = Fy m~^{A~^{U), X) independently of X. Policy estimators and inference questions. erate estimates x'P{y), were (Han and Hausman, 1990, and Foresi and Peracchi, in y regression, this approach leads to the specification where A and m[y,x) = {y\x), j G All of the preceding approaches gen- {0, 1}, of the conditional distribution functions either directly or indirectly using the relation (2.3): A',(y|x)=/' i{Qi-(u|x) <y}dn, je{0,l}, where Q^- {u\x) We is a given estimate of the conditional quantile function. then estimate the marginal distribution functions and quantile functions for the outcome by ' _ ^>- (y) respectively, for j, = /c / Fy^{y\:r)dFx,{x), € QEy^iu) We (2.10) {0, 1}. = We Q\.^{u) and Q^. (u) = mf{y : (y) F,'^- > ,;,}, estimate the quantile and distribution policy effects by - Qliu), and De'- (y) = F^'-iy) - F^^iy). estimate the general functional introduced in (2.5) similarly, using the plug-in rule: Hy{y) = d[y.FlFl„F^,,F°,). (2.11) For example, in this way we can construct estimates of the distribution and cjuantiles of the effects defined in (2.6) and (2.7). Common inference questions that arise in policy analysis involve features of the dis- tribution of the outcome before and after the intervention. For example, we might be interested in the average effect of the policy, or in quantile policy effects at several quantiles to measure the impact of the policy on different parts of the outcome distribution. many questions of interest involve the entire distribution or quantile functions of the outcome. Examples include the hypotheses that the policy More generally, in this analysis has no effect, that the effect is constant, or that it is positive for the entire distribution (McFadden, 1989, Barrett and Donald, 2003, Koenker and Xiao, 2002, Linton, Maasoumi, and Whang, 2005). The statistical problem the estimation of the conditional model to is to account for the make sampling variability in inference on the functional of interests. Section 3 provides limit distribution theory for the policy estimators. This theory applies 13 and quantile functions to the entire marginal distribution vaHd of the outcome before and after performing either uniform inference about the en- the pohey, and therefore is tire distribution function, quantile function, or other functionals of interest, or pointwise for inference about values of these functions at a specific point. Alternative approaches. An alternative way to proceed with policy analysis 2.5. use reweighting methods (DiNardo, 2002). to is Indeed, under Condition M, we can express the marginal distribution of the counterfactual outcome in (2.4) as F^,{y)= [ f where is = u;^^(x) l{y,' fx,{x)/fxj{x) J — - (1 = Pr{J = the propensity score, Pj of the covariate given = <y}w';{x)dFy^{y\x)dFx,{x), j, and y is function Wj follows from Bayes' rule. Pj)pj{x)/[pj{l j}, J is - an indicator for group (2.12) {0,1}, Pjix))], Pj{x) := j, Pr{ J Jx^ The second form the support of Y. We j,ke = j\X = is x) the density of the weighting can use the expression (2.12) along with either density or propensity score weighting to construct policy estimators. Firpo (2007) used a similar propensity score reweighting approach to derive efficient estimators of quantile effects in treatment models.^ effect With some work, one can adapt the nice results of Firpo (2007) to obtain the results needed to perform pointwise inference, namely, inference on quantile policy effects at a specific point. However, results We we need to do more work to develop the needed to perform uniform inference on the entire quantile or distribution function. are carrying out such work in a companion paper. In a recent important development, Firpo, Fortin, and Lemieux (2007) propose an ternative useful procedure to estimate policy effects of changes in the distribution of Given a functional of interest 0, they use a 0(F4) - 0(F°J where 4>'{Fy^ — Fy^) = J a{y, = Fyp)d(Fy^(y) term, where function a is maining approximation error. — - F,\) FY^{y)) + is R{F,\, F,\), the first term (p'^Fy^ — F°^); this , order finear approximation the influence or the score function, and R{Fy^,Fy^) is tire re- In the context of our problem, this approximation error generally not equal to zero and does not vanish with the sample Lemieux (2007) propose a X order approximation of the policy effect: first 6'{F,\ al- practical method mean regression method size. Firpo, Fortin, to estimate the first cleverly exploits the law of iterated expectations is and order and the See Angrist and Pischke (2008) for a detailed review of propensity score methods and a comparison to regression methods in the context of treatment effect models. are also likely to apply to policy analysis. In this paper The pros and cons of these we focus on the regression method. two methods 14 term linearity of tlie tliis method error, is in In contrast to our approach, the estimand of the distributions. an approximation to the pohcy whereas we directly estimate the exact effect with a non-vanishing approximation effect 0(Fy.^) — (piFyJ without approximation :,,', error. Limit Distribution and Inference Theory for Policy Estimators 3. In this section we provide a set of simple, general sufficient conditions that facilitate We inference in large samples. design the conditions to cover the principal practical ap- proaches and to help us think about what needed is approaches to work. Even for various though the conditions are reasonably general, they do not exhaust which the main 3.1. inferential methods will be valid. all . scenarios under , : Conditions on estimators of the conditional distribution and quantile func- We tions. provide general assumptions about the estimators of the conditional quantile or distribution function, which allow us to derive the limit distribution for the policy es- timators constructed from them. These assumptions hold for commonly used parametric and semiparametric estimators of conditional distribution and quantile functions, such and distribution regressions. classical, quantile, duration, We begin the analysis by stating regularity conditions for estimators of conditional quantile functions, such as classical or quantile regression. In the sequel, denote the space of bounded functions mapping from the uniform metric. We assume we have a sample observations follows come from group we use ^ to denote Condition C. The is where {(A',, >',),?' = y is I—* x A") to R, equipped with <%" = let £°°((0, 1) I,....,/;,} In this of size sample no n/\^ observations come from group 1. ii for = n/Ao In what weak convergence. conditional density fy {y\x) of the outcome given covariates exists, continuous and bounded above and away from zero, uniformly on y E a compact subset Condition Q. The {u,x) and n^ x (0, 1) the outcome and covariates before the policy intervention. and as o/R, for j G y and x € A", estimators (u,x) i— {0, 1}. > Qy {u\x) of the conditional quantile functions Qyj{u\x) of outcome given covariates jointly converge in law to continuous Gauss- ian processes: n {Qy,{u\x) - Qy^{u\x)^ ^ ^,V,{u,x), J € {0, 1} (3.1) 15 in £°°((0, 1) X ^), where {u,x) Vj{u,x),j G {0,1}, have zero i-^ function Ev^^{u,x,u,x) := E[Vj{u, x)Vr{u, These conditions appear reasonable outcome for j,r G {0, 1}. x)], in practice C and Q discrete, the conditions is when the outcome do not hold. However, Condition the distribution approach discussed below. mean and covariance C and Q is continuous. If the in this case we can use where focus on the case the outcome has a compact support with a density bounded away from zero, which a reasonable first Condition case to analyze in detail. Q applies to the most is common estimators of conditional quantile functions under suitable regularity conditions (Doss and Gill, 1992, Gutenbrunner and Jureckova, 1992, Angrist, Chernozhukov, and Fernandez- Val, 2006, and Appendix F). Conditions without affecting subsequent we want results. C and Q could be extended to include other cases, y For instance, given set to estimate the counterfactual distribution, Condition C Condition in Q needs only to hold over a smaller region UX = a convergence requirement, without affecting any subsequent less restrictive {{u.x) G joint convergence holds trivially We (0, 1) x A" C Qy{u\x) e y} : over which (0, 1) x M, which leads to results. The the samples for each group are mutually independent. if next state regularity conditions for estimators of conditional distribution functions, such as duration or distribution regressions. Let i'^{y x functions mapping from compact subset .t) I—> Fy X x of R. to M, equipped with the uniform metric, where estimators {y\^) of the (y, x) i—> >^ : ^;^^ ,• : ; '. .^ in law to a continuous ; '.^ v^(Py.(y|.T)-FK,(y|:r;))=^ V^Z,(y,,7:), .7G{0,1}, in i°^{y X X), where (y, x) i—* Zj(y, x), j £ {0, 1}, have zero Sz,>(y,a;,y,x) := £;[Zj(y,i:)Zr(y,x)], /or This condition holds for common 1977, Burr and Doss, 1993, and a Fy^(y|x) of the conditional distribution func- outcome given covariates converges Gaussian processes: 3^ is :,..... . , Condition D. The tions (y, y denote the space of bounded A!) J, r (3.2) ^ mean and covariance function G {0,1}. ,: , .. / ... .,'.,. estimators of conditional distribution functions (Beran, Appendix F). These estimators, however, might produce estimates that are not monotonic in the level of the outcome y (Foresi and Peracchi, 1995, and Hall, Wolff, and Yao, 1999). A way to avoid this problem and to improve the sample properties of the conditional distribution estimators is by rearranging the estimates (Chernozhukov, Fernandez- Val, and Galichon, 2006). The joint convergence holds if the samples for each group are mutually independent. finite trivially 16 If we from a conditional quantile estimator Qy-{u\x), we can define the conditional start distribution function estimator FY^{y\x) using the relation (2.10). the original quantile estimator satisfies conditions C and It turns out that if Q, then the resulting conditional distribution estimator satisfies condition D. This result allows us to give a unified treatment on either quantile or distribution estimators. of the policy estimators based Lemma Under conditions 1. C and Q, the estimators of the conditional distribution func- tion defined by (2.10) satisfy the condition = Z,(y,a-) D with -/,. (y|x)\/,(Fv-(y|x),x), j G {0, 1}. Examples of Conditional Estimators. Here we 3.2. tors of conditional distribution and quantile functions theorem, which we required to hold in verify that the principal estima- satisfy the functional central limit our main Conditions D and Q. In this section we drop the dependence on the group index to simplify the notation. Example 1 continued. Classical regression. model y = X'fSo + V, where the disturbance V is finite variance and quantile function regression and quantiles of show in V Appendix F that the Q'o(''^). mean Gaussian Go{u) := independent of A' and has mean zero, In this case, we can estimate by the empirical quantile function of the resulting estimator 9{u) obeys a functional central limit theorem y/n{9{u) zero Consider the classical linear regression — = /5o mean by residuals. — (q(u),/J')' of 9o{u) We (qo(u),/9q)' Oq{u)) => Go(u.)""^Z(i7), where Z is a process with covariance function fl{u, u) specified in (F.6) and matrix G'(ao('J,), f3o, u)' specified in (F.5). The resulting estimator, Q}-{u\:!:) = q{u)+x P, of the conditional quantile function Q)-{u\x) obeys a functional central limit theorem, V^(^QY{y\x)-Qy{y\x)^^{l,x')Go{ur'Z{u)=:V{u.x), in l°°{{0, 1) X A'), where V{u, x) i:v{u,x,u,x) Example model where 2 continued. Q>'(7i|.r) = is = a zero mean Gaussian {l,x')Go{u)-^n{u,u)[Go(uy^]'{l,x')'. Quantile regression. Consider a x'Po{u). In is linear quantile regression Appendix F we show the canonical quantile regression estimator satisfies a functional central limit theorem, where Z{u) process with covariance function, y/n.{/3{u) — /3o{u)) => a zero mea.n Gaussian process with covariance function Q.{u, u u}E[XX'] and Go{u) := G{do{u),u) = ii) Go{u)~^ Z{u), = {mm{u, {Qy{u\x) - Qv-(u|.t)) = v^ — -E[fYiX'po{u)\X)XX']. The estimator of the conditional quantile function also obeys a functional central limit theorem, v^ ii) (.r'/3(u) - x'Mu)) ^ x'GQ{ur'Z{u) := V{u,x), 17 X in ^°°((0, 1) A"), where V{u, x) is a zero mean Gaussian process with covariance function given by: Ev{v.,x,u,x) Example = x'Go{u)~^Q{u,u)Go{u)~^x. Duration regression. Consider 3 continued. for the conditional distribution function stated in model that gives rise to this specification is equation A common (2.9). the proportional hazard model of where the conditional hazard rate of an individual with covariate vector x Ao(y) exp(x'/?o), line /?o is a p-vector of regression coefhcients, Aq y = hazard rate function, and y E •^o(y) exp{- = duration Cox is Ay(y|x) exp(.T'/3o-l-ln = Ao(y))}, delivering the transformation model (2.9) with i{y) = Let y. = In 1 — Ao(y) x'Pq. In order to discuss estimation, let us Then Cox's = the nonnegative base- is some maximum duration for (1972), ^oiy)dy denote the integrated basehne hazard function. Then Fv-(y|x) Jq and m{x) [0,y] model the transformation (1972) partial maximum assume i.i.d. sampling of (F,, Xi) without censoring. likelihood estimator of Pq takes the form /n n J2^og {My) exp{x[P)/Y,My)^Mx'jP)}dN,{y), and the Breslow-Nelson-Aalen estimator of Aq takes the form n " _l /y {j^JM^Mx'jd)}' j=i where N,{y) := 1{Y, < y} and Let W J,{y) := IfV, '^^.: d{^iv,(y)}, 1=1 >y}, denote a standard Brownian motion on y G 3^; y and .;. ;;.;.. see Breslow (1972,1974). Z let denote an independent p-dimensional standard normal vector. Andersen and Gill (1982) show that :..•,, in WX v^(^-/3o,A(y)-Ao(y))^(E-i/2Z,iy(a(y))-6(y)'S-^/2^) ,,' i°°{y), with the terms a(y), b{y), ,• and S, and regularity conditions defined Andersen and GiU (1982) and Burr and Doss (1993). Let Fy{y\x) log A(y))} ' be the estimator of Fy (y|x). Since Fy(y|x) is = 1 - exp{- exp(x';5 Hadamard-differentiable in (/3, in + A), by the functional delta method we have the functional central limit theorem n(Fr(y|x)-Fy(y|x)) in i°°{y X A'), ^ {1-Fy(y|x)} {exp(x'/3o)l'V(a(y)) where b{y,x) = Ay(y|x)x Gaussian process with covariance function, Ez(y,x,y,x) = — exp(x'/3o)&(y), for y < + 5(y,x)'S-'/2^} =: Z{y,x), and Z{y,x) is a zero mean y, {l-Fy(y|x)}{l-Fy(y|x)}{exp(.T'/3o)exp(.T'/3o)a(y) + 6(y,.T)'S-i6(y,f)}. 18 In Appendix F we also discuss another estimator of this model. Example 4 continued. Distribution regression. where A(x'/?o(y)) for the conditional distribution function, such as the logistic or normal distribution. maximum In We can likelihood to the indicator variables !{)' Appendix we prove that the F, central limit theorem ,-~^ Consider the model F>(y|a;) A a is known = link function, estimate the function Poiy) by applying < y} for each value oiy E y separately. resulting estimator 0{y) of Po{y) obeys a functional ._.' ' V^(d{y)-My))=>-Go{yr'Z{y), = where Go(y) := G(/3o(y),y) and Z{y) derivative of A, > for y y. a zero is mean Gaussian A[A'/?o(y)])}], A is the process with covariance function = E [XX'\[X'l3o{y)]X[X'i3om/{A[X'3o{ym - n{y., y) • E[A[X%(y)]2XA7{A[A"/?o(y)](l - Hence the resulting estimator FY{y\x) := A[A'/3o(y)])}] A(.T'/?(y)) of the conditional distribu- tion function also obeys the functional central limit theorem, Fy{y\x)j => -A[x'/?o(y)]x'Go(y)-'Z(y) =: Z{y,x), v/n(Fy(y|; in (l°°{y X X), where Z{y,x) T.z{y,x.,y.x) 3.3. = is a zero mean Gaussian process with covariance function: A[x'/3o(2/)]A[x'(?o(y)]x'Go(y)-^fi(y,y)Go(y)- 'x. Basic principles underlying the limit theory. The derivation for policy estimators relies on several basic principles that allow us to of the estimators of conditional (quantile of the limit theory link the properties and distribution) functions with the properties of estimators of marginal functions. First, although there does not exist a direct connection between conditional and marginal quantiles, we can always switch from conditional quantiles to conditional distributions using to go Lemma 1, then use the law of iterated expectations from conditional distribution to marginal distribution, and finally get to marginal quantiles by inverting. Second, as the functionals of interest depend on the entire conditional function, we must rely on the functional delta method to obtain the these functionals as well as to obtain intermediate limit results such as limit theory for Lemma 1. Since the estimated conditional distributions and quantile functions are usually non-monotone and discontinuous in finite samples, we must use refined forms of the functional delta method. Accordingly, the key ingredient in the derivation and one of the main theoretical contributions of the paper is the demonstration of the Hadamard differentiability of the func- tionals of interest with respect to the limit of the conditional processes, tangentially to the subspace of continuous functions. Indeed, we need this refined form of differentiability to 19 random functions deal with our conditional processes, which typically are discontinuous finite in samples yet converge to continuous random functions in large samples. These refined method differentiabihty results in turn enable us to use the functional delta to derive all and inference theory. of the following limit distribution 3.4. Limit theory for counterfactual distribution and quantile functions. Our first main tions before Theorem and D, shows that the estimators of the marginal distribution and quantile func- result and after the policy intervention satisfy a functional central limit Under Conditions 1 (Limit distribution for marginal distribution functions). Fy the estimators marginal distribution functions Fy (y) of the theorem. M converge [y) jointly in law to the following Gaussian processes: V^, (F^.{y) - F^-{y)) in i°°{y), where y h-> ^ ^J Z,(y,x)dFx,{x) =: ^,Z^{y), j,k e {0, 1}, (3.3) Z^[y), j e {0,1}, have zero m.ean and covariance function, for j,fc,7-,5G {0, 1}, = E[Z^{y)Z^{y)] = ^zjy-^y) Theorem C, and D f [ EzJy,x,y,x)dFx,{x)dFxAi)Jx J X Under Conditions M, 2 (Limit distribution for marginal quantile functions). the estimators in law to the following ._y/^{Q\.^u) - Qy in ^°°((0, 1)), where fy [y] E';;^^{u,u) Qy => -Z^{Q'y^{u))/f,\{Q'y^{u)) =: Vfiu), — J^ fY^{y\x)dFxi^{x), and u function, for := E[V;'{u)V;[u)] Our second main marginal quantile functions converge {u) jointly Gaussian processes: Q'^y^iu)) mean and covariance {u) of the (3.4) result j, = k,r, s G i-^ j^ke G Vj'iu), j, k {0, 1}, '" {0, 1}, {0, 1}, • (3.5) have zero ' . ^%^{Q'y^{u),Ql.^{u))/[f,\{Q'y^iu))f^^{Ql,{m shows that the estimators of the marginal quantile and distri- bution pohcy effects also satisfy a functional central limit theorem. Corollary D 1 (Limit distribution for quantile policy effects). Under Conditions M, the estimators of the quantile policy effects converge in law to the following Gaussian - [QEy^iu] - QE!^.{u)J ^ V^\//('u) - V%K)°C") =: ^^jiu), k,j G in the space <?°°((0, 1)), where the processes covariance function and ' processes: V^ C, E^ ^(u, u) u i—> WHu), := E[Wj{u)W^{u)\, for j, {0, 1}. k G {0, 1}, have zero j, k, r, s G {0, 1}. (3.6) mean and 20 Corollary 2 (Limit distribution D for distribution policy effects). the estimators of the distribution policy effects converge inlaw M and Under Conditions to the following Gaussian processes: -'>-, V^ [DE,.{y) - DE'y^{y)J => V^jZ^iv) " V^oZ°{y) =: 5^(y), 1 in the space £°°{y), where the processes y i—> S^{y), j, k e {0, 1}, (3.7) mean and j.k G {0,1}, have zero variance functionT.'^^^{y,y) := E[Sj{y)S^{y)], for j,k,r,s e {0,1}. Our third main result shows that various functionals of the status quo and counterfactual marginal distribution and quantile functions satisfy a functional central limit theorem. Corollary Fyjj, ^(y, 3 (Limit for differentiable FyJ, a funcHonal taking values Fy^, Fy^, in {Fy^, Fy^, Fy^, distribution FyJ in l'°°{y), he Let = Hyly) Hadam.a,rd differentiable taugentially to the subspace of continuous functions with derivative Then under Conditions (000, 011, 001, 0io)- functionals). in (2.11) converges in law to the following V^(//v(y)-//r(y))=> E M and D the plug-in estimator Hy{y) defined Gaussian process: y%^'jkiy^FlFl.,F,\,F^.JZ^iy)=:TH{y), (3.8) .?A-e{o,i} where y in i°°{y), i—> Tuiy) has zero mean and covariance function Y^Tniy^y) ' — E[TH{y)TnmExamples of functionals covered by Corollary 3 include function-valued parameters, such as Lorenz curves and Lorenz policy as Gini coefficients and Gini pohcy also include quantile Condition RP; in effects, as well as scalar-valued effects (Barrett and Donald, 2009). These examples and distribution functions of the Appendix C we parameters, such effect of the policy defined under state the results for these effects separately in order to give them some emphasis. 3.5. Uniform inference and resampling methods. We can readily apply the preced- ing limit distribution results to perform inference on the distributions and quantiles of the outcome before and after the policy at a specific point. For example, Corollar}' that the quantile policy effect estimator for a given quantile with mean QEy QEy (u) for a particular quantile index E^. (u, u) [u) and variance E^/ [u,u)/n. by a consistent estimate. u using We this u is 1 implies asymptotically normal can therefore perform inference on normal distribution and replacing 21 However, pointwise inference permits looking at the effect of the pohcy at a specific point only. This approach might be restrictive for policy analysis where the quantities and many hypotheses of interest usually involve points or a continuum of points. That entire distribution or quantile function of the observed and counterfactual outcomes of interest. For example, in order to test hypotheses of the policy having no effect distribution, having a constant effect throughout the distribution, or having a dominance effect, we must use the entire outcome distribution, is, is point. Moreover, simultaneous inference corrections to pointwise procedures based often on the first and not only a single the order specific on the normal distribution, such as Bonferroni-type corrections, can be very conservative for simultaneous testing of highly dependent hypotheses, and become completely inadequate for testing a A continuum of hypotheses. convenient and computationally attractive approach for performing inference on func- tion-valued parameters is to use Kolmogorov-Smirnov type procedures. Some complica- tions arise in our case because the limit processes are non-pivotal, as their covariance functions depend on unknown, though estimable, nuisance parameters.^ valid way ods. An to deal with non-pivotality is to use resampling attractive feature of our theoretical analysis simulation methods follows from the Hadamard is A practical and and related simulation meth- that validity of resampling and differentiability of the policy functionals Indeed, given that bootstrap and with respect to the underlying conditional functions. other methods can consistently estimate the limit laws of the estimators of the conditional distribution and quantile functions, they also consistently estimate the limit laws of our policy estimators. This convenient result follows from preservation of validity of bootstrap and other resampling methods more on see Theorem Lemma this in for 6 in estimating laws of Hadamard differentiable functionals; Appendix A. 3 (Validity of bootstrap and other simulation methods for estimating the laws of policy estimators of function- valued parameters). // the bootstrap or any other simulation method consistently estimates the laws of the limit stochastic processes (3.1) the estimators of the conditional quantile or distribution function, then this consistently estimates the laws of the and lim.it - (3.2) for method also stochastic processes (3.3), (3.5), (3.6), (3.7), (3.8) for policy estimators of marginal distribution functionals. and and quantile functions and other . . Similar non-pivotality issues arise in a variety of goodnes.s-of-fit problems studied by Durbin and others, and are referred to as the Durbin problem by Koenker and Xiao (2002). 22 Theorem 3 shows that the bootstrap inferential processes. This is is vahd for estimating the true provided that the bootstrap is Umit laws of various valid for estimating the limit laws of the (function-valued) estimators of the conditional distribution functions. This is and quantile a reasonable condition, but, to the best of our knowledge, there are no results in the literature that verify this condition for our principal estimators. Indeed, the previous results on the bootstrap established laws of our principal estimators, which difficulty, in Appendix F we prove is its validity only for estimating the pointwise not sufficient for our purposes.^ To overcome this validity of the empirical bootstrap and other related methods, such as Bayesian bootstrap, wild bootstrap, k out of n bootstrap, and subsampling bootstrap, for estimating the laws of function-valued estimators, such as quantile regression and distribution regression processes. These results pendent We interest. , may be of substantial inde- , can then use Theorem 3 to construct the usual uniform bands and perform inference on the marginal distribution and quantile functions, and various functionals, as described in detail in Chernozhukov and Fernandez- Val (2005) and Angrist, Chernozhukov, and Fernandez- Val (2006). Moreover, if the sample size is large, we can reduce the computa- tional complexity of the inference procedure by resampling the first order approximation and quantile functions (Chernozhukov to the estimators of the conditional distribution and Hansen, 2006); by using subsampling bootstrap (Chernozhukov and Fernandez- Val, 2005); or by simulating the limit processes Zj or Vj, j G {0, 1}, appearing in expressions and (3.1) 3.6. (3.2), using multiplier methods (Barrett and Donald, 2003). Incorporating uncertainty about the distribution of the covariates. In the know preceding analysis we assumed that we the distributions of the covariates before after the policy intervention for the target population. observe such distributions only for individuals in the sample. sample are the target population, then the previous inference without any adjustments. If In practice, however, If limit theory and we usually the individuals in the is valid for a more general population group is performing the target population, then the distributions of the covariates need to be estimated, and the previous limit theory needs to be adjusted to take this into account. ideas, while in We Appendix D Here we highlight the main we present formal distribution and inference begin by assuming that the estimators x \—f Fyj. (x), theory. k G {0, 1}, of the covariate distribution functions are well behaved, specifically that they converge jointly in law to Exceptions include Chernozhukov and Hansen (2006) and Chernozhukov and Fernandez- Val (2005), but they looked at forms of subsampling only. 23 Gaussian processes G ^a'^, k {0, 1}: (Fx,{x) - Fx,{x)) => v^ ^,BxSx). Appendix D.l. This assumption as rigorously defined in e /c is {0, 1}, quite general and holds for conventional estimators such as the empirical distribution under i.i.d. sampling as well as various modifications of conventional estimators, as discussed further in Appendix D. where the distribution joint convergence holds trivially in the leading cases a known transformation of the distribution in group 0, or when in group The 1 is the two distributions are estimated from independent samples. The estimation interests. of the covariate distributions affects limit distributions of functionals of Let us consider, for example, the marginal distribution functions. covariate distributions are form FyXv) = unknown, the plug-in estimators /^ FYj{y\x)dFxi,{x) , j,k 6 {0, 1}. The When for these functions the take the limit processes for these estimators become V^ (^F,'.{y) where the familiar tion F^/y)) => ^,Z][y) first component arises ^, + f Fy^{y\x)dBx,{x), j,k e {0, 1}, from the estimation of the conditional distribu- and the second comes from the estimation of the distributions of the covariates. In Appendix 4. The D we discuss further details. . . , Labor Market Institutions and the Distribution of Wages empirical application in this section draws by DiNardo, Fortin, and Lemieux (1996, DFL its motivation from the influential article which studied the hereafter), tutional and labor market factors on the evolution of the U.S. 1979 and 1988. DFL's ,. , , The goal of our empirical application is effects of insti- wage distribution between to complete and complement analysis by using a wider range of techniques, including quantile regression and distribution regression, and to provide confidence intervals for scalar-valued effects as well as function- valued effects of the institutional distribution, We and Lorenz policy use the same dataset effects. as in DFL, Current Population Surveys (CPS) is dummies factors, such as quantile, - extracted from the outgoing rotation groups of the in the hourly log- wage in 1979 dollars. nine education and labor market 1979 and 1988. The The outcome variable of interest regressors include a union status dummy, interacted with experience, a quartic term in experience, two occupation dummies, twenty industry dummies, and dummies for race, SMSA, marital 24 . and part-time status, CPS of the women status. Following DFL we weigh the observations by the product We sampling weights and the hours worked. separately. The major factors suspected to have an important minimum distribution between 1979 and 1988 are the whose 27 percent, the level of unionization, in our sample, and the composition role in the evolution of the we decompose the change total level also declined of the labor force, the effect of a change in in the minimum US wage wage, from 30 percent to 21 percent whose education minimum in the levels and other Thus, following distribution into the sum DFL, of four effects: (2) the effect of de-unionization. (3) the effect The effect (1) of changes in the composition of the labor force, measures changes wage wage, whose real value declined by characteristics have also changed substantially during this period. (1) men and analyze the data for and the price (4) effect. marginal distribution of wages that occur due to a change in the wage; the effects (2) and wages that occur due to a change measure changes (3) in the marginal distribution of in the distribution of a particular factor, the distribution of other factors at some constant level; having fixed the effect (4) measures changes in the marginal distribution of wages that occur due to a change in the wage structure, or conditional distribution of wages given worker characteristics. Next we formally define these four effects as differences " counterfactual distribution functions. Let F^/' tribution function of log- wages wage, ni, is as Y when distribution observed in year denote the counterfactual marginal the wage structure the level observed for year s, between appropriately chosen is as in year as the distribution observed in year v. We the minimum the distribution of union status, U. and the distribution of other worker r, t, identify dis- is as the characteristics, Z, and estimate such counterfactual is dis- tributions using the procedures described below. Given these counterfactual distributions, we can decompose the observed and 1988 into the sum total change in the distribution of of four effects: ^Yss,mss V79, 77179 Vgg, 77179! i'88."l88 ' '" _|_ ' (2) [17^^79.288 77^/79,2791 l-'^y88,'n79 '^yss, 77779) rpUjs.Zjs piljQ.Zjg-l l-'^V88,in79 V79, 77779! component is ' (4) the effect of the change in the effect of de-unionization, the third the fourth is the price effect. is (4.1) I (3) first Ysa.rmgi l^V'sg, 77179 (1) The wages between 1979 minimum wage, the second is the the effect of changes in worker characteristics, and As stated above, we see that the effects (2) changes in the marginal distribution of wages that occur due to a change of a particular factor, having fixed the distribution of other factors at and (3) measure in the distribution some constant level. 25 The effect (4) captures changes in the wage structure or conditional distribution of wages given observed characteristics; in particular, it captures the effect of changes in the market returns to workers' characteristics, including education and experience. Finally, the interpretation of the The decomposition for the mean. We minimum wage we discuss effect (1) in detail below. Oaxaca-Blinder decomposition (4.1) is the distribution version of the obtain similar decompositions for other functionals of interest, (f^iFy^^^l') such as marginal quantiles and Lorenz curves, by making an appropriate substitution in equation (4.1) : (1) (2) (3) (4) (4.2) In constructing the decompositions (4.1) and (4.2), in DFL.^ Also, like DEL, we We -"''' framework. ^ appearing in (4.1). how The in our analysis the first first we need counterfactual distribution if the real Fy^^'^^^, the distri- is minimum wage were is to assume the conditional wage density minimum only on the value of the minimum wage. Under the I where Fy^^msiyW^ characteristics The report strategy, DFL ^r88,m88 {y\u,z), - level of the minimum wage is below the minimum wage The second distribution below minimum wage results for the reverse sequential order in the if y > myg; as in year s. ( given worker Under the second We Appendix. cannot identify this quantity from random variation wage does not vary across individuals and to show that denotes the conditional distribution of wages at year when the DFL, choice of sequential order matters and can affect the relative importance of the four effects. some We ^) first at or its level. the minimal wage by simply censoring the observed wages below the as high Following wage, and the we employ completely avoids modeling the conditional wage the value of the well . has no employment effects and no spillover effects on wages above strategy fit to identify and estimate the various counterfactual distributions we employ minimum wage depends DFL, we because they do not Identifying this quantity requires additional assumptions.^" strategy as _^ bution of wages that we would observe in 1988 as in 1979. same sequential order '' , next describe follow the follow a partial equiUbrium approach, but, unlike do not incorporate supply and demand factors in our we in minimum wage, since the federal varies little across states in the years considered. minimum ' strategy, we have that ' ,, T? Given either ( (4.3) or (4.4) / \ \ we if °' y <m79; ' identify the counterfactual distribution of wages using the representation: i^SS(y) = . where Fyzi We is /^Vss,m..(y|^,^)dFyz.s(«,2), (4.5) the joint distribution of worker characteristics and union status in year can then estimate this distribution using the plug-in principle. In particular, we mate the conditional and distribution in expressions (4.3) (4.4) F,Hi'^TAy) ' ; = I distributions we need esti- using one of the regression methods described below, and the distribution function Fyzsg using The other counterfactual marginal t. its empirical analog. are J ^yss,^rAyW,z)dFu,,{u\z)dFzss{z) (4.6) and Fy::S:M = j FY,,,n.Ay\u.z)dFuz.Au.z). Given either of our assumptions on the minimum wage tions are identified all the components of these distribu- and we can estimate them using the plug-in estimate the conditional distribution Fvgg,n,_g(y|i/, z) (4.7) principle. In particular, we using one of the regression methods described below, the conditional distribution Fu^^{u\z), u G {0, 1}, using logistic regression, and Fzsg,{z) and Ft'z,9 using the empirical distributions. Formulas (4.5)-(4.7) giving the expressions for the counterfactual distributions reflect the assumptions that give the counterfactual distributions a formal causal interpretation. Indeed, we assume in (4.6) and (4.7) that we can and change only the marginal distributions specify how the fix the relevant conditional distributions of the relevant covariates. conditional distribution of wages changes with the level of the and estimate them using the plug-in To estimate the conditional methods: distributions of wages a logit link. The classical regression, despite its left side principle. we consider three classical regression, linear cjuantile regression, also minimum wage. Note that we directly observe the marginal distributions appearing on the of the decomposition (4.1) we In (4.5), different regression and distribution regression with wide use in the literature, is not appro- priate in this application due to substantial conditional heteroscedasticity in log wages (Lemieux, 2006, and Angrist, Chernozhukov, and Fernandez- Val, 2006). The linear quantile regression is more flexible, but it also has shortcomings in this application. First, 27 there is a considerable amount of rounding, especially at the level of the which makes the wage variable highly quantile function the minimum may discrete. minimum wage, Second, a linear model for the conditional not provide a good approximation to the conditional quantiles near wage, where the conditional quantile function may be highly nonlinear. we therefore distribution regression approach does not suffer from these problems, and employ it to generate the main empirical results. In order to check the robustness of our empirical results, we also employ the censoring approach described above. minimum wage the wages below the The to the value of the We set minimum wage and then apply censored quantile and distribution regressions to the resulting data. In what follows, we first present the empirical results obtained using distribution regression, and then briefly compare them with the and censored results obtained using censored quantile regression distribution regression. We present our empirical results in Tables 1-3 and Figures 1-9. In Figure the empirical distributions of wages in 1979 and 1988. In Table and inference results for the decomposition (4.2) of we report 1, 1, we compare the estimation the changes in various measures of wage Figures 2- dispersion between 1979 and 1988 estimated using distribution regressions.-'^ 7 refine these results by presenting estimates for several major functional of distribution functions, and 95% simultaneous confidence intervals interest, including the effects and Lorenz curves. We construct on entire quantile functions, the simultaneous confidence bands using 100 bootstrap replications and a grid of quantile indices {0.02, 0.021, plot all ..., 0.98}. We of these function- valued effects against the quantile indices of wages. In Tables 2-3 and Figures 8-9, we present the estimates of the same Table effects as in 1 and Figures 2-3 estimated using various alternative methods, such as censored quantile regression and censored distribution regression. Overall, we find that our estimates, confidence intervals, and robustness checks foundation. reinforce the findings of we provide standard DFL, giving them a rigorous econometric and confidence intervals, without which able to assess the statistical significance of the results. Moreover, we Indeed, we would not be all errors validate the results with a wide array of estimation methods. discuss each of our results in In Figure 1, more distributions of wages in 1979 and 1988. The estimation what follows below, we detail. we present estimates and uniform confidence significantly lower in 1988 while the In We upper end see that the low end of the distribution is is significantly higher in 1988. This pattern results parallel the results presented in results for the decomposition in reverse order. intervals for the marginal DFL. Table Al in the . , Appendix gives the 28. reflects the well-known increase in wage inequality during this period. decomposition of the total change into the sum of the four we Next we turn to the For this decomposition eff'octs. focus mostly on quantile functions for comparability with recent studies and to facilitate the interpretation. In Figures 2-3, we present estimates and uniform confidence intervals for the total change form a decomposition of 1979 and 1988 in the marginal quantile function of wages and the four effects that in the this total change.-'^ top left report the marginal quantile functions in panels of Figures 2 and results for the decomposition of the total change in 3. we In Figures 4-7, plot analogous marginal distribution functions and ... Lorenz curves. From Figures We 2 and we 3, change see that the contribution of union status to the total The quantitatively small and has a U-shaped effect across the quantile function for men. magnitude and shape of this effect on the marginal quantiles between the decile sharply contrast with the quantitatively large first is and last and monotonically decreasing shape of the effect of the union status on the conditional quantile function for this range of indexes (Chamberlain, 1994), and illustrates the difference between conditional and unconditional effects. -"^ In general, interpreting the unconditional eff'ect a covariate requires some care, because the covariate of its support. of changes in the distribution of may change For example, de-unionization cannot affect those at the beginning of the period, which is only over certain parts who were not unionized 70 percent of the workers; and in our data, the unionization declines from 30 to 21 percent, thus affecting only 9 percent of the workers. Thus, even though the conditional impact of switching from union to non-union status can be quantitatively large, has a quantitatively small effect on the marginal distribution it since only 9 percent of the workers are affected. From Figures 2 and acteristics (other 3, we also see that the than union status) inequality in the upper tail of is change in the distribution of worker char- responsible for a large part of the increase in the distribution. The importance of these composition effects has been recently stressed by Leniieux (2006) and Autor, Katz and Kearney (2008). composition effect is realized through at least two channels. through between-group inequality. Discreteness of wage The In our case, higher educated first The channel operates and more experienced wage data implies that the quantile functions have jumps. To avoid this erratic behavior in the graphical representations of the results, we display smoothed quantile functions. The non- smoothed results are available from the authors. The quantile functions were smoothed using a bandwidth of 0.015 and a Gaussian kernel. We The find similar estimates to function in our CPS data. results in Tables 1-3 Chamberlain (1994) and Al have not been smoothed. for the effect of union on the conditional quantile 29 By workers earn higher wages. tween the lower and upper is increasing their proportion, tails of we induce a the marginal wage distribution. gap be- larger The second channel that within-group inequality varies by group, so increasing the proportion of high vari- ance groups increases the dispersion the marginal distribution of wages. in In our case, By higher educated and more experienced workers exhibit higher within-group inequality. increasing their proportion, distribution. we induce a higher inequality within the upper To understand the Y= consider a linear quantile model Var[Y] first X'P{U), where we can decompose the variance of total variance, The channels in wage dispersion effect of these = + E{P{U)]'Var[X]E[P{U)] channel corresponds to changes of Y X it is independent of U. is tail useful to By the law into: trace{E[XX']Var[p{U)]}. in the first of the term of (4.8) (4.8) where Kar[X] represents the heterogeneity of the labor force (between group inequality); whereas the second channel corresponds to changes in the second term of (4.8) operating through the interaction of between group inequality E[XX'] and within group inequality Var[P{U)]. In Figures 2 and 3, we also include estimates of the price effect. changes in the conditional wage structure. if It represents the difference this period. similar to the pattern Autor, Katz and Kearney (2006a) and 2000. They skill jobs. 1990. A This effect effect captures we would observe the distribution of worker characteristics and union status, and the remained unchanged during minimum wage has a U-shaped pattern, which However, they do not find a U-shaped pattern employment for the is between 1990 find for the period relate this pattern to a bi-polarization of into low and high period between 1980 and possible explanation for the apparent absence of this pattern in their analysis might be that the declining minimum wage masks we This control for this temporary factor, component this phenomenon. In our analysis, once we do uncover the U-shaped pattern for the price in the 80s. In Tables 2-3 and Figures 8-9, we present several interesting robustness checks. As we mentioned above, the assumptions about the minimum wage are particularly delicate, since the mechanism that generates wages strictly below this level could be mea- surement error, non-coverage, or the results to the DFL is not clear; it non-compliance with the law. To check the robustness of assumptions about the minimum wage and to our semi-parametric model of the conditional distribution, linear quantile regression and censored distribution regression with a we re-estimate the decomposition using censored logit link, using wage data censored below the minimum wage. For censored quantile regression, the we use 30 . . Powell's (1986) censored quantile regression estimated using Chernozhukov and Hong's (2002) algorithm. For censored distribution regression, we simply censor bution regression estimates of the conditional distributions below the recompute the functionals of interest. Overall, we to zero the distri- minimum wage and find the results are very similar for the quantile and distribution regressions, and they are not very sensitive to the censoring. _ ^ This paper develops methods interest of a of the change samples for Conclusion performing inference about the on an outcome of effect in either the distribution of policy-related variables or the relationship outcome with these in large 5. '"^ relies variables. The proposed inference procedures validity of the only on the applicability of a functional central limit theorem for the estimator of the conditional distribution or conditional quantile function. This condition holds for most important semiparametric estimators cjuantile functions, ^ such as classical, quantile, duration, of conditional distribution and distribution regressions. Massachusetts Institute of Technology, Department of Economics search Center; and University College London, and CEMMAP. & Operations Re- E-mail: vchern@mit.edu. Re- search support from the Castle Krob Chair, National Science Foundation, the Sloan Foundation, § and CEMMAP gratefully acknowledged. is Boston University, Departm.ent of Economics. E-mail: ivanf@bu.edu. Research sup- port from the National Science Foundation J Brown University, gratefully acknowledged. is Department of Economics. E-mail: Blaise^Melly ©brown, edu. Appendix This Appendix contains proofs and additional lemmas on the functional delta simulation method, extending method and its results. Section Section D Z-processes and Section E F C for collects gives limit distribution results derives limit theory, including complement the Hadamard results in differentiability, for applies this theory to the principal estimators of conditional distribution and quantile functions. These results establish the validity of bootstrap We any presents additional results for the case where the covariate distributions are estimated. These Section method beyond the bootstrap. Section B the proofs for the results in the main text of the paper. Section the main text. collects preliminary derives the functional delta applicability theory for policy effects estimators. A have additional results on quantile, distribution and Lorenz these are available on request from the authors. We effects for the and censored estimates; do not report them here to save space. 31 other resampling schemes for the entire quantile regression process, the entire distribution regression process, and related processes arising in the estimation of various conditional These quantile and distribution functions. results may be of a substantial independent interest. Appendix Functional Delta Method, Bootstrap, and Other Methods A. This section collects preliminary lemmas on the functional delta method and derives the method functional delta any simulation method, extending for beyond the applicability its bootstrap. Some A.l. definitions and auxiliary We results. begin by quickly recalling from van der Vaart and Wellner (1996) the details of the functional delta method. -Definition 1 (Hadamard-differentiability). Let Dq, Do C P. A map Do to 4> V^p : cH) ^-^ E E normed he map : 4>'b Do '"^ E such that n (t>'g{h), — with spaces, ^B>^ tangentially called Hadamard-differentiable at 6 is there is a continuous linear if o,nd IP; . , oo, > tn for sequences all ^- t„ and h^ ^i h ^ Dq such that 6 + tnh„ G D^ for every n. , ,, , This notion works well together with the continuous mapping theorem. Lemma 2 (Extended continuous mapping theorem). Let and Qn D„ H^ E ^ if Xn' Xn : maps (n > D„ and every element inK: random element , , . X 6e arbitrary subsets such that for every sequence 0), X G Do along a subsequence, then p„'(x„') —> goi^)- r^n H^- : he arbitrary D D„ C .,;.. -, , (i) (ii) IfXr.^X,thengn{Xr,)^go{X); IfXn-^pX, The combination then gn{Xn) ~^pgo{X). of the previous definition .:.:>." v. ^. a is ,. : maps random ,,,,.,. . , . 6 ©„ Then, for arbitrary with values in Dq such that go{X) , a;„ .;,:, .:,..,:. , : , ::' :: : and lemma is known .,,.'.'' - : • , ; as the functional delta method. Lemma D 3 (Functional delta-method). Let Do, D, and E 6e Hadamard-differentiable rn{Xn — 9) =^ X inlD, where X is I—> of constants rn — > oo. Then at be normed tangentially to Dq. Let separable and takes rn {4>{Xn) E — (p{9)) its Xn : spaces. Q„ i—> D,^ be values in Do, for => 0e(A'). If (p'g is Let defined (/> : D^ C maps with some sequence and continuous 32 on the whole o/P, then the sequence r„ {(p{Xn) — — 4>[9)) — {fn{^n (!>'$ d)) converges to zero in outer probability. The applicability of the method is greatly enhanced by the fact that Hadamard differ- entiation obeys the chain rule. Lemma 4 (Chain rule). If tangentially to 4)' V' o xp'.,gs o then (Do), deiivative Do and </!) : •i/) : D^ F 1-^ F i—> E^, C H B>^ : is i-^ E^ is Hadamard-dijferentiable at 9 Hadamard- differentiable Hadamard- differentiable is at 9 B)^ tangentially to at (p{6) Dq with tangentially to •- ' (p'g. £ . , . Another technical result to be used in the sequel is . concerns the equivalence of continuous and uniform convergence. Lemma 5 (Uniform convergence via continuous convergence). Let D separable metric spaces, with sequence of functions /„ convergent sequence Xn : — * D ^—> x in E D Proof of Lemmas 2-4: See van Proof of Lemma A. 2. Functional JF^ = (IFi, ..., 5: See, for we have — that fn{xn) ^ for of constants mo is if for any D D random elements Vn + mo/n -^ c \/n.{Vn Vr, — V) = Vn{J-'n), converges random elements GjV^ (A.l) a possibly random sequence such that m/niQ > 0,^^ ^p 1 for and the "draw" Gn some sequence is produced by method that guarantees that the sequence converges conditionally given .F„ in distribution to a tight random element G, sup,,eBL,(0) in and only 2. normed space D, the sequence bootstrap, simulation, or any other consistent Gn if a /(2.'). data. Consider sequences of —> 00 such that Then continuous. zs D complete be bootstrap and other simulation methods. Let K= m{n) E E and der Vaart and Wellner (1996) Chap. 1.11 and 3.9. unconditionally to the process G. Let the sequence of m= h-> converges to f uniformly on the original empirical process. In a where D : example, Resnick (1987), page delta-method Wn) denote the Suppose / compact. D \E\:f,MGr.r - Eh{G)\ -> 0, outer probability, where BLi(D) denotes the space of function with Lipschitz most 1 and can take G E\jr^ norm at denotes the conditional expectation given the data. In the definition, we to be independent of ^„. The random (A.2) scaling is needed to cover wild bootstrap, for example. 33 Given a map : D^ C D i— > E, we wish - Ei^nh (%/^(0(K) SUP/jgBLi(E) show that to 0(K^)))* - ^ Eh{(j>'y{G))\ (A.3) 0, in outer probabihty. Lemma E be 6 (Delta-method for bootstrap and other simulation methods). Let Dq; normed spaces, with C Pq D. Let tangentially to Bq. Let V„ and K„ be that \/n(Ki — V) => and Proof of Leraima 6: maps D E ^-> The proof o-Tid Hadamard-differentiable at 6e as indicated previously with values in f-4.S^ holds in outer probability, Then (A.3) holds values in Do- its G D^ C : IP, where G V D^ such separable and takes is in outer probability. generalizes the functional delta-method for empirical bootstrap in Theorem 3.9.11 of van der Vaart and Wellner (1996) to exchangeable bootstrap. This expands the applicability of delta-method to a wide variety of resampling and simulation schemes that are special cases of exchangeable bootstrap, including empirical bootstrap, Bayesian bootstrap, wild bootstrap, k out of n bootstrap, and subsampling bootstrap (see next section Without for details). assume that the derivative loss of generality, uous on the whole space. Otherwise, replace by an extension 4>'y BL||<^.jl(D). Thus probability. Next : D h-> E**. For every h proved once converges to zero in it - *• E defined and contin- is and the derivative E*"* - <P'v - (p'y is contained in Eh{(P'y{G))\ -> 0, in outer E\^Ji{cP'y (G„))), (A.4) {V^iVn - Vn)) f > s) has been shown that the conditional probability on the right outer probability. distribution to separable = random elements sequence converges by assumption and > i— G BLi(E), the function h o 4>{Vn)) Both sequences y/rn{V„ - V) and G„ c D second dual %> (\/^ {HVu) - HVn))y m (<^(t4) is its : (A.2) implies sup^gBL,(E) \EirJi{(p'v{Gn)T SUP/ieBLi(E) The theorem E by ^'^/ and converges to zero when \/rn{Vn converge (unconditionally) in that concentrate on the space Do- Slut'sky's rrio/n - V) — > theorem when m/m.Q by assumption and Slutslcy's second sequence converges, by noting that V^l{Vn -V) = V^l{Vn - Vn) + V^{Vn - —>p V) 1 The first and niQ/n -^ theorem. The 34 and that E\E\r„h{^{Vn Vn)Y - ^i^XG)! (A. 2), and by By Lemma = - Ki)* - £^|^XG + i„)| < sup^gBLi(D„) E\Eij:„h{y/m.(V„ E\E\^^h[{Gny - E\^^h{G)\ which converges to zero by + sup;jgBLi(D„) E|;r,^/i(G) = Eh{G) t„) G due to independence of from JF„. 3, = 0K (^/^(K^ - ^O) + V^(0(K„) - (i6(y)) = 0V (v^(K, - V)) + V^ (0(Vn) - <^(^0) Subtract these equations to conclude that the sequence Op(l). ,^ . . o*p(l). — (l){Vn)) — (f^'viV^i^n — y/Tn{(l){Vn) Vn)) converges unconditionally to zero in outer probability. Thus, the conditional probability on the right in (A. 4) Exchangeable Bootstrap. A. 3. D converges to zero in outer mean. Let {\Vi, ...,Wn) denote the data. i.i.d. Next we define the collection of exchangeable bootstrap methods that we can employ for inference. each 71, let (e„i, .... e„„) be an exchangeable, nonnegative random vector. For Exchangeable bootstrap uses the components of this vector as random sampling weights in place of constant weights (1, ..., 1). A simple way to think of exchangeable bootstrap each variable Wi the number of times equal to Given an empirical process Ki(/) valued. e,i,', = is as albeit without requiring e„, to - X2,'Li /(A',), we samphng be integer- define an exchangeable bootstrap draw of this process as where e^ = XliLi ^ml'n- This insures that each each observation, which is important draw of V„ assigns nonnegative weights to in applications of to preserve con\-exity of criterion functions. We bootstrap to extremum estimators assume that, for some c > n sup£;[e2+^] < cx), n"^ J](e„, -e„)' ^P 1, e^ ^p > c (A.6) 0, ,=1 where the the last one cases: is (1) two conditions are standard, see Van der Vaart and Wellner (1996), and first is needed to apply the previous lemma. Let us consider the following special The standard empirical bootstrap corresponds to the case where (e„i, a multinomial vector with parameters n and probabilities (1/n, and m= n. (2) The Bayesian bootstrap corresponds nonnegative random variables, and e„j = Ui/Un, so that e„ case where e^i, ..., e„„ are = i.i.d. e.g. 1 to the case ..., and m= vectors with n. (3) £'[e^,|'^] The < 1/n), so that £„ where Ui, unit exponential, with E\U{'^^] ..., < ..., Un are co for some £" £„„) = 1 i.i.d. > 0, wild bootstrap corresponds to the oo for some t > 0, and Karfeni] = 1, 35 m/n = SO that < n resamples k letting (e„i, —>p e^ ..., = k/n < n number m = and both m = nk/{n — k) -^ oo. As a consequence row A; of — > (A. 6) (5) .... oo and n method described above on the weights holds — number n{n — Lemma 7: By Lemma 6, we only need by y^ the support of Vi, C [V, throughout that y^ B.2. = result, to and n — k times k/{n ~ > k) -^ c Q and which might be of inde- to this the conclusions of method. to verify condition (A. 2), which follows D ' for the results in the U ~ main text of the paper. Uniform(ZY) with UX yA! := {(y, x) which a compact subset of M, and that x € is : y E yx, x e ^}, and := U = (0, 1). U x X We . A", what I—> M, and CiJAX) denotes the set of continuous functions mapping h follows, £°^(^/<Y) denotes the set of Uniform Hadamard Hadamard Denote assume a compact subset In bounded and measurable functions : UX i—> M. differentiability of conditional distribution functions with respect to the conditional quantile functions. The the oo, so that Inference Theory for Counterfactual Estimators (Proofs) B, Notation. Define K, := Qy{U\x), where UX ^ condition (A. 6) on the and therefore satisfies condition. (A. 2), This section collects the proofs : The 6 about validity of the functional delta method apply Appendix h /c k)~^^'^k~^/^ k -^ oo. In this case e^ we obtain the following 6, if without replacement. This corresponds to by Theorem 3.6.13 of Van der Vaart and Wellner (1996). of R''. This corresponds to The subsampling bootstrap corresponds Wn of k times the Lemma n bootstrap of 7 (Functional delta method for exchangeable bootstrap). The exchangeable boot- Proof of B.l. The k out interest. Lemma Lemma (4) ordered at random, independent of the Wj's. 0, if strap k —> oo. observations from Wi, weights holds pendent The condition 1/n). ..., letting (e„i, ...,enn) be a the nEe^^^ -^ oo. y/n/k times multinomial vectors with parameters k and e^n) be equal to —^c>0 resampling k = and mo observations from Wi,...,Wn with replacement. probabilities (1/n, e^ > Ee\-^ following lemma establishes differentiabihty of the conditional distribution function with respect to the conditional quantile function. We use this result to prove Lemma 1 in the main text and to derive the limit distribution for the policy estimators based on conditional quantile models. We drop the dependence on the group index to simplify the notation. 36 Lemma ;= 8 (Hadamard derivative of J 1{Qy{u\x) + < y}du. Under i/i((u|x) m The convergence holds uniformly - \\ht -^ /i||oo Lemma Proof of 1{Qy{u\x) for all where 0, We 8: u G 5e(Fy(y|x)) and whereas condition C, as L \ /i,) 0, ^Hy|.^,/^.0^-^y(y|^-) _^ ^^^y|^) .^ -fyiy\x)hiFy{y\x)\x). D,MxJ) = for every with respect to Qy{u\x)). Define Fy(y|x, Fy-(y|a:) for + G h,, i°° small enough := {{y.x) : y ^ y^^, x G P(] {UX), and h G C{UX). have that for any J th,{u\x) ofyX any compact subset > there exists 0, e > - 5) such that for > t <y]< + 1{Qy{u\x) t{h{FY{y\x)\x) < y}; u ^ B(^[Fy{y\x)), \{Qy{u\x) Therefore, for small enough j^ l{gv'(u|.r) + + tht[u\x) <y] = 1{Qy{u\x) <y]. > f < y}du - th,{u\x) 1{Qy{u\x) < y}du Jo (B.l) t l{Qy{u\x) r ^ + t{h{FYiy\x)\x) - < 5) - y} l{Qr(i/|.x) < y} ^^ ' JB,iFy{y\x)) which by the change of variable y = Q)-{u\x) equal to is • fY{y\x)d:y, i where J the image of B£(Fy(y|x)) under u is because Qy{-\^) Fixing JJn[y,y-t{h(FY(y\:r)\:r)-5)] e > f(/i(Fy(y|.r)|x) hand term t-^ 0, for - (5)], in (B.l) is — /y(y|a;) I \ 0, we have that J D and /y(y|x) -^ Lemma 5. - = possible \y,y - i {h{FY{y\x)\x) {h{Fy{y\x)\x) Take a sequence + S) + + [y,y - o{l). o(l) bounds (B.l) from below. Since 6 of (y^, Xt) in uniformly continuous on K. in > can result follows. (y, x) E K, a compact subset K that converges to (y, to this sequence, since the function {y,x) Fy(y|x) and /y(y|x) 5)] /)'(y|x) as Fy(y|x) -^ Fy(y|a;). Therefore, the right that the result holds uniformly in argument apphes is no greater than be made arbitrarily small, the To show of variable one-to-one between Bf:{F)-{y\x)) and J. is -fy{y\x){h{FY{y\x)\x)-5) Similarly The change Q)-{-\x). of yX, we x) G A', then the preceding >-^ — /y(y|.T)/7.(Fy(y|.7:)|.r) This result follows by the assumed continuity of both arguments, and the compactness use of K. is li{u\x), D 37 Proof of Lemma B.3. This result follows by the Hadamard differentiability of the con- 1. Lemma ditional distribution function with respect to the conditional quantile function in 8, Condition Q, and the functional delta method in Proof of Theorem B.4. D 1. The joint Lemma D 3. uniform convergence result follows from Condition by the extended continuous mapping theorem in Lemma 2, since the integral uous operator. Gaussianity of the limit process follows from linearity of the Proof of Theorem B.5. limit process follow uous mapping theorem uous mapping theorem 1. 2. Proof of Corollary B.8. method in Lemma 3 by the functional delta method differentiable (see, e.g., 3. C. 3 since , D D 1 by the extended contin- D 2. rule for Hadamard 1 by the functional delta differentiable functionals in -. Lemma ^ , This result follows from the functional delta method for the bootstrap and other simulation methods Appendix Lemma 2. :.: Proof of Theorem in Doss and GiU, 1992). This result follows from Theorem and the chain ,. B.9. 3. uniform convergence result and Gaussianity of the This result follows from Theorem Lemma in 1 D integral. This result follows from Theorem 2 by the extended contin- Lemma in Proof of Corollary B.7. joint Hadamard is Proof of Corollary B.6. The from Theorem the quantile operator 4. 2. a contin- is in Lemma D 6. Limit distribution for the estimators of the effects For policy interventions that can be implemented either as a known transformation of the covariate, X, we can Yj' — Xj = also identify g{Xo), or as a change in the conditional distribution of and estimate the distribution Yq, j,k G {0,1}, under Condition RP results provide estimators for the distribution limit distribution theory for them. Let Lemma Y of the effect of the policy, stated in the main text. and quantile functions The given Aj following of the effects V={6eR: S = y — y,yEy,yE = and y}. 9 (Limit distribution for estimators of conditional distribution and quantile func- tions). Let Qao{u\x) = QYo{u\g{x)) - Qv-o(u|.t) and Qai{u\x) = Qyi{u\x) - QYg{u\x) be 38 estimators of the conditional quantile function of the effect Q/\^{u\x),j £ {0, and RP, we have: the conditions C, Q, Vn in i°°{{0,l) X X), \/X^Vi{u, x) zero — where . - Qa^{u\x)j (Qa,{u\x) := •3^) \4.o('"'> «^ ' ,,,"; , ,: Under 1}.''^ =^ \\{u,x), j G {0,1}, \/^[K){", - 5'(3:)) ^/X^Vo{u,x). The Gaussian processes (u,x) i— mean and covariance function Q\r^{u,x,u,x) := '' and Vaj^{u,x) yo{u,x)] > ' := {u,x), j G {0, 1}, have Va E[V/^^{u, x)Vi\r{u,x)], for j,r G {0,1}. Let F^j{S\x) = the effects F^,^{5\x), for j G Jlz x) := ^((5. .T, 5, effect, and — 9. Under ^ The uniform convergence G bution processes \/n(-^Aj('5|x) {0, 1}, follows in — Lemma {5\x) as in the proof of Theorem functions). /v with respect to Lemma -^Aj((^|x)cfF\-^, (x) the where 5 h-> the 2. result for the conditional quantile processes from Conditions Q and RP by the extended Uniform convergence of the conditional distri- Qa method in Lemma 3. The Hadamard {u\x) can be established using the differentia- same argument D . for estimators of the conditions M, C, Q, and marginal distribution and quantile RP, the estimators F^ F^ (6) jointly (S) = converge Gaussian processes: VTi{Fi^[8) - Fi^{6)) in i°°{T>), The conditional density of of the marginal distributions of the effects in law to the following ' have zero mean and covariance function 8. 4 (Limit distribution Under G {0,1}. Faj{5\x)),j G {0, 1}, follows from the covergence of the quantile process by the functional delta Fa {0, 1}, j hounded above and away from zero}' to be (3aj("|x)), j and RP. we have: -fA,{6\x)VA^{FA,{S\x)^x) =: Za,{S,x), Z^ji^.x),] G continuous mapping theorem bility of the conditions C, Q, x)ZAr(i5, x)], for j,r G {0,1}. assumed Proof of Lemma \/^(OAj(fi|x) t-^ £'[Z/\,^((5, (5|x), is /a {S,x) an estimMor of the conditional distribution of S}d,u be {0, 1}. %/^(FA,(5ix) -Fa/(5|x)) in i°^{T> X A!), < Jq l{QAji'>i'\-T) Z^ (5), ^ ^^ZA,(<5,,T;)r/Fv,(x) ^: Zi^[8), j,k G {0,1}, j,k G {0,1}, have zero n'/J6,6) := E[Zi^{S)Z^^/s)l forj,k.,r^s G mean and covariance function {0, 1}. In the distribution approach, Qy, (u|a:) can be obtained by inversion of the estimator of the conditional distribution. This assumption rules out degenerated distributions for the distribution of effects, such as constant policy effects. These "distributions" can be estimated using standard regression methods. 39 Under the conditions M, C, Q, and RP, the estimators the marginal quantile functions of the effects Qa (^^) Q^ = (u) inf{(5 F^ : > u] [5) of jointly converge in law to the following Gaussian processes: V^{q1^{u)-Q%{u)) = in /?°°((0, 1)), where fi^{S) mean and -Zi^(Qi^(u))/4(Qi^(u)) =: \%[u), j,k e {0,1}, => E[V^ variance function fly^{u,u) := Proof of Theorem V^ {u)V^^{u)], for The uniform convergence 4. ^ J^ fA^{S\x)dFxi^ix) and u {u), e j k,r,s E {0,1}. j, marginal distribution result for the functions follows from the convergence of the conditional processes in extended continuous mapping theorem in Lemma erator. Gaussianity of the limit process follows convergence result differentiable (see, e.g.. Appendix D. method Doss and are estimated. These results integral. in Lemma 3, D The uniform since the quantile operator Gill, 1992). r— ; is D / for the case complement the analysis where the covariate distributions in the main text. We Limit theory, bootstrap, and other simulation methods. ing Condition a continuous op- Case with Estimated Covariate Distributions ,. This section presents additional results D.l. from linearity of the is by the 9 Inference Theory for Counterfactuals Estimators: The -/ ;: . since the integral Lemma function follows from the convergence of the distribu- for the quantile tion function by the functional delta Hadamard 2, have zero {0, 1}, start by restat- to incorporate the assumptions about the estimators of the covariate distributions. Condition DC. . i/n J fd{Fxi^ (x) Let (a) — Fx;, {x)), ^{FY^{y\x) - FY^{y\x)) := Z,{y,x) and G.v,(/) where Fx^ are estimated prohahility measures, for j, k € {0, ; = 1 } These measures must support the P-Donsker property, namely Zq, Zj, m the space i^{y x G X, G xj => X) x i°°{y x where the right hand side is a zero (v '^0-2^0, v ^\Zi, V '^oCa'o, vAiGxi ;f ) x i°^{J=-) x i°°{J^), for each mean Gaussian process and Xj Fx-Donsker class T, the limit of the ratio is of the sample size in group j to the total sample size n, for j G {0, 1}. (b) The function The condition on class {Fy^ {y\^)> y ^ y} the estimated measure measure based on a random sample. is is Fx^-Donsker, for weak and is satisfied j, fc € {0, 1}. wh6n Fxj Moreover, the condition holds is an empirical for various smooth 40 empirical measures; in fact, in this case the class of functions T for which DC(a) holds can be much larger than Glivenko-Cantelli or Donsker (see Radulovic and Wegkamp, 2003, and Gine and Nickl, 2008). Condition classes of functions, see, e.g., DC(b) is weak condition that holds also a for rich van der Vaart (1998). Theorem 5 (Limit distribution and inference theory for counterfactual marginal distribu- tions). (1) Under conditions M and DC the estimators Fy (y) = J^ Fy^ {y\x)dFx^{x) marginal distribution Junctions Fy (y) jointly converge m law to the of the following Gaussian processes: V^[F^-iy) - F^-{y)] in £°^{y), where y ^ VXjZ^iy) + ^X,Gx,{Fy^{y\-)) =: Z^{y), Zj{y), j,k £ {0,1}, have zero i-^ j,k e mean and covariance {0, 1}. (D.l) function, for j,k,r,se {0.1}, st(y,y) where El'^ Any (2) is x/AASt(y-.y) + Vh>^sE [GxAFyM-))GxAFvM-))] := (D-2) , defined as in (3.4)- bootstrap or other simulation empirical process (Zq, Zj, G.Vo, GyJ method that consistently estimates the law of the in the space £^(J^ x X)xt'^{y x X) x (.'^ [T) x t°° [T) also consistently estimates the law of the empirical process {Z^.Z\,Z\,Z\) in the space e°°{y) X e'^iy) x e°^{y) x f°°(3^). Proof of Theorem in Lemma. Lemma delta 3 10 below with for the t = part of the theorem follows by the functional delta first and the Hadamard method The The 5: differentiability of the l/\/n. The second method marginal functions demonstrated in part of the theorem follows by the functional bootstrap and other simulation methods in Lemma D 6. expressions for the covariance functions can be further characterized in some leading cases; (1) tions The distributions of the covariates in groups and 1 correspond to different popula- and are estimated by the empirical distributions using mutually independent random samples. In this case Gxq and Gxi are independent integrals over Brownian bridges, and the second component of the covariance function in (D.2) Fy^{y)]dFx^{x) for 0, (2) The A'l = /c = s and zero for covariates in group j are k ^ is random sample. — Fy {y)]{F)-^{y\x) — s. known transformations g{Xo), and the covariate distribution in group distribution from a J;^\F)' {y\x) In this case is of the covariates in group estimated by the empirical Gxo and Gxj are highly dependent 41 The second components processes. F^M - [FvAm FimdFxoix) for k Fl{y)]dFxo{'^) k^s = for of the covariance function in (D.2) = and J^[Fy^{y\x) l, — J;:^'[FYjiy\x) - F,\{y)]FyMg{^)) " - F°^,(y)][Fy,(y|5(x)) - Fl{y)]dFx,{x) for = s is JjFY^{y\g{x)) 0, ky^S. Limit distribution theory and validity of bootstrap and other simulation Corollary 4. methods for the estimators of the marginal quantile function, quantile policy effects, distri- bution policy to and effects, Theorems 2 and D.2. Hadamard sult, we differentiable functionals can be obtained using similar arguments and Corollaries 1-3 with obvious changes of notation. 3, derivatives of marginal functionals. In order to state the next y define the pseudometric Pi^^p) on x X ^ re- and on !F by 1/2 Pi2fp^((y,a;),(y,x)) 'LHP) = ElZjiy.x) - Zj{y,: 1/2 -I = p'iHP){fJ) follows from It under Lemma the completion of x 3^ compact. Likewise, JF Lemma -^^ is T ^ on totally yx ; ,., -.^^ and mapping , , /c is, we x A! p'^iip) totally is each for D^ C p\iip\ for each k. D= j. bounded Moreover, , E= j,ke{(),l]. j Fy^{-\x)dFx,{x), is ' ^°^'{yX) x i"°{T) h^ bounded maps f dPi := d{F\,^ t-H> t^iy), : j fdFx^, where the sequence . • :. ; ini^[yX). Pi^PkeC{:F,pl^p^) me,^{r), G T and j,k G {0, 1}. {0, 1}. Then, as t Finally, we assume \0 0(F^,,Fj,J-0(Fy^,F;,,; identify Fx^^ with the map / i—> J fdFxi, {Fy - Fx,)/{ty%), and <P\Py, That y with respect to either of the pseudometrics ,Fx, t i& k 6 {0,1}. a]-^a,eC{yX,p{,^^p;) for the Fx^'Donsker class j, : the space of . . Fx^-Donsker, for X X, forj.k G {0,1}.^^ Consider ,,,,_. is x bounded under - Fy^)l{t^), that for a] := (F^^ for the product of the space of the conditional distribution functions is tribution function on t\Q y denoted A', 4>{Fy^,Fx,):^ where the domain D^ Fy{-\-) , 18.15 in van der Vaart (1998) that 10. Consider the ..;- E{GxM)-GxM)y and Zj has continuous paths with respect to P^i2ip) forj G {0,1}, , in i^{J^). (aj,A), •: Fx,. F*^ , ) pi{f) := ^ .,: ,:. a dis- is G D^ such jfddl .; . that {F)-^{y\x),y as ^ = G y} 42 where and the derivative m.ap {a,p) Proof of % Lemma The continuous. ~'^'FY,FxMj^0k) as Fy^ {dpi - dA) + \/AA / (^Ml^i + \/V^ / ("^ - "jO^ci/J^ | Since a^ is third term vanishes by the term vanishes, since J{a'j — < aj)tdP\.\ = Let 7rm(y,:r) — ||aj argument provided below. Qj||y;f / \td3i\ continuous on the compact semi-metric space is measurable partition U,"ii3^^,:m of within is '^(n,.F"x,)-<l'(Fyyi'x,) term of (D.3) by assumption. finite E, to bounded by |JqJ— Q'j|J3;;t' / dFxf. -^ 0. The second term vanishes, any Fx^-Donsker set T, J /dpi —> J fdpk in i°°{T), and {Fvv(y|x),y ^ y} C f first since for mapping D^ (a,/5), p 10. Write - a',>/Fv, + ^/^tJ I (QJ The h-> (p'p yX^m each ?'; a^tdPl also let limilj^^) < 2\\Qj < 2e - = ^{{iJ,^') + aj O TTr^Wy;^ + J] 2||Qf^ (J^.-f , — Q;j||y;t' p-^s/p)), varies less than G y^im, where [yim^^im) (yi„i,x,>n) if (y,x) for yX such that a^ < e The fourth — > 0. there exists a on each subset. an arbitrarily chosen point is £ 3^'^im}- Then ^ |a-j(y,m, \a,{y,rn,X,r,MPk{hra + X,m)\tPUltr o(l)) i=l < 2e + +o(l) |Qj||y;tmaX/5fc(l.m) ^771 7<77l < since {Itm,?- < w} is 2e + O(0, a FA-^-Donsker The constant class. e is arbitrary, so the left hand side of the preceding display converges to zero. Finally, the map is trivially respect to map is norm on • || D is given by • || ||:va' continuous with respect to \\yx by the continuous. first term V • || • || ||.f . ||jr. The second component of the derivative The continuous with in (D.3) vanishing, as first component is shown above. Hence the derivative D 43 Appendix Functional Delta Method and Bootstrap and Other E. Simulation Methods for Z-processes This section derives a preliminary result that is key to deriving the limit distribution and inference theory for various estimators of the conditional distribution and quantile func- This result shows that suitably defined Z-estimators satisfy a functional central limit tions. theorem and that we can estimate result follows their laws using bootstrap and related methods. The from a lemma that establishes Hadamard differentiability of Z-functionals in spaces that are particularly well-suited for our applications. El. Limit distribution and inference theory for approximate Z-processes. Let us consider an index set T}, where some rate. for T and a set each u 6 T, e{u) That 9{;u) is is, C W. We consider Z-estimation processes {9{u), u G satisfies ||$(^(u), u)\\ < iniee® ll$(^, u)\\+tn, with e^ \ at an approximate solution to the problem of minimizing 6 0. The random function over {9, u) u) h-> '^{9, is specifies conditions ';/,)|| an estimator of some fixed population function {9,u) h^ '^{9,u), and satisfies a functional central limit theorem. lemma ||^(6', The following under which the Z-processes satisfy a functional central limit theorem, and under which bootstrap and other simulation methods consistently estimate the law of this process. Lemma 11 (Limit distribution and inference theory he a relatively coinpact set of some metric .:.__.._ that (i) for each u G T, '^{.u) and has inverse (ii) ^{,u) is '^eo{u).u (iii) (iv) and Q approximate Z-processes). Let be a '\ ihat is ,,,_ »^ R^ possesses a unique zero at 9o{u) £ interior Q, u) that is uniformly continuous at uniformly non-singular, namely Z in i'^{Q x T), where Z m mu u £ T, with derivative inf„gTinfi]/,[[=,i is a.s. E T, ||\i>gg(„)_^/!.|j continuous on Q x T > 0. with respect Euclidean metric, Bootstrap or some other method consistently estimates the law of \/n(^^ For each u G T, o(n~''''^). T compact subset ofW. Assume continuously differentiable at 9q{u) uniformly \/n(^ — ^) => to the \l/~^(-, : space, for let 9{u) be such that \\'^{9{u),u)\\ Then, under conditions < mU£s\\'^i9:U)\\ + e„, — Vl'). with e„ = (i)-(iii) y^(^(.)_^„(.))^_4,-^i^_ Moreover, any bootstrap or other method that the law of the empirical process y/n{9 — [Z(0o(.),.)] inr[T). satisfies condition (iv) consistently estimates 9o) in i°°{T). 44 Lemma Proof of The 11. results follow and by the functional delta method Hadamard is some metric map an r-approximate zero of the Let 4>{-,r) ; i°°{Q) 6 i—> be a 6 i—^ \\zie,u)\\ < map Lemma be a compact subset of and space, Lemma Lemma 6, and with = l/>/n- 12 z{9, u) some for if inf \\z(e',u)\\ that assigns one of + T the W. An be a relatively Q element 6 E > r r. r-approximate zeroes its 3 ( on the following lemma. Let of the preceding result relies set of in bootstrap and other methods in for differentiability of Z-functionals established in The proof compact by the functional delta method (p{z{-, u), r) " G to each element z{-,v.) Lemma lemma z : Q Assume 12. T that conditions (i) Take any hold. X (.'^{Q). W, t—f and suppose we have Here that, it is uniformly m Hadamard tzt{-, as t \ 0, / Zt bounded functions on T, which T as u) denoted as Ot{u) — 0(^(-, w) where n the sample is meet in our context over the parameter space is T= See Van Lemma (!°^{J^ Moreover, our lemma to cover quantile regression processes, on X T for a 12. map z : normed — ^) x T), which appears to be to be totally bounded, which Proof of Lemnia + Then, for the /::((•, u), tqt{u)) spaces. Lemma The con- the collection of all an extremely large parameter space. In particular, to der Vaart and Wellner (1996) p. 3.9.34. 0. size. /?°°(T), is 396 indexed by J^ weak convergence hard to attain when for a = £°°{T) difficult to attain in appli- cations such as quantile regression processes. Indeed, note that J- t map because they include the uniform that the empirical processes ^/n{'^ converge weakly in the space space requires \ uniformly on as l/y/n, are difficult to lemma we need for a continuous an alternative to van der Vaart and Wellner's (1996) convergence of the functions apply their + \ T x diiferentiability of Z-functionals in general lemma ditions of their is u) qt Q stated in the preceding *P u E T, useful to think of Remark. Our lemma 3.9.34 on <!'(•, that on the function (li) —^ z uniformly on Zi fqt{n)- approximate zero of and comment on J-' is in this too rich a space. the limitation of their allows for approximate Z-estimators. This allows us where exact Z-estimators do not We have that ^(6'o(n), u) = Q x T i-^ £°^{Q x T) that is for all u G T. Let exist. Zf —> z uniformly continuous at each point, and qt \ 45 uniformly in u E T t\0. By as Tlie the rest of a rate of convergence for concerning the Unear representation Step we 3, Step = In Step = in ri(^(0,(u), u) - and that uniformly in 2, < We u G T. it In Step we — assuming that 0{-)), we establish 1, main claim verify the u G T, li G + \\t-^^{9o{u).,u) = -zt{9t{u), u) - \\-^{9t{u),u) follows that uniformly in - \\9i{u) + of the = A;(-) lemma o(l). In = is + 0{t). = G T, as v. u) - \\z{9o{u),u) + c"^||^(6',,(ii), Zt{9o{u),u)\\ 0(At(u) '^{9o{a),u)\\ < 9o{u)\\ conclude that uniformly in ^{9o{u), u)) u G T, in has a unique zero at 9o{u) and has an inverse that hence tzt{-,u),tqt{u)) satisfies o(l). 0{t). Note that A,(u) 0(1) uniformly + iias tliree steps. for t~^{6t{-) Here we show that uniformly 1. 'I'(^o("),^^)|| o(l)|| = verify that At(-) 0('I'(-,u) \\^{B,u)+tZi{e,u)\\+tqt{u) =: t\,,{u)+tq,{u), proof tlie to 6{-). 9t{-) M < \\^{et{u),u)-<l'{Bo{u),u)+tZt{e;{u),u)\\ uniformly in u G T. = definition 9t{u) = qt{u)) \ / 0(1) By assumption '!'(•,(/,) continuous at zero uniformly in u G T; T", \\9t{u)-9a{u)\\<dH{^-\^{Bt{u),u).u),<^-\{),u))-^Q, where dn the Hausdorff distance. is - formly in ^ G T, ||^(^,(u), u) so that uniformly in u eT By continuous differentiability - - ^[9^{u),u) ^,o(„).„[^t(^^) assumed d^{u)]\\ = to hold uni- '- 9^{il)\\) ' '. ' ! ' . _ , - o[\\9t{u) ' ' \\<l'i9t{u),u) ~ mu) - '' ' ^.^.^^ t\o <i'{9oiu),u)\\ J^g,^^)J9,{u)-9o{u)]\\ >^.^j^^f^'^'^ - ||^,(u)-^o(^)|| 9o{u)\\ >inf||,,|l=i||^(,o(u),u(/iOII where h ranges over W, and c > by assumption. Thus, uniformly 9o{u)\\<c-'\\<I'{Bt{u)^u)-<iJ{9o{u),u)\\ Step = 0{t). uniformly in u again, conclude \\'i{9t{u),u) T will show that by assumption. = ^ieo{u), u)) .... , Xf{'ii) = o(l) — — 'i{9o{u),u) u G T, \\9t{u) — .,.,..,. and we also have *i'eo(u),u[^((u) qt{'ii) Thus, we can conclude that uniformly -zt{9tiu), u) t-%{u) . in Here we verify the main claim of the lemma. Using continuous differentiability 2. Below we = c>0, 9o{u)] ' . + o(l) = -zi9oiu), u) + o(l) in = - 6'o(u)]|| o(l) uniformly in o{t). (/. u E T, t~^{'i/{9t{u),u) and = %^l^^,^[r\<^{9tiu).u)-<i/{9oiu),u)) = -^,4),j2(^oW,^^)] + o(l). = + o{l)] , G — 46 Step we show that In this step 3. - = Xt{u) = o(l) uniformly in u G T. Note that for + 0{i), we have that Ot G 9, for small — enough t, uniformly in u G T; moreover, Xt{u) < ||i~^^(^t(w), u) + Zt{9t{u),u)\\ = D ^eo{u)AK\u)M^o{u),u)]} + ::{eoiu),u) + o{l)\\ = o{l),ast\0. Oti'u) := Oo{u) i^~^(^)^^ [2:(6'o(a),(i)] Oo{u) || Appendix F. Z-Estimators of Conditional Quantile and Distribution '—^ Functions '. This section derives limit theory These and quantile functions. sampling plans for the principal estimators of conditional distribution re- for the entire quantile regression process, the entire distribution regression and related processes arising process, and other results establish the validity of bootstrap These distribution functions. parameter values u all 11 and of our leading examples. In 9{u) where i—> of a substantial independent interest. we use Lemmas In order to prove the results, conditions that cover may be results u. G and in estimation of various conditional quantile TCR 12. We also specify we have functional these examples, all and 9{u) C C some primitive R^, where for each u G T, 9o{u) solves the equation 'i'{9,u):=E{giW,9,u)] where g ; W x x T ^> R^, W := {X, Y) is = 0, a random vector with support moment estimation purposes we have an empirical analog of the above ^i9,u) where En is Condition Z.l. The set Q is For functions = E^[g{W„9,u)] the empirical expectation and For each u e T, the estimator 9{u) W. (ll^i, satisfies ..., l'V'„) ||^(^(u), u)|| a compact subset ofW is < a inf^ge ||^(^, and W random sample from T is ")|| + ^n, with either a finite subset or a bounded open subset o/M'^. (i) For each u G T, ^{9,u) := Eg{W,9,u) (q'o(u)', /3q)' (ii) = has a unique zero t—> ^{9,u) is : = continuously differentiable at {9q{u),u) with a uni- formly bounded derivative on T, where differentiability in u needs T 9q{u) G interior Q. The map {9,u) case of at being a bounded open subset uniformly nonsmgular at 9q{u), namely o/ R'' ,• ^g^u = infugj- inf |]/i||=i G{9,u) to hold for the = ^Eg{W,9,u) ||>I'gg(u)^u/i|| > 0. is 47 (iii) The function Q = {g{W,6,u),{9,u) € set The map {0,u) integrable envelope G. T X Condition i-^ T} x g{W,0,u) (b) the quantile functions have the and 9 I—> A(x, 9) 9 that are uniformly Q{x, i-^ 9) = form Qy{u\x) each {9, u) G Q{x,9q{u)), where the functions are continuously differentiable in 9 with derivatives bounded over the set X . 13. Condition Z.l implies conditions (i)-(iv) of tion holds with ^/n{'i — ^) with continuous paths in u E T n{u,u) ^ Z, in £°°(T), where Z Lemm.a is 11. a zero In particular, condi- mean Gaussian process and covariance function = E[g{W,9o{u),u)g{W,9o{u),uy]. holds with the set of consistent methods for estimating the law of y/n{^ consisting of bootstrap and exchangeable bootstraps, clusions of at form Fy{u\x) = A{x,do{u)); or Lemma (iv) continuous a square Z.2. Either of the following holds: the conditional distribution has the Condition is P-Donsker with with probability one. (a) (iii) is Lemma 11 hold, more generally. namely Vn{9{-)-9o{-)) => -G{9o{-), )-^ — ^i) Consequently, the con[Z{9o{-), )] in i°°{T). Moreover, bootstrap and exchangeable bootstraps consistently estimate the law of the empirical process \/n{9 This lemma — 9q). presents a useful result in its own the following result, a corollary of the lemma, D Condition and Condition Q for a is right. of From the point of view of this paper, immediate interest to us since it verifies wide class of estimators of conditional distribution and quantile functions. Theorem 6 (Limit distribution and inference theory for Z-estimators of conditional dis- tribution and quantile functions). 1. Under conditions Z.l-Z.2(a), Fy{u\x) of the conditional distribution function {u,x) i—> the estimator {u,x) *—> Fy{u\x) converges in law to a continuous Gaussian process: v^(Fy(4x) in i°° {y X A:!) , Fy(n|x)) =» Z{u,x) := where (?/., -^M^^^G{9oiu),u)-' Zi9o{u),u) x) i—> Z{u, x) has zero mean and (F.l) covariance function Tiz{u, x, u, x) : E[Z{u,x)Z{u,x)]. Moreover, bootstrap and exchangeable bootstraps consistently estimate the law of Z • , = 48 I Under conditions Z.l-Z.2(b), 2. the estimator {u,x) Qy{u\x) of the conditional *-^ quantile function {u,x) •—> Qy{u\x) converges in law to a continuous Gaussian process: V^i [Qv{u\x) ^ - Qy{u\x)) in £°°{{0, 1) X A!), where the process tion Ev'(tt, x, u, x) := We Lemma We 13. finite T simpler, is V at each pair (/;. = sequence of points ||^~^(/it, Ut) T bounded open subset a is and follows T of each pair (/j(, then note that, Atiuuhf) for t* covariance func- = it (ii) G 0,u), of R. 6*0, and it the inverse = T to The proof (i), map for the case we note that by ^"'(/u,u) exists on a continuously differentiable in is with u £ T, where 0(||/if||) 11. To show condition similarly. T is by taking = we have that the closure of T, "i'^lO^u) is we take any sequence uniformly continuous {ut,ht) —> {u,h) with u G T, -. ^{Ooiu),u)h = Gieo{u),a)h, ^ht,u,) - ^i0o{ut),ut)} - -g^i&oiut) + t*h,,Ut)ht (iii), note that by the Donsker central limit theorem for ^(^, u) = — sup^cT^n^jij^i \At{u. h) — ^) cess with covariance function Q.{u,ii) = => Z, where Z is (6', u) with probability one. from the assumptions stated is that Z G{9o{u), u)h\ a zero as / mean Gaussian pro- E[g{W,9o{u),u)g{\V,9o{u),u)'] that has contin- uous paths with respect to the L2{P) semi-metric on Q. continuous at each i—> Oq{u). 0. Enlg{Wi,6,u)] we have that \/n{'i is ^ v- \ we conclude that To show condition G R^ and /i [0, t] rn^(^oK) + 5, map limits. = Lemma u) This implies that for any using the continuity hypotheses on the derivative d'^ /dO and the continuity of Hence by (/u, o(l), verifying the continuity of the inverse can also conclude that 9o{u) and we can extend To show condition = (/j Ut) -^ (0, u) We u. Lemma with a uniformly bounded derivative. 0, n) '^~^iO,Ut)\\ uniformly in at on — (F.2) u), ^ the imphcit function theorem and uniqueness of open neighborhood mean and x) i—^ V{u, x) has zero {v., shall verify conditions (i)-(iv) of consider the case where with a -^^i|^l^G(^o('"), u)-'Z{e,{u). E[V{u,x)V{u,x)]. Moreover, bootstrap and exchangeable bootstraps consistently estiinate the law of Proof of := \/(a, x) The only The map {0,u) result that also has continuous paths on is x i-^ g{\V,9,u) not immediate T with respect By assumption Z has continuous paths with respect to PLHP){{9,u)Ae,u)) = {E[g{W,9,u)~g{W,e,u)fy/~, As |i(^,n) - {e,u)\\ ^ 0, we have to the Euclidean metric that g{W,9,u) • || — g{W,9,u) |{. -^ almost surely. It follows by the dominated convergence 49 theorem, with dominating function equal to (2(5)^, velope for the function class Q, that {E\g[W, 0, The square continuity condition. To show (iv), u) - g{W, integrable envelope we simply invoke Theorem G where G 3.6.13 in 9, is u)]-}^/^ -^ exists Van the square integrable en0. This verifies the by assumption. der Vaart and Wellner (1996) which implies that the bootstrap and exchangeable bootstraps, more generally, consistently estimate the hmit law of \/n{'i Proof of Theorem method Lemma in and the preservation of F.l. This result follows directly from 6. Hadamard validity of bootstrap Lemma D say G, in the sense of equation (A. 2). \1/), the chain rule for 3, tiable functionals in — Lemma 12, the functional delta differentiable functionals in and other methods for Lemma Hadamard 4, differen- D 6. Examples of conditional quantile estimation methods. We consider the loca- tion and quantile regression models described in the text. Example Y variable 2. Quantile regression. The conditional quantile function given the covariate vector X is given by X'Pq{-). Here We at X'Po{u) uniformly in u e T, almost almost surely; and ElXX'] Eg{W, P,u) = for each u £ by (F.3), T = the (0,1), Lemma map P continuous surely; moreover, iniuer fri^'Poi'^)]^) and The of full rank. Conditions Z.l-Z.2(b) hold for {min(n,ii) Proof of is 9 is > c > true parameter Po{u) solves such that Po{u) £ interior (0, 1). 14. = finite (F.3) uniformly bounded and and we assume that the parameter space Lemma n{u,u) is is - ' = {u-l{Y<X'P})X. assume that the conditional density Jy{-\X) outcome we can take the moment functions corresponding to the canonical quantile regression approach: g{W,/3,u) of the i—> Qy{u\x) example with mom,ent function given x'Poiu), G{po{u),u) = -E[fy{X'po{u)\X)XX'], and — uu}E[XX']. To show 14. Eg{W,P,u). {Pq{u),u), = this It is we need to verify conditions straightforward to show that on the derivatives of we have that at (/?, u) = i - , Z.l, ,.. ^-^^Eg{W^p,u) = [G{P,u),EX] = [-E[fy{X'P\X)XXlEX], and the right hand side convergence theorem, the at X'Po{u), £LS is continuous at {Po{il),u). a.s. This follows using the dominated continuity and boundedness of the well as finiteness of i?||X|p. Finally, note that Po{u) mapping y is i-h> /y-(y|A') the unique solution to 50 .. : Eg{W, P,u) = uniformly for u G in each u because it is G{Po{u),u) (0, 1), a root of a gradient of convex function. Moreover, > J'EXX' > where / 0, the uniform lower is bound ' To show we Z.l(iii) W} VC are The Q = classes, so {T\j which is - = li ••:P} & Q x T is of a VC in class J^j J^^j map {d,u) = T, To = 1{K < = ^k^j are also VC with a fixed function is (Lemma a Lipschitz transform van der Vaart, 1998. The collection The envelope thus also Donsker. Finally, the square-integrable. at each {P,u) function classes P-Donkser by Example 19.9 it is J^2]yj formed as products i P-Donsker with a square integrable and Wellner, 1996). The difference Tij—J^2, 2.6.18 in van der Vaart VC is Therefore the function classes classes. classes because they are of Q verify that the function class envelope and the continuity hypothesis. X'0,P G ' - on fY{X'goiu)\X). \—^ {u is given by 2 — 1{Y < X'(5))X is max^ \X.j\ continuous with probability one by the absolute continuity of the conditional distribution of V. To show Z.2(b), we note that the map Z.2(b) provided the set A" Example is (x, 0) i—> x'9 trivially verifies the hypotheses of D compact. V= + V, where X is independent of V. so the conditional quantile function of outcome variable Y given the conditioning variable A' is given by yY'/?o + a'o(-)i where EfKIA'] = X'Po and Q'o(-) = Qv{-)1. Classical regression. This is the location model Here we can take the moment functions corresponding to using X'Po least squares to estimate Pq and sample quantiles of residuals to estimate Qq- giW,a,p.u) = [{u-l{Y-X'p<a}).{Y-X'P)X']': We assume that the density V = Y— of X'Po, /v'(') uniformly bounded and is tinuous at Qo(^') uniformly in u G T, almost surely; moreover, inf^gT- /(q:o(u)) almost surely; EXX' (ao(u),/3o)' solves that (oo('"),/3o)' e Lemma (F.4), is finite, and Eg{W,Q,P,xi) interior for = full rank, and EY'^ < oo. (0, 1), Qy{u\x) = x'po G{ao{u),po,u) + > con- is c > true parameter value is such (0, 1). 15. Conditions Z.l-Z.3(h) hold for this exam.ple with T= The and we assume that the parameter space each u G (F.4) moment junction given by ao{u), fviMu)) fv{ao{u))E{X]' Opxi EXX' (F.5) 51 and Q{u, u) mm(ii, u) = -E[V 1{V < ao{u)}]E[X]' uti (F.6) -E[V 1{V < Lemma Proof of — The proof 15. E[V^]EXX' ao{u)}]E[X] follows analogously to the proof of Lemma 14. Unique- ness of roots can also be argued similarly, with do uniquely solving the least squares normal D equation, and Qq uniquely solving the quantile equation. Examples of conditional distribution function estimation methods. We F.2, sider the distribution regression for the model described in the text and an alternative estimator duration model based on distribution regression. Example The Distribution regression. 4. outcome variable Y given the covariate. vector X conditional distribution function of the is given by A{X'(3q{-)), where moment the probit or the logit link function. Here we can take the to the pointwise maximum where A finite and space full X{X'/3)X, :f.7) A(A"/?)(1-A(X'/?))' 3^ t-^ /y'(y|3:), which is be either a finite set or a bounded open subset of continuous at each y E y, a.s. FyiylX) Moreover, EX X' is i-> rank; the true parameter value /3o(y) belongs to the interior of the parameter each y E y; and A{X'P){1 for either functions corresponding For the latter case we assume that the conditional distribution function y admits a density y is A{X'P)-l{Y<y} the derivative of A. Let is A likelihood estimation; 9{W,P,y) = R*^. con- - A{X'l3)) > c > uniformly on Lemma 16. Conditions Z.l-Z.2(a) hold for this example with (F.7),T = y,u = y,FY{y\x) = moment (3 G 0, a.s. Junction given by A{x'Po{y)), G{Po{y),y)--=E x{X'Po{y)r XX' A{X'Poiym-A{X'Po{y))] and, for y > y, n{y,y) X{X'Poiy))HX'Pom =E A' A" A(X'/?o(y))[l-A(.Y'/3o(y))] Proof of The case where 3^ To show By Lemma Z.l, is We 16. consider the case where a finite set we need d{P',y) is a bounded open subset of Eg{W,p,y) R*^. simpler and follows similarly. to verify conditions on the derivatives of the a straightforward calculation d is y we have that at (/3,y) = map p i-+ Eg{W, (/3o(y),?y), = E[—g{W,P,y)],[g^Eg{W,P,y) = [G(/3,y),/?(/i,y)], P, u). 52 . = where, for H{z) - X{z)/{A{z){l A{z)]} and h{z) = dH{z)/dz, ' , , GW,y) - E[{h{X'P)[A{X'/3)-l{Y<y}] + H{X'P)X{X'P)}XX'], i?(/3,y) = E[H{X'P)fy{y\X)X}]. Both terms are continuous in , E y. This follows from using by y) at {Po{y), y) for each y {f3, the dominated convergence theorem and the following ingredients: (1) map {p,y) ^ ^g{W,Po{y),y), function constllXJI, (3) and (4) A(A''/5)(1 - (2) continuity of the a.s. domination of \\-^g{W,P,y)\\ by a square-integrable continuity of the conditional density function y h^ /y(y|X), a.s. > A(A"/?)) c > uniformly on Eg{W,p,y) = the solution Po{y) to ' unique is for £ 9, /? Finally, also note that a.s. each y E y because it is a root of a gradient of a convex function. To show = we verify that the function class Function classes envelope. j Z.l(iii), 1, ...,p are VC J^i = {X'P,p G 6}, The classes of functions. Q is J^2 {1{^' ^ y}:y E X{J='i)Xj, ,. j = l,...,p a Lipschitz transformation of VC classes with Lipschitz coefRcient and the envelope function c'max^ positive constants. the Hence G \Xj\, which are bounded by craaxj sciuare- integrable; here Donsker by Example 19.9 is and {Xj}, >'}, final class \A(^i)(1-A(J-i)) is = A(^i)-.F,2 f is P-Donsker with a square integrable in 1 and c' are \Xj\ some van der Vaart (1998). Finally, map x continuous at each {p, y) G conditional distribution of uniformly on p E Q, To show Z.2(a), model in 3b. with probability one by the absolute continuity of the and by the assumption that we note that the map {x,0) X is Y - A{X'p)) > t-^ c > A{x'0) trivially verifies the hypotheses D compact. An Duration regression. duration and survival analysis of the duration A(A''/3)(1 a.s. of Z.2(a) provided the set Example V 3^ is to specify the conditional distribution function given the covariate vector probit or the logit link function. We alternative to the proportional hazard X normalize as A(ao(') Q'o(yo) = + X' Pq), where A at some yo is either the E y. Here we can 53 moment take the following functions: + X'l3)-\{Y <y) \{a + A{a + X'p){l-A{a + X'0)) h{a 5(l4/,a,/?,y) = X'p) MX'p) - i{Y < y.\,^^,^^^ h{X'(3){l-K{X'(3)\ where A and the second Let y The the derivative of A. is for be either a first set estimation of /^o- finite set or a bounded open subset that the conditional distribution function y which continuous at each y G is true parameter value (Q'o(y), y ey-, and Lemma (F.7), A(a + of equations X'P){1 /Jq)' 3^, i-^ used = > c> X'i3)) A(ao(y) Giaoiy),Po,y)-= , we assume EXX' and finite is full rank; the belongs to the interior of the parameter space - A(a + Fy{y\x) y, estimation of cvo(y) of R''. For the latter case uniformly on + x'Po), ' e ©, (a,/?')' moment 17. Conditions Z.l-Z.2(a) hold for this example with T = y,u = for Fy(i/|A') admits a density y h^ Iy{y\x)., Moreover, a.s. is — E———g{W,ao{y),0o), ;•• for each a.s. function given by .' . ^ /;'_;,:. o[a, b') '' an(iQ(y,y) Proof of = " E[5(M/ao(y),/3o)5(W^,"o(y),/9o)']. Lemma 17. The proof ' [1] in Lemma 16. D A Quantile References ' "Changes Abadie, A. (1997): follows analogously to the proof of Spanish Labor Income Structure during the 1980's: Regression Approach," Investigaciones Economicas XXI, pp. 253-272. [2] Abadie, A., Angrist, J., and G. Imbens (2002): "Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings," Econometrica 70, pp. 91-117. [3] Andersen, and R. D. P. K., Gill (1982): Sample Study," The Annals of [4] [5] Angrist, J., "Cox's Regression Model for Counting Processes: Chernozhukov, V., and I. Fernandez- Val (2006): "Quantile Regression under Misspecifi- an Application to the U.S. Wage Structure," Econometrica Angrist, and J.-S. Pischke (2008): Mostly Harmless Econometrics: 74, pp. An Autor, D., Katz, L., Economic Review [7] Autor, D., Katz, and Prices," [8] sionists," and M. Kearney (2006a): "The Polarization Empiricist's Companion, of the U.S. Labor Market," American 96, pp. 189-194. L., and M. Kearney (2006b): "Rising Wage Inequality: The Role of Composition NBER Working Autor, D., Katz, 539-563. ' Princeton Univesity Press, Princeton. [6] Large Statistics 10, pp. 1100-1120. cation, with J., A L., Paper wll986. and M. Kearney (2008): "Trends Review of Economics and in U.S. Statistics 90, pp. 300-323. Wage Inequality: Revising the Revi- 54 [9] Barrett, G., and Donald (2003): "Consistent Tests S. for Stochastic Dominance," Econometrica, 71, pp. 71-104. [10] Barrett, G., and S. Donald (2009): "Statistical Inference with Generalized Gini Indexes of Inequality, Poverty, and Welfare," Journal of Business and Economic Statistics 27, pp. 1-17. [11] Beran, R.J. (1977); "Estimating a distribution function," Annals of Statistics [12] Breslow, N. E. (1972): Contribution to the Discussion of "Regression Models and Life Tables," by D. 5, pp. 400-404. R. Cox, Journal of the Royal Statistical Society, Ser. B, 34, pp. 216-217. [13] Breslow, N. E. (1974): "Covariance Analysis of Censored Survival Data," Biometrics, 30, pp. 89-100. [14] Buchinsky, M. (1994): "Changes in the US Wage Structure 1963-1987: Application of Quantile Regression," Econometrica 62, pp. 405-458. [15] Burr, B., and H. Doss (1993): "Confidence Bands for the Median Survival Covariates in the [16] of the American as a Function of the Statistical Association 88, pp. 1330-1340. Cameron, A.C., and P.K. Trivedi (2005): Microeconometrics: Methods and Applications, Cambridge University Press, [17] Cox Model," Journal Time New York. (ed.). • . ^ "Quantile Regression, Censoring, and the Structure of Wages," in C. A. Chamberlain, G. (1994): Sims , Advances in Econometrics, Sixth World Congress. Volume 1, Cambridge University Press, Cambridge. [18] Chernozhukov, and V., I. "Subsampling Inference on Quantile Regression Fernandez- Val (2005): Processes," Sankhyd 67, pp. 253-276. [19] Chernozhukov, V., I. . Fernandez- Val and A. Galichon (2006): "Quantile and Probability Curves without Crossing," mimeo, MIT. [20] Chernozhukov, V., and C. Hansen (2005): "An IV Model of Quantile Treatment Effects," Economet- rica 73, pp. 245-261. [21] Chernozhukov, V., and C. Hansen (2006): "Instrumental quantile regression inference and treatment [22] effect for structural models," Journal of Econometrics 132, pp. 491-525. Chernozhukov, V., and H. Hong (2002): "Three-step censored quantile regression and extramarital affairs," Journal of the American Statistical Association 97, pp. 872-882. [23] Chesher. A. (2003): "Identification in Nonseparable Models." Econometrica. 71, pp. 1405-1441. [24] Cox, D. R. (1972): [25] Dabrowska, D.M. (2005): "Quantile Regression [26] DiNardo, "Regression Models and Life Tables," (with discussion). Journal of the Royal Statistical Society, Ser. B, 34, pp. 187-220. J. in Transformation Models," Sankhyd 67, pp. 153-186. (2002): "Propensity Score Reweighting and Changes in Wage Distributions," unpublished manuscript. University of Michigan. [27] DiNardo, J., Fortin, N., Wages, 1973-1992: [28] Doksum, K. the [29] A (1974); and T. Lemieux (1996): "Labor Market Institutions and the Distribution of Semiparametric Approach," Econometrica Two-Sample Case," Annals Doksum, K.A., and M. Gasko analysis and survival 64, pp. 1001-1044. "Empirical Probability Plots and Statistical Inference for Nonlinear Models in of Statistics (1990): 2, pp. 267-277. "On a correspondence between models analysis," International Statistical Review in 58, pp. 243-252. binary regression 55 [30] Donald, Green, A. A., and H. S. G., Canada and the United An States: J. Paarsch (2000): "Differences Wage in Distributions Between Application of a Flexible Estimator of Distribution Functions in the Presence of Covariates," Review of Economic Studies 67, pp. 609-633. [31] "An Elementary Approach Doss, H., and R. D. Gill (1992): Processes, With an Applications to Weak Convergence for Quantile Journal of the American Statistical to Censored Survival Data," Association 87, pp. 869-877. [32] Firpo, S. "Efficient semiparanietric estimation of quantile (2007): treatment effects," Econom,etrica 75, pp. 259-276. [33] Firpo, S., N. Fortin, and T. Lemieux (2007): "Unconditional Quantile Regressions," Econometrica, forthcoming. [34] Foresi, S., and F. Peracchi (1995): Analysis," Journal of the [35] American "The Conditional Distribution of Excess Returns: an Empirical Statistical Association 90, pp. 451-466. Gine, E. and R. Nickl (2008): "Uniform central limit- theorems for kernel density estimators," Prob- ability Theory and Related Fields 141, pp. 333-387. [36] Gosling, A., Machin, and C. Meghir (2000); "The Changing Distribution of Male Wages in the S. U.K.," Review of Economic Studies 67, pp. 635-666. [37] Gutenbrunner, in the C, and J. Jureckova (1992): "Regression Quantile and Regression Rank Score Process Linear Model and Derived Statistics," Annals of Statistics 20, pp. 305-330. [38] Hall, P., Wolff, R., [39] Han, A., and and Yao, Q. (1999), "Methods for estimating a conditional distribution function," Journal of the American Statistical Association 94, pp. 154-163. J. A. Hausman (1990), "Flexible Parametric Estimation of Duration Risk Models," Journal of Applied Econometrics [40] Heckman, tions Smith, J., pp. 1-28. 5, and N. Clements (1997), "Making the Most Out of Programme Evalua- and Social Experiments: Accounting Economic [41] J. J., and Competing for Heterogeneity in Programme Impacts," The Review of ' Studies. Vol. 64, pp. 487-535. • . Hirano, K., Imbens, G. W., and G. Ridder, (2003), "Efficient estimation of average treatment effects using the estimated propensity score," Econometrica, Vol. 71, pp. 1161-1189. [42] Horvitz, D., and D. Thompson (1952), Finite Universe," Journal of the [43] Imbens, G.W., and W. "A Generalization American of Sampling Without Replacement from a Statistical Association, Vol. 47, pp. 663-685. K. Newey (2009), "Identification and Estimation of Nonseparable Triangular Simultaneous Equations Models Without Additivity," Econometrica, forthcoming. [44] Imbens, G.W., and Evaluation," [45] J. Wooldridge (2008), "Recent Developments NBER Working in the Econometrics of Program Paper No. 14251. Koenker, R. (2005), Quantile Regression. Econometric Society Monograph Series 38, Cambridge University Press. : .. ' . [46] Koenker R. and G. Bassett (1978): "Regression Quantiles," Econometrica [47] Koenker, R., and Z. 46, pp. 33-50. Xiao (2002): "Inference on the Quantile Regression Process," Econometrica 70, no. 4, pp. 1583-1612. [48] Lancaster, T. (1990): The Econometric Analysis of Transition Data, graph, Cambridge University Press. An Econometric Society Mono- 56 [49] . Lemieux, T. (2006): Rising [50] Demand [51] for Skill?," A Machado, (2005): Mata J. Effects, Noisy Data, or 96, pp. 451-498. "Testing for Stochastic subsampling approach," Review of Economic Studies and J., Whang Composition Inequality; American Economic Review Linton, O., Maasomi, E., and Y. conditions: Wage "Increasing Residual Dominance under general 72, pp. (2005): "Counterfactual Decomposition of 735-765. Changes in Wage Distributions Using Quantile Regression," Journal of Applied Econometrics 20, pp. 445-465. [52] McFadden, D. (1989): Economics of Uncertainty (eds) Studies in the Powell, [54] Radulovic. D. and M. of [55] (1986): Wegkamp smoothed empirical processes," Resnick, S. I. honor of J. of T. Fomby and T. K. Sec Hadar), Springer- Verlag. (1987), Extreme "Necessary and sufficient conditions for weak convergence (2003): Statistics & Probability Letters, 61, pp. 321-336. values, regular variation, Series of the Applied Probability Trust, [56] (in II "Censored Regression Quantiles," Journal of Econometrics 32, pp. 143-155. [53] J. L. "Testing for Stochastic Dominance," in Part 4. and point processes, Applied Probability. Springer- Verlag, Rutemiller, H.C., and D.A. Bowers (1968): "Estimation New in A York. a Heteroscedastic Regression VIodel," Journal of the American Statistical Association 63, pp. 552-557. [57] Stock, J.H. (1989): "Nonparametric Policy Analysis," Journal of the Am.erican Statistical Association 84, pp. 567-575. [58] Stock, J.H. (1991): "Nonparametric Policy Analysis: Cleanup Benefits," eds. [59] W. Barnett, and Semiparam.etric Methods Asymptotic statistics, Cambridge m Econometrics and Waste Statistics, Cambridge University Press. Series in Statistical and Probabilistic 3. van der Vaart, A., and to statistics, Application to Estimating Hazardous Powell, and G. Tauchen, Cambridge, U.K.: van der Vaart, A. (1998): Mathematics, [60] J. in Nonparam.etric An New J. Wellner (1996): Weak convergence and empirical processes: with applications York: Springer. , 57 Table 1: Decomposing Changes in Measures of Wage Dispersion: 1979-1988, DR Effect of: Minimum Statistic Total change Individual wage Unions Coefficients attributes Men: Standard 8.0(0.3) Deviation 90-10 21.5 (1.0) 50-10 2.8 (0.1) 0.7 0.0) 1,8 (0.2) 2.7 (0.3) 35.4 (1.4) 8.5 0.6) 22.9 (1.9) 33.1 (2.4) 11.2 (0.1) 0.0 0.0) 9.2 (0.8) I.l (1.3) 52.1 (2.4) 0.0 0.1) 42.6 (4.4) 5.3 (5.9) -2.0 1.0) 5.1 (0.4) 7.9 ( 1.2) 45.5 (8.3) 2.0 1.0) 4.0 (0.8) 4.2 (1.1) 19.7 '8.4) 39.3 (8.8) 41.0 (9.8) 11.2 (0.1) 11.3 (1.4) 99.6 90-50 10.2(1.2) 75-25 15.4(1.1) ( 14.1) 0.0 (0.0) 0.0 (0.0) 95-5 33.0(2.1) Gini 4.1 (O.I) coefficient -3.1 (1.1) 27.1 ( 14.0) 0.0 (0.0) 4.1 q.O) 0.3 (1.3) 11.1 (1.2) 0.0 (0.0) 26.5 6.2) 1.7 (8.6) 71.8 (8.7) 23.0 (0.7) 0.0 '0.6) 8.5 (1.1) 1.4 (1.5) 69.9 (4.1) 0.0 1.7) 25.8 (2.6) 4.3 (4.4) 1.3 (0.0) 0.5 0.0) 0.3 (0.1) 2.0 (0.1) 32.1 (1.2) 11.7 0.6) 6.8 (1.8) 49.4 (1.8) 3.8 (0.1) 0.3 0.0) 4.7 (0.2) 2.1 (0,3) 34.9 (1.5) 3.2 '0.4) 42.8 (1.8) 19.1 (2.5) 23.0 (0.2) 0.9 '0.5) 14.5 (0.7) 1.3 (1.1) 57.9 (1.9) 2.3 1.2) 36.4 (1.7) 3.4 (2.6) 23.0 (0.2) 0.0 0.1) 11.3 (0.4) -1.4 (0.7) 69.9 (1.6) 0.0 0.4) 34.4 (1.3) -4.3 (2.4) 0.0 (0.0) 0.9 0.5) 0.0 (0.0) 13.6 7.2) 0.0 (0.0) 0.0 0.5) 8.3 (0.2) 4.5 (0.8) 0.0 (0.0) 0.0 3.9) 65.1 (5.0) 35.0 (4.5) 16.8 (0.5) 0.7 0.7) 16.4 (2.0) 5.0 (2.1) 43.2 (2.2) 1.9 1.9) 42.1 (5.0) 12.8 (5.1) 2.0 (0.1) O.I 0.0) I.O (0.1) 0.9 (0.1) 49.0 (1.8) 3.5 0.4) 24.5 (1.4) 23,0 (2.2) Women: Standard 10.9(0.4) Deviation 90-10 39.8(1.4) 50-10 33.0(0.7) 90-50 6.8(1.4) 75-25 12.8(0.9) 95-5 38.8(1.9) Gini 4.0(0.1) coefficient Notes All numbers are in % Bootstrapped standard errors are given indicates the percentage of total variation The ( 40.3 (9.9) 11.3) The second model has been applied. in parenthesis distribution regression 2.8 (1.4) 3.1 (0.8) 46.0 line in each cell 58 Table 2: Decomposing Changes in Measures of Wage Dispersion: 1979-1988, CDR Effect of: Minimum Statistic Total change Individual wage Unions Coefficients attributes Men: Standard .2(0.3) Deviation 90-10 50-10 21 ,5(1.0) 11 .3(1.4) 3.3 0.0) 0.6 0.0) 1.9 0.2) 2.4 (0,2) 40.7 1.4) 7.9 0,5) 22.5 1.8) 28.9 (2,4) 9.2 0.8) 1.1 (1.3) 42,6 4.4) 5.3 (5.9) 11.2 0.1) 0.0 0.0) 52.1 2.4) 0,0 0,1) 11.2 0.1) -2.0 9.6(1 4.1) 90-50 10 .2(1.2) 75-25 15 .4(1.1) 95-5 36 .4(2.1) Gini .2(0.1) coefficient -17.9 1.0) 5,1 0,4) ( 1.2) 45.5 8.3) 4,0 0,8) -3,1 (1.1) 27,2 ( 14.0) 4,2 (1.1) 0.0 0,0) 2.0 1.0) 0.0 0.0) 19.7 8.4) 39.3 8,8) 0.0 0.0) 4.1 1.0) 0,3 1,3) 11,1 (1.2) 0.0 0.0) 26.5 6.2) 1.7 8,6) 71,8 (8.7) 26.4 0.7) 0.0 0.6) 8.5 1,1) 1.4 (1.5) 72.7 3.8) 0.0 1.5) 23,4 2.7) 3.9 (4.0) 1.6 0.0) 0.4 0.0) 0,3 0.1) 1,8 (0.1) 37.9 1,1) 10,7 0.5) 7,1 1.6) 44,2 (1.6) 5.6 0,1) 0.3 0.0) 5,1 0,2) 1,7 (0.3) 44.1 1.5) 2.2 0.4) 39,9 1.8) 13,8 (2.5) 13,0 (1.1) 41,0 (9.8) Women: Standard 12 .7(0.4) Deviation 90-10 50-10 43 .2(1.4) 36 ,4 (0.7) 90-50 .8(1.4) 26.4 0.2) 0.9 (0.5) 14,5 0.7) 61.2 2.2 (1.2) 33,5 1.7) 3,1 (2.6) 1.9) 26.4 0.2) 0.0 0.1) 11,3 0.4) -1.4 (0.7) 72.7 1.6) 0.0 0.4) 31,2 1.3) -3,9 (2.4) 0.0 0.0) 0.9 0.5) 3.1 0.8) 2,8 (1.4) 1.3) 40,3 (9,9) 0.0 0.0) 75-25 95-5 12 .8 (0.9) 52 .7(1.9) Gini .9(0.1) ( 0.0 0.0) 0.0 0,5) 83 0.2) 4,5 (0,8) 0.0) 0.0 '3.9) 65,1 5.0) 35,0 (4,5) 4.7 (2,1) 30.6 0.5) 0.7 0.7) 16.7 2.0) 58.1 2.2) 1.4 1.9) 31,6 5.0) 8,8 (5,1) 2.9 (0.1) 0.1 0,0) 1.3 0.1) 0,6 (0,1) 1.9 0.4) 26.1 1,4) 12.8 (2.2) 1.8) Bootstrapped standard errors are given in parenthesis The second hne in each indicates the percentage of total vanation. The censored distnbution regression mod'el has been applied Notes All numbers are cell 46,0 0.0 59.2 coefficient 13.6 (7.2) in °/o 59 Table Decomposing Changes 3: in Measures of Wage Dispersion: 1979-1988, CQR Effect of: Minimum Statistic Total change Individual wage Unions Coefficients attributes Men: Standard ,0(0.3) Deviation 90-10 22 3 (I.I) 50-10 ,5(0.9) 90-50 12 7(0.7) 4.1 0.0) 0.3 0.0) 1.8 O.I) 2.8 (0.2) 45.3 1.5) 3.2 0.5) 20.0 1.6) 31.5 (2.2) 14.2 0.4) -0.5 O.I) 7.2 0.4) 1.4 (1.1) 63.6 '3.4) -2.2 0.6) 32.3 2.8) 6.4 (5.1) 14.2 '0.4) -1.8 0.1) 4.6 0.4) -7.4 (0.9) 47.9 9.0) 78.0(21.6) 148.7 75-25 12 ,7(0.6) 95-5 39 ,2(0.8) Gini ,5(0.1) coefficient ( 6.7) 18.7 3.0) 0.0 0.0) 1.3 0.1) 0.0 0.0) 0.0 0.0) 1.6 0.1) 2.0 0.4) 9.1 (0.5) 0.0 '0.0) 12.9 1.2) 15.5 3.0) 71.5 (3.1) 10. 1.0) 2.6 0.3) 8.8 (0.5) 20.6 2,4) 69.4 (2.5) 0.0) -0.5 O.I) 7.4 0.5) 1.6 (0.8) 78.1 ,1.8) -1.2 0.3) 18.9 1.2) 4.2 (2.1) 1.9 ,0.0) 0.3 0.0) 0.3 O.I) 2.1 (0.1) 6.1 1.4) 45.9 (1.4) 30.6 42.2 I.I) 5.9 0.4) 6.2 Women: Standard 13 I (0.4) Deviation 90-10 48 ,8(1.2) 50-10 37. 2(0.7) 90-50 11 5 (0.9) 75-25 : 95-5 Gini 15 3 (0.9) 50. 8(1.4) 5. (O.I) 0.3 O.I) 4.5 0.3) 2.0 (0.3) 2.6 0.4) 34.8 1.5) 15.0 (1.9) 30.6 0.0) 0.8 0.2) 14.7 0.8) 2.7 (0,8) 62.8 1.5) 1.6 0.3) 30.1 1.3) 5.5 (1,6) 30.6 0.0) -0.3 0.1) 10.9 0.8) -4.1 (0,5) 82,3 1.6) -0.7 0.3) 29.4 1.7) -10.9 (1,4) 0.0 0.0) I.l 0.1) 3.7 0.5) 6.7 (0,8) 0.0 0.0) 9.1 1.1) 32,3 3.3) 58.5 (3,5) 0.0 0.0) 0.8 0.1) 11.8 0.8) 2.6 (0.9) 0.0 0.0) 5.6 0.7) 77.6 5.2) 16.9 (5,1) 30.6 0.0) l.I 0.2) 15.1 0.8) 4,0 (1,0) 60.3 ,1.6) 2.1 0.4) 29.7 1.2) 7.9 (1.8) 3.2 '0.0) 0.1 0.0) I.I O.I) 0.7 (0,1) 2.1 0.3) 21.8 1.2) 13.6 (1.5) 62.5 coefficient 1.5) Bootstrapped standard errors are given in parenthesis The second hne in each indicates the percentage of total variation. The censored quantile regression model has been applied. Notes All numbers ceil 1 0.0) 47.6 .1.6) are in °o 60 Men CO - CO - 'd- - <M - r^ .J-*^ Women 00 - ID "* - CN Distribution function in Uniform CI Figure for 1. Empirical observed wages in the in in 79 Distribution function 79 Uniform CI CDFs and 95% 1979 and 1988. upper panel and distributions for in in 88 88 simultaneous confidence intervals Distributions for women men are plotted are plotted in the bottom panel. Confidence intervals were obtained by bootstrap with 100 repetitions. Vertical lines are the levels of the minimum wage. 61 Observed quantile functions .4 .2 Observed differences .6 Minimum wage o De-unionization . If LU ^.. 1 1 1 ^-;. . . 1 CM 1 ^ CO 1 I 1 1 .4 .2 .6 .8 Residual Individual characteristics .4 .4 .6 Quantile QE Figure 2. 95% .6 .4 .6 Quantile r" -:-"l Uniform confidence bands simultaneous confidence intervals for observed quantile functions, observed quantile policy effects and decomposition of the quantile policy effects for men. Confidence intervals were obtained by bootstrap with 100 repetitions. 62 Observed differences Observed quantile functions o .4 .2 .6 IVIinimum wage De-unionization Residual Individual characteristics LU O o I CM .6 .4 — Figure 3. Quantile QE 95% simultaneous Uniform confidence bands confidence intervals for observed quantile functions, observed quantile policy effects policy effects for women. Confidence with 100 repetitions. .6 .4 Quantile and decomposition of the quantile intervals were obtained by bootstrap 63 Observed Observed differences distribution functions Q o. 1.5 1 2 2.5 Minimum wage De-unionization -— — --—- - ^*^ o ^ Q ' ;'. o .4 .2 .6 .2 .4 .6 .8 Residual Individual characteristics o in o .4 .6 .4 Quantile — Figure 4. 95% .6 Quantile DE Uniform confidence bands simultaneous confidence intervals for observed distribu- tion functions, observed distribution policy effects distribution policy effects for men. bootstrap with 100 repetitions. and decomposition of the Confidence intervals were obtained by 64 Observed distribution functions Observed differences 2 IVIinimum wage .4 .6 De-unionization Q o I o LU Q o .4 .6 — Figure 5. Quantile DE Uniform confidence bands 95% simultaneous confidence intervals for observed distribu- tion functions, observed distribution policy effects distribution policy effects for .6 .4 Ouantile women. Confidence bootstrap with 100 repetitions. and decomposition of the intervals were obtained by 65 Observed differences Lorenz curves 2 Minimum wage .4 De-unionization .4 .6 Figure 6. LE 95% simultaneous .6 .4 Quantile — .6 Residual Individual characteristics .4 .6 Quantile ZZH] Uniform confidence bands confidence intervals for observed Lorenz, observed Lorenz policy effects and decomposition of the Lorenz policy effects for men. Confidence intervals were obtained by bootstrap with 100 tions. repeti- 66 Observed differences Lorenz curves o CO _^ ;^ 0) X! CD * • -- 0) ^ O / >^ CM ^^^^^ ' •1 OTn 1 1 .6 .4 .2 Minimum wage De-unionization CM O uj _i r o Individual characteristics o r CM o Lu r N^^ a- o - - r ,4 .2 .4 .6 Uniform confidence bands LE Figure 7. 95% .6 Quantile Quantile simultaneous confidence intervals for observed Lorenz, observed Lorenz policy effects and decomposition of the Lorenz policy for women. Confidence titions. intervals were obtained effects by bootstrap with 100 repe- 67 Minimum wage o LJJ De-unionization - -^ O I CNl r l' CO CO — .2 .4 r r— .8 .6 .2 1 .4 .6 .8 Residual Individual cJiaracteristics .^_ o ^ T— O ^^^^^^..^^.r'-^, ^^^^^"^ r I CNl CNJ r r CO CO r r .4 ^ Quantile — — - ^--..^...^^ 1 .2 .1 _,^'"^^ - 1 1 1 1 .2 .4 .6 .8 Quantile Distribution regression Censored distribution regression Censored quantile regression Figure 8. regression Comparison 1 1 of distribution regression, censored distribution and censored quantile regression estimates of the decomposition of quantile policy effects for men. 68 Minimum wage De-unionization CM O o - - LU O CM r CO r n .2 .4 .6 .8 1 Individual characteristics "i .2 .4 I .6 "T .8 r 1 Residual r Quantile — Distribution regression — Censored quantile regression 9. I .4 .6 Quantile Figure .2 Comparison Censored distribution regression of distribution regression, censored distribution regression and censored quantile regression estimates of the decomposition of quantile policy effects for women. 69 Table Al: Reversing the order of the decomposition: 1979-1988, DR Effect of: Minimum Individual Total change Statistic attributes wage Unions Coefficients Men: Standard 8.0 (0.3) Deviation 90-10 21.5(1.0) 50-10 11.3 (1.4) 0.9(0.2) 1.5 (0.1) 2.9 (0.2) 2,7 .0.3) 11.4(2.8) 19.2 (1.0) 36.3 (2.8) 33,1 ;2.4) 0.4(1.3) 8.8 (1.2) 11.2 (0.9) 1,1 '1.3) 1.8(5.8) 40.7 (5.6) 52.1 (4.8) 5,3 5.9) 2.5(1.5) 0.7 (1.2) 11.2 (0.5) -3.1 l.I) 21.8(12.7 90-50 10.2(1.2) -2.1(1.2) -20,1 (12.9) 75-25 15.4(1.1) 95-5 33.0(2.1) Gini 4.1 (0.1) coefficient 5.8 ( 11.3) 8.1 (0.8) 79,1 99.6 ( 15.5) 27.1 0.0 (0.9) ( 4.0) 4.2 1.1) 41.0 9.8) 0.7) 0.0 (9.5) 6.4(1.2) -2.1 (1.3) 0.0 (1,0) 11.1 1.2) 41.6(6.8) -13.4 (9.2) 0.0 (7,4) 71.8 8.7) 2.6(1.7) 5.9 (I.l) 23.0 (1,0) 1.4 1.5) 7.9(4.8) 17.9 (3.4) 69.9 (5.5) 4.3 4.4) ( -0.3 (0.1) 1.0 (0.0) 1.4 (0.1) 2.0 0.1) -6.8 (2.8) 24.1 (1.1) 33.3 (2.5) 49.4 1.8) Women: Standard 10.9(0.4) Deviation 90-10 39.8(1.4) 50-10 33.0(0.7) 90-50 6.8 (1.4) 75-25 12.8 (0.9) 95-5 Gini 38.8(1.9) 4.0(0.1) . coefficient Notes All numbers are cell % 4.5 (0.2) 0.0 (0,0) 4.4 (0.3) 2.1 0.3) 41.1 (2.4) -0.2 (0,2) 40.0 (2.4) 19.1 2.5) 11.2 (0.7) 0.0 (0,4) 27.2 (0.2) 1.3 1.1) 28.2(1.6) 0.0 (1,0) 68.4 (2.3) 3.4 2.6) 7.9(0.8) -0.8 (0,5) 22,7 (0,5) -1.4 0.7) 24.1 (2.6) -2.4 (1,7) 82,6 (2,2) -4.3 2.4) 3.3 (0,8) 0.8 (0,5) 0,0 (0,5) 2.8 1.4) 47.9(13.4) 11.7 (7,2) 0,0 (6,5) 40.3 9.9) 4.5 0.8) 2.8(0.7) 0.0 (0,5) 5,5 (0,3) 22.0(5.6) 0.0 (3,9) 43,0 (3,5) 17.4(0.9) 0.0 (0,3) 16,5 (2,0) 5.0 44.8(3.0) 0.0 (0,9) 42,4 (5,1) 12.8 5.1) 0.6(0.1) 0.0 (0.0) 2.5 (0,1) 0.9 0.1) 14.7(2.4) 1.2 (0,3) 61.1 (2,7) Bootstrapped standard errors are given in parenthesis The second line indicates the percentage of total variation The distribution regression model has been applied in 35.0 4.5) 2.1) 23.0 2.2) in each