Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/inferenceonquantOOcher I Massachusetts Technology Department ot Economics Working Paper Series INFERENCE ON Institute ot QUANTILE REGRESSION PROCESS, AN ALTERNATIVE Victor Chernozhukov Working Paper 02- 12 February 2002 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssrn.com/paper.taf '/abstract id^ xxxxx DEWEY 2 f Massachusetts Institute of Technology Department of Economics Working Paper Series INFERENCE ON QUANTILE REGRESSION PROCESS, AN ALTERNATIVE Victor Chernozhukov Working Paper 02-1 February 2002 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection http://papers.ssni.com/paper.taf7abstract id= xxxxx at INFERENCE ON QUANTILE REGRESSION PROCESS, AN ALTERNATIVE VICTOR CHERNOZHUKOV Abstract. A very simple and practical resampling test inference based on Kmaladzation, as developed in in alternative has competitive or better power, accurate size, of non-parametric sparsity and score functions. series data. effect of It is offered as an alternative to Koenker and Xiao (2002a). This and does not require estimation applies not only to iid but also time Computational experiments and an empirical example that re-examines the re-employment bonus on the unemployment duration support this approach. Key Words: bootstrap, subsampling, quantile regression, quantile regression process, Kolmogorov-Smirnov test, unemployment duration 1. Introduction Inference in quantile regression models, pioneered by the classical work of Koenker Bassett (1978), crucial to a is and wide range of economic analyses. For example, evaluation of the distributional consequences of social programs requires inference concerning nature, direction, and quantity of the impact throughout the entire outcomes distribution. See e.g. Abadie (2002), Buchinsky (1994), Heckman and Smith (1997), Gutenbrunner and Jureckova (1992), McFadden (1989), Koenker and Xiao (2002a), and Portnoy (2001). Just like in the classical p-sample theory, e.g. Doksum (1974) and Shorack and Wellner (1986), this kind of inference is based on the empirical quantile regression process. It differs however from the early approaches by replacing the basic (indicator) regressors with general ones. The main difficulty associated with such inference tures, estimated nuisance parameter, or non-i.i.d. is the Durbin problem - the model's fea- data induce parameter-dependent asymp- 1 jeopardizing distribution-free inference. In a recent Econometrica paper Koenker and Xiao (2001) proposed an ingenious and intricate theory, based on Khmaladze transformation, that purges the tests statistics from the non-distribution-free components, restoring totics, MIT working paper, January 2002. Contact: 50 Memorial Drive, E52-262 F, Cambridge MA 02142, VCHERN@MIT.EDU. This project would have been possible without receiving help and encouragement from Takeshi and Roger Koenker. Hansen for I would like to express my appreciation and gratitude to them. I also Amemiya thank Christian very useful comments. The problem gained importance after a long series of bibliography. 1 remarkable works by Durbin, listed in the VICTOR CHERNOZHUKOV 2 distribution-free inference. dictable component The approach uses recursive projections to annihilate the pre- a martingale that limits to a standard Brownian in the process, leaving motion, thus overcoming the Durbin problem. Here we suggest a simple resampling alternative that complex Khmaladze transformation, (ii) metric nuisance and score functions, (iii) the sense that does not require the estimation of the nonparahas an accurate size and the optimal power (in has same power as the test with its does not require the somewhat (i) very competitive with Khamaldzation, (iv) known The basic idea The process. is is aimed at substantively extremely simple. H The key correctly, regardless of ), which makes is it com- a useful complement to based on the quantile regression statistic is denoted H, under the null hypothesis. In whether the null As a an appropriately re-centered quantile process. result, is H sampling purposes, we choose the subsample bootstrap, cf. we resampie true or not, as well as the entire null law of the process are correctly estimated under local departures from the (1999). is expanding the scope of empirical inference. statistic has a limit distribution, order to estimate 2 robust to dependent data, and (v) is putationally and practically attractive. Therefore, the approach Khmaladzation and value critical Politis, null. For re- Romano, and Wolf which has computational, practical, and certain theoretical advantages over the usual (unsmoothed) bootstrap for quantile regression, Bickel (2000). Horowitz (1992)'s mechanism cf. Buchinsky (1995) and Sakov and smoothed bootstrap may also be an attractive resampling for these tests. The underlying principle differs from the conventional bootstrap tests for goodness of fit. The conventional tests resampie from a probability model that is consistent with the null, see e.g Romano (1988), Andrews (1997), and Abadie (2002). Although such approach is potentially useful in quantile regression settings, remains unknown, because the its validity quantile regression families estimated in emprical work are typically mis-specified and in- complete probability models (regression quantile and the In tail what and id => and we use P* follows, — > to denote in distribution for to denote (outer) probability, weak convergence random The questions posed in the for (1974), (1989). under P*. fundamental econometric and effect, effect, cf. Quantile regression is statistical literature are (1999), Koenker and Xiao (2002a). tests do not generally have or, Abadie (2002), Heckman and Smith (1997), an important and practical tool about such distributional phenomena. Khmaladze whether a location-scale effect, or a general shape Koenker and Machado example, a stochastic dominance and McFadden and convergence The Testing Problem the treatment exhorts a pure location Doksum which possibly depends on n, bounded functions in a space of boi vectors, respectively, 2. effect, cf. lines are not constrained to avoid crossing, regression quantiles are not estimated). this property, see Koenker and Xiao. for learning INFERENCE ON QUANTJLE REGRESSION PROCESS Suppose Y is X the outcome variable, and FYlx are regressors. Let the conditional distribution function and the r-quantile of Y 3 and Fy\ X (r) denote The given X. basic conditional quantile model takes the linear in parameters form: F- A.(r)=X'/?n (r). 1 7 = 6 T, where for all t [e, 1 — U~ e] This are quantiles of interest. model Y— entire shape of the conditional distribution and includes the X'f3n (U), where location-scale models as special cases. Pn(r) is As in made dependent on The f/(0, 1). To a is random coefficient stated model allows regressors to affect the classical linear location power facilitate the local and analysis, parameter the sample size n. Koenker and Xiao (2002a), we consider the following #(t)/3„(t)-7-(t) = null hypothesis: reT, *(t). (1) where R{t) denotes a q x p matrix, q < p = dim(/3), r£l', and \£(t) denotes a known ^ T -» R9 We assume that functions R{t), ^(t), t(t), and /?o( T ) = lim n /?n (T) function : . are continuous in r. The tests will be based on the Koenker-Bassett quantile regression process (3n (-): n V pn {r) = arg min pT (Y, - Xtf) r € T, , (=1 where p T (u) = u(t — I(u < Other estimators, such 0)). as Chamberlain's minimum distance or instrumental variable quantile regression estimators, can be considered as well, depending on the problem. We will focus on the basic inference process: «n(T)=(5(T)A,(T)-f(T)-*(T)) and derived from it Kolmogorov-Smirnov and Smirnov Sn = where ||a|| definite v = v/nsup||i; n (r)||v (T) r€T Example 1 The in r. (Location Hypothesis). An Sn = n statistics / ||u n Sn = f{\Znv n (-)), (r)||| (r) dr, (3) JJ —^ Va'Va, the symmetric V(r) symmetric matrix uniformly , (2) ) V(t) uniformly choice of V and in r, and V(t) V discussed in section is important hypothesis is a positive 3. that of the classical is location-shift regression F^x (T)=X'a + 1 -Fwhere V is (a2, ...,a p )', independent of X. i {r), or In this case, R(t) = R = [0 : • V, /p -i] and r(r) = r = which asserts that the quantile regression slopes are constant, independent of The component r can be estimated by any method LAD, OLS, and others in Koenker and Xiao (2002a). t. Y = X'a + 7 consistent with the null, for example VICTOR CHERNOZHUKOV 4 Example X Hypothesis). The 2 (Location-scale and to affect only the location F-^{T) = X'a + X'j-F- l V is independent of X. In this case, = 0. Estimates of a and 7 F 7 (t) can 1 \I/(t) • v Example D= 3 (Stochastic /0_i(-) is Dominance • (t), be obtained by using see /?i(-), R = OLS [0 : Ip-i], and projection of each Koenker and Xiao (2002a). Hypothesis). Suppose The V, D = test of stochastic 1 denotes receipt and dominance, or whether unambigiously beneficial, in the model F^ x dominance involves the = a + 7 Fy r(r) on the intercept denotes non-receipt of a treatment. the treatment moments: Y = X'a + X'-y or (T), where component of slopes vector location- scale shift regression constraints scale of Y, but not any other (T) = D5(T) + X'd( T ), null 5(t) > for all t 0, 1 G versus the non-dominance alternative 5(t) < 0, for some r € T. = 0, In this case, the least favorable null involves t(t) P(t) = R — —[1,0...], ^(r) = 0. and and one may use the one one-sided Kolmogorov-Smirnov or Smirnov Sn = v ninfrST max( — 5n (r),0) and Sn = y/n J max( — 5n (r), 0)\\y {r) dr to test (5(t),0(t)')', statistics / || the hypothesis. We will maintain the following assumptions. A.l (Y ,X t t ,t < n) is stationary and strongly mixing on probability space A. 2 Law of (Yt,Xt,t < (a) for P ™\ n), [ is contigious to a fixed continuous function p(r) R(t)Pu(t) (b) for a fixed - r(r) = *(t) (a) continuous function g(r) Under any local alternative, 7 —> + g(r), R(r)p(T)-r(r) A. 3 : : P'"l, 3 some W g(r) and = mean Gaussian and either for each n for each n y/n - Pn {-)) => &(•)• ^(T), where (6,p,?) (J3 n {-) on p. \/"(-R(') _ are jointly functions with nondegenerate covariance kernel, Under the global alternative, A2(b), the same holds, except that the needs not have the same distribution as in A3 (a). e.g. )- = ^(T)+g(T). A2(a), (b) As defined Pn p(r)/y/n, or. 7 —> Rq and R(.)) => p(.), v/n(f(-) -r(-)) => -?(•)> jointly in zero (fi.3", 87 in van der Vaart (1998) limit (b, p, <;) INFERENCE ON QUANTILE REGRESSION PROCESS A.l allows a wide variety of data processes: sufficient but is iid, time and panels. Mixing is not necessary for consistency of subsampling. Stationarity can be replaced by more general Romano, and Wolf stability conditions, see ch. 4 in Politis, and A. 2(b) formulate a local and a global alternative. is series, 5 A. 3 is (1999). A. 2(a) very general condition, that implied by a wide variety of conditions in the literature, most remarkable and general of which are given in Portnoy (1991), who allows shape heteroscedasticity and dependent data. Thus A. 3. substantively generalizes Koenker and Xiao's (2001) or Koenker and Machado's (1999) conditions (local-to-location-scale assumption and the hypotheses in Examples Proposition 2, sampling) which elegantly suit but are restrictive and not necessary in Example Under conditions Al, A2a, A3, 1. 1. and 1 iid Vnvn {-) =>«() =u(-) + 3. in £°°(7) d{-)+p(-), MO = where u(t) and d(r) i?(r)'6(r) = {Po{ T )p{ T ) sn 2. Under Al, A2b, A3, y/n(vn {-) - d{T) = {(3o{t)p{t) 1. <f(r)). once g statistics in (3) The + And Sn usual component u iid oo if = u(-) /(%/"<?(•) components that Under the null, p = 0, + d(-). where u{r) = Z?(t)'6(t) and + Op {\)) —^ oo (which is true for illustrate the Durbin problem: typically a Gaussian process with non-standard covariance is kernel, so its distribution can not away by imposing • s = /M-)). g{-)) => v(-) —^ ?( r )) ^ 0/ limit consists of three The =» + be feasibly simulated. conditions. However, the This problem may be assumed problem does not go away, once the data is a time series or a panel. In such setting, Koenker and Xiao's method unfortunately does not apply in 2. its present form. Component d Koenker and Xiao is the Durbin isolate nonstandard covariance component that is present because R and r are estimated. d as a chief problem that makes the entire term v to have a kernel. They use Khmaladzation to annihilate this component. 3. Component p, which describes deviations from the null, determines the test's power. As Koenker and Xiao show, the Khmaladzation inadvertently removes some portion of this component as well. In fact they gave examples where p is removed completely, such as piece- wise linear densities. Such densities are not commonplace, yet they can easily approximate a well behaved density. Nevertheless, Koenker and Xiao show that Khmaladzation has respectable power in most practical cases. Khmaladzation requires estimation of several nonparametric nuisance functions - the and various score functions, see Koenker and Xiao for details. The sparsity functions feasibility of this may depend on the procedure not laborious. Otherwise, for instance in Example functions is is more difficult the underlying model. Under a location-scale shift model, and has not been implemented nor had its 3, estimation of score theory been established. VICTOR CHERNOZHUKOV 6 we describe a simple approach that In the next section, very useful in practice, does is not erase components of p under any circumstances, and does not require nuisance function estimation. From a constructive point of view, the approach of the Koenker and Xiao methods, which are brilliant Rather, the approach is meant not intended to be a critique is and useful in be a useful complement, aimed to many conceivable cases. at substantively expanding the scope of empirical inference. Resampling Test and Its Implementation 3. The 3.1. Test. The approach is v n {r) based on the mimicking process v and = v n {r) -g{r), Proposition 2. 1. Given A.l, A2a, A. 3 A. 3 v n (T)^v(-)=u(-)+d(-), S n =^S Under when local alternatives, vn = {-) Sn = => v statistic Sn f{v n (-)). § n => S. 2. (-), Given A.l, A2b, /(«(•)) Sn the statistic S n correctly mimics the null behavior of This does not happen under global alternatives, but this the null is false. : , even is not important. we use In what follows v n (r) itself to estimate g(r), and use the subsample bootstrap consistently estimate the distribution of S, which equals that of The usual bootsrap will also work, but subsampling grounds explained (2000). The (1999), in is S to under the null hypothesis. preferable on both computational Buchinsky (1995) and theoretical reasons given in Sakov and Bickel 4 subsample bootstrap, introduced by basic idea of the is statistic to approximate the sampling distribution of a Politis, statistic computed over smaller subsets of data. The resampling Romano, and Wolf based on the values of this tests based on subsampling are done in three steps. Step 1. when Wt For cases such subsets B n is "n choose = (Yt,Xt ) 6." subsets of size b of the form {Wi, each i-th subset, i < Bn is iid, construct For cases when ..., {W } t is all subsets of size 6. The number a time series, construct of B n = n— 6+1 Wi+b-i}- Compute the inference process v^ n ,i(')^ f° r 5 . Denote by v n the inference process computed over the entire sample; and by computed over the i-th subset of data: i>f> in> i the inference process Another attractive alternative is the smoothed bootstrap as in Horowitz (2001). Subsampling is used for pragmatic, computational reasons. In addition, Sakov and Bickel (2000) show that subsampling, combined -2 with interpolation, yields the same minimax order of occupancy as smoothing once subsample size b oc n ' 5 . A smaller 2.5 in Politis, number B n of randomly chosen subsets can Romano, and Wolf (1999). also be used, if Bn —> oo as n —> oo, cf. Section INFERENCE ON QUANTILE REGRESSION PROCESS and define § nj 6 S„ i6i2 for cases , = 7 = supv^lK^r) - when Sn is and -> )])< H =b u n (r)||v(r) or S nbi , under - b,i{j) un (r)|||,,dT, Define statistics, respectively. x}. - ff(-)ll = v^ x Op (l/y/n) -^4 0, including when - <?() + (g(-) - v n (-))\\ = Vb\\v nAi {-) - g(-) + op (l)\\, b -> oo, V^ll^nO) y/b\\v n \\v n H{x) = Pr{S < and bii {-) Therefore, the distribution of i. coincides with Step f° r instance - = Pr{S<x} ff(r)=p(r)/ > /fi. Therefore uniformly in ^n( Kolomogorov-Smirnov or Smirnov G{x) As 6/n — f{vb[vt,, n ,i{') 7 (r, g) §„,{,,, can consistently estimate G, which Thus, the following steps are local alternatives. clear. Estimate G[x) by 2. B„ Gn , b = S- ^l{S n (M (r)<x}. 1 (x) , !=I Step 3. The critical value is obtained as the cnJb {l-a) The size a Theorem (i) test rejects Given A.l 1. When Under where (3 Under //(x) v a) -a) = Pr{f{v p -^ is is A2b, —> oo,n —» Pn (5n > a), 0, (f // is -^ G- a (l G - is +p()) it Bn — oo, cn , a). c oo, > 6 _1 (l i6 (l — (l - continuous at Pn (Sn > - a)) -> a. H~ l (l a)) -+ continuous at G _1 (l — Pn (Sn > - a), a): — a): /?, -a)), test is asymptotically —* °°- the critical values are — continuous at i/ //"'(I if c n> (,(l x at c„, 6 (l > wften a), a)) -* i/ie —^ and Sn oo: 1. covariance function of nondegenerate. Thus the resampling Op {\) may be known power and j3 goes to one. Sn — ^ unbiased and has the same power as the critical value. Furthermore Under global if as p(r) —> useful to account for an error in G~\(l — a) caused by of standard error to oo or — oo, alternatives, the estimated oo for Kolmogorov-Smirnov and Smirnov we added 1.69 times an estimate conservative when B„ is small. simulations, when Sn > and G(x) are absolutely continuous and v In practice ^ + p{-)) > corresponding test that uses a f( v o(') - //"'(I G n<b (-): 6 a-th quantile of = G-^(l-a). 0, b H 0. if — -^> //"'(l - a), {-) - a) = A2a, p global alternative c„, 6 (l (iv) - —> A. 3 as b/n local alternative c„, 6 (l (iii) - the null is true, c„, 6 (l (ii) the null hypothesis 1 G~\(l — Bn a), to statistics. being "small". make the test In more VICTOR CHERNOZHUKOV 8 3.2. Approximations. It may be sometime more with the largest cell size 5n —> Corollary 1. as n —> practical to use a grid Propositions 1 and 2 and Theorems Estimation of V(r). 1 is — as We for estimating V*(r), uniformly consistently in t. iid n —» oo. we could In order to increase the test's power a (generalized) Andersen-Darling weight. In general cases. set _1 , many methods samples, there are are not aware of any results for more Note that we would need a Newey and West (1987) type estimator that consistent uniformly in Subsampling 7 and 2 are valid for piece-wise constant V(t) =V*{t) =Var[u(r)] which in place of oo. approximations of the finite-sample processes, given that 5 n 3.3. 7n is t. can be used to estimate V(t), even without assuming asymptotic itself integrability conditions. This is possible by using a percentile method in conjunction with asymptotic normality. Consider the truncated variance Vk( t = ) = K(t) = Uj 1 *- — _1 Var[w K (r)] p 1= i[Lj(t),Uj(t)] , where v k {t) a large compact is We a-quantile of Vj(t). set. = v{t) x E.g., I km (v{t)), L3 = a-quantile of vj(t), and can estimate V^(r) using Theorem 2 stated below. Note that having estimated the truncated variance, we may stop there since V£(r) « V*(t), and Theorem Second, using that v(t) = 1 K, applies to any positive definite symmetric V(t). N(0, V*(r)), we can use the percentile method estimate of diagonal elements of V*(r) based on V^-(t). correlations, for a large we can then estimate off-diagonal elements. to obtain an Using symmetrically trimmed In simulations we simply used un-truncated variances. Theorem 2 provides the uniformly consistent in t estimates of any truncated moments of the process v n (r), including the trimmed correlations. This theorem of the ingenious results of Politis, Romano, and Wolf (1999). Let r i—> is v(t) be an element of £°°(T), equipped with the sup norm, of measurable Lipshitz functions \\<p(v) - <p(v')\\ <p : < £°°(T) c • sup —> K M. ||i,(t) - a direct consequence and L(c, k) be a class that satisfy: v'(t)1 \\ip(v)\\ < k, T where c and k are suitably chosen positive constants. For probability laws Q and Q', define the bounded Lipshitz metric (which metrizes weak convergence) as Pbl{Q,Q') Useful examples of vp p ip include (p(v) and we replace the indicator = = sup\\EQ if- EQ np\\. w(t) to / a-('D(t)), where (v\, ...,v p) m = v™ 1 x ... x 1 K{t) (v(t)) by a smooth approximation / A( -)(u(t)) which . INFERENCE ON QUANTILE REGRESSION PROCESS K, vanishes outside compact set for all r. correlations. For example, if v(t) Vax[v K (r)] is This defines Ln^ Theorem in £ oc kinds of truncated all = E Lnb [v{r) 2 f K (v(t))} = B~ ^[(^(t) l : Under assumption A.1-A.3, 2 v n (r)) f K (v nAl (r) - vn (r))] =1 denotes the subsampling (outer) law of v n ^j;(•) 2. moments and a scalar (for clarity sake), then 2 where 9 letting L and Lq — v n {-) in £°°(7). denote the laws ofv(-) and vq(-) (7), respectively, p~ pBL{Ln,b'L) -2» and L equals Lq under t £ 7) within 0, local alternatives. In particular, for functions (v >-> v(T) m f K{r) (v(T)), Ti(c,k), sup E L ^[v(r) m fKav(r))} - EL [v(r) m fK{r) (v{r))] ^0. 7£T The last statement remains true even when /K (r) (-) Choice of Block replaced by l/r( T )( is Romano, and Wolf (1999) various rules are suggested for choosing appropriate subsample size. Politis, Romano, and Wolf (1999) focus on the calibration and minimum volatility methods. The calibration method involves picking the optimal block size and appropriate critical values on the basis of simulation experiments conducted with a model that approximates a situation at hand. The minimum volatility method involves picking (or combining) among the block sizes that yield more stable critical values. More detailed suggestions emerge from Sakov and Bickel (1999) and Buchinsky (1995). Sakov and Bickel (2000) suggest that choosing b = hi 2 5 yields 7 the optimal minimax accuracy (in conjunction with extrapolation). Our own experiments indicated that the constant k between 3 and 10 are attractive both computationally and 3.4. Size. In Sakov and Bickel (1999) and in Politis, ' qualitatively, which well accorded with the results of Sakov and Bickel (1999) for the sample median. 4. A Computational Example The computational experiment that we consider is that of Koenker and Xiao (2002a). This allows us to compare the performance of the resampling without prejudicing against the Their result is for latter. Khamaladzation Consider the location-shift hypothesis as in Example the subsample bootstrap with replacement replacement versions are asymptotically equivalent once b 2 /n (1999). test vs. — > 0. However, the replacement and nonSee e.g. Politis, Romano, and Wolf VICTOR CHERNOZHUKOV 10 1. The data is generated from the model: Y = OL + pX + cT{X )-eu i i i =70+7! o{Xi) X{, d ~N{Q,1), Xi ~iV(0,l), a= Under the = null hypothesis 71 0. 0,0 We = l,7o = l- examine the empirical rejection probabilities for the sample sizes and we used the OLS estimate of /3 and 7 = [.05, .95]. When 71 = the model is a location-shift model, and the rejection rates yield the empirical sizes. When 71 7^ the model is a heteroscedastic model, and the rejection rates give the empirical powers. Table 1 reports the results and compares them with Khmaladzation. Other details of the set up are as those reported in Koenker and Xiao (2002b). heteroscedasticity parameter 71 test for different choices of the . In constructing test, Table 2 speaks for From ples. serious itself. these results, complement The resampling test powerful and accurate even in small sam- is say that the resampling test emerges as a respectable, fair to it is Khamaladzation method. The method to the is also quite robust to variation of subsample size - a wide variety of subsample sizes performs very well (including when the resampling mechanism small sub-samples (6 = 5n 2 5 ' ) 5. To illustrate the present is the n out of n bootstrap), suggesting that even are both computationally An and qualitatively fairly attractive. Empirical Application approach, we will re-analyze and expand on the main empirical question considered in Koenker and Xiao (2002a). The question concerns the Pennsylvania re-employment bonus experiment conducted by the U.S. Department of Labor, 8 which was conducted scheme in the 1980's in order to test the incentive effects of unemployment insurance for were randomly offered a cash bonus and if an alternative compensation (UI). In these controlled experiments, if UI claimants they found a job within some prespecified of time the job was retained for a specified duration. The goal was to evaluate the impact of such a scheme on the unemployment duration. As in Koenker and Xiao (2002a) we focus on the compensation schedule that includes a lump-sum payment of six times the weekly unemployment benefit for claimants establishing the reemployment within 12 weeks (in addition to the usual weekly benefits). The definition of unemployment spell includes one waiting week, with the maximum of uninterrupted full weekly benefits of 27. The model under consideration is the linear conditional quantile model for the logarithm of duration: QW)(T\X) = There see e.g. is a(r) + 5(t) D + X'P(r), a significant empirical literature focusing on the analysis of this and other similar experiments, Meyer (1995)'s review. INFERENCE ON QUANTILE REGRESSION PROCESS where is T is D the duration of unemployment, is 1 the indicator of the bonus offer, and I X a set of socio-demographic characteristics (age, gender, number of dependents, location within the state, existence of recall expectations, and type of occupation). Further details are given in Koenker and Bilias (2001). The estimate of Quantile Treatment Effect for is <5(-) plotted in Figure 1. Unemployment Duration 0.4 Quantile Index Figure The 1 three basic hypotheses described in table 3 include: T= • treatment • treatment affects only the location and scale of the outcome log(T), • treatment constant across most of the distribution effect is unambiguously effect is beneficial: 5(t) These hypotheses specialize Examples 1-3 implemented following section 3, and the implemented for subsample size of 3000. < for all r to the present case. results are given in We sample despite that n designs The as, effectively, first = — [.15, .85]), G 1. The resampling test is Table 3. The tests were could not consider subsamples of smaller sizes because they often yielded singular designs (many components of taking on positive value with probability 2 ( 10%). Thus, we X are dummy variables dealt with effectively a small 6384 (see Goldberger (1991) on characterizing close-to-singular the small-sample designs). two hypotheses are decisively Koenker and Xiao (2002a). This and powerful inferences even tion of these hypotheses is is rejected, strongly supporting the conclusion of notable given that the resampling tests provide accurate in effectively small samples, as we saw in Table 2. The rejec- an additional strong evidence in favor of the quantile inference VICTOR CHERNOZHUKOV 12 paradigm of Lehmann-Doskum, which emphasizes ubiquity of quantile shift effects and the general impossibility of describing such treatment effects as merely shifting the location and scale. The hypothesis statistic is 0, of stochastic dominance, the third one, while for rejection it is necessary that it decisively supported. is exceeds the value of The 2.6. test This additional result complements the set of inferences given in Koenker and Xiao (2002a). Thus the bonus offer creates a first order stochastic dominance effect on the unemployment duration, supporting the efficacy of the program. 6. A Conclusion simple and practical resampling test offered as is technique, suggested in Koenker and Xiao (2002a). (same power as the test with known parametric nuisance functions. It critical value) an alternative to the Khmaladzation This alternative has optimal power and does not require estimation of non- applies both to iid and time series data. Finite-sample experiments provide a strong evidence in favor of this technique and an empirical illustration illustrates its utility. Appendix A and Proof of Proposition 1 Proof of Theorem Part in Politis, 1 2 result (1999). immediate from A.2-A.3. There are few Kolmogorov-Smirnov give the proof for the is of the proof follows standard arguments for subsampling consistency, as II Romano, and Wolf The statistic. details that we have to fill out before then. Extensions to other statistics defined in We the text are straightforward. To prove I. (i)- (iii), GnAx) = B~ Gn EG n A x = ) in Politis, Next , b (x) Pn(Sb l 1 x). l[ two For facts: fact 1, V K G n ,b(x) and write out V 1/2 (t) [sup (V6K,6,i(r) - g(r)) + -fb(g(r) - v n (r))) I iid case: (1999), by Theorem uniformly in l '2 (r) G„A X being a U-statistic of degree ) 3.2.1, combined with b; < «], and otherwise: by contiguity, conclude G n ,b(x) —^ LLN G(x). i (Vb(v n ,bAr) - g(r)) + Vb(g(r) - v„(r))) < - | ]T sup|v 1/2 (r)(\/fc(^, b .,(r)-s(r)))|< Romano, and Wolf collect 1 J^ = B- < G n ,b(x) define |vi/*(t) (Vb(v n ,bAr) - g(r)) + y/b(g(r) The paradigm is formulated in a series of works by Lehmann Machado (1999), and Koenker and Bilias (2001). - n/aT, «„(r))) (1974), Doksum (1974), Koenker and INFERENCE ON QUANTILE REGRESSION PROCESS where A n = 1/2 (r)i>(r)y- 1/2 (r)V and A„ sup, maxeig (v'- eq-ty 10 on p.460 in Amemiya (1985). < l[Ai where l n = v/l/A n and u n 10 (x/u n = v A„, Fact 2 follows from Fact - Wn )] < and wn is < l[A, qn = max[|u„ — 1|, |/„ — —^ 1|] Thus 0. < x] sup r maxeig (v'" 1/2 (r)V(r)V> 1 and by ||A||-|NI < l[Ai < {l/ln ||A + -uj[| - 1/2 < (r)) by ||.4|| + |MI> + UI„)] defined below. By A2 and A3 and assumptions on V and V w n = 0, = 13 wp— > 1 sup_ y/b V l{E n ) = 1, rl/2 (r)(u n (r) where E„ = — = Op (\/b/\/T) -^ p(r)) {v n ,q n < > S} for any 5 0. G n ,b(x - e)l{En < G n ,b{x)l{E n < < G n ,i,(x) < G n ,b(x + e). Now pick — so that [x e,x + e] are continuity points of G(x). For such small enough t G n ,b(x + c) —^ G(x — c), e > for c = e and c = — e, which implies G{x — e) — e < G nt b{x) < G(x + e) + e w.p. — 1. Since e and <5(e) -1 can be set as small as we like, G n ,b{x) —^ G(x). Now note that x = G (l — a) is a continuity point by II. Thus G n ,b(x + for small enough e > there is S > 0, so that e)l(En) so that with probability tending to one: by fact 2: G n ,b(x — ) ) e) , > assumption. Convergence of quantiles is implied by the convergence of distribution functions at continuity points. III. (iv) follows from Proof of Theorem 2 Romano, and Wolf Lifshits (1982) or I. The proof (1999). II. The Davydov, Lifshits, of the first statement class of moment is and Smorodina (1998) by A3. a direct corollary of Theorem 7.3.1 functions / defined prior to the proof measurable, since these functions are a measurable function of a random subsampling truncated moments thus follows from the definition of degenerate uniformly in r, it pi,. vector v„(r). III. Finally, is in Politis, clearly Borel The convergence because v(t) is of non- has continuous bounded density by Gaussianity, by A. 3, Ev(t) p 1 k ^^(v(t)) can be approximated arbitrarily well by Ev{r) p fK (-)(v{r)) for some v >-> f(T) p /K(-) 6 L(c',k') for some c', /.' REFERENCES Abadie, A. (2002): "Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models," JASA, forthcoming. Amemiya. T. (1985): Advanced Econometrics. Harvard University Press. Andrews, D. W. K. (1997): "A conditional Kolmogorov test," Econometrica, 65(5), 1097-1128. Buchinsky, M. (1994): "Changes in U.S. wage structure 1963-87: An application of quantile regression," Econometrica, 62, 405-458. (1995): "Estimating the Asymptotic Covarinace Matrix for Quantile Regression: A Monte-Carlo Study," J. Econometrics, 68, 303-338. Davydov, Y. A., M. A. Lifshits, AND N. V. Smorodina (1998): Local properties of distributions of stochastic functionals. American Mathematical Society, Providence, RI, Translated from the 1995 Russian original by V. E. Nazalkinskii and M. A. Shishkova. Doksum, K. (1974): "Empirical probability plots and statistical inference for nonlinear models in the twosample case," Ann. Statist., 2, 267-277. DURBIN, J. (1973a): Distribution theory for tests based on the sample distribution function. Society for Industrial and Applied Mathematics, Philadelphia, Pa., Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 9. (1973b): "Weak convergence of the sample distribution function when parameters are estimated," Ann. Statist., 1, 279-290. (1975): "Kolmogorov-Smirnov tests when parameters are estimated with applications to tests of exponentiality and tests on spacings," Biometrika, 62, 5-22. 10 Let A be the sup x x'Ax/x'x. maximum eigenvalue (characteristic root) of the symmetric matrix A. Then A = VICTOR CHERNOZHUKOV 14 (1976): "Kolmogorov-Smirnov tests when parameters are estimated," in Empirical distributions and processes (Selected Papers. Meeting on Math. Stochastics, Oberwolfach. 1976), pp. 33-44. Lecture Notes in Math., Vol. 566. Springer, Berlin. (1984): "Boundary crossing probabilities for Gaussian processes and applications to KolmogorovSmirnov statistics when parameters are estimated," in Limit theorems in probability and statistics. Vol. I. II (Veszprem, 1982), pp. 427-447. North-Holland, M. Knott. AND C. C. Taylor Amsterdam. "Components of Cramer-von Mises statistics. II," J. Roy. Statist. Soc. Ser. B, 37, 216-237. Goldberger. A. S. (1991): A course in econometrics. Harvard University Press, Cambridge, MA. Gutenbrunner. C. AND J. Jureckova (1992): "Regression rank scores and regression quantiles," Ann. Statist., 20(1), 305-330. Heckman. J. J., AND J. Smith (1997): "Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts," Rev. Econom. Stud., 64(4), 487-535, With the assistance of Nancy Clements, Evaluation of training and other social programmes (Madrid, Durbin. J.. (1975): 1993). J. L. (1992): "A smoothed maximum score estimator for the binary response model," Econometrica, 60(3), 505-531. (2001): "Bootstrap Methods for Median Regression Models," Econometrica. Koenker. R., AND G. S. Bassett (1978): "Regression Quantiles," Econometrica, 46, 33-50. Horowitz. "Quantile Regression for Duration Data: A Reappraisal R., AND Y. Bilias (2001): of the Pennsylvania Reemployment Bonus Experiments," Empirical Economics, to appear, also at http: //uuw.econ.uiuc edu/"roger /research/home .html. Koenker. R., AND J. A. F. Machado (1999): "Goodness of fit and related inference processes for quantile Koenker, . regression," Koenker. R J. . Amer. AND Statist. Z. Xiao Assoc, 94(448), 1296-1310. (2002a): "Inference on the Quantile Regression Process," Econometrica, forthcoming. (2002b): "Inference on the Quantile Regression Process: Electronic Appendix," at www.econ.uiuc.edu. Lehmann. cisco, E. L. (1974): Nonparametrics: statistical methods based on ranks. Holden-Day Inc., San FranWith the special assistance of H. J. M. d'Abrera, Holden-Day Series in Probability and Calif., Statistics. Lifshits, M. A. (1982): "Absolute continuity of the distributions of functionals of random processes," Teor. Veroyatnost. i Primenen., 27(3), 559-566. McFadden. D. (1989): "Testing for Stochastic Dominance," in Studies in Economics of Uncertainty in Honor of Josef Hadar, ed. by T. Fomny, and T. Seo. Meyer. B. (1995): "Lessson from the U.S. unemployment insurance experiments," J. of Economic Literature, 33, 131-191. Newey, W. K.. AND K. D. West (1987): "A simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix," Econometrica, 55(3), 703-708. Politis. D. N.. J. P. Romano, AND M. Wolf (1999): Subsampling. Springer- Verlag, New York. Portnoy, S. (1991): "Asymptotic behavior of regression quantiles in nonstationary, dependent cases," J. Multivariate Anal, 38(1), 100-113. Portnoy. S. (2001): "Censored Regression Quantiles," prepirint, www.stat.uiuc.edu. Romano. J. P. (1988): "A bootstrap revival of some nonparametric distance tests," J. Amer. Statist. Assoc, 83(403), 698-708. Sakov. A., AND P. Bickel (1999): "On the choice of in the out of n bootstrap in estimation problems," preprint, UC Berekeley. Sakov. A., AND P. J. Bickel (2000): "An Edgeworth expansion for the out of n bootstrapped median," Statist. Probab. Lett, 49(3), 217-223. Shorack, G. R.. AND J. A. Wellner (1986): Empirical processes with applications to statistics. John Wiley & Sons Inc., New York. van der Vaart. A. W. (1998): Asymptotic statistics. Cambridge University Press, Cambridge. m m m Table H= Empirical Rejection Results for 1. .5 Power = 7 n= 100 n- 200 = n = 72 7 = .2 7i level Khmaladze Test H= Bofmger H=.6 Bofmger Bofmger Size 5% Power Size = - 7 5 = 7 = .2 Power Size 7i = 7 -5 = 7=2 7i = -5 0.101 0.264 0.898 0.035 0.211 0.755 0.016 0.126 0.641 0.070 0.480 0.988 0.041 0.406 0.990 0.022 0.280 0.964 300 0.062 0.622 0.998 0.043 0.665 1.000 0.029 0.416 0.998 400 0.043 0.809 1.000 0.043 0.809 1.000 0.035 0.632 1.000 Notes: All results are from Koenker and Xiao (2002b). Symbol relative to the TABLE tic), for Bofmger Empirical rejection results for 2. various K, = b K 5% H denotes different bandwidth choices rule. resampling test (Smirnov Statis- x n 2 / 5 using 250 bootstrap draws and 500 Repe, titions. Subsampling ' rest = 7=2 = 7=2 0.014 0.348 0.980 0.026 0.350 200 0.052 0.752 1.000 0.059 300 0.058 0.910 1.000 0.058 400 0.054 0.980 1.000 0.064 7i = -5 7 Boot strap Test (b=n) Power Size 100 7 = n= n= n = n Subsampling Test (K = 10) (K=5) Power Size Power Size 7i = -5 7 = 7 = .2 7i = .5 0.954 0.022 0.316 0.968 0.728 1.000 0.038 0.728 1.000 0.924 1.000 0.074 0.918 1.000 0.978 1.000 0.056 0.970 1.000 maximal sim. 0.009 s.e. 0.009 0.009 Notes: All results are reproducible and the programs are available from the author. Table b = 3. 3000 ( The test results for the Hypothesis Null Location-shift 6(t) Location-scale shift S(t) =S =a+ Dominance 6{t) < Effect re-employment bonus treatment, using subsampling with replacement ) Alternative ± 5 a + 7q(t) S(T) 7a(r) S(t) -fi 3t:5{t) > Smirnov Statistic 5% level critical value Decision Reject 2.46 1.31 2.47 1.30 Reject 0.00 4.59 Accept Date Due -yfet]o~> MIT LIBRARIES 3 9080 02246 0957 '