Test (2012) 21:635–655 DOI 10.1007/s11749-011-0266-y O R I G I N A L PA P E R An informative subset-based estimator for censored quantile regression Yanlin Tang · Huixia Judy Wang · Xuming He · Zhongyi Zhu Received: 30 December 2010 / Accepted: 23 August 2011 / Published online: 23 September 2011 © Sociedad de Estadística e Investigación Operativa 2011 Abstract Quantile regression in the presence of fixed censoring has been studied extensively in the literature. However, existing methods either suffer from computational instability or require complex procedures involving trimming and smoothing, which complicates the asymptotic theory of the resulting estimators. In this paper, we propose a simple estimator that is obtained by applying standard quantile regression to observations in an informative subset. The proposed method is computationally convenient and conceptually transparent. We demonstrate that the proposed estimator achieves the same asymptotical efficiency as the Powell’s estimator, as long as the conditional censoring probability can be estimated consistently at a nonparametric rate and the estimated function satisfies some smoothness conditions. A simulation study suggests that the proposed estimator has stable and competitive performance relative to more elaborate competitors. Keywords Asymptotic efficiency · Censoring probability · Fixed censoring · Informative subset · Nonparametric · Quantile regression Mathematics Subject Classification (2000) 62G05 · 62G20 · 62N02 Y. Tang · Z. Zhu () Department of Statistics, Fudan University, Shanghai, China e-mail: zhuzy@fudan.edu.cn H.J. Wang Department of Statistics, North Carolina State University, Raleigh, USA X. He Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, USA 636 Y. Tang et al. 1 Introduction Suppose that yi∗ is the ith latent response variable, i = 1, . . . , n, but due to censoring, we only observe yi = max(yi∗ , C), where C is a known censoring point. Without loss of generality, we assume C = 0 throughout the paper. At a given quantile level 0 < τ < 1, we consider the following latent quantile regression model: yi∗ = xTi θ 0 (τ ) + ei , (1.1) where xi is a p-dimensional design vector, which is randomly sampled from a distribution P whose support is Rx ⊂ Rp , θ 0 (τ ) is the unknown coefficient vector, and ei is the random error whose τ th conditional quantile given xi equals zero. Throughout, we assume that ei are independent of each other. Under this model, xTi θ 0 (τ ) represents the τ th conditional quantile of yi∗ given xi . Estimation for the quantile regression model (1.1) in the presence of censoring has been studied extensively in the literature. Noticing that the τ th conditional quantile of the observed yi is max{Qτ (yi∗ |xi ), 0} by the equivariance property of quantiles to monotonic transformations, Powell (1986) proposed to estimate θ 0 (τ ) by the minimizer of Q1,n (θ ) = n−1 n ρτ yi − max xTi θ , 0 . i=1 Due to the nonconvexity of this objective function, the optimization problem is computationally challenging. Several authors have developed optimization algorithms to obtain Powell’s estimator; see Koenker and Park (1996), Fitzenberger (1997a, 1997b), Fitzenberger and Winker (2007) and Koenker (2008). However, the existing computational methods are unstable especially under heavy censoring, and the iterative linear programming methods only guarantee convergence to a local minimum; see Buchinsky (1994), Fitzenberger (1997a) and Chernozhukov and Hong (2002). Portnoy (2010) demonstrated that the use of nonlinear fit may cause Powell’s estimator to break down in cases where other estimators are robust. Following the arguments in Powell (1986), it can be easily shown that Powell’s estimator is asymptotically equivalent to the minimizer of Q2,n (θ ) = n −1 n ρτ yi − xTi θ I xTi θ 0 (τ ) > 0 . i=1 . Let δi = I (yi∗ > 0) be the censoring indicator. By the facts that π0 (xi ) = P (δi = 1|xi ) = P {ei > −xTi θ 0 (τ )|xi } and P {ei > 0|xi } = 1 − τ , the objective function Q2,n is equivalent to Q3,n (θ ) = n−1 n ρτ yi − xTi θ I π0 (xi ) > 1 − τ . i=1 This suggests that to obtain an estimator that is asymptotically equivalent to Powell’s estimator, we can simply apply standard quantile regression for uncensored data to Informative subset-based censored quantile regression 637 the subset {i : π0 (xi ) > 1 − τ }, including all the observations (even censored ones) for which the true τ th conditional quantile is above the censoring point 0. Motivated by this, we consider a simple estimator for censored quantile regression through analyzing data contained in an informative subset. In practice, the conditional censoring probability π0 (xi ) is unknown and has to be estimated. We will show that π0 (xi ) can be estimated nonparametrically, and this will not affect the asymptotic efficiency of the resulting quantile coefficient estimator. Similar ideas of using a subset of the sample for estimation has appeared in the censored quantile regression literature. Buchinsky and Hahn (1998), and Khan and Powell (2001) developed two different two-stage estimators. They restricted the regressors to a compact subset by using some trimming functions of xi , and estimated the censoring probability 1 − π0 (xi ) by either parametric or nonparametric regression. In addition, to obtain the root-n consistency and the asymptotic distribution of the two-stage estimators, the two methods replaced the indicator function I {π0 (xi ) > 1 − τ } by some smoothed weighting function w(xi ). The resulting estimators have asymptotic covariances depending on the trimming and smoothing functions, which make them asymptotically less efficient than Powell’s estimator. Under an envelope restriction on the censoring probability, Chernozhukov and Hong (2002) proposed a three-stage estimator based on selecting the subset iteratively 0 between {i : π̂ (xi ) > 1 − τ + d} and {i : xTi θ̂ (τ ) > δn }, where d and δn are positive 0 tuning parameters, π̂(xi ) is some estimate of π0 (xi ), and θ̂ (τ ) is the second-stage estimator of θ 0 (τ ). Quantile regression was also studied in the context of random censoring. For instance, Portnoy (2003) proposed an estimator through recursive reweighting by assuming that the conditional quantile functions of yi∗ are all linear in the predictors across all quantile levels, Zhou (2006) proposed a stable estimator based on the inverse-probability weighting method, and Wang and Wang (2009) proposed a locally weighted estimator based on the local Kaplan-Meier estimator of the conditional survival function. The estimators of Portnoy (2003) and Wang and Wang (2009) can be adapted for fixed censoring, though the theoretical properties of such extension have not been studied. The proposed estimator in this paper is a further simplification of existing methods for data subject to fixed censoring. We demonstrate that even without additional steps used in other methods, the estimator can achieve the same asymptotic efficiency as Powell’s estimator, as long as the conditional censoring probability can be estimated consistently at an appropriate nonparametric rate and the estimated function satisfies some smoothness conditions. The rest of the paper is organized as follows. In Sect. 2, we describe the proposed informative subset estimation procedure, and present the theoretical results, including the consistency and the asymptotic distribution of the proposed estimator. We assess the finite sample performance of the proposed estimator through a simulation study in Sect. 3. All the theoretical proofs are deferred to the appendix. 638 Y. Tang et al. 2 The proposed method 2.1 Estimation procedure In this subsection, we describe the proposed informative subset-based estimator for censored quantile regression. The proposed estimator can be obtained in two steps. Step 1. Estimate π0 (xi ) by using either parametric or nonparametric regression method for binary data, and denote the estimated conditional probability as π̂ (xi ). For details, see Remark 1 in Sect. 2.2. Step 2. Determine the informative subset Jn = {i : π̂ (xi ) > 1 − τ + cn }, where cn is a pre-specified small positive value with cn → 0 as n → ∞. Then θ 0 (τ ) can be estimated by applying standard quantile regression to the subset Jn , that is, θ̂ (τ ) is the minimizer of Qn (θ , π̂) = n−1 n ρτ yi − xTi θ I π̂(xi ) > 1 − τ + cn . (2.1) i=1 Here cn is added to exclude the boundary cases from the subset used in (2.1). The rate of cn required for establishing asymptotic properties is given in assumption A4 in Sect. 2.2. For a given π(·), the negative subgradient of Qn (θ , π) respect to θ is Mn (θ , π) = n−1 n xi τ − I yi − xTi θ < 0 I π(xi ) > 1 − τ + cn . i=1 It is shown in the appendix that E[Mn {θ 0 (τ ), π0 }] = 0 under model (1.1). Therefore, Mn (θ , π0 ) is an unbiased estimating function for θ 0 (τ ). Chernozhukov and Hong (2002) used parametric regression to estimate the conditional censoring probability, which may give inconsistent estimation of π0 (xi ) when the parametric model is misspecified. They assume an envelope restriction on the censoring probability, requiring that the misspecification is not severe. They use a fixed constant d to replace cn to avoid bias, and seek a further step to select an informative 0 0 subset based on xTi θ̂ (τ ), where θ̂ (τ ) is the initial estimator from their first step. In contrast, our proposed estimator can achieve the same asymptotic efficiency as Powell’s estimator, as long as the estimated censoring probability π̂(·) satisfies some smoothness conditions and converges to π0 (·) at the uniform rate of n−1/4 . Therefore, we allow both parametric and nonparametric estimation of π0 (·). 2.2 Asymptotic properties In this subsection, we establish the asymptotic properties of the proposed quantile coefficient estimator θ̂ (τ ). Throughout the paper, we use · to denote the L2 norm of a vector. For a given π(·), let π − π0 ∞ = supx |π(x) − π0 (x)|. For any vector a = (a1 , . . . , ap ) of p nonnegative integers, define the differential operator Informative subset-based censored quantile regression 639 p D a = ∂ |a| /∂x1 a1 · · · ∂xp ap , where |a| = k=1 ak . Let Rx be a bounded, convex subset of Rp with nonempty interior, and let 0 < α ≤ 1 be some positive constant. For any smooth function h : Rx → R, let h∞,p+α = max sup |D a h(x)| + max sup |a|≤p x |a|=p x=x |D a h(x) − D a h(x )| . x − x α p+α Let Cc (Rx ) be the set of all continuous functions h : Rx → R with h∞,p+α ≤ c. The following assumptions are needed to establish the asymptotic properties. A1. The covariate vector x has a bounded, convex support Rx , and a density function fx (·), which is bounded away from zero and infinity uniformly over Rx . In addition, E(xxT ) is a p × p positive definite matrix. A2. The conditional cumulative distribution function of y ∗ given x, F0 (t|x), has the first derivative with respect to t, denoted as f0 (t|x), which is continuous and uniformly bounded by f¯0 < ∞. A3. For any nonnegative sequence n → 0 and n large enough, λmin,n , the smallest eigenvalue of the matrix E[xxT f0 {xT θ 0 (τ )|x}I {xT θ 0 (τ ) > n }] satisfies λmin,n > λmin > 0. There exists a constant ς > 0 such that for any positive n → 0, supθ −θ 0 (τ )≤ς E{I (|xT θ | < n )} = O(n ). A4. cn → 0 and n1/4 cn is greater than some positive constant c∗ . A5. Assumptions on the conditional probability π0 (x) and its estimator: A5.1. For any positive n → 0 with n /cn → 1 and any x ∈ Rx , π0 (x) > 1 − τ + n implies xT θ 0 (τ ) > n∗ for some n∗ that satisfies n = O(n∗ ). p+α A5.2. P (π0 (x), π̂ (x) ∈ Cc (Rx )) → 1 for some positive α ∈ (0, 1] and finite c. A5.3. For any n → 0, supπ−π0 ∞ ≤n E[I {|π(x) − (1 − τ + cn )| < n }] = O(n ). A6. For any positive n → 0 with θ − θ 0 (τ ) ≤ n , E xI π0 (x) > 1 − τ + cn I xT θ ≤ 0 = E xI π0 (x) > 1 − τ + cn I xT θ ≤ 0 − I xT θ 0 (τ ) ≤ 0 . = −D∗n2 θ − θ 0 (τ ) , where D∗n2 is a semi-positive definite matrix satisfying 0 ≤ λmin (D∗n2 ) ≤ λmax (D∗n2 ) < ∞ for sufficiently large n. Assumption A1 assumes a bounded support for convenience, and this assumption can be relaxed if the censoring probability function is more smooth. Assumption A2 ensures that E{Mn (θ , π)} can be locally expanded to establish the consistency of θ̂ (τ ). The first part of assumption A3 is a standard condition in censored quantile regression. The second part of assumption A3 states a boundary condition on the covariates, which can be satisfied if at least one component of x is continuous and the true quantile function is not flat, and it is similar to assumption R.2 of Powell (1986). Assumption A5.1 basically requires that the derivative of π0 (x) is bounded and the true quantile curve is not flat. It is different from assumption (c) in Chernozhukov and Hong (2002), where the condition was imposed on the estimated censoring probability. Assumption A5.2 states the smoothness assumptions on π0 (·) 640 Y. Tang et al. and π̂(·), which is standard in nonparametric statistics (van der Vaart and Wellner 1996, p. 154). It can be easily verified that π̂(·) used in the simulation study satisfies assumption A5.2; see Stone (1985) and Xue and Wang (2010) for more details on the differentiability properties of nonparametric distribution function estimators. Assumption A5.3 requires π0 (x) to be nonflat near 1 − τ . Assumption A6 is essentially an application of the mean value theorem applied to the integral. By approximating the indicator function by a monotone smoothing function, we can see that D∗n2 can be approximated by a sequence of nonnegative definite matrices. When = o(n−1/4 ), D∗n2 (θ − θ 0 (τ )) = o(θ − θ 0 (τ )) due to A1 and A5.1. The following two theorems state the consistency and the limiting distribution of the informative subset-based estimator θ̂ (τ ). Theorem 2.1 At a given quantile level 0 < τ < 1, let θ̂ (τ ) be the minimizer of the objective function defined in (2.1). Assume that the conditional censoring probability can be estimated consistently with π̂ − π0 ∞ = op (1). Then under model (1.1) and assumptions A1–A5, we have θ̂ (τ ) −→ θ 0 (τ ) in probability as n → ∞. Theorem 2.2 Under the assumptions A1–A6 and that π̂ − π0 ∞ = op (n−1/4 ), we have n1/2 θ̂ (τ ) − θ 0 (τ ) −→ N 0, D−1 VD−1 , where D = E xxT f0 xT θ 0 (τ )|x I xT θ 0 (τ ) > 0 , and V = τ (1 − τ )E xxT I xT θ 0 (τ ) > 0 . Theorem 2.2 suggests that as long as the estimated censoring probability π̂(·) satisfies some smoothness assumption and converges to π0 (·) at the uniform rate of n−1/4 , our estimator has the same asymptotic efficiency as Powell’s estimator. Remark 1 If a parametric form of π0 (x) is known, a root-n consistent estimator π̂(x) can be obtained by using the maximum likelihood estimation. Otherwise, the 4th-root uniform consistency of π̂ (x) can be achieved by applying existing nonparametric estimation methods to data (δi , xi ), for instance, generalized linear regression with spline approximation (Stone 1982), generalized additive models (Hastie and Tibshirani 1990), local regression (Loader 1990) or nearest-neighbor generalized linear regression (Altman 1992). For all these methods, the smoothness condition on π̂(x) in A5.2 can be easily verified. Informative subset-based censored quantile regression 641 3 Simulation study We conduct a simulation study to assess the finite sample performance of the proposed informative subset-based estimator, referred to as ISUB, relative to some common competitors. We consider five different cases for generating the latent response variable yi∗ , i = 1, . . . , n. Due to censoring, we only observe yi = max(yi∗ , 0). For each case, the censoring proportion is around 40%, and the simulation is repeated 1000 times. The first four cases include one standard normal predictor, and the fifth case consists of two independent standard normal predictors. As upper quantiles are not affected much by left censoring, we focus on two quantile levels τ = 0.25 and 0.5. Two different sample sizes n = 200 and n = 500 are considered. For the first four cases involving one predictor xi1 , we employ the Probit model with B-spline approximation to estimate the conditional censoring probability π0 (xi1 ) in the implementation of ISUB. That is, we approximate π0 (·) by π0 (xi1 ) ≈ Φ b(xi1 )T γ , where b(xi1 ) = {b1 (xi1 ), . . . , bkn ++1 (xi1 )}T is the B-spline basis function, kn is the number of internal knots, is the degree of the B-spline basis, and γ is the spline coefficient vector. By Stone (1982), if kn ∝ n1/5 , the estimated π̂(xi1 ) is consistent to π0 (xi1 ) in the order of n−2/5 . In this simulation study, we choose = 2 corresponding to quadratic splines. For both n = 200 and 500, n1/5 is close to 3, so we set kn = 3. The knots are selected as the three empirical quartiles of xi1 . The tuning parameter cn is set as n−1/4 τ in Cases 1–4 and n−1/5 τ in Case 5. For comparison, we include four other estimators, Powell’s estimator (Powell 1986), Portnoy’s estimator (Portnoy 2003), Wang & Wang’s estimator (Wang and Wang 2009) and the three-step estimator proposed in Chernozhukov and Hong (2002), referred to as POW, POR, LCRQ and 3-Step, respectively. For Powell’s estimator, we use the BRECNS algorithm of Fitzenberger (1997b). Both POW and POR are implemented in the function “crq” in the R package quantreg. In the first step of the 3-Step estimator, the conditional censoring probability is estimated from logis2 as the covariates, and the cutoff value d is chosen tic regression using xi1 and xi1 as the 0.1th quantile of all π̂(xi1 ) such that π̂ (xi1 ) > 1 − τ . In the second step, δn 0 0 is chosen as the (1/3n−1/3 )th quantile of all xTi θ̂ (τ ) such that xTi θ̂ (τ ) > 0, where 0 xi = (1, xi1 )T , and θ̂ (τ ) is the initial estimator from the first step. To evaluate the performance of different estimators, we present the root mean squared errors (RMSE) of each estimator in the first four cases in Fig. 1. In each panel, each line connects the RMSE in eight different locations: locations 1, 3, 5, 7 are for the intercept estimates, and the rest are for the slope estimates; locations 1, 2, 5, 6 are for n = 200, and the others are for n = 500; locations 1–4 are for τ = 0.25, and the rest are for τ = 0.5. In addition, we report the empirical coverage probabilities (ECP) and the empirical mean lengths (EML) of 90% confidence intervals in Table 1. For ISUB and 3-Step, we construct the confidence intervals by applying the rank score method (Koenker 2005, Chap. 3.5) assuming non-identically distributed errors to the selected subset samples. For POR, LCRQ and POW, the confidence intervals are constructed by bootstrap where the bootstrap samples are obtained by resampling 642 Y. Tang et al. Fig. 1 Root mean squared errors (RMSE) of different estimators in Cases 1–4. The solid line is for ISUB, the solid line with open circles is for POW, the dash-dotted line is for POR, the thin dashed line is for LCRQ, and the thick dashed line is for 3-Step. The shaded area represents the 95% pointwise confidence band for the RMSE of ISUB the triples (yi , xi , δi ) with replacement. The rank score method used for ISUB and 3-Step, and the bootstrap method used for POW and POR are implemented in the “summary.rq” and “summary.crq” functions of the R package quantreg, respectively, and for LCRQ, 300 bootstrap samples are used for each data set. Case 1. Data are generated from the model yi∗ = θ1 + θ2 xi1 + ei , i = 1, . . . , n, where θ1 = 5/4, θ2 = 5, and xi1 and ei are independent N (0, 1). Therefore, the true quantile coefficients at τ = 0.25 and τ = 0.5 are θ1 (0.25) = 5/4 + Φ −1 (0.25), θ1 (0.5) = 5/4, θ2 (0.25) = θ2 (0.5) = 5. The support of xi1 is theoretically not bounded, but in all the cases, the results show no real difference if xi1 is trimmed to [−5, 5]. This represents an ideal case, where both the Probit link function used in ISUB and the global linearity assumption required by POR are satisfied. The POR and LCRQ have smaller mean squared errors than the other three methods. For all the methods, Informative subset-based censored quantile regression 643 Table 1 Empirical coverage probability (ECP) and empirical mean length (EML) for confidence intervals with nominal level 90% for Cases 1–4 τ n Method 100 × ECP θ1 θ2 EML θ1 θ2 100 × ECP θ1 θ2 Case 1 0.25 200 500 0.5 200 500 200 500 0.5 200 500 θ2 Case 2 POW POR LCRQ ISUB 3-Step POW POR LCRQ ISUB 3-Step 84.2 83.6 90.0 85.0 86.3 82.8 86.2 87.5 86.0 86.7 88.6 85.5 90.0 91.5 93.0 86.3 87.1 87.5 93.0 93.8 0.62 0.59 0.62 0.65 0.67 0.38 0.37 0.39 0.41 0.42 0.69 0.65 0.70 0.73 0.73 0.43 0.42 0.43 0.45 0.45 86.1 57.8 87.1 89.0 84.8 85.1 31.5 83.3 89.6 86.1 86.1 66.7 87.8 94.0 93.4 87.8 40.5 85.5 94.0 93.3 1.13 0.87 1.10 1.28 1.27 0.68 0.53 0.69 0.75 0.73 1.67 1.39 1.68 1.85 1.80 1.02 0.86 1.04 1.11 1.09 POW POR LCRQ ISUB 3-Step POW POR LCRQ ISUB 3-Step 83.7 83.8 91.0 87.7 87.1 82.8 86.5 87.6 87.3 87.7 86.0 86.4 90.6 92.8 93.8 83.9 85.6 89.7 93.4 94.7 0.49 0.49 0.52 0.53 0.56 0.30 0.31 0.32 0.34 0.35 0.58 0.57 0.59 0.59 0.62 0.35 0.35 0.37 0.37 0.38 84.0 66.8 90.3 88.5 85.6 81.7 43.7 86.1 87.3 85.5 86.2 73.2 90.6 93.7 93.3 85.9 53.1 88.6 93.3 93.1 0.70 0.66 0.81 0.85 0.77 0.43 0.41 0.51 0.52 0.48 1.20 1.11 1.30 1.40 1.31 0.74 0.69 0.82 0.85 0.81 POW POR LCRQ ISUB 3-Step POW POR LCRQ ISUB 3-Step 84.2 83.1 89.9 87.5 86.1 83.9 85.8 87.2 88.3 87.0 87.4 86.5 91.4 93.8 92.4 85.9 87.1 89.7 95.4 93.1 0.75 0.71 0.78 0.78 0.82 0.46 0.45 0.48 0.50 0.52 0.88 0.82 0.89 0.90 0.93 0.54 0.51 0.54 0.55 0.57 83.0 53.9 92.2 86.0 81.8 83.8 27.4 88.8 87.6 84.6 85.3 64.4 91.2 93.9 91.8 86.7 42.1 91.5 94.8 94.4 1.43 1.00 1.51 1.50 1.61 0.86 0.61 0.87 0.83 0.93 2.15 1.74 2.24 2.30 2.33 1.31 1.11 1.32 1.33 1.40 POW POR LCRQ ISUB 3-Step POW POR LCRQ ISUB 3-Step 84.9 84.0 91.1 86.6 87.2 84.0 86.2 88.3 87.9 88.2 88.4 87.1 91.0 92.4 94.1 85.8 87.5 90.5 92.6 94.4 0.52 0.52 0.56 0.53 0.59 0.31 0.32 0.35 0.34 0.37 0.62 0.61 0.66 0.63 0.67 0.37 0.37 0.40 0.39 0.41 83.5 64.3 92.7 83.3 83.4 81.3 40.0 89.9 85.7 85.7 86.3 69.5 93.3 91.6 92.6 86.4 52.1 90.3 93.0 93.4 0.76 0.69 0.89 0.78 0.84 0.46 0.43 0.54 0.47 0.53 1.31 1.16 1.45 1.40 1.44 0.80 0.72 0.88 0.83 0.89 Case 3 0.25 EML θ1 Case 4 644 Y. Tang et al. the confidence intervals for the intercept have ECP lower than the nominal level, except LCRQ. For the slope parameters, the confidence intervals from LCRQ, ISUB and 3-Step have ECP close to the nominal level, while those from POW and POR tend to be liberal. Case 2. Data are generated from the model yi∗ = θ1 (τ ) + θ2 xi1 + 1 + (xi1 − 0.5)2 ei (τ ), i = 1, . . . , n, where θ1 (0.25) = −1/2, θ1 (0.5) = 3/4, θ2 = 5, and ei (τ ) = ei − Φ −1 (τ ), where xi1 and ei are independent N (0, 1). In this case, there is a quadratic heteroscedasticity in the error, so the true censoring probability π0 (xi1 ) is quadratic in xi1 after the Probit link transformation. In addition, the conditional quantile of yi∗ is linear in xi1 only at the τ th quantile but is quadratic in xi1 anywhere else. As observed in Wang and Wang (2009) for randomly censored data, POR yields biased estimates in this case because the global linearity assumption is violated. The LCRQ, ISUB, 3-Step and POW perform similarly in terms of RMSE, but LCRQ and POW produce confidence intervals with lower ECP. Case 3. Data are generated from the model yi∗ = θ1 + θ2 xi1 + ei , i = 1, . . . , n, where θ1 = 4/3, θ2 = 5, xi1 ∼ N (0, 1) and ei ∼ t (3). Therefore, θ2 (0.25) = (0.25), where Ft−1 (·) is the θ2 (0.5) = 5, θ1 (0.5) = 4/3 and θ1 (0.25) = 4/3 + Ft−1 3 3 quantile function of the t distribution with 3 degrees of freedom. With the non-normal regression errors, the Probit link function used in ISUB is misspecified in this case. Since all the quantiles of yi∗ are linear in xi1 , the POR method is valid and produces estimates with similar RMSE as LCRQ and smaller than the other methods. Even though the Probit link function is misspecified, the ISUB method still performs competitively well. The LCRQ produces confidence intervals with ECP very close to the nominal level, while the POW produces confidence intervals with lower ECP than other methods. Case 4. Data are generated from the model yi∗ = θ1 (τ ) + θ2 xi1 + 1 + (xi1 − 0.5)2 ei (τ ), i = 1, . . . , n, where θ1 (0.25) = −3/5, θ1 (0.5) = 3/4, θ2 = 5, xi1 ∼ N (0, 1) and ei (τ ) = ei − Ft−1 (τ ) with ei ∼ t (3). This case contains non-normal and heteroscedastic errors. 3 Therefore, the true censoring probabilities π0 (xi1 ) neither has a Probit link nor is linear in xi1 after a link transformation. However, by using a nonparametric estimation of π0 (·), ISUB performs reasonably well. It yields estimates with similar RMSE as the POW and 3-Step methods, and confidence intervals with ECP close to 90%. LCRQ produces smaller RMSE, especially when the sample size is small, and it yields ECP closer to the nominal level with longer confidence intervals. Similar to Case 2, POR leads to very biased estimates. As observed in the other three cases, the confidence Informative subset-based censored quantile regression 645 intervals from the 3-Step method tend to be wider than those from the ISUB method while the ECP are comparable. Case 5. Data are generated from the model yi∗ = θ1 (τ ) + θ2 xi1 + θ3 xi2 + 1 + (xi1 − 0.5)2 /2 + (xi2 − 0.5)2 /2 ei (τ ), i = 1, . . . , n, where θ1 (0.25) = −2/3, θ1 (0.5) = 2/3, θ2 = 3, θ3 = 2, and ei (τ ) = ei − Φ −1 (τ ), where xi1 , xi2 and ei are independent N (0, 1). In this case, the censoring probability function involves the interaction between two covariates. We estimate π0 (·) by the nearest-neighbor locally linear logistic smoother, where the size of the neighborhood is 25% of all observations at n = 200 and 20% at n = 500. Table 2 summarizes the Table 2 The mean bias (Bias), root mean squared error (RMSE), and empirical coverage probability (ECP) for Case 5. The nominal level of the confidence interval is 90% Method 100× Bias θ1 100× RMSE θ2 θ3 θ1 θ2 100× ECP θ3 θ1 θ2 θ3 84.2 τ = 0.25, n = 200 POW −9.4 7.1 3.9 55.6 49.3 42.4 82.2 85.4 POR 25.7 −24.8 −17.7 40.1 43.1 35.1 79.2 83.2 86.2 −10.2 8.4 6.1 44.5 46.6 36.6 89.4 89.1 90.8 ISUB 25.7 −15.7 −10.9 58.6 45.3 37.4 83.1 87.4 87.5 3-Step 5.9 −4.6 −3.5 54.7 44.7 34.3 77.7 84.8 85.3 POW −4.1 2.3 2.1 30.4 30.2 25.5 82.5 87.1 83.9 POR 26.6 −27.0 −17.7 33.0 35.4 26.4 60.3 68.8 76.5 LCRQ 2.7 −1.4 −2.3 24.7 26.8 22.8 87.9 89.9 88.4 ISUB 12.8 −7.6 −5.3 35.1 27.8 23.1 84.6 86.0 86.5 3-Step 2.6 −1.5 −1.5 34.4 28.1 22.1 78.7 84.3 81.9 POW −2.1 −0.2 1.6 28.2 32.4 31.4 83.6 84.9 83.0 POR 15.3 −16.2 −10.5 26.6 31.5 28.0 80.5 83.0 87.3 LCRQ 0.9 −0.1 1.2 26.9 32.3 27.5 89.3 90.1 92.4 ISUB 11.0 −8.1 −4.1 38.7 34.1 29.5 86.1 87.2 87.9 3-Step 0.2 −2.2 1.4 35.1 32.1 26.6 80.4 83.2 85.2 POW −1.3 0.8 1.0 17.8 20.2 19.4 81.2 85.9 84.0 POR 14.9 −15.8 −10.7 20.0 22.8 19.2 70.1 77.8 81.9 LCRQ 5.8 −3.8 −3.7 15.9 19.5 17.7 89.1 88.5 88.7 ISUB 5.5 −4.0 −2.9 24.0 21.2 18.7 87.4 88.3 87.9 −1.1 0.2 −0.1 21.8 19.7 17.7 84.0 86.4 86.1 LCRQ τ = 0.25, n = 500 τ = 0.5, n = 200 τ = 0.5, n = 500 3-Step 646 Y. Tang et al. biases, root mean squared errors and coverage probabilities of 90% confidence intervals for different methods. The ISUB estimates tend to have some finite sample bias. More specifically, ISUB overestimates the intercept and underestimates the slopes. In this case, the quadratic logistic regression used in 3-Step provides a good estimation of π0 (·), which leads to a decent second-step estimator of θ0 (τ ). By fine-tuning the subset based on xTi θ̂ 0 (τ ) in the third step, the 3-Step method corrects the finite sample bias. We note that the bias of ISUB estimate decreases as the sample size increases. Furthermore, in this case, the bias is dominated by variance. Therefore, ISUB and 3-Step tend to have comparable RMSE. On the other hand, as already seen in Cases 2 and 4, the POR estimates have a systematic bias, which does not diminish even when the sample size increases. In this case, the LCRQ method leads to estimates with smaller RMSE than ISUB. In addition, the confidence intervals from LCRQ have ECP closer to the nominal level than from ISUB, but the former require a computationally intensive bootstrap procedure. Since both ISUB and 3-Step are based on informative subsets, we examine the size of subsets (average number of subjects) selected in ISUB and 3-Step. From Table 3, we see that the size of informative subsets increases with τ , and decreases if there is heterogeneity in the error. The sizes of subsets selected by ISUB and 3-Step are very close, except in Case 5 where 3-Step selects larger subsets than ISUB. We find that in Case 5, 3-Step leads to smaller RMSE but lower ECP. One explanation is that 3-Step selects more subjects to the informative subset, which leads to the inclusion of some boundary cases in the subset. If we look at the simulated data sets (flagged) for which the confidence intervals of 3-Step fail to cover the true values but the confidence intervals of ISUB.t (based on the true π0 (·)) cover the true values, we notice that Table 3 Average numbers of the subjects selected in the informative subsets, with standard errors in the parentheses τ = 0.25 n = 200 n = 500 ISUB 3-Step ISUB 3-Step Case 1 105.0 (9.0) 103.4 (7.3) 263.5 (14.7) 262.0 (11.7) Case 2 88.8 (7.6) 88.2 (9.0) 222.6 (13.3) 223.3 (14.5) Case 3 102.3 (8.3) 102.6 (7.8) 258.7 (11.7) 261.0 (12.6) Case 4 85.2 (9.4) 87.4 (12.1) 217.9 (14.6) 219.1 (18.0) Case 5 81.2 (10.3) 89.3 (12.2) 206.7 (16.2) 221.4 (18.1) τ = 0.5 n = 200 n = 500 ISUB 3-Step ISUB 3-Step Case 1 114.8 (8.8) 112.9 (7.2) 289.5 (13.0) 286.8 (11.2) Case 2 103.8 (8.1) 106.4 (8.4) 264.2 (13.0) 269.6 (13.1) Case 3 113.5 (8.1) 113.6 (7.1) 287.0 (12.2) 289.2 (11.8) Case 4 103.6 (7.7) 106.3 (8.4) 265.2 (10.8) 268.9 (13.3) Case 5 96.4 (9.5) 107.8 (9.5) 248.7 (15.3) 269.7 (15.2) Informative subset-based censored quantile regression 647 quite a few points are incorrectly included in the 3-Step subsets. We find that the differences in RMSE for the overall simulation and for the flagged samples are quite substantial. Because the frequency for such occurrences is not high (10–15% of the simulated data sets), they do not affect the overall RMSE that much, but the coverage probability could be off around 10 percent. In summary, Portnoy’s estimator tends to be the most efficient when the global linearity assumption is satisfied, but it can be seriously biased when the assumption is violated. The ECP from LCRQ appeared closer to the nominal level than those from ISUB, but much of the difference is because the LCRQ method used bootstrap while ISUB used the inversion of rank score test to construct confidence intervals. If we also use the bootstrap procedure to construct confidence intervals for the ISUB method, the results are then comparable to those from LCRQ. However, bootstrap is computationally much more intensive than the rank score method. As observed in Portnoy (2010), the BRECNS algorithm for Powell’s estimator is unstable and it broke down occasionally. In contrast, our proposed algorithm ISUB is computationally simple and stable. The theoretical properties of ISUB and 3-Step estimators suggest that both estimators have the same asymptotic efficiency as Powell’s estimator. In our simulation, we find that there is no clear winner between ISUB and 3-Step. It is the simplicity of ISUB that makes it attractive. Remark 2 For the proposed ISUB method, the indicator weighting function is asymptotically orthogonal to the score function in the sense that the efficiency of the estimator will not be affected by replacing the true weights with the estimated ones. Therefore, the rank score test can be directly applied to the informative subset selected for constructing confidence intervals. In contrast, the efficiency of the LCRQ estimates is affected by the estimated local weights. Therefore, the rank score test cannot be directly used in LCRQ without quantifying the variation caused by the weights estimation. 4 Discussion In this paper, we proposed a simple informative subset-based estimator for fixed censored quantile regression. Our proposed estimation is obtained by applying standard quantile regression to the selected informative subset. Therefore, the algorithm is computationally simple and stable. We need to point out that the idea of informative subsets is not totally new. However, existing methods based on the similar idea require additional steps, which complicate not only the computation but also the asymptotic theory. What we want to stress is that for fixed censoring, additional complications are not necessary for us to achieve the same asymptotic efficiency as the celebrated Powell’s estimator. In the more general case of possibly random censoring but the censored variable C is observed or known at all points, Chernozhukov and Hong (2002) pointed out a way to reduce the problem into that of fixed censoring. However, the asymptotical efficiency result may not carry over. In this paper, we assumed random designs so that the results from Chen et al. (2003) can be directly applied. However, under appropriate design conditions, the same theoretical results also hold 648 Y. Tang et al. for fixed designs. The main uniform approximation results needed for the proof rely on an exponential inequality commonly used in the empirical process theory, which can be generalized to a sum of independent but not necessarily identically distributed variables with bounded moments in the quantile regression framework; see for instance He and Shao (1996). Finally, we note that relatively little has been published on doubly censored quantile estimation; see Volgushev and Dette (2010) for a recent treatment. We hope that the idea of an informative subset can be generalized to more general settings in future work. Acknowledgement The research of Dr. Wang is partially supported by NSF grants DMS-0706963 and DMS-1007420. The research of Dr. He is partially supported by NSF grants DMS-1007396, DMS0724752 and NNSF of China grant 10828102. The research of Dr. Zhu is partially supported by NFC grants 10931002 and 1091112038. We would like to thank the Editor, an associate editor and two anonymous reviewers for their constructive comments that led to a major improvement of this article. Appendix Throughout the appendix, we will omit τ from various expressions such as θ 0 (τ ) and θ̂ (τ ), whenever it is clear. Let Θ denote a compact finite dimensional parameter set, and H denote an infinite dimensional parameter space, where θ ∈ Θ and π(·) ∈ H. Let Ck , k = 1, . . . , 8, be positive constants as finite bounds. Recall the negative subgradient of the objective function Qn (θ , π), Mn (θ , π) = n−1 n i=1 n . xi ψ(xi , yi , θ )I π(xi ) > 1 − τ + cn = n−1 mi (θ , π, cn ), i=1 where ψ(x, y, θ) = τ − I y − xT θ < 0 = τ − I y ∗ < 0, xT θ > 0 − I y ∗ > 0, y ∗ − xT θ < 0 . Note that E I y ∗ < 0, xT θ > 0 |x = P yi∗ < 0|x I xT θ > 0 = 1 − π0 (x) I xT θ > 0 , and E I y ∗ > 0, y ∗ − xT θ < 0 |x = I xT θ > 0 P 0 < y ∗ < xT θ |x = I xT θ > 0 P y ∗ < xT θ |x − 1 − π0 (x) . Therefore, E Mn (θ , π) = E mi (θ , π, cn ) = E xI π(x) > 1 − τ + cn τ − P y ∗ < xT θ|x I xT θ > 0 . Informative subset-based censored quantile regression 649 Lemma 5.1 Under assumptions A1–A5. For all positive values n = o(1), we have Mn (θ , π) − E Mn (θ , π) − Mn (θ 0 , π0 ) = op n−1/2 . sup θ −θ 0 ≤n ,π−π0 ∞ ≤n Proof Let Θ n = {θ : θ − θ 0 ≤ n }, Hn = {π : π − π0 ∞ ≤ n }, and (θ , π), (θ , π ) ∈ Θ n × Hn , with θ − θ ≤ n and π − π ∞ ≤ n . The result is a direct extension of Theorem 3 in Chen et al. (2003), from their fixed estimating function to our estimating function, where the infinite dimensional parameter depends on n through cn , and can be proved the same way as theirs. We only need verify the conditions (3.2) and (3.3) in their paper, as condition (3.1) is trivially satisfied when we set their mcj ≡ 0. Note that mi (θ , π, cn ) − mi θ , π , cn 2 = xi ψ(xi , yi , θ )I π(xi ) > 1 − τ + cn 2 − xi ψ xi , yi , θ I π (xi ) > 1 − τ + cn ≤ xi 2 τ 2 I π(xi ) > 1 − τ + cn − I π (xi ) > 1 − τ + cn + xi 2 I yi∗ < 0, xTi θ > 0 I π(xi ) > 1 − τ + cn − I yi∗ < 0, xTi θ > 0 I π (xi ) > 1 − τ + cn + xi 2 I yi∗ > 0, yi∗ − xTi θ < 0 I π(xi ) > 1 − τ + cn − I yi∗ > 0, yi∗ − xTi θ < 0 I π (xi ) > 1 − τ + cn ≤ xi 2 τ 2 + 2 I π(xi ) > 1 − τ + cn − I π (xi ) > 1 − τ + cn + xi 2 I yi∗ < 0, xTi θ > 0 − I yi∗ < 0, xTi θ > 0 + xi 2 I yi∗ > 0, yi∗ − xTi θ < 0 − I yi∗ > 0, yi∗ − xTi θ < 0 . = B1 + B2 + B3 . To verify condition (3.2) of Chen et al. (2003), it is sufficient to show that B1 + B2 + B3 = O(n ). E sup θ −θ≤n ,π −π∞ ≤n Note that B1 ≤ xi 2 τ 2 + 2 I π (xi ) > 1 − τ + cn ≥ π(xi ) + I π(xi ) > 1 − τ + cn ≥ π (xi ) . By assumptions A1 and A5.3, E xi 2 I π (xi ) > 1 − τ + cn ≥ π(xi ) ≤ E xi 2 I π(xi ) + n > 1 − τ + cn ≥ π(xi ) ≤ C1 E I π(xi ) + n > 1 − τ + cn ≥ π(xi ) ≤ C2 n . (5.1) 650 Y. Tang et al. Similarly, E[xi 2 I {π(xi ) > 1 − τ + cn ≥ π (xi )}] ≤ C3 n . Therefore, E sup B1 ≤ τ 2 + 2 (C2 + C3 )n . π :π −π∞ ≤n By assumption A3, E sup θ :θ −θ ≤n ≤E B2 sup θ :θ −θ ≤n xi 2 I xTi θ > 0 − I xTi θ > 0 ≤ C4 E I −xi n < xTi θ < xi n ≤ C5 n . By assumption A2, E sup θ :θ −θ≤n ≤E B3 sup θ :θ −θ≤n xi 2 I yi∗ − xTi θ < 0 − I yi∗ − xTi θ < 0 ≤ E xi 2 E I yi∗ − xTi θ < xi n − I yi∗ − xTi θ < −xi n |xi = E xi 2 F0 xTi θ + xi n |xi − F0 xTi θ − xi n |xi ≤ 2E xi 3 f¯0 n ≤ C6 n . Thus, condition (3.2) of Chen et al. (2003) holds with r = 2 and sj = 1/2. Therefore, their condition (3.3) is satisfied by our assumption A5.2 and their Remark 3(ii). Proof of Theorem 2.1 Note that P (y ∗ < xT θ 0 |x) = τ , and π0 (x) > 1 − τ + cn implies xT θ 0 > 0. Therefore, when plugging the true θ 0 and π0 into E{Mn (·, ·)}, we have E Mn (θ 0 , π0 ) = E xI π0 (x) > 1 − τ + cn τ − I xT θ 0 > 0 P y ∗ < xT θ 0 |x = E xI π0 (x) > 1 − τ + cn τ − τ I xT θ 0 > 0 = 0. The proof of consistency of θ̂ is similar to the lines used in the proof of Theorem 1 of Chen et al. (2003), and we only need verify the conditions (1.1)–(1.3) and (1.5 ) in their paper, as (1.4) follows from Lemma 5.1. (1.1) By the subgradient condition of quantile regression (Koenker 2005), there exists a vector v with coordinates |vi | ≤ 1 such that Mn (θ̂ , π̂ ) = n−1 xi vi : i ∈ Ξ = op n−1/2 , (5.2) where Ξ denotes a p-element subset of {1, 2, . . . , n}. Informative subset-based censored quantile regression 651 (1.2) By assumptions A3 and A5.1, θ 0 is the minimizer of E Qn (θ , π0 ) = E ρτ y − xT θ I π0 (x) > 1 − τ + cn . Because E{Qn (θ , π0 )} is a strictly convex function of θ , θ 0 is the unique minimizer of E{Qn (θ , π0 )}. This implies (1.2) of Chen et al. (2003). (1.3) By assumption A5.3, for any positive n → 0, θ ∈ Θ, and π − π0 ∞ ≤ n , similar to (5.1), we have E Mn (θ , π) − E Mn (θ , π0 ) = E x τ − P y ∗ < xT θ |x I xT θ > 0 × I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn ≤ E xI π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn ≤ C7 n . Thus, E{Mn (θ , π)} is uniformly continuous in π at π = π0 . (1.5 ) Let {n } be a sequence of positive numbers approaching zero as n → ∞. Note that E[xi I {yi ≤ xTi θ }I {π(xi ) > 1 − τ + cn }2 ] ≤ E(xi 2 ) ≤ C8 , under assumption A1. It then follows from Chebyshev’s inequality that sup θ∈Θ,π−π0 ∞ ≤n Mn (θ , π) − E Mn (θ , π) = op (1). Proof of Theorem 2.2 (I) We first prove the 4th-root consistency of θ̂ . We rewrite E{Mn (θ , π)} as E Mn (θ , π) = E xψ(x, y, θ )I π(x) > 1 − τ + cn . = b1 (θ ) + b2 (π) + b3 (θ , π), where b1 (θ ) = E xψ(x, y, θ)I π0 (x) > 1 − τ + cn , b2 (π) = E xψ(x, y, θ 0 ) I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn , b3 (θ , π) = E x ψ(x, y, θ) − ψ(x, y, θ 0 ) × I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn . (i) From Theorem 1, θ̂ is a consistent estimator. Let n → 0 be any positive sequence and θ ∈ Θ with θ − θ 0 ≤ n . Since π0 (x) > 1 − τ + cn implies xT θ 0 > 0, which leads to P (y < xT θ 0 |x)I {π0 (x) > 1 − τ + cn } = P (y ∗ < xT θ 0 |x)I {π0 (x) > 1 − τ + cn } = τ I {π0 (x) > 1 − τ + cn }. For θ − θ 0 ≤ n , we have 652 Y. Tang et al. b1 (θ ) = E Mn (θ , π0 ) = E xE ψ(x, y, θ )|x I π0 (x) > 1 − τ + cn = E x τ − P y < xT θ |x I π0 (x) > 1 − τ + cn = −E x P y < xT θ |x − P y ∗ < xT θ 0 |x I π0 (x) > 1 − τ + cn = −E x P y ∗ < xT θ |x − P y ∗ < xT θ 0 |x I π0 (x) > 1 − τ + cn + E x P y ∗ < xT θ |x − P y < xT θ |x I π0 (x) > 1 − τ + cn = −E f0 xT θ 0 |x xxT I π0 (x) > 1 − τ + cn (θ − θ 0 ) + E xP y ∗ < xT θ |x I π0 (x) > 1 − τ + cn I xT θ ≤ 0 + o(θ − θ 0 ). (5.3) Let D = E[f0 (xT θ 0 |x)xxT I (xT θ 0 > 0)]. By assumption A5.3, T E f0 x θ 0 |x xxT I 1 − τ < π0 (x) < 1 − τ + cn (θ − θ 0 ) = O(cn θ − θ 0 ) = o(θ − θ 0 ). Since π0 (x) > 1 − τ is equivalent to xT θ 0 > 0, we have D − E f0 xT θ 0 |x xxT I π0 (x) > 1 − τ + cn (θ − θ 0 ) = o(θ − θ 0 ). For the second term of (5.3), we use assumptions A3 and A6 to get E xP y ∗ < xT θ |x I π0 (x) > 1 − τ + cn I xT θ ≤ 0 = E xP y ∗ < xT θ 0 |x I π0 (x) > 1 − τ + cn I xT θ ≤ 0 + E x P y ∗ < xT θ |x − P y ∗ < xT θ 0 |x × I π0 (x) > 1 − τ + cn I xT θ ≤ 0 = τ E xI π0 (x) > 1 − τ + cn I xT θ ≤ 0 + E f0 xT θ 0 |x xxT (θ − θ 0 )I π0 (x) > 1 − τ + cn I xT θ ≤ 0 + o(θ − θ 0 ) = −τ D∗n2 (θ − θ 0 ) + o(θ − θ 0 ). So, b1 (θ ) = −{D + τ D∗n2 }(θ − θ 0 ) + o(θ − θ 0 ), in which by assumption A6, D + τ D∗n2 is positive definite. (ii) Because π −π0 ∞ = op (n−1/4 ), and n1/4 cn > c∗ > 0, then π(x) > 1−τ +cn implies π0 (x) > 1 − τ , and therefore xT θ 0 > 0. Since π0 (x) > 1 − τ + cn implies xT θ 0 > 0, it follows that Informative subset-based censored quantile regression 653 b2 (π) = E x τ − P y < xT θ 0 |x × I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn = E x τ − P y ∗ < xT θ 0 |x × I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn = 0. (iii) Similar to (5.1), we have b3 (θ , π) ≤ E x P y < xT θ 0 |x − P y < xT θ |x × I π0 (x) > 1 − τ + cn ≥ π(x) + I π(x) > 1 − τ + cn ≥ π0 (x) ≤ E x I π0 (x) > 1 − τ + cn ≥ π(x) + I π(x) > 1 − τ + cn ≥ π0 (x) = o n−1/4 . It follows that supθ −θ 0 ≤n ,π−π0 ∞ =op (n−1/4 ) b3 (θ , π) = o(n−1/4 ). (iv) By Lemma 5.1, we have Mn (θ , π) − Mn (θ 0 , π0 ) = − D + τ D∗n2 (θ − θ 0 ) + o(θ − θ 0 ) + b3 (θ , π) + op n−1/2 . (5.4) By Lemma 5.1 and (5.2), when plugging (θ̂ , π̂ ) into (5.4), we have − D + τ D∗n2 (θ̂ − θ 0 ) + op (θ̂ − θ 0 ) = −Mn (θ 0 , π0 ) + op n−1/4 . By the Central Limit Theorem, Mn (θ 0 , π0 ) is asymptotically normal and n1/2 Mn (θ 0 , π0 ) = Op (1). By assumptions A3 and A6, θ̂ − θ 0 = op (n−1/4 ). (II) Now we prove the root-n consistency and the asymptotic distribution of θ̂ . By assumption A5.1, π0 (x) > 1 − τ + cn implies xT θ 0 > dn , where n1/4 dn is greater than some positive constant, say, d ∗ . Now θ̂ − θ 0 = op (n−1/4 ), then by assumptions A1 and A5.1, we have D∗n2 (θ̂ − θ 0 ) = op (θ̂ − θ 0 ). Since n1/4 cn > c∗ > 0 and π − π0 ∞ = op (n−1/4 ), then by assumptions A1 and A5.1, both π0 (x) > 1 − τ + cn and π̂ (x) > 1 − τ + cn imply xT θ 0 > 0 and xT θ̂ > 0. Therefore, b3 (θ̂ , π̂) = E x P y ∗ < xT θ 0 |x − P y ∗ < xT θ̂ |x × I π̂(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn = E xxT f0 xT θ 0 |x (θ̂ − θ 0 ) + op (θ̂ − θ 0 ) 654 Y. Tang et al. × I π̂(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn = op (θ̂ − θ 0 ). By Lemma 4.1 and (5.2), when plugging (θ̂ , π̂ ) into (5.4), we have −Mn (θ 0 , π0 ) = −D(θ̂ − θ 0 ) + op n−1/2 + op (θ̂ − θ 0 ). By the Central Limit Theorem, Mn (θ 0 , π0 ) is asymptotically normal and n1/2 Mn (θ 0 , π0 ) = Op (1). Therefore, θ̂ − θ 0 = Op (n−1/2 ), and √ √ n(θ̂ − θ 0 ) = D−1 nMn (θ 0 , π0 ) + op (1). (5.5) Let V = τ (1 − τ )E[xxT I {π0 (x) > 1 − τ }] = τ (1 − τ )E[xxT I (xT θ 0 > 0)]. Note that Cov √ n nMn (θ 0 , π0 ) = τ (1 − τ )n−1 xi xTi I π0 (xi ) > 1 − τ + cn . i=1 By assumption A5.3, n which implies n −1 i=1 I {1 − τ Cov It follows from (5.5) that √ < π0 (xi ) ≤ 1 − τ + cn } = Op (cn ) = op (1), √ nMn (θ 0 , π0 ) −→ V. n(θ̂ − θ 0 ) → N 0, D−1 VD−1 . References Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185 Buchinsky M (1994) Changes in U.S. wage structure 1963-87: an application of quantile regression. Econometrica 62:405–458 Buchinsky M, Hahn J (1998) An alternative estimator for the censored quantile regression model. Econometrica 66:653–671 Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71:1591–1608 Chernozhukov V, Hong H (2002) Three-step censored regression quantile and extramarital affairs. J Am Stat Assoc 97:872–882 Fitzenberger B (1997a) Computational aspects of censored quantile regression. In: Dodge Y, Hayward (eds) Proceedings of the 3rd international conference on statistical data analysis based on the L1 norm and related methods. IMS, CA, pp 171–186 Fitzenberger B (1997b) A guide to censored quantile regressions. In: Maddala GS, Rao CR (eds) Handbook of statistics: robust inference, vol 15. North-Holland, Amsterdam, pp 405–437. MR1492720 Fitzenberger B, Winker P (2007) Improving the computation of censored quantile regressions. Comput Stat Data Anal 52:88–108 Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London He X, Shao Q (1996) A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Ann Math Stat 24:2608–2630 Khan S, Powell J (2001) Two-step estimation of semiparametric censored regression models. J Econom 103:73–110 Informative subset-based censored quantile regression 655 Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge Koenker R (2008) Censored quantile regression redux. J Stat Softw 27:1–25 Koenker R, Park B (1996) An interior point algorithm for nonlinear quantile regression. J Econom 71:265– 283 Loader C (1990) Local regression and likelihood. Springer, New York Portnoy S (2003) Censored regression quantiles. J Am Stat Assoc 98:1001–1012 Portnoy S (2010). Inconsistency of the Powell estimator: examples. Manuscript Powell J (1986) Censored regression quantiles. J Econom 32:143–155 Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040– 1053 Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat 13:689–705 van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, New York Volgushev S, Dette H (2010) Nonparametric quantile regression for twice censored data. arXiv:1007.3376 Wang H, Wang L (2009) Locally weighted censored quantile regression. J Am Stat Assoc 104:1117–1128 Xue L, Wang J (2010) Distribution function estimation by constrained polynomial spline regression. J Nonparametr Stat 22:443–457 Zhou L (2006) A simple censored median regression estimator. Stat Sin 16:1043–1058