Informative subset estimation for censored quantile

advertisement
Test (2012) 21:635–655
DOI 10.1007/s11749-011-0266-y
O R I G I N A L PA P E R
An informative subset-based estimator for censored
quantile regression
Yanlin Tang · Huixia Judy Wang · Xuming He ·
Zhongyi Zhu
Received: 30 December 2010 / Accepted: 23 August 2011 / Published online: 23 September 2011
© Sociedad de Estadística e Investigación Operativa 2011
Abstract Quantile regression in the presence of fixed censoring has been studied
extensively in the literature. However, existing methods either suffer from computational instability or require complex procedures involving trimming and smoothing,
which complicates the asymptotic theory of the resulting estimators. In this paper, we
propose a simple estimator that is obtained by applying standard quantile regression
to observations in an informative subset. The proposed method is computationally
convenient and conceptually transparent. We demonstrate that the proposed estimator achieves the same asymptotical efficiency as the Powell’s estimator, as long as the
conditional censoring probability can be estimated consistently at a nonparametric
rate and the estimated function satisfies some smoothness conditions. A simulation
study suggests that the proposed estimator has stable and competitive performance
relative to more elaborate competitors.
Keywords Asymptotic efficiency · Censoring probability · Fixed censoring ·
Informative subset · Nonparametric · Quantile regression
Mathematics Subject Classification (2000) 62G05 · 62G20 · 62N02
Y. Tang · Z. Zhu ()
Department of Statistics, Fudan University, Shanghai, China
e-mail: zhuzy@fudan.edu.cn
H.J. Wang
Department of Statistics, North Carolina State University, Raleigh, USA
X. He
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, USA
636
Y. Tang et al.
1 Introduction
Suppose that yi∗ is the ith latent response variable, i = 1, . . . , n, but due to censoring,
we only observe yi = max(yi∗ , C), where C is a known censoring point. Without
loss of generality, we assume C = 0 throughout the paper. At a given quantile level
0 < τ < 1, we consider the following latent quantile regression model:
yi∗ = xTi θ 0 (τ ) + ei ,
(1.1)
where xi is a p-dimensional design vector, which is randomly sampled from a distribution P whose support is Rx ⊂ Rp , θ 0 (τ ) is the unknown coefficient vector, and
ei is the random error whose τ th conditional quantile given xi equals zero. Throughout, we assume that ei are independent of each other. Under this model, xTi θ 0 (τ )
represents the τ th conditional quantile of yi∗ given xi .
Estimation for the quantile regression model (1.1) in the presence of censoring has
been studied extensively in the literature. Noticing that the τ th conditional quantile
of the observed yi is max{Qτ (yi∗ |xi ), 0} by the equivariance property of quantiles to
monotonic transformations, Powell (1986) proposed to estimate θ 0 (τ ) by the minimizer of
Q1,n (θ ) = n−1
n
ρτ yi − max xTi θ , 0 .
i=1
Due to the nonconvexity of this objective function, the optimization problem is computationally challenging. Several authors have developed optimization algorithms
to obtain Powell’s estimator; see Koenker and Park (1996), Fitzenberger (1997a,
1997b), Fitzenberger and Winker (2007) and Koenker (2008). However, the existing
computational methods are unstable especially under heavy censoring, and the iterative linear programming methods only guarantee convergence to a local minimum;
see Buchinsky (1994), Fitzenberger (1997a) and Chernozhukov and Hong (2002).
Portnoy (2010) demonstrated that the use of nonlinear fit may cause Powell’s estimator to break down in cases where other estimators are robust.
Following the arguments in Powell (1986), it can be easily shown that Powell’s
estimator is asymptotically equivalent to the minimizer of
Q2,n (θ ) = n
−1
n
ρτ yi − xTi θ I xTi θ 0 (τ ) > 0 .
i=1
.
Let δi = I (yi∗ > 0) be the censoring indicator. By the facts that π0 (xi ) = P (δi =
1|xi ) = P {ei > −xTi θ 0 (τ )|xi } and P {ei > 0|xi } = 1 − τ , the objective function Q2,n
is equivalent to
Q3,n (θ ) = n−1
n
ρτ yi − xTi θ I π0 (xi ) > 1 − τ .
i=1
This suggests that to obtain an estimator that is asymptotically equivalent to Powell’s
estimator, we can simply apply standard quantile regression for uncensored data to
Informative subset-based censored quantile regression
637
the subset {i : π0 (xi ) > 1 − τ }, including all the observations (even censored ones)
for which the true τ th conditional quantile is above the censoring point 0. Motivated
by this, we consider a simple estimator for censored quantile regression through analyzing data contained in an informative subset. In practice, the conditional censoring
probability π0 (xi ) is unknown and has to be estimated. We will show that π0 (xi ) can
be estimated nonparametrically, and this will not affect the asymptotic efficiency of
the resulting quantile coefficient estimator.
Similar ideas of using a subset of the sample for estimation has appeared in the
censored quantile regression literature. Buchinsky and Hahn (1998), and Khan and
Powell (2001) developed two different two-stage estimators. They restricted the regressors to a compact subset by using some trimming functions of xi , and estimated the censoring probability 1 − π0 (xi ) by either parametric or nonparametric
regression. In addition, to obtain the root-n consistency and the asymptotic distribution of the two-stage estimators, the two methods replaced the indicator function I {π0 (xi ) > 1 − τ } by some smoothed weighting function w(xi ). The resulting estimators have asymptotic covariances depending on the trimming and smoothing functions, which make them asymptotically less efficient than Powell’s estimator.
Under an envelope restriction on the censoring probability, Chernozhukov and
Hong (2002) proposed a three-stage estimator based on selecting the subset iteratively
0
between {i : π̂ (xi ) > 1 − τ + d} and {i : xTi θ̂ (τ ) > δn }, where d and δn are positive
0
tuning parameters, π̂(xi ) is some estimate of π0 (xi ), and θ̂ (τ ) is the second-stage
estimator of θ 0 (τ ).
Quantile regression was also studied in the context of random censoring. For instance, Portnoy (2003) proposed an estimator through recursive reweighting by assuming that the conditional quantile functions of yi∗ are all linear in the predictors
across all quantile levels, Zhou (2006) proposed a stable estimator based on the
inverse-probability weighting method, and Wang and Wang (2009) proposed a locally
weighted estimator based on the local Kaplan-Meier estimator of the conditional survival function. The estimators of Portnoy (2003) and Wang and Wang (2009) can be
adapted for fixed censoring, though the theoretical properties of such extension have
not been studied.
The proposed estimator in this paper is a further simplification of existing methods
for data subject to fixed censoring. We demonstrate that even without additional steps
used in other methods, the estimator can achieve the same asymptotic efficiency as
Powell’s estimator, as long as the conditional censoring probability can be estimated
consistently at an appropriate nonparametric rate and the estimated function satisfies
some smoothness conditions.
The rest of the paper is organized as follows. In Sect. 2, we describe the proposed
informative subset estimation procedure, and present the theoretical results, including
the consistency and the asymptotic distribution of the proposed estimator. We assess
the finite sample performance of the proposed estimator through a simulation study
in Sect. 3. All the theoretical proofs are deferred to the appendix.
638
Y. Tang et al.
2 The proposed method
2.1 Estimation procedure
In this subsection, we describe the proposed informative subset-based estimator for
censored quantile regression. The proposed estimator can be obtained in two steps.
Step 1. Estimate π0 (xi ) by using either parametric or nonparametric regression
method for binary data, and denote the estimated conditional probability as π̂ (xi ).
For details, see Remark 1 in Sect. 2.2.
Step 2. Determine the informative subset Jn = {i : π̂ (xi ) > 1 − τ + cn }, where cn
is a pre-specified small positive value with cn → 0 as n → ∞. Then θ 0 (τ ) can be
estimated by applying standard quantile regression to the subset Jn , that is, θ̂ (τ ) is
the minimizer of
Qn (θ , π̂) = n−1
n
ρτ yi − xTi θ I π̂(xi ) > 1 − τ + cn .
(2.1)
i=1
Here cn is added to exclude the boundary cases from the subset used in (2.1). The
rate of cn required for establishing asymptotic properties is given in assumption A4
in Sect. 2.2.
For a given π(·), the negative subgradient of Qn (θ , π) respect to θ is
Mn (θ , π) = n−1
n
xi τ − I yi − xTi θ < 0 I π(xi ) > 1 − τ + cn .
i=1
It is shown in the appendix that E[Mn {θ 0 (τ ), π0 }] = 0 under model (1.1). Therefore,
Mn (θ , π0 ) is an unbiased estimating function for θ 0 (τ ).
Chernozhukov and Hong (2002) used parametric regression to estimate the conditional censoring probability, which may give inconsistent estimation of π0 (xi ) when
the parametric model is misspecified. They assume an envelope restriction on the censoring probability, requiring that the misspecification is not severe. They use a fixed
constant d to replace cn to avoid bias, and seek a further step to select an informative
0
0
subset based on xTi θ̂ (τ ), where θ̂ (τ ) is the initial estimator from their first step.
In contrast, our proposed estimator can achieve the same asymptotic efficiency as
Powell’s estimator, as long as the estimated censoring probability π̂(·) satisfies some
smoothness conditions and converges to π0 (·) at the uniform rate of n−1/4 . Therefore,
we allow both parametric and nonparametric estimation of π0 (·).
2.2 Asymptotic properties
In this subsection, we establish the asymptotic properties of the proposed quantile
coefficient estimator θ̂ (τ ). Throughout the paper, we use · to denote the L2
norm of a vector. For a given π(·), let π − π0 ∞ = supx |π(x) − π0 (x)|. For any
vector a = (a1 , . . . , ap ) of p nonnegative integers, define the differential operator
Informative subset-based censored quantile regression
639
p
D a = ∂ |a| /∂x1 a1 · · · ∂xp ap , where |a| = k=1 ak . Let Rx be a bounded, convex subset of Rp with nonempty interior, and let 0 < α ≤ 1 be some positive constant. For
any smooth function h : Rx → R, let
h∞,p+α = max sup |D a h(x)| + max sup
|a|≤p x
|a|=p x=x
|D a h(x) − D a h(x )|
.
x − x α
p+α
Let Cc (Rx ) be the set of all continuous functions h : Rx → R with h∞,p+α ≤ c.
The following assumptions are needed to establish the asymptotic properties.
A1. The covariate vector x has a bounded, convex support Rx , and a density function
fx (·), which is bounded away from zero and infinity uniformly over Rx . In addition,
E(xxT ) is a p × p positive definite matrix.
A2. The conditional cumulative distribution function of y ∗ given x, F0 (t|x), has
the first derivative with respect to t, denoted as f0 (t|x), which is continuous and
uniformly bounded by f¯0 < ∞.
A3. For any nonnegative sequence n → 0 and n large enough, λmin,n , the
smallest eigenvalue of the matrix E[xxT f0 {xT θ 0 (τ )|x}I {xT θ 0 (τ ) > n }] satisfies
λmin,n > λmin > 0. There exists a constant ς > 0 such that for any positive n → 0,
supθ −θ 0 (τ )≤ς E{I (|xT θ | < n )} = O(n ).
A4. cn → 0 and n1/4 cn is greater than some positive constant c∗ .
A5. Assumptions on the conditional probability π0 (x) and its estimator:
A5.1. For any positive n → 0 with n /cn → 1 and any x ∈ Rx , π0 (x) > 1 − τ + n
implies xT θ 0 (τ ) > n∗ for some n∗ that satisfies n = O(n∗ ).
p+α
A5.2. P (π0 (x), π̂ (x) ∈ Cc (Rx )) → 1 for some positive α ∈ (0, 1] and finite c.
A5.3. For any n → 0, supπ−π0 ∞ ≤n E[I {|π(x) − (1 − τ + cn )| < n }] = O(n ).
A6. For any positive n → 0 with θ − θ 0 (τ ) ≤ n ,
E xI π0 (x) > 1 − τ + cn I xT θ ≤ 0
= E xI π0 (x) > 1 − τ + cn I xT θ ≤ 0 − I xT θ 0 (τ ) ≤ 0
.
= −D∗n2 θ − θ 0 (τ ) ,
where D∗n2 is a semi-positive definite matrix satisfying 0 ≤ λmin (D∗n2 ) ≤
λmax (D∗n2 ) < ∞ for sufficiently large n.
Assumption A1 assumes a bounded support for convenience, and this assumption can be relaxed if the censoring probability function is more smooth. Assumption A2 ensures that E{Mn (θ , π)} can be locally expanded to establish the consistency of θ̂ (τ ). The first part of assumption A3 is a standard condition in censored
quantile regression. The second part of assumption A3 states a boundary condition
on the covariates, which can be satisfied if at least one component of x is continuous and the true quantile function is not flat, and it is similar to assumption R.2
of Powell (1986). Assumption A5.1 basically requires that the derivative of π0 (x) is
bounded and the true quantile curve is not flat. It is different from assumption (c) in
Chernozhukov and Hong (2002), where the condition was imposed on the estimated
censoring probability. Assumption A5.2 states the smoothness assumptions on π0 (·)
640
Y. Tang et al.
and π̂(·), which is standard in nonparametric statistics (van der Vaart and Wellner
1996, p. 154). It can be easily verified that π̂(·) used in the simulation study satisfies
assumption A5.2; see Stone (1985) and Xue and Wang (2010) for more details on
the differentiability properties of nonparametric distribution function estimators. Assumption A5.3 requires π0 (x) to be nonflat near 1 − τ . Assumption A6 is essentially
an application of the mean value theorem applied to the integral. By approximating
the indicator function by a monotone smoothing function, we can see that D∗n2 can be
approximated by a sequence of nonnegative definite matrices. When = o(n−1/4 ),
D∗n2 (θ − θ 0 (τ )) = o(θ − θ 0 (τ )) due to A1 and A5.1.
The following two theorems state the consistency and the limiting distribution of
the informative subset-based estimator θ̂ (τ ).
Theorem 2.1 At a given quantile level 0 < τ < 1, let θ̂ (τ ) be the minimizer of the
objective function defined in (2.1). Assume that the conditional censoring probability
can be estimated consistently with π̂ − π0 ∞ = op (1). Then under model (1.1) and
assumptions A1–A5, we have
θ̂ (τ ) −→ θ 0 (τ )
in probability as n → ∞.
Theorem 2.2 Under the assumptions A1–A6 and that π̂ − π0 ∞ = op (n−1/4 ), we
have
n1/2 θ̂ (τ ) − θ 0 (τ ) −→ N 0, D−1 VD−1 ,
where
D = E xxT f0 xT θ 0 (τ )|x I xT θ 0 (τ ) > 0 ,
and
V = τ (1 − τ )E xxT I xT θ 0 (τ ) > 0 .
Theorem 2.2 suggests that as long as the estimated censoring probability π̂(·)
satisfies some smoothness assumption and converges to π0 (·) at the uniform rate of
n−1/4 , our estimator has the same asymptotic efficiency as Powell’s estimator.
Remark 1 If a parametric form of π0 (x) is known, a root-n consistent estimator π̂(x)
can be obtained by using the maximum likelihood estimation. Otherwise, the 4th-root
uniform consistency of π̂ (x) can be achieved by applying existing nonparametric
estimation methods to data (δi , xi ), for instance, generalized linear regression with
spline approximation (Stone 1982), generalized additive models (Hastie and Tibshirani 1990), local regression (Loader 1990) or nearest-neighbor generalized linear regression (Altman 1992). For all these methods, the smoothness condition on π̂(x) in
A5.2 can be easily verified.
Informative subset-based censored quantile regression
641
3 Simulation study
We conduct a simulation study to assess the finite sample performance of the proposed informative subset-based estimator, referred to as ISUB, relative to some common competitors. We consider five different cases for generating the latent response
variable yi∗ , i = 1, . . . , n. Due to censoring, we only observe yi = max(yi∗ , 0). For
each case, the censoring proportion is around 40%, and the simulation is repeated
1000 times. The first four cases include one standard normal predictor, and the fifth
case consists of two independent standard normal predictors. As upper quantiles are
not affected much by left censoring, we focus on two quantile levels τ = 0.25 and
0.5. Two different sample sizes n = 200 and n = 500 are considered.
For the first four cases involving one predictor xi1 , we employ the Probit
model with B-spline approximation to estimate the conditional censoring probability
π0 (xi1 ) in the implementation of ISUB. That is, we approximate π0 (·) by
π0 (xi1 ) ≈ Φ b(xi1 )T γ ,
where b(xi1 ) = {b1 (xi1 ), . . . , bkn ++1 (xi1 )}T is the B-spline basis function, kn is the
number of internal knots, is the degree of the B-spline basis, and γ is the spline
coefficient vector. By Stone (1982), if kn ∝ n1/5 , the estimated π̂(xi1 ) is consistent to
π0 (xi1 ) in the order of n−2/5 . In this simulation study, we choose = 2 corresponding
to quadratic splines. For both n = 200 and 500, n1/5 is close to 3, so we set kn = 3.
The knots are selected as the three empirical quartiles of xi1 . The tuning parameter
cn is set as n−1/4 τ in Cases 1–4 and n−1/5 τ in Case 5.
For comparison, we include four other estimators, Powell’s estimator (Powell
1986), Portnoy’s estimator (Portnoy 2003), Wang & Wang’s estimator (Wang and
Wang 2009) and the three-step estimator proposed in Chernozhukov and Hong
(2002), referred to as POW, POR, LCRQ and 3-Step, respectively. For Powell’s estimator, we use the BRECNS algorithm of Fitzenberger (1997b). Both POW and POR
are implemented in the function “crq” in the R package quantreg. In the first step of
the 3-Step estimator, the conditional censoring probability is estimated from logis2 as the covariates, and the cutoff value d is chosen
tic regression using xi1 and xi1
as the 0.1th quantile of all π̂(xi1 ) such that π̂ (xi1 ) > 1 − τ . In the second step, δn
0
0
is chosen as the (1/3n−1/3 )th quantile of all xTi θ̂ (τ ) such that xTi θ̂ (τ ) > 0, where
0
xi = (1, xi1 )T , and θ̂ (τ ) is the initial estimator from the first step.
To evaluate the performance of different estimators, we present the root mean
squared errors (RMSE) of each estimator in the first four cases in Fig. 1. In each
panel, each line connects the RMSE in eight different locations: locations 1, 3, 5, 7
are for the intercept estimates, and the rest are for the slope estimates; locations 1, 2,
5, 6 are for n = 200, and the others are for n = 500; locations 1–4 are for τ = 0.25,
and the rest are for τ = 0.5. In addition, we report the empirical coverage probabilities
(ECP) and the empirical mean lengths (EML) of 90% confidence intervals in Table 1.
For ISUB and 3-Step, we construct the confidence intervals by applying the rank
score method (Koenker 2005, Chap. 3.5) assuming non-identically distributed errors
to the selected subset samples. For POR, LCRQ and POW, the confidence intervals
are constructed by bootstrap where the bootstrap samples are obtained by resampling
642
Y. Tang et al.
Fig. 1 Root mean squared errors (RMSE) of different estimators in Cases 1–4. The solid line is for ISUB,
the solid line with open circles is for POW, the dash-dotted line is for POR, the thin dashed line is for
LCRQ, and the thick dashed line is for 3-Step. The shaded area represents the 95% pointwise confidence
band for the RMSE of ISUB
the triples (yi , xi , δi ) with replacement. The rank score method used for ISUB and
3-Step, and the bootstrap method used for POW and POR are implemented in the
“summary.rq” and “summary.crq” functions of the R package quantreg, respectively,
and for LCRQ, 300 bootstrap samples are used for each data set.
Case 1. Data are generated from the model
yi∗ = θ1 + θ2 xi1 + ei ,
i = 1, . . . , n,
where θ1 = 5/4, θ2 = 5, and xi1 and ei are independent N (0, 1). Therefore, the
true quantile coefficients at τ = 0.25 and τ = 0.5 are θ1 (0.25) = 5/4 + Φ −1 (0.25),
θ1 (0.5) = 5/4, θ2 (0.25) = θ2 (0.5) = 5. The support of xi1 is theoretically not
bounded, but in all the cases, the results show no real difference if xi1 is trimmed
to [−5, 5].
This represents an ideal case, where both the Probit link function used in ISUB and
the global linearity assumption required by POR are satisfied. The POR and LCRQ
have smaller mean squared errors than the other three methods. For all the methods,
Informative subset-based censored quantile regression
643
Table 1 Empirical coverage probability (ECP) and empirical mean length (EML) for confidence intervals
with nominal level 90% for Cases 1–4
τ
n
Method
100 × ECP
θ1
θ2
EML
θ1
θ2
100 × ECP
θ1
θ2
Case 1
0.25
200
500
0.5
200
500
200
500
0.5
200
500
θ2
Case 2
POW
POR
LCRQ
ISUB
3-Step
POW
POR
LCRQ
ISUB
3-Step
84.2
83.6
90.0
85.0
86.3
82.8
86.2
87.5
86.0
86.7
88.6
85.5
90.0
91.5
93.0
86.3
87.1
87.5
93.0
93.8
0.62
0.59
0.62
0.65
0.67
0.38
0.37
0.39
0.41
0.42
0.69
0.65
0.70
0.73
0.73
0.43
0.42
0.43
0.45
0.45
86.1
57.8
87.1
89.0
84.8
85.1
31.5
83.3
89.6
86.1
86.1
66.7
87.8
94.0
93.4
87.8
40.5
85.5
94.0
93.3
1.13
0.87
1.10
1.28
1.27
0.68
0.53
0.69
0.75
0.73
1.67
1.39
1.68
1.85
1.80
1.02
0.86
1.04
1.11
1.09
POW
POR
LCRQ
ISUB
3-Step
POW
POR
LCRQ
ISUB
3-Step
83.7
83.8
91.0
87.7
87.1
82.8
86.5
87.6
87.3
87.7
86.0
86.4
90.6
92.8
93.8
83.9
85.6
89.7
93.4
94.7
0.49
0.49
0.52
0.53
0.56
0.30
0.31
0.32
0.34
0.35
0.58
0.57
0.59
0.59
0.62
0.35
0.35
0.37
0.37
0.38
84.0
66.8
90.3
88.5
85.6
81.7
43.7
86.1
87.3
85.5
86.2
73.2
90.6
93.7
93.3
85.9
53.1
88.6
93.3
93.1
0.70
0.66
0.81
0.85
0.77
0.43
0.41
0.51
0.52
0.48
1.20
1.11
1.30
1.40
1.31
0.74
0.69
0.82
0.85
0.81
POW
POR
LCRQ
ISUB
3-Step
POW
POR
LCRQ
ISUB
3-Step
84.2
83.1
89.9
87.5
86.1
83.9
85.8
87.2
88.3
87.0
87.4
86.5
91.4
93.8
92.4
85.9
87.1
89.7
95.4
93.1
0.75
0.71
0.78
0.78
0.82
0.46
0.45
0.48
0.50
0.52
0.88
0.82
0.89
0.90
0.93
0.54
0.51
0.54
0.55
0.57
83.0
53.9
92.2
86.0
81.8
83.8
27.4
88.8
87.6
84.6
85.3
64.4
91.2
93.9
91.8
86.7
42.1
91.5
94.8
94.4
1.43
1.00
1.51
1.50
1.61
0.86
0.61
0.87
0.83
0.93
2.15
1.74
2.24
2.30
2.33
1.31
1.11
1.32
1.33
1.40
POW
POR
LCRQ
ISUB
3-Step
POW
POR
LCRQ
ISUB
3-Step
84.9
84.0
91.1
86.6
87.2
84.0
86.2
88.3
87.9
88.2
88.4
87.1
91.0
92.4
94.1
85.8
87.5
90.5
92.6
94.4
0.52
0.52
0.56
0.53
0.59
0.31
0.32
0.35
0.34
0.37
0.62
0.61
0.66
0.63
0.67
0.37
0.37
0.40
0.39
0.41
83.5
64.3
92.7
83.3
83.4
81.3
40.0
89.9
85.7
85.7
86.3
69.5
93.3
91.6
92.6
86.4
52.1
90.3
93.0
93.4
0.76
0.69
0.89
0.78
0.84
0.46
0.43
0.54
0.47
0.53
1.31
1.16
1.45
1.40
1.44
0.80
0.72
0.88
0.83
0.89
Case 3
0.25
EML
θ1
Case 4
644
Y. Tang et al.
the confidence intervals for the intercept have ECP lower than the nominal level,
except LCRQ. For the slope parameters, the confidence intervals from LCRQ, ISUB
and 3-Step have ECP close to the nominal level, while those from POW and POR
tend to be liberal.
Case 2. Data are generated from the model
yi∗ = θ1 (τ ) + θ2 xi1 + 1 + (xi1 − 0.5)2 ei (τ ),
i = 1, . . . , n,
where θ1 (0.25) = −1/2, θ1 (0.5) = 3/4, θ2 = 5, and ei (τ ) = ei − Φ −1 (τ ), where xi1
and ei are independent N (0, 1). In this case, there is a quadratic heteroscedasticity in
the error, so the true censoring probability π0 (xi1 ) is quadratic in xi1 after the Probit
link transformation. In addition, the conditional quantile of yi∗ is linear in xi1 only
at the τ th quantile but is quadratic in xi1 anywhere else. As observed in Wang and
Wang (2009) for randomly censored data, POR yields biased estimates in this case
because the global linearity assumption is violated. The LCRQ, ISUB, 3-Step and
POW perform similarly in terms of RMSE, but LCRQ and POW produce confidence
intervals with lower ECP.
Case 3. Data are generated from the model
yi∗ = θ1 + θ2 xi1 + ei ,
i = 1, . . . , n,
where θ1 = 4/3, θ2 = 5, xi1 ∼ N (0, 1) and ei ∼ t (3). Therefore, θ2 (0.25) =
(0.25), where Ft−1
(·) is the
θ2 (0.5) = 5, θ1 (0.5) = 4/3 and θ1 (0.25) = 4/3 + Ft−1
3
3
quantile function of the t distribution with 3 degrees of freedom. With the non-normal
regression errors, the Probit link function used in ISUB is misspecified in this case.
Since all the quantiles of yi∗ are linear in xi1 , the POR method is valid and produces
estimates with similar RMSE as LCRQ and smaller than the other methods. Even
though the Probit link function is misspecified, the ISUB method still performs competitively well. The LCRQ produces confidence intervals with ECP very close to the
nominal level, while the POW produces confidence intervals with lower ECP than
other methods.
Case 4. Data are generated from the model
yi∗ = θ1 (τ ) + θ2 xi1 + 1 + (xi1 − 0.5)2 ei (τ ),
i = 1, . . . , n,
where θ1 (0.25) = −3/5, θ1 (0.5) = 3/4, θ2 = 5, xi1 ∼ N (0, 1) and ei (τ ) = ei −
Ft−1
(τ ) with ei ∼ t (3). This case contains non-normal and heteroscedastic errors.
3
Therefore, the true censoring probabilities π0 (xi1 ) neither has a Probit link nor is linear in xi1 after a link transformation. However, by using a nonparametric estimation
of π0 (·), ISUB performs reasonably well. It yields estimates with similar RMSE as the
POW and 3-Step methods, and confidence intervals with ECP close to 90%. LCRQ
produces smaller RMSE, especially when the sample size is small, and it yields ECP
closer to the nominal level with longer confidence intervals. Similar to Case 2, POR
leads to very biased estimates. As observed in the other three cases, the confidence
Informative subset-based censored quantile regression
645
intervals from the 3-Step method tend to be wider than those from the ISUB method
while the ECP are comparable.
Case 5. Data are generated from the model
yi∗ = θ1 (τ ) + θ2 xi1 + θ3 xi2 + 1 + (xi1 − 0.5)2 /2 + (xi2 − 0.5)2 /2 ei (τ ),
i = 1, . . . , n,
where θ1 (0.25) = −2/3, θ1 (0.5) = 2/3, θ2 = 3, θ3 = 2, and ei (τ ) = ei − Φ −1 (τ ),
where xi1 , xi2 and ei are independent N (0, 1). In this case, the censoring probability
function involves the interaction between two covariates. We estimate π0 (·) by the
nearest-neighbor locally linear logistic smoother, where the size of the neighborhood
is 25% of all observations at n = 200 and 20% at n = 500. Table 2 summarizes the
Table 2 The mean bias (Bias), root mean squared error (RMSE), and empirical coverage probability
(ECP) for Case 5. The nominal level of the confidence interval is 90%
Method
100× Bias
θ1
100× RMSE
θ2
θ3
θ1
θ2
100× ECP
θ3
θ1
θ2
θ3
84.2
τ = 0.25, n = 200
POW
−9.4
7.1
3.9
55.6
49.3
42.4
82.2
85.4
POR
25.7
−24.8
−17.7
40.1
43.1
35.1
79.2
83.2
86.2
−10.2
8.4
6.1
44.5
46.6
36.6
89.4
89.1
90.8
ISUB
25.7
−15.7
−10.9
58.6
45.3
37.4
83.1
87.4
87.5
3-Step
5.9
−4.6
−3.5
54.7
44.7
34.3
77.7
84.8
85.3
POW
−4.1
2.3
2.1
30.4
30.2
25.5
82.5
87.1
83.9
POR
26.6
−27.0
−17.7
33.0
35.4
26.4
60.3
68.8
76.5
LCRQ
2.7
−1.4
−2.3
24.7
26.8
22.8
87.9
89.9
88.4
ISUB
12.8
−7.6
−5.3
35.1
27.8
23.1
84.6
86.0
86.5
3-Step
2.6
−1.5
−1.5
34.4
28.1
22.1
78.7
84.3
81.9
POW
−2.1
−0.2
1.6
28.2
32.4
31.4
83.6
84.9
83.0
POR
15.3
−16.2
−10.5
26.6
31.5
28.0
80.5
83.0
87.3
LCRQ
0.9
−0.1
1.2
26.9
32.3
27.5
89.3
90.1
92.4
ISUB
11.0
−8.1
−4.1
38.7
34.1
29.5
86.1
87.2
87.9
3-Step
0.2
−2.2
1.4
35.1
32.1
26.6
80.4
83.2
85.2
POW
−1.3
0.8
1.0
17.8
20.2
19.4
81.2
85.9
84.0
POR
14.9
−15.8
−10.7
20.0
22.8
19.2
70.1
77.8
81.9
LCRQ
5.8
−3.8
−3.7
15.9
19.5
17.7
89.1
88.5
88.7
ISUB
5.5
−4.0
−2.9
24.0
21.2
18.7
87.4
88.3
87.9
−1.1
0.2
−0.1
21.8
19.7
17.7
84.0
86.4
86.1
LCRQ
τ = 0.25, n = 500
τ = 0.5, n = 200
τ = 0.5, n = 500
3-Step
646
Y. Tang et al.
biases, root mean squared errors and coverage probabilities of 90% confidence intervals for different methods. The ISUB estimates tend to have some finite sample bias.
More specifically, ISUB overestimates the intercept and underestimates the slopes.
In this case, the quadratic logistic regression used in 3-Step provides a good estimation of π0 (·), which leads to a decent second-step estimator of θ0 (τ ). By fine-tuning
the subset based on xTi θ̂ 0 (τ ) in the third step, the 3-Step method corrects the finite
sample bias. We note that the bias of ISUB estimate decreases as the sample size
increases. Furthermore, in this case, the bias is dominated by variance. Therefore,
ISUB and 3-Step tend to have comparable RMSE. On the other hand, as already seen
in Cases 2 and 4, the POR estimates have a systematic bias, which does not diminish even when the sample size increases. In this case, the LCRQ method leads to
estimates with smaller RMSE than ISUB. In addition, the confidence intervals from
LCRQ have ECP closer to the nominal level than from ISUB, but the former require
a computationally intensive bootstrap procedure.
Since both ISUB and 3-Step are based on informative subsets, we examine the size
of subsets (average number of subjects) selected in ISUB and 3-Step. From Table 3,
we see that the size of informative subsets increases with τ , and decreases if there is
heterogeneity in the error. The sizes of subsets selected by ISUB and 3-Step are very
close, except in Case 5 where 3-Step selects larger subsets than ISUB. We find that in
Case 5, 3-Step leads to smaller RMSE but lower ECP. One explanation is that 3-Step
selects more subjects to the informative subset, which leads to the inclusion of some
boundary cases in the subset. If we look at the simulated data sets (flagged) for which
the confidence intervals of 3-Step fail to cover the true values but the confidence
intervals of ISUB.t (based on the true π0 (·)) cover the true values, we notice that
Table 3 Average numbers of the subjects selected in the informative subsets, with standard errors in the
parentheses
τ = 0.25
n = 200
n = 500
ISUB
3-Step
ISUB
3-Step
Case 1
105.0 (9.0)
103.4 (7.3)
263.5 (14.7)
262.0 (11.7)
Case 2
88.8 (7.6)
88.2 (9.0)
222.6 (13.3)
223.3 (14.5)
Case 3
102.3 (8.3)
102.6 (7.8)
258.7 (11.7)
261.0 (12.6)
Case 4
85.2 (9.4)
87.4 (12.1)
217.9 (14.6)
219.1 (18.0)
Case 5
81.2 (10.3)
89.3 (12.2)
206.7 (16.2)
221.4 (18.1)
τ = 0.5
n = 200
n = 500
ISUB
3-Step
ISUB
3-Step
Case 1
114.8 (8.8)
112.9 (7.2)
289.5 (13.0)
286.8 (11.2)
Case 2
103.8 (8.1)
106.4 (8.4)
264.2 (13.0)
269.6 (13.1)
Case 3
113.5 (8.1)
113.6 (7.1)
287.0 (12.2)
289.2 (11.8)
Case 4
103.6 (7.7)
106.3 (8.4)
265.2 (10.8)
268.9 (13.3)
Case 5
96.4 (9.5)
107.8 (9.5)
248.7 (15.3)
269.7 (15.2)
Informative subset-based censored quantile regression
647
quite a few points are incorrectly included in the 3-Step subsets. We find that the
differences in RMSE for the overall simulation and for the flagged samples are quite
substantial. Because the frequency for such occurrences is not high (10–15% of the
simulated data sets), they do not affect the overall RMSE that much, but the coverage
probability could be off around 10 percent.
In summary, Portnoy’s estimator tends to be the most efficient when the global linearity assumption is satisfied, but it can be seriously biased when the assumption is
violated. The ECP from LCRQ appeared closer to the nominal level than those from
ISUB, but much of the difference is because the LCRQ method used bootstrap while
ISUB used the inversion of rank score test to construct confidence intervals. If we also
use the bootstrap procedure to construct confidence intervals for the ISUB method,
the results are then comparable to those from LCRQ. However, bootstrap is computationally much more intensive than the rank score method. As observed in Portnoy
(2010), the BRECNS algorithm for Powell’s estimator is unstable and it broke down
occasionally. In contrast, our proposed algorithm ISUB is computationally simple and
stable. The theoretical properties of ISUB and 3-Step estimators suggest that both estimators have the same asymptotic efficiency as Powell’s estimator. In our simulation,
we find that there is no clear winner between ISUB and 3-Step. It is the simplicity of
ISUB that makes it attractive.
Remark 2 For the proposed ISUB method, the indicator weighting function is asymptotically orthogonal to the score function in the sense that the efficiency of the estimator will not be affected by replacing the true weights with the estimated ones.
Therefore, the rank score test can be directly applied to the informative subset selected for constructing confidence intervals. In contrast, the efficiency of the LCRQ
estimates is affected by the estimated local weights. Therefore, the rank score test
cannot be directly used in LCRQ without quantifying the variation caused by the
weights estimation.
4 Discussion
In this paper, we proposed a simple informative subset-based estimator for fixed censored quantile regression. Our proposed estimation is obtained by applying standard
quantile regression to the selected informative subset. Therefore, the algorithm is
computationally simple and stable. We need to point out that the idea of informative
subsets is not totally new. However, existing methods based on the similar idea require additional steps, which complicate not only the computation but also the asymptotic theory. What we want to stress is that for fixed censoring, additional complications are not necessary for us to achieve the same asymptotic efficiency as the celebrated Powell’s estimator. In the more general case of possibly random censoring but
the censored variable C is observed or known at all points, Chernozhukov and Hong
(2002) pointed out a way to reduce the problem into that of fixed censoring. However, the asymptotical efficiency result may not carry over. In this paper, we assumed
random designs so that the results from Chen et al. (2003) can be directly applied.
However, under appropriate design conditions, the same theoretical results also hold
648
Y. Tang et al.
for fixed designs. The main uniform approximation results needed for the proof rely
on an exponential inequality commonly used in the empirical process theory, which
can be generalized to a sum of independent but not necessarily identically distributed
variables with bounded moments in the quantile regression framework; see for instance He and Shao (1996). Finally, we note that relatively little has been published
on doubly censored quantile estimation; see Volgushev and Dette (2010) for a recent
treatment. We hope that the idea of an informative subset can be generalized to more
general settings in future work.
Acknowledgement The research of Dr. Wang is partially supported by NSF grants DMS-0706963
and DMS-1007420. The research of Dr. He is partially supported by NSF grants DMS-1007396, DMS0724752 and NNSF of China grant 10828102. The research of Dr. Zhu is partially supported by NFC
grants 10931002 and 1091112038. We would like to thank the Editor, an associate editor and two anonymous reviewers for their constructive comments that led to a major improvement of this article.
Appendix
Throughout the appendix, we will omit τ from various expressions such as θ 0 (τ ) and
θ̂ (τ ), whenever it is clear. Let Θ denote a compact finite dimensional parameter set,
and H denote an infinite dimensional parameter space, where θ ∈ Θ and π(·) ∈ H.
Let Ck , k = 1, . . . , 8, be positive constants as finite bounds.
Recall the negative subgradient of the objective function Qn (θ , π),
Mn (θ , π) = n−1
n
i=1
n
.
xi ψ(xi , yi , θ )I π(xi ) > 1 − τ + cn = n−1
mi (θ , π, cn ),
i=1
where
ψ(x, y, θ) = τ − I y − xT θ < 0
= τ − I y ∗ < 0, xT θ > 0 − I y ∗ > 0, y ∗ − xT θ < 0 .
Note that
E I y ∗ < 0, xT θ > 0 |x = P yi∗ < 0|x I xT θ > 0 = 1 − π0 (x) I xT θ > 0 ,
and
E I y ∗ > 0, y ∗ − xT θ < 0 |x = I xT θ > 0 P 0 < y ∗ < xT θ |x
= I xT θ > 0 P y ∗ < xT θ |x − 1 − π0 (x) .
Therefore,
E Mn (θ , π) = E mi (θ , π, cn )
= E xI π(x) > 1 − τ + cn τ − P y ∗ < xT θ|x I xT θ > 0 .
Informative subset-based censored quantile regression
649
Lemma 5.1 Under assumptions A1–A5. For all positive values n = o(1), we have
Mn (θ , π) − E Mn (θ , π) − Mn (θ 0 , π0 )
= op n−1/2 .
sup
θ −θ 0 ≤n ,π−π0 ∞ ≤n
Proof Let Θ n = {θ : θ − θ 0 ≤ n }, Hn = {π : π − π0 ∞ ≤ n }, and (θ , π),
(θ , π ) ∈ Θ n × Hn , with θ − θ ≤ n and π − π ∞ ≤ n . The result is a direct
extension of Theorem 3 in Chen et al. (2003), from their fixed estimating function
to our estimating function, where the infinite dimensional parameter depends on n
through cn , and can be proved the same way as theirs. We only need verify the conditions (3.2) and (3.3) in their paper, as condition (3.1) is trivially satisfied when we
set their mcj ≡ 0. Note that
mi (θ , π, cn ) − mi θ , π , cn 2
= xi ψ(xi , yi , θ )I π(xi ) > 1 − τ + cn
2
− xi ψ xi , yi , θ I π (xi ) > 1 − τ + cn ≤ xi 2 τ 2 I π(xi ) > 1 − τ + cn − I π (xi ) > 1 − τ + cn + xi 2 I yi∗ < 0, xTi θ > 0 I π(xi ) > 1 − τ + cn
− I yi∗ < 0, xTi θ > 0 I π (xi ) > 1 − τ + cn + xi 2 I yi∗ > 0, yi∗ − xTi θ < 0 I π(xi ) > 1 − τ + cn
− I yi∗ > 0, yi∗ − xTi θ < 0 I π (xi ) > 1 − τ + cn ≤ xi 2 τ 2 + 2 I π(xi ) > 1 − τ + cn − I π (xi ) > 1 − τ + cn + xi 2 I yi∗ < 0, xTi θ > 0 − I yi∗ < 0, xTi θ > 0 + xi 2 I yi∗ > 0, yi∗ − xTi θ < 0 − I yi∗ > 0, yi∗ − xTi θ < 0 .
= B1 + B2 + B3 .
To verify condition (3.2) of Chen et al. (2003), it is sufficient to show that
B1 + B2 + B3 = O(n ).
E
sup
θ −θ≤n ,π −π∞ ≤n
Note that
B1 ≤ xi 2 τ 2 + 2 I π (xi ) > 1 − τ + cn ≥ π(xi )
+ I π(xi ) > 1 − τ + cn ≥ π (xi ) .
By assumptions A1 and A5.3,
E xi 2 I π (xi ) > 1 − τ + cn ≥ π(xi )
≤ E xi 2 I π(xi ) + n > 1 − τ + cn ≥ π(xi )
≤ C1 E I π(xi ) + n > 1 − τ + cn ≥ π(xi ) ≤ C2 n .
(5.1)
650
Y. Tang et al.
Similarly, E[xi 2 I {π(xi ) > 1 − τ + cn ≥ π (xi )}] ≤ C3 n . Therefore,
E
sup
B1 ≤ τ 2 + 2 (C2 + C3 )n .
π :π −π∞ ≤n
By assumption A3,
E
sup
θ :θ −θ ≤n
≤E
B2
sup
θ :θ −θ ≤n
xi 2 I xTi θ > 0 − I xTi θ > 0 ≤ C4 E I −xi n < xTi θ < xi n
≤ C5 n .
By assumption A2,
E
sup
θ :θ −θ≤n
≤E
B3
sup
θ :θ −θ≤n
xi 2 I yi∗ − xTi θ < 0 − I yi∗ − xTi θ < 0 ≤ E xi 2 E I yi∗ − xTi θ < xi n − I yi∗ − xTi θ < −xi n |xi
= E xi 2 F0 xTi θ + xi n |xi − F0 xTi θ − xi n |xi
≤ 2E xi 3 f¯0 n
≤ C6 n .
Thus, condition (3.2) of Chen et al. (2003) holds with r = 2 and sj = 1/2. Therefore,
their condition (3.3) is satisfied by our assumption A5.2 and their Remark 3(ii). Proof of Theorem 2.1 Note that P (y ∗ < xT θ 0 |x) = τ , and π0 (x) > 1 − τ + cn implies
xT θ 0 > 0. Therefore, when plugging the true θ 0 and π0 into E{Mn (·, ·)}, we have
E Mn (θ 0 , π0 ) = E xI π0 (x) > 1 − τ + cn τ − I xT θ 0 > 0 P y ∗ < xT θ 0 |x
= E xI π0 (x) > 1 − τ + cn τ − τ I xT θ 0 > 0
= 0.
The proof of consistency of θ̂ is similar to the lines used in the proof of Theorem 1
of Chen et al. (2003), and we only need verify the conditions (1.1)–(1.3) and (1.5 ) in
their paper, as (1.4) follows from Lemma 5.1.
(1.1) By the subgradient condition of quantile regression (Koenker 2005), there
exists a vector v with coordinates |vi | ≤ 1 such that
Mn (θ̂ , π̂ )
= n−1 xi vi : i ∈ Ξ = op n−1/2 ,
(5.2)
where Ξ denotes a p-element subset of {1, 2, . . . , n}.
Informative subset-based censored quantile regression
651
(1.2) By assumptions A3 and A5.1, θ 0 is the minimizer of
E Qn (θ , π0 ) = E ρτ y − xT θ I π0 (x) > 1 − τ + cn .
Because E{Qn (θ , π0 )} is a strictly convex function of θ , θ 0 is the unique minimizer
of E{Qn (θ , π0 )}. This implies (1.2) of Chen et al. (2003).
(1.3) By assumption A5.3, for any positive n → 0, θ ∈ Θ, and π − π0 ∞ ≤ n ,
similar to (5.1), we have
E Mn (θ , π) − E Mn (θ , π0 ) = E x τ − P y ∗ < xT θ |x I xT θ > 0
× I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn ≤ E xI π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn ≤ C7 n .
Thus, E{Mn (θ , π)} is uniformly continuous in π at π = π0 .
(1.5 ) Let {n } be a sequence of positive numbers approaching zero as n → ∞.
Note that E[xi I {yi ≤ xTi θ }I {π(xi ) > 1 − τ + cn }2 ] ≤ E(xi 2 ) ≤ C8 , under assumption A1. It then follows from Chebyshev’s inequality that
sup
θ∈Θ,π−π0 ∞ ≤n
Mn (θ , π) − E Mn (θ , π) = op (1).
Proof of Theorem 2.2
(I) We first prove the 4th-root consistency of θ̂ .
We rewrite E{Mn (θ , π)} as
E Mn (θ , π) = E xψ(x, y, θ )I π(x) > 1 − τ + cn
.
= b1 (θ ) + b2 (π) + b3 (θ , π),
where
b1 (θ ) = E xψ(x, y, θ)I π0 (x) > 1 − τ + cn ,
b2 (π) = E xψ(x, y, θ 0 ) I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn ,
b3 (θ , π) = E x ψ(x, y, θ) − ψ(x, y, θ 0 )
× I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn .
(i) From Theorem 1, θ̂ is a consistent estimator. Let n → 0 be any positive sequence and θ ∈ Θ with θ − θ 0 ≤ n . Since π0 (x) > 1 − τ + cn implies xT θ 0 > 0,
which leads to P (y < xT θ 0 |x)I {π0 (x) > 1 − τ + cn } = P (y ∗ < xT θ 0 |x)I {π0 (x) >
1 − τ + cn } = τ I {π0 (x) > 1 − τ + cn }. For θ − θ 0 ≤ n , we have
652
Y. Tang et al.
b1 (θ ) = E Mn (θ , π0 )
= E xE ψ(x, y, θ )|x I π0 (x) > 1 − τ + cn
= E x τ − P y < xT θ |x I π0 (x) > 1 − τ + cn
= −E x P y < xT θ |x − P y ∗ < xT θ 0 |x I π0 (x) > 1 − τ + cn
= −E x P y ∗ < xT θ |x − P y ∗ < xT θ 0 |x I π0 (x) > 1 − τ + cn
+ E x P y ∗ < xT θ |x − P y < xT θ |x I π0 (x) > 1 − τ + cn
= −E f0 xT θ 0 |x xxT I π0 (x) > 1 − τ + cn (θ − θ 0 )
+ E xP y ∗ < xT θ |x I π0 (x) > 1 − τ + cn I xT θ ≤ 0
+ o(θ − θ 0 ).
(5.3)
Let D = E[f0 (xT θ 0 |x)xxT I (xT θ 0 > 0)]. By assumption A5.3,
T
E f0 x θ 0 |x xxT I 1 − τ < π0 (x) < 1 − τ + cn (θ − θ 0 ) = O(cn θ − θ 0 ) = o(θ − θ 0 ).
Since π0 (x) > 1 − τ is equivalent to xT θ 0 > 0, we have
D − E f0 xT θ 0 |x xxT I π0 (x) > 1 − τ + cn (θ − θ 0 ) = o(θ − θ 0 ).
For the second term of (5.3), we use assumptions A3 and A6 to get
E xP y ∗ < xT θ |x I π0 (x) > 1 − τ + cn I xT θ ≤ 0
= E xP y ∗ < xT θ 0 |x I π0 (x) > 1 − τ + cn I xT θ ≤ 0
+ E x P y ∗ < xT θ |x − P y ∗ < xT θ 0 |x
× I π0 (x) > 1 − τ + cn I xT θ ≤ 0
= τ E xI π0 (x) > 1 − τ + cn I xT θ ≤ 0
+ E f0 xT θ 0 |x xxT (θ − θ 0 )I π0 (x) > 1 − τ + cn I xT θ ≤ 0
+ o(θ − θ 0 )
= −τ D∗n2 (θ − θ 0 ) + o(θ − θ 0 ).
So, b1 (θ ) = −{D + τ D∗n2 }(θ − θ 0 ) + o(θ − θ 0 ), in which by assumption A6,
D + τ D∗n2 is positive definite.
(ii) Because π −π0 ∞ = op (n−1/4 ), and n1/4 cn > c∗ > 0, then π(x) > 1−τ +cn
implies π0 (x) > 1 − τ , and therefore xT θ 0 > 0. Since π0 (x) > 1 − τ + cn implies
xT θ 0 > 0, it follows that
Informative subset-based censored quantile regression
653
b2 (π) = E x τ − P y < xT θ 0 |x
× I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn
= E x τ − P y ∗ < xT θ 0 |x
× I π(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn
= 0.
(iii) Similar to (5.1), we have
b3 (θ , π)
≤ E x P y < xT θ 0 |x − P y < xT θ |x × I π0 (x) > 1 − τ + cn ≥ π(x) + I π(x) > 1 − τ + cn ≥ π0 (x)
≤ E x I π0 (x) > 1 − τ + cn ≥ π(x)
+ I π(x) > 1 − τ + cn ≥ π0 (x)
= o n−1/4 .
It follows that supθ −θ 0 ≤n ,π−π0 ∞ =op (n−1/4 ) b3 (θ , π) = o(n−1/4 ).
(iv) By Lemma 5.1, we have
Mn (θ , π) − Mn (θ 0 , π0 )
= − D + τ D∗n2 (θ − θ 0 ) + o(θ − θ 0 ) + b3 (θ , π) + op n−1/2 .
(5.4)
By Lemma 5.1 and (5.2), when plugging (θ̂ , π̂ ) into (5.4), we have
− D + τ D∗n2 (θ̂ − θ 0 ) + op (θ̂ − θ 0 ) = −Mn (θ 0 , π0 ) + op n−1/4 .
By the Central Limit Theorem, Mn (θ 0 , π0 ) is asymptotically normal and
n1/2 Mn (θ 0 , π0 ) = Op (1). By assumptions A3 and A6, θ̂ − θ 0 = op (n−1/4 ).
(II) Now we prove the root-n consistency and the asymptotic distribution of θ̂ .
By assumption A5.1, π0 (x) > 1 − τ + cn implies xT θ 0 > dn , where n1/4 dn is
greater than some positive constant, say, d ∗ . Now θ̂ − θ 0 = op (n−1/4 ), then by
assumptions A1 and A5.1, we have D∗n2 (θ̂ − θ 0 ) = op (θ̂ − θ 0 ).
Since n1/4 cn > c∗ > 0 and π − π0 ∞ = op (n−1/4 ), then by assumptions A1 and
A5.1, both π0 (x) > 1 − τ + cn and π̂ (x) > 1 − τ + cn imply xT θ 0 > 0 and xT θ̂ > 0.
Therefore,
b3 (θ̂ , π̂) = E x P y ∗ < xT θ 0 |x − P y ∗ < xT θ̂ |x
× I π̂(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn
= E xxT f0 xT θ 0 |x (θ̂ − θ 0 ) + op (θ̂ − θ 0 )
654
Y. Tang et al.
× I π̂(x) > 1 − τ + cn − I π0 (x) > 1 − τ + cn
= op (θ̂ − θ 0 ).
By Lemma 4.1 and (5.2), when plugging (θ̂ , π̂ ) into (5.4), we have
−Mn (θ 0 , π0 ) = −D(θ̂ − θ 0 ) + op n−1/2 + op (θ̂ − θ 0 ).
By the Central Limit Theorem, Mn (θ 0 , π0 ) is asymptotically normal and
n1/2 Mn (θ 0 , π0 ) = Op (1). Therefore, θ̂ − θ 0 = Op (n−1/2 ), and
√
√
n(θ̂ − θ 0 ) = D−1 nMn (θ 0 , π0 ) + op (1).
(5.5)
Let V = τ (1 − τ )E[xxT I {π0 (x) > 1 − τ }] = τ (1 − τ )E[xxT I (xT θ 0 > 0)]. Note
that
Cov
√
n
nMn (θ 0 , π0 ) = τ (1 − τ )n−1
xi xTi I π0 (xi ) > 1 − τ + cn .
i=1
By assumption A5.3, n
which implies
n
−1
i=1 I {1 − τ
Cov
It follows from (5.5) that
√
< π0 (xi ) ≤ 1 − τ + cn } = Op (cn ) = op (1),
√
nMn (θ 0 , π0 ) −→ V.
n(θ̂ − θ 0 ) → N 0, D−1 VD−1 .
References
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat
46:175–185
Buchinsky M (1994) Changes in U.S. wage structure 1963-87: an application of quantile regression.
Econometrica 62:405–458
Buchinsky M, Hahn J (1998) An alternative estimator for the censored quantile regression model. Econometrica 66:653–671
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71:1591–1608
Chernozhukov V, Hong H (2002) Three-step censored regression quantile and extramarital affairs. J Am
Stat Assoc 97:872–882
Fitzenberger B (1997a) Computational aspects of censored quantile regression. In: Dodge Y, Hayward
(eds) Proceedings of the 3rd international conference on statistical data analysis based on the L1 norm and related methods. IMS, CA, pp 171–186
Fitzenberger B (1997b) A guide to censored quantile regressions. In: Maddala GS, Rao CR (eds) Handbook
of statistics: robust inference, vol 15. North-Holland, Amsterdam, pp 405–437. MR1492720
Fitzenberger B, Winker P (2007) Improving the computation of censored quantile regressions. Comput
Stat Data Anal 52:88–108
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London
He X, Shao Q (1996) A general Bahadur representation of M-estimators and its application to linear
regression with nonstochastic designs. Ann Math Stat 24:2608–2630
Khan S, Powell J (2001) Two-step estimation of semiparametric censored regression models. J Econom
103:73–110
Informative subset-based censored quantile regression
655
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Koenker R (2008) Censored quantile regression redux. J Stat Softw 27:1–25
Koenker R, Park B (1996) An interior point algorithm for nonlinear quantile regression. J Econom 71:265–
283
Loader C (1990) Local regression and likelihood. Springer, New York
Portnoy S (2003) Censored regression quantiles. J Am Stat Assoc 98:1001–1012
Portnoy S (2010). Inconsistency of the Powell estimator: examples. Manuscript
Powell J (1986) Censored regression quantiles. J Econom 32:143–155
Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040–
1053
Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat 13:689–705
van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, New York
Volgushev S, Dette H (2010) Nonparametric quantile regression for twice censored data. arXiv:1007.3376
Wang H, Wang L (2009) Locally weighted censored quantile regression. J Am Stat Assoc 104:1117–1128
Xue L, Wang J (2010) Distribution function estimation by constrained polynomial spline regression. J
Nonparametr Stat 22:443–457
Zhou L (2006) A simple censored median regression estimator. Stat Sin 16:1043–1058
Download