Measurement Errors in Quantile Regression Models ∗ Sergio Firpo Antonio F. Galvao

advertisement
Measurement Errors in Quantile Regression Models∗
Sergio Firpo†
Antonio F. Galvao‡
Suyong Song§
June 30, 2015
Abstract
This paper develops estimation and inference for quantile regression models with
measurement errors. We propose an easily-implementable semiparametric two-step estimator when we have repeated measures for the covariates. Building on recent theory
on Z-estimation with infinite-dimensional parameters, consistency and asymptotic normality of the proposed estimator are established. We also develop statistical inference
procedures and show the validity of a bootstrap approach to implement the methods in
practice. Monte Carlo simulations assess the finite sample performance of the proposed
methods. We apply our methods to the well-known example of returns to education
on earnings using a data set on female monozygotic twins in the U.K. We document
strong heterogeneity in returns to education along the conditional distribution of earnings. In addition, the returns are relatively larger at the lower part of the distribution,
providing evidence that a potential economic redistributive policy should focus on such
quantiles.
Key Words: Quantile regression; measurement errors, returns to education
JEL Classification: C14, C23, J31
∗
The authors would like to express their appreciation to Roger Koenker, Yuya Sasaki, Susanne Schennach,
and Liang Wang for helpful comments and discussions. All the remaining errors are ours.
†
Sao Paulo School of Economics, FGV E-mail: sergio.firpo@fgv.br
‡
Department of Economics, University of Iowa, W284 Pappajohn Business Building, 21 E. Market Street,
Iowa City, IA 52242. E-mail: antonio-galvao@uiowa.edu
§
Department of Economics, University of Iowa, W360 Pappajohn Business Building, 21 E. Market Street,
Iowa City, IA 52242. E-mail: suyong-song@uiowa.edu
1
Introduction
Quantile regression (QR) models have provided a valuable tool in economics as a way of
capturing heterogeneous effects that covariates may have on the outcome of interest, exposing
a wide variety of forms of conditional heterogeneity under weak distributional assumptions.
Under some assumptions on the unobservable factors, QR can also be interpreted as providing
a structural relationship between the outcome of interest and its observable and unobservable
determinants. Also importantly, QR provides a framework for robust inference when the
presence of outliers is an issue.
Measurement errors (ME) have important implications for the reliability of general standard estimation and testing. Variables used in empirical economic analysis are frequently
measured with error, particularly if information is collected through one-time retrospective
surveys, which are notoriously susceptible to recall errors. If the regressors are indeed subject to classical ME, it is well known that the slope coefficient of the ordinary least squares
(OLS) estimator is inconsistent. In the one regressor case (or multiple uncorrelated regressors), under standard assumptions, the OLS is biased toward zero, a problem often denoted
as attenuation (see, e.g., Carroll et. al. (2006) and references therein for an overview of ME
models).
Recently, the topic of ME in variables has received considerable attention in the QR
literature. As in the OLS case, the standard QR estimator has been shown to be inconsistent
in the presence of ME (see, e.g., Montes-Rojas (2011)). He and Liang (2000) consider
the problem of estimating QR coefficients in errors-in-variables models, and propose an
estimator in the context of linear and partially linear models. Chesher (2001) studies the
impact of covariate ME on quantile functions using a small variance approximation argument.
Schennach (2008) discusses identification of a nonparametric quantile function under various
settings when there is an instrumental variable measured on all sampling units. Identification
and estimation for general quantile functions are based on Fourier transforms and previous
results for nonlinear models (see, e.g., Schennach, 2007). Wei and Carroll (2009) propose a
method for a linear QR model that corrects bias induced by the ME by constructing joint
estimating equations that simultaneously hold for all the quantile levels. More recently,
Torres-Saavedra (2013) and Hausman, Luo, and Palmer (2014) study ME in the dependent
variable of QR models. We refer to Ma and Yin (2011), Wang, Stefanski, and Zhu (2012),
and Wu, Ma, and Yin (2014) for other recent developments in QR models with ME. Thus, in
1
the analysis of QR with mismeasured covariates, it has been common to employ estimation
methods that either impose parametric restrictions on nuisance functionals or use exogenous
information as those provided by instrumental variables (see, e.g., Wei and Carroll (2009),
Schennach (2008), and Chernozhukov and Hansen (2006)). Nevertheless, methods relying
on parametric assumptions are very sensitive to misspecification of such conditions, which
are indeed relevant for inference as the asymptotic variance typically requires estimation of
conditional densities. In addition, finding exogenous instrumental variables is known to be
a nontrivial task in most economic models.
This paper contributes to both the QR and ME branches of the literature by developing
estimation and inference methods for QR models in the presence of ME in the covariates.
This is achieved by exploring repeated measures of the true regressor. Identification and
estimation of conditional mean regression models with repeated measures of the true regressor have already been studied in Li (2002), Schennach (2004) and Hu and Sasaki (2015),
among others. However, to the best of our knowledge, there has yet been no attempt to
develop estimation and inference for QR models using repeated measures of the true regressor. This paper bridges this gap. We propose a simple, easily-implementable, and wellbehaved two-step semiparametric estimation procedure that preserves the semiparametric
distribution-free and heteroscedastic features of the model. The first step employs a general nonparametric estimation of the density function. The second step uses the estimated
densities as weights in a weighted QR estimation. We establish the asymptotic properties
of the two-step estimator, assuming that the conditional densities satisfy smoothness conditions and can be estimated at an appropriate nonparametric rate. We also develop practical
statistical inference, and propose testing procedures for general linear hypotheses based on
the Wald statistic. To implement these tests in practice the critical values are computed
using a bootstrap method. We provide sufficient conditions under which the bootstrap is
theoretically valid, and discuss an algorithm for its practical implementation. Our method
leads to a simple algorithm that can be conveniently implemented in empirical applications.
Compared to the existing procedures for QR models with ME, our approach has several
distinctive advantages. First, our method does not assume global linearity at all quantile
levels for the estimation of the conditional density function as in Wei and Carroll (2009).
Such feature makes our procedure applicable to any τ -quantile of interest, thus relaxing the
requirement of a joint estimation and providing more flexibility. Second, our algorithm is
computationally simple and easy to implement in practice because estimation of the weights
2
does not require recursive algorithms allowing the weights for all observations to be obtained
from one single step. As a result, the quantile estimate is attained by minimizing only one
single convex objective function at the quantile of interest. Third, the methodology does
not rely on instrumental variables. Therefore, information from outside the model is not
necessary for identification. Finally, our estimated weights exhibit a property of uniform
consistency, implying that it is feasible to establish both the consistency and asymptotic
normality of the resulting estimators of the parameters of interest. Hence, the method
provides standard inference and testing procedures.
Monte Carlo simulations assess the finite sample properties of the proposed methods.
We evaluate the estimator in terms of empirical bias, standard deviation, and mean squared
error, and compare its performance with methods that are not designed for dealing with
ME issues. The experiments suggest that the proposed approach performs relatively well in
finite samples and effectively removes bias induced by ME.
Our procedure will hopefully be useful for those empirical settings based on QR models
in which ME in the independent variables is a concern because the method does provide
intuitive and practical ways of handling the problem. To motivate and illustrate the applicability of the methods, we revisit and analyze the important example of returns to education
on earnings. The QR approach is an important tool in this example because it allows us to
capture the heterogeneity in the returns to education along the conditional wage distribution. At the same time, endogeneity induced by ME has been extensively discussed in the
returns to education example, as misreporting in the number of schooling years is a genuine concern (Card (1995), Card (1999), and Harmon and Oosterbeek (2000)). Within that
framework, finding valid and strong instrumental variables to solve the endogeneity problem
is not, in general, an easy task (see, e.g., Card (1999)). Thus, our method is a natural
alternative solution to the ME problem when repeated measures on educational achievement
are available.
In our empirical example we use a data set on female monozygotic twins in the U.K.
that had been previously used in Bonjour et al. (2003) to study the problem of returns to
education. Bonjour et al. (2003) use the information on one twin to obtain an instrumental
variable for schooling years on the other twin. Amin (2011) points out that the results in
Bonjour et al. (2003) are largely affected by outlier observations and revisited the problem
using QR. He uses the same data on twins and apply the instrumental variables method-
3
ology described in Arias, Hallock, and Sosa-Escudero (2001). We compare our results with
those from Bonjour et al. (2003) and Amin (2011). Our empirical findings exemplify and
support the idea that the proposed methods are a useful alternative to existing approaches
in economic applications in which ME is an important concern. We document strong heterogeneity in returns to education along the conditional distribution of earnings. In addition,
the returns are relatively larger at the lower part of the distribution, providing evidence that
a potential economic redistributive policy should focus on such quantiles.
The rest of the paper is organized as follows. Section 2 presents the model and discusses
identification of the parameters of interest in presence of ME. Section 3 proposes the two-step
QR estimator. Section 4 establishes the asymptotic properties of the estimator. Inference
is discussed in Section 5. Section 6 presents the Monte Carlo experiments. In Section 7, we
illustrate empirical usefulness of the the new approach by applying to returns to education.
Finally, Section 8 concludes the paper.
2
Model and identification
2.1
Model
We first introduce the model studied in this paper. Given a quantile τ ∈ (0, 1), we define
the following quantile regression (QR) model,
Yi = Xi> β0 (τ ) + Zi> δ0 (τ ) + εi (τ ),
(1)
where Yi is the scalar dependent variable of interest, Xi is a vector of potentially-mismeasured
covariates, Zi is a vector of correctly-observed covariates, and εi (τ ) is the innovation term
whose τ -th quantile is zero conditional on (Xi , Zi ). The structural parameters of interest are
θ0 (τ ) = (β0 (τ ), δ0 (τ )). In general, each β0 (τ ) and δ0 (τ ) will depend on τ , but we assume τ
to be fixed throughout the paper and suppress such a dependence for notational simplicity.
Suppose (Yi , Xi , Zi ) are i.i.d. random variables defined on a complete probability space
(Ω, F, P ). Define the population objective function for the τ -th conditional quantile as
Q(β0 , δ0 ) := E ψτ (Yi − Xi> β0 − Zi> δ0 )[Xi Zi ] = 0,
(2)
where ψτ (u) := (τ − I{u < 0}) with the indicator function I{·}. When the true covariates
(X, Z) are observed, β0 and δ0 can be consistently estimated from the standard quantile
4
regression model with sample analog of Q(β, δ) in (2) as
n
1X
Qn (β, δ) :=
ψτ (Yi − Xi> β − Zi> δ)[Xi Zi ] = 0.
n i=1
(3)
The presence of the indicator function in the above equation implies that the solution may
not be an exact zero. It is usual to write this estimator as a minimization problem, and then
use linear programming to solve the optimization. Thus, the above moment condition is a
slight abuse of notation, but since everything else involving observed data is an estimating
equation that will have a zero, we will use the estimating equation nomenclature. For more
details on Z-estimator with non-smooth objective functions, see He and Shao (1996, 2000).
2.2
Measurement error bias and its solution
Under the assumption of perfectly-measured regressors, the solution of equations (3) can
be shown to produce consistent estimates of (β0 , δ0 ). Nevertheless, it is commonly observed
that researchers have to use the regressor X measured with error. Using mismeasured X
in the standard QR estimation in (3) induces a substantial bias in the estimates of the
coefficients of interest (see, e.g., He and Liang, 2000). Thus, estimation of the standard QR
model under measurement errors (ME) leads to inconsistent estimates. To overcome this
drawback we propose a methodology that makes use of repeated measures. Both variables
are mismeasured observables of the true covariate.
Suppose that true covariate X is unobservable due to ME. Instead, a researcher observes
two error-laden measurements which are noisy measures of X and defined as follows
X1i = Xi + U1i
X2i = Xi + U2i ,
where U1i and U2i are ME. Therefore, the observed random variables are (Yi , X1i , X2i , Zi ),
and one seeks to estimate the parameters (β0 , δ0 ).
We show how to use information from the measures X1 and X2 to obtain consistent
estimates of the parameters of interest. For that purpose, it is useful to rewrite Q(β, δ) as a
5
function of the density function as well as (β, δ):
e 0 , δ0 , f0 ) := E[ψτ (Y − X > β0 − Z > δ0 )[X Z]]
Q(β
Z
= ψτ (y − x> β0 − z > δ0 )[x z] · fY XZ (y, x, z)dydxdz
Z
= ψτ (y − x> β0 − z > δ0 )[x z] · fX|Y Z (x | y, z)fY Z (y, z)dydxdz
Z
>
>
=E
ψτ (Y − x β0 − Z δ0 )[x z] · fX|Y Z (x | Y, Z)dx
(4)
x
= 0,
where fY XZ (y, x, z) and fY Z (y, z) are the joint density of (Y, X, Z) and (Y, Z), respectively,
and where fX|Y Z (x | y, z) ≡ f0 is the conditional density of X given (Y, Z). By replacing
the outer expectation with its empirical counterpart, we write the sample analog of the
population objective function (4) as:
n Z
1X
e
Qn (β, δ, f ) :=
ψτ (Yi − x> β − Zi> δ)[x Zi ] · fX|Y Z (x | Yi , Zi )dx
n i=1 x
(5)
= 0.
The integration in (5) makes the function continuous in its argument. The summand
of (5) is Ex [ψτ (Yi − x> β − Zi> δ)[x Zi ] | Yi , Zi ], the conditional mean of the original score
function given the observed Y and Z. Moreover, (5) is an unbiased estimating function,
that is, has mean zero, and will be the basis for constructing estimating equations to obtain
consistent estimates of the parameters of interest.
Therefore, one would solve the new estimating equation (5) to estimate the parameters
of interest. In empirical applications, however, the true conditional density fX|Y Z (x | y, z)
is unknown and to implement the estimator (5) in practice one needs to replace it with
fbX|Y Z (x | y, z), a consistent estimate of fX|Y Z (x | y, z). Thus, a (feasible) estimator would
first estimate fX|Y Z (x | y, z). The fitted density function from this step would be used to
estimate the coefficients of interest in a second step. Finally, with a consistent estimate
of the conditional density, (β0 , δ0 ) can be consistently estimated. However, in general, the
conditional density is not stochastically identified due to the unobservability of the true X.
In a related model, Wei and Carroll (2009) make use of an iterative algorithm to obtain a
consistent estimator of the conditional density fX|Y X1 (x | y, x1 ) in the presence of ME on X.1
1
We note that their conditional density is slightly different from ours since there is mismeasured covariate
X1 in their conditioning set.
6
They focus on model with one measurement of true X (here X1 ) and with no other observed
covariates Z for simplicity. Although their approach can be useful in some applications, it
has important technical challenges. First, in order to implement the estimator, one needs
to estimate the conditional density fX|Y X1 (x | y, x1 ) which requires pre-specified parametric
form of fX|X1 (x | x1 ). This suffers from potentially serious model misspecification. Second,
and related to the first problem, there is a problem to solve the estimating equations, since
estimating the conditional density fX|Y X1 (x | y, x1 ) involves estimation of the entire process β0 (τ ) over quantiles τ . In other words, the estimating equations in Wei and Carroll
(2009) need to be solved jointly for all the τ ’s, which increases the dimensionality of the
problem substantially and makes implementation considerably difficult. This is reflected in
the tractability of inference for their method.
In this paper, we propose a novel way to nonparametrically estimate the conditional
density without imposing assumptions on known distributions of the ME. Specifically, we
make use of the repeated measures, X1 and X2 , and show that two mismeasured covariates
are sufficient to identify the conditional density in the presence of ME on the covariate. In
turn, the result guarantees consistent estimation of parameters of interest. The approach
with repeated measurements has been recently studied in the ME literature. Most of studies
have focused on i.i.d. measurement errors (e.g., Li and Vuong, 1998; Delaigle, Hall, and
Meister, 2008). We extend the literature by relaxing such strong conditions. We also extend
issues in smooth objective function of mean regression with ME (e.g., Schennach, 2004) to
a non-smooth objective function such as the QR.
In the next section we propose a procedure that yields a consistent estimator of (β0 , δ0 )
in (5). We develop a method for QR with measurement errors, which relies on estimating
the conditional density function nonparametrically. The method is a two-step estimator,
where in the first step we estimate the density nonparametrically and then in the second
step we employ a standard weighted QR procedure. Before we proceed to estimation, we
show an identification result for the density function which is essential in the estimation.
For expositional ease, we use fX|Y Z (x | y, z) and f (x | y, z) synonymously.
2.3
Conditional density
As described above, f (x | y, z) is an important element for the identification of the parameters of interest in the QR with ME. This section describes the identification of the
7
conditional density function f (x | y, z) which is required to compute the two-step estimator.
The identification is based on the assumption that repeated measures of the true regressor
are observed. We state the following assumptions to obtain the main identification result.
Assumption A.I: (i) E[U1 | X, U2 ] = 0; (ii) U2 ⊥ (Y, X, Z).
Assumption A.II: (i) E[|X|] < ∞; (ii) E[|U1 |] < ∞; (iii) |E[exp(iζX2 )]| > 0 for any
finite ζ ∈ R.
Assumption A.III: (i) sup(x,y,x1 ,z)∈supp(X,Y,Z) f (x | y, z) < ∞; (ii) f (x | y, z) is integrable
on R for each (y, z) ∈ supp(Y, Z).
Assumption A.I imposes restrictions on the repeated measures of X. Assumption A.I (i)
requires conditional mean zero of ME on X1 , but allows dependence of the ME and (X, U2 ).
Assumption A.I (ii) requires that ME on X2 is independent of true X as well as other
variables. However, it does not necessarily require zero mean of U2 . Thus, our setting on
the repeated measures can be useful for an example such that there is a drift or trend in the
mismeasured covariates. Assumption A.II imposes mild restrictions on the existence of the
first moments of X and U1 , and nonvanishing characteristic function of X2 . These have been
commonly assumed in the deconvolution literature (see, e.g., Fan, 1991b; Fan and Truong,
1993). Assumption A.III is trivially satisfied in commonly-used conditional densities.
Let φ(ζ, y, z) ≡ E[eiζX | Y = y, Z = z] be conditional characteristic function of X given
Y and Z. The following theorem presents the identification of f (x | y, z).
Theorem 1 Suppose Assumptions A.I–A.III hold. Then, for (x, y, z) ∈ supp(X, Y, Z),
Z
1
f (x | y, z) =
φ(ζ, y, z) exp(−iζx)dζ,
(6)
2π
where for each real ζ,
E[eiζX2 | Y = y, Z = z]
φ(ζ, y, z) =
exp
E[eiζX2 ]
Z
0
ζ
iE[X1 eiξX2 ]
dξ .
E[eiξX2 ]
Proof. See Appendix.
The theorem implies that conditional density f (x | y, z) can be written as a function of
purely-observed variables. For this, we use useful properties of Fourier transform. Namely, we
8
write f (x | y, z) as the inverse Fourier transform of φ(ζ, y, z). This simplifies identification
since φ(ζ, y, z) is easily identified from Assumptions A.I–A.III by removing the ME, U1
and U2 , in the frequency domains (ζ, ξ). It is worth noting that the identification result is
similar to Kotlarski (1967) who identifies density of X from its repeated measurements by
assuming mutual independence of X, U1 , and U2 . Our approach rests on weaker assumptions
than their mutual independence, which is highlighted in condition A.I. As a result, the
proposed method can be applied to many interesting topics which allow for dependence
among variables and their ME.
2.4
Identification
e δ, f ) as:
Given the result in equation (6), we can rewrite Q(β,
Z
>
>
e
ψτ Y − x β − Z δ [x Z] · fX|Y Z (x | Y, Z)dx
Q(β, δ, f ) = E
x
Z
Z
1
>
>
ψτ Y − x β − Z δ [x Z] ·
=E
φ(ζ, Y, Z) exp(−iζx)dζ dx ,
2π ζ
x
which does not depend on data on X. Thus, estimation of (β0 , δ0 ) follows from solving a
en (β, δ, f ):
feasible version of Q
n
X
en (β, δ, fb) = 1
Q
n i=1
Z
ψτ Yi − x> β − Zi> δ [x Zi ] · fbX|Y Z (x | Yi , Zi )dx,
x
where
1
fbX|Y Z (x | Yi , Zi ) =
2π
Z
b Yi , Zi ) exp(−iζx)dζ,
φ(ζ,
ζ
and the only feature of this sample objective function that had not yet been presented is
b the estimate of φ, which is defined in the next section. In practice, as we discuss next,
φ,
we approximate integrals by sums, thus actual implementation solves a slightly different
objective function. By approximating the integral by a sum, we end up with a double sum
(on observations and on grid values of X). Importantly on that representation is the fact
b δ)
b will be obtained by a weighted QR, whose weights will be given by
that the estimates (β,
the estimate fbX|Y Z .
9
3
Estimation
Given the identification condition in equation (6) of Theorem 1, we are able to estimate
the structural parameters of interest, (β0 , δ0 ). We propose a semiparametric estimator that
involves two-step estimation. Implementation of the estimator is simple in practice. In
the first step, one estimates the nuisance parameter, the conditional distribution, using a
nonparametric method which requires no optimization. In the second step, by plugging-in
these estimates, a general weighted quantile regression (QR) is performed.
3.1
Estimation of nuisance parameter
In this subsection we discuss the estimation of the nuisance parameter in the first step,
i.e., the conditional density f (x | y, z). It is important to note that the proposed density
estimation is novel in the literature and makes use of repeated measures and nice properties
of Fourier transform.
The estimation of the nuisance parameter is very important step for implementation of
the proposed estimator in practice. We propose a nonparametric method to estimate the
density consistently. To obtain a consistent estimator of f (x | y, z), we adapt the class
of flat-top kernels of infinite order proposed by Politis and Romano (1999). Consider the
following assumption.
Assumption A.IV: The real-valued kernel x → k(x) is measurable and symmetric with
R
k(x)dx = 1, and its Fourier transform ξ → κ(ξ) is bounded, compactly supported, and
equal to one for |ξ| < ξ¯ for some ξ¯ > 0.
From Assumption A.IV, we allow for a kernel of the form (see, e.g., Li and Vuong, 1998)
k(x) =
sin(x)
,
πx
(7)
with its Fourier transform such that
x
κ(h ζ) =
Z
1 x
k x exp(iζx)dx,
hx
h
(8)
for a bandwidth hx . This flat-top kernels of infinite order has the property that its Fourier
transform is equal to one over [−1, 1] interval and zero elsewhere, which guarantees that
the bias goes to zero faster than any power of the bandwidth. We note that the ill-posed
10
inverse problem occurs when one tries to invert a convolution operation. This is true to our
proposed estimator because it is divided by a quantity which converges to zero as frequency
parameter goes to infinity by Riemann-Lebesgue lemma. By estimating the numerator using
the kernel whose Fourier transform is compactly supported, one can guarantees that the
ratio is under control. This is because that the numerator can decay to zero before the
denominator converges to zero. This compact support of the Fourier transform of the kernel
can be easily implemented by preserving most of the properties of the original kernel. For
instance, one can transform any given kernel e
k into a modified kernel k with compact Fourier
support by using a window function that is constant in the neighborhood of the origin and
vanishes beyond a given frequency.
The following theorem summarizes the result.
Theorem 2 Suppose Assumptions A.I–A.III hold, and let k satisfy Assumption A.IV.
For (x, y, z) ∈ supp(X, Y, Z) and hx > 0, let
Z
1
x
e−x
x
f (x | y, z; h ) ≡
k
f (e
x | y, z)de
x.
hx
hx
(9)
Then we have
1
f (x | y, z; h ) =
2π
x
Z
κ(hx ζ)φ(ζ, y, z) exp(−iζx)dζ.
(10)
Proof. See Appendix.
(2)
(2)
b denote
Let hn ≡ (hxn , hn ) with hn ≡ (hyn , hzn ) be a set of smoothing parameters. Let E[·]
P
a sample average, i.e., n1 ni=1 [·]. Finally, we introduce a consistent nonparametric estimator
of f (x | y, z) motivated by Theorem 2.
The estimator of f (x | y, z) is defined as
Z
1
b
b y, z, h(2) ) exp(−iζx)dζ,
f (x | y, z; hn ) ≡
κ(hxn ζ)φ(ζ,
n
2π
Definition 2.3
for hn → 0 as n → ∞, where
b y, z, h(2) )
φ(ζ,
n
!
Z ζ b
b iζX2 | Y = y, Z = z]
E[e
iE[X1 eiξX2 ]
≡
exp
dξ .
b iζX2 ]
b iξX2 ]
E[e
E[e
0
11
(11)
The above estimator is useful to compute the structural parameters of interest. Since it
has an explicit closed form, it requires no optimization routine unlike other likelihood-based
b iζX2 | Y = y, Z = z], can be achieved via
approaches. Estimation of conditional mean, E[e
any nonparametric method. For instance, one might use popular kernel estimation with
khn (·) ≡ h−1
n k (·/hn ) (e.g., Epanechnikov kernel) defined as
b iζX2 khy (Y − y)khz (Z − z)]
n
n
b iζX2 | Y = y, Z = z] ≡ E[e
E[e
.
b hy (Y − y)khz (Z − z)]]
E[k
n
3.2
n
Estimation of the structural parameters
This section describes the general estimator for QR models with ME. The estimator can be
obtained in two steps. Given the identification condition in equation (5) and the estimator of
the density function described in the previous section, we are able to estimate the structural
parameters of interest. We propose a Z-estimator that involves two-step estimation. We
estimate the parameters of interest, θ0 = (β0 , δ0 ) for a selected τ of interest, from the
following two steps:
Step 1. Estimate fb(xj | Yi , Zi ; h) for each i-th observation and j-th grid as in equation
(5) where j ∈ J ≡ {1, 2, . . . , m} with m number of grids for approximating the numerical integral. The choice of kernels and bandwidths are provided in Definition 2.3 above.
The integrals in equation (11) are performed using the fast Fourier transforms (FFT) algorithm. Well-behaving performance of the algorithm is guaranteed by the smoothness of the
characteristic function φ(·) and the finiteness of the moments.
Step 2. Then, to compute equation (5) in practice, we have to make a numerical
approximation to the integral over x. We do this via translating the problem into a weighted
quantile regression problem. Let x
e = (e
x1 , x
e2 , ..., x
em ) is a fine grid of possible xj values, akin
b ) = (β(τ
b ), δ(τ
b )) can be computed
to a set of abscissas in Gaussian quadrature. For each τ , θ(τ
by solving
m
n X
X
>
ψτ (Yi − x
e>
xj Zi ] · fb(e
xj | Yi , Zi ; h) = 0,
j β − Zi δ)[e
(12)
i=1 j=1
where fb(e
xj | Yi , Zi ; h) is obtained from Step 1. The weighted quantile regression of Yi on
x
ej and Zi with corresponding weights fb(e
xj | Yi , Zi ; h) can be readily computed using the
function called “rq” in R package quantreg.
12
b ), δ(τ
b )),
The asymptotic properties of the estimator given in equation (11) and also of (β(τ
in equation (12), are established in Section 4 below.
4
Asymptotic properties
This section investigates the large sample properties of the proposed two-step estimator.
While these methods seem similar to the ones discussed by Wei and Carroll (2009), the
novel estimation of the conditional density function raises some new issues for the asymptotic
analysis of the estimator. First, we establish the asymptotic results for the estimator of the
conditional density function given in (11). Second, we establish consistency and asymptotic
normality of the two-step estimator in (12).
4.1
Asymptotic properties of the density estimator
In this subsection we establish the asymptotic properties of the density function estimator in
R
equation (11). Let µ(ζ) ≡ E[eiζX ], ω1 (ζ) ≡ E eiζX2 , and χ(ζ, y, z) ≡ eiζx2 fX2 Y Z (x2 , y, z)dx2 .
We impose the following assumptions.
Assumption B.I: (i) There exist constants C1 > 0 and γµ ≥ 0 such that
Dζ µ(ζ) ≤ C1 (1 + |ζ|)γµ ;
|Dζ ln µ(ζ)| = µ(ζ) (ii) There exist constants Cφ > 0, αφ ≤ 0, νφ ≥ 0, and γφ ∈ R such that νφ γφ ≥ 0 and
|φ(ζ, y, z)| ≤ Cφ (1 + |ζ|)γφ exp(αφ |ζ|νφ ),
sup
(y,z)∈supp(Y,Z)
and if αφ = 0, then γφ < −1;
(iii) There exist constants Cω > 0,αω ≤ 0, νω ≥ νφ ≥ 0, and γθ ∈ R such that νω γω ≥ 0 and
min{
inf
(y,z)∈supp(Y,Z)
|χ(ζ, y, z)|, |ω1 (ζ)|} ≥ Cω (1 + |ζ|)γω exp(αω |ζ|νω ).
Assumption B.II: (i) E[|X1 |2 ] < ∞; (ii) E[|X1 ||X2 |] < ∞; (i) E[|X2 |] < ∞.
Assumption B.III: sup(y,z)∈supp(Y,Z)) |fb(y, z) − f (y, z)| = Op
(ln n)1/2
(nhy hz )1/2
+
P
s 2
s=y,z (h
) .
These assumptions are standard for nonparametric deconvolution estimators because
their rates of convergence will depend on the tails of the Fourier transforms (see, e.g., Fan,
13
1991b; Fan and Truong, 1993). The literature commonly adopts two types of smoothness
assumptions: ordinary and super smoothness. Ordinary smoothness admits a Fourier transform whose tail decays to zero at a geometric rate |ζ|γ , γ < 0 whereas super smoothness
admits a Fourier transform whose tail decays to zero at an exponential rate exp (α |ζ|γ ),
α < 0, γ > 0.2 Assumption B.I simultaneously imposes ordinary and super smoothness
conditions.3 Assumption B.II imposes mild moment restrictions required for consistency
results. Assumption B.III imposes a standard condition on nonparametric estimator of the
joint density of f (y, z).
The next result establishes the asymptotic properties of the density function estimator.
Theorem 3 Let Assumptions A.I–IV and B.I–III hold. Then for (x, y, z) ∈ supp(X, Y, Z)
and h > 0 satisfying max{(hyn )−1 , (hzn )−1 } = O (nη ) and
(hxn )−1 = O (ln n)1/νω −η
(hxn )−1
if νω 6= 0,
= O n(1−20η)/2(γµ −γω ) if νω = 0,
for some η > 0, we have
sup
|fb(x | y, z; h) − f (x | y, z)|
(x,y,z)∈supp(X,Y,Z)
x −1 γB
ν (h )
exp αB (hx )−1 B
δ
γ
ν + Op n−1/2 max{ 1 + (hx )−1 L , (hy hz )−1 } 1 + (hx )−1 L exp αL (hx )−1 L ,
=O
with αB ≡ αφ ξ¯νφ , νB ≡ νφ , γB ≡ γφ + 1, αL ≡ αφ 1{νφ =νω } − αω , νL ≡ νω , γL ≡ 1 + γφ − γω ,
and δL ≡ 1 + γµ .
Proof. See Appendix.
The theorem above establishes a consistency and uniform convergence rate of the proposed estimator. The conditions on the bandwidths are imposed to guarantee that asymptotic behavior of the linear approximation of the expression fb(x | y, z; h) − f (x | y, z) is
2
The typical examples of ordinarily smooth functions are uniform, gamma, symmetric gamma, Laplace
(or double exponential), and their mixtures. Normal, Cauchy, and their mixtures are super smooth functions.
ν
3
A term exp (α1 |ζ| 1 ) is omitted in Assumption B.I (i) with merely a small loss of generality since ln µ (ζ)
is indeed a power of ζ.
14
essentially determined by a variance term since a nonlinear remainder term is asymptotically negligible. The result also shows that convergence rate depends on the tail behaviors of
the associated quantities. For instance, when χ(ζ, y, z) and ω1 (ζ) in Assumption B.I is ordinarily smooth (i.e., νω = 0), one can choose small bandwidth so that resulting convergence
rate of the estimator is faster than when they are super smooth.
4.2
Asymptotic properties of the two-step estimator
In this subsection, we derive the asymptotic properties of the two-step estimator of parameters of interest. We establish its consistency and asymptotic normality.
4.2.1
Consistency
Consistency is a desirable property for most estimators. We wish to establish consistency of
b δ)
b defined in equation (12), where fb, given in (11), is an estimator of
the estimator θb = (β,
f0 := f (x | y, z).
First, notice that from the estimating equation in (5) we have
n
X
en (β, δ, f ) = 1
Q
n i=1
Z
ψ(Yi − x> β − Zi> δ)(x Zi ) · f (x | Yi , Zi ) dx,
and its expectation is
Z
e δ, f ) = E
Q(β,
ψ(Yi − x> β − Zi> δ)(x Zi ) · f (x | Yi , Zi ) dx.
b δ)
b is obtained by equating Q
en (β, δ, fb) to zero, where fb is an estimator
The estimator θb = (β,
e δ, f0 ) = 0 if and only if (β > , δ > )> = (β0> , δ0> )> ∈ Θ.
of f0 . Note that Q(β,
Now we formally state the following sufficient conditions for the two-step estimator to be
consistent.
b δ,
b fb) = op (1).
en (β,
Assumption C.I: Q
Assumption C.II: X ∈ X , a compact set in Rdx .
Assumption C.III: E[|Z|] < ∞.
15
Condition C.I defines the estimating equation (Z-estimator). Pakes and Pollard (1989)
and Chen, Linton, and Van Keilegom (2003) have similar assumptions. For a detailed discussion of this type of identification assumption, see, e.g., He and Shao (1996, 2000). C.II
imposes compactness for the true covariate. A similar assumption in the QR literature
appears in Chernozhukov and Hansen (2006). C.III only requires the first moment of the
well-measured regressor to be finite. A uniform law of large numbers for the first-step estimator fb(x | y, z) is standard in two-step estimation literature; see, e.g., Newey and McFadden
(1994). We note that this is straightforwardly satisfied by Theorem 3.
b δ).
b
The following theorem derives consistency of the proposed two-step estimator, θb = (β,
Theorem 4 Under assumptions C.I–C.III and conditions of Theorem 3, as n → ∞
p
θb → θ0 .
Proof. See Appendix.
4.2.2
Weak convergence
Now we derive the limiting distribution of the two-step estimator in (12). We impose the
following assumptions for weak convergence.
b δ,
b fb) = op (n−1/2 ).
en (β,
Assumption G.I: Q
Assumption G.II: The conditional density gY (y | X = x, Z = z) is bounded and
uniformly continuous in y, uniformly in x and z over the support of (Y, X, Z).
Assumption G.III: Let Γ1 := EgY (X > β0 +Z > δ0 ) | X, Z)(X > , Z > )> (X > , Z > ) be positive
definite and Vn := var[Qn (θ0 )]. There exists a nonnegative definite matrix V such that
Vn → V as n → ∞.
Assumption G.IV: ||fb − f0 || = op (n−1/4 ).
Assumption G.V: Z ∈ Z is compact.
Assumption G.VI: For some > 0, F = {f : ||f − f0 || ≤ } is uniformly bounded and
Donsker.
16
Condition G.I defines the estimator. It is slightly stronger than condition C.I but still
allows the right-hand side to be only approximately zero. This type of op (n−1/2 ) condition is
also assumed in Theorem 3.3 of Pakes and Pollard (1989) and Theorem 2 of Chen, Linton,
and Van Keilegom (2003). Conditions G.II and G.III are standard in the QR literature;
see, e.g., Koenker (2005). Condition G.IV imposes that the estimator of the nuisance
parameter converges at a rate faster than n−1/4 . A similar condition appears in condition
(2.4) in Theorem 2 of Chen, Linton, and Van Keilegom (2003). Assumption G.V strengthens
C.III and imposes compactness on the well-measured regressor. Finally, condition G.VI
is similar to Chen, Linton, and Van Keilegom (2003) and Galvao and Wang (2015), and
guarantees that f is asymptotically well behaved. This condition is related to the stochastic
en . It allows for many nonparametric
equicontinuity of the moment function associated with Q
estimators of the conditional density f0 . Primitive conditions can be obtained through the
derivation of asymptotic normality of fb, which requires finding a lower bound for the variance
of the estimator. In fact, an exact asymptotic rate of convergence can be obtained from the
assumption that the limiting behavior of the relevant Fourier transforms has a power law or
an exponential form; see e.g., Fan (1991a) for the kernel deconvolution estimator.
We note that Assumption G.IV is verifiable for particular examples through Theorem 3.
As shown in Theorem 3, the convergence rate is controlled by the smoothness of quantities
such as φ(ζ, y, z), χ(ζ, y, z), and ω1 (ζ). Recall that φ(ζ, y, z) is the conditional density of
X given Y = y and Z = z (i.e., f (X | Y = y, Z = z)), the parameter of interest in the
first step; χ(ζ, y, z) is the conditional characteristic function of X2 given Y = y and Z = z,
weighted by the joint density of (Y, Z) (i.e., E[eiζX2 | Y = y, Z = z]f (y, z)); and ω1 (ζ) is the
characteristic function of X2 . Since ω1 (ζ) = E[eiζX2 ] = E[eiζX ]E[eiζU2 ], the smoothness of
ω1 (ζ) is determined by X and U2 . Therefore, the rate of convergence depends on the possible
combinations of the smoothness of various quantities. For instance, if φ(ζ, y, z) is ordinarily
smooth and if χ(ζ, y, z) and ω1 (ζ) are super smooth, a convergence rate of the form (ln n)−υ
for some υ > 0 is achieved. This case illustrates a very slow rate of convergence. On the other
hand, a faster convergence rate, n−υ for some υ > 0, which satisfies Assumption G.IV, can
be achieved when φ(ζ, y, z) is also super smooth. In addition, if all three quantities, φ(ζ, y, z),
χ(ζ, y, z), and ω1 (ζ), are ordinarily smooth, the slow convergence problem is easily avoided.
b δ),
b is established in the following
Weak convergence of the two-step estimator, θb = (β,
result.
17
Theorem 5 Under Assumptions C.I–C.III, G.I–G.VI, and conditions of Theorem 3, as
n→∞
√
n(θb − θ0 )
N (0, Λ)
−1
for some positive definite matrix Λ = Γ−1
1 V Γ1 .
Proof. See Appendix.
5
Inference
In this section, we turn our attention to inference in the quantile regression (QR) with measurement errors (ME) model. Important questions posed in the econometric and statistical
literatures concern the nature of the impact of a policy intervention or treatment on the
outcome distributions of interest; for example, whether a policy exerts a significant effect, a
constant versus heterogeneous effect, or a non-decreasing effect. It is possible to formulate
a wide variety of tests using variants of the proposed method, from simple tests on a single
quantile regression coefficient to joint tests involving many covariates and distinct quantiles
simultaneously. We suggest a bootstrap-based inference procedure to test general linear
hypotheses.
5.1
Test statistic
General hypotheses on the vector θ(τ ) can be accommodated by standard tests. The proposed statistic and the associated limiting theory provide a natural foundation for the hypothesis Rθ(τ ) = r when r is known. The following are examples of hypotheses that may
be considered in the former framework.
Example 1 (No effect of the mismeasured variable). For a given τ , if there is no dynamic
effect in the model, then under H0 : β(τ ) = 0. Thus, θ(τ ) = (β(τ ), δ(τ ))> , R = [1, 0] and
r = 0.
Example 2 (Location shifts). The hypotheses of location shifts for β(τ ) and δ(τ ) can
be accommodated in the model. For the first case, H0 : β(τ ) = β, so θ(τ ) = (β(τ ), δ(τ ))0 ,
R = [1, 0] and r = β. For the latter case, H0 : δ(τ ) = δ, so that R = [0, 1] and r = δ.
18
More general hypotheses are also easily accommodated by the linear hypothesis. Let
ζ = (θ(τ1 )> , ..., θ(τm )> ) and define the null hypothesis as H0 : Rν = r. This formulation
accommodates a wide variety of testing situations, from a simple test on single QR coefficients
to joint tests involving several covariates and distinct quantiles. Thus, for instance, we might
test for the equality of several slope coefficients across several quantiles.
Example 3 (Same mismeasured effect for two distinct quantiles). If there are the same
effects for two given distinct quantiles in the model, then under H0 , β(τ1 ) = β(τ2 ). Thus,
ζ = (θ(τ1 )> , ..., θ(τm )> ) = (β(τ1 ), δ(τ1 ), β(τ2 ), δ(τ2 ))> , R = [1, 0, −1, 0] and r = 0.
Consider the following general null hypothesis for a given τ of interest
H0 : Rθ(τ ) − r = 0,
where R is a full-rank matrix imposing q number of restrictions on the parameters, and r
is assumed to be a known column vector of q elements. Practical implementation of testing
procedures can be carried out based on the following statistic
b ) − r.
Wn (τ ) = Rθ(τ
From Theorem 5, at given τ , and under the null hypothesis, it follows
(13)
√
b ) − r)
n(Rθ(τ
N (0, RΛR0 ). If we are interested in testing H0 , a Chi-square test could be conducted based
on the statistic in equation (13). However, to carry out practical inference procedures, even
for a fixed quantile of interest, to construct a Wald statistic one would need to first estimate
Λ consistently, and consequently nuisance parameters which depend on both the unknown θ0
and f0 in a complicated way. The estimation of Λ is potentially difficult because it contains
additional terms from the effect of θ on the objective function indirectly through f0 . An
alternative method is to use the statistic Wn directly and the bootstrap to compute critical
values and also form confidence regions. Therefore, to make practical inference we suggest
the use of bootstrap techniques to approximate the limiting distribution.
5.2
Implementation of testing procedures
Practical implementation of the proposed tests is simple. To test H0 with known r, one needs
to compute the test statistics Wn (τ ) for a given τ of interest. The steps for implementing
the tests are as following:
19
First, the estimates of θ(τ ) are computed by solving the problem in equation (12). Second,
b ) at r. Third, after obtaining the test statistic, it is
Wn (τ ) is calculated by centralizing θ(τ
necessary to compute the critical values. We propose the following scheme. Take B as a
large integer. For each b = 1, . . . , B:
b
b
(i) Obtain the resampled data {(Yib , X1i
, X2i
, Zib ), i = 1, . . . , n}.
b )).
(ii) Estimate θbb (τ ) and set Wnb (τ ) := R(θbb (τ ) − θ(τ
(iii) Go back to step (i) and repeat the procedure B times.
1
B
Let b
cB
1−α denote the empirical (1 − α)-quantile of the simulated sample {Wn , . . . , Wn },
where α ∈ (0, 1) is the nominal size. We reject the null hypothesis if Wn is larger than b
cB
1−α .
Confidence intervals for the parameters of interest can be easily constructed by inverting the
tests described above.
We provide a formal justification of the simulation method. Consider the following conditions.
Assumption G.IB: For any δn ↓ 0, sup||f −f0 ||≤δn || n1
Assumption G.IIB:
√
n n1
Pn
i=1 [(τ
∗
Pn
i=1
√
f (·) − E[f0 (·)]|| = op∗ (1/ n).
− 1{Yi < qτ 0 })(fb∗ (·) − fb(·))] converges weakly to a
tight random element G in L in P -probability.
Lemma 1 Under Assumptions C.I–C.III, G.IB–G.IIB and G.VI with “in probability”
√
√
replaced by “almost surely”, the bootstrap estimator of the θ0 is n-consistent and n(θb∗ −
b
θ)
N (0, Λ) in P∗ -probability.
Proof. See Appendix.
Lemma 1 establishes the consistency of the bootstrap procedure. It is important to
highlight the connection between this result and the previous section. In fact, Lemma 1 shows
that the limiting distribution of the bootstrap estimator is the same as that of Theorem 5,
and hence the above resample scheme is able to mimic the asymptotic distribution of interest.
Thus, computation of critical values and practical inference are feasible.
20
6
Monte Carlo simulations
6.1
Monte Carlo design
In this section, we describe the design of a small simulation experiment that have been conducted to assess the finite sample performance of the proposed two-step estimator discussed
in the previous sections. We consider the following model as a data generating process:
Yi = β1 + β2 Xi + εi ,
where ε ∼ N (0, 0.25), and β1 and β2 are the parameters of interest.4 We set them as
(β1 , β2 ) = (0.5, −0.5). The true variable X is not observed by the researcher, and we use
additive forms of measurement errors (ME) to generate the mismeasured X as follows:
X1i = Xi + U1i ,
X2i = Xi + U2i ,
where we generate X ∼ N (0, 1), and we use a Laplace distribution density as L(0, 0.25)
to generate both measurement errors, U1 and U2 . We compute and report results for the
proposed QR estimator. For comparison, we compute the density fbX|Y using different procedures. First, we construct our proposed estimator to control for ME, using the variables
(Y, X1 , X2 ), where the density is estimated by the Fourier Estimator. Second, we use the
variables (Y, X) to construct an “infeasible” kernel estimator of fX|Y in the first step. Finally, the variables (Y, X1 ) are used for “naive” kernel estimator of fX|Y which still suffers
from ME. For all estimators, we consider fourth-order Gaussian kernel. We approximate the
inner summation in equation (12) using Gauss-Hermite quadrature which is useful for the
indefinite integral. We perform 1000 simulations with n = 500 and n = 1000. We scan a
set of bandwidths for X and Y in order to find empirical optimal bandwidths in terms of
minimizing mean squared error.
6.2
Monte Carlo results
We report results for the following statistics of the coefficient β2 : bias (B), standard deviation
(SD), and mean squared error (MSE). First of all, in order to illustrate the problem of ME
in practice, we consider a model estimation where the researcher ignores the ME problem
4
For simplicity, the perfectly-observed covariate Z is absent here.
21
and performs a parametric median regression of Y on X1 without correcting for the ME in
X. This simple regression provides the bias of 0.1686, the standard error 0.02655 and the
MSE of 0.02586. These results highlight the importance of correcting for the ME problem.
Now we discuss and present the results for the nonparametric estimators with(out) correction of ME. Tables 1–3 report finite-sample performance of three different two-step estimators
at the median: (i) our proposed estimator (Fourier estimator); (ii) infeasible kernel estimator; (iii) naive kernel estimator. These results are for n = 500, but the results for n = 1000
are similar. At the bottom of each table, B, SD, and MSE from optimal bandwidth are
reported. In Table 4 we vary the quantiles and present results for the different estimators
across different deciles with n = 1000.
Tables 1 - 3 Simulation Results
[ABOUT HERE]
Table 1 shows that the proposed estimator is effective in reducing the bias when true X
is measured with errors and repeated measures of the mismeasured covariate are available.
These results are comparable to the infeasible kernel estimator in Table 2. On the other
hand, the results in Table 3 from the naive kernel estimator ignoring ME in X show much
larger bias over all selected bandwidths. Therefore, our estimator outperforms the naive
kernel estimator in terms of both bias and MSE. The minimum MSE for our proposed
method is 0.00674 while the minimum MSE from the naive kernel estimator is 0.01008. This
result confirms that the methods proposed in this paper are beneficial in finite samples when
repeated measures of the mismeasured regressor are available to the researcher.
Table 4 reports finite-sample performance of three estimators over various quantiles with
n = 1000. For simplicity, we use the optimal bandwidths obtained from the simulation
results above. The results confirm that our proposed estimator performs well over different
level of quantiles.
Table 4 - Simulation Results
[ABOUT HERE]
22
7
Empirical application
This section illustrates the usefulness of the new proposed methods in an empirical example.
One of the most commonly studied topics in labor economics is the impact of education
on earnings. The problem of measuring returns to education is an important research area
in economics with a very large literature on the subject. For examples of comprehensive
studies, see, e.g., Card (1995), Card (1999), and Harmon and Oosterbeek (2000). The large
volume of research in this area has been explained by both the interest in the causal effect
of education on earnings and the inherent difficulty in measuring this effect. The difficulty
arises for several reasons. The classical one is the fact that unobserved factors, such as
ability is probably related to both educational level and earnings. In a mean regression
framework, if ability is positively correlated with both education and earnings, ordinary
least squares (OLS) will overestimate true causal impact of education on earnings. Finding
strong instrumental variables (IV) that are not correlated with unobserved ability is usually
a difficult task. Nevertheless, even when available, IV estimators do not necessarily produce
estimated coefficients of education that are significantly lower than those obtained by OLS.
A potential reason for these findings in the returns to education literature is that IV’s
are used for two simultaneous purposes: to correct for both an omitted variable bias (since
ability is unobservable) and measurement errors (ME) in reported schooling years. Education measures are frequently measured with error, particularly if the information is collected
through one-time retrospective surveys, which are notoriously susceptible to recall errors,
(see, e.g., Ashenfelter and Krueger (1994), Kane, Rouse, and Staiger (1999), Bound, Brown,
and Mathiowetz (1999), and Black, Sanders, and Taylor (2003)). It is also known that ME
in a simple framework can provoke attenuation bias, thus OLS may not necessarily be overestimating the true returns to education if ME is a quantitatively more important problem
than omitting a covariate. Thus, it became important in that literature to understand what
is the isolated role of ME on the bias of estimated coefficients.
We use quantile regression (QR) methods to study returns to education. We accommodate possible heterogeneity on the returns to education in the earnings distribution by
applying QR. Indeed, this heterogeneity is not revealed by conventional least squares or two
stage least squares, while the QR approach constitutes a suitable way to investigate whether
the returns to education differ along the conditional wage distribution. In this paper, we
primarily focus on controlling for ME in education, even though the omitted variable bias
23
may be an important issue. To the best of our knowledge, there is no published work which
effectively controls for both omitted variable and ME in QR.5 Careful research is required
to control for both sources of endogeneity of education in the QR framework. We leave this
topic for future research.
Our QR method proposes a solution to the ME problem in education by using repeated
measures of the education variable. The literature on the returns to education has used
useful information on repeated measurements of education where one twin is asked to report
on both his/her own schooling and the schooling of the other twin (Ashenfelter and Krueger
(1994) and Bonjour et al. (2003)). This allows one to treat the information reported by the
other twin as a repeated measure of the true education. We therefore apply our method to a
data set on female monozygotic twins from the Twins Research Unit, St. Thomas’ Hospital
from the United Kingdom. Our data are taken from Bonjour et al. (2003) and Amin (2011).
The sample consists of 428 individuals comprising 214 identical twin pairs with complete
wage, age, and schooling information. The summary statistics are described in Table 5.
Table 5 - Summary Statistics
[ABOUT HERE]
The proposed QR estimator is designed to correct for the ME problem while exploring
heterogeneous covariate effects, and therefore provides a flexible method for the practical
analysis of returns to education. Thus, our objective is to estimate the following conditional
quantile function:
QWi (τ |edui , Zi ) = β(τ )edui + Zi> δ(τ ),
(14)
where Wi is the earnings of individual i, edui is the true number of years of education
which is latent, and Zi is a vector of exogenous covariates. The parameters of interest are
(β(τ ), δ(τ )). As mentioned earlier, if edui is subject to ME, and only edu1i and edu2i are
observed, standard QR estimates of β(τ ) using edu1i or edu2i will be inconsistent. For the
practical implementation of the procedures, the dependent variable is the log of wage (Y ).
The independent variable subject to ME is education and the observed repeated measures
5
Amin (2011) uses the average education of the twins as an additional covariate to proxy for omitted
ability bias and uses co-twin’s estimate of education as an instrument to control for ME in self-reported
education. However, this procedure generates an issue of two mismeasured covariates which require two
valid instruments. Amin (2011) instruments both mismeasured covariates with reported education variables.
However, there will be ME on those instruments, which makes the IV approach in QR invalid.
24
of true education are twin 1’s education (X1 ) and twin 2’s report of twin 1’s education (X2 ).
These Y , X1 and X2 are standardized to have mean zero and standard deviation one, for the
purpose of bandwidth selection. We use age and squared age as correctly-observed exogenous
covariates (Z).
Clearly, the model in equation (14) is very simple: ability has a monotonically positive
or negative impact on education return. However, as emphasized by Arias, Hallock, and
Sosa-Escudero (2001), QR provides a more flexible approach to distinguishing the effect of
education on different percentiles of the conditional earning distribution, being consistent
with a non-trivial and, in fact, unknown interaction between education and ability.
We compare the estimates using our proposed methods with those from the existing literature, in particular the results presented in Amin (2011) for QR and IV-QR. Amin (2011)
presents results for the parameter of interest using the two-stage QR estimator of Arias,
Hallock, and Sosa-Escudero (2001) and Powell (1983), where fitted value of education is estimated in the first stage and a QR of log of wage on the fitted value of education follows in
the second stage. However, for comparison purposes, we report estimates using the standard
IV-QR proposed by Chernozhukov and Hansen (2006). For this, we use the variable edu2
as an instrument for education edu1 . The IV strategy is based on the assumption that the
co-twin’s education is strongly related to the other’s report of the co-twin’s education (i.e.,
IV) but the IV is independent of unobservable factors of earnings as well as measurement
errors (e.g. Chernozhukov and Hansen (2005)). We conjecture that the IV approach delivers
different estimates than our proposed ME estimator since they rely on different set of conditions. Our method is particularly useful for the data set where it is unlikely that the IV is
independent of the regression error which contains ME on self-reported education, since the
IV is also mismeasured.6
Our results for the estimates of the returns to education coefficient are reported in Figures 1–4. The figures present results for the coefficients and confidence bands, for a range of
quantiles, for QR, IV-QR, and QRME, respectively. The shaded region in each panel represents the 95% confidence interval. In addition, the estimates for simple OLS and the IV-OLS
appear in the respective figures, with dashed red lines for confidence bounds. In Figure 1
we report standard QR and OLS estimates. The estimation strategy follows Koenker and
6
We note that the independence condition implies independence between ME on the co-twin’s education
and the other’s report of the co-twin’s education. However, our approach requires a weaker assumption of
conditional mean zero as in Assumption A.I (i).
25
0.4
0.5
Figure 1: Returns of Education. QR and OLS
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0.2
0.3
o
o
0.0
0.1
coefficients
o
o
o
0.2
0.4
0.6
0.8
quantiles
Bassett (1978) for the usual QR method. Figure 2 uses the instrumental variables (IV-QR)
estimator of Chernozhukov and Hansen (2006, 2008). For completeness, we also provide
results for the corresponding IV-OLS estimates. We use the IV as described above. Figure
3 displays the results after correcting for ME using our proposed estimator. Finally, for
comparison, in Figure 4 we report the results for estimates from a simple nonparametric
Kernel density estimation where we do not correct for ME; namely, Y , X1 and Z are used.
In both nonparametric estimations, most bandwidths are chosen based on Silverman’s rule
of thumb. For the bandwidth of frequency domain in our proposed estimator (hxn in equation
(11)), we use an informal rule where the estimates are not sensitive to marginal changes in
the neighborhood of the optimal bandwidth.
We note that all QR estimates (QR, IV-QR, and QRME) show returns to schooling
varying over the earnings distribution. The variability of the effects is the most apparent
and dramatic in the QRME estimates. While the QR and IV-QR estimates are statistically
different from zero, they are all closely clustered around the corresponding OLS estimate. In
Figure 1, the OLS value is 0.336 while the QR estimate varies from 0.288 to 0.356. Figure 2
shows more variability across quantiles. The IV-OLS value is 0.382 while the IV-QR ranges
from 0.539 to 0.243. Therefore, relative to the IV-QR estimates, the QR estimates appear
26
0.6
0.8
Figure 2: Returns of Education. IV-QR and IV-OLS
o
o
o
o
o
o
0.4
coefficients
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0.0
0.2
o
o
o
0.2
0.4
0.6
0.8
quantiles
to be approximately constant. In addition, Figure 2 displays a decreasing pattern over the
conditional distribution of wages, that is, the returns to education are smaller for the upper
quantiles.
Figure 3 reports the QRME results after correcting for the ME problem. In general, the
estimates are smaller than those from QR and IV-QR. The QRME also presents a different
patten relative to the other estimates. The shape of the estimated coefficients for returns
to education looks very interesting. The QRME estimates exhibit a distinct inverted Ushape, implying higher returns to schooling for those in the middle quantiles. In particular,
the QRME results show positive and monotonically-increasing returns to schooling at low
quantiles of the earnings distribution; estimated coefficient is increasing from 0.15 to 0.23
up to approximately 0.25 quantile. The returns start deceasing for higher quantiles. This
implies that the relative large wage gains from additional years of schooling accrue to those
at the lower end of the earnings distribution, and for high quantiles the returns of education
are smaller. The result can be, in part, associated with the fact that at the top of the
distribution of earnings, since individuals have high abilities, additional year of education
increases very little wages.
Figure 4 reports estimation results from QR based on a simple kernel density estimator
27
0.25
0.30
Figure 3: Returns of Education. Measurement Error Correction QRME. Fourier density
estimates
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0.15
coefficients
0.20
o
o
o
o
o
o
o
0.10
o
o
o
o
o
0.00
0.05
o
0.2
0.4
0.6
0.8
quantiles
0.35
0.40
0.45
Figure 4: Returns of Education. Measurement Error Correction QRME. Kernel density
estimates
o
o
o
o
o
0.30
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0.15
0.20
0.25
o
0.10
coefficients
o
0.2
0.4
0.6
quantiles
28
0.8
which suffers from ME in X. The estimates are similar to standard QR in Figure 1. In
general, the estimates are bigger than those in Figure 3 and they do not show a decreasing
pattern of returns for the top quantiles.
In all the application illustrates that QR method is an important tool to study returns to
schooling. It allow us to estimate returns to schooling for individuals at different quantiles of
the conditional distribution of earnings, which might be viewed as reflecting the distribution
of unobservable ability (Arias, Hallock, and Sosa-Escudero (2001)). Our empirical findings
document findings that the larger returns occur at the middle of the distribution, providing
empirical evidence that a potential economic redistributive policy should concentrate on
education at that lower part of the distribution.
8
Conclusion
This paper develops estimation and inference for quantile regression models with measurement errors. We propose a semiparametric two-step estimator assuming availability of repeated measures of the true covariate. The asymptotic properties of the estimator are
established. We also develop statistical inference procedures and establish the validity of a
bootstrap approach to implement the methods in practice. Monte Carlo simulations assess
the finite sample performance of the proposed methods and show that the proposed methods
have good finite sample performance. We apply the methods to an empirical application to
returns of education. The results document important heterogeneity in the returns of education and illustrate that our methods are useful in empirical models where measurement
error is an important issue.
29
A
Mathematical Appendix
Proof of Theorem 1. Given Assumption A.III, we have
φ(ζ, y, z) ≡ E[eiζX | Y = y, Z = z]
Z
= E[eiζX | Y = y, Z = z, X = x]f (x | y, z)dx
Z
= f (x | y, z)eiζx dx
(15)
(16)
where the last expression is the Fourier transform of f (x | y, z). Note that for (x, y, z) ∈ supp(X, Y, Z),
Z
1
φ(ζ, y, z) exp(−iζx)dζ
2π
is the inverse Fourier transform of φ(ζ, y, z). Thus we have
Z
1
f (x | y, z) =
φ(ζ, y, z) exp(−iζx)dζ.
2π
We now need to show that
E[eiζX2 | Y, Z]
exp
φ(ζ, Y, Z) =
E[eiζX2 ]
Z
0
ζ
iE[X1 eiξX2 ]
dξ .
E[eiξX2 ]
From Assumptions A.I–II
iE[XeiξX ]
E[eiξX ]
iE[XeiξX ]E[eiξU2 ]
=
E[eiξX ]E[eiξU2 ]
Dξ ln(E[eiξX ]) =
=
iE[Xeiξ(X+U2 ) ]
E[eiξ(X+U2 ) ]
=
iE[Xeiξ(X+U2 ) ] + iE[E(U1 | X, U2 )eiξ(X+U2 ) ]
E[eiξX2 ]
=
iE[Xeiξ(X+U2 ) ] + iE[E(U1 eiξ(X+U2 ) | X, U2 )]
E[eiξX2 ]
iE[Xeiξ(X+U2 ) ] + iE[U1 eiξ(X+U2 ) ]
E[eiξX2 ]
iE[X1 eiξX2 ]
=
.
E[eiξX2 ]
=
30
Therefore, for each real ζ,
φ(ζ, Y, Z) ≡ E[eiζX | Y, Z]
=
=
=
=
=
E[eiζX | Y, Z]E[eiζU2 ]
E[eiζX ]
E[eiζX ]E[eiζU2 ]
E[eiζX2 | Y, Z]
E[eiζX ]
E[eiζX2 ]
E[eiζX2 | Y, Z]
iζX
exp
ln(E[e
])
−
ln
1
E[eiζX2 ]
Z ζ
E[eiζX2 | Y, Z]
iξX
exp
Dξ ln(E[e ])dξ
E[eiζX2 ]
0
Z ζ
E[eiζX2 | Y, Z]
iE[X1 eiξX2 ]
exp
dξ ,
E[eiζX2 ]
E[eiξX2 ]
0
where the third equality is obtained by U2 ⊥ (Y, X, Z).
Proof of Theorem 2. Note that the inverse Fourier Transform of κ(hx ζ) is k(x/hx )/hx , and
the inverse Fourier Transform of E[eiζX | Y = y, Z = z] is f (x | y, z) by equation (13). Also note
that from the convolution theorem, the inverse Fourier Transform of the product of κ(hx ζ) and
E[eiζX | Y = y, Z = z] is the convolution between the inverse Fourier Transform of κ(hx ζ) and
the inverse Fourier Transform of E[eiζX | Y = y, Z = z]. Because Assumptions A.II (iii)–A.IV
guarantee the existence of f (x | y, z; hx ), we conclude that
Z
1
x
e−x
x
f (x | y, z; h ) ≡
k
f (e
x | y, z)de
x
hx
hx
Z
1
κ(hx ζ)E[eiζX | Y = y, Z = z] exp(−iζx)dζ
=
2π
Z
1
=
κ(hx ζ)φ(ζ, y, z) exp(−iζx)dζ.
2π
The following lemma is helpful to derive the result given in Theorem 3.
Lemma A.1
For (x, y, z) ∈ supp(X, Y, Z) and hn > 0,
fb(x | y, z; h) − f (x | y, z; h) = B(x, y, z; hx ) + L(x, y, z; h) + R(x, y, z; h),
where B(x, y, z; hx ) is a nonrandom “bias term” defined as
B(x, y, z; hx ) ≡ f (x | y, z; hx ) − f (x | y, z);
L(x, y, z; h) is a “variance term” admitting the linear representation
b [`(x, y, z, h; Y, X1 , X2 , Z)]
L(x, y, z; h) ≡ f¯(x | y, z; h) − f (x | y, z, hx ) = E
31
where `(x, y, z, h; Y, X1 , X2 , Z) is defined in the proof of the lemma, and R(x, y, z; h) is a “remainder
term,”
R(x, y, z; h) ≡ fb(x | y, z; h) − f¯(x | y, z; h).
Proof of Lemma A.1. Let ωA (ζ) ≡ E AeiζX2 where A = 1, X1 and
h
i
ω(ζ, y, x1 , z) ≡ E eiζX2 | Y = y, Z = z
Z
= eiζx2 f (x2 | y, z)dx2
=
where χ(ζ, y, z) ≡
and let
R
χ(ζ, y, z)
,
f (y, z)
b AeiζX2 and δb
eiζx2 f (x2 , y, z)dx2 . Also let ω
bA (ζ) ≡ E
ωA (ζ) ≡ ω
bA (ζ) − ωA (ζ),
h
i
b eiζX2 | Y = y, Z = z ≡ χ
ω
b (ζ, y, z) ≡ E
b(ζ, y, z)/fb(y, z)
where
n
h
i
1 X iζX2j
b eiζX2 khy (Y − y)k hz (Z − z)
χ
b(ζ, y, z) =
e
khy (Yj − y)khz (Zj − z) = E
n
1
fb(y, z) =
n
j=1
n
X
b [khy (Y − y)khz (Z − z)] ,
khy (Yj − y)khz (Zj − z) = E
j=1
and δ χ
b(ζ, y, z) ≡ χ
b(ζ, y, z) − χ(ζ, y, z) and δ fb(y, z) ≡ fb(y, z) − f (y, z). We use a following representation
ωX1 (ζ) + δb
ωX1 (ζ)
ω
bX1 (ζ)
=
= qX1 (ζ) + δb
qX1 (ζ)
ω
b1 (ζ)
ω1 (ζ) + δb
ω1 (ζ)
(17)
where qX1 (ζ) = ωX1 (ζ)/ω1 (ζ) and where δb
qX1 (ζ) can be written as either
δb
qX1 (ζ) =
δb
ωX1 (ζ) ωX1 (ζ)δb
ω1 (ζ)
−
ω1 (ζ)
(ω1 (ζ))2
δb
ω1 (ζ) −1
1+
ω1 (ζ)
or δb
qX1 (ζ) = δ1 qbX1 (ζ) + δ2 qbX1 (ζ) with
δb
ωX1 (ζ) ωX1 (ζ)δb
ω1 (ζ)
−
ω1 (ζ)
(ω1 (ζ))2
ωX1 (ζ) δb
ω1 (ζ) 2
δb
ω1 (ζ) −1 δb
ωX1 (ζ) δb
ω1 (ζ)
δb
ω1 (ζ) −1
−
.
δ2 qbX1 (ζ) ≡
1+
1+
ω1 (ζ)
ω1 (ζ)
ω1 (ζ)
ω1 (ζ) ω1 (ζ)
ω1 (ζ)
δ1 qbX1 (ζ) ≡
Similarly,
1
1
=
= q1 (ζ) + δb
q1 (ζ)
ω
b1 (ζ)
ω1 (ζ) + δb
ω1 (ζ)
32
(18)
where q1 (ζ) ≡ 1/ω1 (ζ), and where
δb
q1 (ζ) =
δb
ω1 (ζ)
−
(ω1 (ζ))2
δb
ω1 (ζ) −1
1+
ω1 (ζ)
or δb
q1 (ζ) = δ1 qb1 (ζ) + δ2 qb1 (ζ) with
δb
ω1 (ζ)
(ω1 (ζ))2
δb
ω1 (ζ) 2
δb
ω1 (ζ) −1
1
1+
δ2 qb1 (ζ) ≡
.
ω1 (ζ) ω1 (ζ)
ω1 (ζ)
δ1 qb1 (ζ) ≡ −
And also
χ
b(ζ, y, z)
χ(ζ, y, z) + δ χ
b(ζ, y, z)
=
= q2 (ζ, y, z) + δb
q2 (ζ, y, z)
b
f (y, z)
f (y, z) + δ fb(y, z)
(19)
where q2 (ζ, y, z) ≡ χ(ζ, y, z)/f (y, z), and where
δb
q2 (ζ, y, z) =
δχ
b(ζ, y, z) χ(ζ, y, z)δ fb(y, z)
−
f (y, z)
(f (y, z))2
!
δ fb(y, z)
1+
f (y, z)
!−1
or δb
q2 (ζ, y, z) = δ1 qb2 (ζ, y, z) + δ2 qb2 (ζ, y, z) with
δχ
b(ζ, y, z) χ( ζ, y, z)δ fb(y, z)
−
f (y, z)
(f (y, z))2
!2
!−1
χ(ζ, y, z) δ fb(y, z)
δ fb(y, z)
δ2 qb2 (ζ, y, z) ≡
1+
f (y, z)
f (y, z)
f (y, z)
!−1
δχ
b(ζ, y, z) δ fb(y, z)
δ fb(y, z)
−
1+
.
f (y, z) f (y, z)
f (y, z)
δ1 qb2 (ζ, y, z) ≡
Rζ
R
b X (ζ) ≡ ζ (ib
Let QX1 (ζ) ≡ 0 (iωX1 (ξ)/ω1 (ξ))dξ and δ Q
ω1 (ξ))dξ − QX1 (ζ). Note that for
1
0 ωX1 (ξ)/b
b
some random function δ Q̄X1 (ζ) such that |δ Q̄X1 (ζ)| ≤ |δ QX1 (ζ)| for all ζ,
2 1
b
b
b
exp(δ Q̄X1 (ζ)) δ QX1 (ζ)
. (20)
exp QX1 (ζ) + δ QX1 (ζ) = exp(QX1 (ζ)) 1 + δ QX1 (ζ) +
2
33
From equations (14)∼(16), we have
fb(x | y, z; h) − f (x | y, z; hx )
Z
Z
1
1
x b
(2)
=
κ(h ζ)φ(ζ, y, z, h ) exp(−iζx)dζ −
κ(hx ζ)φ(ζ, y, z) exp(−iζx)dζ
2π
2π
Z ζ
Z ζ
Z
1
ω
b (ζ, y, z)
ib
ωX1 (ξ)
ω(ζ, y, z)
iωX1 (ξ)
x
=
κ(h ζ) exp(−iζx)
exp
dξ −
exp
dξ dζ
2π
ω
b1 (ζ)
ω
b1 (ξ)
ω1 (ζ)
ω1 (ξ)
0
0
Z ζ
Z
1
ω(ζ, y, z)
iωX1 (ξ)
=
κ(h1 ζ) exp(−iζx) −
exp
dξ
2π
ω1 (ζ)
ω1 (ξ)
0
)
(
b(ζ, y, z) χ(ζ, y, z)δ fb(y, z)
χ(ζ, y, z) δ χ
+
−
+ δ2 qb2 (ζ, y, z)
+
f (y, z)
f (y, z)
(f (y, z))2
1
δb
ω1 (ζ)
×
−
+ δ2 qb1 (ζ) × exp(QX1 (ζ))
ω1 (ζ) (ω1 (ζ))2
Z ζ
2 Z ζ
Z ζ
1
× 1+
iδ1 qbX1 (ξ)dξ +
iδ2 qbX1 (ξ)dξ + exp(δ Q̄X1 (ζ))
iδb
qX1 (ξ)dξ
dζ.
2
0
0
0
We denote the linearization of fb(x | y, z; hx ) by f¯(x | y, z; hx ). Then
L(x, y, z; h)
≡f¯(x | y, z; h) − f (x | y, z; hx )
Z
ω1 (ζ)
χ(ζ, y, z) δb
1
x
κ(h ζ) exp(−iζx) exp(QX1 (ζ)) −
=
2π
f (y, z) (ω1 (ζ))2
Z ζ
χ(ζ, y, z) 1
+
iδ1 qbX1 (ξ)dξ
f (y, z) ω1 (ζ) 0
1 δχ
b(ζ, y, z)
1 χ(ζ, y, z)δ fb(y, z)
+
−
dζ
ω1 (ζ) f (y, z)
ω1 (ζ)
(f (y, z))2
Z
1
δb
ω1 (ζ) δ χ
b(ζ, y, z) δ fb(y, z)
x
=
κ(h ζ) exp(−iζx)φ(ζ, y, z) −
+
−
2π
ω1 (ζ)
χ(ζ, y, z)
f (y, z)
Z ζ
iδb
ωX1 (ξ) iωX1 (ξ)δb
ω1 (ξ)
+
−
dξ dζ
ω1 (ξ)
(ω1 (ξ))2
0
!
Z
b
1
δb
ω
(ζ)
δ
χ
b
(ζ,
y,
z)
δ
f
(y,
z)
1
=
κ(hx ζ) exp(−iζx)φ(ζ, y, z) −
+
−
dζ
2π
ω1 (ζ)
χ(ζ, y, z)
f (y, z)
Z Z ±∞
1
iδb
ωX1 (ξ) iωX1 (ξ)δb
ω1 (ξ)
x
+
κ(h ζ) exp(−iζx)φ(ζ, y, z)dζ
−
dξ
2π
ω1 (ξ)
(ω1 (ξ))2
ξ
!
Z
δb
ω1 (ζ) δ χ
b(ζ, y, z) δ fb(y, z)
1
x
=
κ(h ζ) exp(−iζx)φ(ζ, y, z) −
+
−
dζ
2π
ω1 (ζ)
χ(ζ, y, z)
f (y, z)
Z Z ±∞
1
iδb
ωX1 (ζ) iωX1 (ζ)δb
ω1 (ζ)
x
κ(h ξ) exp(−iξx)φ(ξ, y, z)dξ
+
−
dζ
2π
ω1 (ζ)
(ω1 (ζ))2
ζ
34
Z 1 1
κ(hx ζ) exp(−iζx)φ(ζ, y, z)
2π ω1 (ζ)
Z
1 iωX1 (ζ) ±∞
x
κ(h
ξ)
exp(−iξx)φ(ξ,
y,
z)dξ
δb
ω1 (ζ)
−
2π (ω1 (ζ))2 ζ
Z ±∞
1
i
+
κ(hx ξ) exp(−iξx)φ(ξ, y, z)dξ δb
ωX1 (ζ)
2π ω1 (ζ) ζ
1
1
x
+
κ(h ζ) exp(−iζx)φ(ζ, y, z) δ χ
b(ζ, y, z)
2π χ(ζ, y, z)
1
1
x
b
+ −
κ(h ζ) exp(−iζx)φ(ζ, y, z) δ f (y, z) dζ
2π f (y, z)
Z
b iζX2 ] − E[eiζX2 ] + Ψ2 (ζ, x, y, z, hx ) E[X
b 1 eiζX2 ] − E[X1 eiζX2 ]
Ψ1 (ζ, x, y, z, hx ) E[e
=
b iζX2 khy (Y − y)khz (Z − z)] − E[eiζX2 khy (Y − y)khz (Z − z)]
+ Ψ3 (ζ, x, y, z, hx ) E[e
x
b
+ Ψ4 (ζ, x, y, z, h ) E[khy (Y − y)khz (Z − z)] − E[khy (Y − y)khz (Z − z)] dζ
Z
b
=E
Ψ1 (ζ, x, y, z, hx ) eiζX2 − E[eiζX2 ] + Ψ2 (ζ, x, y, z, hx ) X1 eiζX2 − E[X1 eiζX2 ]
+ Ψ3 (ζ, x, y, z, hx ) eiζX2 khy (Y − y)khz (Z − z) − E[eiζX2 khy (Y − y)khz (Z − z)]
x
+ Ψ4 (ζ, x, y, z, h ) (khy (Y − y)khz (Z − z) − E[khy (Y − y)khz (Z − z)]) dζ
−
=
b [`(x, y, z, h; Y, X1 , X2 , Z)] ,
≡E
where the following identity was used in the fourth equality: for any absolutely integrable function
g
Z ∞Z ζ
Z ∞Z ∞
Z 0 Z −∞
Z Z ±∞
g(ζ, ξ)dξdζ =
g(ζ, ξ)dζdξ +
g(ζ, ξ)dζdξ ≡
g(ζ, ξ)dζdξ,
−∞
0
0
−∞
ξ
ξ
ξ
and where
1 1
κ(hx ζ) exp(−iζx)φ(ζ, y, z)
2π ω1 (ζ)
Z
1 iωX1 (ζ) ±∞
−
κ(hx ξ) exp(−iξx)φ(ξ, y, z)dξ
2π (ω1 (ζ))2 ζ
Z ±∞
i
1
x
κ(hx ξ) exp(−iξx)φ(ξ, y, z)dξ
Ψ2 (ζ, x, y, z, h ) ≡
2π ω1 (ζ) ζ
1
1
Ψ3 (ζ, x, y, z, hx ) ≡
κ(hx ζ) exp(−iζx)φ(ζ, y, z)
2π χ(ζ, y, z)
1
1
Ψ4 (ζ, x, y, z, hx ) ≡ −
κ(hx ζ) exp(−iζx)φ(ζ, y, z).
2π f (y, z)
Ψ1 (ζ, x, y, z, hx ) ≡ −
We use the following convenient notation for expositional simplicity.
35
Definition A.1 We write f (ζ) g(ζ) for f, g : R 7→ R when there exists a constant C > 0,
independent of ζ, such that f (ζ) ≤ Cg(ζ) for all ζ ∈ R (and similarly for ). Analogously, we
write an bn for two sequences an , bn when there exists a constant C independent of n such that
an ≤ Cbn for all n ∈ N.
Proof of Theorem 3. In order to obtain the uniform convergence rate of fb(x | y, z; h), we derive
asymptotic convergence rate of the bias term, divergence rate of the variance term, and rely on
negligibility of the remainder term. First, from Parseval’s identity and Assumption A.IV, we have
|B(x, y, z, hx )| = |f (x | y, z; hx ) − f (x | y, z)|
= |f (x | y, z; hx ) − f (x | y, z; 0)|
Z
Z
1
1
x
= κ(h ζ)φ(ζ, y, z) exp(−iζx)dζ −
φ(ζ, y, z) exp(−iζx)dζ 2π
2π
Z
1
x
=
(κ(h ζ) − 1)φ(ζ, y, z) exp(−iζx)dζ 2π
Z
1
≤
|(κ(hx ζ) − 1)| |φ(ζ, y, z)| dζ
2π
Z
1 ∞
|(κ(hx ζ) − 1)| |φ(ζ, y, z)| dζ
=
π ξ̄/hx
Z ∞
|φ(ζ, y, z)| dζ.
ξ̄/hx
Then, by Assumption B.I (ii), we have
sup
Z
x
∞
Cφ (1 + |ζ|)γφ exp(αφ |ζ|νφ )dζ
|B(x, y, z, h )| ξ̄/hx
(x,y,z)∈supp(X,Y,Z)
Z
∞
(1 + |ζ|)γφ exp(αφ |ζ|νφ )dζ
(21)
ξ̄/hx
¯ x νφ
exp αφ ξ/h
= O (hx )−γB exp αB (hx )−νB .
=O
¯ x
ξ/h
γφ +1
For the asymptotic divergence rate of the variance term, define
Z
Z
x
x
Ψ+ (h) ≡ Ψ+
(ζ,
h
)dζ
+
Ψ+
1
2 (ζ, h )dζ
Z
Z
x
y z −1
x
+ (hy hz )−1 Ψ+
(ζ,
h
)dζ
+
(h
h
)
Ψ+
3
4 (ζ, h )dζ,
x
x
where Ψ+
A (ζ, h ) ≡ sup(x,y,z)∈supp(X,Y,Z) |ΨA (ζ, x, y, z, h )| for A = 1, 2, 3, 4. From Assumptions
A.IV and B,II, and from similar arguments above, one can show that
 2 
sup
(δ χ
b1 (ζ, y, z))  1,
E n |δb
ω1 (ζ)|2 1, E n hy hz ·
(y,z)∈supp(Y,Z)
 
2
E n |δb
ωX1 (ζ)|2 1, E n hy hz ·
sup
δ fb(y, z)  1,
(y,z)∈supp(Y,Z)
36
and
y z −1
Z
x
x −1 γµ +γφ −γω +2
Ψ+
exp −αω (hx )−1 )νω exp αφ ((hx )−1 )νφ ,
1 (ζ, h )dζ (1 + (h ) )
Z
x
x −1 γφ −γω +2
Ψ+
exp −αω (hx )−1 )νω exp αφ ((hx )−1 )νφ ,
2 (ζ, h )dζ (1 + (h ) )
Z
x
y z −1
x −1 γφ −γω +1
Ψ+
exp −αω (hx )−1 )νω exp αφ ((hx )−1 )νφ ,
3 (ζ, h )dζ (h h ) (1 + (h ) )
Z
x
y z −1
x −1 γφ +1
Ψ+
exp αφ ((hx )−1 )νφ .
4 (ζ, h )dζ (h h ) (1 + (h ) )
(h h )
y z −1
(h h )
Then we have
γ −γω +1
Ψ+ (h) = O max{(1 + (hx )−1 )γµ +1 , (hy hz )−1 } 1 + (hx )−1 φ
exp (αφ 1{νφ =νω } − αω )((hx )−1 )νω .
Note that by Minkowski inequality,
"
E
#
|L(x, y, z, h)|
sup
(x,y,z)∈supp(X,Y,Z)
"
#
|f¯(x | y, z; h) − f (x | y, z; hx )|
sup
=E
(x,y,z)∈supp(X,Y,Z)
Z
[Ψ1 (ζ, x, y, z, hx )δb
ω1 (ζ) + Ψ2 (ζ, x, y, z, hx )δb
ωX1 (ζ)
(x,y,z)∈supp(X,Y,Z)
x
x
b
+ Ψ3 (ζ, x, y, z, h )δ χ
b1 (ζ, y, z) + Ψ4 (ζ, x, y, z, h )δ f (y, z)]dζ !
Z =E
sup
≤E
|Ψ1 (ζ, x, y, z, hx )| |δb
ω1 (ζ)|
sup
(x,y,z)∈supp(X,Y,Z)
!
+
sup
x
|Ψ2 (ζ, x, y, z, h )| |δb
ωX1 (ζ)|
(x,y,z)∈supp(X,Y,Z)
!
+
sup
x
|Ψ3 (ζ, x, y, z, h )|
(x,y,z)∈supp(X,Y,Z)
!
sup
(y,z)∈supp(Y,Z)
!
+
sup
x
|Ψ4 (ζ, x, y, z, h )|
(x,y,z)∈supp(X,Y,Z)
Z ≤
sup
(y,z)∈supp(Y,Z)
|δ χ
b1 (ζ, y, z)|
!
b
δ f (y, z) dζ
n n o1/2
o1/2
2
2
+
x
x
Ψ+
(ζ,
h
)
E
|δb
ω
(ζ)|
+
Ψ
(ζ,
h
)
E
|δb
ω
(ζ)|
1
X
1
1
2
 

y z
x

+ (hy hz )−1 Ψ+
(ζ,
h
)
E
h h ·
3

 

x
 y z
+ (hy hz )−1 Ψ+
4 (ζ, h ) E h h ·
37
!2 1/2

sup
δχ
b1 (ζ, y, z) 

(y,z)∈supp(Y,Z)

!2  
sup
δ fb(y, z) 
dζ

(y,z)∈supp(Y,Z)
≤n
−1/2
Z
x
Ψ+
1 (ζ, h )
Z
o1/2
n n o1/2
2
x
dζ
dζ + Ψ+
ωX1 (ζ)|2
E n |δb
ω1 (ζ)|
2 (ζ, h ) E n |δb
  
+
x
y z −1

Ψ3 (ζ, h ) E n hy hz ·
+ (h h )

  Z

y z
x

(ζ,
h
)
E
n
+ (hy hz )−1 Ψ+
h h ·
4

Z
!2 1/2

dζ
sup
δχ
b1 (ζ, y, z) 

(y,z)∈supp(Y,Z)
!2  
sup
δ fb(y, z)  dζ

(y,z)∈supp(Y,Z)
n−1/2 Ψ+ (h).
Thus, we have that by Markov’s inequality
sup
|L(x, y, z, h)|
(22)
(x,y,z)∈supp(X,Y,Z)
γ −γω +1
= Op n−1/2 max{(1 + (hx )−1 )γµ +1 , (hy hz )−1 } 1 + (hx )−1 φ
exp (αφ 1{νφ =νω } − αω )((hx )−1 )νω .
From Assumptions B.II–III, selection of the bandwidths in the statement of the theorem, and
minor adjustment of the argument for the variance term above, one can show that the remainder
term is asymptotically negligible. So detailed proof is omitted here for brevity. Then putting
equations (21) and (22) together yields the result.
Proof of Theorem 4. To show consistency of the estimator, we apply Theorem 1 of Chen,
Linton, and Van Keilegom (2003). Thus, we need to verify Conditions (1.1)–(1.5’) in Chen,
Linton, and Van Keilegom (2003). Recall that
n
X
e n (β, δ, f ) = 1
Q
n
Z
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x | Yi , Zi ) dx,
i=1
and
Z
e δ, f ) = E
Q(β,
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x | Yi , Zi ) dx.
a) Conditions (1.1) is directly satisfied by our Assumption C.I.
R
e δ, f ) is the derivative of E ρ(Y −x> β −
b) For verification of Condition (1.2), note that
Q(β,
R
Z > δ)f0 (x|Y, Z) dx with respect to (β, δ) and that ρ(Y − x> β − Z > δ)f0 (x|Y, Z) dx is convex in
(β, δ).
e δ, f ) is continuous in
c) Now we show that Condition (1.3) is satisfied by verifying that Q(β,
38
f uniformly for all (β > , δ > )> ∈ Θ. For any ||f − f0 || ≤ ,
e δ, f ) − Q(β,
e δ, f0 )||
||Q(β,
Z
=||E ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) dx
Z
− E ψ(Yi − x> β − Zi> δ)[x Zi ] · f0 (x|Yi , Zi ) dx||
Z
=||E ψ(Yi − x> β − Zi> δ)[x Zi ] · [f (x|Yi , Zi ) − f0 (x|Yi , Zi )] dx||
Z
≤E ||[x Zi ]|| · |f (x|Yi , Zi ) − f0 (x|Yi , Zi )| dx
Z
≤E ||[x Zi ]|| dx × .
The first inequality holds by the property of exchanging norms and
R integral, Cauchy inequality,
and the fact that ψ(·) ≤ 1. By Assumptions C.II and C.III, E ||(x Zi )|| dx < ∞. Therefore,
Condition (1.3) holds.
d) Condition (1.4) is satisfied by our Theorem 3.
e) It only remains to verify Condition (1.5’). For any n = o(1),
sup
e n (β, δ, f ) − Q(β,
e δ, f )|| = op (1).
||Q
(β,δ)∈Θ,||f −f0 ||≤n
Let diam(X ) denote the diameter of X . Since X is compact, diam(X ) is finite. By noting that
e n (β, δ, f ) − Q(β,
e δ, f )||
||Q
!
Z
n
1X
=||
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) − Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) dx||
n
i=1
Z
n
X
1
≤ ||
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) − Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )|| dx
n
i=1
n
≤diam(X ) sup ||
x
1X
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) − Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )||,
n
i=1
we have
e n (β, δ, f ) − Q(β,
e δ, f )||
||Q
sup
(β,δ)∈Θ,||f −f0 ||≤n
n
≤diam(X )
||
sup
(β,δ)∈Θ,||f −f0 ||≤n ,x
>
− Eψ(Yi − x β −
Zi> δ)[x
1X
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )
n
i=1
Zi ] · f (x|Yi , Zi )||.
Denote φβ,δ,f,x (Yi , Zi ) = ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ). We need to show that {φβ,δ,f,x :
(β > , δ > )> ∈ Θ, ||f − f0 ||F ≤ , x ∈ X } is G-C. Because {ψ(Yi − x> β − Zi> δ) : (β, δ) ∈ Θ, x ∈ X } is
bounded and VC, it is G-C. Also x ∈ X , which is compact by assumption C.II, and E[|Z|] < ∞
39
by assumption C.III. Finally, Fn = {f : ||f − f0 || ≤ } is G-C by Theorem 3. Those conditions
and Corollary 9.27 (ii) of Kosorok (2008) lead to our conclusion.
Proof of Theorem 5. We now apply Theorem 2 of Chen, Linton, and Van Keilegom (2003) to
establish weak convergence. We need to check their Conditions (2.1)–(2.6).
a) Condition (2.1) is satisfied by assumption G.I.
b) To verify Condition (2.2), note that
Z
e δ, f0 ) = E ψ(Yi − x> β − Z > δ)[x Zi ] · f0 (x|Yi , Zi ) dx
Q(β,
i
= EE[ψ(Yi − x> β − Zi> δ)[x Zi ] · |Yi , Zi ]
= E[ψ(Yi − x> β − Zi> δ)[x Zi ]]
= EE[ψ(Y − x> β − Z > δ)[x Z]|X, Z]
= E[E[ψ(Y − x> β − Z > δ)|X, Z][x Z]]
= E[(τ − G(x> β + Z > δ))[x Z]].
The derivative with respect to (β, δ), denoted by Γ1 (β, δ, f ), is −Eg(x> β + Z > δ))[x Z][x Z]> . It is
continuous in (β, δ) at (β0 , δ0 ) and positive definite by Assumptions G.II and G.III.
e δ, f ) at
c) Now we verify Condition (2.3). We first calculate the pathwise derivative of Q(β,
f0 :
e δ, f0 + ζ(f − f0 )) − Q(β,
e δ, f0 )]/ζ
Γ2 (β, δ, f0 )[f − f0 ] = [Q(β,
Z
=E ψ(Yi − x> β − Zi> δ)[x Zi ] · [f (x|Yi , Zi ) − f0 (x|Yi , Zi )] dx.
For any n ↓ 0, such that ||(β, δ) − (β0 , δ0 )|| ≤ n and ||f − f0 || ≤ n :
e δ, f ) − Q(β,
e δ, f0 ) − Γ2 (β, δ, f0 )[f − f0 ]|| = 0
||Q(β,
and
||Γ2 (β, δ, f0 )[f − f0 ] − Γ2 (β0 , δ0 , f0 )[f − f0 ]||
Z
=||E [ψ(Yi − x> β − Zi> δ) − ψ(Yi − x> β0 − Zi> δ0 )][x Zi ] · [f (x|Yi , Zi ) − f0 (x|Yi , Zi )] dx||
Z
≤E |ψ(Yi − x> β − Zi> δ) − ψ(Yi − x> β0 − Zi> δ0 )| · ||[x Zi ]|| dx × n
Z
=E |1{Yi − x> β − Zi> δ < 0} − 1{Yi − x> β0 − Zi> δ0 < 0}| · ||[x Zi ]|| dx × n
=o(1) × n .
The inequality holds by the property of exchanging norm and integrals. The last equality holds
because the domain for integration is o(1) and X is compact.
d) Condition 2.4 holds by Assumption G.IV.
40
e) Now we verify Condition (2.5’):
√
e n (β, δ, f ) − Q(β,
e δ, f ) − Q
e n (β0 , δ0 , f0 )|| = op (1/ n)
sup
||Q
||β−β0 ||≤n ,||δ−δ0 ||≤n ,||f −f0 ||≤n
Note that
e n (β, δ, f ) − Q(β,
e δ, f ) − Q
e n (β0 , δ0 , f0 )||
||Q
Z
n
1X
=||
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) − Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )
n
i=1
!
n
1X
>
>
−
ψ(Yi − x β0 − Zi δ0 )[x Zi ] · f0 (x|Yi , Zi ) − Eψ(Yi − x> β0 − Zi> δ0 )[x Zi ] · f0 (x|Yi , Zi ) dx||
n
i=1
Z
n
1X
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) − Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )
≤ ||
n
i=1
n
1X
−
ψ(Yi − x> β0 − Zi> δ0 )[x Zi ] · f0 (x|Yi , Zi ) − Eψ(Yi − x> β0 − Zi> δ0 )[x Zi ] · f0 (x|Yi , Zi )|| dx
n
i=1
n
1X
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) − Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )
≤diam(X ) sup ||
n
x
i=1
n
1X
ψ(Yi − x> β0 − Zi> δ0 )[x Zi ] · f0 (x|Yi , Zi ) − Eψ(Yi − x> β0 − Zi> δ0 )[x Zi ] · f0 (x|Yi , Zi )||.
−
n
i=1
So we need to show
n
sup
||β−β0 ||≤n ,||δ−δ0 ||≤n ,||f −f0 ||≤n ,x
||
1X
ψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi )
n
i=1
n
1X
ψ(Yi − x> β0 − Zi> δ0 )[x Zi ] · f0 (x|Yi , Zi )
n
i=1
√
>
>
+Eψ(Yi − x β0 − Zi δ0 )[x Zi ] · f0 (x|Yi , Zi )|| = op (1/ n).
−Eψ(Yi − x> β − Zi> δ)[x Zi ] · f (x|Yi , Zi ) −
We need to show that φβ,δ,f,x is Donsker. Because {ψ(Yi − x> β − Zi> δ) : (β, δ) ∈ Θ, x ∈ X }
is bounded and VC, it is Donsker. Also x ∈ X , which is compact by Assumption C.II, and
Z ∈ Z which is also compact by Assumption G.V. Finally, Fn is uniformly bounded Donsker
by Assumption G.VI. Those conditions and Corollary 9.32 (iii) of Kosorok (2008) lead to our
conclusion.
√ e
Finally, we verify Condition (2.6). Noting that nQ
n (β0 , δ0 , f0 ) converges weakly and Assumption G.III, we only verify that
Z
√
√
nΓ2 (β0 , δ0 , f0 )[fb − f0 ] = nE ψ(Y − x> β0 − Z > δ0 )[x Z] · (fb(x|Y, Z) − f0 (x|Y, Z)) dx
√
√ R
converges weakly. Also, since the bias of fb is op (1/ n), we only need to verify nE ψ(Y − x> β0 −
Zδ0 )[x Z] · (fb(x|Y, Z) − Efb(x|Y, Z)) dx converges weakly:
Z
Z
√
1
>
>
b −iζx dζ] dx.
nE ψ(Y − x β0 − Z δ0 )[x Z] · [
κ(hxn ζ)(φb − Eφ)e
(23)
2π
41
First,
n
1 X iζX2j
p
sup |
e
− EeiζX2j | → 0.
n
ζ
j=1
This is because eiζX2j = cos(ζX2j ) + i sin(ζX2j ) and those two terms are Lipschitz in ζ. SimiiζX2j
1 Pn
P
p
p
j=1 X1 e
eiζX2j
→ EX1iζX
. By the continuous
larly, n1 nj=1 X1 eiζX2j → EX1 eiζX2j . Therefore, n 1 P
iζX
n
2j
2j
n
j=1
Ee
e
mapping theorem,
Z
exp
0
ζ
i n1
Pn
iζX2j
j=1 X1 e
1 Pn
iζX2j
j=1 e
n
!
p
Z
→ exp
0
ζ
iEX1 eiζX2j
EeiζX2j
Also we have
b iζX2 | Y = y, Z = z] ≡
E[e
P iζX2
1
[e
khyn (Y − y)khzn (Z − z)]
hyn hzn n
.
P
1
[khyn (Y − y)khzn (Z − z)]]
hyn hzn n
So (23) equals
Z Z
Z
√
1
>
n
ψ(y − x β0 − zδ0 )[x z] · [
κ(hxn ζ)(φb − φ)e−iζx dζ] dx dydz
2π
Z Z
Z
√
1
>
ψ(y − x β0 − zδ0 )[x z] · [
κ(hxn ζ)×
= n
2π
1 X iζX2
[e
( y z
khyn (Y − y)khzn (Z − z)]
hn hn n
!
Z ζ 1 Pn
n
i n j=1 X1 eiζX2j
1 X iζX2j 1 X
/
[khyn (Y − y)khzn (Z − z)] − φ)
e
exp
P
n
1
iζX2j
n
hyn hzn n
0
j=1 e
n
j=1
× e−iζx dζ] dx dydz
Z
Z Z
√
1 X iζX2
1
>
= n
κ(hxn ζ)( y z
ψ(y − x β0 − zδ0 )[x z] · [
[e
khyn (Y − y)khzn (Z − z)]
2π
hn hn n
Z ζ
iEX1 eiζX2j
[exp
+ op (1)]/[EeiζX2j f (y, z) + op (1)] − φ)e−iζx dζ] dx dydz
iζX2j
Ee
0
Z
Z
√ 1X
1
>
= n{
ψ(Yj − x β0 − Zj δ0 )[x Zj ] · [
κ(hxn ζ)×
n
2π
Z ζ
iEX1 eiζX2
[eiζX2j ][exp
+ op (1)]/[EeiζX2 f (y, z) + op (1)]e−iζx dζ] dx − φ} + op (n−1/2 ),
iζX2
Ee
0
which converges weakly, and the result follows.
Proof of Lemma 1. The proof is a direct application of Theorem B in Chen, Linton, and Van
Keilegom (2003) and parallel to that of weak convergence.
42
References
Amin, V. (2011): “Returns to Education: Evidence from UK Twins: Comment,” American Economic Review, 101, 1629–1635.
Arias, O., K. F. Hallock, and W. Sosa-Escudero (2001): “Individual Heterogeneity in the
Returns to Schooling: Instrumental Variables Quantile Regression Using Twins Data,” Empirical
Economics, 26, 7–40.
Ashenfelter, O., and A. Krueger (1994): “Estimates of the Economic Return to Schooling
from a New Sample of Twins,” American Economic Review, 84, 1157–1173.
Black, D., S. Sanders, and L. Taylor (2003): “Measurement of Higher Education in the
Census and Current Population Survey,” Journal of the American Statistical Association, 98,
545–554.
Bonjour, D., L. F. Cherkas, J. E. Haskel, D. D. Hawkes, and T. D. Spector (2003):
“Returns to Education: Evidence from U.K. Twins,” American Economic Review, 93, 1799–1812.
Bound, J., C. Brown, and N. Mathiowetz (1999): “Measurement Error in Survey Data,” in
Handbook of Econometrics, ed. by J. Heckman, and E. Leamer, vol. 5. North-Holland, Amsterdam.
Card, D. (1995): “Earnings, Schooling, and Ability Revisited,” in Research in Labor Economics,
ed. by S. Polachek, vol. 14. JAI Press.
(1999): “The Causal Effect of Education on Earnings,” in Handbook of Labor Economics,
ed. by O. Ashenfelter, and D. Card, vol. 3. Amsterdam: Elsevier.
Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu (2006): Measurement
Error in Nonlinear Models. Chapman & Hall, Boca Raton, Florida.
Chen, X., O. Linton, and I. Van Keilegom (2003): “Estimation of Semiparametric Models
When the Criterion Function is not Smooth,” Econometrica, 71, 1591–1608.
Chernozhukov, V., and C. Hansen (2005): “An IV Model of Quantile Treatment Effects,”
Econometrica, 73, 245–261.
(2006): “Instrumental Quantile Regression Inference for Structural and Treatment Effects
Models,” Journal of Econometrics, 132, 491–525.
(2008): “Instrumental Variable Quantile Regression: A Robust Inference Approach,”
Journal of Econometrics, 142, 379–398.
Chesher, A. (2001): “Parameter Approximations for Quantile Regressions with Measurement
Error,” Working Paper CWP02/01, Department of Economics, University College London.
Delaigle, A., P. Hall, and A. Meister (2008): “On Deconvolution With Repeated Measurements,” The Annals of Statistics, 36, 665–685.
43
Fan, J. (1991a): “Asymptotic Normality for Deconvolution Kernel Density Estimators,” Sankhyā
: The Indian Journal of Statistics, Series A, 53, 97–110.
(1991b): “On the Optimal Rates of Convergence for Nonparametric Deconvolution Problems,” The Annals of Statistics, 19, 1257–1272.
Fan, J., and Y. K. Truong (1993): “Nonparametric Regression with Errors in Variables,” The
Annals of Statistics, 21, 1900–1925.
Galvao, A. F., and L. Wang (2015): “Uniformly Semiparametric Efficient Estimation of Treatment Effects with a Continuous Treatment,” Journal of the American Statistical Association,
forthcoming.
Harmon, C. P., and H. Oosterbeek (2000): “The Returns to Education: A Review of the
Evidence, Issues and Deficiencies in the Literature,” Centre for the Economics of Education,
LSE.
Hausman, J., Y. Luo, and C. Palmer (2014): “Errors in the Dependent Variable of Quantile
Regression Models,” Working Paper, MIT.
He, X., and H. Liang (2000): “Quantile Regression Estimates for a Class of Linear and Partially
Linear Errors-in-Variables Models,” Statistica Sinica, 10, 129–140.
He, X., and Q.-M. Shao (1996): “A General Bahadur Representation of M-estimators and
Its Application to Linear Regression with Nonstochastic Designs,” The Annals of Statistics, 6,
2608–2630.
(2000): “On Parameters of Increasing Dimensions,” Journal of Multivariate Analysis, 73,
120–135.
Hu, Y., and Y. Sasaki (2015): “Closed-form Estimation of Nonparametric Models with Nonclassical Measurement Errors,” Journal of Econometrics, 185, 392–408.
Kane, T., C. Rouse, and D. Staiger (1999): “Estimating Returns to Schooling When Schooling
is Misreported,” NBER Working Papers No. 7235.
Koenker, R. (2005): Quantile Regression. Cambridge University Press, New York, New York.
Koenker, R., and G. W. Bassett (1978): “Regression Quantiles,” Econometrica, 46, 33–49.
Kosorok, M. (2008): Introduction to Empirical Processes and Semiparametric Inference. Springer.
Kotlarski, I. (1967): “On Characterizing the Gamma and the Normal Distribution,” Pacific
Journal of Mathematics, 20, 69–76.
Li, T. (2002): “Robust and Consistent Estimation of Nonlinear Errors-in-variables Models,” Journal of Econometrics, 110, 1–26.
Li, T., and Q. Vuong (1998): “Nonparametric Estimation of the Measurement Error Model
Using Multiple Indicators,” Journal of Multivariate Analysis, 65, 139–165.
44
Ma, Y., and G. Yin (2011): “Censored Quantile Regression with Covariate Measurement Errors,”
Statistica Sinica, 21, 949–971.
Montes-Rojas, G. V. (2011): “Quantile Regression with Classical Additive Measurement Errors,” Economics Bulletin, 31, 2863–2868.
Newey, W. K., and D. L. McFadden (1994): “Large Sample Estimation and Hypothesis
Testing,” in Handbook of Econometrics, Vol. 4, ed. by R. F. Engle, and D. L. McFadden. North
Holland, Elsevier, Amsterdam.
Pakes, A., and D. Pollard (1989): “Simulations and the Asymptotics of Optimization Estimators,” Econometrica, 57, 1027–1057.
Politis, D. N., and J. P. Romano (1999): “Multivariate Density Estimation with General
Flat-Top Kernels of Infinite Order,” Journal of Multivariate Analysis, 68, 1–25.
Powell, J. L. (1983): “The Asymptotic Normality of Two-Stage Least Absolute Deviations
Estimators,” Econometrica, 51, 1569–1575.
Schennach, S. M. (2004): “Estimation of Nonlinear Models with Measurement Error,” Econometrica, 72, 33–75.
(2007): “Instrumental Variable Estimation of Nonlinear Errors-in-Variables Models,”
Econometrica, 75, 201–239.
(2008): “Quantile Regression with Mismeasured Covariates,” Econometric Theory, 24,
1010–1043.
Torres-Saavedra, P. (2013): “Quantile Regression for Repeated Responses Measured with Error,” North Carolina State University, mimeo.
Wang, H. J., L. A. Stefanski, and Z. Zhu (2012): “Corrected-Loss Estimation for Quantile
Regression with Covariate Measurement Errors,” Biometrika, 99, 405–421.
Wei, Y., and R. J. Carroll (2009): “Quantile regression with measurement error,” Journal of
the American Statistical Association, 104, 1129–1143.
Wu, Y., Y. Ma, and G. Yin (2014): “Smoothed and Corrected Score Approach to Censored
Quantile Regression With Measurement Errors,” Journal of the American Statistical Association,
forthcoming.
45
Table 1: Fourier Estimator
0.2
0.3
0.4
hx \hy
0.5
0.6
1.5
B
SD
MSE
0.09592
0.09765
0.01874
0.07751
0.08403
0.01307
0.09131
0.11718
0.02207
0.06610
0.06085
0.00807
0.09373
0.08298
0.01567
1.6
B
SD
MSE
0.09770
0.09206
0.01802
0.07553
0.07425
0.01122
0.08439
0.08380
0.01414
0.08457
0.07819
0.01327
0.09696
0.07446
0.01495
1.7
B
SD
MSE
0.10225
0.10788
0.02209
0.06963
0.04347
0.00674
0.07425
0.09206
0.01399
0.07394
0.08228
0.01224
0.09968
0.08720
0.01754
1.8
B
SD
MSE
0.09728
0.08762
0.01714
0.07464
0.05453
0.00855
0.07571
0.06794
0.01035
0.08745
0.07288
0.01296
0.10426
0.07359
0.01629
1.9
B
SD
MSE
0.08473
0.02795
0.007961
0.09663
0.09039
0.017507
0.08869 0.10269
0.08893 0.11265
0.015774 0.023235
0.10458
0.08798
0.018679
hx
1.7
hy
0.3
B
0.06963
SD
0.04347
optimal
46
MSE
0.00674
hx \hy
Table 2: Infeasible Kernel Estimator
0.1
0.2
0.3
0.4
0.5
0.1
B
0.00706 0.01670 0.01124 0.01097 0.03331
SD 0.04271 0.04013 0.04270 0.04468 0.04288
MSE 0.00187 0.00189 0.00195 0.00212 0.00295
0.2
B
0.01065 0.00200 0.00537 0.01181 0.03037
SD 0.03394 0.03158 0.03454 0.02594 0.02880
MSE 0.00127 0.00100 0.00122 0.00081 0.00175
0.3
B
0.00513 0.00737 0.00777 0.01121 0.01890
SD 0.02867 0.02620 0.02737 0.02755 0.02652
MSE 0.00085 0.00074 0.00081 0.00088 0.00106
0.4
B
0.01320 0.00935 0.01242 0.01729 0.02802
SD 0.02411 0.02800 0.02527 0.02289 0.02464
MSE 0.00076 0.00087 0.00079 0.00082 0.00139
0.5
B
0.02175 0.01778 0.01631 0.02742 0.03347
SD 0.02270 0.02195 0.02541 0.02582 0.02875
MSE 0.00099 0.00080 0.00091 0.00142 0.00195
optimal
hx
0.3
hy
0.2
B
0.00737
47
SD
0.02620
MSE
0.00074
Table 3: Naive Kernel Estimator
0.1
0.2
0.3
0.4
hx \hy
0.5
0.1
B
0.11254 0.11132 0.10409 0.10296 0.13249
SD 0.04776 0.04581 0.04199 0.04449 0.04084
MSE 0.01495 0.01449 0.01260 0.01258 0.01922
0.2
B
0.09695 0.09553 0.10231 0.10989 0.12630
SD 0.03399 0.03093 0.03381 0.02886 0.03153
MSE 0.01055 0.01008 0.01161 0.01291 0.01695
0.3
B
0.10133 0.09800 0.10012 0.10388 0.11663
SD 0.02957 0.02854 0.02820 0.02557 0.02600
MSE 0.01114 0.01042 0.01082 0.01145 0.01428
0.4
B
0.10243 0.09939 0.10476 0.10609 0.11884
SD 0.02715 0.03008 0.02441 0.02245 0.02630
MSE 0.01123 0.01078 0.01157 0.01176 0.01481
0.5
B
0.10757 0.10313 0.10299 0.11340 0.11782
SD 0.02196 0.02276 0.02703 0.02525 0.02700
MSE 0.01205 0.01115 0.01134 0.01350 0.01461
optimal
hx
0.2
hy
0.2
B
0.09553
48
SD
0.03093
MSE
0.01008
Table 4: Simulation Results over Various Quantiles
τ \estimator
Fourier Infeasible Naive
τ = 0.2
B
0.07068
SD 0.06196
MSE 0.00883
0.00510
0.02002
0.00043
0.09976
0.02746
0.01070
τ = 0.3
B
0.06372
SD 0.05685
MSE 0.00729
0.00515
0.01775
0.00034
0.09785
0.02568
0.01023
τ = 0.4
B
0.06014
SD 0.05530
MSE 0.00667
0.00480
0.01734
0.00032
0.09737
0.02507
0.01011
τ = 0.5
B
0.05943
SD 0.05487
MSE 0.00654
0.00457
0.01770
0.00033
0.09778
0.02383
0.01013
τ = 0.6
B
0.06029
SD 0.05546
MSE 0.00671
0.00479
0.01792
0.00034
0.09818
0.02391
0.01021
τ = 0.7
B
0.06326
SD 0.05681
MSE 0.00723
0.00542
0.02003
0.00043
0.09915
0.02422
0.01042
τ = 0.8
B
0.07213
SD 0.05947
MSE 0.00874
0.00458
0.02265
0.00053
0.10130
0.02564
0.01092
49
Table 5: Summary Statistics
MEAN S.D.
Log of Wage
2.117
0.571
Education (self-report)
14.110 2.501
Education (twin’s report)
13.925 2.526
Age
42.477 10.050
Number of observations
428
50
MIN
MAX
-1.426 4.573
10.000 17.000
10.000 17.000
21.000 59.000
Download