N I V

advertisement
NONPARAMETRIC INSTRUMENTAL VARIABLES
ESTIMATION OF A QUANTILE REGRESSION MODEL
Joel Horowitz
Sokbae Lee
THE INSTITUTE FOR FISCAL STUDIES
DEPARTMENT OF ECONOMICS, UCL
cemmap working paper CWP09/06
NONPARAMETRIC INSTRUMENTAL VARIABLES ESTIMATION OF A QUANTILE
REGRESSION MODEL
by
Joel L. Horowitz
Department of Economics
Northwestern University
Evanston, IL 60208
U.S.A.
and
Sokbae Lee
Department of Economics
University College London
London WC1E 6BT
United Kingdom
May 2006
Abstract
We consider nonparametric estimation of a regression function that is identified by
requiring a specified quantile of the regression “error” conditional on an instrumental
variable to be zero. The resulting estimating equation is a nonlinear integral equation of
the first kind, which generates an ill-posed-inverse problem. The integral operator and
distribution of the instrumental variable are unknown and must be estimated
nonparametrically. We show that the estimator is mean-square consistent, derive its rate
of convergence in probability, and give conditions under which this rate is optimal in a
minimax sense. The results of Monte Carlo experiments show that the estimator behaves
well in finite samples.
JEL Codes: C13, C31
Key words: Statistical inverse, endogenous variable, instrumental variable, optimal rate,
nonlinear integral equation, nonparametric regression
The research of Joel L. Horowitz was supported in part by NSF Grant 00352675. The work of
both authors was supported in part by ESRC Grant RES-000-22-0704 and by the Leverhulme
Trust through its funding of the Centre for Microdata Methods and Practice and the research
program “Evidence, Inference, and Inquiry.”
NONPARAMETRIC INSTRUMENTAL VARIABLES ESTIMATION OF A QUANTILE
REGRESSION MODEL
1. Introduction
Quantile regression models are increasingly important in applied econometrics. This
paper is concerned with nonparametric estimation of a quantile regression model that has a
possibly endogenous explanatory variable and is identified through an instrumental variable.
Specifically, we estimate the function g in the model
(1.1)
Y = g( X ) + U
(1.2)
P (U ≤ 0 | W = w) = q ,
where Y is the dependent variable, X is an explanatory variable, W is an instrument for X , U
is an unobserved random variable, and q is a known constant satisfying 0 < q < 1 . We do not
assume that P (U ≤ 0 | X = x) is independent of x . Therefore, the explanatory variable X may
be endogenous in the quantile regression model (1.1)-(1.2). The function g is assumed to satisfy
regularity conditions but is otherwise unknown. In particular, it does not belong to a known,
finite-dimensional parametric family. The data are an iid random sample, {Yi , X i ,Wi : i = 1.,,,.n} ,
of (Y , X ,W ) . We present an estimator of g , derive its L2 rate of convergence in probability,
and provide conditions under which this rate is the fastest possible in a minimax sense.
Estimators of linear quantile regression models with endogenous right-hand side
variables are described by Amemiya (1982), Powell (1983), Chen and Portnoy (1996),
Chernozhukov and Hansen (2005a), and Ma and Koenker (2006). Chernozhukov and Hansen
(2004) and Januszewski (2002) use such models in economic applications.
Research on
nonparametric estimation of quantile regression models is relatively recent. Chesher (2003)
considers a triangular-array structure that is not necessarily additively separable and investigates
nonparametric identification of derivatives of the unknown functions in that structure.
Chernozhukov and Hansen (2005b) give conditions under which g in (1.1)-(1.2) is identified.
Chernozhukov, Imbens, and Newey (2004) give conditions for consistency of a series estimator
of g in a quantile instrumental-variables model that is not necessarily additively separable. The
rate of convergence of their estimator is unknown.
There has also been research on nonparametric estimation of g in the model
(1.3)
Y = g ( X ) + U ; E (U | W = w) = 0 .
1
Here, as in (1.1)-(1.2), X is a possibly endogenous explanatory variable and W is an instrument
for X , but the quantile restriction (1.2) is replaced by the condition E (U | W = w) = 0 . Blundell
and Powell (2003); Darolles, Florens, and Renault (2002); Florens (2003); Newey and Powell
(2003); Newey, Powell, and Vella (1999); Carrasco, Florens, and Renault (2005); Horowitz
(2005); and Hall and Horowitz (2005) discuss estimators of g in (1.3). Our estimator of g in
(1.1)-(1.2) is related to Hall’s and Horowitz’s (2005) estimator of g in (1.3), but for reasons that
will now be explained, estimating
g in (1.1)-(1.2) presents problems that are different from
those of estimating g in (1.3).
In both (1.1)-(1.2) and (1.3), the relation that identifies g is an operator equation,
(1.4)
Tg = θ ,
say, where T is a non-singular integral operator and θ is a function. T and θ are unknown but
can be estimated consistently without difficulty. However, T
−1
is discontinuous in both (1.1)-
(1.2) and (1.3), so g cannot be estimated consistently by replacing T and θ in (1.4) with
consistent estimators. This “ill-posed inverse problem” is familiar in the literature on integral
equations. See, for example, Groetsch (1984); Engl, Hanke, and Neubauer (1996), and Kress
(1999). It is dealt with by regularizing (that is, modifying) T to make the resulting inverse
operator continuous. As in Darolles, Florens, and Renault (2002) and Hall and Horowitz (2005),
we use Tikhonov (1963a, 1963b) regularization. This consists of choosing the estimator ĝ to
solve
(1.5)
2
minimize : Tˆϕ − θˆ + an ϕ
ϕ ∈G
2
,
where G is a parameter set (a set of functions in this case), Tˆ and θˆ , respectively, are consistent
estimators of T and θ ; {an } is a sequence of non-negative constants that converges to 0 as
n → ∞ ; and
ν
2
= ∫ν ( x)2 dx
for any square integrable function ν . In model (1.3), T and Tˆ are linear operators, and the
first-order condition for (1.5) is a linear integral equation (a Fredholm equation of the first kind).
In (1.1)-(1.2), however, T and Tˆ are nonlinear operators, and the first-order condition for (1.5)
is a nonlinear integral equation.
The nonlinearity of T
convergence of ĝ to g .
in (1.1)-(1.2) complicates the task of deriving the rate of
In contrast to the situation in many other nonlinear estimation
2
problems, using a Taylor series expansion to make a linear approximation to the first-order
condition is unattractive because, as a consequence of the ill-posed inverse problem, the
approximation error dominates other sources of estimation error and controls the rate of
convergence of ĝ unless very strong assumptions are made about the probability density
function of (Y , X ,W ) . To avoid making these assumptions, we use a modified version of a
method developed by Engl, Hanke, and Neubauer (1996, Theorem 10.7) to derive the rate of
convergence of ĝ . This method works directly from the objective function in (1.5), rather than
from the first-order condition. It requires us to assume that the norm of the second Fréchet
derivative of T is not too large (see assumption 6 in Section 3.2). It is an open question whether
the same rate of convergence of ĝ can be attained without making this assumption or one that is
similar to it.
The remainder of the paper is organized as follows. Section 2 presents the estimator for
the special case in which X and W are scalars. Section 3 gives conditions under which the
estimator is consistent, obtains its rate of convergence, and shows that the rate of convergence is
the fastest possible (in a minimax sense) under our assumptions. Section 4 extends the results of
sections 2 and 3 to a multivariate model in which X is a vector that may have some exogeneous
components. Section 5 presents the results of a Monte Carlo investigation of the estimator’s
finite-sample performance.
Concluding comments are given in Section 6.
The proofs of
theorems are in the appendix, which is Section 7.
2. The Estimator
This section describes our estimator of g in (1.1)-(1.2) when X and W are scalars. Let
FY | XW denote the distribution function of Y conditional on ( X ,W ) . We assume that the
conditional distribution of Y has a probability density function, fY | XW , with respect to Lebesgue
We also assume that ( X ,W ) has a probability density function with respect to
measure.
Lebesgue measure, f XW . Let fW denote the marginal density of W . Define FYXW = FY | XW f XW
and fYXW = fY | XW f XW .
Assume without loss of generality that the support of ( X ,W ) is
contained in [0,1]2 .
It follows from (1.1)-(1.2) that
(2.1)
1
∫0 FYXW [ g ( x), x, w]dx = qfW (w)
3
for almost every w . We assume that (2.1) uniquely identifies g up to a set of x values whose
Lebesgue measure is 0.
Chernozhukov and Hansen (2005b, Theorem 4) give sufficient
conditions for this assumption to hold.
Now define the operator T on L2 [0,1] by
(2.2)
1
(Tϕ )( w) = ∫ FYXW [ϕ ( x ), x, w]dx ,
0
where ϕ is any function in L2 [0,1] . Then (2.1) is equivalent to the operator equation
Tg = qfW .
Identifiability of g is equivalent to assuming that T is invertible. Thus,
(2.3)
g = qT
However, T
−1
−1
fW .
is discontinuous because the Fréchet derivative of T is a completely continuous
operator and, therefore, has an unbounded inverse if fYXW is “well behaved”. Consequently, g
cannot be estimated consistently by replacing T and fW with consistent estimators in (2.3). As
was explained in Section 1, we use Tikhonov regularization to deal with this problem.
We now describe the version of problem (1.5) that we solve to obtain the regularized
estimator of g . We assume that fYXW has r > 0 continuous derivatives with respect to any
combination of its arguments. 1
Let K denote a continuously differentiable kernel function
whose support is [−1,1] , is symmetrical about 0, and that satisfies
(2.4)
⎧1if j = 0
⎪
∫−1 v K (v)dv = ⎨
⎪0 if 1 ≤ j ≤ max(1, r − 1).
⎩
1
j
Let h > 0 denote a bandwidth parameter, and define K h (v ) = K (v / h) for any v ∈ [−1,1] . We
estimate fW , fYXW , and FYXW , respectively, by
1 n
fˆW ( w) = ∑ K h ( w − Wi )
nh i =1
1 n
fˆYXW ( y, x, w) = 3 ∑ K h ( y − Yi ) K h ( x − X i ) K h ( w − Wi ) ,
nh i =1
and
y
FˆYXW ( y, x, w) = ∫ fˆYXW (v, x, w) dv .
−∞
Define the operator Tˆ by
4
1
(Tˆϕ )( w) = ∫ FˆYXW [ϕ ( x), x, w]dx .
0
for any ϕ ∈ L2 [0,1] . Let ϕ denote the L2 norm of ϕ . Define G = {ϕ ∈ L2 [0,1]: ϕ
2
≤ M } for
some constant M < ∞ . Our estimator of g is any solution to the problem
(2.5)
gˆ = arg min Sn (ϕ ) ≡ Tˆϕ − qfˆW
ϕ ∈G
2
+ an ϕ
2
.
Under our assumptions, a function ĝ that minimizes S n always exists, though it may not be
unique (Bissantz, Hohage, and Munk. 2004).
3. Theoretical Properties
This section gives conditions under which the estimator ĝ of Section 2 is consistent,
derives the rate at which ĝ − g
2
converges in probability to 0, and gives conditions under which
this is the fastest possible rate, in a minimax sense.
3.1 Consistency
This section gives conditions under which E gˆ − g
2
→ 0 as n → ∞ .
Define
δ n = h 2 r + ( nh) −1 . Make the following assumptions.
Assumption 1: (a) The function g is identified. That is, (2.1) specifies g ( x) uniquely
up to a set of x values whose Lebesgue measure is 0. (b) g
2
≤ M for some constant M < ∞ .
Assumption 2: fYXW has r > 0 continuous derivatives with respect to any combination
of its arguments. These derivatives and fYXW are bounded in absolute value by M .
Assumption 3: As n → ∞ , an → 0 , δ n → 0 , and δ n / an → 0 .
Assumption 4:
The kernel function K
is supported on [−1,1] , continuously
differentiable, symmetrical about 0, and satisfies (2.4).
In assumption 2, r need not be an integer. If r is not an integer, then the assumption
means that
D[ r ] [ fYXW ( y1 , x1 , w1 ) − fYXW ( y2 , x2 , w2 )] ≤ M ( y1 , x1 , w1 ) − ( y2 , x2 , w2 )
r −[ r ]
,
where [r ] is the integer part of r , D[ r ] fYXW denotes any order [r ] derivative of fYXW , and
z1 − z2 is the Euclidean distance between the points z1 and z2 .
5
Assumptions 1(b) and 2 are standard in nonparametric estimation. Assumptions 3 and 4
are satisfied by a wide range of choices of h , an and K . The choice of h in applications is
discussed in Section 3.2.
Theorem 1: Let assumptions 1-4 hold. Then
(3.1)
2
lim E gˆ − g
n→∞
=0.
Result (3.1) implies that ĝ − g
2
converges in probability to 0 as n → ∞ . The next section
obtains the rate of convergence.
3.2 Rate of Convergence
In model (1.3), where T is a linear operator, the source of the ill-posed inverse problem
is that sequence of singular values of T (or, equivalently, eigenvalues of T *T , where T * is the
adjoint of T ) converges to 0. Consequently, the rate at which ĝ − g
2
converges to 0 depends
on the rate of convergence of the singular values (or eigenvalues). See Hall and Horowitz (2005).
In (1.1)-(1.2), where T is nonlinear, the source of the ill-posed inverse problem is convergence
to 0 of the singular values of the Fréchet derivative of T at g . Denote this derivative by Tg .
The rate of convergence of ĝ − g
2
in (1.1)-(1.2) depends on the rate of convergence of the
singular values of Tg or, equivalently, of the eigenvalues of Tg*Tg , where Tg* is the adjoint of
Tg . Accordingly, the regularity conditions for our rate of convergence result are framed in terms
of the spectral representation of Tg*Tg .
It is easy to show that the Frechet derivative of T at g is the operator Tg defined by
1
(Tg ϕ )( w) = ∫ fYXW [ g ( x ), x, w]ϕ ( x) dx .
0
The adjoint operator is defined by
1
(Tg*ϕ )( w) = ∫ fYXW [ g ( w), w, x ]ϕ ( x) dx .
0
We assume that Tg*Tg
is non-singular, so its eigenvalues are strictly positive.
Let
{λ j ,φ j : j = 1, 2,...} denote the eigenvalues and orthonormal eigenvectors of Tg*Tg ordered so that
λ1 ≥ λ2 ≥ ... > 0 . Under our assumptions, Tg*Tg is a completely continuous operator, so {φ j }
forms a basis for L2 [0,1] . Therefore, we may write
6
∞
(3.2)
g ( x ) = ∑ b jφ j ( x ) ,
j =1
where the Fourier coefficients b j are given by
1
b j = ∫ g ( x )φ j ( x) dx .
0
We make the following additional assumptions.
Assumption 5:
β − 1/ 2 ≤ α < 2 β − 1 ,
(a) There are constants α > 1 , β > 1 , and C0 < ∞ such that
j −α ≤ C0 λ j , and
| b j | ≤ C0 j − β
for all
j ≥ 1 . (b) Moreover,
r ≥ max[(2β + α − 1) / 2,(3β + α − 1/ 2) /(α + 1)] .
Assumption 6: There is a finite constant L > 0 such that
(3.3)
T ( g1 ) − T ( g 2 ) − Tg2 ( g1 − g 2 ) ≤ 0.5L g1 − g 2
2
for any g1 , g 2 ∈ L2 [0,1] and
∞
(3.4)
b 2j
∑ λ j < 1/ L .
j =1
Assumption 7:
The tuning parameters h and an satisfy h = Ch n −1/(2 r +1) and
an = Ca n −α /(2 β +α ) , where Ch and Ca are positive, finite constants.
In assumption 5(a), α characterizes the severity of the ill-posed inverse problem. As α
increases, the problem becomes more severe and, consequently, the fastest possible rate of
convergence of any estimator of g decreases. The parameter β characterizes the complexity of
g in the sense that if β is large, then g can be well approximated by the finite-dimensional
parametric model that is obtained by truncating the series on the right-hand side of (3.2) at j = J
for some small integer J . A finite-dimensional g can be estimated with a n −1/ 2 rate of
convergence, so the rate of convergence of ĝ − g
2
increases as β increases.
Assumption (5a) places tight restrictions on the rates of convergence of λ j and b j . This
is unavoidable for obtaining optimal rates of convergence with Tikhonov regularization. The
restrictions are not needed for consistent estimation, as Theorem 1 shows. In linear inverse
problems, the restrictions can sometimes be relaxed by using other forms of regularization. See,
for example, Carrasco, Florens, and Renault (2005) and Bissantz, Hohage, Munk, and Ruymgaart
(2006). However, there has been little research on the application of non-Tikhonov methods to
7
nonlinear inverse problems, especially when T in (1.4) is unknown and must be estimated.
Much additional research will be needed to determine the extent to which such methods are
useful in nonlinear problems with an unknown T .
Together, assumptions 2 and 5(a) imply that Tg*Tg is a completely continuous linear
operator, so its eigenvectors form a basis for L2 [0,1] . Assumption 2 implies that inequality (3.3)
holds for some L < ∞ , so (3.3) amounts to the definition of L . Inequality (3.4) restricts the norm
of the second Fréchet derivative of T . A sufficient condition for (3.4) is
−1
⎛ ∞ b 2j ⎞
sup ∂fYXW ( y, x, w) / ∂y < ⎜ ∑ ⎟ .
⎜ j =1 λ j ⎟
y , x,w
⎝
⎠
Restrictions similar to (3.4) are well-known in the theory of nonlinear integral equations. It is an
open question whether an estimator of g that has our rate of convergence can be achieved
without making an assumption similar to (3.4). Assumptions 2 and 5(b) imply that fW , and Tϕ
for any ϕ ∈ L2 [0,1] can be estimated with a rate of convergence in probability of
O p [ n − ( β −1/ 2+α / 2) /(2 β +α ) ] and that fYXW can be estimated with a rate of convergence of
O p [n − ( β −1/ 2) /(2 β +α ) ] . These rates are needed to obtain our rate of convergence of ĝ . Under
assumption 7, h has the optimal rate of convergence for nonparametric estimation of an r -times
differentiable density function of a scalar argument.
Accordingly, h can be chosen in
applications by, say, cross-validation applied to fˆW .
Let H = H( M , C0 , α , β , L) be the set of distributions of (Y , X ,W ) that satisfy
assumptions 1, 2, 5, and 6 with fixed values of M , C0 , α , β , and L . Our rate-of-convergence
result is given by the following theorem.
Theorem 2: Let assumptions 1-2 and 4-7 hold. Then
lim limsup sup PH ⎡ gˆ − g
⎣
D →∞ n→∞ H ∈H
2
> Dn−(2 β −1) /(2 β +α ) ⎤ = 0 .
⎦
As expected, the rate of convergence of ĝ − g
2
decreases as α increases (the ill-posed inverse
problem becomes more severe) and increases as β increases ( g increasingly resembles a finitedimensional parametric model).
The next theorem shows that the convergence rate in Theorem 2 is optimal in a minimax
sense under our assumptions. Let {g n } denote any sequence of estimators of g satisfying
g n
2
≤ M for some M < ∞ .
8
Theorem 3: Let assumptions 1-2 and 4-7 hold with α ≥ 2 . Then for every finite D > 0
liminf sup PH ⎡ g n − g
⎣
n→∞ H ∈H
2
> Dn −(2 β −1) /(2 β +α ) ⎤ > 0 .
⎦
Our proof of Theorem 3 requires α ≥ 2 . It is an open question whether the rate of
convergence n −(2 β −1) /(2 β +α ) is optimal when 1 < α < 2 .
4. Multivariate Model
This section extends the results of Sections 2 and 3 to a multivariate model in which X
is a vector that may have some exogenous components. We rewrite model (1.1)-(1.2) as
(4.1)
Y = g( X , Z ) + U
(4.2)
P (U ≤ 0 | W = w, Z = z ) = q ,
where Y is the dependent variable, X ∈ \ A (A ≥ 1) is a vector of possibly endogenous
explanatory variables, Z ∈ \ m ( m ≥ 0) is a vector of exogenous explanatory variables, and
W ∈ \A is a vector of instruments for X , U is an unobserved random variable, and q is a
known constant satisfying 0 < q < 1 . If m = 0 , then Z is not in the model. The problem is to
estimate g
nonparametrically from data consisting of the independent random sample
{Yi , X i ,Wi , Z i : i = 1,..., n} .
The estimator is obtained by applying the method of Section 2 after conditioning on Z .
To do this, let K A,h (ν ) =
∏ j =1 Kh (ν j / h) , where ν j
A
is the j ’th component of the A -vector ν .
Define K m,h (ν ) for an m -vector ν similarly. Let fWZ and fYXWZ , respectively, denote the
probability density functions of (W , Z ) and (Y , X ,W , Z ) . Define
FYXWZ ( y, x, w, z ) =
y
∫−∞ fYXWZ (ν , x, w, z)dν .
Define the following kernel estimators of fWZ , fYXWZ , and FYXWZ :
fˆWZ ( w, z ) =
1
nhA + m
fˆYXWZ ( w, z ) =
n
∑ KA,h (w − Wi ) Km,h ( z − Zi ) ,
i =1
n
1
nh 2A + m +1
∑ Kh ( y − Yi ) KA,h ( x − X i ) KA,h (w − Wi ) K m,h ( z − Zi ) ,
i =1
and
FˆYXWZ ( y, x, w, z ) =
y
∫−∞ fˆYXWZ (ν , x, w, z)dν .
9
For each z ∈ [0,1]m , define the operators T z and Tˆz on L2 [0,1]A by
(T zϕ z )( w) =
∫[0,1] FYXWZ [ϕ z ( x), x, w, z ]dx
(Tˆzϕ z )( w) =
∫[0,1] FˆYXWZ [ϕ z ( x), x, w, z ]dx ,
A
and
A
where ϕ z is any function on L2 [0,1]A .
The function g satisfies
(4.3)
(T z g )( w, z ) = qfWZ ( w, z ) .
The function g (⋅, z ) is identified if (4.3) has a unique solution (up to a set of x values of
Lebesgue measure 0) for the specified z . Define G = {ϕ ∈ L2 [0,1]A : ϕ
2
≤ M } for some constant
M < ∞ . For each z ∈ [0,1]m , the estimator of g (⋅, z ) is any solution to the problem
(4.4)
gˆ(⋅, z ) = arg min
ϕz ∈ G
{∫
[0,1]A
[(Tˆzϕ z )( w) − qfˆWZ ( w, z )]2 dw + an ∫
[0,1]A
}
ϕ z ( w)2 dw .
4.1 Consistency
This section gives conditions under which E gˆ − g
2
→ 0 as n → ∞ in model (4.1)-
(4.2). Define δ n = h 2 r + ( nhA + m ) −1 . Make the following assumptions, which are extensions of
the assumptions of Section 3.1.
Assumption 1′: (a) The function g is identified. (b)
∫[0,1] g ( x, z )
A
2
dx ≤ M for each
z ∈ [0,1]m and some constant M < ∞ .
Assumption 2′: fYXWZ has r > 0 continuous derivatives with respect to any combination
of its arguments. These derivatives and fYXWZ are bounded in absolute value by M .
Assumption 3′: As n → ∞ , an → 0 , δ n → 0 , and δ n / an → 0 .
Theorem 4: Let assumptions 1′-3′ and 4 hold. Then for each z ∈ [0,1]m ,
lim E ∫
n →∞
[0,1]A
[ gˆ ( x, z ) − g ( x, z )]2 dx = 0 .
4.2 Rate of Convergence
As when X and W are scalars, the rate of convergence of ĝ in the multivariate model
depends on the rate of convergence of the singular values of the Fréchet derivative of T z .
10
*
Accordingly, let Tgz denote the Fréchet derivative of T z at g , and let Tgz
denote the adjoint of
*
Tgz . Tgz and Tgz
, respectively, are the operators defined by
(Tgzϕ z )( w) =
∫[0,1]
fYXWZ [ g ( x, z ), x, w, z ]ϕ z ( x) dx
*
(Tgz
ϕ z )( w) =
∫[0,1]
fYXWZ [ g ( w, z ), w, x, z ]ϕ z ( x ) dx .
A
and
A
*
Assume that for each z ∈ [0,1]A , Tgz
Tgz is non-singular. Let {(λzj ,φ zj ) : j = 1, 2,...} denote the
*
Tgz ordered so that λz1 ≥ λz 2 ≥ ... > 0 . The eigenvectors
eigenvalues and eigenvectors of Tgz
{φ zj } form a complete, orthonormal basis for L2 [0,1]A . Thus, for each z ∈ L2 [0,1]A we can write
g ( x, z ) =
∞
∑ bzjφzj ( x) ,
j =1
where
bzj =
∫[0,1] g ( x, z )φz ( x)dx .
A
Now make the following additional assumptions, which extend those of Section 3.2.
Assumption 5′:
(a) There are constants α > 1 , β > 1 , and C0 < ∞ such that
β − 1/ 2 ≤ α < 2 β − 1 , j −α ≤ C0 λzj , and | bzj | ≤ C0 j − β uniformly in z ∈ [0,1]m for all j ≥ 1 . (b)
Moreover, r ≥ max{[A(2 β + α − 1) − m] / 2, r *} , where r * is the largest root of the equation
4[(α + 1) /(2 β + α )]r 2 − 2[(A + m)(2 β − 1) /(2β + α ) + A + 1 − m]r − (A + 1)m = 0 .
Assumption 6′: There is a finite constant L > 0 such that
T z ( g1 ) − T z ( g 2 ) − Tg 2 z ( g1 − g 2 ) ≤ 0.5L g1 − g 2
2
for any g1 , g 2 ∈ L2 [0,1] uniformly in z ∈ [0,1]m and
∞
bzj2
∑ λzj < 1/ L .
j =1
uniformly in z ∈ [0,1]m .
Assumption 7′:
The tuning parameters h and an satisfy h = Ch n −1/(2 r + A + m ) and
an = Ca n −τα /(2 β +α ) , where τ = 2r /(2r + m) and Ch and Ca are positive, finite constants.
11
Let HM = HM ( M , C0 , α , β , L, r , A, m) be the set of distributions of (Y , X , Z ,W ) that
satisfy assumptions 1′, 2′, 5′, and 6′ with fixed values of M , C0 , α , β , L , r , A , and m . The
multivariate extension of Theorems 2 and 3 is
Theorem 5: Let assumptions 1′-2′, 4, and 5′-7′ hold. Then for each z ∈ [0,1]m ,
(4.5)
lim limsup sup PH
D →∞ n→∞
H ∈HM
{∫
[0,1]A
}
[ gˆ ( x, z ) − g ( x, z )]2 dx > Dn −τ (2 β −1) /(2 β +α ) = 0 .
If, in addition,
(4.6)
[A(2 β + α − 1) − m]/ 2 ≥ r * ,
then for each z ∈ [0,1]m
(4.7)
liminf sup PH
n→∞ H ∈H
M
{∫
[0,1]A
}
[ gˆ ( x, z ) − g ( x, z )]2 dx > Dn −τ (2 β −1) /(2 β +α ) > 0 .
If m = 0 , then (4.6) simplifies to α ≥ 1 + 1/ A . The rate of convergence in Theorem 5 is
the same as that in Theorem 3 if A = 1 and m = 0 . The theorem shows that increasing m
decreases the rate of convergence of ĝ for any fixed r .
dimensionality of nonparametric estimation.
This is the familiar curse of
In addition, assumption 5′ implies that as A
increases, r must also increase to maintain the rate of convergence n−τ (2 β −1) /(2 β +α ) . This is a
form of the curse of dimensionality that is associated with the endogenous explanatory variable
X.
5. Monte Carlo Experiments
This section reports the results of Monte Carlo simulations that illustrate the finite-sample
performance of the estimator that is obtained by solving (2.5). Samples of (Y , X ,W ) were
generated from the model
Y = g(X ) + U ,
where
∞
g ( x) = 21/ 2 ∑ (−1) j +1 j −6 sin( jπ x)
j =1
and (U , X ,W ) is sampled from the distribution whose density is
12
∞
⎡ jπ
⎤
fUXW (u , x, w) = C ∑ a j sin ⎢ (u + 1) ⎥ sin( jπ x)sin( jπ w)
⎣ 2
⎦
j =1
∞
+ 0.15∑ d j sin[ jπ (u + 1)]sin(2 jπ x)sin( jπ w)
j =1
for (U , X ,W ) ∈ [ −1,1] × [0,1]2 . In this density,
⎪⎧ j −3 if j = 1,3,5,...
aj = ⎨
⎪⎩0 otherwise,
d j = j −10 , and
⎡⎛ 16 ⎞ ∞ a j ⎤
C = ⎢⎜ 3 ⎟ ∑ 3 ⎥
⎣⎢⎝ π ⎠ j =1 j ⎥⎦
−1
.
The density is cumbersome algebraically, but it is convenient for computations and has a
conventional shape. This is illustrated in Figures 1-3, which show graphs of the density of W ,
the density of X conditional on W , and the density of U conditional on ( X ,W ) .
For
computational purposes, the infinite series were truncated at j = 100 .
The kernel function used for density estimation is K ( x) = (15 /16)(1 − x 2 ) 2 for | x | ≤ 1 .
The estimates of g were computed by using the Levenberg-Marquardt method (Engl, Hanke and
Neubauer 1996, p. 285). The starting function was obtained by carrying out a cubic quantile
regression of Y on X without controlling for endogeneity.
Each experiment consisted of estimating g at the 99 points x = 0.01,0.02,...,0.99 . The
experiments were carried out in GAUSS using GAUSS pseudo-random number generators.
There were 500 Monte Carlo replications in each experiment. Experiments were carried out
using sample sizes n = 200 and n = 800 .
The results are summarized in Table 1 and illustrated graphically in Figures 4-5,
respectively, for n = 200 and n = 800 .
For each sample size, there are two values of the
bandwidth parameter, h (0.5 and 0.8), and two values of the regularization parameter, an (0.5
and 1). For estimation of fYXW the bandwidth 2h is used in the Y direction and h is used in the
X and W directions, because the standard deviation of Y is about double those of X and W .
The results in Table 1 show that the empirical integrated variance and integrated mean-square
error (IMSE) decrease as the sample size increases. The bias does not decrease if an remains
fixed. This is not surprising because an is the main source of estimation bias. Figures 4-5 show
13
g ( x) (dashed line), the Monte Carlo approximation to E[ gˆ ( x )] (solid line), and the estimates,
ĝ , whose IMSEs are the 25th, 50th, and 75th percentiles of the IMSEs of the 500 Monte Carlo
replications. The figures show, not surprisingly, that ĝ is biased but that its shape is similar to
that of g .
6. Conclusions
This paper has presented a nonparametric instrumental variables estimator of a quantile
regression model, derived the estimator’s rate of mean-square convergence in probability, and
given conditions under which this rate is the fastest possible in a minimax sense. The estimator’s
finite-sample performance has been illustrated by a small set of Monte Carlo experiments.
Several topics remain for future research.
The problem of deriving the asymptotic
distribution of ĝ appears to be quite difficult. In contrast to the situation in many other nonlinear
estimation problems, asymptotic normality cannot be obtained by using a Taylor series
approximation to linearize the first-order condition for (2.5). This is because, as was explained in
Section 1, the ill-posed inverse problem causes the error of the linear approximation to dominate
other sources of estimation error unless very strong assumptions are made about the distribution
of (Y , X ,W ) . This problem does not arise with the mean-regression estimator of Hall and
Horowitz (2005) and Horowitz (2005), because the first-order condition in the mean regression is
a linear equation for ĝ . Other topics for future research include determining whether the rate of
convergence of Theorem 2 is optimal when 1 < α < 2 (or when r < r * in Theorem 5) and finding
a method to choose the regularization parameter an in applications.
7. Mathematical Appendix: Proofs of Theorems
This appendix provides proofs of Theorems 1-3. Theorems 4-5 can be proved by
following the same steps after conditioning on Z .
6.1 Proof of Theorem 1
The proof is a modification of the proof of Theorem 2 of Bissantz, Hohage, and Munk
(2004). By (2.5),
(6.1)
Tˆ ( gˆ ) − qfˆW
2
+ an gˆ
In addition, E Tˆ ( g ) − T ( g )
2
2
≤ Tˆ ( g ) − qfˆW
2
2
+ an g .
= O (δ n ) and E fˆW − fW
14
2
= O(δ n ) . Therefore, by assumption 3,
E Tˆ ( g ) − qfˆW
2
2
≤ 2 E Tˆ ( g ) − T ( g ) + 2qE fˆW − fW
2
= O(δ n ).
Combining this result with (6.1) gives
E Tˆ ( gˆ ) − qfˆW
2
+ an E gˆ
2
≤ Cδ n + an g
2
for some constant C < ∞ and all sufficiently large n . Therefore, by assumption 3,
lim sup E gˆ
n →∞
2
2
≤ g .
Note, in addition, that E Tˆ ( gˆ ) − qfˆW
2
→ 0 as n → ∞ . Moreover, assumptions 1(b) and 2
imply that T is weakly closed. Consistency now follows from arguments identical to those used
to prove Theorem 2 of Bissantz, Hohage, and Munk (2004, p. 1777). Q.E.D.
6.2 Proof of Theorem 2
Assumptions 1-2 and 4-7 hold throughout this section. Let ⋅, ⋅ denote the inner product
in L2 [0,1] . Define ω g = (Tg Tg* ) −1Tg g and
(6.2)
g = g − an (Tg*Tg + an I ) −1Tg*ω g ,
where I is the identity operator. Observe that by (3.4),
(6.3)
L ωg < 1 .
Let r̂ and r be Taylor series remainder terms with the properties that
(6.4)
T ( gˆ ) = qfW + Tg ( gˆ − g ) + rˆ
and
(6.5)
T ( g ) = qfW + Tg ( g − g ) + r ,
2
2
where qfW = T ( g ) . By (3.3), rˆ ≤ ( L / 2) gˆ − g , and r ≤ ( L / 2) g − g .
Lemma 6.1: For any g ∈ G ,
15
(1 − L ω ) gˆ − g
g
2
≤ g − g
2
+ an−1 Tˆ ( g ) − T ( g ) + r + qfW − qfˆW
+ 2Tg ( g − g ) + anω g , ω g + an−1 Tg ( g − g )
2
2
+2 qfW − qfˆW , ω g + 2an−1 r + qfW − qfˆW , Tg ( g − g)
+2 Tˆ ( gˆ ) − T ( gˆ ), ω g + 2an−1 Tˆ ( g ) − T ( g ), Tg ( g − g ) .
Proof: By (6.5),
(6.6)
Tˆ ( g ) − qfˆW
2
= Tˆ ( g ) − T ( g ) + r + qfW − qfˆW
2
+ Tg ( g − g )
2
+2 r + qfW − qfˆW , Tg ( g − g ) + 2 Tˆ ( g ) − T ( g ), Tg ( g − g ) .
Also,
(6.7)
Tˆ ( gˆ ) − qfˆW
2
= Tˆ ( gˆ ) − qfˆW + anω g
2
+ an2 ω g
2
−2an Tˆ ( gˆ ) − T ( gˆ ), ω g − 2an Tˆ ( gˆ ) − qfW , ω g
−2an qfW − qfˆW , ω g − 2an ω g
2
.
Moreover,
(6.8)
g − g , g = g − g , Tg*ω g
= Tg ( g − g ), ω g .
By (6.4),
(6.9)
gˆ − g , g = gˆ − g , Tg*ω g
= Tg ( gˆ − g ), ω g
= T ( gˆ ) − qfW , ω g − rˆ, ω g .
By (2.5),
(6.10)
Tˆ ( gˆ ) − qfˆW
2
+ an2 gˆ
2
≤ Tˆ ( g ) − qfˆW
2
2
+ an2 g .
16
Rearranging and expanding terms in (6.10) gives
(6.11)
gˆ − g
2
⎡
≤ an−1 ⎢ Tˆ ( g ) − qfˆW
⎣
2
− Tˆ ( gˆ ) − qfˆW
2⎤
⎥⎦ + g − g
2
2
2
+2 g − g , g − 2 gˆ − g , g .
Combining (6.11) with (6.6)-(6.9) gives
(6.12)
gˆ − g
2
≤ 2 rˆ, ω g − an−1 Tˆ ( gˆ ) − qfˆW + anω g
+ an−1 Tˆ ( g ) − T ( g ) + r + qfW − qfˆW
2
+ g − g
+ 2Tg ( g − g ) + anω g , ω g
2
+ an−1 Tg ( g − g ) + 2 qfW − qfˆW , ω g + 2an−1 r + qfW − qfˆW , Tg ( g − g )
+2 Tˆ ( gˆ ) − T ( gˆ ), ω g + 2an−1 Tˆ ( g ) − T ( g ), Tg ( g − g ) .
The lemma follows by noting that the second term on the right-hand side of (6.12) is non-positive
and that 2 rˆ, ω g ≤ L ω g gˆ − g . Q.E.D.
2
Lemma 6.2: For any g ∈ G ,
(6.13)
an−1 Tˆ ( g ) − T ( g ) + r + qfW − qfˆW
2
2
⎛
L2
g − g
≤ 4an−1 ⎜ Tˆ ( g ) − T ( g ) +
⎜
4
⎝
+ qfW − qfˆW
2
= an3 (Tg Tg* + an I )−1ω g
(6.14)
2Tg ( g − g ) + anω g , ω g + an−1 Tg ( g − g )
(6.15)
2 qfW − qfˆW , ω g + 2an−1 r + qfW − qfˆW , Tg ( g − g )
≤ L ω g g − g
+ Lan g − g
2
2
2⎞
4
+ 2an qfW − qfˆW (Tg Tg* + an I )−1ω g
(Tg Tg* + an I ) −1ω g ,
and
17
⎟⎟ ,
⎠
2
,
2 Tˆ ( gˆ ) − T ( gˆ ), ω g + 2an−1 Tˆ ( g ) − T ( g ), Tg ( g − g )
(6.16)
≤ 2an Tˆ ( g ) − T ( g ) (Tg Tg* + an I ) −1ω g + 2 ω g ⎡Tˆ ( gˆ ) − T ( gˆ ) ⎤ − ⎡ Tˆ ( g ) − T ( g ) ⎤
⎣
⎦ ⎣
⎦
+ 2 ω g ⎡Tˆ ( g ) − T ( g ) ⎤ − ⎡Tˆ ( g ) − T ( g ) ⎤ .
⎣
⎦ ⎣
⎦
Proof: Inequality (6.13) follows from (6.5) and the relation
A+ B+C
2
(
2
≤4 A + C
2
+ C
2
)
for any functions A , B , and C .
To show (6.14), note that
an (Tg Tg* + an I ) −1 = I − Tg (Tg*Tg + an I ) −1Tg* .
(6.17)
By (6.2)
(6.18) T ( g − g ) = −anTg (Tg*Tg + an I ) −1Tg*ω g .
It follows from (6.17) and (6.18) that
(6.19)
anω g + T ( g − g ) = an2 (Tg Tg* + an I ) −1ω g .
Taking the squares of the norms of both sides of (6.19) and expanding the term on the left-hand
side yields
(6.20)
an2 ω g
2
2
+ Tg ( g − g ) + 2 anω g , Tg ( g − g ) = an4 (Tg Tg* + an I )−1ω g
2
.
Then (6.14) follows by dividing both sides of (6.20) by an .
We now turn to (6.15). First note that
(6.21)
r + qfW − qfˆW , Tg ( g − g ) = r + qfW − qfˆW , Tg ( g − g ) + anω g
− r + qfW − qfˆW , anω g .
It follows from (6.19) and (6.21) that
(6.22)
2 qfW − qfˆW , ω g + 2an−1 r + qfW − qfˆW , Tg ( g − g ) + anω g
= 2an r + qfW − qfˆW ,(Tg Tg* + an I ) −1ω g − 2 r , ω g .
Then (6.15) follows by applying the Cauchy-Schwarz and triangle inequalities to (6.22).
Now we prove (6.16). Observe that by (6.19) and algebra like that yielding (6.21),
18
2 Tˆ ( gˆ ) − T ( gˆ ), ω g + 2an−1 Tˆ ( g ) − T ( g ), Tg ( g − g )
= 2an Tˆ ( g ) − T ( g ),(Tg Tg* + an I )−1ω g + 2 ⎡ Tˆ ( gˆ ) − T ( gˆ ) ⎤ − ⎡ Tˆ ( g ) − T ( g ) ⎤ , ω g ,
⎣
⎦ ⎣
⎦
which yields (6.16) by the Cauchy-Schwarz and triangle inequalities. Q.E.D.
Lemma 6.3: The following relations hold uniformly over H ∈ H .
(a)
g − g
2
= O[n − (2 β −1) /(2 β +α ) ] ,
= O[n − (2 β −1) /(2 β +α ) ] ,
(b)
an−1 g − g
(c)
an3 (Tg Tg* + an I )−1ω g
(d)
an g − g
(e)
an qfW − qfˆW (Tg Tg* + an I )−1ω g = O p [n −(2 β −1) /(2 β +α ) ] ,
(f)
an Tˆ ( g ) − T ( g ) (Tg Tg* + an I )−1ω g = O p [n − (2 β −1) /(2 β +α ) ] ,
(g)
There are random variables Δ n = O p [n − ( β −1/ 2) /(2 β +α ) ] and Γ n = o p (1) such that
4
2
2
= O[n −(2 β −1) /(2 β +α ) ] ,
(Tg Tg* + an I ) −1ω g = O[n − (2 β −1) /(2 β +α ) ] ,
⎡ Tˆ ( gˆ ) − T ( gˆ ) ⎤ − ⎡ Tˆ ( g ) − T ( g ) ⎤ ≤ Δ n gˆ − g + Γ n gˆ − g 2 ,
⎣
⎦ ⎣
⎦
(h)
⎡ Tˆ ( g ) − T ( g ) ⎤ − ⎡ Tˆ ( g ) − T ( g ) ⎤ = O p [ n − (2 β −1) /(2 β +α ) ] .
⎣
⎦ ⎣
⎦
Proof: To prove (a), note that by (6.2) and Tg*ω g = g ,
g ( x) − g ( x) = −an
∞
∑ λ j + an φ j ( x) φ j ,Tg*ωg
1
j =1
= − an
∞
bj
∑ λ j + an φ j ( x).
j =1
Therefore,
g − g
2
=
∞
an2
∑ (λ
j =1
b 2j
j
+ an ) 2
= O[n − (2 β −1) /(2 β +α ) ],
where the second line follows from arguments identical to those used to prove equation (6.4) of
Hall and Horowitz (2005). This proves (a). It follows from (a) that
19
an−1 g − g
4
= O[n − (2 β −1) /(2 β +α ) ]
whenever α < 2β − 1 , thereby proving (b).
We now turn to (c). Define ψ j = Tg φ j / Tg φ j . Use ω g = (Tg Tg* ) −1Tg g and the singular
2
value decomposition Tg*ψ j = λ1/
j φ j to obtain
∞
1
ψ j ψ j , Tg g
j =1 λ j (λ j + an )
(Tg Tg* + an I ) −1ω g = ∑
∞
1
ψ j Tg*ψ j , g
(
)
λ
λ
+
a
j
n
j =1 j
=∑
∞
1
ψ j λ1/j 2φ j , g
j =1 λ j (λ j + an )
=∑
∞
bj
1/ 2
j =1 λ j (λ j
=∑
+ an )
ψ j.
Therefore,
(6.23)
(Tg Tg*
+ an I )ω g
2
∞
=∑
b 2j
j =1 λ j (λ j
+ an ) 2
= O[an(2 β −3α −1) / α ]
by arguments like those used to prove equation (6.4) of Hall and Horowitz (2005). Therefore, (c)
follows from an = Ca n −α /(2 β +α ) .
To prove (d), note that by (6.23)
(6.24)
an (Tg Tg* + an I ) −1ω g = O[an(2 β −α −1) /(2α ) ]
= O[n − (2 β −α −1) /(4 β + 2α ) ].
Therefore, (d) follows from (6.24) and (a), because α < 2β − 1 . Now by assumptions 2 and 5(b),
fˆW − fW
2
= O p [n −(2 β −1+α ) /(2 β +α ) ]
and
Tˆ − T
2
= O p [n − (2 β −1+α ) /(2 β +α ) ]
20
uniformly over H . Therefore, (e) and (f) follow from (6.24).
We now turn to (g). By the mean value theorem,
{⎡⎣Tˆ ( gˆ ) − T ( gˆ )⎤⎦ − ⎡⎣Tˆ ( g ) − T ( g )⎤⎦} (w)
1
= ∫ { fˆYXW [ g ( x), x, w] − fYXW [ g ( x), x, w]}[ gˆ ( x) − g ( x)]dx,
0
where g is between ĝ and g . Then by the Cauchy-Schwarz inequality
⎡Tˆ ( gˆ ) − T ( gˆ ) ⎤ − ⎡Tˆ ( g ) − T ( g ) ⎤
⎣
⎦ ⎣
⎦
2
2
1 1
= ∫ ⎛⎜ ∫ { fˆYXW [ g ( x), x, w] − fYXW [ g ( x), x, w]}[ gˆ ( x) − g ( x)]dx ⎞⎟ dw
0⎝ 0
⎠
1 1
1
≤ ∫ ⎛⎜ ∫ { fˆYXW [ g ( x), x, w] − fYXW [ g ( x), x, w]}2 dx ∫ [ gˆ ( x) − g ( x)]2 dx ⎞⎟ dw
0⎝ 0
0
⎠
=∫
1
0
1
∫0 { fˆYXW [ g ( x), x, w] − fYXW [ g ( x), x, w]}
2
2
dxdw gˆ − g .
But
1
1
∫0 ∫0 { fˆYXW [ g ( x), x, w] − fYXW [ g ( x), x, w]}
2
dxdw = O p [ n − (2 β −1) /(2 β +α ) ] + gˆ − g o p (1)
2
by assumptions 2 and 5(b), thereby yielding (g). Finally, (h) can be proved by combining (a) with
arguments similar to those used to prove (g). The lemma is now proved because the foregoing
arguments hold uniformly over H ∈ H . Q.E.D.
Proof of Theorem 2: The theorem follows by combining the results of lemmas 6.1-6.3
with L ω g < 1 . Q.E.D.
6.3 Proof of Theorem 3
It suffices to find a sequence of finite-dimensional models {g n } ∈ H for which
liminf PH ⎡ g n − g n
⎣
n →∞
2
> Dn − (2 β −1) /(2 β +α ) ⎤ > 0 .
⎦
To this end, let m denote the integer part of n−1/(2 β +α ) and f XW denote the density of ( X ,W ) .
Since α ≥ 2 , (2β + α − 1) / 2 ≥ (3β + α − 1/ 2) /(α + 1) .
Let r = (2 β + α − 1) / 2 .
f XW ( x, w) ≤ C for all ( x, w) ∈ [0,1]2 and some constant C < ∞ . Let
Y = gn ( X ) + U ,
21
Assume that
where U is independent of ( X ,W ) , and P (U ≤ 0) = q . Let FU and fU , respectively, denote the
distribution function and density of U . Assume that fU (0) > 0 and that FU is twice continuously
differentiable everywhere with | FU′′ (u ) | < M for all u and some M < ∞ . Define the operator Q
on L2 [0,1] by
(Πg )( x) =
1
∫0 π ( x, z) g ( z )dz ,
for any g ∈ L2 [0,1] , where
1
π ( x, z ) = fU (0) 2 ∫ f XW ( x, w) f XW ( z , w)dw .
0
Let {λ j ,φ j : j = 1, 2,...} denote the orthonormal eigenvalues and eigenvectors of Π ordered so
that λ1 ≥ λ2 ≥ ... > 0 . Assume that jα λ j is bounded away from 0 and 1 for all j . Set
g n ( x) = θ
∞
∑ j − β φ j ( x)
j =m
for some finite, constant θ > 0 . Then for any h ∈ L2 [0,1] ,
(T h)( w) =
1
∫0 FU [h( x) − gn ( x)] f XW ( x, w)dx ,
and the Fréchet derivative of T at g n is
(Tg n h)( w) = fU (0)
1
∫0 f XW ( x, w)[h( x) − gn ( x)]dx .
Assumption 6 is satisfied with L = MC whenever θ > 0 is sufficiently small.
Now let θˆ be an estimator of θ . Then
gˆ ( x) ≡ θˆ
∞
∑ j − β φ j ( x) .
j =m
is an estimator of g n ( x ) . Moreover,
(6.25)
gˆ − g n
where Rn =
2
∞
= (θˆ − θ ) 2 Rn ,
∑ j =m j −2β .
Note that n(2 β −1) /(2 β +α ) Rn is bounded away from 0 and 1 as n → ∞ .
In addition, fW is estimated by
fˆw ( w) = q −1 (T gˆ )( w) .
Define ψ j = Tg φ j / Tg φ j . Then a Taylor series approximation and singular value expansion give
22
fˆW ( w) − fW ( w) = q −1 (θˆ − θ )
∞
∑ j − β (Tgφ j )(w) + (θˆ − θ )2 O[n−(2β −1) /(2β +α ) ]
j =m
.
= q −1 (θˆ − θ )
∞
∑ j − β λ1/j 2ψ j (w) + (θˆ − θ )2 O[n−(2β −1) /(2β +α ) ].
j =m
Now,
n
(2 β +α −1) /(2 β +α )
∞
∑
2
2
j − β λ1/
j ψj
j =m
is bounded away from 0 and ∞ as n → ∞ . Therefore, there is a finite constant Cθ > 0 such that
(6.26) (θˆ − θ )2 ≥ Cθ n(2 β +α −1) /(2 β +α ) fˆW − fW
2
.
Combining (6.25) and (6.26) shows that there is a finite constant C g > 0 such that
(6.27)
n(2 β −1) /(2 β +α ) gˆ − g n
2
≥ Cg n(2 β +α −1) /(2 β +α ) fˆW − fW
2
.
The theorem now follows from (6.27) and the observation that with r = (2 β + α − 1) / 2 ,
O p [n(2 β +α −1) /(2 β +α ) ] is the fastest possible minimax rate of convergence of fˆW − fW
23
2
. Q.E.D.
FOOTNOTES
1
The results of this paper hold even if fYXW or its derivatives are discontinuous at one or more
boundaries of the support of (Y , X ,W ) provided that the kernel function K is replaced by a
boundary kernel.
24
REFERENCES
Amemiya, T. (1982). Two stage least absolute deviations estimators. Econometrica, 50, 689711.
Bissantz, N., T. Hohage, and A. Munk (2004). Consistency and rates of convergence of nonlinear
Tikhonov regularization with random noise. Inverse Problems, 20, 1773-1789.
Bissantz, N., T. Hohage, A. Munk, and F. Ruymgaart (2006). Convergence rates of general
regularization methods for statistical invetrse problems and applications, working paper, Institut
für Numerische und Angewandte Mathematik, Georg-August-Universität Göttingen, Germany.
Blundell, R. and J.L. Powell (2003). Endogeneity in nonparametric and semiparametric
regression models. In Advances in Economics and Econometrics: Theory and Applications, M.
Dewatripont, L.-P. Hansen, and S.J. Turnovsky, eds., 2, 312-357, Cambridge: Cambridge
University Press.
Carrasco, M., J.-P. Florens, and E. Renault (2005). Linear inverse problems in structural
econometrics: estimation based on spectral decomposition and regularization. In Handbook of
Econometrics, Vol. 6, E.E. Leamer and J.J. Heckman, eds, Amsterdam: North-Holland,
forthcoming.
Chen, L. and S. Portnoy (1996). Two-stage regression quantiles and two-stage trimmed least
squares estimators for structural equation models. Communications in Statistics, Theory and
Methods, 25, 1005-1032.
Chernozhukov, V. and C. Hansen (2005a). Instrumental quantile regression inference for
structural and treatment effect models. Journal of Econometrics, forthcoming.
Chernozhukov, V. and C. Hansen (2005b).
Econometrica, 73, 245-261.
An IV model of quantile treatment effects.
Chernozhukov, V. and C. Hansen (2004). The effects of 401(k) participation on the wealth
distribution: an instrumental quantile regression analysis. Review of Economics and Statistics,
86, 735-751.
Chernozhukov, V., G.W. Imbens, and W.K. Newey (2004). Instrumental variable identification
and estimation of nonseparable models via quantile conditions. Working paper, Department of
Economics, Massachusetts Institute of Technology, Cambridge, MA.
Chesher, A. (2003). Identification in nonseparable models. Econometrica, 71, 1405-1441.
Darolles, S., J.-P. Florens, and E. Renault (2002). Nonparametric instrumental regression.
Working paper, GREMAQ, University of Social Science, Toulouse, France.
Engl, H.W., M. Hanke, and A. Neubauer (1996).
Dordrecht: Kluwer Academic Publishers.
25
Regularization of Inverse Problems.
Florens, J.-P. (2003). Inverse problems and structural econometrics: the example of instrumental
variables. In Advances in Economics and Econometrics: Theory and Applications,
Dewatripont, L.-P. Hansen, and S.J. Turnovsky, eds., 2, 284-311, Cambridge: Cambridge
University Press.
Groetsch, C. (1984). The Theory of Tikhonov Regularization for Fredholm Equations of the First
Kind. London: Pitman.
Hall, P. and J.L. Horowitz (2005). Nonparametric methods for inference in the presence of
instrumental variables. Annals of Statistics, 33, 2904-2929.
Horowitz, J.L. (2005). Asymptotic normality of a nonparametric instrumental variables
estimator. Working paper, Department of Economics, Northwestern University, Evanston, IL.
Januszewski, S.I. (2002). The effect of air traffic delays on airline prices. Working paper,
Department of Economics, University of California at San Diego, La Jolla, CA.
Kress, R. (1999). Linear Integral Equations, 2nd edition. New York: Springer.
Ma, L. and R. Koenker (2006). Quantile regression methods for recursive structural equation
models. Journal of Econometrics, forthcoming.
Newey, W.K. and J.L. Powell (2003). Instrumental variable estimation of nonparametric models.
Econometrica, 71, 1565-1578.
Newey, W.K., J.L. Powell, and F. Vella (1999). Nonparametric estimation of triangular
simultaneous equations models. Econometrica, 67, 565-603.
Powell, J.L. (1983). The asymptotic normality of two-stage least absolute deviations estimators,
Econometrica. 50, 1569-1575.
Tikhonov, A.N. (1963a). Solution of incorrectly formulated problems and the regularization
method. Soviet Mathematics Doklady, 4, 1035-1038.
Tikhonov, A.N. (1963b). Regularization of incorrectly posed problems. Soviet Mathematics
Doklady, 4, 1624-1627.
26
Table 1.
Results of Monte Carlo Experiments
n
an
h
Bias2
Variance
MISE
200
0.5
0.5
1.0
1.0
0.5
0.8
0.5
0.8
0.0226
0.0795
0.0146
0.0247
0.0199
0.0220
0.0195
0.0203
0.0425
0.1015
0.0341
0.0450
800
0.5
0.5
1.0
1.0
0.5
0.8
0.5
0.8
0.0233
0.0816
0.0151
0.0258
0.0048
0.0054
0.0047
0.0050
0.0282
0.0870
0.0199
0.0308
Note: Bias2R, Variance, and MISE, respectively,
are Monte CarloR approxiR1
1
1
2
mations to 0 (E[ĝ(x)] − g(x)) dx, 0 (ĝ(x) − E[ĝ(x)])2 dx, and 0 (ĝ(x) −
g(x))2 dx.
27
Figure 1: The Density of W
28
Figure 2: The Density of X Conditional on W
29
Figure 3: The Density of U Conditional on X and W
30
Figure 4: Monte Carlo Results for n = 200
31
Figure 5: Monte Carlo Results for n = 800
32
Download