Document 11159490

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/kernelestimationOOnewe
working paper
department
of economics
KERNEL ESTIMATION OF PARTIAL MEANS
AND A GENERAL VARIANCE ESTIMATOR
Whitney K
No.
.
Newey
92 -3
Dec
.
1992
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
M.I.T.
LIBRARIES
KERNEL ESTIMATION OF PARTIAL MEANS
AND A GENERAL VARIANCE ESTIMATOR
Whitney K. Newey
No.
93-3
Dec.
1992
MIT Economics Working Paper 93-3
KERNEL ESTIMATION OF PARTIAL MEANS AND
A GENERAL VARIANCE ESTIMATOR*
Whitney K. Newey
MIT Department of Economics
December, 1991
Revised: December, 1992
*
Financial support
comments.
was provided by the NSF.
P.
Robinson and T. Stoker provided useful
Abstract
Econometric applications of kernel estimators are proliferating, suggesting the need for
convenient variance estimates and conditions for asymptotic normality.
This paper develops
a general "delta method" variance estimator for functionals of kernel estimators.
Also,
regularity conditions for asymptotic normality are given, along with a guide to verifying
them for particular estimators.
The general results are applied to partial means, which are
averages of kernel estimators over some of their arguments with other arguments held fixed.
Partial
means have econometric applications, such as consumer surplus estimation, and are
useful for estimation of additive nonparametric models.
Keywords:
estimation.
Kernel estimation, partial means, standard errors, delta method, functional
1.
Introduction
There are a growing number of applications where estimators use the kernel method
their construction,
where functionals of kernel estimators are involved.
i.e.
in
Examples
include average derivative estimation (Hardle and Stoker, 1989, and Powell, Stock, and
Stoker, 1989), nonparametric policy analysis (Stock, 1989), consumer surplus estimation
(Hausman and Newey,
1992), and others that are the topic of current research.
important example in this paper
is
a partial mean, which
is
an average of a kernel
regression estimator over some components, holding others fixed.
The growth of kernel
applications suggests the need for a general variance estimator, that applies to
This paper presents one such estimator.
cases, including partial means.
An
many
Also, the paper
gives general results on asymptotic normality of functionals of kernel estimators.
means control for covariates by averaging over them.
Partial
They are related to
additive nonparametric models and have important uses in economics, as further discussed
below.
It is
shown here that
their convergence rate is determined by the
components that are averaged
out, being faster the
number of
more components that are averaged
over.
The variance estimator
is
based on differentiating the functional with respect to the
contribution of each observation to the kernel.
A more common method
asymptotic variance formula and then "plug-in" consistent estimators.
quite difficult
case.
when the asymptotic formula
In contrast, the
functional and kernel.
In this
way
Also,
it
This method can be
complicated, as often seems to be the
it
is
gives consistent standard errors even for fixed
is
centered at
like the
An alternative approach
its limit),
unlike the
more common
Huber (1967) asymptotic variance for m-estimators.
Also, it is a generalization of the "delta
bootstrap.
to calculate the
approach described here only requires knowing the form of the
bandwidths (when the estimator
approach.
is
is
method" for functions of sample means.
to variance estimation, or confidence intervals, is the
The bootstrap may give consistent confidence intervals
-
1
-
(e.g.
by the
percentile method) for the
not appear to be known.
same types of functionals considered here, although
In any case, variance estimates are useful for bootstrap
improvements to the asymptotic distribution, as considered
The variance formula given here has antecedents
density at a point
it
is
this does
in Hall (1992).
in the literature.
For a kernel
equal to the sample variance of the kernel observations, as
recently considered by Hall (1992).
For a kernel regression at a point, a related
estimator was proposed by Bierens (1987).
Also, the standard errors for average
derivatives in Hardle and Stoker (1989) and Powell, Stock, and Stoker (1989) are equal to
this estimator
when the kernel
is
New
symmetric.
cases included here are partial means
and estimators that depend (possibly) nonlinearly on
of the density or regression
all
function, and not just on its value at sample points.
Section 2 sets up m-estimators that depend on kernel densities or regressions, and
gives examples.
estimator.
Section 3 gives the standard errors,
i.e.
the asymptotic variance
Section 4 describes partial means and their estimators, and associated
asymptotic theory.
Section 5 gives some general lemmas that are useful for the
asymptotic theory of partial means, and more generally for other nonlinear functionals of
kernel estimators.
The proofs are collected
in
Appendix
A,
and Appendix B contains some
technical lemmas.
2.
The Estimators
The estimators considered
is
in this
To describe the
a vector of kernel estimators.
vector of variables,
x
a
k x
paper are two step estimators where the first step
1
denote the product of the density
first step, let
y
be a
r
x
1
vector of continuously distributed variables, and
f
n (x)
of
x
with
- 2 -
E[y|x]
as
h
(2.1)
Let
Q
(x)
= E[y|x]f
(x).
where
in Section 4,
u
k x
is
that include observations
Let
1.
and
y.
x.
l
K
=
(u)
-k
a-
z.,
(i
on
h(x) = n
This estimator
is
A second
^"y.K
h.
u
n"
This
is
h,
($
,
and not just
E[m(z,£
1
is
an m-estimator that depends on the
To describe such an estimator,
and
,h
)]
=
0.
its
h.
let
|3
denote a vector of
a vector of functions that depend on the
m(z,/3,h)
observation, parameter, and the function
(2.3)
and
(x-x.).
parameters, with true value
sample equation
>
o-
the first step considered here.
estimated function
Suppose that
Then for a bandwidth
x.
is
h_.
step allowed for in this paper
entire function
denote data observations,
n),
1
and
y
a kernel estimator of
K(u/o-),
=
and other conditions given
1
l
o"
(2.2)
TX(u)du =
denote a kernel function satisfying
X(u)
Here
m(z,j3,h)
is
allowed to depend on the
value at observed points; see below for examples.
A second
step estimator
|3
that solves a corresponding
is
n
m(z.,/3,h) = 0.
*n=l
i
£.
i
a two-step m-estimator where the first step
the kernel estimator described
is
above.
This estimator includes as special cases functions of kernel estimators evaluated
at points, e.g. a kernel density estimator at a point.
Some other interesting examples
are as follows:
Partial Means:
An example that
is
(apparently)
new
regression over some variables holding others fixed.
and
g n (x) = E[q|x].
Partition
x = (x ,x
)
and
let
- 3 -
is
an average of a nonparametric
Let
x„
q
denote a random variable
be a variable that
is
included
and has the same dimension as
z
in
x(x
&
be some fixed value for
x
and
x„,
(2.3a)
/3
This object
A
= E[T(x
Q
2
)g
(x
x
r 2
mean
partial
an average over some conditioning variables holding others fixed.
is
estimator
y =
Let
expectation.
f(x) = h,(x),
and
|3
= n
This estimator
)h (x
is
= (x,,x„.).
x.
is
^TlXylilx.).
m(z,0,h) =
a special case of equation (2.3) with
,x„)/h (x ,x„) -
It
/3.
shows how explicit estimators can be included as
Further discussion
special cases of equation (2.3).
given in Section
is
4.
An estimator with economic applications
Differential Equation Solution:
solves a differential equation depending on a nonparametric regression.
estimator, let
(x ,x )'.
,
y = (l,q)
x
and suppose
is
be some fixed value for
x
denoted by
p
and
p
,
with
p
<
p
two-dimensional
The estimator
.
g(x) = h„(x)/f(x).
It
is
m(z,/3,h)
)
=
commodity
£
q,
is
k =
To describe
2),
with
x =
£.
is
given by
m(z,£,h)
of equation (2.3)
way
This example shows one
h.
p
to
p
,
The economic
for an individual with income
- 4
is
that
a nonparametric estimate of the cost of a change of price
from
this
0,
can be allowed to depend on the entire function
interpretation of
of a
1
Sip
a special case where the
the solution of the differential equation minus
(i.e.
one that
is
and consider two possible values for
x
dS(p)/dp = -g(x -S(p),p),
= S(p°),
(2.5)
for
Let
can
for the kernel density
Then the estimator
2i
1
1
It
and a sample average for the
g
g(x) = h (x)/h (x) = h (x)/f(x),
(l,q),
1
(2.4)
is
)].
be estimated by substituting a kernel estimator for
x
Let
.
J.
be some weight function, possibly associated with fixed "trimming" that keeps a
)
denominator bounded away from zero.
t(x
x
L
x
p,
and demand
function
g n (x) = E[q|x].
This example
is
analyzed in Hausman and Newey (1992), using
results developed here.
Inverse Density Weighted Least Squares:
An estimator that
follows.
is
Let
t(x)
t(«)
is
an
be a density of an elliptically symmetric distribution
for some
(x-fi)'E(x-fi)
/j
and
t(x)
(i.e.
that has bounded
Z),
The estimator solves
minimizes
(2.6)
Y.
n
l
ill
,f(.x.)~ T(x.)lq.-x'.p]
^i=l
l
2
.
This estimator has the form given in equation (2.3), with
The weighting by the inverse density leads to
E[q|x]
projection of
on
x
|3
under the density
(1991).
This estimator
is
= h(x)
m(z,/3,h)
T(x)x[q-x'£].
converging to the least squares
t(x),
which
coefficients of a generalized regression model, as discussed in
Duan
where
a weighted least squares estimator, that can be described as
is
a density that depends only on
support.
useful for estimating
E[y|x] = t(x'5),
the semiparametric generalized regression model
unknown transformation,
is
analyzed in Newey and Ruud
is
consistent for scaled
Ruud
(1991),
(1986)
and Li and
using results
developed here.
The results of this paper apply to each of these examples, as discussed below.
also apply to other estimators, including those that minimize a quadratic
sample average depending on
quasi-maximum
3.
£
and
h,
form
They
will
in a
or minimize a sample average, such as
likelihood estimators that depend on kernel estimators.
The Asymptotic Variance Estimator
To form approximate confidence
consistent standard errors.
intervals and test statistics
it
is
important to have
To motivate the form of the asymptotic variance estimator,
- 5 -
it
is
helpful to briefly sketch the asymptotic distribution theory.
left-hand side of equation (2.3) around
-
(3.1)
where
n
is
= -[n"
Q
1
and solving for
am(z.J,n)/a0]"
j;."
i
a mean value.
1
mn O o
)
>
mn (|S)
Expanding the
-
gives
= I.^mfz.^.hJ/n,
By the uniform law of large numbers discussed
in Section 4,
will converge in probability to
Y,_ 9m(z.,£,h)/3/3
1
M
= E[am(z,3
,h
)/80].
so that the asymptotic distribution of
will be determined by
conditions will be given for existence of
The magnitude of
such that
with
a
a
will be determined by the
form of
E[m(z,0
a
vn<r (0 -
(3.1b)
A
)
-U
N(0,M
_1
(3.1c)
as a function of
E[m(z,0
,h)].
will be
-1,
VM
).
consistent asymptotic variance estimator can be constructed by substituting
estimates for true values in the formula
M,
,h)]
being smaller the more dimensions being integrated over in
By the Slutzky theorem the asymptotic distribution for
of
In Section 4
(0 n ).
vno-°m (0_) -±> N(0,V).
n
(3.1a)
h,
a a
m
M VM
'
.
It
is
easy to construct an estimator
as
M
= n"
1
X;."
am(z.,0,n)/a0.
i
Finding a consistent estimator of
for the presence of
h
in
m
V
(0 n )-
is
more
difficult,
because of the need to account
One common approach to this problem
is to
calculate
the asymptotic variance, and then form an estimator by substituting estimates for unknown
functions, such as sample averages for expectations.
- 6 -
This approach can be difficult when
the asymptotic variance
is
very complicated.
parameter, because the variance formula
Also, it
h
(3.2)
fl[n"
tj ; iin(zJ ,|,fi
The interpretation of
h
observation in
estimator.
on
+
that
O^'
is
tr
—
0.
>
constructed by estimating the influence of
Let
T. ,m(z.,S,h)/n.
^i=l
1
1
=
5.
on
be sensitive to the bandwidth
only valid in the limit as
is
The asymptotic variance estimator here
each observation in
may
denote a scalar number and
C
~ x.))]/aci
th
estimates the first-order effect of the
5.
is
£.
m(z.,/3 ,h)/n.
it
let
In this sense it is
i
an "influence function"
The variance can be estimated by including this term with
in a
m(z.,P,h)
sample variance, as in
V =
(3.3)
£."#.#'. /n,
n=li
#.
l
=
m(z.,j§,h) + 8. - £.
l
An asymptotic variance estimator for
n
.5yn.
J=l
l
l
J
can then be constructed by combining
/3
V
with a
Jacobian estimator in the usual way, as in
Var(p) =
(3.4)
_1
M VM
_1
'
,
M
= n
-1
£." Bmlz.,(i&)/ap.
In Section 5 conditions will be given that are sufficient for
Consequently, inference procedures based on
and variance
-
~
1.96[Var(|3)../n]
on
(i.e.
a),
var(£)
form of
Var(p)
of the
5.
in
form of
For example,
is
$
.
±
It
is
does not depend on the convergence rate for
but that its large sample behavior will.
0..
h
by including the
These terms are straightforward to compute, requiring only knowledge
m(z,/3,h)
and the kernel.
In particular,
analytic differentiation with respect to the scalar
formula
'
being normally distributed with mean
£
This asymptotic variance estimator accounts for the presence of
terms
—^-> M VM
will be an asymptotic 95 percent confidence interval.
interesting to note that the
|3
-
will be asymptotically valid.
Var(/3)/n
1/2
/3
cr
very hard to construct,
5.
C,.
5.
can be calculated by
Alternatively, if the analytic
can be calculated as the numerical derivative of
7 -
n
y. ,m(z.J,h + <y.K
*-j=l
j
1
(•
- x.))/n
cr
V = Y.
ri 1
Li=l,0.fi'./n
Here
with respect to
<.
1
a "delta-method" variance for kernel estimators.
is
exactly analogous to delta method variances for parametric estimators.
h
of
h =
was a sample mean rather than a kernel estimator, say
n
5.
a{y. ,m(z.,£,h + C,y.)/n}/dC, =
j=l
i
J
is
i
where
in, y.,
hi
the analogous influence function estimator would be
= a(z.,0,h) + m, (y.-y),
in,
h
Another feature of
bandwidth shrinking to zero for
its validity.
when the bandwidth
The terms
£
is
and
m(z.,£,h)
convergence rate of
y'.
n
i
^j=l
is
reasons the formula for
Var(/3)
held fixed at
S./n
£.
slower than
5
.
that
is
it
estimator
Vn(fi -
K
(x-x.).
(3
),
= f(x)
|3
at
(e.g.
form of
some
,
as
j
X
[q.
M
i
- f(x.)
j
^(xJK
2
j
(x.-x.).
cr
The asymptotic variance estimator can then be formed as
- 8 -
j
i
to
that affect the
levels
and
will dominate).
The simplest
Other examples are:
can be obtained by explicit differentiation of
tV^.lftx.)
^j=l
2j
h
the sample variance of
i
= n
the
where the asymptotic variance
x,
1
n
n" X
T(x_.)[h (x.) + <q-K
(x - x .)]/[f(x.) + <K (x.-x.)],
^ M i o- j
^j=l
2 j
<r
2j
i
l
j
j
l
is
Also, for analogous
this estimator.
This estimator was recently considered by Hall (1992).
5.
would
when the
\j>.
between pointwise density
.
5.
|3
it
They are retained because they are easy
1/Vn.
Var(p) = Y. ,K (x-x.) /n - [Y ,K (x-x.)/n]
^i=l <r
i
^j=l cr
j
Here
where
are asymptotically negligible in
to illustrate the
a density estimator
Partial Means:
m,y./n)
<r.
where the slower convergence rate of the derivative
is
(£.
does not rely on the
does not distinguish between elements of
Some examples may serve
is
j
the bandwidth were held fixed
If
asymptotic distribution and those that do not
example
Thus,
Sa(z.,P,h)/5h.
compute and could conceivably improve the asymptotic approximation.
derivatives,
then the analog
= m(z.,£,h) + m.y. -
!fi.
be a consistent estimator of the asymptotic variance of
|3
1
if
the usual delta-method formula for the presence of a sample
average in an m-estimator.
limit of
= n~
For example,
y./n,
Y>.
is
It
Var(£) =
(3.5)
E^i^/n,
^1=1 l
Ifi.
Differential Equation Solution:
5.,
but this expression
approach that
is
£K
2
2i
It
l
((p,x -S(p))-x.)]
+ 5. -
l
J.^Syn.
J=l J
l
possible to derive an analytical expression for
is
An alternative
quite complicated and difficult to evaluate.
used by Hausman and Newey (1992),
numerical solution to
+
is
= x(x_.)h_(x.)/f(x.) -
l
is
to numerically differentiate the
dS(p)/dp = -[h (p,x -S(p)) + <q.K ((p,x -S(p))-x.)]/[f(p,x -S(p))
with respect to
£
form
to
This approach
5..
is
quite feasible
using existing fast and accurate numerical algorithms for ordinary differential equations.
Inverse Density Weighted Least Squares:
1
11
Var(0) = fiT^ifV. -^'.
(3.5a)
*-i=l
5.
ill
i
jfif"
M
,
^J=l
n
8-
= -n" y
is
f(x.)" T(x.)x.x'.
ill
l
n
2
i
T(x.)f(x.)" x.u.K (x.-x.).
^J=l
i
J
The variance estimator
y.-x'.p.
1
i
^i=l
1
- 5\ ,8.,
i
=
u.
= n"V.
i
n
= t(x.)x.u. +
#.
l
Let
J
J
J
J
c
J
i
For partial means, asymptotic normality and consistency of the variance estimator are
shown
in Section 4,
using the
Lemmas
of Section 5.
Corresponding theoretical results for
the other examples are given elsewhere.
4.
Partial
Means
Partial
means have a number of applications
in
economics.
For example, they can be
used to approximate the solution to the differential equation described in Section
Dropping the
P„ =
[p ,p
S(p)
term from inside
g (x ,p)dp = E[(p -p
].
where
It
S(p)
is
known that
is
)g
(x ,x
2.
)],
where
x
leads to an approximation as
is
distributed uniformly on
this approximation is quite good in
a small proportion of
described in Section
g (x -S(p),p)
It
x
(see Willig, 1978).
can be estimated by
- 9 -
2.
many economic examples,
This
is
a partial mean as
where
p.
is
"
1
=
(4.1)
(p
-p°)Ii
g(x ,p.)/n,
1
1
drawn from a uniform distribution on
[p ,p
This
].
is
a simulation
estimator similar in spirit to that of Lerman and Manski (1981).
Partial
means are also of interest from a purely
E[q|x
dimension attenuation devices.
Like
smaller dimensional argument.
Consequently, partial
than estimators of
covariates
x„,
The way
in
in
the partial
],
E[q|x
However, unlike
g n (x).
2
.)g
is
a function of a
mean estimators
will converge faster
a partial mean controls for the
],
which partial means control for covariates
takes an additive form,
E[T(x
mean
an average way.
relationship to additive nonparametric models.
(4 2)
statistical point of view, as
(
E[q|x] = g
(x
Xl ,x2 .)] = g 10 Xl
Thus, as a function of
(
x
)
)
Suppose that the conditional expectation
+ g
(x
+ E[T(x .)g
the partial
,
illustrated by their
is
2
),
20
E[x(x„)] =
and that
(x
2
Then
1.
.)].
mean estimates the corresponding component
of an
additive model, up to a constant.
In
comparison with other additive model estimators, partial means are easier to
compute but may be less asymptotically efficient.
Unlike alternating conditional
expectation estimator for additive models (ACE, Breiman and Friedman, 1985), the partial
mean
is
an explicit functional, so the kernel estimator will not require iteration.
However, because the partial mean does not impose additivity
estimator.
Also, the partial
curse of dimensionality
may
mean depends on the
E[q|x].
The partial mean
mean-square projection of
may be
a less efficient
full conditional expectation,
so the
result in slower convergence to the limiting distribution.
The partial mean and ACE are different
placed on
it
E[q|x]
is
statistical objects
when no
given in equation (2.3a).
The ACE object
on the set of functions of the form
These estimators summarize different features of
- 10 -
E[q|x].
If
one
restrictions are
is
gJx
)
is
the
+ g„(x
interested in
).
of
5.
8<£." m(z.,0,fi + ZyJ/xiY/dC, = nyr.,
is
influence function estimator would be
the analogous
6
=
= m(z.,/3,h) + m,y. -
1
tr
1
Another feature of
average in an m-estimator.
bandwidth shrinking to zero for
its validity.
Var(/3)
that
is
,m,y./n)
^j=l h j
(.Y
1
it
Vn{@ -
|3
where
),
when the bandwidth
The terms
and
m(z.,p,h)
convergence rate of
is
j§
is
Y,.
held fixed at
8./n
slower than
5.
example
estimator
K
(x-x.).
£ = f(x)
is
at
form of
some
x,
This estimator was recently considered by Hall
n'tStL.UhJx.)
^j=l
2 j
2j
5.
l
The simplest
the sample variance of
,
[3].
Other examples are:
can be obtained by explicit differentiation of
5.
+ Cq.K
(x.-x.)]/[f(x.) +
^ M i cr j
l
j
<K
(x.-x.)],
cr
j
as
i
_1
1
n
1
= n" I. T(3L.)f(x.r [q.
M i - f(x.) h (x.)]Kcr (x.-x.).
^j=l
2j
2 j
l
j
j
1
j
The asymptotic variance estimator can then be formed as
n 2
Var(£) = £. .0 /n,
^1=1
$.
l
Differential Equation Solution:
5.,
but this expression
is
l
= T(x_.)h„(x.)/f(x.) - p +
2i
2 l
l
It
is
n
8. - I. ,5./n.
^j=l
l
(10)
j
possible to derive an analytical expression for
quite complicated and difficult to evaluate.
11
-
to
that affect the
where the asymptotic variance
.
Here
h
will dominate).
this estimator.
Var(p) = Y. ,K (x-x.) /n - [Y ,K (x-x.)/n]
^i=l <r
^j=l cr
l
j
Partial Means:
when the
between pointwise density levels and
(e.g.
to illustrate the
a density estimator
is
$.
Also, for analogous
does not distinguish between elements of
Some examples may serve
the
They are retained because they are easy
1/Vn.
where the slower convergence rate of the derivative
derivatives,
is
o\
are asymptotically negligible in
asymptotic distribution and those that do not
would
cr
compute and could conceivably improve the asymptotic approximation.
reasons the formula for
it
£
cr
p
.
does not rely on the
the bandwidth were held fixed
If
be a consistent estimator of the asymptotic variance of
limit of
Thus,
the usual delta-method formula for the presence of a sample
+ m, (y.-y),
a(z.,/3,h)
iji.
= n *£ " 3a(z.,0,h)/ah.
mh
where
An alternative
Let the
normal.
argument of
u
denote the true density of
X(u)
be partitioned conformably with
x
and
f (x
n
9)
The asymptotic variance of the partial mean estimator
x„.
will be
V = [X{XX(u
(4.3)
,u
1
Theorem.
ii)
2
)du
Suppose that
4.1:
H
Assumptions
2
2
>
(t)
Var(q|x=(x
1
E[\q\
4
fJx,x)
E[\q\
] < m,
d s
s;
4
\x]fJx)
Hi) x(x
for some
co
no-
—
i
If, in addition,
>
0.
is
Then for
ncr
E[q\x]
bounded,
and
l
—
>
$
then
oo,
equation (2,4),
in
2
lV
cr
^
—
Vntr
i
x(x
iv)
E[q \x]
4
^{1+Eiq \x=(xfr\,x )]}f(xfx\,x )dx
2
2
2
k /Z
fsup
c > 0,
+* S
and
frf x2 )
a.e.,
fQ (x)
are bounded;
bounded and zero except
is
)
~
are continuous
and
bounded away from zero;
is
,t))dt.
1
and K are satisfied for
on a compact set where
2
(x ,t)" T(t) f
].J f
1
i)
2
1
>
du
and
)
f (x
)
are continuous, and
Pk-k
< «;
(fi
- P )
Q
-^
9
i/ln(n)
v) n<r
->
N(0,V).
V.
The conditions here embody "undersmoothing," meaning that the bias goes to zero faster
than the variance.
distribution
is
Undersmoothing
reflected in the conclusion, where the limiting
is
centered at zero, rather than at a bias term.
An improved convergence rate for
in the
k /2
normalizing factor
v'ntT
partial
means over pointwise estimators
k -1/2
the asymptotic distribution result
is
(no- l)
while the corresponding rate from the
k -1/2
usual asymptotic normality result for pointwise estimators
converges to zero slower by
is
cr
is
(n<r
out,
)
which
,
Furthermore, the rate for partial means
going to zero.
exactly the nonparametric pointwise rate when the dimension
components are averaged
embodied
The rate implied by
for the asymptotic distribution.
i
is
the smaller will be
k
,
is
k
.
Thus, the more
and hence the faster will be the
convergence rate.
One important feature of this result
trimming" condition, where the density of
nonzero.
problem."
This condition
It
is
is
is
x
hypothesis
is
iii),
bounded away from zero where
theoretically convenient because
used here because
it
is
that amounts to a "fixed
not restrictive in
- 12 -
it
t
is
avoids the "denominator
many cases
(e.g.
pointwise
estimation) and because the resulting theory roughly corresponds to trimming based on a
large auxiliary sample, which
is
often available.
might be possible to modify the
It
results to allow trimming to depend on the sample size, e.g. as in Robinson (1988), but
would be very complicated.
this modification
Estimators will be v'n-consistent when they are full means,
There are many interesting examples of such estimators, such as the
components.
all
The general conditions given here are
policy analysis estimator of Stock (1989).
slightly different for that case, so
some function of the data, and
different than
= E[a (z)g
j3
x
of
(3
variance estimator from equation (2.6), for
p = E
i
(4.4)
"
i
(x)
= E[q|x],
a
n
may
(z)
be
= h (x)/h
g(x)
0.
= a
as above,
(x)
(z.)g(x.) -
is
1^,/n,
+ 6. -
= E.V^zJftx.rHq.-gtxJlK
(x.-x.)/n.
M &
cr
^j=l
i
j
l
j
5.
Oj
l
j
The asymptotic variance of this estimator
V =
(4.5)
g
along with the associated asymptotic
,
I^/n,
Var(p) =
a (z )i(x.)/n,
1
where
(x)],
a continuously distributed variable that
is
A kernel estimator
x.
helpful to describe the estimator and result
is
it
Suppose
way.
in a slightly different
is
are averages over
i.e.
^
El**].
= a (z.)g
will be
+ E[a <z) |x]| ~
-
(x.)
Q
yx.f T^x.Hq.-g^x.)].
=x
i
Theorem
and
is
E[\mJz)\
x
zero if
compact
set;
inside the
Vn(p-P
Y
n
Suppose that
4.2:
]
is
iv)
<
ii)
oo;
/Vn
ili
i
1
2
M-\l>.\\ /n
]
< m,
E[\q\
H
not in a compact set, and
E[aJz)\x],
-^
i
-U
4
Assumptions K and
compact set of Hi);
"
)-Y,
2
E[\q\
i)
and
and
and
v) no-
is
/ln(n)
2
and
—
>
oo
and
V).
fQ (x)
d £
s;
are bounded,
Hi) a.Jz)
bounded away from zero on that
are continuous
VnCp - $ ) -i> N(0,
Q
v -L+
\x]f (x)
are satisfied for
f Q (x)
f Q (x)
2k
4
a.e.
2s
n<r
and bounded for
—
»
0.
x
Then
If, in addition, n<r
3k
->
oo,
then
v.
This result gives asymptotic normality for a trimmed version of Stock's (1989) estimator,
- 13 -
as well as being a general result on the asymptotic normality of sample moments that are
random
linear functions of kernel regressions.
Useful
5.
Lemmas
Several intermediate results of a familiar type are useful in developing asymptotic
theory for the m-estimator
j3
described in Section
useful for showing consistency of
Asymptotic normality of
is
£
Uniform convergence results are
2.
and of the Jacobian term
|3
will follow
from
v'ncr
m
(£_)
—
>
in the
N(0,V).
expansion of
Also,
very important for consistent estimation of the asymptotic variance.
a =
when
such that
corresponding to v'n-consistency of
0,
v'nm (8„) = Y. ,\b./Vn + o
h=n
n
(1)
and
p
Y.
p,
,
^i=l
for each of these results are given in this Section.
110. -i//.
l
v -£-»
<r
II
/n —^->
0.
V
In addition,
can be shown that there
it
(3.1).
is
i//.
Primitive conditions
l
Examples of how these results can
be used to derive results for particular functionals are given in the proofs of Theorems
4.1
and 4.2, and
(1992),
proofs of results in Hausman and Newey (1992), Matzkin and Newey
in the
and Newey and Ruud
A number
(1991).
of additional regularity conditions are used in the analysis to follow.
The first regularity conditions imposes some moment assumptions.
IIBII
= [tr(B'B)]
1/2
where
,
Assumption Y: For
p £
This condition, like Assumptions
moment
condition for
y
is
as a function of
metric
is
the
h,
in
P
]
<
K and
oo,
E[llyll
P |x]f (x)
is
bounded,
let
2
E[llm(z,/3
H, is a standard type of condition.
it
useful to impose smoothness conditions on
terms of a metric on the set of possible functions.
supremum norm on the function and
14
its derivatives,
,h
)ll
The fourth
useful for obtaining optimal convergence rates for
For the asymptotic theory,
B
denotes the trace of a square matrix.
tr(»)
E[llyll
4,
For a matrix
h.
m(z,£,h)
Here, the
a Sobolev norm.
The
]
<
supremum norm
quite strong, but uniform convergence rates for a kernel estimator and
is
known or straightforward
derivatives are either well
its
are not very
much slower than
term"
uniform rates).
in the
L
to derive (see Appendix B), and
convergence rates (there
is
only an additional "log
Consequently, conditions for remainder terms to go to zero
fast enough to achieve asymptotic normality will not be
much stronger with the supremum
norm than they
it
in
L
will be with
supremum norm for many
Furthermore,
norms.
vector consisting of all distinct
X
Also, let
is
.sup
rv.\\d
This
a Sobolev supremum norm of order
taken equal to infinity
One useful type of result
is
one,
Suppose that
5.1:
where
£
is
c >
such that
\\m(z,P,h)-m(z,p,h
(5.1)
sup
i)
n
)
E[sup
d £ A+l, ln(n)/(ncr
E[b(z)] <
)\\
oo,
and for
* b(z)(\\h-h
"
l
\\n~ £.
i
\\
f
.
x e
X.
j.
m
Let
m(z,fi,h
compact, and
are satisfied with
and for any
the derivatives do not exist for some
if
uniform convergence
conclusion of the following result.
Lemma
x,
l
B{x)/dx'\\,
IIBII
is
denote any
elements of
all
contained in the support of
where
is
3 B(x)/5x^
show smoothness
let
j
IIBII.smax,
.
let
order partial derivatives of
j
denote a set that
nonnegative integer
quite easy to
functionals.
To define the norm, for a matrix of functions B(x)
B(x).
is
= E[m(z,/3,h )].
n
(£)
continuous at each
is
CR
)
\\m(z,p,h
—
/3
m(z,p,h) - E[m(z,p,h
and
>
Then
<
e
£
o-
oo;
Q
)]\\
-^
(5.1)
—
»
and
E[m(z,&,h
The uniform convergence conclusion of equation
consistency of the solution to equation (2.3).
)\\]
/3
ii)
e
S
with probability
Assumptions K, H, and Y
f)
all
1
in probability, as in the
is
llh-h
)]
and there
0,
is
is
biz)
< c,
II
£
continuous on
and
0.
a well known condition for
Also, equation (5.1) is useful in showing
consistency of an estimator that maximizes an objective function
- 15 -
and
n
—1
n
£.
a
m(z.,|3,h),
where
~*1
n
m
a scalar, and
is
is
useful for showing consistency of the Jacobian term
*
yi
£._ 3m(z.,/3,h)/5(3,
by letting the
m
in the
Lemma
statement of the
be each column of
of the derivative.
Asymptotic normality of
v'nV
m
(0
is
)
essential for asymptotic normality of
This result has two components, which are a linearization around the true
asymptotic normality of the linearization.
It is
|3.
and
h
useful to state these two components
separately.
Asymptotic normality of the linearization will follow from asymptotic normality of
v'nV
m
(/3
when
)
m(z,j3 ,h)
m(h) = m(z,P ,h).
n
nature of
5.2:
a linear functional that does not depend on
The rate of convergence
o
and
vnc-
m(h)
is
m(h) = Sv(x)' h(x)dx
v(x)
is
If
S
the magnitude of
For the moment, assume that
continuous almost everywhere, there
oo,
(i.e.
say
z,
will depend on the
a)
Here the results are grouped into two main ones, the first involving
m(h).
v^n-consistency.
Lemma
is
—
>
0,
where
is
zero outside a compact set,
such that
c >
5. = v(x.)'y.,
then for
a scalar
Vn[m(h)
-
E[sup
m(h
)]
II
v(x+u)
2
II
£711 y
2
|x]]
II
= £."/S.-Ef5.J>/vn +
(1).
P
Cases where convergence
following assumption
integer and let
J
is
is
slower than
1/Vn
useful for these cases.
9 h(x)/dx
J
be ordered so that
are somewhat more complicated.
For the moment
let
3 [y.K
(x-x.)]/3x
J
i
- 16 -
cr
i
£
The
be a nonnegative
= y®[3T<
(x-x.)/5x
J
<r
l
].
<
Assumption
domain
=
R
k = k
Suppose that
5.1:
k
£ k„
z,
there
,
for
= (x (t)'.t')';
x(t)
a matrix of functions
is
x
a vector of functions
< k,
,TGj(t)[dTi(x(t))/ax ]dt
+ k
ii)
almost everywhere and zero outside a compact set
R
in
(t)
w(t)
such that
1,
x
(t)
is
continuously
f
differentiable with bounded partial derivatives on a convex, compact set
J
=
in its interior;
4
%sup ||7}||i£; [<l
|x],
E[llyll
Z(x) = E[yy' |x]
iii)
The key condition here
continuous a.e., and for
is
+ v(x (t)+T},t)}f (x (t)+T),t)]dt <
1
containing
and
e >
v(x)
a,.
1
()
the integral representation of
is
m(h)
i)
bounded and continuous
is
and
3",
k
with
w(t)
The dimension of the
m(h).
argument being integrated and the order of the derivative lead to the convergence rate
-
for
that
m(h),
Vncr
is
k /2 + £
l
Thus, every additional dimension of integration
.
7
v?
increases the convergence rate by
rate by a factor of
l/o\
asymptotic variance of
V =
(5.2)
This hypothesis also leads to a specific form for the
which for
m(h),
Jcj(t)[E(x(t))®<fJ<(u ,t)^(u
1
Lemma
+
I,
5.3:
VR<r
k
If Assumptions
/2 —
=>
and
oo,
Asymptotic normality
is
nonlinear in
while every additional derivative decreases the
h
a+S
in the
,t)
,
-h>
£
,t)
£
= Sd K(u+[5x (t)/5t]v,v)/Su dv
,
du
}a)(t)
f (x(t))dt.
and
5.1 is
then
satisfied with
Vn(r
a
[m(h)-m(h
more general case where
)]
m(z,£
d a £+s
-^
,h)
is
again assume that this
useful for the linearization.
is
a scalar.
- 17 -
Let
a = k /2
and for
N(0,V).
depends on
can be reduced to the previous cases by a linearization.
following assumption
is
1
1
K, H, Y,
Vno-
X(u
m(z,h) = m(z,/3
The
n ,h),
and
z
and
Lemma
Suppose
5.4:
that
vector of functionals
for
ii)
biz) h-h n
II
[ln(n)/(n<r
/or
.
II
C A
and nonnegative constants
D(z,h),
12
and
d £ max{b+l,L,+s,b„+s}
such that
co>;
Assumptions K, H, and Y are satisfied, X
with
h
all
12
llh-h.IL
A
k+2J
1/2
)]
Q
D(z,h)
A. s A,
a,
is linear in
\\m(z,h)-m(z,h )-D(z,h-h
< c,
\\
\\D(z,h)\\
iii)
;
+
\\h-h
i)
E[b(zf]
and
* b(z) llhll.
A
is
)\\
h
compact, there
(t
on
=
{h
c
2),
1,
llhll.
:
A
is a
>
<
=£
< «;
uO /or
tj
J =
n
1
S
cr
,
/n
U
->
A
n
Vn<r E[b(z) JtA • i, 2
0,
n
->
and
vncr
k+A
fa
->
co.
Then
m(h) = fD(z,h)dF(z),
CC
n
Jm(z.,h)-m(z.,h n )]/n
Y
*n=2
Vn(r
i
Op
= VRa^ [m(h)-m(h J] + o
i
The conditions of this result imply Frechet differentiability at
function of
h,
in the Sobolev
norm
llhll
A = A
with different norms, rather than
,.
max{A
= A
h_
of
m(z,h)
as a
The remainder bounds are formulated
...
,A >
r 2
to allow
,
(1).
weaker conditions for
asymptotic normality in some cases.
Asymptotic normality of
either
Lemma
Lemmas
5.2 or 5.3.
5.2 and 5.3 that
cr
m(z.,h)/Vn
2>
In the v'n-consistent case of
n
In the
n
,m(z.,h,J/'/n
^1=1
l
-^
0,
Lemma
n
Var(m(z.,h )+5.)
Lemma
so that
5.3 and
o-
T.
n
^1=1
i
a
>
0,
m(z.,n)/vn'
l
- 18 -
it
>
N(0,V).
so that
follows by the central
m(h) = JD(z,h)dF(z)
will be the case that
—
5.4 with
from
5.2, it will follow
,m(z.,h)/Vn = Y. Am{z.,h n ) + S.-E[d.]}/Vn + o (1),
^1=1
^i=l
l
l
l
l
p
slower than v'n-consistent case, where
satisfies the conditions of
o-°T.
Lemma
V.
asymptotic normality, with asymptotic variance
limit theorem.
can be shown by combining
Assumption
oo;
For
ii)
linear on
llh-hll
A
—
Pn
>
Lemma
and
<
o-
U
—
.
0,
1
_.
i«5_
Pn
5.5:
U
»
-^
2oi
<r
V
-^
V,
h;
s
for
V
in
and
b(z)llhll.
A
a+s
...
> k+A
+ (llh-hgll^]
there
< e,
Lemma
equation
E[b(z)
is
5.2 then,
,h
for
]
<
12
)dF(z)
(5.2).
- 19 -
5.
,h
=
I
)ll
If
..
,
/ln(n),
E[b(z)
that
o(llh-h!l
as
)
£ b(z) llhll
Op
—
2
II
(S a
pn
2k+2A -2a
> co,
3
ncr
<
]
is
(II/3-/3
= pn +
iv)
co;
and
D(z,h;/3,h)
4
3k+2A +2A -2a
no-
,
5.2 is satisfied.
m(h) = fD(z,h;&
If
II
e
ll
IID(z,h;/3,h)-D(z,h;/3
iii)
Suppose that Assumption
V = Var(S.).
llh-h
1
satisfies the conditions of
V
£ b(z)[ll/H3
|m(z,£,h)-m(z,/3,h)-D(z,h-h;£,h)
and
IID(z,h;/3_ ,h,JII
a-k-A
n
0,
/3
)ll
and
Hp-Pgll < e
satisfying
oo
for fixed
and
)
llm(z,p,h)-m(z,/3 ,h
i)
>
llhll
>
U A_
.
e
—
llh-h.ll.
Sa
5.2:
+
),
—>
co.
m(h) = SD(z,h;f$ n ,h n )dF(z)
= v(x.)y.,
T.
_IIS.-5.il
/n —^-»
satisfies the conditions of
Lemma
and
5.4,
Appendix
C
Throughout the appendix
different uses and
= £•_,
£.
Proofs of Theorems
A:
T
and
Also, CS, M,
different in
Cauchy-Schwartz,
will refer to the
DCT
Markov, and triangle inequalities, respectively, and
to the dominated convergence
Before proving the results in the body of the paper
theorem.
may be
will denote a generic constant that
is
it
useful to state and
prove some intermediate results.
Proof of Theorem
x = (x ,x
Let
II
= sup
nil
t(x) = x(x
),
x
for
e
_1
D(z,h-h;h)| =
|
[h.(x)
a = k,/2.
_r
,
,
i)
n
^- 2s+a
„l/2 a+s-k/2
2[ln(n)J
+ vncr
a-
x e
all
=
[ln(n)/(ner
—
_
,
by
>
.
k—
—
v'na-
>
Thus, the conclusion of
-1
(x(t))
Let
= x(x(t))f
a.e.
u(t)
Lemma
(x(t))
n,
<r
/vn
and by
5.4 holds, with
g (x(t))h (x(t))]f
f
(t)[-g
(x(t)),l].
Furthermore,
hypothesis.
k-a = k
+
k/2.
v'na-
—
and
>
Thus, the conclusion of
Then by the triangle inequality, and
a
vn(r (£-p
)
=
>
'/n(r
£
=
[h (x) -
h
(x)
is
=
/Vn +
ln(n)<r
n
w
/-.
2s+a
k/2.
—
>
Also,
Lemma
Cllhll.
_
0,
.
,
.
implying
ln(n)cr
/Vn
cr
—
>
5.4 are satisfied.
m(h) = JD(z,h;h )dF(z) =
for
(t)dt,
= x
t
This function
f
,
The other conditions of Assumption
x.
ti
_
0,
>
a+s
and zero outside a compact set by continuity of
assumption about
—
*
|D(z,h;h)|
v'na-
,
a-k.^-.
,.
and
llhll
5.5.
|m(z,/3,h) - m(z,j3,h) -
< c,
so that the rate hypotheses of
oo,
[h (x(t)) -
J*T(x(t))f
,
lnlnjcr
goes to zero faster than some power of
implies that
+
)]
2
_1
small enough that
e
llh.-h._ll
s Cllh-hll
|
and
iii),
D(z,h;h) = x(x)h (x)
Choose
n ).
Then for
X.
h (x) - l]D(z,h-h;h)
Then for
1
be the compact set of hypothesis
D(z,h) = D(z,h;h
and
bounded below by
X
),
m(z,h) = x(x)h (x)/h (x),
Let
llh(x)ll.
{h_(x)/h (x)}h (x)],
Let
The proof proceeds by checking the conditions of Lemmas 5.3 -
4.1:
v'no-
Lemma
l
and
f_,
5.1
v'no-
)]
I.{m(z.,n)-E[m(z,h )]>/n
x(t)
=
(x-.t).
bounded and continuous
g
,
and by the
are also satisfied by
—
5.3 holds, for
= E[m(z,h
- 20
=
is
and
2
>
oo
V
by hypothesis and
in equation (4.3).
a
a
=
i/n<r
^<m(z.,h )-E[m(z,h
v^o- 2^.{m(z.,h )-E[m(z,h )]}/n
because
To finish the proof, note that
that for
i)
-
iii)
)]>/n + vn<r°*[m(h)-m(h
= m(z,h)
m(z,h,/3)
and
—^-»
by
satisfied by
is
iv)
condition implies
then follows by the conclusion of
Proof of Theorem 4.2:
Let
Lemma
Lemma
of
b(z)
_1
= E[a (z)f
2
JV(x)h(x)dx
for
m(z,h) = a (z)h
(xT h
conditions of
Lemma
5.3, for
i//.-a
The
r.{5.-E[5.]}/v^i + o (1).
^i l
l
p
follows from
Lemma
Proof of
sup
'
5.1:
Theorem
B.2,
lln~ £.[m(z.,/3,h)
follows by
It
)
_1
(x)
<h (x) -
2
=x
/
v n[m(h)-m(h
one obtains
,
)]ll
- m(z.,3,h
—^>
(ln(n)
and
1/2
(n«r
n )]
=
QED.
4.1.
Tauchen, 1985) that
(e.g.
E[m(z,£,h
k+2A )" 1/2
+
)]
<r)
is
= o
continuous in
(1).
p.
Therefore,
p
p
)]ll
Lemma
Also, the second conclusion
Theorem
follows by standard results
A
g^xlh^x)}] =
so that by the conclusion of
v(x),
first conclusion then follows.
Hh-hJL = O
and the
By hypothesis, the
f (x)(-g (x),l).
f (x)
(z.)g (x.)+|3
- E[m(z,£,h
a =
m(h) = JTJ(z,h;h )dF(z)
Also, here
(z)ll.
5.5 similarly to the proof of
£.m(z.,p,h
jJIn
Also, by
Lemma
Ha
except that
4.1,
_1
~
I
D(z,h;h) =
and
(x)
Theorem
5.3 are satisfied for this
= Wx.)y. =
5.
Q
(z) |x]
The second conclusion
The proof that the conditions of Lemma 5.4 are
5.4 is taken to be
= E[a
i>(x)
Furthermore,
and the fact that this last
«,
»
0.
+ k /2 = k-a.
g^xlh^x)}] = E[E[a (z)|x]f
{h (x) -
(x)
= A_ =
QED.
5.5.
satisfied proceeds exactly as in the proof of
function
—
k
>
as specified above, conditions
= A
/ln(n)
no-
0,
[h (x) - <h (x)/h (x)>h (x)].
a (z)h (x)
sup
A = A
3k/2 - 2a = 3k /2 + k /2
s >
0.
»
= D(z,h;h),
D(z,h;|3,h)
5„ =
£n
-±> N(0,V),
(1)
follows from the above arguments and by hypothesis
it
of Assumption 5.2 are satisfied, with
condition
—
a-
+ o
)]
^ n~ X!.b(z.)(llh-h
e
II
)
-^
so the conclusion
QED.
T.
Proof of
Lemma
results,
sup
5.2:
By the Fubini theorem,
„JIE[h](x) - h
(x)ll
=
0((r
)
E[m(h)] = m(E[h]).
for any compact set
- 21 -
Also, by standard
6\
Then by
v(x)
zero
outside a compact set
Let
0.
v^n[E[m(h)] -
E,
u = (x-xj/c,
J*b(x.)|X(u)|du
<
Then by DCT,
oo.
s Cb(x)llyll,
DCT
so by
vn[m(h)-m(h
so
0,
)]
*-i
l
and by
3",
= 0(vno-
X£ (u)
Let
—>
)
bounded on
cr
l
l
d K(u)/8u
denote
same dimension as
and
p
(x)
a
cr~
4a
cr
4
]/n
—
0*
—
By
0.
»
IIE[m(h)] - m(h_)ll
1
Var(p
(x.)y.)
cr
l
>
and
x
+o-v
e
—
>
V
J(x„)v
Z
as
it
x„+crv g
2
IIE[p
cr
X(u
3"
,v)
is
for all
= [9x (x
>
a
llE[h]-h
0.
zero for
J
a
show that
oo,
l
—
>
v =
-U
N(O.V)
it
(x.)y.]-m(h
p
l
l
l
cr
k +1
cr
l
>
V
and
=
—
>
(t)
By
V.
V.
Then, for small enough
is
zero for
X(u)
l
outside a bounded set
(x)
x
is
)ll
—
Therefore, to show
0.
t[p (x.)y.y'.p (x.)']
v
all
which
(x.)y.)
l
cr
so
v e V,
Var(p
cr
|IE[p
cr
in its interior.
+crv)/3t]v,
Therefore,
bounded, and converges to
II.
an identity matrix with the
is
cr
—
£ VhV
)>
vno- {m(h)-E[m(h)]>
(x.)y.]ll
suffices to show that
x„ £ J.
2
For
Let
3"
if
cr,
x„ e
Tcj(x )[I®X (J(x
is
)v
and a mean value expansion give
bounded over for
zero for
+ u,v)]dv
- 22
3"
2
v e
V
and converges
p (x,(x„)-cru,x„) =
cr
1
Z
Z
Jcj(x +crv)[I®X ([x (x„+crv)-x (x )]/cr + u,v)]dv
is
m(h
a
I
-h>
continuous differentiability of
cr
bounded and zero outside
w(t)
{E[m(h)] -
cr
[x (x +crv)-x (x )]/cr
to
E.teT-EleTlWn
/
v ncr <m(h)-E[m(h)]> -^-> N(0,V).
l
3",
Then by M,
a
Vn
data and
i.i.d.
be a compact, convex set containing
then
Also,
0.
>
£
Thus, to show
and hence
0,
having bounded support,
x„ £ J
2
—
cr
1
cr
cr
and
cr
and the last equality follows by a the change of variables
y
1
E[llp (x.)y.ll
having
Jw(t)[I®:K ((x(t)-x)/<r)]dt
where
suffices, by the Liapunov central limit theorem, to
<r
so by
v^T(r
k~£
=
)/cr,v)]dv,
m(h) = Y.p (x.)y./n.
Then
X(u)
,
show that
suffices to
it
I
z
-
= X.<5.-E[5.]>/^n + 2\<e f-E[eT]>/vn
o(l)
follows that
it
Jw(x +o-v)[I®X ((x (x +crv)-x
(t-x_)/o\
cT = sT-S..
for
E[m(h)] = m(E[h]),
3",
Therefore,
0.
-k -I
=
->
]
)
QED.
(1).
Note that
5.3:
x(t)
a+S
2
eT
|
By
small enough
all
= 0(v/nV s
(x)ll
last equality follows
with probability one as
5.
>
where the
Q
p
l
Lemma
E[
—
sT
for
= vfi{m(h)-E[m(h)]> +
+ o(l) = YA8.-E[8.]}/Vn + o
Proof of
„IIE[h](x) - h
m(h) = £.,S./n.
so that
v(x.+cru)X(u) ^ b(x.)|X(u)|
bounded support,
-£»
s VnCsup
)]
= [JV(x)K (x-x.)dx]y. = [JV(x.+cru)X(u)du]y.,
5.
by a change of variables
|5j|
m(h
all
u
outside a compact set,
by the dominated convergence
theorem, for
= x
t
J(t)
= 3x
k +2
<r
i
*E[p (x.)y.y'.p (x.)']
1110"
C
1
4
E[llp
o-
(x.)y.ll
0"
OIk +4^JV«P
j
1
4a
]/n *
o"
4
(x.)ll
0"
4
lly.ll
4a
^
]/n
o-
4
E[llp
Lemma
u
l
C/(no-
=£
1
(v'Sr
a
p
5.4:
for
A
E[b(z)hA«Tj 2) = ° IDn
'n
p
*
j
and
1
Lemma
1O1O
^11
= D(z.,y.K («-x.))
i
j c
J
.
ij
d £ A +
By
Note that
i.
E[D
n ~ n
£ E[b(z)
.]
OAOA
Vn[YMz.)/n]\lh-h n
^11
(e.g.
^2k-2A
2
2
]E[llyil
Serf ling,
a
(vncr <(E[D
p
D(z.,h).
~
]cr
Lemma
2
1/2
n.])
By
<r
= E[D..|z.]
Jl
2k " 2A
2
llyll
D.
«1
Let
>
]cr"
and
i
E[D
vW
5.2.2b),
2
1/2
.])
-2
n
|
-D
^. .<D. -D.
**ij
(<r
.+E[D.
«i
i*
ij
a~ k
=
>/n)
T. .D. ./n
^ij ij
and
D.
1'
]}|
-iL> 0.
~\/Vn)
Then
-H> 0.
A
the conclusion follows by T.
E[D(z.,h-hJ
By
J
2
Lemma
D..
ij
It
2
2
](
llh-h^ll
JT>(z,,y.K (•-x.);|3_,h.JdF(z).
IID..-D..II
ij
ij
lO
lO
i
2
J
—
»
0.
Then
QED.
n
o"
)
.
A
follows by a standard argument, similar to the proof of
= D(z.,y.K (•-x.);0 n ,h n ),
0'
i
oJ
l
Let
E.
J
=
D.
i«
£ E[b(z)
]
B.4,
/
cr'TAD. • -E[D. • ]-D(z.,h )+E[D(z.,h,,)]>/v n -?-> 0.
Jlm(z.,0,n)-m(z.,0_,h_)ll /n -5-» 0.
^i=l
l
i
that
1
IJ
l
^i
5.1,
=E[D..|z.]
=
l
Thus, by Chebyshev's inequality,
5.5:
for
s
.]
l
Lemma
2
i»
p
Hh-hJL
l
Proof of
2
Then by a V-statistic projection on the basic observations
l.
+(E[D
0,
=
1
2
=
Ilfi-h.ll.
.
ij
ij
—
\\
r.D(z.,h)/n
n
D(z,h),
ii
E[b(z)
Then by hypothesis
-^-» 0.
l
2
The
0.
QED.
Hh-lrJI
B.3,
a
<r
B y linearity
J of
2
-*
l
E-=EM!\^ij
^l=l^J=l
and
l)
l
1/2
Y.{m(z.,h)-m(z.,hJ-D(z.,fi-hJ>ll s
llcAi
1
k
{l+v(x,(t)-oni,t)>f-(x (t)-(ru,t)dtdu/n
conclusion then follows by the Liapunov central limit theorem.
Proof of
1
4
l
o*
=
<l+v(x.)>]
(x.)ll
(T
1
1
V.
>
1
E[llp
1
(x,(t)-«ru,t)ll
l
—
(t)-<ru,t)p (x.(t)-o-u,t)'f,,(x (t)-o-u,t)dtdu
Jpo" (x,(tHni,t)E(x
1
1
<r
1
U 1
4a
Also,
1
?k +92
o-i
=
D.
and
]/<r
,
2
(A.6)
iv)
u = [x (x„)-x
Then by the change of variables
(t)/3t.
n
5.
l
By
J
and
Lemma
5.
* b(z.)lly.K («-x.)IL (110-0<r
l
l
l
A
B.5,
+
= D(z.,y
5.
llfi-h.ll.
A
ji'
= T. ,D../n.
^j=l
)
CS,
- 23 -
(— x.);£,h)
.K
a-
j
l
and
j
= E[D..|z.] =
l
i
II
.
ij
= J. ,D../n,
^j=l ji
J
D.
Lemma
l
Also, by Assumption 5.2,
ji
* b(z.)lly
l
k-A
.llo-~
j
iO (5
p
0n
+tj
A
2).
n
Then by
2
^.^HS.-a.ll
* Co-^.IID.j-D.yi
/!!
2k_2A
By the data
O
2a
2
5\H5.-S.II /n]
2ce
£
-«
E[llg
cr
+ n ^[IISjII
_ 2a -1 -2k-2A
„
s Qr n cr
3
proof of
er
/n
2l.H6.-5.il
*i
1
Lemma
—>
V
——» V
5.3 note that
n
~\(5
p
Q
pn
+7)
2
2
/n](j:.lly.ll
/n)
A 2
2)] ) = o (1).
n
p
2
E[II5.-5.II
a
cr
T and
follows by
5.
= p
E[5.]
2
n
2
]
]
* Gr
£ Qr
2a
(E[lln
2a
n
^
(D -«
u
2
1
(E[IID
II
2
)B
+ E[IID
]
Under the conditions of lemma
-£-» 0.
1
—
>
0,
V.(5.-y
.5 ./n)'
u\ 1 ^j.5 ./n)(5.-y
1 ^j
j
j
y.HS.-S.II
^1
1
1
E[5.6'.
11
—5->
V,
]
2
5.2, it
/n -^>
+ n
]
2
1
E[IID
n
ll
]
2
II
])
was shown
in the
—
-1
>
V,
and
n
4a
o-
follows by
J
Lemma
4
E[H6.ll
]
—
>
0.
5.3.
so the conclusion follows by T.
As shown
in
Therefore, by M,
1
- 24 -
T.
Under the conditions of Lemma
defined in the proof
of
e
p (x)
r
xr
i
cr
so that
the law of large numbers.
2a
—
0,
>
]
1
for
(x.)y.
J
c
1
(r
~k
[5:.b( Z .)
1
5.2 that
1
that proof,
CC
2a
0,
1
Then
* Qr
= OJ[a-
)
O A
2
/!!
i.i.d.,
E[<r
so by
J M,
2
+ llh-h_ll.
i(ll£-0_ll
(r"
2
QED.
Appendix
Technical Details
B:
This appendix derives rates for uniform convergence in probability in Sobolev norms for
derivatives of kernel estimators.
sup^.sup
Lemma
„IIS ri(x)/Sx
compact, Assumption K
and
n
(B.l)
\\h-E[h]\\
.
It
—
set
X,
llhll
=
.
P]
<
(ln(n)
p >
for
<x>
A £
for
2,
Et\\y\\
and
j,
=
<r
P \x]f (x)
is
Q
such that
<r(n)
1
bounded,
1/2
(no-
f 1/2
k+2J
P
suffices to prove the result for
).
a scalar.
y
For each
^
£
by
j,
X(u)
having bounded support the order of differentiation and integration can be interchanged to
obtain
E[S h(x)/dx
derivative of
1
n"
is
Pn
n
cr"
]
= dTE[h](x)/axf
and
h(x),
k "£
X." 1 y.fc((x-x.)/o-)
Next, let
k(x)
the corresponding derivative of
and
a E[h](x)/dx
>
£
suppressed for notational convenience.
1/p
;
o-
(B.2)
y.
in
= Pn
1/p
,
y.
>
'l
Pn
1/p
;
y.
J
m
£
= E[H(x)],
1/p
y.
J
,
for some
Prob(H(x) * H(x)
Let
8 = [ln(n)/(ncr
x) £ Prob(y.
bounded and
p
>
£ nProb(|y.|
y.)
k+2£ 1/2
)]
.
argument of
n
let y.
=
y.,
I
y.
I
H(x) =
Let
l
*
in
*
1/p
H(x) =
so that
Note that by Bonferonni's inequality,
L- =1 y- fc((x-x.)/o-).
£ nProb(y.
where the
Pn
<
order partial
£
K(x),
Also, for a constant P,
= -Pn
th
denote and
H(x)
.
For
>
Pn
1/p
)
£ E[
P |x.=x]
c(x) = E[ |y.
|
2,
- 25 -
for some
y.
i
s n)
l
|y.
|
P ]/P 1/p
and
P
.
fixed, by
c(x)f (x)
Q
is
bounded
is
<r(n)
Then
oo.
>
=
J
Proof:
E[\\y\\
is satisfied
/ln(n)
<r(n)
from the text that for a closed
II.
Suppose that
B.l:
Recall
<r(n)
—
5
(B.3)
1
|E[fi(x)] - E[H(x)]|
1
* C5" r"
k "£ (1/p)_1
n
<
s 5
E[ y.
|
|
_1
k
*E[1( ly. l>Pn
<r
P x.] fc((x-x.)/<r)
|
|
1/p
)
|y.
|
|«(x-x.)/<r)
I
]
]
|
__-l -I (1/pH...,
,
..
., k/2 (l/p)-l/2.,
,,,
,1/2,
K
,
*
=
cr
n
n
/ln(n)
S «(v) c(x-<rv)f ;(x-crv)dv = 0(cr
)
= C5
Next, by
compact,
Lipschitz, sup
k(x)
-
3|fi(x)-fi(x)
Cn
can be covered by less than
it
the centers of these open balls,
=
(j
3k
* Cn
I
k~£_1
(1/p) "" 3
<r~
^ Cn
J(e)
by
x,
-3k
-3
n
Let
.
Then for
.
X
Also, by
.
open balls of radius
J(e)),
1
equal to the center of an open ball containing
|E[H(x)]-E[H(x)]
denote
x.
je
x. (x)
^ E[ |H(x)-H(x)
|
|
]
follows that
it
sup v |H(x)-E[H(x)]| s sup (y .|H(x)-E[H(x)MH(x. (x))-E[H(x.
(B.4)
* Cn
+ su P(r |<H(x. (x))-E[H(x. (x))]>|
•*
je
Je
Note that ln(n)
eq.
(B.4),
it
2
<
Cn
and, by
follows by
p a 2
cr(n)
and
k
—
>
oo
3k
that for
J
^
-7V-73
Cn
(B.5)
tr
cr
5 = [n
7
7
YXy. k{(x-x.)/<rf] *
in
l
-k-2£
Co-
n
by
2
E[y |x]f (x)
cr
-7Y-7H
«r
7
Prob(sup T |H(x)-E[H(x)]|
*
^
(
^Prob(|H(x. )-E[H(x.
J(
^ 2y
>
jc
>
M,
n
,-,,.,2. , ,
M5(l - C/[M ln(n)ncr
M
<r
>
J
i
(B.4),
and
MS/2)
.
>
...
n
y.
cr
&((x-x.)/cr)
bounded by
large enough,
)]|
>
MS/2)
]
k "£
2 2
k_£
1+(1/p)
)
cr"
S])
y. k({x. -x.)/<r) + Cn
^ exp(-n S /[2nVar(cr"
'in
jc
l
^J=l
.
_ 3k
£ Cn
_2.„ r -k-2£
1/p -k-£_ n ^ _ k
_ k+2£_2. fl A 1/p «-„
,
cr 5))
+ n
exp(-n5 /C[cr
cr
5]) £ Cn exp(-Cncr
5 /(l + n
,
- 26 -
._
MS/2.
i
-k-£
eq.
)
-Y-7I
7
7
£
^JVt(u)*E[yf|x.=x-<ru]f_(x-<ru)du
J
MS) * Prob(sup.|H(x. )-E[H(x.
)]|
k,3/2,
in
-
MS
big enough,
i
Then by
C
As usual for kernel estimators,
0.
>
all
je
Then for the constant
.
.._,,
.
7
i
and Bernstein's inequality, for
,
—
/ln(n)]
s
Ely
J k{(x-x.)/<rf]
bounded.
+ sup.|H(x. )-E[H(x. )]|.
k+2
< Ccr
cr
,
Also, note that
k "£_1
(r"
(x))]}|
J
bounded,
no-
(1/p) " 3
_ (l/p)-3
-k-£-l
.....
5-(2/p)
,
r
r tr k+2,1/2,
x ,2,
= MS(1 - r.,,
oC/[M
ln(n)n
)
Cn
a-
,,,
o(l).
|
I
£ Cn
31c
exp(-CM
2
2
•ln(n)) *
Cexp(-[CM -3k]ln(n)).
Since these inequalities hold for any
M,
n
sup„|H(x)-E[H(x)]| =
(5).
Then by
eq.
(B.3)
=
(5).
Consider any
sup
x
|H(x)-E[ft(x)]|
sup„|H(x)-E[H(x)]| =
M/2)
<
e/2
e
Prob(5~ sup„|H(x)-H(x)
so that by eq. (B.2),
by
large enough,
for all
M/2)
>
e/2
<
so that
for
all
E[ |y.
n.
p ]/P 1/p
e/2,
<
For this fixed
ProWS^sup^ |H(x)-E[H(x)]
such that
Then by the triangle inequality,
n.
P
Choose
M
there exists
(5)
and the triangle inequality,
> 0.
I
follows that
it
Prob(5~ sup^lHCxJ-ElHtx)]!
P,
>
I
M) <
>
-1
ProbCS^sup^lHtxJ-HCxH
(B.2)
>
M/2) + Prob(5
and the triangle inequality,
su P;r |H(x)-E[H(x)]
sup T |H(x)-H(x)
=
|
0(c-
If Assumptions
K, H,
Therefore, by eq.
The conclusion then follows by
(5).
applying this conclusion to each derivative of up to order
B.2:
< e.
p
J.
Lemma
M/2)
>
|
and by
j
and Y are satisfied for
d £ j+s
QED.
bounded.
cr
then
\\E[h]-h
II
.
=
m ).
Proof:
that by
1
around
=
-
a
IIS
it
Lemma
B.3:
J
h/5x J -
Ccr
C
m
o-
J
a 'h (x)/ax
J
ll
mXX(u){® m
r
=l
]
=
d s j+s
Hh-hJI
.
J
=
J
=
1/2
(n<r
.f;K(u)d
J+m
J+m
h(x)/ax
B.l
k+2J )~ 1/2
+
h(x+u<r)/ax J du.
Also, by
in
cr
j),
h(x+S:u)/ax
lia
J
Then by a Taylor expansion
1
f 1 u>®{a
J+£
h(x)/ax J+£ )du
J+m }dull
J+m
ll
*
Ccr
m
.
QED.
and B.2 are satisfied and Assumption
then
(ln(n)
=
ll^jc/c^WuK®
u>®{a
m [T|K(u)|llull m du]llsup
U
C„,
If the hypotheses of Lemmas
satisfied with
(B.7)
J
9 E[h](x)/dx
for constant matrices
0,
so
1
h (x) = TK(u)h (x)du.
follows that
+
*
10"
having finite support,
X(u)
J~K(u)du =
(B.6)
k
E[h](x) = E[y.K (x-x.)] = J"h(t)[X((x-t)/<r)/cr ]dt = TX(u)h(x+u(r)du,
Note that
S
<r
).
P
- 27 -
H
is
Lemmas
Follows by
Proof:
Lemma
B.l
If Assumption. K
B.4:
and B.2 and the triangle inequality.
m(h)
is satisfied,
QED.
\m(h)\ £
is linear,
Cllhll
then
,
E[m(h)] = m(E[h]).
By
Proof:
that
UK (•-xJIL =
linearity of
m(h),
(F
F
Let
U
it
continuous on
draws) as
—
J
follows that
g(x)K (x-x)f
£,
it
>
x
(x)
m(g(x)K («-x))
—
—
x
m(J' v=,g(x)K 0" (•-x)F.(dx))
= Tm(g(x)K (»-x))F
continuous
Also, since each
E[m(h)].
>
of up to order
J
(T
Furthermore, by
m(E[h]).
is
J
o*
ll.f„g(x)K (•-x)F I (dx)-E[h]ll
mtJUgMKCT (-x)F.(dx))
J
the empirical measure
(e.g.
JV^mtgCxJK (•-x))F.(dx)
with respect to
follows that
>
£
on
Then, since
oo.
to
—
bounded and
is
and hence
0,
>
.
A
A
having finite support,
F.
J
to
J
to
m(E[h]) =
and
be a sequence of measures with finite support, that
(x)
to
derivative of
Hence, by
6\
J
i.i.d.
W,
x i
such
\J
converge in distribution to the distribution of
and bounded on
&
a compact set
is
for all
0,
E[m(h)] = .T„m(g(x)K (•-x))f_.(x)dx
m(J\ g(x)K0* (•-x)f ri (x)dx).
from a sequence of
compact there
m(g(x)K («-x)) =
and hence
0,
(*>
to
X
having finite support and
X(u)
CT
¥
Then
(dx).
T
gives the conclusion.
J
QED.
Lemma
linear
dm(h
Proof:
B.5:
D(h)
If Assumption K
with
and for given
\m(h)-m(h)-D(h-h)\ = o( h-h
+ C,yK (>-x))/dC,\ r
Let
is satisfied
II
.
=
II
)
as
h
II
with
h-h
II
—>
Hhll.
<
oo
there is
then
0,
D(yK (—x)).
h„ = h + <yK (--x),
so that
llh
-fill,
s C"yK ('-x)ll *
|m(h»)-m(fi)-CD(yK (»-x))|/£ = |m(h J-m(fi)-D(h „-h) |/< =
- 28 -
o(llh
-fill
Then
CC,.
./|Cl
)
=
o(l).
QED.
References
Bierens, H.J. (1987): "Kernel Estimators of Regression Functions," in Bewley, T.F., ed.,
Advances
in
Econometrics: Fifth World Congress, Cambridge: Cambridge University Press.
Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple
regression and correlation. Journal of the American Statistical Association. 80
580-598.
Hall, P.
(1992):
The Bootstrap and Edgeworth Approximation, New York: Springer-Verlag.
Hardle, W. and T. Stoker (1989): "Investigation of Smooth Multiple Regression by the
Method of Average Derivatives," Journal of the American Statistical Association, 84,
986-995.
J. A. and W.K. Newey (1992): "Nonparametric Estimation of Exact Consumer
Surplus and Deadweight Loss," working paper, MIT Department of Economics.
Hausman,
Huber, P. (1967): "The Behavior of Maximum Likelihood Estimates Under Nonstandard
Conditions," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability, Berkeley: University of California Press.
Lerman, S.R. and C.F. Manski (1981): "On the Use of Simulated Frequencies to
Approximate Choice Probabilities," in Manski, C.F. and D. McFadden, eds., Structural
Analysis of Discrete Data with Econometric Applications, Cambridge: MIT Press.
Li,
K.C. and Duan, N. "Regression Analysis Under Link Violation," Annals
17,
of
Statistics,
1009-1052.
Matzkin, R. and W.K. Newey (1992): "Nonparametric Estimation of Structural Discrete
Choice Models," mimeo, Department of Economics, Yale University
Newey, W.K. and P. A. Ruud (1992): "Density Weighted Least Squares Estimation," working
paper, MIT Department of Economics.
Powell, J.L., J.H. Stock, and T.M. Stoker (1989): "Semiparametric Estimation of Index
Coefficients," Econometrica, 57, 1403-1430.
Robinson, P. (1988): "Root-N-Consistent Semiparametric Regression," Econometrica, 56,
931-954.
Ruud,
P. A. (1986): "Consistent Estimation of Limited Dependent Variable Models Despite
Misspecification of Distribution," Journal of Econometrics, 32, 157-187.
Stock, J.H. (1989): "Nonparametric Policy Analysis," Journal
Statistical Association, 84, 567-575.
Tauchen, G.E. (1985): "Diagnostic Testing and Evaluation of
Journal of Econometrics, 30, 415-443.
Willig, R.
(1976):
of the American
Maximum
Likelihood Models,"
"Consumer's Surplus without Apology," American Economic Review, 66,
589-597.
- 29 -
5
93
9
078
Date Due
Ub-26-67
MIT LIBRARIES DUPL
3
^OflO
D0fl4b441
1
Download