Document 11158560

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/convergencerates00newe2
uewcjf
HB31
.M415
working paper
department
of economics
CONVERGENCE RATES & ASYMPTOTIC NORMALITY
FOR SERIES ESTIMATORS
Whitney
95-13
K.
Newey
Jan. 1995
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
CONVERGENCE RATES & ASYMPTOTIC NORMALITY
FOR SERIES ESTIMATORS
Whitney
K.
Newey
Jan. 1995
95-13
MASSACHUSETTS INSTITUTE
OF TFCKNtunGY
AUG 1 6
1995
95-13
CONVERGENCE RATES AND ASYMPTOTIC NORMALITY
FOR SERIES ESTIMATORS
Whitney K. Newey
Department of Economics, MIT
January, 1995
ABSTRACT
This paper gives general conditions for convergence rates and asymptotic normality of
series estimators of conditional expectations, and specializes these conditions to
polynomial regression and regression splines.
rates are derived.
estimators, covering
Asymptotic normality
is
Both mean-square and uniform convergence
shown for nonlinear functionals of series
many cases not previously
Also, a simple condition for
treated.
v'n-consistency of a functional of a series estimator
is
given.
The regularity conditions
are straightforward to understand, and several examples are given to illustrate their
application.
Keywords:
Nonparametric Estimation, Series Estimation, Convergence Rates,
Asymptotic Normality.
Whitney K. Newey
Department of Economics
MIT; E52-262D
Cambridge,
MA
Phone:
(617) 253-6420
Fax:
(617) 253-1330
E-mail: wnewey@mit.edu
02139
Support was provided by the NSF for research for this paper.
Introduction
1.
Nonparametric estimation
is
depend on a conditional expectation
analysis, one could model the
E[y|x]
is
1990).
Series estimation
unknown.
For example,
demand
in
Hausman and Newey,
One way to estimate
1995).
Also,
it
is
convenient for imposing certain restrictions on
is
such as additive separability
relatively
is
Models often
by least squares regression on approximating functions, referred to here as
series estimation.
E[y|x],
that
applications.
demand function as a conditional expectation with unknown
functional form (e.g. see Deaton, 1988, or
E[y|x]
many econometric
useful in
(e.g.
see Stone, 1985, or Andrews and Whang,
computationally convenient, because the data
few estimated
is
summarized by a
coefficients.
Large sample properties of series estimators have been derived by Stone
(1985),
Cox
(1988),
Andrews and Whang
(1990),
Newey
(1988, 1994a, b, c).
This paper extends previous results on convergence rates and
Eastwood and Gallant
Gallant and Souza (1991),
These extensions include convergence rates that
asymptotic normality in several ways.
are faster than some published ones
(1991),
(e.g.
such as Cox, 1988) or have weaker side
conditions than current results (e.g. than Newey, 1994a).
Also, asymptotic normality is
shown for regression splines and/or nonlinear functionals that are not covered by the
results of
Andrews
(1991),
and a simple primitive condition for v'n-consistency
Newey
K
that
is less
restrictive than previous results (e.g. those of
1994), requiring only
2
K /n
given.
under an upper bound on the growth of number of
In addition, the results are derived
terms
is
—
for regression splines and
>
Andrews
3
K /n
—
>
1991 or
for power
series estimators of linear functionals.
To describe a series estimator,
expectation and
g
let
g n (x) = E[y|x]
denote some function of
x.
denote the true conditional
Also, consider a vector of approximating
functions
K
p (x) = (P 1K (x)
Pkk (x))
''
(1)
having the property that a linear combination can approximate
(i
=
1,
n)
...,
g(x) = p
where
A
denote the data.
K
series estimator of
will be nonsingular with probability approaching one,
is
P (x n )]'.
(2)
Under conditions given below,
denotes any symmetric generalized inverse.
B
(y.,x.),
K
[p^x^
P =
= (P'PfP'Y,
(x)'/3,
g n (x)
Let
g n (x).
and hence
P'P
will be the
(P'P)
standard inverse.
Series estimators are convenient for imposing certain types of restrictions.
example, suppose that
E[y|x]
is
additively separable in
two subvectors
x
For
and
x,
of
cL
x,
so that
E[y|x] = g (x ) + g (x ).
a a
b b
(3)
This restriction can be imposed by including in
x
or
3.
x,
,
D
but not on both.
Additivity of
p (x)
functions that depend either on
then results from
g(x)
combination of these functions of the separate arguments.
linear model,
where one of the components
is
it
being a linear
Another example
a linear function, as
a partially
is
in
E[y|x] = x r + g (x )
a Q
b b
(4)
This restriction can be imposed by including only
x
and functions of
cL
The value of imposing either of these restrictions
is
when
of categorical variables.
We
x
a
p
in
D
(x).
that efficiency improvements
result, in the sense that convergence rates are faster.
particularly convenient
x,
The partially linear model
has a large number of elements,
e.g.
when
x
a
is
consists
explicitly consider both types of restrictions in this
paper.
There are several different types of series estimators that could be used, such as
Fourier series, power series, and splines.
because, as in Andrews (1991),
g
is periodic,
(x)
which
is
Fourier series are not considered here
is difficult
it
to derive primitive conditions except
not relevant for most econometric applications.
when
Primitive
conditions will be given in Sections 5 and 6 for power series and regression splines.
Convergence Rates
2.
The results
will follow
from a few
primitive, easy to interpret conditions.
The
first condition is
Assumption
1:
(y.,x
(y ,x
)
are
)
The bounded conditional variance assumption
convergence rates.
For the next condition
norm for a matrix
Assumption
P
K
= Bp
(x)
2:
(x);
K
xeX
IIP
(x)ll
there
K
and;
£ < (K)
Q
and
ii)
let
IIBII
is
K(n)
is
bounded.
to relax without affecting the
= [trace(B'B)]
be the support of
There
K =
Var(y|x)
is difficult
1/2
is
K
K
E[P (x)P (x)']
is
a sequence of constants
such that
be the Euclidean
x..
a nonsingular constant matrix
the smallest eigenvalue of
i)
zero uniformly in
sup
K
For every
K
X
Also, let
B.
and
i.i.d.
2
C Q (K) K/n
—
»
B
such that for
bounded away from
satisfying
C n (K)
as
n
-4
«.
This condition imposes a normalization on the approximating functions, bounding the
second moment matrix away from singularity, and restricting the magnitude of the series
terms.
This condition
is
useful for controlling the convergence in probability of the
sample second moment matrix of the approximating functions to
the Euclidean norm.
This
is
Newey
(1988) and
Andrews
(1991) also
also difficult for Gallant's Fourier flexible form.
its
expectation, in
impose bounds on the smallest
The reasons for
difficulty are discussed in detail in Gallant and Souza (1991).
this
moment matrix.
eigenvalue of the second
Primitive conditions are given below for
regression splines and power series when the density of
with
CVK
equal to
C n (K)
2
—
K /n
restrictions
CK
3
—
K /n
or
»
and
mentioned
»
To do
so,
let
is
it
|gl
Assumption
as
K
—
3:
d
=
leading to the
useful specify a rate of
A
(A.
= A
)'
let
x,
denote a vector of
|A|
= E-.A.,
and
|
£d
sup
|A|
xg;r
For an integer
d £
1
g(x)/9x
la
«««ax
there are
a,
r
r
|
such that
£
lg n ~P
'Pv'h =
derivatives up to order
d
K
shrink at
The integer
.
d
will be specified
the order of the derivative for which a uniform convergence rate
x,
^^
'
m.
»
This condition requires that the uniform approximation error to the function and
integer
be
d
Define
max
|;V
C,
in the introduction.
nonnegative integers, having the same dimension as
any nonnegative integer.
bounded away from zero,
is
respectively, for a constant
For controlling the bias of the estimator
approximation for the series.
x
a
is
related to the smoothness of the function
and the size of
d.
g n (x)
a =
that exist and
where
s/r,
r
is
s
is
below as
The
the dimensionality of
For example, for splines and power series and
assumption will be satisfied with
derivatives of
g n (x),
derived.
is
its
d = 0,
this
the number of continuous
the dimension of
x.
It
should also be
noted that to derive a mean-square convergence rate only a mean square approximation rate
is
needed
(i.e.
E[<g (x)-p (x)'P„}
u
K.
]
= 0(K
rate specified here, but for simplicity
we do
To state the general convergence rate
rather than the uniform convergence
)),
make
not
result,
denote the cumulative distribution function of
x.
l
C d (K) =
max
lx|sd
sup
A
xeX
lia
P
K
(x)ll.
this generalization explicit.
some notation
and
is
required.
Let
F
Q
(x)
It
assumed throughout that
will be
exists whenever
result on
Theorem
appears
it
C rf (K) £
for large enough
1
and that
K,
C H (K)
The following result gives a general
in the assumptions.
mean-square and uniform convergence rates.
If Assumptions
1:
1-3
d =
are satisfied with
2
S[g (x) - g(x)] dFQ ( X ) = O (K/n + K
then
2CL
).
p
some
Also, if Assumptions 1-3 are satisfied for
\g -
This result
is
g
\
d
= O «: (K)[VR/Vn + K~
p d
similar to
Newey
d £
a
]).
mean square
is
K
(1994a), except that the requirement that
2
C n (K) K/n
depend on the data here and the requirement that
square error result
then
different than Andrews and
error, rather than sample
Whang
mean square
—
is
>
does not
The mean
different.
(1990), in applying to integrated
error, and in imposing Assumption 2.
The conclusion for mean-square error leads to optimal convergence rates for power
series and splines,
i.e.
that attain Stone's (1982) bound.
K
corresponds to a variance term and
to a bias term.
these two terms go to zero at the same rate, which occurs
the same rate as
*""'"
n
convergence rate will be
rate will be
n
-r/(r+2s)
,
(and the side condition
n
0,
K
when
C n (K) K/n
is
chosen so that
goes to infinity at
—
is satisfied),
>
a =
s/r,
will not be optimal.
K/v^i +
K
1/2— s/r
,
For example, for splines and
which cannot attain Stone's
Nevertheless, these uniform convergence rates improve on some in the
literature, e.g. on
Cox
the
the
which equals Stone's (1982) bound.
the corresponding rate will be
(1982) bound.
K
When
essentially
For power series and splines, where
.
The uniform convergence rates
=
K/n
The term
(1988).
Also,
it
is
not yet
known whether
it
is
attain the optimal uniform convergence rates using a series estimator.
possible to
d
3.
Asymptotic Normality
There are many applications where a functional of a conditional expectation
For example, a common practice
interest.
function in log linear form, where
y
is
in
demand analysis
is
the log of consumption,
is
of
estimation of the demand
x
is
a two
dimensional vector with first argument equal to the log of price and second the log of
income, and the estimated demand function
analysis
e
is
g(x)
.A
functional of interest in demand
approximate consumer surplus, equal to the integral of the demand function
is
over a range of prices.
6 =
e
For a fixed income
i(lnt ' lnI)
dt
I
an estimator of this functional would be
(5)
.
Jjj
An asymptotic normality result for
this functional could be useful in constructing
approximate confidence intervals and
tests.
This Section gives conditions for asymptotic normality of functionals of series
In this Section
estimates.
we focus on
i/n-consistent case in the next Section.
functional of
g,
i.e.
To describe the
results, let
a(
• )
be a vector
a mapping from a possible conditional expectation function to a
The estimator
real vector.
the "slower than 1/Vn" case, and discuss the
will be
assumed to take the form
9 = a(£).
(6)
For example, the approximate consumer surplus estimator satisfies this equation with
a(g)
=
e
s
e
where
gn
To use
'
Q
dt.
= a(g
The true value corresponding to
this estimator will be
(7)
),
denotes the true conditional expectation function.
6
for approximate inference procedures,
asymptotic variance estimator.
it
is
important to have an
Such can be formed from a delta-method estimator of the
I
variance of
A =
A
when
as a function of the estimated coefficients
6
aa( P
exists,
K
'/3)/8pi
g,
and otherwise
A
let
be any vector with the same dimension as
A
regularity conditions given below will be imply that
approaches one
in large samples.
is
The
exists with probability that
2
t = E^p^x-lp^x.)' [y.-g(x.)] /n.
(8)
just the usual one for a nonlinear function of least squares
A
The vector
coefficients.
£.
Let
Q = P'P/n,
V = A'Q~IQ~A,
This estimator
Let
£.
is
a Jacobian term, and
Q ZQ
is
the White (1980)
estimator of the least squares asymptotic variance for a possibly misspecified model.
This estimator will lead to correct asymptotic inferences because
accounts properly
it
for variance, and because bias will be small relative to variance under the regularity
conditions discussed below.
Some additional conditions are important for the asymptotic normality
Assumption
E[{y-g_(x)>
4:
4
|x]
is
bounded, and
Var(ylx)
This assumption requires that the fourth conditional
strengthening Assumption
that
it
The next one
1.
is
is
moment
result.
bounded away from zero.
of the error is bounded,
a smoothness condition on
be approximated sufficiently well by a linear functional when
requiring
a(g),
g
is
close to
%
Assumption
4 2
£
,(K)
K /n
for some
that
Either
5:
—
c
a(g)
is
linear in
and there exists a function
>
C,
a)
>
and
all
g,
g
lla(g)-a(i)-D(g-i;i)ll £ C(|g-il
The interpretation of
D(g;g)
is
with
2
d
that
)
it
D(g;g)
lg-gn
and
is
or;
g,
l
< e,
For
b)
that
is
lg-g
IID(g;g)-D(g;g)ll
as in Assumption 3,
d
l
,
< e,
s L g
I
a functional derivative of
it
d
I
and such that
g
linear in
is
g-g
a(g).
I
true
-
d
Indeed,
this assumption implies that
norm
I
g
is
is
Frechet differentiate in
often straightforward to verify.
easy to check that
will be satisfied with
it
D(g;i) = JPg(lnt,lnT)e
as long as
[lnE,lnp]xOnT>
note that for
lg-g Q
<
l
is
expansion of
e
with respect to the
g
'
d
=
and
dt,
(9)
contained in the support for
lg-g
l
Q
and
g
e,
<
and
lla(g)-a(g)-D(g-g;g)ll
=£
z
gives
z
|e
are in some bounded set.
/Pc|g(lnp,lnT)-i(lnp,lnI)|
2
i
= e
z
Assumption
n
g = gn
denote the derivative at
)
a(g)
5:
from Assumption
3,
is
a scalar, there exists
and there exists
zzz~
~ 2
Therefore,
dp * C(p-E )( lg-il
such that
K
~
is
)
Let
^ C g
|
E[g-.(x)
K.
for
,
I
2
]
—
d
and
>
K.
bounded away from zero.
This assumption says that the derivative
mean-square norm
estimator
2
Q
derivative.
|D(g)|
such that
(x)'/3..
K.
D(g,.)
for some
-e -e (z-z)| £ C|z-z|
.
C
g„(x) = p
bounded on
so that a mean-value
,
The next requirement imposes some continuity conditions on the
D(g) = D(g;g
this is so,
will be uniformly
g
z
d J e /dz
z,
To see that
x.
i
Also, for any scalar
around some other point
z
consumer surplus example
In the
~
z
when
C
i(lnt lnT)
and
e
the range of integration.
constant
is
..
I
This condition
it
a(g)
6
(E[g(x)
2 1/2
])
.
restriction imposed
is
cases of interest.
When
that
a(g)
a(g)
is
is
continuous in
lgl H
.
but not in the
The lack of mean-square continuity
not v'n-consistent, and
is
is
is
also a useful regularity condition.
a scalar, which
is
is
Another
general enough to cover many
a vector asymptotic normality would follow from
conditions like those of Andrews (1991), which are difficult to verify.
Assumption 5
will imply that the
a primitive condition, that
is
In contrast,
relatively easy to verify.
For example, for the consumer surplus estimator previously discussed where
p (x)
»
a power series, suppose that
is
x
T =
bounded density, and that the set
equal to
everywhere
CJ"
<(lnp,lnT)
on the line
l/(p-p_)
g.dnp.lnljdp = C
D(q.(x))
so that
T,
p)
is
C
E[q.(x)
combination of a power series
it
and
>
2
]
—
A =
Let
(D(p
1K
2
<r
(x)
),...,D(p
Theorem
= A'Q^SQ^A,
K
If Assumptions
2:
(C,JK)/Vn)
d
—
]
By the Weirstrass
0.
»
q,(x),
it
is
useful to
is
satisfied in this example.
work with an asymptotic
and
))'.
The asymptotic variance formula
V
E[g.(x)
follows that Assumption 5
= Var(ylx)
KK
g.(x)
Since a polynomial is a linear
0.
»
To state the asymptotic normality result
variance formula.
such that
can be approximated uniformly by a polynomial
g.(x)
and
g.(x)
D(g.) = f g.(lnp,lnT)exp[g (lnp,lnT)]dp £
for a constant
C/2,
>
contained in this
uniformly bounded, and converges to zero
is
For this sequence,
else.
approximation theorem, any
P
£ s p i
:
Then there exists a sequence of continuous functions
support.
is
continuously distributed with compact support and
is
is
Q = E[p
1-6
K
(x)p
K
(x)'],
K
I = E[p (x)p
VnK
are satisfied and
—
K
(x)'<r(x)
then
»
2
].
(10)
Q = 8_ +
and
l
VnV~ "(e
K
- e )
Q
-i* N(0,1),
VnV
u '(e
- e
Q
)
-!> N(0,1).
This theorem includes as a special case linear functionals, which were studied by Andrews
(1991).
(1991) conditions.
(1991),
in
to be
Theorem 2 are quite simple
In that case the conditions of
Also, the restriction
leading to only requiring that
Andrews
i.i.d.
(1991).
here, while
2
CJK) K/n
K/n —
>
—
>
Andrews
weaker than that of Andrews
for power series, rather than
One reason for this contrast,
Andrews
is
relative to
is
K/n —
that the regressors are assumed
(1991) general results allows for the regressors to not
be identically distributed.
10
may
This bound
6.
We know
not be sharp.
d =
considered next, where
Whether
0.
/
O
This result only gives an upper bound
on the convergence rate for
will not be sharp in the v'n-consistent case
it
sharp for other cases
is
it
,(K)/v n)
{C,
is still
an open
question.
v'n-Consistency
4.
The key condition for v'n-Consistency
is
that the derivative
D(g)
be mean-square
continuous, as specified in the following hypothesis.
Assumption
D(g
Q
PK
)
There
6:
= E[v(x)g
with
is
(x)],
E[lMxH3
DCpj^) =
K
Kp
2
(x)ll
This condition allows for
a(g)
with
i>(x)
EMxJp^fx)]
-»
]
a(g)
E[v(x)v(x)']
finite
for
all
k
and nonsingular such that
and
K,
and there
is
0.
to be a vector.
as an expected outer product,
when
g
is
approximating functions, and for the functional
It
requires a representation of
equal to the truth or any of the
i>(x)
in the outer
product
representation to be approximated in mean-square by some linear combination of the
functions.
This condition and Assumption 5 are mutually exclusive, and together cover
most cases of interest
A
(i.e.
they seem to be exhaustive).
sufficient condition for Assumption 6
continuous in
g
is
that the functional
a(g)
be mean-square
over some linear domain that includes the truth and the approximating
functions, and that the approximation functions
form a basis for
this domain.
The outer
product representation in Assumption 6 will then follow from the Riesz representation
theorem.
This condition
is
somewhat
like
Van der Vaart's
(1991) condition for
v'n-consistent estimability of functionals, except that his mean-square continuity
hypothesis pertains to the set of scores rather than a set of conditional expectations.
Also, here
it
is
a sufficient condition for Vn-consistency of a particular estimator,
11
A similar
rather than a necessary condition for existence of such an estimator.
condition
was
Newey
also used by
(1994b) to derive primitive conditions for
v'n-consistency of series estimators, under stronger regularity conditions.
There are many interesting examples of functionals that satisfy this condition.
example
an average consumer surplus estimator,
is
(11)
JMDdl =
some weight function, with
is
surplus over different income values.
This
1.
An estimator of
derivative of this functional
l(
Assumption 6 will then be satisfied, with
and
(4).
Suppose that
Similarly to equation (10), the
E£p£p,lsl<l)v(I)exp(g
=
v(x)
(lnp,lnD).
where
v(I,p),
f(I,p)
assumed to be bounded away from zero on
p,
Another example
an average of consumer
is
D(g) = Jv(I,p)g(lnp,lnI)dpdI, v(I,p) =
I
is
this functional could be used as
a summary of consumer surplus as a function of income.
density for
but
= jJvmjfexpCgUnp.lnnjdpdl,
a(g)
v(I)
(5),
Consider the functional
integrated over income.
where
equation
like that of
is
the coefficients
E[Var(x
|x,
)]
is
yn
(12)
f(I,p)
[p_,p]
x
is
the
[1,1].
from the partially linear model of equation
nonsingular, an identification condition for
y
3.
and
let
U = x
a(
In this
g()
a
)
- E[x
=
example
a
|xj
b
and
E[i>(x)g (x)]
a(g)
is
v(x) = (E[UU'
= (EIUU'
_1
])
-1
])
(E[Ux^]y
a linear functional of
g,
a(g) =
Then for
U.
+
E[Ug
and
b
(x
a(g)
b
)])
= y
Q
= D(g)
.
(13)
satisfies
third example is a weighted average derivative, where
a(g)
This functional
= .Tw(x)[3g(x)/3x]dx,
is
/w(x)dx =
1,
w(x) a
0.
(14)
useful for estimating scaled coefficients of an index model, as
12
U
EMx)g(x)],
Assumption 6 by construction.
A
One
1
discussed in Stoker (1986), and can also be used to quantify the average slope of a
Assuming that
function.
w(x)
zero outside some compact set and that
is
continuously distributed with density
where
w(x)
that
is
bounded away from zero on the set
= -J*[aw(x)/ax]g(x)dx = EMx)g(x)],
Therefore, Assumption 6
is
integration by parts gives
is positive,
a(g)
f(x)
x
vlx)
= -f(x)
_1
3w(x)/3x.
satisfied for this functional, and hence
is
(15)
Theorem 3 below
will
give v'n-consistency of a series estimator of this functional.
The asymptotic variance of the estimator
from Assumption
6.
It
will be determined by the function
v(x)
will be equal to
V = EMx)Wx)'Var(ylx)].
Theorem
3:
If Assumptions 1-4 and 6 are satisfied for
Vn(G - 9
5.
Q
)
-U
V
N(0, V),
-^
d =
0,
and
—
VnK
>
then
V.
Power Series
One particular type of series for which primitive regularity conditions can be
specified
X
is
a power series.
Suppose that
r
is
the dimension of
~ Z-_i^->
—
J * J
denote a vector of nonnegative integers,
X
X
r
an d l et z = TI. — (z.)
For a sequence
vectors, a
power series approximation has
(X.
)'
r
.
J
PkK (x)
It will
= x
1
J
i.e.
x,
X =
let
a multi-index, with
(A(k))„
norm
of distinct such
*--
Mk)
(16)
.
be assumed that the multi-index sequence
13
is
ordered with degree
I
A(K)
|
|i\|
increasing in
K.
The theory to follow uses orthonormal polynomials, which may also have computational
For the first step, replacing each power
advantages.
x
polynomials of order corresponding to components of
may
distribution
by the product of orthonormal
with respect to some
X,
when the
lead to reduced collinearity, particularly
matches that of the data.
The estimator
replacement, because
I
|
\(l)
distribution closely
will be numerically invariant to such a
monotonically increasing.
is
For power series primitive conditions for Assumptions 2 and 3 can be specified.
Assumption 2 will be a consequence of the following condition.
Assumption
x
on which
The support of
7:
x
is
a Cartesian product of compact connected intervals
has a probability density function that
This assumption can be relaxed by specifying that
distribution of
x),
but
it
To be
x
it
is
bounded away from zero.
only holds for a component of the
(which would allow points of positive probability in the support of
appears difficult to be more general.
specific about Assumption 3
we need uniform approximation
rates for power series.
These rates will follow from the following smoothness assumption.
Assumption
of
It
8:
g n (x) = E[y|x]
is
continuously differentiate of order
s
on the support
x.
d =
follows from this condition and Lorentz (1986) that for
rate of Assumption 3
is
a =
s/r,
where
r
is
the dimension of
search has not yet revealed a corresponding result for
also approximated, although
hold for
a
it
is
any positive number.
known that
if
0,
E[y|x]
d
>
is
0,
the approximation
x.
A
literature
where derivatives are
analytical Assumption 3 will
Because of this absence of approximation rates for
derivatives, the rest of the Section will confine discussion to the case
The first result for polynomials gives convergence rates.
14
where
d =
0.
Theorem
If Assumptions
4:
1,
7,
hf/n
and 8 are satisfied and
2
Slg (x) - g(x)] dF (x) =
(K/n + K
Q
2s/r
\g -
),
—
then
>
gQ Q = 0(K[VR/Vn
\
This result gives a mean-square and uniform approximation rates for
K~
+
s/r
]).
Uniform
g.
convergence rates for derivatives are not given for the reason discussed above, that
approximation rates for derivatives are not readily available
in the literature.
As previously noted, the mean-square approximation rate attains Stone's (1982)
r/(r+2s)
J~
K = Cn
bounds for
obtained here
and
s
£ r
(implying
different than that of Andrews and
is
3
K /n
—
Whang
0).
>
The mean-square rate
(1990), because
it
population mean-square error rather than the sample mean-square error, which
because
it
for the
is
is
a fixed norm rather than one that changes with the sample size and
On the other hand, Assumption 7
configuration of the regressors.
Whang
conditions imposed in Andrews and
is
stronger than the
(1990).
The uniform approximation rate does not appear to be optimal, although
it
seems to
improve on rates existing in the literature, being faster than that of Cox (1988).
implication of this rate
—
>
appealing
is
that g
will be uniformly consistent
when
£ r
s
One
3
K /n
and
0.
This result can also be extended to additive models, such as the one in equation
A power
(3).
series estimator could be constructed as described above, except that
x
products of powers from both
ab
and
x,
are continuously differentiable of order
s,
are excluded.
x
a
and
x,
b
.
each of
g (x
a
abb
)
and
g, (x,
The conclusion of Theorem 4
K
—
— s/r
,
where
r
is
will then hold with
the
r
maximum
dimension
replaced by
r.
This gives mean-square convergence rates for polynomials like the regression spline rates
of Stone (1985).
r
An extension
to the partially linear case
replaced by the dimension of
rather than
x.
x,
would also be similar, with
and Assumption 7 only required to hold for
x,
Because these extensions are straightforward, explicit results are not
15
)
the exclusion of the (many) interaction
terms will increase the approximation rate to
of
If
given.
Assumptions 7 and 8 are primitive conditions for Assumptions 2 and
3,
so that
asymptotic normality and v'n-consistency results will follow under the other conditions
of Sections 3 and 4.
The following more primitive version of Assumption 5
to verify, except for Assumption 5.
is
are straightforward
All of these other conditions are primitive,
easier to check and will suffice for Assumption 5 for power series.
Assumption
a(g)
9:
a scalar and there exists a sequence of continuous functions
is
00
such that
{gjfx)}..
a(g
An asymptotic normality
Theorem
VnK
up
9 = 9n +
'
(K/Vn)
—
result for
1
VnV~
and
power series estimators
and 4 are satisfied for
either
0,
»
2/2
—
K /n
(6 - 6 n )
u
>
-U
The rate conditions of this result require that
functional, or that
s >
3r
?
bounded away from zero but
is
If Assumptions
5:
satisfied,
)
or
a(g)
E[g.(x)
is
d =
—
0.
>
given by the following:
Assumptions
0,
is linear
and
¥T /n
7-9
—
then
0,
>
are
N(0,1).
s >
3r/2
in the
case of a linear
a nonlinear functional.
in the case of
]
certain amount of smoothness of the regression function
is
In this sense
a
required for asymptotic
normality.
For the consumer surplus example,
are satisfied.
Since
it
is
s/2
A
->
has already been shown that Assumptions 4 and 9
nonlinear, and since the dimension of
case, asymptotic normality will follow
tfft-
it
x
is
2
from Assumptions 7 and 8 and from
in this
K /n
—
and
•>
0.
result can also be formulated for v'n-consistency of a functional of a polynomial
regression estimator.
Theorem
6:
If Assumptions
satisfied,
VnK
Vn(B - B n )
-^
—
N(0,V)
>
0,
and
1
and 4 are satisfied for
either
V
—
>
K /n
—
V.
16
>
or
a(g)
d =
0,
Assumptions
is linear
and
YT/n
6-8
—
>
0,
are
then
The conditions of this result are very similar to those of Theorem
Assumption 9 has been replaced by Assumption
6 are not given because
it
is
satisfies Assumptions 4 and 6 with
—
>
and
v^nK
— s/2
—
"
it
>
4.
d = 0,
except that
More primitive conditions for Assumption
6.
already in a form which
demonstrated by the examples of Section
5,
is
straightforward to verify, as
The average consumer surplus estimator
so that under Assumptions 7 and 8 and
will be v'n-consistent.
K /n
Also, the partially linear
coefficients and weighted average derivative examples also satisfy the assumptions, and
are linear functionals, so that they will be ViT-consistent
6.
K /n
if
—
and
>
VnK
—
-»
Regression Splines
Regression splines are linear combinations of functions that are smooth piecewise
Spline approximating
polynomials of a given order with fixed knots (join points).
functions have attractive features relative to polynomials, being less sensitive to
outliers and to bad approximation over small regions.
as the theory here
is
concerned, that support of
required for knot placement.
x
They have the disadvantage, as far
must be known.
A known support
is
Alternatively, the results could also be obtained for
estimators that only use data where
x
some specified region, but the conclusion
is in
would have to be modified for that case.
In particular, the
uniform and mean-square
convergence rates would only apply to the true function over the region of the data that
is
used, and asymptotic normality would only hold for functionals that did not depend on
the function
g
for
x
For simplicity, this section will only
outside the region.
consider the case where the support
is
known and
satisfies Assumption 7,
following condition can be imposed without loss of generality.
Assumption
10:
The support of
x
is
[-1,1]
17
.
where the
x
When the support of
is
known and Assumption 7
is
can always be rescaled
x
satisfied,
so that this condition holds.
To describe a
spline, let
degree polynomial with
L-l
r
{[x +
A
A.(k,K) s
with
For
x.
l(x > 0)*x.
knots
is
£-1
1st* m+1,
- 2(£-m-l)/L]}
1
A
spline basis for a univariate
m
(17)
m
m+2 £
,
£
m+L
s
formed from products of these functions for the individual
spline basis can be
components of
=
(x)
m+J
the set of distinct r-tuples of nonnegative integers
{A(k,K)>
for each
p kK (x) = nj=iPx
(x
(k)
j
j'
and
L)>
and
£,
—
(k=1 *
K)
K =
(m+J)
let
,
(18)
-
The theory to follow uses B-splines, which are a linear transformations of the above
functions that have lower multicollinearity.
The low multicollinearity of B-splines and
recursive formula for calculation also leads to computational advantages; e.g. see Powell
(1981).
The rate at which splines uniformly approximate a function
power
series, so the
smoothness condition of Assumption 8 will be
splines there does not
in the literature, so
Theorem
7:
we
limit attention to the case
If Assumptions
1,
7, 8,
Q
is
the same as that for
For
left unchanged.
seem to be approximation rates for derivatives readily available
2
S[g (x) - g(x)] dF (x) =
It
is
and
10
d =
0,
as
we do
for power series.
are satisfied then
(K/n + K~
2s/r
),
\g -
gQ Q = 0(VK[VR/Vn
\
+
K~
s/r
]).
interesting to note that the uniform convergence rate for splines is faster than
power
series.
It
may be
intrinsic feature of the
that this
is
an artifact of the proof, rather than some
two types of approximation, but more work
that.
18
is
needed to determine
Asymptotic normality and consistency will follow similarly as for power series.
Theorem
If Assumptions
8:
Up
9 = Qn +
In
"'
VnK
satisfied,
(K/Vn)
—
and
VnV~
comparison with Theorem
5,
d =
and 4 are satisfied for
either
0,
>
1
1/2
K /n
—
or
>
a(g)
0,
Assumptions
is linear
and
¥T/n
7-10
are
—
then
>
0,
(9 - 9 n ) -±> N(0,1).
u
this asymptotic normality result for functionals of a
growth rate
spline estimator imposes less stringent conditions on the
Consequently,
K.
the necessary smoothness conditions for asymptotic normality will be less severe,
requiring only
for a nonlinear functional or
s > 2r
As
for a linear one.
s > r
in
the case of polynomials, the consumer surplus functional satisfies the conditions of this
A
9:
If Assumptions
are satisfied,
VnK
Vn(Q - 6 n )
—
'
-U
»
1
0,
and 4 are satisfied for
and
N(0,V)
K /n
either
V
—
>
—
or
»
splines, the average
— s/2
""
—
>
0,
be Vn-consistent
7.
and
=
v'nK
'
—
"
>
0.
d =
a(g)
0,
Assumptions
is linear
and
6-8
K /n
—
and
»
0,
V.
The examples of Section 4 meet the conditions of
v'nK
—
vlT-consistency result for splines can also be formulated.
Theorem
then
K /n
normal for
result, so that it will be asymptotically
consumer surplus estimator
this result, and so for regression
will be v'n-consistent if
K /n
—
>
and
and the partially linear coefficients and weighted average derivative will
2
if
K /n
—
»
and
VnK
— s/r
—
>
0.
Conclusion
This paper has derived convergence rate and asymptotic normality results for series
estimators.
Primitive conditions were given for power series and regression splines.
Further refinements of these results would be useful.
for estimation of derivatives of functions.
19
It
would be good to have results
These could easily be obtained from
10
approximation rates for power series or splines.
Also,
it
whether the uniform convergence rates could be improved
20
would be useful to know
on.
Appendix: Proofs of Theorems
Throughout the Appendix,
different uses and
C
let
= £•_,•
J\
denote a generic constant that
Also, before proving
Theorems
1-3,
the following observations that apply to each of those proofs.
nonsingular linear transformations of of
K
p (x) = P
i.e.
I,
K
bounded away from zero,
it
Q
p (x)
Q
A
Hence, the conditions imposed on
K
replaces
p
K
-1/? K
p (x)ll
K
g
invariant to
is
K
(x)p
K
.
is
(x)'
]
Q =
has smallest eigenvalue
I.
This
because, for a
is
....
a nonsingular linear
*CC d (K).
and convergence rate bounds in terms of
E[ IIQ-III
2
]
=
CJK)
derived for
C H (K).
E^Zj^EKEi^Pj^Cx.Jp^Cx^/n
-
2
I
jk
>
* Ik =lEj=l E[ PkK (x)2p jK (x)2l/n = E[ ^k=l P kK (x)25:j=l P jK (x)2l/n " < (K)2tr(I)/n "
—
>
]
V
K)2K/n
so that
0,
IIQ-III
= O «.(K)K
1/2
/vn) = o
Furthermore, since the difference
in smallest eigenvalues is
the smallest eigenvalue of
IIQ-III,
(19)
(1).
p
P
by
B =
~
(x),
First,
1:
Since
make
helpful to
Assumptions 2 and 5 will be satisfied when
in
this replacement will also hold for the original
Proof of Theorem
is
satisfying
~
p (x)
,
it
different in
can be assumed throughout that
Q = E[p
,.-1/2 K, ,
Q
p (x)
_-l
_
of
< d (K) = sup xeX(|A|£d ll5Q
—1/2
it
can be assumed throughout that
-.-1/2
,
symmetric square root
Q
(x),
Furthermore, since
(x).
transformation of
p
may be
Q
n
=
1)
—
Next, for
in absolute value
Let
converges in probability to one.
the indicator function for the smallest eigenvalue of
Prob(l
bounded
Q
being greater than
1/2,
be
1
so
> 1.
G = (g^x^....^^))'
Boundedness of Var(y|x)
let
e =
Y-G
and
X =
(Xj
and independence of the observations implies
Therefore,
21
x
n
).
E[ee' |X] ^ CI.
1/2
E[l
n
IIQ
P'£/nll
2
1
n
VelXl/n
E[e'P(P'P)
n
=
1
n
E[tr<P(P'P)
V'ee'HXl/n
tr<P(P'P)~ P'E[ce\X])/n * CI tr<P(P'P)~ V' >/n * CK/n,
n
1/2
so by the Markov inequality,
_1
n
IIQ
P'e/nll =
-1/2
£
(1)1
p
Let
1
l
=
1
=
|X]
n
IIQ
1
n
IIQ"
1
P'e/nll =
1/2
/Vn).
follows that
It
1/2 _1 Q" 1/2
1/2
P'c/n>
{(e'P/n)Q"
P'e/nll =
{K
Q
(K
1/2
/vn).
p
be as in Assumption
g = (gJx.)
Then for
3.
g (x
by
))',
1
P(P'P)~P'
idempotent,
_1
P' (g-P0)/nll *
IIQ
1
£
(l)[(g-P0)'(g-P0)/n]
Therefore, by
1
n
1
-
110
(0 - 0)
s
011
=
1/2
=
(K~
Q
n
P'(g-P0)/n]
1/2
).
_1
P' (y-g)/n +
1
-1
1
1
a
_1
1
pf
[(g-P0)'P(P /
(1)1
IIQ^P'e/nll +
1
n
IIQ
P' (g-P0)/n,
Q
it
P'(g-P0)/nll =
(K
follows that
1/2
/vn + K~
a
).
(20)
p
Next, by the triangle inequality,
l
n
2
X[g(x)- g() (x)] dF(x) =
£ MI0-0II
2
K
,
+
l
n
J [p
K
l
nJ"[p
lg " g
=
P
(21)
2
(x)'0-g (x)] dF(x) =
(K/n + K~
o
=
1
1
2a
)
+ 0(K~
2a
)
=
(K/n + K~
with probability approaching one.
and Cauchy-Schwartz inequalities,
Also, by the triangle
n
2
o
Therefore, the first conclusion follows by
l
K
(x)'(0-0) + p (x)'0-g (x)] dF(x)
l
Q d
*
K
l
n
lp
« d.(K)[(K 1/2/vn)
K
'(0-0)l
+
K"
d
+ |p '0-g
a
]).
22
l
o d
s C (K)l 110-011 + 0(K"
d
n
a
)
(22)
2a
).
Q
The second conclusion follows from
from
this equation like the first follows
eq.
(21).
QED.
Proof of Theorem
rfi[<;
d
Note that Assumption 5
2:
(K)(VR/Vn + K~
2
C d (K) (v'R/vfi +
-1/2
F =
Let
V..
= (A'EA)
a 2
_a
K
=
)
1/2
4 2
=
)]
b) implies
K
<(C (K)
d
[C (K)
d
4
K/n]
1/2
|D(g
CI
by
K
= |A'/3
)|
K
I
^
IIAIIII/3
K
=
II
IIAII(E[g
Q =
bounded below and
Var(ylx)
1/2
+ (•nK
_a 2
)
4
+ L[£ (K) /n]
4
[C (K) /n]
K
2 1/2
(xr])
1/2
n'nk~
implying
,
II
a
-»
All
>
(23)
0,
0.
as in Assumption
g
—
->
>
Z s
Further,
».
= A'ZA i CHAN
V
so that so that
I,
1/2
d
Then by Cauchy-Schwartz inequality, for
.
~
6,
/n)
2
and
,
hence
Then, for
2
£ C,
|F|
_1
1
n
and
e
n
IIFA'Q
IIFA'Q~
Now,
Theorem
as in the proof of
1
1
= tr(FA'AF) £ tr(CFV„F) =
IIFAII
1/2 2
II
II
g
= Y-G.
K
(x)
1,
_1
1
n
UFA'
s UFA'
—
let
s
II
+
1
n
IIFA'(Q-I)Q
+
1
n
K
= p (x)'0
tr(FA' (Q-I)Q
1/2
(e-9 n ) =
v^F[a(i)-a(g n )] =
i/nV"
+
FA'P'c/Vn + v^FA'(Q -I)P'c/n
By Assumptions 3 and 5
n
l
n
1
iv
(v^TK
(•nK
p
P
)
II
=
IIQ-III
(24)
(1),
p
AF) £ C + CIIFA'
111
n
UFA'
_i
ll
=
IIQ-III
(1).
p
G =
3,
l^nF[D(g" )-D(g
K
= o
(1).
(
p
P
1
n
Next, let
1
(g (x
g (x
)
<VnF[a(g)-a(g n )-D(g)+D(g n )]
1
+ v^nFA'Q" ?' (G-Pp
)]|
Cauchy-Schwartz inequality and
IIFA'Q T'/v'nllllg-P/VI s
_a
a
UFA'
(1)
))',
Then
_1
1
s C +
from Assumption
for
1
Also, by the
II
p
_1
2
II
C.
=s
eq.
n
i'Q
IVnFA'Q
Vi
P'(g-P0,J/n|
N
)
I
Hlvfimax.^ |g.(x.)-g„(x.)
u i
i
i^n
IIFA'Q
is.
X =
x
(x,
1
23
n
)
and
p.
i
K
Ci/nl^-g^
K
1
)/n + v/nF[D(g
*
i/nC|D(g -g
(24),
K
(25)
= p
K
(x.).
i
|
£
1
n
IIFA'Q
Then by
)-D(g
Q
)]>.
£ Cv/nK~
a
->
0.
s
ll^n|g_-g„ |_ =
u
E[e|X
n
]
=
iv
0,
u
Etl
n
Hv'nFA'fQ^-nP'e/nl^lX
1
n
-I)[y\p.p'.(r (x.)/n](Q *-I)AF'
**i
n
2
llvfiFA'tQ^-DP'e/nll
Next
Z.
,
(i
=
m
n)
1
-^
i
so that
E[Z.
Also,
is i.i.d..
e)Z
>
|
£ ne
2
4
IIFA'
ll
2
in
C n (K)
by
N(0,I)
-5->
1
2
2
E[llp.ll
=
]
conclusion follows by
*
)]|
(1-1
n
.^i
|x.]]/n
11
L"/n(
I
VnV~K
K
A =
so
VnK
)
—
>
Assumption 5 and
=
Prob(I
1)
—
I
i-g
1
2
I
Q d
K
|p
I a(p
n
K
'
|
s T C(|p
n
as
—
»
0,
<
J
K
so that
2
A
)
»
—
0.
>
so
N(0,1),
and
5,
eq.
>
FAQ
1
P'
e//n
(23),
_a 2
)]
Then by
-£-> 0.
)
so that the first
N(0,1),
a(g)
lp
/iip-pii
< (K)[VR/Vn + K~
d
5,
-£-» 0.
l
D(p
K '3-g
n
s
linear in
is
;g))'.
a
]
Then
g.
l
,
<
|
Q
l
a(p
'|3)
2
1/2
= [C (K) K/n]
(l +
d
Therefore, for the
lg-g n
a(p
e/2
By Assumption
and
5,
e
1
from
=1,
for any
i\i\
£
< e,
K
'
0)-a(g)-D(p
K/
0;g)+D(g;g) /II0-0H
I
Ia; d (K) 2 ii0-0ii -»
exists and equals
probability approaching one.
and Theorem
—
n,
(26)
in
d
where
5,
;g)
|
d
]
(Li/n[C (K)(v^/i/n+K
u
0)-a(g)-J' (0-0) /II0-0II = T
p-£i
2
4
£ ne E[(Z. /e) ]
2
Assumption
lg-g n
1,
= (D(p
implying
e/2,
K,
2 -5-
and
1,
—
£.Z.
(0~0 n )
VnV,.
n
IK
'/3-g|
IIQ-III
Note that for each
2
s C< rt (K) K/n
^0
e
1,
Assumption
II
such that
2 4
=
)
=
]
equal to the indicator function for
Also, let
> 1.
2
follows that
1)(Z. /e)
in
>
Theorem
so that by Theorem
0,
*
(9-9„) -^-» 0.
In case b) of
A.
2
in
limit theorem,
Next, consider case a) of Assumption
= A'£,
F.EtZ
0,
4
E[e
l
and the triangle inequality,
(25)
it
CIIFAII
= FA'P'e/Vn.
in
in
Finally, by
1.
|vGF[a(g)-a(g )-D(g)+D(g
eq.
h
2
= ne E[l(lZ. /e|
]
Then by the Lindbergh-Feller central
>
^
)
(1),
V.Z.
in
in
—
o
)
1
(Q-I)AF'
is
l
in
nE[l(|Z.
1
0.
= FA'p.e./Vn,
r
Z.
let
,
1
_1
n
Therefore, since this conditional expectation
1
2
1
tr(FA' (Q
-I)AF) £ CI tr(FA' (Q-I)Q
-I)Q(Q
n
=
]
_1
-1
CI tr(FA' (Q
n
J
when
1=1.
Furthermore, by linearity of
1,
24
Therefore,
D(g;g)
in
g,
A
exists with
Assumption
5,
Q
2
l
n
UA-A)'
=
IIA-AII
£ CT l(A-A)'p
n
K
|
d
l£-g
l
Eq. (23) and dividing through by
p
« dJ (K) 2 [v'R/vn
K~
+
a
])
K
(A-A) =
-£-» 0.
n
K
lD((A-A)'p ;i)-D((A-A)'p ;g
£ I CIIA-AIIC (K)li-g
n
d
d
II
l
A-A
gives
II
Therefore,
I
1
n
II
A-A
£ I
IIFAII
.
l
d
k
£ C< (K)|g-g
II
n
(27)
)|
+
IIFIIIIA-AII
=
IIFAII
=
(1),
and
p
-1
similarly
I IIFAQ
n
=
II
(1).
p
h = I Q
n
Next, let
_1
AF
h = I AF.
n
and
Then
=
llhll
_1
_1
llh-hll
E s
Since
£ I IIFA'Q
n
+ I
(I-Q)II
F( A-A)'
II
n
Z
the largest eigenvalue of
CI,
I Ih'Zh n
£ Cllh-hll
II
£ I UFA'
n
II
+ I |F|IIA-AII
III-QII
n
bounded above, and hence by
is
+ 2[(h-h)'Z(h-h)]
1/2
[h'Zh]
1/2
£ o
(1)
+ Cllh-hll
-^
h'Zh =
= Ih'Zh - h'Zhl £ (h-h)'Z(h-h) + |2(h-h)'Zh|
1|
2
and
(1)
p
0.
1
(28)
-^
0.
P
2
Z = r.p.p'.e./n.
Also, let
for
^111
4
E[e.|x.]
follows by
from Theorem
-£-»
IIQ-III
It
that
1,
n
vnK
= n
_a
/K
2
llhll
implying
=
IIZ-ZII
(Do
P
= g (x.)-£(x.).
A.
let
-E-» 0,
IIZ-ZII
I Ih'Zh - h'Zhl = |h'(Z-Z)h| £
Now,
bounded, and an argument like that
11
l
1/2
—
0,
>
)
so by
-1
Theorem
1,
< (K)[(K/n)
max.^
I
A.
I
bounded and a similar argument to
IIQ-III
-J-» 0,
1/2
+K~
£ lg-g
S = E[p.p'. |e.|] = E[p.p'.E[|e.
and
£.p.p'. |e.|
Note that
|
|x.]]
it
(1)
-^
(29)
0.
P
a
]
1/2
2
= [< (K) K/n]
(l +
=
l
(o(D) -?-> 0.
£ CQ =
follows that
CI.
II
By
S— S
II
E[e
—^-»
Also, let
2
|x.]
0.
Therefore,
I IFVF - h'Zhl =
n
£ 2max.
i£n
_1
I
A. |n
l
V.(h'p.)
u
\
*i
|h'(Z-Z)h| = |n
2
_1
Hi
2
|e.
i
I
]
2
y.(n'p.) (2c.A.+A
+ max.
|A |n"V.(h'p.)
u\
*i
i^n
l
25
2
ill
2
)|
£ o U)[h' (S+Q)hl
p
(29)
S
pV
s o (l)[h'(S+Q-S-I)h] + o (l)[h'(S+I)h] £ o (l)llhll
P
P
P
Then by the triangle inequality and
=
Probd
Then by
i/nv"
1/2
—>
1)
(e-e
F V
1,
eqs.
-£->
(28)-(30),
2
+ IIQ-III) + o
(IIS-SII
-^
(l)llhll
0.
P
follows that
it
|FVF-1| -!->
1
0.
implying
1,
2 1/2
= VnF(e-e )/(F v)
)
2
-U
n(o.i),
giving the last conclusion.
show the
Finally, to
first conclusion,
it
show that
suffices to
|V
|
s CC,(K)
/2
9 = 6. + (V*
because
/
/v n)vnF(9-9_) = 9 n +
K
Cauchy-Schwartz inequality implies
C|p
K,
CHAN
2
A|
CC.UO IIAII.
s
.
s CC (K)
d
2
Dividing by
/
—
J
f
with
satisfied with
=
llcll
>
Let
and
N(0,c'Vc)
IIAII
* CC.(K),
IIAII
c'Vc
c'Vc
-£-»
Furthermore, for the functional
1.
replaced by
v(x)
Wx).
for scalar
2
so that
(K)II0II,
then gives
IIAII
Note that for any
/Vn).
the
0,
K
= |D(p 'A)| £
|V„| s
and hence
V
By the Cramer-Wold device and by symmetry of
3:
to prove that c'v n(9-6
c
s £
'|3|
/2
,
QED.
.
Proof of Theorem
vector
|p
(V*
2
u
K.
Therefore,
c'v(x).
K
K
v v = Ap (x) = E[i>(x)p (x)']Q
V,
it
suffices
for any conformable constant
c'a(g),
Assumption 6
is
suffices to prove the result
it
—1
and
K
be the mean square
p (x)
IX
v = v(x)
projection of
on the approximating functions, where the
dropped for notational simplicity.
2 2
Then
V,.
ix
K
2
E[{v-p (x)'
v}
I
K
—
]
-V
I
>
Assumptions
0.
* E[|i>
2
2
-i>
|]
s
EUv
2 1/?
s od) + 2iEivnr'(Eiiv-v
Thus,
—^->
V
V.
By
<r
(x)
1
= E[v v <r
K.
x
argument
by Assumption
(x)]
6,
is
E[[v-v v )
K
?
-v)
Z
1/?
K r]r*
]
+ 2E[|v||i> -v|]
K
->
o.
bounded away from zero and
proof now follows exactly as in the proof of Theorem
—
V
i
v(x)
non-zero,
V
>
The
0.
2,
except that
F = V
-1/2
"
is
/*y
—> l/W.
s
]
and the Cauchy-Schwartz inequality then give
K.
bounded by
2
K.
Therefore, by the conclusion of Theorem
26
2,
^(9-9^
=
V^
/2
1/2
v'nV
V
K
(e-e
—^->
V
is
P
Let
4:
bounded below,
x
density of
(3.14) of
and hence
Andrews
(1991) that
2
3
C->(K)
—
K/n £ CK /n
Proof of Theorem
2
]
—
»
5:
0.
Furthermore,
d =
and
K(J
K a
for
1/J
c
0,
that
J
c
>
g.(x)
= VnST
so that
(x)
IIP
K(J),
1/e,
~ CK>
"
follows by Theorem 8 of Lorentz
a =
and
such that
such that
(3
Let
J.
so the conclusion
s/r,
|a(g.)|
is
>
a sequence of
e
for
all
and
.
and
pv = pv
let
JK.
.
for
Then by the triangle
E[gv (x)
2
]
= sup
A
^.|g (x)-|3 'p
XGo.
J
K.J
and
J
supremum
—
(x)|
—
0,
>
K(J) £
K
< K(J+1),
inequality,
so Assumption 5
|a(g.J|
is
p„ =
>
e
satisfied.
K
for
for
<
< K(l),
K
all
as
»
A„.
be a monotonic increasing sequence such that
K(J)
>
d =
Then, since
K.
and from the proof of Theorem
C (K)
V
'
note that by Assumption 9 there
5,
set, there is
g K (x) = p (x)'3 K
),
K
follows as in the proof of Theorem 4 that Assumptions 2 and 3 are
It
for any fixed
oo
s
'
it
K.J
»
follows as in eqs. (3.13)
Since power series can approximate a continuous function in the
0.
norm on a compact
—
By the
(x).
1/9
|P
(x)
kK
k < K X €X
uniformly bounded continuous functions
E[g.(x)
it
p
QED.
1.
To show Assumption
satisfied.
Also,
].
max
(1986) that Assumption 3 is satisfied with
follows by Theorem
so that their
x's
each term by the Jacobi
in
K
K
E[P (x)P (x)'
=s
>
so that
1,
orthonormal with respect to the uniform distribution
is
CI
1/2 -i>
QED.
which comprises a nonsingular linear transformation of
[-1,1],
and
-^-» V.
FV
2,
be obtained by transforming the
(x)
polynomial of the same order that
on
V
and replacing the individual powers
[-1,1]
Theorem
Also, as in the proof of
and squaring gives
1,
Proof of Theorem
support
-i> N(0,V).
)
K /n s CK /n
5/r
Proof of Theorem
—
>
C n (K) ^ CK,
so for nonlinear
and, since Assumption 3
is
a(g)
satisfied with
follows
it
a =
s/r,
vftK
QED.
6:
Follows by Theorem 3 similarly to the proof of Theorem
Assumption 6 replacing Assumption
Proof of Theorem
4,
7:
It
follows by
9.
5,
with
QED.
Lemma
27
A. 16 of
Newey
(1994a), that for
P
(x)
equal to
K
the products of normalized B-splines (e.g. see Powell, 1981) multiplied by the square
root of the number of knots,
0.
Furthermore,
it
K
IIP
CK
*
(x)ll
1/2
= C (K),
Q
and hence
?
< (K) K/n s
Q
follows by the argument of Burman and Chen (1989,
rest of Assumption 2 is satisfied.
by Theorem 12.8 of Schumaker
Also,
(1981),
Assumption 3 with
d =
so the conclusion follows
Proof of Theorem
8:
Follows as
proof of Theorem
5.
Proof of Theorem
9:
Follows as in the proof of Theorem
6.
in the
28
and
p.
?
CK /n
—
1587), that the
a = s/r
from Theorem
1.
follows
QED.
References
Andrews, D.W.K. 1991, Asymptotic normality of series estimators for nonparametric and
semiparametric models, Econometrica 59, 307-345.
,
Andrews, D.W.K. and Y.J. Whang, 1990, Additive interactive regression models:
Circumvention of the curse of dimensionality, Econometric Theory 6, 466-479.
Burman,
P.
and K.W. Chen, 1989, Nonparametric estimation of a regression function, Annals
of Statistics
17,
1567-1596.
Chen, H. 1988, Convergence rates for parametric components in a partly linear model,
Annals of Statistics 16, 136-146.
Cox, D.D. 1988, Approximation of least squares regression on nested subspaces, Annals of
Statistics 16, 713-732.
Deaton, A., 1988, Rice prices and income distribution in Thailand: A nonparametric
analysis, Economic Journal 99 supplement, 1-37.
B.J. and A.R. Gallant, 1981, Adaptive rules for seminonparametric estimation
that achieve asymptotic normality, Econometric Theory 7, 307-340.
Eastwood,
Gallant, A.R. and G. Souza, 1991, On the asymptotic normality of Fourier flexible
functional form estimates, Journal of Econometrics 50, 329-353.
Hausman, J. A. and W.K. Newey, 1995, Nonparametric estimation of exact consumer surplus
and deadweight loss, forthcoming, Econometrica.
Lorentz, G.G. 1986, Approximation of Functions,
New
York: Chelsea Publishing Company.
Newey, W.K. 1988, Adaptive estimation of regression models via moment restrictions,
Journal of Econometrics 38, 301-339.
Newey, W.K. 1994a, Convergence rates for series estimators, Statistical Methods of
Economics and Quantitative Economics: Essays in Honor of C.R. Rao, G.S. Maddalla,
ed.,
forthcoming.
Newey, W.K. 1994b, The asymptotic variance of semiparametric estimators, Econometrica
1349-1382.
Newey, W.K. 1994c, Series estimation of regression functionals, Econometric Theory
10,
1-28.
Powell, M.J.D. 1981, Approximation Theory and Methods, Cambridge, England: Cambridge
University Press.
Schumaker, L.L.
1981, Spline Functions:
Basic Theory, Wiley:
New
York.
Stoker, T.M. 1986, Consistent estimation of scaled coefficients, Econometrica 54,
1461-1481.
Stone, C.J. 1982, Optimal global rates of convergence for nonparametric regression,
Annals of Statistics
10,
1040-1053.
29
62,
MIT LIBRARIES
3 9080 00940 0380
Stone, C.J. 1985, Additive regression and other nonparametric models, Annals of
Statistics 13, 689-705.
White, H. 1980, Using least squares to approximate unknown regression functions,
International Economic Review 21, 149-170.
30
7966
]
Date Due
Lib-26-67
Download