Document 11157832

advertisement
^^^trs^
^^orr^^
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
http://www.archive.org/details/consistencyasympOOnewe
WS^W:
Consistency and Asymptotic Normality
of
Nonparametric Projection Estimators
Whitney K. Newey
No.
584
Rev. July 1991
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
-*-;
;?jf?K
^
U;>'<#
Consistency and Asymptotic Normality
of
Nonparametric Projection Estimators
Whitney
No.
584
K.
Newey
Rev. July 1991
;
VO'
'''
•
M.J.T.
NOV
LIBRARIES
4 1991
RECEiVtu
1
MIT Working Paper 584
Consistency and Asymptotic Normality of Nonparametric Projection Estimators
Whitney
K.
Newey
MIT Department of Economics
March,
Revised,
1991
July,
1991
Helpful comments were provided by Andreas Buja and financial support by the
NSF and the Sloan Foundation.
Abstract
Least squares projections are a useful way of describing the relationship
between random variables.
These include conditional expectations and
projections on additive functions.
way of estimating such projections.
Sample least squares provides a convenient
This paper gives convergence rates and
asymptotic normality results of least squares estimators of linear functionals
of projections.
General results are derived, and primitive regularity
conditions given for power series and splines.
Also,
it is shown that
mean-square continuity of a linear functional is necessary for v^-consistency
and sufficient under conditions for asymptotic normality,
and this result is
applied to estimating the parameters of a finite dimensional component of a
projection and to weighted average derivatives of projections.
Keywords:
Nonparametric regression, additive interactive models, partially
linear models,
average derivatives, polynomials,
asymptotic normality.
splines,
convergence rates,
Introduction
1.
Least squares projections of a random variable
random vector
y
and
q.
q
y
on functions of a
provide a useful way of describing the relationship between
The most familiar example is the conditional expectation
E[y|q],
which is the projection on the linear space of all (measurable, finite
mean-square) functions of
q.
Estimation of this projection is the
nonparametric regression problem.
estimating
E[y|q]
when q
Motivated partly by the difficulty of
has high dimension, projections on smaller sets of
functions have been considered, by Breiman and Stone (1978), Breiman and
Friedman (1985), Friedman and Stuetzle (1981), Stone (1985), Zeldin and Thomas
(1977).
These include projections on the set of functions that are additive
in linear combinations of
q,
and generalizations to allow the component
functions to be multi-dimensional.
One simple way to estimate nonparametric projections is by regression on
a finite dimensional subset,
size,
e.g.
(1988),
with dimension allowed to grow with the sample
as in Agarwal and Studden (1980),
Gallant (1981), Stone (1985), Cox
and Andrews (1991), which will be referred to here as series
estimation.
This type of estimator may not be good at recovering the "fine
structure" of the projection relative to other smoothers,
Hastie,
e.g.
and Tibshirani (1989), but is computationally simple.
see Buja,
Also,
the fine
structure is less important for mean-square continuous functionals of the
projection,
such as the parameters of partially linear models or weighted
average derivative (examples discussed below), which are essentially averages.
This paper derives convergence rates and asymptotic normality of series
estimators of projection functionals.
Convergence rates are important because
they show how dimension affects the asymptotic accuracy of the estimators
(e.g.
Stone 1982,
1985) and are useful for the theory of semiparametric
estimators that depend on projection estimates (e.g. Newey 1991).
Asymptotic
normality is useful for statistical inference about functionals of projection,
such as derivatives.
The paper gives mean-square rates for estimation of the
projection and uniform conververgence rates for estimation of functions and
derivatives.
Conditions for asymptotic normality and consistent estimation of
asymptotic standard errors are given, and applied to estimation of a component
of an additive projection and its derivatives.
Fully primitive regularity
conditions are given for power series and spline regression, as well as more
general conditions that may apply to other types of series.
The regularity
conditions allow for dependent observations, so that they are of use for time
series models.
The paper also relates continuity properties of linear functionals of
projections to ^-consistent estimation.
Under a regularity condition on the
projection residual variance, continuity in mean-square is shown to be
necessary for existence of a (regular) V^-consistent estimator, and sufficient
for asymptotically normal series estimators.
This result is used to derive
V^-consistency results for partially linear models with an additive
nonparametric component (a generalization of the model of Engle et.
al.
,
1984)
with and for weighted average derivatives of (possibly) additive models (a
generalization of the Stoker (1986) functional).
One problem that motivates the results given here is estimation of an
additive nonparametric autoregression,
•••
y^ = ^(yt-l^ *
(1.1)
^^^t-r^
^
^'
is the residual from the projection of
where
e
of
lags.
r
-^
y
on additive functions
This model avoids high dimensional arguments but allows for
several lags, which seems useful for short time series.
The convergence rates
and asymptotic normality results apply to estimates of this projection,
although asymptotic normality here requires it to be a dynamic regression,
where
E[e |y
,
y
„•
•
•
•
1
~ ^-
the weighted average derivative
r).
Effects of lagged values can be quantified by
Xw(y
)
[5h(y
)/ay
]dy
,
(J
=
1,
....
The results to follow include primitive regularity conditions for
Vn-consistency and asymptotic normality of series estimators of this
functional, and could also be applied to generalizations that allow
interactions between lags in equation (1.1).
These results are an addition to previous work on the subject,
that of Agarwal and Studden (1980), Gallant (1981),
including
Stone (1985). Cox (1988),
Andrews (1991), Andrews and Whang (1990), because the results (including
asymptotic normality) do not require that the projection equal the conditional
expectation,
similarly to Stone (1985), but not to the others.
the results allow for dependent observations,
In addition,
and apparently improve in some
respects on those of Cox (1988) and Andrews (1991) for the special case of
conditional expectations.
There is some overlap of the convergence rates with
a recent paper by Stone (1990) on additive-interactive spline estimators,
which the author saw only after the first version of this paper was written.
Stone's rate results (1990) are implied by those of Section 7 below, under
conditions that are weaker in some respects (allow for dependence) and
stronger in others (imposing a side condition on allowed number of terms).
Also,
the same convergence rate result is given in Section 6 for variable
degree polynomial regression, which is not considered in Stone (1990), and
uniform rates and asymptotic normality are shown here.
Series Estimators
2.
The results of this paper concern estimators of least squares projections
that can be described as follows.
q
(measurable) functions of
z,
denote a mean-squared closed,
q
with finite mean-square.
Let
z
denote a data observation,
with
q
having dimension
H
Let
r.
and
y
linear subspace of the set of all functions of
The projection of
K
on
y
is
h(q) = argminj^^j^E[{y - h(q)}^].
(2.1)
h(q) = E[y|q],
An example is the conditional expectation,
set of all measurable functions of
q = (ql.ql)',
generalization has
(1=1
n
(2.2)
= {q^p .
h^,^2f^2t^
has finite mean-square,
q.
are subvectors of
L)
with finite mean-square.
q
:
q
,
K
where
An important
q
.,
and
~^^^^
^[^^^(q^,)']
<
<^}.
Primitive conditions for this set to be closed are given in Section
is a smaller set of functions,
x
with many
see Stone (1985) for discussion and references.
e.g.
theory allows for any closed
This
6.
whose consideration is motivated partly by the
difficulty of estimating conditional expectations for
dimensions;
is the
H,
(e.g.
H
= {w(q) [J]„h„(q.)
]
>,
The general
w(q)
a
known function, under conditions for this to be closed), and primitive
conditions are given for power series and spline estimators of a projection
on the
H
of equation (2.2).
The estimators of
h(q)
finite dimensional subspace of
considered here are sample projections on a
H,
which can be described as follows.
be a vector of functions, each of which is an
p (q) = (p.„(q),
.
element of
Denote the data observations by
H,
.
.
,p„„(q)
)'
Let
y.
and
q.,
,
(i
=
and let
...).
2.
1.
for sample size
where
.
•
.
An estimator of
n.
h(q) = p^(q)'n,
(2.3)
y = (y^.
.
y^)
p^ = [p^(q^ )...., p^(q^)
and
'
h{q)
is
^ = (p^'p^)"p^'y.
denotes a generalized inverse, and
(•)
subscripts for
K
have been suppressed for notational convenience.
71
]
The matrix
and
h(q)
p 'p
will
be asymptotically nonsingular under conditions given below, making the choice
of generalized inverse asymptotically irrelevant.
Often the object of interest is not the projection, but rather some
functional
some point
=
{J]
._.h (q
.
projection,
is an
A(h)
A(h) = 3 h(q),
s x
1
vector of real
a partial derivative evaluated at
A(h) = h (q.)-h (q
and
q,
where
h(q),
Examples are
numbers.
H
of
A(h)
= h{q
)
q ,...,q
)
- h(q)
for
the difference of the Jth component of an additive
.) },
at two different points.
Another example is the parameters of the finite dimensional component of
a projection,
i.e.
h(q) = q'P + S._ h „(q
in
p
.).
Estimators of such
parameters have been analyzed by Chamberlain (1986), Heckman (1986), Rice
(1986),
Robinson (1988), Schick (1986), and others, but only under the
conditions
h(q) = E[y|q]
L =
and
1.
Here the nonparametric component can
be additive, which leads to a more efficient estimator if
Var(ylq)
is constant:
the moment that
K
see Section
is closed,
projections of each element of
E[{q -P(q 1« )}{q -P(q |K
)>'
identification condition for
(2.4)
P = A(h),
let
q
].
P,
K
Let
5.
P(q \H
on
H
,
h(q) = E[y|q]
= {Z._ h .(q .)>,
assume for
denote the vector of
)
M =
and
Assuming that
M
it follows that
A(h) = M"^E[{q^-P(q^|K2)}h(q)].
and
is nonsingular,
the
In addition,
)
-
q'M
one can consider linear functlonals of
is in hand,
/3
by subtracting off
h2(q2).
h(q ,q
once
for any
q^p
e.g.
q^,
E[{q,-P(q. |K )}h(q)], where
for
L =
1.
^i=i^2l^^Zp
is some specified value of
q
Another interesting example is a weighted integral of a partial
derivative of
of the form
h(q),
X
A.(h) = /w (q)a %(q)dq,
(2.5)
for a multi-indices
A.
(J = 1
and functions
1.
This is an average derivative
w.(q).
functional similar to that of Stoker (1986),
autoregression from Section
s),
including the nonparametric
Estimators of similar functionals have been
analyzed by Hardle and Stoker (1989), Powell, Stock, and Stoker (1989), and
Andrews (1991).
In this paper most of the analysis will concern linear functionals of
such as the above examples.
The natural "plug-in" estimators of linear
functionals have a simple form.
h(q)
Let
A = (A(p ^(q))
is a linear combination of elements of
A(h) =
h,
H,
A(p^,^,(q)
linearity of
)
)'
.
A(h)
Because
implies
A'tt.
The paper focuses on linear functionals because the linearity in
n
of this
estimator leads to straightforward asymptotic distribution theory for
A(h).
the delta-method can also be used to analyze nonlinear functions of
Of course,
such estimators.
The idea of sample projection estimators is that they should approximate
h(q)
if
K
is allowed to grow with the sample size.
this approximation are that
and
2) p (q)
"spans"
H
1)
as
each component of
K
grows (i.e.
p
The two key features of
(q)
is an element of
for any function in
H,
K
K,
can
)
.
be chosen big enough that there is a linear combination of
p (q)
approximates it arbitrarily closely in mean square).
1),
71= (E[p*^(q.)p*^(q.)'])"^E[p^(q.)y^] = (E[p^(q. )p^(q^
the coefficients of the projection of
2),
will approximate
(q)'7r
p
error in
is small,
tt
h (q)
on
) '
]
n
estimates
)"^E[p*^(q. jh^lq.
Thus,
p (q).
under
) ]
and
1)
Consequently, when the estimation
h (q).
should approximate
h(q)
Under
that
h (q).
Two types of approximating functions will be considered in detail.
They
are:
Power Series:
A = (X
Let
nonnegative integers,
X
q
=
r
Tl„
I
q.
.
,
...,A
denote an r-dimensional vector of
)'
a multi-index,
i.e.
oo
For a sequence
(A(k))
with norm
|A|
= Xii-i-^i'
^^'^
l^t
of distinct such vectors,- a power
series approximation corresponds to
^Ik-
(2.6)
pj^(q) =
i
^ =
A(k-s)
q_
^'
.
s
^.
k = s+1.
,
,
H
allowing for the finite dimensional component of
Throughout the paper it will be assumed that
natural way, with the degree of the terms
in
k.
Also,
X (k-s
including in
of some
exists a
{q
i.e.
q„»,
q
an additive
.
>
are ordered in the
A(k-s)
IA(k-s)|
monotonically increasing
component of the projection will be allowed by
q
only those terms with components that are subvectors
by requiring that for each multi-index
such that the only nonzero components of
where the corresponding component of
q
is included in
condition will be that all such terms appear in
multivariate powers of each
q
„,
X(k-s)
A (k-s)
ordered so that
|A(k-s)|
are those
All of these
A(k-s).
{q.
there
The spanning
q„.
{A(k-s)}.
requirements are summarized in the statement that
increasing.
discussed above.
}
consists of all
is monotonic
The theory to follow uses orthogonal polynomials, which may also have
computational advantages.
A(k-s)
If each
q
is replaced with the product of
orthogonal polynomials of order corresponding to components of
respect to some weight function on the range of
q
q
X(k-s)
and the distribution of
The estimator will be numerically invariant to such a
.
replacement (because
,
then there should be little collinearity among
is similar to this weight,
the different
q
with
\(k-s),
is monotonically increasing),
|A(k-s)|
but it may
alleviate the well known multicollinearity problem for power series.
Splines:
Splines, which are smooth piecewise polynomials,
can be formulated
as projections if their knots (the joining points for the polynomials) are
They have attractive features relative to power series, being
held fixed.
less oscillatory and less sensitive to bad approximation over small regions.
The theory here requires that the knots be placed in the support of
For convenience the support is normalized to
which therefore must be known.
be
r~s
^ = n-=i
(•)
[-1,1]
f"'^'
^-l
q^.,
^^^ attention restricted to evenly spaced knots.
•
= 1(* > 0)(*).
An
Let
degree spline with L+1 evenly spaced knots on
m
is a linear combination of
'^^
[
£
u^,
r
{[u +
For a set of multi-indices
1
£ m,
fc
- 2(A:-m)/(L+l)]^>
with
{A(k)},
the approximating functions for
,
m+1 i
A.(k) ^ m+L-1
/fc
£ m+L
for each
J
and
will be products of univariate splines,
q
i. e.
q^jj.
(2.7)
pj^j,(q)
=
{
Note that implicit in
k =
1,
.
.
.
,
s
r-s
^"inj=i^^(k-s).L_|^<'2j^' ^ =
K
•••^
is a choice of number of knots for each of the
k,
s
s
components of
q
and a choice of which multiplicative components to include.
Throughout the paper it will be assumed that the the ratio of numbers of knots
for each pair of elements of
is bounded above and below.
q
An additive q
component of the projection will be allowed for by requiring that the
multi-indices satisfy the same condition as for power series, which can be
siommarlzed in the statement that the
components that appear in any
terms consist of all interactions of
q
q„„.
The condition that the support of
may first appear.
support
domain
support
R
r—
and range
R
[-1,1],
then
is
= (t(x
q
),..., t(x
))'
will have
Since additive projections are invariant to such componentwise,
}.
the spline estimator based on
estimate the original projection.
X
with
x
t(«) is a univariate one-to-one transformation with
one-to-one transformations,
of
is not as restrictive as it
Suppose that there are "original" variables
and
,
$
is
q
R
r—
is restrictive.
Of course,
Also,
q
will also
the condition that the support
the bounds on derivatives of the
projection imposed to obtain convergence rates in what follows are
restrictive.
The transformation must be continuously differentiable with
positive derivative to preserve differentiability of the projection, and
boundedness of the derivatives will require that the derivatives of the
original projection go to zero as
the transformation.
For example,
x
if
grows faster than the derivatives of
t(') = 2F(«) -
that is continuously differentiable of all orders,
1,
where
F(x)
is a CDF
then the order of
differentiability of the projection is preserved under the transformation, but
boundedness of derivatives requires that the derivatives of the projection
go to zero faster than the density of
Fixed,
F
as
x
goes to infinity.
evenly spaced knots is restrictive, and is motivated by
theoretical convenience.
A judicious choice of transformation may help
alleviate effects of evenly spaced knots.
If a
distribution function is used
to transform the data,
as discussed above,
closely the true distribution,
and the distribution matches
then the transformed variable will be "spread
out," which can improve splines with fixed evenly spaced knots.
Allowing for
estimated knots (e.g. via smoothing splines, as in Wahba, 1984) is known to
lead to more accurate estimates, but is outside the scope of this paper.
The theory to follow uses B-splines, which are a linear transformation of
the above basis that is nonsingular on
[-1,1]
and has low multicollinearity.
The low multicollinearity of B-splines and recursive formula for calculation
also leads to computational advantages; e.g.
see Powell (1981).
Series estimates depend on the choice of the number of terms
that it is desirable to choose
choice of
K
based on the data.
For example, one might choose
K
h
.
(x.
)
E- _i ^V- "^.-if^^s
^ 1
>
is the estimate of the regression function computed from all
the observations but the
data based
With a data-based
by delete one cross
validation, by minimizing the sum of squared residuals
where
so
these estimates have the flexibility to adjust to conditions
K,
in the data.
K,
i
.
Some of the results to follow will allow for
K.
10
.
3.
Regularity Conditions
This section lists and discusses some fundamental regularity conditions
on which all the following results are based.
The first Assumption limits
dependence of the observations.
Assumption 3.1:
is stationary and a-mixing with mixing
{(y,,q.))
U
coefficients a(t) = 0U~^),
=
I,
for
...),
2,
>
^l
2.
The stationarity assumption could be relaxed, but is not done so in order to
keep the notation as simple as possible.
moment conditions on
matrix
D
]>
= [trace(D'D)]
IIDII
1/v
V
{E[IIYII
let
V <
,
u = y - h (q),
Let
y.
00,
and
The results will also make use of
1/2
= y
u
-
h (q.).
for a random matrix
,
the infimum of constants
|Y|
Y,
for a
Also,
|Y|
=
such that
C
00
ProbCllYII
<
C)
=
1.
Assumption 3.2:
|u.
s i 2
is finite for
IS
|
2
is bounded.
E[u.|q.]
and
11
The bounded second conditional moment assumption is quite common in the
literature (e.g. Stone,
1985).
Apparently it can be relaxed only at the
expense of affecting the convergence rates (e.g.
see Newey,
1990),
so to avoid
further complication this assumption is retained.
Either
Assumption 3.3:
(pit)
that
= 0(.t~^),
(t
=
1,
|E[u^u^^^|q^.q^^^]
a)
2,
I
z.
is
uniform mixing with mixing coefficients
...),
for
M
^ c(t)
and
E^"iC(t) <
>
2
b)
or;
there exists
oo.
This assumption is restrictive, but covers many cases of interest,
a dynamic
nonparame trie regression with
h (q.
\J
)
including
_„,...]
_,
XXXXX.LX
= E[y. |q.
X
The next Assumption is useful for controlling
11
such
c(t)
,
q. _., y.
y.
^^
K
K
(p 'p /n)
.
J
Assumption 3.4:
{P
(q)}
K
(P^j,(q)
For the support
and a nonsingular,
probability measure
P
K
—
>
i)
q.
P(q.eQ) a cP(q €Q)
with
,
K
.
there is
K
with
A^
ii) There is
q € Q;
p
.
.
.
(q)'
=
such that
<q(K)
P^^Cq)
)'
there is a
for any measurable set
K
K
X P (q)P (q)'dP(q)
Q Q
is bounded away from
oo.
K
The bounds in ii) give a convergence rate for
all of the restrictions onto ii).
p,j,(q) = q
while iii) controls
For example,
k-1
,
then
if
Q =
that loads
Without this type of normalization the
second moment matrix can be ill-conditioned,
difficulties.
K
p 'p /n,
Hypothesis iii) is essentially a normalization,
its singularity.
and
For each
constant matrix
and the smallest eigenvalue of
zero as
of
iii) For P^(q) = (P^^(q)
£ Cq^K);
maXj^^j,|Pj^(q^)|
Q
^°^ ^^^
^KK^'^^^'Sc
Q
R,
q
K
K
E[p (q)p (q)'] =
KJs.
leading to technical
is uniformly distributed on
[cr..],
1J
which
= l/(i+j-l),
<r
[0,1],
1
has a smallest eigenvalue that goes to zero faster than
K
factorial.
One approach to verifying this assumption is to find a lower bound
on the smallest eigenvalue of
p (q)[X(K)]
to let
-1/2
,
P,^(x)
K
K
E[p (q)p (q)'],
and then let
as in Newey (1988a) for power series.
P
(q)
\(K)
=
Another approach,
is
be a transformation that is orthonormal with respect to some
density and assume that the true distribution dominates the one corresponding
to that density,
and to use known bounds for orthonormal functions, as in
Cox (1988), Newey (1988b), and Andrews (1991) for power series.
A third
approach to find a transformation that is well conditioned, though not
orthogonal, as for B-splines in Agarwal and Studden (1980) and Stone (1985).
The next Assumption specifies the way in which the number of terms is
allowed to depend on the data.
12
In
Assumption 3.5:
with
K(n) 5 K
subvector of
K = K(z,
K(n)
:s
z
,
n)
such that
There are
with probability approaching one;
which is a subvector of
p (q),
i)
p
K(n) ^ K(n)
—
p-
ii)
for all
(q),
is a
(q)
K(n) i K s
K(n).
That is,
is allowed to be random,
K
but must lie between nonrandom upper
and lower limits with probability approaching one,
functions must be nested.
= K.
and the approximating
Nonrandom K is included as a special case where
These upper and lower limits control variance and bias,
(the larger is
respectively
the less bias there is but the more variance).
K
K
It
would be
interesting to derive such upper and lower limits for specific choices of
K
cross-validation), but these results are beyond the scope of this
(e.g.
paper.
Some results below will require that the transformed approximating
functions are also nested:
Assumption 3.6:
3.3,
P—
for all
is a subvector of P (q),
(q)
K(n)
Assumption 3.4 is satisfied and for
rs
Yi
:s
P
of Assumption
(q)
which is a subvector of
P
(q),
K(n).
This Assumption is satisfied for power series but not for splines,
following spline results are limited to the nonrandom
K
so that the
case.
The next Assumption imposes a rate condition for convergence of the
second moment matrix.
p'^
(q)P
(q)'
Let
denote the number of elements of
that are nonzero for some
convenience suppress the
Assumption 3.7:
K(n)
KCq(K) /n
n
argument in
>
0.
13
K,
q e Q,
K,
K,
and for notational
and
K
henceforth.
This is a side condition that will be maintained throughout.
growth rate of
sense,
It
limits the
in a way that may be nonoptimal in the mean-square error
K
as discussed in Sections 6 and
7,
although it is weaker than similar
side conditions imposed by Cox (1988) and Andrews (1991).
The bias of these estimators depends on the error from the finite
dimensional approximation.
error.
Sobolev norms will be used to quantify this
For a measurable function
|f(q)|^^^ =
if(q)|
for some
|X|
,
Q,
^ d.
V
Q,
let
-X,^,^^<E[|a^f(q.)r]}l/^
-^U|.d'^'^(^i^lv =
|f(q)|^^^ .
The norm
defined on
f (q)
'"^^lAl^d'^^^^qeo'^^^^''^'-
will be taken to be infinite if
d f(q)
does not exist
Inclusion of derivatives in these norms will be useful for
deriving properties of
9 h(q).
Many of the results will be based on the following polynomial
approximation rate hypothesis:
Assumption 3.8:
and
a = a(?,d,v)
For each class of functions
such that for all
'^'n^R^^'^^^ - P^'^'^^'^ld.v ^
?,
there exists
C = C(?,d,v)
f 6 ?,
^" a
This condition is not primitive, but is known to be satisfied in many cases.
Primitive conditions for power series and splines are given in Sections 6 and
7.
In order for the same bias bounds to apply to estimated functionals of
interest,
it is necessary that
A(h)
be continuous with respect to the same
same norm as in Assumption 3.8, which is imposed in the following condition.
14
Assumption 3.9:
For the
d
and
of Assumption 3.8,
v
linear functional with respect to the Sobolev norm
such that for all
h e K,
IIA(h)ll
C|hkd, V
s
|h|
is a continuous
A(h)
,
d, V
,
there is
i.e.
C
.
Convergence Rates
4,
This Section gives mean-square convergence rates for
and uniform
h(q)
consistency rates for its derivatives and continuous linear functionals.
The
results include both sample and population mean-square error rates.
Theorem 4.1:
and
3=
= {h},
-3.5
If and Assumptions 3.1
and 3.7 are satisfied for
d =
0,
then
lj^^[h(q.)-h(q.)]^/n =
(K/n
If Assumption 3.6 is also satisfied,
S[h(q)-h(q)]^dF(q) =
+
K"^"f j:^,^^^^^/^;""'';^'''';
then for the CDF
(K/n
+ k'^""
F(q)
{l^^^^^CK/K)'"^^
of
}^^'^
q.,
).
The two terms in the convergence rate essentially correspond to variance and
The bias term, which is
b ias.
K
if
K =
K,
and for
K
,„ ,,,,-av, 2/v
)
iY^j.^^r^^i^'^)
2a, „
is bounded above by
v > 1/a
will be equal to
.
K
.A
consequence of the second conclusion is convergence rates for some version of
the additive components,
h(q)
because
Ti
closed implies that the mapping from
to some decomposition into additive components is mean-square continuous
(see Bickel et.
al.
,
1990,
Appendix).
Uniform convergence rates depend on bounds for the derivatives of the
15
I
I
series terms.
Assumption 4.1:
For each
for all multi-indices
k s K,
with
A
P j,(q)
s p
\\\
is differentiable of order
there is
and
p
such that with
Ci-ki(K)
A
1
probability one
Theorem 4.2:
00,
and
9^
maXj^^j,!
aV^^Cq.
)
i
I
Cj^l
and
(K)
If Assumptions 3.1 - 3.8 are satisfied, for
= {h},
and Assumption
sup .\dh(q)
qev
-
^h(q)\
4.
1
is satisfied,
=
p
The uniform convergence rate for
i Cq^K)
C|;^| (K)
d =
\X\
,
s p,
v =
then
(K'^^^C,..,(K)l(K/n)^^^ + K~";;.
A
I
is slower than the mean-square rate
h(q)
and does not attain Stone's (1982) bounds, although it is faster than
previously derived rates for series estimators, as further discussed below.
Convergence rates for continuous linear functionals of
h(q)
will follow
from this result.
Theorem 4.3:
C,.^.(K)
p
If Assumptions 3.1 - 3.9 and 4.1 are satisfied for
is monotonically increasing in
(K^^^t^XK)i(K/n)^^^
d
|\|,
then
v =
a>
and
A(h)-A(h^) =
+ k"";;.
The implied convergence rate for mean-square continuous linear functions is
not sharp, as they are shown to be Tn-consistent (under slightly stronger
conditions) in Section
5.
16
5.
Asymptotic Normality
An estimator of the asymptotic variance of
An
can be formed from the
usual estimator of the asymptotic variance of the projection coefficients
n.
The asymptotic normality result below will require that the products of
H
elements of
and the residual be martingale differences,
autocorrelation correction is required.
Treating
K
as fixed,
Let
V = I.;^p^(q. )p^(q.
t H p^'p*^/n.
n,
)' [y.
-h(q.
)
]^/n.
the White (1980) estimator of the asymptotic variance
of the projection coefficient estimator
combination of
so that no
Z VZ
is
tt
Since
.
Att
is a linear
a corresponding estimator of the asymptotic variance
is
n = A'z:~vt~A/n.
This estimator is consistent as
K
grows,
under conditions to follow.
Further conditions are useful for asymptotic normality of
consistency of
Q.
11
and
E[u. |q.]
E[h(q. )u. lz._
,
111
V = E[p^(q.)p^(q.)'u^].
Assumptions 3.1
2
i)
z.
and
Let
Z = E[p^(q.)p^(q.)'].
Assumption 5.1:
A(h)
-
3.2 are satisfied, with
is bounded away from zero;
__,...
]
= 0;
ii) For any
iii) K = K(n)
h (q.
)
17
= E[y.
s > 4/i/(/i-l),
h(q) e H,
is nonrandom.
Fart ii) of is the martingale difference assumption:
observations are independent or
Q = A'Z"^VZ"Vn.
I
z.
_,
it
z.
holds if the
_,...]
.
An
Assumption 5.2:
has full column rank for some
A
a nonsingular matrix
on
C^
such that for all
J
K
and for all
^ K,
K
there is
does not depend
C P (q)
K.
Part ii) of this hypothesis rules out asymptotic linear dependence among
different components of
When
A(h).
is mean-square continuous in
A(h)
a primitive condition for this hypothesis is that
is onto
A(h)
R
h,
as
,
discussed below.
Assumption 5.3:
KK<Q(K)Vn
K,
Assumption 3.4, 3.8, and 3.9 are satisfied for
—
>
and
0,
vQC~" -^
v ^
K £
2,
0.
The second condition requires essentially that the bias converges to zero
faster than
1/Vn
(see Assumption 3.8),
which is stronger than the natural
condition that the bias go to zero faster than the variance.
Theorem 5.1:
n~^^^[A(h) - A(h)]
Furthermore,
\b
^n
Q
—
>
n^,
-5.3
If Assumptions 5.1
-^
are satisfied then
N(o.i),
if there exists a scalar
ip
dr^^^ih
>
-
qjq'^^^'
-^
and nonsingular
o.
n_
such that
then
il^^lACh)
-
A(h)]
-S
N(O.n^),
ipjl
-^
n^.
This result improves on Andrews (1991) in applying to estimators of
projections other than conditional expectations, allowing for dependence, and
having a faster growth rate for
more diffcult to allow for random
but restricts
K,
K
K
to be fixed:
with dependent observations.
The second conclusion is useful for forming asymptotic Gaussian
18
It
is
confidence intervals in the usual way.
satisfied and
K^^'^C^CK)
[
consistent estimator of
(K/n)
^'^^
n
1/2
i/»
—
+ k""']
>
so that
0,
A(h)
is a
in the standard way.
is not restrictive when
n
A(h)
the delta-method can be used to make inference
A(h),
about smooth nonlinear functions of
hypothesis about
If the hypotheses of Theorem 4. 3 are
will satisfy the hypotheses.
A(h)
However, when
The
where
is a scalar,
is a vector,
A(h)
ili
^n
it
requires essentially that the variance of each component of
A(h)
converge
which may not be true when, e.g.,
A(h)
includes
to zero at the same rate,
both
and its derivatives at a single point.
h(q)
primitive condition for
= n,
i//
=
is possible to derive a
It
corresponding to Vn-consistency of
A(h),
which is stated in the following result.
Theorem 5.2:
V =
and
Suppose that i) Assumptions 5.1 and 5.3 are sat isfied for
ii) for any
2;
E[{h(q)-p (q)'n
8(q)
and
}
]
of elements of
—
as
>
H
A(h) = E[S(q)h(q)]
VE[A(h) - A(h)]
h(q) €
K
—
>
such that
for all
-^
K
oo;
there exists
K
such that
iii) there exists an
E[5(q)S(q)'
]
Then for
h e K.
N(O.Q^),
n
nn
-^
d =
s x
vector
1
exists and is nonsingular,
Q^ = E[a-^(q)6(q)5(q)'
]
n^.
Hypothesis ii) is the minimal mean-square spanning requirement for consistency of the sample projection.
By the multivariate Riesz representation
theorem (e.g.
hypothesis iii) is equivalent to the
statement that
see Hansen,
A(h)
1985),
is mean-square continuous and has range
R
.
Furthermore, mean-square continuity of such linear functionals is a necessary
condition for a finite semiparametric variance bound for
(1956),
A(h),
as in Stein
and hence for existence of a (regular) v'n-consistent estimator
(Chamberlain,
1985),
so that mean-square continuity of
the v'n-conslstent case.
19
A(h)
characterizes
Theorem 5.3:
is a scalar functional that is not mean-square
A(h)
If
continuous then the semiparametric variance bound for
A(h)
is infinite.
Theorem 5.2 can be specialized to many interesting examples,
including
the parameters of the finite dimensional component of the projection and
average derivatives.
= q'^p
-i-
(p ^
Theorem 5.4:
h (q
)
P = E[5(q)h(q)
5(q) =
for
]
the parameters
m"-^
3
[q^-P(q^ IW^)
of
1
h(q)
•
the mean-square spanning hypothesis of ii) will be satisfied as
Furthermore,
any
satisfy
h^{q^)
long as
As noted in equation (2.4),
(q
),
.
.
.
,p
(q
)
)'
H
span
giving the following result:
,
Suppose that i) Assumptions 5.1 and 5.3 are satisfied; ii) for
e
K
there exists
E[{q^-P(q^\n^)}{qj^-P(q^\n^)}'
such that
^\i(^fc>i(>
n^ =
Then for
is nonsingular.
]
M~h[a-^(q){q^-P(q^\n^)}{{q^-P(q^ \}<^)}]M~\
-
Vn(^
p^)
-^
nn
N(o,n^).
Sample projection estimators of
-^
n^.
have been previously analyzed by
p
Chamberlain (1986), Andrews (1991), and Newey (1990), but only under
q'p
+ h„(q„)
h (q„)
= E[y|q],
an unrestricted functional form for
could not be additive), and independent observations.
E[y|q] = q'p
of this result is that if
constant,
+ h
(q
)
and
then an estimator that imposes additivity
asymptotically more efficient than one that does not:
are
2
0-
<T^iE[{q^-Piq^\n^)}{q^-P(q^\n^)}'
-1
(E[{q -E[q Iq
]
>{q -E[q Iq
semi-definite difference.
the convergence rate of
]
>'
Thus,
p,
]
)
])''^
2
o-
h (q
(q)
)
h„(q„)
(e.g.
One implication
= Var(ylq)
is
will be
The asymptotic matrices
and
respectively, which have a positive
although imposing additivity does not improve
it can lower its asymptotic variance.
20
.
Theorem 5.2 can also be specialized to the average derivative functional
of equation (2.4) for certain weights.
If there are no
boundary terms then
integration by parts gives
X
|A
A.(h) = S w.(q)a -^h(q)dq = (-1)
^
Q
X 3 -'w(q)h(q)dq
Q
-^
|A
= E[5
where
(q)h(q)],
5
is the density of
f(q)
= (-1)
(q)
Following Stoker (1986),
hence
A(h)
A
I
"
P(a ^w(q)/f (q) K) (q)
|
2
Here,
E[5.(q)
will be finite if if
^
A
9
w(q).
Let S(q) = (5 (q)
5 (q))'.
q.
]
is not too small relative to
f(q)
A
I
^
the previous integration by parts will be valid,
and
mean-square continuous, under the hypotheses of the following
result.
Suppose that
Theorem 5.5:
~
any
oo;
h(q) €
iii)
Q
K
there exists
|A|
n
K
~
K
such that E[ {h(q)-p (q)'n
Is convex with nonempty Interior,
different lab le to order
for all
Assumptions 5.1 and 5.3 are satisfied;
i)
s.
|A.|,
E[<T^(q)5(q)a(q)'
|A.|
on
E[S(q)S(q)'
Q
]
and
8
w
.(q)
w .(q) =
2
ii) for
K
—
on the boundary of
Q
K
}
]
—
>
as
>
Is continuously
exists and is nonsingular.
Then for
Q_
].
Vn(A(h)
-
A(h^))
-^
N(0,n^),
nh
-^
Q^.
Primitive conditions for these results for power series and splines are given
in the following Sections.
21
=
Power Series
6.
This Section gives primitive conditions for consistency and asymptotic
normality of projections that use power series.
(2.2),
For each
i)
q_ = q_„,
then
l
such that for each
>
q
(£
„,
for some
=
L),
1
is a subvector
q„
ii) There exists a constant
I';
with the partitioning
i,
if
q_ =
iqy'p.q.
'.)'
Either
p =
n^
(i.e.
is not present) or
q.
and ii) are sufficient for
i)
Boundedness of
1
]
a(q)
iii)
;
is bounded and for the clos-
{^^^h^^iq^py. El{q^-P{.q^\}i^)}{{q^-P(.q^\K^)}']
of
Conditions
q
c >
for any
,
cJ"a(q)d[F(q2^)-F(52£)] ^ E[a(q)] ^ c"Va(q)d[F(q2^) -FC^^^)
0,
ure
is as specified in equation
and the following hypothesis is satisfied:
Assumption 6.0:
of
H
it will be assumed that
the next Sections,
Throughout both this and
{Z._ h .(q .)}
is nonsingular.
to be closed.
can be relaxed, but for brevity is not here.
q
Power series estimators will be mean-square consistent if the regressor
distribution has an absolutely continuous component and the
slowly enough; see Newey (1988a).
There are finite
that the support of
q„,
is
r
grows
To obtain convergence rates it is useful
to bound the regressor density below,
Assumption 6.1:
K
as follows:
2
1
q. > q.,
12
fl._i[q..q.]
v. ^ 0,
(J = 1
such
r)
and the distribution of
has
q
absolutely continuous component with density bounded below by
Cfl-si
^
^'^ •"'' -^
It is also
^^ -"^
-^
^
°^ ^^^ support.
possible to allow for a discrete regressor with finite support, by
including all dummy variables for all points of support of the regressor, and
all interactions.
Because such a regressor is essentially parametric, and
allowing for it does not change any of the convergence rate results,
22
this
generalization will not be considered here.
To state further conditions,
let
denote the sequence of
{'V(k)>
multi-indices used in defining the power series or spline interactions in
equations (2.3) and (2.4) respectively, and for any (other) multi-index
^(k) = {j
X (k) * 0>,
:
V = max, r.
„,,
,u./J(k),
k^j€^(k)
J(k) = #^(k) <
A = max,
7
n,
,A./J(k).
„,,
k^j€^(k)
J
A,
.
J
For power series, Assumption 3.7 follows from
_4+4t>
Assumption 6.2:
K
/n
—
0.
>
For
a =
v = 0,
and
1
Cox's (1988,
715)
p.
1/4
K = o(n
),
this condition is
K = o(n
)
which is weaker than
requirement.
Primitive approximation rate conditions (as in Assumption 3.8) for power
series follow from known results of Lorentz (1986), Powell (1981), or a Taylor
expansion.
Assumption 6.3:
Each of the components
continuously differentiable of order
h
h Aq.)
a = h-d
when
t.
=
corresponding conditions for
v =
oo,
with
is
L),
1
q..
d =
and
a = h/n.
A literature search has not yet revealed
1.
d
=
on the support of
This hypothesis implies Assumption 3.8 for
and with
(£
,
>
and
a
>
1,
but rates for this case
follow from a Taylor expansion under the following (strong) condition:
Assumption 6.4:
the
A
There is a constant
C
such that for each multi-index
partial derivative of each additive component of
is bounded by
C
23
h(q)
A,
exists and
The first power series result gives convergence rates.
Theorem 6.1:
-3.3
Suppose that Assumptions 3.1
and 6.0 - 6.3 are satisfied.
Then
Y..'l^[h(q.)-h(q.)]^/n =
(K/n
S[h(q)-h(q)]^dF(q) =
in addition,
K^^'').
(K/n + K'^^"^ )
sup ^^\h(q)-h(q)\ =
Suppose,
+
(K^''^{[K/n]^^^ + K^^"^})
that either a)
Assumption 6.4 satisfied and
n.
=
i,
X < h,
and a = h-\, or; b)
is any positive number;
a
sup^^^\dh(q)-ah(q)\
=
Then
(K^*''*^'^{[K/n]'^^^ + k"*";;.
This result implies optimal convergence rates for power series estimators of
h(q)
when
K
If the density of
satisfied.
Cn
,
h(q)
goes to infinity at the optimal rate and Assumption 6.2 is
y = n/{2h+n.),
is
n
condition that
and
,
h
>
h
>
q
is bounded away from zero,
3n,/2,
,
K =
the mean-square convergence rate for
which attains Stone's (1982) bounds.
3n/2,
K = en
The side
which is needed to guarantee Assumption 6.7,
limits
this optimality result, but is weaker than the corresponding condition in Cox
(1988).
These mean-square error results apply to additive projections (rather
than conditional expectations),
or Andrews and Whang (1990),
(1990),
like Stone (1985,
1990) but unlike Cox (1988)
allows for interactive terms, similarly to Stone
(although Stone (1985) also derives optimal rates for derivatives),
The side condition Assumption 6.2 is
and allows for dependent observations.
not present in Stone (1985) or Andrews and Whang (1990), but it implies a
population mean-square error result, unlike Andrews and Whang (1990).
comparison with Cox's (1988) uniform convergence rate of
24
K
/Vn + K
In
=
K
(
[K/n]
K
+
Cox's notation,
+ K
),
for univariate
)
h =
1
and
with density bounded away from zero (in
q
k = 2),
the rate here is the faster
1/2
and uniform convergence rates for derivatives are given here.
To state the asymptotic normality result for power series,
the variance matrix estimator described in Section
5,
satisfied with
s > 4iL/(y.-l),
respect to the Sobolev norm
\h\
,
,
a,
—
)
\h
n
"0
ip^^'^^lACh)
-
A(h)]
In comparison with Andrews
-^
are
6.1,
is continuous with
either a) 6.3 is satisfied,
Q„
—
=
n.
for some
oo
>
such that
yph
N(0,n^),
(1991),
6.0,
d =
0,
and
00
K/n
and nonsingular
>
- 5.2,
A(h)
ii)
0;
>
or b) Assumption 6.3 is satisfied,
0;
Assumption 6.4 is satisfied and
a scalar
—
/n
IC
denote
Q
let
for power series.
Theorem 6.2: Suppose that i) Assumptions 3.1, 3.2, 5.1
VnK
K([K/n]
-^
lA
^n
7 > 0;
Q
—
>
—
VnK
1,
>
Hi)
Q^.
0;
or c)
there exists
Then
n^.
this result applies to projections other
than the conditional expectation, or allows for dependent observations, and
has weaker growth rate conditions for
o(n
)
h
>
5T./2
(e.g.
h(q)
v =
if
K = o(n
while Andrews (1991) requires
imply that
n.
K:
is thrice
).
then
i)
For
d =
requires
i)
K =
and ii)
continuously dlf ferentiable when
= 1).
This result can be applied to estimation of the components of an additive
projection and their derivatives, when the observations are independent.
25
Theorem 6.3:
and bounded away from zero,
are satisfied for
v
= 0,
.
|u|
<
each
h
J
h
order
q
.,
q
on
Q,
K = o(n
tr^^^lChrq.)
Also,
—
VnK
if
>
for
oo
Xq
J
-
(J
.,
hXq.)}
-
.)/dq^. - a'^h
—
VnK
=
.Jq
is bounded
(q)
Assumptions 6.0 and 6.1
2,
....
1.
Then for any pair of points
0.
>
r),
(h.^Cq.) - h.^Cq.)}]
then for any
0,
Q~^^^[a^h Xq
q
s >
a-
is continuously
^ different iable to
.)
J
and
),
in the support of
.
2
Suppose the observations are independent,
q
.
-^
in the support of
.)/dq^.]
-^
N(0,1).
q
.,
U(0, 1).
The differencing normalization here is different than the mean centering in
in Stone (1985), which would be more difficult to work with.
A v'n-cbnsistency and asymptotic normality result for power series
estimates of mean-square continuous linear functionals
is:
Theorem 6.4: Suppose that Assumptions 5.1, 6.0, 6.1, are satisfied and
¥r
/n
—
exists an
>
0;
ii) Assumption 6.3 is satisfied,
s x
1
vector
d(q)
exists and is nonsingular, and
n^ = E[<r^(q)5(q)5(q)'
Vn[A(h)
-
of elements of
and
VnK
H such
A(h) = E[S(q)h(q)]
—
>
0;
Hi)
that E[S(q)S(q)'
for all
h € H.
there
]
Then for
],
A(h)]
-^
N(O.n^),
nf2
-^
n^.
This result can be specialized to the parameters of a finite dimensional
component and average derivatives, as follows.
26
Theorem 6.5: If hypotheses
i) and ii) of
Theorem 6.5 are satisfied then for
n^ = M~^E[a-^(q){q^-P(q^\n^)}{{q^-P(q^\H^)}' ]M'\
Vn(^
-
^q)
-^
N(O.^q).
r)fl
-^
n^.
This result gives fully primitive regularity conditions for ^-consistency and
asymptotic normality of a power series estimator of the parameters of a finite
dimensional component of a projection.
It
allows for
h (q
to have the
)
additive form discussed above, and also allows for dependent observations.
An
analogous result can be given for weighted average derivatives, although for
brevity such a result is only given below for splines.
7.
Splines
Results for splines are limited to the case
Assumption 7.1:
....
Assumptions 6.0 and 6.1 are satisfied with
v
.
= 0,
(j
=
J).
Splines allow for a faster growth rate for the number of terms.
Assumption 7.2:
3
K /n
—
>
0.
Approximation rate conditions for
a =
1
or
d =
follow from known
results, but a literature search has not yet revealed conditions for other
cases,
which limits the following results.
27
1,
Theorem 7.1:
7.2,
Suppose that
a =
m
and 6.3 are satisfied, and
lip
=
Z.'^JMq.)-h(q.)]^/n
^1=1
S[h(q)-h(q)]^dF(q) =
sup
If,
=
n.
1,
h
^
-
Assumptions
Then for
1.
|A|
3.
-
1
3.3,
7.1,
^ m,
(K/n + K^^^"^),
(K/n
\h(q)-h(q)\ =
in addition
d = 0,
or
1
K^^"^)
+
(K(lK/nl^^^
+
K^^"^}).
then
(K^'^'^dK/n]^^^ + K^*^}).
sup^^^\a^h(q)-ah(q)\ =
This result yields optimal mean-square convergence rates for spline regression
estimation of an additive projection with dependent observations,
n
computed as described in Section
Theorem 7.2: Suppose that
Sobolev norm
>
for
,
d,
=
n
n
or
1
0;
ii)
d £
m.
—
>
n„.
Apparently,
let
\.
be the variance estimator
Q.
using splines.
d = = 0;
A(h)
Assumptions
i)
5.
1
- 5.2,
7.1,
is continuous with respect to the
Assumption 6.3 is satisfied for
m ^ h-1,
a scalar
ijj
and nonsingular
>
n^
such
Then
- A(h)]
-^
N(0,n^).
ipjl
-^
Qq.
there are no other asymptotic normality results for spline
projections in the literature.
)
7,
>
00
il,^^''^[A(h)
K = o(n
5,
—^0; Hi) there exists
VnK
i/(
\h\
n.
4
—
K/n
are satisfied and
that
h
here the side condition of Assumption 7.2 is satisfied if
:
Throughout the rest of Section
and
K =
if
for power series,
differentiability of
h(q)
The growth rate
K = o(n
1/4
)
is smaller than
so asymptotic normality will require only twice
rather than the thrice differentiability for power
28
.
This result can be specialized analogously to Theorem 6.3, although
series.
for brevity this specialization is omitted.
A Vn-consistency and asymptotic normality result for spline estimators of
mean-square continuous linear functionals
is:
4
K /n
Theorem 7.3: Suppose that Assumptions 5.1, 6.1, are satisfied and
ii) Assumption 6.3 is satisfied,
an
s X
vector
1
m 2 h-1
H
of elements of
S(q)
and is nonsingular, and
—
VnK
and
>
such that
A(h) = E[S(q)h(q)]
for all
Hi)
0;
>
0;
there exists
E[S(q)S(q)'
]
exists
n^ =
Then for
h s H.
—
E[(r^(q)S(q)S(q)']
-^
- A(h)]
Vn[A(h)
nh
N(0,n^),
-^
Q^.
This result can be specialized to the parameters of a finite dimensional
component and average derivatives, as follows.
derivative result is given here.
Let
components of
jth,
other than the
q
density of the
Theorem 7.4: Suppose that hypotheses
order
|A.|,
|A.|
h .^(q
on
d >
d ^
0,
and
Q
denote the vector of all the
.
and
the conditional
f(q.|q_.)
component given the others.
jth
for some integer
q_
For brevity, only the average
d
is continuously different iable to
w .(q)
m,
w .(q)
Theorem 7.3 are satisfied,
i) and ii) of
on the boundary of
=
is continuously different iable to order
.)
\\[d'^w(q .)/dq'^.]/f(q .\q
[-1,1]
->
J
J
\\\
and
d,
Then
.;il„<oo.
w(q .)[d^h.(q .)/dq^.]dq
Vn[S
for all
Q
J
N(0,E[a-(q) S(q) ]),
.
nQ
w(q
- S
J
[-1,1]
-^
29
.)
[d^h
E[a-(q) 8(q)
(q .)/dq'^.]dq
J^
-'
]
J
J
.]
J
^
Proofs of Theorems
8.
The proofs of Sections 8 and 9 are abbreviated, with details provided
only for central or potentially unfamiliar results.
A longer version of these
sections is available from the author upon request.
Throughout,
a generic positive constant and
\
.
min
(B)
and
maximum eigenvalues of a symmetric matrix
X
max
be
C
let
be minimum and
(B)
The following Lemmas are useful
B.
in proving the results for power series and splines.
Lemma 8.0:
~
~
E[h^„(q j)
If Assumption 6.1 i) and 11) are satisfied,
2
}
< m,
i =
1,
.
.
.
L}
,
then
Is closed In mean-square.
{Zp_ h „(q .):
If In addition
Assumption 6.1 ill) is satisfied then H is closed.
For now,
Proof:
drop the
2
let
K„ =
^ Cmax-{llh„ll
C
K
such that for each
Existence of such a
>.
and for nontational convenience
.)},
and Wellner (1990),
existence of a constant
llhll
h „(q
By Proposition 2 of Section 4 of the Appendix of
subscript.
Bickel, Klaasen, Ritov,
{S„
argument like that of Stone (1990, Lemma
C
1,
closed is equivalent to
h e
Jf
there is
can be shown using an induction
"L„ Rate of Convergence for
As noted
Interaction Spline Regression," Tech. Rep. No. 268, Berkeley).
there,
than
with
h.(q.)
assuming that this property holds for each maximal dimension less
a,
for each h e K,
such that for all
q-
there is a unique decomposition
that are strict subvectors of
for all measurable functions of
Stone (1990),
q„
q„,
x»,
q„,
that there is a constant
E[h(q)^] i c~^E[h^(5^)^].
30
Y.p-A^p^'^p^
E[h„(q.)5(q„
with finite mean-square.
it suffices to show that for any "maximal"
proper subvector of any other
h =
Then,
) ]
>
=
following
that is not a
c >
1
such that
.
To show this property,
-^c
q„
note that that holding fixed the vector of components
«
-^
of
that are not components of
q
of a strict subvector of
each
q„
* k,
i
is a function
h.(q.)
Then,
q„.
E[h(q)^] s c~^S{\iqf) + E^^^h^(5^) >^dF(5^)dF(5^)
= c"V[X<h^(5^) + j:^^^h^(5^)>2dF(5^)]dF(5^)
= c"V[J'<h^(q^)2 + {5:^^^h^(5^)>2}dF(5^)]dF(q^)
c"V[Jh^(q^)^dF(5^)]dF(q^) = c~^E[h^(5^)^]
2=
To show the second conclusion,
= {q'^
+ h„(q_)
^1"^
2 ^2
To show
square.
h.,
22
h
converges to some
p
and hence
,
= h.-q'S.
h_
^1'
2j
J
J
H
so
Lemma 8.1:
= {f(q):
h
for
d =
Proof:
e K,
is a
—
jO
h.
>
h.
in mean
Cauchy sequence, and hence
converges to
q!j3.
h.-q'S.,
^1*^0'
^
If the support
Q
of
each additive component
C
.
J
H
and let
'\\^r>.
mean-square,
iri
which is an element of
K_
2
e K.
different iable of order
there is
h
subscript,
2
mean-square continuous function of
p.
converges to
.
closed,
is a
p.
Cauchy sequence,
a
.
and hence
by
note
e H,
h
so that by
Consider a sequence
^
h„ € K_>.
:
now add back the
^
and
q
of
f(q)
max
\d
such that for all
f e 5,
and
a =
a =
and for
1,
^ C}.
inf
(
< d,
such that
is continuously
f(q)
f(q)\
>
^/n.,
C
is a box and there is
.
then for power series,
^K\f(q)-p'^(q)'n\_,
and
< CK'"'.
a = f-d.
First, note that it suffices to show the result for
a =
r,
since
the approximation error of the function is bounded by the sum of errors over
all additive components.
monotonic increasing,
For the first conclusion, note that by
the set of all linear combinations of
include the set of all polynomials of degree
31
QC
for some
p
(q)
C
|\(K)|
will
small
^
.
enough,
so Theorem 8 of Lorentz (1986) applies.
Q =
let
[q
,q
and note that
],
series up to order
for all
K.
with
n
sup |af(q)/aq - 5fj,(q)/5q| £ C-K
coefficient chosen so that
f(q,
)
Lemma 8.2:
satisfying
(q)'Tr
= f„(q,
d =
for
e.g.
Q,
and the constant
1
Jv
a^f„(5)/aq|d5 ^ ck"^"'\
If the support
of
Q
is star-shaped and there is
q.
of
f(q)
then for power series, for all
},
for all
star-shaped,
there exists
s p £
for all
q e Q
For a function
1.
series up to order
ti\
m
C
such that
>
s CK~°^.
^
(w. l.g.
such that for all
for an expansion around
Note
q.
)
r = n.
that
By
^q + (l-3)q € Q
q e Q,
let P(f,m,q)
f(q),
denote the Taylor
5P(f
,
m,
aV(f,m,q) = PO'^f m-1 A|
so that by induction
P(af/aq.,m-l,q),
a f(q)
„fi\f(q)-p'^(q)'
max^\d f(q)\ ^
X,
there is
d >
As above, assume without loss of generality
Proof:
Q
inf
f e ?,
a,
such
C
is continuously
f(q)
different iable of all orders and for all multi-indices
C
^
|f(q) - f(,(q)|
),
1
K.
^ = {f(q): each additive component
that
such that
C
The second conclusion then follows
.
1
ia'*f(q)/aq'* -
there exists
f^(q) = P
''
by integration and boundedness of
S
is a spanning vector for power
(q)/9q
By the first conclusion,
there is
f € ?,
d p
For the second conclusion,
,
,
q)/aq
q).
.
=
Also,
also satisfies the hypotheses, so that by the intermediate value form
of the remainder,
max ^j^la'^fCq) -
Next,
let
m(K)
combination of
,
m- A| q)
|
,
|
cVe
^
(m-d)
be the largest integer such that
p (q),
ordering" hypothesis,
i C mCK)"",
PO^f
and let
fj^Cq)
a >
0,
c'"^'^
32
C
V[
]
P(f,m,q)
and
(m(K)-d)
such that
C
!
is a linear
By the "natural
= P(f,m(K),q).
there are constants
so that for any
!
]
£ CK"",
and
C,m(K)
s K
sup|^l^^ ^\d\(q)-dh^(.q)\ = sup|^|^^^Q|a^f(q)-P(a^f.m(K)-U|.q)| ^ CK'
Lemma 8.3:
C
If the support
of
Q
q
? = {f(q): each additive component
such that
continuously different iable of order
then for splines there is
^-1,
inf
^K\f(q)-p'^(q)'Ti\,
a,
TteiK
Proof:
Powell (1981),
I
a
< CK'"-,
such that for all
>
n.
=
1.
^
< d.
d =
assume r =
w. l.g.
CK
- a f
f(q)
and
there is
is
and
m a
f e ?,
a = ((/ri)-d.
follows by Theorem 12.8 of Schumaker (1981).
and let Q = [-1,1].
1
large enough and some
K
for
there exists
f(q)/aq
d = 0,
f(q)\ ^ C),
\d
is a spanning vector for splines of degree
(q)/aq
spacing bounded by
sup
C
of
f(q)
max
and
^
or
00
The result for
For the other case,
d p
1=1
is a box,
.
such that for
n
(q)/aq
i C'K
I
Note that
with knot
m-d,
C.
Therefore, by
f„(q) = p (q)'Tr
,
The conclusion then follows by
.
integration.
If Assumption 6.1 is satisfied,
Lemma 8.4:
3.4 is satisfied and for any
Proof:
First,
assume that
then for power series Assumption
there is a constant
d >
is nonexistent,
q
the definitions in Abramowitz and Stegun (1972,
the ultraspherical polynomial of order
Tr2^~^°'r(k+2a)/{k! {k+a)[r(a)]^},
12
2
1
.)
J
J
J
x.{q.) = (2q .-q .-q .)/(q .-q
J
J
J
P.
k
K
P (q)
(q)
^
J
and
k
and let
Ch.
22),
for exponent
p^^Ux) =
C
such that for all
q = q
let
.
C
(a)
..(a)
a,
n
[h'-'^h'^'^^C^'^Ux).
Following
ix)
denote
=
Also,
and define
(i^ +.5)
= n.^.p .\,
(a:.(q.)),
j=r A (k)
J ^j
,
K
p (q)
is a nonsingular combination of
33
by the "natural ordering"
let
.
assumption (i.e. by
monotonic increasing).
|X(k)|
J]
V
n-=i
f
^^5 -~*5 -^ ("^
-"^
-^
^
12
r
Q =
absolutely continuous on
for
Also,
P(q)
with pdf proportional to
._^[ci .,q .]
J
J
J
^^^ ^y t^® change of there is a constant
>
wi th
C
A^^^(jP^(q)P^(q)'dP(5))
(v +.5)
^^in^J'Vlf^
'm
K
M = maXj^^j,|A(k)|
= C«C
P^^^o:) = ip'^'^Ux)
J^'i^-^^
(a;.(q.))
.,
M
J
J
®. , [q
J-1
p^^^a:)).
by differentiating 22.5.37 of Abramowitz and Stegun (for
Next,
equal to
and
r
a subvector of
P (q)
=C.
(«^j(qj))']dP(q))
'm
^^J^^'j^^^
where the inequality follows by
for
+.5)
(i^
v
,_„'
it follows that for
here) and solving,
i
£ k,
I
d C
so that by 22.14.2 of Abramowitz and Stegun,
(x)
m
(y+ 5)
'
,
for
there
£
(a:)/diC
A(k-s)
as
in equation (2.3),
|aV^(q)|
|A(k-s)| s CK
where the last equality follows by
Now,
P
(q)
for the case with
for
P.
(q)
continue to hold by
n 2K
^"^^^X
C|A(k-s)r^-^^^"^^ ^ CK'
s Cn.^,[l.A,(k-s)]"''''j'^^J s
q1
,
let
bounded.
the linear space spanned
(q)
KJ\.
described above.
q
P
P(q
E[{q^-P{q^\n^^)}{q^-P(.q^\n^^)}']
Let
).
1/a
= q.,,
IK
k =
I
The bounds of the previous equation
P(q„) = (P
Note
H
(q
^ £
is bigger than
J^^,
M.
),..., Pj,(q„)
so that
)'
E[q^P(q2)'](E[P(q2)P(q2)']) -U
and thus has smallest
Furthermore,
\
E[P(q2)P(q2)']
by the extremal characterization of the smallest eigenvalue.
34
and
K,
for,
B =
Thus,
(q)
Kl^
eigenvalue bounded away from zero by Assumption 6.1.
E[P^(q)P^(q)'] = BDB'
P
s,
1
=
(M ).A
E[P^(q)P^(q)']) i A^. (BB')A^. (D) > min{X
(E[P(q- )P(q^)
min
mm
min K.
min
2
2
.
.
and for any
X
K
K
P (q)P (q)'
the number of nonzero elements of
satisfied,
and
there is a constant
d >
(J=l,...,r),
< m.
.
it
C
Let
B
follows that
consider the case where
First,
..
)
i C.
>
q = q
be the B-spline of order
(a;),
-1 + 2j/[L+l],
= ...,
j
-1,
-1,
0,
m,
is bounded by
such that for all
sup
J
Proof:
]
If Assumption 7.1 is satisfied then for splines Assumption 3.4 is
Lemma 8.5:
d
'
£
\X\
,^„|9 P,^(q)\ s
XA
q^Q, K— A
Q =
and let
ffn_,
Let
[-1,1].
for the knot sequence
with left end-knot
...
with
X
CK,
and let
j,
Pl^(q) =n;iHA^(k)>0)P^_^^(^^(q^).
Then existence of a nonsingular matrix
follows by inclusion in
Q
p (q)
splines for components of
all
if
|k-k'
distribution of
1587) that for
^""^
^pv^'^p'^
Po
c,
,
L
P^
(q)Po
c,
T
L
(q)
^
K
for
(q)
h(q)
Theorem 19.2 of Powell,
B-splines,
q e
and the
1981).
^px^^'-ip^^pic' ^^p^
2(m+l)'"K = CK.
Also,
~ ^
^°^
for
^^ll^^l^
(q)'dq) 2 C
P(q)
the
noting that
Q,
^^^ so-called normalized B-splines with
Burman and Chen (1989,
it follows by the argument of
l^^^
= Ap
implying that the number of nonzero elements of
-1/2
evenly spaced knots,
.
m,
K
independent uniform random variables on
r
[2(m+l)/L.] [L„/2]
mm {S
>
is bounded above by
P*^(q)P^(q)'
X
I
P
corresponding to components of
q
by a well known property of
q € R
such that
of all multiplicative interactions of
usual basis result for B-splines (e.g.
Next,
A
^i L+m+l'^^l^^
'
^^^^^ ^^
for all positive integers
L.
^
35
"^^^^
Therefore,
boundedness away from zero of the smallest eigenvalue follows by
p.
P (q) a
the
subvector of
®f.Pp
analogously to the proof of Lemma 8.4.
(qn),
,
Also,
since changing even knot spacing is equivalent to rescaling the argument of
B-splines,
sup„|a B
IK
{x)/d<c
.,
JL
|
s CL
d s
,
derivatives given in the conclusion.
implying the bounds on
m,
The proof when
is present follows
q
as in the proof of Lemma 8.4
Proof of Theorem 4.1:
9. 1
Note that Assumptions 3.2 and 3.3 imply that Assumption
is satisfied for
J =
and
1
= y.
y
The first conclusion then follows
by Lemma 9.9, and the second by Lemma 9.10.
Proof of Theorem
By reasoning as in the previous proof,
4. 2:
the theorem
immediately from Lemma 9.11.
Proof of Theorem 4.3:
Follows immediately from Theorem 4.2.
Proof of Theorem 5.1:
By Assumption 3.4,
K
combination of
A(h),
Z,
or
for any matrix
2
(q) i C,
(8.1)
(q)
Is a
(q)
by
IIDBll
= [trCDB D')]
A'Z
—1
A
2 1/2
IIDIIX
:£
max
Thus,
B.
Z
P (q),
then follows that
(B
)
s UFA'S
\\X
max
and hence the
-1
A
IIDIIA
F = n
for
max
-1/2
(B)
.by
,
>^'^^
]
^ Cv^,
—1 /?
—1 /9
(Z
)
£ CVn.
is invariant to replacing
A'Z
=
-1/2
K
P (q)
by the
Assumption 5.2, so that in analyzing the properties of
element of
does not change
= {tr[FA'Z~-^AF']>^''^ s C{tr [FA'Z"-^VS"-^AF'
—1
II
K
P (q)
1/?
for a positive definite square-root
IIFA'Z"-^''^!!
nonsingular linear
it suffices to show the conclusion with this
and positive semi-definite
D
IIFA'Z
Note that
p
2
Note that
replacement.
0"
Thus,
Q.
{J
K
so that replacing
p (q),
n,
P
row of
j
A,
A'Z
K
from
C P (q)
A
the
is Invariant to
j
K.
It
is a monotonic Increasing sequence in the positive
semi-definite semi-order (since this matrix is formally identical to the
36
,
inverse asymptotic variance of a minimum chi-square estimator, which rises as
additional equality restrictions are added).
eigenvalue of
A'Z
is also monotonic increasing in
A
having full rank for some
5 CA
IIFII
max
Z~ (Z-Z)Z~'^(Z-Z)Z~
in~'^)^^^ i CA
.
min
A'Z
A
A
(Q)'^""^ ^ CVn.
(Z~-^-Z~^
)
(Z-Z)Z~^ = z"^(Z-Z)Z~^ +
by Lemma 9.6 and Assumption 3.7,
,
so that by
K,
This implies that
t~^-I.~^ = Z~^(E-S)Z~-^ = E~^(Z-Z)Z~-^+
Also,
the smallest
the smallest eigenvalue of
K,
is bounded away from zero.
(8.2)
Therefore,
IIZ-ZII
= o
(1),
P
A
~-l
max
(Z
=
)
p
such that A
Z"^'^^
-1
that
IIAZ
--!/? 2
IIFA'Z
II
--1 2
Let
TT
II
and
= 0(1)
(.iT^^^)
max
-1/2
i CIIAZ
II
IIFA'Z
and there are positive semidefinite square roots
(1),
A
max
(t~^^^) =
It
(1).
Z
-1/2
,
follows
p
so that w.p.a.l,
II,
-1/7
^ Cn[l + A
max
(Z
'^irA
be such that
|h(q)
(1))]
IIZ-SIIO
=
p
--1/2 2
< IIFA'Z
+
)IIZ-ZII(1
--1/2 2
max
(Z
=
)
(n),
p
(n),
p
- P^(q)'ji|
Q V
^ CK~"-
For
R = F[A'
t'"^?'
h =
,
(h(q^),....h(q^))'.
F[A(h) - A(hQ)] = FA'Z~V'u/n
where,
for convenience,
and (8.2),
(8.3)
the
K
+ R,
superscript on
IIRII
^ IIFA'Z~'^P/V^llllh-PSll/v/n +
FA'Z~-^P'u/n = FA'Z~^P'u/n + R.
Lemma 9.8,
By vi)
has been dropped.
w. p. a. 1,
£ C[tr(FA'z"4z~-^AF')]^^^ +
Also,
P
{h-Pn)/n+A' n-Aih^)]
{y}''^)
liz"-^^^' u/V^II =
IIFIIIIA'Tt
^Ik""
- A(hQ)ll
s C^nK"" = o
(1).
R = FA' (Z~^-Z~-^ )P'u/n.
and by Lemma 9.6.
P
37
By the proof of
II(Z-Z)Z~
II
£
—1 /?
-
IIZ-ZIIA
max
(S
1
=
)
/?
p
(8.4)
"?
Therefore,
Cn(K) /VK).
(K
= IIFA'i~-^(2-Z)Z~-^P'u/nll
IIRII
IIFA'Z'VyHlIll (E-Z)S~^'^^II IIZ~-^''^P'u/V^II
:£
s CO (K^'^^K^^^<„(K)^/v^) = o (1).
P
p
Next,
as
A(h),
v'FA'Z
be a constant vector with
let
I'
and
Z.
= p'FA'Z~^P^(q, )u,/v^,
Note that
P'u/n.
_1
Also,
1.
IZ^^I
£
II/JIIIIFA'Z
a = s/2,
Assumption 5.1, for
and the same dimension
1
r",Z. /Vn =
^i=l in
is a martingale difference sequence,
Z.
9
E[Z^^] =
=
so that
^11
in
llvll
(qJlllu^l/'/S £ CK
u[l-(2/a)]
>
2
and
1
|u.
1
(IZ^^I^)^ - CK^Co^^^'^^'^i'^'^ "
^^<o^^^'^-
inequality and
K
2
r.Z
./n
^1
ni
>
m
<
I
1.
for any
Also,
e-n)Z^
in
m
Since this result holds for all
CkV (K)\[ |u.
/Vn
Y.^
= J..
(1984) that
llvll
=
1,
in
l
FA'Z'^^P'u/n
= o(l).
-^
-^
N(0, 1).
N(0,I)
and the triangle inequality.
(8.4),
To prove the second conclusion, note that Assumption
hypotheses of Theorem 4.1 are satisfied.
Y. |u.-u. l^/n =
1
|h(q.)-h(q.)|^/n
L
^i
1
'ii
Therefore, by
|u.-u.
/^7
(8.5)
|'^]/(ne)
The first conclusion then follows by eqs.
follows by the Cramer-Wold device.
1
= o(l),
i
with
i^
CK^C-(K)Vn
ci
.
£ E[Z^ ]/(n6) s
]
)^ ^
I
*"i
^1
Thus,
co.
0.
e > 0,
J-
.
It then follows by Theorem 5.2.3 of White
(8.3),
By
s KK,
-^
E[1(Z^
Cq(K)|u^|.
Davydov's (1968)
^^^"^ '^^
E[(j:.Z^./n - 1)^] s i (C/n)[j:^"^r*'^^ " ^''^^KIZ^
'^i ni
^t=0
in
so that
/o
1
If
IIIIP
and
111
£ 2|u.
|
?
j:^|u^-u^|/n
=
p
1
:£
I
|u. -u.
I
Thus,
for
u.
implies that the
5. 1
= y
-
(K/n + (nK"^")/n) =
|u.-u.|
11
»
?
1 /?
(1){J:^|u.-u. r/n}^'^^ =
38
,
1/2
),
(K/n) = o
p
+
h(q
1/2
{Y^'^/n^).
(1).
p
Let
and
V
and let
be as defined above,
Z
111
V = T.P^Cq. )P^(q. )'u?/n.
''I
(8.6)
= P
y
(q.
)P^(q
)u
2
and
s
there equal to
so that
2s,
J = K,
IIV-VII
now follows by the triangle inequality that
It
<
IIF[n-n]F'll
II
FA 'Z"-^
II
+
^IIV-VII
o
(1).
noting that by
,
=
U)
v
y
~ 1 /?
_
u7.
p (q),
Then
Assumption 9.1 is satisfied for
s > i^l/{^l-l),
replacing
P (q)
K^'^^^QCfO^EilV^i'^" " ° (K^^^Co(*^)^K^^^/n^^^) =
apply Lemma 9.6, with
Next,
2
s
IIV-VII
except with
(K
_
C,
= o
IIV-VII
= C„(K)
AK)
7
B
,
,
yi
/Vn) = o
(i:
Therefore,
(1).
IIF[A'f:~'^(Z-S)Z~-^VZ~^(Z-f:)z"-^A]F'll
+ 2IIF[A'Z~-^(Z-Z)Z~'^VZ~^A]F'II
s o (1) + o (1)X
(Z~-^) + o (DA
(1) = o (1),
max
max
p
p
p
p
giving the second conclusion.
,l/2-,l/2
w
Q
n
—
_,l/2
>
,.
.,
^ ,v,
by continuity
of
the square root, and
,
i2„
1
yy
K
by
-^
I,
o
Q
-1
nonsingular,
A = E[p*^(q)5(q)'].
(q)
it
,
^
i//
n
Q)
,l/2_,l/2
=
i/»
Q
—
yy
/-1/2--1/2 p
—
Q
.K
thatV
(i//
Q
follows that
-1/2
i//
Q
n
>
Q
,
^-1/2-1/2
U
Q
_,
.
then
^
Therefore,
so the final conclusion follows from the first conclusion.
Proof of Theorem 5.2:
5
1
Q
^,1/2
,,
,
Therefore, multiplying through by
^
and,
hypothesis is satisfied,
If the final
By iii) and each component of p (q) an element of
Let 5^(q) = p^(q)'Z"-^A = p^(q) z"-^E[p^(q)5 (q)
'
is the minimiim mean-square error linear combination of
follows by ii) and
(r^(q)
E[a-^(q)ll5(q)-5^(q)ll^] s CE[
bounded that
115
E[5 (q)5 (q)'] -^ E[5 (q)5(q)
'
(q)-5„ (q)
]
,
11^]
E[ll5(q)-5 (q)ll^] -^
-^
0.
Therefore,
p
'
]
.
(q),
H,
Since
it
and
A'Z'-^A =
so that Assumption 5.2 is satisfied by iii).
Also,
39
llnn-nQll
= IIE[<r^(q)A'Z ^p^(q)p^(q)'Z"-^A] -
= IIE[(r^(q)5j,(q)5j,(q)'] - E[o-^(q)5(q)5(q)
so that the final hypothesis of Theorem
5. 1
'
n^ll
]
II
^ E[o-^(q) Il5(q)-5j,(q)
is satisfied.
-^
11^]
0,
The conclusion
follows by the final conclusion of Theorem 5.1.
Proof of Theorem 5.3:
exists a sequence
A(h)
h.(q) e H,
is not mean square contionuous then there
(J = 1,
is bounded away from zero.
|A(h.)|
that
If
P(.y\H) = h(q)
+ rh.Cq),
...)
2,
such that
E[h.(q)
2
]
—
and
»
Consider any parametric submodel such
with true value of
y
equal to zero.
By
Chamberlain (1987) the supremum over all such submodels of the Cramer-Rao
variance bound is the the asymptotic variance of the least squares estimator,
which is
2
(T
(q)
(E[h.(q)
2-2
2
2
E[(r (q)h.(q)
])
bounded away from zero,
Furthermore, by the delta-method and
].
the corresponding supremum for
A(h)
is
[aA(h+rh.)/ar]^(E[h.(q)^])"^E[(r^(q)h.(q)^] i CA(h )^(E[h (q)^] )~^ -^
J
J
J
.
.
J
J
Therefore,
the supremum over all parametric submodels of Cramer-Rao bounds
for
is not finite.
A(h)
Proof of Theorem 5.4:
Given in Section
5.
Proof of Theorem 5.5:
Given in Section
5.
Proof of Theorem 6.1:
Proceed by verifying the hypotheses of Theorems 4.1
Assumption 3.4 with
and 4.2.
Lemma 8.4.
co.
follows by Assumption 6.1 and
C^fK^ ~ ^
Assvunptions 3.5 and 3.6 follow by the assumption
|A(k-s)|
is
increasing, which implies that the products of univariate orthogonal
polynomial terms form a nested sequence.
6.2 and
2
K = K
.
Assumption 3.8 with
Assumption 3.7 follows by Assumption
d =
two conclusions then follow from Theorem 4.1.
40
follows by Lemma 8.1.
The first
The third line follows from
The final conclusion follows from Theorem 4.2,
Theorem 4.2 similarly.
bound on
C-v (J^)
any
)
a
from Lemma 8.4, and Lemma 8.2 (which implies Assumption 8 for
A
>
.
Proof of Theorem 6.2:
By Theorem 5.1 it suffices to show that Assumption 5.3
Assumption 3.4 with
is satisfied.
and Lemma 8.4.
Co(K) =
j,3j,2+4i^,
i,5+4i7,
/n = K
/n
>
Proof of Theorem 6.3:
„
,
by
.
,
1
)
—
VnK
K = K^,
Finally, note that
—
follows by Assumption 6.1
K'
Assumptions 3.8 and 3.9 and
Lemmas 8.1 and 8.2.
K K
the
follow by ii) and
>
so that
KKC,
{K)^/n =
.
Proceed to verify hypotheses of Theorem 6.3.
Assumptions 3.1, 3.2, and 5.1 are satisfied by the independence of the
observations.
Note that
A(h) = h (q )-h (q
is continuous with respect to
.
.
.
J
J
J
|h|„
J
|h|
J
6.3 is satisfied.
E[5(q)h(q)],
with
V =
2.
,
d.oo
Proceed from Theorem 5.2.
as in the proof of Theorem 6.3,
J
-
)
r
h(q),
=
J
so that ii) of Theorem
,
The conclusion then follows by taking
Proof of Theorem 6.4:
q
J
A(h) = d h.(qj/5q.
while
,
q
1
0,00
is continuous with respect
to
'^
3 h(q)/3q.
= h(q,
.)
J
It
ip
= Q
follows by
-1/2
u
.
A(h) =
that Assumption 5.3 is satisfied
Theorem 5.2 ii) follows by the well known spanning result for
power series for bounded
Proof of Theorem 6.5:
the proof of Theorem
Proof of Theorem 7.1:
q
(e.g.
Gallant,
1980),
giving the result.
Follows from Theorem 5.3 by the same argument used in
6. 5.
Proceed by verifying the hypotheses of Theorems 4.1 and
follows by Assumption 7.1 and Lemma
4.2.
Assumption 3.4 with
8.5.
Assumptions 3.5 and 3.6 follow trivially by
Cp,(K)
= K'
K
constant.
3.7 follows by Assumption 7.2 and Lemma 8.5, which implies
Assumption 3.8 with follows by Lemma 8.3.
41
Assumption
K = ^ CK
2
.
The conclusions then follow from
.
Theorems 4.1 and 4.2, with the bound on
Proof of Theorem 7.2:
and Lemma 8.5.
Lemma 8.3.
K^K^/n =
Cr,^^^
~ ^'
Assumptions 3.8 and 3.9 and
Finally, note that
^0
K^n
A
By Theorem 5.1 it suffices to show that Assumption 5.3
Assumption 3.4 with
is satisfied.
by
from Lemma 8.5.
<..(K)
K ^ CK
follows by Assumption 7.1
—
VuK.
follow by ii) and
>
by Lemma 8.5
so that
KK< (K) /n =
i).
Proof of Theorem 7.3:
Follows analogously to the proof of Theorem 6.3.
Proof of Theorem 7.4:
Follows analogously to the proof of Theorem 6.5,
except for the explicit formula for
5(q)
given, which follows by
J'w(q.)[a^h.(q.)/aq'!]dq. = Jw(q)[5'^h(q)/aq'^]dq
w(q) = w(q )f (q_
for
[d^wiq)/aq^]/f(.q) = [a'^wCq j/Sq'!] f (q_ )/f (q) = [a^w(q j/aq'^J/f (q
.
9.
.
.
.
|
q_
.
)
.
)
where
,
Useful Lemmas
This Section gives general results on convergence rates for certain
It is useful to allow throughout for a vector of series
remainder terms.
estimates with dimension that can increase with sample size.
To do so,
it is
necessary to introduce more notation and assumptions.
m
I
Let
{y.,)._,
observation
z.
j_i
be a collections of functions of a single data
For notational convenience,
be suppressed in what follows.
the
J
subscript on
will
y.
The results will pertain to certain of sample
averages of these functions, or of series estimators of the projections
of
y.
on
y. .-h.(q.)
ij
1
J
K.
Denote the observations on
y.
by
y.
.,
and let
u.
.
h.(q)
s
and objects without a subscript denote corresponding vectors of
42
n
or
observations,
J
K = K(z
,
.
.
.
,
y. = (y
e.g.
let
z ,n),
p =
= (u.
u.
For
)'.
u.
The estimators are
p (q )]'.
(q.)
[p
and
.)'
,y
h.(q)
= p^(q)'(p'p)"p'yj.
Assumption
For
9. 1:
s >
and
1
i'
(
y
2
E[B .Iq.] i C
and either a)
coefficients
such that
rit)
(i=1.2....
= 0(.t~^),
<pU)
(t
"
"^
Z+=i<^(*)
z
=
J)
max
,
is
uniform
1,
2,
.
max.
2i^d
.
)
.
^ v
,|u..|
ij
.
j^J
^
IB
.,
yi
.1
yi s
<oo,
mixing with mixing
(0)
,
(J)B
y
>
or
2,
|E[u. -u.
.
there exists
b)
-
-Iq-.q.^^ll
cU)v
(J)
,
).
Henceforth,
let
=
^^^i
^i'
I^£=i
=
^^^
T.(^'
=
Y.i=i
ly
The first few Lemmas consist of useful convergence results for random
matrices with dimension that can depend on sample size.
denote symmetric matrices such matrices, and
A
max
Let
and
(•)
Z
X
and
Z
the
mm (•)
.
smallest and largest eigenvalues respectively.
Lemma 9.1:
= o
IIZ-SII
If
X
then
(1)
i C
(Z)
X
p
mm it)
.
with probability approaching one
^ C
Proof:
For a conformable vector
X
= min„
mm (Z)
.
C - o
+
A^l'I^l
>-^
„
ll/ili=l
(1).
Lemma 9.2:
If
^ C
A
,
min
(Z)
mm (Z)
.
conformable matrix such that
IIZ"-^''^D
n
Proof:
=
II
p
It
< llAlhllBII,
tr(A'BA)
:£
(e
n
it follows by
^l,
^
^ (Z-Z)fi>
A
A
=:
w.p.a.l,
^
-1/2
D
.
max
•
II
a matrix norm that
(Z-Z) ^ X
mm (Z)
.
-
IIZ-ZII
£
w.p.a.l.
^ C/2
IIZ
- A
mm (Z)
II
IIZ-ZII
= o
and
D
for some
e
(1),
p
=
II
n
(€
p
n
)
is a
n
n
,
then
).
is easy to show that for any conformable matrices
IIA'BAII
IIAII^A
and
w.p.a.l.
fi'
Therefore,
p
(w. p. a. 1)
max
i
IIBIIollA'AII,
(B),
IIABII
^
and that if
IIAIIA
max
(B)
43
B
and
A
and
B,
IIABII
is positive semi-definite,
IIBAII
^ NANA
max
(B).
Let
Z
-1/2
is an orthogonal matrix and
A
roots of the eigenvalues of
Z
-1/7
X
max
(Z
)
(9.1)
= IX
-1
p
n
such that
n
-1/2
X
max
--1
(Z
X
llP'P/n -
Zll
^ C
mm (Z)
.
= o (1)
P
W = P(P'P) P'
Let
and
w. p. a. 1.
and
rows,
A
and
is a
max
(Z~-^)]
(e^).
P
n
a random matrix
u
K x n
random matrix
(e
and
p
Then
n
tr(u'p(p'p) p'u/n) =
W = p(p'p)
Since the space spanned by
G
and
If
lIG-Gll^/n £
For
W
'
(e
p
= PA
p
'^
2
n
).
P
and
p
is a subset of the space spanned
p
Z = P'P/n.
Then by Lemma 9.2,
(e^).
n
denote random matrices with the same number of columns and
u = Y-G.
and let
),
be the orthogonal projection
p'
Let
For a vector
tr(u'p(p'p) p'u/n) =
let
p
-
Proof:
Then
(1).
—
P'u/nll =
is positive semi-definite.
W-W
Lemma 9.4:
n,
p
n
-1/2—
IIZ
P
Y
=
+ ll(Z-Z)Z"-^D II^A
P
,
tr(u'Wu/n) s tr(u'Wu/n) = IIZ~^'^^P'u/nll^ =
Let
)
(e^)[l + o (1)0 (1) + IIZ-ZII^A
(Z'-^'^^J^O (1)] =
n
max
p
p
p
Suppose
respectively.
P,
U
is positive definite and
operators for the linear spaces spanned by the columns of
by
where
)
+ I1Z"'^''^[Z-Z]Z"-^''^II)
is a random matrix.
A
Proof:
Z
Also by
^ Lemma 9.1,
.
n
11^(1
UAU'
rows.
Lemma 9.3:
where
1/?
Note that
.
denote the trace of a square matrix
tr(A)
with
-1
)]
n
which is equal to
Z
a diagonal matrix consisting of the square
= tr(D' [z"-^-z'-^]D
11^
n
liZ'-^'^^D
£
(Z
max
IIZ'-^'^^D
s
Let
-1
be the symmetric square root of
(e
2
).
n = (p'p) p'Y
and
G =
prr.
Then for any conformable matrix
(e^) + IIG-pnll^/n.
p
and
n
W
as in the proof of Lemma 9.3,
idempotent,
44
n
by
Wp =
p,
and
I-W
IIG-GII^/n =
tr[Y'WY
Y'WG - G'WY + G'G]/n = tr[u'Wu + G' (I-W)G]/n
-
s tr[u'Wu + (G-ptt)' (I-W)(G-p7r)]/n s
Suppose
X
llP'P/n -
Ell
Lemma 9.5:
such that
(S)
.
= o
w.p.a.l.,
^ C
n
P
llG-p7rll^/n.
^
is a random selection matrix.
S
\\n-n\\^
so p (e^)
+0 p (1
n
)
A
.
min
p'p/n = S'(P'P/n)S
By
(P'P/n).
Also,
Thus,
A
note that for
W
llTT-Till^
=
£ A
.
.
min
min
+
(e^)
n
p
Proof:
Then for any conformable matrix
n,
IIG-p7rll^/n,
^
tr[(7r-Tt)'S'ES(rt-7r)]
'
n
P
P = PS
and
(e^),
P
where
random matrix
K x n
is a
tr[u'P(P'P)~P' u/n] =
and
(1)
+
(s^)
p
p
(1
)
IIG-p7rll^/n.
for the selection matrix
(p'p/n) i C
w.p.a.l,
f
>
so
W = p(p'p) p'
as above,
A
.
^^^
.
min
=
(p'p/n)~-^
^'
r-
G =
and
,
A
S,
(p'p/n) i
p (1).
pTr,
(p'p/n)~-^tr[(7r-Tr)' (p'p/n)
(n-Tt)]
^ ^
(l)tr[Y'WY - Y'WG
G'WY
-
+
G'G]/n
P
£
(l)[tr(u'Wu/n) +
IIG-GII^/n]
=
P
(e^)
P
To prove the second conclusion,
+
n
p
(1
)
IIG-GII^/n.
note that by the triangle inequality and the
same arguments as for the previous equation,
tr[(7i-7r)'S'ZS(7i-7i)]
= tr[(n-rr)' [S'ZS-p'p/n]
s llTi-Till^llS'ZS-p'p/nll +
(e^)
p
£ [0 (6^) +
p n
p
n
+
(l)IIG-GII^/n](l +
p
(tt-tt)]
(tt-tt)'
(p'p/n)
(tt-tt)
(l)IIG-GII^/n
- P'P/nll) =
IIZ
+
(e^)
p
n
+
(
p
1
)
IIG-GII^/n.
The next results give convergence rates for sample average with dimension
that can grow with sample size.
below,
let
a
>
2/i/((j-l)
For
ji
in Assumption
be as small as desired,
45
and
1
and
s
in Lemma 9.6
^ f-(l/2) + (l/s)-(l/A),
I
-1/2
Lemma 9.6:
and
B
for
s >
,
,
^ =
s > 2(j/(/J-1)
2u/(u-l)
fl
(s/2a),
s s 2m/(h-1)
1/2
s > 2/i/(ji-l)
-
I^
(J =
max .^,\y .\ ^ v (J)B ..
yi
j^ ij
y
.
,
1,
2,
and
...),
\B
.\
yi s
v (J)
< m
then
1,
(A
llE^y^/n - E[y^]ll =
The proof for the case
Proof:
2fi/(/J-l)
:£
If Assumption 3.1 is satisfied and there exists increasing
with
.
yi
s
(J)n").
s > 2|Li/(fx-l)
follows immediately from
applying Davydov's inequality to the covariance terms in
E[ IIJ].y./n-E[y.
2
]
II
].
The proof for the other case follows by a truncation argument analogous to
that used to prove weak laws of large numbers.
K
For the
of Assumption 3.4 and
P (q)
11
= P^(q),
Z = E[P(q. )P(q.
number of elements of
lli
-
=
Zll
P^(q)P^(q)
y. =
is nonzero for some
_
IP, i^Cq. )P»v^(q.
kK i cK. i
_ 2
V (J) = CrtfK)
Let
y,
h .(q.
J
1
),
and
K
let
P(q)
denote the
Q.
(K<-(K)Vn).
^0
Apply Lemma 9.6 for
9.6 with
T
'^.PCq. )P(q. )'/n.
^1
^t=l ^1
that are nonzero at any point in
P(q)P(q)'
p
e Q,
of Assumption 3.5,
If Assumptions 3.1, 3.4, and 3.7 are satisfied then
Lemma 9.7:
Proof:
S =
)' ].
—
—
—
K = K(n)
B
,
)
.
=s
I
=
1,
C< (K)
and
2
,
P,
rr(q)P/,7(q)
q e Q.
for all
Here,
k
J = K.
and
I
such that
Note that for all
q
so that Assumption 9.1 is satisfied for
s = m.
Thus,
the conclusion follows by Lemma
s = > 2fi/(ji-l).
h,
and
and
h .(q.
J
be
h
n x J
matrices with respective
).
1
46
ij
elements
y.
.,
,
.
If Assumptions 3.1,
Lemma 9.8:
3.4,
u = y-h,
Let
Proof:
=P^(q.),
P.
1
E[P P.^'u
u
expectations,
EEP^^P'^^u^
by each element of
residual with
H.
Z = E[P.PM,
|q
.
^v
] ]
(9.6)
v'
(9.7)
v'
H
cU)
for
[C..+C.']v £ 2\E[v'P.P'.
jt
1
,i^E[u. .u.
ij
-1/2
IIZ
(J)^tr(E~-^''^[C +
.
u. = (u,
.Iq.,q.
^1 ^i+t,]
,u
Ij
27"
— 1/2
P'u/nll =
ZcU)v iJ)^v'Zv,
]
I
(J)^cU)v' Zv.
s 2v
y
is positive semi-definite in either case,
y
giving
9. 1
for
Then it follows that
.)'.
nj
= r.-^^ELu'.PE'V'u.l/n^
E[llz"-^''^P'ull^/n^]
Ji^
=0
(1984) that for any
i+i, J
J
s
]
given there,
c(i)
i+t
Let
as in (9.6) or (9.7).
(9.8)
E[P u
= Ct
2
Zv (J) c(t)S - (C..+C.',)
jt
Jiy
Thus,
s
and orthogonality of the projection
[C.,+C.']v £ ZCt'^^v'ElP.P'.u^ .]v ^
jt
.
Then under the uniform mixing condition of Assumption
under Assumption 9.1 b),
Also,
c(t)
and
v
.
jt
Also,
(J)^Z.
it follows by Lemma 2.2 of White and Domowitz
conformable
C
By Assumption 9.1 and iterated
an element of
P
and
11
= E[Pj^P^E[u?
.]
then
(J)^/n).
(JKi^
1
(0 s t < n-i).
],
.
and 9.1 are satisfied,
3.7,
=0
tr[(y-h)'p(p'p)~p' (y-h)/n]
(9.5)
a),
3.5,
((JK)
p
^t=0
c(i)]ZZ"'^^^)/n = 0(JKi^ (J)^/n),
y
v (J)n
y
-1/2
).
The conclusion then follows by
Lemma 9.3, Lemma 9.7, Assumption 3.5, and Assumption 3.7.
Define
5(h,K,d,v) = inf max..
TT
5(h,K,d,oo) = inf max
\
^
.
\
..
,{E[|a'^{h(q.)-7r'p^(q.)}r]}^'''' + exp(-exp(K)
1
—CI
sup
I
1
a'^[h(q. )-7i'p'^(q)
47
]
|
+
exp(-exp(K)
)
)
Lemma 9.9:
If Assumptions 3.1,
3.4,
p
Let
Proof:
and let
n
[tt
s.
{E[ |h,(q.
for
K s K s K.
t^
1=1
w. p.a.
1
K-N^K
K
1/v
V
(q,
Jr>-
) ]
|
J
]
>^
s 5(h,,K, 0, v).
J
1-
5(h.,K,0,v) s 5(h.,K,0,v),
j
1-
j
the indicator function for the event K s K £ K,
^ E[max^^j,^j^llh-p'7rj,ll^/n]
E[l^llh-p'Trj^ll^/n]
Then by
'p
By Assumption 3.5,
,v^
Jk
Then for
)-7r
1
J
Ik
and 9.1 are satisfied,
J
J'^
=
3.7,
y
be such that
.„
jr
3.5,
and the Markov inequality,
2
llh-p'7T/>ll
/n £
V ?/v
Also, by Lemma 9.8,
(y-h)'p(p'p) p' (y-h)/n =
).
iY,.{Y.^^„^^(h.,K,0,w) }
P J ^—K—
J
— 1/2
-1/2
((JK)
I'
(J)n
).
The conclusion then follows by Lemma 9.4.
Is.
If Assumptions 3.1.
Lemma 9.10:
for the distribution
of
F(q)
3.4,
3.5.
3.7,
and 9.1 are satisfied,
then
q,
{5:.J[h.(q)-h.(q)]^dF(q)}^''^
P
Let
Proof:
K
P (q)
y
J
K=:1S.-1S.
J
be as defined in the proof of the previous Lemma, with
n.
replacing
K
p (q).
Then by the same argument as there,
Xl^llh(q)-P^(q)'7r-ll^dF(q) s i:j{5:j^^j,^^5(hj, K, 0,
Next,
P = [P^(q^),
-V
w.p.a.l,
71
.
.
.
,P^(q^)]',
= (P'P) P'y,
P=[P^(q^)
S = jP^(q)P^(q)'dF(q),
apply Lemma 9.5 with
the selection matrix such that
S
71
v)''}^"'''.
=
ir^.
G = h,
48
- 1/2
e^ = (JK)
v
(J)n
^^^''n^^''
P^(q) = S'P^(q)
-1/2
.
From the
A
conclusion of Lemma 9.5 and the argument in the previous Lemma,
follows
it
that
= JllP^(q) [ii-TT^] ll^dF(q)
Xllp'^(q)[n-7r^]ll^dF(q)
= tr{(n-7r^)'[JT^(q)P^(q)'dF(q)](n-Tr^)}
= tr[(jr-7r)'S'ZS(n-7r)]
:£
p
=
°p^^^
(e^)
+
n
* Op(l)llh-P'7r^ll2/n =
(1) IIG-P7rll^/n
p
Op(.2)
.
Then by the first equation of this proof, and
J'llh(q)-h(q)ll^dF(q) s C{ JllP^(q)
[rt-ir^]
0p(I.{I^^^^^5(h..K.0.v)^}2/-)
1-
=
w.p.a.l,
1
+ Jl^llh(q)-P^(q)
ll^dF(q)
'
Tij^ll^dFCq) >,
so that the conclusion follows.
Lemma 9.11:
If Assumptions 3.1,
and
are different iable of order
P
(q)
Let
Proof:
=0
- a^(q)ll
n.
e
R^
„|A.„(q)|
qtU
:£
J
and
be such that,
for
and
5(h.,K, |A|,oo)
sup
J
K.
Also,
let
and
k ^ K
h(q)
K,
^^^"
(K^''^C,^|(K)[(K/n)^^^+{E.5(h..K. Ul.co)^}^''^]).
A.
h h (q)-P^(q)
(q)
.
JK.
JJ>>-
each
for each
\\\
JK-
sup
and 9.1 are satisfied,
3.7,
^Ul^^^ ^<0^^^'
^^'^
"•^^sK^^eo'^^kK^^^I ^<|A|^^^'
sup^„lia^(q)
3.5,
3.4,
„
qsy
a\ js.
.^(q)
K x J
be the
tt
|
'tt
.
JK.
J
5(h
:£
I
..
K, |X
|
,
oo)
for
J
matrix with
column
J
and
A^
the n x J matrix with
Assumption 3.7,
5(h
[P*^(q^),...,P^(q^)]',
(9.9)
.,
K,
|
A|
,
oo)
^ 5(h
.,
K,
|
A|
,
oo)
Note that by
A.„(q.).
w.p.a.l.
Thus,
for
w.p.a.l.
llh-Pn^ll^/n = IIA^II^/n =
K.
element
ij
K.
J]
-EJ 1
-i^Cq-
jK.
49
1
)^/n i
j;
.5(h
J
.,
J
K,
|
A
|
n.
jK.
K.
.
oo)^.
P =
.
Next,
Y = y
let
and
G =
h.
Note that columns of
linear transformation of the columns of
by Lemma 9.8,
Thus,
P(P'P) P' = p(p'p) p'
so that
p,
tr [ (Y-G)'P(P'P)~P' (Y-G) ]/n =
are a nonsingular
P
by Lemma
Thus,
(K/n).
9.5.
(9.10)
llTC-Tri>ll^
K
s
(K/n)
+0
p
P
(l)y'.5(h ..K,
~
J
U|
,
oo)^
J
= Op([(K/n)^''^ + {j:j5(hj.K, lAl.o.)^}^^^]^).
Noting that
3 h(q)
= n' d P (q),
inequality that for any
lia'^h(q)
£
Since the
tefrm
q e Q,
- a\(q)ll^ s C{ll(^-Tr/>)'a'^P^(q)ll^ + ll7r-'aV^(q)
ll7r-7r-ll^lia'^P^(q)ll^
r>-
it then follows by the Cauchy-Schwartz
+ j:.A.-(q)^ ^ KC
J
JN-
,
.
,
I
A
I
(K)^llii-7r" 11^
f^.
+
J]
.a(h
J
following the last inequality does not depend on
conclusion then follows from eq.
(9.12).
50
.,
-
K,
a\(q)ll^>
|
A
t
,
oo).
J
n,
the first
References
Abramowitz,
Functions.
M.
and Stegun,
Washington,
I.
D. C.
eds. (1972). Handbook of Mathematical
Commerce Department.
A.,
:
D. W. K.
(1991). Asymptotic normality of series estimators for various
nonparametric and semiparametric models. Econometrica. 59 307-345.
Andrews,
Andrews, D.W.K. and Whang, Y.J. (1990). Additive interactive regression
models: Circumvention of the curse of dimensionality. Econometric Theory.
466-479.
6
Bickel P., C.A.J. Klaassen, Y. Ritov, and J. A. Wellner (1990): "Efficient and
Adaptive Inference in Semiparametric Models" monograph, forthcoming.
Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for
multiple regression and correlation. Journal of the American Statistical
Association. 80 580-598.
Breiman,
Buja, A.,
models.
L.
,
Stone,
C.J.
(1978).
Nonlinear additive regression,
note.
T.
and Tibshirani, R. (1989). Linear smoothers and additive
Annals of Statistics. 17 453-510.
Hastie,
,
Burman, P. and Chen, K.W. (1989).
function. Annals of Statistics.
Nonparametric estimation of a regression
17 1567-1596.
Chamberlain, G. (1986). Notes on semiparametric regression. Preprint.
Department of Economics, Harvard University.
Davydov, Y. A. (1968). Convergence of distributions generated by stationary
stochastic processes. Theory of Probability and Its Applications. 13
691-696.
Granger, C.W.J., Rice, J., and Weiss, A. (1984). Semiparametric
estimates of the relation between weather and electricity sales. Journal of
the American Statistical Association. 81 310-320.
Engle, R.F.
,
Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. Journal
of the American Statistical Association. 76 817-823.
Gallant, A.R. (1980). Explicit estimators of parametric functions in nonlinear
regression. Journal of the American Statistical Association. 75 182-193.
(1981). On the bias in flexible functional forms and an
essentially unbiased form: The Fourier flexible form. Journal of
Econometrics. 76 211 - 245.
Gallant, A.R.
(1985). A method for calculating bounds on the asymptotic
covariance matrices of generalized method of moments estimators. Journal of
Econometrics. 30 203-238.
Hansen, L.P.
51
W. and Stoker, T. (1989). Investigation of smooth regression by the
method of average derivatives. Journal of the American Statitistical
Assocations. 84 986-995.
Hardle,
Heckman, N. E. (1986). Spline smoothing in a partly linear model.
the Royal Statistical Society, Series B. 48 244-248.
Lorentz, G.
Company.
(1986).
G.
Journal of
Approximation of Functions. New York: Chelsea Publishing
Newey, U.K. (1988a). Adaptive estimation of regression models via moment
restrictions. Journal of Econometrics. 38 301-339.
(1988b). Two-step series estimation of sample selection models,
Princeton University. Department of Economics.
Newey, W. K.
preprint.
(1990). Series estimation of regression functionals. preprint. MIT
Department of Economics.
Newey, W. K.
Newey, W. K. (1991). The asymptotic distribution of semiparametric estimators.
Preprint. MIT. Department of Economics.
(1981). Approximation Theory and Methods. Cambridge, England:
Cambridge University Press.
Powell, M.J.D.
Powell, J.L.
Stock, J.H.
and Stoker,
of index coefficients. Econometrica.
,
,
(1989). Semiparametric estimation
57 1403-1430.
T. M.
J.
(1986). Convergence rates for partially splined estimates. Statistics
and Probability Letters. 4 203-208.
Rice,
Robinson, P. (1988). Root-n-consistent semiparametric regression.
Econometrica. 56 931-954.
Schick, A. (1986). On asymptotically efficient estimation in semiparametric
models. Annals of Statistics. 14 1139-1151.
Schumaker,
(1981):
L. L.
Spline Functions: Basic Theory. Wiley, New York.
Stone, C.J. (1982). Optimal global rates of convergence for nonparametric
regression. Annals of Statistics. 10 1040-1053.
Stone, C.J. (1985). Additive regression and other nonparametric models.
of Statistics. 13 689-705.
Stone,
C.J.
Tech. Rep.
(1990).
No.
L
268,
Aru^als
rate of convergence for interaction spline regression,"
Berkeley).
(1984). Cross-validated spline methods for the estimation of
multivariate functions from data on functionals. In Statistics: An
Appraisal, Proceedings 50th Anniversary Conference Iowa State Statistical
205-235, Iowa State University
Laboratory (H.A. David and H. T. David, eds.
Wahba, G.
)
Press,
Ames,
Iowa.
52
White, H. (1980). Using least squares to approximate unknown regression
functions. International Economic Review. 21 149-170.
White, H.
Press.
(1984).
Asymptotic Theory for Econometric ians. Orlando: Academic
White, H. and Domowitz, I. (1984). Nonlinear regression with dependent
observations. Econometrica. 52 143-161.
M. D. and Thomas, D. M. (1975). Ozone trends in the Eastern Los Angeles
Proceedings International
basin corrected for meteorological variations.
Conference on Environmental Sensing and Assessment 2, held September 14-19,
1975, in Las Vegas, Nevada.
Zeldin,
,
53
40
7
-6!
Date Due
«AY
3
IfSOT
MIT
3
TQflO
LIBRARIES
D07nS51
1
Download