Document 11157114

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
http://www.archive.org/details/asymptoticvarianOOnewe
The Asymptotic Variance
of Semiparametrlc Estimators
Whitney
No.
583
K.
Newey
Rev. July 1991
i
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
^nui-lJ''
The Asymptotic Variance
of Semlparametrlc Estimators
Whitney
No.
583
K.
Newey
Rev. July 1991
NOV
J
4 1991
MIT Working Paper 583
The Asymptotic Variance of Semiparametric Estimators
by
Whitney
K.
Newey
MIT Department of Economics
August 1989
Revised,
July 1991
Helpful comments were provided by referees and M. Arellano, L. Hansen, J.
Hausman, R. Klein, P.C.B. Phillips, J. Powell, J. Robins, T. Stoker. Financial
support was provided by the NSF, the Sloan Foundation, and BellCore.
Abstract
Knowledge of the asymptotic variance of an estimator is important for
large sample inference, efficiency, and as a guide to the specification of
regularity conditions.
The purpose of this paper is the presentation of a
general formula for the asymptotic variance of a semiparametric estimator.
A
particularly important feature of this formula is a way of accounting for the
presence of nonparametric estimates of nuisance functions.
The general form
of an adjustment factor for nonparametric estimates is derived and analyzed.
The usefulness of the formula is illustrated by deriving propositions on
asymptotic equivalence for different nonparametric estimators of the same
function,
conditions for estimation of the nuisance functions to have no
effect on the asymptotic variance, and the form of a correction term for the
presence of linear function of a conditional expectation estimator, or other
projection estimator (e.g. partially linear and/or additive nonparametric
projections), and for a function of a density.
Specific results cover a
semiparametric random effects model for binary panel data, nonparametric
consumer surplus, nonparametric prediction, and average derivatives.
Regularity conditions are given for many of the propositions.
These include
primitive conditions for v'n-consistency, asymptotic normality, and consistency
of an asymptotic variance estimator with series estimators of conditional
expectations (or projections),
Keywords:
regression,
derivative.
in each of the examples.
semiparametric estimation, asymptotic variance, nonparametric
series estimation, panel data,
consumer surplus, average
1.
Introduction
This paper develops a general form for the asymptotic variance of
Despite the complicated nature of such estimators,
semiparametric estimators.
which can depend on estimators of functions, the formula is straightforward to
derive in many cases, requiring only some calculus.
not based on primitive conditions,
models,
it
Although the formula is
should be useful for semiparametric
just as analogous formulae are for parametric models,
(1967) for m-estimators.
It
such as Huber
gives the form of remainder terms, which
facilitates specification of primitive conditions.
make asymptotic efficiency comparisons,
It also can be
used to
in order to find an efficient
estimator in some class.
The usefulness of this formula is illustrated in several ways.
examples are considered throughout,
New
in order to emphasize that it can be a
useful tool for further work in semiparametric estimation, and not just a way
of "unifying" existing results.
A number of Propositions are derived,
primitive conditions are given for many of them.
and
The propositions include
showing that the method of estimating a function (e.g. kernel or polynomial
regression) does not affect the asymptotic variance of the estimator.
Also,
two sufficient conditions are given for the absence of an effect on the
asymptotic variance from the presence of a function estimator.
One is that
the limit of the function estimator maximizes the same expected objective
function as the population parameter,
out."
i.e.
the function has been "concentrated
The other is a certain orthogonality condition.
Several propositions are given on the form of correction terms for the
presence of function estimates.
One has sufficient conditions for this
adjustment to take the form of the projection on the tangent set (the
mean-square closure of all scores for parametric models of the nuisance
More specific results are given for
functions) for a semiparametric model.
the case of conditional expectations,
or other mean square projections,
and
A characterization of the correction term for estimators of
for densities.
linear functions of projections and densities is given, with specific formula
given for semiparametric individual effects regression for binary panel data,
nonparametric consumer surplus, and Stock's (1989) nonparametric prediction
estimator.
Regularity conditions for v^-consistency and asymptotic normality are
formulated.
assumptions.
The discussion is organized around a few "high-level"
Times series are covered,
including weighted autocovariance
estimation of the asymptotic variance, with data-based lag choice.
Primitive
conditions are given for power series estimators of conditional expectations
and other projections,
including several examples.
The formula builds on previous work,
estimators,
i.e.
including that on Von Mises (1947)
functionals of the empirical distribution, by Reeds (1976),
Boos and Serf ling (1980), and Fernholz (1983).
The formula here allows for
explicit dependence on nonparametric functions estimators, such as conditional
expectations or densities, which are difficult to allow for in the Gateaux
derivative formula for Von-Mises estimators.
It
is based on calculating the
semiparametric efficiency bound, as in Koshevnik and Levit (1976), Pfanzagl
and Wefelmeyer (1982), and Van der Vaart (1991) for the functional the
estimator is a nonparametric estimator of, as discussed in the next section.
Also,
some of the examples build on previous work on semiparametric
estimation,
including Bickel, Klaassen, Ritov, and Wellner (1990), Hardle and
Stoker (1989), Klein and Spady (1987), Powell, Stock, and Stoker (1989),
Robinson (1988), Stock (1989), and others cited below.
Section 2 gives the formula for the asymptotic variance.
Section 3 and 4
apply this formula to derive some propositions on the effect of preliminary
nonparametric estimators on the asymptotic variance.
regularity conditions are collected in Section
Some high-level
Section 6 gives general
5.
regularity conditions for v^-consistency and asymptotic normality when an
estimator depends on a series estimator of a conditional expectation or other
projection, and Section 7 applies these results to specify primitive
regularity conditions for several examples.
2.
The Pathwise Derivative Formula for the Asymptotic Variance
The formula is based on the observation that Vn-consistent nonparametric
estimators are often efficient.
For example,
the sample mean is known to be
an efficient estimator of the population mean in a nonparametric model where
no restrictions,
other than regularity conditions (e.g. existence of the
second moment) are placed on the distribution of the data.
The idea here is
to use this observation to calculate the asymptotic variance of a semiparamet-
ric estimator,
i.e.
by finding the functional that it nonparametrically estimates,
the object that it converges to under general misspecif ication,
and
calculating the semiparametric variance bound for this functional.
To be more precise,
let
^
be an estimator,
and suppose that one can
associate with it the triple,
z
(2.1)
^ = {F
p
fi
That is,
p
:
3^
;
};
^
finite dimensional data vector,
unrestricted family of distributions of
R^;
u(F
)
z
= plim(p)
when
is a nonparametric estimator of
probability limit for all distributions of
^l{F
z
unrestricted, except for regularity conditions.
F
),
z
z,
is true.
having this as its
belonging to a family that is
In other words,
M^F
)
is the
object estimated by
of
under general misspecif ication, when the distribution
p
does not necessarily satisfy restrictions on which
z
The
is based.
p
asymptotic variance formula discussed here is taken to be the variance bound
bound for estimation of
ii{F
z
),
F
z
€
This formula is an alternative to
9^.
the Gateaux derivative for Von-Mises estimators, because the domain of
need not include all distributions, e.g. so that
on a density function.
fi(F
z
)
can depend explicitly
)
this feature of
In the technical conditions to follow,
the formula results from
li(F
having a density with respect to a measure for
F
which the true distribution also has a density.
The formula for calculating the variance bound for
in previous work by Koshevnik and Levit
(1982),
and others.
F
e
(1976),
/i(F
is that given
)
Pfanzagl and Wefelmeyer
Following Van der Vaart (1991),
<F ^
Zd
let
denote a one-dimensional subfamily of
6 e
:
(0,
e) c
a path in
i.e.
R,
e > 0,
3-,
such that the true distribution and each member of this subfamily are
Z3
3-},
^,
absolutely continuous with respect to the same o— finite measure.
be the expectation
under the true distribution
^
F
dF
and let
„,
zO
be the densities with respect to the common dominating measure,
integration with respect to that measure.
that for each one there is a random variable
J[e-^dF-^-dF^^^)
Here
-
T
Let
is^(z)dF^^^,^dz
£[•]
and
„
z9
and
dF „
zO
dz
denote a set of paths such
S (z)
with
E[S (z)
-.0,
as
-^
is a "mean-square version" of the score
S (z)
6
Let
e
2
Q
<
]
oo
and
0.
51n(dF a^/dQ\ ^^
0—0
Zw
associated with the path, which quantifies a direction of departure from the
truth allowed by
?.
The requirement that
?
in the condition that there is a set of paths
scores
!f
00,
T
with associated set of
satisfying the following property:
Assumption 2.1:
<
be unrestricted is formalized
and any
y
c >
is linear and for any
there is
S„(z) € y
s(z)
with
such that
E[s(z)] =
E[s(z)
0,
E[{s(z)-S^(z)
2
}
]
< e.
2
]
the mean-square closure of the set of scores is all mean-zero random
That is,
variables,
allows for any direction of departure from the truth.
^
i.e.
The functional
ji_,(S
)
y
:
—
)
^l{F
is
)
that is linear and mean-square continuous with respect to
IR
the true distribution (i.e.
ll/V(S
E
<
)ll
E[S (z)
if
2
]
for every
<
e"^[^x(F^g) - /jfF^Q)]
(2.2)
pathwise different iable if there is a mapping
6),
^
there exists
e >
such that
5 >
such that for each path,
^^
'V^^e^
i.e.
the derivative from the right of
^L,(S
).
/i(F
Z6
^
e
)
o,
at the truth (6 = 0)
The linearity and mean-square continuity of
fx
(S
),
is
Assumption 2.1,
and the Riesz representation theorem imply the existence of a unique (up to
the usual
a. s.
E[d(z)] =
such that
(2.3)
equivalence) random vector
0,
E[d(z)^]
<
co,
d(z),
the pathwise derivative,
and
= E[d(z)S„(z)].
Mr-(S^)
y
r
Under Assumption 2.1 and with i.i.d. data the asymptotic variance bound for
estimators of
variance of
/i(F
^
the functional
)
is E[d(z)d(z)
'
Hence,
]
the formula for the asymptotic
suggested here is the variance of the pathwise derivative of
3
is a nonparametric estimator of.
A stronger justification for regarding the pathwise derivative of
as a correct formula for the asymptotic variance of
is asymptotically equivalent to a sample average.
asymptotically linear with influence function
(2.4)
Vn{^
- j3„)
= y.",i//(z. )/v^ + o
^1=1
1
p
(1),
i//(z)
E[i//(z)]
p
fi(F
is available when
Define
3
)
^
to be
if at the truth,
= 0,
Var(i//(z))
finite.
This condition is satisfied by many semiparametric estimators, under
sufficient regularity conditions.
For i.i.d.
data,
asymptotic linearity and
the central limit theorem imply
Define
Var(f/»(2)).
T,
and
6
n
is asymptotically normal with variance
3
to be a regular estimator of
p
= 0(l/v^),
when
z.
has distribution
/i(F
F„
if for any path in
)
Vn{^-^l(F^
,
n
limiting distribution that does not depend on
{6
}
fi(F
consistent for
Theorem 2.1:
because it requires that
),
/j(F
^
is a nonparametric
locally
is asymptotically,
^
).
Suppose that
and regular for
n
or on the path.
_
Regularity is the precise condition that specifies that
estimator of
has a
))
6
1
z ,z ....
and Assumption
T,
pathwise different iable and
\l)(z)
are i.i.d,
2. 1
^
is satisfied.
is asymptotically linear
Then
is
\i(F )
= d(z).
The thing that seems to be novel here is the idea of applying this result
to the functional
(i(F
)
that is nonparametrically estimated by
The
p.
fact that asymptotic linearity and regularity imply pathwise differentiability
follows by Van der Vaart (1991, Theorem 2.1), and the fact that Assumption 2.1
implies that there is only one influence function and that it equals the
is a small additional step that has been discussed in
pathwise derivative,
Newey (1990a).
This result can also be used to detect whether an estimator is
As shown by Van der Vaart (1991),
Vn-consistent.
satisfied but
fip,(S
)
if equation (2.2)
is not mean-square continuous (i.e.
d(z)
is
satisfying
equation (2.3) does not exist) then no V^-consistent, regular estimator
exists.
For example,
the value of a density function at a point does not have
a mean-square continuous derivative,
and neither does the functional that is
nonparametrically estimated by Manski's (1975) maximum score estimator.
The
pathwise derivative does not help in finding the asymptotic distribution (at a
slower than
v^
rate) of such estimators, which can be quite complicated:
e.g.
see Kim and Pollard (1989).
The hypotheses of Theorem
2.
1
2.
1
are not primitive,
but the point of Theorem
is to formalize the statement that "under sufficient regularity
conditions" the influence function of a semiparametric estimator is the
pathwise derivative of the functional that is nonparametrically estimated by
p.
In Sections 3 and
4,
this result and some pathwise derivative calculations
are used to derive propositions about semiparametric estimators.
These
results are labeled as "propositions" because primitive conditions for their
validity are not given in Sections 3 and
4.
-
They might also be labeled as
"conjectures," although this word does not convey the same sense that the
validity of the results only requires regularity conditions.
and
4,
In Sections 3
the solution to equation (2.3) is calculated using the chain rule of
calculus,
differentiation under integrals,
= E[a(z)S
(z)]
in Sections
given.
for
5-7
a(z)
integration,
and
3ja(z)dF /39l
9
0—
with finite mean square where ever needed, and then
conditions for implied remainder terms to be small are
This approach, with formal calculation followed by regularity
conditions,
is similar to that used in parametric asymptotic theory (e.g.
for
Edgeworth expansions), and is meant to illustrate the usefulness of the
pathwise derivative calculation.
3.
Semipcirametric M-Estimators
The rest of the paper will focus on a class semiparametric m-estimators,
obtained from moment conditions that can depend on estimated functions.
m(z,p,h)
be a vector of functions with the same dimension as
on a data observation
z
and a vector of unknown functions
fS,
h.
Let
depending
Let
hO)
denote an estimator of
m-estimator
p
with corresponding
h,
A semiparametric
is one which solves an asymptotic moment equation
Y,^^^m{z^,p,hW))/n =
(3.1)
m(z,p,hO)).
0.
The general idea here is that
an estimated function
is obtained by a procedure that "plugs-in"
P
that can depend on
hO),
p.
An early and important example is the Buckley and James (1979) estimator
for censored regression.
Other examples are Robinson's (1988) semiparametric
regression estimator and Powell, Stock, and Stoker's (1989) weighted average
derivative estimator.
For a new example,
additive nonparametric component,
V = (v ,v
).
E[y|x,v] = x'P
+ p, (v
The motivation for this model is that if
dimensional the asymptotic properties of
additivity of
consider a semi-linear model with
p, (v
further discussion.
+ p
)
(v
)
v
)
+ p
(v
),
where
is high
could be adversely affected if
P
is true but not imposed:
see Section 4 for
Assume that the set of additive functions in
with finite mean-square is closed in mean square, and let
the mean-square (Hilbert space) projection on this set.
n(»|v ,v
Also,
and
v
)
let
v
denote
fl(«|v ,v
)
denote an estimator of this projection, such as the series estimator
considered in Stone (1985) and in Section
6,
or the alternating conditional
expectation estimator in Breiman and Friedman (1985).
(3.2)
p = argminp{j:^2i f^i ~
''i'^
Consider
" ^(v^, p) ]^/2>,
h(v,p) = lt(y|Vj,V2) - f[ix\v^,v^)'p.
This is a semiparametric m-estimator with
= [x + ah(v,p)/ap][y It
is possible,
x'fi -
m(z,p,h(p))
h(v,p)].
at the level of generality of equation (3.1),
to derive a
number of propositions.
derivation,
it is
To use the pathwise derivative formula in this
necessary to identify the functional that is
nonparametrically estimated by
for a general distribution
F = F
,
the limit
/i(F)
Ep[m(z,fi,h(ti,F))]
(3.3)
That is,
of
^
for that value of
By the usual method of moments
for a general
E
[m(z, p, h{/3, F)
equal to the limit of
fi
hO).
should be the solution to
)
]
and the
(by the law of large
so that
g
is consistent
that sets the population moments to zero.
depend only on the limit
the estimator
F
hO)),
Before computing the pathwise derivative,
it will
subscript is suppressed
z
so that sample moments are zero,
^
sample moments have a limit of
hO,F)
hO)
= 0.
equation (3.1) sets
numbers and
denote the limit of
where the
henceforth for notational convenience.
reasoning,
hO,F)
Let
^.
Thus,
hO,F),
it
is interesting to note that
and not on the particular form of
different nonparametric estimators of the same
functions should result in the same asymptotic variance.
For example,
this
reasoning explains why replacing the kernel estimator of Robinson (1988) by
series estimators gives an asymptotically equivalent estimator, as shown by
Newey (1990b), and suggests that for estimation of the additive model above,
the distribution is invariant to the estimator of the projection.
Also,
two
estimators may not be asymptotically equivalent if the nuisance functions
estimate different objects nonparametrically.
Proposition
1:
The asymptotic variance of semiparametric estimators depends
only on the function that is nonparametrically estimated, and not on the type
of estimator (such as kernel or series nonparametric regression).
To obtain more results,
it is useful
to be more specific about the form
of the pathwise derivative.
(h
,
...,h,).
For a path
F
J
1
E^[«]
6
Suppose that
9
has
h
denote the expectation with respect to
= h.(|3,9^),
h.(9) =
h.O
,
9 =
equal to the truth when
,
F
h.O,9)
,
9
h =
components,
J
9=0,
U
let
h.O,F
=
J
J
h.O)
),
9
J
and let the same expressions without the
9),
subscript denote corresponding vectors. For a path,
j
will be the
^i(9)
functional satisfying the parametric version of equation (3.2),
(3.4)
E^[m(z,ji,h(ji,9))] =
Then for
m(z,h(9)) = m(z,P ,h(9)), differentiation gives
0.
= /m(z, h(9-
5E^[m(z,h(9-))]/S9l
)
)
[SdF /S9]dz ^
= E[m(z, h(9-
)
)S^(z)
]
|
Then,
applying the chain rule to
E
it follows that
[m(z,h(9 ))],
9
2
1
= E[m(z,h(9Q) )Sg(z)
9Eg[m(z,h(9))]/a9|g
Assuming
D = 9E[m(z,p,
hO, 9„)
)
aE[m(z,h(9) ) ]/59
+
1^
.
the implicit function
is nonsingular,
]/9^| „
°
]
^0
theorem gives
= -D"^{E[m(z,h(9_))S^(z)]
5M(e)/S9l
(3.5)
+
.0
SE[m(z, h(9)
)
]/5e
I
The first term is already in outer product form of equation (2.3),
}.
so that the
pathwise derivative will exist if the second term can be put in a similar
form.
Suppose there are
SE[m(z.h,(9-)
iU
(3.6)
a.(z)
h.(9)
applying the chain rule to
each
9.
9,
j
JU
=
J,
1
Jo
h,(9.))]/S9L
J
Then,
equal to
such that for each
= E[a .(z)S^(z)
D
E[m(z, h (6 ),..., h (0 .),..., h (9
.
it follows that
10
]
))
]
with
=
«
J
giving the outer product form.
it
Then,
moving
follow that the pathwise derivative is
a (z)
{J] ._
.
(3.7)
}
]
h ,(6,
.
)
]/Se
d(z) = -D
[m(z,h(e
ipiz)
m(z,P^,hO
+
))
so that by Theorem 2.1 the influence function of
,
I
inside the expectation,
-D
equals
^
= -D"^'{m(z,pQ,hOQ)) + l.-i^[ccAz) - E[a.(z)]]}.
This influence function has an interesting structure.
-D
)
= El il.i. a Az)}S{z)],
D
J-i J
l.i.ElaAz)SAz)]
J-l
h.O)
= J].:!,5E[m(z.h, (e.)
SE[m(z.h(G))]/ael
))
is the usual Huber
(1967) formula for the influence
function of an m-estimator with moment functions
hO)
formula that would be obtained if
The leading term
m(z,^,h(^)),
were equal to
hO).
second term is an adjustment term for the estimation of
hO),
i.e.
the
Thus,
the
a
nonparametric analog of adjustments that are familiar for two-step parametric
estimators.
It can also be
functional. D
D
Jm(z, p
,
hO
E[m(z, p
)
)dF(z)
for each component of
,
hO
interpreted as the pathwise derivative of the
,
F)
)
]
,
or as the influence function of
Furthermore,
.
h,
the pathwise derivative of
and the
the adjustment contains exactly one term
adjustment can be interpreted as
j
E[m(z,PQ,hAp^)
h O^, F)
.
,
.
.
.
,
h^O^)
)
]
.
This
property is useful, because the adjustment terms can be calculated for each
function
h.,
holding the other functions fixed at their true values, and
then the total adjustment formed as the sum.
For this reason the
subscript will be dropped in the rest of Sections 3 and
4,
understanding that the results can be applied to individual
j
with the
h.
terms,
and
then combined to derive the total adjustment (e.g. when some adjustment terms
are zero and others are not).
11
is useful to know when an adjustment term is zero.
It
should not be necessary to account for the presence of
hO)
i.e.
h(/3),
it
can
greatly simplifying the calculation
hO),
be treated as if it were equal to
In such cases,
of the asymptotic variance and finding a consistent estimator of it.
One
case where an adjustment term will be zero is when equation (3.1) is the
first-order condition to a maximization problem, and
hO)
maximizes the population value of the same function.
To be specific,
that there is a function
depending on
(3.8)
P
q(z,P,hO))
has a limit that
HO),
and a set of functions
but not on the distribution
F
of
suppose
possibly
such that
z,
m(z,p,hO)) = Sq(z,p,hO))/5|3, hO,F) = argmaxp^^^^j^^^Ej,[q(z, p, hO)
The interpretation of this condition is that
m(z,p,hO))
conditions for a stationary point of the function
q
maximizes the expected value of the same function,
i.e.
"concentrated out."
hO,F„),
Then for any parametric model
it follows that
E[q(z,
P,
hO, 6)
) ]
are the first order
hO,F)
and that
F
hO,F)
that
is maximized at
has been
hO,e)
since
,
) ]
=
The first
6..
(J
order conditions for this maximization are
identically in
p.
p,
hO, 0)
Differentiating again with respect to
= a^E[q(z,p,hO,0))]/seapi
(3.9)
5E[q(z,
= aE[m(z,p,hO,e))]/a0i
Evaluating this equation at
p
,
2:
|
Q
= 0,
3,
o
.
it follows that
If equation (3.8) is satisfied,
a(z) =
h(li).
12
will solve
Summarizing:
then the estimation of
be ignored in calculating the asymptotic variance,
h(p) =
]/5e
= aE[aq(z,p,h(/3,0))/5p]/a0i
equation (3.6), and hence the adjustment term is zero.
Proposition
)
i.e.
it
h
can
is the same as if
Examples of estimators that satisfy the hypotheses of this proposition
Ichimura (1987), Klein and Spady (1987),
are those of Robinson (1988),
(1990).
(3.2).
any
and Ai
A new example is the additive semi-linear estimator of equation
H
Suppose that the set
and is invariant to
F e ^
projection under
and let
F,
hO)
Then
F.
of additive functions is closed under
IT
('Iv ,v
is a nonparametric estimator of
- Tl^{x\w^,w^)'^ = Tl^{y-x'P\w^,v^),
which minimizes
2,
estimation of
(y|v
IT
r
v
1
effect on the asymptotic variance of
Z
and
)
p.
Thus,
v
r
1
for
c =
)
Therefore, by
p.
(x|v
IT
(y|v ,v
IT
2
x'^ - h(v,p)) ],
-
Ep[(y
the same objective function minimized by the limit of
Proposition
denote the
)
Z
should have no
)
y-x'P -p
(v
)-p
(v
),
the formula for the influence function is
(3.10)
0(z) = (E[{x-n[xlv^,V2]}{x-n[x|v^,V2]}' ])~-^{x-n[x|v^,V2]}c.
Primitive conditions for this result are given in Newey (1991), and somewhat
weaker conditions could be formulated using the results of Section
6.
There is another, more direct condition under which estimation of the
nuisance function does not affect the asymptotic variance.
condition,
suppose that
takes on as a function
vector argument in
property if
h(p)
m(z,h)
of a subvector
h(v)
m(2,h).
depends on
h(v,p
only through the value it
h
v
of
z,
i.e.
h
is a real
The additive semi-linear example has this
is redefined to include
the limiting value of
To formulate this
for a path.
)
,v
IT[x|v
For
].
Let
h(v,9)
denote
M(z) = Sm(z,p ,h)/Sh|
,,
,
differentiation gives
(3.11)
aE[m(z,h(e))]/ae|^
=
E[M(z)ah(v,e)/aeL
If the term on the right-hand side is zero,
then
equation (3.6), and the adjustment term is zero.
13
]
= aE[M(z)h(v,e)]/aei
a(z) =
will solve
One simple condition for
.
E[M(z)|v] =
this is that
if
is an element of a set to which
h(v,e)
Proposition
of a set
h
H
E[H(z)\v] =
If
3:
is orthogonal.
M(z)
or more generally
0,
E[M(z)h(v)] =
such that
the adjustment term will be zero
More generally,
0.
for all
h(v,F)
h & H,
is an element
then estimation of
can be ignored in calculating the asymptotic variance.
The semi-linear, additive model is also an example here.
In cases where the correction term is nonzero,
the limit of
hO).
Therefore,
it is difficult to give a general
characterization of the correction term.
completely specifying the form of
where the data,
In
z,
h
One result that does not depend on
can be obtained in semiparametric models
are i.i.d.
z
...,
,
its form will depend on
and
z.
is restricted to have a
1
is a nonparametric
density function of the form
f(zO,g),
(functional) component.
S (z) = 51nf(z|p ,g(T)))/97}|
Let
where
for a finite-dimensional parameterization of
with
g
S-(z) = Slnf (z|^,g„)/S^| „
the truth), and let
g
Also,
.
denote the score
S^Vj^)
=
let
A
gp,.
(g^
denote a
constant matrix with number of rows equal to the number of elements of
The tangent set
J
The tangent set is useful in calculating the
AS (z).
asymptotic variance bound for estimators of
The form of this bound is
S (z)-n(S (z)
tangent set.
IS")
and
this projection.
in the semiparametric model
7 = (E[S(z)S(z)'
]
)~\
where
S(z) =
for example, Newey (1990a) for further discussion.
See,
m(z) = m(z,p
p
denotes the mean-square projection on the
Tl{'\J)
Under certain conditions,
for
p.
is defined as the mean-square closure of the set of all
linear combinations
f(z|p,g).
is
,
hO
Let
discussed in Section 2
)),
6
(6
a(z) = -IT(m(z)|3')
will solve equation (3.5),
so that the correction term can be calculated from
be the parameter of an unrestricted path, as
does not have anything to do with
14
/3
or
t)).
Suppose that there is
such that,
g(0)
(3.12)
Jm(z,h(e))f(z|pQ.g(e))dz =
In words,
for the limit of
hO
)
0.
under a general distribution there is a
corresponding value of the nonparametric component of the semiparametric model
where the population moment conditions (corresponding to equation (3.1)) are
satisfied.
S „(z) = Slnf. (z |g(e) )/99,
^
ge
Let
'
element of the tangent set,
respect to
E[m(z)S q(z)] = E[n(m(z)
g9
implying
S „(z) = IKS (z)]?).
ge
e
Suppose that
= -E[m(z)S ^(z)] = -E[m(z)n(S^(z)
®
under the previous conditions,
go
(z)].
= E[-n(m(z) |Sr)S^(z)
a(z) = -n(m(z)|3^)
l^")
]
1
satisfies equation
Summarizing:
Proposition
4:
If for all unrestricted paths
that equation (3.12)
then
|3')S
Then differentiating equation (3.12) with
= -E[n(m(z)|3-)n(S^(z)|3-)]
(3.6).
is an
6,
5E[m(z,h(e))]/Sel
Thus,
and note that S „(z)
g9
is satisfied,
a(z) = -\[(m(z,^^,h(^^))\'5 )
-D"^[m(z,pQ.hOQ))
-
F
ZQ
there exists
g(6)
such
dlnf(z\^^,g(B))/dQ = W(S(z)\'5),
and
and the influence function of
^
is
n(m(z,PQ,hOQ))|3-)].
This form of the correction term has previously been derived by Bickel,
Klaassen, Ritov,
and Wellner (1990) and Newey (1990a).
The contribution of
Proposition 4 is to give a general formulation for this result in terms of the
pathwise derivative calculation developed in Section
2.
This result leads to a sufficient condition for asymptotic efficiency of
a semiparametric m-estimator,
that the hypotheses of Proposition 4 are
15
m(z) = S„(z).
satisfied and
In this case,
m(z) + a(z) - E[a] = S^Cz) p
P
IKSqCz)!?) = S(z).
P
Furthermore, any semiparametric m-estimator that is
regular under the semiparametric model
-D~ (m(z)+a(z)-E[a]
f(z|p,g)
and has influence function
will satisfy
)
D = -E[(m(z)+a(z)-E[a])S(z)'],
(3.13)
as discussed in Newey (1990a).
(E[S(z)S(z)'
]
)
Thus,
S(z) = VS[z),
VE[S{.z)S(.z)' ]V = V,
the influence function of
p
is
with corresponding asymptotic variance
which equals the lower bound.
Functions of Mean-Square Projections and Densities
4.
In this section,
the form of the correction term is derived when the
nuisance functions are linear functions of conditional expectations or other
mean-square projections, such as additive or partially linear regressions,
and for densities.
X
an
r X
Let
vector.
1
y
Let
is closed in mean-square and
projection of y on
be a random variable with finite second moment and
§
g(x)
that is
x,
denote a linear set of functions of
function of
g,
and
v
h(v) = A(g,v),
is a subvector of
where
W
is all measurable functions of
general example is a projection on
16
2
].
A
One
h(v)
is a linear
z.
The simplest nonparametric example of a projection is
where
that
denote the least squares (Hilbert-space)
g(x) = argmin- „E[(y-g(x))
considered in this section will be
x
x
g(x) = E[y|x],
with finite mean-square.
A more
with each
a subvector of
x.
This is a smaller set of functions, whose
x.
consideration is motivated partly by the difficulty of estimating conditional
expectations for
with many dimensions; e.g.
x
discussion and references.
where
Here,
see Stone (1985) for
is a nuisance function,
g(x)
important reasons to avoid high dimensional nonparametric regressions are that
a
projection on a larger set of functions than that to which
will lead to higher asymptotic variances for
P
belongs
g(x)
as noted in
in some cases,
Newey (1991), and will lower the rate at which remainder terms converge to
affecting accuracy of the asymptotic normal approximation.
zero,
h(v) =
The correction term is derived first for the simplest case, where
on
^
M(z)
g(x,9) = argmin~ ^^^^^V
Let
g(x).
for a path.
on
identically in
2
E^^^^
1
denote the projection of
y
Note that for the vector of projections of elements of
5(x) = n(M(z)|^),
'§,
~
Also,
6.
by
it follows that E[M(z)g(x, 6)
6(x) € §,
E [6(x)g(x, 6)
s
]
= E
s
= E[5(x)g(x, 9)
]
so by
[5(x)y],
the chain rule,
E[M(z)ag(x,0Q)/S9]
(4.2)
Ig
=
5E[M(z)g(x.9)]/S9|g
= SE[6(x)g(x,9)]/S9|g
= {3EQ[6(x)g(x.e)]/ae - aE^[s{x)gix]]/ae}\^
= SE„[6(x){y-g(x)}]/a9l
o
y
= E[6(x){y-g(x) }S^(z)
y
]
I
.
y
Equation (4.2) implies the next result.
Proposition
5:
If
correction term is
h(v) = g(x)
a.(z)
is the projection of
y
on
"§ ,
then the
= Tl(H(z)\'§) [y-g(x)]
A new example is an estimator for a semiparametric random effects model.
Let
(y.
,x
is binary,
)
(t=l,2),
be sets of observations for two time periods,
and suppose that for
x =
(x
17
,x
),
E[y
|x]
= $( [x ^
where
+p(x)
]/cr
y
)
]
=
<r
is an unknown function,
p(x)
1,
an individual effect
N(p(x),cr
is
a,
2
denotes the standard normal
$
This is a binary panel data model with
CDF.
X
and
=
y
1
a + e
+
(x P
and the conditional distribution of
for
0)
>
given
a + e
This model generalizes Chamberlain's (1980) random
).
effects model by allowing the conditional mean of
to be unknown.
a
In
contrast to Manski's (1987) semiparametric individual effects model,
is
e
allowed to be heteroskedastic over time, but the conditional distribution of
is restricted to be Gaussian.
a + E
An implication of this model is that
*"-^(E[y^|x]) = <r^^i>~^(.E[y^\x]) + U^-x^)^^^.
(4.3)
This implication can be used to construct a semiparametric minimum distance
estimator by replacing the conditional expectations with nonparametric
h (x) = E[y |x]
estimators
-1
regression of
$
and choosing
and
p
-1
-
(h, (x.))
1
on
1
x, .-x„.
li
2i
and
$
from the least squares
o"
-
This estimator can
(h„(x.)).
2
1
also be generalized to the case where the distribution of disturbances is
unknown, by normalizing the scale of
^
and replacing
by a series
$
approximation to the unknown inverse marginal distribution functions, although
further development of this estimator is beyond the scope of this paper.
To derive the influence function of the estimator of
that it is a semiparametric m-estimator with
^'"^
m(z,^,h(v.^)) = A(x,h2)[$"-^(h^(x)) [x -X
,
$
(h
variation,
(x))]'.
since
Here,
m(z,3 ,h(v,p
^
A(x,h
(.h^ix))o-^ -
= 0.
))
Also,
<i
J
)',
,
note
v = x,
(x^-x^)^],
A(x,h2) =
and
D = -E[A(x, h )A(x,h
Then by Proposition
5,
J
J
18
)'
]
a.(z) =
J
J
)J"V(4'~^(h.(x)))"^[y.-h.(x)].
)(-lcr
•^
O'er
cr
the correction terms are the only source of
M.(z) = A(x,h ){-l)'^~-^<^($~^(h.(x)))~^
J
p =
and
IB
and
0(z) = -D ^[a^(z)+a2(z)] = -D ^Atx.h^)
(4.4)
{^(*"^(h^(x)))"^y^-h^(x)]
Proposition
-
•
(r^(p{i~'^{h^U)))~'^ly^-h^U)]}.
can be generalized to linear functionals of the projection
5
that have a particular property specified in the following assumption.
h(v) = A(v,g),
Assumption 4.1:
is a linear function of
A(v,g)
there is
6(x) e &
(4.5)
E[M(z)A(v.i)] = E[5(x)i(x)].
such that for all
and
g,
g(x) e §,
By the Riesz representation theorem, equation (4.5) is equivalent to assuming
that the functional
E[M(z)A(v, g)
condition is necessary for
functional of
h(v)
h(v),
is mean-square continuous in
]
satisfied.
Thus,
for
to be a v^-consistently estimable
XM(z)h(v)dF(z)
as discussed in Newey (1991),
will affect the convergence rate of
h(v)
a linear
This
g.
P
so that the estimation of
unless Assumption
function of
g(x).
4.
1
is
Assumption 4.1 and
the form of the correction term given below characterize the adjustment for
mean-square projections.
Equation (4.5) leads to a straightforward form for the correction term.
h(v,e) = A(v,g(e)),
Noting that
(4.6)
E[M(z)ah(v.e)/se]
I
differentiation gives
= 5E[M(z)h(v,e)]/aei
= aE[5(x)g(x,e)]/ae|^
o
= SE[M(z)A(v,g(e))]/aei
= E[5(x){y-g(x)}S^(z)].
o
where the last equality follows as in equation (4.2).
Proposition
6:
If Assumption 4.1 is satisfied,
8(x)[y-g(x)].
19
the correction term is
oc(z)
In order for this result to provide an interesting formula,
possible to find
6{x).
In a number of cases
similar to that of Proposition
(x
,x
),
v = x
x =
h(v) = A(v,g) = J.g(x ,x )dx
and
,
E[M(z)h(v)] = E[E[M(z) |v]h(v)
In this case,
takes a projection form
One interesting case is that where
5.
may be a vector,
X
6(x)
it must be
E[J^[M(z) Ix^lgCxjdx^] =
=
]
.
E[l(x^6^)f(x^|x2)"^E[M{z)|x2]i(x)] = E[n(l (x^€^)f (x^ |x2)"^E[M{z} Ix^] l^)g(x)
where
f (x
|x
Proposition
is the conditional density of
)
S^(x
h(v) =
If
7:
,x )dx
respect to the product measure corresponding to
X
with density
.
dx
and the distribution of
-1
f(x \x^),
and
1
(x €j4)f (x,
then the correction term is
second moment,
x
is absolutely continuous wi th
x
,
given
x
]
Ix-,
)
has finite
E[M(z)|x-,]
8(x) [y-g(x)]
8(x) =
for
n(l(x^€^)f(x^|x2)~^E[M(z)|x2]|g).
An example is average approximate consumer surplus, where
variable and
with
M(z) =
^
n b'^
p = J^._ J g(x
1.
x
,
By Proposition
)dx /n,
8,
is a price
x
which is a semiparametric m-estimator
the influence function for this estimator
will be
(4.7)
^(z) = J^g(x^,X2)dx^ " ^0 " ^(l(aix^£b)f(x^|x2)"^^)[y - g(x)].
Results for exact consumer surplus (i.e. equivalent variation) and where the
demand function is a nonlinear function of a projection (e.g.
log-linear
models) are analyzed in Hausman and Newey (1991).
Another case where
5(x)
takes a projection form is where
derivative of a projection evaluated at some other variable
and a vector
\-
(A
A
)'
of nonnegative integers,
and denote a partial derivative by
(4.8)
D'^g(x) = ^''^'g(x)/ax^^ooo^x^^
20
let
v.
|A|
h(v)
For
is a
x e
= E-=i'^-
J"
IR
,
h(v) =
Suppose that
the densities of
and
v
v
order in the components of
and f (v)
and
V
f
X
be
(x)
respectively, with respect to the same dominating
x
Assuming that
measure.
M[v) = E[M(z)|v]
Let
Z)'^g(x).
has a density that is differentiable to sufficient
v
corresponding to nonzero components of
X,
with zero derivatives on the boundary of its support, repeated integration by
parts gives
E[M(z)h(v)] = J>l(v)D'^g(v)f (v)dv = (-1
XV
'
)
'^'
J'D'^[M(v)f
(v)]g(v)dv =
{_l)l^lj-C^[M(v)f (v)]|
g(x)dx = E[(-l)'^'f (x)~^/[M(v)f (v)]
g(x)] =
V
V=X
V
v=x
X
S(x) = (-l)'^'n(f (x)"^D^[M(v)f (v)]
E[5(x)g(x)],
Proposition
8:
If
h(v) = D g(x)\
x=v
v
,
and
v=x
|^).
are absolutely continuous
x
with respect to the same measure, which is Lebesgue measure for the components
X
of
corresponding to nonzero components of
X
are continuously differentiable to order
E[M(z)\v]
of
is a convex set with nonempty interior,
X
X,
is zero on the boundary of the support of
finite second moment,
x
in
|X|
f (v)
XV
D f (v)
D [M(v)f (v)]
and f (x)
S(x) [y-g(x)]
and
the support
x,
X s X,
and for each
then the correction term is
C-i;''^'n(f (x)~-^D^[H(v)f (v)]
X
V
v=x
the density
for
v=x
has
S(x) =
\'§).
An example with no derivatives involved is Stock's (1989) nonparametric
prediction estimator, where
partially linear projection,
P = 7-
and
,
[g(v. )-g(x.
= E[g(v)]-E[g(x)],
)
]/n.
^ =
v =
{g. (x
(v
+ x'tj},
)
x
)
so that
Pq
(4.9)
is a
is partitioned conformably with
This is a semiparametric m-estimator where
h^(v) = g(v), h^Cx) = g(x),
and
From the form of the correction terms in Proposition
of
g(x)
M^(z) = M^Cz) =
8,
x,
S„
1.
the influence function
is
0(z) = g(v)-g(x)-pQ + [n(f^(x)~V^(x)|^) - l][y - g(x)].
This result differs from Stock's in the inclusion of the term
21
g(v)-g(x)-p
,
because Stock's result only derived the conditional distribution given the
observations on
and
x
The variance of the second term is the same as
v.
Stock's formula, because
XV
n(f (x)~-^f (x)|g) = f
(x, )~-^f
X
1
1
U^-E[x^\x^])E[War{x^\x^)]~'^Elx^-Elii^\x^]^^^
11
equal to the densities of
and
x
+
f^U^)
and
fy(v^)
1
1
respectively.
v
)
1
1
for
],
(x,
V
Proposition 8 also
gives the form of correction terms for the dynamic discrete choice estimators
of Ahn and Manski
(1989) and Hotz and Miller (1989),
and the average
derivative estimator of Hardle and Stoker (1989).
A correction term for density estimation can be derived under conditions
A(v,f
w
is a linear function of the density
)
WW
respect to some measure.
E[M(z)A(v,f
path.
)]
f
w
Let
f
w
(w|e)
of a vector
(w)
Suppose that there is
= J'a(w)f (w)dw.
h(v) = A(v,f
Suppose that
similar to those for the projection.
a(w)
w
),
w,
where
with
such that
denote the density of
w
for a
Then
E[M(z)ah(v,e)/5e]
I
O
Wo
= aE[M(z)A(v,f (e))]/ae|^
= aE^[a(w)]/Sel
= E[a(w)S^(z)].
y
y
y
Proposition
such that
9:
If
h(v) = A(v,f
E[H(z)A(v, f
)]
)
for a density
f
and there is
a(v)
then the correction term is
= So.(v)f (v)dw
a(w)-E[a.(w)].
Existence of such a
if
J"f
w
(w)
2
dw
a(w)
will follow from the Riesz representation theorem
is finite and
E[M(z)A(v,f
w
can be extended to a linear
)]
functional on the Hilbert space of square integrable (dw) functions that is
continuous.
sense,
Continuity of
E[M(z)A(v,f
)]
in
f
,
in the square integrable
appears to be essentially necessary for the correction term to be
V^-consistent, although it is difficult to give a precise result, because the
22
usual parameterization for checking v'n-consistency is the square root of the
rather than density itself.
density,
A case where is it easy to compute a density correction term
that with
w = (y',x')
parts gives
E[M(z)A(v,f
(_l)l^lj-£)^[M(v)f
V
(v)]|
{-l)''^'D^[M(v)f (v)]|
V
Proposition
W
_
V—
X
v=x
)]
= jM(v)D'^[J'a(y)f
W
(w)dy]
|
|
_
f
X=V V
is
Integration by
.
(v)dv =
[Ja(y)f (w)dy]dx = Ja(w)f (w)dw,
w
w
a(w) =
a(y).
h(v) = D Sa(y)f (y,x)dy\
If
10:
h(v) = D [Ja(y)f (w)dy]
and
a(w)
_
,
v
and
are absolutely
x
continuous with respect to the same measure, which is Lebesgue measure for the
components x
f (v)
and
of
D f (v)
is a convex set with nonempty interior,
is zero on the boundary of the support of
D [N(v)f (v)]\
V
x
v=x
has finite second moment,
a(y)
then the correction term is
(-l)^^^D^[H(v)f (v)]\
V
X,
are continuously differentiable to order
E[H(z)\v]
the support of
X,
corresponding to nonzero components of
x
v=x
a(w)
-
Ela(w)]
for
x
the density
in
|A|
and for each
x,
X ^
and
then the correction term is
aCw) =
a(y).
This result gives the form of the correction term for Powell, Stock,
and
Stoker's (1989) weighted average derivative estimator and Robinson's (1989)
test statistics.
Another example is Ruud'
s
(1986) density weighted least
squares estimator, which is treated in Newey and Ruud (1991).
There may be other interesting cases where the form of the correction
term can be calculated.
Hopefully,
the ones given here illustrate the
usefulness of the pathwise derivative calculation of the influence function.
In the next two Sections,
regularity conditions for the validity of many of
these calculations are given.
23
Regularity Conditions.
5.
This Section develops a set of regularity conditions that are sufficient
for validity of the pathwise derivative formula.
The regularity conditions
are based on direct verification that remainder terms from the pathwise
derivative are small.
The rest of the paper will focus on semiparametric generalized method of
moments estimators where
does not depend on parameters.
h(v)
be a vector of functions of the data observation
vector
and
^
a
J x
assume that the moment condition
Note that this setup allows
that some
h(v)
y.
h(v)
h(v)
))
E[m(z. P
,
and each
'
,
h(v.
) ) ]
,
)
parameter
1
is a vector.
v.
=0
is satisfied.
to include parameter values,
by specifying
For example, some
might be trimming parameters, as in Newey and Ruud (1991).
m (S) =
n
denote an estimator of this vector function,
,m(z. p, h(v.
q x
represents a possible value of
h
are trivial (can only take on one value).
v.
elements of
Let
where
h,
h(v) = (h (v ),..., h (v
a vector of functions
Also,
vector
1
the
z,
m(z,^,h)
Let
and
)/n,
W
a positive semi-definite matrix.
The estimator
to be analyzed satisfies
p = argmin^ ^m 0)'Wm (p).
p€t! n
n
(5.1)
Although
h
is not allowed to depend on
p
in this section,
the results
are still useful for the general case, because they provide conditions for the
important intermediate result that
asymptotically normal.
letting
V.
,m(z. 6., h(v.
^1=1
,
1
,
6.
)
)/v'n
This result will follow as a special case by
p = I^^i'^^^i'^o'^^^i'^O^^^'^-
Because of the importance of asymptotic normality of
(for
is
1
m(z,h) = m(z,p
correction terms,
,
h)),
J^._
m(z. h(v.
,
)
)/V^
and because this function is the source of the
it is useful to discuss this result first and organize the
24
discussion around a few high-level conditions.
The pathwise derivative
calculation is very useful in formulating these conditions, because it gives
the form of a remainder term that should converge in probability to zero,
implying asymptotic normality.
(J
=
J),
1
and
a(z) =
Let
J^._
be the solution to equation (3.5),
a.(z)
a.(z).
from the form of equation (3.7)
Then,
one would expect that the following remainder
should converge in
R
term
probability to zero:
R
If
R
n
n
11
= y;.",m(z.,h(v.))/Vn - F.^^u./Vn,
^1=1
-^
0,
1
^1=1
1
then asymptotic
normality of
^
-^
from the central limit theorem applied to
To give conditions for
remainder term.
column of
M(z)
For
11
= m(z.,h(v.))
u.
,m(z. h(v.
^1 = 1
1
1
Y.
,
+ a(z.
- E[a(z)].
)
1
)
will follow
)/V^
u./Vn.
J^._
to be small it is helpful to decompose this
R
M(z) = 9m(z,h)/3h|, ^, ,,
h=htv)
let
M.(z)
denote the
j
J
and
11
11
11
R^ = I.",{m(z.,h(v.)) - m(z.,h(v.)) - M(z.)[h(v.) - h(v.)]}/Vn.
^1=1
n
1
R^. = j:.'^.{M.(z.)[h.(v.)
^1=1
nj
1
1
J
J
Note that
R
n
= R^ + E.'^.R^..
n
^j=l nj'
-
-
h.(v.)]
a.(z.)
1
J
J
so that
R
n
+
1
E[aAz)]}/Vn.
J
-^
if each of the
following
^
conditions is satisfied.
Asymptotic Linearity:
R
-^
Asymptotic Differentiability:
0.
R
-^
.
0,
(j
=
1
J).
Asymptotic linearity is similar to a condition formulated in Hardle and
Stoker (1989), and will follow from a Taylor expansion and a sample
mean-square convergence rate for
discussed below.
h
of slightly faster than
n
-1/4
,
as
Asymptotic differentiability is a deeper, more important
25
h.(v)
It can be shown to hold if
condition.
is a kernel estimator of
J
using U-statistic projection results and higher-order bias
J'a(y)f (y, x)dy
reducing kernels, as in Powell, Stock, and Stoker (1989) and Robinson (1988),
or if
is a series estimator using properties of sample projections and
h.(v)
series approximations, as in Newey (1990b) and Section
6.
possible
It is also
to further decompose the asymptotic differentiability remainder in a way that
allows application of Andrews (1990b) stochastic equicontinuity results.
Jiji
= 5:.",{M.(z.)[h.(v.) - h.(v.)]
R^^.
^1=1
nj
R^^ = Vn{XM.(2)[h.(v)
Note that
R^
.
=
nj
R^^.
nj
-
ji
nj
J
- h .(v)]dF(z)}/i/S.
J
J
h.(v)]dF(2) - l.^Aoi.U.) - E[a (z) }/n}.
.
so that
+ R^^,
- J>l.(z)[h.(v)
Let
R^
nj
.
-^
]
if each of the following
conditions is satisfied.
Stochastic Ecpiicontinuity:
Functional Convergence:
R
R
.
nj
-^
.
—
>
0.
0.
Conditions for stochastic equicontinuity are given below.
convergence is specific to the form of
if
E[M.(z)|v] =
0.
Thus,
0,
One interesting result is that
then functional convergence holds trivially for
a.(z) =
asymptotic linearity and stochastic equicontinuity are regularity
conditions for Proposition
zero,
h(v).
Functional
3,
as further discussed below.
When
a.(z)
functional convergence may follow from asymptotic normality of
mean-square continuous linear functionals of
h(v),
since functional
convergence is only slightly stronger than asymptotic normality of
v^{jE[M.(z)|v]th.(v)
-
h.(v)]dF(z).
Some of these high level conditions will be consequences of more
primitive hypotheses.
The first of these limits the dependence between
26
is not
observations that are far apart.
Assumption 5.1:
coefficients
is strictly stationary and strong (a) mixing with mixing
z.
= 0(t~^)
oc{t)
for
fi
>
2.
The next condition is uniform consistency of
Assumption 5.2:
sup
,,
v^el/j
For each
|h.(v.)-h.{v.)|
J
J
J
V
and the support
j
-^
h.
of
.
v
.,
0.
J
Primitive conditions for this and the other assumptions about
Section
The following pair of hypotheses are more primitive conditions for
6.
Asymptotic Linearity.
random variable
= {h:
Y
<
llh-h(v)ll
Assumption 5.3:
.
i 2,
6
The first imposes smoothness conditions on
let
=
|Y|
(E[
|
Y| ^]
^''^,
and for any
)
For a
m.
let
€ >
Jf(v,€)
€>.
|m(z, p
there is a neighborhood
&
are given in
h
,
h(v)
N
)
|
of
,
p
is finite for some
€ > 0,
b.
>
(z)
such that with probability one
i 2
continuously differentiable on
>
s'
1,
&.
2fi/(fi-l)
s j+k ^ 2),
(1
m(z,p,h)
and
is twice
Nx'Hiv,e),
The next hypothesis imposes a convergence rate on
h.
O
Assumption 5.4:
ii) either
m
i)
for some h,
is linear in
h
for each
or
h
J,
^ -&
Y.
^1 = ,1
|h
.
J
(v. )-h
1
.
(v.
J
1
)
|
At
/n = o (n
p
'
-(1/2).
Assumption 5.4 is stated in terms of the sample
L
norm rather than a more
general norm because the literature on convergence rates of nonparametric
estimators seems to give the sharpest results for this norm.
27
):
If Assumptions 5.2 - 5.4 are satisfied then Asymptotic Linearity
Lemma 5.1:
is satisfied.
The following condition is sufficient for stochastic equicontinuity.
Assumption 5.5
<
,
E[l(v
such that
V.
set
|M(z)|
J
V
is convex with
d.
integer
J
>
=
]
II
and for each
2/i/(/i-2)
is a singleton,
J
Prob(Boundary(V
h.(v)
J
.)
=
)
and there is a positive
is continuously differentiable,
J
with bounded derivatives,
to order
d.
V
on
.
sup^ ^^ |D'^h(v.)-D\(v.)|
-^
and for all
.
|A|
£ d.,
J
J
J
0.
then Stochastic
If Assumptions 5.1 and 5.5 are satisfied,
Lemma 5.2:
there is a
j,
and either; a) V.
J
such that
diin(v.)/2
J
>
s'
.(z)
.el/" .) IIM
J
b)
or;
for
oo
Equicontinuity is satisfied
Although the main focus here is asymptotic distribution theory, for
completeness it is appropriate to give a consistency result.
The next
hypothesis imposes identification and regularity conditions for consistency.
Let
p:
IR
^
Assumption 5.6:
W
E[m(z,
(3,
is positive definite,
b)
p(e)
B
E[b(z)] <
is compact,
oo
h(v)
j3
e
-,,
nSH IV,
m(z,p,h(v))
G
)
-5.2
II
]
<
Jlm(z, p,h)-m(z,
cd,
/3,
W
,
h{v)
)
II
W,
with
/3
there is
-^
and
b(z}
£ b(z)p(€),
or;
J
E[b{z)] <
^"PpeB h€K(v e)l'"'(z.'3.h)-m(z,p,h(v))ll
If Assumptions 5.1
h(v)
is continuous in
continuous at zero such that
Theorem 5.3:
|3,
P
is convex in
a) m(z,P,h(v))
E[llm(z,
SB,
and sup,
0.
has a unique solution at
=
) ]
and either
probability one, for each
such that
p(0) =
be continuous at zero, with
IR
:£
oo,
p,
sup
there is
„llm(z, 3, h(v)
)
II
and
^ b(z),
b(z)cP.
and 5.6 are satisfied then
28
b(z)
-^
^
.
Next,
regularity conditions for Proposition 3 are given.
If Assumptions 5.1-5.6 are satisfied and for each
Theorem 5.4:
E[H
.(z) \v .] =
J
J
U.
set
Let
more generally
or,
h .(v
E[M .(z)h
such that
J
J
E[m.m'.] + y„ ,E[m.m'.
^i^-p
)
„
+ m.
-^
and
h .(v
J
for all
=
.(v .)]
J
J
.)
J
J
h
either
J
are elements of a
.)
J
e
.
=
m.
K
then for
.,
Q =
J
J
„m'.].
V = (D'WD)~'^D'WnWD(D'WD)~'^.
N(0,V),
This theorem shows that Andrews (1990a) "independence" hypothesis,
that
estimation of
^,
does not affect the limiting distribution of
h(v)
consequence of orthogonality of
with the set of possible
M(z)
is a
h(v).
The next asymptotic normality result allows for a nonzero correction
n^ = E[u.u^^^],
Let
term.
Theorem 5.5:
J),
If for some
Assumptions 5.1
satisfied,
^ = n^
v^(p-p
then
)
5.6,
-^
A consistent estimator
of
u.
1
m.
1
If
When
1
1
)
),
*
u.
1
=
u.
p.
Such
Q
denote estimates of
...
in.
1
WD)~^
can be formed from
a.(z.)
J
is not autocorrelated then
u.
1,
is required to form a consistent
n
of
a.,
Ji
1
= m. (z. ,p,h(v.
'^'
1
Let
u..
(J =
is finite,
V = (D' WD)~-^D' WnWD(D'
N(0,V),
n
,
and Asymptotic Differentiability are
estimator of the asymptotic variance of
estimates
|a.Czj|
> 2^/(y.-l),
s'
- 5.4,
+ l^l^iQ^+Q'^)
+ Y- ,\.a...-Y. ,a../n],
^.]=1
11
.11 ^1=1
n = Q
and
1
n„ =
i
Yv ,u.u' „/n.
^1=1 1 i+£
will be an appropriate estimator.
may be autocorrelated, consider a weighted autocovariance estimator
like that in Newey and West (1987),
with
29
n = ^Q + l^^^u,L)iQ^ + n^],
where
w(£,L)
w(£, L)
=
1
is a weight such that
- £/(L+l).
Here
covariance matrix of
there is
2,
for each
i
^
and
.
=
p
n = n^;
u
such that
|m.|
^
is bounded uniformly in
Up
CO
J
1,
s > 4u./(ll-2)
rr-
\\p-pj\
i,
\h .(v)-h .(v)\
J
D'WnWD(D'WD)
w(l,L)
J,
for each
1
an estimator of the asymptotic
Suppose that Assumptions 5.1 and 5.3 are satisfied with
Theorem 5.5:
10
Q,
can be formed in the usual way, as
^
V = (D'WD)
fe
n
;,
or
=
(l/Vn).
there is e
n
b)
L
-^
co,
and
^
and
and
L
\a .(z.)[
s
X
J
and
n
is nonsingular.
The estimator
Q
=
&_
are finite
w(i,L) =
lim,
= o(l) such that 1/n = 0(&^),
n
p
(^n )
L = o (&~'^).
p
n
and either a) Q, =
0,
I
Then,
V
-^
V.
the asymptotic variance depends
and an optimal (asymptotic variance minimizing) choice of
W,
when
I
y.",lla ..-a .Cz Jll^/n =
^1 = 1
ji
1
J
As usual for minimum distance estimators,
on
such as
can depend on the data, as is important for
Given this estimator of
applications.
6,^ =
L
is positive semi-definite,
Q
W
is
n
can be used to form a feasible
version of the optimal minimum distance estimator, by using
W = n
in
equation (5.1). The resulting estimator will be an optimal estimator that
adjusts for the presence of first-stage, plug-in estimators in the moment
functions,
W,
(D'n
similarly to the estimator of Hansen (1985).
D)
For this choice of
will be a consistent estimator of the asymptotic variance of
30
Series Estimation of Projection Functionals.
6.
This Section develops regularity conditions for linear functionals of
Power series are considered because
power series estimators of projections.
they are computationally convenient and the most complete convergence rate
Although in some contexts power series
results seem to be available for them.
are thought to be inferior to other approximating functions, because of their
"roly-poly" behavior and global sensitivity of best uniform approximations to
singularities,
these considerations may not be as important here, where the
projection estimator is a nuisance function.
Under Assumption 4.1,
p
depends on the series approximation essentially only through a weighted
average, where these problems with power series seem not to be so important.
An example is provided by the Monte-Carlo results of Newey (1988a), where a
semiparametric power series estimator performs extremely well relative to a
kernel estimator.
Here,
in equation (4.1).
£
i L
&
the domain
of the projection will be assumed to take the form
The conditions to follow will depend on the maximum across
of the dimension of
which will be denoted by
x„,
a.
A power series
estimator of the projection can be obtained from a regression of
truncated power series with elements restricted to lie in
Stone's (1985) spline estimator.
integers as before, and let
x
X
Let
analogous to
^f
x.
IT._
on a
denote a vector of nonnegative
X
T-
=
W,
y
For a sequence
.
CO
(/VCk))
of
distinct such vectors, a power series is
X
(6.1)
pj^(x)
k =
,
L+l,k
1,
.
.
.
dim(x
,
)
L+l
=
A (k-s
X
It will be assumed that
)
,
.
{x
,
.
/•
~
k = dim(x
A (k
},
j.
\
L+ l
,
)+l,
,~
,
k>dim{x
)
L+l
31
consists of all multivariate
powers of each
for
x.
p.
(x)
and no more,
^ L,
ordered so that
is
|A.(k)|
This assumption imposes the essential restriction that
monotonic increasing.
each
£
belongs to
and the spanning condition that the sequence
W,
includes all such terms.
The estimator of the projection considered here is that obtained from the
least squares regression of
on the data. For
K
[p
(x
y = (y
K
i(x) = p^(xrn,
(6.2)
y
K
)',
terms, where
K
K
p (x) = (p.(x)
the estimator of
p (x )],
)
on
y
g(x)
is allowed to depend
and
p (x))',
p
K =
is
t = p'^'p^^/n
n = t'p^'y,
where
(•)
denotes a generalized inverse.
K K
p 'p
will be nonsingular with probability approaching one, so that the
Under the conditions to follow,
choice of generalized inverse does not matter, asymptotically.
A data based
is essential for making operational the nonparametric
K
properties of series estimators, allowing the estimator to adjust to
conditions in particular applications.
how to best choose
K
It
would also be interesting to know
in the current context,
but this question is outside
the scope of this paper.
For computational purposes it may be useful to replace
p (x)
with
nonsingular linear transformation to polynomials that are orthogonal with
respect to some distribution,
since these may have less of a multicollinearity
problem than power series are known to have.
not affect the estimator.
Also,
Of course,
this replacement will
note that the elements of each
smooth, bounded transformations (e.g.
x.
may be
the logit distribution function) of
"original" variables, which may help to limit the sensitivity of the estimator
to outliers.
In the Monte Carlo example of Newey (1988a),
transformation lead to reduced sensitivity to the choice of
32
such a
K.
The function
linear function
A(v,g)
that appears in the moments will be taken to be a
of the projection
will be estimated by replacing
h(v)
of
h(v)
A
in
g,
by
g
g
as analyzed in Section
4,
and
By linearity
in the function.
the resulting estimator takes the form
h(v) = A(v,g) = A(v)'7t,
(6.3)
g,
A(v) = (A(v,p^)
Note that this estimator requires that
A(v,g)
A(v,p~))'.
have an explicit form that
does not depend on the true data-generation process.
An estimator of the correction term is required for estimation of the
asymptotic variance of
p.
constructed in a straightforward way.
Let
a(z) = *'f:"p^(x)[y - g(x)],
(6.4)
By Assumption 4.1
*'£ p
(x)
i
* = Y.^^^ldm(z^,^,hiv^))/ah]Mv ^)/n.
will be an estimator of
is an estimator of the regression of
approximate
6(x)
such an estimator can be
Under Assumption 4.1,
for large
and
K
n.
on
6(x)
Alternatively,
K
fixed.
which will
p (x),
can be viewed
a(z)
as the estimator of the correction term obtained by treating
were a parametric regression, with
so that
J6(x)p (x)dF(x),
g(x)
as if it
This procedure results in a
consistent estimator of the correction term because it accounts properly for
its variance, while bias from the series approximation will be small because
of smoothness restrictions on
and
g(x)
5(x)
imposed below.
The following conditions are needed to apply the results of Newey (1991).
Let
X
denote the vector consisting of the union of all distinct variables
appearing in
x„,
(£
£ L),
and let
first condition is sufficient for
§
s,
t =
L
~
'(Eo^ig/^^/^
to be closed.
33
~
•
^^^A^f^
2
^
*~
"^-
^^®
Assumption 6.0:
of
x^
then
i)
x =
For each
for some
x„,
such that for each
x^,
(£
I'
=
....
1,
ii) There exists a constant
£ L;
with the partitioning
i,
X = (x^.x.')',
cJa(q)d[F(x^)-F(xJ)] £ E[a(q)] ^ c"Va(q)d[F(x^) -FCxJ)
0,
=
(i.e.
of
t,
is not present) or
X
is a subvector
x
if
L),
]
c >
for all
a(q) ^
Either
iii)
;
1
§
is bounded and for the closure
x
E[{x^^^-n(x^^j|f)}{x^^^-n(x^^^|f)>']
i)
is nonsingular.
The next condition requires that the support of
be a box and places a
X
lower bound on its distribution.
Assumption 6.1:
There are finite
such that the support of
X
X
is
X.
n
II
-
> X.,,
v
.
^ 0,
1,
dim(X))
...,
and the distribution of
[X. ,X.. ]
ju' jb
^
j=l
=
(j
has absolutely continuous component with density bounded below by
on the support.
Cn. ,[(X. -X.){X.-X., )]
"j=l
JU J
Jb
J
The nonsingularity condition is a normalization, unless
interest, where it is an identification assumption for
Assumption 6.2:
|c|
is finite for
,
s'
i 2
and
E[e
is a parameter of
7)
t\
.
2
|x]
Let
e = y-g(x).
is bounded.
The bounded second conditional moment assumption is quite common in the
literature (e.g. Stone,
Assumption 6.3:
0(i) = 0(t~^),
that
Either
(t
=
1,
lEf^i^^i+il'^i.^i+^J
and simplifies the regularity conditions.
1985),
a)
2,
I
z.
is uniform mixing with mixing coefficients
...),
for
/i
- c(t)
and
E^"jC(t) <
> 2
b)
or;
there exists
c(t)
such
co.
This assumption is restrictive, but covers many cases of interest,
including
independent observations and dynamic nonparametric regression with
g(x.) =
E[y. |x.,y. .,x. ,,y.
^,...].
1-^1-1
1-1 '1-2
1
The next condition restricts the amount of
variation allowed in the choice of number of terms
34
K.
Assumption 6.4:
K = K(n)
where
such that
The bounds
such that with probability approaching one,
K = K
K = K(n)
and
K(n) ^ n
and
K
are sequences of constants,
for all
and there is
e >
large enough.
n
control the bias and variance,
K
K ^ K ^ K
respectively, of
g.
The next set of conditions impose smoothness assumptions used to control
the bias of the estimator.
Assumption 6.5:
is continuously differentiable to order
(x.)
g.
d,
(£
^ L).
Two results will be given, because the conditions are weaker and simpler
in the special case where
any nonnegative integer
meriting Its separate treatment.
h(v) = g(x),
For
let
d
C^(K) =K-^"^*2d_
(6.5)
The covariance matrix of
Section
5,
can be estimated by the procedure discussed in
^
using the estimators of
The asymptotic
given above.
a.(z)
distribution results will include consistency of this variance estimator.
Theorem 6.1:
Suppose that Assumpt ions 5.1, 5.3, 5.6, 6.0-6.5 are satisfied,
and for each
J,
and
^\l/(\1l-2)
E[\\M .(z)-5 .(x)\\
d
differentiable to order
zero:
either
m(z,^^,h)
n
Cq(K)(K
1,
V
K
n = n.; or b)
u
o
2
on
is linear in
/vn
L
+
-^
K
00,
Hi)
x;
h
)+n
and
s'
.(z)\\
is bounded;
\x]
K^^^^CK)^-"^ ^"^
K^C^(K)^/n.
ElWH
s > ^ii/Cii-Z),
i)
]
is finite for some
each of the following converge to
K/Vn
or
K5
L = o
p
+
-^V.
35
VnK
= o(l)
(s'h.
ri
is continuously
ii) S .(x)
K^^^^^(K)k'^^''
,
>
s'
,
v^'^^^^S^^'^;
= o(n
O2);
iv)
and either a) Q. =
Then Vn(^-f3^)
u
-^
=
v) e
N(0,V)
0,
£
and
>
The upper bound on the rate of growth for the number of terms in the series
expansion is
n
1/4
when the density of
which is less than the
n
1/2
is bounded away from zero
X
rate derived in Newey (1990b).
requires existence of derivatives of both
and
g(x)
6(x)
(v = 0),
This result also
up to order more
than the largest dimension of an additive component, as in Newey (1990b).
h(v) =
To obtain asymptotic normality in the more general case, where
A(v,g)
is some other linear function of
continuity condition on
support of
as a function of
A(v,g)
and denote
V,
Assumption 6.6: There is a constant
IIA(v,g)llQ
g.
Let
V
denote the
supremum Sobolev norms by
= =^Pui^d,vel/l^''h^^^l'
"^^^^"d
it is useful to impose a
g,
"S^^^"d
=^"P|A|sd,xl^^^(X)l.
and an integer
C
A
such that
Cllg(x)ll^.
:£
This Assumption will imply that the bias from approximating the function
h (v)
by a linear combination of
approximating
of
p (x).
g(x)
is bounded by the bias of
A(v)
and its derivatives to order
A
by a linear combination
Unfortunately, for multivariate functions, a literature search has
not yet revealed bias bounds for approximating a functions and derivatives by
power series, except under part b) of the following condition.
Assumption 6.7:
Either a)
a =
1
or
A =
is continuously differentiable to all orders,
that
sup^^^|Z)\^(x^)| £
c''^'
for all
When
K
K
for any
is random,
b)
for each
£,
g.
and there is a constant
(x.)
C
such
X.
Condition b) implies an approximation rate for
is faster than
or;
g(x)
and its derivatives that
a.
it is useful to also have an approximation rate for
36
In order that the results would apply to many cases,
6(x).
with more general
approximation results that one anticipates will appear eventually,
the rest of
the results of this section will be stated in terms of
on
e^(K) = min |5(x)-p*^(x)'7i|„.
^
X
Let
= X(n) = tK.
K].
Theorem 6.2: Suppose that Assumptions 4.1. 5.1. 5.3. 5.6. 6.0-6.7 are
satisfied and let
= a =
^
h(v))|
s
,
and
mixing or
b)
€
a - (d/n.)-L
b)
J^^ft
^
are
,
}^nK'°'oejK) -^ 0;
and
^ CK^ K
—
o
and
e
n
=
n^^¥p^^C,^(K)^/V?i + flj^^^CK)^]^^^ -^ 0;
or b)
L
-^
<x>.
and
L = o (e~^).
P
(J
n
s'
m(z.h)
a)
+
is uniform
is linear in
[l^eJK)^]^^^ -^
A
Then
either a) n^ =
Vn(^-(i^)
u
-^
and
> ^[i/(^i-Z).
n^^l^^^C^jDll^^^/^n
iv)
a.
j.
ii) Either a) z.
n''^[K{:jK)+ K^^^C-CTc/j/v^
= K^^^i:jK)K~°- +
yrMcAK)^lK/n+YC^"-l = o(l)
A
fi_;
finite for
Hi) Either
0;
>
under Assumption 6.7 a) and
Also suppose that for each
\6 .(x) [y .-g .(x)\
J
J
J
i) Z,/KC.CK;^K~'^" -^
and
and
under Assumption 6.7 b).
+00
|m(2,p
= d/n,
a.
0.
+ k""'
I
^
N(O.V)
1
h
or
0.
+
7
h
.
and
=
V
-^
V.
The smallest upper bound on the number of terms allowed by this theorem is
n
,
for
V =
and
s =
oo,
a rate also derived in Newey (1991).
result also requires existence of derivatives of
3/2
g(x)
This
of order more than
the maximum dimension of the additive components, but imposes weak
smoothness restrictions on
5(x).
The requirement that the bias go to zero faster than
for V^-consistency,
product of
i/n,
estimating
g(x)
is that
v'i^ °''oe.(K,)
o
1/v^,
converge to zero.
the approximation rate for
g(x)
(i.e.
as needed
This term is the
the bias from
by truncated series), and the approximation rate for
37
5(x).
Consequently,
"iindersmoothing" is not required for asymptotic norma''
ty
'
th
"plug-in" series estimators, as it is for some kernel estimators, because the
bias from estimating
does not have to go to zero faster than
h(v)
1/v^.
This result is a consequence of the usual orthogonality property of
mean-square projections.
Let
on
y)
p (x)
of
g(x)
(or
gv-(^)
and
the corresponding value of h^(v).
and
^jf^^)
be the population regression
respectively,
6(x),
Using Assumption 4.1,
and
hxr(v)
= A(v,g
)
the population first
order bias term, analogous to that considered by Stoker (1990) for kernels,
is
E[M(z){hj,(v)-h(v)}] = E[5(x){gj,(x)-g(x)}]
(6.6)
= -E[{5j,(x)-5(x)}{g^(x)-g(x)}].
so that the bias term for
h
is equal to product of biases terms for
5
and
g-
Under Assumption 6.1 with
v = 0,
K
nonrandom, and Assumption 6.7 a),
these results will also apply to uniform knot spline estimators of
the definition of
is changed to
Cj(K)
Cj(K) =
5+d
K'
.
g(x),
if
More generally, the
results apply to any series estimator satisfying Assumptions 3.1-3.8 in Newey
(1991),
7.
although these do not allow for Fourier series estimators.
Examples
This Section gives primitive regularity conditions for the validity of
the examples of Section
4,
and one or two additional examples.
To save space,
this Section has a special format, where each subsection gives an estimator
an estimator of its asymptotic covariance
38
and a result on v'n-consistency and
asyniptOj.
variance,
normality of the estimator and consistency of the estimated
c
with discussion of the results reserved until the end.
To give more specific results on the rate at which the number of terms
can grow it will be useful to impose the following Assumption.
T
K(n) = n
Assumption 7.1:
7. 1
—
and
K(n) = n
r
for some
T > 7 > 0.
Semiparametric Random Effects for Binary Panel Data
CP;.o^o)'
1
2
=
(!'', A.
^1=1
1
An
1
^(h,(x.)).
^I-'^.A.*
^1=1
1
h.(x) = Ely, |x],
t
I.
=
(t
2),
1,
= 1(0
1
<
=
A.
t
I
1
h,(x.)
1
<
and
1
2
V = ^(I.;^A.Ap-^I.2^G.GMI.;^A.An-^
=
X.
1;
1
(x.
.
1
il
1
h^(x.)
<
1
=
S-.
1
1
1
<
-x.
$"-^
'
i2
1),
(h„(x.
2
1
Suppose that
are satisfied for
to order
d
on
i)
x = (x ,x
X;
iv)
iii)
).
E[(x -x
Assumption 7.1 is satisfied with
Then
7. 2
V?l[(^' ,S-^) - (p' ,<r^)]
-^
,
p(x) )(x -x
N(0,
V)
,
p(x) )'
]
+ <I "
and
=^^g^^r^2i^^^i
J^P^(x^,X2.)dx^/n}'rV(x.)[y.
39
"
^t^'^l^^^'
is nonsingular;
r > max{2r^/d,
V
-^
V.
^ = ri=i>r^g^^r^2i^^^/"'
^
AM^^.^^^'}
-
ii) Assumptions 6.0 and 6.1
Nonparametric Consumer Surplus
^ = ^i=i^/^'
)'
has continuous derivatives of up
p(x)
T < 1/4.
)
.
= A. ^c"! (h^ (x. ))
G.
are i.i.d.;
z.
)
1
+ It=l^"^^^~^°'tf^j = l'^^*'^^^t^''j^^^"^P^^''j^^"'^'P^^''i^^yti
Theorem 7.1:
(x.' X.;)'.
il
i2
-^
- i(x.)].
vi)
ZTk/d^, ^/(d+d_)>.
o
o
Theorem 1.2:
Suppose that
6.5 are satisfied for
are i.i.d.; ii) Assumptions 6.0 - 6.2, and
z.
i)
v = 0,
is absolutely
x = (x ,x^)
iii)
s > 4.
d
continuous with respect to the product of the uniform density on
the distribution of
~
1
{g.(x.,x^)
+
X
x
2
with bounded density
,
and f(x ,x
't)},
d_
different iable to order
1-11
f(x^)
X
continuously different iable in
Assumption 7.1 is satisfied with
r > Tnyd.
7. 3
Vn(^-p)
Then
E[x
on the support of
o
-^
N(0.
to order
d
> 3n./Z,
V)
and
and either a) ^ =
f(x)
and
)
2
1
\x
x
are continuously
]
or
b)
o
-^
x.
iv)
y > n./[2(d+dj]
o
T < (s-2)/5s,
V
is
'[l(f(x)\^)
on the support of
d.
and
V.
Nonparametric Prediction
P = Zi=iti(v.} - i(x.)]/n.
V = Zi=i^i/n,
+
JJ
Suppose that
6.5 are satisfied for
with respect to
X
2
't)},
12
V = (v ,x
x
),
and
f \(x
d..
o
1-11
f
X
Assumption 7.1 is satisfied with
Vn(^-^)
-^
and either a) ^ = (g (x
f(v)
and
)
on the support of
continuously different iable in
Then
\(x
)
N(0,
to order
d
>
3a/2,
V) and
40
V
is absolutely continuous
v
iii)
s > 4,
with bounded density
and
.
are i.i.d.; ii) Assumptions 6.0 - 6.2, and
z.
i)
v =
different iable to order
r > rn/d.
'^i-'i^i
<I-"i[p^(v.)-p*^(x.)]/n}'z'^p^(x.)[y. -i(x.)].
J=l
Theorem 7.3:
= i(v.)-i(x.)-p
G.
2
Eix
or
x
d„
o
1
\x ]
b)
-^
V.
+
are continuously
Tl(f(x)\'S)
on the support of
T < {s-2)/5s,
)
x.
is
iv)
y > ny[2(.d+dj].
o
Average Derivatives
7.4
V = Z.^jG.G^/n,
Ely .\x
.,
\a.(z)\
J
^
,
Suppose that
i)
(J =
00,
X
in
X
other than
and
K = 0(n
(s-2)/[s(7+4v)],
and
7.5
s'
V
-^
and
r
)
E[\\f(x)
or
such that
that is continuously different iable
f(x)
df(x)/dx
a)
W
]
m(z,h)
zero on the boundary
df(x)/dx
v) K ^ n
is finite.
linear in
T < (s-2)/ [s(14+4v)]
b)
4yi/(^i-2)
is absolutely continuous respect to the
-12
for either
>
and the distribution of all elements
x
with density
x
there is
on the interior of a convex support,
of the support,
>
X
J),
...,
product of Lebesgue measure on
of
iii) Assumptions 6.0 -
ii) K = K(n) is not random;
1,
g(x.) =
Assumption 5.1 is satisfied, and
and 6.1 b) are satisfied,
<
^
111
y ._, X ._,...] ;
6.5,
6.2,
-
[5:.",{am(z..h(x.))/5h>{ap^(x.)/ax,}'/n]z"p^(x.)[y.-i(x.)]
^j=l
1
1
1
J
+
Theorem 7.4:
= m(z.,h(x^))
G.
.
Then
h
and
Vn(0-^)
-x
for some
F <
-^
N(0,
V)
V.
Discussion
The conditions in Section 7.4 for multidimensional average derivatives
are quite restrictive,
results were available,
but could be relaxed if better approximation rate
on approximation of derivatives and unbounded
functions by power series.
In particular,
having an approximation rate for
derivatives.
Also,
6(x),
nonrandom
K
results from not
which is unbounded for average
one can relax these conditions substantially for weighted
average derivatives, where
fi
= E[w(x)3g(x)/Sx]
41
and the weight function is
if
such that
f (x)
the support of
w(x)5f(x)/5x
x,
+
5w(x)/Sx
is continuously differentiable on
including allowing for random
K.
The estimators for the semiparametric random effects and for
nonparametric consumer surplus seem to be new.
The result for nonparametric
prediction is the first result on v^-consistency of an estimator, and includes
the unconditional variance in the estimation of the asymptotic variance.
Series estimators for average derivatives were previously suggested by Andrews
(1991),
although the result here includes conditions for V^-consistency and
apply to times series.
42
Appendix
Throughout this appendix
Proofs of Theorems
A:
will denote a generic positive constant that
C
can be different in different uses and
will denote
o
inunediately from Theorem 2.1 of Van der Vaart
linearity of
any
S (z) e
N(0,
E[(0(z)',S„(z))'
(v^(p-p
!f,
conclusion of Lemma
b' (G
[/i(F
ZS
)-p(F
)
'
U
(i//(z)'
A.
,S^(z))]
converges to
-,)])
ZU
E[ (0(z)-d(z) )s(z)
to be any element of
Proof of Lemma 5.1:
h.
= h(v.),
max.
Ilh.-h.ll
i^n
11
=
(n
<
02).
since asymptotic
w.p.a.l.
e
Thus,
E[0(z)S (z)
B
and path,
b
]
it
]
,
and
Also,
b,
while by pathwise
= b'E[d(z)S^(z)
]
Since
.
follow by Assumption
2.
so that choosing
s(z),
s(z)
0(z) = d(z).
it follows that
suffices to prove the result for scalar
= h (v.),
h.
follows that for any vector
b'E[0(z)S^(z)
i/»(z)-d(z),
It
it
for all mean-zero
=
]
b'
m(z,h) = m(z,p,h).
m.
Let
By Assumption 5.2,
a standard result implies
^
max.
i^n
{b„„(z.)}
02
1
by an expansion,
111
111
liy.",m(z.,h. )-m.-M(z. )(h.-h.
'^1 = 1
(A.l)
follows
Furthermore, by the the final
).
of Van der Vaart,
1
this equality must hold for any
let
(1991),
)
converges in distribution to
S (z. )/v^)'
,X]._
1—1 6
i
differentiability it follows that
that
)i(F
and the Linbergh-Levy central limit theorem imply that for
p
o
(1).
P
Pathwise differentiability of
Proof of Theorem 2.1:
o
P
< 7.", IIS^m(z.
)/nll
h. )/5hah'
11
^1=1
,
11
llllh.-h. Il^/n
-1/2-(l/2)-l/&
1/&
,,2,
\\^ n .Ilh.-h.ll
J^..(
,,c
s max.^ {b(z.)}y.
/n = ^ (n
02)0 (n
02) = o (n
).
^1=1
i^n
1
1
1
p
p
p
,
,
Proof of Lemma 5.2:
W
let
aiLiiLia
_.
= v.,
m(w ,t) = t(v),
a
Suppress the
W_,
and
= z..
Vf
= V,
g(w) = M(z).
satisfied by hypothesis.
J
.
,
,
subscript.
q = d,
In Andrews
x = l(vey)h(v),
.
(1990b) notation,
k
a
= dim(v),
Note that Andrews Assumption F ii) is
Also by hypothesis,
43
t
=
l(v€V)h
(v)
has
derivatives of up to order
W
on
q
and for
,
^ q,
|X|
sup
-
\D t(v)
^
a
d\q(v)|
= o
T = {t(v)
Let
(1).
sup^^^ |d\(v)
:
|
<
with
X
Ul
Then sup cT^^|xI<
^ q}.
+
|
1
a
\
for all
IP'^t^Cv)
^^\^y^
a
"^W
1/2
2
'^ ""^(^^l
"^^^
giving
< "•
a
Andrews Assumption F iii).
W
(and hence constant) outside
m(w ,t) = t(v)
by construction,
Also,
is zero
giving Andrews Assumption F iv).
,
Also,
s'
cL
implies
> 2/1/(^-2)
are satisfied.
so that Andrews Assumptions v) and vi)
2s'/(s'-2),
>
^l
Finally,
t(v) e T
w.p.a.
that the conclusion follows by Theorem
Proof of Theorem 5.3:
E[m(z,p.h„(v))],
Let
=
QO)
iii
n
iii
0)'Wiii
n
^1=1
(p),
/.-[tCvj-tCv)
,
g,
that
while by
z.
ergodic,
= m{/3)'WmO).
QO)
and
QO) -^ QO).
Noting that
conclusion then follows from
and Gill (1982).
Assumption 5.2,
so that
sup
QO)
QO)
sup
_IIQO)-QO)ll
.^llm
-^
0.
llm
0)-m
m
O) -^
is convex by
iii
O)
uniquely minimized at
Under Assumption 5.6 b),
while
p,
Under
implying
O) -^ mO),
iii
0)-mO)ll
so
0,
=
mO)
h„(v) )/n,
1
Assumption 5.6 a). Assumption 5.2 implies that for each
0,
-^
dv
]
of Andrews (1990b).
II. 7
= y.",m(z.
n O)
and
1,
sup„
^llrii
p€jD
-^
n
0)-iii
p
n
m (p),
convex,
,
0)ll
-^
0)ll
so
the
as in Anderson
-^
by
follows by Andrews (1987),
The conclusion now follows by the Wald
argument for extremum estimators.
Proof of Theorem 5.4:
By Lemmas 5.1 and 5.2, Asymptotic Linearity and
Stochastic Equicontinuity are satisfied, and by orthogonality of
h(z)
and
h(z),
v'iiJT'Kz)
[h(v)-h(v) ]dF(z) =
limit theorem of White and Domowitz (1984),
N(0,n).
M(z)
with
Then by the a-mixing central
0.
Vnrii
n
,m./V^
0-) = T.
^1=1 1
+ o
p
(1)
—
The remainder of the proof then follows from a standard minimum
distance argument, such as that in Newey (1988b).
Proof of Theorem 5.5:
It follows by Lemma 5.1
44
that Asymptotic Linearity is
>
so that by Asymptotic Differentiability,
satisfied,
the triangle inequality,
and the a-mixing central limit theorem of Vfhite and Domowitz (1984),
= y. ,u./v^ + o
^1=1
1
—
(1)
p
n
(S_)
The remainder of the proof then follows by a
N(0,n).
)
vTim
standard minimum distance argument, such as that in Newey (1988b).
By a mean value expansion,
Proof of Theorem 5.6:
X-^Jlin--m.ll^/n £
^1 = 1
11
(A. 2)
=
Therefore,
and
llu.-u.
n
11
,llu.-u.ll
T.
^1=1
„-u.
i+t
/n =
(e
P
n
+
Jl
+
,u.u'
^1=1
I
llu.llllu.
„-u. .11
1
i+l
i+i
i+£
Q„ = ).
Let
).
+ E„^.,w(£, L) [n„ + Q']
Q = n
Illlu.
1
1
llh(v)-h(v) ll^)j;." fb, . (z. )^+ b.. (z, )^]/n
ll^-^_ll^+sup
"^0
"^
^1 = 1
V
10
1
01
i
(€^).
P
a),
(
Note
in case b).
Jlllu.-u.
llu.
i+£
1
„/n,
i+i
1
Q = Q„
in case
1^0,
so that for all
II,
^
- u.u'^.ll
llu.u'.^.
1
by the Cauchy-Schwartz and Markov inequalities,
(A. 3)
lin„
i {I^"fllG.-u.ll^/n}^''^{j:"~fllG.^,-u.^„ll^/n}^^^
^1=1
^1=1
1
1
i+£
i+t
- nJI
i
t
„2
,,-n-t,,
+
<r.
JIu.ll
^1=1
.^-i,,
_^
.
,l/2,^-£,-
/n>
{Y.
^1=1JIu.i+£
1
,,2
^^ = l"''i+£"
.l/2.Ji-£
.
„2
„-u.
„ll
i+£
'
.
.1/2
^"^
,,2,
^^ = l"'^i""i"
'^"^
,1/2
/n}
i y."jlu.-u.ll^/n + {y.^Jlu.ll^/n}^'^^{y:.^JlG.-u.ll^/n}^^^ =
^1 = 1
^1 = 1
^1 = 1
1
1
1
11
In case a),
-
lin
=
nil
lin„
the law of large numbers,
sequence of numbers
were
L'
6
n
0(e
for
lin
),
n^
-
=0 p (e n
)
= o
p
(1),
giving the conclusion.
^0
such that for
L'
while
lin
In case b),
= 6 /e
n n
,
-
nil
n
).
follows by
there is a
Prob(L
:£
L'
)
—^
1,
can be chosen as an integer by the argument from Newey (1990b).
Then by boundedness of
one
- n^ll
(€
p
nil
s
||5
-
5
w(£,L)
II
and eq.
+ cr.^jin.-njl
^£=1
£
(A. 3),
^
£
with probability approaching
(1+CL')0 (€
p n
)
= o
p
(l).
AIso,
it follows that
so by Davidov's inequality,
arguing as in Kool (1988),
= n
with probability approaching one,
+
J]fi_.,w(£,
L) [n. + n'],
45
l/'/n =
iin
-
a
s
II
-n
\\n
W
+
lin„
CX;„^
- n.ll = O
(L'/Vn) = o (l).
applying the
Finally,
dominated convergence theorem as in Newey and West (1987), it follows by
-^
that
00
wn^+Y.^^'^u.Din^
-
nil
= o (1).
To show consistency, note first that iii) implies that
Proof of Theorem 6.1:
K^''^Cq(K)/v^ = o(1)
+ n^]
L
and
K^'^^<q(K)K~^'^'^ = o(1),
satisfied by Theorem 6.1 of Newey (1991).
so that Assumption 5.2 is
Consistency of
then follow by
p
Let objects without subscripts denote vectors of observations.
Theorem 5.3.
Asymptotic Linearity follows by Lemma 5.1 and iv), since by Theorem 6.1 of
Newey (1991),
5.
llg-gll
/n =
(.K/Vn +
K
= o
)
(n
02
)
so Assumption
,
4 is satisfied.
Asymptotic Differentiability is shown, with
Next,
Section
M(z)
For notational convenience the
4.
treated as a scalar (for vector
j
subscript will be dropped and
the result follows by applying
M(z)
the following argument to each of its elements).
(M(z
),
.
.
,M(z ))',
.
and
6 = QM.
as derived in
a.(z)
Then by
Q
Let Q = p(p'p) p'
M =
,
idempotent,
M'(g-g) - 6'(y-g) = 6'g - M'g - 5'y + 5'g
(A.4)
= (6-5)' (g-g) + S'(y-g) + (5-3)'g = Rj + ^2
""^
^3'
By Theorem 6.1 of Newey (1991) and iii),
iRjI/Vn s
Vnlia-5lllli-gll
=
By Lemma 8.1 of Newey (1991),
such that
p^(x)'7r^
O
Then by
Q
(.VniK.'^^^/Vn + k'^S^'^) (K^^^/'/n +
there are
K
g„(x) = p (x)'7i
K
llg(x)-g„(x) IL £ CiC^'^
K
U
idempotent.
46
and
K
k"^'^)
and
g
5
)
(x)
= o (1).
=
K.
Il5(x)-5^,(x)
Is.
ll„
V
^ CK~^5'^'^.
IR^I/V^ =
(A. 5)
*
By
II
6
I
)'
(y-g) |/v^ +
(6-6^)'Q(y-g)
I
|/v/H
(1).
^
R
By Lemma 9.8 of Newey (1991),
(1).
R^^ ^
so that
(iC/n).
= o
)
with probability approaching one,
K i K
II5-6-II
[
(y-g) 'Q(y-g)/n]
^'^^
=
By the strong mixing hypotheses, Davydov'
p =
inequality, and Assumption 6.4, for
1
(6-6-
(Vnk~^^*^6^'''^) = o
(y-g)'Q(y-g)/n =
K
:£
(6-6^)' (I-Q)(g-g^)|/Vn = R^, - R22 - R23-
llg-g-ll/Vn =
(K
(I-Q)yl/V^
idempotent and
I-Q
116-6-
I
15'
(6-6-)' (y-g)|^/n = Op(i:^E[
|
(6-6^,)
K
and
1-2/ji
_
=
(y-g) |^]/n) = Op(j:j^E[
'
|
2
R
[K,K],
(6-5^)
'
£
(y-g) |^]/n) ^
Op(j:j^|(6(x)-6j,(x))(y-g(x))|^) = 0p(E^|(5(x)-6j,(x))(y-g(x))|^) = Op(I^K"^V'').
2d_/a
Then since
y^K
6
>
1
o
= 0(1)
follows from
K
implying
R
follows,
J\
same form as
follows.
R
,
Finally,
with
note that
so that
2^/((j-l),
y and
|a(z)|
oa
= o
^1
p
>
s
Note also that
(1).
so that
and
is bounded,
for
converging to zero,
6
U
interchanged,
M
6(x)
<
C^^^^K
2/i/((j-l),
|e|
=0
R /V^
<
R
for
cd
has the
3
also
(1)
s >
so that all of the
hypotheses of Theorem 5.5 are satisfied and the first conclusion follows from
its conclusion.
To prove the second conclusion,
(1991),
li(x)-g(x)|
by Theorem 5.6,
=
note first that by Theorem
(K^''^Cn(K)[K^''^/v^ + K'"^^""]) =
(A. 6)
1
In
M
111
Cr."jla.-a(z. )ll^/n £ T." JIM'p^'p^(c. -c.
^1=1
^1 = 1
1
1
+
111
y'."jl(M'p^~p^-6. )c.ll^/n =
^1=1
47
+
R^
and hence
(€
P
a
n
).
can be
e.
=
1
Then
ll^/n +
123
R,
Therefore,
Let c. = y.-g(x.),
).
)
a,
111
assumed to be scalar without loss of generality.
1
of Newey
11
V.
It suffices to show this result for each element of
11
).
1
Jla.-a(z.)ll /n =
^1 = 1
it only remains to be shown that
y.-g(x.), M. = 5m(z. ,p,h(x))/ah, M = (M
(€
6.
+ R^.
7." ,11 (M-M) 'p'^'p'^e.
"^
^1 = 1
'^i
1
Il^/n
6
By
and
s 2
idempotent,
Q
M'QM/n ^ M'M/n
=0
(1).
Therefore, by
Theorem 6.1 of Newey (1991),
£ sup li(x)-g(x)|^M'QM/n =
X
R,
(A. 7)
1
Also,
&
by
£
IIM-MII^/n =
P
1,
6
£
Finally, note that
M'p
11
6(x.
is
p.
M
on
p
.
satisfied.
Consistency of
= d/n.
a
Assumption 6.6
(V^C.
(K)
First,
[K/n+K
5(x.
= o
,
Thus,
)
by Theorem 6.1 of Newey (1991),
ll^/n =
(e^).
P
n
follows as in the proof of Theorem 6.1, noting
Theorem 6.1 of Newey (1991) and
11^
=
so that Asymptotic Linearity follows by the
same Taylor expansion argument as used in the proof of Lemma 5.1.
Asymptotic Differentiability,
*j,
= J-5(x)p^(x)dF(x),
6j,(x)
= p*^(x)'Z^^*j,,
n
is the series estimator
)
Vlillh(v)-h (v)!!^ £ Ci/nllg(x)-g(x)
])
(e^).
P
consider the case where Assumption 6.7 a) is
it follows by
Next,
that
^
(M-M)'Q(M-M)/n =
1
11
Proof of Theorem 6.2:
(n^''^)
p
where
)
R^ £ (max.
G?)y'.'^JI6(x. )-5(x.
lin 1 ^1=1
3
(A. 9)
that
z!
from regressing
of 5(x)
(M-M)'Q(M-M)/n £
so
(M-M)'p*^"p^ll^/n =
R^ £ (max.^ e^)y;."jl
lin 1 ^1=1
2
1
(A.8)
.
ri
and a Taylor expansion,
1
(€^n~^^^),
n
(e^).
P
To show
let
Z^ = Xp^(x)p*^(x)'dF(x),
g^(x) = p^(x)'7rj,,
hj,(v)
71^
= 2:^^E[p^(x)g(x)].
= A(v,gj,) = A(v)'7rj,,
* = Ij"iM(z^)A(v.)'/n.
Note that
h(v)
is invariant to nonsingular linear transformations,
without loss of generality
p (x), p^(x),
...
so that
can be assumed to be the
functions in the conclusion of Lemma 8.4 of Newey (1991), for which the
48
smallest eigenvalue of
sup
that
^
is bounded below and there is a constant
Z
s
(x)|
ID p
[K]
CC,
by Assumption 6.6
Then,
.
such
C
maXj^^j^llA(v,Pj^)llQ i C^(K).
(A. 10)
It then follows by
|M(z)
finite for
|
and Lenuna 9.6 of Newey
s > 2/i/(/i-l)
(1991) that
(A. 11)
K
— 1 /9
=
p
(K
—
—
/N
C.(K)/yn) = o
A
=
IIZ-ZII
.
— ?
{KC,AK) /Vn) = o
p
.
p
P
as in the proof Lemma 9.8 of Newey (1991),
Also,
(A.
ll*-*-ll
12)
liz'-^'^^p' (y-g)/v/nll
(K^'^^/'/K).
~
~
K
g„(x) = p (x)'7r
K.
by Lemma 8.1 of Newey (1991) there exists
Next,
that
=0
llg(x)-g^(x)ll.
:£
Note that
CK~".
Uti-tiII
:£
C[ (7i-7i)'Z(7r-7i)
^'^^
]
such
=
C|gj^(x)-ij^(x)l2 ^ C[|g(x)-gj,(x)l2+lg(x)-ij,(x)l2] ^ C|g(x)-g^(x) 1^ ^ CK"".
Therefore.
(A. 13)
llg(x)-gj,(x)ll^ £ llg(x)-gj,(x)ll^ +
:£
C(K""+llp^(x)IIJIn-7ill)
A
=£
llgj,(x)-gj,(x)ll^ s C(K~"+llp^(x)ll^ll7i-Trll)
CK^''^<.(K)K~",
A
By the definition of the least squares coefficients
Newey (1991) it follows that E[p^(x) {g(x)-g^(x)
CK
0,
(A. 14)
so that under uniform mixing,
E[Ij^llp^'(g-g^)/v/nll^]
which converges to zero by
i).
for
p
=
}]
[p
and Lemma
7i
=
),..., p
£ CEj^E[llp^(x)ll^(g(x)-g^(x))2]
Without uniform mixing,
mixing and boundedness of p (x), g(x),
and
49
g„(x)
that
it
1
of
|g(x)-gj,(x) 1^ ^
and
(x
8.
(x
^
)]',
CJ:^KCq(K)V^%,
follows by strong
which converges to zero by ii) b).
K
2
'
Zi/"P
lip'
(g-g/>)/Vnll
2
= o
K.
= o
and
|A
p
($-*-)'Z"-^p'(y-g)/n + *' (i"-^-z"^ )p' (y-g)/n
.
min
.
min
-1
ll*-'Z
II
:£
C[*-'Z
hence
-1
II*-' (Z~-^-Z~-^
)
II
^
= o
^ ll%'Z~-^
and
=
V^_5
the largest eigenvalue of
IIZ-ZII,
II
^ CE[5(x)
5' (y-g)/n.
($-*-)
K
'z"'^
:£
I!
ll$-*^IIO
K
llll
.
= o
= o
it
(J
,
P
J
(K),
(1)
= o
(1).
p
-1
2
so that
],
= o
(Z-Z)Z~-^II
p
ll*/>'Z
.
II
K
=
(1),
and
p
now follows by
It
p
v^
that
P
IIZ~'^''^p' (y-g)/v'nil
p
:£
|
I^
v^llp' (g-g-)/nll
]-XM(z) [h^(v)-h(v)]dF(z) )/n
K.
K.
= o
1/2
*/>]
K.
(Z)
so that
-^
K.
V^.4
{Z)-A
)
6'(y-g)/n = I,?,R, +
+
is bounded in probability,
'^
Also,
K
I\.
(6^-6)' (y-g)/n + J^.^^ (M(z J [h^(v. )-h(v.
IIZ-ZII
Z"'^
.
p
111
1—1
+ J>l(z)[h^(v)-h(v)]dF(z)
By
it follows by the Markov
5:.|^,M(z.)[h(v.)-h(v.)]/n = ($-*)'Z"-^p' (g-g/>)/n + *' (z"-^-z"-^ )p' (g-g/>)/n
+ *j^'s"^p'(g-gj^)/n +
+
^
(g-g')/\^ll
it follows by a little arithmetic that
Now,
(A. 16)
lip'
with probability approaching one,
(g-gv,)/v'nll
inequality that
since
Then,
=
1,
2,
3).
Also,
follows similarly from eq.
by
(A. 11)
and iii) that
.
p
Next, note that by either uniform mixing or the bound on the conditional
covariances,
E[{ (5j,-5)' (y-g)}^/n] £ CE[{l+e^}{5j^(x)-5(x) >^] £
CE[{1+E[e^|x]}{5,,(x)-S(x)}^] s C€-(K)^,
K.
(A. 17)
o
so that by Assumption 6.4 and iii),
nlR^I^ = Op(j:j^E[{(5^-5)'(y-g)}^/n]) = O^ilj^e^iK)^) = o^.
Similarly, note that by eq.
(A. 13),
|h^(v)-h(v) |^ £ CK^^^C^{K)K~",
by strong mixing, Davydov' s inequality, and i) for
50
p =
1
- 2//J,
so that
n|R_|^ =
(A. 18)
p
I
{l^\mz)lh^{v)-h{v)]\l) =
K
K
p
Furthermore, by Assumption 4.1,
= E[5(x)p
*
p
^'^)
(5:^KC.(K)^K
K
A
= o
.
p
so that by
(x)],
j6{x)g-(x)dF(x) = J-6^(x)g^(x)dF(x) = j5^(x)g(x)dF(x)
v^lRgl = ••n|j6(x)g^(x)dF(x) - E[5(x)g(x)
(A.19)
= •K|J[6(x)-5£,(x)][g/>(x)-g(x)]dF(x)|
K
:£
Vne-(K)K"%
:£
O
K.
where the last inequality follows by
]
Vne-(K)K"% -^
o
monotonically decreasing in
e-(K)
o
For the case where Assumption 6.7 b) is satisfied,
replaced by any (arbitrarily large) positive number
n,
6.4 that all terms above where
K
o,
so that all the terms depending on
can be ignored,
a
i.e.
and
a
K
a
K
,
or
and
a
follows
K
and Assumption
K,
appear are small,
K
or
o,
a
It then
a.
bounded by a power of
Cj^I^^
K.
it follows by Lemma
8.2 of Newey (1991) that all the previous arguments hold with
bounded by a power of
0,
in the statement of the Theorem
a
can be set to
+oo,
again giving the first
conclusion.
The second conclusion will be shown only under case a) of Assumption 6.7,
For notational
because under case b) the result will follow as above.
convenience,
suppress the
j
subscript on each
Theorem 6.1 of Newey (1991) that
|h(v)-h(v)|
h.(v).
=
p
CD
=
(e
p
n
).
Therefore, by Theorem 5.6,
11
r.^Jla.-a(z.
^1=1
)ll^/n =
(e^).
P
n
€^ = K^'^^<^(K)[K^'^^/\/n + K~"]
KC^(K)^tK-^^^/\/n + K~"]
Let
if
otherwise.
it
hi
,
p,
is linear in
By eq.
so that by an expansion.
51
(K^^^C^ (K) [K^'^^/V^+K~"]
~
A
only remains to be shown that
* = y.^.m^ (z.
^1=1
m
follows from
It
(A. 10)
h(v.
)
)A(v.
1
h
)
'/n
and define
1
and
e^ =
sup^^^llA(v)ll £ CK^'^^C^(K).
(A.
20)
ll$-*/>ll
+
II
{lip-P-liy.^.llm,
|IIA(v)lll
K
Next,
+
II
-(2.,p,h(v.))ll/n
|h(v)-hQ(v)lJ.;^llm^(z.,p,h(v.))ll/n} =0^[^^).
€„ = KC,AZ)^/Vn
(*-*/> )'2~-^
(A.
+
it follows as in the proof of eq.
Also,
for
ll$-*i>ll
:£
and
is bounded,
ll*-'Z~-^ll
ll*i>'S~-^(Z-Z)i~'^ll
K
=
p
ll*-'S~^ (Z-i)i~'^ll =
that
(A. 11)
(€,+€„) = o
v
Z
p
ll^'Z"'^-*-'^"^!! £
so that
,
and
(e„)
ll^-'i'^^ll
=
(1).
p
note that
21) l.'^.\a.-(x.\^/n ^
1—1
1
1
Cj;.!?
111
J*'Z"^P.
1—1
(e. -c.
)
|^/n +05^.^,1 ($'z"^-*/>'z"^ )P.e. |^/n
1—1
K.
11
+ CX.^j|[5j,(x.)-5(x.)]c. l^/n = R^ + ^2 " ^3"
d/n, i a,
Therefore, by Theorem 6.1 of Newey(1991), and
(A.22)
|R,|
i
£ Cll$'z"^ll|llp^(x)ll|^.'^ji(x.)-g(x.)|^/n
00^1 — i
1
1
=
p
(Kr (K)^[(K/n)
=
+ k"^^^""])
-
(e^).
P
n
Also. I.^liz"^/2p.l|2/n = Z." tr(Z"^''2p_p,£-l/2^/^ ^ tr (z"^''^(P'P/n)r^/2)/n
^1=1
^1=1
1
1 1
is equal to the dimension of
(A.
23)
IR^I
Z
Z,
less than or equal to
£ C(max., c^) (T.^JIZ'-^^^P. Il^/n)
isn 1
1=1
1
+ II*'Z"^/2||2|!Z"^/2(Z-Z)Z"^''2||2]
Finally,
that |R_| = 0^(J:„ ^E[<6„(x)-5(x)}^]) =
Theorem 6.1.
Section
6,
7. 1:
so
ll($-vt) 'Z"-^^^ll^
(n^^=K(e +6,)^).
Z w
2
|x]
bounded,
(e^).
The proof proceeds by verifying the hypotheses of
Assumption 5.1 holds by
where
p
w.p.a. 1.,
Theorem 6.1, using E[e
it follows as in the proof of
Proof of Theorem
=
[
K
m(z,3,cr
i).
h) =
52
The estimator has the form of
'^
I(0<h^<l,0<h2<l)(x^-x^,$ ^ih^))'[i ^(h^)-cr^i
is bounded,
+ p(x)
X
h (x) = $( [x +p(x)
so that
from zero and one, and that
{h^)-ix^-x^)'
(•)
$
is
]/<r
fi]
Note that
.
is bounded away
)
continuously differentiable on any set
where its argument is bounded away from zero and one.
It
follows that
Assumptions 5.3 and 5.6 are satisfied for case a) of Assumption 5.6 and
CD,
< j+k £ 2.
1
Assumptions 6.1
M
note that
Also,
- 6.5
is a bounded function of
(z)
mean-square integrable functions of
of Theorem
6.
are satisfied with
1
v =
also hold, with
o
-u.
-
oo.
is the set of all
and
Co(K) = K
Noting that
d_ = d.
s =
and
M.(z) = 5.(x),
so that
x,
&
and
x
^
-
i)
1/2
ii)
it
,
u
follows by vi) that Theorem 6.1 iii) and iv) are satisfied, since each of
'
n
4r-i
,
n
r-r{d/2k)
,
,
and
follows by Theorem 6.1.
n
.s-^d/k
note
Next,
_,
converge to zero.
s =
m
..
y.-h (x)
by
Theorem 6.1 v) is implied by Theorem 6.1 iii), so
.
,
.
The first conclusion now
bounded,
—
€
so that
and the second
>
conclusion also follows by Theorem 6.1.
Proof of Theorem 7.2:
follow,
Follows similarly to the proof of Theorem 7.3 to
on noting that
E[J"^g(x ,x^)dx
1)
= £{d)E[g{x)f{x)]
]
6(x) = ie(^)n(f(x)|g);
2)
E[A(v,g)] =
Assumption 4.1 is satisfied, where
,
is the Lebesgue measure,
1
:e(^)E[f(x)|Xj] = f(x^,X2)~^f(x2)
and hence
in case b),
where
the projection has an explicit form.
The proof proceeds by verifying the hypotheses of
Proof of Theorem 7.3:
Theorem 6.2.
Section
6,
Assumption 5.1 holds by
where
The estimator has the form of
i).
so that Assumptions 5.3 and 5.6 are
m(z,p,h) = g(v)-g(x)-^,
satisfied for case a) of Assumption 5.6 and
0,
so that Assumption 6.7 a)
g(x),
so that
of Newey (1991),
Section
4,
5
(x)
'§
=
1.
6
Let
is satisfied.
To discuss
is closed,
6
so that
(x),
oo,
h (v
5
1
i j+k
)
= g(v),
:£
2.
Also,
h (v
)
A =
=
note first that by Lemma 8.0
IT(f(x)|§')
Assumption 4.1 is satisfied for
53
=
.
(x)
exists.
As shown in
= II(f(x)|§'),
so that under
Newey (1991) that
iii) b),
it follows by Lemma 8.2 of
iii) a),
the projection has an explicit form
K
1/2
and
a
= a = -d/a,
.,.,,,
since
and
.
,
n
converge to zero.
n
C„(K) = CAY.) =
that the hypotheses of Theorem
it follows by iv)
each of
6.2 are satisfied,
+
]
and that E[f(x)|x^] =
Noting that
is satisfied.
so that part b)
Under
o
n(M(z)|^) = E[M(z)|x
{x^-E[x^|x^]}(E[Var(x^|x^)])"-^E[{x^-E[x^|x^]}M(z)],
f i(x^)~'^f i(x-^).
€^(K) s k'^S'^^.
iri3-2d/n)
n
,
1/2-y (d+d-)/a
5
,
n
5r-l+(l/s)
,
The conclusions now follow from the
conclusion of Theorem 6.2.
First the result will be proven when
Proof of Theorem 7.4:
Assumption 5.1 holds by
i
m(z,p,h) = m(z,9g(x)/Sx
)
)
.
5(x) = n(f(x)
- p,
b)
a
1
n
is satisfied,
where
&
=
.
Is
oo,
j+k £
2.
As shown in Section
so that by the usual mean-square spanning
—
K
as
)
o
—
>
Also,
oa.
since Assumption 6.7
none of the conditions of Theorems 6.2 that depend on
are binding.
^ when
u
and
e_(K)
6,
so that Assumptions 5.3 and 5.6 are satisfied
9f(x)/9x |&),
result for polynomials,
is a scalar.
The estimator has the form of Section
for case a) of Assumption 5.6 and
4,
x
The conclusion now follows by Theorem 6.2,
u^
m(z,h)
r
•
(l/s)+r[(5/2)+2i^]-(l/2)
converges to zero,
1-
•
is linear in
u
h
^v,
both
and V,
...
n
•
and
(l/s)+r(2+l+2i'+4)-(l/2)
so the conclusion follows by Theorem 6,2.
54
or
A =
(l/s)+r[(7/2)+y]-(l/2)
u-n
*u
go to zero, while
otherwise
n
^
since
a
References
H. and C. F. Manski (1989): "Distribution Theory for Binary Choice
Analysis with Nonparametric Estimation of Expectations, " mimeo, University
of Wisconsin.
Ahn,
Ai,
C.
Ph.D.
(1990): "Semiparametric Estimation of Index Distribution Models," MIT
Thesis.
Anderson, P.K. and R.D. Gill (1982): "Cox's Regression Model for Counting
Processes: A Large Sample Study," Annals of Statistics 10, 1100-1120.
Andrews, D.W.K. (1987): "Consistency in Nonlinear Econometric Models,
Generic Uniform Law of Large Numbers," Econometrica 55, 1465-1471.
Andrews, D.W.K. (1990a):
I. Estimation," mimeo,
A
"Asymptotics for Semiparametric Econometric Models:
Cowles Foundation, Yale University.
D.W.K. (1990b): "Asymptotics for Semiparametric Econometric Models:
Stochastic Equicontinuity, " mimeo, Cowles Foundation, Yale University.
Andrews,
II.
D.W.K. (1991): "Asymptotic Normality of Series Estimators for
Nonparametric and Semiparametric Models, " Econometrica 59, 307-345.
Andrews,
Bickel P., C.A.J. Klaassen, Y. Ritov, and J. A. Wellner (1990): "Efficient and
Adaptive Inference in Semiparametric Models" monograph, forthcoming.
D.D. and R.J. Serfling (1980): "A Note on Differentials and the CLT and
LIL for Statistical Functions, with Application to M-Estimates, " Annals of
Statistics, 8, 618-624.
Boos,
L. and Friedman, J.H.
(1985). Estimating optimal transformations for
multiple regression and correlation. Journal of the American Statistical
Association. 80 580-598.
Breiman,
Buckley, J. and I. James (1979):
Biometrika, 66, 429-436.
"Linear Regression with Censored Data,"
Chamberlain, G. (1980), "The Analysis of Covariance with Qualitative Data,"
Review of Economic Studies, 47, 225-238.
Fernholz, L.T. (1983): von Nises Calculus for Statistical Functionals, Lecture
Notes in Statistics 19.
Berlin: Springer.
Hansen, L.P. (1985): "Two-Step Generalized Method of Moments Estimators,"
discussion. North American Winter Meeting of the Econometric Society,
Meeting, New York.
Hardle, W. and T. Stoker (1989): "Investigation of Smooth Multiple Regression
by the Method of Average Derivatives," Journal of the American Statistical
Association, 84, 986-995.
Hausman, J. A. and W. K. Newey (1991): "Nonparametric Estimation of Exact
Consumer Surplus and Deadweight Loss, " working paper, MIT Department of
Economics.
55
J. and R. Miller (1989): "Estimation of Dynamic Discrete Choice Models,"
preprint, Carnegie-Mellon University.
Hotz,
(1967): "The Behavior of Maximum Likelihood Estimates Under
P.
Nonstandard Conditions," Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, Berkeley: University of California
Ruber,
Press.
and R.Z. Has'minskii (1981): Statistical Estimation:
I. A.
Asymptotic Theory, New York: Springer-Verlag.
Ibragimov,
H.
(1987): "Estimation of Single Index Models," Ph.
Massachusetts Institute of Technology.
Ichimura,
Kim,
Pollard (1989),
and D.
191-219.
J.
18.
"Cube Root Asymptotics,
"
D.
dissertation,
Annals of Statistics,
Klein, R.W. and R. S. Spady (1987), "An Efficient Semiparametric Estimator of
the Binary Response Model, " manuscript, Bell Communications Research.
H.
(1988): "A Note on Consistent Estimation of Heteroskedastic and
Autocorrelated Covariance Matrices, " mimeo. Department of Econometrics, Free
University of Amsterdam.
Kool,
Koshevnik, Y. A. and B. Y. Levit (1976): "On a Non-parametric Analouge of the
Information Matrix," Theory of Probability and Applications, 21, 738-753.
Lorentz, G.G.
Company.
(1986).
Approximation of Functions. New York: Chelsea Publishing
Manski, C. (1975): "Maximum Score Estimation of the Stochastic Utility Model
of Choice, " Journal of Econometrics, 3, 205-228.
C.
(1987): "Semiparametric Analysis of Random Effects Linear Models
From Binary Panel Data, " Econometrica 55, 357-362.
Manski,
(1988a): "Adaptive Estimation of Regression Models Via Moment
Restrictions," Journal of Econometrics 38, 301-339.
Newey, W.K.
Newey, W.K. (1988b): "Asymptotic Equivalence of Closest Moments and GMM
Estimators, " Econometric Theory 4, 336-340.
Newey, W.K.
(1990a): "Semiparametric Efficiency Bounds," Journal of Applied
5, 99-135.
Econometrics,
(1990b): "Series Estimation of Regression Functionals,
Department of Economics, Princeton University.
Newey, W.K.
"
mimeo.
W.K. (1991): "Consistency and Asymptotic Normality of Nonparametric
Projection Estimators," mimeo, MIT Department of Economics.
Newey,
and K.D. West (1987): "A Simple, Positive Semidef inite,
Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Estimator," Econometrica, 55, 703-708.
Newey, W.K.
56
W.K. and P. A. Ruud (1991): "Density Weighted Least Squares Estimation,"
working paper, MIT Department of Economics.
Newey,
Pfanzagl, J., and Wefelmeyer (1982): Contributions to a General Asymptotic
Statistical Theory, New York: Springer-Verlag.
Powell, J.L.
J.H. Stock, and T.M. Stoker (1989): "Semiparametric Estimation
of Index Coefficients," Econometrica, 57, 1403-1430.
,
J. A.
(1976): On the Definition of Von Mises Functionals, Ph.D. Thesis,
Harvard University.
Reeds,
Y. and P.J. Bickel (1987): "Achieving Information Bounds in Non and
Semiparametric Models," Technical Report No. 116, Department of Statistics,
University of California, Berkeley.
Ritov,
Robinson, P. (1988): "Root-N-Consistent Semiparametric Regression,"
Econometrica, 56, 931-954.
Robinson, P. (1989): "Hypothesis Testing in Semiparametric and Nonparametric
Models for Economic Time Series," Review of Economic Studies, 56, 511-534.
P. A.
(1986): "Consistent Estimation of Limited Dependent Variable Models
Despite Misspecif ication of Distribution," Journal of Econometrics, 32,
Ruud,
157-187.
J.H. (1989): "Nonparametric Policy Analysis," Journal of the American
Statistical Association, 84, 567-575.
Stock,
Stoker, T.M. (1990): "Smoothing Bias in Derivative Estimation,"
School of Management, Working Paper 8185-90-EFA.
MIT,
Sloan
C.J. (1985): "Additive regression and other nonparametric models."
Annals of Statistics. 13 689-705.
Stone,
Stone,
C.J.
(1990):
"L_ Rate of Convergence for Interaction Spline
Regression," Technical Report No. 268, Berkeley Statistics Department.
Van der Vaart,
Statistics,
A.
19,
(1991): "On Differentiable Functionals," Annals of
178-204.
Von Mises (1947): "On the Asymptotic Distributions of Differentiable
Statistical Functionals," Annals of Mathematical Statistics, 18, 309-348.
H. and I. Domowitz (1984):
Observations," Econometrica 52,
White,
"Nonlinear Regression with Dependent
143-161.
57
^07}
60
Date Due
^^
g
1894
19^
'^^im
MIT LIBRARIES
3
TDflD
DD7nSM2
D
Download