0* HB31 .M415

advertisement
0*
HB31
.M415
Efficiency of Weighted Average
Derivative Estimators
Whitney K. Newey
Thomas M. Stoker
No.
92-8
April 1992
M
I
T.
LIBRARIES
4 1992
RECEIVED
AUG
Economics Working Paper #92-8
EFFICIENCY OF WEIGHTED AVERAGE DERIVATIVE ESTIMATORS
AND INDEX MODELS
»*
*
by Whitney K. Newey
and Thomas M. Stoker
June 1988, revised April, 1992
*
Department of Economics
Massachusetts Institute of Technology
Cambridge, MA 02139
**
Sloan School of Management
Massachusetts Institute of Technology
Cambridge, MA 02139
earlier version of this paper was presented at the 1988 summer meeting of the
Econometric Society at the University of Minnesota. Financial support was provided by
An
the
NSF and the Sloan Foundation. Helpful comments were provided by referees and
Chamberlain, L. Hansen, R. Matzkin, and D. McFadden.
G.
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/efficiencyofweigOOnewe
ABSTRACT
Weighted average derivatives are useful parameters for semiparametric index models,
including limited dependent variables, and in nonparametric
Efficiency of weighted average derivative estimators
is
demand estimation.
a concern, because the weight may
affect efficiency and the presence of a nonparametric function estimator might lead to
low efficiency.
This purpose of this paper
is
to give efficiency results for -average
derivative estimators, including formulating estimators that have high efficiency.
Our efficiency analysis proceeds by deriving the semiparametric efficiency bounds
for average derivative and index models, and then formulating average derivative
estimators with high efficiency.
We
first derive the
bound for weighted average
derivatives of conditional location functionals, such as the conditional
mean and median.
This bound gives the asymptotic distribution of any sufficient well-behaved weighted
average derivative estimator.
e.g.
We compare
efficiency for different location measures,
mean versus median, finding that the comparison
Many semiparametric
is
similar to the location model.
limited dependent variable and regression models take the form
of index models, where the location measure (e.g. conditional mean) depends only on
linear combinations of the regressors,
i.e.
efficiency bound for these index models.
on "indices."
We
We then compare
derive the semiparametric
the bound with the asymptotic
variance of weighted average derivative estimators of the index coefficients.
the form of an efficient weight function
elliptically
when the
We
distribution of the regressors
symmetric and discuss existence of optimal weights.
We
derive
is
discuss combining
different weights to achieve efficiency, in the process deriving a general condition for
approximate efficiency of a pooled minimum chi-square estimator
model.
Also,
we
in
a semiparametric
discuss ways the type and number of weights could be selected to achieve
high efficiency in practice.
Keywords:
average derivative, index model, efficiency bound, optimal weights, minimum
chi-square, spanning condition.
1.
Introduction
Average derivatives are useful parameters
number of semiparametric models.
in a
As
discussed in Stoker (1986), they can be used in estimation of index models, including
limited dependent variable models and partially linear regression models.
used
in
nonparametric demand estimation, as
Efficiency of average derivative estimators
is
and Jerison
(1991).
a concern, because there are several types
Also, the presence of a nonparametric function estimator might
that have been proposed.
lead to low efficiency.
in Hardle, Hildenbrand,
Also, they are
This purpose of this paper
is
to give efficiency results for
average derivative estimators, including formulating estimators that have high
efficiency.
Our efficiency analysis proceeds by deriving the semiparametric efficiency bounds
for average derivative and index models, and then formulating average derivative
estimators with high efficiency.
We
first derive the
bound for weighted average
derivatives of conditional location functionals, such as the conditional
mean and median.
For the conditional mean the previously suggested estimators of Hardle and Stoker (1989)
and Stoker (1991a) attain this bound.
this
This result
is
what one would
bound places no restrictions on the data distribution, and
estimator that
is
in
anticipate, because
such cases any
asymptotically equivalent to a sample average and sufficiently regular
will be efficient (e.g.
We
see Newey, 1990a).
functionals such as the median.
We
find that
also give the bound for other location
when the efficiency
of average derivative
estimators for different location functionals can be compared, the comparison
to that for location models, e.g. with the average derivative conditional
is
similar
median being
more efficient that the conditional mean for "fat-tailed" distributions.
Many semiparametric
limited dependent variable and regression models take the
form
of index models, where the location measure (e.g. conditional mean) depends only on
linear combinations of the regressors,
on "indices."
i.e.
We
efficiency bound for these index models.
-
1
-
We
derive the semiparametric
then compare the bound with the asymptotic
variance of weighted average derivative estimators of the index coefficients.
In an index model, consistent estimators arise
functions, so an important efficiency question
is
from the use of different weighting
We
the choice of weights.
form of an efficient weight function when the distribution of the regressors
symmetric.
elliptically
expectations
is
It
is
also
shown that
We
necessary for existence of an efficient weight.
discuss the
showing that
possible to obtain an approximately efficient estimator by pooling using
We
give general results on
is
linearity of certain conditional
possibility of combining different weights to achieve efficiency,
chi-square.
derive the
when such pooling
it
is
minimum
will lead to efficiency in
semiparametric models and on how the number of estimators to combine can be chosen from
the data.
efficiency
These results are specialized to derive conditions for achieving approximate
from combining many weighted average derivative estimators.
ways of combining a few weighted average derivative estimators so as
Also,
we suggest
to achieve high
efficiency.
Other papers give some efficiency results on average derivatives or index models.
Chamberlain (1987) previously derived the semiparametric efficiency bound for conditional
mean, single index models.
Following our
initial
work Samarov
(1990) gave the
Newey
efficiency bound for the (unweighted) average derivative of the conditional means.
(1991a) gives efficiency bounds for linear functionals of
includes average derivatives of conditional
mean-square projections (that
means as a special
papers gives regularity conditions for the bounds.
2
case).
None of these
Recently Hall and Ichimura (1991)
derived some efficiency results for index estimators when there
is
a residual that
is
independent of the regressors.
We
cite the
appear
working paper Chamberlain (1987) because the index model bound does not
in the published version.
To be precise, they do not exhibit a sequence of regular parametric submodels for which
Cramer-Rao for the submodel approximates the candidates for the bound they suggest.
the
- 2 -
Weighted Average Derivatives and Partial Index Models
2.
Let
denote a dependent variable,
y
loss function of a real-valued variable,
x
k x
a
1
vector of regressors,
and
g(x) = argmin E[p(y - g)|x].
(2.1)
Here
g(x)
mean for
is
Examples include the conditional
a conditional location function.
pic) = c
2
,
pie) = |e|,
the conditional median for
exotic location functionals such as quantiles or expectiles.
properties of estimators of weighted average derivatives of
(x
a
pic)
T T T
,x„)
suppose
,
= 5a(x)/5x
x
Our interest
g(x).
derivative of
g(x)
is
in
Partition
continuously distributed, and for a function
is
A weighted average
.
as well as other more
a(x)
x
x =
as
let
a' (x)
is
5 = E[w(x)g'(x)],
(2.2)
where
w(x)
is
For the average derivative to be well defined,
a scalar function.
must be continuously distributed, but
x„
can be discrete.
A primary motivation for weighted average derivatives
g(x) = G(x]0,x
(2.3)
Under
this model,
derivative
is
x
is
a partial index model, with
).
T
5 = E[w(x)3G(x (3,x „)/3(x
proportional to
j3,
T
./3)]*/3,
so that the weighted average
and hence can be used to estimate
|3
up to scale.
Rodriguez and Stoker (1992) have recently used this model in specification analysis for
estimation of conditional means.
The motivation for the index model where
not present
is
well
known
(e.g.
g(x)
see Stoker, 1986).
variety of semiparametric regression models.
- 3
is
the conditional
mean and
x
Other cases can be motivated by a
In particular,
suppose that
is
y|x = y|(x^,x
(2.4)
2
),
a conditional distribution index model that
allows for
x
in
x(r)
x
to have a conditional
an arbitrary way.
instance,
and
y
is
it
It
is
is
many
implied by
T
+ cr(x
)
more restrictive models.
interesting
)v),
where
v
is
a transformation that can be either known or unknown.
is
could be
x(r) = l(r > 0),
heteroskedasticity to depend on
corresponding to a model
£(y)
implies that equation (2.3)
This model
mean and variance (and other moments) that depend on
y = t(x 8 + ^ix
implied by
analyzed in Newey (1990b).
For
x
independent of
The transformation
corresponding to a binary choice model that allows for
x
in
an arbitrary way, or
T
= x 6 +
that
7)
is
could be
it
x(r) = £
important for duration data.
It
(r),
then
so that partial index models are implied by
is satisfied,
transformed, semiparametric regression models that allow heteroskedasticity to depend
nonparametrically on some regressors.
For the semiparametric model
in equation (2.4), the choice of loss function
pic)
can be motivated by efficiency considerations similar to those for the linear model,
namely that
if
the distribution of
y
might be obtained by working with
is
pic)
fat-tailed, then a
more
efficient estimator
that gives less weight to large values of
pic).
This feature will become apparent from the semiparametric efficiency bounds derived
below.
Also, comparison of parameters
from different choices of
test restrictions on the conditional distribution of
and Bassett
In
(2.5)
given
may allow one
x, similarly to
to
Koenker
(1982).
some cases
choice of
y
pic)
pic)
it
is
possible to
weaken equation
An important such case
will produce a partial index model.
y = x(x^/3 + u(x
)
+ v,
x^,
x(r,
x^
will be a partial index
monotonic
is
Because the median of a monotonic transformation
(2.4) so that essentially only one
is
in
This model
- 4 -
is
where
median(v|x) =
0.
the transformation of the median, this
model for the conditional median, where
other conditional location measures.
r,
is
pic) =
|e|,
but not for
a generalization of one considered by
Powell
For the case where
(1991).
x
not present,
is
Doksum and Samarov
(1992) have
and
when
suggested using average derivative estimators to estimate
t(v)
is
t(v)
its
inverse
monotonic.
this paper is to develop the efficiency properties of weighted
The primary purpose of
average derivative and partial index estimators, and discuss how and when different
weighted average derivative estimators can be combined into an approximately efficient
estimator of a partial index model.
It
is
beyond the scope of this paper to discuss the
properties of particular estimators, although the results here have implications for the
asymptotic properties of average derivative estimators.
As discussed
in
Newey
(1991a),
the semiparametric efficiency bound for an unrestricted functional, such as an average
derivative, is the asymptotic variance of any sufficiently regular estimator.
Thus, one
would expect that asymptotic variance of average derivative estimators to have the form
given below.
For instance, consider the following kernel estimator that
p(e)
is
p(e) =
not smooth (e.g. for
Let
|e|).
f(x)
is
well-defined even when
be the density of
x.
By
integration by parts
a = E[£(x)g(x)],
(2.6)
For a kernel
K(v)
2(x.)
h
w(x) =
(1991a).
3
1
let
f(x)
=
T^K.
(x-x.)/n
^i=l h
be a kernel
i
g(x) =
argminT. =1 K,(x-x.)p(y.-g)/n
Then an estimator of
1
i
corresponding to equation (2.6)
5
is
1
y(x
[5:
be the kernel estimator
)x /n]"
i
i
j:
y(x.)i(x.)/n,
i
ill
= -w'(x.) - w(x.)f'(x.)/f(x.).
i
For
= h" X(x/h),
K, (x)
S = (E." w(x.)/n)"
1
(2-7)
'
k
let
density estimator and let
of Tsybakov (1982).
= -w'(x) - w(x)f (x)/f(x)
£(x)
i
and
pic) = e
2
,
this estimator similar to those analyzed in Stoker
3
The two leading terms form a nonparametric estimator of the identity matrix, and so do
An alternative approach
P
Let
functions.
be a
(x)
to use a series estimator with smooth approximating
is
K x
1
vector of differentiate functions.
Suppose that
linear combinations of this vector can approximate functions and their derivatives.
Consider the estimator
d = E.J^wtx.Jg' (x.)/n,
(2.8)
This estimator
conditional
is
T K
g(x) = n P
(x),
n = argmin£." p(y. -
based on differentiating the series estimator
mean case
this estimator is analyzed in
Newey
of
g(x)
T K
tt
P
g(x).
(x.)).
For the
(1991a).
Regularity conditions for asymptotic normality of these estimators are beyond the
Nevertheless, the results of this paper do provide a formula for
scope of this paper.
As discussed
the asymptotic variance of these estimators.
in Section 3, the
semiparametric bound we derive will be the asymptotic variance of any estimator that
is
asymptotically equivalent to a sample average and sufficiently regular.
3.
Efficiency for Weighted Average Derivative Estimation
In this Section
we
derivative estimators.
derive the semiparametric efficiency bound for weighted average
The average derivative
is
an unrestricted parameter,
in that its
The efficiency
definition places no substantive restrictions on the data distribution.
bound for estimators of such unrestricted parameters can be calculated as the variance of
the pathwise derivative of the parameter with respect to the distribution of the data, as
shown
in
Pfanzagl and Wefelmeyer (1982).
To discuss the pathwise derivative
z = (y,x
to
T
)
it
is
denote a single observation and
some dominating measure).
useful to introduce
f(z)
more terminology.
the true density of
A "parametric submodel" or "path"
is
z
Let
(with respect
a parametric family of
severe finite
not contribute to the asymptotic variance, but they lead to reduction in a
(1991b).
Stoker
sample bias in the estimator, as discussed in
- 6 -
i.e.
(with respect to some dominating measure) that pass through the truth,
f(z|0)
densities
such that
f(z|6
= f(z)
)
for some parameter value
value of the average derivative in equation (2.2)
when
z
and
9 = 9A
,
let
S (z)
31nf(z|9)/d9l
denote the score of
but
—Q
pathwise derivative
such that
E[ip(z)]
is
=
at
Let
.
where typically
Newey
with finite mean-square
0(z)
denote the
5(9)
distributed as
is
precisely defined, e.g., in
a function
is
f(z|9),
S (z)
=
The
(1990a).
T
(i.e.
E[i//(z)
<
i//(z)]
oo)
and for any sufficiently regular submodel,
as(e)/ae\ a
(3.1)
S Q (z)
f(z|9)
9
= e[^(z)s q (z)
n
t
].
o
When the
z
distribution of
is
unrestricted, as
here,
is
it
it
is
typically possible to
show that a parametric submodel can be chosen so that the score
function of
z
with mean zero and finite mean-square.
asymptotic variance of semiparametric estimators of
V =
(3.2)
E[0(z)0(z)
formula
Cramer-Rao bound for a submodel
V
when
Another interpretation of
estimator
(3.3)
8,
is
Ed/»(z)S Q (z)
G
T
is
all
9
is
The
parametric submodels.
](E[S Q (z)S Q (z)
This matrix
S (z)
^(z)
V
that by the reasoning of Stein (1956),
is
and the "delta-method."
approximately equal to
is
].
should be the supremum of Cramer-Rao bounds over
(3.1)
a lower bound on the
In that case,
8
approximates any
T
Briefly, the idea behind this
equation
S (z)
T
_1
])
E[S Q (z)i//(z)
bounded above by
approximately equal to
T
by
]
9
V
and
it
is
i//(z).
as the influence function of an efficient
is
satisfying
VK(8 -
8)
=
n
y.
,iMz.Wn
^i=r
+ o
(1).
p
i
The "influence function" terminology, which originated
in the
literature, is motivated by the fact that in large samples
effect of a single observation on
8.
It
is
- 7
a convenient
robust estimation
i/»(z)
way
approximately gives the
to think about the
asymptotic properties of Vn-consistent estimators, because most will satisfy equation
(3.3) for
some influence function
T
mean average
derivative estimators, as
By the central limit theorem the asymptotic variance of
Stoker, 1991a).
E[0(z)i/»(z)
(e.g.
shown
in
will by
S
equal to the bound when the influence function equals the pathwise
],
derivative in equation (3.2).
In general,
any influence function will be a pathwise derivative,
estimator satisfies equation (3.3) for some
0(z),
i.e.
if
an
and certain regularity conditions
hold, then its influence function satisfies equation (3.1) (e.g. see
Newey, 1990a).
When
the model imposes no substantive restrictions on the data distribution, the set of scores
is
unrestricted (except for having mean zero), as
it
is
in this Section.
Therefore,
there can be at most one pathwise derivative, and hence at most one influence function.
4
As discussed in Newey (1991a), this fact can be used to find the influence function of
any estimator, by calculating the pathwise derivative of the parameter that
under unrestricted distributions.
In particular, the influence function of
is
estimated
an average
derivative estimator will be equal to the pathwise derivative derived below, because
impose no substantive restrictions on the data distribution.
V
interpreting
Thus,
we are
we
justified in
not only as the variance bound for average derivative estimators but
also as the asymptotic variance of any (regular) average derivative estimator satisfying
equation (3.3).
For instance, the bound derived below will be the asymptotic variance of
the kernel and series estimators suggested at the end of Section
2,
under the plausible
assumptions that they satisfy equation (3.3) and are regular.
Because the form of the pathwise derivative
will first give this
form and discuss
it,
and
For two different influence functions
0(z)
Let
(approximately) equal to
E[<$(z)-i//(z)>
T
0(z)-i/»(z)
SJz)] = E[$(z)-ipU)>
the main result of this Section,
we
postponing the regularity conditions until the
e = y - g(x)
end of the Section.
is
and
$(z),
choosing
and differencing equation
T
<4i(z)-tl>(z)}].
- 8 -
(3.1)
S Q (z)
implies
=
u = -v(x)
(3.4)
Theorem
£(x)
E[.//(zWz)
Two
given in equation (2.6).
is
(3.6)
m(y-g(x)),
The semiparametric variance bound
then
is
T
].
interesting special cases are the conditional
= w(x)g'(x)
i//(z)
mean and median.
For the mean,
x
(3.7)
at
- 5 + £(x)[y - g(x)].
u = [2f(0|x)]
For the median,
e =
EMzWz) T
first
term
is
]
> 0)
- l(e < 0),
_1
the asymptotic variance bound
£.
w(x.)g'(x.)/n.
restrictive parametric submodels
E[i/»(z)S
because
(z)
T =
]
is
the conditional density of
e
so that
sgn(y - g(x)).
bound can be decomposed into two terms.
known, by the following argument:
E[S (z)|x] =
f(0|x)
By
E[u|x] =
0,
2
T
= Var(w(x)g'(x)) + E[u £(xWx) ].
asymptotic variance of
so that
Kg
where
= w(x)g'(x) - 5 + £(x)[2f(0|x)]
</»(z)
(3.8)
sgn(e),
sgn(e) =
and
In general, the variance
The
_1
so that
e,
given
-
(x)
is
= w(x)g'(x)-5 + «x)u = w(x)g'(x)-5 - £(x)p(x)
i//(z)
where
u =
below shows that the pathwise derivative
3.1
(3.5)
v(x) = dE[m(y-g) |x]/dg|
m(e) = dp(e)/de,
m(c),
8
i/»(z)
is
known, being the
is
the bound
when
satisfies equation (3.1) for
more
g(x)
The second piece
still
where the distribution of
x
is
T
].
Also,
mean zero function, and hence can approximate
SAz)
£(x)u.
is
y
given
x,
can approximate any conditional
Thus, the argument for equation
for suggesting the following interpretation.
9 -
f(x)
known, scores satisfy
parameterizes the conditional distribution of
E[£(x)uS (z)
We thank Gary Chamberlain
when
(3.2) also applies here.
The magnitude of second piece
unknown
depends on
g,
p(e)
2
E[u Ux)Ux)
the location parameter
where
m(e)
different
and
p(e)
argmin
In particular,
E[p(e-fi)]
were
e.
if
corresponding to
Ux)
of location
does not depend on the
i.i.d.
with density
f(e.|x),
Thus, when averages derivatives for
functions are equal, so that the corresponding bounds can be compared,
the comparisons can be carried out in a
way
similar to that for estimation of location
For example, when the conditional density of
parameters.
]
equal to the bound for estimation of
is
are given in equation (3.4).
v(x)
_
2
T
E[E[u \x]l{x)l{x)
way the asymptotic variance
-2
2
2
E[u |x] = v(x) E[m(e) |x]
while
pic),
]
similarly to the
estimators depends on the loss function.
form of
T =
for most values of
x,
given
e
x
has "thick tails"
the second piece of the bound will tend to be smaller for the
conditional median than for the conditional mean.
Turning now to the statement of regularity conditions, we first give an assumption
that
is
essential to the result.
Assumption
3.1:
w(x)f(x)
is
zero on the boundary of the support of
x.
This condition allows us to ignore boundary terms in the derivation of the bounds.
Without this assumption, E[w(x)g'(x)]
evaluated at particular points.
may
include boundary terms that depend on
g(x)
For continuously distributed regressors, the value of a
conditional expectation at a point has an infinite variance bound, so that average
For a simple example, suppose that
derivative will not be Vn-consistently estimable.
x
is
and
(3.9)
a scalar that
w(x) =
1.
is
uniformly distributed on
[0,1],
g(x)
the conditional mean,
Then
5 = E[g'(x)] =
[
g'(x)dx =
g(l) - g(0).
easy to show that the semiparametric variance bound
It
is
is
consistent with the well
particular point
is
is
is infinite
in this case,
known fact that value of a conditional expectation
/
not v n-consistently estimable.
- 10 -
at a
which
to guarantee that this assumption holds in index models,
One way
to be zero outside the interior of the support of
x.
Of course, such a choice might be
with the efficient choice of weights discussed
in conflict
w(x)
to choose
is
in Section 5.
Additional regularity conditions are useful for deriving the result.
Assumption
containing the closure of
solution at
exists and
]
that
1/2
is
bounded and continuously differentiable
E[<(y,x)|x]
is
x
continuously differentiable in
x
continuously differentiable in
and
on
on
sgn(e)
will be implied by
it
mean-square continuously differentiable
not be satisfied
y-g(x),
y
because (for
particular,
As
if
is
it
is
discrete and
m(y-g).
C(y> x )
does not hold
= D
if
y
in the scalar
m(e)
is
E[m(y-g)|x]
binary and
is
«,
the
X,
and for any
X,
X,
and
E[m(y-g)<(y,x) |x]
it
is
C(y> x '
is
straightforward
For example, for
Ely
2
|x],
a
and
x
.
m(e) = c
and for
being absolutely continuous with
f(y|x)
<
]
with bounded derivatives,
from the other assumptions and continuity of
will follow
2
such that
f(y|x)
The last smoothness condition does not seem very primitive, but
to give sufficient conditions for particular
has a unique
Xx^.
on
g
x
in
x
in
?:R
compact
are bounded on
(x)
has conditional density
mean-square continuously differentiate
is
is
E[llf ' (x)/f(x)ll
Also,
1.
w'
and
w(x)
x
given
y
=
there
E[m(y-g)|x] =
such that
Prob(i>(x) * 0)
nonsingular,
is
convex and compact,
is
in its interior
g(X)
conditional distribution of
f(y|x)
x
of
g e W, and
for
g(x)
T
E[i//(z)i//(z)
X
The support
3.2:
it
m(e) =
f(y+a|x)
1/2
This assumption will
not continuous, over the range of
will not be continuous in
g.
In
m(e) = sgn(c).
well known, the class of estimators must be restricted to obtain an efficiency
bound result.
We do
this by restricting attention to estimators
5
that are regular,
meaning that for a class of regular parametric submodels the limiting distribution of
/n(5-5(8
n
))
does not depend on
{9
n
>
when
Vn(d -9„)
n
is
bounded and the data has
Assumption 3.2 follows in these cases by Lemmas C.2 and C.3 of Newey (1991b) and by
E[m(y-g)<;(y,x) |x] = J"m(c)C(u+g,x)f(u+g|x)du
when y is continuously distributed.
-
11
-
distribution
Theorem
f(z|9
3.1:
for each sample size
)
If Assumptions
nonsingular for
all
n
in
ip(z)
2.1
n.
and 2.2 are satisfied, and
equation
V
then
(3.5),
regular parametric models such that
is the
dd(Q)/de\
V =
= E[ip(z)SJz)
0—0
of
N(0,V)
o
8
that is regular satisfies
and
where
(1),
U
is
Q
)
—
tit
Z
independent of
E[ip(z.)] =
£J
Vn(8 - 8
and
.
>
Z
[p(z.)] <
oo,
T
]
Q
+
Furthermore, if
E[ij)(z.)
is
]
supremum of Cramer-Rao bounds of
o
8
T
E[\p(z)\p(z)
and
U
where
- 8
v7T(<5
8
is
Z
is distributed
=
)
and any estimator
^"iKzJ/Vn
regular then
\p(z.)
+
=
L
ip(zj.
In the
environment considered here, where data distribution
is
not restricted, the
assumption that the pathwise derivative formula hold for a parametric submodel
convenient regularity condition.
It
is
is
verified in the proof of the theorem that there
exists a class of regular parametric models where the pathwise derivative formula
satisfied, with score that can be chosen to
The
in equations (2.7)
and
asymptotic variance
\p(z))
have
(2.8) will
T
],
\jj(z)
and kernel estimators described
as their influence function, and hence
as long as they satisfy equation (3.3) (for some
and are sufficiently regular.
Efficiency Bounds for Multiple Index Models
4.
In this Section
we
derive the variance bound for the parameters of multiple index
models, where the function
of
approximate any mean square random vector.
In particular, the series
E[<//(z)i//(z)
is
theorem shows that any influence function must equal
last conclusion of this
that given in equation (3.5).
just a
x
and parameters.
Let
g(x)
described earlier
v(x,0)
is
restricted to depend on a function
be a vector of functions of
- 12 -
x
and a
q x
1
as
parameter vector
A multiple index model
0.
one where there
is
a function
is
such
G(v)
that
g(x) = G(v(x,£
(4.1)
An important example
T
(x 0,x
T T
)
.
)).
the partial index model discussed in Section
is
2,
v(x,0) =
where
The pathwise derivative can be used to calculate the efficiency bound for
estimators of
although this approach must be modified to account for the
0,
We now carry
restrictions imposed by equation (4.1).
out this calculation, using tangent
set and projection methods.
Because
is
now an
parameter rather than an explicit functional, a more
implicit
Consider parameterizing the
specific parameterization of the problem is useful.
T TT
submodels by 9 =
(0
the distribution of
,T)
z
)
where
,
other than
a nuisance parameter vector for any feature of
is
tj
and
S„
Let
0.
denote the respective scores,
S
7)
where for notational convenience we have suppressed the
E[0(z)sL =
(4.2)
I,
E[0(z)S
T =
]
0.
By the same reasoning following equation
i//(z)
such that this equation
combination of
S
.
This
is
(3.2), the efficiency
satisfied and
nuisance scores,
i.e.
J = U(z)
and
- T
E[(S -t) t] =
on
S
q x
1
Assuming that
that
1,
is
J
Let
be
linear combinations of all possible
3"
is
C, S
with
a linear set,
E[llt-CS
let
t
2
II
<
]
be the
characterized by the two conditions
t
s
-
for
all
t
e
3",
referred to as the efficient score.
(3.1)
can be approximated by a linear
t/»(z)
3 e > 0, constant matrix
:
referred to as the tangent set.
mean square projection of
bound will be the variance
can be found by a projection calculation.
\p{z)
the mean-square closure of the union of all
c>,
Then equation
for a pathwise derivative reduces to
(3.1)
of
argument.
z
and
let
Then
S = S-t.
0(z) = (E[SS
The random vector
T
-1
])
S
S
is
will satisfy equation
and can approximated by a linear combination of scores, so that the semiparametric
variance bound for
will be
(E[SS
])"
.
- 13
Begun, Hall, Huang, and Wellner (1983) and
3"
and Wellner (1992) developed this projection form of the bound.
Bickel, Klaasen, Ritov,
The form of the efficient score
present and discuss
T
if
and then give regularity conditions.
the corresponding component of
shows that the efficient score
S =
(4.3)
2
cr~
T
V = (E[SS
_1
=
])
Although the bound
is
does not depend on
2
(E[<r"
2
2
|v]/E[<r~ (x)|v]>,
Thus, the semiparametric variance bound
(4.4)
3G(v)/3v
and
)
=
v
set equal to
is
Theorem
ft.
(x)v
4.1
(x)
= E[u
2
|x]
Assuming that
=
2
i>(x)"
E[m(e)
2
|x].
is
2
vj -
complicated,
o-
it
E[<r~ (x)v
|
v]E[cr~
2
(x)vL v]/E[cT
2
(x)
has a straightforward interpretation
of an optimally weighted m-estimator where the regression function
"concentrated out."
first
is
2
- E[cr~ (x)v
(x)u<v
v(x,/3)
we
v = v(x,/3
Let
where by convention each component of
)/3£ 3G(v)/3v,
3v(x,)3
zero
it,
the main result of this Section, so
is
v(x) > 0,
let
w(x) = Wx)/(r
-2
(x)
|
v]]f \
terms
in
has been
G(v)
= l/Mx)E[u
2
|x]},
and
G(v(x,j3),|3)
= argmin
E[u(x)p(y - g)|v(x,/3)].
..E[u(x)p(y - g)] = argmin
,
,
gtIK
g(V(X,p))
Thus, this function minimizes the population value of a weighted m-estimation criterion.
8
Consider the estimator of
with
G(v(x,8),8)
that minimizes the sample counterpart to this criteria,
substituted for
g,
8 = argminj]." w(x.)p(y. -
(4.5)
G(v(x.,/3),|3)).
1
This
is
an estimator where the unknown function
that minimizes the population counterpart,
in the population.
asymptotic variance
i.e.
G(v)
where
has been replaced by the function
G(v)
has been "concentrated out"
By the usual formula for a parametric m-estimator,
2
(E[o-"
(x)G (x)G (x)
p
p
T
_1
])
- 14 -
for
G^x) =
/3
3G(v(x,p),p))/S/3
will have
.
|
p=/3
This
asymptotic variance
- E[o-" (x)vJv]/E[(r" (x)|v].
P
ft
ft
7
2
2
G_(x) = v
(4.6)
the semiparametric bound, because
is
This interpretation suggests an approach to efficient estimation, that proceeds by
replacing
and
u(x)
G(v(x,p*),|3)
mean
the single index, conditional
uses known
known.
the estimation of
In general,
G
and
G
u(x) = Var(c|x)
In particular it follows
G
(1991a) that the replacement of
will not affect the asymptotic variance, essentially because
Also, as usual in m-estimation, estimation of
out."
when
is
will not affect the asymptotic
w(x)
variance, so that this estimator will be efficient.
Newey
In
case, Ichimura's (1991) weighted kernel estimator that
can be interpreted as an estimator of
u(x)
Proposition 2 of
by nonparametric estimators.
in equation (4.6)
by
by a nonparametric estimator
G
has been "concentrated
will not affect the
w(x)
asymptotic distribution under appropriate regularity conditions.
The
first assumption gives regularity conditions for the distribution of the data as
a function of
Let
ft.
parameter and
e(/3)
G(w(x,ft),ft)
= y -
G(v(x,ft),ft).
be left unspecified, because
E
[
•
denote the location functional when
it
G
The way that
can depend on
does not affect the form of the bound.
denote expectations for a parametric submodel when
]
respectively,
Assumption
of
4.1:
for an open set
Let
ft.
i)
ft
X
denote the support of
The marginal distribution of
such that on
X x
S,
v(x,|3)
in
ft
with bounded derivative,
differentiate in
ft
and
differentiate
v
x
tj
=
E
Let
ft
[
=
•
ft
and
]
,
x.
does not depend on
is
G{v,ft)
directly will
ft
and
-q
the true
is
ft
/3;
ii)
ft
bounded and continuously
is
bounded and continuously
with bounded derivatives, the conditional distribution of
7
The moment restriction E[m(e)|x] =
implies that 3E[cj(x)m(e) v{x,ft)]/dft =
dEMx)E[m(e)Jx]|v(x,£)]/d/3 = 0. Then differentiation of the first order conditions
E[w(x)rn(y - G(v{x,ft),ft)) v(x,|3)] =
with respect to ft gives 8G(w,ft)/dft =
-2
-2
-E[cr
(x)v |v]/E[cr
(x)|v],
so that differentiation separately with respect to the
I
\
two
ft
arguments
in
€
G(v(x,ft),ft)
gives equation (4.6).
- 15 -
2
y
x
given
at
has density
3
that
f(y|x,/3)
G(v(X,S),B)
interior the closure of
is
nonsingular and bounded,
there
iii)
a compact set
is
1 x S x
such that on
continuously differentiate with bounded derivative;
and
>
is
bounded away from zero,
9E [m(e)|x]/30 = E[m(e)S
with probability one
/3
9
with (conditional) information matrix that
bounded and bounded away from zero;
regular in
is
iv)
E [m(y
&,
[m(e(/3))
|x] is
containing in
- g)|x]
3EJm(y-g)
|x]/Sg|
&
p*
E_[m(y-g)|x] =
P
solves
G(v(x,/3),/3)
"S
E
„,
its
is
„, „,
,
g=G(v(x,|3),/3)
and
0,
|x].
This Assumption consists of more or less standard regularity conditions.
The next
hypothesis imposes additional smoothness conditions.
Assumption
in
and
y
For any function
4.2:
£(y,x,3)
that
bounded, continuously differentiable
is
and has bounded derivative, the integral
p\
continuously differentiable in
/3
and
g
with bounded derivative.
bounded away from zero uniformly
is
The first hypothesis
is
|x,/3]
is
Also, there exists a
with bounded derivatives such that
m(e)
bounded, continuously differentiable function
E_[m(e(j3))m(eO)) |x]
P
E[m(y-g)<(y,x,|3)
in
x,
/3.
similar to the last part of Assumption 3.2, so that
primitive conditions for this condition can be specified as in Section
3.
Also,
it
is
straightforward to formulate more primitive conditions for the second hypothesis with
particular
m(e).
By
2
E
[m(e(/3))
|x]
bounded and bounded away from zero
it
suffices to
P
m(e)
find
such that
E o Km(e(p))-m(e(0))>
2
|x]
small uniformly in
is
x
and
13.
For
P
m(e) =
example, in the conditional mean case, with
X x
B,
then choosing
so that
m(e)
|m(e)|
f(e|x),
with
e
choosing
continuously distributed given
|m(e)|
*
1
and
if
and
EgNe|
m(e) = e
Also, in the conditional
big enough will satisfy this assumption.
sgn(e)
* |e|
e,
m(e)
=?
x
sgn(e)
of zero will satisfy this assumption.
- 16 -
|x]
is
bounded on
except when
median case with
|e|
c =
with bounded conditional density
except in a small enough neighborhood
is
Theorem
If Assumptions
4.1:
from equation
4.3,
E [m(y-g)\x] =
(G(v,n
the
Z
E [m(y
-
g)\x]
is
fi
)
—
8E [m(c)\x]/di) = E[m(c)S
Z
»
T
is
]
nonsingular for
+
U
Z
where
G(v,t))
solves
\x],
it
as
is distributed
7
1
V = (ElSS ])'
follows that
of
3
that is regular satisfies
j3_
U
and
N(0,V)
is
independent of
is
#
.
Of particular interest for studying weighted average derivatives
T
v(x,£) = (x |3,x
T T
)
corresponding to a partial index model.
,
Here
+x
11
T
1Z
TT
/3,x„)
In this case,
.
Z
G
for
(v)
1
is
the case where
is
/3
up to scale, so we normalize the first coefficient to be one, with
(x
v(x,/3)
= dG(*s,x „)/3*s| _
T
^J— X-.+X. _p-.
Z
only identified
=
the efficient score
,
is
S =
(4.7)
5.
2
cr"
(x)u.
2
2
G^vHXg
- E[cr~ (x)x
2
l
v]/E[<r" (x) v]>.
I
Efficiency of Weighted Average Derivative Estimators of Index Coefficients
As we discussed
average derivatives estimate the parameters of partial
in Section 2,
index models up to scale.
In this Section
we
consider the efficiency of average
derivative estimators of index model coefficients.
before, where the first coefficient equals one.
and
5
identification condition
8
E[w(x)ag(x)/Sx
5„/5
]
will be consistent for
(3.
it
We adopt
For
= E[w(x)Sg(x)/5x .„].
* 0,
S
continuously differentiable in a neighborhood of
supremum of Cramer-Rao bounds and any estimator
Vn(fi -
E[SS
are satisfied and
then for the class of parametric submodels such that
0,
and
),
),-n
4.1 to 4.3
the same normalization as
g(x) = G(x
Assuming that
will be the case that
5„/5
The asymptotic efficiency of
Section.
- 17
T
+x „p,x
w(x)
=
/3,
is
),
let
<5
=
obeys the
so that
analyzed
£ =
in this
To evaluate the efficiency of
we
this estimator
will
equation (3.3), having influence function given in equation
assume that
(3.5).
5
satisfies
This assumption
is
justified because the influence function of any (regular) average derivative estimator
satisfies equation (3.5), by
Theorem
We
3.1.
assume that equation
also
satisfied, a standard regularity condition that is
known
(4.2)
is
to be a consequence of
regularity of the estimator (e.g. see Newey, 1990a).
The Asymptotic Variance of Relative Average Derivative Estimators
5.1
Given that
(and hence
|§
T
x +x
[-0,11a' (x)
[-/3,I]a'(x)
only on
Also,
v.
function of
(5.2)
is
= SaU» -
«/»(z)
a(x) = a(x
x„
is
[-3,1],
is
then
It
Let
« =
,x„)
v,
[-/3,I]g'(x)
=
x
,
since
v
holding
g(x)
depends
so that by the delta method the influence
will equal
= S^[-0,lKw(x)g'(x)-5 - <w'(x) + w(x)f ' (x)/f (x)>u
£(x)
= -5^[-/3,lKw'
For notational convenience we use the same
3,
,x
with respect to
a(x)
included in
5(5 /5 )/S5 = §
= £(x)u,
2 and
the influence function of
xJp.x^.x^/Sx^,
the partial derivative of
Recalling that
constant.
(3.5),
asymptotic variance) can be derived by the delta method.
its
and note that for any function
„/3
(5.1)
i.e.
has influence function in equation
5
i/»(z)
(x) + w(x)f'(x)/f(x)>.
and
even though the expressions are not the same.
£(x)
notation here as in Sections
The asymptotic variance of
T
E[i/»(z)i/i(z)
is
].
interesting to note that the term
Var(w(x)g'
(x))
in the
average derivative
bound has dropped out of the asymptotic variance of the index estimator.
This result
is
consistent with the interpretation of this term as the one that accounts for the density
of
x
being unknown.
Since
(3
is
a feature of the conditional distribution of
- 18 -
y
given
x
the distribution of
x,
is
ancillary for estimation of
and knowledge of
/3,
this distribution should not affect the efficiency of estimators for
£.
Efficient Weighting
5.2
We next
of Sections 2 and
4,
where
on the weight function
can be chosen so that
Section
and
w(x),
/3
is given.
p(e)
we
In this
model the efficiency of
j§
depends
would be useful to know whether a weight function
it
is efficient,
In this subsection
4.
as an estimator in the partial index model
consider the efficiency of
with asymptotic variance equal to the bound of
consider existence and the form of such an efficient
weight.
Our first result
elliptically
is
when
that an efficient weight function will exist
symmetric distribution and
2
<r
(x)
depends only on
x
has an
as described in the
v,
following result.
Theorem
5.1:
Suppose that
i)
x has an
T
nonsingular variance and density
vector
is
u
%((x-\i) \(x-u))
and positive definite matrix
nonzero with probability one.
w(x) =
(5.5)
2
<r~
elliptically symmetric distribution, with
A;
for a differentiable
2
ii)
cr
2
(x) = E[u \v] =
2
cr
Then an efficient weight function
T
(v)G (v)/h{(x-iL) h(x-\i)),
/i(q)
l
constant
£('),
and
(v) >
G (v)
is
= l(q)/S qJ(r)dr.
The optimal weight depends on the density of the regressors through the inverse of the
hazard
/i(q).
It
does not effect the weight
corresponding to exponential
£(q)
is
l//i(q)
is
£(q)
and only
if
proportional to
more weight to larger values of
"thinner tailed" than normal (e.g.
/i(q)
£(q)
proportional to
- 19
In cases
x.
l/(l+q
(x-ji)
constant,
is
and hence normally distributed
"thicker tailed" than normal (e.g. f(q)
will tend to give
if
a
a
),
T
a
),
a
1),
while
A(x-fi),
exp(-t
>
>
1)
where
it
if
£(q)
will
tend to give less weight.
Some properties of
g
symmetric distributions are essential to existence
elliptically
of an efficient weight, because of implicit constraints on
and
x,
denote the vector of
all
x
elements of
Let
£(x).
other than the
k
x =
.
(x T,v
Then
)
£(x)
is
constrained in the following way.
Lemma
E[l (x)\x
5.2:
]
= 0,
k * q -
1.
This result can be interpreted in terms of the amount of information used by a single
Each
average derivative ratio.
function of
x
+<c,
(3,
,
words,
£,
x +ai8,
.
j§,
.
Also, each
and allows for
is
u
in the influence
uses only the restriction that the index
|3,
g(x)
the term multiplying
is
C.(x)
x.
to depend on
in
an arbitrary way.
is
of the
form
In other
consistent for the coefficient in a partial index model with index
Lemma
5.2
is
a consequence.
It
exactly the condition that makes the
is
influence function uncorrelated with nuisance parameter scores in such a partial index
model, as required by consistency of
elements of
£*(x) =
G
2
(v)<r~
(x){x
|3,
,
as in equation (4.2).
2
2
12
-E[(r~ (x)x
12
l
v]/E[(t~ (x) v]>
score only have conditional expectation zero given
|
v.
In contrast, the
from the
efficient
This results from the index model
imposing more restrictions than any individual average derivative ratio, leading to more
information (variance) in the efficient score.
Theorem 5.2 shows that
it
is
sometimes possible to combine the information from the
individual derivative ratios to obtain efficiency.
Intuitively, although the individual
terms have less information than the efficient score, together they can have as much,
because the index interpretation of the vector of derivative ratios comes from the index
model.
However, the requirement that a single average derivative vector
information
is
all
the
quite restrictive, relying on linearity of certain conditional
The declining weight implicit in the density weighted estimator of Powell, Stock, and
Stoker (1989) would tend to have high efficiency when the x distribution has thinner
behaves
l/h{q)
tails than normal, although it is difficult to find an example where
exactly like
£(q).
- 20 -
expectations as a necessary condition.
Theorem.
Suppose
5.3:
that
2
<r
nonzero with probability one, and
k
\x_
k
cr
k
\v] + c
in
2
when
2
case where
E[x
E[x,\x_]
Thus, linearity of
efficient weight,
=
]
cr
-
]
- E[<c_
x
k
G
0,
(v)
such that
c
\v]}.
k
We
v.
necessary for existence of an
is
could also derive a result for the
more general way, but for simplicity we have not
in a
allowed for this generality here.
Although
x
having nonlinear conditional expectations will rule out the existence
of an efficient weight, approximate efficiency can
influence functions from
derivatives
efficiency
many
still
derivative estimators.
be achieved by combining
combining average
Intuitively,
from different ratios imposes the index model information, that can
results
if
from different weighting functions are
lead to
Unfortunately, the
used.
efficient combination will not generally exist in closed form, and hence is difficult to
describe.
We use
certain Hilbert space results to argue that average derivatives can be
combined to achieve approximate efficiency, but not
that
is
useful here
is
that the closure of the direct
in closed
sum
form.
The basic result
of closed linear subspaces
is
equal to the orthogonal complement of the intersection of orthogonal complements of the
Take the Hilbert space to be the usual one of functions of
subspaces.
mean square.
for each
k.
Take the subspaces to be the set of functions of
x
with finite
x satisfying Lemma 5.2
These have orthogonal complement equal to the set of functions of
The intersection (over
k ^ q-1) of this set
is
the set of functions of
v.
&_,
.
This
intersection has orthogonal complement equal to the set of functions that have
conditional
mean zero given
v.
Then,
is
If an efficient weight
nonsingular.
for each
<c_.
Prob(u = 0) =
v,
there is a vector
1
T
(<c_
k
k
is
depends only on
(x)
depends on
(x)
E[SS'
k £ q
function exists then for each
E[(C
depends only on
(x) >
it
follows by
- 21 -
E[£ (x)|v] =
that each element
of
I (x)
is in
the closure of
ix)+»"+l
<l
(x)
:
E[lAx)\x
= Oh
]
]
x
that depend on
in the efficient score,
average derivative that depend on
However,
it
is
x,
Thus, the terms
can be approximated by the terms
in the
which will lead to approximate efficiency.
impossible to give an explicit form for this additive decomposition,
because a direct sum of subspaces need not be closed.
when
Also, even
it
is
closed the
decomposition into the sum does not generally have an explicit form, except when the
subspaces are orthogonal or finite dimensional.
{£,(x)
:
E[L (x)\x,
]
covered by Theorem
= 0}
5.1.
Here the subspaces
are never orthogonal, and many finite dimensional cases are
Thus, in general an explicit form for the optimal combination of
different weighted average derivative estimators cannot be found.
5.3
Pooling Weighted Average Derivatives Via
Minimum Chi-Square
Combining different weighted average derivative estimators provides a way to achieve
approximate efficiency.
when an
Also,
efficient weight exists,
that are unknown, and pooling estimators with
efficient estimation.
efficiency is by
£ 2
let
minimum chi-square
is
and
let
H =
[I
I]
= e
where
®I,
(3.
=
be specified,
J),
1
J
and
w.(x),
let
|3.
=
e.
is
the
vector of ones
J
Let
(3.
w(x) =
(satisfying equation (5.2) for
j
j
w.(x))
and
let
*(z) =
T
^,(z)
d/».(z)
= E[*(z)*(z)
*(z) =
components
].
($»
in
Let
(z)
Q
TT
)
.
The asymptotic variance of
be a consistent estimator of
^,(z)
y
is
then
Q
J
1
J
for
(j
the identity matrix with dimension equal to the number of elements of
denote the influence function of
0.(z)
w.(x),
Let
Stack the separate estimators into a vector
be the associated ratio estimator.
7 = (£,...,£)
an approach to feasible
estimation, that can be described as follows.
denote the weighted average derivative estimator using
5.
I
is
may have components
An optimal way to pool different estimators so as to improve
linearly independent weighting functions
5 .„/5 •,
J2 Jl
and
known weights
it
)
,
with
$.(z)
fi,
such as
obtained by replacing
equation (5.2) by nonparametric estimators.
- 22 -
Q =
all
£. _.*(z.)*(z.)
the unknown
The pooled estimator
is
/n
= argmin (r - K0)
P
(5.6)
It
T
_1
ft
(y - K0)
= (H
T
l
l
H)
ft
T
H Q
l
y.
follows from standard minimum chi-square theory that
asymptotically normal with asymptotic variance
(H
T
-1
-1
asymptotic variance of any linear combination of the
that
H)
ft
0.,
=
(j
T^—
consistent estimator of the asymptotic variance will be
The minimum-chi square estimator
ft
no larger than the
J).
1
Also, a
—1
1
H)
.
Also, it will be approximately efficient
a linear combination of the influence functions can approximate the
if
J
(H
is
will be efficient if the efficient score is a
linear combination of the influence functions.
for large
Vn-consistent and
is
We
efficient score in mean-square.
state these results in the following theorem, also
giving an interpretation of the difference of the efficient information matrix and the
the
minimum chi-square
Theorem
v'
5.4:
such that
S =
such that
lim
1
T
2
H'Q~ H = min^[{S-TM(z)}{S-TN(z)} ],
-
IIS
precision matrix.
then
is efficient.
E[\\S-Tl
T
^W
2
]
=
then
so that if there exists
Furthermore, if for each
Urn
T
1
(H'Q^H)' =
J
II
there exists
IT
V.
The second condition, on existence of a mean-square approximation of the scores by the
influence function,
is
a "spanning condition" for efficiency.
It
is
the (minimum
chi-square) analog of the generalized method of moments spanning condition given in
Newey
(1990b).
To achieve approximate efficiency by combining weighted average derivative
estimators, the weights will have to satisfy a spanning condition corresponding to that
of
Theorem
5.4.
This spanning condition
is
quite complicated, because the influence
functions depend on both the weight and its derivative.
For this reason, we do not try
and give as general a spanning condition as possible, instead focusing on conditions that
The
are relatively easy to verify.
distribution.
Let
f(x)
variable with realization
first of these conditions involves the data
denote the density of
cc,
.
- 23 -
x
and
let
X,
denote the random
Assumption
x
5.1:
E[llf'(x)/f(x)ll
2
]
<
oo,
kiq-1
for
The
,
)
(x)
o-
r
df{x,\x_,)/dx,Cov(l{X,£x,),X,\x_.)
last part of this condition does not
check for particular distributions.
bounded and bounded away from zero,
is
and any positive integer
-2
fix, \x
2
has compact support,
J/dx
df{x \x
r,
are continuous on the support of
seem primitive, but
is
it
In particular, as long as
fix.
and
x.
straightforward to
\x,
on the
>
)
interior of the support, the last expression will be continuous on the interior, so that
show continuity on the boundary.
suffices to
it
a
beta density, proportional to
x_
,
x,
,
b
x, (l-x,
)
For example, suppose
for coefficients
b
a,
T{x,\x_,)
k
|
x_
=
)~ 2
k
dT{x
|
k
a;_
k
and
on
b
)/ax Cov(l(X sa; ),X^ x_ )
k k
k
k
[a - (a+b)a^]|
|
r
lc
(t
-[a/(a+b)]
This function will be continuous at any
convergence).
a
Then
suppressed for notational convenience.
f (x
a
that are continuous in
bounded, positive, and bounded away from one, where dependence of
is
is
X
a
r
)t
b
where
Also, if a sequence is such that
dt/[a^
(l-t)
X,
<
x,
+1
(l-a^)
< 1
b+1
]
(by dominated
converges to zero or one, then
continuity follows by L'Hopital's rule, with the limit of this expression equal to
r
-a[a/(a+b)] /(a+l)
and
at
at the limiting value of
When Assumption
x
,
r
-bU-[a/(a+b)] }/(l-b)
k
1
for
a
and
b
evaluated
.
5.1 is satisfied there
for the spanning condition to be satisfied.
the
at
are simple conditions on the weight functions
Let
xAt,x)
denote the vector with
position and other components equal to the corresponding components of
- 24 -
t
in
x_^.
Assumption
each
w.
5.2:
k s q -
1,
there
continuous function
that
sup
y;
k
Ej^jjj
\Y .i
n
1
(x)
dp
=
is
a{x),
w
(v)p
£
a set
and
(x)/dx
]
3 PjJ (a: (t,a:))/a^dt
k
b(<c)
for
|
1/C
there
w
(v)
<
C
is
£ 'it p
and
J
(<c)
n
.,...,
> 0.
e
Also, for
> 0,
n
such
=
J > J.
This hypothesis says that the partial derivatives with respect to each
x,
approximate any continuous function, and that the linear combination
definite integral of the partial derivative.
satisfied for particular choices of
C
for some
such that for any
-co,
there
is
and
e
<
<
inf(B) =
£ R,
b e S,
k
with
(a:)
It
p. .(a;).
is
can
£.,tc
easy to check that this hypothesis
p. Ax)
For example, suppose
is
..p.. (a:)
is
a
is
a power
series with all terms of a given integer order (sum of exponents) and below included.
Then the hypotheses follow by the Weirstrass theorem and because derivatives and definite
integrals of
power series are also power
series.
Also, for similar reasons, this
hypothesis will be satisfied for Gallant's (1981) flexible Fourier form.
An important assumption here
function of
x,
is
that
w
(v)
is
a function of
both of which depend on the unknown index
weights are not feasible.
x +x
They can be made feasible by replacing
and
v
T
is
a
so that these
/3,
£
p. Ax)
with an estimator.
Because the choice of weights does not affect the consistency of the estimators, under
appropriate regularity conditions the replacement of
X
with
value will have no effect on the limiting distribution of the
its
corresponding estimated
minimum chi-square
estimator, and hence no effect on its efficiency.
The next result shows Assumptions
condition for near efficiency of
Theorem
5.5:
If Assumptions
5.1
and 5.2 are sufficient for the spanning
minimum chi-square.
5.1
and 5.2 are satisfied, then
- 25 -
Urn
Jr—Xd
(H'Q
H)
=
V.
5.4
The Choice of Weights
We can combine
in Practice
minimum chi-square with the form
the near efficiency of
of the
optimal weights given for the elliptically symmetric case to suggest a sequence of
weights that should have high efficiency with just a few terms included.
is
to initialize the weight sequence at values that are close to efficient
The basic idea
when the
regressors are normal, add first some terms to allow for nonnormal but elliptically
symmetric
and then include terms that would account for non
x's,
G
Let
distributions.
and
(v)
v
= x
where
a
+ x
G
estimator one might choose
the constant and
is
(v)
be obtained from a preliminary parametric
For example,
estimator of the index model.
conditional mean, and
cr
(v)
b
T
„/3,
y
if
is
binomial,
m
(•)
=
1,
= 0(a + b«v)
and
a-
the coefficient of
x
and other
distributions other than normal, such as
that
is initialized
g(x)
the
is
then corresponding to a preliminary probit
be the sample mean and variance of the regressors and
scalar argument, with
elliptical
(v)
= $(a + b«v)[l-$(a + b«v)]
from
probit.
Let
n
and
Z
be functions of a
u».(*)
allowing for elliptically symmetric
m.
J
m.{q) = [q/(l+q)]
Then a weight sequence
.
at normal and elliptically symmetric regressors
is
as given in
Assumption 5.2 for
w
where
J
powers of
(v)
is
oc.
= &iv)/v
Z
W,
p. (a:) =
some (small) positive
uiMx-^t'hx-ii)),
integer.
An noted above, estimation
Higher (than
j
J)
s
J,
order terms might include
of the weights will not affect the asymptotic
distribution of relative average derivative estimators.
An important practical
One way the number of weights might be chosen
GMM
number of weights
issue is the choice of
cross-validation criteria suggested in
Newey
is
in applications.
by the minimum chi-square analog of the
(1990b).
The basic idea
is
to use
equation (4.2) to construct a cross-validated estimator of the mean-square of the
which by Theorem 5.5
difference between the efficient score and the influence function,
- 26 -
By equation
precision.
IIE[*(z)¥(z)
T T
]TI
The
.
error for different
T
]
= E[SS
term does not depend on
first
J
E[(S-lWz)MS-II*(z)>
(4.2),
term can be dropped.
this
= (Q
-1-1
_1
.)
.
H =
Let
^i=l
.h +
T T
H n
.
+
ft
-l
-l
+
II
= Y. .*(z.)*(z.) /(n-1),
.
-i
J*i
T
T
ft
and
J
be a positive
r
is
J
-111-1
.*(z.)*(z.)
J
Let
».
T
an estimator of the scalar mean-square error criteria
up to an additive constant.
T T
T
Q
Then a cross-validated criteria for choosing
tracetr-y;"^
H
- IIH -
be an estimator, as
*(z)
1111
cv(j) =
is
]
[Q' 1 + n" 1 *(z.)(l-*(z.) Tn" 1 *(z.))" 1 *(z.) T Q~ 1 ]H.
definite matrix.
This
T
so in comparing mean-squared
J,
described above for the minimum chi-square estimator,
fi
minimum chi-square
related to the difference of the efficient information and the
is
EHS-TI*(z)} r{S-IT*(z)>]
9
Efficiency For Conditional Distribution Index Models
5.5
When the
conditional index distribution model of equation (2.4) holds, relative
average derivatives will estimate the same coefficients for different
so that their asymptotic efficiencies can be compared.
2
E[<r
(x)£(x)£(x)
T
for
]
2
a-
= v(x)
(x)
-2
E[m(e)
2
|x].
estimators depend on
this
dependence
dropped out.
is
functions,
The asymptotic variance
As discussed
/i(x)
is
2
in Section 3,
the asymptotic variance for estimation of the location parameter
argmin E[p(y-u)|x],
p(e)
cr
(x)
is
=
so that the asymptotic variance of relative average derivative
in a
p(e)
way
that
is
analogous to the location parameter.
Here
even more direct, since the first term in the average derivative has
For example,
if
"fat-tailed" enough at each
the conditional distribution of
y
given
x
is
so that the median has lower asymptotic variance than the
x
mean, then the asymptotic variance of a weighted average derivative estimator based on
the median will be smaller than that based on the mean.
9
In a
Newey
Monte Carlo example for a different semiparametric estimation problem, given in
(1990b), an analogous GMM criteria led to an estimator with dispersion that almost
as small as that of the best
J.
- 27 -
It
is
possible to approximately attain the semiparametric bound for the conditional
index model, that
is
given in
Newey
estimators for different weights and
is
(1990b), by combining relative average derivative
p(e)
functions.
A
sufficient spanning condition
that the weighted average derivative estimators are calculated from
all
combinations
of a sequence of weights satisfying Assumptions 5.1 and 5.2 and a sequence of
m(e)
functions for which linear combinations can, for the conditional distribution of
e
given any
x,
approximate any function with finite mean square.
omitted a full description and the proof of this result.
- 28 -
For brevity, we have
Appendix:
Throughout the appendix
let
C
Proofs of Theorems
denote a generic matrix of positive constants that
may
be
different in different appearances.
Proof of Theorem
3.1:
of
<(y,x)
an d
T
an d
C(y.x)
9
(A.2)
=
T
are bounded, so this
y(x)
is
a density function for
9
close enough
T
E_[a|x] = E[a|x] + 9 E[a<|x].
o
mean-square continuous differentiability of
8=0
(A.3)
Lemma
follows by
C.4 of
Newey
(1991b),
f(z|6)
1/2
'
neighborhood
in a
B
with score
S.(z) = C(y,x) + y(x).
By Assumption 3.2 and Lemma C.2 of Newey
differentiate in
E[m(y-g)|x]
and
and hence by
eq.
x
E[m(y-g)<(y,x) |x]
is
o
is
function theorem, for each
x,
to
9
E [m(y-g)|x] =
in
Xx0
there
is
for
g € 5.
on
(x ,g)
continuously differentiable on
all
continuously
so that
^(y,x),
can be chosen small enough that for
g(x,8)
is
are continuously differentiable in
E [m(y-g)|x]
(A.2),
exists a unique solution
g'(x,9)
E[^(y,x)|x]
(1991b),
with bounded derivatives, and hence so
particular, by continuity
X,
and for the density
Also,
0.
In addition,
and
= r(x)-E[y(x)],
f(z|6) = f(z)[l + e <(y,x)][l + 9 y(x)].
Both
of
r(x)
consider the parametric submodel
z
(A.l)
to
be bounded and continuously
r(x)
<(y,x) = <(y,x) - E[<|x],
differentiate, let
f(z)
Let
Xx§"x0.
9 €
J?
OCX'S,
In
there
By the implicit
a neighborhood such that
3g(x,9)/39
exist and are continuous on that neighborhood, so that by compactness of
can be chosen small enough that
and bounded on
Xx0,
9g(x,9)/S9
and
- 29
and
g' (x,0)
exist,
are continuous
d g (x,e)/aei
(A.4)
=
e=0
WteXlx]
-i>(x)
Next, the marginal density of
5(9) =
E [w(x)g'(x,9)] = E[w(x)g'
T
g(x,9)
9,
and by
differentiable in
Next,
finite
is
i//(z)
it
T
E[w(x)y(x)g' (x,9)]
[w(x)r(x)f(x)]'g(x,9)].
with bounded derivative,
9
5(9)
T
]
T
|
e=0
the pathwise derivative for
follows, e.g. by
EUlEtslxl-rtx)!!
Lemma
E[s(z)]
2
]
=
0,
all
C.7 of
T
].
parametric submodels as specified above.
Newey
and for any
(1991b), that for
e
>
there are
0,
2
= E[lls(z)-E[s|x] + E[s|x] - C - r"
]
any
with
s(z)
and
<(y,x)
y(x)
2
E[lls(z)-£(y,x)ll
]
<
e
so that
e,
<
E[lls(z)-S Q (z)ll
^ 2E[lls(z)-E[s|x] - C»
where the
in
- Elftxr^wtxlytxlftxll'gtxll
]
+ E[w(x)g'(x)r(x)] = E[^(z)S Q (z)
mean-square and
(A.7)
differentiate
is
satisfying the above boundedness and smoothness hypotheses with
and
f(x|8) =
is
-1
= E[£(x)a g (x,9)/d8
0=o
= E[£(x)u<
Thus,
f(z|e)
and another integration by parts,
(A.4)
eq.
35(9)/39|
(A.6)
parametric submodel
(x,9)] + 9
Q
= E[£(x)g(x,9)] - 9 E[f(x)
By
in the
so that by integration by parts,
f(x)[l+9y(x)],
(A.5)
x
= EtuClx].
2
]
+ 2E[IIE[s|x] - E[s] -
last inequality follows by the
2
]
2
rll
]
* 4e,
Cauchy-Schwartz inequality.
The
first
conclusion then follows by Bickel, Klaasen, Ritov, and Wellner (1992, Chapter
3,
Theorem
2).
E a [\\\p(z)\\
To obtain the second conclusion, note that
by regularity and Theorem 2.2 of Newey (1990a),
(A.6),
E[($(z)-|//(z))S
(z)
]
=
0.
2
]
aS(e)/d9\
is
Q=Q
continuous in
=
T
E[i//(z)S
e
(z)
The second conclusion then follows because
-
30
-
Then
9.
],
so by eq.
S (z)
Q
T
can approximate any mean zero vector function in mean square, and hence can approximate
2
so that
0(z)-i//(z),
E[ll0(z)-i//(z)ll
We prove Theorem
=
]
QED.
0.
two Lemmas.
4.1 using
The
form of the tangent
first gives the
set and
the second the projection on the tangent set.
Lemma
If Assumptions
A.l:
are satisfied then the tangent set, for all parametric
4.1
submodels satisfying the hypotheses of Theorem
ST
= (tCz)
We prove
E[t(z)
:
2
]
E[t(z)] = 0,
< 0,
4.1,
is
E[tu\x] = E[tu\v]}.
in
J
and
exhibiting a class of parametric submodels that can approximate anything in
J
in
Proof:
showing that any nuisance score must
this result by
lie
mean
square.
Consider first any nuisance score for a submodel satisfying the hypotheses of
Theorem
4.1.
3G(v,7))/aT)
By the implicit function theorem
= -v(x)
3E [m(c)|x] = E[uS
lies in
E[uS
Thus,
|x].
is
differentiate at
|x]
and
tj
a function of
is
v
and so
V
T)
Tl
S
G(v,T))
9".
To construct a regular parametric submodel with score that can approximate anything
in
3",
let
D
differentiate in
denote the statement "the function
and
y
Then by Assumption
4.1
bounded, continuously
and has bounded derivatives."
3,
and
is
Lemma
C.2 of Newey (1991b),
Let
<(y,x)
E [^(y,x)|x]
D.
satisfy
is
continuously
P
differentiable in
(3,
with derivative
E
P
(conditional) score for
f(y|x,/3).
of the conditional information
E
[<[(y,x)|x]
satisfies
D.
E
(1991b),
[S
S (y|x)
bounded and boundedness
C
—
T
|x].
<(y,x,3) =
Thus,
0y. x
be as specified in Assumption 4.2.
m(e)
T>
so that by Assumptions 4.1 and 4.2
by Assumption 4.1 and
E [m(e(|3))m(e(3))
|x]
Lemma
and
P
EJ<;(y,x,3)m(e(/3)) |x]
satisfy
25.
For a function
differentiable with bounded derivative, let
- 31 -
the
is
P
bounded by
is
(y|x)S (y|x)
satisfies
where
P
This matrix
Also, let
m(cO)) = m(e(/3))-E [m(e(£))|x]
[C(y,x)S (y|x) |x],
a(v)
that
is
continuously
)
~
Then
C.2 of Newey
C(y.x,/3)
= <(y,x,0) - m(e(0))(E [m(e(0))m(e(0))|x]) *<E [<(y,x,0)m(e(0))|x] - a(v(x,0))h
/3
Then this function satisfies
for any function
E
and
2)
[C(y,x,0)|x]
with mean zero,
<(x)
continuously differentiable in
from zero and one for
9 =
is
7)
A(z,9) =
(1
=
by construction.
+
T
7)
<(y,x,0)(l +
with bounded derivative and
(0,tj)
T
T)
is
a small enough neighborhood of zero, and
Therefore,
<(x))
is
bounded away
E
[A(z,0)]
=
0,
.
so that by
Lemma
C.4 of
Newey
=
(1991b), f(z|9)
f(z|/3)A(z,9)
is
a density with
mean-square continuously differentiable square root, with
1
S^ = <(z) - m(e)(E[m'(e)m(e)|x]f <E[<(z)m(e)|x] -
(A.8)
Thus,
f(z|9)
is
E [m(e)|x] =
(A.9)
Q
by Assumption
T
7)
tj
(A.10)
a parametric submodel, note that
a(v(x,0)).
Q
and by Assumption
g
3E [m(y-g) |x]/9g
[G(v(x,p),/3)-e,G(v(x,0),/3)+€],
all
is
it
E [m(y-g)|x] = E [m(y-g)|x]
4.1,
differentiable in
small enough
To show that
smooth.
small enough there
= EJm(e
a(v)>.
is
is
4.1,
T
+
T)
there
E [m(y-g)C(y,x,0)|x]
e
is
such that for
>
continuously
is
all
tj
bounded away from zero on
uniformly
in
i>(v(x,0),7))
+ Wv(x,0),7)) |x]
x
in a
and
Therefore, by eq. (A.9), for
0.
neighborhood of zero such that
= E Q [m(y - G(v(x,0),9))
|x],
G(v(x,0),9) = G(v(x,0),0) + Wv(x,0),T)).
This
is
a local minimum of
E [p(y-g)|x]
by continuous differentiability of E [m(y-g)[x]
y
and the derivative bounded away from zero, and a global minimum for small enough
the theorem of the
maximum and compactness
restriction, and hence is a
of
§
'.
Thus,
smooth parametric submodel.
f(z|0)
satisfies the index
Furthermore, the other
hypotheses in Theorem 4.1 for the parametric submodels are satisfied by construction.
To show that
S
can approximate anything
- 32 -
in
'H ,
tj
note that for
t
e J,
by
by
boundedness of
^
7
CE[lltll
exists
E[m(e)
2
2
E[IIE[m(e)i|v]ll
|x],
Then for any
].
and
a(v),
<;(y,x),
€
2
2
E[lli-<ll
Also,
e.
<
]
Lemma
2
= E[E[m(e)
]
Newey
C.7 of
2
2
|x]E[llill
|x]]
(1991b) that there
that are bounded and continuously differentiable with
y(x)
bounded derivative such that
E[IIE[i|x]-r(x)ll
follows by
it
>
= E[IIE[m(e)i|x]ll
]
E[IIE[m(e)£| v] - a(v)ll
e,
<
]
E[llt-E[i|x]-<ll
2
]
£ CE[lli-<ll
2
2
]
and
e,
<
+ CE[IIEtt|x]-E[<;
]
2
|x]ll
s Ce.
]
Therefore,
2
E[llt-S
(A.ll)
i C<E[llt-EU|xKll
]
II
2
+ E[IIEU|xK(x)ll
]
2
]
2
+ E[Var(m(e)|x)(E[m(e)m(e)|x]f IIE[m(e)Clx] -
£ Ce + CE[IIE[m(e)Clx] - E[m(e)t|x] + E[m(e)i|v] -
s Ce + CE[IIE[m(e)(C-t)|x]ll
£ Ce + CE[E[m(e)
Lemma
A.2:
vector
E[u
If
2
2
2
E[{o-"
oo
(x)uE[u's|x]>
],
2
where
q =
let
2
]
2
(x)
= 0) =
0).
then the projection of a
]
<
oo.
Also, note that
2
2
E[(r" (x)|v]E[(r~ (x)us|v]/E[o-" (x)|v]
-2
q
x
random
1
is
1.
By the Cauchy-Schwartz inequality,
s
2
2
E[cr" (x)E[<r" (x)
= E[E[<r~
]
2
2
(x)
E[cr
=
0.
Then
a>,
and
2
|
I
v]
2
2
2
E[u |x] v]E[s v]/E[<r~ (x)
|
|
=
]
|
v]]
=
-2
m
(x)] <
E[R(x)|v] = E[E[<r
<
]
_2
2 2
2
u |v]E[s v]/E[(T (x)
Thus, for the expression
-2
E[o-
J
co
the expressions are finite by
Prob(c
]
QED.
-2
Lemma, E[t
2
(x)] <
2 2
2
2
2
u |v]E[s |v]/E[o-" (x)|v]
all
2
2
2
2
R(x) = (T (xKE[u«s|xl - E[<T (x)u«s| v]/E[<r~ (x)|v]>.
2
2
s Ce.
2
2
2
2
2
s E[E[<u/cr (x)> |x]E[u |x]E[s |x]] = E[s
]
2
exists and
the
|x]]
E[<r
(x)uE[o-" (x)u.s|v]/E[<r" (x)|v]>
2
2
and
For notational simplicity
E[(r" (x)E[<r~ (x)
E[s
|x]E[ll<-ill
]
-2
] <
s - E[s] - uR(x),
E[<<r"
2
2
]>
2
a(v)ll
+ CE[IIE[m(e)t| v] - a(v)ll
]
with finite mean-square on
s
Proof:
2
2
a(v)ll
(implying
E[cr
(x)|v]
given in the statement of
t
-2
(x)us|x]|v] -
2
E[tu|x] = E[su|x] - E[u |x]R(x) =
-2
(x)u«s| v]/E[cr
E[(s-t)t]
(x)|v]
is
a function of
v,
so that
= E[E[tu|x]R(x)] = E[EUu|v]R(x)] = E[E[tu| v]E[R(x)
- 33 -
e
t
|
v]]
9".
=
Also, for any
0.
QED.
t e J,
Proof of Theorem
By Assumption
4.1:
smooth with score
and the implicit function theorem,
4.1
f(z|0)
is
satisfying
SQ
p
E[uS |x] = v
(A. 12)
+ dG(w, B)/dB.
This equation implies that
2
E[(r" (x)uS Q |v]
(A. 13)
=
E[o-~
2
P
By
Lemma
4.1 the
projection of
SQ
tangent set
on
9"
is
2
(x)E[uSjx]|v] =
P
E[cr~
2
(x) v].
|
P
Lemma
so by
(x)vJ v] + SG(v,p)/50E[a-"
4.2 and eq. (A. 13) the residual from the
is
J,
S.
The conclusion now follows by
Bickel, Klaasen, Ritov,
P
and Wellner (1992, Chapter
There
Theorem
Lemma
is
Theorem
3,
a convenient characterization of efficiency that
A regular estimator with influence function
B
if there is a constant matrix
By
E[0(z)^/(z)
if
there
eq.
T
]-E[i/»(z)S
B
a
is
l
T
](E[SS
5.1:
is
I
E[Si/»(z)
eq.
B
efficient if and only
is
\fi(z)
= BS.
so that
E[ip(z)\p(z)
E[(^(z)-BSH^(z)-BS>
T
]-(E[SS
T =
l
(4.2)
A(x-jli).
Note that
d/i(q)
/dq +
(5.1),
-2
Ellix) ut] =
for
all
t
e 1.
Note that
o-
(x)ua(v) e
J
_2
a(v),
since
2
it
=
QED.
an identity matrix with dimension equal to the number of elements of
bounded function
2
]f
and only
if
]
T
F(x) = -[-pMH^.OlAtx-M),
= o-'^vJG^vMx),
T
Also, by eq.
]
q(x) = (x-ji)'
Then by
1.
I,
]
T =
min
= BS.
i/»(z)
</»(z)
T =
E[^(z)S
_1
])
Let
l
£(x)
T
such that
h(q)~ l{q)~ dh{q)/dq =
(A.14)
such that
(4.2) it follows that
Proof of Theorem
where
useful for proving
is
5.1.
A.3:
Proof:
QED.
2).
has finite mean-square (by
E[u<o-" (x)}a(v)u|x] = a(v)a-" (x)E[u
2
|x]
=
a(v).
- 34 -
Thus,
E[<r
= E[£(x)
(x)]
u<a-~
finite)
x^.
for any
and
(x)>a(v)u] =
T
T
E[«x)
<x(v)]
choosing
= E[E[£(x)|v] a(v)]
for any bounded
to approximate
a(v)
E[£(x)|v]
-2
E[F(x)v
T =
]
Furthermore,
0.
is
[I..0],
and
imply
Var(x)
nonsingularity of
by
[-0,1],
Var(F(x))
a nonsingular linear combination of
x-E[x]
combination of
is
it
A
*
(v)
In particular,
0.
having full row rank and
all
follows that
F(x)
T TT
x =
Then since
nonsingular.
is
orthogonal to
is
= E[£(x)|v] =
Hence,
(v)G
-<r
(x.„,v
Standard
D
a nonsingular matrix
is
)
a linear
is
with nonsingular variance.
v
arguments then imply that there
linear regression
DF(x)
that
x,
(by
-2
E[F(x)|v] =
so that
(v)G (v)E[F(x)|v],
-a-
mean-square.
in
E[£(x)|v] =
implying
<x(v),
the residual vector from the population linear regression of
x
such that
on
„
v.
Linearity of conditional expectations for spherically symmetric distributions then gives
DF(x) = x „-E[x
2
so that by eq. (A. 14),
|v],
2
=
(x)
a-
<r
Lemma
(v),
A.3 gives the
conclusion as a result of
D0(z) = D-B«x)u =
=
2
o-"
(v)G (v){x
1
Proof of
Lemma
2
12
-E[<r" (v)x
g(x) = G(x
partial index model
+X
11
influence function
expectation of
G
2
^
lk k
'
x
w(x) =
2
<r
Ux) = B<x
(A.15)
Let
x. = (x._).
J
T
.(x
-J
.-E[x
-J
12 J
.|v])
-J
12
and
w(x)
Let
=
l
is
k-l
,X
l
(vHx^-Elx^
G
X,
,
is
-E[x
x
.
"J
12
v]}u
an estimator of
with
^q' 3^'
k+1
eq.
in the
(A. 14),
the conditional
Furthermore, the arguments of
zero.
QED.
giving the conclusion.
equal the efficient weight, and without changing
w(x).
Then by
Prob(u*0) =
so by
|
QED.
S.
-1
(v)G (v)
such that S = B£(x)u,
B
x
12
are a nonsingular linear combination of
notation let
5,/S.
given the arguments of
5.3:
(v)G
Then by the argument following
(x)u.
^(x)
Proof of Theorem
b
L
er~
|v]/E[(r" (v)|v]>u
12
As noted in the text,
5.2:
2
^(vJG^vJDFCxJu =
eq.
(5.1)
and
Lemma
A.3 there
is
a matrix
1,
lv]>.
= (x,„)
..
Equation (A.15) implies
12 -J
for constant scalar
b.
and vector
b
..
-J
J
- 35 -
£. x)
= bix.-Ex. v
j
J
Also,
b. * 0,
J
J
)
J
because
+
either
or
b.
b
not zero (because nonsingularity of
is
.
E[SS
nonsingularity of
T
and
])
b. = 0,
if
B
follows from
T
nonzero
c_.(x .-E[x_.|v])
J
J
J
by initeness of the variance bound for the index model) contradicts
dividing by
b
we
.,
= E[x
]
J
T
](E[*(z)*(z)
|v] +
T
(-b_ /b.){x_.-E[x_.|v]>.
J
Proof of Theorem 5.4:
E[S*(z)
Lemma
Then,
5.1.
obtain
E[x |v,x
J
(again implied
J
T
J
By equation
_1
E[*(z)S
])
J
J
V
(4.6),
QED.
J
_1
-
T
H'Q H = E[SS
-1
T =
1
min^US-ITCHS-ITO)
and the other statements
"],
]
-
]
follow as immediate consequences.
Proof of Theorem 5.5:
factors out of each
It
w
suffices to prove the result with
2
By
£.(x).
<r
Note first that for any vector
(v)
b(x), E[llb(x)ull
2
s CE[ llb(x)
]
E[ll£ (x)ll
2
*
show existence of square matrices
suffices to
—
»
as
i (x)
is
J
—
>
it
2
in the closure of
,
©•••©£,,
J
E[\a(cc)-l — n i. ,(x)|
J 1 JJ JK
Also, by
x
2
]
-*
bounded, polynomials in
E[{a(x)-{p(a:)-E[p(a;)|a:_.]»
2
]
,,
2,q-l
show that for each
suffices to
(A. 16)
k
for
it
suffices to
polynomial
f(x)~
2
is finite.
]
5.5 is
2
J
(x)-T. ,n..£.(x)ll
]
show that
It
x
k
df(x)/dxA a(a:
2k
=
{I:
a(x) e £„,
E[i\x ,1 =
-k
0, E[£
2
]
n
there are
<
a.}.
such that
0.
X
are dense in
while
£»,
= E[Var(a(x)-p(<E)|a:_.)] s Var(a(x)-p(<c)) £
E[{a(x)-p(a;)>
2
],
J
{p(x)-E[p(ffi)
p(x).
£„.
and
J
so that the set
E[ll£
(v)
Also, by the Hilbert space fact cited in the text, each element of
oo.
21
Thus,
such that
IT.,
2
Theorem
so by
],
II
w
since
1,
*
bounded away from zero,
(x)
=
eq.
\x_.h p(x)
is
a polynomial)
(A.16) is satisfied
(t,a;))f(a^(t,x))dt
is
dense in
Therefore,
1^.
a(x) = p(x)-E[p(x) x_
where
follows by Assumption 5.1 that
is
\
]
for a
b(x) = a(x) -
continuous, for
b e
S
such that
x^
>
b
k
^
on the support of
x.
Note that
a(x) = b(x) + fix)
-1
df(x)/dxAf* b(a^(t,a;))dt,
b
for the
7T.
of Assumption 5.2,
JJ
- 36 -
so that
lateJ-E.^ir.^MI *
J
k
|b(<c)
-
J].^ ir. j ap. k (<c)/a<c
1
k
l
+
|f(<c)
W (xVda^
|
^it.jap (a^ft.a))^]^!
[b(a^(t,a;)) -
I
£
(1
Therefore, by
+
c
1
|f(x)" 3f(a:)/ax
k
_1
|max
arbitrarily small,
|iE
;r
it
k
-b|)e * C(l + |f(x)
follows that eq. (A. 16)
37 -
af(a:)/ax
k
l
is satisfied.
)e.
QED.
References
Begun, J., W. Hall, W. Huang, and J. Wellner (1983): "Information and Asymptotic
Efficiency in Parametric-Nonparametric Models," Annals of Statistics, 11, 432-452.
Ritov, and J. Wellner (1992): "Efficient and Adaptive Inference
Semiparametric Models," forthcoming, Johns Hopkins University Press.
Bickel, P., Klaasen, Y.
in
Chamberlain, G. (1987): "Efficiency Bounds for Semiparametric Regression," working paper,
University, University of Wisconsin.
Doksum, K. and Samarov,
work
A.
(1992):
"Average Derivative Estimators of Transformations,"
in progress.
Gallant, A.R. (1981): "On the Bias in Flexible Functional Forms and an Essentially
Unbiased Form: The Fourier Flexible Form," Journal of Econometrics, 15, 211-245.
and H. Ichimura (1991): "Optimal Semiparametric Estimation in Single Index
Models," preprint, Australian National University.
Hall, P.
Hardle, W., W. Hildenbrand, and M. Jerison (1991): "Empirical Evidence on the
Demand," Econometrica 59, 1525-1549.
Law
of
Hardle, W. and T. Stoker (1989): "Investigation of Smooth Multiple Regression by the
Method of Average Derivatives," Journal of the American Statistical Association, 84,
986-995.
Ichimura, H. (1991): "Semiparametric Least Squares (SLS) and Weighted Least Squares
Estimation of Single Index Models," preprint, Department of Economics, University of
Minnesota.
Koenker, R. and G. Bassett (1982):
"Robust Tests for Heteroskedasticity Based on
Quantiies,"
Econometrica
Regression
50, 43-61.
Newey, W.K. (1990a): "Semiparametric Efficiency Bounds," Journal of Applied Econometrics,
5,
99-135.
Newey, W. K. (1990b): "Efficient Estimation of Semiparametric Models Via Moment
Restrictions," preprint, MIT Department of Economics.
Newey, W.K. (1991a): "The Asymptotic Variance of Semiparametric Estimators," MIT
Department of Economics Working Paper No. 583.
Newey, W. K. (1991b): "Efficient Estimation of Tobit Models Under Symmetry," in
Nonparametric and Semiparametric Methods, Barnett, W.A., J.L. Powell, and W.A. Barnett
eds., Cambridge: Cambridge University Press.
Pfanzagl,
Theory,
J.
and Wefelmeyer
New
(1982): Contributions to a
General Asymptotic Statistical
York: Springer-Verlag.
Powell, J.L., J.H. Stock, and T.M. Stoker (1989): "Semiparametric Estimation of Index
Coefficients," Econometrica, 57, 1403-1430.
- 38
Powell, J.L. (1991): "Estimation of Monotonic Regression Models Under Quantile
Restrictions," in Nonparametric and Semiparametric Methods, Barnett, W.A., J.L. Powell,
and W.A. Barnett eds., Cambridge: Cambridge University Press.
Rodriguez, D. and T. Stoker (1992): "A Regression Test of Semiparametric Index Model
Specification," working paper, Sloan School of Management, MIT.
Samarov, A. (1990): "On Asymptotic Efficiency of Average Derivative Estimates," preprint,
Massachusetts Institute of Technology.
(1956): "Efficient Nonparametric Testing and Estimation," Proceedings of the
Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Berkeley:
Stein, C.
University of California Press.
Stoker, T.M. (1986): "Consistent Estimation of Scaled Coefficients," Econometrica, 54,
1461-1481.
Stoker, T.M. (1991a): "Equivalence of Direct, Indirect and Slope Estimators of Average
Derivatives," in Nonparametric and Semiparametric Methods, Barnett, W.A., J.L. Powell,
and W.A. Barnett
eds.,
Cambridge: Cambridge University Press.
Stoker, T.M. (1991b): "Smoothing Bias in Density Derivative Estimation,"
of Management Working Paper.
MIT Sloan School
Tsybakov, A.B. (1982): "Robust Estimates of a Function," Problems of Information
Transmission
18,
190-201.
- 39 -
3
TOSQ 0074^=128
S
Date Due
Lib-26-67
Download