Document 11044644

advertisement
c.
X
ALFRED
P.
WORKING PAPER
SLOAN SCHOOL OF MANAGEMENT
CONSISTENT ESTIMATION OF SCALED COEFFICIENTS
by
Thomas
M.
Stoker
July 1984
WP #1583-84 |^3V
MASSACHUSETTS
INSTITUTE OF TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE. MASSACHUSETTS 02139
CONSISTENT ESTIMATION OF SCALED COEFFICIENTS
by
Thomas
M.
Stoker
July 1984
WP #1583-84 i^SY
"Thomas
Stoker is Associate Professor of Applied Economics, Sloan School
(1.
o-f
Management, Massachusetts Institute of Technology, Cambridge, Massachusetts,
02139.
This research was funded by
Foundation.
discussions,
P.
Bickel
grant from the National
The author wishes to thank
A.
and C.
estination.
a
Deaton,
Manski
J.
J.
Hausmann, and
for useful
Science
Powell for valuable ongoing
J.
Rotemberg for useful comments, and
conversations regarding adaptive
ABSTRACT
This paper studies the estimation oi coefficients B in single
such that E(y
I
X)
=F (a + X '6)
1
where the function
F
is misspecified or
linear instrumental variables slope coefficient vector of
shown to be consistent for
up
to
a
y
models
inde;-;
unknown.
regressed on
X
A
is
scalar multiple, where the instruments
&re appropriately defined score vectors of
the marginal
distribution
of
X.
The
framework is illustrated by several common limited dependent variable aodels
and models involving a transformed dependent variable.
Similar estimators are
indicated for multiple index models and models where extraneous variables are
present.
The construction of the instrumental
variables is discussed, and
illustrated by several examples. The asymptotic distribution of the
instrufliental
variables estimator is established.
CONSISTENT ESTIMATION OF SCALED COEFFICIENTS
1.
Introduction
In
which
a
this paper we consider the generic econometric modeling situation in
dependent variable
independent variables
expectation
of
as
a
function of
X
a
vector of
where the conditional
and stochastic terms,
X
given
y
is modeled
y
takes the form E(ylX)
Flw+XS). This situation
=
exists for many standard models of discrete choice, censoring and selection,
but
is clearly not
limited to such models.
Our
interest is in what can be
learned about the values of the coefficients g without specific assumptions on
the distribution of unobserved stochastic terms or other functional
aspects;
or
unknown.
For
other words,
in
when the true form of the function
F
form
is misspecified
1
different examples of limited dependent variables models,
Ruud(1983a), Gol dberger 1981
(
Goldberger (1984)
,
)
Deaton and Irish(1984) and Chung and
,
among others, have studied the conditions under which OLS
regression coefficients and other quasi -maxinum likelihood estimators will
consistently estimate P up to
scalar multiple. Ruud(1983a) points out that
a
a
sufficient condition for this property occurs when the conditional expectation
of
each component of
example, when
X
and Irish(1984)
is
X
given
=
cx
+
XP
is
linear in
Z,
which is valid, for
multivariate normally distributed. Goldberger (1981
and Chung and Gol dberger
analogous condition with
An
Z
a
(
1984
)
)
,
Deaton
point out the sufficiency of an
more general definition of
2
Z.
intriguing feature of this work is that it provides special cases
where knowledge of the marginal distribution of
X
is very useful
for
estimating behavioral effects when certain features of the true model are
unknown.
The question is immediately raised as to whether more general results
D-f
this type can be obtained, because as Ruudn983a)
suHicient condition
is
states,
the above
"too restrictive to be generally applicable." Results
which apply to more general marginal distribution forms are of substantial
practical
interest because the marginal distribution of
can in general
X
be
empirically characterized.
The purpose of this paper is to indicate that
distributed random vector, knowledge
of
coefficient vector
of
a
a
X
is
a
continously
distribution
the marginal
general permits consistent estmation of 8 up to
particular, we show that such
if
of
scalar multiple.
in
X
In
consistent estimate can be obtained as slope
a
linear instrumental
variables regression
of
y
on
X,
where the instruments are appropriately defined score vectors from the
marginal distribution of
X.
any two components of this slope
The ratio of
vector will consistently estimate the ratio of the corresponding components of
These estimates may suffice for many applications, such as judging
relative marginal utilities in
a
discrete choice situation. Moreover, because
variables estimator is easily
the asymptotic distribution of the instrumental
established, certain scale free hypotheses can be tested, such as zero
restrictions and equality restrictions on the components
More broadly, the ratio estimates provide
a
choosing specific modelling assumptions. Namely,
of
B.
consistent benchmark for
if
alternative functional
form or stochastic distribution assumptions give rise to substantively
different estimates of 6, the consistent ratio estimates can guide the choice
of
the best specification.
For example,
in
a
binary discrete choice situation,
separate estimates of P under logit or probit assumptions could be judged
in
relation to the consistent instrumental variable ratio estimates.
The exposition begins with notation,
2.
examples and assumptions
The main result on consistency of the instrumental
in
Section
variables slope vector
,
is presented
in
in
Section 3.2.
Section 3.1, with immediate extensions to more general models
they utilize the results
o-f
independent interest because
some potential
The proofs are oi
Stoker (1982 1983)
in
,
novel
a
presents facilitating results on the construction
Section 3.3
way.
the instrumental
of
variables, and Section 3.4 establishes the asymptotic distribution of the
Specific examples
instrumental variables estimator.
distributions Are considered in Section
to the previous literature
where the relation of the results
4,
is discussed.
independent variable
of
Section
contains some concluding
5
remarks.
2.
Notation and Basic Assumptions
We consider the situation where data
yk
and an M-vector of
po(X)
represents the density
The conditional density q(ylX)
(Bodel
,
Xi,
for
f;
=
distribution with density
a
the isarginal
of
variable
where M>2.
l,...,K,
q(ylX)po(X), which is absolutely continuous with respect to
=
finite measure v.
X.
independent variables
k=l,...,K represent random drawings froa
(yk,Xk),
Po(y,X)
is observed on a dependent
a
o-
distribution
of
represents the true behavioral econometric
for which we assume that the conditional
expectation E(ylX) can be
written in the form
(2.1)
E(y IX)
=
F(c< +
for some function F,
constants, and
Z
is
X'e)
=
where a
F(Z)
i
s
a
constant,
single index model.
,
.
.
.
Z
Ph)
'
as an
an M-vector
of
index variable,
This framework is very general, subsuming
(2.1)
any
standard limited dependent variable
such models.
i
defined as Z=k+X'0. We refer to
with
a
0= (B
models, but is not restricted to
Before proceeding to specific examples,
it
is useful
to note the
following generic special case of
Suppose that Z* is
(2.1).
index variable such that Z*-Z is independent of
F*(Z*)
for some function F",
(2.1)
is
implied.
X,
then if EiylZ")
=
This implies the natural result
variables can be omitted from
that important behavioral
general
a
X
in
(2.1)
without
affecting our results, provided that the omitted variables are independent
the included ones.
distributed independently
Example
1
y
represents
a
e
>
=
y
if
1
I
X
)
=F
(ot
true function
F
+
X
'
P
)
F (« +
X
function.
Example
+
X'0 + e for such an inde>:, where e is
We now turn
to some specific
'
di
examples:
chotomous random variable modeled as
-(a+X'S)
is the probability of
y=l
given the value of
determined by the true distribution of e.
B) =*[
If
e
i,
(cx
+X
g
)
/o-)
,
where # is the cumulative nornal
Logit models, etc. can easily be included.
with the
is distributed
and variance a^, then the faailiar probit model
normally with mean
with
4
X.
k
otherwise
=
E (y
of
=
Binary Discrete Choice
:
Suppose that
Here
write Z"
We will
of
results,
distribution
5
Tobit Models
2:
Suppose that
y
is equal
to an index
Z* only
if
Z" is positive,
as
in
the
following censored tobit specification
y
=
(x
+ X 'B + e
E
>
-(a+X'0)
otherwi se
=
Alternatively,
if
if
y=K+X B+e is observed only when e
truncated tobit specification.
>
-(a+X'B), we have the
Example
3
Dependent Variable Transformations
;
Suppose there exists
where g(y)
function g(y) such that the true model is of the form
«+X 'B+E
=
g(y)
a
is
invertible everywhere except for
a
set of
v-measure
0.
A
specific example here is the familiar Box-Cox transformation where
y^^*
a+X B+E
=
^'
with y^^'' =C(y'^-l)/X] for \^0,
y
*
^
=
ln(y)
for
X = 0.
These examples serve to illustrate the wide spectrum of models covered by
with general
(2.1)
found.
function
and many other single index examples can be
F,
Multiple index models are considered in Section 3.2. We now turn to the
other assumptions required.
Formally, we assume that
set Q of
X
is
continuously distributed, having carrier
the following form:
Assumption
1
;
£i
i
s
a
measurable, closed, convex subset of R" with nonempty
interior. For XedQ, where dQ is the boundary of Q, we have
F(a+X •8)po(X)=0 and Xpo(X)=0.
Assumption
allows for unbounded X's, where ft=R" and dQ=0.
1
For
the bounded
case Fpo and Xpo vanish on the boundary, which is obviously implied
if
po
vanishes on the boundary. While the majority of the results employ Assumption
1,
the incorporation of discrete
(qualitative)
independent variables is
discussed in Section 3.2.
The main regularity condition on the behavioral model
Assumption
2
:
F(Z)
is di
from
ft
by
f f
erent i able for all
a
set of
v-measure
5
0.
Z=o< +
is
X'6, where XeH, and
ft
differs
f
reasons we will utilize the translation family generated by
For technical
Po(X), defined as
Q (6) ={X
+
e
P(y,X)e)
I
=
Xeft}
q(y
with B
,
I
{p(Xie)}
=
TT
X)p (X
I
compact subset
a
8)
where plXIS)
so that P (y
,
,
X
I
of
0)
=
po<X-e)
is defined on
R" with interior point 6=0. We set
=Po
(y
,
X
)
,
p(XIO)=po(X) and Q(0)=Q. We
assume
Assumption
3
P(y,XI6)
:
is twice di
f
variances and covariances of
for
(2.2)
^8
X
and the score vector
exist
.1
8
ainP(y.X 18)
38
"
E(.ll
o
is nonsingular
X')
for
all
BeB.
clearly guarantees the existence of the means, variances and
3
covariances
y,
SeB, where
all
The matrix
Assumption
SeQ. The means,
erenti abl e in 8 for all
of
y,
and
X
A,©
for 8=0,
the data set moments.
Note that
Jlo
can be
written as:
(2.3)
If
Sin p(X
ilc
I
81n po(X:
0)
38
8X
we denote the mean of
(2.4)
E(y)
f
y
•(8)
=
<^'i&)
is di
/
for each
SeB
as
yP(y,Xie)dv
then we assume
Assumption
4
:
f f
er ent
i
abl e
for
all
e«8, with nonzero derivative
8=0.
Finally, we give Assumption
5
in
the Appendix, which is
a
purely technical
regularity assumption that assures that derivatives may be taken under
at
expectations. While somewhat formidable technically, these assumptions are
collectively very weak,
3.
Consistent Estimation of Scaled Coefficients
In
this section we consider the slope estimates of the linear equation
Vk =
(3.1)
c
Xk 'd
+
+
Uk
obtained by instrumenting with
evaluated
The slope coefficients
X^.
at
where
(l,.lloi.'),
(2.3)
(di,...,dM)' can be written
=
d
score vector
is the
A,o^
explicitly as
(3.2)
d
where Sox
=
=
(
Sox
E.How (X^-X
covariance matrices.
Soy
*
)
'
)
/K and
Soy
=
EJlok (yk-y
)
/K
are the relevant sample
Section 3.1 we establish the main result that
In
strongly consistent estimator
of
B
up to
a
scalar multiple.
In
d
is
a
Section 3.2 we
extend the result to more general models with extraneous independent variables
and several
index variables.
instruments
jlok
in
Section 3.3 we discuss how to construct the
applications, and in Section 3.4 we establish the
asymptotic distribution of
3.
In
d
for statistical
inference.
The Main Result
1
We begin by showing
A
Theorem
1
;
Under Assumptions
,
1,
2,
3,
4
and 5,
d
=
Tg a.s., where T
We first consider the unbounded case where Q=F" and dO=0.
reparameterizing the translation family
is the population mean
of
X
is
K-»oo
.
nonzero constant.
Proof:
lim
in
the data,
TT
by E(X)
f
m
=
and define E(y)
Begin by
Ho+6, where mo=Eo(X)
=
^(w)
=
4>*(i-'-Po).
a
Since Q(e)=0 for all
Theorem
in
the unbounded case,
application oi
by a direct
Stoker(1983) we have that
oi
2
btB
(3.3)
8f(0)
8<»<Po)
d
liffl
a. s.
38
Hhere the latter equality follows fro* the definition of p.
from computing the latter derivative.
The reeult follows
variables to
By a change of
x
=
X-6, we
have that
E(y)
(3.4)
Now,
=
from Assumptions
^^
(3.'
-
=
4)*(e)
/
2
;
and 5, we differentiate
—
po(x)dv
[;|f
8F
where rr
d
at
i
s
F(a+(x + e) 'B)po(>!)dv
evaluated at
Z =
a
=
/
—
8
(3.4)
as
Po(x)dv
po(x)dv]
+ (>;+6)
'
B.
The result follows from evaluating
9=0 and inserting into
nonzero by Assumption
(3.3),
where T
BF
=
S "tt
the applicability of
Po(X)dv, and
Z
Theorem
2
of
a
Stoker(1983) to this problem. Theorem
longer immediately valid, because ft(8)#fi when
3*"
r^
do
tV''^^
^®
in this case can be written
=
a+X'B. T is
consideration
very careful
applies only when the carrier set does not vary with 8, and so
derivative
=
4.
The bounded case where d0^8 follows from
(3.6)
(3.5)
I
-|^ / F(a + X'B)p(X I8)dv
^® a
ei*0.
of
2
is no
(3.3)
The structure of the
as
+
~
F(O( +
J"
X'0)
po(X)dv
^^ SUB)
where each term is evaluated at 8=0. The first term is the derivative of 4i*(e)
holding the carrier set ^(8) constant at ft(0)=Q, while the second term
derivative
of
*(8) holding
varying the carrier set.
the integrand constant at
By repeated application of
8
F
(a+X
'
)
po (X)
is
the
while
Fubini's Theorem and the
Fundamental Theorem of Calculus, the second term reduces to integrals of Fpo
over boundary points Xed^,
Theorem
of
•first
(an
2
of
so
Stoker 1983)
(
it
vanishes by Assumption
what the proof
Now,
2.
actually shows is that lim Soy is equal to the
derivative. By an analogous argument, we have that lim Sox
Mxtl
identity matrix), so that equation
(3.3)
3t:
38
shown to be valid in this
is
case also. Consequently, the result that lim d=TP a.s.
follows.
BED
is quite nonstandard,
The technique of the above proof
some independent interest.
The results of Stoker(1983)
(and
and possibly of
the predecessor
Stoker(1982)) connect the large sample limits of linear instrumental variables
regression coefficients to the aggregate effects induced by distribution
The above proof exploits
changes, or changes in the sample configuration.
these results by considering the implied aggregate impact
artificial) type of distribution change. Namely,
on E(y)
of
varying the density of
X
OS
of
a
specific
gives the local effect
within the translation family
effect is seen to be consistently estimated by
d.
(but
TT.
This
The desired property of
then established by calculating the value of the aggregate effect via
d
is
(3.5).
This technique of proof, namely to perturb the sample distribution and then
trace the aggregate implications, may be useful
The reason that the translation family
J]
in
other contexts.
works for this problem is that
changes in the implied marginal distributions of Z=o<+X'B are determined
locally in
a
neighborhood
of
8=0 by changes in the parameter B'g.
this feature provides
a
characterization of the scalar T. Namely,
the Bean of the index
2
as
a
n
=
chain rule formula where T
=
E(Z)
«+ (ho+S)
=
8<{>'(0)
,
so that
'
g,
then
(3.5)
In
if
fact,
we denote
is seen to be
T IS interpreted as the change
ar,
in E(y)
induced by
translation.
a
change in the mean of the index E(Z) under density
3. 2
Immediate Extensions
Extraneous Variables and Multiple Index Models
-
The logic of the above proof can be immediately applied to more general
Bodeling circumstances than provided by (2.1), which we outline below. For
this section
we expand the notation slightly to consider two sets of
(only),
and an Mj vector
Suppose that the
independent variables;
an
behavioral model for
implies that the conditional expectation
Xi
Xj.
of
y
given
X,
the form
and Xa 15 of
E(y
(3.7)
y
Mi>2 vector
I
X, ,Xj)
=
F((x, + Xi 'Bi.Xa)
for some function F and constant coefficients od
extraneous variables X^ to the model
(2.1).
and Bi.
(3.7)
We assume that
just adds the
(Xi',X2')'
is
distributed with density po(Xi,X2).
It
is easy to see that
if
Xi
is
continuously distributed and
have no common components, then knowledge of Po(Xi,X2)
estimation
of
Assumptions
respect to
up
6i
through
1
a
5
scalar multiple.
to apply to Xi
In
particular, reinterpret
(defining the translation family with
"
"
3X»
and consider the slope coefficient estimates di
ye
=
Cj
+
Xmd,
+
obtained by instrumenting with
1,
we have that
lini
d|
of
the linear equation
U,
(l,Jlii.')'-
''
Theorem
allows consistent
ain Po(X^,X2)
_
^'
(3.9)
and Xz
only), define the generalized score vector as
Xi
(3.8)
to
Xj
=
By reinterpreting the proof
of
3F
Tipj a.s., where Ti = J'r7- po(Xt,X2)dv,
Zi=ai+Xi Bi.
This result indicates that extraneous variables are acconodated in the
10
above analysis through their impact on the instruments
could be ignored
i
.(li
-f
did not depend on the value of
(3.8)
o-f
The variables Xz
Jin,.
Xa,
sufficient condition is that Xa is distributed independently of
indicated in the discussion of
generalized index
a
Z*
of
Xi
for which
a
(as
Section 2).
The extension permits the analysis of two codinionly encountered practical
situations which were not previously treated. The first occurs when the
variables
X2
are qualitative variables,
not
continously distributed.
The above
result says that when the qualitative variables X2 are not independent
continuous variables Xi, the coefficients 81
be consistently estimated up to
variables regression
(3.9).
a
of
of
the
the continuous variables can
scalar nultiple by the instrumental
The instruments Jin, in this case are just the
score vectors of the distribution of
Xi
conditional on the value of X2,
evaluated at Xi=Xik and X2=X2k.
The second practical
situation occurs when the behavioral model employs
several index variables. Suppose that M2>2 and that the conditional
expectation (3.7) can be written
E(ylX,,X2)
13. 10)
=
F(o<, + X,
in
the two index form
'Bj,o;2+X2'B2)
=
where Zj=Oi + Xi'Bi and Z2=o<2 + X2 62. As above, when
'
distributed and
di
Xj
F(Z,,Z2)
is
Xi
the slope coefficients
and X2 have no variables in common,
consistently estimate Tjgj. Moreover,
if
continuously
X2 is continuously distributed,
then the same argument can be applied to estimating Ba up to scale.
reinterpret Assumptions
Assumption
3
1
through
5
to apply to X= (Xi
'
,
X2
)
,
'
'
Formally,
noting that
implies that no linear combination of the components of
perfectly correlated with any linear combination of the components of
we define
(3.11)
Jl=
31n Po(Xi .Xa)
dXz
11
Xi
is
X^.
If
2
;
coeHicients
and dz as the estimated slope
(3.12)
=
Vk
Ca
Xzk da
+
+
the linear equation
o-f
U:
obtained by instrumenting with (l,il2k')', then we have that lim d2=T2B2 a.s.,
where T2=J'
3F
Po <X|
^-.
d1
Equi valent
(3.1);
1
y
with Z2=a2 + X2'0a.
Xg) dv,
,
we can set
,
(X,',X2')'
=
X
and perforni the single regression
here repeated as
yk
=
where (l,Jlok')'
=
(3. 13)
+
c
(
1
Xk
+
d
,ili k
'
,
u»
Jt2k)
'
is used
as the instrumental
above development, we clearly have that lin
coeHicients
of
both index variables Zj^od
+
=
d'
,T2B2
'
'
)
of
F
the scale factors Ti and T2 will
of
a
Thus the
a.s.
in
is unknown.
(3.10)
It
general not be equal,
that only the ratios of components of Bi or ratios of components of
consistently estimated. Ratios
From the
Xt'Bi and Z2=K2+X2'62 can be
estimated up to scale when the true function
be noted that
(TiPi
variable.
component of 0, to
a
component
should
so
B2 sre
of
B2 are
not identified by d.
A
standard example of
a
two index model obeying
(3.10)
is the selection
bias model studied by Heckman 1979)
(
Example
4
;
Selection Bias
Suppose that
y
=
but that
cx, +
y
y
X,
is equal
Bi
+
to an index
as
in
Ei
is observed only
We assume that
Z,*,
(ei,E2)
if
a
second index
is distributed
conditional expectation of
y
given
Xi
Z2" = cx2+X2 P2+E2 is positive.
independently
and X2 is
'
of
(X,,X2).
Thus,
the
E(y
I
X,
,X2,Z2">0)
so that the structural
X.ei
=
Ki
=
F (0(l+X, 'e,
+
+
E(e,Ie2
>
-(Ka+Xagz))
,«2+X2'B2)
parameters 0i and the selection parameters 02 can be
estimated up to scale without explicit assumptions on the joint distribution
of
Notice that in this example Ti=l, so that lim di=6i a.s.
(Ei,E2).
By comparing Example 4 and the truncated tobit
specification
Example
of
2,
we
see that selection parameters can be estimated up to scale in two polar
situations, namely when the selection index
the structural
structural
index
index
Zj.
Zi,
l-z
has no variables in comson with
when the selection index Zz is equal
or
Moreover, it is easy to verify that
variable appearing in both
Zj
corresponding component of
d
there is
a
common
then the large sample limit of the
and Z2,
of
if
to the
(3.2)
is
the sum of the corresponding
8
components of Tipj and T2B2.
The above discussion has focused on two index models;
clearly analogous
results can be obtained for models with three or more index variables. While
we now return to the notation and framework of
Section 3.1, all of the
following econometric results can be reformulated for the above estimators
wi
thout
di
f f
i
cul ty.
3.2 Construction of the Instruments
In
this section we discuss the empirical
variables
Jlok.
construction
of
the instrumental
There are two cases in which application of Theorem
1
is
particularly easy. The first is when the density po(X) is known exactly, so
that
Jloi,
can be computed directly from
(2.3),
and evaluated at
Unfortunately, this case is never likely to be met
in
case occurs when the form of the density is congenial
13
X=
practice.
in
that
Jl©
Xk.
The second
is exactly
f
colllnear with
X
some known function of
or
We discusE this case in
X.
conjunction with the normal distribution examples
In
in
Section
4.
general applications the above circumstances will not be valid, and
the score vectors
have to be estimated.
will
kot,
marginal distribution of
is modeled,
X
to some estimable parameters,
score vectors
Here we assume that the
with the density po(X)
and establish that the natural
known up
estimates
the
provide valid instruments. We then briefly raise the
Jlok
prospect of estimating
nonpar ametri cal
Jlow
1
y.
Suppose that the marginal density is assumed to lie within
family p"(XIA), so that po(X)
=
p*(XIAo), where A is
parametric
a
finite vector of
a
parameters which can contain the mean, variances and covariances, etc.
with true value A=Ao.
that p* is twice
of
di
f
We make Assumption
6
the Appendix,
of
of
X,
which assumes
erenti able with respect to the components of A as well as
some other regularity properties.
The application of
Theorem
1
now proceeds in two steps.
First obtain any
strongly consistent estimate A of A=Ao using the data X^, k«l,...,K.
Standard goodness
of
fit tests can easily be performed at this stage to assure
the suitability of the assumed parametric form p".
the score vector
l^^
(3.14)
.
kot,
-
for each
k=
l,...,K by evaluating
^JLlLllll^
,=
,
(2.3)
at
and A as
Xw
K
A
and form the instruoiental
Next construct estimates of
variable estiroator d*
«
A
(d t*
A
,
.
.
.
,
dw*)
'
o-f
(3.1)
A
using
(3.15)
Jlok
as
in
d' =
(Sox)-'Soy
where Sox and Soy are the sample covariance matrices between
respectively. The justification
of
JLoi.
and
this procedure is formalized as
14
Xi,
and y^
.
Theorem
2
:
Under Assumptions
2,
1,
d"
=
Tp a.s.
above and Theorem
7
of
5
4,
3,
and 6,
1 i
m
,
K-*oo
Proof:
A
direct application of Theorem
1
reinterpreted to apply to the elements
While Theorem
it
2
of
Stoker 1983)
(
QED
A.
permits the implementation
of
Theorem
1
in
applications,
relies on specific modeling of the independent variable distribution,
question of significant practical
vectors
.llok
can be nonparametr i cal
A
importance concerns whether the score
1
y
estimated, because then specific modeling
assumptions on po(X) would not be necessary.
A
number of natural methods for
such nonparametric estimation come to mind, such as to use an adaptive score
estimate
of
the type proposed in Stone(1975),
Bickel(1982) and Manski
<
19B4)
Unfortunately, to the author's knowledge, no results are available on
nonparametric score vector estimation for multivariate distributions, as the
above papers are concerned only with univariate distributions.
this topic is mentioned because of its natural
Consequently,
importance, but relegated to
future research.
3.4 Scale Free Inferences on
The above results establish the strong consistency of the instrumental
A
variables coefficients d" (and
A
d)
asymptotic distribution of d" (and
as an estimator
d)
hypothesis tests on the true value of
and ratio restrictions
In
to be carried out.
free hypotheses include zero restrictions
<Bi=Bj)
TB.
this section the
established, which allows scale free
is
P
of
Examples of scale
(Bj=0), equality restrictions
(Bi/Bj=c). Because the data on observed
variables and instruments represent i.i.d. drawings, the asymptotic
distribution of d" can be established by very standard methods. Ne sketch the
argument below, which is just the appropriate specialization of the results of
15
(
1
Whitef 1980, 1982)
among others.
,
Consider the generalized setting where an M-vector of variables
observed, so that the full set
represents
a
is
observations (yk,Xk ,Wh'), k=l,...,K,
of
random sample from
Wi,
joint distribution of y,
a
X
and U.
Of
interest is the asymptotic distribution of the instrumental variables
estimator dw
of
equation
obtained by instrumenting nith
(3.1)
(l,Wk')',
defined as
<
3
.
1
dw
6
where S„x
=
=
(Swx)
'Sw,
X^-X
S Ww-W)
(
)
'
/K
and S„y
covariance matrices. We collect
=
E (W^-W) (y^-y /K are the relevant sample
)
sufficient set of regularity conditions for
a
the following results on dw as Assumption
IV
in
the Appendix.
Assumption
7
lists the requirements not covered by Assumptions 1-6 for the specific
application
If
of
this paper.
we define
matrices between
in
Sw
W
turn define Uwk
lim i:(Wk-W)Uwv/K
=
/K(dw
-
(3.17)
=
(2wx)"'2wv, where Ewx and Zwy are the covariance
and y,
=
(y^-y
a.s.,
Sw)
By applying plim Swx
=
=
respectively, then clearly lim dw
X
)
-
(Xh-X
'
)
Sw,
=
Sw a.s..
If
we
then we have immediately that Eo(uwi.>=0,
and
(Swx)-'
^Jlll''""^""''
Ewx, and the Central
Limit Theorem to the second term,
we have
Theorem
3
;
Under Assumption IV, as
mean
K-»oo,
/K(dw-6w) is asymptotically normal with
and covariance aatrix Vw =
Ewu.wu is the covariance matrix of
Uw=(y-E(y))-(X-E(X))
Sw.
16
(
Ewx
)
~
'2wu ,wu Ewx
(
(W-E(W))uw, with
)
'
"'
i
where
Following Whi te (1980 1982)
,
,
V„ is consistently estimated by
Vw=(Swx)"*Swu,wu(Swx')~\ where Suu,wu = EnWk-W)(Wk-W)'Uwk^]/K, and
'
_
Uwk=
(y i.-y
_
)
-
<
Theorem
Xi,-X
3
'
-'•
'
)
dw is the estimated residual
-from
establishes the asymptotic distributions
coefficients estimators studied
in
Stoker (1982, 1983)
Theorem
Wi,
=
3
Corollary
is
applied by setting
4
Under fissumptions
;
(3.1)
1,
2,
all
-for
For
of
the linear
(3.2)
of
d
dw
above,
to yield
Jlok
3,
.
using
4,
and 7, as
5
asymptotically normal with mean
/K(d-Tg)
K-»oo,
is
and covariance matrix V=Eou,ou,
where Sou.ou is the covariance matrix of A-oUo, with
uo=(y-Eo(y) )-(X-Eo(X)
Proof:
Lemma
1
Stoker(1983)
of
)
're.
implies that lim Sox is the Jacobian matrix of
E(X)=Po+6 with respect to 6, which is the MxM identity matrix. Theorem
then yields the result.
Following Theorem
Corollary
5
:
since plim Sox
Under Assumptions
1,
2,
3,
4,
=
plim Sox, we also have
6
5,
asymptotically normal with mean
As above,
V
=
Sou.ou
the asymptotic covariance matrix
=
E[ (.iokA-oi.
above
QED
for d",
2
3
'
)
Uoi.^]/K or V* =
/K(d"-Te) is
as K*m,
and covariance matrix
V.
is consistently estimated by
V
(Sox
and 7,
~
)
'Sou. ou (Sox
'
)
"
'
,
where
Uoi.
is
the
estimated residual from (3.1) with coefficients d*; namely
uoi.
=
(yk-y)-(Xk-X)
Corollaries
4
d*.
and 5 establish the asymptotic distribution required for
testing hypotheses on the value of TB.
This facilitates the testing of certain
hypotheses on the value of g, which are scale free in that they are unaffected
by the true value of
T.
For example,
if
1
is an M-vector of
constants, the
linear restriction l'B=0 is equivalent to l'(TB)=0, under which the test
17
f
statistic
Id*
asymptotically normal with mean
15
Therefore, by choosing appropriate values of
(such as Bj=0)
using d* and
and equality restrictions
1,
and variance I'Vl.
tests of zero restrictions
can be carried out
(such as Bi=6j)
V.
Tests on the value of
a
nonlinear
di
f
erentiabi
e
function of Tg can be
derived by the "delta" method in the usual way. As an example, for testing
whether the ratio 8i/Bj is equal to
/K[
(di
•/dj")
- (01
/B
j)
is
]
a
specific value, we have that
asymptotically normal with mean
and variance cjij,
where
(3. 16)
where di=TBi
ai,
1
=7^
and Vij
+~
^
v..
is the
d.a
i,j
'7?^-
V,,
element of
V.
CTij
is consistently estimated by
evaluating (3.10) using the appropriate components of d* and
4.
V.
Independent Variable Distribution Examples and Related Discussion
Here we present several examples based on specific forms of the marginal
distribution
of
X,
to illustrate the structure of
the instruments
ilok
and
relate our results to the previous literature.
4,1 Multivariate Normal
As indicated above,
Distributions
the implementation of Theorem
1
particularly easy
is
A
if
the score vector
of
OLS slope coefficients of yk regressed on Xk,
Jlo
is exactly collinear with
verify that this situation will occur
• ul ti
van ate
normal
if
18
for then
k=l,...,K.
and only if po(X)
form over 0, as follows.
the form
X,
Suppose that
d
is the vector
It
is
Jlo
of
is easy to
the
can be written in
(4.1)
where
=
Jlo
A
A + BX
is an M-vector
generality,
A=-B(Eo(X)
is
B
and B an MxM matrix of constants,
symmetric and nonsingular. Since Eo(Jlo)=0, we have that
Now,
)=-Bi-'o.
view of
in
which implies that In po(X)
In po(X)
(4.2)
=
-
C
for some constant C,
(4.3)
where without loss of
with respect to
(4.1)
X,
the form
must be of
(1/2)
we integrate
(2.3)
(X-Mo) 'B(X-i-'o)
which for E"'=B clearly implies that
l(Xeft)
po(X)
Pn(X
pN(XI|Jo,i:)dv
J-
I
|-'o,E)
ft
where Pn(XI|-'o,E)
multivariate normal density with mean Mo and
is the
covariance matrix E, and l(Xeft) is the indicator function of the event
Consequently,
(4.1)
the carrier Q,
It
Po
(
p (X
X
I
)
=Pn
X
and when
ft=R",
Pot E)
I
8) =Pn (X
I
The translation family
.
Mo+S,E)
.
computed via (2.4)
Z
=
Pn
TTn
The induced marginal
determined by B'B, as p*
of
then po(X)
(
X
I
multivariate normal over
is
E)
i-'oi
.
informative to reexamine the structure of Theorem
is
(
implies that po(X)
general
in
(
Z
6 g) =Pn
'
I
as E(y)=4)*(e),
(2
k + ijo 6
when Q=R" and
1
is defined via
distributions of
'
I
Xeft.
+
6 B g Eg)
'
'
,
.
Z=
(x
+
X'B are
The mean of
y
can be
equivalently using the marginal density
or
as
(4.4)
f(e)
=
=
E(y)
=
S
**"(e'B)
The aggregate effects on E(y)
a*'
38
(4.5)
Now, since
TTn
p"(Zie'0)dv
F(Z)
./Ml'
changing G can therefore be written as
of
W SO
={
V3(B'e)1/ Vae
is an exponential
g) \
/'
/
9<|>"
U(e
\
p)/
^
fanily with driving variable
19
X,
the results of
imply that the OLS slope coefficient vector
stoker 1982)
(
8* *
•r^
strongly to
(
)
which from
,
gives the result
(4.5)
d
(3.1)
o-f
Theorem
of
converges
1.
The
earlier interpretation of the scaling factor T as the effect on E(y) of
varying
in
r)=E (Z
)
=a
+
+ i-'o'
8 B is obvious from
'
exponential family form with driving variable
same results from Stoker
regression coefficient
It
and
is useful
of
on
yk
another application of the
Z,
indicates that T is the a.s.
982)
Zi,
=a+
is
limit of the OLS
Xk'0.
this point to discuss the results of Ruud<1983a), Deaton
at
(
1984)
While Ruud<1983a)
.
studies
maximum likelihood estimators for binary discrete choice models and
and Chung and Gol dberger
Deaton and lrish(1984)
definition
linear in
of
the indicator
E(X IZ)
G
=
1984)
each paper utilizes
a
employ
generalized
a
condition that E(XIZ)
is
G + HZ
and H are M-vectors.
normality
of
X.
one dimensional
function
Z,
(
Z;
(4.6)
where
1
and Chung and Gol dberger
lrish(1984)
quasi
(
Finally, since p"(Zie'0)
(4.5).
F
For
Z=K+X'g,
The value of condition
for the purpose of
inpacts only on
(4.6)
(4.6)
inplied by multivariate
is that
it
makes
X
calculating covariances with
scalar covariance.
a
is
effectively
y,
so
that
the
This is easily seen from the
following proof, which is basically Chung and Gol dberger
'
s
(
1984)
result for
censored model cases. Let y=F(Z), Sxx, Exy, and Lxz denote the respective
covariance matrices between
X,
y
and Ozy and Cx' denote the respective
and Z,
scalar covariance values. Now, begin by expressing Lxy as
(4.7)
Lxy
=
Eo((X-Po)y)
=
H
=
E2(E(X-|JolZ)F(Z))
Ez ((Z-r>)o)F(Z))
=
H
azy
where the second equality follows fro* (2.1) and the latter equalities follow
:u
•from
(4.6).
function
y
F.
Note that the value of ary entirely captures the impact
recalling here that
Noh,
regressed on
X,
the
the OLS slope coefficient vector of
is
d
o-f
we have
(4.8)
Hcrz,
r^)
where the latter equality -follows from H=I!xz/o-z' and B= (Exx)
~' Exz.
This argument was recalled in order to indicate the identical role played
by the linearity condition
(4.6)
and density translation in the normal
distribution case. The fact that the marginal densities of
e'B suffices to reduce the dimensionality
(4.5),
Zk=ot + Xk
the aggregate effects as in
which IS exactly the impact of the linearity condition
Moreover, equality between
that T
of
depend only on
Z
=
(4.5)
and
(4.8)
(4.6)
on
(4,7).
gives an alternative demonstration
(Jzy/az^, the large sample OLS slope regression coefficient of
yi,
on
B.
Given interest in conditions that depend only on the marginal
distribution of
X,
it
is natural
to inquire how much more general
than
multivariate normality is the linearity condition (4.6) with Z=a+X'B. We have
no concrete answer here,
although no obvious examples of nonnormal densities
where (4.6)
all
is valid for
individual components of
regularity conditions)
X
a,
are imiaediate.
It
is
true that if the
are independent or homoscedast
(4.6)
i
c
implies multivariate normality
Linnik and Rao(1973)), although the implications of
(4.6)
,
then
(c.f.
(under some
Kagan,
to more general
circumstances are not known to the author.
4.2 Mixtures of Normals
An obvious circumstance where the
X
data was nonnormal would occur
the sample distribution displayed several pronounced modes.
21
In
if
the
this case, it
v
might be appropriate to model the
density via
X
indicate below that the large sample limit
appropriate weighted average
of
the limits of
this case is the
in
d
o-f
mixture of normals. We
a
the OLS slope coefficients that
would be obtained from regressions over each of the component normal
di str
1
buti ons.
fill
the intuition of this example can be seen in the case of
of
component normal mixture. Suppose that the marginal distribution of
two
a
is given
X
as
(4.9)
Po(X)
where pi(X)
=
=
using
(3.1)
limit of
(4.11)
d
Pa and
¥
=
-
Xpi
(X
Jlo
where w(X)
+
(l-X)p2(X)
Pn(XIm>,2i), Pr(X)
densities, Pi
(4.10)
Xp, (X)
=
as
A-ok
<
P°'^'
11'^
)
/po
X
(
X
)
.
1.
<
If
d
(1-H(X))E2-MX-(J2)
+
is the instrumental
instrument, then lim
d
=
TP a.s.
variables slope vector of
By direct
lim
=
d
XEi-';(X-Mi)yq(y X)p, (X)dv
+
I
Xd,
+
(
1-X £2"' / X-M2) yq
(
)
(y
I
X
)
pa
(
X
)
d
(1-X)d2
where dj is the large sample value of the OLS coefficient of
X
computation, the
can be written as
=
if
component
The relevant score vector Xo is
w(X)E.-MX-|-',)
=
are the normal
Pn(XIp2,I;2)
=
y
on
X
of
(3.1)
was distributed with respect to the normal density pi(X), and d2 is the
large sample value of the OLS coefficient of
y
on
X
if
X
was distributed with
respect to the normal density p2(X). Consequently, one can consider the proper
slope estimator in this context as weighting together regression coefficients
from samples distributed with respect to each of the component densities.
From Section 4.1 we have that dx=TjB and 62=^2$, where T
=
XTi+(l-X)T2. This
is consistent with
where
^i"
<^**,
the formula E(y)
=
f"(e'e)
=
\^i"{eB)
+
(1
-X
and +2*** are the aggregate functions derived as in
)
<ti2**
(
B B
)
from
(4.4)
translation families generated by po<X), pi(X) and psiX) respectively.
course, separate OLS estimates of d, and da could not in general be
Of
computed with observed data, because
is
it
not
in
general possible to identify
which observations Xk were drawn individually from pi(X) or from pztX).
estimates
to compute d,
Jlok
of
the true score vectors have to be constructed,
which requires estimates of Mi, Ma,
the consistent roots of
Also,
Sz and
2i,
These could be obtained as
>,.
the likelihood equation for
Xi<,
k
=
l,...,K implied by
Finally, the above weighted component regression interpretation clearly
(4.9).
holds for the case of
a
mixture
of
more than two normal components.
Elliptical Distributions
4. 5
the same fashion as OLS slope estimators arise when independent
In
variables are multivariate normally distributed, weighted least squares
for when the independent variables are elliptically
estiffiators are called
distributed. Suppose that the marginal density of
(4.12)
po(X)
where p©
Eo(X)
=
=
and
p*( -y (X-Ho)
Lisa
E-MX-Ho)
X
has the elliptical form
)
positive definite matrix. Here the score vectors
take the form
Ho
(4.13)
where r(X)
-
=
=
a)(r (X)
(X-Mo)
^
'
by /cu
in
(
r
(
X k
)
2-' (X-^o)
E'MX-po)
The proper instrumental
is weighted
)
is the
distance measure and tu(r)=-
variables estimator
d
^
-r
.
or
for proportionately estimating B
least squares, where the data for the k*" observation are weighted
)
.
In
the multivariate normal
case we have
u)(r) =
general that Eo(A-o)=0 implies that the weights u)(r(X))
23
l
for all
r.
Note
are uncorrelated
,
with
however correlations with squares and cross products of
X,
possible.
Po(X),
in
X
are
one would require estimates of the parameters determining
As above,
particular m© and E,
for each observation
in
order to estimate the proper weights
ci)(r(X))
X^.
Multivariate Lognormal Distributions
4. 4
Other cases where
X
is distributed
sample distribution of
X
is skewed,
in
nonnormal -fashion occur when the
a
as one would expect
measuring income or other wealth components. As
for variables
example we consider
final
a
lognormally distributed; namely where In(X-X)
the case where
X
is
distributed as
a
multivariate normal vector with mean m* and covariance matrix
E*,
with X is
a
vector of constants. Here Q is defined as the set
is
=
{XIX>X},
with the standard definition of the lognorsial density augmented by setting it
to zero for
XedQ.
computation, the appropriate score vector for this case is
By direct
given as
(4. 14)
Jlo
where diag(X-X)
[diag(X-X) ]-'
=
is
the diagonal
construct the vectors
estimates
of
X,
A,ok,
[ l
+
(
E»)
"M
n
1
aatrix with
(
X-X) -M»)
i''"
one would evaluate
3
diagonal
(4.10)
at
element Xj-Xi.
Xi,
To
and consistent
h* and E*.
This example points out the close connection between the proper score
vectors and the specification
of
the index
The proper score function is given by
specification
above,
of
in
Z
the behavioral
when
(4.14)
variables in the index Z=a+X'B.
and the index
Z
were defined as
Z=o< +
l
n
(
X)
X
If,
'
6
,
is
equation
(2.1).
the correct
alternatively, we set X=0
then the results of
Section 4.1 would apply, with the proper estimator the OLS slope coefficients
of
Vk
index
regressed on InlX^). While in many applications the precise
Z
-form
of
may not significantly affect the coefficient ratio estimates,
the
is
it
important for the correct application of our results.
5.
Summary and Conclusion
In
this paper
a
linear instrumental
variables estimator
d
is proposed
for
estimating the ratio of coefficients in single index models. The framework is
illustrated by several common examples of limited dependent variables models,
as well
as models
involving
a
transformed dependent variable.
Similar
estimators are indicated for multiple index models, and models where
extraneous variables are present. The construction
of
the instrumental
variables is discussed, and illustrated by several examples of specific
independent variable distributions. The asymptotic distribution of
established for purposes
of
statistical
1
c
to the extent
that
it
is
is
inference.
There sre two major advantages to the proposed estimator
nonparametr
d
d.
First,
d
is
robust to aany specific functional form
and stochastic distribution assumptions.
If
a
particular application requires
only estimates of the ratios of components of 0, then
A
d
will
suffice. Scale
A,
free hypotheses on g can be tested using
d.
Moreover, in
a
general
application
where different sets of modeling assuaptions produce substantively different
estinated parameter values,
d
will provide useful
information for choosing the
best specification.
A
The other major advantage in using
d
is that
it
is a linear
estimator,
once the instruments are computed. Consequently, once the distribution of the
independent variables is characterized, the computation of
d
is easy and
relatively inexpensive, particularly for large data bases.
There are also two drawbacks to the results. First, to construct the
proper instruments, the distribution of the independent variables must be
25
modeled, and the score vectors derived from the assumed density.
This problem
can be overcome by further research on nonpar ametric estimation of
Bultivariate score vectors, which given the current state
estimation, appears very promising.
of
work on adaptive
The second drawback is that our results
apply only to estimating the coefficients of continously distributed
variables, but most serious applications to mi croeconomi
using discrete as well
as continuous
c
data will require
independent variables. While we have
indicated above how discrete variables can be accomodated in the estimation of
continuous variable coefficients, the question
of
how to nonparametr call
estimate discrete variable coefficients up to scale remains open.
i
y
f
Append!
For the purpose
>:
;
Further Regularity Assumptions
differentiating under integral signs, define
o-f
difference quotients as
.^,y,,,„
n
X,Cpoa-he,)-po(X)3
=
D.,(x,h)
yq(ylX)[po(X-he,)-po(X)]
._
i,j=l,...,M, where
for
with
component
j*^^
Assumption
5
and
1
h
is a scalar.
IDyj(y,X,h)l
for
D,
all
J
i
(X,h)
,
j
=
l
of
X,
ej
is the unit
< gi
I
,
<
.
.
.
h
where
vector
We now make
There exists v-integrable functions gvj(y,X)
;
i,j=l,...,M such that for all
I
i*^ component
is the
Xi
<
Ihl
<
and gij(X)
for
ho,
gyj(y,X)
J
(X)
,f1.
For the purpose of
using estimated parameters to construct the score
vector instruments, define
31n p*
A,o(A)
(X
I
A)
dX
Denote the j"' component of A,o(A)
Assumption
of
A
in
6
;
p"(XIA) is twice
neighborhood
an open
of
di
as Jloj(A)
f
erenti able with respect to the components
A=Ao. There exists measurable functions
6vj(y,X) and 6ij(X), i,j=l,...,M such that
I
I
yJloj(A)
Xjl-oj (A)
6yj(y,X)
<
I
I
<
and assume
Gi j(X)
27
(
<
for
all
A
bounded for some x>0,
A
d
an open neighborhood
in
.1+T
where Eo(Gyj)
Ao,
and Eo(Gij)
1+T
are
i,j=l,...,M.
sufficient set of conditions for establishing the asymptotic
distribution
The means and covariance matrices of y,
Assumption
IV
covanance
matri;-;
Uw= (y-E (y
)
)
-
(
:
variables slope estimators is given as
instrumental
of
X-E
Ewx
=
)
(
X
)
'
6w,
E
[
(
W-E W)
(
)
(
X-E
(
X
'
)
)
]
and W exist,
is nonsingular.
the covariance matrix of
For deriving the asymptotic distribution of
X
and the
For
(W-E(W))Uw exists.
the specific estimator of this
paper, we require
Assumption
7
:
For Uo= y~Eo (y
)
)
-
X-Eo
)
(X
)
exists.
28
'
Tg
,
the covariance matrix of JloUo
.
I
Footnotes
1. The sensitivity of estimates to specific stochastic distribution
assumptions in certain limited dependent variable contexts is well known. For
example, Heckman and Singer(19S4) illustrate such sensitivity for duration
models, and establish an approach based on nonpararoetr cal 1 y estimating the
stochastic heterogeneity distribution.
i
2.
See also Greene (19B1
Ruud(19B3b)
technique.
3.
studies
,
1983)
Lawley(1943) and Stewart 1983)
(
,
similar estimation problem and proposes
a
a
different
The behavioral modeling framework of Deaton and lrish(1984) and Chung and
Goldberger 1984) is slightly different to that considered here, since it
subsumes situations where e (our notation) is uncorrelated with X, but
possibly not independent.
4.
(
Man5ki(1975) presents an alternative nonpar ametri
both 0! and g for discrete choice models.
5.
6.
Note that Assumption
= {X + 81 XeiJ,eeB}.
3
requires that F(o(+X'B)
c
method of estimating
is defined
over the set
ft(8)
There are a number of regression estimators that measure the effects of
7.
discrete variables, however none appear to estimate the coefficients of
discrete variables up to the same scalar multiple as applicable to the
continuous coefficients. For example, suppose that Xa is a single discrete
variable taking the values
and 1, and the behavioral model implies that
E (y
The joint density of Xi and X2 can be written as
Xi Xz) =F (cx + Xi B + X2P2)
'
I
1
,
PoUj.Xa)
.
=
(l-X)p'='(X,)
If
X2=0
=
Xp» (X,)
if
X2=l
where X is the probability that X2=l and p-" is the conditional density
given that X2=j. Now suppose that one estimates the equation
yk
=
+
c
Xiw'dj
+
+
X:
of
Xi
u.
A
A
using instruments (1 ,iloi. A,di.
where Jld = Sin po (X, X2) / 3X
so that (dj',d2)
is an estimator of the macroeconomi c effects of varying E(Xi) and E(X2) on
E(y). It is easy to show that lim di=TjBi and that
lim 92=Eo (y X2=l -Eo <y X2=0)
While d2 is a measure of the impact of the
discrete variable X2, the conditions under which lim d2=TiP2 appear to involve
severe restrictions on the structure of the function F.
,
)
,
,
,
)
I
.
For instance, in the selection model of Example 4, if a variable Xi was
contained in both Xi and X2, then its coefficient di from (3.1) will
consistently estimate 0it+T202i, the structural coefficient plus a selection
8.
term.
Man5ki(1984) also proposes similar work on multivariate extensions. It
should be noted that the nonpar ametri c estimation o-f JLok called -for in the
present paper is not as demanding as that proposed by Manski, because the X
data is observed.
9.
10.
of
11.
Moreover, V* is just the "heteroscedast
White(1980).
On this point,
i
ci ty
consistent" variance estimator
see the discussion in Deaton and
30
Irish(1984).
References
"On Adaptive Est i mat i on
Bickel, F'.(19B2),
,
"
ftnnals of
Statistics
,
"F'r opor 1 1 onal Projections
C-F. and A. S. Gol dberger (1984)
Dependent Variable Models," Econometr ica 52, 531-534.
Chung,
,
647-671,
10,
in
Limited
,
Deaton, A. and M. Irish(1984), "Statistical Models for Zero Expenditures in
Household Budgets," Journal of Public Economics 23, 59-80.
,
Greene, W. H.(1981), "On the Asymptotic Bias of the Ordinary Least Squares
Estimator of the Tobit Model," Econometr i ca 49, 505-514.
,
Greene, W. H.(1983), "Estimation of Limited Dependent Variable Models by
Ordinary Least Squares and the Method of Moments," Journal of
Econometrics, 21, 195-212.
Goldberger, A. S.(1981), "Linear Regression After Selection," Journal of
Econometri cs, 15, 357-366.
Heckman, J. (1979), "Sample Selection Bias as
Econometr ica 47, 153-161.
a
Specification Error,"
,
"A Method for Minimizing the Impact of
Heckman, J. and B. Si nger 1964
Distributional Assumptions in Econometric Models for Duration Data,"
Econometrica, 52, 271-320.
(
,
)
Y.
Kagan, A. M.
Mathematical
,
Linnik and C. R. Rao(1973), Characterization Problems in
Statistics, Wiley, New York.
V.
Lawley, D.(1943), "A Note on Karl Pearson's Selection Formulae," Proceedings
of the Royal Society of Edinburgh, Section A
62, 28-30.
,
Manski
,
of
F.(1975), "Maximum Score Estimation of the Stochastic Utility Model
Choice," Journal of Econometrics 3, 205-228.
C.
,
Manski, C. F.(1984), "Adaptive Estimation of Non-linear Regression Models,"
draft. Department of Economics, University of Wisconsin.
Ruud
,
"Sufficient Conditions for the Consistency of Maximum
Likelihood Estimation Despite Misspecif ication of Distribution in
Multinomial Discrete Choice Models," Econometr ica 51, 225-228.
P.
A.
(1983a)
,
,
Ruud, P. A. (1983b), "Consistent Estimation of Limited Dependent Variable
Models Despite Mi sspeci f ication of Distribution," Draft.
Stewart,
M.
B.(1983), "On Least Squares Estimation When the Dependent Variable
Review of Economic Studies 50, 737-753.
is Grouped,"
,
Stone, C.(1975), "Adaptive Maximum Likelihood Estimators of
Parameter," Annals of Statistics 3, 267-284.
a
Location
,
Stoker, T. M.(1982), "The Use of Cross Section Data to Characterize Macro
Functions," Journal of the American Statistical Association, 77, 369-380.
31
stoker, T. M,(1983), "Aggregation, Efficiency and Cross Section Regression,"
MIT Sloan School of Management Working Paper No. 1453-83, revised April
1984.
White, H.(1980), "A Heteroskedasti ci ty-Consi stent Covanance Estiwator and
Direct Test for Heter oskedast i ci ty " Econometr i ca 48, 817-838,
,
,
White, H.(1982), "Instrumental Variables Regression with Independent
Observations," Econometr i ca 50, 483-500.
,
3790 054
a
3
^
Dan DD M MT3
Sfifi
Date Due
Lib-26-67
BASF.MFMT
Q^o^ uo/f^^
Download