Document 11159455

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/likelihoodinfereOOcher
Dewey
bi
1415
\.6Z-\
Technology
Department of Economics
Working Paper Series
Massachusetts
Institute of
LIKELIHOOD INFERENCE FOR
SOME NON-REGULAR
ECONOMETRIC MODELS
Victor Chernozhukov,
Han Hong,
MIT
Princeton
Working Paper 02-05
February 2002
Room
E52-251
50 Memorial Drive
Combridge, MA02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
ht tp://papers.ssi'n.coi'n/'paper.taf ?abstract id- 30 1486
f
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
LIKELIHOOD INFERENCE FOR
SOME NON-REGULAR
ECONOMETRIC MODELS
Victor Chemozhukov,
Han Hong,
MIT
Princeton
Working Paper 02-05
February 2002
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssm.com/paper.taf?abstract id= 30 486
1
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
MAR
7 2002
LIBRARIES
Likelihood Inference in a Class of Nonregular
Econometric Models
*
Victor Chernozhukov
Han Hong
Depaxtment of Economics
Department of Economics
Massachusetts Institute of Technology
Princeton University
First Version:
August 2000
This version: February
11,
2002
Abstract
In this paper
we study
inference for a conditional model with a
the conditional density, where the location Eind size of the
by regression
This interesting structure
lines.
econometric models.
Two prominent
is
jump
jump
in
are described
shared by several structural
examples are the standard auction model
where density jumps from zero to a positive value, and the equilibrium job
search model, where the density jumps from one level to another, inducing
kinks in the cumulative distribution function. This paper develops the asymptotic inference theory for likelihood
based estimators of these models- the Bayes
and maximum likelihood estimators. Bayes and
sical procedures.
offer
some
While
MLE
is
ML
estimators are useful clas-
transformation invariant, Bayes estimators
and computationcJ advantages. They
theoretic
efficiency properties.
We
also have desirable
characterize the limit likelihood as a function of a
Poisson process that tracks the near-to-jump events and depends on regressors.
The approach
auction.
flexible
We
is
applied to an empirical model of a highway procurement
estimated a pareto model of Paarsch (1992) and an alternative
parametric model.
Key Words:^Structural Econometric Model, Auctions, Job Search, Highway
Procurement Auction, Likelihood, Point Process, Stochastic Equisemicontinuity
'We would like to thank Joe Altonji, Stephen Donald, Jerry Hausman, Bo Honore, Joel Horowitz, Shakeeb Khan, Yuichi Kitamura, Rosa Matzkin, Whitney Newey, George Neumann, Harry Paarsch, Frank
Schorfheide, Robin Sickles, Richard Spady, Max Stinchcome as well as participants in the CEME conference, the
EC2
meeting and at various seminars for valuable suggestions. The problem was suggested to
Amemiya. We are grateful to him for enlightening conversations and continued
us by our advisor Takeshi
support.
[For convenience of refereeing only.
To be removed.]
Contents
1
2
3
4
Introduction
1
The Basic Model
2.1
Assumptions
2.2
Definition
2
2
and Motivation of Bayes and
Asymptotic Theory
for
ML
Estimators
3
The Basic Model
5
3.1
Likelihood Limit
5
3.2
Large Sample Properties of Bayes Estimators
7
3.3
Maximum
8
Likelihood
Nonlinear Model with Nuisance Parameters
8
10
4.1
Limits of Likelihood
4.2
Asymptotic Behavior of Bayesian and
ML
Estimators
11
11
5
Efficiency
6
Confidence Intervals and
7
Empirical Illustration
15
A
Useful Background Definitions
20
A.l
A. 2
A.3
B
Practical Questions
13
20
Point Processes
Convex Objectives
20
Stochastic Equisemicontinuity
Proofs for the Linecir
20
Model
Proof of Theorem
21
1
22
B.2
Proof of Theorem 2
Proof of Theorem 3
B.4 Lemmas 4-6
25
B.3
26
B.l
C
Some
Proofs for the Non-Linear
28
Model
30
C.l
Proof of Theorem 4
C.2
Proof of Theorem 5
C.3
Proof of Theorem 6
C.4
Proof of Theorems 7 and 8
34
C.5
Lemmas
36
7-8
30
32
.
33
Introduction
1
we consider a conditional model with a jump
In this paper
whose location and
size are described
by regression
lines.
in
the conditional density,
This model was
first
proposed
by Aigner, Amemiya, and Poirier (1976) in the context of production analysis. Many
recent econometric models also share this interesting structure. For example, in standard
Donald and Paarsch (1993a), the conditional density jumps from
Neumann, and Kiefer
the
density
jumps
from
one
level
to
another,
inducing
in the distribution
kinks
(2001)),
function. In what follows, the former model is referred to as the one-sided or boundary
model, while the latter model is the two-sided. It is typical in these models that the location
auction models,
cf.
zero to a positive value: in equilibrium job search models (Bowlus,
of jump
indispensably related to the parameters of the underlying structural economic
is
model. Learning the parameters of location
is
thus crucial for learning the parameters of
the underlying economic model.
Several important, fundamental papers developed inference methods for such models,
including Aigner,
Amemiya, and
Poirier (1976), Ibragimov
and Has'minskii (1981), Flinn
and Heckman (1982), Christensen and Kiefer (1991), Donald and Paarsch (1996), Donald
and Paarsch (1993b), Donald and Paarsch (1993a), Bowlus, Neumann, and Kiefer (2001).
Ibragimov and Has'minskii (1981) (IH afterwards) obtained the limit distributions of Bayes
and maximum likelihood estimators(MLE) without covariates. Donald and Paarsch (1996)
MLE
dealt with
in the one-sided
(boundary) models with discrete covariates.
by Aigner, Amemiya, and Poirier
and ML estimators in the general two-sided regression model are still unknown. The properties
of Bayes estimators in the one-sided model and the properties of MLE in the one-sided
model with general regressors are also open questions. Without understanding these basic
properties, using classical estimation principles in these econometric applications may be
Nevertheless, the general inference problem posed
(1976) hcis remained unresolved.
The
basic asymptotic properties of Bayes
questionable.
In this paper,
we develop the
Maximum
cisymptotic theory of Bayes and
Likelihood
estimators for a general conditional model of a density jump, including one-sided and
two-sided models with arbitrary covariates.
estimation procedures. While
some
theoretic
Bayes estimators and
MLE is transformation invariant,
MLE
are attractive
the Bayes estimators offer
and computational advantages, and are convenient
in practice.
They
cJso
have desirable efficiency properties.
Further details
hood process
is
may be summarized
as follows.
We will show that
the limit of the
likeli-
a stochastic integral of a Poisson point process that tracks the conditional
near-to-jump events.
The
result
is
analogous in
spirit
to that of
Chemozhukov
(2000),
obtained for the extremal quantile regression.
It will
hood
be shown that Bayes estimators behave asymptotically as functions of the
limit.
asymptotically equivalent to
sample average
is
ML.
In fact, Bayes estimators axe efficient in terms of finite-
and asymptotic average-risk optimality, which strongly
do not study the minimax criteria. Recent contribution by Hirano
risk optimality
justifies their use.^
'MLE
likeli-
Unlike the usual case of regular parametric models, Bayes estimators are not
We
not optimal for loss functions in conventional sense, but
it
may be shown
to be optimal for a
and Porter (2001)
offer
a substantive analysis in this interesting direction in the context
of one-sided discrete covariate models.
We
will also
demonstrate that the
Our proof
likelihood limit.
(1999). In our opinion, the result
econometrics and
Finally,
we
MLE
behaves asymptotically as a function of the
uses the concept of stochastic equi-semicontinuity of Knight
makes
a convincing case for its further applications in
statistics.
will
study these methods
and apply them to estimate models
model we estimate is a stylized pareto model of
in simulations
of a highway procurement auction.
The
Paarsch (1992). The second one
a flexible parametric alternative to the non-parametric
is
first
model of Guerre, Perrigne. and Vuong (2000). We also implemented computer programs
with Monte Carlo Markov Chain methods for the estimators. These programs are available
from the authors.
The paper
organized as follows. Section 2 describes a basic linear model. Section
is
3 develops the asymptotic theory for this model.
Section 4 considers a
more general
nonlinear model with nuisance parameters. Section 5 discusses efficiency issues. Section
Throughout the paper, c and
6 discusses practical aspects of inference and estimation.
C
denote generic positive constants;
distribution, respectively;
2
and
•
|
|
—
>
—
and
>
denote convergence in probability and
denotes the supremum
norm
of a vector.
The Basic Model
This section begins with a basic linear model, which helps establish the main results
Extensions to general nonlinear models are given in section
2.1
The
clearly.
4.
Assumptions
bcisic
model, denoted R, takes the following form
Y,=XlP + ei,
where the error
Cj
(1)
has the conditional density / {e\Xi,P), parameterized by P belonging
R''. We denote the reference parameter as po^
to the set B, a compact, convex subset of
and assume Po S
interior^.
The
conditional density has a
\\mf{e\x,p)
=
jump
at zero:
q{x,p),
ctO
\imf{e\x,P)=p{x,p),
(2)
e-l-O
p{x,p)>q{x,p)+6,s>o,
VxeX,v/3eB.
X
In other words, the conditional density of Y given
jumps at the location X'P, which
depends on the parameter p and covariate X The shape of the density may also depend
on the parameter p. In section 4, it will be made dependent on other parameters as well,
.
and X'P
will
be generalized to a nonlinear function.
generalized Dirac loss function.
We
have two models to consider: the one-sided model and the two-sided model. The
one-sided model has
two-sided model has
conditional density jumping from zero to a positive constant.
its
positive value. Figure
The
conditional density jumping from one positive value to another
its
1 illustrates
The one-sided model is a special case
Amemiya, and Poirier (1976),
the two models.
of the two-sided model. In addition, as suggested by Aigner,
the two-sided model
may be applied to one-sided models, using the lower density
More generally, the two-sided model approximates models
to account for outliers.
sharp increase
finite
region
with a
the density, whose location depends on parameters and regressors.
in
sample distribution of parameter estimates
in
The
such a model are approximated by
that in the model with density jump.
It is
typical in these models that the location of
jump
is
parameters of the underlying structural economic model.
location
We
is
indispensably related to the
Learning the parameters of
thus crucial for learning the parameters of the underlying economic model.
maintain the following additional cissumptions for model R.
Assumption
(C.l) {Yi, Xi)
The
1
is
an
following statements apply to x in
i.i.d.
sequence of vectors in
Kx
M'',
X
and P
defined on
Fx, with compact support X, that does not depend on
>
c
>
(u|i, P)
=
q (x, P)
g(x,/3)
ii.
/
(C.3) Except at
e
=
=
0, for
u <
€ K,
iiT
|c|
is
0,
\fJnf{e
(5
+
>
0,
e
and
P, that are
upper-semicontinuous at
Ex J\-fyf [y - X'P\X;P)
> 0, C >
< K,
in case C.2.ii this
c.d.f
0.
0.
has continuous derivatives in
0, f{e\x, /3)
dominated: sup^^g
all c, e
T^ Pp)- Xt has
or
uniformly in u,x.p. W.l.o.g. /(e|i,/?)
(C.4) There exist
(fi,
Var{X) >
/?,
P and x
(C.2) in addition to (l)-(2), uniformly in
i.
in B:
\dy
<
0; its
bounded
derivative
is
oo.
such that uniformly in x and P: in case C.2.i, for
c\x,p)\<C{,^x)=C\i,\nf{e\x,p)\'+';
apply only to c,e
:
e
+c >
0.
Moreover, sup^ Ep^C{et,Xt)
<
oo.
Assumption C.2 allows for the boundary case, where density is zero to the left side of the
is positive on the right side. It also allows for the two-sided case, where density
positive on both sides. We distinguish these two cases to organize the proofs better.
jump and
is
C.3 and C.4 are needed for uniform convergence of a continuous part of the likelihood
ratio. It will
tails.
consider a
2.2
The
be
satisfied as long as the derivative of the density
is
not ill-behaved in the
Finally, the linearity of the regression function eases the exposition. Section
more general non-linear model with nuisance parameters.
Definition and Motivation of Bayes and
likelihood function for the
LniP)
model
=
n
i<n
is
ML
given by
/(^'
- X',p\Xi;p)dFx{Xi).
Estimators
4
will
A. density of a two-sided
model
B. density of a one-sided
model
\
J
2
4
6
S
10
y
A. cdf of a two-sided model
Figure
1:
B. cdf of a one-sided
model
Panels B. correspond to the one-sided model. Panel A. corresponds to the two-sided
model, which arises in equilibrium job search models and translates into kinks of cumulative
distribution functions, see
models,
when with a low
Bowlus
The two-sided model also arises from one-sided
we draw an outlier which ends up below the boundary of
et al (2001).
probability
the support. This model was initially proposed by Aigner et
for
al
(1976).
See also Hajari (1999)
a robustness critique of the one-sided model in the auction context. Note that the location
of the density
introduced
jump depends on parameters and
in section 4.
regressors. Additional
shape parameters
will
be
and the
ML
estimator'^
defined as
is
=argmin -
/?ML
On
LniP).
the other hand, the Bayes estimator minimizes the posterior expected loss
P
= argmf
-
p„(6
/
dp,
p)
Jb
beB
JB^n{P)q{P)
whhere p„ (x) = p{nx) is a loss function, and q(-) is the prior density or weight function on
B. In the above expression, Ln{P)q{p)/J,^ Ln(P)q{p)dP is the posterior density conditional
on the data (Yt,Xt,t < n). It does not depend on dFx{Xi). We impose the standard
assumptions on p(-) and
Assumption
(D.l) q(-) >
(D.2) p(-) >
2
is
continous on B,
is
convex, and
Examples of conventional
loss
(z)
p
= z'Wz
where A >
IH(1982).
q{-), cf.
is
majorized by a polynomial of
=
as
|u|
—
¥
oo.
quadratic
loss functions, satisfying condition (D.2), include the
W and the absolute deviation loss p
for positive definite
and abs(2)
\u\
(z)
=
A'abs(z),
{\z\}.
ML and Bayes estimators are efand asymptotically equivalent under asymptotic normality. If normality does not
apply, as in our case, the Bayes estimators and MLE are typically not asymptotically
For symmetric and bowl-shaped loss functions, the
ficient
equivalent.
Therefore,
MLE
does not inherit the efficiency of Bayes estimators; Bayes
estimators are average-risk efficient under conventional loss functions, while
Asymptotic Theory
3
We
not.^
The Basic Model
for
begin with the asymptotic behavior of the likelihood process, and then proceed to
asymptotic distributions of Bayes and
3.1
In
MLE is
ML
estimators.
Likelihood Limit
modern eisymptotic
analysis,
a common
first
step
is
to find the finite-dimensional or
marginal limit of the likelihood ratio process. The limit eventually serves to describe the
asymptotic distribution of Bayes and
ML
estimators.
Such an
initial
step
is
called the
convergence of experiments, see van der Vaart and Wellner (1996).
Consider the local likehhood ratio function
£„(z)
^dFxiXi) does not depend on P and
=
is
LniPn
+ z/n)/Ln{Pn),
hence irrelevant.
^See for example van der Vaart (1999). Efficiency for
not for quadratic or absolute deviation
loss.
MLE
could be claimed for the "delta loss" but
,
where
=
/?„
+
/?o
^In.,
6
(5
for
any
finite
denotes the true parameter sequence.
R'^
study asymptotic efficiency
Finite-dimensional
later.
This
is
needed to
weak convergence means that
J
[tn[z,)..
and
(fi-di)
^cxi() is called
a
fi-di limit.
j
<
J)
^
3<J):
{eo.{Zj)..
—
In this section,
>
(3)
denotes convergence under
P/3„
.
Define
pW =p(X/?o) and g(X) =9(X^o)Theorem
1
Given assumption
t,
the fi-di
weak
-
e^{z) =exp{z'EX\p{X)
X exp
limit of likelihood ratio in{z) equals
q{X)]}
where
/ l^{j,x)dN{j,x)
Je
x)=\n^l[0<j< x'z] -Hn 4^1 [0 > i > x' z]
Z,(i,
p(x)
N
a Poisson
is
{Xi} are
i.i.d.
q[x)
random measure N(-) =
with
d.f.
Fx and
,
Jl
=
{^/}
^^i
is
n/q{X!),
1 \[Ji, -^i)
T[
=
and
dom
variables that are also independent of {Xi}
are two
i.i.d.,
2i=i
1 K-^i'
'^D ^
]> 'where
an independent copy of{Xi},
{£i}
{£",'}
ۥ]-!-
-{£[
+
(4)
...+£l),
mutually independent sequences of standard exponential ran-
The result heis an intuitive appeal. The
The deterministic "outer" part,
and {X-}.
limit likelihood ^00(2) hais
two informative
parts.
£i(z)=exp{z'EX\piX)-q{X)]},
can be regarded as information created by the far-from-jump data. The "inner" stochastic
part,
£2(2)
=
exp
can be interpreted
N
is
l,{j,x)dJ>i{j,x)
J
cis
= exp J2UJi,Xi) + J2iM.xl)
.t=i
information created by near-to-jump data.
an asymptotic model of the near-to-jump data. As equation
(4)
of N, {Ji,Xi) and {Ji,XI), depend on regressors Xi in a complex way.
shows, the points
N
is
the limit of
the point process
N(-)= J2 i[{nei,Xi)e-]+ J2 l[ne„X,)£-].
i:ti>0
i:ei<0
The mecisure N(vl) counts the number of points in
the limit behavior of
N(A) depends
ciny
given set A. For any bounded set A,
only on near-to-jump errors nci and the corresponding
6
The
covariate values.
to mutually
dependent
smallest
|£i|'s
gamma
are the ones that matter and they converge in law
variables. Furthermore, in large samples the likelihood
driven mainly by near-to-jump data, revealing
rate
is
Op{n~^)
at
/3
rate.
The
not surprising. In a simplest one-sided case with no covariates,
fast
is
convergence
MLE is the minimal
order statistic that converges to the end-point of the support at Op{n~^) rate.
Note
g{x)
also
an important simplification of the formulae
in the one-sided case.
Since
=
£,{z)^exp{z'EXp{X)},
ifJ, >A'/z,Vi
1
'0
The
z.
inner part £2
is
Otherwise, £2{z)
very informative in assigning the zero likelihood to certain values of
Once
is flat.
£1
and
£2
£2(z) equals
by minimizing the posterior
=
7r„ [u) is
is
assigned a zero inner likelihood.
(z) is
section
=
£n (w) q
the loss function.
and q
As n —>
n{/3
(/3) is
P„)
is
related to the likelihood ratio process
/
p{z -u)-rrn(u)du.
oo, Ujt
(/?n
£„ (z)
is
-|-
/in
u/n)
+ u/u) du.
the likelihood ratio process defined in the previous
approaches
The
E"*
2 Suppose assumptions 1 and
is
is
=
it
7r„(z)
approaches the limit
thus simple to conjecture.
2, also define
T^{z)^j p{z-u)
= argmin^gudFoo
7roo(u)
a function of the likelihood limit only;
and the posterior
limit posterior ttoo
from prior information. The result
Suppose that Z^o
(«) 9 [Pn
= n{B — Pn)-
the prior density function.
^oo (w) / Jjijj^oo (w) du.
Theorem
—
the posterior density on the rescaled parameter space Un
Kn{u)
free
no z
loss:
^niz)=
is
0,
shape the limit likelihood.
The normalized Bayes estimator Z„
p
the outer part £i{z) further shapes the
Large Sample Properties of Bayes Estimators
3.2
where
1,
>
model, when q{x)
likelihood. In the two-sided
Both the
otherwise
for £ao
^^^f
{z) is uniquely defined in
(•)
specified in
Theorem
1:
du.
W^
a.s.
(*),
then
7 -^ 7
Remark
3.1
minimum
at z
The
=
condition (*)
is
automatic
for strictly
0, since iooiz^ is positive a.s.
measure by the assumed non-degeneracy of X.
convex functions p(z) with unique
on a subset of
M''
with positive Lebesgue
Maximum
3.3
MLE
Zn —
Likelihood
n^ML ~ Pn)
maximizes the
local likelihood ratio process:*
Zn = aigmm.^u^ - £„
(z)
Because ^n(-) is a highly non-regular function, the standard uniform convergence arguments are not applicable. One approach, taken by IH{1982), treats £„{) as an element
of a Skorohod space. There are substantive difficulties with this approach in the regression case where there is more than one parameter. Instead, we employ Knight's stochastic
equisemicontinuity, which converts the finite-dimensional convergence of discontinuous objective functions into convergence of argmins.^
this
Appendix
A
provides a brief discussion of
new concept.
Theorem
a.s.,
3 Suppose assumption
then
—
Z„
Remark
The
3.2
1,
and
Zoo
>
=
that
— ^ooC-^)
argmin^gKrf
-
attains a unique
may
fail
MLE
in
W^
(.oo{z)-
condition that —£oo{z) attains a unique
erwise the limit distribution
minimum
minimum
a.s.
is
needed, oth-
to exist.
with discrete covariates in the one-sided model has
The important
special case of
been studied
the remarkable pioneering work of Donald and Paarsch (1996) and Donald
in
The
and Paarsch (1993a).
importantly, also two-sided
results obtained here extend to continuous covariates and,
Ccises.
Nonlinear Model with Nuisance Parameters
4
In this section
we consider a more
general nonlinear model and also introduce nuisance
parameters. While the linear model conveys the basic flavor and allows for a better explanation of the proofs, the nonlinear setup conforms with the economic models described in
the introduction.
The
generalized model, denoted R,
Y,=g{XuP) +
where the error
and a £
AC
£j
R''^.
has conditional density / (e|Xi,
We
assume that the
the reference parameter 70
^If
Zn
is set- valued,
=
(/3o,
set
Q
=B
is
given by
U,
j5,
x
hy P & B C R'^^
compact and convex and that
a), parameterized
A
is
Qo) belongs to the interior of this
we may choose any measurable solution.
set.
Alternatively, define
— t„{Z„) < inf^g^i — fn(z)
which it may be difficult to find
\
Zn
as any measurable
0. Allowing approximate
en,fn
solutions is useful for situations in
the exact optimum.
^MLE is a special Bayes estimator minimizing the posterior loss T„{z) = f Sz{u)(n{u) du = tni^),
where Sz{-) is the delta function, which is too irregular to be a subject of the previous section.
en-approximate argmin:Zn
is s.t.
-t-
The
both
/?
jump
size of the
and
of the conditional density of
e,
may depend on
at 0, given X^.
q:
hm / (e|x, P.a) =
q [x, p. a)
.
\imf{€\x.,P,a)=p{x,P,a),
(5)
£4.0
pixj,a)>
Vi e X,
q{x,0,a)+S.,S>O:,
iP,a) e W' ,d
=
d^
+ d-z-
X
In other words, the conditional density of Y given
jumps at the location g{X,0),
which depends nonlinearly on the parameter P and covariate X. The shape of the density
depends on the parameters P, a and covariates
The additional shape parameter q is
X
.
not related to the parameter of the location function. This model
more
flexible
is
therefore considerably
than the basic linear model.
The assumptions and
model are very
results for the non-linear
the linear
similcir to
model. However, the presence of nuisance parameters adds to the complexity of exposition.
We make the following additional technical
Assumption
(E.l)
The
3
assumptions:
following statements apply to x in
and
is
bounded uniformly
in e,x,7;
in a, uniformly in e,x,7.
eachx and
Var
and sup^g^
7;
(/?,
a) in Q:
=
ln/(y;
either (a) Ep_^
-
exist
€ K,
|c|
hold for
all
^'' (t')
=
If (2/-5(x,i);x,7)
c, £
{y
for each
>
0,
< K,
c,€
:
e
C>
\-§2
+
The parameters
0,
>
0.
5
>
0,
is
except at
upper-semicontinuous at
e
—
0,
=
e
for
c».
uniformly in x and 0.
w.r.t. p,
is
=
(<,s) in
an open
ball at
[^'i(7')]'
(l')]
is
7
=
{P,a),
uniformly nonsingular
such that for any x and 7: in
+ c|x,7)| <
Ccise E.l.i, for all
C(e,x) and in case E.l.ii this only needs to
Moreover, sup^ Ep^C{ei,Xi)
include location parameters
a
7,
uniformly nonsingulcir and bounded, or (b)
and Ep^ [^'>
ln/(e
c
'^
Under the same conditions, second derivatives
< C" (£,x) and sup^ Ep^C"{ei,X,) < 00
the inference about
x and
hold uniformly in 7.
(a) ajid (b)
iiT
e,
- g{X,P)\X;'y)\dy <
g[X^,t); X, .-)'). For 7'
F^^j (7')
and bounded;
There
7
positive definite uniformly in p.
is
QR
(E.4) Let /i(7')
/(e|x,7)
W.l.o.g.
Ex f \^
e,
c.d.f
has continuous and bounded second derivative
two continuous and bounded derivatives
(E.3) g (x,P) has
(E.6)
=
and 7
{Yi,Xi) is an i.i.d. sequence of vectors in Ex R'', defined on {Q.,T,Py). Xi has
Fx, with compact support X. (5) holds, and uniformly in P,a, and x
i.
or ii. / (£|x, P) = q [x, P, a) = 0, for e < 0.
q(x,P,a) >
(E.2) Density /(e|x, 7) hcis continuous derivatives in
(E.5)
X
Eire
<
00.
dominated:
|
gfl^. In
P and shape parameters
/
(e
a. If
P
+
c|i, 7)
is
known,
|
regular. Thus, the assumptions E.1-E.6 reflect a mixture of non-
regular assumptions like in section 3 and in IH(1982) and regular ones like in van der
Vaart and Wellner (1996), chapter 7 (mean-square differentiability). Conditions E.2-E.3
impose reasonable smoothness on the location function and the density function. E.4
imposes a standard mean-square differentiability and finite information matrix for the
shape parameter a. E.5 and E.6 impose a standard domination on the score function and
derivative.
its
We
ML
next derive the limit likelihood process, followed by the asymptotics of Bayes and
estimators.
Limits of Likelihood
4.1
The
model
likelihood for this
of the form:
is
Lnh) =
n
fiy^
-
g{X,,p)\Xi;'y)dFx(Xi),
i<n
ML and Bayes estimators, 7ml
where dFx(Xi) does not depend on 7 and factors out.
iBayes, are defined as in section
The convergence rates needed
2.
n and
limiting likelihood ratio process are given by
Hn
^/n, for
respectively. Define
as a diagonal matrix with l/n in the first dim(/?) diagonal entries and l/^/n in the
=
remaining diagonal entries. Let 7n{z) = 7n + HnZ for z
ratio process is given by £„ {z) = L„(7„(2))/L„(7„).
+
^ K^
(u, v)
—
> denotes convergence in distribution under
jd
HjiS for given 6 6
In this section,
7o
and a
fi
and
to obtain a nondegenerate
Theorem
4 In model R, given assumption
P-y^
.
The
where 7„
,
the finite- dimensional
3,
localized likelihood ratio in (2) takes the following form,: for
A(x)
s
local likelihood
weak
=
(/3ji,
Qn) =
limit of the
dg(x, Po)/dl3
^00 (2) =^100(1^) x£2oo{m),
4oo(w)
=exp Iw'v- -v'Jj^vj
£2oo{u)
=exp [u'EA{X)\p{X) X exp
q(X)]
/ l.u{j,x)dN{j,x)
Ue
JE
d
I^Mto)
^1
Theorem
W
=
where lu{j,x)
process in
The
In
1.
result differs
as expected.
Second,
variable with
its
[0
<
<
j
A(a;)'u]
+
In
normally distributed
is
from that
in section 3.
we have a new term
7.
Note that
if
j
N (O^Jy^)
First,
>
A(a;)'u]
,
N
is
/3
the Poisson
and independent ofN.
A{X) = dg{x,Po)/d0
which
is
replaces
X,
a normal random
^ioo(i') is
a
van der Vaart and Wellner (1996), ch.
were known, we would end up only with the standard term
e.g.
needs to be estimated, we have the mixture of "regular" information about
the shape parameters
/?.
>
variance inversely related to the information matrix. Thus,
the parameter
/?
[0
^ioo(v), the log of
standard term for regular likelihood inference,
^ioo(v). Since
^1
-h (70)
da
a and
the "non-regular" information about the location parameters
Moreover, these information components are asymptotically independent.
10
Asymptotic Behavior of Bayesian and
4.2
=
Next consider the normalized Bayes estimator Z„
Theorem
3,
Suppose
/rj ^oo
also that
Zoo
=
argmin^gjjdFoo [z)
Z-n
—
Z^
2.
P2 {v
—
Z^ =
>
argmin„
1
i'')^2oo {v')dv'.
Z^
Theorem 4
Z„
tion 3,
is
6
(MLE
These
>
Z^ =
and Z^
—
>
/?),
^{a - a))
as-
argmin^gjjj
results generalize
—
a.s.
(*),
then
Z^^
=
argmin„ /^^^
>
=
Zoo
Theorems
Consider next the normalized
minimum
argmin^gKrf
and
2
- ^00(2)
we have Z^
and
3.
Z^
a.s,
—
¥
Z^ = J^^W =
iV(0,
J~^)
are independent.
In view of asymptotic independence befor these
parameters
Also, the limit distribution of the Bayes estimator of
shape parameter q coincides with that of
is
and
in ^looCw)
Nonlinear Models) Under model R, assump-
for
— ^2tx3(z)- Z^
are asymptotically independent.
Anderson's lemma). This
M"*
.
tween the shape information and location information, the estimators
5
Q„)).
p(z). If the additive separability does not hold, part
multiplicative additive separability of £oo{z),
and Z' —
—
-^oo
^
^ooC'^) attains a unique
Zn
By
uniquely defined in
due to multiplicative separability of £oo(^)
Asymptotics
and assuming that
is
applies, while part 2 does not.
is still
= {Zl Z^) = {n0 -
Theorem
/?„), y/n{Q.
Z^
and additive separability of
of the
MLE
—
[z') dz'
- u') (.i^o («') du'
f^^^ pi [u
and
are independent.
Note that the independence
^2oo{'^)
{n{l3
and that p(z)
"'R''
1.
=
(Z^, Z^)
Estimators
Asymptotics for Nonlinear Models) Assume model R,
= pi(u) + p-ziv). For £^ () in Theorem, 4 define:
5 (Bayesian
sumptions 2 and
ML
MLE,
if
the loss function p2
is
symmetric (by
not the case for the estimators of the location parameter
/3.
Efficiency
The Bayes estimators are
loss functions.
Theorem
efforts
This
is
exactly finite-sample average-riskefficient(ARE) under particular
an instance of a well known
result, formally stated in
Theorem
7.
statement an asymptotic one. These results justify one of the main
of this paper - the study of Bayes estimators. The ML estimators of location
8
makes
this
parameters are not equivalent to Bayes estimators even Eisymptotically and, unlike the
usual case, do not share the optimality of Bayes estimators in large samples.
Average
risk efficiency is
Lehmann, and
one of the
others. Before writing
it
classic efficiency
down
11
concepts developed by Wald,
formally for our
csise, it is
helpful to review
.
the basic idea. Given a parameter 7. an estimator 7. and a loss function p^ (x) = p [H~^x).
we can compute the expected risk as Ep^pn{-y — 7 ) The average risk takes the form
Jij
—7
Ep^pni'y
To address
)q{'y)d'y
section 4, and let 7n((5)
fn
=
where 9
is
a weight function
fn{{yi, ^i)i=i):
=
jo
+ Hn6.
and denote the
Consider
RpAf:K) =
q
[
(ARC)
Ep^^^,^ [p [h;^' [/„
JK
the weight or prior measure
is
earlier. Division
all statistics
set of all such
Define the (exact) average risk criterion
where
(e.g.
=
9(7)
1).
the asymptotic results, consider the following notation, define i7„ as in
by 'Leb{K)
mappings as F„.
as
7n(<5)])
]q{i„{S))dS
uniform) and p
/Leb(Ji:),
the loss function, defined
is
immaterial at this point.
is
Theorem 7 (Finite-Sample ARE)
For fsayes £ ^n, defined by
(e.g.
-
(measurable mappings of data)
loss
Suppose model
R
and conditions E.1-E.6
hold.
function p and prior weight q:
fsayes S arg inf Rp,g{f,U„).
/£F„
We
next define asymptotic average risk
(AARC)
as
Rp{{fn},K) =limSUpi?p,Leb(/n,-R'),
n—>oo
K
a compact cube
of W' with center 0, and sequence of estimators {/„} in {F„}. To
extend this definition to entire M'' define
for
,
R,{{fn},R'')
where
K
limsup [R,{{U},K)]
,
W' denotes an increasing sequence of cubes converging to W'
'[
Theorem
=
8 (Asymptotic
ARE)
Suppose model
R
and conditions E.1-E.6
hold.
For
{/sayes} £ {Fn}j defined by loss p and prior (weight) q:
inf
{/"}€{F„}
Because Bayes and
RpifnX) = Rp{{fBayes}X)
ML estimators of location parameters are not Jisymptotically equiv-
alent (equivalence holds for the shape psirameter
a under symmetric
ML estimators are not optimal under the convex loss functions
contrast to the usual case where, for a large class of loss functions,
equivcJent to the Bayes estimator and shares
We
its
loss function P2), the
considered here. This
is
in
MLE is asymptotically
optimality.
next examine whether theoretical efficiency translates into actual efficiency gains
with a simple monte carlo example.
We
consider simple one and two sided models with
two covariates. In the one-sided case we generate data as
~ Uniform [c/m + (1 — l/m) c, c]
c = c-PiXi- P-iX-i, Xj ~ Uniform (a, b),j =
Yo
12
1,2
Table
Mean Squared Error
1:
For
Two
Simulation Experiments
MLE
The
Pi
/?2
01
02
One-sided Model
0.8769
2.2030
0.0473
0.0395
Two-sided Model
0.1761
0.1244
0.0587
0.0502
distribution in this
model,
example can be rationalized by a simple procurement auction
C
draws a random cost
from the uniform distribution on
C through
submitted bids, which depend on
-|-
(1
— 1/m) C.
resulting in the
The two-sided example
the data
Y=
is
=
n
is
We
only observe the
the Bayesian Nash equilibrium bid function:
model above.
a contaminated version of the
Yo with prob. A and
=
0.9
and
Y=
L=
Uniform (L^c/m
first
example. In particular,
sum
200. Table 1 reports the
— l/m)c)
with prob.
using Pi
02
=
0.5
Some
Practical Questions
results enable the construction of confidence sets.
Inference about
(for
=
A
mean square errors across 300 simulations. The
smaller mean square error than MLE.
Confidence Intervals. We must distinguish between the estimates
parameter a and the estimates of the location parameters p.
Bayes
—
1
of
Confidence Intervals and
The obtained
-f (1
We simulate the above two models,
2.
Bayes estimator has a substantively
6
Each bidder
bidders.
(c, c).
generated from:
where we chose A
for
m
which there are n auctions. In each auction there are
in
Y = c/m
Bayes
a parameters
symmetric
is
fully regular.
loss function pi)
is
The
of the shape
limit distribution of either
To
given by N{0,J'~^^).
MLE and
facilitate inference
we
need to estimate J^^^- This can be done by conventional methods, taking the parameter
estimate
P
as given.
CorollEtry 1 Under assumption
^
= -\Y1
3,
for Bayes or
5!^ln/(^i
MLE a,P
- g{Xi,p)\Xi,p,a)
-^
J^„
i=l
An
alternative
brevity.
is
the familiar outer product of scores in a, which
The resampling methods
available for inference about
bootstrap. Although the bootstrap
is
a
we do not
state for
include subsampling and
not investigated formally, we can conjecture
due to Mammen's theorem and asymptotic normality
(see
it
works
Horowitz (2000)).
P poses more difficulties. Neither Bayes nor ML estimadistribution. The nonparametric bootstrap is not consistent in
Inference about the parameter
tors have a standard limit
the present setting.
A
simple counterexample
13
is
the boundary model without covariates,
.
MLE
which case both Bayes and
in
nonparametric bootstrap
We
is
known
to
are functions of the
fail in
minimum
this case (e.g.
order statistics.
The
Horowitz (2000)).
discuss a subsampling and an analytical approach to confidence intervals.
We
more practical and computationally simpler method. Following
Politis, Romano, and Wolf (1999), let Wi,...,Wn„ be equal to the Nn = (") subsets of
size 6 of {{Yi.Xi). i < n}. ordered in any fashion. Let /i ...Ib be chosen randomly with or
without replacement from {1,2, ...Nn}- Now, let On^^i be equal to the statistic of interest
believe subsampling
a
is
,
evaluated at the data set
9i,
Tn{0 — 9), where
is
Tn
=n
W, The approximation
to the limit distribution function of
.
for the location parameters or t„
=
^/n for the shape parameters,
given by
Ln,b{x)
By
inverting Ln,b{x),
=
we obtain
two-sided confidence interval
is
1
i{n{On,b,Ii
-B^
B
- On) <
various a-quantiles
c„^t,ioi)
=
—
-^n U'*)-
^^^
level 1
— a
— t~^c„,(,(1 a/2),^„ — T~^c„,6(a/2)].
— On\ can be used to construct symmetric
obtained as [On
Similarly, the empirical distribution of Tt,\6nfij-
x}
confidence intervals.
Corollary 2 The subsampling method of estmating the lim,it distribution of t„{0 — 0),
is Bayes or MLE, is consistent in the sense of Politis, Romano, and Wolf (1999)
where
(Theorem 2.2.1 (i)-(iii)), and the asymptotic coverage probability of the confidence intervals
achieves the correct nominal value, as long as b —^ oo, b/n -4 0, and B —> oo, as n —> oo.
is discussed in detailed in chapter 9 of Politis, Romano, and
They provide the calibration and the minimum volatility methods. In the
empirical section, we use 1/10 of the sample size. The confidence intervals are not sensitive
The
choice of the block size b
Wolf
(1999).
to block size variation. This
The
distribution.
is
probably due to the fast rate of convergence to the limiting
insensitivity principle underlies the
An alternative is an analytical method,
process
N, taking the estimated parameters
the solutions Z^o. This method
subsampling
is
minimum
method.
volatility
based on simulating the distribution of a poisson
detailed in
as given, then obtaining i^o and computing
Chernozhukov (1999). In the present context,
preferable on computational grounds.
is
Computational Methods. Modern computational methods
are important for
the inference and estimation methods available to practitioners.
It
making
used to be that the
Bayes computations were cumbersome and hampered the applicability
for
many
years.
Markov Chain MonteCarlo (MCMC). This technique allows the simulation of a markov chain Zi, ...Zj whose
marginal distributions are approximately the posterior distribution. The method allows
efficient and numerically stable computation of the Bayes estimates. Detailed discussion
Since approximately 1990, this problem hsis been overcome by
can be found
We
e.g. in
Robert and Casella (1998).
implement these computational methods
The programs
subroutines
pute the
cU'e
cire
available from the authors.
coded
MLE, we
in C.
Our program
for the
models considered
Our implementation
uses uninformative
(flat
is feist
in this paper.
over R"*) prior.
use simulated annealing, an algorithm that handles general
14
main
To comnonsmooth
since the
Table
Worktype
2:
Summary
Statistics
Avg.
Stdev.,
Avg #.
winning bid
winning bid
bidders
#auctions
(1989$, mil)
2
141
1.006
1.149
5.91
3
181
1.500
1.870
8.59
4
405
5.015
9.497
7.46
objective functions.
Therefore,
computer programs needed
7
for
we provide not only
Empirical Illustration
We consider a data set of bids submitted
the
the theory, but also the tools and
implementation.
New
in a
procurement contract auctions conducted by
(NJDOT)
Jersey department of transportation
period, the
NJDOT
in
the years 1989-1997. Over this
conducted 1025 low-price, sealed-bid auctions of contracts to procure
various types of services such as highway work, bridge construction and maintenance, and
road paving. Most of the services procured had few auctions conducted. In the following,
we
consider only three types of services. See table (2) for the
summary
statistics.
Hong
and Shum (2000) give a detailed description of the data.
We focus on the independent
ticular
private value
we assume that the construction
model formulated
Paarsch (1992). In par-
in
cost follows independent pareto distributions, as
studied by Paarsch (1992) and Donald and Paarsch (2000). Precisely, the cost distribution
for construction
companies
is
given by, for 9
h{c)
=
^^
=
(Oi,
9-2):
0<9i<c,0<92.
Paarsch (1992) and Donald and Paarsch (2000) showed that this implies the following
density function of the winning bids, conditional on the number of bidders
and other
m
covaxiates that affect the distribution parameters 9:
g2m{gig2(m-l)/|g2(m-l)-ll}''a"'
(
f{y\m,9)
=
ML
>
»,(™_i)_i
otherwise
I
Table 3 reports the
giffaCm-l)
-r
II 2/
»"="-+'
\
estimates and the Bayesian posterior
model. The variation in the summary
statistics of
mean estimates
indicate that the jobs defined in these contracts are very different.
separate parameter estimates for each type of contract. Also,
and 95% symmetric confidence
we
15
Hence we present
give the
intervals constructed using subsampling
earlier.
for this
the winning bid across types of contracts
95%
equal tailed
method described
Table
3:
One
sided Pareto Estimates
MLE
6-2
logL
01
0-2
0.0160483
0.57543
-1341.05
0.0789063
0.706494
-0.0140822
0.562586
0.0455028
0.541959
2
equal
tail
symmetric
0.0154688
0.598813
0.0839134
0.73919
-0.00578062
0.552359
0.0620981
0.669373
0.0378773
0.598501
0.0957144
0.743614
0.0593045
0.568535
0.0465869
0.356914
0.0377357
0.56294
0.017097
0.268854
0.060384
0.589204
0.0493621
0.357122
0.0396208
0.55182
0.0264918
0.291758
0.0789883
0.58525
0.066682
0.42207
0.0222681
0.646935
0.146176
0.555795
0.00208421
0.627295
0.131181
0.329598
0.0223695
0.666575
0.157989
0.566001
0.00392716
0.627295
0.13406
0.368623
0.040609
0.666575
0.158291
0.742967
3
equal
tail
symmetric
4
equal
Bayes
^1
worktype
tail
symmetric
-1955.05
-7975.9
Table 4 reports the parameter estimates from an alternative two sided model:
\\ e^m{e-i,0-i(m-l)l\e2(m-\)-l\\
ySj-^ + l
1-,
\^
=
f[v\m,e)
"^1
" 2/ ^ e2{m-l)-l
ifO<y<^iMiri:iil
S2(m-1)-1
and
otherwise.
We
chose A
=
0.02,
which accommodates outliers that do not conform
to the theoretical model.
In table 5
we introduce a continuous
only available for work type
where
X
denotes
4.
volume.
traffic
covariate for the traffic volume. This covariate
is
exp (qi + as x X) and 62 — exp (02),
appears to be significant, although there
We parameterize 6i =
The
coefficient
are large discrepancies depending on the estimation
method and the model. This
is
in-
dicative of misspecification.
An
alternative approach to direct parametric inference for independent private value
auction model
Their insight
is
is
the indirect inference approach of Guerre, Perrigne, and
based on examing the
a representative bidder
max
6
f
i
first
Vuong
(2000).
order condition of the optimization problem of
in the equilibrium:
g-i [x)
(6
—
c)
dx
-g^i
(b)
{b-c)+ G-i
(b)
=
Jb
where G-i and g-i denotes the survival function and the density function of the distribution of the minimum bid among bidder i's competitors. Therefore, a two step procedure
can be used. In the
first step,
G-i and g-i are estimated using the bid
16
data. In the second
Table
4:
Two
sided Pareto Estimates
MLE
2
equal
tail
symmetric
3
equal
tail
symmetric
4
equal
tail
symmetric
Bayes
9i
0-2
logL
0,
02
0.259243
0.55482
-372.445
0.22572
0.810755
0.180359
0.498768
0.149349
0.776126
0.272495
0.569737
0.236047
0.85172
0.188709
0.498771
0.154353
0.297086
0.329778
0.61087
0.770756
0.850753
0.418519
0.512854
0.3074
0.361401
0.316776
0.453844
0.203662
0.336442
0.439423
0.529476
0.31702
0.36304
0.320138
0.491592
0.208184
0.344999
0.516895
0.534106
0.406617
0.377803
2.87942
0.535876
1.84626
0.549167
2.46807
0.529865
1.35817
0.53917
3.1199
0.553112
1.98644
0.569452
2.55317
0.518958
1.42778
0.530891
3.20566
0.552793
2.26474
0.567443
worktype
Table
5:
Model with
-656.18
-2375.17
Traffic
Volume (Type
MLE
4)
Bayes
ai
02
as
logL
ai
02
as
one-side
-3.812
-0.436
-0.000
-6996.4
-1.096
-0.488
-0.697
equal
-3.833
-0.440
-0.001
-1.637
-1.092
-0.941
-3.810
-0.409
0.006
0.137
-0.453
-0.557
-3.830
-0.453
-0.006
-2.160
-0.615
-0.911
-3.793
-0.419
0.005
-0.032
-0.362
-0.484
two- side
-1.131
-0.652
0.547
-0.540
-0.639
0.091
equal
-2.150
-0.655
0.544
-0.842
-0.670
0.058
-1.114
-0.634
0.645
-0.461
-0.588
0.138
-1.820
-0.670
0.451
-0.812
-0.681
0.049
-0.441
-0.634
0.642
-0.269
-0.596
0.134
tail
symmetric
tail
symmetric
17
-1984
step, the pseudo-cost values
.9-1 (b)
are constructed for each bid in each auction.
Then the
distribution of these pseudo-values
can be used to infer the latent distribution of the cost parameter. G_, and g-i can be
easily inferred from the data for any symmetric affiliated private value model. It suffices
to observe the winning bid. since
n-l
G.dx)={F{x))
where
F
The
(•)
and /
(•)
/
-
and
g_,(x)
777
—
-
1
f
{x)
and Vuong (2000) uses a non-
F (x)
and f (x). Here we consider a flexible parametric
used the truncated normal distribution parameterized as:
parametric method to estimate
We
\
are the survival and density function of the whining bid, respectively.
indirect inference approach of Guerre, Perrigne,
approach.
1
= i^-^^j F {x)--
/<''->(^)/*(^)
where we take
In 6
=
ai
-I-
the results for worktype
a4 x m, In p
=
we notice that the new model
First,
02
-I-
as x m, In a
=
03
Qe x m. Table 6 reports
-I-
4.
is
an improvement over the pareto model
of the log Hkelihood value, suggesting a significant
improvement
in
the
fit.
in
terms
Although we
did not develop a formal testing procedure for the nonstandard likelihood, the principle
of
Vuong
in
the information-theoretic sense.
(1989) suggests that the model with the higher likelihood
Second,
we note
that the Bayes posterior
likelihood estimates, although
none of the slope
surprising, since the likelihood surface hcis
ters
is
greater.
and the
mean
is
closer to the
estimates differ from the
coefficient reverse its sign.
many more modes when
data
maximum
This
is
not
the number of parame-
Based on the computational experiments, theoretical efficiency properties,
Bayes estimators (posterior means) minimize the globally convex
fact that the
objective function, they
We did not report
may be
preferred.
the results for the case with the traffic volume covariate. Adding this
covariate produced a tiny improvement of the log likelihood and did not change the original
coefficients.
The estimates for the traffic volume coefficients were highly
methods yielded agreeable
insignificant.
Both
results concerning that covariate.
Overall, the two-sided models
fit
data better that one-sided ones, indicating that con-
do not conform the model is important. We also observe that the
parametric variant of the indirect inference approach of Guerre, Perrigne, and Vuong
(2000) is quite valuable and allows to fit the particulcir data better than the Pareto model.
trolling outliers that
Conclusion
We
studied a general model in which the conditional density of the dependent variable
jumps
at a location that
is
parameter dependent. This includes a variety of the boundary-
dependent model discussed
in the recent literature of structural estimation.
18
We
derive
Table
6:
Truncated Normal Model
for
Worktype
4
MLE
woiktype 4
Q2
a-i
as
logL
equal
tail
symmetric
ai
=
Os
oe
-694.854
-2.38883
0.404125
0.628959
-0.0702671
-2.58986
0.207458
-2.56234
-2.39153
-0.23598
-0.14270
-2.61471
0.167639
-2.3802
0.54492
0.80567
-0.04021
-2.5829
0.23593
-2.55203
-2.07808
0.15484
-0.12891
-2.60658
0.168674
-2.22562
2.88633
1.10308
-0.01162
-2.57313
0.24624
Bayes
equal
tail
symmetric
Qi
02
as
Oi
as
06
4.40077
-2.41695
0.643259
-2.38707
-2.26628
0.211352
3.84919
-2.93896
0.462919
-2.60164
-2.56957
0.172165
4.87133
-1.88786
1.047
-1.8852
-1.77164
0.23446
3.88809
-2.94462
0.347846
-2.8686
-2.72846
0.182684
4.91346
-1.88929
0.938672
-1.90554
-1.80411
0.24002
asymptotic distributions of Bayes and
ML
estimators under general conditions, and offer
and inference methods. The
practical computation
results provide a solution to a long-
standing econometric problem.
Our
models;
results extend previous
(2) inclusion
ciency; (3) considering the
and
(4)
in several directions: (1)
handling general regression
effi-
two sided model as a robust alternative to the one-sided model;
using the point process methods to give a precise characterization of large sample
distributions.
erties
work
of Bayes estimators, which enjoy the small and large sample
Bayes estimators are important alternatives to
which the
MLE
does not share.
The methodology
MLE
in this
due to efficiency prop-
paper also provides new
insights into the analysis of asymptotic distributions for models that cannot
be studied
using the conventional tools of locally asymptotically normal models.
The empirical application presented illustrates the usefulness of the results. We estimated some key auction models. The first model was a stylized pareto model of Paarsch
(1992). The second auction model represented a flexible parametric alternative to the nonparametric approach of Guerre, Perrigne, and Vuong (2000). We find that the two-sided
models
fit
data better than one-sided ones, indicating that controlling for outliers that do
not conform the model
is
important.
We
also find that a parametric variant of the indirect
inference approach of Guerre, Perrigne, cind
to
fit
Vuong
(2000)
data better than the direct parametric approach.
19
is
quite valuable
and may allow
A
Useful Background Definitions
A.l
Point Processes
cf. Resnick (1987)) Let E be a locally compact topowith a countable basis, and £ to be the Borel cr-algebra of subsets of E. A point
Definition 1 (Point Measures, Mp{E),
logical space
measure
(pm) p
on {E,£)
is
a measure of the following form: for {xi,i
lection of points (called points of p),
and any
set
A
^
p{A)
€:
=
Yli
>
1},
a countable
col-
p{K) < oo,
p{x) < 1 Vx e £,
G A).
^(^^t
If
K C E compact, then p is said to be Radon. A p.m. p is simple if
compound otherwise. Let Mp(E) be the collection of all Radon point measures. Sequence
{pn} C Mp{E) converges vaguely top, if/ /dp„ —> / fdpioi all functions / e Ck{E) [continuous,
any
for
and
is
and vanishing outside a compact set] (cf. Leadbetter, Lindgren, and Rootzen (1983)).
Vague convergence induces vague topology on Mp{E). Topological space Mp{E) is metrizable as
complete separable metric space. Mp{E) denotes such metric space hereafter. Define Aip{E) to
be tr-algebra generated by open sets.
real-valued,
Definition 2 (Point Processes: Convergence in Distribution.)
is
a measurable
map
N
:
(ft,
—> {Mp
T, P)
the realization of the point process
gence of the point process
N„
(E)
Nniw)
,
Mp(E))
is
taking values in
we shall write N„ => N in Mp{E)
ous and bounded functions h mapping Mp{E) to R.
Resnick (1987):
Je
—
f{x:)<I^n{x)
J^ f(x)dN{x)
>
for
with mean intensity measure
(a) for
F
any
e
£,
m
i.e.
if
{Fi,i
Fp/i(N„) -^ Eph{N)
Note that if Nn => N
for all continuin
Mp(E), then
Random Measure (PRM))
(defined on {E,S)),
Point process
N
is
a
if
and any non-negative integer k
k) are disjoint sets in £,
ifm(F)
then (N(F,),i
<
fc)
cire
=
oo,
independent random variables.
Convex Objectives
A. 2
The
<
A point process in Mp(E)''
if
^ '^"^"1
(b)
for
any / e Ck(E) by continuous mapping theorem.
Definition 3 (Poisson Point Process or
PRM
^'
every elementary event
ui G fi,
some point measure in Mp{E). Weak converMp(E) is the same as for any metric space, cf.
,
result
can be found in Knight (1999).
Knight and Pollaid.
It
a generalization of earlier convexity lemmas by
It is
allows discontinuities
and
R
-
valued objective functions.
Lemma
1 (Guyer) Suppose {Qt} is a sequence of lower-semi-continuous (Isc) convex ^-valued
random functions, defined on R**, and let V be a countable dense subset o/R'*. IfQx fidi-converges
to
Qoo in
R
on
P
where Qoo
is Isc
convex and
argmin Qt{z)
provided the latter
A. 3
is
uniquely defined
a.s.
in
finite
—
>
R
on an open non-empty
set a.s., then
argmin Qoo{z),
.
Stochastic Equisemicontinuity
The remarkable concepts of this section were recently developed by Knight
summarizes some essential elements we need.
20
(1999).
The
following
Epi- Convergence. Suppose the sequence of objectives {Qn} are random lower semi-continuous
(I-sc) functions (i.e.
Q„(x) < liminf^.-n Q„(j;j), V2;,Vij —> x). Let C be the space of 1-sc
functions /
:
R"^
—>
R,
s.t.
^
/
C can
oo.
be
made
sidering a special metric, convergence in which
Rockafellar and
Wets
R°,
...,
Rl, and any real
P{ Dj^i
{
inf
x£Rj
Q{x)
>
ri,
...,
<
rj})
Lemma
i.
ii.
Hi.
Zn
Zoo
by con-
Knight(2000),
in C:
with open interiors
if
for
any closed rectangles Ri,...,Rk in
R''
Qn
said to
is
rjt:
liminf
P(
n
>
n^^^i { inf Q„(a:)
x£Rj
{ inf
n*=i
Q„(x)
rj})
>
r-,})
< P(
n*^=i { inf
Q(x)
>
rj}).
a weak condition that leads to the convergence of axgmins.
Theorem 1) Suppose that
Q„{Zn) < ini^^jidQn{z) + (n, En
2 (Knight,
is s.t.
=
Qn{)
Zji
is
(cf.
Hence one can metrize the weak convergence
< limsupP(
Epi-convergence
into a complete separable metric space
equivalent to epi-convergence
Q
(1998)).
epi-converge in distribution to
is
dTgmin^^^dQooiz)
\ 0; Zn = Op(l)
uniquely defined in
is
R
a.s.
epi-converges in distribution to Qoo{), then
——>
Zoo
Epi-convergence
is
discontinuities. In
more general than uniform convergence, because
our case,
(lots of)
it
allows for rather generjil
non- vanishing discontinuities make the uniform convergence
of the likelihood function impossible.
Provided the
finite
dimensional distributional
condition for epi-convergence in distribution
is
(fidi) limit exists,
the necessary and sufficient
stochastic equi-lower-semi-continuity
(s.
e-l-sc),
developed by Knight (1999).
Stochastic equi-semi-continuity. Sequence {Qn} £ £ is s. e-sc. if for each bounded
> 0, and (5 > 0, there exist ui,...,Uk G B and some open sets V(ui),...,V{uk) covering
e
containing ui,...,Uk
s.t.
limsupP(Uj=i
Lemma
3 (Knight,
Thm
{
s-esc condition
amounts
(and hence of argmin) of
Q„
will
Q„(x) < min(e"\(5„(uj)
Qn
and only
is s.e-lsc.
if
{Qn}
-
e)})
<
<5.
Then {Qn} converges
to
Qoo in
distribu-
epi-converges in distribution to Qoo().
to the possibility of approximating the distribution of the
infimum
B
by an approximate minimum of Q„ over a
with given precision (e, 5). It is worth emphasizing that the naive
over bounded set
carefully chosen grid {ui, ...«*},
uniform grids
inf
2) Suppose
tion in finite- dimensional sense if
The
B,
and
set
B
not do the job in our case. The application of stochastic equisemi-continuity
leads to a simple proof even in the no-covariate case, which could improve (in terms of length)
over the arguments of Ibragimov and Hasminsldi.
B
Proofs for the Linear Model
In the proof
we
set
the local parameter sequence
j3„
=
Po-
sequence Pn does not change the arguments but introduces a
21
Putting through the general IoceJ
lot of notational
complexity.
—
B.l
Proof of Theorem
1.
Q„
Consider the local log likelihood ratio process
(z)
=
lnL„(/3o
+
z/n)/Ln{fio)
n
<?„(2)
= ^g,„{^)
X
[l(e,
>X',z/nVO) +
l(€,
<X',z/nA.O)]
+
1(0
>
1=1
n
+ ^9.n(^)
X [1(0
<
e,
<
X'.zfn)
>
£,
X',z/n)]
1=1
=
q,n{z)
=
Q\n
(z)
-
In [fiYi
Qi„{z) and ^211(2) behave very
I.
Limit of Qin{z)
By
like statistic.
+ Qin
X.'(/3o
(z)
where
,
+ z/n))\X,,l3o + z/n)/f{Y, - X,'/3o|X„ ^o)]
•
differently.
analyzed by techniques similar to those in IH (1982). {Q\n{z)}
is
CI,
C4 and LLN,
Qin{z)
converges in probability for each
it
—
>
is
an average
z:
= -z EX
Qloo(z)
(6)
f(€t\X,Po)
For a density function / that has a dominated derivative everywhere except at 0; fj^f'(u)du
— /(O""") + /(0~). Thus applying this to the conditional density / (e|X,/9o), we have:
Qi^{z)
=
= z'EX\p{X)-q{X)].
Next we use stochastic equi-continuity to convert pointwise convergence to uniform convergence
show that for any \zi — 22!
> 0, the term
—
in z. It suffices to
n
n
|^gi„(2i) X
l(ei
>X,'zi/7jV0)-^gi>i(22) x
>X,'z2/nV0)|
l(ei
^0,
(7)
1=1
1=1
as well as other similar
terms
in
Q„
(2).
The
left
hand
side of (7)
bounded by
is
71
^l(e,>
4^ v4^V0)
ln/(6i-4^|/3o +
x
zi/7i)
- In/
(e.
- 4^|^o + 22/n)
1=1
,
By
C.4, the
sum
first
>
jr{
l(e,
,
in (8) is
,
by, for 2' in the
bounded
> XiZi/n V XiZ2 n V 0)
(8)
fU-^\0o + z,/n)
,
—
-—
x
\
convex hull of
—
21
and
-—
22
x
f(ei-X[z'ln\Po-\rz-ln)\
n
I
I
(9)
<-y;i(ei>0)x
n -^
The second sum
sup
I
[21
-
some C, C, C"
>
{}''|f°,'' T'
/(cil^o)
by, for
x
22I
=
Op(l) x
\zx
-
22I
I
f:i(e,e(0,^V^])xCx|n!l^rx&N^
sup
.zoPZ
l.zjSZ
z-i
x
|
bounded
in (8) is
1
<
const
'^
"
E
1 (e,
e
(0,
C'/n]) x
C" =
._j
,
22
Op(l)
since f'{t\x)
X
has a compact support. Thus
Limit of ^271(2)- Qzn
II.
and can be modeled
We
as
to the point process that measures these occurrences.
w.r.t.
constructs the key point process and derives
1
<
Step 2 shows that {Q2n{zj),j
representation.
(7) follows.
driven by the "rare" occurrences of the near-to-jump observations
is
an integral
the proof into two parts: step
split
when € > u >
Other terms are similarly checked.
uniformly bounded above and f{u\x) uniformly bounded below
is
by C.2-C.3, and
J)
its
limit
a continuous transformation of the point
is
process.
Step
The Key
1:
E is taken to
is
a compact subset of E. £
The key
E = — oo,0) U (0,+oo) x X. The topological space on
R \ {0} and R"* n X. So that, e.g., [a, b] x X
Point Process. Define
(
be a product of standard topologies of
point process
is
is
the Borel u-algebra of subsets of E.
a random measure taking the following form;
n
= J2l[{ne„Xi)eA],
f<!{A)
1=1
any
for
set
A
N
in £.
a random element of Mp(E), the metric space of nonuegative point
is
measures on E, with the metric generated by the topology of vague convergence. Appendix
gives definitions.
We
will
A
show that
N =» N in Mp(E)
where
(a)
N
By
is
defined in the proposition
F 6 T,
open bounded rectangles
C.l and C.3, for any
intersections of
across
i
m
is
and
in steps (a)
compact open
in E), limn-+oo
BN(F) =
<
dFx(x)
(b).
sets in
limn->oo
E (finite unions and
nP({ne,X} £ F)
= m{F) <
00
defined in the proposition. Since the events {{nei,Xi) £
F]
>
by C.l, by the Meyer's
+
0)du
Lemma
q{x)l{u
(
lim P{SS{F)
which by Kallenberg's Theorem^
[
Appendix A) implies that
the mean measure 7n().
(
0)du] x
Meyer (1973)
n—voo
process
done
is
the basis of relatively
\p{x)l{u
/
where measure
This
1.
=
=
0)
)
we
are independent
also have:
e-"'<^*,
and
N clearly simple
N => N ia Mp(E), where iV
is
=
a.s.]
is
the definition of the Poisson
a Poisson point process with
N
Next we show that
has the same distribution as the process N, stated in the proposition.
homogeneous PRMs No and Nq with points {Pj} aind {P;}, defined in the
proposition. No has the mean measure mo(du) = du on (0, 00), and No has the mean measure
77io(d'u) = du on (— oo,0). Now because No and No are independent,
(b).
First, define canonical
Ni(-)=No(-)+No(-)'
is
the Poisson point process with the
mi{du)
Because
{A",, A"/}
are
i.i.d.
mean measure
= \(u>
Q)du
+
\(u
<
Q)du on
R\
{0}.
and independent of {FijPj}, by the Composition Lemma(cf. Proposi-
tion 3.8 in Resnick (1987)), the composed
PRM N2
with points
{{Ti,Xi},{T'i,Xl},i>l,j>\) inR"*
^For example, Resnick (1987), Prop. 3.22
23
mean
has the product
intensity
=
m2{du,dx)
Finally,
N with
measure given by
>
[l(u
+
0)du
<
l(ii
0)du] x
Fx{dx) on
R\
{0} x
X.
the transformed points {T{r,,X,),T{r',,X-)}, where
T:{u,x)>-^(l{u>0)-^ + l{u<0)^, x)
= m2oT~\dj,dx) =
m{dj,dx)
by the Transformation Theorem
Step
X
:
2:
The Functional
>
q{x)
compact
>
i5
E
mean measure on
has the desired
Xs
0} and (b)
set Z, as
>
+
0)
q{x)l{j <0)]dj x Fx{dx),
Poisson Processes( Proposition 3.7 in Resnick (1987)).
for
Key
of the
\p{x)l{j
Point Process. Here
X
{r G
=
q{x)
:
0}.
we
distinguish two cases (a)
f(5
—
x'z/n\x,l3o
+
z/n)
q(x)
uniformly in {<5,^,x 6
R+
x
f(S
—
In
uniformly in {S,z,x
X
Z x
p{x)
>
x'z
:
x'z/n\x, Po
<
0,
+
5
<
{x G
any
x
X
Zx
p(x)
In
{l
Mx).
<
x' z
:
>
0,
5
>
[0
+
0(n-'))
(11)
x'z/n}. Note that in case (a) this holds by
C.3, while in case (b) equation (10) holds identically equal to
[Dn^l
(10)
and
x'z/n},
z/n)
mx,po)
eR-
a + Oin-'))
In
f{s\x,M
=
for
^ oo,
n
In
Q2n(z)
X=
Note that by assumption C.3,
— oo. Hence
< nu < X.'z]+f^ln^l p >nu >XU]]^{l + o,(\))
(12)
sQ2„(z)x(H-0p(l))
may
(Again this expression
uniformly in z over Z.
equal
— oo
in ceise (b),
create a problem for the proof.) Write Q-2n{z) as an integral w.r.t.
but that does not
N:
Q2u{z)= f h{j,x)d^U,x),
JE
where lz{j,x)
(a):
By
Kz =
is
We
defined in the theorem.
consider cases (a) and (b) sepsurately:
conditions C.1-2, the function (j,x)
X X,
[—T], +7)]
T]
=
sup3,gx
l^'-^l,
i->
where
77
lz{j,x)
<
=
x'z.
Next define the
map
T
:
Mp{E)
i-^
bounded and vanishes outside the set
Kj is a compact set in E (by the
and at
lz{j,x) is discontinuous at j =
is
oo by C.l.
standard compactification of E). The function {j,x)
j
(13)
i->
R' as
N^(^J^h,{j,x)dN{j,x),k<iy
The mapping
T is
discontinuous at the set
V(T) =
where
(j/^
,
if*
,
i
>
1)
[Ne
Mp{E) jf =
:
or
jf
=
z'^xf for
some
i>l,k<l}
denote the points of N. By C.3 and the construction of
P[N 6 ©(T),
for
some n >
1
24
]
=
0,
P[N 6
I>(T)]
=
0.
N
(14)
:
Therefore
N
N
=>
in
Mp{E)
implies
-^
T(N)
-^
{Q-2n{zk),k<l)
=
where Q2oo{z)
Now
(b):
T(N).
We
conclude that
(Q2oo{zk),k<iy
(15)
f^L{j,x)d'N{j,x).
=
consider the second case: i2n{z)
=
£2n(z)
if
N(A(Z)) >
explQanCz)}- Note that
=
hn{z)
0,
1 if
n{A{z))
=
0,
where
A{z)
=
Observe also that i2oo(z)
dimensional convergence (for
if
=
N(^(z)) >
A^ =
eR+ xX:j<
{{j,x)
or
^200(2)
0,
suffices to
< Kj
show {f<S{A(zk)),k
theorem, this follows from
—
P(£2<»(2*:)
r„()
< Kj. By
{'N(A(zk)),k
>
to
(1), it suffices
an integral with respect to in over
(z) is
rji^i
(2)
= /
p{zJ\u]<M
u) in («) q {Po
Define their corresponding limit process
rft, (z)
=
p (2
/
-
R**
There are two steps
.
+ u/n) du,
=
a.s.
(fidi)
convergence of
in
rfj
suppose that
for
is
which
= [
tn
J\u\<M
each z and each
(16),
by two
Elniz)
is
it
-t-
u/n) du.
r'^ =
loo (u) q{Po)du.
>
0,
5
>
suffices to
3M 6 R+
0,
P (|r„ (z) - r^
It follows
(z)
I
>
e)
<
J,
(16)
from the exponential smallness of the likelihood
show (ri^.T^i
tails,
(4) ajid (5).
(z*)
,fc
-^
< k\
{t^oc,T^oo (zk) ,k
<
if
V
This
facts:
=
lEll'^'iz')
i.
e
demonstrated in the subsequent lemmas
view of
follows
{u) q (^0
M € R4-,
J\u\<M
a tail-smallness property.
is
the proof:
of R**. For leirge
cis
u) too (m) q(l3o)du,
lim sup
Fact
finite-
-U (r^(zk),k<Ky
J\u\<M
ii
show
the continuous mapping
show the finite-dimensional
Approximate r„ (2) over R** by an integral over a compact subset
approximate r„ (z) by Tf (z) = rf, (z) /T'^^, where
i
to
and N(9j4(za:))
1.
2. In
Thus
toTooC-),
(rn{zk),k<K'^
This
0.
= Ak,k < k),
N ^ N in Mp{E), since N(9yl(z*;)) =
Using the convexity lemma
Now
=
N{A(z))
1 if
Proof of Theorem 2
B.2
r„
x'z).
1):
p{l2Azk) = Ak,k < k) —->
it
=
Koo
-
eV^{z)\^
<
C\z
-
z'\
the definition of likelihood ratio.
conditions of a limit theorem for integrals of
Fact
ii.
random
(1982).
25
is
by
Lemma
functions,
5-6.
These facts check the
in Appendix I of IH
Theorem 22
,
By condition
Show
Show
•
•
first
C.3,
— ^„
is
lower-semi-continuous.
the stochastic equi-lower-semi-continuity
step was
shown
in
Theorem
by
steps, the conclusion follows
= — log^n}. We know
We
1.
Lemma
(6) implies
is
B C
>
and S
C ur=iV(zjt) and
R**
done
is
-A
Q2n(z)
0,
= J2
1
[ineip{Xi),Xi) e
<
N
<
Fin
T2n---
i
>
;
:
>
e,
r2„...
.
=
Thus iV()
<
ajiy
a fixed
bounded
zi,...,Zm
s.t.
S-
1
process:
[(neiq{X,),X.) G
•]
of
{
n£,g(X,),
i
:
tv
<
}.
Denote by Xin,Xin the corresponding to
-I- 1 [(r'i„,X,'„) e ]
J2i>i 1 [(Fin, AT.n) e
•]
N, say T(N), from Mp{E)
Theorem
e
1.
]
+ l[iT'i,Xl)
Also,
{rin,—,Tkn)
e
•]
to Mp{E). Therefore, in M-p(E)
where the distribution of the points
by continuous mapping theorem,
—
>
in
R*
(ri,...,rA;).
of this to chciracterize the equisemi-continuity of Q2n- Write
Q2niz)
We
^
+
•]
ri„,r'2„,...
},
>
r'i„
N() = T(N) = E,>il[(ri,-^0
defined in the statement of
all
show that or
some
suffices to
is
equivalently in terms of the order statistics
a continuous transform of
need
0,
ei<0
r,„,r;„ realizations of the covariate.
We
it
-Q2niz) < -Q2n{Zk)}
inf
Ur=l{
o{{neip(Xi),
ri„,r2n,..-
N{.) =>
-^
in several steps.
Represent the points of
so that
{—in}, or equivaJently of
where Q2n(z) = J^h{j,x)d'N{j,x). Qiooiz)
show s-e-lsc of {—Q2n(z)} only.
Ci>0
is
" Q2n{z)
seemingly strange point process] Construct the point
iV(-)
is
(e-lsc) of
there aie open neighborhoods V(zi),...V{zm) of
0,
1.5.1 in
Op(l). Combining these two
of' Theorem 1 that:
a piece- wise constant function
P
(a) [A
=
that Zn
sets,
linear function. It therefore suffices to
Because Q2n(z)
— in},
2.
- Ql^iz)
uniformly in z over fixed compact
iV
(e-lsc) of {
demonstrate the second step next. Theorem
lemma
from the proof
Qln{z)
This
steps axe needed:
remains to show stochastic equi-lower-semi-continuity
It
{— <52n
B
Two
finite-dimensional convergence of in to the stochastic limit ^oo
IH(1982) shows that the conclusion of
set
.
Proof of Theorem 3
B.3
The
.
= f
h{J,x)dN(j,x) =
hij,x)dNU,x) + f
JE:j>0
JE:j<0
Q+n{z)
+ Q^niz)-
examine discontinuities of Q2n by examining that of Qtni^) ^^d Q2ni^)- Because the argu-
ments are identical
(b) [Picking
for either part, consider only part OjnC-^)-
the Cover] Fix a bounded
equal-sized cubes {X^{xj),j
<
set
B C R'^.
Cover
X by the minimaJ
J(4')} with the side-length equal to
denoted as Xj. Construct sets
{Vkj,k= -m,...,m,j <J{4>)} CR"*
26
<(>
number
of closed
and the centers of cubes
as Vkj
s
{z 6
R"*
—
Vf.
<
(p
Vf.
X
<
p{x)x'z
=
+ <p,^x
Vf.
k G {—m,
for
k(p,
e X^{xj)}, where
>
ip
and
0, ...m}.
...,
bounded away from and above by
by selecting sufficiently large m.
(c) [Number of Break-points is Op(l) and separated for small ip] Consider argument z in
UjVJtj, then a discontinuity in Qj„(z) can potentially occur in UjVkj only if there exist z* £ UjVJtj
and (TintXin) s.t.
Since
is
a nondegerate compact subset of
R'',
and p{x)
f
where
there
(17)
,
must be that
it
Uk
If
— P\Xin)X.jjiZ
Z7i
is
B
assumption, {UkjVkj} can cover any given bounded set
is
-
^
'P
<v^ +
r,„
'p-
such (r,n,^in), we say that Q^„ has a breakpoint in OjVkj- Note that Qj„(z) can not
in Vkj with fc <
because Tin > 0. Define A/k = #{« Tin < fc}, A/" = #{»'
have breakpoints
Fi
<
fe},
where k
:
:
=
Mn
snp^^^^^gp{x)x'z.
the upper bound on the number of breakpoints of
is
—M
in R. So the number of breakpoints
^
Qf„(z) in set B. By continuous mapping theorem, jV„
0{J^n) is stochastically bounded Op(l). Furthermore, break-points are separated: no more than
one break point can happen in UjVkj with probability arbitrarily close to one if ip is sufficiently
small. Define Ak to be the event that Qtni^) has more than two break-points in UjVkj.
P [UfcAfc] < hmsup P
limsup
n
|F,„
-F(i_i)„| <2ip
min
|Fi„
-
F(,_i)„|
min
IF;
—
F/i_i)
min
,
n
< hmsup P
<
+ P[Nu>K]
2ip
n
P
<
<
I
+ P[N'>K\
2(^
<5/2
which
is
P
E
Prom now
(c) ["smart"
on,
K and
<pi
t;
:
<
sufficiently large so that
[mini<i<K
min
[
Ll<i<K
77
the constraints,
<<
we
<^
will
will set
where
B(t])
is
{
inf
F(,_i)
^
•"
|
<
= E
|Fi-F(,_i)ll
P [N' >
2<^
[
]
<
min
K\ <
l^jl
l-i<'<^
J
5/4, followed by setting
5/4, which
>
is
(p
possible since
0.
J
are fixed,
— (p< X
be
4>
cp
Next construct
small]
<
Zkj
Ufc
—
¥'
+
"centers" Zkj in Vkj so that
Vx e x^{xj)
';,
set sufficiently small in the next step.
Depending on
7/,
to satisfy
sufficiently small as well.
(d) [Stochastic Equi-semicontinuity]
limsupPfuij
—
|F,
grid-points by setting
Vf.
where
K
achieved by setting
sufficiently small so that
Now follows the verification of stochastic equi-semicontinuty:
-Qtni^) < -Qtni^kj)}] < lim sup P [5(77)]
the event that {Fyn,i
< K}
-I-
lim sup
are separated, and at least one of
of the "small" disjoint sets of the form:
hk, -V,V.ki
-f +
27
rj],
i
<K.
them
P
[
Ufc
falls
Ak]
into one
The bound is true because the last event contains the event that (r,„,i < K) are separated, and
one of them falls into a set of the form [vi^ — tp, X'j^Zkj] which actually is the event
Ut;{
because —Qtni^)
'^
< K)
because (ri„,i
-Qo+„(z)<-Q+.(z,,)}n(U,.4,r,
inf
piecewise-constant and can only
—
(r,,
>
< K), which have
i
=
limsupF[B(7;)]
jump up
0(KTfi
<
Now
index X'^z increases.
if the
the bounded density,
follows that
it
djl.
n
by picking
By
Lemmas
B.4
A
sufficiently small
(Note that
tj.
K
stays fixed
+ limsup„ P[UtAt] <
step (b) limsup„ P[B(77)]
and
depend on
choice does not
its
rj).
S.U
4-6
critical ingredient in
the proof of
Theorem
2
is
the requirement of
smallness. In this section
tail
such properties of
en(z)
are established, where z
Lemma
4 Suppose 36
6
>
j,M
it is
/3
in
Ln{l3
Wz
eW' ,z'
+ z/n)/Ln(J3).
R**.
0,
B>
ii)Ep/„ {zf" <
uniformly in
=
an open
0, s.t.
e-''l=l;
eR''
.
{ii)Ep^\£n {z)"^
3no
-
Vn > no:
s.t.
£„ (Z)'^'
f < B\z -
z'\,
Then for
ball at /3o-
^^^^r
in(u),iP, + uln)
^^^
^ f
J\u\>M Ju^^n{u)q{Po+u/n)du
true that under P = P^„
lim limsup^ri^i
M-*oo „_>oo
^^^_
^^^
=0,
lAu),(Po + uln)
^^
fy^in
J\u\>M
lim limsupET^a'
M->oo „_K3o
(-z)
=
(u) q{Po-i-
^^
u/n)du
0,
(18)
so that
lim
limsupP
(|r„ (z)
- rf
(z)
1
>e) =
0.
(19)
lemma by bounding
Our approach requires a modification: we bound a conditional Hellinger
the model R. The Conditional Hellinger Distance, denoted as r2 (/3;I3 + h), is defined
IH(1982) studied the non-regression models ajid verified the conditions of this
a Hellinger distance.
distance for
as:
[//I'\f'/^{y-x'{l3
Upper and lower bounds on
Lemma
5
// there are
a
>
r2 () are
0,
infri(/3;/3
P
A>
Vn >
P + h)- f"^
/3
-
fi)
? dyFx
(rfx)l
used for to verify conditions of
lemma
2;
\x;
[y
such that for each
+ ft)>-^!^
1 + l"l
where sup/inf are computed over
3rJo s.t.
+ h)
and
x' p\x;
h>
sup rl
small,
{13 ifi
+ h) <
A\h\
/3
in open ball at
/3o-
Then 36 >
0,
B>
0, s.t.
Vz 6
R**, z'
no-"
supSp„£„
(z)'/'
<
e-""!^!;
sup£;pJ£„
28
(z)'/^
-
£„ (z')'^'
|^
< B\z -
z'\.
€
R'^,
The
following
lemma
lemma
the conditions of
verifies
(5)
and hence
also verifies the
bound on
the conditional Hellinger distance in model R.
Lemma
Proofs of lemmas
Lemma
(18)
(4):
is
immediately from
Lemma
>
6 In model R, suppose (C.1)-(C.3) hold, then 3a
small enough, the following
[l
-
0, j4
>
0,.
such that for
all
h
rUP; P +
(4)
to (6):
>
h)
a special case of
and
a\h\
lemma
su-pri (P;
j3
+ h) <
A\h\
L5.2 and Theorem L10.2 of IH(1982).
(19) follows
(18).
from the definition of the conditional Hellinger distance that
y y f"^ (y
-
x'
(/3
+
z/n)
+ z/n) f"^
|x; /3
[y
-
x'/3|i;
dyFx
H)
rj
(/3; /3
+
{dx)
Also Ei,
n
j J f'^'^y
we can bound uniformly
in
£'^„(z)i/2=
Similarly,
£|^„
{zf' -
=
2
-
^'(-^
P=
+
f{yi
|'
=
Bin
J f'^ {y
J
/'^^
(y
fl^
Lemma
(6):
rl{p;p
nrl{j3
+ z/n)f'''
{y
-
x'/3|x;/3)
dyFx
{dx)
P^
£„ {z'f^^
<2n i-Ex
x'iP\x;P)dyiFx{dxi)
^/n)\xi;P
l-lrlip-p +
I- ( Ex
-
}
f'^
=
>
true:
in{
(5): It follows
=2
is
(z)
- X'
{y
<
z/n)
+ El,,
(/J
+
e-t^2'<^'''+"''"'
{z')
-X'(p + z'ln)
+ z/n\l3 + z'/n)
- X'
(/3
2E(.n (z)'^' in (z'f'''
zin) \X;
-X'{P + z/n)
(y
-
\X;
+ z'/n)
< e"* hT^.
p + z/n)
\X;p
fi
+ z' jn) dyj
+ z/n)
|X;/3
+ z'/n) dy
<A\z-z'\.
To obtain the uniform upper bound,
for e
>
small enough
+ h)=Exj(f^'{y-X'{P + h)\X;p + h)-f'^'{y-X'/3\X;/3)ydy
<Ex fifty- X' W + h)\X;l3 + h)-f{y- X'P]X; /3)
\dy
<Ex [
\f{y-X'{l3 + h)\X;p + h)-f{y- X'P\X; 0)
J[X'0,X'W+h)]
+Ex I
J\X'B.X'(B+MY
\dy
\f{y-X'{fi + h)\X-fi + h)-f{y-X'p\X-p)\dy,
29
z/n)
where
\{a
+
[a,b]
=
a
<
-
6^
[a,b] ii
b)(a -b)\
=
2Ex \X'h\
(p (.Y,/3)
[a^
b
|
+
=
and
for
a
[b,a] if 6
<
a.
>
0.
This
>
and
q (X,/3))
6
+ Ex
The
inequahty follows since
first
h'^iy- X'
JJ
au
\a
—
b\^
<
bounded by
further
is
+
(13
uh)\X;l3
+ uh)
'
dudy.
which by compactness of X, Cauchy-Schwartz inequality and changing order of integration by
Fubini
bounded
is
<
const
by,
x\h\+
const
x
Ex
/
\h\
\-^
J
Jo
The upper bound
Now bound
rl(P;fi
is
now obtained by
uniformly in
(3
>
- X'
(P
+
uh) \X
;
+ uh)
fi
dydu
condition C.3.
(f'"
^
~x
[
^x5x
{y
'^P
the conditional Hellinger distance from below:
+ h) >Ex [
J[X'P,X'{0+h}]
>Ex
I
{y
- X'
W + h) |X;/3 + h)
X'h
{y
- X'P\X-p))' dy
'
dy
p"^{X,li)-q"\X,fi)
= ^x5x\h\xEx
f\X'h\)
- f"^
>
const X
\h\,
\h\
where the
C
last line follows
because inf^g^d.|^|_j £'|X'c|
>
0,
Var{X) >
since
0.
Proofs for the Non-Linear Model
In the proofs we set the local parajneter sequence 7n = 7o- Putting through the general
sequence 7n does not change the arguments but introduces a lot of notational complexity.
C.l
As
in
loccil
Proof of Theorem 4
Theorem
1,
we
split
the log likelihood ratio process
Qn
(z)
=
In Z/n(/3o
+ H„z) / Ln(j3o)
= {u,v),
into
the "jump" part and the "smooth" part, and analyze each part separately. For z
n
Qn{z)
=
Y^qin{z) X
[l(e,
> A„(Xi,u)/nVO) +
l(£,
<
A„(A',,-u)/n
A
0)]
1=1
n
+ X^9m(-z)
X (1(0 <ei
< A„(X.,K)/n) + l{0>fi > A„(Xi,u)/n)]
1=1
= Qin (z) + Q2n {z)
''"^^^~'"
A„(2:,u)
I.
We
first
=
,
where
f{Yi-g{Xi,Po + u[n))\X,,Po + u/n, ao
f(Y,-g{X„Po)\X,,Po)
n{g{Xi,/3o
find the uniform limit of
to those of IH (1982). For each
z,
+
-
u/n)
Q\„
(z)
.
+ v/y/n)
g{X,,Po))-
In this part only,
we apply the arguments
similar
using assumptions (E.l) to (E.3) with a second order Taylor
30
expansion:
=-
ViT. (z)
u
,
—
h dg{X,Mf'{^\X,lo)f(e\X,-yo)
0/3
^
7
=1
= u'EA{X)
(p
where the definition of p and Qin
^aiio|i^x^_ and CLT
-
(X)
{z)
is
q {X))
Qm {z)
=^ Qioc(z)
=
(^.•|^m7o)
IV't;
^N
- ^v'jv
a continuous Gaussian process, provided we can show that
strating stochastic equicontinuity of
+
u'ti
Qi„{z)
obvious. Information matrix equality for
^ Er=i £f
gives
+ Qin{z) =
Qi„
(z).
(0,J)
.
a
gives
—J =
Hence
in £°=(Z),
convergence
this
In particular, for any
\zi
—
is
z^]
uniform by demon-
—>
0,
we can show
that terms like
n
^q,n(zi)l(ei > A„(X,,'Ui)/nV0)-5i„(z2)l(e, > A„
Split the
(X,,-U2) /n
V
0)
term into
71
^l(ei > A„(Xi,ui)/ny A„{X,,U2)/nVQ)
- A„ {Xi,ui) /n\Xi;Po + ui/n,ao + vifn)
- A„ (Xi,U2) /n\Xi;l3o + U2/n,ao + V2/n)
X |ln/(ei
-
ln/(€,
\
n
+ ^l(f
X
€
(0,
A„ {Xi,u^)ln V A„
f {a
max
j
- An
(X,, U2) In
(Xi,Uj) ln\fio
= l,2
+
V
0))
Uj/n,ao
+ Vj/^/n)
f{€,\/3o,ao)
By the argument of the proof of Theorem 1, the second term is bounded by J^"_i 1 (e; 6 (0, C /n V 0)) x
-^ 0. Each summand in the first summation can be bounded by, for some z' between 21
^
and
22,
n
^l(ei > A„(Xi,ui)/nV A„(Xi,U2)/Ti VO)
x
1=1
(
^In/ (ei - A„ {Xi,u') /n\Po + u' /n,ao + v' /y/n)
+ ^In/(€i - A„
Using assumption (E.5), the
{Xi,u') /n\^o
first
term
in this
+ v.' ln,ao + v'/y/n)
^
Ul
^
—
\
summation can be bounded by
c]^Y.\^\nf{,ib,)\'^'\u,-U2\ =
31
0,WW~U2\
U2 In
^
The second term
y
1
n
aaaa'
term
(E.6) implies that the second
-
"
1
V2) |- "S^
term by the
CLT and
^
—
—
22
-^
1
0,
-
V2
-
V2)
v"/
is
=
\vt.
-
i;2|Op(l)
Op(l) and
is
hnear in
C"(u,Xi)
assumption (E.4)
the entire term goes to
{Vi
and
bounded by
is
the indicator in the summation can be replaced by
\zi
2"
ln/(f,|A',,7o)(t;i
n
\
rt
first
z" between
d
> A„(A-,,ui)/n V A„(.Y,,-U2)/nV0)—
l(f,
const X \v" (vi
The
as, for
\v"
^
Assumption
summation can be further expanded
in the
>
v\
—
V2.
(Note that
by Chebyshev inequality.) Thus as
in probability. This completes the proofs for stochastic
\{tt
0)
equicontinuity and uniform convergence of
Limit of Q2n
II.
the
r.v.
in
the proof of
Qin (z) for the nonlinear model.
Theorem 1, in the expression for Q2n
either \np(Xi)/q{Xi) or \ap{Xi)/q{Xi), uniformly in z in Z,
by
qin(z)
As
(z).
Note that
where
e,.
this expressions
may
equal —00.
[z)
we can
replace
depending on the sign of
So uniformly
in z over Z, Q271 (z)
=
+ Op(l),
Q2n(z)
02„(z)
= y]
[In
4^1(0
< nu < A„(X„«)) + In
4^1(0
>
ne,
>
A„(Xi,u)].
Next we note that by straightforward calculations
n
EY^\l{0<ne, < A„{Xi,u))-
1(0
<
ne,
< A(X,,w))|
1=1
+ [1(0 >ne, > An(X,,u))-l{0>nu >
where
A (Xi
E
,
u)
=
^^^gg'^°^ u,
80
which yields that
for
any fixed
A{X,,u))\
=
Qin
=
z,
{z)
o(l),
ln^4^1(0<n£. <A{Xi,u)) + \ii^^liO>nu > A{Xi,u)) + Op(l).
p{X,)
q{x.)
Having obtained the "linearized expression," the finite-dimensional convergence of Q2n (z) now
follows by the arguments identical to those in Theorem 1. It remains to show that (Q2n (zi) i <
J) and (Qin(zi) ,i < J), for ajiy finite J, are asymptotically independent. This follows by
noting that the dependence between these terms is realized through the sums that disappear in
probability for any fixed z = {u,v) ;^ Er=i £i^f{^i\Xi,lo)liu e (0, A„ (Xi,«))) = Op(l),
,
-^ Er=i
C.2
£ In/ (ei|Xi,7o)
{u e (A„ (Xi,u) ,0))
= Op (1)
Proof of Theorem 5
Similar to theorem
it
1
suffices to
2,
the proof
is
a straightforward application of convexity
show the finite-dimensional
(fidi)
{rnizk),k<K) -A (r^(zk),k<Ky
As
before
we
lemma
convergence of r„ () to Too (),
follow two steps.
32
(1),
by which
.
1.
we approximate
First
For each
Me
r^, (Z)
= [
R-|.,
the integral Fn (z) over
the approximation of r„ (z)
P{Z-Z')
In (z') q
+
(/3o
R''
by an integral over a compact subset of
s T'^i (z) /r^2 where
R**.
given by F^' (z)
is
ffnZ') du,
= f
V'^^
J\z'\<M
i^ (z') q
(/?o
+ Hnz')
dz'
J\z'\<M
Define
p{zFi^ {z)= f
J\2'\<M
z')
e^
{z')
e^
q(Mdu, r^^ = f
J\z'\<M
(z') q(l3o)dz'.
Similaj to (19), lemmjis (7) and (8) in section (C.5) demonstrate that the
ratio process is exponentially small: for each e
>
>
0,5
3M
0,
tail
of the likelihood
e R+:
limsupP(|F„(z)-ri^(z)|>£) <5,
2. In
view of
follows
Ee„{z)
i
is
i.
of a limit
= 1<
- e'J\z)f <
(r2L,rfL
(zk) ,k
<
k\
This
c\z
-
z'\
be definition of likelihood ratio. Fact ii. is by Lemma 7. These facts check the conditions
theorem for integrals of random functions, Theorem 22 in Appendix I of IH (1982).
is
Two
lower-semi-continuous.
steps are needed:
Show finite-dimensional convergence of £„ to the stochastic limit
Show the stochastic equi-lower-semi-continuity (e-lsc) of {—in},
Theorem 4 handled step
1.5.1 in
We
i.
Lemma
conclusion follows by
It
-^
oo
By assumption —l„
i.
show (r'n2,T^i (zk),k<K^
Proof of Theorem 6
C.3
ii.
suffices to
it
facts:
E\eV^{z')
ii
Fact
(20),
by two
(20)
2,
demonstrate step
since
Z„
=
ii
After having these two steps the
next.
Op(l) by the
iao,
tail
smallness
lemma
8
and Theorem
IH(1982).
therefore remains to
equivalently of
show the stochastic equi-lower-semi-continuity
the proof of Theorem 4:
Qin{z)^u'(i
where Qioo()
subset of
R'^
.
(e-lsc)
of
{—in}
or
{—Q2n = — logf„}. From
is
+ Qi^iv),me°°(Z)
(21)
the gaussian process defined in the proof of Theorem 4 and
Z
is
any compact
Also
Q2n{z)
-
Qiniz)
-^ 0,
uniformly in z over Z, where
Q2n{z)
=
/
Ln{j,x)dN(j,x),
Je
lun(j,x)
Because of
(21),
only through u,
it
=
In
44l(0 <
p(x)
suffices to
we may
show
^ A„(i,u)) -|-ln^l(0 >
?
q(x)
s-e-lsc of
{—Q2n{z)}
only.
write Q2niu) instead. Because Q2n(u)
33
is
j
> A„(z,u)).
Because Q2n{z) depends on z
a piece- wise constant function
it
suffices to
show that
til,
...,
Urn
ur=i{
This
B C R'*' and S >
CU'^=iV{uk) and
any bounded
for
V{ui),...V{um) of some
s.t.
set
B
done in several steps. The proof
is
-02n(«) <
inf
uev(uj,)
there are open neighborhoods
0,
<S-
-Q2.K)}
nearly identical to the proof of
is
Theorem
and only
3,
the minor differences are highlighted below.
This step
(a).
is
constructed and
identical to step (a) in the proof of
its
limit
Theorem
N
where the point process
3,
is
found. This process eJIows to demonstrate the equisemi-continuity of
is
Q2rf Write
Q2„{V.)= f
lun{j,x)<m{j,x)+
JE:J>0
We
lunU,x)di<!(j,x)
f
= QtAu) + Q2ju).
JE:J<0
wish to examine the nature of discontinuities of Q2n by examining that of Q^^ and Q^„.
for either part, consider only part Q2n-
Because the arguments are identical
(b) This step
is
Here
of {Vkj}.
identical to step (b) in the proof of
=
for for sufficiently large n, Vkj
(^,Vr 6 X^{xj)^, where
dg{x,Po+ut(X)/n)
c^
>
and
v,.
=
kip, for
Theorem
{u G
k e
R"^'
{— rn,
...,
'
3 except that the construction
~f
V.k
0, ...m}.
j^^ ^ positive definite varicince matrix uniformly in
"^
'p{x)^n{x,v.)
<
Uj^
+
Observe that ^n(X,u) =
u' in any compact set, as
n —> 00, by assumption. Thus the support of this vector in non-degenerate in R"*' Thus, as in
Theorem 3, since also X is compact and p(x) is bounded away from and above by assumption,
{UfcjVitj} can cover any given bounded set B by selecting sufficiently large m, for large n.
(c)[Number of Break-points is Op(l) and separated for small tp\ Consider argument z in
.
KJjVkj,
then a discontinuity in 0271 C'^) (since it only depends on u through the index A„(x,u))
set VijVkj only if there exist u' S UjVfcj and {Tin, Xin) s.t.
can potentially occur in the
ri„=p(X.„)A„(Xi„,«*),
where
we
should be the case for large n that
it
say that
where k
=
Qj„ has
i^^.
—
<
(^
a breakpoint in \JjVkj- Define jV„
sup^.^^ u6bP(^)^(^i")
+
'^'P-
(22)
Tin <v_k
=
#{i
:
+
f- If there
r,n
For sufficiently large n,
<
Mn
fc},
is
is
jV
such (r,„,X,„),
= #{i
:
F,
<
k},
the upper bound on the
—
number of breakpoints of Qini^) ™ set B. By continuous mapping theorem, Nn
> J^ in R. So
the number of breakpoints 0{Mn) is stochastically bounded Op(l). Furthermore, breeik-points are
separated in the same sense as in the proof of Theorem 3. Define Ak to be the event that Q^ni^)
has more than two break-points in \JjVkj- Then by the arguments that are identical to those in
the proof of Theorem 3, for any S > Q <p can be picked small so that limsup„ P [VkAk] < 5/2.
(c) [grid-points
by setting
4>
q
:
Q
<
7)
«
constraints set
be
Next step is to construct the "centers" of Ukj of Vkj< A„(x,Ukj) < Vy. — ip + t], Vr e X^(xj), where
small in the next step. Depending on rj, to satisfy the
small]
Pick Ukj e Vkj so that for large n v^
—
ip
set sufficiently
ip
will
4>
sufficiently small as well.
(d) [Stochastic Equi-semicontinuity] This step
3,
except
C.4
we
replace
identical to step (d) in the proof of
Theorem
Proof of Theorems 7 and 8
Proof of Theorem 7 The
is finite
is
X' z with £\n(X,u).
for the
result
is
well
known. Under the stated conditions, the posterior
Bayes estimator, so Theorem
Proof of Theorem
8.
Z„ denotes the
1.1 in
re-scEiled
34
Ch.4 of Lehmann applies.
Bayes estimator Hn^ilBayes — 7)
risk
.
As a preliminary step we have
>
for large n, for c\,C2
and any
H
^{|Z„| >J/} <ciexp{-C2|H|},
which follows by Theorem
conditions verified in
Ibragimov and Hasminski that required the
and (8).
1.5.3 in
Lemmas
(7)
Zn under P-, depends on 7, which we emphasize by writing
by a polynomial imply that for some no,
\p{ZZ),n > no, 7
Z^{K)
Define
H~^
as
open
in
{fBayes,\K
ball at 70
!•
some no and any compact
<p{ZZ{K) — S),n>
smallness
Z^. (23) and majorzation of p
uniformly integrable.
is
(24)
"7)1 where fsayes.XK '^ ^^^ Bayes estimator defined with
70) € K}. By construction for large H:
P'{\ZUK)\>H} =
for
eis
tails
= l{Hn^{x —
respect to the prior weight Ak(i)
Thus
(23)
7
(25)
K,
sets
no,
0.
in
open
ball at 70
f
is
uniformly integrable.
(26)
Next we (a) evaluate
/
= Rp{{fBayes}, R'^)
and II {K)
and (b) show that II{K) approaches / from below as
Step
(a). Ri,{{fBay..},K)
=
= RpUfsayesAK} K)
,
K '\'R^
\\m^ JJ^Ep^^^,^p{Zl-''^^)d5|Leh{K)
= / Ep^^p{Z^)d5lheh(K) =
Ep^^piZ^)
Jk
by Theorem
5,
Thus
equations ((23)-(24)), and dominated convergence theorem.
= Ep^^p{Z^)
RpiifBay.s},^'')
Analogously, Rp{{fBaye.,Xj, },K)
=
lim„ /^ Ep^^^,^p(Z;i''^'HK))d5/Leh{K)
= [ Ep^^p{zUK))d5/Leh{K)
JK
by Theorem 5(applies to Z^"
(K) by the same argument
(25)-(26), and dominated convergence, where
.^^(iir)
Step
(b).
the Ihs
is
Next,
for
any
S,
fj^
=
aig inf
/
p{z
-
ft
-
as
it
S)lcx,{ft
Ep^^p{Zi:,{K))d5/heb(K)
<
applies to
-
Z^"
)
and equations
S)dft.
Ep^^p{Zoo). This follows because
the lower risk bound for Rp{{f„},K). Thus no other estimator sequence in {F„} that
from {fBayes, Xk } ^'^^ 'O- " achieves lower risk value.
Rewrite the inequality as
diifers
^ Ep^^ [piZ^K)) -
p{Zo.)] dS/Leh(K)
=
(27)
J
Ep^^ [piZtoiK)) - p{Zoo)Y dS/heb(K)
-
J
Ep^^ [p{Zi,{K)) - piZoo)]' d5/Leb{K)
35
<0
Next
J
—>
r{K)
as
oo
"
Ep^^ [piZtiK)) - p{Z^(K))] dS/heb{K)
=
'
Ep^^ [p(ZI^^''\k)) - p(Z^)\
J
dfi
^
^
0.
(28)
where r(K) denotes the width of the cube
K
(which
conclusion follows by (a) domination: for any
for
jBp^ij
any p e
—
{K))
Ld(2^'^
<
p(Zao)\
—
Z^
by
Zoo for p 6 (0,1)
>
minimizes, in probability;
\p{Z'^^'^\K))
oo.
last
(29)
as r(/£') -> oo,
it
this
-
p(Zo.)]
^
(30)
0,
Lemma
and convexity
definition
Z^
"
due to
1,
s.t.
,
pe
(0,
l)'^,r(K)
<
ooj
uniformly integrable.
is
must be that Jj^Ep^^ [p{Zi^{K)) - p{Z^)]^ d5lheh{K) ->
as
fi-di
minimizes to the objective function that Z^o
(29) the collection of variables
- p(Zoc)]~
Thus II{K) t /
Note that
by
(ii)
imply that
(27)- (30) also
r{K) -^
The
oo:
Sp^^p(^tx)) and (b) pointwise convergence:
convergence of the objective function that
I
r{K) <
(0,1)'^,
.
(0,1)''
EP,, [p(Z^'*^'(X))
since (i)
centered at zero by definition)
- p{Z^)] ' < p{Z^)
[p(Z^'<^'(/f))
so that
n €
is
/sT
t
as
R''-
shows that {fsa-yca} minimizes iip({/},R
).
Suppose that there
is
a sequence
{/^} in {F„} that achieves lower risk value than Ep^^p{Zoa)- But then this sequence must achieve
a lower value than II{K) for some large AT, which is impossible by the previous comment.
C.5
Lemmas
In this section
7-8
some important properties of
^n(z)=in(7(^))/i47)-
s
7
+ H^z
conditions of
lemma
aie established, where 7(2)
First
we note the
'EpJ^(zy'^<e-'^^\
By
the proof of
lemma
1.5.2 of
for z
=
(u,v) € R''.
4 that uniformly in
7
in a ball at 70
Ep^\in{zf'^-ltn{z'y'^\<B\z-z'\
IH(1982) the
the second inequality only needs to hold for
first
\z
—
inequality only needs to hold for z large,
<
z'\
1.
For the nonlinear model R, the conditional Bellinger distance r2
//I\f'^^ (y - g(x,t + hi)-x,'y + h) - f^^ (y - g (x,
Next we obtain the nonlinear versions of lemmas
Lemma
7 If 3a
>
m{rl(r,'r
then
3b> 0,B >0,
0,
A>
0,
such that for h
+ h)>amax{\hi\,\h2f)
s.t.
for
sup Epjn (zY''
all
<
u
large,
e-'l^l,
V2 e
>
(5)
and
+ h)^
7
t)
1,7) I'rfyFx {dx)
;
(6) for
R'', z'
suprj (r,!
e R^
,
\z
-
SUpEpJir. (Z)"' -
7
36
z'\
in
+
<
is
the model R.
small enough for 7 in a
and
defined as
(7;
h)
1.
ball at
< A
{\hi\
3no,
{z)"^
\
s.t.
70
+
\h2\'^)
Vn >
< B\z -
z'\
no,
and
Proof: For z
EpJ„{z)
On
{zY'^
|£„
-
-
1
the other hand, for
Ep^
same caJculation
large, the
\z
In [zY'''
5 leads to
v/y/n)
small,
z'\
<nrl
I
lemma
+ u/n,ao +
^''s (70; /9o
—
as in
(70
+
[uln,
v/^)
;
70
+
(u /n,
v'l^)) < A[\u -
u'\
+
\v
-
v'\^)
<A{\z-z'\ + \z-z'\'')<B{\z-z\)
Lemma
8 In model R, suppose condition (E.1)-(E.6) hold, then 3a
small enough; for 7 in a
inf r2 (717
+
/j)
ball at
> amax (1^1
|/i2|^)
1,
and
suprj (717
''
Proof of
rlir,
'y
<Ex
>
3A >
0,.
0,
s.t.
V/i
>
70
+ /») < A
(|/ii|
+
|/!2p)
7
Lemma
+ h)=Ex
8 For
>
e
J [f'"
and
I
>
small enough,
{y-g(X,t +
\f(y-g{X,t
f
77
e)
\X;
t
let
=
7
+ e,s + rj)-
{t,s),h
=
(e, rf),
f" (y - g (X,
t)
\X;
t,
s))
'
dy
+ e)\X;t + e,s + Ti)-f{y-g{X,t)\X-t,s)
'
9(X,f),s(X,t + 0]
+ Ex [
^[g(X,i),s(X,t +
+ Ex
(f'^' {y
e)]'=
- g(X,t + €)\X;^ + h) -
f' (y - g{X,t + e)\X;t + e,s)Y dy
^
"^
f {y
J
- g{X,t + e)\X;t + e,s) -
- g{X,t)\X-t,s) dy
f (y
(2)
<2Ex\giX,t + e)-g{X,t)\{p{X,'y) +
g{X,'y))
\df'^'(y-g(X,t + e)\X,t +
1^1
e,s
r ^^ r\df{y-g{X,t + uje)\X,t+we,s)
+ wv)
dydtj
dyduj
dt
(3)
'
< 2\e\Ex{p{X,'y) + q{X,'y)) f
\dg{X,t
+ se)
dsFx(dX) + 0{\Tif)+0(\e\)
dt
Jo
0{\e\)+0{\v\').
=
=
< a, and the bound is uniform in 7 by (E.4). The first
and from |a — 6p < \a^ — b^ for a > and b > 0. The first
term in the second inequality follows from that fact that for y & [g {X, t) g (X, t + e)] and e small
enough, |/ (1) < 2{p (X, 7) + g {X, 7)), and then we integrate over y over that area. The second
and third terms in the second inequality are usual multivariate first order Taylor expansions and
Fubini. The first term in the third inequality follows from Taylor expansion and Fubini. The
second term in the third inequality follows from assumption (E.4), while the third term in that
inequality follows from assumption (E.2).
where
[a, b]
[a, 6] if
a
<
6
and
[6,
a] if 6
inequality follows by tricingle inequality
\
,
|
To obtain a bound from below, consider
rf (7;
7
+ /i):
(f''{y-g{X,t + e)\X;'r + h)-f'^'{y-g{X,t)\X;y)ydy
=Ex [
^
y[s(X,(),s(X,(+c)] ^
dy
+ Ex [
(f'^y - g(X,t + e)\X-n ^ h) - f'^ {y - g{X,t)\X-y))'
'
J\s(X,t),giX,t+()Y ^
37
We
can bound the
E,\
term from below uniformly
first
in
7 by
+ e)-giX,t) p'l'(X,^)-q"\Xn)
g{X,t
dg{X-t)
>c\e\Ex
>cU\
dt
On
using assumption (E.3), Taylor expansion ajid Cauchy-Schwartz inequality.
bound the second term
7
of r2 (7;
+
the other hand,
h) uniformly below by:
{f"(y-g{X,t +
Ex [
J\g(X,t),9(X,t+€)Y ^
e)\X-^
+ h)-f'^(y-g{X,t)\X-t,s))''dy
2
y|s(x,<),3(x,(+£)]'
Under assumption
=
On
(E.4) (a), a lower
M^ Ex
\hf
"7
V
the other hand,
if
Thus, conclude that
bound
is \h\^
inf|„|=i
/^^'fa-^ff^(^.*);7) '^
I^
assumption (E.4)
inf-,
/
r| (7;7
(b) holds, the
+ h) > cmax (|e|,
Ex
—
~
f,
,x
t)
gix
\y^^ ^|^|2)
(+£)]=
>
(
^|^|2
uniform lower bound
2
^""37^^'^'"^^
"
dy
)
>
^|^|2
is
|J7^|).
References
AlGNER, D., T. Amemiya, and D. PoiRlER (1976): "On the Estimation of Production Frontiers: Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function," International
Economic Review,
17, 377-396.
"Auction Models Are Not Robust
Bajari, p. (1999):
When
Bidders Make Small Mistakes," Stanford
University.
BowLUS, A., G. Neumann, and A. Kiefer (2001): "Equilibrium Search Models and the Transition from
School to Work," International Economic Review, 42(2), 317-343.
Chernozhukov, V.
(2000):
"Conditional Extremes and Near Extremes," mimeo.
MIT and
Stanford
University.
Christensen, B., and a. N. Kiefer (1991): "The Exact Likelihood Function
Model," Econometric Theory, 7, 464-486.
Donald,
S.,
and
for
an Empirical Job Search
H. Paarsch (2000): "Superconsistent Estimation and Inferences in Structural Econo-
metric Models using Extreme Order Statistics," University of Texas and University of Iowa.
Donald,
S. G.,
and
H.
Distribution depends
Department
J. Paarsch (1993a): "Maximum Likelihood Estimation when the Support of the
upon Some or All of the Unknown Parameters," Typescript. London, Canada:
of Economics, University of
Western Ontario.
S. G., and H. J. Paarsch (1993b): "Piecewise Pseudo-Maximum Likelihood Estimation in
Empirical Models of Auctions," International Economic Review, 34, 121-148.
Donald,
Donald,
pirical
and H. J. Paarsch (1996): "Identifiction, Estimation, and Testing in Parametric EmModels of Auctions within Independent Private Values Paradigm," Econometric Theory, 12,
S. G.,
517-567.
Flinn,
C, and
J.
Heckman
(1982):
"New Methods
Dynamics," Journal of Econometrics,
18,
115-190.
38
for
Analyzing Structural Models of Labor Force
E., I. Perrigne, and Q. Vuong (2000):
Auctions," Econometrica, 68(3), 525-574.
Guerre,
"Optima! Nonparametric Estimation of First-Price
HiRANO, K., AND J. Porter (2001): "Asymptotic Efficiency in Parametric Structural Models with
Parameter-Dependent Support," UCLA and Harvard University.
Hong,
and M. Shum
H.,
(2000):
"Increasing Competition and the Winner's Curse:
Evidence from
Procurement," manuscript.
Horowitz,
"The Bootstrap," University
(2000):
J.
AND
IBRAGIMOV,
I.,
Knight, K.
(1999): "Epi-convergence
of Iowa; to appear.
Ann.
(1973):
Probability,
Paarsch, H.
vol. 5.
R. Has'minskii (1981): Statistical Estimation: Asymptotic Theory. Springer Verlag.
and Stochastic Equisemicontinuity," Preprint.
Leadbetter, M., G. LlNDGREN, AND H. RoOTZEN
Sequences and Processes. Springer Verlag.
Meyer, R. M.
Handbook of Econometrics,
"A poisson-type
limit
(1983):
theorem
for
Extremes and Related Properties of Random
mixing sequences of dependent
'rare' events,"
1.
(1992): "Deciding
between the
common and
private value paradigms in empirical models of
auctions," Journal of Econometrics, 51, 191-215.
PoLiTis, D.,
J.
Resnick,
I.
S.
Robert, C.
Romano, and M. Wolf
(1987):
P.,
and
Rockafellar, R.
T.,
VAN der Vaart, a.
G. Casella (1998): Monte Carlo Statistical Methods. Springer-Verlag.
and
R. B.
New
Vuong, Q.
Wets
(1998):
Variational Analysis. Springer-Verlag, Berlin.
(1999): Asymptotic Statistics.
van der Vaart, A. W., and
Verlag,
(1999): Subsampling. Springer Series in Statistics.
Extreme Values, Regular Variation, and Point Processes. Springer-Verlag.
J.
A.
Wellner
Cambridge University
(1996):
Press.
Weak convergence and
empirical processes. Springer-
York.
(1989): "Likelihood-ratio tests for
model
selection
pp. 307-333.
39
and non-nested hypotheses," Econometrica,
Date Due
7/2^/^
MIT LIBRARIES
3 9080 02527 8919
Download