Document 11159492

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium Member Libraries
http://www.archive.org/details/likelihoodestimaOOcher
»
:>
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
A CLASS OF
NONREGULAR ECONOMETRIC MODELS
LIKELIHOOD ESTIMATION & INFERENCE
IN
Victor Chernozhukov
Han Hong
Working Paper 03-1
January 2003
Revised: May 9, 2003
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research
Network Paper Collection
http://ssm.com/abstract=386680
at
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
MAY
I
3
2003
LIBRAR IES
LIKELIHOOD ESTIMATION AND INFERENCE LN A CLASS OF
NONREGULAR ECONOMETRIC MODELS
VICTOR CHERNOZHUKOV AND HAN HONG
Abstract.
In this
paper we study estimation and inference
the conditional density, where the location and size of the
Two prominent examples
are auction models,
in structural
jump
models with a jump
where the density jumps from zero to a positive
and the equilibrium job search model, where the density jumps from one
value,
inducing kinks
in
in
are described by regression lines.
the cumulative distribution function.
An
early
model
level to another,
of this kind
by Aigner, Amemiya, and Poirier (1976), but the estimation and inference
in
was introduced
such models remained
an unresolved problem, with the important exception of the specific cases studied by Donald and
Paarsch (1993a) and the univariate case
is
in
Ibragimov and Has'minskii (1981a). The main
the statistical non-regularity of the problem caused by discontinuities
in
difficulty
the likelihood function.
This difficulty also makes the problem computationally challenging.
This paper develops estimation and inference theory and methods
such models based on
for
likelihood procedures, focusing on the optimal (Bayes) procedures, including the
results
and confidence
tionally.
intervals.
The Bayes
The Bayes procedures
obtain
of independent practical
method with good small sample
and theoretical
likelihood in these models, in
statistic is not
are attractive both theoretically and computa-
confidence intervals, based on the posterior quantiles, are
a valid large sample inference
is
MLEs. We
on convergence rates and distribution theory, and develop Wald and Bayes type inference
which the
interest
maximum
shown
to provide
This inference result
properties.
due to the highly non-regular nature of the
any
finite
dimensional
Principle. Point Process, Frequentist Validity, Posterior, Structural
Econometric
likelihood statistic or
asymptotically sufficient.
Key Words: Likelihood
Model, Auctions, Equilibrium Search, Production Frontier
The original
2003.
We
draft of this paper
would
like to
was presented
at the
December 2000
EC 2
meetings
in
Dublin. This version
is
March,
thank Joe Altonji, Stephen Donald, Christian Hansen, Jerry Hausman, Joel Horowitz, Ivan
Fernandez, Shakeeb Khan, Yuichi Kitamura, Rosa Matzkin, Whitney Newey, George Neumann, Harry Paarsch, Frank
Schorfheide, Robin Sickles, Richard Spady,
EC2
We
meeting,
MIT, Northwestern,
Max
Stinchcombe as
well as participants in the
Princeton, Rice, University of Texas at Austin, and
thank three referees and the co-editor
are grateful to our our advisor Takeshi
for
numerous
Amemiya
Duke
CEME
conference, the
for valuable suggestions.
useful suggestions that improved the paper considerably.
for suggesting this
1
problem to
us.
We
1.
Introduction
This paper develops theory for estimation and inference methods in structural models with jumps
in
the conditional density, where the locations of the jumps are described by parametric regression
curves.
The jumps
in the density are
result in non-regular
and
very informative about the parameters of these curves, and
difficult inference theory,
implying highly discontinuous likelihoods, non-
standard rates of convergence and inference, and considerable implementation
Amemiya, and
Many
sis.
difficulties.
Aigner,
Poirier (1976) proposed early models of this type in the context of production analy-
recent econometric models also share this interesting structure. For example, in structural
procurement auction models,
cf.
Donald and Paarsch (1993a), the conditional density jumps from
zero to a positive value at the lowest cost; in equilibrium job search models (Bowlus,
and Kiefer (2001)), the density jumps from one positive
Neumann,
another at the wage reservation,
level to
inducing kinks in the wage distribution function. In what follows,
we
refer to the
former model as
the one-sided or boundary model, and to the latter model as the two-sided model. In these models,
the locations of the jumps are linked to the parameters of the underlying structural economic model.
Learning the parameters of »the location of the jumps
is
thus crucial for learning the parameters of
the underlying economic model.
Several early fundamental papers have developed inference methods for several cases of such
els,
including Aigner,
Heckman
Amemiya, and
Poirier (1976), Ibragimov
mod-
and Has'minskii (1981a), Flinn and
and Kiefer (1991), Donald and Paarsch (1993a, 1993b, 1996, 2002),
(1982), Christensen
and Bowlus, Neumann, and Kiefer (2001).
Ibragimov and Has'minskii (1981a) (Chapter V) ob-
tained the limit theory of the likelihood-based optimal (Bayes) estimators in the general univariate
non-regression case, and obtained the properties of
MLE
in
the case of one-dimensional parameter,
van der Vaart (1999) (Chapters 9.4-9.5) discussed the limit theory for the likelihood
in
the univari-
ate Uniform and Pareto models, including Pareto models with parameter-dependent support and
additional shape parameters. Paarsch (1992) and Donald and Paarsch (1993a, 1993b, 1996, 2002)
introduced and developed the theory of likelihood
(MLE) and
related procedures in the one-sided
regression models with discrete regressors, demonstrated the wide prevalence of such models in
structural econometric modeling,
and stimulated further research
Nevertheless, the general inference problem posed by Aigner,
remained unsolved previously. Very
in
little is
known about
in this area.
Amemiya, and
Poirier (1976) has
likelihood-based estimation and inference
the general two-sided regression model. In the general one-sided regression model, the problem
and inference
of likelihood-based estimation
important exception being the
(1993a).
1
The
is
remains an important unresolved question, an
general theory of such regression models
different structure
'There
also
MLE theory for discrete regressors developed
than the corresponding theory
also a literature on the
continuous covariate case, see
e.g.
is
by Donald and Paarsch
more involved and has a substantively
for the univariate (non-regression) or
dummy
ad hoc "linear programming" estimators of linear boundary functions covering
Smith (1994) (Chernozhukv (2001) provides a detailed review and other
related
regressor case.
computational
This paper
methods
in
2
Moreover, there
difficulty of
is
a considerable implementation problem caused by the inherent
the classical
offers solutions to these
(maximum
likelihood) estimates.
open questions by providing theory
both one and two-sided models with general regressors.
likelihood-based optimal 3 Bayes and also the
MLE procedures.
for estimation
and
These methods
inference
on the
rely
This paper demonstrates that these
are tractable, computationally and theoretically attractive ways to obtain parameter estimates,
construct confidence intervals, and carry out statistical inference. These results cover Bayes type
inference as well as
Wald type
inference.
We show that Bayes inference methods, based on the posterior quantiles, are valid in
and
also perform well in small samples. Moreover, these inference
no knowledge of asymptotic theory on the practitioner's
attractive due to their well-known finite-sample
are computationally attractive
(MCMC),
in
see e.g.
when
and
carried out through the
likelihood principle. Importantly,
models
(in
sample theory of likelihood
we show that the
procedures are generally not functions of the
dummy
for these
models,
any kind of inference based on the
MLE is generally not an asymptotically sufficient
contrast to the non-regression case or
the likelihood contains more information than the
or
They
••
useful not only for the present analysis but also for
statistic in these
risk optimality.
Markov Chain Monte Carlo procedure
MLE.
All of these results are preceded by a complete large
is
These estimation methods are also
sample average
Robert and Casella (1998), which helps avoid the inherent curse of dimensionality
the computation of the
which
part.
large
large samples
methods are tractable and require
MLE
does,
dummy
and the
regressor case). Therefore,
totality of likelihood-based
MLE asymptotically, as they
are in the non-regression
regressor case (or regular models). This motivates the study of the entire likelihood
and
the wide class of the likelihood-based procedures.
results). In
with the
2
some
MLE
In fact,
special cases, such as homoscedastic exponential linear regression models, these estimators coincide
asymptotically.
we show
in this
paper that unlike
in
discussed in details by van der Vaart (1999) or
statistics.
MLE
limit likelihoods
3
is
the univariate models, such as the Uniform and Pareto models
dummy
regression case, there are no finite-dimensional sufficient
not asymptotically sufficient either, making inference theory difficult to analyze.
We
depend on multivariate Poisson point processes with complex correlation structure.
This terminology follows that of Berger (1993),
p. 17.
show that the
and Porter
(2002).
They
4
provide a detailed analysis of asymptotic minimax efficiency in a class of boundary models.
They
Our work
also related to a recent important contribution by Hirano
is
employ an exponential-shift experiment framework along with group analysis to generate new
results
and insights on the efficiency structure of Bayesian estimators (which also motivate the present
research) and prove the inefficiency (sub-optimality) of the
and absolute deviation
and inference problem
We briefly
criteria.
We
MLE
under the
study a different set of questions
in the general two-sided
common mean
squared
focusing on the estimation
-
and boundary models.
summarize the contributions of this paper as
behavior of the likelihood ratio process and show that
it
follows. First,
we
derive the large sample
approaches a simple, explicit function of
a Poisson process that tracks the extreme (near-to-jump) events and depends on regressors
interesting way. This limit result
The
limit
useful since
is
it
is
useful for
any inference that
can be easily simulated
we prove the
on the likelihood
in
an
principle.
order to evaluate the limit distributions of
in
derived estimators and various likelihood-based statistics.
Second,
relies
To our knowledge,
these results are new.
consistency, derive the rates of convergence, and provide the limit distribu-
tion of the likelihood-based optimal estimators (BE)
for using these estimators in empirical
work.
More
and MLE. The
results are basic prerequisites
Wald
importantly, these results justify general
type inference based on limit distribution, subsampling, and parametric bootstrap.
Third, we show that posterior r-quantiles are asymptotically
of the true parameters. This property implies the validity of
on the posterior quantiles.
(1
—
r)-quantile unbiased estimators
Bayes type confidence
intervals based
These confidence intervals provide valuable practical inference meth-
ods since they are simple to implement and require no detailed knowledge of asymptotic theory.
This frequentist validity result
also of general theoretical interest because
is
complicated likelihoods where no
finite
We
proof applies more generally to other problems.
inference about general
it
covers models with
dimensional sufficient statistics exist asymptotically, and
smooth functions
its
further generalize this result to cover Bayes
of parameters,
and show that
it
provides valid inference
asymptotically.
Fourth,
we
briefly discuss
how the well-known
procedures carries over to the
4
The
limit.
finite-sample (average-risk) optimality of Bayes
discussion
is
auxiliary and given here to prove
some
Hirano and Porter (2002) also derive the limit distributions of BE's for continuous covariate boundary models
different form.
theirs somewhat.)
An important
Poisson process, hence
(2002)'s limit
in
a
(As Hirano and Porter (2002) note, the present treatment of continuous covariates appears to precede
is
it
is
difference
is
that our limit likelihood
is
stated in terms of a simple transform of a
quite simple to simulate for inference purposes (which
is
our focus). Hirano and Porter
implicit, given as a process indexed by continuous covariate values. Consequently, their result
suited for efficiency analysis (which
is
their focus),
and
its
use for classical inference
without the (equivalent) Poisson type representations obtained
in this
paper.
in practice
may be
is
more
infeasible
Also, the present paper focuses on
inference in both one- and two-sided models.
5
The optimality
of
Bayes estimates
is
treated in considerable details elsewhere in the literature.
The
recent
contribution by Hirano and Porter (2002) provides a detailed limit-of-experiments analysis for a class of boundary
models.
Lehmann and
Casella (1998) provides a basic discussion,
3
van der Vaart (1999), Chapter 9.3-9.4, treats
of the previously stated results and for the justification of BE's, including the approximate
MLEs. The
exact
MLE,
exact
procedures asymptotically. But
even
it is
when
close to the approximate
MLE
defined as a
function that approximates the delta function such as the 0-1 loss I[\u\
power
losses.
6
Such
loss
that sense, the exact
delta function,
>
BE
under any
MLEs
is
approximately optimal under any
and may perform better under the alternative
generally can not be dominated by any other given
e]/e or (truncated) p-th
loss function that
loss functions
approximate the
than other likelihood
BE
when the
the loss function of the end user of her result. Thus, the
method
for
comparisons are
made
across
MLE
method, advocated by Donald
the context of the discrete-covariate boundary models, provides a valuable
in
estimation and inference in both the two-sided and one-sided regression models. 7
we show through simulation examples based on an empirical auction model from Paarsch
Fifth,
(1992) that(l) particular BE's and
critically
on the measure of
intervals based
on the
risk,
MLEs
and
work quite
(2) the
limit distributions
and
well
their relative performance
Bayes confidence intervals and the Wald confidence
less
expensive computationally.
The Bayes
produce the shortest confidence intervals among other methods.
intervals also
depends
perform as accurately as the Wald confidence intervals based
on the parametric bootstrap, but are much
justifies
risk
MLE
Such comparisons are relevant when the empirical investigator does not
different loss functions.
and Paarsch (1993a)
loss
functions penalize mistakes differently than the squared loss function. In
procedures such as posterior means or posterior medians. Importantly, this implies that the
know
and
bias-corrected, generally does not coincide with optimal
confidence
Thus, this paper
a whole array of useful and practical inference techniques, ranging from Wald type to Bayes
type inference methods.
The Model, Examples, Assumptions, Procedures
2.
This section describes the model and provides an informal discussion of the assumptions, results,
and inference procedures developed
The Model.
2.1.
It is
in
the later sections of the paper.
convenient to describe the class of models
model where the errors have a discontinuous
iid
sample of
size
density. Let
consider in terms of a regression
(Yj, Xi) ,i
= 1,
. . .
,n, denote the
random
n generated by the model
Y =g(X ,0)+e
i
efficiency jd the
we
Uniform and Pareto models
i
(2.1)
i ,
in the non-regression case.
Ibragimov and Has'minskii (1981b), p.93
prove a fundamental result on the generic asymptotic efficiency of Bayes procedures under general, non-primitive
(hard-to-verify) conditions.
The approximations
exact
are
Bayes procedures and are optimal
in that regard,
and hence can be a good substitute for
MLE.
Additional important and distinct properties of the
MLE
include
(i)
invariance to reparameterization and
independence from prior information. Arguably, both properties are very useful.
4
(ii)
where
F; is
the dependent variable, Xj
and the error
e*
is
a vector of covariates that has distribution function
The
has conditional density / (e\Xi,/3,a).
central assumption of the model
Fx
,
that
is
the conditional density of the error / (e|Xj, 0, a) has a jump (or discontinuity) normalized to be at
0,
which may depend on the parameters
and a:
Yimf(e\x,0,a) =q{x,0,a),
(TO
\\mf{e\x,0,a)=p{x,0,a),
(2.2)
(10
p(x,0,a) >q(x,0,a) +8,
Hence,
>
8
sets of parameters, collected into
possibly the error distribution
e
X = support(X), V(Aq)e6x1
model the location of the discontinuity
in this
by the regression function g{X,0), which
that
Vx €
0,
BCR
d/>
and a £
=
a vector 7
and a
Ac Rda
.
(0',a')',
affects the
We
in the density of
conditional on
where
is
given
affects the regression curve
assume that the parameter
also
X
Thus, there are two
shape of the error distribution only.
and convex, and that the true parameter belongs to the
We
Y
described by the parameter 0.
is
set
Q=Bx
and
We assume
A is compact
interior of this set.
consider two models: the one-sided model and the two-sided model. In the one-sided model,
the conditional density jumps from zero to a positive constant. In the two-sided model, the conditional density
jumps from one
positive value to another.
two-sided model. In addition, Aigner,
model may
model the
Amemiya, and
outliers.
in the density,
More
is
a special case of the
Poirier (1976) suggested that the two-sided
generally, the two-sided
model approximates models with a sharp change
where the location of the change depends on parameters and regressors. The
with a density jump.
in
The two-sided models
such models
is
finite
approximated by that in the model
also naturally arise in equilibrium search models, see
Bowlus, Neumann, and Kiefer (2001).
The key
feature of the regression
model
is
that the conditional density of
sharp discontinuities
The
difficulties.
in
given
X
jumps at
the likelihood, which create statistical non-regularities and computational
discontinuities are highly informative
(The simplest univariate example
is
the other hand, inference about
Note that
Y
and covariates X. This feature generates
the location g(X,0), which depends on the parameter
On
one-sided model
be applied to one-sided models in the presence of outliers, using an additional side to
sample distribution of parameter estimates
e.g.
The
classification of the
the uniform model
a
is
standard
in
and imply estimability at rate n.
about
U (0,/3),
many
model's parameters into
where
is
estimated at the rate n).
regards.
a and
is
motivated
statistically, as in
Donald and Paarsch (1993a) and van der Vaart (1999) (who considered univariate Pareto models).
The boundary parameters
earlier. If
usually coincide with the
they do not and the Wald type inference
is
main economic parameters,
to be used, then
as indicated
one needs to reparameterize
them
into
a and
/3,
see e.g.
Donald and Paarsch (1993a). 8 However, the practical use of Bayes type
inference or parametric bootstrap
In the following,
we
methods do not require such reparameterization.
a structural example, which
briefly review
our regularity conditions and explain the results.
bility of
will
serve to illustrate the plausi-
provides an example for the
It also
Monte
Carlo work.
Example: Independent Private Value Procurement Auction.
Consider the following
econometric model of an independent private value procurement auction, formulated in Paarsch
(1992) and Donald and Paarsch (2002). Here, Yi
Xi =
i-th auction
The
of
V
is
the winning bid for auction
minus
1,
and Z, denotes other observed
bidders' privately observed costs
given
X
follow
an
and the covariates
characteristics of auctions.
Pareto distribution given X,
iid
i.e.
the density
described by
is
=
fv (v\x)
where 6 2 and
V
i
mi denotes the number of bidders in the
(Zi,rrii) describes variation across auctions, where
M£
v
are parameterized as functions of
9\
—
notation convenience). E.g. 9\ {X,fi)
>
X
>
e1
and
exp(/?(Z) and
2
o,
(3
e2
>
o,
(but this dependence
(X,P)
=
is
suppressed for
exp(P'2 Z).
Assuming the Bayesian Nash Equilibrium solution concept, the equilibrium bidding function
satisfies
a(v)
= v+
_ Fv{v]x))m
(1
which
is
=
at v
the cost plus the expected net revenue conditional on winning the auction. Evaluating a (v)
6\ gives
the conditional support for the winning bid. As shown in Paarsch (1992), this implies
the following density function of the winning bid Y, which
the specified bidding
rule, conditional
=
(y\x,e u e
U, ,,Y/l,M
2
8
A
,
is
the
first
order statistic generated by
on covariates X:
02m [M™-D-i]
^^
)
referee pointed out the following example.
J^(m-l)
./ >
^
1
Suppose g($)
=
#i 62,
6>
2
(m-l)-l
where 62 also
error distribution. Although this example does not correspond to the economic model
affects the
we used
in
shape of the
the simulations,
it
highlights the important issue of reparameterization. In this case the asymptotic theory requires reparameterization
into
/3
=
6\02 and
type inference
a =
may
9\
,
and then the estimates
finite-sample estimation uncertainty about
2(6
-q) /q & n~
3
2
of
fli
and 62 are deduced from the estimates of a and 0, and Wald
be carried out using the Delta method (preferably based on the second order expansion, so that
l
ZB la
+ n-
l/2
Z a /a 2
not neglected). E.g. for 02 = PI a (02 — 02) ~ {fi — P)/a — (a — a) /a 2 +
_1
2n (Z a ) 2 /a 3 where Z s and Z a are the limit distributions of /3 and
/3 is
+
,
a. This expansion can be important because in finite samples, variability of estimates of
larger order than that of 6, motivating this expansion.
(02
—
02)
~
~(oc
- a)/a 2 +
Of
o p (l) but this approximation
course, one could use the
is less
6
accurate.
first
/3
may be
of comparable or
order Taylor expansion too,
Therefore, this
is
an example of a one-sided regression model
Y =g(X
i
i
,0)
+e
(2.1)
where
i,
with
g(X,0)
=
9 1 (X,0)
=
(m -
9 2 (X, 0)
1)
/(9 2 {X, 0)
ti
2.2.
Regularity Conditions. The main regularity conditions
A. They serve to impose
identification
(a)
1)
-
1),
fY (g(X,0) + e\X,0) conditional on X.
and
has density f(e\X,0)
(m -
C0-C5
are collected in
and compact, convex parameter space (with true parameters
(b) continuous differentiability of the regression function
in the interior),
g{x;0) in 0,
(c)
nondegeneracy and boundedness of the vector dg (X,0) /d0,
(d)
continuous differentiability and boundedness of the density function /(e|x,7), of
first
derivatives in
7 and
€,
and of the second partial derivatives
continuous differentiability and integrability of the
(e)
Appendix
types of assumptions:
five basic
first
in
7(except at
and the second
e
its
=
partial
0).
partial derivatives
of In/ (e|a;,7) in 7.
Conditions of types
(a)
-
(c)
are standard in nonlinear likelihood analysis.
tions of type (d) represent a generalization of the conditions of Ibragimov
Conditions of type
are the standard conditions for regular
(e)
van der Vaart (1999), Chapter
if
is
7.
Conditions of type
flexible
likelihood models, e.g. as in
that inference about
enough to cover various auction models,
models, and equilibrium search models.
a
is
standard
= II /W-*(*<,W«;7).
•
technical
Define the
where the
addendum
risk is
gives an
likelihood-based estimators that minimize the average
computed under
example
(2.3)
<n
The optimal Bayes estimators are the
risk,
and Informal Overview of Results.
10
Lnil)
expected
frontier production function
9
Definitions of Estimation Procedures
likelihood function as
A
and Has'minskii (1981a).
known.
These conditions are
2.3.
smooth
(e) reflect
Smoothness condi-
different
parameter values and then averaged over
of verification of these conditions in the auction
model that underlies
our Monte-Carlo simulations.
I0
The
of {X,,i
likelihood can be
<
n). This term
is
made unconditional by multiplying through with the
density (probability mass) function
omitted because this additional term does not affect the definition of the likelihood ratio
or can be otherwise canceled out.
these parameter values.
The procedures
7
=
where p n (7)
Q,
and
p(n/3,y/na)
= arginf [
a
is
i n (7)A*(7)//c ^n(T')/
i
loss function
(7')^7'
analysis, as in
made
is
pn
p„( 7
7)
^fji-
,
(2.4)
the weight density function (prior density) on
is
tne posterior density.
ls
d),
The optimality
explicitly
dependent on the sample size
purposes of asymptotic
for
Ibragimov and Has'minskii (1981b), but this may be ignored
Appendix A. Examples
properties of the Bayes
11
and standard conditions are imposed on the
in
-
loss function, p(-)
procedures carry over to the limit.
The
are generally of the following form:
loss function p
and the prior
p,
in practice.
Convexity
D1-D3
and collected as
of such loss functions include
= z'z, a quadratic loss function,
= Sj_i \zj\, an absolute deviation loss function,
p (z; t) = Y.%1 1 zj > °) ~ T z3> T e (°> !)> a variant
(A) p (z)
(B) p (z)
(C)
I
check deviation
)
(
of the
Koenker and Bassett (1978)
loss function,
Solutions of (2.4) with loss functions (A), (B), (C) generate
BEs 7
that are, respectively,
(a)
a vector
of posterior means, (b) a vector of posterior medians (for each parameter component),
(c)
a vector
of posterior r-th quantiles.
Since
BEs become
very difficult to compute
when p
functions for pragmatic reasons. However, proofs of the
loss functions specified in
is
not convex, we focus on convex loss
main
results
Ibragimov and Has'minskii (1981b). In practice, 7 can be computed using
Markov Chain Monte Carlo methods, which produce a sequence
(1,
(7
whose marginal distribution
is
More
,...,7
(
of
draws
>)
(2-5)
),
given by the posterior. Appropriate statistics of that sequence can be
taken depending on the choice of
(B) above.)
apply more generally to other
p.
(E.g. the
generally, estimators
7
means
or component-wise medians for cases (A)
and
are solutions of well defined globally convex differentiable
optimization problems. 12
The computational
attractiveness of estimation and inference based on the Bayes procedures stems
from the use of Markov Chain Monte Carlo
of the
(MCMC) and
Bayes procedures. Since the Bayes estimates and the
the statistical motivation of definition
interval estimates are typically
medians, or quantiles of the posterior distribution, by drawing the
posterior distribution,
we can compute these
Furthermore, another motivation for
(2.3)
is
1
pn
Given the
is
MCMC
series (2.5)
7
MCMC sample of size 6 from the
quantities with an accuracy of order 1/vb. In contrast,
that any optimal (admissible) estimation procedure
procedure or a Bayes procedure with improper priors,
(if
means,
cf.
Wald
solves arginf 76 g j X]<=i P" [f
smooth) optimization problem.
8
is
a Bayes
(1950).
~
7' 4 ')>
which
is
a globally convex and smooth
MLE
the computation of exact
requires optimization of a highly non-convex, discontinuous
otherwise highly nonlinear likelihood.
MLE
can be estimated by grid-based algorithms or
and
MCMC
only with an accuracy that worsens exponentially in the parameter dimension.
MLE
The BEs and the
are consistent and
0-0 = O p (nThe BE's
are
shown to converge
process.
We
first
is
a prerequisite
shown
it is
in this paper that
and 5 - a
1
)
in distribution to
= Op {n-
1
'2
(2.6)
).
Pitman 13 functionals of the
limit likelihood ratio
develop a complete large sample theory of likelihood for these models, which
for
any inference based on the likelihood
principle.
In particular,
we obtain an
explicit form of the limit likelihood ratio process as a function of a Poisson process that can be
easily simulated.
This result implies that the limit distributions of the estimators can be simulated for purposes of
Wald type
inference through either (a) simulation of the limit likelihood process, or (b) resampling
may be more robust than
techniques including subsampling and parametric bootstrap. Subsampling
other methods under local misspecification of the parametric assumptions. However, the resampling
methods are much more computationally expensive and require much more computational time than
Bayes type inference. Simulating the limit distribution
is
comparable
in
terms of the computational
expense to Bayes inference due to the linearity of the limit process.
An
attractive practical alternative
results establish its large
intervals for r n (7),
is
the Bayes inference based on the posterior quantiles.
sample frequentist
where rn
is
a smooth
Our
Consider constructing a r x 100% confidence
validity.
real function that possibly
depends on
n. Define the r-th
posterior quantile of the posterior distribution as
c(t)
where p{z;r)
is
=
arginf f p
(f
MCMC
(r, i (7
r)
I
(1)
under mild conditions on r„, which
pragmatic motivation
for
is
^^
=
,
d7
(2.7)
,
{r„(7),7 € Q). In practice, c(t)
is
sequence evaluated at rn
),-,r n (7 (6) ))-
resulting r x 100%-confidence intervals are given
[c(r/2),c(l-r/2)], where
A
r„ (7)
the check function defined above, and TZ n
computed taking the rth-quantile of the
The
-
(2.8)
by
Kn^P^cir/Z) <
r„ (70)
<
c(l
-t/2)}
= r,
(2.9)
one of the main results of this paper.
Bayesian intervals
is
that the empirical researcher does not need to
have detailed knowledge of complex asymptotic limit theory to apply them. She can simply compute
the intervals through generic
MCMC methods, and then rely upon the present results that establish
the large sample frequentist validity of these intervals.
13
We
follow the terminology of Ibragimov and Has'minskii (1981b) p. 21.
Another
classical
procedure
is
MLE, which
the
7
=
=
0', a')'
defined by maximizing the likelihood function:
is
axgsup
Ln
(7)
MLE is a limit of BEs under any sequence of loss functions that approximates the delta functions.
We shall only briefly discuss the limit distribution of exact MLE for editorial reasons. A detailed
analysis of MLE is given in the technical report, cf. Chernozhukov and Hong (2003). The MLE
The
Converges in distribution to a random variable that maximizes the limit likelihood ratio.
Large Sample Theory
3.
This section contains the main formal results of the paper. Section 3.1 examines the large sample
properties of the likelihood ratio function. Characterization of the limiting behavior of the likelihood
is
necessary for obtaining
all
of the main results and
is
any likelihood based inference
useful for
methojds. Section 3.2 provides an intuitive discussion of this result and subsequent results through
an example. Section 3.3 describes the large sample properties of optimal Bayes estimators and both
Wald and Bayes type
inference procedures.
Section 3.5 briefly discusses the limit theory of exact
MLE.
3.1.
Large Sample Theory for the Likelihood.
analysis
is
A common
the limit serves to describe the asymptotic distribution of
step
step in
modern asymptotic
and Knight (2000). After appropriate strengthening,
criterion functions, e.g. van der Vaart (1999)
initial
first
to find the finite-dimensional marginal limit of the likelihood ratio process or other
is
all
Such an
likelihood based estimators.
sometimes called the convergence of experiments, see van der Vaart (1999).
Consider the local likelihood ratio function
e n (z)
where 7n (<5)
=
+
70
Hn S
matrix with 1/n in the
= Ln (nn (5) + Hn z)/Ln {ln {$)),
denotes the true parameter sequence.
first
dp
— dim (/5)
The
any
Hn
scaling by
corresponds to the convergence rates y/n for
function £„(z)
finite
is
Pln
Hn
is
a diagonal
is
=
dim
(a)
necessary for subsequent results.
a and n
for
14
j3.
if
for
k
^oo(-) is called
under
and
said to converge in distribution to Eoa (z) in finite-dimensional sense
(*»(**)'
and
Rd
diagonal entries and l/y/n in the remaining d a
diagonal entries. Consideration of the local parameter sequence
The
6 €
(S)-
We
a finite-dimensional
3<k)-> d
(4o(*i)>
3<k),
limit. In this section, ->,j
denotes convergence in distribution
partition the localized parameter z accordingly into z
The convergence
(3.1)
=
(u',v')
,
where u £
R
dff
rates are established as parts of the proof of the subsequent theorems, and follow from the
exponential decay of the likelihood tails
E£ n
(z)
~
const
10
e
_c
'
z
'
as
\z\
—>
00, see the proof of
Theorem
3.2.
Rda
corresponds to the localized location parameters and v €
corresponds to the localized shape
parameters.
Theorem
3.1 (Limits of the Likelihood Function).
C0-C5
Given conditions
pendix A, the finite- dimensional weak limit of the likelihood ratio process t n
form: for A(x)
=
p{X) = p{X,-y
dg(x,0o )/dP,
(z)
=
e l0O {v)
1 1oo (v)
=
exp
e
x
),
l
u (j,x)
exp(u'm+
= \n?P-l[0 <
j
p(x)
— oo
N
lnoo
,
is
=
oo and 1/0
co, oo
•
a Poisson random measure N(-)
J
t
variable.
>
1} is
an
>
{X!,£-,i
iid
l
u (j,x)dN(j,x)),
'
=T
t
=
's
is
>
j
J), and
TV (0,
A(x)' u]
,
convention applies to the case when q(x)
=
0:
InO
=
0, see equation (3.6) below].
= Y^\
/p(X,),
[W>-*t') e
1
S1 +
Ti=
T'i
=
"]
-(£[
+
+ SSi
.. .
an independent copy of {Xj,£j,i
1
Wn^i)
e
']>
where
+E
i>\
(3.3)
+ £<),
I>1
(3.4)
...
i,
sequence of variables where Xi follows law
1}
WA
q(x)
Jl=rjq{Xl),
{Xi,£i,i
/
JlixX
< A(x)' u]+\n?P-l[0>
[where Ibragimov and Has'minskii (1981b)
=
=\nf(ei \Xi ,'y),
(7)
/*
(W'v - v'Jv/2)
£h (lb)'), m = EPyo A(X)\p(X) - q(X)},
(£i, ( 7 o)
io
),
(z) takes the following
x £ 2 oo(w),
V
J = E„
and
(3 2)
i 2oo{u)=
where
q{X) =q{X,^
Ap-
collected in
>
1},
Fx, and
£, is a unit exponential
and both sequences are independent
o/W.
Remark
fRxX
l
3.1 (Alternative
Form). To analyze
u (j,x)dN(j,x) appearing
in
OO
the limit ^2oo(u) further, write the Lebesgue integral
the statement of
Theorem
OO
OO
£*««,*) +^(Ji,X!) =
£>
/
-v,
3.1 as
v
Syil
[0
<
J,
<
A [X,)'u]
(3.5)
+ £l*fSM°> 4 >*(*/)'*],
which
is
a simple function of the variables {X^X-, Jj, J-}. This suggests that the
function can be simulated simply by generating sequences of {X<, X-,
distributions specified in
Theorem
In practice, the quantities
p
(Xi)
3.1 for large
J-,i
<
b} according to the
and then evaluating the corresponding expressions.
and q (Xi) are replaced by
the empirical distribution function.
inference, see e.g.
b,
J,-,
limit likelihood
This replacement
Chernozhukov (2001).
1
is
their estimates,
and
Fx
is
replaced by
permissible for purposes of large sample
Remark
(Boundary Case). There
3.2
(boundary) model. Since q(X)
=
a.s.,
a drastic simplification of
is
using the rules stated
+X>GM) = J2 -°°
£>(./„ *,)
1 [°
£200 (u) in the one-sided
Theorem
< J <
'
3.1
A (*»)'«]
1=1
1=1
i=i
in
=0
(3.6)
0,
"{
Hence
for
,
Remark
3.3
I
.
exp(u'm)
if Jj
i
„
>
1,
otherwise.
> A(Xi)'u, for
.
alii
>
1
(3.7)
.
otherwise.
Theorem
3.1 (hence
in (3.4).
(Robustness to Misspecification)
Theorems
.
not
It is
of densities at the
jump
difficult to
3.2-3.4) that the limit theory for
fication of the regression function g{x,0) of order o(l/n),
and
/? is
observe from the proof of
robust under local misspeci-
local misspecification of the height
points p{x,-y) and q(x, 7) of order o(\/ri).
It
also appears that the qual-
nature of the limit theory would be preserved under local 0(l/n)- misspecification of the
and possibly under 0(1) misspecification on
regression function g(x,fi)
p{x,f)
all i
the one-sided models, the limit depends only on the set of variables in (3.3) and does not
depend on the variables
itative
(^i)'u, for
—00,
\
in
J4 > A
m = EA(X)p{X)
(u) = <
2oo(u)=S
Thus
if
>
(7(2,7) for all x.
p{x,~f)
Theorem
3.1
is
beyond the scope
its
A
formal development
of this paper.
extends the results of Donald and Paarsch (1993a) on the boundary models with
discrete regressors
Despite
q(x, 7) as long as
Given these mentioned conditions hold, the inference about a appears to
be robust up to local o(l)- violations of the information matrix equality for a.
of these results
and
and the
results of
Ibragimov and Has'minskii (1981a) on the univariate models.
unusual form, the limit likelihood has a simple structure. The term ^ioo(v)
is
a standard
expression for the limit likelihood ratio in regular models, and inference about the shape parameter
a
is
thus asymptotically regular. The limit log-likelihood has a standard linear-quadratic expression:
t/W This limit contains a normal vector
for
W=N
v'Jv/2,
(0,
J) and the information matrix
J
example, that conventional estimators of a, such as the posterior mean and the
.
This implies
MLE, have
the
standard limit distribution
jBecause
is
unknown, the
1
w = jv(o,j-
1
limit likelihood also includes
tinuities in the density are highly informative
about
12
/?
).
a nonstandard term ^2oo(u).
and are of a
local nature.
A
lot of
The
discon-
information
about P
that
is
contained
the observations YJ that are near the location of the discontinuity g(Xi,P),
in
for those Yi such that
is,
ei
close to zero.
is
=Y -g{X ,P)
i
i
Thus the behavior of extreme
(closest to zero) e[s determines the behavior of
^200 (u), as further explained in Section 3.2. Consequently, one expects that the rate of convergence
n
of likelihood-based estimators will be
likelihood estimators of
3.2.
will
ft
for
be determined by £2oo(")-
Informal Explanation Through an Example. Consider a simple model 15
Y =X'
i
where £
is
inference about
more
a
as
it is
=
1.
Assume that
z
i
,
= n(P — Po),
if
Yi
— X[p >
this constraint takes the
likelihood for this example
Ln
(P) /
is
a boundary model with the density at the
(3-8)
a (We do not
discuss the
a linearized, homoscedastic version
is
be most informative about P, as the likelihood function
0, for all
n (P
)
as
is
i,
ne*
> X'^P —
depend on these
constraints.
that
>
X[z, for
/?o will
L n {p) =
Y\ i
a function of z
M*) = II
when
is,
/3n),
for all
i.
Letting
form
learn about the parameter
likelihood ratio
= E,
models.
nti
What we can
ei
The model
regular as stated earlier).
realistic nonlinear
be positive only
The
+e
there are no shape parameters
Intuitively, the smallest values of tj will
will
o
i
a standard unit exponential variable. This
boundary equal to p{X)
of
contrast to ^/n for a), and that the behavior of
(in
(e-
e
'
all i.
e- <i+x '^-M\{ne > X[n(P - p
=
i
)).
Hence the
n(P — Pq) takes the form
+X '* /n /e-<-)
> X'z),
(nei
1
i<n
which further reduces to
= ex z l(n€i > X' z,
'
in (z)
Since
X-* p EX,
f
the behavior of E n {z) for fixed z
is
for all
(3.9)
i).
determined by the lowest order
statistics
"«(1). ««(2)) "f(3)>
The Reny
representation, see e.g. Embrechts, Kliippelberg,
and Mikosch (1997)
re-scaled order statistics to be represented almost surely as
n,
£\,
We
thank a referee
£\ H
n—17^2
for suggesting using
,
t\ H
n
-Si
n— 1
a similar example.
13
n
-\
n
—
xt3,...,
2
p.
189, allows these
where {£i,£2,...,£n }
stochastically
n
an
is
sequence of unit-exponential variables. For given z, essentially only a
iid
bounded number
—¥ co, for any finite
of order statistics, say
fc,
matters in the constraints
Hence as
(3.9).
A;
k
(ne(i), 7i€( 2 ) ,—,ne(k))-*d {£1,
£\
+
£2
,
-,
/]
£j)
3=1
=
Hence the marginal
may be
limit of i n {z)
=
£«,(«)
where {I\}
is
the sequence of
point process methods in
result stated in
The
density of
variables Tj
3.
term
and
>
l{Ti
X[z, for
just a
all i
>
above, and Xi
1).
is
the
iid
sequence of regressors
special case of the limit stated in equation (3.3),
in
equation
3.1 formalizes the intuition described
3.1
is
more complicated
=
e*'s
Eioo(v)-
a
information about
/3
statistics,
is
The
The
(3.3).
use of
it
to
is
for the following reasons:
also an additional negative error in equations like
from above and the
in equation (3.4) as the limit distributions of "extremes
may
gamma
variables
from below".
vary near zero, which changes the hazard rates of the limit
a
leads to the presence of an additional
of this term reflects that the inference about
given by the limit average score
W and
comes from a small portion of the
the average score
W
is
gamma
by varying hazard functions p{Xj) or q{X-).
Uncertainty about the additional shape parameter
The form
where
^2oo(w).)
above and extends
then largely deduced from the e^s closest to
/? is
r'; , resulting in their division
information about
type
z
to zero from below. This explains the presence of the additional set of
and associated regressors
2.
).
errors.
Theorem
The information about
e;'s closest
fc
'
more general two-sided models, there
1. In
r
,
a special case of points (J{,Xi) stated
is
Theorem
more general heteroscedastic
(3.8).
is
...
(Also there are no nuisance parameters in this example so that £tx{z)
1.
definition of points (1^, Xi)
The
E(x)
,
seen as
gamma variables defined
with distribution Fx- Note that this
p(X) =
e
r2
(r l5
a
The
regular.
the information matrix
entire sample
independent of those
is fully
and
is
J
.
limit
Since
based on extreme
statistics asymptotically.
This follows
from the standard proof of asymptotic independence of sample averages of general form and sample
minimal order
Lemma
3.3.
statistics, see e.g.
Large Sample Properties of Bayes Procedures. Given the above
Theorem
3.2
can be easily conjectured. The Bayes estimator Z„
centered at the true parameter 7„
is
Resnick (1986) for a general treatment and van der Vaart (1999)
21.19 for a simple example.
(6)
=
(/?„ (S)
related to the localized likelihood ratio £ n (z)
,
q„
(5)
)
—
discussion, the following
—
a„
(5))')'
and normalized by the convergence
rates,
as follows 14
=
it
(n(/3
/?„ (<5))',
-Jn (2
minimizes the posterior loss redefined
in
terms of the local deviation from the true parameter:
T n {z)=
p(z - z')n n
/
(z')dz'.
Ju d
Here n n
(z) is the posterior density for
nn
where £n
(z) is
(z)
=
l n (*)
M
(7n (6)
= A»
+
H n z) / f
JR
£ n [z]
fRd too {z) dz.
(z) I
The
fj. (
7n
(6)
H n z) dz,
+
d
the local likelihood ratio process and
be conjectured from the discussion
7roo {z)
the local deviation z from the true parameter:
p.
is
the prior density.
As n —>
in the previous section that the posterior
limit local posterior density
ir
oo,
it
can
n (z) approaches
a function of the likelihood
iTao is
only and does not depend on prior information.
Theorem
3.2 {Properties of BEs). Suppose that the conditions of Theorem 3.1 and
1.
The convergence rate
2.
Zn
—
>
is
n for estimating
If p(z)
=
pp{u)
y/n(a n — a n
Theorem
(5))
+
=
limit
is
terms £ioo(w) and
- Pn
i.e.
Z„
— Op (1).
Z& and Z a
^200 («)•
If
p{z)
Z
(&))-* d
—
{ *'_\
13
dz'
(3.10)
(z)dz
= argmf u fR *
v') £ioo («') dv' ,
given in the form
cf.
^
z')
pp (u -
and Z® and
u') e2oo [u')du'
Za
of a
Pitman functional of a
methods according to Remark
Remark
3.1.
= PpW) +
MLE
(1999)),
the limit distribution of
3.5.
p a (y) does not hold, part 3 of Theorem 3.2 does not
the loss function p a
5
is
is
is
a
coincides with
symmetric (by Anderson's lemma, see van der Vaart
given by
Af(0,JThis
and
result also justifies
are independent due to the factorization of too(z) into independent
that of the
if
limit distri-
limit likelihood
The
apply. Also, the limit distribution of the Bayes estimator of the shape parameter
i.e.
and
are independent.
and establishes the rates of convergence and the
MCMC
the use of the parametric bootstrap,
In the stated result,
-
J Rd too
JR<!a p a (v
3.2 obtains the consistency
The
p(z
2€R J iR i
arginf v
not difficult to simulate using
/
arg inf
p a (v), then n(J3 n
Za
-)•<*
butions of the BEs.
is
and -Jn for estimating a,
(3
Z, where
Z=
3.
D1-D3
Then
hold.
]
).
not the case for the estimators of the location parameter 0. Furthermore, as shown below
the optimal estimators generally are not transformations of the
the non-regression or
Remark
lidity of
3.4
(
dummy
MLE
asymptotically, contrary to
regression cases.
Wald Inference with Sub sampling). Theorem
subsampling for Wald type inference.
3.2
immediately
justifies
the va-
Subsampling approximates the distribution of the
estimator in the
sample based on values of
full
Implementation protocols are standard and can be found in
orem
Romano, and Wolf
2.2.1 in Politis,
polynomial in n
rates,
proven in Theorem
(ii)
3.2.
strap and (b)
is likely
(1999) applies provided
Romano, and Wolf
3.5
may
not be as high quality as the parametric bootstrap or simulation of
is (a)
computationally
less
demanding than the parametric boot-
to be more robust than other methods to local misspecification of parametric
affect the rates of convergence.
(Wald Inference with Parametric Bootstrap). As
weak convergence
(1981a), the
The-
the estimates are consistent at
(i)
models that change the parameters of the limit distribution but do not
Remark
(1999).
the estimates posses a limit distribution. Both of these conditions are
However, subsampling
limit.
Politis,
smaller subsets of data.
Thus, Theorem 3.2 immediately implies the validity of inference based on
subsampling. Subsampling
the
many
this estimator in
and conditional on almost every
in
Ibragimov and Has'minskii
and the proof can be stated uniformly
in
the parameter 7,
realization of the covariate sequence {Xj,i
<
n} (n —>
results
order to do so, the notation must be
made more complicated
in
00).
In
a manner similar to Ibragimov and
Has'minskii (1981a) to denote the dependence of the limit on the parameter 7 and on the realization
of the covariate sequence.
The uniform convergence
in distribution
is
defined as the convergence
of distributions under the Levy metric uniformly in the parameter 7, and conditional on covariate
sequences. This immediately implies that the parametric bootstrap
is
valid in the usual sense that
the bootstrap distribution converges to the limit distribution in probability under the Levy metric
as long as the preliminary estimate
estimator
7 may be
used.
7—
p 7, conditional
on covariate samples. Any
Hence the parametric bootstrap can be used
for
initial consistent
Wald type
inference
based on the point estimates (See for example Horowitz (2000)). Although the Bayes estimates are
not
difficult
to recompute (especially with a good starting value such as the
7), the parametric bootstrap appears to be very expensive computationally.
4, it is
may
much more computationally demanding than any
other method.
Bayes estimate
As discussed
in Section
The parametric bootstrap
not be robust against the local misspecification of the parametric models.
Next consider the posterior mean 7 and posterior quantile 7
(2.4)
initial
under the squared
Also define
Z
and
Z
loss
(r) as
and the check
(t) as
the solutions to the problem
loss functions, respectively (each defined in Section 2).
the solutions of the limit problem (3.10) under the squared and the check
functions, respectively.
Theorem
tervals).
3.3 (Mean-Unbiasedness,
Under
1. Posterior
the conditions of
mean
Quantile- Unbiasedness, Posterior Confidence In-
Theorem 3.2
estimators are asymptotically mean-unbiased:
16
<
Consider any
2.
=
and t
t"
<
r'
<
r"
1.
If
Z (t)
has positive density in the neighborhood of
then posterior r-quantiles are
,
Vm
o
P- rm w{<rt{T))
<
i
—
1
a.
[(7
r"))j]
(
(7
1
Results
and
1
For example,
2 follow
be that
A
0.
losses,
mean Z had a mean
asymptotic
risk regardless of
it is
r„
:
result concerns the
-
r n (7)
R
is
EZ =
c
/
0,
then the estimator 7
the local parameter sequence. Hence
= R'(a-
is
>
1
00) y/n
and
R=
[R',R']' with rank
+ R'{0-0o )n + O
determined by either parameter
m
function m (a)
If
a smooth function
If
a smooth
(3.13).
(2.9) in
Hn c
must
(/?)
is
(n\0
R=
a
-
o
\
1:
+ ^\a - a
made dependent on n
a
or
of interest, then taking r n (7)
is
Note that these transformations by %fn or n do not
all
)
.
(3.13)
have a better
of the asymptotic
in rates of
= n m (0) fulfills
= ^/n m (a) also
affect
a
\
specifically to
due to the difference
of prime interest, taking r n (7)
is
The
technical.
finite-sample approximation through the avoidance of the trivial case where
is
+
it
Consider inference about the function r n (7) where
For purposes of theoretical analysis, the function
inference
in section 3.4.
asymptotic validity of the posterior quantiles c(t) for inference
such that for a
rn (70)
..
asymptotically optimal under the r-check loss function.
about smooth functions of the parameters.
I »+ d " ->
(7)
argument applies to the rth-posterior quantile. The r-posterior quantile
similar
r-quantile unbiased because
d
the validity of confidence intervals
which are defined and established
requirement that Zj (r) has positive density around
The next
(3.12)
from the asymptotic optimality of posterior means and quantiles respec-
strictly lower
EZ =
is
sample inference on parameter components
ar g e
the limit posterior
if
would have a
is 1
l
under the squared and check function
tively
—
f° r
r'
(3.11)
< (tV)),} =r"-r'.
very useful implication of the quantile unbiasedness result
•
—
Hence
taP^JRMlj < (%(*)),
r '))
(
for t
= Ac{(^ W)j < 0} = 1-T,
(7n (*)),-}
where (a)- denotes the j-th components of vector
A
0,
r-quantile unbiased:
convergence.
condition (3.13).
fulfills
condition
the practical formulations (2.7)
-
Section 2.3 by the linearity of transformations and equivariance of quantiles to monotone
transformations.
Theorem
1.
3.4
For any
<
Inference with Posterior Quantiles
(
r
<
1, (c{t)
-
Z(r)
2. Provided
Z (r)
r n (y n (5))) -> d
S
arginf
Z
(r),
).
where
|/( i - R> Z ;r) £ ^^d,
^
has positive density over a neighborhood of
limPTn(4) {c(T) <
n
Under the conditions of Theorem 3.2
rn
(
7b (6))}
17
for r
= P7o {z(r) <
=r
o}
l
=
and t
1
-
r,
=
r"
(3.14)
and
lim
n
o
Theorem
3.4 generalizes
P7T>w {c(T )<r n
,
Theorem
3.3 to
(
more
inference about
struction of parameter confidence intervals. For example, a 1
hypothesis r n (70)
= rn
is
=
7„((5))<c(r")}
—
-
t"
t'
more general functions than conasymptotic test of the null
r-level
given by the decision rule that rejects the null
if
Optimality.
Lemma
3.1
below
optimality properties of BE's, which
3.3
and
A
3.4.
needed only
r/2)],
test.
and asymptotic average
briefly records the finite-sample
is
—
r„ $ [c(t/2) ,c(1
where (3.15) can be used to deduce the local power and consistency of the
3.4.
(3.15)
.
risk
the auxiliary purposes of proving Theorems
for
detailed valuable analysis of (minimax) optimality in non-regular models can be
found in Hirano and Porter (2002).
Define the normalization matrix
Hn
as in section 3.1,
the local parameter sequence. Consider the set
7„.
Tn
and
let 7„((5)
=
+ Hn 5,
70
of all statistics (measurable
5 €
Zn =
where
measures of
The
i/" [7„
— 7 n (^)] and
the expectation
finite
sample average risk (AR) of 7
is
the weight or prior measure over
measure. The asymptotic average risk
K|K
Compared
p(Zn ),
computed under 7n (<5). Consider the following
(S)
K, p
(AAR)
p(Zn )p^n (5))d6,
is
d
(3.16)
the loss function over K, and A
of estimator sequence {7 n }
limsuplimsup —tjtt \
KtR d n->°° A l-ttJ JK
where
EP
given by:
P
p. is
denote
risk.
~y J^E ^
where
is
,
mappings of data)
Define the expected risk associated with a loss function p and estimator jn as
1
Kd
EP
is
the Lebesgue
given by
is
p{Zn )d6,
(3.17)
denotes an increasing sequence of cubes centered at the origin and converging to
to the previous formula, the weight
p. is
Rd
.
replaced by the objective (uninformative) weight
overM'*.
Lemma
3.1. Suppose the conditions of
estimator under loss p and prior weight p,
1.
For each n
>
1 the
i.e.
at
The infimum of asymptotic average
00 and
is
(3.17).
(Z denotes the weak
Statement 1
is
i/" 1
For
hold.
[j P ,,,, n
—
%,,,,„
7n(<5)],
Zn — Zn
K = Un
limit of
denoting the Bayes
is
achieved over
T„
in (3.16).
risk over estimator sequences in
attained by the sequence of the Bayes estimators
a basic result of
Tn
€
Un = n (B — /?o) x y/n (-A — cxq)
of finite sample average risk for
infimum
by the Bayes estimator %,„,„,
2.
Theorem 3.2
Zn =
jPlll ,„,
T„
i.e.
equals
E P^
o
p(Z) <
{Zn } = {Zn }
in
Zn ).
statistics,
that the optimal estimator under loss p
estimator defined by the risk-weighting function
18
p and a
loss function p, cf.
Wald
is
a Bayes
(1950) and
Lehmann and
Casella (1998), Chapter
Statement 1
5.
average risk efficiency
MLE
may be worse than
but performs better under other loss functions,
Large Sample Theory of
MLE.
discussion of the
is
asymptotic
from Ibragimov and Has'minskii (1981b)
(this result essentially follows
For example,
loss function p.
which
an alternate definition
efficiency into
p.93).
that unlike in the regular case, the efficiency rankings are largely determined by the
It is critical
3.5.
often simply used as
is
Bayes (optimal) procedures. Statement 2 translates finite-sample
of the
Consider
the posterior
Section
cf.
mean under the squared
loss,
4.
Maximum Likelihood Procedures. We provide only a brief
the MLE Z n = (z% Z°'\ = (n{0 - n (5))', Jn~{a - a n (<5))'Y,
,
centered at the true parameter and normalized by the convergence rates.
Theorem
minimum
MLE). Under C0-C5,
Zn — O p (l) and
3.5 {Properties of
in
Rd
a.s.,
then
Zn ^ d Z =
Z°^t d Z a =
In particular,
J~
1
and supposing
arginf. 6RJ
W = N{0,J-
1
),
Z
-
£00(2)-
Z =
-> d
that -£oo{z) attains a unique
argmin u6R<f/ ,
and
l2oo {u),
Z
and
Z a are independent.
The proof
is
Due
Hong
given in the technical report (Chernozhukov and
is
an argmin of a
limit likelihood,
The
(2003)).
which inherits the discontinuities of the
finite
limit variable
sample likelihood.
to asymptotic independence of the information about the shape parameter from the information
about the location parameter, the
MLEs
for these
parameters are asymptotically independent. In
the boundary models, the limit result can be stated more explicitly for P as follows:
n(0 -
/?„
{8))-> d
Z =
arginf (
= argsup
- exp(u'm)
u'm
(
such that
such that
J;
>A
(*,•)'
> A(Ai)'u,for
J;
u, for all
all i
>
>
i
l\
,
lj.
This result generalizes the results of Donald and Paarsch (1993a) and Smith (1994). Note that the
weak conditions.
solutions of the linear programs like these are unique under fairly
Remark
3.6
the posterior
(Asymptotic Non- Sufficiency of MLE's).
means and medians are generally not equal
example of Section
3.2
E (X)
limit
maximum
is
important to note here that
to the bias corrected
(z)
= eE
W
likelihood variable
z subject to the constraint T;
>
z
Z
l [Tt
>
X(zM
maximizes
X\z, for
all i
all i
£00 (z),
>
1.
>
When A (X,)
who show that
A
Consider the
1).
which
equivalent to maximizing
is
In the no covariate case, the limit
For example, a very simple sufficient condition for almost sure uniqueness
distributed element in
MLE.
where
4o
The
It
(A,) and that Af-Y,) has a nondegenerate distribution,
is
that there
cf.
is
one continuously
Portnoy (1991)
for a related
problem.
has discrete support, the stated limit result coincides with the result of Donald and Paarsch
(1993a)
uniqueness holds
if
A(/f,) has nondegenerate distribution (assumed in C3).
19
MLE Z
implying sufficiency of Z.
transformations of
shift
Z
whether
X=
is
If
Z
Z
X
(1,X) where
implying that
(z)
Z
is
is
continuous,
(z)
Design.
where we
=
set 6 2
=
100 and n
auctions.
estimates.
We
1
£<*,
and
was
set to
all i
median,
(2) the posterior
mean,
(3) the posterior
mode (MLE),
across different risk measures.
(c)
the
>
1)
MCMC
B—
be
Here the
sequence.
results given in Table 1
nonrandom functions of the
[/3n
±
x
5]
±
[j3i
MCMC
We
5]
made
Z.
and a flat prior to compute the Bayes
algorithm described on
245 in Robert
p.
MLE
is
computed by taking the argmax over the
and Table 2 show that:
better under the
implementation the
first
grid
18
mean
(a)
mean
is
the posterior median
the best under the
10-th power loss function. Thus,
some
20,000 draws are
made
risk
with a fixed variance, and used
in
it
is
the best under
mean squared
appears that
all
loss,
of the
measure.
for a "burn-in stage",
variance of transition kernel every 200 draws in order to keep the rejection rate near
were
MLE
compare the performance of
likelihood procedures perform quite well relative to
In the
limit
computation of the estimates. The computations
in the
absolute deviation loss, (b) the posterior
MLEs do
with strictly positive probability,
17
(1) the posterior
mean
some
all
<?i
Quality of Estimation Procedures.
the
Z),
400, which are close to practical sample sizes encountered in empirical work on
and Casella (1998).
The
<
We used a simple procurement auction model similar to that in section
= exp(/? + A-X), with X ~ 1/(0, 1), /? = 1, P\ = 1, m = 3. We take
starting value
generated by the
z
e l(z
the general regression case. Taking the example with
were performed using the canonical random-walk
4.2.
=
n}, l<x>(z)
Computational Experiments
used the parameter space
The
<
even conditional on covariates. Thus, the limit likelihood-
are generally not
Monte Carlo
=
min{r;,i
easy to see that
it is
not sufficient for
4.
2.1,
in
X!Z,tor
Z
Z =
thus
then the limit optimal Bayes estimators are
sufficient
is
^ eEm '*l(X!z <
based Bayes estimators
n
z,
by the well-known Rao-Blackwell argument. This raises the question of
a sufficient statistic for £«, (z)
4o
4.1.
>
maximizes over z such that I*
the computation of estimates.
The
with adjustments
Then
.5.
C++
made
to the
additional 20,000 draws
implementation
is
available
from the authors.
18
We
also tried the approximate
MLE
performance of the approximate and exact
loss functions that
defined as a
MLEs
BE
coincided
under (truncated) 10-th power
up to many
digits
and
approximate delta function can also be used to approximate
20
is
loss function but the
not reported separately. Other
MLEs
by some BE.
4.3.
Quality of Inferential Procedures.
In the next step,
we compare the performance
inference methods, focusing on the coverage properties of the confidence intervals.
of several
We compare
(1)
the confidence intervals based on the posterior quantiles,
(2)
the percentile confidence intervals based on the parametric bootstrap of the posterior mean,
(3)
the percentile confidence intervals based in the parametric bootstrap of the
MLE,
(4) the percentile confidence intervals based on the subsampling of the posterior
1/4 x n as the subsample
(5)
=
(using
size),
the simulation of the limit distribution of
3.1 (using b
mean
MLE and other estimators as described
in
Remark
n).
Results reported in Table 3 indicate that the intervals based on the posterior quantiles and simulation
of the limit distribution perform nearly as well as the parametric bootstrap, while subsampling
performs worse than any of them. The confidence intervals based on the posterior quantiles also
appear to be the shortest on average. Given that the posterior intervals are the
compute, they should be preferred. The subsampling
and
is
is
least
expensive to
expensive than the parametric bootstrap
less
probably more robust for inference purposes under local misspecification.
In terms of computational expense,
on a Pentium
III
PC. Simulation
computation of the posterior quantiles takes
of the limit distribution
is
less
than
1
minute
roughly twice as expensive (because
the limit expressions are simple transformations of linear functions and do not contain nonlinear
We
expressions).
the
used 200 bootstrap draws and the
MCMC algorithm
(which reduces the number of
full
sample estimate as the starting value
in
MCMC draws needed in the re-computation of
the estimates). Using this implementation, 200 bootstrap draws take between 7 and 30 minutes for
samples n
=
100 and n
=
400. (Thus, 1000 bootstrap draws take up to 150 minutes for
The subsampling takes about 1/5-th
of draws.
The
entire
of the time of the parametric bootstrap, using the
Monte Carlo work took
5.
We
studied estimation
models discussed
in
400).
same number
weeks of computer time.
Conclusion
and inference a general model
dependent variable jumps at a location that
of interesting economic
several
n =
is
in
which the conditional density of the
parameter dependent. This model includes a number
the recent literature of structural estimation.
We derived
the large sample theory of a variety of likelihood based procedures, and offered an array of useful
and
practical inference techniques, including
results provide a theoretical
and
Wald type and Bayes type
practical solution to
inference methods.
The
an important econometric problem.
Appendix A. Regularity Conditions C0-C5 and D1-D3
Notation.
Throughout the paper,
—> v and —>d denote
convergence
c
and const denote generic
in probability
positive constants unless stated otherwise;
and distribution, respectively;
21
||x||
is
the usual Euclidian
norm Vx'x, and
Note the
|x| is
supremum norm
used to denote the
densities of interest /(e|x,7) are discontinuous at
simplify notation,
we
e
=
radius 5 as measured by
7 with
Conditions C0-C5 The
7
=
{P,a)
CO
d
Q CR
-
/(Y,
in Q.
sup J<fc
and are not
C4
(Y,,
Xi)
compact convex
0, etc.
(xi,...,Xk)-
=
differentiable in e at e
when
e
^
To
0.
and also use
0,
it
to
Also, .6,5(7) denotes a closed ball
|
C5
and
an
is
=
e
—
where x
\xj\
•
|
x
following conditions apply to
Conditions
For each 7,
is
=
use 9/(e|x,7)/3e to denote the usual partial derivative
denote the directional partial derivative df(Q + \x,f)/dt when
at
|x|
i.e.
,
apply to any 7
=
£
=
(/3,
a) in
CO -C3
apply to any
some
.8,5(7) f°r
denned on probability space
any 7 and
interior Q\ for
Conditions
6 R.
e
and 7
R x R*,
sequence of vectors in
iid
set such that 70
X,
in
(/3,a)
7^7, P7 {/(yi
—
>
5
0.
(Q,7",
P7
<?(Xi,/?)|.X,,7)
).
^
S (X, 3)|X,,7)}>0.
/
CI
Xi has cdf Fx, that does not depend on
in 7 and x, we have either
and has compact support X.
7,
In addition to (2.2),
uniformly
(i)
the two-sided model: p(x,y)
(ii)
the one-sided model: p(x,7)
C2
7.
Without
The
e
c
second partial derivative
>
>
c
and f
0,
the density /(e|x,7)
=
e
in
0) that
7 that
is
bounded uniformly
X for each
x on
and
C3
g(x,fi) has two continuous
1
da
aa
continuous in x on
ffi
When
C4
uniformly
in
each
either (a)
P7 -a.s.,
then £^-,[^'1 (7)
C5
In the two-sided
a
^U (7)']
model
is
is
are
;,
bounded
sup 7
EPy Cj
Lemma
f",
EPl Cj (e,
Xi )
<
Cl.i, the
00 for j
=
Xi )
<
00 for j
=
1
,
JWj
sup
=
e
for each
has continuous
first partial
£,7); f(e\x,*y) has continuous
The
Ex J
density
function
and the
in
and the
/?,
first
uniformly
in
00.
x and
/?,
and
specified above derivatives are
=
In/ (Y,
— g(Xi,p)
(7)'] is positive definite
\Xi,^) where 7
and bounded
or (b)
if
=
-^U
(J3,a),
(7)
=
and bounded.
y
-*(X*,0)|**,7) f,
=
1,
(
*
2,3, for all Y,
—
W^rlnf (Y - g(Xi,p)\Xi^)\\
(A.l)
t
g(Xi,/3) 6
R \ {0},
uniformly in
76
Bs(~)o),
=
1, 2,3, for all
Yi
— p(X,,/3) >
0,
uniformly
in
7 e
#,5(70),
where
2, 3.
conditions
C0-C3
imply that there are finite constants f, f,
9" such that
/(e|*,7)</,
sup
\g(x,/3)\<g,
J68.I6X
and
derivatives specified
— g(X,/3)\X;~/)\dy <
\$-f(y
x and
terms
A.l (Important Constants). The
f_, 9, 9',
sup 7
The
0.
1,2,3. Similarly, for the one-sided model Cl.ii the terms in (A.l)
respectively by Cj(e,,X,), j
(e,„
/?.
positive definite
bounded respectively by Cj(f,,Xj), j
where sup 7
in (e,
and bounded derivatives
in
<
in (e,x,i). f(e\x,*y)
in (e, x, 7).
7. Lastly,
0, for all e
upper-semicontinuous at
is
present, for U (7)
EPl [-§^U (7)
\%lnf(Yi - g{XiJ)\Xin)\, \\^ ln f
are
=
/3.
the nuisance parameter
7 and 7
«
uniformly
s positive definite
X for
q(x,~/)
bounded uniformly
is
in
'
=
(t\x,~y)
above are continuous
The function
or
bounded from above uniformly
is
(except at
>
q(x,~f)
loss of generality,
density f(e\x,f)
derivative in
>
>
sup
S6B,i6X
sup
||^*
||^,s(x,/?)||
w /(«|«,7)ll</',
< 3,
22
sup
sup
Hj^cKx,/?)!!
SgB.igX
||Jfc,/(«|x,7)||
<
s",
<
/",
and
also that for
some 6 >
=
and V(0)
{0} in case of Cl(i) and V(0)
[—5,(5] \
/(e|x,7)
inf
*ev(0),xex,7?g
Remark
A.l. The regularity conditions are summarized
location parameters
the conditions
/?
C0-C5
and the shape parameters
are a mixture of non-regular
(1981a) and van der Vaart (1999), Ch.
is
zero to the
left side
a.
jump and
of the
model where the density
Conditions D1-D3 The
prior
p
Q
:
and
Section
in
in case of Cl(ii)
(0,<5]
0.
The parameters 7
2.2.
known, the inference about a
is
regular assumptions, as in Ibragimov
include the
regular.
Thus,
and Has'minskii
Condition Cl(ii) allows for the boundary model, where the density
positive on the right side. Condition
is
on both
positive
is
7.
If j3 is
> / >
=
->
Conditions
sides.
R+ and the
C3-C5
loss function
p
are
:
Cl(i) allows
common
Rd " +d s
-*
for
the two-sided
nonlinear analysis.
in
R+ have
the following
properties:
Dl p
( •)
>
>
p(z)
is
D2
D3
p
Remark
()
is
continuous on Q,
and p(z)
=
iff
z
=
p
0,
convex,
is
dominated by a polynomial of
\z\
as
\z\
—
A. 2. These are standard assumptions on the
>
00.
loss function
p and the prior
ft,
Ibragimov and Has'minskii (1981b). Since BE's become essentially uncomputable when p
do not consider the non-convex
loss
see for
is
specified in Ibragimov
loss
functions
and Has'minskii (1981b).
Appendix
[N.B. In the proofs
B.l.
we
functions for pragmatic reasons. However, the proof of Theorems 3.1-
do not rely upon the convexity assumption and the results apply more generally to other
3.4
example
not convex,
we
Proof of Theorem
Proofs for Section 3
B.
extensively use the constants defined in
3.1. In the proof
we
set the local
Lemma
A.l]
parameter sequence f n
=
70-
Considering a
general sequence does not change the argument but complicates notation.
Following Ibragimov and Has'minskii (1981a), we split the log likelihood ratio process
Q„
into the continuous part
Our
goal
is
to
show that
c
Q n (z)
Q n (z)
(2)
=
\ne n (z)
=
In
L„(f + H„z)/L„(7o)
and the piece-wise constant part Q„
converges
in distribution in
Qoo(z)
= QUz) +
(2),
and analyze each part
separately.
the finite-dimensional sense to
Qi>(z),
where
Qlc(z)
= W'« -
\v' Jv
+
Qt(z)= [
m'u,
1
where each term
is
defined in
Theorem
lu(j,x)dK(j,x),
JRxX
/RxX
3.1.
Given this
^00(2)
result,
the finite-dimensional limit of t n (z)
= axp (Qao(z))
23
is
For z
=
(u
,v')' ,
using that
=
e,-
Q n (z) = ^2ri„(z)
V*
—
g(Xi,j3o),
> {A„(X„«)/n} V0) +
x[l(e<
< {A n (X,,u)/n} A
l(e,
0)]
1=1
n
+ ^(r,„(z) -
7N n
(z))
x
[1(0
A n (X„u)/n) + 1(0
<u <
>
e,
>
A„(X,-,u)/n)]
i=i
+ ^Ttn„(z) x[l(0<e, < A n (X,,u)/n) + l(0>e, >
A„(X;,u)/n)]
<?*<*)
= Ql
(z)
=
f,„(z)
+ Ql
f(Yi
,
-
where
+ u/n)
g(X,,/3
In
f(tj
=
(«)
\X,,/3o
+ u/n,a + v/-fn)
HYi-giXuM&i^o)
- A n (Xi,u)/n)\X,,l3o + u/n,a +
v/y/n)
ln
/(e,|X,-,7o)
r in (z)
=
q(Xi)
In
p(Xi)
<e,)+ln
1(0
A n (x,u) =
The convergence
tinuous part
Ql
Also, in above expressions
for
n(g(Xi,/3o
from that of
and
working with oo's defined
all
in
c
Qn
proofs
Theorem
(z),
we
c
Qn
and
(z)
is
is
standard. In sharp contrast, behavior of discon-
analyzed using the point process methods.
use the algebraic rules of Ibragimov and Has'minskii (1981a)
This
3.1.
is
done to include the proof
a special case. In particular, the expressions involving 1(0
e,
>
0.
Also in the boundary model r;„(z)
=
Part
I
is
>
e,
>
= — oo when
r,„(.z)
QL(2) =
Thus, the term Qlniz)
e,),
+ u/n) -g(Xi,0o ))-
analysis of the continuous part
(z) differs
>
1(0
q(Xi)
lP(X,)
...)
<
for the
boundary model as
cancel, since in the
e;
< A n (X",, u)/n,
boundary models
so that
o.
only non-zero for the two-sided model. Further details follow.
obtains the finite-dimensional limit of
Q cn {z). The proof method is standard for the smooth likelihood
analysis.
Application of Taylor expansion to each
by application of the Markov
LLN and
'
Q^z) = -u'EA(X
s.
f
%
)
{£
o)
'^'l
•
r, n
(z),
Chebyshev
i
=
-L£Aln/
+v'
L
*
1, ...,n,
so that the
expanded terms are
inequality, yields for a given z [see
*
(
e,|X„ 7 o)
*
£
a^'° E/(
= -J
24
iid,
followed
Addendum]
<
|A'^l
v
+ op (l),
where A(A,) = dg{Xi,/3 )
_ ai°g/Hx, T0 )aio S /^|x,-rQ
g
The information matrix
/d/3.
^
'
)
|
the
CLT
W7i _>
giyes
a
equality for
W =7V(0,J-). Also
rf
-J = E d '"l^f
"
implies
follows
it
by
'
7o)
C2
=
19
m = EA(X )(p(X )-q(X
i
Therefore, the finite-dimensional limit of Q\ n (z)
Q^(z)
It
remains to show Q\ n
=
(2)
given by
= urn + Wv -
for
any compact
- A n (x,u)/n\x,/3o + u/n,ap +
set Z, as
v/\/n)
hi,
u/n,cto
<e<
0,
in {e,
X
R_ x Z x
z,x 6
+ v/y/n)
(B.l)
p(x)
<2x
(/'//) xg'\\z\\/JZ
(B.2)
_g(x)
A„(i,a) < 0,0 >
:
<2x(/7/)xS'||z||/ v 'n
A„(x,u)/n}. Likewise
/(e|z,7o)
uniformly
hence consider the two-sided case.
0,
p(x)
A„(x, u) >
:
A n (x,u)/n\x,/3o +
-
/(e
X
R+ x Z x
z,x €
=
-> 00,
In
/(e|z,7o)
in {e,
n
g(x)
-
In
uniformly
\v'Jv.
o p (l). In the one-sided case Q2T. (2)
Note that by assumptions C1-C3,
/(e
is
i )).
i
e
> A„(i,u)/n}. Thus
71
C
sup |Q 2 „ (2)
|
<
2
V
~t
x (/'//) xg' x \\Z\\/y/H x
-•EZ
for
K=
some constant
conclusion
is
||Z||
x
g'
,
where
=
||Z||
sup{||z||
I
6 Z}, where /C
Op {\[y/n~)
(B.3)
by C3. The
is finite
Op (l/y/n)
by C2:
££^1(M <^/n)
Part
2
:
< K/n) =
l(|e,
II obtains the finite-dimensional limit of
= J2
oi(z)
[
|n
Q„
(2).
<2/A" <co.
(B.4)
Recall
.,
^^-l(0<ne,<A
u )) + ln
n (X„u))
t53 1(0 < ne - A "(^"
'
4§
1(0
>
ne -
£ A n(^..«)].
By C2 and C3
n
£^|l(0 <
=
1
+ |l(0 >
where A(X,)
=
<5n(^)
Now
8,
=
^V
£
< A„
(A\,u))
-
1(0
<
ne,
<
A(Xi)'u)\
ne,
> A„
(X,,u))
-
1(0
>
ne,
>
A(A,)'u)|
'^ which implies that
In
f
note that (Q„ (zj)
follows by applying
ne,
1
^TTT
P(X,
1
(°
,j<l) and
<
ne
'
for given
,
j
<
I),
2
2fg"\\u\\
/n
=
o(l),
2
< A (*0'") +
(Q5i (z7-)
<
for
In
(0 > ne,
^7^1
9 (A",)
any
finite
/,
> A(X
f
)'u)
are asymptotically independent. This
a standard argument concerning the independence of minimal order
averages of general form, see for example Resnick (1986) or
Op(l).
Lemma 21.19
in
statistics
van der Vaart (1999)
[
and sample
Addendum
provides the proof for completeness].
19
RecalI that for a density function /, which
derivative,
is
everywhere continuously differentiate except
+
fR f'{u)du= -/(0 ) + /(0").
25
at
with an integrable
The next
step
Q dn
to obtain the finite-dimensional limit of
is
near-to-jump observations, whose behavior
The behavior
.
steps. Step 1 constructs the required point process
and
of Q%.
is
We split
described using a point process.
is
determined by the
the argument in two
derives its limit. Step 2 applies Step
1
to obtain the
finite-dimensional limit of Q^.
Step
1:
Intuition for Step
Define
The
E= Rx
1 is
provided
in
X. The topology on
point process of interest
is
Section 3.2 of the
E
is
standard,
main
[a,b]
e.g.
text.
x
X
is
a compact subset relative to E.
a random measure taking the following form:
for
any Borel subset
A
n
N(A)
= ^l[(n fi ,I,)ei],
i=i
N
random element
a
is
of
M
P (E),
the metric space of nonnegative point measures on E, with the metric
generated by the usual topology of vague convergence, Resnick (1987) Ch.
a review].
We
[Technical
3.
Addendum
provides
show that
N => N in MP (E),
N
for
Theorem
given in
3.1.
This
done
is
in steps (a)
and
(b).
By CI and C2, for any F € 7", the basis of relatively compact open sets in E (finite unions and
bounded rectangles in E), limn-n*, E'N(F) = lim,,-^ nP((rae,,X;) e F)
(a):
intersections of open
=
where measure
F}
m
is
\p(x)l(u
I
>
0)du
defined as m(du,dx)
are independent across
i
+ q(x)l(u <
=
\p(x)l(u
>
0)du]
0)du
x dFx (x)
= m(F) <
+ q(x)l(u <
0)du] x
(B.5)
oo,
dFx (x).
Since {(ne,, -X*) g
by CO, by Meyer's Theorem, Meyer (1973),
lim
n—MX)
P(N(F) =
0)
=
e-
m(F)
(B.6)
.
Statements (B.5) and (B.6) imply by Kallenberg's Theorem - Resnick (1987), Proposition 3.22 - that
N => N in MP (E), where TV
Next we show that
(b):
N
canonical Poisson processes
is
has the same distribution as
No and
mean measure mo(du) = du on
Because
(1987), p. 138.
JVo
a Poisson point process with the mean intensity measure m(-).
N
(0, oo),
N
given in
Theorem
with points {I\} and {Tj} defined
and
N
has the mean measure m' {du)
and N' are independent, Ni(-)
=
JV
(-)
+
iV
3.1.
First, consider the
in
Theorem
=
du on (— oo,0), see Resnick
(-)'
is
No has the
3.1.
a Poisson point process
mean measure mi(du) = du on R, by definition of the Poisson process, see Resnick (1987), p. 130.
Because {Xi, X-} are i.i.d. and independent of {r^Tj}, by Proposition 3.8 in Resnick (1987), the composed
with
process
N2 with
[l(u
>
T
(u,x)
:
0)du
m(du,dx)
+
i->
=
points ({Pi, Xi}, {P;, X[},i
>
1) is
a Poisson process with the mean measure m,2(du,dx)
l(u
(l(u
TO2
1
(1987).
Step
2:
We
=
< 0)du] x Fx (dx) on R x X. Finally, N with the points {T(r,, X),T{T'U A /)}, where
> 0)u/p(x) + l(u < 0)u/q(x),x) is a Poisson process with the desired mean measure
oT~ (du,dx) = \p(x)l(u > 0) + q(x)l(u < 0)]du x Fx(dx), by Proposition 3.7 in Resnick
7
have
for 2
d
d
Q n(z) = Q Ju) =
=
(u,v)
[f>^l
[0
< nu < A(X,)'«]+f>jgjl
[0
>ne, > A(X,)'«]]
+
o p (l).
N:
Ignoring the o p (l) term, write Qi{u) as a Lebesgue integral with respect to
d
Q n (n)= f
JE
where
l
defined in
u (j,x) is
Theorem
l
(j,x)dN{j,x),
u
The convergence
3.1.
of this integral
is
implied by
N => N in both the
two-sided and one-sided model:
By
(a) In two-sided model:
Ku =
the compact set
has compact support but
(f
TV
1-4
at
V(T)
l
Uk
A
r
e
MP (E)
Since
.
=
0.
Resnick (1987),
p.
P[N
e 1>(T)]
e,'s
x X,
:
=
77
<
for
I)
=
j?
or
I
<
sup x€X |A(i)'u|, where
when
discontinuous
is
(j,x)dN(j,x),k
= {N
points of
conditions Cl-3, the function (j,x) M- l u (j,x)
[-?7,+??]
j
N
=>
153. It follows
and j
=
for
)
P[N
some
i
>
bounded and vanishes outside
implies
{Qi(u k ),k <
l)-* d (Qio(u k ),k
map T
3.13 in Resnick (1987)
<
:
T
>->
M
P (E)
is
l
u (j,x)
i->
R' as
discontinuous
where (j? ,x? ,i > 1) denote the
some n > 1] = 0, and by definition of N,
l,fc
e V(T), for
N in MP (E)
Qt(u) = [
is
00 by C3. Thus (j,x)
A(x)'u. Define the
Hence by Proposition
00.
= i4A(zf
jf
are absolutely continuous
Therefore
=
<
77
/}
r(N)-> d T(N) by the continuous mapping theorem,
<
I),
where
l.(j,x)dN(j,
J
(b) In one-sided model: Using the Ibragimov
rules for algebraic operations with
= /£ lu(j, x)dN(j, x) as a binomial random variable:
= 0, where A(u) = {(j,x) e R+ x X j < A(x)'u}.
if N(yl(u)) = 0.
Also define Qt(u) = QtM = JE u (j,x)dN(j,x) = -00 if N(A(u)) > 0, Qt(u) =
Thus, to show finite-dimensional convergence (for 7* = —00 or 0):
oo's stated in
Qi(u)
= -00
Theorem
if
note Q„(u)
3.1,
N(-4(u))
>
=
and Has'minskii (1981a)
=
Qi(u)
0,
if
Qto( u )
N(A(u))
:
l
lim
n—¥oo
it
suffices to
B.2.
C2
)
J)
= P(QUu k
show (N(A(ut)),fc < l)—*d (N(A(u k )),k <
of point processes,
since by
P(Q dn (u k = 7fcl fc <
cf.
I)
/
<
and construction of N, N(9.4(7i t ))
Proof of Theorem
3.2.
The
=
By a
00.
Embrechts, Kliippelberg, and Mikosch (1997)
=
definition of
is
weak convergence
immediate from
N => N,
a.s.
proof applies Theorem 1.10.2, p.
some
I),
p. 232, this
and N(d.4(ujt))
(1981b) that states the limit distribution of BE's provided
First,
for
=~f k ,k <
)
107 of Ibragimov and Has'minskii
general conditions hold.
BE's are measurable by Jennrich's (1969) measurability theorem since they minimize the objective
functions that are continuous in data and parameters.
Second, the following conditions (l)-(3) verify the conditions of Theorem
1.10.2, p.
107 of Ibragimov and
Has'minskii (1981b):
(1)
(a)
Holder continuity of („
(b) Exponential
(z) in
expectation proved in
bound on the expected
E PJ n
where function
a'(\z\
Has'minskii (1981b),
for
any
N
>
0,
—
i.e.
lim^^^
1)
falls
(z)
—
1) is
N - a '<l*l-»
e
U2
=
C.2,
Lemma
>
C.2: for a
<e-°'^- \
l
to the class of functions
(i) a'(\z\
\z\
Lemma
likelihood tails proved in
G
defined on
monotonically increasing to 00
0.
1-
p.
41 in Ibragimov and
in \z\
on
[0,
00),
and
(ii)
(2)
=
Finite-dimensional convergence of £ n (z)
collection (zj,j
<
exp(Q n (z))
=
to £oo(-z)
exp(Qoo(z)):
any
for
finite
I)
(in(Zj),j<l)~^d {too(Zj),j<l),
of the terms are defined in the proof of
(all
(3)
The
p. 107
and conditions
It
—
- z)-
p{z
/
uniquely solved by a random vector Z, which
cf.
3.1.)
Bayes problem:
limit
Z = arginf
is
Theorem
in
by
is
Z
°°
D2
dz,
p
(since
convex with a unique minimum,
is
Ibragimov and Has'minskii (1981b).)
Dl -D3
on the
loss functions
p and prior p.
must be noted that Ibragimov and Has'minskii (1981b) impose the symmetry
However, the inspection of the proof of Theorems 1.10.2 (and Theorem
require the
symmetry and
applies to the loss functions that satisfy
of
p throughout
1.5.2) reveals
D1-D3.
Thus, conditions (l)-(3) imply by Theorem 1.10.2 of Ibragimov and Has'minskii (1981b) our
Zn —
lim
The
B.3.
Proof of Theorem
3.3.
To show claim
lim
Consider the problem min c
to set c
= —EZ.
W.
E Pl
Suppose that
p {Z
c^O,
E7n(5)
+ c)
1,
and any
will
EP
1.10.2 of
N>
,„p{Zn )
be used
Ibragimov and Has'minskii
= EP
p(Z)
<
m.7)
00.
later.
note under the conditions of Theorem 3.2 by (B.7)
= EPlo Z.
1
[H- (7- 7„
=
where p(z)
,
d
lim
\
not needed to prove Theorem 3.2, but
last result is
and Theorem
1.5.2
local sequence 7n(5) = 70 + H„S,S 6
H- N Plnls) {\Zn > H} = 0, and
(1981b) that for any
result:
Z.
><j
Furthermore, conditions (l)-(3) imply by Theorem
their book.
that the proof does not
(*))]
z'z.
The
solution of this minimization
problem
E Plo p(Z + c)<EPlo p(Z),
where by
Lemma
3.1
and the rhs of (B.8)
the lhs of (B.8)
is
is
Yvm
i
since
is
2,
the asymptotic average risk of the sequence of estimators
mean
established in
P^ w {^(T)) j <
(70),.}
=
Ym^P^^Zn (r)).
< o} =
assumed to be a continuity point of the distribution of (Z
c
where p(z;r)
Lemma
=
(1 (z
3.1.
A
7+
Hn c
Lemma
3.1.
note that by Theorem 3.2 and definition of weak convergence
m in
or by
(B.8)
the asymptotic average risk of the sequence of posterior means 7, which contradicts
the asymptotic average risk efficiency of the posterior
To show claim
is
then
>
0)
—
Ep P
-,
((
Z
(
T ))j
t)z. Note that the quantity
solution of this problem
Py {{Z(r)) j
>
c}
_
c T
'
)
(t))
..
P-f0 {{Z{r))
<
o}.
Consider the problem
'
EPlo p UZ (t)) — c; t)
.
is
given by the root of the
first
=
r or P, {(Z (r)) J < c]
=
28
j
1
is
finite for
any c by (B.7)
order condition
-
r,
(B.9)
c
i.e.
=
—
(1
r)-th quantile of (Z(r))- (under the condition that (Z (r))
neighborhood of
Suppose
0).
has positive density in any small
c 7^ 0, then
E P7o p((Z(t)) -c;t)<E PioP ((Z(t)) j
;t),
(B.10)
j
Lemma 3.1
where by
(7(7-)
Hn
—
,
the lhs of (B.10)
and the rhs
Lemma
and
3.4
c)
is
is
the asymptotic average risk of the sequence of estimators defined as
the asymptotic average risk of the posterior r-th quantile (7 (r))
Then (B.10) contradicts the asymptotic average
3.1 for definitions.
posterior quantiles under the check function loss established in
Thus
must be that
it
second part of claim
limP7 „ (<) {( 7
=
1
-
<
(r'))
lim
Tl—HX>
=
c
in (B.9), so that the first part of
equation (3.12)
2,
P„
7
(
7n
Proof of Theorem
Lemma
Lemma
Then
<
J
C.2 and (2)
l^
(a)
(z)
>
< 0} J
f
some
00
U
Kn
.
,
9 (z)
either the set {z
is
at the origin.
marginally
to £
x
ln(z)fi (70 + Hn z)
+ HnZ )dz
jAz')»{io
sK
'
y n (S)
+ Hn z 6 Q}
in
,
is
(Rt
< l)—*d (R<x>{zj) ,j <l), where
R7"( z ) -> d arginf 2gR P«,(z).
(r")),|
(z)
(z) has the properties specified in
under a parameter sequence 7 n (<5)
R^
=
70
+
H n 5.
,
dz
>
^
^
which case
f
JK
e
.
,
9 {z)
K=R
x (z)
is
,
j~e^WW dz
;
or
Kn = K
~y n
a sequence of R-valued
is
-
a fixed cube centered
is
{8).]
random functions,
i.e.
for any
convex and finite on an open non-empty set
de-
<
I
a.s.,
is
proven on
(1)
Lemma
then
and
(a)
is
a special case of
Lemma 1.5.1
in
Ibragimov and Has'minskii (1981b).
p. 106-109 of Ibragimov and Has'minskii (1981b) under more general conditions
(2).
B.2 See Davis, Knight, and Liu
part of the proof of
Theorem
3.4
is
(1992)
and Pollard
(1991).
done by setting the true parameter sequence 7„
=
(5)
Considering general sequence does not change the argument but significantly complicates notation.
Write
Zn
Zn
=
(r)
00
,
than conditions
first
The
i
-
Assertion (b)
The
r":
(B.ll)
converges in distribution in finite- dimensional sense to Poo,
Proof of Lemma B.l Assertion
Proof of
proven.
=
J
taken under any local sequence
B.2 (Convexity Lemma). Suppose {Rt}
ar g inf zeR'!
< {Zn
is
for any vector-valued continuous function g(z) dominated
R
If
J
Suppose that (1) £„
Lemma
(zj) ,j
and r
two useful lemmas.
t 7l (z)).
[Convergence in distribution here
Rt
r'
> o\
(r'))
L
fined on
.
=
j
ball at zero a.s., (b)
:
r
for
<0}-P7o {(Z(r')) >0}=r"-r'.
£ n (z) converges
—*
equation (3.11),
2,
=
P^mUZn (r% <
P,.„,{(ZB
lim
71—fOO
3.4. Consider at first
in
= Hm
(7(1-")),}
See section
3.1.
claim
immediate from (B.9) with c
B.l (Integral Convergence of
by a polynomial as z
Here
is
(r"))
K.
= l-P,o {(Z(r"))
j
B.4.
(*)),-
{(Z„
(S)
Lemma
.
risk efficiency of the
(r)
=
(c(t)
- r„
arg inf T„ (z)
*e R
,
(70)).
I\, (i)
Note that
= / p (z - r„
JR d
(70
+ Hn z) + rn
29
(70)
;
r)
/"
(
* )/X ( 7°
.
JRd e n (z')p,(-ro
+ %"**
dz.
+ Hn z')dz'
.
,
70.
Since
\r n
+ Hn z) —
(70
—
r„ (70)
= O
R"z\
\Hn z\ a
Iz
>
for a'
J
the case by the properties of the
0, it is
check function that
-
\p (z
for
any
Lemma
Hence by
2.
Tn
r„ (70
+ Hn z) +
\
'
Recall that
Z (r)
again,
„
in
it
for all
z.
Next we need to
it
suffices to consider
(M _I £)
E^\& 2 (M-'O
(b)
EP^*{M- i)<e-
sequence of
M and
-yn (<5)
=
70
lim
ri
c '<M- l
Then
risk in
R
and
2
I
>
c'
< cW -
—
all
The
Lemma
n-too
Proof of
Chapter
5.
Lemma
(B.12)
+ Op(l)
pv;
.
by
(5) is given
.
Zn
(r).
+ Hn z) — r„ (70) — R'z =
Consider linear transformation f = M'z
Then, the likelihood for £ given by
M
.
0):
+
2|e'|
V jf '|)
(r)
3.1,
of
Ibragimov and Has'minskii (1981b)
for
any
0,
;
\
r
1
it
= J5p70 p l^Z (t)
(B13)
\
;
tj
<
00.
can be concluded that {c(t)} minimizes the
3.1.
A("-)Jk
EP^
p{Zn \T)d&.
{£} which are measurable functions of sample (Yj, Xi,
rest of the
i
<
n),
argument that establishes the r-posterior quantiles are
unbiasedness and resulting coverage properties
B.5.
dz
,
the sense of achieving the infimum of
statistic sequences
r„ (7 n (5))).
(r)
e"| (1
(hm^ EP^ {s) p \Zn
KfR d
(c
r„
a column of
is
lim sup lim sup —T^r /
over
.
\
identically to the steps in the proof of
asymptotic average
Z
(t) -> d
H- N P, n(s) {\Zn (r)\>H} =
hence
n z') dz'
j Ri in (z')ft (70
Lemma C.2. By Theorem 1.5.2
+ Hn 5,5 e Rd and any N >
—too, n—too
n z)
(70
By Lemma B.2
(z).
that
>
- e n/2 (M~ l t)
l
by nonsingularity of
p.
only linear functions such that r„ (70
M such
has the properties (for some c
(a)
local
;
establish the uniform integrability for
defined by the nonsingular matrix
£n
in (z)
follow that the marginal limit of
denotes the minimizer of T^,
what follows
\Hn zf)\,
2\0(z
+H
/77wt^L^
+H
+ H„Z)
—
(jO
ft
.
.
Zn
Thus,
<
|
fRd ln(z')ft(yo+Hn z')dz'
./if*
Lemma B.l
In (z)
ID
p(z-Rz\T)-f
f
/
r)
B.l
(p(S- Hz; r)+0(z- \Hn z\"))
=f
Jk*
(z)
=
Applying
;t)- P (z- R' z;
r„ (70)
Claim 1
is
is
identical to the proof of
just a special case of
Theorem
Theorem
1.1 of
(1
where
—
Zn =
r)-quantile
3.3
Lehmann and
Casella (1998),
Claim 2 follows by an argument similar to that given by Ibragimov and Has'minskii (1981b)
p. 93. Details
are omitted for brevity. See
Chernozhukov and Hong (2003).
30
Appendix
Let 7
=
=
and h
(/?,a)
/2
jJ
(note that
Lemma
all
Ft
(dx)
x,f
);
2
taken outside the
•
|
+
ft)
-
6 Q, uniformly
'2
-brackets, since
in
z,
(*)'
E PJ n
/2
it
I^P)
',
W
"*
1
Kg
1
+
and 7
(z)
B>
C.2. For some
<
2>
1
n
there are a
r| (7;
7
+
'2
-
and
/2
In (z'Y
|
|
_r
<
e
E P,\tn (z)' /2
-
*„ (z')'
r2
2
h> P
(1) follows
is
\
dyFx
(dx)
>
A>
and
such that for
< A
(|ft,|
\
ha
2
\
)
(C.l)
.
some no and
(C.l), for
>
+
all
n > no
uniformly in 7
<A (\z - z'\)
(1
+2
•
V
\z'\
\z\)
(C.2)
.
denoting the sup norm,
«.
2/
/
+ u / n a + V /Vn)
>
.
/
/--\
(
3)
""(l-I.H?)
„
when
/2
|
r
>
2
<nr|
0, (3) is
(7
+
p. 260
in
m.«(M.I"l 2 )
(4)
given by (C.l), and
(u/n, v/-J^)
2
)
;
7
(4)
and
(\z
-
and
(4) are obvious.
31
by the standard
(2) follows
by the
(5) are obvious. Also,
z'\) (1
by the standard manipulation of Hellinger distance, as on
given, (3)
(1) follows
+ (u'/n, w'A/n)) < A
< A
-.I-I + -
(6)
Ibragimov and Has'minskii (1981a);
<A (\z -z'\ + \z- z'\
(1981a), (2)
2
\
a'
ft)
depends only on the diameter of the parameter space Q\
r)
inequality (1
1,7)
does not depend on the parameters).
and some
H„,z' 6 Q,
1
£p7 |4,
',
manipulations of Hellinger distance, as on
—
;
(i)
,„
1/2
:
where constant
+ Hn z 6 5
< e-'""-
Lemma
(z)
2
(y-g (x,0)
7
^J ^
(r,^h)>2
z such that 7
Proof of
where
1
f
C.2 (Exponential Tails and Holder Continuity). Given
£p7 «»
C.l.
(v-9(x,0 + h
|
+h
(a) rl
Lemma
is
+ ft) 2 =
(hg,h a ). Define the standard Hellinger distance ri (757
C.l (Hellinger Distance Properties). Under C0-C5,
h such that 7
/or any
l/'
Important Lemmas
C.
+
2
(\u
\z'\
p. 260 in
V
-
u'\
|*|)
+
\v-
2
v'\
)
,
Ibragimov and Has'minskii
Lemma
Proof of
C.2.
1/2
+ h)<Ex
rl{T,l
C.l. In order to establish (C.l)(b),
j (/
(
M
y -g(X,l3+
I*;/?
+
fc/j,
=
7
let
=
a) and h
(/?,
i/2
Q + ha ) - f
(hp,h a ),
2
(y
-
dy
g (X,0) \X; 1 ))
(i)
Ex
<
f
\f (y
- g(X,p + h )\X;P + h ,a+ha ) - f(y - g(X,0)\X;i)
dy
'[9{X,0),g(x,0+h /) )]
Hsix,,
+ Ex j
J\9iX.0),9( X
/2
(/'
+h
(y- g(X,0
n + h) - f
)\X
'0 + h t))Y
Ex
J\ 9 <.x,0), g (x„
(X,8+h
f (y
c
M
-g(X,0 +
+ h e ,a) -
\X;p
1/2
/ (y
(y- g(X,p +
-
hp)\X;fi
+ h p ,a)Y
dy
g (X,0) \X;-y) dy
)]
(2)
< 2Ex \g(X,p +
df
Ex [>
2
+
I
\h a
\
+ ]h0]
Ex
l
\
<
2f\h s \Ex
df(y-g (X, p + whff)
\)
=
\X,
[0,6] if
u,hg)
,
a)
du
2
+O
(\h a
+
)
\
+u
J
h.
dyduj
dydw
(\h
\)
2
)
\
a
<
6
and
by triangle inequality and from
=
<
[b,a] if 6
\a
—
2
<
b\
by the Taylor expansion and Pubini. The
and Pubini. The second term
2
<
|
and the bound
a,
\a
inequality follows from the fact that \f (|)
is
+ wh e
§_
dp
= 0(\h e + 0(\h a
[a,b]
-9(X,/3 + h )\X,p + hg,a
da
\dg(X,p +
dp
[
Jo
where
(y
f\
_
(3)
h fi )-g(X,0)\f
—
b
2
\
is
>
a
uniform
and
The second and
/.
term
first
for
in
6
>
in 7.
The
The
0.
first
first
inequality follows
term
in
the second
third terms in the second inequality are
the third inequality follows from Taylor expansion
from C4, while the third term
in the third inequality follows
in
that inequality
by C2.
The
lower bound from below, the equation (C.l)(a),
sufficiently small S
and
\h\
>
Indeed, for sufficiently small S and
r
On
<
\h\
2
{T,l
8
it is
+ h)>
r 2(7;7
It
for
some a >
the bound in (C.l)(a)
remains to prove (C.3)
'(-
'"
>'•./
for \h\
<
8 for
,/2
(f
l[9(X,e),g(x,(!+h l) )] x
Ex f
established by considering separately
\h\
<
S for
some
is
CO
for all
+ /0 >
es
\h\
>
>
2
|/i„|
some
y
sufficiently small 8.
-g(X,P +
y
+
h £ G
(C4)
0.
-
(C.4).
Write
(y-g(X,p + hg)\X n + h)-f
(
(C.3)
).
8 such that
immediate from (C.3)
1/2
(f
shown below that
const max(|ft^|,
the other hand, by the identification condition
Hence
is
S.
h s )\X; 1
i
'2
(y-g(X,p)\X--y)Ydy
'
+ h)-f ,/2 (y-g(X,p)\X; 1 )Ydy.
we can bound
For small hg,
/ from below uniformly in
Ex ^\g(X,p + hg)-g(X,fi)
using assumption
C3 and
,/2
(X, 7 )
(X,f) - q
Taylor expansion.
i/
Ex f
J\g{x,e),9(x.g+h
l[g(X,3),g(x,B+hg)Y
(f
v
9 )Y
X
I/2
p
(*
c
f
l[g(X,0),g(x,0+h
J\g(X.0),g(X.e
+ h„)Y
)]
On
df
ll2
w&
small
for sufficiently
area
[g
C4(b)
Ex
|ft|,
(X, 0) g (X, p
,
f
2
inf, u
is \h\
u
dy
1
where the remainder term
+ hg)} and
using bounds in
2
.
(757
+
=1
>
const \hg\
C1-C3, bound // from below
bound
h)
>
const
(y-g(X,0)\X-y)) dy
J
Ex
^ 7f
5(!,
Ju
+ O (|/ig|)
(
x,fS),g<x.0+hg)Y (
1
>
const
|/i|
>
const
|/i
Q
|
,g);i,)
'u)
dy
,
neglecting the integrand over the small
so.
On
the other hand,
if
assumption
is
*
•
by:
2
i/2
dy-o(\h\ 2 )
O (\hg\) arises from
C2 and C3 to do
inf
&•)
inf 7 r
|
dg{X-p)\
Ex
<>,
(y-g(*;/?)l7)V
:
f(«
Conclude
)
<9t
ff^(y-g(X,/3)-n )'
holds, the uniform lower
,a/' /2
const
the other hand, by
(y-9(X\P)h)
Under C2, C3 and C4(a), a further lower bound
>i*r
>
*(y-g(X,p + hg)\X- 1 + h)-f
V
\
l)
7 by
max
(\hg\, \ha\
S3
Ex
^
J
I
X '^'^
al
da
u
)
d V> const l^|
2
References
Aigner, D., T. Amemiya, AND D. Poirier (1976): "On the Estimation of Production Frontiers: Maximum Likelihood
Estimation of the Parameters of a Discontinuous Density Function," International Economic Review, 17, 377-396.
Andrews, D. (1994): "Empirical Process Methods in Econometrics," in Handbook of Econometrics, Vol. 4, ed. by
R. Engle, and D. McFadden. North Holland.
Andrews, D. W. (1997): "A stopping rule for the computation of generalized method of moments estimators,"
Econometrica, 65(4), 913-931.
Artstein, Z., AND R. J.-B. Wets (1995): "Consistency of minimizers and the
SLLN
for stochastic programs," J.
Convex Anal., 2(1-2), 1-17.
Beroer, J. O. (1993): Statistical decision theory and Bayesian
analysis, Springer Series in Statistics. Springer- Verlag,
York, Corrected reprint of the second (1985) edition.
Bowlus, A., G. Neumann, AND A. Kiefer (2001): "Equilibrium Search Models and the Transition from School to
Work," International Economic Review, 42(2), 317-343.
Chernozhukov, V. (2001): "Extremal Quantile Regression," MIT Working Paper.
Chernozhukov, V., AND H. Hong (2003): "Likelihood Estimation and Inference in Some Nonregular Econometrics
Models," Working Paper, Department of Economics MIT, to be published by the Social Science Reserach Network
New
at www.ssrn.com.
Christensen, B., AND A. N. Kiefer (1991): "The Exact Likelihood Function for an Empirical Job Search Model,"
Econometric Theory, 7, 464-486.
Davis, R. A., K. Knight, and J. Liu (1992): "M-estimation for autoregressions with infinite variance," Stochastic
Process. Appl., 40(1), 145-180.
S. G., AND H. J. Paarsch (1993a): "Maximum Likelihood Estimation when the Support of the Distribution
depends upon Some or All of the Unknown Parameters," Typescript. London, Canada: Department of Economics,
University of Western Ontario.
Donald, S. G., AND H. J. Paarsch (1993b): "Piecewise Pseudo-Maximum Likelihood Estimation in Empirical
Models of Auctions," International Economic Review, 34, 121-148.
Donald, S. G., AND H. J. Paarsch (1996): "Identifiction, Estimation, and Testing in Parametric Empirical Models
of Auctions within Independent Private Values Paradigm," Econometric Theory, 12, 517-567.
(2002): "Superconsistent estimation and inference in structural econometric models using extreme order
Donald,
Econometrics, 109(2), 305-340.
and R. Wets (1988): "Asymptotic behavior of statistical estimators and of optimal solutions of
stochastic optimization problems," Ann. Statist., 16(4), 1517-1549.
Embrechts, P., C. Kluppelberg, AND T. Mikosch (1997): Modelling extremal events, vol. 33 of Applications of
Mathematics. Springer- Verlag, Berlin, For insurance and finance.
Flinn, C, AND J. Heckman (1982): "New Methods for Analyzing Structural Models of Labor Force Dynamics,"
Journal of Econometrics, 18, 115-190.
Hirano, K., and J. Porter (2002): "Asymptotic Efficiency in Parametric Structural Models with ParameterDependent Support," October 2002, forthcoming in Econometrica.
Horowitz, J. (2000): "The Bootstrap," University of Iowa; to appear, Handbook of Econometrics, vol. 5.
Ibragimov, I., AND R. Has'minskii (1981a): Statistical Estimation: Asymptotic Theory. Springer Verlag, Chapter
V: Independent Identically Distributed Observations. Densities with Jumps.
(1981b): Statistical Estimation: Asymptotic Theory. Springer Verlag, Chapter I: The Problem of Statistical
Estimation.
Jennrich, R. I. (1969): "Asymptotic properties of non-linear least squares estimators," Ann. Math. Statist., 40,
633-643.
Knight, K. (2000): "Epi-convergence and Stochastic Equisemicontinuity," Preprint.
Koenker, R., AND G. S. Bassett (1978): "Regression Quantiles," Econometrica, 46, 33-50.
Lehmann, E., AND G. Casella (1998): Theory of Point Estimation. Springer.
Meyer, R. M. (1973): "A poisson-type limit theorem for mixing sequences of dependent 'rare' events," Ann. Probastatistics," J.
Dupacova,
J.,
bility, 1.
Paarsch, H. (1992): "Deciding between the common and private value paradigms
in empirical
models of auctions,"
Journal of Econometrics, 51, 191-215.
Pflug, G. C. (1995): "Asymptotic stochastic programs," Math. Oper. Res., 20(4), 769-789.
Politis, D., J. Romano, AND M. Wolf (1999): Subsampling. Springer Series in Statistics.
Pollard, D. (1991): "Asymptotics for Least Absolute Deviation Regression Estimator," Econometric Theory,
186-199.
7,
AM
S. (1991): "Asymptotic behavior of the number of regression quantile breakpoints," SI
J. Set. Statist.
Comput., 12(4), 867-883.
Portnoy, S., AND R. Koenker (1997): "The Gaussian Hare and the Laplacian Tortoise," Statistical Science, 12,
279-300.
Resnick, S. I. (1986): "Point processes, regular variation and weak convergence," Adv. in Appl. Probab., 18(1),
Portnoy,
66-138.
(1987): Extreme Values, Regular Variation, and Point Processes. Springer- Verlag.
Robert, C. P., AND G. Casella (1998): Monte Carlo Statistical Methods. Springer- Verlag.
Rockafellar, R. T., AND R. B. Wets (1998): Variational Analysis. Springer- Verlag, Berlin.
AND R. J.-B. Wets (1986): "On the convergence in distribution of measurable multifunctions (random
normal integrands, stochastic processes and stochastic infima," Math. Oper. Res., 11(3), 385-419.
Salinetti, G.,
sets),
SMITH, R. L. (1994): "Nonregular regression," Biometrika, 81(1), 173-183.
van der Vaart, A. (1999): Asymptotic Statistics. Cambridge University Press.
Van der Vaart, A. (2000): "Lecture notes on Bayesian Estimation," preprint, Aarhus University.
van DER Vaart, A. W., AND J. A. Wellner (1996): Weak convergence and empirical processes. Springer- Verlag,
New
York.
Wald, A.
(1950): Statistical Decision Functions.
John Wiley
&
Sons
Inc.,
New
York, N. Y.
Table
1.
Estimator Performance for Intercept
Estimator
RMSE
MAD
(Based on 500 repetitions).
/?o-
Median
AD
pthloss (p=10)
n=100
Posterior
mean
0.0114
0.0081
0.0059
0.0401
Posterior
median
0.0115
0.0078
0.0054
0.0433
0.0127
0.0088
0.0063
0.0369
MLE
n=400
Posterior
mean
0.0029
0.0021
0.0015
0.0104
Posterior
median
0.0030
0.0020
0.0014
0.0106
0.0034
0.0023
0.0015
0.0103
MLE
Table
Estimator Performance for Slope
2.
Estimator
RMSE
MAD
/3i.
Median
(Based on 500 repetitions)
AD
pthloss (p=10)
n=100
mean
0.0212
0.0147
0.0105
0.0983
Posterior median
0.0214
0.0144
0.0096
0.1053
MLE
0.0216
0.0145
0.0096
0.0693
Posterior
n=400
Posterior
mean
0.0053
0.0037
0.0026
0.0227
Posterior
median
0.0054
0.0037
0.0025
0.0228
0.0057
0.0038
0.0024
0.0201
MLE
Table
3.
Comparison of Inference Methods: Coverage and Average Length of the Nominal
90% Confidence Intervals (Based on 500 repetitions)
Confidence Interval
coverage: intercept
length: intercept
coverage: slope
length: slope
n=100
Posterior Interval
0.87
0.0298
0.85
0.0555
Bootstrap P-mean
0.88
0.0392
0.86
0.0720
Subsampling: P-mean
0.83
0.0416
0.82
0.0770
Limit process: P-mean
0.86
0.0346
0.84
0.0587
Limit process: P-median
0.85
0.0352
0.86
0.0610
0.93
0.0347
0.95
0.0653
Posterior Intervals
0.87
0.0075
0.86
0.0140
Bootstrap: P-mean
0.84
0.0089
0.88
0.0167
Subsampling: P-mean
0.82
0.0085
0.83
0.0158
Limit process: P-mean
0.89
0.0085
0.82
0.0145
Limit process: P-median
0.86
0.0087
0.86
0.0150
0.89
0.0084
0.92
0.0157
Limit process:
MLE
n=400
Limit process:
MLE
Technical
This
addendum
Addendum, Part
includes the excluded material, such as omitted simpler calculations and simpler proofs,
and some background material on point processes. The addendum
MIT
I
be made available as a part of an
will
Economics Department Technical Report published by the Social Science Research Network.
Appendix D. Point Processes
The
following definitions are collected from Resnick (1987).
Definition D.l
(MP (E)).
Let
E be a locally compact topological
the Borel cr-algebra of subsets of E.
>
p(K) <
{Xi,i
is
point measure (p.m.)
K
oo, for any
compound
CE
compact, then p
otherwise. Let
outside a compact
A
said to be Radon.
is
MP (E) be the collection of
all
Radon
£ to be
space with a countable basis, and
pon (E,£)
is
a countable collection of points (called points of p), and any
1},
converges vaguely to p,
is
A
a measure of the following form:
set
p.m. p
is
A
G
£:
simple
for
= £V l(i, 6 A). If
if p{x) < 1
Vx 6 E, and
p(A)
MP (E)
C
point measures. Sequence {p n }
/ /dp„ — > / fdp for all functions / € Ck (E) [continuous, real-valued, and vanishing
Vague convergence induces vague topology on
P (E). The topological space
V (E)
if
M
set].
metrizable as complete separable metric space.
M
P (E)
M
denotes such metric space hereafter. Define Ai p (E)
to be cr-algebra generated by open sets.
Definition D.2 (Point Processes.
map
measurable
N
:
(CI,
the point process N(u;)
values in
M
P (E)
is
the
J7 P) —>
is
,
Convergence
(Mp (E) ,M P (E))
some point measure
same
as for
in
,
in Distribution.).
i.e.
for
MP (E).
any metric space,
cf.
A
point process
every elementary event
Weak convergence
w
6
ft,
in
M
P (E) is
of the point processes
Resnick (1987): we shall write
N„
a
the realization of
=$
N
N„
in
taking
MP (E)
—> Eph(N) for all continuous and bounded functions h mapping MP (E) to R. Note that if
if Eph(N n
N„ => N in MP (E), then JE f(x)dN n (x)—>d J f(x)dN(x) for any / 6 Ck(E) by continuous mapping
)
f:
theorem.
Definition D.3. (Poisson Point Process) The point process
intensity measure
(a) for
any
F
m
on (E,£),
= *) =
<
<
(Ft,i
k) are disjoint sets in £,
then (N(F,),i <
Appendix
k) are
E. Plausibility
This section illustrates the plausibility of conditions
private value auction
section: 02
similarly.
is
=
1
Assumptions
Yi,Xi
and
8\
model
=
in
if
< oo
ifm(F)=oo,
=
section 2.1.
independent random variables.
of
C0-C5
CO C5
using the Paarsch's (1992) independent
For clarity consider the example used
exp(/3'Z), with the following assumptions.
For example 62 can be
notationally
Al
a Poisson Point Process with mean
-W
--"MfOVH
{
if
is
€ £, and any non-negative integer k
P(N(F)
(b)
N
if
made a
in
the Monte-Carlo
More general examples can be
verified
function of regressors too, and verification proceeds similarly but
much more burdensome.
(in
addition to those listed in Section 2.1)
(Z,,mi) are
iid, /?
G B, a compact convex subset of W*.
1
X
6
X, a compact subset
of
Rd " +I
.
A2 Fx
A3
3
(x) does not
M
< m, <
Verification of
for
CO:
depend on
is
positive definite.
M < oo, where m,
some
Al
EZZ'
/3.
implies iid sampling
number
(non-degenerate)
is
of bidders (minus
and compactness and convexity
of
1).
parameter space stated
in
CO.
Under the stated parameterization
g
=
(A',/3)
-
exp {0 Z) (m
1) /
(m -
2)
and
9{x
'('••
",
'
A2
Note that 7
=
probability.
Since the f (y
and
j3
and
A3
imply that
— g (X,/3)
^r,m+1 He>o).
.
for
/3
^
/3'
g(X,@)
,
\X,/3) density function
is strictly
X
g(X,/3') for
some
monotone
g (X,/3), identification
^
in
with positive
holds.
Verification of
CI: Clearly the model
is
a one-sided model Cl.(ii) in which f(e\x,(3)
P(XJ)=m
is
strictly
for
is
each
A
=
and
M m+,= 9lxT)
Al
and A3.
C2: / (e|A,/3), as defined, is obviously upper semi-continuous at
/3, and is uniformly bounded by Al. Its first derivative in e:
in e
first partial
except at
e
=
and bounded uniformly
derivative of / (e\X,
«
=
0, is
maximized at
e
=
X
clearly continuous in
and
/3
in (e,
A, /?) by
Al
and A3.
/?) in
d
f{e\X,p) = Z
dp
is
q(x,/3)
and
continuous
The
and from above by
bounded away from
Verification of
[g(X
=
9(X,0)
m — (m + 1)
[e
and bounded uniformly
+ g(X,/3)]i
in (e, A,/3)
by
f(e\X,P)
Al
and A3. This holds
similarly
2
for
a
-*
epep rf(e\X,P)
2
d/Sd/3 7
f(e\X,0)
=
ZZ'f(e\X,j3)
I
[e
All of the
above quantities are continuous
dp
in
X
'i_V_ (m + 1)r
9(X,j3)
m — (m + 1)
+ g[X
for each e
f(y-g(X,P)\X,0) \=l(y>9(X,p))
m
2
i^L +
[e+g(X,p)]
and
/?.
Finally, for
g(X,pr
,,771
+
1
some constant
-9(X,P)Z
Verification of
Ex f\&{*-'&>fi\*' i*)\*s Bx
C3: This
is
verified
immediately by noting that:
^g(X,P) =
g(X,p)Z,
+
1)
jM
[e
C
<i{y>-C)—^,
which implies
sup
(ra
cy=x dy <
00.
+ g(X,0)Y
and
-g{X,p)=g(X,p)ZZ',
dpdp
due to Al. Also note that
dg(X;p)dg(X;p)
dp
dp
for
some
>
c
due to
Al and A3. By A2 EZZ'
Verification of
C4:
Verification of
Verification of
C5:
When
/
(Y,
C4
is
2
E
[g(X,j3) ZZ']
7 so that
is
cEZZ
is
not present.
positive definite.
is
a
not needed since parameter
the true parameter
>
e,
=
Y,
—
g(Xj,P), we have that
m—l^S-^-^l (Y, - g(X„p) > 0)
- g(X„0)\X,p) =
.
[e,+g(X„l3)r
Al and A3
Therefore by
sup
£\nf(Y-g(X„0)\X„p) = supl(Y,-g(X„P)>0)
(m+D
&
B
m — (m + 1)
g(x„p)
[t,+g(X„p)}\
z,
9ix
t,
<
+
rJK^
g{Xi,P)
C\
<
00,
and
9(x J)
^\nf(Y,-g(X,J)\X,J) = sapl{Yi-g(XiJ)>Q) {m + d
;
e,+g(X„p)
8
sup
3
g(x,J)
[e,+g{X„p)}\
(m+1)
Appendix
F.l.
In Proof of
Theorem
F.
zX
< Cz <
iZ
,
00.
Excluded Simple Derivations
added
3.1. These additional derivations are
at the request of a referee. Write
Tl
<?!»(*)= £>«(*) xl„(«)
(F.l)
t=i
where
At
for
=
l )n («)
first, for
[l(e,
a fixed
>{A„(X„u)/nV0}) +
2,
l(«,
< {A n (X„u)/n A
we use a Taylor expansion
of f,„(z) for each
()})]
i
.
and plugging
this back into expression
Q\n {z)
,1
Tl
dg(Xu p
^
*—
+^)
d
In/(
,
dt
5/3
A n (X„n, n
(
oa
+
1
n
I
op
,
V
2
~~
f
/ f oaoa
a a
n f—
=
,
1
1
e>
\
A,,/3
n
Xi po
+ u/n, Qo + v/y/nj
e,
2
,
V-=V"*
5— ln/(«i|A'i,7o) lin(u) +«'- V"
In/
*—* ^3
'—^
y/n
)
(«
+
"/rJ,Qo
,
\
+
n
V,„/VJ1
\X,,Po
t
+ uln /n,a +
v/y/n
)
/
l,n(u)
'I
l«n(«)
v,
l,„(u)
where «i„
and
a point on the
is
Vi n is
a point on the line
Thus, the terms are
and u that depend on u and
between
line
and v that depends on v and
between
Xi) (but not on other observations),
(e,,
(ty,
Xj) (but not on other observations).
the above summations.
iid in
and the bounded
Application of Chebyshev inequality, the bounded density condition Cl,
C3 removes
condition
Var
derivative
the term II, and adds o p (l) to the above expression. For example, consider
li n (u) in
<
-^f^ln/(ei|*,-, 7o)(l,-n(u)-l)
const
-f]E\li
n (u) {f/ff
*
—
*
•
1|
Tt.
2
< const(/7/)
-2-(/p')/7i
=
(F.2)
(l).
Similarly
£
iE£
*
1
=
ln
f^\ X»^ (!.»(«) -
1)
=
o(l).
1
Hence
Also elementary calculations, as
bounded density conditions
C2
Chebyshev
of
inequality,
C5
show that by the
«,„, Vi n
by
and
and ///, and add a term o p (l) to the whole expression. The
for /, //'
Markov LLN along with
application of
and application
(F.2),
and bounded derivative condition C3, we can replace
remove l,„(a) from the expression
and
in
allows to replace each of the terms with
its
limit expectation,
gives the required conclusion.
Proof of Independence needed in the Proof of Theorem 3.1. These additional derivations are
added at the request of a referee. The proof was omitted from the main text, because it follows essentially
F.2.
from the standard proof concerning the independence of minimal order
sample averages of general form
Vaart (1999).
It is
(partial
sum
statistics (extremal processes)
processes), see e.g. Resnick (1986)
and Lemma 21.19
in
and
van der
presented here for completeness.
c
{Q n (zj) ,j < I) is W„, we need to show that
where ignoring o p (l) term
Since up to o p (l) terms, the only stochastic element in
(Qn
("Uj) ,j
<
')
are asymptotically independent of
On(u)
For clarity
modate
Case
I:
>
I
/
=
we
1
(°
<
nti
< A (*')'«) +ln£
7$l 1
q(X,
present the proof for the case
first
n
=
1
=
P
when
by
similar to that in (F.2),
-^^^ln/(e
V
it
^Ya
p(Xi
By an argument
W
Therefore
ln
f
For notation sake, we do not index
1.
1.
=E
W„,
1
|X,,
I
=
1,
> "e >
'
A W«)
discuss the changes needed to accom-
7.
W —W
n
=
n
7o)l(e, > {A(X,)'u/n} V
op
or
(1),
where
a < {A(X,)'u/n}
1
show asymptotic independence between
suffices to
and
(°
W
n
and Q^
>
ne,
(u).
Define
71
Ql ( u ) =
^
[1
(0
<
nc,
< A{X,)'u) +
1
(0
> A(X,)'u)]
/\0)
and
CO
= J2
QZ,(u)
< * <&{Xi)'u) +
[!(°
limsupP IqI
To show
=
(u)
(F.3), proceed in
P [Ql
= k,Qi
(«)
(u)
J'f
>A
(#/)'«)]
and Q^
(«) follows
by the Portmanteau
W,
< y} <
-P
= *,<&
Lemma and
proving
k:
<
fc,q* (u)
two
>
W„
Then asymptotic independence between
that for any real x,y and integer
1(0
x,
steps. In
Step
1
{<?L
(«)
below, invoking
<x,W n <y}=P [Ql
(ti)
=
iid
(u)
sampling,
< x}
fc,Q* (u)
< x}
it is
P{W
<
y}
(F.3)
.
shown that
p{yiZiW„ <
(F.4)
y}
where
W„
where
?;,
X;
for
i
<n—
k axe
CLT
Step 2 applies
Theorem
3.1
to
=
Thus the proof
1.
P{QZ
> A(X,)'u/n V
{e,
n
—> d W.
and by the Portmanteau
lim sup
Step
W
show that
(u)
Define p n
= P {A(ci, Xi) c }. By
i.i.d.
M"«»,*i)<*
J2
;=n-i +
(u)
and
1
P\
e,
< A(X,)'u/« A
0}.
N => N implies similarly to the proof in
Finally, convergence
= k,Qdn
complete given Steps
n-k
or
Lemma that
is
(l)pUl-Pn)
(F.5)
draws from the distribution of ei,X, conditional on
i.i.d.
A(ei,Xi)
yjn-k E^/tel*.*).
f^ da
< x) <
PJQ^ (u) = k,Qt («)
< x}
.
2.
sampling
1
(0
<
nti
in
CO, the
left
< A(Xj)'u) +
1
hand side of
(0
>
ru,
(F.4) can be written as
> A(X,)'u) =
1, for i
= n- k + 1,.,...,71
]
= P{Q^M=k,Qi(u)<x}
XP
{^g£' n/(f
where function
Step
2.
W„—>d
/„ is
W
defined
follows
£W n ->0,
Var (w„)
(c)
Lindeberg's condition
By C2
E*
^
3.1
e,
> {A(Xj)'u/n} V
and
W„
or
e,
< {A(X,)'u/n} A
0, for
a lit <
n-
k
>,
in (F.5).
by checking three conditions:
(a)
EW„ =Vn^k
|X -' 70)
Theorem
in
(b)
Vor(W), and
-»
Condition (a) requires
'
is satisfied.
EW„ = Vn - kE-§^ In / (ii\Xi,~fo)
/>!(<,,
x,)£
ln
/
-> 0. This
is
true because
-Sx^(«., x «£ln/M^7o)/(*i|*,-,7o)<fei
W*.>7o) /( f ,|X, 7o)^, _
/=
= — vnfcP{A((„X,)}
.
)
P{>4( f „X,)}
limn-,00 P{A(e,,X,)}
=
-Ex f
JAUi.Xi)*
1.
In addition, for large
enough
n:
^-\nf(e,\X, no )f(u\X„ 1 o)de,<2U'/f)(9f)(b
oa
n
>
Hence
£W„
—
A
0.
>
similar calculation verifies
E— In/ (e,|X
Var(wn ) =JS^ln/(e,-|X„7o)^ln/(ei|X 7o)'1-,
^££ln/(e,|X ,7o)^ln/(e
I
The
step
final
|X
1
to verify the Lindeberg condition, that for
is
n-k
^£#l
ln
!
= Var(W„).
,7o)'
all
^
A
1)7o )
and
all
>NW)i( V^ln/^IXi.To)
f
>
0,
> &n-k\/n]
(F.6)
ln/(ei|Xi,7o)
(F.7)
where we define
2
^T, Var
°n-k
n — k
x
\
'i:
lQ
f(^^)
= X'Var
da
Since a\_ k converges to a positive constant by (b), the conclusion (F.6) follows from
^(A'^ln/( e-,|X,, 7o))
P{
which, in turn, follows from lim n _nx,
Case
II: I
>
J
^(V^ln/^pG^o))
4(e,, -X,)}
=
/ P{A(u,
<
oo;
and by C4.
1
This involves more tedious notations but follows the same
1.
t )}
logic.
W„-W, = ov (1),
Note that
where
W
Hence
teau
= ^y"#-ln/(€,|X,, 7o )l(e, >{A(X0'%/n}V0ore, < {^{X^'uj/n}
vn ~^ oa
n
it
show the asymptotic independence between (Q^ (uj) ,j<l) and
suffices to show that for any real Xj and y and any integers kj and k
suffices to
Lemma it
limsupP {qI (m) = k u Q dn
( Ul )
<
x u ...,Q pn (u,)
d
=
k,,Q n
< x h Qi =
(ui)
k,
A
W
n
.
for j
W„
<
for all j
0,
By
=
i)
•
the Portman-
1, ...,l
< y\
(F.8)
<P{Q PX («i) = fci.Qi («i)
<
x,, ...,(&,
=
(ti,)
fc,,Qi («i)
< *,,QU = *} -P{W <
y}
where
Q*n
=^2(l[0 <
ne,
< A (X,)' uj,
for
<
l]
+ 1 [0 >
some j <
l]
+ 1[0 >
some j
ne t
-
> A (X,)' u,,
for
some
j
<
I])
and
QU =
By
iid
^
sampling
fl
in
[0<
CO, the
P {QKu,) = huQiim)
„f
i
v;'«
da
J,
\i>f(e i
< A
left
(Xi)' uj, for
hand
< x U -,Q*(u
\X i ,jo)<y
€i
J\
> A
(AT/)' jy, for
some j <
I])
side of (F.8) (without limsup) equals
t )
= k h Qi(u
< xi,Q n = k)
(
t )
> {A(Xj)' uj/n] V
or
c,
<
{A(X,-)'tt//n}
A
0,
for all j
<
J,
for alii
< n -
w "- '}
- p {v^?
where
W, = -J=f^°=1
^
In
/
I
!
e,|X,,7o) and e,,X, are
i.i.d.
draws from the distribution of a,Xi
conditional on
A(u,Xi)
=
{e,
> {A(X,)'uj/n} V
or
6
u <
{A(A\)'u_,/ti}
A
0, for all j
<
I}.
fc
I,
Theorem
Similarly to the proof of
p{Q pn (ui) = ki,Qi
limsup
<p{Q^(ui) =
Finally, that lim n _,<x,
Case
I
(when
1
Proof of
F.3.
Chapter
5.
f{
^W
ZL
/
1
<
n
N
convergence
3.1,
<
(ui)
N
=>
QS (w) = fo.On
xi,...,
= P {VK <
<
(w/)
=
x,,Q|,
that
A:}
= fci,Q»(ui)<x,,QU =
*i,Q£,(ui)<zi,...,Q^(ui)
y}
Lemma
implies that by the Portmanteau
*}
y} follows similarly to the proof of (a)-(c)
Step 2 in
in
= 1).
Lemma
3.1. Claim 1
is
just a special case of
Theorem
Lehmann and
1.1 of
Casella (1998),
Claim 2 follows by an argument similar to that given by Ibragimov and Has'minskii (1981b)
p.93.
Let Z*
local
= H„' (7,.^,„ — 7n(<5)), where
parameter sequence
I(K)
It
I
follows
=
=
index 5 emphasizes the dependence of the distribution of Zt on the
Define
7n(<5).
In (K)
limsup In (K),
= -^— [ EP
from Fatou's lemma and conclusion (B.7) that I(K)
limsup^R,; I(K)
II(K)
=
By Lemma B.2 and Lemma
— 7n(<5)),
= 1{H~
where
(7
limsup //„(«"),
B.l
+
|K|
=
P ,x
EP
_
lo
p(Z ao )d&
any
/
K
=
by Fatou's lemma implies that II(K)
= rr^r [ E P
-
p{ z
=
0, for |/C|
E
'
K fK
.
Jk(
IIn (K) < In (K)
n
l
(Zn (K))d6.
{r,
-
^_~ ^..
8))
sup{|z|
:
2 6
EP
K}
for
dr,.
p(Z„(K))
and
infinitely
now
follows.
<5
e
risk-efficiency of the
no,
p(Z s (K))d5
for each n,
J
E ",o
risk.
Bayes estimator
S
I.
= H~
Define Z£
wW fK EP
Proof of limsup KTR<i //(X) =
III
~j
P ,>.
hence IJ(K)
K
,
n
< I(K) =
I.
for
any
8
6
Md
as
Indeed, suppose there exists an estimator sequence {7,1} that achieves a
asymptotic average
many n >
d
and n < 00, = E P p(Z s (K)); which
any
Then limsup KTR j H{K) — I follows from (a) II(K) < I for each K, (b) noting that
K f R Z (A") — p Z, and (c) dominated convergence theorem, as shown below.
(2)
Thus
<eKd
finite-sample average risk efficiency of the Bayes estimator
The claim
p(Z).
Bayes estimator defined with respect to
n is the
,
provides the necessary uniform integrability to conclude lim n _Kx,
strictly lower
E P^
=
K}. Define
//„(*)
axg inf
<$}
~/
fo) 6
follows that for
it
P7n(6) {Z*(iir) >
—
1
Zns (K)-^ d Z 5 (K) =
By
JK
-^rj^.
= E P^ o p{Z).
Next let Z^(-K) = H„' (7p.a k ,t.
the loss function p and prior A«(7)
The property -
=
p(Zns )d6.
~t P .\
+
(yn
— 7n(<5)),
then
it
must be that
for
some K, no,
p{Z n )d8 < II„(K), which contradicts to finite-sample average
K
,„
for
each such n.
Rewrite II(K)
[p(Z (K))-p{Z)]
l
s
< I(K)
as
dS/^K)-J E Pno
JK
E P^ o
s
[p(Z s (K))
-
p(Z)]
d5/X(K) <
[p{Z {K))-p(Z)]~ d8/\{K)
<
0.
or
(F.9)
Next as r(K) —> oo
f
JK
E Plo
s
\p(Z (K))
-
p(Z)] ' dS/X(K)
l
where r(K) denotes the width of the cube
follows by (b)
for
But
any
,
77
(c),
6
(F.9)- (F.ll)
Thus JI(K) - J
-+
= [
EPlQ \p{Z^ K \K)) J (-1,1)*
K
(which
and the domination (uniform
(0, l)
d
as
is
it
#tR
d
-
must be that JK
~
dr,
-> 0,
(F.10)
J
assumed to be centered at
zero).
Conclusion (F.10)
integrability) condition:
and any K, \p(ZVT{K) (K)) - p(Z)Y < p(Z), where
imply that
p(Z)]
L
J
EP^
o
[p(Z*(K))
E
Pl(j
p(Z)
- p(Z)] + d5/X(K)
->
<
00.
(F.ll)
as r(K) -> 00.
Addendum, Part
Technical
This addendum includes the material on the
The addendum
the statement of the result.
Working Paper published by the
Maximum
II:
Likelihood Estimation
maximum likelihood
The main
estimation.
text only contains
MIT Economics Department
be made available as part of a
will
Social Science Research Network.
Appendix G. Maximum Likelihood Procedures.
The Maximum
Likelihood Estimator
7
(MLE)
=
given by
is
=
(&',&')'
arginf
- L n (f).
765
We
obtain various properties of the
and -Jn
for the
parameter a), and
one for smooth likelihood analysis.
may be
used for inference
variable
Z e and
in
MLE
its
such as consistency, rates of convergence (n for the parameter
asymptotic distribution. The limit distribution
The asymptotic
/3 is
is
j3
the standard
an extreme type distribution, which
the same way as any standard distribution. For example, denoting the limit
the parameter sequence 7n(<5)
=
7o
+ Hn 5,
n0 - /?„(<*))->*
Zi =
for
distribution for
a
for
any continuously differentiate functions r of
Z*
,
/?,
«(rW-r(ft))^|r(W'Z'.
Then, quantiles of distribution of ^r(/?o)'
'
£sr((3)'ZP
s can be estimated by simulating a series of draws of
according to the formulas given in this paper, and then used for classical inference and hy-
,
(Alternatively, parametric bootstrap
pothesis testing.
Z°
Q-quantile of §gr(f5)'
an asymptotic
,
90%
may be
limit distribution
median
On
bias,
is
also useful for bias correction.
simply take
r(ft)
—
loss functions)
given by
For example, to remove the (first-order) asymptotic
£~-
the other hand, inference about
symmetric
is
kA-S ?}
'rcft-321,
The
For example, denoting by c(t) the
used).
confidence interval
a
is
The
standard.
under the parameter sequence
limit distribution of either
7n(<5)
=
70
+
Hn 5
a
Zl = V^(a - a„(<5)H d Z ±N{Q,J~
where
is
J = —E
in fact
o*
a
,
ln/(Yi —g{Xi,j3)\Xi,p,a)
Z
independent from
,
20
This intuition
(so that
is
points,
e.g.
statistics
or
BE's
(for
),
the usual information matrix for a.
/9
The
limit variable
Za
coming from a small fraction
and information about a coming from the whole sample
impact of the small fraction
based on
dence of minimal order
jump
MLE
given by
l
which follows from the information about
of the whole sample located near the
and averaged over
is
is
is
negligible).
20
Resnick (1986) and van der Vaart (1999)'s
and sample averages.
9
Lemma
21.19 concerning the indepen-
The
combined with the
usual estimates of the information matrix can be used for inference. This result,
one above, can also be used for inference about functions
-
>/n(r08, a)
r(/3
,a
=
))
dr
^
ao)
'
^0 -
a) of both
r(fi,
fa)
+
dr{
^
ao)
and a using the delta-method:
/?
'
^i(& - Qo ) + o,(l/VS)
&dr(g2i aoy za
+
oa
Also, in this case
it is
possible to use the second order expansion,
better capture the estimation uncertainty,
/ZMA A\ r lR ,ao))«
n ^ i.
v^(r(^a)-r(/?
In
many
/?,
and the
dr(p ,a
a
Z
+—-—^—Z
Y Z B + gK^op) Z
-j*
'
'
1
d
2
situations, such as the previous auction example, the functions of
regular parameter
a
is
does not vanish, to
i.e.
9q
Q&
— /?o)
from which ($
r(/3o,a
,-
„„
)
+0p (l/^).
prime interest depend only on
not present. Quantiles of the above distributional approximation can be
obtained by simulation, which consists of making draws of the variables
Z
and
Za
independently of each
,
other, evaluating the above expressions (with derivatives of function r evaluated at the estimates),
taking the appropriate quantiles of the simulated
Wald
intervals
and hypothesis
Theorem G.l
mum
in
R
Particularly,
then
MLE). Under C0-C5, and
Zn = Op (l)
Z°-> d Z a =
The
resulting quantiles can be used for classical
testing.
(Properties of
a.s.,
series.
and then
supposing that
— too(z)
attains a unique mini-
and
J-'W = A/"(0, J--1
),
Z%-> d
Z =
argmin^ cR
^
-
and
£>oo(u),
Ze
and Z" are
independent.
This result states the consistency, the rates of convergence, and the limit distribution of the MLE. The
limit
tion
is
given in form of argmin of a limit likelihood.
Due
to asymptotic independence of the informa-
about the shape parameter from the information about the location parameter, the MLE's
for these
parameters are asymptotically independent.
Remark
G.l (Boundary Models).
y/n(a
and by
In the
- a n (6))-+ d argsup
(Wv - v'jv/2) = J"
u
limit distribution of
MLE
is
— exp(u'm)
(
= argsup
1
W = ^(O.J^
such that
J,
> A(Xi)'u, for
such that
J,
> A(Xi)
u'm
f
explicit:
1
),
point algorithms,
cf.
i
>
lj,
u, for all
i
>
11.
1
'
thus convenient in the boundary models and can be simulated by solving
OLS
through the use of
is
much
less practical.
(Uniqueness). The condition that — loo(z) attains a unique minimum
does not appear to be problematic. The invertibility of the information matrix
Za
,
interior
Portnoy and Koenker (1997). In contrast, bootstrapping requires repeated solutions of
a nonlinear programming problem and
Remark G.2
all
^
an Li-linear programming problem, which can be done at the speeds of
ness of
be made more
limit result can
(3.7)
n(P — Pn(i))—>d arginf
The
boundary models the
and the solutions of
linear
programs
like
J
a.s.
is
necessary and
guarantees the unique-
those above are unique under mild conditions, which
10
guarantees uniqueness of Z®
A (^^'s
when
E.g.
.
Portnoy (1991)
see e.g.
support does not concentrate on a proper linear subspace
component and the variables
of lower dimension, has an absolutely continuous
corresponds to the result of Donald and Paarsch (1993a)
G.l.
Epi-graphical Convergence. Epi-convergence
and Wets (1998), among
(1986), Rockafellar
x.
£ be
Let
Qn
lower semi-continuous
is
the space of
1-sc
(1-sc)
functions /
and any
,
n,
real
p{
..-,
rk
(1)
(2)
P[inf Ig K,i< a Qn(x)
inf ieK ,x<a Q-n(x)
The
following
Lemma
i.
ii.
iii.
<
and
Zn
Zoo
be
>
Q(x)
inf
rk
oo.
.R]
,
...,
Rk
in
R
with open
)
Q„(i) > n,
...,
inf
Q n (x) >
rk
Q n {x) > n
...,
inf
Q n {x) >
rk
\
(G.l)
,
Q(x)>r,,...,
Q{x)>rk
inf
1
f
\.
simply by lower-semi-continuity. Epi-convergence
Q
Q n {x),
It is also
n (x)].
Thus
if
Q n (Z„) <
eR j
a weak condition that
inf l£ K
Qn(x) <
a]
=
one can characterize the joint distribution of the terms
one obtains the
inf.
is
an evident condition, since P[arg
limit distribution of argmin.
given in Knight (2000), Pflug (1995) Salinetti
s.t.
^
Q n (z) + e n
,
e„
\
0,
and Wets
(1986),
among
others.
and suppose
=O p (l),
=
Qn()
arginf ieR jQoo(z)
13
uniquely defined
m TRd
a.s.,
and
epi-converges in distribution to Qoo(),
Zn —>i
Epi-convergence
is
Zoo.
more general than uniform convergence, because
In our case, the non-vanishing discontinuities
impossible.
such that /
are
Qn(x3 ), Vz, Vij —
-,
inf
of argmins.
inf l€ K>ga
is
liminf Ij._n
any closed rectangles
for
f
in{ X £K tT .£ a
lemma
G.l. Let Z„
then
ities.
(2) is
if
-t-oo]
<
f
< P\
weak convergence
inf
limsupP-^ inf
(3)
leads to
work on
-1
f
< liminfP^
Note that the inequality
C
in
in the
:
Q(x)>r u...,
inf
<
Q
each n, Qn{x)
[— oo,
>
said to epi-converge in distribution to
interiors R*\, ...,R k
has been developed
Knight (2000), Pflug (1995), Salinetti and Wets
cf.
for
is
Rd — R =
:
that uniqueness holds in that case too.
Suppose that the sequence of objectives {Qn}
others.
functions, that
who show
in distribution
the stochastic approximation of optimization problems,
random
are absolutely continuous,
J,
a related problem. Also when A(/f,)'s have discrete support the limit result
for
The recent remarkable work
ing epi-convergence, which
amount
stochastic equisemicontinuity.
of
it
allows for non-vanishing discontinu-
make the uniform convergence
Knight (2000) provides convenient
of the likelihood function
sufficient conditions for verify-
to converting the finite-dimensional limits to epi-limits via a device called
The work extends
version of stochastic equisemicontinuity a.s.
We
Salinetti
shall
and Wets (1986) by defining an
"in probability"
prove epi-convergence directly, albeit borrowing the
general structure of the proof from Knight (2000). In fact, part of the proof replicates the proof of
2 of Knight (2000).
Theorem
Proof of Theorem G.l.
G.2.
measurable by Proposition 3.2
Second,
we
use
Lemma
First,
in
Theorem
is
arg sup tn(z)
the rescaled parameter space ^/n(A
Zn ->d Z =
Lemma
defined in the proof of
is
may
G.l
Theorem
are
By
definition
— a n (S)) x n(B —
/? n (<5)),
Q n (z)
defined in the proof of
is
= arg
inf -<2oo(z),
3.1.
be verified by checking three conditions:
(a)
epi-convergence in distribution of
Zn =
(c)
uniqueness of Z.
—Qn
to its finite-dimensional limit
— Qoo,
0„(1), and
(c) is
of the proof
— Qn(z),
-Q n {z),
arg inf
arg sup £oo(z)
(b)
Conditions
=
inf ze /c
C0-C3.
be proved that
3.1. It will
where Qao(z)
and other variables such as
(1988), given
G.l on epi-convergence to prove the result.
Zn =
where U„
MLE
note that the
Dupacova and Wets
is
assumed. Condition (b)
is
shown below.
borrowed from Knight (2000) 's proof of
two types of modulus
of continuity
by a strategy that
more
It is
his
is
Theorem
difficult to
2.
The
prove
The
(a).
specifics are
general idea
based on bounding
similar to the one in Ibragimov
and Has'minskii
(1981a).
We
Definition of epi-convergence in (G.l) consists of parts (l)-(3).
almost identically, and part
what
follows
we do not index
P
K
Thus, to verify
inf
z£R]
-Qn(z) >
(1) in (G.l)
it
Ri,...,
suffices to
result clearly, first
{ inf
Denote
R = R^
x R"
.
Rk, write
n,..., inf
z€.Rk
limsupP{u7<t {
To explain the
inf
=
-Qn(z) > r k \
1
- p{u,<
J
I.
fc
{ inf
z£Rj
-Q n (z) <
-Q„(z) < rj}\ <
bound the probability
-Q n (z) <r} =
}\.
)
P\uj<k {
inf
-Qoo(z)
<
rj}\.
(G.2)
of the event
{ inf (-Ql(z)
- Q dn {u)) <
r\.
Define two sets of grid-points as follows.
distance between the adjacent points
Lemma G.2 where
proof of Lemma G.2.
rj
show
Consider the grid of equidistant points {v s } and {u m } inside the rectangles
of
in
by 7n(5).
Given a collection of rectangles
Pi
verify part (1) only, part (3) follows
(by definition of lower-semi-continuity.) For notation sake,
(2) holds trivially
and
let
is
at
most
tp.
Also cover
R
by the
Ra
Ukj denote a carefully chosen point inside the cover
Next, define collection of points {zi} as the Cartesian product {v s } x ({w m }
grid points has the property that the nearest grid-points are at
of {zi} will be used to approximate the inf z6 fl
used to approximate the behavior of inf ueH ^
— Q cn (z) by
—Qn( u ) by
12
most
R
such that sup-
U
proof
set Vjy, as defined in
the
{«*j})- This collection of
apart from each other.
— Q cn (z). The
— Q„(u).
}
inf^j.,}
inf u6 { ut
<p
and
sets V,*, as defined in the
The
collection
collection of {ukj} will
be
Then
\mf -Q n (z)<r\c\\
A
where
is
-Q n {z) +
c
inf
w Q ^(R,<p) and Ld (R,<p)
part
=
SU P
Il,^2e'?i|z ]
^(fl,rf =
The modulus WQ^(R,(p)
Skorohod-type modulus,
is
(
G .3)
complement, that
is
Q cn
and discontinuous
=
is
sup{|z|
:
Qn(22)l.
-Qi(u)\.
inf
c
Qn
whether the infimum of the step function
G.3 bounds the probability of
<
c\
'
\R\
-<#(«)<
inf
l
-
l<3n(2l)
-Z2l<V
The modulus
.
£ Q j(R,<p) is a
coincides with the mini-
— Qi(z)
over a finite set of grid points {zi}-
P{w Q n (R,ip) >
of (G.5)
is its
,
0}
a standard measure of equicontinuity of
it tells
— Qn(z) computed
Lemma
where
=
<p)
a}\ U A c
are the moduli of continuity of the continuous part
WQ'„(R><P)
of
i Qi (R,
{
respectively:
Qi,
mum
e,
n
Ac
the event that the finite-dimensional approximation "works" and
A = {» g ; (R, <p) <
where
-Q dn (u) < r + e\
inf
Ac
:
l
const -\R\ -£~ <p,
z e R}. For any e
>
P{( Q d (R,¥>) =
and given R, we can pick
<
l}
ip(e)
const
\R\
(G.5)
tp,
small enough such that the rhs
smaller than e/2.
Hence
P{ inf-Q„(z) <r\<p{
and by Theorem
3.1
and the Portmanteu
limsupPJinf
-Q n {z) <
>
is
r) <
p{
UfP) C R,
for
it
is
Finally,
inf
inf
-Q n (z) <
-Q„(z)
it
remains to establish
of
MLE
-Q K (j)
<
r
+
e}
+
<
r
+
c\
+e
e.
r\ <
p\
-Q x (z) < A.
inf
<r
;
}| < P{u* = i{
inf
-Qoo{z) <
r,
1
+ e}\ +
e.
Zn = Op (l).
— ln/(V, —
g(Xi, /3)|X,,7)
sup^EPl sup T
>-»
First, the
MLE
7
is
consistent by a generalization of Wald's
- Theorem 3.3 of Artstein and Wets (1995) - which requires
(b) the domination:
7
inf
-Q?,(z)
inf
arbitrary, the required conclusion (G.2) follows.
Theorem on Consistency
(a)
c
follows that
limsupP{u* =1 {
Since e
-Q n (z) +
inf
arbitrary
limsupPJ
Thus
-Q dn (u) < r + e\+e,
inf
Lemma
< p{
Since £
-Q cn (z) +
inf
is a.s.
,
ln/(K,
(c)
identification (by CO),
(d)
compact parameter space (by CO).
lower semi-continuous (which
-
g(Xi,P')\Xi,*j')
13
is
true
by
< In/ < +oo, which
C2
is
and C3),
true by C2,
wp —>
Consistency implies
Zn
1
Sn =
€
{z
given in
Lemma
C.2.
These
and Has'minskii (1981a)
Lemma
inequalities imply,
Lemma
[see
p{
e n (z)
,
\
N
Lemma G.2 (Lipschitz Continuity of Q n
Cn > such that for all n > no and some large
).
which aJlows the
-> 0;
<CN A~
\
N
like
A>
and any
Ov >
where
,
that on
p.
265
in
Ibragimov
N>
0.
Zn = Op (l).
follows that
it
some e„
e n } for
by a standard argument
>A~
c
\Ql(zi) -<?.»(*i)|
<
£n, \u\/n
G.3, along with the exponential inequality for El]/ (z)
G.4], that for sufficiently large
sup
< C N A~ N and
Hence P{\Zn > A}
<
|u|/%A*
:
use of inequalities (G.6) and (G.7) proved in
Under C0-C5, for
a small 8
>
0, there is a
random
variable
no
< Cn \zi - z2 \{\zi |+1),
E P^Cn
sup
<oo,
i>no,7efl<(7o)
uniformly over
all \z\
—
zi\
<
1
in the set
Sn =
{z
lul/v^
:
<
£n, |«|/"
<
Lemma
G.3 (Bounding Moduli of Continuity). Under C0-C5,
sufficiently large, for some 5 > 0, and any bounded rectangle R C Sn :
1.
For
all sufficiently
small
<p
>
and
for
tuftere
n
all
:
£„ -4 0.
N
>
no, where no
<
const
•
\R\
V.
e
(G
v
76B4<-ro)
2.
For
all sufficiently
small
ip
\R\
Lemma
=
sup{|z|
G.4
:
>
P7 (£
z G fl}, and
P7 {
uniformly in 7 €
G.3.
Proof of
tziQ^
,i
.8,5
(70) for
Lemma
sup
some 8
e n (z)
>
(P,
= 1) < const
tp)
|i?|
for sufficiently large
> A~
N
\
< C N A~
N
,
A>
and any
where
CN >
N>
0.
0.
G.2(short proof). The
Lemma G.2
is
needed
for the
MLE
follows by a standard empirical process argument, noting that the object of interest
average and that
Proof of Lemma G.3. The
is
first
part follows by the
p. 262 in
an
Markov
inequality
and
Lemma
G.2.
The second
Ibragimov and Has'minskii (1981a).
[Covering Sets.] For a hyper-cube
R = Ra
collection of (possibly overlapping) subsets {Vkj} of
R
1.
There are
J{4>)
<
const
(l/<f>)
dB such cubes.
14
Rd "
R C Rd ",
x
R8
8
as follows. First cover the support of vector
,
where R" C
by the minimal number of closed equal-sized cubes {Xj,j,j <
<
is
proven below. In the one-dimensional case with no covariates, the argument essentially reduces to
the proof given on
<p
a function that
document.]
G.4.
to
result
,
part
(a)
The
Cn (ei,Zi Z2)|zi — £2|||zi| + 1|, and finally applying a maximal moment inequality
Cn (u,zi,Z2), specifically Lemma 19.34 in van der Vaart (1999). [Details are given in the
expressions of the from
last section of this
is
part only.
a spline type object. The result then follows by the Taylor-like expansion and obtaining
is
to the coefficients
G^
/
vs,
() and £ Q d () are defined in the proof of Theorem G.l.
Bound). Under C0-C5,
(Tail
61'
'
sup
where
is
>
e
P7 (w Q ^(R,tp) > e)
su
sup
£«},
J((f>)}
and
construct a
A(X)
with the side-length of the cube equal
Recall that
dA n (X,u) _
dg{x,P)
du
Note
en
u
also that uniformly in
—>
0,
Rs
in
and uniformly
In particular, choose no such that for
du
n > no
all
for
"^""^
Uuefl"
of
R
any given x and any
—
E n } for
some given sequence
1 '"'
K"
is
K" =
a.s.
(j>
2 " cubes of the
independent of
<f>
as being equal A(x) and independent from
Q^
7
A(X)| <
,
and
is finite
<j>)
and given that A(x) G X^, ; we have that
,
belongs at most to
only need that
(no depends on
du
I
aA
<
\\u\\/n
:
= A(X) + o(l).
dA n (X,u
I
(We
\0=e o +u/n
Rr C {u
in
we have
dA n (X,u)
Thus,
\
d/3
form X^j/ that are adjacent to X^
R
and
Thus,
).
in
what
follows
it is
.
(G.8)
helpful to think
u.
Construct the (overlapping) sets 21
{vk] ,k = -m,...,m,j =
de
l,...,J{<l>)}cR
such that
where
ip
Vkj
=
>
and
u G
-j
R
"
vk
'
—
A n (x,u) <
<
ip
=
vk
Since the range of |A n (X, u)\
—
brackets of the form [v k
ip,v k
is
fci^,
bounded
+ <p]
where
for k
at
Hence the
<p.
most polynomially
number L
total
\R
in
\
and
in 1/V-
Next, construct the "centers" u k j in
n > no and
for all
G {— m,
by
a.s.
m<
oc
p||i?^|| for all n,
</l\R
all
x
s.t.
A(x) G X^.j
k
...,0,...m}.
const \R g \/<p,
<t>
for all small
+ ip
t^
\R S
\
we can cover the range by
=
sup{|u|
:
u G
R e }.
2m
-I-
1
Choose
e
(G.9)
\
of covering sets Vkj
L < (2m +
bounded as
is
1)
•
J(<t>)
and grows
22
Vk j n R s
so that for
all
n > no
23
<Skj+V, Vi:A(i)eX^
hj < A„(x,u tj-)
(G.10)
where
5,.-
We
will
satisfy
A„(x,u) where
inf
need that n
<
:
77
<<
as n oc
2
<p
A(x) and hence
and
of
The covering
22
Note also that
sets
Vk j
Vk j
-
+
R&
[v k
dA„(i,u)/du
n > no- Hence u G
for all
for
<t>
at
i?
and x
:
small relative to
A(x) 6 X^
tp.
•
4>
when x
:
.-.
Moreover,
set sufficiently small as well. Setting
most to const
can be thought of as "approximate linear subspaces" of
clearly cover
brackets of the form
Note that
to have
dA n (x,Uk,)ldu
is sufficiently
PI
<f>
in
order to
small restricts the
A(x) G X^j. Thus, we choose
as stated in (G.9).
21
23
taken over u 6 Vkj
inf is
that n
i.e.
if,
we need
the constraint in (G.10),
variation of
77
=
Vk j
in
<P,v k
<p]
for
the interior of
K
<*.
n > no, because given u we have A n (x,u) belongs to at least two different
2
all n > no, and A(x) G
away from
for some X^j that is at most
for
X^
Vk j
R?
for
,
some k and
it is
j.
the case that 6^,
15
=
vk —
<p;
but otherwise, this
is
not the case.
(b) [Characterization of Break-Points] Recall that
n
Qt{u)=Y.
•n
l(Xi
i
7^{l(0<ne, <
A n (X
i7
r
+^
u))
\n^\l(0>n
>A n (X„u))
(A,)
ei
q
1=1
We
next examine the nature of the discontinuities of Qt(u) by
examining those of Qi. + (u) and then
first
those of Qi~(u).
Suppose we have nti
=
A„(A'j,u) for some u £ Vkj and A(A";) 6 X^,j, then the pair (ne,, Xi)
induce a break-point in the set Vkj and in the bracket
Given that this
is
— Q^ + (u)
-Qt
piecewise-constant
is
—
ip,
+
vk
the only pair that induces a break-point in Vkj
inf
since
\v k
Thus, what we need
# -Q n
(u kj )
only
and can only jump up
as follows.
is
than one break-point happens
d+
+ (u)
we need
First,
if
net
if
follows that
£
[5 kj
,
+ n]
6 kj
the index A„(A',-,
increases.
v.)
to control the probability of the event that
any of the brackets of the form
in
it
—
[v k
+
ip,v k
for
tp]
< m.
|fc|
included in the event that the errors net are not separated in non-overlapping brackets, which
A\(R)
=
Ujt<77i{ there
are nei,nei' e [v k
said to
is
which A„(X,, u) belongs. 24
to
<p]
— <p,y k + <p]}- Second, we need to control
— <p,v k + tp], they do not fall into the
nei that are separated into the brackets \v k
the probability that for
"bad subset"
is
the event
is
\5 k
-,
Sk
A(Xj) G X^j. Formally, conditionally on the complement of A\(R),
of such brackets, given that
more
This
•
all
+
i.e.
rj\
on
A\(R) define the event A2 (R) as the union of
A 2i ,kj(R) =
across
i
<
n,
\k\
< m,j <
[nei 6
[S kj
,S kj
+
f?]|ne.
6
[v k
- f,v k +
ip],A{Xi) 6
X^-.u 6
Vk}
j
J(4>).
To begin,
n
71
p{a
1
(r)]
< J2
J2
x^ p { ne " ne
e
'
fc*
-
*>**
+ fi} ^ 2m +
(
*)
<2
2
-^ ^
const fl ivi
|k|<m i'=l:i'£i i=l
Denote the
bracket
at
[v k
most 3
in (G.8)).
•
number
total
of
rce,
that
fall
into brackets of the
form
[v k
— ip, v k + tp] by Afn
— <p, v k + <p] overlaps with at most two other brackets and (ii) (G.8)
K" jV„ neighborhoods of the form Vkj in which the break-point may
holds for
.
Because
n>
occur (where
,
Al(R)} <
3
N
K'
n
p{A 2i
sup
,j,
k
(R)\
<
3
•
K'
M
n
(///)
i<»,l*l<mj'<J»)
F{M,} < nEl{\nu\ <
g'\\R\\)
<
2g'\\R\\f,
p{a 2 (R)\AUR)}
Hence
K"
is
(since n oc
P{
<
const
\R\(nM.
2
<p
)
Ukj
{Jnf -Qt+(u)
# -Q* + («,-»)}} < P{A2 (J?)|A;(iJ)} + p{^(fi)}
<
The terminology
"break-points"
is
borrowed from the
16
const
linear
\R\(r)/tp
programming
+ ip) <
const
literature.
\R\<p.
any
defined
Then
p\A2 (R)\tfn
Since
(i)
no, there are
fo/(2p)).
Therefore, conclude that
J
Likewise,
it
= p\ inf -Q dn + {u) <
UtH"
+ {u
<
const
\R\<p.
-Q dn {u kj )\<
const
\R\<p.
-Qi
inf
Ukj}
kj
)
}
>
follows that for a finite collection of grid points {Ukj}
II
= P{
-Qi~(u)<
inf
inf
Finally,
P\
inf
'•u£R0
G.5.
-Q dn {u) <
Proof of Lemma G.4. This
(1981a).
We
need to establish for
uses the
Cn
suffice to
method
sup
en
of the proof on p. 265-266 of
A>
and any
1
(z)>A-"\
N
for sufficiently large
<CN A~ N
{GU)
.
=
{z
:
<
t
\z\
<
t
+
1}.
It will
>A
t
PJ
Ibragimov and Has'minskii
>
denotes a generic constant that only depends on N. Let R(t)
show that
|#|¥>.
>
sufficiently large
PJ
where
-Q*(u)) < J + JJ < const
inf
^H^kj.^kj}
en(z)>i-
sup
N }<C i~ N
N
(G.12)
,
since then
oo
PJ
*•
sup
£ n (z)
z£S„:\z\>A
> A~"\ < VP-rj
L
Next cover R(t) by grid-points
pJ
^ :
{zi} in the
N
tn(z)>r \<Py\
sup
f^
'
L
>
eR(t)ns n
way
£„(z)> (i
sup
+ f)"""
defined in the proof of
sup £n (z)>r
N
}
,v
.
Theorem
G.l. It follows that
+ pJw QCJR(t), <p) >\\n(r N /2)\UZ Q * (R(t),<p) =
/2}
J
*e{*i}
<CnA"
1
J
:6S(/l+t)nS n
v_!;
2
//
where ujqc and £
The number
i?(t)}
= t + 1,
,i
are the moduli of continuity defined in (G.4).
of partition points {zj}
that
is
L <
const
By Lemma C.2 and Markov
(2
noting that \R(t)\
Il<
Select
v?
=
<"
2Ar
"\
l)"^
inequality,
I<
By Lemma G.3
+
bounded by L < const
is
const
= +
i
const
-
",
where
1
<
k.
<
(\R(t)\/<p)
oo (n
is
K
,
= sup{|z| z
proof of Lemma G.2).
where
given in the
Py {( n (zi) > <~'v /2} < const exp(— b\t\).
V"" t'v -e" 6 " 6>0
(t + 1)
,v
t
1
•
•
.
\R(t)\
Hence
:
for
t
6
>A
(G.13)
1
(<
+
1)|
ln(r
w
/2)|"V+
const
(t
+
1)^.
then (G.12) immediately follows from (G.13) and (G.14).
17
l\,
i,
'
'
(G.14)
G.6.
Proof of
Lemma
We
G.2(Detailed proof).
have
n
c
Q n {z)=^fin{z)
>{A„(X,,M)/n}V0) +
x[l(ej
< {A n (X„u)/n} A
l(e i
0)]
1=1
+ ^( f "'( z ) -
<
r '"( z )) x l^
^ A„(Xi,u)/n) +
e<
1(0
>
et
> A„(Xi,u)/n)]
i=i
"
'
«
Recall that
Q| n ( z ) =
Intuitively,
note that the functions of interest are
the one-sided models.
in
assumptions
tion, given the differentiability
Under C1-C5,
>
a small 5
for
0,
there
all
Lipschitz-smooth (spline-type) objects by construc-
CI- C3. Thus,
in
it is
a random variable
is
reasonable to expect the final result:
Cn >
such that for
all
n > no and some
large no
\<&M-QU*2)\<Cn\Zl-Z2\QZl\ + l),
EP_,Cn
SUp
<OC,
7i>no,76Bj(7o)
uniformly over
—
all \zi
z2
\
The proof is tedious but
the
maximal
Split
Sn =
<
<
—>
<
1 in
it
does have a very simple structure. Given some careful Taylor-type expansions,
the set
{z
:
\v\/*Jn
e„, \u\/n
e„}, where e n
0.
inequalities will be applied to the coefficients of those expansions to obtain the required result.
Q c n {zi) —
Qi„(z 2 )
\
into
two terms
n
7
= Y, f
'
n (*i)l(ei
> {A„ (Xi,B,)/n}V0) -
r, n
(z 2 ) 1 (e t
> {A n (X„u 2 ) /n] V
0)
< {A„ (X„ Ui )/n} A 0) -
f, n
(z 2 ) 1
< {A n
0)
1=1
n
II
= Y^f,n
(zi)
1 (e,
(a
(X,,u 2 ) /n} A
i=i
We focus
on term
I,
and only
briefly indicate
the differences for term
h -h <
I
II.
The term
I
< h + h,
where
/,
=J^l(ei >{A„(I„u
1
)/t! }V{A„(X1 ,i12 )/ji}v0)
i=i
x (In/
-
In/
— A n (Xi,u\) ln\Xi\po + ui/n,ao + v\/y/n)
(e;
(e,
- A„
+ u 2 /n, ao + v2 /y/n))
(X,,u 2 ) /n\Xi;
n
72
= ^l(0<e
t
e [A n (Xi,ui) /n,
A n (X u u 2
)
/n])
i=i
x
max
j
where
[a, 6]
denotes {i
:
a
= i.2
I
in
/
( ei
- A„
— —
(X,,uj) /n\Xi;p
<i<
6 or
i)
<
i
<
+ Uj/n,a + Vj/\/n)
—. r—
r
/(e,|X,;/3 ,ao)
I
I
a}.
Analogously approximate the term 77 as follows:
77,
- 772 < 77 <
18
7/i
+
772
I
,
can bounded as
where
Tl
=^
IIj
x (In/
-In/
- A„
(e,
(e,
< {A„(A",,w,)/n} A{A„(A'„b 2 )/ii}AO)
1(e '
- An
(A,, ui)/7j|A;;/3
(Xi,Ut) /n\Xi;/3
+
+
Ui/n,a +Vi/Vn)
+
u 2 /n,a
v 2 /%fn))
n
=
J/ 2
][] 1(0
>
e,
e [A n
(A„u,)/n,A n (X,,U2)/n])
i=i
x
max
/(e,
I.
- A„
where
Part
[a,
I.
fc]
denotes
{x:a<x<b
Bounds on Terms
I\
or
and
+ Uj/n,a +
(Aj.Mjj/nlA.jjgo
—; —— /(e,|A,;/3o,ao)
in
j=J.»l
Vj/-fn)
\
I
6<i< a}.
II\.
Term
7i
equals by Taylor expansion
n
^
1 (f,
> {A„
(A, lUl )/n}
V {A„ (X„u 2 )/n} V
0)
i=i
x
—
ou
ln/(e, -
A„
(A,,u*)/n|A,;/3
+
u"/n,a
+ v* j-Jri)
(u\
- u 2 )/n +
In
$S(e,- > {A n (X„ui)/n}V{A n (X ,u 2 )/n}V0)-^\nf((,\X,, lo )(v -«s)/Vnda
1
l
i=i
£)l(* > {A^A.^O/nlViAnCA,,^)/^} V0)
i=i
92
x(H„z")'^-5-r
ln/(e,
- A„ (A,,u") /n|A,;/3 +
w"
n
o-yoa
Analogously decompose the term II\ as 77 n
J2 1
(«>
< { A « (A„ Ul
)
/n} A
,<*o
+
u"
-7=)
^/n
(ui
-
uz) /%/«.
+ 1 hi + U\z-
{A n (A„u 2 /n} A
)
0)
1=1
O
x
— ln/(e, - A
n
(
A, u" ) /n\ Ar, 0o
,
+
u' /n,a
+v"/y/n)
"
£l(e,< {A„(A„ Ul )/«} A{A n
—
(ui
- u 2 )/n +
<9
(A,, U2 )/n}
A
0)
In
/ (e,|A,, 7 o)
(u,
-^2)/^ +
]£l(e. < {A (A,,w,)/n} A{A„(A,,u 2 )/n}A0)
7I
1=1
*(Hnz')'J!—\nf( e - A„ (A,,0/"|A,,,3o +
070a
,
19
—
n
,a
+ ^=)
-Jn
(w,
- w) /VS.
By C5,
the term
+
|Jji
II\\\
is
bounded by
(I
£ VC2(e„X,))
where the expectation of the random term
By C5,
+
the term I i3
II\ 3
is finite
|
-
UI
(G.15)
ual,
and constant across n by
iid
sampling.
bounded by
is
n
1
z^\
+
-
l){z,
z2
3 {u,Xi),
)\-Y^C
n
i
where the expectation of the random term
is
constant for
=
(G.16)
l
all n.
Write
Jn *—
oa
1=1
'
"
'
v
Jill
-4=y"l(0<e
v
J
e[A„(X,-,ti,)/n,A n (Xi,ti2)/n])^-ln/(e
j
|Jfi,To)(t;,
-w
2)
(G.17)
1=1
;/7 2
-
Y"
—
vn i=i
-7=
>
1(0
£i
6 [A„ (A-j.m,) /n,
A„ (Xj; «2 ) /n])^-ln/ (e,|X„ 7o)(vi *
»
By C5
u 2 )|
Oa
J//i has two finite moments, which remain constant for
and the same bound
III3 follows
for
E
By Lemma
identically.
all n.
Next we show a bound
\IIh - EIII2 < J {F,F,L 2 (P))
sup
\
for llli
19.34 in van der Vaart (1999)
{]
<
oo,
(G
v
I*ll/n+|ii2l/n<2€„
where
J[\(F,
original
T, L%(P ))
the L2(P) bracketing entropy of the function class, which
is
we
rewrite in terms of
\02
-
parameter
F={l(g(X„,3
)
<
Y,
e [s(X„/3 ),9(X,-, 9 2 )])^-ln/(e |X 7 o), |£
with the constant enveloped
e„ eventually puts
1
=
/?o + ttj /n's
of
a Donsker
in (G.18) is finite
V
-
+
fa\
#>|
< 2e„}
the vector of ones. Note that the bound \ui— U2\fn
for
n >
no,
and
also puts
T
uniformly in n by a standard argument, because
is
A„
<
(X{ Uj ) /n's
,
formed as a product
class
{l(p(X,,/?o)
(see type
is
) ,
a small fixed neighborhood of /?o
any small fixed neighborhood of 0.
The entropy
1
/
/'//I by C3, where 1
in
in
in
18);
'
functions in
<
Yi 6
Andrews
[g(X„p ),g(X„0 2 )]),
(1994))
1
and a bounded by
van der Vaart and Wellner (1996) implies that
T
is
20
|/3,
F
- fa\ +
random
\fo
~
Po\
variable,
Donsker, and thus
J[)(F,
< 2e„}
which by Theorem 2.10.6
T, L2(P))
<
oo.
Also note that by
C2
and C3:
E^\
=
\EIIh\
lfe
r An (X^
< E^fn
,112
.f'
7JI
>
<
/(e|X„ 70 )|l( e > 0)de
)/ n
/'l(e
I
0)<7e
f
g
\\u,
- u 2 \\/y/n,
/i,(x„ii,)/»
where the constants are defined
Thus,
for
some random
Lemma A.l.
in
C„ with bounded expectation uniformly
variable
|Ii2+//h| <
Now
bounds established so
collecting all the
expectation uniformly
II.
we have
far,
n
\.
some random
for
variable
Cn
with bounded
n
in
Bounds on Terms
and 772
72
<C n
+IIl\
|/l
Part
Cn \vi -v 2
in
.
Let's get
\zi
+ l).
-Z2\(\Zl\
(G.19)
back now to the terms 72 and 772
.
Recall that
n
=^1(0
72
<e. e [A„(A',,M,)/n,A n (X,,« 2 )/n])
>=i
f
I,
max
x
in
j = i.2
- A„
((,
(Xj,Uj) /n\Xj;Po
+ Uj/n,ao + Vj/\/n)
\
,
/(e,|A,;/3o,Qo)
I
I
and that
n
772
=^l(0>f,
)/n,A„(A,,« 2 )/n])
6 [A„(X,, Ul
t=i
x
By C2-C3
72
is
/(ei
max
j
-
An
+
(X,,Uj) /n\X,;0o
In-
= i.2
u j /n,a
+ Vj/y/n)\
/(ei|Ai;/?o,ao)
1
bounded by
i^
1(0
<
e,-
e [A„ (X\,u,) /n,
A„ (A„ U2
)
A„ (A„ « 2
)
/n])(/'//) x||«,
- «2
||,
x|| Ul
- «2
||,
'2
where the constants are defined
By C2-C3 772
Lemma
in
A.l.
bounded by
is
-J*
— 1(0 >
7J.
e,-
<
6 [A„ (Xi,ui)ln,
/n])(f'/f)
—
7/21
where the constants are defined
inequality (G.18)
we
in
Lemma
A.l.
Then, by an argument that
obtain that uniformly in n
E
sup \y/n(hi -
EI2 t)\ <
co,
"1 ."2
and
identical
bound
follows for the
term 772 i:
E
sup \y/n(II2 - £77 2
i
U],T12
Furthermore by C2-
C3
7J(7 2
i
+772] <
)
const
•
|m
- u2
21
|.
i
)
|
<
oo.
is
identical to the proof of
Hence
for
a random variable
II2
Combining
Cn
with uniformly bounded expectation (uniformly in
+ /I2I <
this inequality with the
For a small 5
>
0,
there
IQi.(«i)
is
U21 +II21I
one
const(l
cr»|*i
+ Cn/Vn)\ui -
the bound
in (G.19),
a random variable
- Q?»(«>)l <
<
Cn >
in
such that
- a|(|a| + 1),
u2
n and
\.
the statement of the
for all
lemma
n > no and some
E Pl cn
sup
in 7)
follows:
large no
<oo,
™>no."r€B«(7o)
uniformly over
Similarly
it
all \z\
—
zi\
<
1 in
=
the set 5„
follows that for a small 5
>
0,
{z
there
:
\v \/y/n
<
e„, \u\/n
a random variable
is
<
e„}, where c„
Cn >
—>
0.
such that for
all
n > no and
some large no
IQLfcO-QLMI
uniformly over
C„
=
all
\z\
—
zi\
<
1 in
<Cn|zi
the set
-22|(|2l|
5n =
{z
:
22
ZM
SU P
n>no.T6Si(7o)
l),
\v\/y/n
in the one-side models.
J
+
<
e n ,|tt|/7i
<
E P^Cn
£„},
<00,
where e„ —»
0.
Note that
MIT LIBRARIES
3 9080 02524 4606
!
.
m
''.'••
:
Download