Document 11159466

advertisement
it
I
\
LIBRARIES
V *& nft^S^
Off®*
if
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium Member Libraries
http://www.archive.org/details/mcmcapproachtoclOOcher
1
0/
HB31
.M415
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
AN MCMC APPROACH TO CLASSICAL ESTIMATION
Victor Chernozhukov
Han Hong
Working Paper 03-21
December 2002
Room
E52-251
50 Memorial Drive
Cambridge,
MA 02142
This paper can be downloaded without charge from the
Social Science Research
Network Paper Collection
http://ssrn.com/abstract=42037
at
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
SEP
1
9 2003
LIBRARIES
An
MCMC Approach to
Classical Estimation
Victor Chernozhukov a
°
Department of Economics, Massachusetts
b
,
Han Hong6
Institute of Technology,
Cambridge,
MA 02142, USA
Department of Economics, Princeton University, Princeton, NJ 08544,
First Version:
October 2000 This Version: December 2002
USA
1
Project Funded by the National Science Foundation.
Abstract
This paper studies computationally and theoretically attractive estimators referred here as to the Laplace
type estimators (LTE). The
LTE
include
means and
quantiles of Quasi-posterior distributions defined as
transformations of general (non-likelihood-based) statistical criterion functions, such as those in
nonlinear IV, empirical likelihood, and
minimum
to classical extremum estimation and also
it
offers
distance methods.
The approach
For example,
outside the parametric Bayesian approach.
falls
GMM,
generates an alternative
a new attractive estimation method for such important semi-parametric problems as censored and
instrumental quantile regression, nonlinear IV,
GMM,
and
value-at-risk models.
The LTE's
are
computed
using Markov Chain Monte Carlo methods, which help circumvent the computational curse of dimensionality.
A
large sample theory
JEL
Classification:
is
obtained and illustrated for regular cases.
C10, Cll, C13, C15
Keywords: Laplace, Bayes, Markov Chain Monte Carlo,
GMM,
instrumental regression, censored
quantile regression, instrumental quantile regression, empirical likelihood, value-at-risk
1
A
Introduction
variety of important econometric problems pose not only a theoretical but
challenge,
cf.
Andrews
(1997).
A small
a serious computational
(and by no means exhaustive) set of such examples include
Powell's censored median regression for linear and nonlinear problems, (2) nonlinear
e.g in the Berry et
updating
al.
(1)
rV estimation,
(1995) model, (3) the instrumental quantile regression, (4) the continuous-
GMM estimator of Hansen et
al.
(1996),
and
related empirical likelihood problems. These
problems represent a formidable practical challenge as the extremum estimators are known to be
difficult to
compute due to highly nonconvex
criterion functions with
pronounced global optimum). Despite extensive
of
efforts, see
extremum computation remains a formidable impediment
'A
shorter version of this paper
is
forthcoming
in
many
local
optima (but
well
notably Andrews (1997), the problem
in these applications.
Journal of Econometrics 115 (August 2003),
p.
293-346
This paper develops a class of estimators, which we
Quasi-Bayesian estimators (QBE),
2
call
the Laplace type estimators (LTE) or
which are defined similarly to Bayesian estimators but use gen-
eral statistical criterion functions in place of the
parametric likelihood function. This formulation
circumvents the curse of dimensionality inherent in the computation of the classical extremum
mators by instead focusing on
functions and can be
lation techniques
LTE
computed using Markov Chain Monte Carlo methods (MCMC), a class of simu-
from Bayesian
statistics.
This formulation
will
be shown to yield both computable
and theoretically attractive new estimators to such important problems as
(l)-(4) listed above. Al-
though the aforementioned applications are mostly microeconometric, the obtained
to
many
other models, including
of Gallant
The
esti-
which are functions of integral transformations of the criterion
GMM
and
quasi-likelihoods in the nonlinear
results extend
dynamic framework
and White (1988).
class of
LTE's or QBE's aim to explore the use of the Laplace approximation (developed by
Laplace to study large sample approximations of Bayesian estimators and for use in other nonstatistical
problems) outside of the canonical Bayesian framework - that
likelihood settings
when the
likelihood function
is
not known.
outside of parametric
is,
Instead, the approach relies
other statistical criterion functions of interest in place of the likelihood, transforms
distributions - Quasi-posteriors - over a parameter of interest,
quantiles of that distribution as the point estimates
and
them
defines various
and confidence
upon
into proper
moments and
intervals, respectively.
It is
important to emphasize that the underlying criterion functions are mainly motivated by the analogy
principle in place of the likelihood principle, are not the likelihoods (densities) of the data,
and are
most often semi-parametric. 3
The
resulting estimators
and inference procedures possess a number of good
theoretical
and com-
putational properties and yield new, alternative approaches for the important problems mentioned
earlier.
The estimates
are as efficient as the
extremum estimates; and,
in
many
cases, the inference
procedures based on the quantiles of the Quasi-posterior distribution or other posterior quantities
yield asymptotically valid confidence intervals,
which also perform notably well in
finite
samples.
For example, in the quantile regression setting, those intervals provide valid large sample and excellent small
sample inference without requiring nonparametric estimation of the conditional density
function (needed in the standard approach).
The obtained
results are general
and
useful
- they cover
the examples listed above under general, non-likelihood based conditions that allow discontinuous,
non-smooth semi-parametric
settings to the nonlinear
criterion functions,
dynamic framework
and data generating processes that range from
of Gallant
and White
(1988).
The
iid
results thus extend
the theoretical work on large sample theory of Bayesian procedures in econometrics and statistics,
e.g.
and Yahav (1969), Ibragimov and Has'minskii (1981), Andrews (1994b), Kim (1998).
Bickel
The LTE's
2
A
are
computed using
preferred terminology
Estimators'
is
already used to
is
MCMC,
which simulates a
series of
parameter draws such that
taken here to be the 'Laplace Type Estimators', since the term 'Quasi-Bayesian
name
Bayesian procedures that use either 'vague' or 'data-dependent' prior or multiple
priors, cf. Berger (2002).
3
In this paper, the
term 'semi-parametric'
refers to the cases
where the parameters of interest are finite-dimensional
but there are nonparametric nuisance parameters such as unspecified distributions.
>
the marginal distribution of the series
parameters.
mean
As
The estimator
a quantile of the
or
stated above, the
is
therefore a function of this series,
is
LTE
approach
is
may
MCMC). The
reason
typically
is
motivated by the estimation and inference efficiency as well as
LTE
approach
that the computation of LTE's
means or quantiles
is
as efficient as the
is itself
statistically motivated.
B
number
the
is
mode (extremum
of
is
LTE's are
estimator)
draws from that distribution (functional
MCMC
estimated (computed) by the
is
similar grid-based algorithms at the nonparametric rate (l/B) d +?p
p
extremum approach,
of a quasi-posterior distribution, hence can be estimated (computed)
evaluations). In contrast, the
dimension and
explicitly as the
not suffer from the computational curse of dimensionality (through the use of
at the parametric rate l/\/B, where
and
and may be given
the minimizer of a smooth globally convex function.
series, or implicitly as
computational attractiveness. Indeed, the
but generally
(approximately) the Quasi-posterior distribution of the
,
where d
is
the parameter
the smoothness order of the objective function.
Another useful feature of
LT
estimation
is
that,
by using information about the
and confidence
objective function, point estimates
intervals
and allows
also allows incorporation of prior information,
may be
for
shape of the
overall
calculated simultaneously.
It
a simple imposition of constraints in
the estimation procedure.
The remainder
of the paper proceeds as follows. Section 2 formally defines
and further motivates the
Laplace type estimators with several examples, reviews the literature, and explains other connections.
The motivating examples, which
justify the pursuit of
are
all
semi-parametric and involve no parametric likelihoods, will
a more general theory than
is
sample theory, and Sections 3 and 4 further explore
mentioned
earlier. Section
currently available. Section 3 develops the large
it
within the context of the econometric examples
4 briefly reviews important computational aspects and illustrates the use
of the estimator through simulation examples.
Section 5 contains a brief empirical example, and
Section 6 concludes.
Notation.
Standard notation
is
used throughout.
convergence in distribution under P*,
denotes the Euclidean
\x\
table
is
etc.
See
e.g.
van der Vaart and Wellner (1996)
y/x'x; B$(x) denotes the ball of radius
<5
for definitions.
centered at x.
A
notation
given in the appendix.
Laplacian or Quasi-Bayesian Estimation: Definition and Motivation
2
2.1
Motivation
Extremum estimators
of
norm
— p denotes
—>£ denotes the
Given probability measure P,
the convergence in (outer) probability with respect to the outer probability P*;
random
are usually motivated by the analogy principle and defined as maximizers
average-like criterion functions
typically viewed as transformations of
are maximized uniquely at some 9q.
normal,
cf.
Amemiya
Ln
((?),
where n denotes the sample
size.
n _1 L n
sample averages that converge to criterion functions
Extremum
(1985), Gallant and
M
(6)
(9)
are
that
estimators are usually consistent and asymptotically
White (1988), Newey and McFadden
(1994), Potscher
and Prucha (1997). However,
in
many important
computing the extremum estimates
cases, actually
remains a large problem, as discussed by Andrews (1997).
Example
rics is
Censored and Nonlinear Quantile Regression. A prominent model
1:
in
economet-
the censored median regression model of Powell (1984). Powell's censored quantile regression
estimator
is
defined to maximize the following nonlinear objective function
n
Ln
where p T (u)
and Y{
is
=
(r
—
W = - X) w
l(u
<
0))
u
«
is
'
PriXi
q{Xi,e)),
= max (0,g(Xu 9)),
q{Xi,9)
the check function of Koenker and Bassett (1978), w,
either positive or zero. Its conditional quantile q(Xi,9)
censored quantile regression model was
inference in
of
-
unknown
first
Tobin-Amemiya models without
form.
ically elegant,
The extremum
it is
distributional assumptions
Buchinsky (1991), Fitzenberger (1997), and
we
shall explore the use of
Example
2:
Nonlinear
The
difficulty.
where m;
(9) is
LT
Khan and
objective function
is
local
similar to that
optima, posing a
(1998),
estimators based on Powell's criterion function and show that
GMM
(1977),
Hansen (1982), Hansen
et
al.
(1996)
estimators that maximize
a moment function defined such that the economic parameter of interest solves
Em,i{9
The weighting matrix may be given by
choices.
Hahn
Powell (2001) for related discussions. In this
both theoretically and computationally.
IV and GMM. Amemiya
introduced nonlinear IV and
The
and with heteroscedasticity
formidable obstacle to the practical use of this extremum estimator; see Buchinsky and
paper,
a weight,
estimator based on the Powell's criterion function, while theoret-
nonsmooth and highly nonconvex, with numerous
this alternative is attractive
is
specified as max(0,g(Xi,8)).
formulated by Powell (1984) as a way to provide valid
has a well-known computational
plotted in Figure 1 -
is
Note that the term "op (l)"
in
Wn
{9)
Ln
estimators, which will be discussed in section
—
)
[jj
=
0.
Y^?-i
m
«
(#)
m
«
W
J
+ °p0)
or other sensible
implicitly incorporates generalized empirical likelihood
4.
Up to the first order,
objective functions of empirical
likelihood estimators for 9 (with the Lagrange multiplier concentrated out) locally coincide with L„.
Applications of these estimators are numerous and important
(1996),
many
(e.g.
Imbens (1997)), but while global maxima are typically
local
approach
optima
in applications.
in applications
This leads to serious
where the parameter dimension
is
Berry
et al. (1995),
well-defined,
difficulties
high.
As
in
it is
Hansen
et al.
also typical to see
with applying the extremum
the previous example, LTE's
provide a computable and theoretically attractive alternative to extremum estimators. Furthermore,
Quasi-posterior quantiles provide a valid and effective
explore the shape of the objective function.
way
to construct confidence intervals and
Example 3: Instrumental and Robust Quantile Regression. Instrumental quantile regression
may be defined by maximizing a standard nonlinear IV or GMM objective function4
where
mi
Yi
is
D
the dependent variable,
regressors, Z\
is
((?)
is
-
t
= (r - l(Yi <
lY m
Tt
i
i
(9)m i
Zu
a vector of possibly endogeneous
a vector of instruments, and
W„{9)
q(D h Xi,9))
Wn
(9) is
+op (l)
(9)'
Wn
or
(tf)
=
_i__
i
i=l
variables,
Xi
is
a vector of
a positive definite weighting matrix,
J
I
v
J.
e.g.
-j^^Z't
/6
T'
)
L
i=l
or other sensible versions. Motivations for estimating equations of this sort arise from traditional
separable simultaneous equations,
cf.
Amemiya
(1985),
taneous equation models and heterogeneous treatment
and also more general nonseparable simul-
effect
models. 5
Clearly,
a variety of Huber (1973) type robust estimators can be defined in
suppose
in
q{X,9)
then
Z = f(X)
=
fact,
it
The
(1999),
<
\{X{j
produces an approach that
and Hubert
= X'l3(T),
can be constructed to preclude the influence of outliers in
example, choosing Z\j
(2002).
is
Xj), j
—
1,
...,
whose computational
difficulty is well
i n (/3)
is
optimum
approach
known, as discussed
in
van Aelst
et al.
highly robust to both outliers in Xij and Y,. In
Rousseeuw and Hubert (1999).
is
flat,
and has numerous
is
daunting.
discontinuities
The function L n
and
local optima.
6
is
highly non-
(Note that the
well pronounced.) Figure 1 illustrates the situation. Again, in this case the
will yield
LTE
a computable and theoretically attractive alternative to the extremum-based
estimation and inference. 7 Furthermore,
provide a valid and effective
6
n}
appears that the breakdown properties of this objective function are similar to those of the
convex, almost everywhere
5
For
<
maximal regression depth estimator of Rousseeuw
Despite a clear appeal, the computational problem
4
X on the inference.
dim(X), where Xj denotes the median of {X{j, i
similar in spirit to the
resulting objective function
objective function of
global
For example,
this way.
the absence of endogeneity
we
will
show that the Quasi-posterior confidence
way to construct confidence
intervals for parameters
Early variants based on the Wald instruments go back to Mood (1950) and Hogg (1975),
See Chernozhukov and Hansen (2001) for the development of this direction.
Macurdy and Timmins
cf.
and
intervals
their
Koenker
smooth
(1998).
(2001) propose to smooth out the edges using kernels, however this does not eliminate
non-convexities and local optima; see also Abadie (1995).
T
Another computationally attractive approach, based on an extension of Koenker and Bassett (1978) quantile
problems like these, is given in Chernozhukov and Hansen(2001).
regression estimator to instrumental
functions without non-parametric estimation of the conditional density function evaluated at quantiles
(needed in standard approach).
The LTE's
studied in this paper can be easily
other posterior simulation methods.
Ln
function
(9) is
computed through Markov Chain Monte Carlo and
To describe these
estimators, note that although the objective
generally not a log-likelihood function, the transformation
>Wtt(0)
e
is
a proper distribution density over the parameter of interest, called here the Quasi-posterior. Here
k
(9) is
a weight or prior probability density that
example,
in the
is
strictly positive
can be constant over the parameter space. Note that p„
it
Bayesian sense, since
may
it
is
and continuous over ©,
for
generally not a true posterior
not involve the conditional data density or likelihood, and
is
thus
generally created through non-Bayesian statistical learning.
The Quasi-posterior mean
is
e
where
then defined as
=
Je
ePn{ e )de
the parameter space.
is
=
f&
e
(
™
^ZlU *
Other quantities such as medians and quantiles
will also
be
considered.
A
In order to
compute these estimators, using Markov Chain Monte Carlo methods, we can draw a
Markov chain
formal definition of
(see
Figure
LTE's
given in Definition
is
1.
1),
S={9^,9^,.. .,0< D >),
whose marginal density
estimate
9, e.g.
is
approximately given by p„(9), the Quasi-posterior distribution. Then the
the Quasi-posterior mean,
is
computed as
B
1
B
2=1
Analogously, for a given continuously differentiable function g
are constructed simply by taking the .05-th
9(S)
see Figure
1.
=
and
-¥ M, the 90%-confidence intervals
(g(e^),...,g(9W)),
Under the information equality
restrictions discussed later, such confidence regions are
asymptotically valid. Under other conditions,
it is
possible to use other Quasi-posterior quantities
such as the variance-covariance matrix of the series
regions, see Section 3. It shall be
:
.95-th quantiles of the sequence
S
to define asymptotically valid confidence
emphasized repeatedly that the validity of
depend on the likelihood formulation.
this
approach does not
2.2
Formal Definitions
Let p n (u) be a penalty or loss function associated with making an incorrect decision. Examples of
pn
(u) include
i.
p„ (u)
=
iT/nu]
ii.
p n (u)
=
y/n^2 j=1
iii.
p„ (u)
= y/n^j_
2
the squared loss function,
,
(tj
1
Koenker and Bassett
The parameter
is
the absolute deviation loss function,
\iij\,
—
1 (uj
<
€
0)) Uj, for Tj
(0, 1) for
each
the check loss function of
j,
(1978).
assumed to belong to the subset
of Euclidean space. Using the Quasi-posterior
pn density in (2.1), define the Quasi-posterior risk function as:
On (0
Definition
1
The
=
/e pAO - C) Pn
class of
LTE
ffl
minimize
e
The estimator
9
is
a decision
rule that
* = fe *Vthe function
= arginf
is
Q n {0
(
j^Zl^de ) »•
(2 3)
-
?n (%-3) for various choices of
[Qn (C)].
pn :
(2.4)
least unfavorable given the statistical (non-likelihood)
information provided by the probability measure p n
using the loss function p n In particular, the
p n may asymmetrically penalize deviations from the truth, and 7r may give differential
weights to different values of 9. The solutions to the problem (2.4) for loss functions i-iii include
-
,
loss function
the Quasi-posterior means, medians, and marginal
2.3
Our
The
Tj-th.
quantiles, respectively. 8
Related Literature
analysis will rely heavily on the previous work on Bayesian estimators in the likelihood setting.
initial large
sample work on Bayesian estimators was done by Laplace (see
detailed review). Further early
extended in both econometric and
statistical research, cf.
and Yahav (1969), Andrews (1994b),
The treatments
Phillips
in useful generality
of Bickel
for the present setting,
8 This
a
Ibragimov and Has'minskii (1981), Bickel
and Ploberger (1996), and
In general, Bayesian asymptotics require very delicate control of the
and were developed
Stigler (1975) for
work of Bernstein (1917) and von Mises (1931) has been considerably
much
later
tail
Kim
(1998),
among
others.
of the posterior distribution
than the asymptotics of extremum estimators.
and Yahav (1969) and Ibragimov and Has'minskii (1981) are most
but are inevitably tied down to the likelihood
setting.
useful
For example, the
formulation implies that conditional on the data, the decision 8 satisfies Savage's axioms of choice under
uncertainty with subjective probabilities given by p„ (these include the usual asymmetry and negative transitivity of
strict preference relationship,
independence, and some other standard axioms).
latter
treatment
relies
being a likelihood of
heavily on Hellinger bounds that are firmly rooted in the objective function
iid
data. However, the general flavor of the approach
suited for the present
is
purposes. The treatment of Bickel and Yahav (1969) can be easily extended to smooth, possibly
incorrect iid likelihoods, 9 but does not apply to censored
settings.
Andrews (1994b) and
median regression or any
of the
GMM type
and Ploberger (1996) study the large sample approximation
Phillips
of posteriors- and posterior odds ratio tests in relation to the classical
Kim
of smooth, correctly specified likelihoods.
Wald
tests in the context
(1998) derives the limit behavior of posteriors in
Kim's approach and related approaches
likelihood models over shrinking neighborhood systems.
have been important in describing the essence of posterior behavior, but the limit behavior of point
estimates like ours does not follow from
it.
10
Formally and substantively, none of the above treatments apply to our motivating examples and the
estimators given in Definition
1.
These examples do not involve
likelihoods, deal mostly with
GMM
type objective functions, and often involve discontinuous and non-smooth criterion functions to which
the above mentioned results do not apply. In order to develop the theory of LTE's for such examples,
we extend the previous arguments. The
of the Bayesian
examples
results obtained here enable the use of Bayesian tools outside
framework - covering models with non-likelihood-based criterion functions, such as
listed earlier
and other semi-parametric objective functions that may,
example, depend
for
on preliminary estimates of infinite-dimensional nuisance parameters. Moreover, our
general forms of data generating processes - from the cross-sectional framework of
results apply to
Amemiya
(1985)
to the nonlinear dynamic framework of Gallant and White (1988) and Potscher and Prucha (1997).
Our motivating problems
are
to such problems, see notably
all
semi-parametric, and there are several pure Bayesian approaches
Doksum and Lo
(1990), Diaconis
and Freedman
(1986),
Hahn
(1997),
Chamberlain and Imbens (1997), Kottas and Gelfand (2001). Semi-parametric models have some
parametric and nonparametric components,
Examples
in
1-3.
e.g.
the unspecified nonparametric distribution of data
The mentioned papers proceed with the pure Bayesian approach
to such problems,
which involves Bayesian learning about these two components via a two-step process. In the first step,
Bayesian non-parametric learning with Dirichlet priors
is
used to form beliefs about the joint non-
parametric density of data, and then draws of the non-parametric density ("Bayesian bootstrap") are
made repeatedly
as
it
fully
different
to
compute the extremum parameter
conforms to the Bayes learning model.
from LTE's or QBE's studied
of interest. This
It is clear
in this paper,
and
approach
is
that this approach
in applications,
it still
purely Bayesian,
is
generally quite
requires
numerous
re-computations of the extremum estimates in order to construct the posterior distribution over the
parameter of
the
common
interest. In
sharp contrast, the
LT
criterion functions as posteriors,
estimation takes a "shortcut" by essentially using
and thus
entirely avoids
both the estimation of the
nonparametric distribution of the data and the repeated computation of extremum estimates.
9 See
iid
Bunke and Milhaud
likelihood case.
I0
(1998) for an extension to the
The conditions do not apply
E.g. to describe the behavior of posterior
L n (&) beyond
GMM
the compact \.f\/n neighborhoods of &o-
requires the study of the complete
L n (6).
more than three times
Example 1.
median one needs to know f*
to
differentiable
smooth misspecified
or even
p n {6)d9 which requires the study of
mean is J^°00 9p n (8)dd, which also
Similarly, the posterior
Finally note that the
information principle
set of
approach has a limited-information or semi-parametric nature
we do not know
sense that
a
LTE
moment
is
maximum
conditions, calculates the
calculation of the
maximum
entropy densities consistent
While
(misspecified) likelihoods.
entropy densities
is
in
the present framework,
not needed, the large sample theory obtained here
does cover Zellner's (1998) estimators as one fundamental case. Related work by
a
The limitedwho starts with
with the moment
powerfully elaborated in the recent work of Zellner (1998),
and uses those as formal
equations,
in the
or are not willing to specify the complete data density.
limited information likelihood interpretation for certain
GMM
smooth
Kim
settings.
11
(2002) derives
In addition,
the LTE's based on the empirical likelihood are introduced in Section 4 and motivated there as
respecting the limited information principle.
Large Sample Properties
3
This section shows that under general regularity conditions the Quasi-posterior distribution concentrates at the speed T./\/n around the true parameter 9
moments" norm (and
total variation
norm
as
measured by the
as a special case), that the
and asymptotically normal, and that Quasi-posterior
quantiles
LT
"total variation of
estimators are consistent
and other relevant quantities provide
asymptotically valid confidence intervals.
Assumptions
3.1
We
begin by stating the main assumptions. In addition,
the criterion functions
For example, given the underlying probability space
function of
9,
and
ASSUMPTION
vex subset
ii.
p„(u)
p
is
1
any 9 € 0, L n (9)
—
is
(Parameter) The
of Euclidean space
ASSUMPTION
i.
for
it is
assumed without further notice that
L n {8) and other primitive objects have the
Rd
(ft,
a random
T,P),
convex and p(h)
<
true parameter 9
is
iv.
<p(£)
the weighting function
"Kim
(2002) also provided
n
some
:
€
ft,
L n (9)
is
a measurable
belongs to the interior of a compact con-
>
and p(u)
for some
—
p>
iff
u
—
:
Rd
->
is
€
Kd
useful
is
satisfies:
1,
minimized uniquely
—> R+
M+
0,
'
Hi.
w
a measurable function of w.
.
1 4- \h\ p
= JRi p(u — f)e -u a "du
any
variable, that
2 (Penalty Function) The loss function p n
p(y/nu), where p(u)
standard measurability properties.
for
at
some
£*
for any finite a
>
0,
a continuous, uniformly positive density function.
asymptotic results for exp(L n (0)) using the shrinking neighborhood ap-
proach. However, Kim's (2002) approach does not cover the estimators and procedures considered here, see previous
footnote.
ASSUMPTION
3 (Identifiability) For any
D
[\e-9
ASSUMPTION
i.
ii.
Ln
(9)
—
iv.
for each
(9o)
=
(ft,)
n-^(9 )A n
Jn
(9
)
(9
/y/E
0(1) and
e
>
0, there exists e
\>6n\
>
-
e
)'
-m
fl n
A„
such that
0,
1
J
4 (Expansion) For 9 in an open neighborhood of
- Ln
Hi.
>
-(L n (0)-L n (6o ))<-e\
sup
, j
<5
(6>
)
- \{9 -
0o)'
0o>
-9
[nJn (0O )] (9
+ R n (9),
)
tf(0,I),
(9o)
=
0(1) are uniformly in n
there is a sufficiently small 5
(a)
limsupP*!
f&;
limsupP*]
>
and
positive- definite constant matrices,
M>
large
'
Jg" <g
sup
sup
|-RnW|
such that
> el <
>e
>
e,
=
0.
Discussion of Assumptions
3.2
In the following
We
true.
we
discuss the stated assumptions under which
Theorem
1-4 stated below will be
argue that these assumptions are simple but encompass a wide variety of econometric
models - from cross-sectional models to nonlinear dynamic models. This means that Theorems 1-4
are of wide interest
and
applicability.
In general, Assumptions 1-4 are related to but different
from those
in Bickel
Ibragimov and Has'minskii (1981). The most substantial differences appear
due to the general non-likelihood
to handle the
tail
setting.
in
and Yahav (1969) and
Assumption
4,
and are
Also in Assumption 4 we introduce Huber type conditions
behavior of discontinuous and non-smooth criterion functions.
early approaches are inevitably tied to the iid likelihood formulation,
which
In general, the
not suited for the
is
present purposes.
The compactness Assumption
difficult
it is
only required that
7r is
it is
only required that
Je
2
it is
1 is
conventional.
It is
shown
in the
proof of Theorem
1
that
required that
Assumption
2.
Je
\0\
it is
not
Theorem
3,
a proper density; in the case of Quasi-posterior variances in Theorem
4,
to drop compactness. For example, in the case of Quasi-posterior quantiles in
2
\9\
n(9)d9
p ir(0)d9
<
<
oo;
oo.
Also, that the parameter
and
Of
is
for the general loss functions considered in
course, compactness guarantees
on the
interior of the
all
of the
Theorem
above given
parameter space rules out some
non-regular cases; see for example Andrews (1999).
Assumption
2
imposes convexity on the penalty function.
functions for pragmatic reasons.
One
of the
We
do not consider non-convex penalty
main motivations
of this paper
is
the generic com-
putability of the estimates, given that they solve well-defined convex optimization problems.
10
The
domination condition, p(h)
<
1
+
\h\ p
some
for
< p<
1
oo,
conventional and
is
satisfied in all
is
examples of p we gave.
The assumption that
some
finite
f
*
(p(£)
= / p(u — £)e~*''"'du oc Ep(Af(0, o _1 —
any positive
for
mentioned. In fact, when p
Assumption 3
in
Amemiya
LEMMA
i.
it.
is
)
definite
required,
is
=
The proof
Lemma
of
Ln
(8)
Given Assumption
jn
Assumption 4
—
it
clearly holds for
1,
can be found in
1
Assumption 3 holds
>
nonstochastic, continuous on Q, for any S
is
and
all
of examples of
p we
by Anderson (1955)'s lemma.
implied by the usual uniform convergence and unique identification conditions as
(1985).
1
o
symmetric, £*
is
a unique minimum at
£) attains
Mn
(6)
is satisfied
models. Assumption
for cross-sectional
4.ii
0,
Amemiya
(1985)
Mn
a function
if there is
limsup n (supi 9 _ 9
i
and White (1994).
Mn
><s
{9) that
(9)
—
Mn
(6 ))
< 0,
converges to zero in (outer) probability uniformly over Q.
under the conditions of
Lemma 2,
which are known to be mild
requires asymptotic normality to hold,
and
and many time-series applications. Assumption
is
4.iii
generally a
in nonlinear
weak assumption
out the cases of mixed
rules
asymptotic normality for some non-stationary time series models (which can be incorporated at a
notational cost with different scaling rates).
Assumption
LEMMA
4.iv easily holds
when
2 Given Assumptions
1
there
and
A n (0 o = V 9 L„(0o
)
i.
ii.
for
some 6
there is
fl n
>
0,
{9
)
Ln
(9)
Mn
and
such that
some 6
>
and each
3,
enough smoothness.
Assumption 4 holds with
and Jn (8
)
(6) are twice
n- 1/2 {9 )VeL n {8
0(1) are uniformly positive
Hi. for
is
e
definite,
)
= -V 99 >Mn (9
)
=
0(1),
continuously differentiable in 6 when
)/s/h~
-^
J„(0O )
A/"(0,7),
is
immediate, hence
0(1) and
— 6q\ <
Q n (9
)
6,
=
and
sup
\Vge>L
\Vee'L n
{9)/n-V e e'Mn (e)\>e\=0.
(9)\>e'
(9)/n-Vee'M
[\e-9 \<6
2
=
\6
>
limsupP*]{
Lemma
if
its
proof
is
J
omitted.
Both Lemmas
1
and 2 are simple but useful
conditions that can be easily verified using standard uniform laws of large numbers and central limit
theorems. In particular, they have been proven to hold for criterion functions corresponding to
1.
Most smooth
cross-sectional models described in
11
Amemiya
(1985);
:
2.
The smooth
GMM
and dynamic
nonlinear stationary
(1982), Gallant
and Quasi-likelihood models of Hansen
and White (1988) and Potscher and Prucha (1997), covering Gordin(mixingale
type) conditions and near-epoch dependent processes such as
ARMA, GARCH, ARCH,
and
other models alike;
3. General empirical likelihood
(1997),
models for smooth moment equation models studied by Imbens
Kitamuraand Stutzer
(1997),
Newey and Smith
(2001),
Owen
(1989,1990,1991, 2001),
Qin and Lawless (1994), and the recent extensions to the conditional moment equations.
Hence the main
statistical
results of this paper,
Theorems
1-4,
apply to these fundamental econometric and
models. Moreover, Assumption 4 does not require differentiability of the criterion function
and thus holds even more
generally.
Assumption 4.iv
is
a Huber-like stochastic equicontinuity
condition, which requires that the remainder term of the expansion can be controlled in a particular
way over a neighborhood
4 are given
in empirical
of 6q. In addition to
Lemma 2, many sufficient
process literature, e.g.
Amemiya
(1985),
conditions for Assumption
Andrews (1994a), Newey
Pakes and Pollard (1989), and van der Vaart and Wellner (1996). Section 4
for the leading
verifies
(1991),
Assumption 4
models with nonsmooth criterion functions, including the examples discussed
in the
previous section.
Convergence
3.3
in the Total Variation of
Moments Norm
we show that the Quasi-posterior density concentrates around #o at the
speed 1/y/n as measured by the total variation of moments norm, and then use this preliminary
Under Assumptions
1-4,
result to prove all other
main
results.
Define the local parameter h as a normalized deviation from 9q and centered at the normalized
random
"score function"
h
=
y/n~ (9
-
0„)
1
- Jn
(Bo)'
A„
(9
)
lyfc.
Define by the Jacobi rule the localized Quasi-posterior density for h as
p' (h)
n
Define the total variation of
=
-)=p n (hl^fc + 9 + Jn (Oo)-
moments norm
\\f\\
THEOREM
1
-
4,
for any
1
(
for a real-valued
TVMM =
J
1
A n (0O
)
/»)
measurable function / on S as
(l+\h\ a )\f(h)\dh.
Convergence in Total Variation of Moments Norm) Under Assumptions
< a <
00,
\\PnW ~
Plo(h)\\
TVMM = f
(1
+
12
\h\
a
)\
*
*
Pn (h) - Px (h)\dh^ p
0,
Hn = {y/n{9 - 9
where
Theorem
1
)
- Jn
-1
A n (0O
(0 O )
)
/y/K
0} and
9 e
:
shows that p n (9) is concentrated at a l/y/n neighborhood of 9 as measured by the total
moments norm. For large n, p n (9) is approximately a random normal density with the
variation of
random mean parameter
Theorem
1 applies
9q
+ Jn
_
A„
(0o)
and constant variance parameter Jn (^o) _1 / n
(9a) /n,
to general statistical criterion functions
L n (9),
hence
it
-
covers the parametric
likelihood setting as a fundamental case, in particular implying the Bernstein- Von Mises theorems,
which state the convergence of the likelihood posterior to the limit random density
Note also that the
variation norm.
norm
total variation
moments norm. The use of the latter
as the posterior means or variances in Theorems
variation of
results
from setting a
—
in the total
in the total
needed to deduce the convergence of LTE's such
is
2-4.
Limit Results for Point Estimates and Confidence Intervals
3.4
As a consequence
of
Theorem
of LTE's.
When
extremum
estimators.
Recall that the
1,
Theorem
the loss function p
2 establishes y/n- consistency
and asymptotic normality
symmetric, LTE's are asymptotically equivalent to the
is
(•)
extremum estimator y/n(9 ex —
9q),
where 9 ex
=
argsup 9ee
L n (9),
is first
order-
equivalent to
=^J
y/n
Un
Given that the p* approaches
p*^,
it
1
n (9 )-
may
A n (9
).
be expected that the
LTE
y/n{9
-
9q)
is
asymptotically
equivalent to
Zn To
a relationship between
see
arg inf
z£® d
Zn
6„ c«
which
exists
6„(»o>
= °-
by Assumption
2.
>
12
and
-
I
/
Un
,
L
p
is
12
qa
/
i
p( z
symmetric,
- u )p~ (") du
i.e.
p(h)
—
\
>
p(—h), then by Anderson's lemma
—
£j„(s
"*"
)
^">
are prepared to state the result.
For example,
is
.
Hence
Zn
and we
\
J
define
ar s in
If
p(z- u)p1 (it - Un ) du
JR*
I
in
the scalar parameter case,
the Q-quantile of Af{0,
if
p(h)
=
(a
1).
13
—
l(h
<
0))/i,
the constant £j(s
)
=
q a Jn{9o)
'' 2
,
where
THEOREM
(LTE
2
y/ii0
-
in Large Samples) Under Assumptions
0b)
=
6.(.o)
1-4,
+ Vn + Op (l), n;^ 2 (0o)J„(eo)f/n ->i
JV(0
,/).
Hence
// ioss function
pn
is
symmetric,
i.e.
(h)
pn
=
p n (—h) for
all h, £j„(»
)
=
for each n.
In order for the Quasi-posterior distribution to provide valid large sample confidence intervals, the
density of
Wn =
J\f (0,
Jn {9a)^U n {9o)Jn{9o)~^) should coincide with that ofp^
)- 1
I hh'pl(h)dh = Jn (9
~ Var(W„)
ee
(h).
This requires
Jn(0 )-*n„{9o)Jn(9o)-\
or equivalently
~
^rt(^o)
Jn(8o),
The information equality is known to hold for
known to hold for appropriately constructed criterion
functions of generalized method of moments, minimum distance estimators, generalized empirical
likelihood estimators, and properly weighted extremum estimators; see Section 4.
which
is
a generalized information equality.
regular, correctly specified likelihoods. It
is
also
Consider construction of the confidence intervals for the quantity g{9
and suppose g
),
is
continuously
differentiable. Define
Fg,n(x)=
p n {9)d9,
/
and
c 9 ,„(a)
—
inf{i
:
F
ti
„{x)
>
a}.
Jeee:g(9)<x
Then a LT confidence
interval
is
given by
[cg
t
„(a/2),cgjn (l
—
a/2)]
confidence intervals can be constructed by using the a/2 and
1
.
As previously mentioned, these
— a/2
quantiles of the
MCMC
sequence
(g(9^),...,g(9^))
and thus are quite simple
in practice. In order for
the intervals to be valid in large samples, one needs
to ensure the generalized information equality, which can be done easily through the use of optimal
weighting in
GMM
and minimum-distance
criterion functions or the use of generalized empirical
likelihood functions; see Section 4.
Consider
now
the usual asymptotic intervals based on the A- method and any estimator with the
property
y/n~{9-0)
Such
=
1
J„(B )-
^„(9)/y/K+op (l).
intervals are usually given by
where qa
is
the a-quantile of the standard normal distribution. The following theorem establishes
the large sample correspondence of the Quasi-posterior confidence intervals to the above intervals.
14
9
THEOREM
Sample Inference
3 (Large
I) Suppose Assumptions 1-4 hold.
In addition sup-
pose that the generalized information equality holds:
lim Jn {9 )Q n {9 )- 1
=1.
n—nx>
Then for any a
(0, 1)
€E
c g n (a)
,
- g{9) -
qa
y/^^''-w-2inm =
0p
(*\
,
and
lim
7i—yoo
P*{c9 ,„(a/2) <
g(9
<
)
c9 ,„(l
I
- a/2)) =
1
-a.
J
One practical limitation of this result arises in the case of regression criterion functions
(M-estimators),
where achieving the information equality may require nonparametric estimation of appropriate
weights,
e.g.
by using a
as in censored quantile regression discussed in Section
different
we can
quantiles,
method
4.
This
may
entirely
be avoided
for construction of confidence intervals. Instead of the Quasi-posterior
use the Quasi-posterior variance as an estimate of the inverse of the population
Hessian matrix J~ 1
(9o),
and combine
easier to obtain) in order to obtain the
it
with any available estimate of
A-method
style intervals.
The
Q n {9o)
(which typically
is
methods
is
usefulness of this
particularly evident in the censored quantile regression, where direct estimation of
Jn (9o)
requires
use of nonparametric methods.
THEOREM
4 (Large Sample Inference II) Suppose Assumptions 1-4
hold.
Define for
=
fe 9pn (e)de,
J-\9
)
= f n(9- 9)(9 - 9)'pn (9)d6,
Je
and
=
c 9in (a)
where
n n ((9 )n-
1
(6>
)^
p /.
lim
In practice
sequence
Jn (#o)
S=
-1
is
\/v^(^ ^(
,
g{9)
+
qa
9
,
n (a/2)
o)
-'^(«oV-(°°)-'^^W
)
Then JTl {9 )Jn (9 )- 1
Wc
fl
<
g(9
)
^p
<
I,
and
- a/2))
c 9 n (l
,
= 1 - a.
computed by multiplying by n the variance-covariance matrix of the
(9^,9^,...,
<-
B
MCMC
1).
Applications to Selected Problems
4
This section further elaborates the approach through several examples. Assumptions 1-4 cover a wide
variety of
next
in
is
smooth econometric models (by
virtue of
Examples 1-3.
Verification of the key
Assumption 4
Propositions 1-3 and the forthcoming examples show
prime
Lemma
1
and
Lemma
2).
Thus, what follows
mainly motivated by models with non-smooth moment equations, such as those occurring
interest to us.
15
how
is
not immediate in these examples, and
to do this in a class of models that are of
'
Generalized
4.1
Method
Going back to Example
set of population
Moments and Nonlinear Instrumental
of
that a typical model that underlies the applications of
2, recall
moment
of
moment
GMM
is
a
equations:
Emi(6)=Q
Method
Variables
if
and only
=
9
if
9
(4.1)
.
estimators involve maximizing an objective function of the form
(9)
=
-n(gn (9))'Wn (9)(gn
9n(0)
=
-Vm,(ff),
*—
Ln
Wn
W
(9))/2,
(4.2)
(4.3)
n
(0)
= W{9) + op (1)
(9)
>
uniformly in
e 0,
and continuous uniformly
in 9
(E
(4.4)
0,
(4.5)
-1
W(9
The
=
)
flim Varh/^flo)]!
(4.6)
•
choice (4.6) of the weighting matrix implies the generalized information equality under standard
regularity conditions.
Theorem
Generally, by a Central Limit
\/™<?n(#o)->d
^'{0,W'~ 1 (9o)), so that the objective function
f
can be interpreted as the approximate log-likelihood for the sample moments of the data g n {ff)- Thus
we can think of
as an approach that specifies an approximate likelihood for selected moments
GMM
13
of the data without specifying the likelihood of the entire data.
We may
impose Assumptions 1-4 directly on the
GMM
objective function.
However, to highlight
the plausibility and elaborate on some examples that satisfy Assumption 4 consider the following
proposition.
Proposition
and that for
i.
ii.
Hi.
1
all
(Method-of-Moments and Nonlinear IV) Suppose
9 in 0, m,-(0)
is
conditions (4-l)-(4-5) hold,
J{9)
=
G{9)'W{9)G(9)
A n (9 )/^=
iv. for
any
e
>
0, there is
r^co
Then Assumption 4
>
-Vn~gn(e
limsuppJ
6
and
is
where the condition
>
= V g Emi{9)
A'(O,n(0o )), Q(9
sup
in
continuous,
=G(9 )'W(9
^^W-^))-(^W-^(y))l
l
+ y/ri\9-e'\
)G(9 Q ),
>e l <e
.
(4.7,
J
In addition the information equality holds by construction. Therefore the
A n (#
),
ft n
(9o)
=
fi
(#o)
6) is only needed for the conclusions of
This does not help much
)
is
such that
\\e-B'\<s
holds.
(4-
continuous, G{9)
yW(9 )G(9 )^ d
conclusions of Theorems 1-4 hold with
13
that Assumptions 1-2 hold,
stationary and ergodic, and
and Jn {9
Theorem 3
terms of providing formal asymptotic results
16
for the
)
=
J
(9q) defined above,
to hold.
GMM model.
LTE
Therefore, for symmetric loss functions p ni the
asymptotically equivalent to the
is
extremum estimator. Furthermore, the generalized information
GMM
equality holds by construction, hence
Quasi-posterior quantiles provide a computationally attractive method of "inverting" the objective
function for the confidence intervals.
For twice continuously differentiable smooth
V$L n (9) and Vee'L n {9)
generally,
Andrews
many methods
Lemma
stated in
(1994a), Pakes
moment
conditions, the smoothness conditions
imply condition iv in Proposition
2 trivially
1.
on
More
and Pollard (1989) and van der Vaart and Wellner (1996) provide
to verify that condition in a wide variety of method-of-moments models.
Example 3 Continued.
approach and the
Instrumental median regression falls outside of both the classical Bayesian
classical
smooth nonlinear IV approach of Amemiya (1977). Yet the conditions of
Proposition 1 are satisfied under mild conditions:
(Yi,Di,Xi Zi)
i.
7
{m,i(9)
ii.
iii.
G{9)
iv. J{9)
=
(r
-
an
is
1(Y;
iid
<
data sequence, E[m.i(9o)Zi]
q{Di,Xi,9)))
= VeErmie) = -EfYlDiX
=
G{9)'W(9)G{9)
>
,
z
and
Zu 9
€ 0}
=
0,
and
a Donsker
is
{q{D,X,9))ZVeq{D,X,9)'
9q
is
class,
is
identifiable,
14
|m;(0)|
2
<
oo,
continuous,
continuous in an open ball at 9
is
Esup g
.
In this case the weighting matrix can be taken as
Wn
(9)
=
—?—
r(l-r)
\tm
n
.
so that the information equality holds. Indeed, in this case
n(0 o )
=
G(9 )'W(9 )G(9
)
=
J(9
),
where
W(9
When
the model q
)
=
is
plim
linear
Wn
(9
)
=
[Var
m^n)]
and the dimension of
estimators in the literature. 15 In
Var mi {9
,
D
is
)
=
r(l
-
T)EZiZ[.
small, there are computable
more general models, the extremum estimates
compute, and the inference faces the well-known
On
-1
and
practical
are quite difficult to
difficulty of estimating sparsity
parameters.
the other hand, the Quasi-posterior median and quantiles are easy to compute and provide
asymptotically valid confidence intervals. Note that the inference does not require the estimation of
14
This
is
a very weak restriction on the function
class,
and
is
known
forms, see van der Vaart (1999).
I5
These include e.g. the "inverse" quantile regression approach
in
to hold for
all
Chernozhukov and Hansen (2001), which
extension of Koenker and Bassett (1978)'s quantile regression to endogenous settings.
17
practically relevant functional
is
an
.
the density function.
The
simulation example given in Section 5 strongly supports this alternative
approach.
Another important example which poses computational challenge
et al.
(1995). This example
application of the
the estimation problem of Berry
LT methods may be
and the
fruitful there.
Generalized Empirical Likelihood
4.2
A
is
similar in nature to the instrumental quantile regression,
is
class of objective functions that are first-order equivalent to optimally
weighted
GMM
(after
recentering) can be formulated using the generalized empirical likelihood framework.
A
class of generalized empirical likelihood
functions(GEL) are studied
mura and Stutzer (1997), and Newey and Smith (2001). For a
in
Imbens
et al. (1998), Kita-
moment equations Em^do)
set of
=
that satisfy the conditions of section 4.1, define
n
L n (0, 7 = J2
)
Then
-
(m, (0)' 7 )
(a
a (0))
(4.8)
set
£„(*)
= £„ (0,7W),
(4-9)
where 7(0) solves
7(0)=arg
and p
The
=
inf^.L B (0,7),
(4.10)
dim(mj).
scalar function s(-)
is
a
strictly convex, finite,
and three times
differentiable function
on an open
R containing 0, denoted V, and is equal to +00 outside such an interval, s () is normalized
both Vs (0) = 1 and V 2 s (0) = 1. The choices of the function s(w) = — ln(l — v), exp(u), and
interval of
so that
(1
+v) 2 /2
GMM
lead to the well-known empirical likelihood, exponential tilting, and continuous-updating
criterion functions.
Simple and practical sufficient conditions for
et al. (1998),
data,
Kitamura
Newey and Smith
these problems
To
illustrate
is
(1997),
Lemma
2 are given in Qin
Kitamura and Stutzer
(2001),
and Christoffersen
et
and Lawless (1994), Imbens
(1997), including stationary weakly dependent
al.
(1999). Thus, the application of
LTE's to
immediate.
a further use of LTE's we state a set of simple conditions geared towards non-smooth
microeconometric applications such as the instrumental quantile regression problem. These regularity
conditions imply the
first
Donskerness condition below
is
order equivalence of the
GEL
a weak assumption that
is
and nonlinear functional forms encountered
known
some S >
18
and
all
that
€
objective functions.
to hold for
in practice, as discussed in
Proposition 2 (Empirical likelihood Problems) Suppose
the following conditions are satisfied: for
GMM
and
all
The
reasonable linear
van der Vaart (1999).
Assumptions
1-
2
hold,
and that
1
condition (4-1) holds and that m{(9)
i.
< x]/d6
is
continuous in 9 uniformly in x
<s |m,-(0)|
<
K
dP[mi{9)
ii.
Hi. supi e _g
i
{m{(9),9 €
iv.
is iid,
0}
Donsker
is
for some constant
a.s.,
< K,
for
J(9
)
= G(9
The information equality holds in
1-4
0,
are true with
yft9n(0 )V.(J&o)-*G(9 )-> d JV(O,n(0 o )),
)=G(9 yV(9
)
iii.
V (9 = E{mi{0o )mi{0o y} >
then Assumptions 3 and 4 hold, and thus the conclusions of Theorems
n($
in
K
W^^(0/W),
A„ (ftO/Vn =
K
where
class,
= 4= f>i
Vn~9n(0o)
\x\
:
)'V(9
)- i G(9
r
1
),
G(9o),
G(0
= VeEmi {9
)
).
this case.
Another (equivalent) way to proceed
is
through the dual formulation. Consider the following criterion
function
n
L n (9) =
sup
jri,...,ir„e[0,l]
where h
is
n
i=1
cf.
L n (9)
in (4.11)
is
0,
7(7 +
y^7r;
=
(4.11)
1,
i=1
Newey and Smith (2001)
(l?^
1
function
=
i=1
the Cressie-Reid divergence criterion,
h(*)
The
n
Y^ft(7rj) subject to Y^m;((?)7Tj
1
-
1
1)
the generalized empirical likelihood function
for
9 with the con-
centrated out probabilities. In fact, (4.11) corresponds to (4.9) by the argument given in Qin and
Lawless (1994)
p. 303-304,
Empirical probabilities
so that Proposition 2 covers (4.11) as a special case up to renormalization.
7?;(0)'s
are obtained in (4.11) using the
yields the empirical likelihood case,
method.
Taking 7
=
where
7fi(9)'s
extremum method. The
are obtained through the
yields the exponential tilting case,
where
case
maximum
7r,(#)'s are
obtained through
minimization of the Kullback-Leibler distance from the empirical distribution. Taking 7
the continuous-updating case, where
9.
Each approach generates the implied
n
=
Vs(7(ff)
1 yields
probabilities T?i{9)
Qin and Lawless (1994) and Newey and Smith (2001) provide the formulas:
S?iW
=
are obtained through the minimization of the Euclidean
7?j(#)'s
distance from the empirical distribution.
given
7= —
likelihood
,
m (0))/X> S
i
19
(
Wm
7
-(0)).
I
The Quasi-posterior
for 9
and
can be used for predictive inference. Suppose m,i(0)
7r,(0)
some random vector Xj. Then the Quasi-posterior
for
P{Xi eA} =
predictive probability
is
=
m(Xi,9)
given by
JY^miiXi € A]pn {9)dB = Jh n (9)pn [0)dB,
=h n (9)
MCMC sequence evaluated at h n (h n (9^), ..., h n (9^))
Theorem 1 in Qin and Lawless (1994) that y/n(P{Xi e A} —
where n„ = P{X 6 A}(1 - P{X € A}) - Emi(9o)'l{Xi € A} U
which can be computed by averaging over the
,
It follows similarly to the proof of
P{Xi € A})-> d JV(0,n„)
Emi(9 )l{Xi eA},U = V(9o)~ 1 {I ~ G(^)J{0o)- G(9o)V(9o )- 1 }.
,
{
{
1
M-estimation
4.3
M-estimators, which include
many
linear
and nonlinear regressions as
special cases, typically maxi-
mize objective functions of the form
n
Ln
{9)
=
Ym
j
{e).
i
i=l
rtii
(9)
need not be the log likelihood function of observation
and
i,
may depend on
preliminary
non-parametric estimation. Assumptions 1-3 usually are satisfied by uniform laws of large numbers
and by unique
McFadden
identification of the parameter; see for
(1994).
The next
example Amemiya (1985) and Newey and
proposition gives a simple set of sufficient conditions for Assumption
Proposition 3 (M-problems) Suppose Assumptions 1-3 hold
for the criterion function specified
above with the following additional conditions: Uniformly in 9 in an open neighborhood of 9q,
is
stationary and ergodic, and for fn n (9)
i.
=
there exists m,(9o) such that Erhi(9o)
m.i{9)
4.
rrii{9)
Yl7=i "i,-(0)/n,
—
- mi{9 ) - m;(0o )'(0 \9-9
for each
9
i
)
:
,
\9
and, for some 5
—
0q\
< 6>>
is
>
0,
a Donsker class,
\
E[m n {9) -
mn (9
)
-
m n (0o )'(O - 9
2
)]
=
o{\9
-
2
O
|
),
n
ii.
J(9)
— — Vee'E[m,i{9)]
Then Assumption 4
J{9o)
=
fi(#
),
holds.
is
continuous and nonsingular in a
Therefore, the conclusions of
Theorems
then the conclusions of Theorem 3 also hold.
20
ball at 9q.
1, 2,
and 4
hold.
If in addition
The above conditions apply to many well known examples such as LAD, see for example van der Vaart
and Wellner (1996). Therefore, for many nonlinear regressions, Quasi-posterior means, modes, and
medians are asymptotically equivalent, and Quasi-posterior quantiles provide asymptotically
confidence statements
fails
if
the generalized information equality holds.
Theorem 4 provides
to hold, the method of
Example
1
Continued.
is
not
difficult
Newey and Powell
Assumption
(1990) imply
/,
=
fYi \Xi(qi),
Qi
—
(1990)
Furthermore,
it
are nonparametrically estimated, the conditions of
ui*
4.
Newey and Powell
satisfied.
Under
iid
sampling, the use of efficient weighting
= RT^r) *'
">
where
in Powell (1984) or
assumptions of Proposition 3 are
when the weights
to show that
valid
the information equality
valid confidence intervals.
Under the conditions given
for the censored quantile regression, the
When
<
412 )
q{Xi;6a), validates the generalized information equality, and the Quasi-
posterior quantiles form asymptotically valid confidence intervals. Indeed, since
J (*°>
= T {\- T E ^ VqiV ^
fOT Vqi
=
9?(Xi 9o)/d6
'
>
(
4J3 )
)
and
n
1
-=
A n (0o =
)
*Jn
v
T V»r
1
•Jn
v
'
(T-
\{Yi
<
qi ))V qi
-„
jV(0,
n(0o )),
(4.14)
-f—
t=i
with
n{9o)
=
7o^)^ V9iV^'
(415)
we have
n(flo)
For this
class of
(4.16)
).
problems, the Quasi-posterior means and medians are asymptotically equivalent
to the extremum estimators.
intervals
= A0
when the
efficient
The
Quasi-posterior quantiles provide asymptotically valid confidence
weights are used. However, estimation of
inary estimation of parameter 9
.
When
efficient
weights requires prelim-
other weights are used, the method of
Theorem 4 provides
valid confidence intervals.
5
Computation and Simulation Examples
In this section
5.1
The
we
briefly discuss
the
MCMC
method and present simulation examples.
Markov Chain Monte Carlo
Quasi-posterior density
is
proportional to
L 6
Pn {e)<xe ^ \{e).
21
we can
In most cases
easily
compute e L "^Tr{9). However, computation of the point estimates and
confidence intervals typically requires evaluation of integrals like
Je g(9)e^ M0)M
L
Je e ~Wir(9)d0
e
{
'
For problems for which no analytic solution exists for
for various functions g.
(5.1), especially in
MCMC methods provide powerful tools for evaluating integrals like the one above.
high dimensions,
See for example Chib (2001), Geweke and Keane (2001), and Robert and Casella (1999) for excellent
treatments.
MCMC
bution
is
a
methods that produce an ergodic Markov chain with the stationary
collection of
pn Given a starting
.
value 0(°' a chain
(#(')
,
,
1
<
t
< B)
is
distri-
generated using a transition kernel
9^
with stationary distribution p n which ensures the convergence of the marginal distribution of
B, the
methods produce a dependent sample (9^ a \9^ 1 \...,9^)
,
MCMC
top,,. For sufficiently large
whose empirical distribution approaches p n
imply that as
B
.
The
ergodicity
B
(
stress that this technique does not rely
computation of LTE's. (Appendix
One
of the
most important
Metropolis-Hastings
density
p n {9) oc e
generate (9^°\
of the chains usually
r
1
o£s(0 °Hi>
We
and construction
-> oo,
...,
Ln< 6
-
9
MCMC
)
on the likelihood principle and can be
methods
is
in the following
Choose a starting value 9^.
2.
Generate £ from g(0 w) |?)-
3.
Update
from
9^
=
for j
7+1)
_
used for
the Metropolis-Hastings algorithm.
(MH) algorithm with
1.
fruitfully
provides the formal details.)
\{9), known up to a
(B)
6^' +1 >
B
/ 9W)Pn(8)d9.
Quasi-Posteriors.
Given the Quasi-posterior
constant, and a prespecified conditional density q{9'\9),
way,
1, 2,
...,
using
£
with probability
p(# (i) ,0
0W>
with probability
l-p{9^,0
1
'
where
Note that the most important quantity
an "old" point x to the "new" point
a possible "new" value of y
y,
in the
algorithm
is
the probability p(x, y) of the
which depends on how much
yields relative to e Ln
of
^Tx{x) at the "old" value
22
move from
an improvement in e Ln ^n{y)
x.
Thus, the generated
chain of draws spends a relatively high proportion of time in the higher density regions and a lower
proportion in the lower density regions. Because such proportions of times are balanced in the right
way, the generated sequence of parameter draws has the requisite marginal distribution, which we
then use
for
mode
the
computation of means, medians, and quantiles. (How closely the sequence travels near
not relevant.)
is
Another important choice
is
the transition kernel
q,
also called the instrumental density. It turns
out that a wide variety of kernels yield Markov chains that converge to the distribution of interest.
One
canonical implementation of the
MH
algorithm
Q{x\y)
where /
a density symmetric around
is
0,
is
to take
=f{\x-y\),
such as the Gaussian or the Cauchy density. This implies
a random walk. This
is the implementation we used in this paper. Chib
Robert
and Casella (1999) can be consulted for important
Geweke
and
Keane
and
(2001)
(2001),
details concerning the implementation and convergence monitoring of the algorithm.
that the chain
It is
(9^)
now worth
is
repeating that the main motivation behind the
efficiency properties (stated in sections 3
LTE
the
approach
is
as efficient as the
of dimensionality through the use of
posterior distribution, hence can be
is
MCMC
the number of
MCMC
the
and
4) as well as
LTE
its
computational attractiveness. Indeed,
MCMC.
LTE's are
typically
computed (estimated)
means
or quantiles of a Quasi-
at the parametric rate 1/y/B,
16
where
B
draws (functional evaluations). Indeed, under canonical implementations,
chains are geometrically mixing, so the rates of convergence are the same as under
MCMC and
is
computed (estimated) by
similar grid-based algorithms at the nonparametric rate (l/B) d +?p
parameter dimension and p
We used
based on
is
extremum approach, but may avoid the computational curse
independent sampling. In contrast, the extremum estimator (mode)
the
approach
is
,
where d
is
the
the smoothness order of the objective function.
an optimistic tone regarding the performance of
the objective functions have numerous local optima, but
mum. These problems are important, and
MCMC.
all
Indeed, in the problems
we
study,
pronounced global
exhibit a well
therefore the good performance of
opti-
MCMC and the derived
estimators are encouraging. However, various pathological cases can be constructed, see Robert and
Casella (1999). Functions
which case
the
initial
MCMC
draw
may have
may
may be
multiple separated global
modes
(or
approximate modes),
require extended time for convergence. Another potential problem
9^°'
very far in the
tails of
the posterior
pn (#).
In this case,
also take extended time to converge to the stationary distribution. In the problems
this
may be
values based on an
16
that
MCMC
may
we looked
at,
avoided by choosing a starting value based on economic considerations or other simple
considerations. For example, in the censored
two stage
in
is
initial
median regression example, we may use the starting
Tobit regression. In the instrumental median regression, we
may
use the
may
typically
least squares estimates as the starting values.
Note that the rates are used
for the informal motivation.
increase linearly or polynomially in d
if
d
is
allowed to grow.
23
We
fix
d
in
the discussion, but the rate
Monte Carlo Example
5.2
As discussed
in Section 2,
a large
Censored Median Regression
1:
literature
has been devoted to the computation of Powell's censored
median regression estimator. In the simulation example reported below, we find that both
and large samples with high degree censoring, the LT estimation may be a
in small
useful alternative to the
popular iterated linear programming algorithm of Buchinsky (1991). The model we consider
=
Y*
0o
+ X'B + u,
X=A/"(0,J3 ), w = X!2 A^(0,l),
The
true parameter (80,81,32,63)
The LTE
initial
is
y = max(0,Y*).
(—6,3,3,3), which produces about
is
MCMC series
40%
censoring.
£"
L n (6») = - =1 \Yf - max (0, O + X\8) |. The
taken to be the ordinary least squares estimate, and other details
based on the Powell's objective function
draw of the
is
is
are summarized in Appendix B.
Table
first
The number
reports the results.
1
number
results indicates the
row
for the
in parentheses in the iterated linear
of times that this algorithm converges to a local
ILP reports the performance of the algorithm among the subset
which the algorithm does not converge to the
local
results for all simulation runs, including those for
from the
local
minimum
results,
minimum
The second row
at 0.
of
0.
reports the
which the ILP algorithm does not move away
and they compare favorably to the ILP even when the
results, as
can be seen from Table
LTE's do markedly
When
1.
The
of simulations for
minimum. The LTE's (Quasi-posterior mean and median) never converge to the
of 0,
from the ILP
programming (ULP)
minimum
the local
local
minima
local
are excluded
minima are included
in
the ILP
better.
[Table 1 goes here.]
5.3
We
Monte Carlo Example
Instrumental Quantile Regression
2:
consider a simulation example similar to that in Koenker (1994).
Y = a + D'Bo + u,
u
=
The model
is
a(D)e,
3
D = exptf(0,I3
The
true parameter (ao,0o) equals
Wn
0,
and other
details are
-E(2
(1
+ £l>(o)/5.
_1(Yi -
a+i?
'
/?))2
^''
r
1
•
t=i
draw of the
summarized
= AA(0,l), a(D) =
and we consider the instrumental moment conditions
l-i
{0)
.
In simulations, the initial
e
),
in
MCMC series
Appendix B.
24
is
taken to be the ordinary least squares estimate,
While instrumented median regression
is
designed specifically for endogenous or nonlinear models,
use a classical exogenous example in order to provide a contrast with a clear undisputed
- the standard linear quantile regression.
The benchmark provides a
reliable
and high-quality
estimation method for the exogenous model. In this regard, the performance of the
and
inference, reported in
Table 2 and Table
3, is
we
benchmark
LT
estimation
encouraging.
Table 2 summarizes the performance of LTE's and the standard quantile regression estimator. Table
3 compares the performance of the
LT
confidence intervals to the standard inference method for
The reported
quantile regression implemented in S-plus 4.0.
parameters.
Other
The root mean square
criteria
level of
90%
in
LTE's are no
larger
than those of quantile
regression.
demonstrate similar performance of two methods, as predicted by the asymptotic
The coverage
theory.
errors of the
results are averaged across the slope
of Quasi-posterior quantile confidence intervals
both small and large samples.
is
also close to the
nominal
noteworthy that the intervals do not require
It is also
nonparametric density estimation, as the standard method requires.
[Tables 2
6
An
and 3 go
here.]
Illustrative Empirical Application
The following illustrates the use
of LT estimation in practice.
We consider the problem of forecasting
the conditional quantiles or value-at-risk (VaR) of the Occidental Petroleum
The problem
returns.
economic analysis, but
of forecasting quantiles of return distributions
is
fundamental to the
real-life activities
(NYSE:OXY)
security
not only important for
is
We
of financial firms.
approach provides a simple and
effective
method
an
offer
econometric analysis of a dynamic conditional quantile forecasting model, and show that the
LTE
of estimating such models (despite the difficulties
inherent in the estimation).
The dataset
Y
the one-day returns of the Occidental Petroleum
t,
X
t
consists of 2527 daily observations (September, 1986
,
(NYSE:OXY)
security,
a vector of returns and prices of other securities that affect the distribution of
constant, lagged one-day return of
price of oil
The
- November, 1998) on
Dow
Y
t
:
a
Jones Industrials (DJI), the lagged return on the spot
(NCL, front-month contract on crude
oil
on
NYMEX), and
the lagged return Y(_i.
choice of variables follows a general principle in which the relevant conditioning information
for estimating value-at-risk of
a stock return,
corresponding capitalization and type
X
t
,
may
(for instance,
the
contain such variables as a market index of
S&P500 returns for
a large-cap value stock),
the industry index, a price of a commodity or some other traded risk that the firm
and lagged values of
its
stock price.
25
is
exposed
to,
Two
functional forms of predictive r-th quantile regressions were estimated:
Linear Model
Q n+1 (r|It
:
Dynamic Model:
where
It
t.
Q Yt+1 {T\h
,
,
= X'
9{tJ)
9(t),
t
(? yt+1 (r|/€ ,»(r),e(r))
= X' 9{r) + q(t) QYt [r\It-i,9{r),Q{r)),
t
9(t)) denotes the r-th conditional quantile of
available at time
t.
In other words,
Qyt+l (r|J(,0(r))
The idea behind the dynamic models
is
is
Yt+i
the value-at-risk at the probability level
to better incorporate the entire past information and
by Engle and Manganelli
better predict risk clustering, as introduced
conditional on the information
(2001).
The nonlinear dynamic
models described by Engle and Manganelli (2001) are appealing, but appear to be
using conventional
extremum methods,
empirical analysis of the linear model
The LT estimation and
see
Engle and Manganelli (2001)
difficult to
for discussion.
An
estimate
extended
given in Chernozhukov and Umantsev (2001).
is
inference strategy
is
based on the Koenker and Bassett (1978) criterion
function,
n
L n (9,e) = -Y,MT)PT(Y
t
-Q
Yt
(T\it - u e, e )),
(e.i)
t=s
where p T (u) — (t — l(u < 0))m. This criterion function is similar to that described in Example 1,
with the exception that there is no censoring. The starting value s = 100 initializes the recursive
specification so that the
numerically negligible
imputed
conditions (taken to be the marginal quantiles) have a
we constructed the LT estimates
In the first step,
w
The
initial
effect.
t
=
(r)
1/t(1
—
using the flat weights
t) for each
t
—
s,...,T.
results of the first step are not presented here, but they are very similar to those reported
below. Because the weights are not optimal, the information equality does not hold, hence Quasiposterior quantiles are not valid for confidence intervals. However, the confidence intervals suggested
in
Theorem
consistent
Under the assumption
4 lead to asymptotically valid inference.
specification, stationary sampling,
and the conditions specified
in
of correct
Proposition
3,
and asymptotically normal
(?(r)i(r)')^^(0,J(9o)- "(«o)J(Co)- i )
1
where
for V<? t (r)
= dQ Yl {r\It-i,Q{r),9{r))ld{Q,9')'
J(9
and
for
^M = -^= ^L,
fl(0o)
fr
= l™
the model
is
)
and q t (r)
= EfYtlIt _Mt(T))Vq
~ Iff <
£A
^—
—
T->co J
If
dynamic
the LTE's are
*(t))]
t
(6.2)
,
= Q n (r\It -i,e(r),9(T)),
(T)Wqt (r)',
V ft (r),
n (0 o )A n (0 o )'
-
r(l
- rJEVftWVftW'.
S
not correctly specified, then, for example, the Newey and West (1987) estimator
provides a consistent and robust procedure for estimation of the limit variance Q{9
26
).
The
estimation of the matrix J(#o)
-1
can be done through the use of nonparametric methods as
Powell (1984). Alternatively, as suggested in Theorem
of the
MCMC
J(6a )~ x
4,
we can use the variance-covariance matrix
sequence of parameter draws multiplied by n
=
Plugging the estimates into the variance expression
.
and confidence
we
(T
—
(6.2),
s) as a consistent estimate of
we obtain the standard errors
intervals that are qualitatively similar to those reported in Figures 4-7.
In order to illustrate the use of Quasi-posterior quantiles
efficiency,
in
(Theorem
3)
and improve estimation
also carried out the second step estimation using the Koenker-Bassett criterion function
(6.1) with the weights
h
,
1__
ffi(
[QY (T
t
where h
oc
of correct
+
h/2\it - 1 ,e{T),e{T))-QYt
Cn~ x l z and C >
dynamic
is
{T-h/2\it . u e(T),9{T))]
based on the second step estimates. The
computed
for
each coefficient 0j{t)
and the 90%-confidence
imply the generalized information equality, which
intervals,
=
(j
.05-th, .5-th,
1, ...,4)
and
£>(t),
following analysis
and then used to form the point estimates
which are reported Figures 4-7
for r
=
.2, .4,
..., .8.
VaR functions of the dynamic model
linear models, respectively, plotted in the time-probability level coordinates, (i,p), (p
We
quantile index.)
report
VaR
many
for
represents a
more complete depiction
The dynamics
risk tends to
model
is
The
VaR
is
the
surface formed by varying r
of conditional risk.
usual
its
level.
The
risk surface generated
difference
between the linear and the recursive
by the recursive model
is
much smoother and
is
persistent. Furthermore, this difference is statistically significant, as Figure 7 shows.
Focusing on the recursive model,
slope coefficients
coefficient
^(0>
let
us examine the economic and statistical interpretation of the
^3(')> #4(')> ?(')> plotted in
on the lagged
oil
Figures 4-7.
price return, #2(-)>
tails of the conditional return distribution.
coefficient
Clearly, the whole
t
typically
depicted in Figures 2 and 3 unambiguously indicate certain dates on which market
be much higher than
also striking.
much more
—
The conventional VaR reporting
values of r.
involves the probability levels at a given r.
The
The
and .95-th Quasi-posterior quantiles are
Figures 2 and 3 present the estimated surfaces of the conditional
and
t(i-t)'
chosen using the rule given in Koenker (1994). Under the assumption
specification, these weights
validates Quasi-posterior quantiles for inference purposes, as in (4.12)-(4.16).
is
'
on the lagged DJI return,
is
and
right
negative in the middle part.
The
insignificantly positive in the left
It is insignificantly
03(-), in contrast, is significantly positive for all values
of
r.
We
also notice a sharp increase in the middle range. Thus, in addition to the strong positive relation
between the individual stock return and the market return (DJI) (dictated by the
on
(0.2,0.8)) there
is
also additional sensitivity of the
fact that #2(-)
>
median of the security return to the market
movements.
The
coefficient
on the own lagged return,
for values of t close to 0.
distribution.
in the tails.
This
may be
6><i(-),
on the other hand,
is
significantly negative, except
interpreted as a reversion effect in the central part of the
However, the lagged return does not appear to significantly
Thus, the lagged return
is
more important
27
for the
shift the quantile function
determination of intermediate
risks.
Most importantly, the dynamic
quantiles
is
and
coefficient g ()
high quantiles, but
in the
a strong evidence
is
on the lagged VaR
in favor of the recursive specification.
both the reversion and significant risk clustering
7).
As expected,
there
is
the
effects in
properties of the stock price. Thus, the
dynamic
the tails of the quantile function, that
for risk
7
significantly negative in the low
The significance of q(-)
The magnitude and sign of g(-) indicates
tails of
zero effect over the middle range, which
is
is
insignificant in the middle range.
effect of
is
lagged
the distribution (see Figure
consistent with the
VaR
is
random walk
much more important
for
management purposes.
Conclusion
In this paper,
common
using
we study the Laplace-type Estimators
statistical, non-likelihood
these estimators are v^-consistent
or Quasi-Bayesian Estimators that
we
define
based criterion functions. Under mild regularity conditions
and asymptotically normal, and Quasi-posterior quantiles provide
A
asymptotically valid confidence intervals.
simulation study and an empirical example illustrate
the properties of the proposed estimation and inference methods. These results show that in
many
important cases the Quasi-Bayesian estimators provide useful alternatives to the usual extremum
we
estimators. In ongoing work,
are extending the results to models in which v^-convergence rate
and asymptotic normality do not hold, including the maximum score problem.
Acknowledgments: We thank the
and an anonymous
editor for the invitation of this paper to Journal of Econometrics
prompt and highest
referee for
quality feedback.
Gary Chamberlain, Ivan Fernandez, Ronald
Cole,
Hansen, Jerry Hausman, James Heckman,
Sergei
We thank
Gallant, Jinyong
Xiahong Chen, Shawn
Hahn, Bruce Hansen, Chris
Bo Honore, Guido Imbens, Roger Koenker, Shakeeb Khan,
Morozov, Whitney Newey, Ziad Nejmeldeen, Stavros Panaceas, Chris Sims, George Tauchen,
and seminar participants
at
Brown
University,
Duke-UNC
Triangle Seminar,
MIT, MIT-Harvard,
University of Chicago, Princeton University, University of Wisconsin at Madison, University of
Michigan, Michigan State University, Texas-AM University, the Winter meeting of the Econometric
Society, the 2002
European Econometric Society Meeting
gratefully acknowledge the financial support provided
grants SES-0214047
A
in
Venice for insightful comments.
We
by the U.S. National Science Foundation
and SES-0079495.
Appendix of Proofs
Proof of Theorem
A.l
It suffices
1
to show
\h\°\p'n
(h)-pUh)\dh-> p
(A.l)
/,.
for all
a >
0.
Our arguments
(1981), as presented
part
2,
and
follow those in Bickel
and Yahav (1969) and Ibragimov and Has'minskii
by Lehraann and Casella (1998). As indicated
are due to
(i)
the non-likelihood setting,
to handle discontinuous criterion functions,
(iii)
(ii)
in
the text, the main difference are in
the use of Huber-like conditions in Assumption 4
allowing more general loss functions, which are needed for
construction of confidence intervals.
28
,
Throughout
proof the range of integration for
this
ft
is
implicitly understood to be
Hn
argument, we limit exposition only to the case where J„(9) and Q„(9) do not depend on
.
n.
For clarity of the
The more
general
case follows similarly.
Part
1.
Define
ft
= v^(0 - Tn
)
T„
,
=
O
+ -J {Bo)-' A„ (0O ) Un = -^J(9
,
n
)-'
^/n
A„(0 O ),
(A.2)
then
p„
=—=p„ (fc/v^ + 5o + Un/Vn)
(ft)
/»„ w
(^ + T") ex P ( L " (^ + T")) dh
*(jz + Tn )exp(u>(h))
^7r(^ + T„)exp(w(ft))
"
'
a,
where
w (ft) = L„ (V„ + 4=
- L (0o) - ^- A„
)
(0 O )'
^ (flo)"' A„
(A.3)
(0O )
and
Cn = [
o >
Part 2 shows that for each
A,„=
Given
(A.4), taking
a
/"
(-y=
+ Tn \
exp
(w(ft)) dft.
0,
exp(w(ft))7rfrn
|ft|
a =
ix
+ jLj-expf-|ft'J(0o )ft)7r(0o
)
dft
-^0.
(A.4)
we have
C„->„
,(9o "
7r(0o)dft =
f e-^'l
7r(0o)(27r)
=
|detJ(0o)r
1/2
(A.5)
hence
C„
=Op (l).
Next note
left
side of (A. 1)
ee
/
|ftHp n (ft) -p^(ft)|dft
=
>1„
C„-\
i-here
4„
=
/"
|ft|°
e
wW Tr(Tn +
Using (A. 5), to show (A.l)
it
-j=}
suffices to
-
{2-K)-
show that
dl7
An
\<\etJ (0 O )|
-?-y 0.
A„ < A\„ + A 2 „
29
But
,/2
exp
f-h' J (9
)h\
C„
dh.
where
A 2n =
Then by
a
C„(2Ky d/2 \detJ(0o)\ l/2 exp(-^h'J(6o)h\ -is (0o)exp (~h'J(e )h\
\h\
J
dh.
(A. 4)
0,
and
A 2„ =
C„(27r)-
d/2
|
det
1/2
J(0O )
- tt(0o)
f
|
\h\"ex.p(-^h'j(9o)h\dh
0.
Part
2. It
remains only to show (A.4). Given Assumption 4 and definitions
-
^A
n (0 O )'
= -\tiJ{e
Split the integral
A\„
•
Area
(i)
:
\h\
< M,
•
Area
(ii)
:
M
<
\h\
•
Area
(Hi)
:
\h\
>
8y/n.
Each of these areas
Area
(i):
We
will
J
)h
-1
(flo)
An
(fl
)
+ Rn
(-^=
in (A. 2)
and
(A. 3), write
+ T„)
+ Rn (^= + T„Y
in (A.4) over three separate areas:
is
<
8y/n,
implicitly
show that
for
liminf P.l
f
understood to intersect with the range of integration for
<
each
c
\h\
M < oo and each
e
exp(w(h))n(Tn +
h,
which
is
H„.
>
—=j
(A.6)
-exp(-^tij{0o)h\ir(eo ) dh
This
is
<e| >1-
.
proved by showing that
sup
|
exp(to (ft))* \T„
+
-M -exp (~h'J(9o)h\
tt(5
(A.7)
)
\h\<M
Using the definition of
(a)
uj(h), (A. 7) follows from:
sup
\h\<M
where
(a) follows
n(^=+Tn )
\v«
-7z(S
)
-Ao,
(6)
/
sup
\R^(-^=+Tn )
VV"
/
o,
\h\<M\
from the continuity of
ir(-)
and because by Assumption
1
J(flo)" '
A„
(fl
y/Z
30
)
= Op (1)
4.ii-4.iii:
(A.8)
—
—
Given (A.8),
from Assumption
(b) follows
4.iv, since
h
Tn
sup
= — 9o
H
= Op {l/Vn).
\h\<M
Area
(ii):
We
show that
for
Iiminf P.
each
there exist large
M and small 5 >
exp (w (h)) k
/
<
"
>
e
[Jm< \k\<5s/^
such that
(**)
(A.9)
Since the integral of the second term
to
show that
for each e
>
I
[Jm< \h\<SJK
it suffices
exp (w
By assumption
By
7r (•)
(k))
< K,
to
M
7r
so
W
show that
+
( T„
-^=
(- ]rtiJ{8
)
h\
and can be made
is finite
there exist large
Iiminf P.l
In order to do so,
exp
and small 5
-k
>
>
-€.
1
M large,
it
suffices
such that
-=)
exp(w(h))n (t„ +
M as n —
< C exp (-jh'J (8 h\
)
we can drop
)
e 1
arbitrarily small by setting
for sufficiently large
j
(8
\dh<
dh <
>
-e
1
(A.10)
oo
for all
,
>
h
e >
M<
<
\h\
(A.H)
5y/n.
from consideration.
it
definition of ui(h)
<
exp (w (h))
Since \T„
—
8
\
=
o p (l), for any 5
[
wp —>
>
T„
exp
-=
H
-
-ti J (8
)
h + Rn
1
— Bo <
25,
for all
\h\
Thus, by Assumption 4.iv(a) there exists some small 5 and large
Iiminf P.
',
sup
<
M<\h\<&
lsV*\h+^J(8o)- 'A„(9
\
Since ^\J (Bo)'
1
A„
2
(8
)
|
= Op (1),
for
some
lim inf P. i exp(u>(h))
(* + £)
2
)|
<
5y/n.
M such that
< ^mineig (J
" 4
(8 )) \
>
1
-
e.
C>
< Cexp (-jft'j(A)fcH
> liminfpJe"'" < Cexp (-hi J(8 )h+
> 1-e.
(A. 12) implies (A. 11), which in turn implies (A.9).
31
(A.12)
-mineig
(
J (Bo))
\h\
0}
Area
(iii):
We will
show that
>
for each e
and each
>
8
>
limlinfP.i
a
f
|h|
+ 4=
|exp(u;(/i))irfrn
)
(A.13)
The
integral of the
exp
(~tiJ(e )h\
second term clearly goes to
a
f
\h\
as
e"
n —>
vr +i
[
J\e
Since
T„ — 9o -^4
Tn -9
,r
is
wp
->
X„ C
1
•
vn/
V
<eI >
1-e.
we only need
to
show
p 0.
bounded by
((9) .-x,>
j
Ln
(9)
- Ln
(9
)
-
^ A„
(So)'
J (do)-
1
An
d6.
(0 O )
J
bounded by
this is
% /H
\dh
oo. Therefore
-T„|>J
0,
)
w n (T„ + JL) dh -+
J|/.|>iiv^
Recalling the definition of h, the term
n{9
a+1
a
(1+
/
\9\
)
-k
(6)
exp (L„
(6)
- i„
(0 O )) e»,
•/|9-»ol>«/2
where
=
isT„
By Assumption
exp
>
3 there exists e
wp —>
1
the entire term
is
pj
(9 )j
= Op (l).
sup
e
L"
m - L"
l,a)
<
e" e } =
1
bounded by
#„ C
•
Here observe that compactness
1
J^o)" A„
(%)'
such that
liminf
Thus,
f-^A„
is
^/n
,+1
•
e~
n'
a
f
\9\
ir
(9)
d0
=
o p (l).
(A.14)
only used to insure that
[
a
\6\
Tx{9)d9<oo.
(A.15)
Hence by replacing compactness with the condition (A.15), the conclusion (A.14)
is
not affected for the given
a.
The
A.2
entire proof
is
now completed by combining
(A. 6), (A. 9),
and (A.13).
Proof of Theorem 2
For clarity of the argument, we limit exposition only to the case where J„ (9) and
n.
The more
general case follows similarly.
Recall that
h
= y/n{9-6 )-J (So)" A n
1
32
(g
)
/y/n.
fi„ (9)
do not depend on
Define
Un = J (do)
l
&n{8o) I y/n. Consider the objective function
Qn (z) = J p(z-h- U„)p*n (h)dh,
which
is
minimized
at ^/n(8
— 80)
Also define
Q<x>(z)
=
/
-
p{z
-
Zn
Define
h
U„)p*x {h)dh.
Jr*
which
is
minimized at a random vector denoted
£
Note that solution
is
symmetric, f
=
Z„ —
Therefore,
is
unique and
=
arg inf
I6» d
finite
.
lj^p(z-h) P Uh)dh\.
I
jRd'
by Assumption
2 parts
we have
for
by Assumption 2.H p(h)
<
1
+
p
\h\
\Qn(z)-Q ao (z)\< f
and by
(X
<
[
f
Jh„
{l+2 p
+
< [
f
where o p (l)-conclusion
Now
note that
convexity
=Op (l).
by Theorem
is
- Q ao {z)^ v
\a
+ b\ p <
(l
f
1
2p
~l
\a\
p
+ 2p -
1
|6|
p for
p >
1:
+ \z-h-Un n<plo(h))dh
-i
\h\
(l
p
+
(l+2 p -'\h\ p
+
H„
When p
+ \z-h-U„\n\Pn{h)-pUh)\dh
+
measure of
loss function p.
any fixed z
Qn(z)
since
on the
(iii)
arg inf zift d Qae(z) equals
Zn = £ + !/„
Next,
and
(ii)
by Anderson's lemma.
(l
+
+2 p
2p
-
-
l
,
\h\
\z-U„\ p )\p'n {h)-p'ao {h)\dh
p
+
2p
-1
\z-U„\ p )(p-ao (h))dh
+Op (l))(p'n (h)-pUh))dh
2»-'\h\ p
+ Op (l))(pUh))dh =
and exponentially small
tails of
o p (l),
the normal density (Lebesgue
converges to zero).
Q n {z)
and Qaa(z) are convex and
lemma of Pollard
finite,
and Z„
=
arginf zgR i Qac(z)
= O p (l). By
the
(1991), pointwise convergence entails the uniform convergence over compact sets
K:
sup
Since
yjn(8
Zn = Op (l), uniform convergence and
— 80) — Z„-> p 0, as shown below.
Proof of
Zn —
yfn(8
—
80)
=
o p (l).
The
<2„(z)
-
Qoc(z) ->„
0.
convexity arguments like those in Jureckova (1977) imply that
proof follows by extending slightly the convexity argument of
Jureckova (1977) and Pollard (1991) to the present context.
33
Consider a
ball
Bs(Z„) with radius
5
>
0,
Zn and let z = Zn + dv, where visa unit direction vector such
= Op (l), for any 8 > and e > 0, there exists K > such that
centered at
Z„
,
liminf P.i E„
By
convexity, for any z
= Zn + dv
= {BS (Z„)
constructed
S
that
\v\
=
1
and
d>
8.
Because
B K (0)}\ > 1 - e.
so, it follows
that
^(Q-OO - Qn{Zn )) > Q n (z') - Q„(Z„),
where z*
is
a point of boundary of Bs{Z n ) on the
Q„(z) to Qoc(z) over any compact set
^(Qn(z)
ZJk-(O),
exists
1
—
an
>
77
is
a uniformly
>
in
En
and Z„. By the uniform convergence of
occurs:
- Q„(Z„)) > Q„(z*) - Q n (Zn )
>
where V„
line connecting z
whenever
(A.16)
n
-
Qoo(z*)
Qoc(^n)
positive variable, because
P(Vn >
such that lim inf„
>
jj)
1
—
Zn
is
+ Op (l) >
V„
+ Op (l),
the unique optimizer of
That
is,
there
least as big as
3e for large n:
2(Qn(z)-Q n (Zn ))>7
1
Thus, yfn{9
— 9o)
set e as small as
small
T)
>
eventually belongs to a
we
like
0, it follows
by picking
complement
Since this
Bs(Zn ) with
K, and
>
0, it follows
-V^{e-e
probability at most 3e. Since
(b) sufficiently large n,
and
we can
(c) sufficiently
)\
><s}
=
0.
that
Z„ - Vn(9 - 9
A. 3
.
that
true for any 8
is
of
(a) sufficiently large
iimsupP*{|z„
)
=
o p (l).
M
Proof of Theorem 3
For clarity of the argument, we limit exposition only to the case where J„(8) and
n.
Q<x>-
Hence we have with probability at
e.
The more
general case follows similarly.
We
f„. b (z)
Evaluate
it
at
x
=
H
9
<?(#o)
+ sj^fn and
= /
Jee&-.
Pn {e)d0.
change the variable of integration
p'n (h)dh.
Define also
g n (s)
,
= /
d :g(e +h/ ^+U
JheR
//.<ER'<:
n /^r)<g(8 )+3/^;
9 (0o+'i/v
w/ n + k%./v/5r)<s(9o)+s/%Af
p'ooWdh
and
H
Hg^is)
,oo(s)
g
do not depend on
9 (,e)<x
As) = Fg ,„(9(So) + s/V^) = f
S
8
ii n {9)
defined
=
plcitydh.
/
J/.6R
//.eR J :V9(9
:Vor9nVf/i/VS"+U„/v^r)<s/v^
)'('i/v^'+t''n/v^)<»/yS
34
By
moments norm and Theorem
definition of total variation of
sup \Hg> „(s)
1
H3 ,„(s)|-»-p
-
0,
a
where the sup
By
is
Hg<n (s).
taken over the support of
the uniform continuity of the integral of the normal density with respect to the boundary of integration
sup|i?s ,„(s) --ff9 ,oo(s)|-> p
0,
- -ff3 ,oo(s)|->p
0.
which implies
SUp
|ifg ,„(s)
3
where the sup
is
The convergence
taken over the support of
H
3i „(s).
of distribution function implies the convergence of quantiles at continuity points of distri-
bution functions, see
e.g. Billingsley (1994),
so
- ff-^ (<*)-„
i?-i (a)
0.
Next observe
= P{^g(Bo)'N'(Un
ff9 ,oo(s)
H~Ua) = Vg(9
where qa
is
the a-quantile of jV(0,
Recalling that
we denned
+ qa ^Vgg(Bo)'J-
)'Un
<
J-'iBo))
,
s\un
},
(BoWeg(0o),
1).
=
c 3 ,n(aO
,
F~l(a), by quantile equivariance with respect to the monotone trans-
formations
tf-i( a )
= yS(cs ,„(a)-ff(0o))
so that
\/n(cg ,„(a)
The
A.4
-
rest of the result follows
g(6
))
V^»9W^W»sW +
= Vg{6Q )'Un + g
<'p(l)-
by the A-method.
Proof of Theorem 4
In view of
Assumption
4, it suffices
to
show that
Jn\6
)
-
_,
J„
(A. 17)
(e )->p 0,
and then conclude by the A-method.
Recall that
h
= VS(0 - B
)
- Jn
-1
(So)
A„
(Bo)
A/S,
u„
and the
localized Quasi-posterior density for
p*n (h)
=
h
—
yTl
is
p„ (h/y/n
35
+ 0o + U„/y/n)
.
Note also
J"
1
(Bo)
= J n{9- 9)(9 - 0)'p n (9)d9
= J (h-V^(9-6o) + Un )-(h-VE(e-8 )+Un yp: (h)dh,
t
and
l
J~ {9o)= J htiplcitydh.
We have,
denoting h
=
(h u
...,
dd ) and f„
(a)
fHn
hihj (p'n (h) -ploihijdh
(b)
fHC
hihj [plo(h)jdh
(c)
fHn
\fn
2
(p'n (h)
\
=
=
=
(f„i
,
...,
f„ d ) where f„
Theorem
o p (l) by
o p (l) by
y/n{9
-9
- U„
)
for all
,
i,
j
<
d
1,
Jn {9o)
op (l) by definition of pjo and
-pUh))dh =
=
Theorem
being uniformly nonsingular,
2,
=<.„(!)
(d)
(e)
\Tn
/H
JHri
2
(ploihndh
\
fni
hj
(p'„(h)
=
op (l) by Theorem
- plcihijdh =
2, definition
op (l) by Theorems
of p^,, and
1
and
Jn (9o) being
nonsingular,
2,
=°„(i)
(f)
fcj
/H
(p£oC0 )dh
T„i
=
o p (l) by Theorems
1
and
2,
definition of p'^,
and J„(5o) being uniformly
nonsingular, from which the required conclusion follows.
A. 5
Proof of Proposition
Assumption 3
1.
It
is
directly implied
1
by (4.1)-(4.4) and the uniform continuity of
remains only to verify Assumption
shown
Errii (8), as
in
Lemma
4.
Define the identity
L„(8)
- L n (9
)
= - ng n (9 )'W{9
A„(fl
)G{8
)
{9
-9
)
)<
(A.18)
1
--(9-9o )'nG{9 o yw(9o )G(0o)(9-9o)
R n (9).
+
Jtfo)
Next, given the definition of
conditions
i-iii
of Proposition
A n (9o)
1.
and
J(9o), conditions
Condition iv
is
i,
iii
ii,
verified as follows.
of Assumption
4.
are
immediate from
Condition iv of Assumption 4 can be
succinctly stated as:
ID
ii
for each e
>
there exists a 8
>
such that lim
36
supP*
<
sup
——
(Q\
—
7-:
~)
I
1
,„
>
e >
<
6.
e
This stochastic equicontinuity condition
see e.g.
Andrews
.
equivalent to the following stochastic equicontinuity condition,
is
(1994a):
id
forany<5 n ->0
|»—
This
is
weaker than condition (v) of Theorem 7.1
sup
"T^ifl
where the term
+ n|0-0o
in brackets is
2
i,
—FT? = W
O
|
v^ie - g
l
Hence the arguments of the proof, except
At
first
From
(1994),
which requires
(A. 20)
>
\VE\9-e \+n\d-9
l + n|0-0 o 2
L
2
|
\'
|
+
|
for several
-
2
rc|0-0 o
|
important differences, follow those of Theorem 7.2
in
(1994).
note that condition iv of Proposition
sup
»6Bj n (eo)
(A.19)
bounded by
"*"
Newey and McFadden
°p
„i/)
- 0o + n\9 -
Vn\6
|
I
Newey and McFadden
in
Rn{6)
Rn(9)
l
n
= o„(l).
l + n|ff — flop
'
l<<5„
fl
fa\
v
sup
e (0)
= ov (-L)
Vv™/
where
,
1 is
e (0)
implied by the condition (where
3"
=
9
=
^ +~ Vn|0-0
"f*
f
o)
O
1
<*>
for
we
let
g(9)
ft.
->
0.
any
= Egn
(0)):
(A.21)
v
;
|
(A. 18)
fl* (0)
=
+ « 2n (0) + Rsn(6),
i?m(0)
where
R ln
(0)
=n (gn
W„ (0)G (0 O
(0 O )'
iZ 2 „ (0)
)
)
- 0o) + \
- 0o)' G (0O )' W(9)G (0O ) (0 - 0o)
(0
5* («)' W»(%« (0) +
="Qs» (0o)' (W„(0o) -
R3n (9) =n(gn (flo)' (W(6 -
(0
iy„(0)) 9 „ (0o))
W„(fl)))
G (0O
)
Verification of (A.19) for the terms R2„{9)
uniform consistency of
by condition
in
as |0
It
-
O
->
|
i
W
of Proposition
1,
and
assumed
as
n (9) in
so that
Rin
W„(%„ (00))
)
- W(9)) G (0O
W„(0) - W(9)
i
of Proposition
=
1
<?„
(0)
for
=
(1
op (l) uniformly
in
the term Ri„(0). Note that
+
y/H\9
-
(0
- 0o))
0o|) 6 (0)
37
+g(9) + gn
)
= O p (l) and
and from the continuity
0.
remains to check condition (A.19)
)
immediately follows from y/ng„(6
in condition
,
0o)
+ \(9- 0o)' G (0 O (W(9
)'
(Bo)'
,
-
(0
\g„
(0 O )
.
and W(9) -
W(9
of
)
the
W(9)
=
o(l)
+
,
.
Substitute this into Ri„ (9) and decompose
-
-Rm (6) =
+
(1
+
V^|fl
I
(1
- flo|)
+ Vn|« - 6o\)
2
W„(<%-
e (0)'
e (0)'
(fl
)
+
W„(0)e
(1
+
(0)
+gn (9
V^\9 - 9
\)
)'
W„(9)(g
e (9)'
+ \g («)' (Wn (9) - W(9)) g (9) + \g (9)' W(9)g (9) -\{9 - 9
Using the
inequalities, for
(1
x
>
+ y/nxf <
+ ra 2
y/iii
2
1
'
'
W|
.,
sup
9es 5 „(eo)
(6)
c)
SU P
(d)
sup
(e)
sup
»6B 5 „(»o)
<
<
+ nx 2 ~
v
'
-
i
1
VK(0)
o{\9
(a) follows
>
—
is
9o\)
condition
o(\9
—
i
finite
"l
u"
''
0_£/ °l
?/^
"'?
W
<
1
1
'
1
G (9
)'
+ Vni <
+ nx 2 —
n (1
2
1
+
+
)'
l
=
12
'
sup
9eB 5 „(e
v
-
iii;
(9
-
9
)
(d) follows
- »or
W„ (9) v^3n
sup
2^ eWW-Wi/^-
first
(A.22)
'
(A. 21):
(00)
=
op (1)
Op(l).
= °p(1),
&(«)-^)l = MD,
8 -9°\
)
I
2
O (|e-9o| |^(g)l)
\o-0or
)
i,
which states that
Wn
{9)
= W{9) + op (l) and
= G (9o) {9 — 9o) +
equality follows by Taylor expansion g(9)
follows from (A.22) and condition
(c) follows
iii;
by (A.22) and then replacing, by condition
6o\), followed by applying (A.21)
=
<
sup
9€B S „(8
x
(l),
2n| e (0)'W.(«)ff»(*o)|
and condition
i;
G (9o) (9 — 9o) + o (\9 — 9o\), followed by applying condition
followed by applying condition
G(9o) (9 — 9o) + o(\9 — So
i;
ii,
from (A.22), (A.21),
g{9) with
G (9o) {9 — 9o)
from replacing by condition
(e) follows
with
and
(f)
ii
4-
g(9)
follows from replacing g (9) with
i.
I),
Verification of (A. 20) for the term R\ n {9)
A. 6
)
^y/n
and
i- iii
SU P
"Mfl)|
^
+ nlC-Por
e6B ire (e
the
n|0
<
—
2
^
(>
in 9; in (b)
'
*
+
1
)
from (A.22), (A.21), and condition
uniformly
W(9)G (0O )
y/ng)
m
ne(9)'Wn (9)e(9)=op
SU P
" P *
+n1?/-Vo\'
+
and the second conclusion
and
i2
\
,«
where
i
fl
1
C6Bj„(»o)
(0)
7
:
i
-9
Wn (9)g (9)
each of these terms can be dealt with separately, by applying the conditions
sup
(9
))
0:
1
(a)
- G (9
(9)
now
follows by putting these terms together.
Proof of Proposition 2
Verification of
Assumption 3
consistency proofs of
is finite
is
standard given the stated conditions and
extremum estimators based on
and Kitamura (1997)
for cases
when
s
GEL
in
takes on infinite values.
38
is
subsumed
Kitamura and Stutzer (1997)
We
shall
as a step in the
for cases
not repeat
it
when
s
here. Next,
v
>
we
Assumption
will verify
.
Define
4.
7(0)=arg mfL„(0, 7 ).
to
It will suffice
show that uniformly
L n (9„)=L n
in 9„
6 Bg n
(9o) for
any
<J
n
—>
0,
we have
GMM
the
set-up:
(9„,T{9 n ))
/
1
"\
,/i"
V
"
1
(A.23)
where
V(0o)
The Assumptions
4.i-iii
= Erm {9 rm
(9
)
)'
from the conditions of Proposition
follow immediately
gn (9)
S
>
=
^5D"_, mi(9), the Donsker property assumed
2,
and Assumption
GMM
verified exactly as in the proof of Proposition 1, given the reduction to the
in condition iv implies that for
4.iv
is
Indeed, defining
case.
any
>
there
is
the proof of Proposition
1.
e
0,
such that
limsupPV
sup
-/R\g n (9)
(eeB s (e
n-too
- gn {9
- (Eg n (9) - Eg„(9
)
))\
> e\ <
)
e,
)
which implies
„.
limsupP J
<
,-
n-»oo
which
It
is
—
V^\9n{9)
-
sup
[esfljfSo)
condition iv in Proposition
1.
The
——
- g„(9o) - (Eg n (9) - Eg„(9
_
n l" — "o|
1 +
—
-
We
W.H
we use
first
> e \f <
e
i
J
arguments follow that
rest of the
only remains to show the requisite expansion (A.23).
For that purpose
))\
in
show that
0.
the convexity lemma, which was obtained by C. Geyer, and can be found in Knight
(1999).
Convexity Lemma. Suppose
defined on
Rd and
,
let 2?
Qn
a sequence of lower-semi-continuous convex R-valued random functions,
is
be a countable dense subset of $&
finite-dimensional sense) on
V where
Qoo
is
d
.
If
Qn
weakly converges to Qoo
lower-semi-continuous convex and
finite
in
R marginally
(in
on an open non-empty
set a.s., then
arginf
provided the
latter is
uniquely defined
Next, we show that 7(6 n )-*p
By
in
Rd
F=
{7
a.s.
Define
convexity and lower-semicontinuity of
Thus
9
0.
>->
for
7 e F, Es[mi(9)''y]
<
00
Q n (z)-+d
s,
F
for all 9
is
:
£s[m,(0o)'7]
F
and any 9 n —
<
00} and
convex, open, and
its
€ Bs(9o) and some 5
Es[mi(9)'y] over Bs{9o) implied by the condition
Thus, for a given 7 6
arginf Qoo(z),
ii
and
>
0,
p 9a
"
1
'
39
<
{7
boundary
iii.
-Y"s[mi(B„)'y]-> p Es[Tni(9o)'i]
»
77
Fc =
00.
is
:
Es{mi{9o)'l]
nowhere dense
=
in
00}.
Rp
.
which follows by continuity of
'
»
>
.
This follows from the uniform law of large numbers implied by
6 Bs{9o)}, where S
1. {s[mi(9)'f],8
2. Em.i{8)
=
The above
function set
Jx
dP\rn,i{9)
is
<
x]
small, being
is sufficiently
M for some compact M and a given 7 E F, by condition
(b)
{mi(6),6 £ Bs(0o)} being Donsker class by condition iv,
(c)
s
Donsker
Now
V Pi
being a uniform Lipschitz function over
take 7 in
V
e
Theorem
for
all
9 £ B}(9o),
some S >
2.10.6 in van der Vaart
class
F
c
\
is
Donsker class
dF, where
dF
0,
M
17
given
and
and
iii.
s,
e F, by construction of F,
7
itself.
wp —>
denotes the boundary of F. Then
™
Now take all the rational numbers 7
Lemma and conclude that
e
Rp \ dF
we can expand the
first
=
Es[mi(9oY"i]
p
1
00.
as the set T> appearing in the statement of the Convexity
=
7(0")-»p
result,
ii
1,
and Wellner (1996) that says a uniform Lipschitz transform of a
n
Given this
-+
iii,
by assumption on
,
and a
—1 7^ s[mi(9n Yj] = 00—
^—
form. Note
wp
Donsker by
m,(0)'7 £
(e)
class
being continuously differentiable in 9 by condition
(a)
(d) mi(S)'7
a Donsker
arg inf Fs[7n,(9o)'7].
7
order condition for 7(0„)
order to obtain the expression for
in
its
first
= J^ Vs
(7 (9n)'
mi
(0„ ))
m,
(<?„)
'':'
(A-24)
where
V»
=
-V— V
T7.
for
some 7(9 n ) between
2
*
s (7 (9n
and f{9„), which
is
)'
m,
different
(9 n )) rrn (9 n )
mi
(0„)'
,
from row to row of the matrix
Vn
.
Then
V„-+ p
V(6>o)
= Fm,
(9
)
m,
(9
)'
This follows from the uniform law of large numbers implied by
{V 2 s(-y'mi(9*))mi(9)mi(9)' ,(0*,-y,9) 6
small, being a Donsker class wp —> 1,
1.
2.
Emi(9)mi(9)'
17
Recall that
V
= J xx'dP[mi(9) <
is
x]
£,5,(00) x
Bj 2 (0) x
being continuous function
defined as the open convex set on which
40
s is finite.
B53 (9
in 9
)},
where
by condition
8}
i,
>
are sufficiently
,
3.
EV
2
s(~f'mi (9"))mi (9)
>
sufficiently small 5
The claim
1
is
nu
0, for
verified
= £V 2 s(0)77ii
(9)'
any 7 — >
0,
+ o(l)
(9) tth (9)'
uniformly in
by assumptions on s and condition
by applying exactly the same
logic as in
.
(9, 9')
e Bg(9
)
x
B5 (9
)
for
iii.
the previously stated steps (a)-(e). For the
sake of brevity, this will not be repeated.
Therefore,
wp —
1
-y(9 n )
= -(Vn )-
,
^J2m
= -(V(9
+op (l))^J2m
)- 1
(9 n ).
(A.25)
4=y><(0-)+5V^(0»)'V»V«7(*»),
(A.26)
i
(9 n )
1=1
i
t=i
Consider the second order expansion,
i»(9«,7(«»))=Vwr(«»)'
v"
l
,=1
where
V„
for
some 7 (9„) between
1 «^
= - Y, V 2 s (7 (9r,)' m, (9n )) rm
and 7
(9 n
),
which
is
different
(9„) m,- (0 n )'
from row to row of the matrix
Vn
.
By a preceding
argument,
V„-> P V(9
Inserting (A.25)
A. 7
and V„ = V(9o)
+o p (l)
into (A.26),
).
we obtain the required expansion (A. 23).
Proof of Proposition 3
Assumption
3 is
assumed.
We
need to
verify
Assumption
4.
Define the identity
L n (9) - L„(9
)
= J2 m i( 9o)'(9 -
+
1
-(9
9
)
(A.27)
-
y ee ,Em
9 )'n
t
(9
)
(9
-
9
)
-./(So)
+
Assumption
4.i-iii
Rn(9).
then follows immediately from conditions
The remainder term R„
(9) is
i
and
ii.
Assumption
4.iv
is verified
given the following decomposition:
n
Rn
(9)
= J2
[m,{9)
-
m,(0 o )
- Emi{9) + Emi(9
+ n{Em,{9) - Em
t
)
(9 ))
- m,(9
+
i(0
k 2 „(S)
41
)'(9
-
-
O )}
9 )'nJ(9 ){9
-
9Q )
as follows.
"
,
to verify Assumption 4.iv separately for Ri„(9)
It suffices
R2n(9)
for
9"
some
on the
To show Assumption
= -\n{6 - 6o)' \J(9") -
connecting 9 and
line
from continuity of J(9)
a
in 9 over
limsupP*<
"
for
any given
\<M/jz
-9
),
for
R2n(9) follows immediately
M>
J
sup
9
l
\\e-e
<limsupW
First,
(9
|.Ri„(0)|>el
sup
\\e-e
"
last
)]
Assumption 4
9o, verification of
<limsuP p|
where the
J(9
ball at 9o-
R\ n {9), we note that
4.iv-(b) for
and R2„(9). Since
-^! ^'"^, >el
1
1
\<M/^i
\o
(A.28)
— Oo\
J
—
sup
LJiLkZJ
>e}=0,
conclusion follows from two observations.
note that
»w=^ 9-9 \-7nhy
- mi (g
i{9)
z
- (Erm{9) - Ermifio)) -
)
mi{9
)'{9
-9
)
l
is
Donkser by assumption, that
The
is it
converges in l°°(Bs(8o)) to a tight Gaussian process Z.
process has uniformly continuous paths with respect to the semimetric p given by
2
p
so that p{9,9)
—>
if
9 -* 9o-
{9 u 9 2 )
Thus almost
all
= E{Z{9,) -
Z{9 2 )f
Z
sample paths of
are continuous at So-
Second, since by assumption
E[m n (9) we have
for
any 9 n
—>
m n {9 -m^{8
)
)'(6
-
2
8
)]
=
-
o{\9
2
9
\
),
So
|-Ri„ (9 n
yfr\6 n
-
)
oJ^-9^
|
|e„-e p
So|
'
therefore
Z(flo)
Therefore for any
ff
—> p
So,
=
o.
we have by the extended continuous mapping theorem
Z„(6')-t d Z{9
)
=
that
0,
is
Z„(9')-* p
(A. 29)
0.
This shows (A.28).
To prove Assumption
4.iv-(a) for Rj„(9),
we need
limsupP-(
to
show that
'"
sup
'f
42
for
(
some
! 11 >
5
>
4 <€
and constant
M
(A.30)
<
Using that M/^/n
\9
— 9o\, bound
limsupP
the left-hand-side by
\RlnW\
a
- &o\
sup
<
i-\a
\<5 i/n\8
M/^<\e-e
,,
< limsup P*
< limsup P*
for
any given
e
>
in
order to
by the property (A. 29) of Z„ or make
B
l~\a
y/n\9
sup
|Z„(0)|
make the
M
last inequality true,
sufficiently large
I
J_ >
|
su P
'-'"
\M/v^<|9 -9ol<*V'"|0- fl" °l
J
€
\
[gin (9)
f
>
- 9a~\
M
'•
J- >
.
1
"
(A.31)
J
E L
™
[m/ vTT<|9-9oI<'5
n
where
'
i
I
we can make
by the property Z„
either S sufficiently small
— Op
.
(1).
Appendix on Computation
B.l
A
computational
we record some formal
In this section
LEMMA
lemma
3 Suppose the chain (0 ',]
such that q{8\9')
MCMC
results on
< B)
is
computation of the quasi-posterior quantities.
produced by the Metropolis Hastings(MH) algorithm with q
for each (6,9'). Suppose also that P{p(9 u) ,£)
>
=
1.
p n ()
2.
the chain is ergodic with the limit marginal distribution given by
1}
is
for all j
>
t
.
Then
the stationary density of the chain,
lim sup |p(0
Bl-K» A
(B)
6 A\0 o )
where the supremum
is
p n ()-"
- f p n {9)do\ =
JA
\
3.
>
taken over the Borel
0,
I
sets,
For any p„- integrable function g:
B
JD^K
Proof. The result
An
is
immediate from Theorem
immediate consequence of this lemma
is
6.2.5 in
f
g(9) Pn (9)d9.
Robert and Casella (1999).
the following result.
LEMMA
4 Suppose Assumptions
Lemma
then for any convex and p„-integrable loss function p„
3,
-,
arg inf
see
provided that 9
is
1
and 2
hold.
B
nVp,^ 01 -0)
Suppose the chain {9
1
r
->p
=
uniquely defined.
43
arg inf
,j
< B)
satisfies the conditions of
(•)
r
/
p n (0 - 8)p n (9)d9
Proof. By
Lemma 3 we
have the pointwise convergence of the objective function: for any 9
B
r
i
^5>n(0
B
(i)
-0)->„
/ Pn (9-0) Pn (S)d9,
which implies the result by the Convexity Lemma, since 9
J Pn(9 — 9)p„(9)d9
is
convex by convexity of
Quasi-Bayes Estimation and Simulated Annealing
B.2
The
>-¥
relation
between drawing from the shape of a likelihood surface and optimizing to find the mode of the
likelihood function
is
well known. It
is
lim
well established that,
[a
f
>l
x
Essentially, as
A —>
m n(0)d9
m LM
XL
9e "
=
e.g.
arg
Robert and Casella (1999),
m axX n (9)
(B.l)
measures
oo, the sequence of probability
XL » (e
\(9)
L
J9 e* -mn(9)d9
e
(B.2)
converges to the generalized Dirac probability measure concentrated at argmax
L„
(9).
see
The
difficulty of nonlinear optimization
has been an important issue in econometrics (Berndt et
Sims (1999)). The simulated annealing algorithm (see
considered a generic optimization method.
with a uniform prior
ir
=
(9)
Press et
number
The temperature parameter
is
al.
(1992), Goffe et
al.
(1994))
al.
(1974),
is
usually
an implementation of the simulation based optimization (B.l)
c on the parameter space ©.
annealing routine uses a large
(B.2).
It is
e.g.
At each temperature
of Metropolis-Hastings steps to
level 1/A,
the simulated
draw from the quasi
distribution
then decreased slowly while the Metropolis steps are repeated,
until
convergence criteria for the optimum are achieved.
Interestingly, the simulated annealing algorithm has
semiparametric objective functions. In principle,
if
been widely used
in
optimization of non-likelihood-based
the temperature parameter
is
decreased at an arbitrarily
slow rate (that depends on the criterion function), simulated annealing can find the global
non-smooth objective functions that may have many
parameter
is
a very delicate matter and
is
local extrema. Controlling
paper show that
compute the quasi-posterior medians or means
used
in place of
the exact
maximum. They
limiting distribution as the exact
apply equally to (B.2), the
a positive constant and then
for (B.2) using Metropolis steps.
These estimates can be
are consistent and asymptotically normal,
maximum. The
and possess the same
interpretation of the simulated annealing algorithm as an
implementation of (B.2) also suggests that for some problems with special structures, other
such as the Gibbs sampler,
may be
of
certainly crucial to the performance of the algorithm with highly
On the other hand, as Theorems 1 and 2
we may fix the temperature parameter 1/A at
nonsmooth objective functions.
results of this
optimum
the temperature reduction
MCMC methods,
used to replace the Metropolis-Hasting step in the simulated annealing
algorithm.
B.3
Details of Computation in Monte-Carlo Examples
The parameter space
prior
is
is
truncated to 0.
taken to be
© =
Each parameter
[9o
is
±
10].
The
transition kernel
is
a Normal density, and
flat
updated via a Gibbs-Metropolis procedure, which modifies
44
>
slightly the basic Metropolis-Hastings algorithm: for
—
=
k
\,...,d,
a draw of
fit
from the univariate normal
made, then the candidate value f consisting of ft and 8_ k replaces 0^') w jth
probability p(6^\^) specified in the text. Variance parameter
is adjusted every 100 draws (in the second
density
q(\t,k
9%
is
\,<t>)
<f>
simulation example and empirical example) or 200 draws (in the
rejection probability
The
first
N
is
5,
intervals.
The
starting value
is
OLS
the
To
N
give
an idea of computational expense, computing one
depending on the example.
results are available
we used
All of the codes that
x d draws are used
estimate in
all
examples.
We
in
use
N = 10, 000 in the second simulation
000 in the second simulation example and empirical example and
example.
simulation example) so that the
x d draws (the burn-in stage) are discarded, and the remaining
computation of estimates and
N=
first
roughly 50%.
set of estimates takes 20-40 seconds
to produce figures, simulation,
and
empirical
from the authors.
Notation and Terms
—p
—>d
wp —>
convergence in (outer) probability
1
~
-Bj(x)
/
A>
with inner probability P. converging to one
asymptotic equivalence denoted
Donsker class
A~B
(°°{J-)
l
=
/
identity matrix
is
positive definite
when A
is
matrix
normal random vector with mean
and variance matrix a
here this means that empirical process /
asymptotically Gaussian in ^°°(J
mineig(A)
means lim AB~
>
centered at x of radius 8
ball
A
A/"(0, a)
J-
P*
convergence in distribution under P*
metric space of bounded over
minimum
r
),
>-*
2™_i(/(Wi) — &f(Wi))
is
see van der Vaart (1999)
T functions,
eigenvalue of matrix
-4=
see
van der Vaart (1999)
A
References
Abadie, A., 1995. Changes in Spanish labor income structure during the 1980s: a quantile regression approach,
CEMFI
Working Paper.
Amemiya,
T., 1977.
The maximum
likelihood
and the nonlinear three-stage
least squares estimator in
the general
nonlinear simultaneous equation model. Econometrica 45 (4), 955-968.
Amemiya,
T., 1985.
Advanced Econometrics. Harvard University
Press.
Anderson, T. W., 1955. The integral of a symmetric unimodal function over a symmetric convex set and some
probability inequalities. Proc.
Andrews, D.
W.
K., 1994a. Empirical process
of Econometrics, Vol.
Andrews, D.
odds
W.
tests.
metrica 65
The
K., 1994b.
K., 1997.
(4),
Soc.
6,
methods
170-176.
in
econometrics. In: Engle, R., McFadden, D. (Eds.),
Handbook
North Holland, pp. 2248-2292.
4.
Econometrica 62
W.
Andrews, D.
Amer. Math.
A
large
(5),
sample correspondence between
classical hypothesis tests
and Bayesian posterior
1207-1232.
stopping rule for the computation of generalized method of moments estimators. Econo-
913-931.
45
W.
Andrews, D.
when a parameter
K., 1999. Estimation
is
on a boundary. Econometrica 66, 1341-83.
Berger, J. O., 2002. Bayesian analysis: a look at today and thoughts of tomorrow. In: Statistics in the 21st Century.
Chapman and
New
Hall,
York, pp. 275-290.
Berndt, E., Hall, B., Hall, R., Hausman,
Economic and
Bernstein, S., 1917.
Theory
Berry, S., Levinsohn,
Yahav,
Bickel, P. J.,
Geb
Measurement 3
Social
J.,
1974. Estimation
J.,
(4),
and inference
in
nonlinear structural models. Annals of
653-665.
of Probability. (Russian) Fourth Edition (1946) Gostekhizdat, Moscow-Leningrad.
Pakes, A., July 1995. Automobile prices in market equilibrium. Econometrica 63, 841-890.
J. A., 1969.
Some
contributions to the asymptotic theory of Bayes solutions. Z. Wahrsch. Verw.
11, 257-276.
and Measure, 3rd Ed. John Wiley and Sons.
Billingsley, P., 1994. Probability
Buchinsky, M., 1991. Theory of and practice of quantile regression, Ph.D. dissertation, Department of Economics
Harvard University.
Buchinsky, M., Hahn,
An
1998.
J.,
alternative estimator for the censored regression model. Econometrica 66, 653-671.
Bunke, O., Milhaud, X., 1998. Asymptotic behavior of Bayes estimates under possibly incorrect models. The Annals
of Statistics 26 (2), 617-644.
Chamberlain, G., Imbens, G., 1997. Nonparametric appliations of Bayesian inference,
Chernozhukov, V., Hansen,
C,
2001.
An IV model
of quantile treatment effects,
NBER Working
MIT
Paper.
Department of Economics
Working Paper.
Chernozhukov, V., Umantsev,
Economics
Chib,
S.,
2001.
Handbook
Markov chain monte
Hahn,
J.,
Chapter
5.
Computation and
inference. In: J.J.Heckman, Learner, E. (Eds.),
North Holland, pp. 3564-3634.
Inoue, A., 1999. Testing, comparing and combining value at risk measures, working Paper,
Wharton School University
of Pennsylvania.
Diaconis, P., Freedman, D., 1986.
Ann.
carlo methods:
of Econometrics, Vol 5,
Christoffersen, P.,
Doksum, K.
Conditional value-at-risk: Aspects of modeling and estimation. Empirical
L., 2001.
26, 271-92.
On
the consistency of Bayes estimates. Annals of Statistics
A., Lo, A. Y., 1990. Consistent
and robust Bayes procedures
for location based on partial information.
Statist. 18 (1), 443-453.
Engle, R., Manganelli, S., 2001. Caviar: Conditional value at risk by regression quantiles,
of
14, 1-26.
Economics
UC
Working Paper, Department
San Diego.
Fitzenberger, B., 1997.
A guide to censored
quantile regressions. In:
Robust
inference,
Handbook
of Statistics. Vol. 14.
North-Holland, Amsterdam, pp. 405^437.
Gallant, A. R., White, H., 1988.
A
Unified
Theory
of Estimation
and Inference
for
Nonlinear Dynamic Models. Oxford:
Bail Blackwell.
Geweke,
J.,
Keane, M., 2001. Computationally intensive methods for integration
Learner, E. (Eds.),
Goffe,
W.
L., Ferrier,
Handbook
of Econometrics, Vol 5, Chapter
G. D., Rogers,
J.,
5.
in
econometrics. In: J.J.Heckman,
North Holland, pp. 3465-3564.
1994. Global optimization of statistical functions with simulated annealing.
Journal of Econometrics 60, 65-99.
46
Hahn,
38
J.,
Hansen,
sample study. Intemat. Econom. Rev.
1997. Bayesian bootstrap of the quantile regression estimator: a large
(4),
795-808.
L.,
Heaton,
J.,
Yaron, A., 1996. Finite-sample properties of some alternative
GMM
estimators. Journal of
Business and Economic Statistics 14, 262-280.
Hansen, L.
P., 1982.
Large sample properties of generalized method of moments estimators. Econometrica 50
(4),
1029-1054.
Hogg, R. V., 1975. Estimates of percentile regression
Journal of American Statistical Associ-
lines using salary data.
ation 70, 56-59.
Huber,
P. J., 1973.
Ibragimov,
I.,
Robust
regression: Asymptotics, conjectures,
(3),
carlo.
Annals of Statistics
1,
799-821.
Has'minskii, R., 1981. Statistical Estimation: Asymptotic Theory. Springer Verlag.
Imbens, G., 1997. One-step estimators
64
and monte
for over-identified generalized
method
of
moments models. Rev. Econom.
Stud.
359-383.
Imbens, G., Spady, R., Johnson,
Econometrica
Jureckova,
J.,
P., 1998.
Information theoretic approaches to inference
in
moment
condition models.
66, 333-357.
1977. Asymptotic relations of M-estimators
and R-estimators
models. Annals of
in linear regression
Statistics 5, 464-472.
Khan,
S.,
Powell,
J. L.,
2001.
Two
step estimation of semiparametric censored regression models. Journal of Econo-
metrics 103, 73-110.
Kim,
J.-Y., 1998.
Large sample properties of posterior densities, Bayesian information criterion and the likelihood
principle in nonstationary time series models. Econometrica 66 (2), 359-380.
Kim,
J.-Y., 2002. Limited information likelihood
and Bayesian analysis. Journal of Econometrics
Kitamura, Y., 1997. Empirical likelihood methods with weakly dependent processes. Ann.
Kitamura, Y., Stutzer, M., 1997.
An
information-theoretic alternative to generalized
Statist.
,
175-193.
25
(5),
2084-2102.
method of moments estimation.
Econometrica 65, 861-874.
Knight, K., 1999. Epi-convergence and stochastic equisemicontinuity, Working Paper, Department of Statistics University of Toronto.
Koenker, R., 1994. Confidence intevals
Asymptotic
for quantile regression. In:
Proceedings of the 5th Prague
Symposium on
Statistics. Heidelberg: Physica- Verlag, pp. 10-20.
Koenker, R., 1998. Treating the treated, varieties of causal analysis, Lecture Note, Depratment of Economics University
of Illinois.
Koenker, R., Bassett, G.
S., 1978.
Regression quantiles. Econometrica 46, 33-50.
Kottas, A., Gelfand, A., 2001. Bayesian semiparametric median regression modeling. Journal of the American Statistical Association 96,
Lehman n,
1458-1468.
E., Casella, G., 1998.
Theory
of Point Estimation. Springer.
Macurdy, T., Timmins, C-, 2001. Bounding the influence of attrition on the intertemporal wage variation
Working Paper, Department
of
Economics Yale University.
47
in
the
NLSY,
Mood, A. M.,
W.
Newey,
1950. Introduction to the
Theory of
Uniform convergence
K., 1991.
Statistics.
McGraw-Hill Book Company,
Inc.
and stochastic equicontinuity. Econometrica 59
in probability
(4),
1161-
1167.
W.
Newey,
(Eds.),
K., McFadden, D., 1994. Large sample estimation and hypothesis
Handbook of Econometrics, Vol. 4. North Holland, pp. 2113-2241.
W.
Newey,
testing. In: Engle, R.,
and type I censored
K., Powell, J. L., 1990. Efficient estimation of linear
regression
McFadden, D.
models under conditional
quantile restrictions. Econometric Theory 6, 295-317.
W.
Newey,
W.
Newey,
of Economics
K., West, K. D., 1987.
A
and generalized empirical likeliood estimators,
MIT.
simple, positive semidefinite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica 55
Owen,
GMM
K., Smith, R., 2001. Higher order properties of
Working Paper, Department
(3),
703-708.
A., 1989. Empirical likelihood ratio confidence regions. In: Proceedings of the 47th Session of the International
Statistical Institute,
Book 3
(Paris, 1989). Vol. 53. pp. 373-393.
Owen,
A., 1990. Empirical likelihood ratio confidence regions.
Owen,
A., 1991. Empirical likelihood for linear models.
Owen,
A., 2001. Empirical Likelihood.
Ann.
Ann.
Statist. 18 (1), 90-120.
Statist. 19 (4), 1725-1747.
Chapman and Hall/CRC.
Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57 (5), 10271057.
Phillips, P. C. B., Ploberger,
64
(2),
W., 1996. An asymptotic theory of Bayesian inference
for
time
Pollard, D., 1991. Asymptotics for least absolute deviation regression estimator. Econometric
Potscher, B. M., Prucha,
Powell,
,
Econometrica
series.
381-412.
J. L.,
I.
R., 1997.
Dynamic Nonlinear Econometric Models.
Theory
7,
186-199.
Springer-Verlag, Berlin.
1984. Least absolute deviations estimation for the censored regression model. Journal of Econometrics
303-325.
Press,
W., Teukolsky,
S. A.,
Vettering, W., Flannery, B., 1992. Numerical Recipes in C,
The Art
of Scientific
Com-
puting. Cambridge.
Qin,
J.,
Lawless,
Robert, C.
C,
P. J.,
and general estimating equations. Annals
1994. Empirical likelihood
P., Casella,
Rousseeuw,
Sims,
J.,
G,
1999.
Monte Carlo
Statistical
Hubert, M., 1999. Regression depth.
1999. Adaptive Metropolis-Hastings, or
J.
of Statistics 22, 300-325.
Methods. Springer.
Amer.
monte
Statist. Assoc. 94 (446),
carlo kernel estimation,
388-433.
Working Paper, Department of
Economics Princeton University.
Stigler, S.
M., 1975. Studies
Laplace. Biometrika 62
van Aelst,
S.,
Rousseeuw,
in
(2),
the history of probability and statistics.
XXXIV. Napoleonic
statistics:
the work of
503-517.
P. J.,
Hubert, M., Struyf, A., 2002. The deepest regression method.
81 (1), 138-166.
van der Vaart, A., 1999. Asymptotic
Statistics.
Cambridge University
48
Press.
J. Multivariate
Anal.
van der Vaart, A. W., Wellner,
J. A., 1996.
Weak Convergence and
Empirical Processes. Springer- Verlag,
New York.
von Mises, R., 1931. Wahrscheinlichkeitsrechnung. Berlin: Springer.
White, H., 1994. Estimation, Inference and Specification Analysis. Vol. 22 of Econometric Society Monographs.
Cambridge University
Zellner, A., 1998. Past
Press, Cambridge.
and recent results on maximal data information
49
priors. J. Statist. Res.
32
(1), 1-22.
Table
1:
Monte Carlo Comparison of LTE's with Censored Quantile Regression Estimates Obtained
Programming (Based on 100 repetitions).
using Iterated Linear
Estimator
RMSE
MAD
Mean
Bias
Median Bias
Median Abs. Dev.
n=400
Q-posterior-mean
0.473
0.378
0.138
0.134
0.340
Q-posterior-median
0.465
0.372
0.131
0.137
0.344
Iterated LP(10)
0.518
0.284
0.040
0.016
0.171
3.798
0.827
-0.568
-0.035
0.240
0.089
n=1600
Table
Q-posterior-mean
0.155
0.121
-0.018
0.009
Q-posterior-median
0.155
0.121
-0.020
0.002
0.092
Iterated LP(7)
0.134
0.106
0.040
0.067
0.085
3.547
0.511
0.023
-0.384
0.087
2:
Monte Carlo Comparison of the LTE's with Standard Estimation
for
a Linear Quantile
Regression Model (Based on 500 repetitions)
RMSE
MAD
.0747
.0587
.0174
.0204
.0478
.0779
.0608
.0192
.0136
.0519
.0787
.0628
.0067
.0092
.0510
Q-posterior-mean
.0425
.0323
-.0018
-.0003
.0280
Q-posterior-median
.0445
.0339
-.0023
.0001
Standard Quantile Regression
.0498
.0398
.0007
.0025
Estimator
Mean
Bias
Median Bias
Median
AD
n=200
Q-posterior-mean
Q-posterior-median
Standard Quantile Regression
n=800
.0295
j
.0356
Table 3: Monte Carlo Comparison of the LT Inference with Standard Inference for a Linear Quantile
Regression Model (Based on 500 repetitions)
Inference
Method
coverage
length
n=200
Quasi-posterior confidence interval, equal tailed
.943
.377
Quasi-posterior confidence interval, symmetric (around mean)
.941
.375
Quantile Regression: Hall-Sheather Interval
.659
.177
Quasi-posterior confidence interval, equal tailed
.920
.159
Quasi-posterior confidence interval, symmetric (around mean)
.917
.158
Quantile Regression: Hall-Sheather Interval
.602
.082
n=800
Criterion for
IV-QR
Criterion for QB-Estimation
Markov Chain Sequence
2000
4000
6000
Q-Posterior for Theta
10000
-0.2
0.0
0.2
0.4
thela
Figure
A
Nonlinear IV Example involving Instrumental Quantile Regression. In the top-left panel
L n {9) is depicted (one-dimensional case). The true parameter
In the bottom-left panel, a Markov Chain sequence of draws (9^\ ...9 J ^) is depicted. The
1:
the discontinuous objective function
#o
=
0.
<-
marginal distribution of this sequence
point estimate, the sample
vertical lines are the 10-th
mean
is
9, is
pn (9)
=
e
Ln(8)
/
fQ
e
L"
w d&
,
seethe bottom-right panel. The
given by the vertical line with the romboid root.
and the 90-th
percentiles of quasi-posterior distribution.
panel depicts the expected loss function that the
LTE
minimize.
Two
other
The upper-right
Var(p) for dynamic model
2400
Figure
2:
Recursive
VaR
o.2
Surface in time-probability space
Var(p) for static model
2400
Figure
3:
Non-recursive
VaR
o.2
Surface in time-probability space
0.2
Figure
Figure
0.3
4: #2(1") for
5:
0.4
r €
0s{t) for t €
[.2, .8]
[.2, .8]
0.5
0.7
0.8
and the 90% confidence
intervals.
and the 90% confidence
intervals.
0.2
0.1
Figure
0.3
6: 9* (t) for
0.1
Figure
0.2
7:
r 6
0.3
g(r) for r 6
0.4
[.2, .8]
04
[-2, .8]
0.5
0.6
07
and the 90% confidence
05
0.6
07
and the 90% confidence
0.8
0.9
intervals.
0.8
intervals.
,
Throughout
this proof the range of integration for h
implicitly understood to
is
,
Hn
be
.
For clarity of the
argument, we limit exposition only to the case where J„(9) and tt„(9) do not depend on n. The more general
case follows similarly.
Part
1. Define
h
= y/H(9-Tn ), Tn =e o +-J(0o )-' A„(9
n
),
U„
=
-^=J(9
)-'
y'n
A„(0 O ),
(A.2)
then
=-y=pn (h/Vn +
p"n (h)
(^ +T") ex p(^(7% +T"))
7r
/«„ * (ts
7r
~fH
7r
+ U„lVn)
9
(^r
+ T")
exp
(
L - {7s + T")) dh
+T") exp(w(ft))
„*(^+Tn ) exp(u(h))dh
(v^
+T") exp(u W)
'
where
u>
= L n (rn + -j=\ - L (9
(h)
)
- i- A„
(9o)'
J
{9 )-'
A„
(9
(A.3)
)
and
C„
Part 2 shows that for each a
Am=[
Given (A .4), taking a
a
7v
+ T„)
f-j=
exp (u(h)) dh.
0,
ey.p(w{h))n(Tn
\h\
=
>
= f
+~\-eK P (-^h'j(9
dh^+0.
)h\n(9o
(A.4)
we have
C„-> p
/"
e-i h
'
Jieo)h
Tr(9
=
)dh
iT(eo)(,2iT)i\<ietJ(9 )\-
i/2
(A.5)
,
hence
C„ =
Op (l).
Next note
left
side of (A. 1)
= f
a
\p'
n (h)
\h\
-p'00 (h)\dh = A n C~'
where
„
=
/"
Q
|fc|
e
u,(
Using (A. 5), to show (A.l)
'°7r(V,
it
!
+ -^=)
suffices to
-(27r)-
show that
d/2
An
An < A
|detJ(5o)r /2 ex P
—
tjl
29
>
0.
+ A 2n
But
r-^'J(eo)^ C„
dh.
where
Then by
Cn {2-K)- d,2 \AetJ(fio)\ U2 &^{-\h'J{9o)h\
a
A 2n = f
\h\
-7r(0o)exp (-\h'J(9 )h\ dh.
(A.4)
and
C„(27r)-
Part
2. It
d/2
1/2
|detJ(e
)|
(fi
J (9
)'
)-'
= -\tiJ{6o)h + IU
Split the integral
A\ n
in (A.4) over three
exp f-lh'J(9
(i)
rea
(ii)
M <\h\< Sy/n,
rea
(iii)
\h\
:
\h\
Each of these areas
We
will
A„
(fl
f-j=
)
+ i?n
f-^:
)
in (A. 2)
h\ dh
and (A. 3), write
+ T„)
+ T„y
separate areas:
<M,
rea
(i):
\h\
remains only to show (A.4). Given Assumption 4 and definitions
- ^-A n
Area
a
f
-7r(6lo)|
>
is
Sy/n.
implicitly understood to intersect with the range of integration for h,
show that
<
for each
liminfP,
•{/
J\h\<t
\h\<
M < oo and each
e
>
exp(w(h))7z(rn
+
-^=\
which
is
H„.
[
(A.6)
- exp ( - hi
This
is
J (fl
)
h\
-k
(do) \dh
<
e \
> 1-e.
proved by showing that
sup
\h\°
exp(w(h))n (t„
+
-j=\ - exp (-lh'J(6 )h\
7r(0 o )
(A.7)
\h\<M
Using the definition of
(a)
ui (h),
sup
|M<M
where
(a) follows
(A. 7) follows from:
"ts +T")
7T
(0 O )
0,
(6)
sup
\h\<M
"•(^ +T")
from the continuity of 7r() and because by Assumption
1
Jn
J(8 )-'A n
(9
30
)=Op (l).
o,
4ii-4.iii:
(A.8)
Given (A.8),
from Assumption
(b) follows
4.iv, since
Tn +
sup
\h\<M
Area
(ii):
We
show that
>
for each e
a
\h\
I
J_
Op {\/y/n).
M and small 5 >
there exist large
4/
\unmfl\{
h
\pn
JM<\h\<6Sn
|
exp (w(h)}
(t„
it
+
such that
-J=j
'-
(A.9)
-exp(-*-tiJ{e )h\n{9
Since the integral of the second term
to show that
for each e
>
liminf P,\
n
In order to
do
and can be made
finite
is
M and small
there exist large
exp(io(ft))7r
+
(T„
+
-^=
< Cexp
-^=
dh
)
V"/
\
-
M large,
by
setting
e I
> 1-e.
it
suffices
>
(-ife',7(6>o) ft)
,
<
(A. 10)
J
M as n —
to show that for sufficiently large
so, it suffices
1
such that
exp(w(h))w [Tn
\h\"
[
[JM<\h\<6^
)
arbitrarily small
>
i5
<e\ >
dh
oo
for all
M<
<
\h\
6y/n.
(A.ll)
J
By assumption
By
7r(-)
< K,
so
we can drop
from consideration.
it
definition of w{h)
exp (w
Since \T„
— do\ =
o p (l), for
any
<
(h))
S
[
wp —
>
T„
exp
>
r=
H
•Jn
— -h'J
(6o)
— 8o <
sup
<|hl<*Vn \h
l
Since £| J (e )~
An
2
(fl
)
[
= O p (1),
+ ^J(6
for
some
liminfP.^ exp(u>(/i))
>liminfpje"'
°|e'"
c 't)
(
for all
25,
+
\R n (Tn
PA
('-*)
1
Thus, by Assumption 4.iv(a) there exists some small
liminf
h + Rn
)-
<5
and
\h\
large
-$=)\
1
A„
<
8^/n.
M such that
j
< imineig(J(e9o))|
)) J.>l-e.
~ 4
2
(So)
|
C>
< Cexp
(
-ift'J(fl
<
Cexp -±/i' J (9
<CexpC-l
> 1-e.
(A. 12) implies (A. 11), which in turn implies (A.9).
31
)
h
)ft
+
(A.12)
^mineig(J
:
(0 O ))
|ft|
Area
(iii):
We
show that
will
for each e
and each
a
PA f
liminf
>
exp(tu (h))
\h\
>
S
-k
0,
+
( T„
-^-
)
(A.13)
-expf--/»'j(0 o )/i)
The
integral of the second
term clearly goes to
a
f
term
Recalling the definition of h, the
^/n
x+i
T„-e
f
—
T„ — 9o
0,
wp
tf„
•
->
C
1 this is
•
^
is
e"
(fc)
oo.
(t„
7r
Therefore we only need to show
+ -^=) dh^ p
0.
bounded by
7r(fl)exp[£ n (fl)-i„(9o)-^-A„(9o)'J(flo)2n
\
J\e-Tn \>s
Since
|fc|
n—
as
d/i<el > 1-e.
tt(0o)
+1
,
A„(e
)
)d9.
/
bounded by
/
(1
Q
+
|0|
) 7T (0)
exp (L„
(0)
- L„
(0 O )) <#,
/|9-9 l>S/2
where
=
tf„
exp
f-ii. (0o
)'
J (ft,)
-1
A n (0O
= O p (l).
)J
By Assumption 3
>
there exists e
such that
liminf P.[
n-+°°
Thus,
wp —>
1
the entire term
is
sup
*-m-L.lto)
< e -«l =
1
J
bounded by
K„C- v^"
Here observe that compactness
e
(_|9-9ol>*/2
is
1
"1
e
- "'
/
|(9|°tt
(0)
d0
=
(A.14)
op (l).
only used to insure that
Q
/
|0|
7r(0)d0<oo.
(A. 15)
J&
Hence by replacing compactness with the condition (A. 15), the conclusion (A.14)
is
not affected for the given
a.
The
A. 2
entire proof
is
now completed by combining
(A.6), (A.9),
and
(A.13).
Proof of Theorem 2
For clarity of the argument, we limit exposition only to the case where Jn (0) and ii„(9) do not depend on
n.
The more
general case follows similarly.
Recall that
h
= VM0 - Oo) -J(9o)~ A„
l
32
{Oo)
iVn.
.
.
Define
Un =
J(9o)
'
A n (0o)/\/n.
Consider the objective function
Qn(z)= f p(,z-h-U„)p'„(h)dh,
which
minimized at y/n{9
is
—
Also define
do).
= / p(z-h-U„)pUh)dh.
J**
Q=»(z)
which
minimized at a random vector denoted
is
£
=
arginf
<
is
symmetric, $
Therefore,
=
Zn —
is
unique and
finite
Define
.
p(z
I
\hi
{ Jit*
z£* d
l
Note that solution
Zn
we have
by Assumption 2 parts
2.ii
p(h)
<
1
+
|Q„( Z )-Qco(z)|<
\h\ p
/
and by
{l
<
<
f
(l+2p
loss function p.
When p
=Op (l)-
/
(l
where o p (l)-conclusion
by Theorem
is
+
2
-
p-
f
1
+ b\" <
2p
~l
\a\
p
+ 2 p_1 |b|"
for
p>
1:
+ \z-h-Un p )(pUh))dh
\
[
+
note that
on the
(iii)
- Qoo(zHp
\a
(l
/
+
1
\h\"
{l
+
1
\h\
{l
p
+
+
2"- 1 \z-U \")\p' {h)-p' (h)\dh
n
n
00
2"- ,
\h\
p
+
2'" ,
\z-Un
p
\
){p'0O (h))dh
+Op (V)(p'n(h)- P Uh))dh
2»-
1
\h\
p
+ Op (l))(pUh))dh =
and exponentially small
tails of
o p (l)
>
the normal density (Lebesgue
converges to zero)
Q n {z)
and Qoo(z) are convex and
lemma of Pollard
convexity
and
+ \z-h-Un n\pl(h)-pZo{h)\dh
+
Now
(ii)
any fixed z
for
H^
\
arginf z6R dQoo(z) equals
by Assumption
measure of
dh
by Anderson's lemma.
Qn(z)
since
h) p"^{h)
J
Z„=Z + Un
Next,
-
finite,
and
Zn =
arginf 26Rti(5oo(z)
= O p {\). By
the
(1991), pointwise convergence entails the uniform convergence over compact sets
K:
sup \Q„(z)
Since Z„
\/n{9
—
= O p (l),
6o)
Proof of
Qoo(z) ->>
uniform convergence and convexity arguments
— Zn -t p
Zn —
-
0,
as
shown below.
\/n{Q
—
6q)
=
o p {\).
The
0.
like
those in Jureckova (1977) imply that
proof follows by extending slightly the convexity argument of
Jureckova (1977) and Pollard (1991) to the present context.
33
Consider a
ball
Bi(Z n ) with radius
8
>
0,
Zn and let z = Zn + dv, where v is a unit direction vector
Zn = Op (l), for any 6 > and e > 0, there exists K > such that
centered at
,
HminfP,{En = {Bs (Zn ) 6
By
convexity, for any z
= Zn + dv
constructed
J(Q«(*)
where
z*
is
a point of boundary of Bg(Z„) on the
Qn(z) to Qoo(z) over any compact
^(<9n(z)
exists an
1
—
is
a uniformly
>
7j
- 0„(Z»)) >
in
n
Q„(**)
and d
>
S.
Because
(A.16)
),
By
the uniform convergence of
- Q»(Z„)
- Qoa(Z„) +Op(l) >
Qoo(z*)
P(Vn >
>
rj)
1
—
Z„
is
V„
+ Op(l),
That
the unique optimizer of Qoo-
Hence we have with probability
e.
is,
there
at least as big as
3e for large n:
2(Qn(z)-Q„(zn ))> v
—
Thus, y/n(8
small
r\
>
0, it
we
like
by picking
.
a complement of Bg(Z„) with probability at most
0q) eventually belongs to
set e as small as
(a) sufficiently large
K, and
(b) sufficiently large n,
Since this
is
true for any 8
>
0, it follows
Since
we can
(c) sufficiently
=
0.
that
-Vn(e-6
Z„
A. 3
3e.
and
follows that
limsupP'{|Z„ -x/ra(0-0 o )| > s]
)
=
o p (l).
U
Proof of Theorem 3
For clarity of the argument, we limit exposition only to the case where J„(9) and
n.
1
that
Q n (z')-Q n (Zn
positive variable, because
such that lim inf„
=
e.
connecting z and Z„.
line
\v\
Bk(0), whenever E„ occurs:
set
>
where V„ >
B K {0)}} > 1 -
so, it follows
-Q«(Z«)) >
such that
The more
We
general case follows similarly.
*i,»(aO
Evaluate
it
at
x
=
g(0o)
+ s/yjn
= f
eee:g(e)<
do not depend on
p n (6)d6.
and change the variable of integration
Hg n (s) = Fg n (g(9o) + s/y/n) =
,
f2 n (0)
defined
,
Hs ,n(s) =
p"„(h)dh.
f
x
P' (h)dh
/
J/i6R J 9 (9 +fc/ vAf+t/„/VH')<9(9o)+s/v/:S
:
H9 ,oo(s) =
I
plcitydh.
34
By
definition of total variation of
moments norm and Theorem
sup \Hg>n (s)
where the sup
By
is
taken over the support of
-
1
Hg n (s)\-y p
0,
,
H
gi „(s).
the uniform continuity of the integral of the normal density with respect to the boundary of integration
sup|-ffs ,„(s)
- Hgtao (s)\->p
0,
SUp \Hg, n (s)
- Hg i00 (s)\-*p
0.
which implies
3
where the sup
is
The convergence
taken over the support of
Hg<n
(s).
of distribution function implies the convergence of quantiles at continuity points of distri-
bution functions, see
(1994), so
e.g. Billingsley
H-Ua) - H~Ua)^ P
0.
Next observe
=
ffs ,»(s)
H'Ua) =
where qa
is
the a-quantile of A/"(0,
Recalling that
we
defined c g ,„(a)
p{Vg(0o)'tf(Un J
-,
(0o))
,
+ qa ^V<,g{e
Vg{8 )'Un
<
)'J-i(6
s\un ],
)V e g(e
),
1).
=
F~l(a), by quantile equivariance with respect to the monotone trans-
formations
Hg ,i(a) = Vn (cj,„(a) - g(8
))
so that
%Mc9 ,n(a) -ff(0o)) = Vg{eo)'Un +qa ^s9{SoyjThe
A.4
rest of the result follows
1
(e )Veg(.e
)
+
o p (l).
by the A-method.
Proof of Theorem 4
In view of Assumption
4, it suffices
to
show that
J;'W-^'(«oK0,
(A.17)
and then conclude by the A-method.
Recall that
h
= V^(o - e
)
- j„
-1
(flo)
'
and the
localized Quasi-posterior density for
p*„
(h)
h
a„
-
(0 O )
M^,
'
is
= —=pn
(h/y/n
y/Tl
35
+ 9o + Un/y/n)
.
Note also
J-\e
)
= f
n{8
= f
(ft
= f
htiplcWdh.
- ff){6 - 9)'p n (9)d9
- JH{9-
9
)
+
U„)
(h
- yft{8-8o) + Un )'p'„(h)dh,
and
J-
We
have, denoting h
1
{9
=
(a)
/Hn
fchj (p'n (h)
(b)
JHC
hihj
W
/*„
l^"|
(d)
/H
|Tn|
{e)
fH ^hj fni
)
(h u
and
- pUh))dh =
lp^(h))dh
2
...,dd )
=
(ptcC1 ))^
(p*„(h)
(T„i, ...,Tnd )
Theorem
o p (l) by
where f„
= V"(0 - 0o) -
t/„, for all i,j
<d
1,
op (l) by definition of p*x and J„(6o) being uniformly nonsingular,
(p"n(h)-pUhj)dh
2
fn =
=
°p(1)
= op {l)
by Theorem
by Theorem
2,
2, definition
-p*00 (h))dh = ov (1) by Theorems
of p'^, and J„(0o) being nonsingular,
1
and
2,
=<v(i>
(f)
hj
JH
(p^oW)*" =
T„i
1
°p(l) by
Theorems
1
and
2, definition
of p^,,
and J„(#o) being uniformly
=Op(l)
nonsingular, from which the required conclusion follows.
A. 5
Proof of Proposition
Assumption 3
1.
It
is
directly implied
1
by
(4.1)-(4.4)
remains only to verify Assumption
and the uniform
continuity of Ew.i
(ff),
as
shown
in
Lemma
4.
Define the identity
L„(9)
-
L„(9
)
= - ngn (9o)'W(6
A„(»
)G(9
) (6
-
)
)'
(A.18)
Q )'nG(0 o )'W{9o)G{0o){8-0o
-H<,e-0
^
*
Z
)
+
Rn(9).
^
Next, given the definition of A„(#o) and J{9o), conditions
conditions
i-iii
of Proposition
succinctly stated
1.
Condition iv
is
i,
ii,
iii
verified as follows.
of
Assumption
4.
are immediate from
Condition iv of Assumption 4 can be
as:
for each e
>
there exists a 5
>
such that lim sup P*./
<
\
36
|9
sup
-"
|<*l
o
—
\R„ (9)
-—tj
r-
+
|
n|0-0o| 2
>
1
e >
J
<
e
,
This stochastic equicontinuity condition
see e.g.
Andrews
is
equivalent to the following stochastic equicontinuity condition,
(1994a):
forany5„->0
_£=ML_ = 0p
sup
(A .19)
(l).
W-9o\<t~ l+n\(f—(fo\
This
is
weaker than condition
(v) of
Theorem
SU P
Rn
l
where the term
brackets
in
-f=Ta
a
Rn
(0)
+ n|0-0o| 2
^/E\9
Newey and McFadden
7.1 in
m
.
i
^/ii\9
(6)
- 9 + n\9 \
9o\
y/E\9-9
Hence the arguments of the proof, except
At
first
sup
W
>
(
- 6p\ + n\B - 6p
l+n|0-0o 2
A
-
20 )
21
|
\
+ n\9-9
l
'
~
2
\
important differences, follow those of Theorem 7.2 in
for several
(1994).
note that condition iv of Proposition
esBj„(e
2
»
bounded by
is
^
Newey and McFadden
=
H~ll
(1994), which requires
e (9)
=
)
op
-L)
,
f
Vv"/
where
implied by the condition (where
1 is
g"
=
e (0)
3"
o)
~
(<?)
for
<2f
^
f
l + -/n|0-0o|
any
we
let
g (9)
ft.
-*
0.
(9
-9
= Eg„
(9)):
(A.21)
From (A.18)
Rn
{9)
=
+ R2n(6) + R3 n(9),
Rln(9)
where
Rln (9) =n(g n (0 O
)'
Wn {9)G
(9
)
(9
- \9n
R 2 „ (9) =n(^gn
(9o)'
(Wn (9 ) -
-9
(9)'
Wn
)
+ \(9- 9
Wn (9)g
(0)) gn (0 O
n (9)
)'
G (9
+ \gn
)'
W{B)G{9
(0O )'
Wn (9)gn
)
)
(0 O ))
)Y
R3n (9) =n (g n (fio)' (W(9 - Wn (9))) G (0o) (9 - 0o)
)
+ \(9- Bo)' G
Verification of (A. 19) for the terms
uniform consistency of
in 9
as
It
\9
by condition
-
O
|
->
i
W
n (9) in 9 as
of Proposition
1,
R2 n {9) and
assumed
so that
R3n
(fl
(9)
(W{9
)
- W{9)) G (9
i
of Proposition
- W{9) =
1
o p (l) uniformly in 9 and
remains to check condition (A. 19) for the term Ri n (0). Note that
=
(1
+
v/S|0
-
(9
-
0o|) e (0)
37
+ g (0) + gn
O ))
= Op (l)
and from the continuity
0.
g n (0)
)
immediately follows from ,/ng n (9o)
in condition
Wn
)'
(0 O )
.
W{9) - W(9
and the
of
)
W
=
(9)
o(l)
,
and decompose
Substitute this into R\ n (9)
- -flm
n
+
(i
(«)
=
(1
2__
I
+ V^\e - e
\)
+ v^|0 - 0o|)
2
W„(9) e
e (0)'
(0)
w„(%»
« (0)'
>
+ S„
(fl
)
+
(i
Wn(0)(9
- G (9
(<?)
)) (9
+ V£|0 - to|)
<=
(e)'
'
X
/
w
>
+ ^/m:) <
+ Til 2 _
yfnx
2
'
1
1
<
+ ra ! _
SU P
ees^ («o)
M
su p
(<*)
sup
w
su p
m
SUP
1
J
'
1
+ y/nx <
+ rtx 2 ~
o{\9
—
is
9o\)
condition
i
j?
t
*
+ n e ~ ae°\
i
a"
«|r 2
,
i
finite
n
+ Vnx) < Vn
2
+ nx 2 ~~ x
(1
g
1
'
1
!
."LV/^
|2
,
,
(g)
±i
:.v 1,2
I
+
i
'
ifl
uniformly
iii;
(d) follows
o(VH|0-0o|)'
)
9o)
+
o
(\9
—
)
.
—
9o\),
,
m
(A. 22)
'
(A.21):
>
,-
Wv"»»
JiT^"
2
,1
o
1 + n|0 O
-„
.
(to)
m
= oP (1)
|
=
^
su p
2«l«(0)'w«(0)0»(to)l
<
^p
2^
<
<
sup
-p
IffWI
iSViw-(«)-^(»)i =
^
SU P
°i.(i),
-^ = 0,(1),
9
-So|
e(0)'Wn (0)2
^(i).
2
o(|6>-go| lV^(g)|)
12
|
the
first
follows
i,
fl
fl
|2
=
Wn
which states that
from (A. 22) and condition
i;
(e) follows
followed by applying condition
followed by applying condition
term R]„{0) now
...
°p t 1 )
•
{9)
=
equality follows by Taylor expansion g(9)
followed by applying (A.21) and condition
Verification of (A. 20) for the
A. 6
SU P
96B,„ (»
iii;
by (A. 22) and then replacing, by condition
G (9o) {9 — 9a) +o(\9 — 9o\),
-
a
in 9; in (b)
with
(9
=
i2
i2
—
G (9o)
|
?/^
o{\9
9q\),
(fl)
fl
and
i- iii
«eWW.(ff)e(tf)=«y(l)
SU P
^
\
and the second conclusion
and
12
from (A. 22), (A.21), and condition
(a) follows
W{9) >
?//^
i
n|rg(g)j
(/)
where
1
sup
m
o
*
each of these terms can be dealt with separately, by applying the conditions
( 6)
-
0:
2
a>
(e
r 6 (9)
Using the inequalities, for x
<
)
w
r 5 (9)
(1
)
'
vk„(% (e)
,
l
+ \g (<?)' (w.W - w(*)) 9 (0) + ^<? (0)' w(9) 9 (e) - -{e-e )'G (e y w(B)G (e
x
-9
v
>
'
„
(to)'
»
,
'
i;
(c) follows
ii,
+ op (l) and
— G (9o) (9 — So) +
W{9)
from (A. 22), (A.21),
g{9) with
G (9o) (9 — 9o) +
from replacing by condition
and
(f)
follows
from replacing g
ii
g(9)
(9)
with
i.
follows by putting these
terms together.
Proof of Proposition 2
Verification of
Assumption 3
consistency proofs of
is finite
is
standard given the stated conditions and
extremum estimators based on
and Kitamura (1997)
for cases
when
GEL
in
s takes on infinite values.
38
is
subsumed
Kitamura and Stutzer (1997)
We
shall
as a step in the
for cases
not repeat
it
when
s
here. Next,
we
Assumption
will verify
Define
4.
7(0)=arg
to show that uniformly in 9 n
It will suffice
6 Bs„(9o)
(0*))
-K^!>
Ln (9,y).
inf
for
any
<5„
—>
0,
(V^oJ + oj.a))-
we have the
»
!
1
GMM
set-up:
\
(A.23)
f-^f^nn^Jj,
where
V(9
The Assumptions
4.i-iii
5
>
=
i 127=1
Errn {6
)
rm
(0 O )'
-
follow immediately from the conditions of Proposition
verified exactly as in the
g„(9)
=
)
proof of Proposition
m i(Q)i tne
Donsker property assumed
2,
and Assumption
GMM
given the reduction to the
1,
case.
in condition iv implies that for
4.iv
is
Indeed, defining
any
e
>
0, there is
such that
- gn {8o) - (Egn (9) - Egn (6
HmsupPV
sup
Vn\gn(8)
limsupP'I
sup
^\9^)-9n{9o)-(Egn (9)-E9n (9o))\
>
1 + y/n\9 — 9o\
))\
> e\ < e,
which implies
n-njo
which
It
is
[ffgBstSo)
condition iv in Proposition
1.
The
arguments follow that
rest of the
We
only remains to show the requisite expansion (A.23).
W»)-*j.
first
in
1
<£
I
the proof of Proposition
1.
show that
o.
For that purpose we use the convexity lemma, which was obtained by C. Geyer, and can be found in Knight
(1999).
Convexity Lemma. Suppose Q n is a sequence of lower-semi-continuous convex R- valued random functions,
onRd and let V be a countable dense subset ofR *. If Q„ weakly converges to Q^, in R marginally (in
1
defined
,
finite-dimensional sense) on
V where
Qoo
is
lower-semi-continuous convex and finite on an open non-empty
set a.s., then
arginf Q„{z)-y d arginf Qoo(z),
z£R d
zeR d
provided the latter
is
uniquely defined
Next, we show that 7(0„)-> p
By
Define
F =
convexity and lower-semicontinuity of
Thus
9
0.
a.s. in
i-+
for
7 e F, Es[mi(8)'^] < 00
s,
for all
3d
{7
F
:
is
Es{mi{9
for
a given 7 e
F
and any
9n
—> p
<
its
9 e Bg(9o) and some 6
ii
and
{7 Es[mi(9oY"f] = 00}.
boundary is nowhere dense in R p
00} and
convex, open, and
Es[mi(9)'^/] over B{(9o) implied by the condition
Thus,
)'j\
>
9a
39
<
:
.
0,
iii.
-^s[tti,(6>„)'7Hp Es[mi(9o)'H
Fc =
00.
which follows by continuity of
This follows from the uniform law of large numbers implied by
6 B&(9o)}, where
1. {s[mi(8)'f],9
2. Erriiifi)
— Jx
The above
function set
mi(6)'-y
(b)
{jrii(8),9
(c)
s
e V
Donsker
take 7 in
a Donsker class
x] being continuously differentiable in 9
for
all
2.10.6 in
class
is
class
€ Bs(9o), some
6
by condition
where
class
dF
—1
5
>
by condition
VD
result,
first
0,
M
1T
and
iii.
iii,
iv,
by assumption on
,
s,
and a given 7 £ F, by construction of F,
denotes the boundary of F. Then
^
>J
s[mi(6„)'-y]
Rp
6
\
dF
we can expand the
first
=
wp —
>
oo-fp Es[m.i{8o)'"i]
=
1
00.
as the set T> appearing in the statement of the Convexity
=
7(0nH P
Given this
and
itself.
Now take all the rational numbers 7
Lemma and conclude that
form. Note
ii
1,
van der Vaart and Wellner (1996) that says a uniform Lipschitz transform of a
Donsker
F c \ dF,
wp —
Donsker by
e Bg(8o)} being Donsker
Theorem
Now
sufficiently small, being
being a uniform Lipschitz function over
(d) mi(<?)'7
(e)
is
<
is
M for some compact M and a given 7 e F, by condition
€
(a)
dP[mi(8)
S
arg inf £s[mi(0o)'7].
7
order condition for 7(#„) in order to obtain the expression for
its
n
= Yl Vs
(7(0*)'
m
i
(0«))
™i
(
9 ")
•':'
(A.24)
t=i
where
V„
=
"
-^V
1
=
i
for
some
~t{8n)
between
and 7
(9 n
),
2
s(7(fl„)'m,(e„))m i (9„)Tn i
(9 T,)',
l
which
is
different
from row to row of the matrix V„.
Then
K.->p V{0o)
=
Errn (9
)
rm
(9
)'
This follows from the uniform law of large numbers implied by
{V 2 s(-y'mi(9'))mi(9)mi(8)' ,(9',-y,9) e Bs ^(9
small, being a Donsker class wp — 1,
1.
2.
Emi(9)mi{9)'
17
Recall that
V
=
is
J xx'dP[m.i(8) <
x]
)
x Bj 2 (0) x Bs 3 (9
being continuous function
defined as the open convex set on which s
40
is finite.
in
)},
where
9 by condition
Sj
i,
>
are sufficiently
.
3.
EV 2 s{~f'mi (0*))m,-
rm
(£>)
= EV 2 s{0)rrii [9) m; (0)' + o(l)
(9)'
6 Bj(5
(9, 9")
uniformly in
)
x B*(0 O )
for
i
>
sufficiently smaJl S
The
claim 1
is
0, for ajiy
verified
7 —
0,
by assumptions on
by applying exactly the same
s
and condition
iii.
logic as in the previously stated steps (a)-(e).
For the
sake of brevity, this will not be repeated.
Therefore,
wp —
1
7(fl»)
= -(V n) -1 -y)m -(flB ) =
r
I
ft
-(V(tfor
1
+Op(l))-X)mi
B ).
(A.25)
(0n)+ rV^7(0„)'V„v^7(0«)>
(A.26)
n
.
t=i
(fl
.
t=i
Consider the second order expansion,
1
"
=y>
M0„,7(0„))=-v/S7(0n)'-7
1
<
where
for
some 7
and 7 (# n ), which
(#„) between
is
different
from row to row of the matrix V„.
By a
preceding
argument,
V;-> P V(9
Inserting (A.25)
A. 7
and V„
=
V(9o)
+ o p (l)
into (A.26),
).
we obtain the required expansion
(A. 23).
Proof of Proposition 3
Assumption 3
is
assumed.
We
need to verify Assumption
4.
Define the identity
L n (9) - L n (9
)
= £m,-(0o )'(0 -
9
)
A»(»o)'
(A.27)
+
\<fi- 9o)'nV ee .E mi(eo) (9-9
)
-JWo)
+ Rn{9).
Assumption
4.i-iii
then follows immediately from conditions
The remainder term
R„
(9)
Rn
(9) is given
= ]T
{mi(9)
i
and
ii.
Assumption
4.iv
is
verified as follows.
the following decomposition:
- m,(0o - £m,(0) + Emi{9
)
+
n (Em,(9)
)
-
m,{9
)'{9
- £m,(0 o )) + \{9 -
41
-
9
)j
9 )'nJ{9 ){9
-
9
)
¥
Assumption
to verify
It suffices
R*n(0)
for
9'
some
on the
To show Assumption
= -|n(0 - Bo)'
4.iv-(b) for Ri„(9),
B
|_|9-8
we note
sup
|<M/ V/S'
-
9
),
n
M>
that for any given
l-Ri* (0)
>
|
eI
J
<limsu P pl
sup
1
ifl-flol
^|9-e |<M/VH'
<HmsuP p(
last
(9
)]
over a ball at 9o-
in
limsupP'i
where the
- J(9
\J{9')
connecting 9 and 9o, verification of Assumption 4 for R2 n (9) follows immediately
line
from continuity of J(9)
and R2n{9). Since
4.iv separately for Ri„(9)
^'"^, >e\
|f - »0|
J
1
^JgLLMl >e =0
}
sup
(A.28)
,
conclusion follows from two observations.
First, note that
7nW
lf>\ =
~
is
V W~m
W
Rl »
y/E\e-0 O
m
- J_
(
V^fcfV
\
Donkser by assumption, that
The
is it
'( g °)
~
(
Em iW ~ ErmW) ~ rni{9
)'{9
-
fl
|*-flo|
)
\
/
converges in t°°(Bg(9o)) to a tight Gaussian process Z.
process has uniformly continuous paths with respect to the semimetric p given by
2
(9 1 ,9 2 )
p
so that p(9,9)
—>
if
9
—
9o-
Thus almost
all
= E(Z(9 )-Z(92 )) 2
1
sample paths of
Z
,
are continuous at So-
Second, since by assumption
E[m n (9) - m„(9
we have
for
any 9 n
—
)
-l*Z{9o)'{9
-
9 )f
=
-
o{\9
2
<?
|
),
9o
E*
i/<Y,aU!
f_°(|0n-0|
,V^|Cn -Sol.
|»n
-
0O|
2
)
_
2
Qj
therefore
z(e
Therefore for any 6'— v 9o,
)
=
o.
we have by the extended continuous mapping theorem
ZniP')-*d Z(9
)
=
0,
that
is
Zn (9')-* p
(A. 29)
0.
This shows (A.28).
To prove Assumption 4iv-(a)
for
"'
R\ n (9), we need to show that
—
i
!>/'
[
sup
for
'f.'"^!
42
some
5
>
>e\<e.
and constant
M
(A.30)
Using that
M/y/n <
\9
—
9o\,
bound the
,.
Iim sup
Pp .
by
left-hand-side
f
|-Rln(0)|
sup
<
'
< hm sup P
y/n\9-8
\
-ipr-
sup
<
—>
1
—7=7-
,,
Vn\9-9
\M/.fR<\e-e \<5
—
\
e >
J
\
-i-
77 >
•
e >
(A.31)
<limsupP*{
sup
^M/ v 'S'<|e-«ol<*
"
where
for
any given
by the property
B
>
e
(A. 29) of
order to
in
Zn
or
make
make
M
!
the last inequality true,
sufficiently large
we can make
either S sufficiently small
Zn = O p - (1).
by the property
Appendix on Computation
B.l
A
computational lemma
we record some formal
In this section
LEMMA
3 Suppose
such that q{8\8')
the chain (9
results on
,j
< B)
is
MCMC
computation of the quasi-posterior quantities.
produced by the Metropolis Hastings(MH) algorithm with q
for each (9,9'). Suppose also that P{p(9 U) ,£)
>
=
1}
Pn(-)
2.
the chain is ergodic with the limit marginal distribution given by
is
)
for all j
>
i
-
Then
where the supremum
is
taken over the Borel
- f
p n ():
Pn<fi)do\
= 0,
Ja
I
3.
>
the stationary density 0} the chain,
1.
iB)
lim sup \p(9
e A\9
B»->oo A
sets,
For any p n - integrable function g:
B
1
D
Proof. The result
An
|Z n (0)|.J_> e L
is
b)
Y^g(9
immediate from Theorem
immediate consequence of
this
lemma
is
r
)^ P
6.2.5 in
/ g(9)pn (9)d9.
Robert and Casella (1999).
the following result.
LEMMA
4 Suppose Assumptions
Lemma
then for any convex and p n -integrable loss function p„
3,
1
and 2
hold.
Suppose
(
the chain {9 -'\j
B
1
1
are inf
i^p„(e
S£0
provided that 9
is
(j
'-0) ->„0 = argmf
uniquely defined.
43
r
< B)
satisfies the
(•)
/•
/
p„(8
-
9)p n {9)d9
conditions of
Proof. By
Lemma 3 we
have the pointwise convergence of the objective function:
for
any 9
B
>0)dO,
which implies the result by the Convexity Lemma, since 6
pn
i->
J p n {9 — 9)p n (9)d9
is
convex by convexity of
-
Quasi-Bayes Estimation and Simulated Annealing
B.2
The
relation
between drawing from the shape of a likelihood surface and optimizing to find the mode of the
likelihood function
well
is
known.
It is well
Km
A-xx.
Essentially, as
A —
established that, e.g. Robert
f xz
u-(„,9 )7r(5)d9
,„/,„
/e e
=
argmaxX n
» e9
and
Casella (1999),
(B.l)
(0)
sequence of probability measures
oo, the
XL ^ e
e
K(9)
(B.2)
converges to the generalized Dirac probability measure concentrated at argmax L„
The
difficulty of nonlinear
Sims
(1999)).
optimization has been an important issue in econometrics (Berndt et
The simulated annealing algorithm
considered a generic optimization method.
with a uniform prior n
=
(9)
number
The temperature parameter
convergence criteria for the
It is
(see e.g. Press et
is
optimum
al.
(1992), Goffe et
al.
(1994))
At each temperature
of Metropolis-Hastings steps to
level 1/A, the
draw from the quasi
if
non-smooth objective functions that may have many
a very delicate matter and
results of this paper
is
compute the
used
quasi-posterior
in place of
until
decreased at an arbitrarily
is
local extrema. Controlling the
optimum
of
temperature reduction
certainly crucial to the performance of the algorithm with highly
On the other hand, as Theorems 1 and 2
we may fix the temperature parameter 1/A at
apply equally to (B.2), the
functions.
show that
simulated
distribution
been widely used in optimization of non-likelihood- based
the temperature parameter
slow rate (that depends on the criterion function), simulated annealing can find the global
is
usually
then decreased slowly while the Metropolis steps are repeated,
semi parametric objective functions. In principle,
nonsmooth objective
(1974),
is
are achieved.
Interestingly, the simulated annealing algorithm has
parameter
al.
an implementation of the simulation based optimization (B.l)
c on the parameter space 0.
annealing routine uses a large
(B.2).
(9).
medians or means
for (B.2)
a positive constant and then
using Metropolis steps.
These estimates can be
maximum. They are consistent and asymptotically normal, and possess the same
the exact maximum. The interpretation of the simulated annealing algorithm as an
the exact
limiting distribution as
implementation of (B.2) also suggests that for some problems with special structures, other
such as the Gibbs sampler,
may be
MCMC methods,
used to replace the Metropolis-Hasting step in the simulated annealing
algorithm.
B.3
Details of
Computation
The parameter space
prior
is
is
truncated to 0.
taken to be
in
Monte-Carlo Examples
Q =
Each parameter
[8o
is
±
10].
The
transition kernel
is
a Normal density, and
flat
updated via a Gibbs-Metropolis procedure, which modifies
44
slightly
the basic Metropolis-Hastings algorithm: for k
density
q{\(,k
—
\,<t>)
0Jt
p(6^,£)
probability
is
and
rejection probability
is
first
N
=
5,
000
example.
-~,d,
a draw of
(/>
is
fit
from the univariate normal
adjusted every 100 draws (in the second
empirical example) or 200 draws (in the
first
simulation example) so that the
roughly 50%.
N
x d draws (the burn-in stage) are discarded, and the remaining
in
To
x d draws are used in
The starting value is the OLS estimate in all examples. We use
the second simulation example and empirical example and N = 10, 000 in the second simulation
computation of estimates and
TV
1,
specified in the text. Variance parameter
simulation example
The
=
made, then the candidate value £ consisting of £k and 9_^ replaces 0"' with
give
intervals.
an idea of computational expense, computing one
depending on the example.
results are available
All of the codes that
set of estimates takes 20-40
we used to produce
figures, simulation,
seconds
and empirical
from the authors.
Notation and Terms
convergence in (outer) probability P"
-t p
convergence in distribution under P'
—>d
wp —>
1
~
Bs(x)
J
with inner probability P» converging to one
asymptotic equivalence denoted
ball centered at
AT(0, a)
T Donsker class
>
A
is
positive definite
is
when
t°°(J-)
lim
AB~' =
I
matrix
normal random vector with mean
and variance matrix a
here this means that empirical process /
asymptotically Gaussian in
mineig(A)
means
identity matrix
A
A>
x of radius S
A~B
£°°(^r
metric space of bounded over
minimum
),
see
i->
— Ef(Wi))
is
van der Vaart (1999)
T functions,
eigenvalue of matrix
-t= X)™=1 (/(W,-)
see van der Vaart (1999)
A
References
Abadie, A., 1995. Changes in Spanish labor income structure during the 1980s: a quantile regression approach,
CEMFI
Working Paper.
Amemiya,
T., 1977.
The maximum
likelihood
and the nonlinear three-stage
nonlinear simultaneous equation model. Econometrica 45
Amemiya,
T., 1985.
The
integral of a
probability inequalities. Proc.
W.
least
squares estimator in the general
955-968.
Advanced Econometrics. Harvard University Press.
Anderson, T. W., 1955.
Andrews, D.
(4),
K., 1994a. Empirical process
set
and some
econometrics. In: Engle, R., McFadden, D. (Eds.),
Handbook
symmetric unimodal function over a symmetric convex
Amer. Math. Soc.
6,
methods
170-176.
in
of Econometrics, Vol. 4. North Holland, pp. 2248-2292.
Andrews, D. W. K., 1994b. The large sample correspondence between
odds
tests.
Andrews, D.
metrica 65
Econometrica 62
W.
K., 1997.
(4),
A
(5),
classical hypothesis tests
and Bayesian posterior
1207-1232.
stopping rule for the computation of generalized method of
913-931.
45
moments
estimators. Econo-
Andrews, D. W. K., 1999. Estimation when a parameter
is
on a boundary. Econometrica 66, 1341-83.
Berger, J. O., 2002. Bayesian analysis: a look at today and thoughts of tomorrow. In: Statistics in the 21st Century.
Chapman and
Bemdt,
New
Hall,
York, pp. 275-290.
E., Hall, B., Hall, R.,
Hausman,
Economic and Social Measurement 3
Bernstein, S., 1917.
Theory of
J.,
1974. Estimation
(4),
and inference
in
nonlinear structural models. Annals of
653-665.
Probability. (Russian) Fourth Edition (1946) Gostekhizdat, Moscow-Leningrad.
Berry, S., Levinsohn, J., Pakes, A., July 1995. Automobile prices in market equilibrium. Econometrica 63, 841-890.
Yahav,
Bickel, P. J.,
Geb
Some
J. A., 1969.
contributions to the asymptotic theory of Bayes solutions. Z. Wahrsch. Verw.
11, 257-276.
and Measure, 3rd Ed. John Wiley and Sons.
Billingsley, P., 1994. Probability
Buchinsky, M., 1991. Theory of and practice of quantile regression, Ph.D. dissertation, Department of Economics
Harvard University.
Buchinsky, M., Hahn,
J.,
1998.
An
alternative estimator for the censored regression model. Econometrica 66, 653-671.
Bunke, O., Milhaud, X., 1998. Asymptotic behavior of Bayes estimates under possibly incorrect models. The Annals
of Statistics 26 (2), 617-644.
Chamberlain, G., Imbens, G., 1997. Nonparametric appliations of Bayesian inference,
Chernozhukov, V., Hansen,
C,
2001.
An IV model
NBER Working
MIT
of quantile treatment effects,
Paper.
Department of Economics
Working Paper.
Chernozhukov, V., Umantsev,
Economics
Chib,
S.,
2001.
Handbook
L.,
2001. Conditional value-at-risk: Aspects of modeling and estimation. Empirical
26, 271-92.
Markov chain monte
carlo methods:
Chapter
of Econometrics, Vol 5,
5.
Computation and
inference. In: J.J.Heckman, Learner, E. (Eds.),
North Holland, pp. 3564-3634.
Hahn, J., Inoue, A., 1999. Testing, comparing and combining value
Wharton School University of Pennsylvania.
Christoffersen P.,
,
Diaconis, P., Freedman, D., 1986.
Doksum, K.
Ann.
On
and robust Bayes procedures
for location based on partial information.
443-453.
Engle, R., Manganelli, S., 2001. Caviar: Conditional value at risk by regression quantiles,
of Economics
UC
working Paper,
the consistency of Bayes estimates. Annals of Statistics 14, 1-26.
A., Lo, A. Y., 1990. Consistent
Statist. 18 (1),
at risk measures,
Working Paper, Department
San Diego.
Fitzenberger, B., 1997.
A
guide to censored quantile regressions.
In:
Robust
inference,
Handbook
of Statistics. Vol. 14.
North-Holland, Amsterdam, pp. 405-437.
Gallant, A. R., White, H., 1988.
A
Unified Theory of Estimation and Inference for Nonlinear
Dynamic Models. Oxford:
Bail Blackwell.
Geweke,
J.,
Keane, M., 2001. Computationally intensive methods for integration
Learner, E. (Eds.),
Goffe,
W.
L., Ferrier,
Handbook
of Econometrics, Vol 5, Chapter 5.
G. D., Rogers,
J.,
in
econometrics. In: J.J.Heckman,
North Holland, pp. 3465-3564.
1994. Global optimization of statistical functions with simulated annealing.
Journal of Econometrics 60, 65-99.
46
Hahn,
38
J.,
(4),
Hansen,
1997. Bayesian bootstrap of the quantile regression estimator:
a
large
sample study. Internat. Econom. Rev.
795-808.
L.,
Heaton,
J.,
Yaron, A., 1996. Finite-sample properties of some alternative
GMM
estimators. Journal of
Business and Economic Statistics 14, 262-280.
Hansen, L.
Large sample properties of generalized method of moments estimators. Econometrica 50
P., 1982.
(4),
1029-1054.
Hogg, R. V., 1975. Estimates of percentile regression lines using salary data. Journal of American Statistical Association 70, 56-59.
Huber, P.
J.,
Ibragimov,
I.,
1973.
Robust regression: Asymptotics, conjectures, and monte
carlo.
Annals
of Statistics 1, 799-821.
Has'minskii, R., 1981. Statistical Estimation: Asymptotic Theory. Springer Verlag.
Imbens, G-, 1997. One-step estimators for over-identified generalized method of moments models. Rev. Econom. Stud.
64 (3), 359-383.
Imbens, G., Spady, R., Johnson,
Econometrica
Jureckova,
66,
J., 1977.
P., 1998.
Information theoretic approaches to inference
in
moment
condition models.
333-357.
Asymptotic
and R-estimators
relations of M-estimators
models. Annals of
in linear regression
Statistics 5, 464-472.
Khan,
S.,
Powell,
J. L.,
2001.
Two
step estimation of semiparametric censored regression models. Journal of Econo-
metrics 103, 73-110.
Kim,
J.-Y., 1998. Large
sample properties of posterior
densities,
Bayesian information criterion and the likelihood
principle in nonstationary time series models. Econometrica 66 (2), 359-380.
Kim,
J.-Y., 2002. Limited information likelihood
and Bayesian analysis. Journal
of Econometrics
Kitamura, Y., 1997. Empirical likelihood methods with weakly dependent processes. Ann.
Statist.
,
175-193.
25
(5),
2084-2102.
Kitamura, Y., Stutzer, M., 1997. An information-theoretic alternative to generalized method of moments estimation.
Econometrica
65,
861-874.
Knight, K., 1999. Epi-convergence and stochastic equisemicontinuity, Working Paper, Department of Statistics University of Toronto.
Koenker, R., 1994. Confidence intevals for quantile regression.
Asymptotic
Statistics. Heidelberg:
In:
Proceedings of the 5th Prague Symposium on
Physica- Verlag, pp. 10-20.
Koenker, R., 1998. Treating the treated, varieties of causal analysis, Lecture Note, Depratment of Economics University
of Illinois.
Koenker, R., Bassett, G.
S., 1978.
Regression quantiles. Econometrica 46, 33-50.
Kottas, A., Gelfand, A., 2001. Bayesian semiparametric median regression modeling. Journal of the American Statistical
Association 96, 1458-1468.
Lehmann,
E., Casella, G., 1998.
Theory of Point Estimation. Springer.
Macurdy, T., Timmins, C., 2001. Bounding the influence of
Working Paper, Department
of
attrition
Economics Yale University.
47
on the intertemporal wage variation
in
the
NLSY,
Mood, A. M.,
Newey,
Theory of
1950. Introduction to the
Statistics.
McGraw-Hill Book Company,
Inc.
W.
K., 1991. Uniform convergence in probability and stochastic equicontinuity. Econometrica 59
W.
K.,
(4),
1161-
1167.
Newey,
(Eds.),
W.
Newey,
McFadden, D.,
Handbook
K., Powell,
J. L.,
W.
North Holland, pp. 2113-2241.
Theory
6,
W.
of
I
censored regression models under conditional
295-317.
GMM
K., Smith, R., 2001. Higher order properties of
Working Paper, Department
Newey,
4.
1990. Efficient estimation of linear and type
quantile restrictions. Econometric
Newey,
McFadden, D.
1994. Large sample estimation and hypothesis testing. In: Engle, R.,
of Econometrics, Vol.
and generalized empirical likeliood estimators,
Economics MIT.
K., West, K. D., 1987.
A
simple, positive semidefinite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica 55 (3), 703-708.
Owen,
A., 1989. Empirical likelihood ratio confidence regions. In: Proceedings of the 47th Session of the International
Statistical Institute,
Owen,
Book
A., 1990. Empirical likelihood ratio confidence regions.
Owen, A., 1991. Empirical
Owen,
3 (Paris, 1989). Vol. 53. pp. 373-393.
likelihood for linear models.
A., 2001. Empirical Likelihood.
Ann.
Ann.
Statist. 18 (1), 90-120.
Statist. 19 (4), 1725-1747.
Chapman and Hall/CRC.
Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57 (5), 10271057.
Phillips, P.
C. B., Ploberger, W., 1996.
An
asymptotic theory of Bayesian inference for time
Econometrica
series.
64 (2), 381-412.
Pollard, D., 1991.
Asymptotics
Potscher, B. M., Prucha,
I.
for least absolute deviation regression estimator.
R., 1997.
Dynamic Nonlinear Econometric Models.
Econometric Theory
7,
186-199.
Springer-Verlag, Berlin.
Powell, J. L., 1984. Least absolute deviations estimation for the censored regression model. Journal of Econometrics
,
303-325.
Press,
W., Teukolsky,
S. A., Vettering,
W., Flannery, B., 1992. Numerical Recipes
in
C,
The Art of
Scientific
Com-
puting. Cambridge.
Qin,
J.,
Lawless,
J.,
1994. Empirical likelihood and general estimating equations. Annals of Statistics 22, 300-325.
Robert, C. P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer.
Rousseeuw, P.
Sims,
C,
J.,
Hubert, M., 1999. Regression depth.
1999. Adaptive Metropolis-Hastings, or
J.
Amer.
monte
Statist. Assoc. 94 (446),
carlo kernel estimation,
388-433.
Working Paper, Department of
Economics Princeton University.
Stigler, S.
M., 1975. Studies
in
the history of probability and statistics.
XXXIV. Napoleonic
statistics: the
work of
Laplace. Biometrika 62 (2), 503-517.
van Aelst,
S.,
Rousseeuw,
P. J.,
Hubert, M., Struyf, A., 2002. The deepest regression method.
81 (1), 138-166.
van der Vaart, A., 1999. Asymptotic
Statistics.
Cambridge University
48
Press.
J.
Multivariate Anal.
van der Vaart, A. W., Wellner,
von Mises,
J. A., 1996.
Weak Convergence and
Empirical Processes. Springer- Verlag,
New York.
R., 1931. Wahrscheinlichkeitsrechnung. Berlin: Springer.
White, H., 1994. Estimation, Inference and Specification Analysis. Vol. 22 of Econometric Society Monographs.
Cambridge University
Zellner, A., 1998. Past
Press, Cambridge.
and recent results on maximal data information
49
priors. J. Statist. Res.
32
(1), 1-22.
Table
1:
Monte Carlo Comparison of LTE's with Censored Quantile Regression Estimates Obtained
Programming (Based on 100 repetitions).
using Iterated Linear
Estimator
RMSE
MAD
Mean
Bias
Median Bias
Median Abs. Dev.
n=400
Q-posterior-mean
0.473
0.378
0.138
0.134
0.340
Q-posterior-median
0.465
0.372
0.131
0.137
0.344
Iterated LP(10)
0.518
0.284
0.040
0.016
0.171
3.798
0.827
-0.568
-0.035
0.240
n=1600
Table
Q-posterior-mean
0.155
0.121
-0.018
0.009
0.089
Q-posterior-median
0.155
0.121
-0.020
0.002
0.092
Iterated LP(7)
0.134
0.106
0.040
0.067
0.085
3.547
0.511
0.023
-0.384
0.087
2:
Monte Carlo Comparison of the LTE's with Standard Estimation
for
a Linear Quantile
Regression Model (Based on 500 repetitions)
RMSE
MAD
.0747
.0587
.0174
.0204
.0478
.0779
.0608
.0192
.0136
.0519
.0787
.0628
.0067
.0092
.0510
Q-posterior-mean
.0425
.0323
-.0018
-.0003
.0280
Q-posterior-median
.0445
.0339
-.0023
.0001
.0295
Standard Quantile Regression
.0498
.0398
.0007
.0025
.0356
Estimator
Mean
Bias
Median Bias
Median
AD
n=200
Q-posterior-mean
Q-posterior-median
Standard Quantile Regression
n=800
Table
3:
Monte Carlo Comparison
of the
LT
Inference with Standard Inference for a Linear Quantile
Regression Model (Based on 500 repetitions)
coverage
length
Quasi-posterior confidence interval, equal tailed
.943
.377
Quasi-posterior confidence interval, symmetric (around mean)
.941
.375
Quantile Regression: Hall-Sheather Interval
.659
.177
Quasi-posterior confidence interval, equal tailed
.920
.159
Quasi-posterior confidence interval, symmetric (around mean)
.917
.158
Quantile Regression: Hall-Sheather Interval
.602
.082
Inference
Method
n=200
n=800
Criterion for
CO
Criterion for QB-Estimation
ps
•
\
to
J
"\
<*
IV-QR
CM
i
4
O
12
2000
4000
3
thela
theta
Markov Chain Sequence
Q-Posterior for Theta
6000
8000
10000
-0.2
0.2
0.0
0.4
thela
Figure
1:
A
Nonlinear IV Example involving Instrumental Quantile Regression. In the top-left panel
L n {0) is depicted (one-dimensional case). The true parameter
Markov Chain sequence of draws (0' ', ...0^) is depicted. The
marginal distribution of this sequence is p n {9) = e L " (8) / /e e LnW d0 see the bottom-right panel. The
point estimate, the sample mean 9, is given by the vertical line with the romboid root. Two other
vertical lines are the 10-th and the 90-th percentiles of quasi-posterior distribution. The upper-right
the discontinuous objective function
#o
=
0.
1
In the bottom-left panel, a
,
panel depicts the expected loss function that the
LTE
minimize.
Var(p) for dynamic model
24-00
Figure
2:
Recursive
VaR
o.2
Surface in time-probability space
Var(p) for static model
2400
Figure
3:
Non-recursive
VaR
0.2
Surface in time-probability space
9?(t) for r
Figure
4:
Figure
5: #3(7") for
€
r G
[.2, .8]
[.2, .8]
and the 90% confidence
intervals.
and the 90% confidence
intervals.
'
0.2
0.1
Figure
0.3
6: d\ (r) for
0.1
Figure
0.2
7:
r 6
0.3
q(t) for t e
0.7
0.6
0.4
[.2, .8]
0.4
and the 90% confidence
0:5
[.2, .8]
0.6
0.8
and the 90% confidence
^8 38
intervals.
[
G
0.9
intervals.
I 1 2003
Date Due
Lib-26-67
MIT LIBRARIES
3 9080 02613 1380
Download