Document 11163021

advertisement
Digitized by tine Internet Arciiive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
littp://www.archive.org/details/oncomputationalcOObell
fimEY
HB31
.M415
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
ON
THE
COMPUTATIONAL COMPLEXITY OF
MCMC-based ESTIMATORS
IN
LARGE SAMPLES
Alexandre Belloni
Victor Chernozhukov
Working Paper 07-08
February 23, 2007
Room
E52-251
50 Memorial Drive
Cambridge,
MA 021 42
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://ssrn.com/abstract=966345
On the Computational Complexity of MCMC-based
Estimators
in
Large Samples
Alexandre Belloni' and Victor Chernozhukov^
Revised February 23, 2007.
Abstract
This paper studies the computational complexity of Bayesian and quasi-Bayesian
mation
in large
covers cases where the underlying likelihood or
extremum
concave, discontinuous, and of increasing dimension.
provide structural restrictions for the problem,
tionally efficient.
samples
is
particular
is
Specifically, it
bounded
is
esti-
samples carried out using a basic Metropolis random walk. The framework
is
in probability
it is
criterion function
possibly non-
shown that the algorithm
shown that the running time
by a polynomial
is
Using a central limit framework to
in the
is
parameter dimension
of stochastic order d" in the leading cases after the burn-in period.
that, in large samples, a central limit
computa-
of the algorithm in large
d,
and
in
The reason
theorem implies that the posterior or quasi-posterior
approaches a normal density, which restricts the deviations from continuity and concavity
in a specific
manner, so that the computational complexity
to exponential
and curved exponential
Key Words:
polynomial.
is
An
appfication
given.
Computational Complexity, Metropolis, Large Samples, Sampling,
gration, Exponential family,
HBM
is
families of increasing dimension
Moment
Inte-
restrictions
TJ Watson Research Center and MIT Operations Research Center. Email: belloni@mit.eclu
http://web.mit.edu/belloni/www
Research support from a National Science Foundation grant
.
is
Url;
gratefully
acknowledged.
^Department of Economics and
http://web.mit.edu/vchern/www
fully
.
MIT
Operations Research Center.
Email:
vchern@mit.edu
Research support from a National Science Foundation grant
acknowledged.
1
is
Url:
grate-
1.
Introduction
Markov Chain Monte Carlo (MCMC) algorithms have dramatically increased the use
and quasi-Bayesian methods
of Bayesian
methods
rely
and
for practical estimation
inference.'^
Bayesian
on a likelihood formulation, while queisi-Bayesian methods replace likelihood
by other criterion functions. This paper studies the computational complexity of a basic
MCMC
algorithm as both the sample and parameter dimension grow to infinity at appro-
The paper shows how and when
priate rates.
the large sample asymptotics places sufficient
on the likelihood and criterion functions that guarantee the
restrictions
polj'nomial time - computational complexity of these algorithms.
efficient - that
These
is,
results suggest
that at least in large samples, Bayesian and Quasi-Bayesian estimators can be computationally efficient alternatives to
in cases
in
where hkelihoods and
parameters of
maximum
all
and possibly non-smooth
interest.
To motivate our
analysis, consider the
M-estimation problem, which
The
of estimating various kinds of regression models.
maximize some
extremum estimators, most of
likelihood and
criterion functions are non-concave
is
a
common method
idea behind this approach
to
is
criterion function:
n
QnW = -^m(r, -(7,:(X^,e)),
eeecM'',
(1.1)
)=1
where
In
Yi is the
response variable, X,
many examples,
estimator
may be
problem
is
problem
is
the problem
difficult or
is
is
?n(-) to
.
dependence on
.).
This problem
fits in
for instance,
''See
e.g.
management a major
be the asymmetric absolute deviation function''
all
f{Xi, qi-i,qi-2,
=
{a
—
l{u
<
0))u.
past data and accurately capture
•••;
d)
=
books of Casella and Robert
methods and
and
the M-estimation framework
GARCH-like dependencies,
leading research in this area^ considers recursive models of the form
^See
a regression function.
impossible to obtain. For instance, in risk
m{u)
MCMC
is
to predict the a-quantile of a portfolio's return Yi tomorrow, given today's
by taking function
reflect
g,;
that of constructing the estimates of Conditional Value-at-Risk. In particular, the
past available information [Xi, Xi-i,
To
a vector of regressors, and
nonlinear and non-concave, implying that the argmax
X'^j
[6],
qi
=
f{Xi, qi^i,qi^2, ',d)
+ piqi-i + p2qi-2- This implies a highly non-linear,
Chib
[9],
Geweke
[15],
Liu
[29] for
detailed treatments of the
their apphcations in various areas of statistics, econometrics,
and biometrics.
Koenker and Bassett (1978).
^E.g. Engle
and Maganelli
(2005).
.
,
..
,
'.
:.,
,..•,
•
recursive specification for the regression function
qi{-; 9),
terion function used in M-estimation defined in (1.1)
in this
example, the function Qn{S)
is
e
may be
very hard to obtain. Figure
1
is
which
in turn implies that the cri-
generally non-concave. Furthermore,
non-smooth. As a consequence the argmax estimator
& &vgmaxQn{e)
(1.2)
in Section 2 illustrates other kinds of
examples where
the argmax computation becomes intractable.
As an
argmax estimation, consider the Quasi-Bayesian estimator obtained
alternative to
by integration
in place of optimization:
6'exp{Q„(
(1-3)
eMQn[0')]de'
may be
This estimator
TT-niO)
DC
recognized as a quasi-posterior
(Of course, when
expQ„(6').
redundant.) This estimator
is
often
much
easier to
will
show that
mean
of the quasi-posterior density
a log-likelihood, the term "quasi" becomes
is
not affected by local discontinuities and non-concavities and
is
compute
if
than the argmax estimator;
in practice
Hong
the discussion in Chernozhukov and
This paper
Qn
[8]
and Liu, Tian, and Wei
see, for
example,
[28].
the sample size n grows to infinity and the dimension of the
problem d does not grow too quickly
sample
relative to the
size,
the quasi-posterior
exp{Q„(g)}
^^'^^
L/e
will
be approximately normal. This result
(1.3)
is
in
turn leads to the main claim: the estimator
can be computed using Markov Chain Monte Carlo
the starting point
As
e-XY>{Qn{0')]de'
standard
is
in
drawn from the approximate support
in tlie literature,
we measure running time
polynomial time, provided
of the quasi-posterior (1.4).
in the
irumber of evaluations
of the numerator of the quasi-posterior function (1.4) since this accounts for most of the
computational burden.
In other words,
above estimator
is
when
the central hmit theorem (CLT) for the quasi-posterior holds, the
computationally tractable. The reason
is
that the
CLT,
in addition to
implying the approximate normahty and attractive estimation properties of the estimator
0,
bounds non-concavities and discontinuities
that the computational time
the
is
bound on the running time
polynomial
of
Qn{d)
in the
of the algorithm
in a specific
manner that imphes
parameter dimension
is
d.
In particular,
Op{d^) in the leading cases after the
so-called burn-in period. Thus, our
CLT
main
insight
is
to bring the structure implied by the
MCMC
into the computational complexity analysis of the
and sampling from
of (1.3)
Our
algorithm for computation
(1.4).
analysis of computational complexity builds on several fundamental papers studying
the computational complexity of Metropolis procedures, especially Applegate and
[1],
Kannan and Poison
Pi'ieze,
nan and
Li
[23],
of our results
oped
[14],
Poison
Lovasz and Simonovits
and proofs
We
in these works.
rely
[34],
[30],
Kannan, Lovasz and Simonovits
and Lovasz and Vempala
upon and extend the mathematical
Kannan
[24],
[31, 32, 33].
Kan-
Many
tools previously devel-
extend the complexity analysis of the previous literature, which
has focused on the case of an arbitrary concave log-likelihood function, to nonconcave and
nonsmooth
tings
it is
cases.
The motivation
typically easier to
is
that from a statistical point of view, in concave set-
compute a maximum
extremum estimate than a
likelihood or
Bayesian or quasi-Bayesian estimate, so the latter do not necessarily have practical appeal.
In contrast,
when
the log-likelihood or quasi-Hkelihood
is
either
nonsmooth, nonconcave, or
both, Bayesian and quasi-Bayesian estimates defined by integration are relatively attractive computationally,
compared
to
maximum
likelihood or
extremum estimators
defined by
optimization.
Our
analysis also relies on statistical large sample theory.
posteriors and quasi-posteriors for large samples as
n
—
>
oo.
We
invoke limit theorems for
These theorems are necessary
to
support our principal task - the analysis of computational complexity under the restrictions
of the
CLT. As a preliminary
for quasi-posteriors
literature for posteriors
1809, Bickel and
Yahav
and quasi-posteriors
[4],
provided
CLTs theorems
Wei
provided
[28]
step of our computational analysis,
and posteriors which generalizes the
CLTs
for fixed
Ibragimov and Hasminskii
for posteriors.
dimension. In particular, Laplace
[19],
c.
and Bunke and Milhaud (1998)
[8]
and
Liu, Tian,
and
formed using various non-likehhood criterion
we allow
for increasing
dimensions. Ghosal
CLT for posteriors with increasing dimension, but only for
families. We go beyond such canonical setup and establish the CLT
discontinuous cases. We also allow for general criterion functions in
also previously derived a
concave exponential
for
we obtain a new CLT
previously derived in the
Chernozhukov and Hong
for quasi-posteriors
functions. In contrast to these previous results,
[17]
CLT
non-concave and
place of hkelihood functions.
The paper
also illustrates the plausibility of the
using exponential and curved exponential families.
when
the data must satisfy additional
moment
The curved
families arise for
restrictions, as e.g. in
approach
example
Hansen and Singleton
Chamberlain
[18],
and Imbens
[7],
[20].
The curved famihes
fall
outside the log-concave
framework.
The
rest of the
paper
organized as follows.
is
Theorem
version of the Central Limit
may be
result
that
2,
we
establish a generahzed
seen as a generahzation of the classical Bernstein- Von-Mises theorem, in
allows the parameter dimension to grow as the sample size grows,
it
n —
as
In Section
Bayesian and Quasi-Bayesian estimators. This
for
>
DO.
In Section
the complexity of
2,
MCMC
we also formulate the main problem, which
is
d
i.e.
—
>
oo
to characterize
sampling and integration as a function of the key parameters
that describe the deviations of the quasi-posterior from the normal density.
Section 3
explores the structure set forth in Section 2 to find bounds on conductance and mixing time
MCMC
of the
algorithm. Section 4 derives bounds on the integration time of the
MCMC
algorithm. Section 5 considers an application to a broad class of curve.d exponential famihes,
which are possibly non-concave and discontinuous, and
We verify
this class of statistical models.
verifies
that our results apply to
that high-level conditions of Section 2 follow from
primitive conditions for these models.
The Setup and The Problem
2.
Our
We
analysis
is
motivated by the problems of estimation and inference in large samples.
consider a "reduced-form" setup formulated in terms of parameters that characterize
local deviations
from the true
statistical
parameter.
®
The
contiguous deviations fi-om the true parameter and we shift
of the
extremum estimator
and
—
it),
s
^/n{9
—
we have the
6o) the
local
That
s.
is,
The parameter space
-
parameter A describes
first
order approximation
denoting a parameter vector, 9o the true value,
normalized extremum estimator (or a
first
order approximation to
for 9
is
= ^i{9 -
9o)
-
s.
0, and the parameter space
for
A
is
therefore
A = \/n(9 —
s.
The corresponding
localized likehhood (or localized criterion) function
For example, suppose Ln{9)
or,
local
by a
parameter A defined as
A
^o)
for 9
it
more
generally, Ln{9)
Examples
set-ups.
in
is
is
is
denoted by
^(A).
the original Hkelihood function in the hkelihood framework
exp{nQ„(6)} where Qn{9)
is
the criterion function in
extremum
Section 5 further illustrate the connection between the localized set-up and the non-localized
framework, then
=
e{X)
The assumptions below
+
L„(0o
(A
+ s)/M/Lnm-
be stated directly
will
in
terms of ^(A). (Section 5 provides more
primitive conditions within the exponential and curved exponential family framework.)
Then, the posterior or quasi-posterior density
for
A takes the form (impUcitly indexed by
the sample size n)
/(A)
and we impose conditions that
=
(2.5)
ryrhrr
force the posterior to satisfy a
CLT
in the sense of
approach-
ing the normal density
=
(j){X)
— exp
^
More
n
—
>
formally, the following conditions are
These conditions, which
00.
explicitly allow for
CI. The
s is
n
local
assumed to hold
mean
fj^
local
A =
U
Jf
fiX)dX
>
where
K'=,
1
-
Op(l)
is' is
and
£2
such that
is
:=
oo):
a closed ball 5(0,
>
I
-
size
conditions,"
A C
||fs:||)
M'^.
The
vector
bounded above
with
\\K\\
as
= CVd
o{l)J
i.e.,
there exist positive approximation errors
every X G K,
for
ei 4- £2
A'JA/2,
•
(2.6)
a symmetric positive definite matrix with eigenvalues bounded away
from zero and
IK'lli
>
parameter space A e
Jj. 4>{X)dX
\n^{X)-l^-h'JX^ <
where J
"CLT
or quasi-posterior function i{X) approaches a
quadratic form in logs, uniformly in K,
and
—
sample
for ^(A) as the
will call the
vector with variance 0, whose eigenvalues are
C2. The lower semi-continuous posterior
ei
we
an increasing parameter dimension d (d
-> oo, and
such that
2
V
in the following
parameter A belongs to the
a zero
-^A'JA
I
(27r)'i/2det(J-i)^/2
fi'om above.
Also,
we denote the
ellipsoidal
norm induced by J
as
WJ'^M-
C3. The approximation
errors
ei
and
£2 satisfy £1
=
Op{l),
and
£2
•
|lA'||j
=
Op(l).
These conditions imply that
i{X)=g{X)-m{X)
''Note that \\K\\
:= sup{|ja||
concentration of measure under d
;
a G K}.
—
>
The constant
00 asymptotics.
C
need not grow due to the phenomenon of
r?0^.
-^K
Figure
1.
This figure illustrates how ln<'(A) can deviate from
ln(;((A)
in-
cluding possible discontinuities on In^(A).
over the approximate support set
K where
ln5(A)
-ei
Figure
1
-
e2A'JA/2
= --A'JA,
< Inm(A) <
illustrates the kinds of deviations of lnl{X)
the parameters
ei
and
€-2,
and
es controls
+ e2A'JA/2.
(2.8)
from the quadratic curve captured by
shows the types of discontinuities and non-convexities
Parameter
permitted in our framework.
parameter
also
ei
(2.7)
ei
controls the size of local discontinuities
and
the global tilting away from the cjuadratic shape of the normal log-
density.
Theorem
1.
[Generalized
CLT for
Quasi-Posteriors] Under the conditions stated above,
the density of interest
£(A)
(2.9)
approaches a normal density
I
1/(A)
-
(p{X)
0(A)|dA
JA
Proof. See
Comment
tial for
with variance matrix
= /
JK
1/(A)
-
cj>{X)\d\
J
in the following sense:
+ Op(l) =
Op(l).
(2.10)
D
Appendix A.
2.1.
Theorem
1
is
a simple preliminary result.
defining the environment in which
main
However, the result
results of this
is
essen-
paper - the computational
complexity results - will be developed. The theorem shows that
some
regularity conditions hold, Bayesian
ple properties.
The main
CLT
implications of the
part of the paper, namely Section
samples, provided
3,
develops the computational
In particular, Section 3 shows that polynomial time
conditions.
MCMC
computing of Bayesian and Quasi-Bayesian estimators by
CLT
in large
and Quasi-Bayesian inference has good large sam-
is
in fact implied
by the
conditions.
Comment
2.2.
By
allowing increasing dimension (d -^
cx))
Theorem
1
extends the
CLT
previously derived in the hterature for posteriors in the likelihood framework (Bickel and
Yahav
[4],
Ibragimov and Hasminskii
quasi-posteriors in the general
Bunker and Milhadu
[19],
general criterion functions (Chernozhukov and
theorem
is
[5],
Ghosal
extremum framework, when the likehhood
more general than the
Hong
Ghosal
results in
[8],
[17],
Liu, Tian,
who
[17])
and
for
replaced by
is
and Wei
[28]).
The
also considered increasing
dimensions but limited his analysis to the exponential likelihood family framework.
contrast,
Theorem
of posteriors.
1
In
allows for non-exponential families and allows quasi-posteriors in place
Recall that quasi-posteriors result from using quasi-hkehhoods and other
This expands substantially the scope of the
criterion functions in place of the likelihood.
applications of the result.
discontinuities in the
apphcations listed
The Problem
Importantly,
Ukehhood and
Theorem
1
allows for non-smoothness and even
criterion functions,
which are pertinent in a number of
in the introduction.
of the Paper. Our problem
is
to characterize the complexity of obtaining
draws from /(A) and of Monte Carlo integration J g{X)f{X)dX, where /(A) is restricted to
the approximate support K. The procedure used to obtain the basic draws as well as to
carry out
standard
I.
Monte Carlo
MCMC
integration
a Metropolis (Gaussian) random walk, which
is
algorithm used in practice.
The
is
a
tasks are thus:
Characterize the complexity of sampling from /(A) as a function of
[d, n, ei, £2,
K);
K]
II.
Characterize the complexity of calculating J g{X)f{X)dX as a function of
III.
Characterize the complexity of sampling from /(A) and performing integrations
with /(A) in large samples as
d,
?z
—
»
oo by invoking the bounds on
{d, n, £i, £2,
(d, n, ei, e2. A')
imposed by the CLT;
IV. Verify that the
CLT
conditions are apphcable in a variety of statistical problems.
This paper formulates and answers this problem.
strictions into the complexity analysis
Thus, the paper brings the
and develops complexity bounds
integrating from /(A) under these restrictions.
These
CLT
for
CLT
re-
sampling and
restrictions, arising
by using
large sample theory
and imposing certain regularity conditions, limit the behavior of /(A)
over the approximate support set
nomial computing time
K
do not provide strong restrictions on the
our analysis of complexity
manner that
in a specific
allows us to establish poly-
sampling and integration. Because the conditions
for
is
tail
behavior of /(A) outside
K
CLT
for the
other than Cl,
limited entirely to the approximate support set
K
defined in
C1-C3.
Comment
2.3.
By
solving the above problem, this paper contributes to the recent liter-
ature on the computational complexity of Metropolis procedures. Early work was primary
concerned with the question of approximating the volume of high dimensional convex
where uniform densities play a fundamental
Lovasz and Simonovits
the log-likelihood
Vempala
is
[24, 25]).
concave
[31, 32, 33]).
role
(Lovasz and Simonovits
set's
Kannan,
[30],
Later the approach was generalized for the cases where
(Fi'ieze,
Kannan and Poison
[14],
However, under log-concavity the
Poison
maximum
and Lovasz and
[34],
hkelihood estimators
are usually preferred over Bayesian or quasi-Bayesian estimator from a computational point
of view.
In the absence of concavity, exactly the settings where there
is
a great practical
appeal for using Bayesian and quasi-Bayesian estimates, there has been relatively
any, analj'sis.
One important exception
smooth
covers nearly-concave but
the paper of Applegate and
Kannan
less, if
[1],
densities using a discrete Metropolis algorithm.
Kannan
contrast to Applegate and
is
[1],
which
®
In
our approach allows for both discontinuous and
non-concave densities that are permitted to deviate fi^om the normal density (not from an
arbitrary log-concave density, like in Applegate and
Kannan
manner
motivated by the
in
which they deviate from the normal
by parameters
ti
and
e2,
which are
is
in turn restricted
restrictions provide a congenial analytical
[1])
by the
in a specific
CLT
CLT
conditions.
framework that allows us to
discrete sampling algorithm frequently used in practice. In fact,
manner. The
and controlled
it is
The CLT
treat a basic non-
known
that the basic
Metropolis walk analyzed here does not have good complexity properties (rapidlj' mixing)
for arbitrary log-concave density functions, in opposition to other
and-run^. Nonetheless, the
CLT
in fact rapidly mixing. Moreover, only Subsection 3.2
walk while
all
CLT
sophisticated schemes. As
The
is
more general
setting.
approach can be used to establish polynomial bounds
standard
like hit-
random walk
is
depends on the particular form of the
the remaining results are valid in a considerably
suggests that the same
random walks
conditions imply enough structure that the
in the literature,
we assume that the
for
This
more
starting point
discrete Metropolis algorithm facilitates the analysis, but they are are not frequently used in practice.
See Lovasz and Vempala
[33] for
a discussion.
of the algorithm
in the
is
CLT
Indeed, the polynomial
approximate support of the posterior.
time bound that we derive applies only
because this
in this case
provides enough structure on the problem.
Our
is
the
domain where the
analysis does not apply outside this
domain.
Sampling from / using a Random Walk
3.
3.1.
Set-Up and Main Result.
obtaining a draw from a
random
In this section
By
invoking Assumption CI,
the approximate support set
K
measurable density function / restricted to
K denoted by Q,
i.e.,
we
results to
is
to
will
be defined over
Q{A) = J^ f{x)dx/
Jj^
f{x)dx
for all
used random walk induced by a Gaussian distribution.
A
e A). Asymptotically,
will
Casella and Robert
[6]
=
and Vempala
cr
>
is
completely
and
0,
its
one-
procedure of drawing a point y according to a
imn{£{y)/i{u), 1} move to
[39]
commonly
concentrate on a
Such random walk
Gaussian distribution centered on the current point u with covariance matrix
with probability mm{f{y)/f{u), 1}
it is
are capable of performing
filter
characterized by an initial point uq and a fixed standard deviation
latter is defined as the
to a
K (this density induces a probability distribution
such task. In order to prove our complexity bounds, we
The
this set.
draw a random variable according
well-known that random walks combined with a Metropolis
step move.
study the associated
our attention entirely to
restrict
and the accuracy of sampling
Consider a measurable space {K,A). Our task
on
the computational complexity of
upon these
function / as defined in (2.5). (Section 4 builds
integration problem.)
we bound
variable approximately distributed according to a density
for details).
y;
cr"/
and then,
otherwise stay at u (see
In the complexity analysis of this
algorithm we are interested in bounding the number of steps of the random walk required
to
draw a random variable from / with a given
in
bounding the number of evaluations of the
precision.
Equivalently,
we
are interested
local Ukelihood function ( required for this
purpose.
Next we review
definitions of important concepts relevant for our analysis.
tions of these concepts follow Lovasz
and Simonovits
[30]
and Vem.pala
[39].
The
defini-
Let q{x\u)
denote the density function associated with a random variable N{u, a"!), and luiA) be the
indicator function of the set A. For each u £
distribution after one step of the
Pu{A)= I
K the one-step distribution P^, the probability
random walk
starting fi'om u,
is
defined as
m:iiA^4r\A\q{x\u)dx-V0lu{A)
(.3.11)
where
e
mm\^,l\ q{x\u)dx
=1- f
[fW
Jk
is
(3.12)
J
the probabihty of staying at u after one step of the ball walk from u.
random walk
said to be proper
is
if
the next point
is
different
A
step of the
from the current point (which
happens with probabihty 1—9).
The
(K,A, {Pu
triple
u G K}), along with a starting distribution Qo, defines a Markov
We denote by Qt
random walk. A distribution Q
the probability distribution obtained after
chain in K.
is
called stationary
Pu{A)dQ[u)
=
on {K,A)
for
if
A
any
t
steps of the
G A,
Q{A).
(3.13)
Given the random walk described earher, the unique stationary probabihty distribution
is
induced by the function /, Q{A)
and Roberts
[6]
literature since
As mentioned
This
.
is
=
/^ f{x)dx/
f{x)dx
for all
before, our goal
is
method
A
£ A, see
MCMC
the main motivation for most of the
provides an asymptotic
it
Jj^
Q
Casella
e.g.
studies found in the
to approximate the density of interest.
to properly quantify this convergence
and
for that
we need
to review additional concepts.
The
ergodic flow of a set
A
Q
with respect to a distribution
^{A)
is
defined as
= / Pu{K\A)dQ{u)
JA
It
measures the probabihty of the event {u G A,u' ^ A) where u
to
Q
and
u' is
average flow of points leaving
A
in
and only
if
^{A)
stationary measure
if
Pu[K
$(A)
\
A)dQ{u)
one step of the random walk.
=
= /
JA
$(A'\A)
(1
is
distributed according
chain
is
said to be ergodic
if
/.
aU
A
e
A
u\ it
captures the
follows that
Q
is
a
since
A
^[A) >
the case for the A-Iarkov chain induced by the
assumptions on
for
It
- Pu{A)) dQ{u) = Q{A) f Pu{A)dQ[i
J
Pu{A)dQ{u) - f Pu{A)dQ{u)
A Markov
is
obtained after one step of the random walk starting from
= ^[K
\ A).
for every
A
with
< Q{A) <
random walk described
earlier
1,
which
due to the
P
In order to compare two probability distributions
Q
and
we use the
total variation
distance^°
IIP
P
Moreover,
said to be a
is
- QWtv =
M-warm
in the analysis are the
^^
,
.,,,
= mm (p{A)
=
if
^<M.
(3.15)
conductance of a set A, which
HA) =
.
0<QiA)<l/2Q{A)
Lovasz and Simonovits
Q
defined as
is
as
mm
.
A
(3.14)
mm{Q{A),Q{K\A)y
'
and the global conductance, defined
(p
Q{A)\.
start with respect to
sup
The key concepts
-
sup \P{A)
ACK
UPuiK\A)dQ{u)
-—
mm
-^
.
0<QiA)<l/2
.
Q{A)
proved the connection between conductance and convergence
[30]
for
the continuous space setting. This result extended an earlier result of Jerome and Sinclair
[21, 22],
who connected convergence and conductance
for discrete state spaces.
Lovasz and
Simonovits' result can be re-stated as follows.
Theorem
2. Let
Qq
be a
M-warm
WQt
Proof. See Lovasz
The main
4>
~ QWtv <
and Simonovits
result of this
Markov chain
under the
start with respect to the stationary distribution Q.
^1 U -
[30].
D
,
paper provides a lower bound
CLT
Then,
conditions. In particular,
for the global
conductance of the
we show that 1/0
is
bounded by
a fixed polynomial in the dimension of the parameter space.
Theorem
a for
the
3.
(Main Result) Under Assumptions Cl, C2, C3, and
random walk
setting the
parameter
as defined in (3.17), the global conductance of the induced
Markov
chain satisfies
l/0
=
o(de4^>+^^^ll^'l!').
.
In particular, the random, walk requires at most
N,
^'^This distance
Q
since
is
= Op[
equivalent to the
sup^gK \PiA) - Q[A)\ =
\
e«^i+8''^ll^'ii5
O (K)
(-Y
ln(M/e:
distance between the density functions associated with
J^ \dP - dQ\.
P
and
steps to achieve
0(1), vue
WQ^e ~ QWtv ^ £• Invoking the CLT restrictions, ei = o(l),
= Op{d) and the number of steps N^ is bounded by
£2
•
Op
{d^
ln(M/£))
D
Comment
3.1. In general, the dependence on
ei
and
eo is
exponential and this bound does
CLT
not imply polynomial time ("efficient") computing. However, the
ei
=
.
Proof. See Section 3.2.
that
||-^'^'||j
have that 1/0
=
and
0(1)
£2
\\I'^\\j
=
which by Theorem
o(l),
3 in turn
fi-amework implies
imphes polynomial time
computing.
Next we discuss and bound the dependence on M, the "distance" of the
Q
Qo from the stationary distribution
distribution
Qq
is
M
the one-step distribution conditional on a proper
We
point u e K.
A natural
as defined in (3.15).
emphasize that,
initial distribution
candidate
for a starting
move from an
arbitrary
such choice of Qq could lead to values of
in general,
that are arbitrary large. In fact, this could happen even in the case of the stationary
density being a uniform distribution on a convex set (see
case under the
Lemma
least
1.
CLT
Let u G
[33]).
Fortunately, this
as
K
be the associated one-step distribution.
and P^
is
not the
shown by the following lemma.
framework
With probability at
1/3 the random walk makes a proper move. Conditioned on peiforming a proper m,ove,
the one-step distribution is a
InM =
Under
the
CLT
M-warm
0(dln(\/d||/^||)
restrictions, e2\\K\\j
=
o(l)
\nM =
Proof. See
Appendix A.
Combining
this result
where
start start for f,
+
|jA'||5
and
+
\\K\\j
ei
=
+
e2\\K\\-j).
0[\/d), so that
0{d\nd).
'
'
.
D
with Theorem 3 yields the overall (burn-in plus post burn-in)
running time
Op{d^\iid).
Proof of the Main Result. The
3.2.
an analytical result and
first is
proof of
Theorem
first
result to
K=
The proof
Outline of the Proof.
3.2.1.
U
5i
5^2
bound the ergodic
U
C
53 where Si
HA) >
X4
A
A, 5^
C
K \ A,
one
CLT
i
/^ P„(/i \ A)dQ{u)
+
i
4
P„(.4)dQ(a)
= ^{K
—
imphes that
\\u
—
v\\
is
i.e.,
Q(53)
is
between their one-step distributions
also
random walk
bounded from below
for the global
conductance from below. Theorem
3 follows
1
/^.^^
is
Pu{A)dQ{u)
f Q(53).
will follow
c
and Pv{A) <
\\Pu
- Py\\ >
(see Section 3.2.3).
by bounding
is
c.
Therefore, the
\Pu{A)
- Pv{A)\ >
Since u and v were
Therefore 53 cannot be arbitrarily
the following:
among
all
After bounding the global
conductance.
CLT
iso-perimetric problem has a long history in mathematics.
surface area.
+
+
are used to ensure that this condition
by invoking
minimizes a certain criterion within a particular family of
perimetric problem
for
bounded from below.
This leads to a lower bound
The
K \A
sets.
arbitrary points, the sets Si and 52 are "far" apart.
small,
or
decreasing in the distance between S\ and 52. Therefore
properties of the
is
A
in
derived in Section 3.2.2. This result will provide a
is
needs to bound the distance between these
The geometric
2c.
[31].
disjoint partition
\A).
Given two points u G Si and v £ 52, we have Pu{K\A) <
1
we
This will be achieved by an iso-perimetric inequality which
framework and
total variation distance
follows
at least a fixed constant c (to be
=
A)dQ{u)
\
bound on QiS^), which
still
is
P„(A- \ A)dQ(«)
the last term from below.
lower
and S3 consists of points
to the other set
what
Lovasz and Vempala
in
two terms could be arbitrarily small, the result
first
tailored to the
arguments
£ A, consider the particular
where the second equality holds because $(A)
Since the
In
the formal proof
finally,
we have
Pu(K
4
i
follows the
flow of
which the one-step probability of going
defined later). Therefore
established, the second result
is
bound the conductance from below.
provide an outline of the proof, auxiliary results, and,
In order to
an
results:
of independent mathematical interest. After the connection
is
between the iso-perimetric inequality and the ergodic flow
allows us to use the
two auxihary
3 uses
and a geometric property of the particular random walk. The
iso-perimetric^' inequality
sets.
sets with unit
conditions and
It
Theorem
2.
consists of finding the set that
A famous and
familiar
example
of an iso-
volume, find the one that has the smallest
3.2.2.
An
Iso-perimetric Inequality.
A
concavity.
x,y E
IR",
we
function /
R" —
>
:
some
/3
£
is
by defining a notion of approximate
start
said to be log-/?-concave
if for
a €
every
log-
[0,1],
liave
/(a.T
for
We
IR
(0, 1].
/
is
+ (l-a)y)>/?/(xr/(j/)i-"
said to be logconcave
of log-/?-concave functions
is
if
P can be
The
talcen equal to one.
class
rather broad, for example, including various non-smooth and
discontinuous functions.
Together, the relations (2.7) and (2.8) imply that we can write the functions / and i as
the product of e~^^
Lemma
Over
2.
'^^
and a log-/?-concave function:
the set
K
the functions /(A) := i(X)/ J,^({\)d\
of a Gaussian function, e~'^
,
and ({X) are
the product
and a p -log- concave function whose parameter
(3
satisfies
ln/3>2-(-ei-e2-||A'||}).
D
Proof. It foUows from (2.8).
In our case, the larger
That
is
is
the support set
appropriate since the
ability densities.
CLT
K,
the larger
is
the deviation from log-concavity.
does not impose strong restrictions on the
tail
of the prob-
Nonetheless, this gives a convenient structure to prove an iso-perimetric
inequality which covers even non-continuous cases permitted in the framework described in
the previous sections.
Lemma
K=
Consider any measurable partition of the form
3.
distance between S\ and
5*2
is
at least
t,
i.e.
d{Si,S2)
Then for any lower semi-continuous function f{x)
=
>
5'i
U
52
Let Q{S)
t.
e~ll^ll
U
Ss such that the
=
J^ fdx/
m{x), where
m
is
Jj^
fdx.
a log-P-
concave function, we have
Q{S2)
> /3^^^^ min
{Q(5i), QiS.)}
D
Proof See Appendix A.
Comment
3.2. This
Kannan and
in
Li [23],
new
Applegate and Kannan
assumptions on /,
iso-perimetric inequality extends the iso-perimetric inequality in
Theorem 2.L The proof
for
[1].
Unhke the
builds on their proof as well as on the ideas
inequality in
[23],
Lemma
3 removes
smoothness
example, covering both non-log-concave and discontinuous cases.
The
iso-periraetric inequality of
subsets of
K
Lemma
3 states that,
under suitable conditions,
the measure of at least one of the original subsets.
The
Corollary
f{x)
=
Consider any measurable partition of the form
1.
>
t,
two
following corollary extends the
J
previous theorem to cover cases with an arbitrary covariance matrix
d{Si, S2)
if
are far apart, the measure of the remaining subset should be comparable to
and
let
Q{S)
=
e~2^ '^^m{x), where
m
>P
QiSs)
K
=
Si
U SsU
S2 such that
J^ fdx/ /^ fdx. Then for any lower semi- continuous function
is
a log-P-concave function, we have
y'^^ j^in {q(^Si),
ie-^-"''/8
Q{S2)}
,
where Xmin denotes the minimum, eigenvalue of the positive definite matrix J.
Proof. See
Appendix A.
Bounds on
3.2.3.
D
'
the Difference of One-step Distributions.
Next we
relate the total varia-
tion distance between two one-step distributions with the Euclidean distance between the
Although
points that induce them.
this
approach follows the one
there are two important differences which call for a
[31, 32, 33]
longer rely on log-concavity of /. Second,
we use a
different
in
Lova^z and Vempala
new
proof.
random walk.
First,
We
we no
start with
the following auxiliary result.
Lemma
4. Let g
compact
set
,
K
.
:
IR"
—
>
IR 6e a function such that Ing
Then, for every x e
K
and r >
Lipschitz luith constant
L over
[g{y)/g[x)\>e-'^\
inf
:,,
is
0,
yeB{x,r)nK
Proof The
result
is
Given a compact
D
obvious.
set
K, we can bound
the Lipschitz constant of the concave function Ing
defined in (2.7) by
L < sup
Lemma
v\\
< ^
5. Let
where
L
f as defined in
u,v e
is
]|Vln5(A)||
K := B{0,
<
sup
||A'|1), cr^
||JA||
<
<
Xmax\\K\\
jg^j;^,
and suppose
the Lipschitz constant of Ing on the set
(2.5),
=
(Vd)
that
(3.16)
.
jr^
<
j^
and
\\u
—
K. Under our assumptions on
we have
||Pu-p.||tv<i-^.
Proof. See
Appendix A.
'
D
The
converse of
Lemma
5 states that
if
two points induce one-step probability distribu-
tions that are far apart in the total variation
norm, these points must also be
far apart in
the Euclidean norm. This geometric result provides a key ingredient in the application of
the iso-perimetric inequality as discussed earher.
Proof of Theorem
3.2.4.
where
||ii'||
=
3.
Consider the compact support
We
0{\/d/Xmin) from Assumption CI.
= min {l/4VdL,
a
Therefore the assumptions of
the theorem, using (3.16)
it
Lemma
for
f
as
K —
B{0,
\\K\\),
define
\\K\\/V2{)d\
(3.17)
.
Moreover, under the assumptions of
5 are satisfied.
follows that
=
cj>
(3.18)
.
,
l20\raaxVd\\K\\
Fix an arbitrary set
to
K.
We
will
A
e
^ and
denote hy
A'^
=
K\A the complement of A with respect
prove that
HA) =
j
Pu{A')dQiu) >
^a^%~mm{Q{A),QiA')},
which implies the desired bound on the global conductance
to bounding #(A'^) since
Q
is
(j).
Note that
(3.19)
this is equivalent
stationary on {K,A).
Consider the following auxiliary definitions:
Si^LeA: Pu{A')
First
S2
assume that Q{Si) < Q{A)/2
this case,
$(A)
<-^Y
= lveA':
(a similar
P„(A) <
^|
,
and S^
argument can be made
= K\{Si U
for
52 and
S2).
A'').
In
we have
= /
P,{A')dQ{u)
JA
and the inequahty
> f
P^[A^)dQ{u) >
Ja\Si
f
fdQiu) > ^Q{A\S,) > ^Q{A),
JA\Si 6e
6e
12e
(3.19) foUows.
Next assume that QiSi) > Q{A)/2 and Q{S2) > QiA'')/2. Since $(A)
=
^{A') we have
that
^A) =
j
P„{A')dQ{u)
=
lJ^Pu{A')dQ{u)
A\Si
>
^
+
"^^ ;^^\^J
lJ^,P,,{A)dQiv)
-r 2 Ja<=\S2
— j^(
Us.ldQ{u)^4-Q{S^),
2 J53 G^dQiu)
=
where we used that ^3
sets Si
and
5^2,
Lemma
we can apply the
(5(53).
We
tire
\ ^i)
we have
5,
U [A"
that
—
||u
1
\ 52).
Given the definitions of the
- Pu{A') - Pv{A) >
i'||
>
1
- ^-
| for every u € Si and v G 52. Thus,
iso-perimetric inequahty of CoroUary
Pu{A^)dQ{u)
1,
with d{Si,S2)
>
cr/S, to
>
g|e-i^-"-/64 y2^min{Q(5i),Q(52)}
>
^%^ min{Q(A),Q(A^)}.
second inequahty also used that
tions. Therefore, using relation (3.18)
i/4>
since the eigenvalues are
The remaining
Theorem
(^
bound
obtain
J
where
=
e 52 we have
t;
- PvWtv > Pu{A) - Py{A) =
\\Pu
In such case, by
K \ {S\ U 52)
u £ S\ and
for every
2
Xmin'^'^
\\K\\
=
< A^j„
,|'
^y^
<
'^/d
under our
to be uniforml}^
Theorem
3 follow
bounded from above and away from
by invoking the
CLT
zero.
conditions and applying
D
with the above bound on the conductance.
4.
defini-
0{\Jd/\min)^ we obtain
= o f/j-^^d) =o(d e'^^+'^-mA
assumed
results in
and
Complexity of Monte Carlo Integration
This section considers our second problem of interest - that of computing a high dimensional integral of a
bounded
real valued function g:
ixg
=
g{\)!{\)d\.
/
(4.20)
Jk
The
A"*,
integral
A^,
.
.
.,
is
computed by simulating a dependent (Markovian) sequence
of
random
points
which has / as the stationary distribution, and taking
1
/^3
=
^
(4.21)
]^Eff('\')
1=1
as an approximation to (4.20).
size
draws from
the
The dependent nature
of the sample increases the sample
needed to achieve a desired precision compared to the
/. It turns
(infeasible) case of
independent
out that as in the preceding analysis, the global conductance of the
Markov chain sample
will
be crucial
in
determining the appropriate sample
size.
The
starting point of our analysis
due to Kipnis and Varadhan
versible
Markov chain on
a central limit theorem for reversible
is
which
[26]
restated here for convenience.
is
K with stationary distribution /.
stationary time series {s'(A')}^^j, obtained by starting the
distribution
/
is
The
Markov chains
Consider a
re-
lag k autocovariance of the
Markov chain with the stationary
defined as
7fc
=
Cov/(g(A^),5(V+'^-)).
Let us recall a characterization of 7^ via spectral theory following Kipnis and Varadhan
[26]
:
Let
T denote
the transition operator of the
In our case, since the chain
Hilbert space
Lr{K,A,Q),
T
reversible,
is
Markov chain induced by the random walk.
a linear bounded self-adjoint operator in the
Let Eg denote the measure on Borel sets of (—1,1)
see [30].
induced by the spectral measure of
is
T
applied to g, as in
With
[16].
these definitions, one
has that for any k
7fc= f J'^kEa
We
are prepared to restate the central hmit theorem of Kipnis
and Varadhan
[26]
needed
for our analysis.
Theorem
as (4.21)
4.
and
For a stationary,
irreducible, reversible
Markov
chain, with
'p.g
and
fig
defined
(4.20),
+00
k — — oo
almost surely. If
cr^
Proof. See Kipnis
In our case, 70
2,
states that
Corollary
cr^
is finite,
then \fN{flg —fig) converges in distribution
and Varadhan
is finite
since g
to N{Q,a'l).
D
[26].
is
bounded. The next
result,
which builds upon Theorem
can be bounded using the global conductance of the Markov chain.
2. Let g be a square integrable function with respect to the stationary
Q. Under the assumptions of Theorem.
7fc
Proof. See Lovaasz
<
I
1
-
and Simonovits
y
4,
I
[30].
we have
70
and
that
o-g
<
70
I
-2
measure
We
will use the
mean square
error as the measure of closeness for a consistent estimator:
MSEiflg)
Many
to
approaches are possible
[16] for
run
E\fig
-
Mg]2.
for constructing the
a detailed discussion. Here,
• long
•
=
we
sequence of draws
will analyze three
common
in (4.21);
we
refer
schemes:
(Ir),
subsample
(ss),
• multi-start (ms).
Denote the sample
corresponding to each method as
sizes
run scheme consists of generating the
first
A';,.,
and Nms- The long
A^ss,
point using the starting distribution and, after
the burn-in period, selecting the Ni^ subsequent points to compute the sample average
(4.21).
in the
start
The subsample method
sample average
scheme uses Nms
also uses only
(4.21) are spaced out
different
sample paths,
starting distribution /o and picking the last
one sample path, but the Nss draws used
by S steps of the chain.
initializing
draw
in
Finally, the multi-
each one independently from the
each sample path after the burn-in
period to be used in (4.21).
There
is
points are
another issue that must be addressed.
drawn from the stationary
called "burn-in" period B, that
is,
distribution /.
the
number
All
We
schemes require that the
therefore need to
initial
compute the so
of iterations required to reach the stationary
distribution / with a desired precision, beginning at the starting distribution /q.
Theorem
5.
Let /o be a
M-warm
and g :=
start with respect to f,
the notation introduced in this section, to obtain
MSE{fig) <
e
sup;^^;^; |5('^)|-
it is
Using
sufficient to use the
following lengths of the burn-in sample. B, and post-burn in samples, Nij.,Nss,Mms-
Tif
In
and
^^r
The
= -:|,
A^..
overall complexities of
Proof. See
ss,
—
S=
{2/<p^) In (670/e)),
are thus
.
For convenience Table
method depends on
{unth
and ms methods
Appendix A.
that the dependence on
the
Ir,
=
1
M
A^^.
= |^.
B + Ni,., B + SN^s,
and
B x NmsD
.
tabulates the bounds for the three different schemes.
and g
is
only via log terms.
Note
Although the optimal choice of
the particular values of the constants,
when
£
\
0,
the long-run
Table
Burn-in and Post Burn-in Complexities
1.
Method
Quantities
Complexity
Long Run
B+
^
iV,,
Subsample
B+
N.^
Multi-start
B
N^^
Table
2.
x
An ^^vfal^U
(in ^^^
j)
^ In ((^^^)
S
-f
f
;^
x
)
^ (lio)
^ (^ In (^))
^
Burn-in and Post Burn-in Complexities under the CLT.
Burn-in Complexity
Metliod
Long Run
Op(dHnd
IriE-'^)
Subsample
Op{dHnd
Ine''^)
Multi-start
Op((i^ Inrf
Ine"!)
Post-burn-in Complexity
+Op{d'^-£-^)
+
Op{d'^
e''^
Ine-'^)
x Op(£"^)
algorithm has the smallest (best) bound, while the the multi-start algorithm has the largest
bound on the number
(worst)
CLT
conditions,
namely
|jA'||
and g are constant, though
and g grow
In this section
We
nential family.
[35],
Table 2 presents complexities impUed by the
-^
0,
and
£2||-ft^P
CLT
-^
0.
The
table assumes 70
straightforward to tabulate the results for the case where 70
requires,
namely that
ei
=
bounds apply under a
Op(l) and e2||A'|P
=
slightly
Op{l).
we
verify that our conditions
Then we drop
Lehmann and
Ghosal
statistical
the concavity and smoothness assumptions to illustrate the
of the approach developed in this paper.
Concave Cases. Exponential
cf.
and analysis apply to a variety of
begin the discussion with the canonical log-concave cases within the expo-
full applicability
mation,
ei
Application to Exponential and Curved-Exponential Families
problems.
5.1.
it is
0{Vd),
at polynomial speed with d. Finally, note that the
weaker condition than the
5.
of iterations.
=
[17],
Casella
and Stone
et al.
families play a very important role in statistical esti-
[27],
[37].
especially in high-dimensional contexts,
cf.
Portnoy
For example, the high-dimensional situations arise
in
modern data
tial
Moreover, exponen-
and econometric applications.
sets in technometric
familes have excellent approximation properties and are useful for approximation of
densities that are not necessarily of the exponential form,
Our
discussion
exposition,
based on the asymptotic analysis of Ghosal
is
we invoke the more canonical assumptions
El. Let xi,
Stone et
cf.
.
.
,Xn be
.
iid
al.
[37].
In order to simplify
[17].
similar to those given in Portnoy
[35].
observations from a d-dimensional canonical exponential
family with density
f{x;e)=exp{x'9-i>n{0)),
where
6'
E
an open subset of IR
is
£ Q.
of parameter points 9q
Set
fx
and d —> oo
,
=
as
F =
and
'fp'i&o)
n
—
Following Portnoy
covariance of the observations, respectively.
Fix a sequence
oo.
>
[35],
we
F=
re-parameterize the problem, so that the Fisher information matrix
For a given prior n on 0, the posterior density of
mean and
the
ip"{Oo),
implicitly
I.
conditioned on the data takes
over
the form
n
[|/(.x,;6I)
TTniO) oc tt{9)
2
The
local
in the
parameter space
parameter space
is
=
=
s
y/n{x
hood/extremum
—
/i)
^/ri{Q
—
is
a
estimate.
first
By
= ^i{9 -
^(A)
We
=
-
nij{9))
.
be convenient to associate every point 9
9o}
-
s,
meLximum
order approximation to the normalized
design,
we have that
by Chebyshev's inequahty, the norm of
=
It will
9o).
Finally, the posterior density of A over
where, for x
exp {nxO
n{e)
with an element of A, a translation of the local parameter space,
A
where
=
1
E[s\
—
and
E [ss'] =
can be bounded in probability,
s
A = x/n(0 —
9q)
—
s is
Id-
=
=
||s|]
given by /(A)
likeli-
Moreover,
Op{'\/d).
r
}(x)d\
'
X]"=i Xi/n,
exp (x'(^/^(A
+
s))
+
n{4>{9o
+
(A
+ s)/^i) -
^(^o)))
impose the following regularity conditions, following Ghosal
^(^o
[17]
+
(A
-I-
s)/0I).
and Portnoy
E2. Consider the following quantities associated with higher moments
borhood of the true parameter
Bin(c) := sup{Ee|o'(.'c,
-
9o:
ii)\^
:
a G IR^
||a||
=
1, ||^
-
^ojl^
<
cd/n},
e.a
B2nic):=sup{Ee\a'{x,~^i)\'^:aeU'^,\\a\\
= l,\\9-9of<cd/n}.
[35]:
in a neigh-
23
>
For any c
and
all
n there
are
<co +
Bin{c)
E3. The prior density n
p >
d'
and
proper and
is
cq
>
such that
and S2n(c) <
satisfies
cq
+
d".
a positivity requirement at the true
parameter
supln[7r(6i)/7r(6'o)]
=
0{d)
eee
where
9o
Moreover, the prior n also
the true parameter.
is
satisfies
the following
local Lipschitz condition
<V{c)^\\e ~
\\mr{e) -lm:{9o)\
for all 6
such that
||^
—
latter holding for all c
9q\\"
>
<
eoW
cd/n, and some V{c) such that V{c)
+
c^,
the
E4 The
following condition on the growth rate of the dimension of the parameter
assumed
Comment
is
to hold:
5.1. Condition
0.
E2 strengthens an analogous assumption
of Ghosal [17].
assumption are imphed by the analogous assumption made by Portnoy
is
cq
space
d^/n -^
E3
<
0.
similar to the assumption on the prior in Ghosal
assumption, see
[17].
[35].
Both
Condition
For further discussion of this
Condition E4 states that the parameter dimension should not grow too
[3].
quickly relative to the sample
Theorem
6.
Proof. See
Appendix A.
size.
Conditions EI-E4 imply conditions C1-C3.
Combining Theorems
D
1
and
6,
we have the asymptotic normality
\f{X)-4>{X)\d\=
A
/
|/(A)-0(A)|<iA
+
of the posterior,
Op(l)=Op(l).
Jk
Furthermore, we can apply Theorem 3 to the posterior density / to bound the convergence
time (number of steps) of the Metropohs walk needed to obtain a draw from / (with a fixed
level of accuracy);
The convergence time
is
at
most
Opid')
after the burn-in period; together
with the burn-in, the convergence time
Opid^nd).
is
bounds stated
Finally, the integration
previous section also apply to the posterior
in the
Non-Concave and Discontinuous Cases. Next we
5.2.
/.
consider the case of a d-dimen-
sional curved exponential family. Being a generaHzation of the canonical exponential family,
its
many
analysis has
enough to allow
with the previous example.
similarities
Nonetheless,
is
it
and even various kinds of non-smoothness
for non-concavities
general
in the log-
likelihood function.
NEl.
Let xi,
.
.
,Xn be
.
observations fi'om a d-dimensional curved exponential
iid
family with density
/(.T;e)=exp(x'0(77)-V„(0(r?))),
.
,
where ^ £ 0, an open subset
NE2. The
a convex compact set
=
^0
9{'q)
^f
^(%)- The mapping
some
for
=
c
6q
of IR
parameter of interest
>
c
rj
IR^^
h^
9(7])
,
and d
is
7/,
The
and that
1|0(?7)
-
r/(6'o)||
>
oo as n -^ oo.
takes values from
>
rjo
the interior of
lies in
true value of 6 induced by
Moreover, assume that
0.
—
whose true value
eo!!?/
rjo
-
is
'/oil
IR''^
to
IR''
rjo
is
given by
where c-d <
di
<
d,
the unique solution to the system
for
some
eo
>
and
all
?/
G *.
Thus, the parameter 9 corresponds to a high-dimensional linear parametrization of the
log-density,
and
[12],
describes the lower-dimensional parametrization of the density of interest.
rj
There are many
classical
Lehmann and
examples of curved exponential
Casella
[27],
and Bandorff-Nielsen
puts a curved structure onto an exponential family
is
rn{x,a)f{x,9)dx
This condition restricts 9 to
=
where component
rj
apphcations, often
moment
lie
[2].
a
=
moment
[18],
0.
on a curve that can be parameterized as
(a,/3) contains
Chamberlain
of the condition that
restriction of the type:
a
as well as other parameters 0.
{9{rj),i]
£ ^},
In econometric
restrictions represent Euler equations that result fi-om the data
X being an outcome of an optimization by rational decision-makers;
Singleton
example Efron
families; see for
An example
[7],
curved exponential framework
Imbens
is
[20],
see e.g.
and Donald, Imbens and Newey
Hansen and
[10].
Thus, the
a fundamental complement of the exponential fi-amework,
at least in certain fields of data analysis.
We
require the following additional regularity conditions on the
NE3.
G
:
For every
IR'''
-^
IR'^
k,
and uniformly
such that
G'G
in
7 £
mapping
9{-).
B(0,K.\/(i), there exists a linear operator
has eigenvalues bounded from above and away from
Figure
2.
solid line
is
This figure illustrates the mapping
the
mapping while the dash
by G. The dash-dot
The
9{-).
(discontinuous)
line represents the linear
line represents the deviation
map
band controlled by
induced
ri„
and
r2n-
zero,
and
every n
for
v^ (0(770 +
where
<
j|rin||
and
5i„
-
7/\/H)
<
||i?2n|j
d{vo))
Thus the mapping
rj
i— 9{i])
of
770-
More
=
An example
of a kind of
Again, given a prior
tt
R2n)G-f,
and 52nd
--*0.
implies the continuity of the
NE3
generally, condition
linearization in the neighborhood of
S2n-
+
allowed to be nonlinear and discontinuous. For example,
is
>
{Id
Moreover, those coefficients are such that
<52n-
^inVa^O
the additional condition of Ji„
= rm +
map
does impose that the
whose quality
rjo
is
rj
in a
neighborhood
admits an approximate
controlled by the errors Sin and
allowed in this framework
on 0, the posterior of
mapping
map
is
given in the figure.
given the data
is
denoted by
-
ni>{9{T])))
n
Mv) «
7r(0(r/))
•
H
fixi;
rj)
=
77(^(77))
exp
{ni'e{ii)
.
i=l
In this framework,
we
also define the local
from the true parameter
J
=
parameters to describe contiguous deviations
as
y/niv
-
Vo)
-
s,
s
=
{G'G)~^G'\/n{x
-
11),
where
s is
a
first
s:
=
£(7)
exp
{nx'{e{r,o
^n{d{rio
+
(7
+
—
E[s]
0, E[ss']
F = ^{'^ -
posterior density of 7 over T, where
The
maximum likehhood/extremum
order approximation to the normalized
estimate. Again, similar bounds hold for
s)/V^) -
'i]o)
=
-
+ n^iOim +
e{r]o))
{G'G)~^, and
s, is
(7
/(7)
+
=
,-
||s'||
=
/Z!^
,
Op(Vd).
where
s)/07)) - niid{^o)))
h + s)/V^)).
+
(5.22)
The condition on
NE4
The
the prior
prior n{rj)
Theorem
7.
Proof. See
Appendix A.
As
before,
is
the following:
<x 77(^(77)),
Conditions E2-E4 arid
Theorems
/"
where
Tr{9) satisfies
NEI-NE4
condition E3.
imply conditions C1-C3.
;
,
lH
,
1
1/(7)
and 7 prove the asymptotic normality
- 0(7)M7
=/
1/(7)-
<A(7)|rf7
+
of the posterior,
Op(l)=Op(l),
;
-
"
where
<t^[l)
=
^-
77^
exp
'
(27r)^/2det((G'G)-i)i/2
Theorem
3 implies further that the
sampling and integration apply to
'
6.
'
Tlris
mation
main
this
f
\
-^7'(G'G)7
2'
"^
results of the paper
on the polynomial time
curved exponential family.
'
Conclusion
paper studies the computational complexity of Bayesian and quasi-Bayesian
in large
esti-
samples carried out using a basic Metropolis random walk. Our framework
permits the parameter dimension of the problem to grow to infinity and allows the underlying log-likelihood or
We
extremum
criterion function to be discontinuous
and/or non-concave.
establish polynomial complexity by exploiting a central hmit theorem
provides structural restrictions for the problem,
i.e.,
approaches a normal density in large samples.
The
analysis of this paper focused on a basic
for its simplicity, it is
framework which
the posterior or quasi-posterior density
,,'
random
..',,.'.,,
.
;
_
walk. Although
not the most sophisticated algorithm available.
it is
widely used
Thus,
in principle
further improvements could be obtained by considering different kinds of
(or variance reduction schemes).
As mentioned
before, essentially only one
random walks
lemma
of our
analysis relies
analysis
on the particular choice of the random walk. This suggests that most of the
apphcable to a variety of different implementations.
is
Appendix
Proof of Theorem
Prom CI
1.
follows that
it
<
\f{X)-4>{X)\dX
f
Proofs of Other Results
A.
f \f{\)-4>W\dX+ f
=
/ |/(A)-^(A)|a!A
{f{X)
+
4>iX))dX
+ Op(l)
Jk
where the
.T
last
equahty follows from Assumption
J
JNow, denote
^
C^ =
m
-
1
Cl."^^
(27r)'^/2det(J-i)i/2
?
—-—
.
and write
r-;
=
^{X)dX
Cn exp[
In
e{X)- (~-A'JA
l\(p{X)dX
<P{X)
Combining the expansion
/(A)
-1
in
C2
with conditions imposed in
(j){X)dX
<
Jj^\Cn-exp{ei
0(A)
+ jj.
2
<
2|C„e°p(i)
follows
-
ezA' JA)
-
1|
<^(A)dA
/ |c„-e°p(^)-lU(A)dA
Jk
The proof then
+ e2X'JX)-l\4>{X)dX
exp (-ei
\Cn
<
C3
'
'
by showing that Cn -^
-
1-
1|
We
have that
R=
\\K\\
=
0{\/d), and by
assumption Cl
.iVJAg-ei-f(A'JA)^;^
e{X)dx
|A||<fl
C„
g{X)dX
(1+0(1))/
Jm<fi
(1
+
e-2^''^dX
0(1))/
Jm<R
-iA'(J+£2J)A
dX
det(J)
(1+0(1)) V det(J
+
7iiA|l<fi
(2^)'^/2
det((J
+
eo^J)-')'/^
e2J)
dX
l|Ai|<fl(27r)^/2det(J-i)V2
For the case of
densities, see
4>!
it
follows from the standard concentration of
Lovasz and Vempala
[31].
measure arguments
for
Gaussian
Since
<
£2
bounds
1/2,
we can
W ~ yV(0,
define
(1
+62)"^^"^) and
V~ N{Q,J-'^)
and rewrite our
as
1
e-2^1
1
Y/^P{\\W\\<R)
f
{l+o{l))\l+e2j
P{\\V\\<R)
>
-
/||A||<fi^(^)^'^
l+o(l)J^|A||<fl5(A)rfA
d/2
1
Vl+e2
(1+0(1))
where the
last inequality follows
< R) >
from P{\\W\\
P{\\./r+l^_W\\
< R) =
P{\\V\\
<
R).
Likewise,
Proof of
Lemma
Proof of
Lemma
Assume
—
since ei
i
c'
—
*
D
0.
result follows immediately
Let
3.
0, €2
>
The
2.
Vl-e2,
/||;,|]<^fi(A)rfA
<^n
Therefore Cn —^
M
:=
P
^^^
that there exists a partition of
r-
—
.
We
will use the Localization
Lemma
prove the
of
for
0,
i
=
D
(2.7)-(2.8).
lemma
K = Si U ^2 U S3, with d{Si, S2)
{Mls,{x) ~ ls,(x))f{x)dx >
We
from equations
will
bj'
contradiction.
>
such that
t
1,2.
Kannan, Lovasz, and Simonovits
[24]
in order to
reduce a high-dimensional integral to a low-dimensional integral.
Lemma
6 (Localization Lemma). Let g and h he two lower semi-continuous Lebesgue
integrable functions
on IR
such that
g{x)dx
Then
there exist two points a,b
£
M.^,
>
and
/
h{x)dx
>
and a linear function 7
T~{t)g{{l-t)a + tb)dt>0 and
/
0.
:
[0, 1]
f'-^(t)hi{l~t)a
-^
IR-i-
such that
+ tb)dt>0,
Jo
where
([a, 6],
Proof. See
By
7)
is
said to
form a
needle.
Kannan, Lovasz, and Simonovits
the Localization
i''~\-u-)f{{'^
Lemma,
-u)a +
[24].
there exists a needle (0,6,7) such that
ub){Mls.^{{l
-u)a +
ub)
-
ls:,{i.l
-u)a + ub))du >
0,
for
i
—
||6
=
>
a\\
=
Equivalently, using 7(u)
1,2.
-
l{u/\\h
and v :=
a\\)
{h
-
a)/\\b
-
a\\
where
we have
t,
\\b-a\\
1
for
i
—
{.u)f{a
+ uv)
(Mls^ia
/[|6-a||
+ uv)ls,{a + uv)du>
'y'^-\u)f{a
hand
left
must contain points
as, for
and
at least
is
^'^-'^{u)ls^{a
side of (A.23) be positive for
in Si
an interval whose length
Y
+
>
uv)) du
i
=
0,
1, 2,
r
In order for the
[a,b]
Is-^ia
can be rewritten
1,2. In turn, this last expression
M
+ uv) -
{u)f{a+uv)du>
M mm
^2. Since d{Si,S2)
We
t.
>
1
for
every
Y-\u)f{a + uv)du,
<^
and
=
i
2,
uj
(A.23)
the line segment
we have that ^3 n
t,
prove that
will
=
i
+ uv)f{a + uv)du.
[a, b]
contains
e IR
-f'^-\u)f{a
/
+ uv)du\
(A. 24)
which contradicts relation (A.23) and proves the lemma.
First, note that
f{a + uv)
—
m{a^uv) ~
e~ll''+™ll
e~""+'"i""''''''m(a
+ ui')
where
t\
:— 2a'v
and tq := -||a|p.
m(a + uv)7'^~^(t()
Next, recall that
By Lemma
7 presented in
such that 0m.{u)
So
and
si
?n, this
such that
.
Ti
+
m{a +
/
.
„
+
is
rw+t
P
m{a +
by
In sq,
^
e-""+''i"+''odu>
Jw
which
ioT
soe^'"
uv)f'^~''^{u)
To := tq
r+t
/3
sqb^^'^
< m(u)
u £ {w.w
+
uv)'y'^~^{u)
/3soe*^"
t)
every u. Moreover, there exists numbers
soe'^'^"'"'^
Due
to the log-concavity of
and
<
iri{u)
soe^^^
otherwise.
by Soe*^" on the right hand side of (A. 24) and
on the
left
hand
side of (A. 24).
we have
After defining ri
=
^
/•«
f
MminM
f\\b-a\\
,
„
.
e-"'+''i "+'''' du,
/
,
^
_
1
e~""+'"i"+''"(iu
^
Jw+t
\Jo
(A.25)
J
equivalent to
f
f2
(
fw
e-("-^)'+^°+^du>MminM
Jw
for
m(w + t)—
and
implies that
and
si
exists a unidimensional logconcave function in
uv)^'^''^{u)
m(w) =
Thus, we can replace
i
a unidimensional log-/3-concave function on u.
< m{a +
m(u) >
replace
is still
Appendix B, there
\^Jo
-
f2
e^^""^) +^°+^du,
p\\b-a\\
/
Jw+t
-
?-
1
e-^'^-i'^'+^^+^du}
J
(A.26)
.
Now, cancel the term
for
any w, (A. 26)
is
e^^^"^^!^
on both sides and; since we want the inequahty (A. 26) holding
impUed by
e'^^'du
^"^
>
holding for any w. This inequality
Proof of Corollary
/(.t)
The
Lemma
<
(where a~
Define
5.
and
Define the direction
H2 = {x G Bu n By
We
:
let
We
a'^I.
— B^jHSyn K. By
V.
satisfies the
by applying
jqJjji),
covariance matrix
and
Lemma
2.2 in
I
(A. 27)
[23].
For brevity, we
"""du
e
/
Kannan and
=
'^
Lemma
3
Consider the change of variables x
2.
e^'^m{\/2J~^/~x)
result follows
Proof of
Au,v
is
e'""'du,
/
<j
Li
reproduce the proof.
will not
=
min
^
assumption of
Lemma
3 with
Z^
Then,
x coordinates,
D
the radius of
is
in
and d{Si,S2) > t\/Xmin/2.
x coordinates.
K := B{0, R), so that R
.
K;
also let r := 4\/da
q{x\u) denote the normal density function centered at u with
use the following notation:
definition of r,
=
Bu
B{u,r), By
we have that /^ q[x\u)dx
=
=
B{v,r), and
f^ q{x\v)dx
>
1
—
\.
— u)/\\v — u\\. Let Hi = {x £ B^nBy w'{x — u) > Hu — w]]/2},
w'{x — u) < \\v — u\\/2}. Consider the one-step distributions fi-om u
w=
{v
:
have that
\\Pu-Pv\\tv
1-/
<
mm{dPu,dPy}
J Au.v
=
1-^^
<
1
-
:
min|.7(.rl.)min|^,l|,g(x|i>)min|^,lUdx
JAu.v
<
,.'
,
,
where
\\u
—
v\\
arbitrary u €
We
mm{q{x\u),q{x\v)}dx
Pe~^''
/
first
<
l-/3e-^'"(/
a/S.
Next we
will
q{x\u)dx+ f
q{x\v)dx-)
bound from below the
last
sum
of integrals for an
K.
bound the
integrals over the possibly larger
.sets,
respectively
H\ and Ho- Let
h denote the density function of a univariate random variable distributed as N{0,a~).
is
easy to see that h{t)
the direction
w
Note that B^ C
=
J^,,_s_^q{x\u)dx,
(up to a translation). Let //s
HiU
{Ho —
\\u
—
v\vS)
U iJa
i.e.
=
{x
h
:
is
It
the marginal density oi q{.\u) along
—\\u
—
v\\/'2
where the union
is
< w'{x disjoint.
u)
<
\\v
—
u\\/2}.
Armed with
these
observations,
we have
q{x\u)dx
+
=
q{x\v)dx
/
J
>
+
q{x\u)dx
/
Jh2
—
q{x\u)dx
I
r\W-v\\/2
r
—
q{x\u)dx
/
1
-
h{t)dt
I
JBu
>
q{x\u)dx
/
JH3
J Bu
=
q{x\u)dx
/
JH2-\\u-v\\w
Hi
J-\\u-v\\/2
^
- /
4
e
y_||„_^||/2
1
1
„
„
dt
\/2na
.
\\u
—
< a/8 by
v\\
In order to take the support
K
1
8^/2n
.
/27r cr
where we used that
1
e^
l---\]u-v\\^^>l~^
>
=>4A.28)
10
the hypothesis of the lemma.
into account,
we can assume that u,v G dK,
i.e.
||m||
=
= R (otherwise the integral will be larger). Let z = {v + u)/'2 and define the half space
Hz = {x z'x < z'z} whose boundary passes through u and v (Using ||u|| = \\v\\ = R it
follows that z'v = z'u = z'z/2).
||w||
:
By
the
symmetry
normal density, we have
of the
=
q{x\u)dx
Although
HidH.
does not he in
q{x\u)dx.
-^
^ JHi
HiHH^
K in general, simple arithmetic shows that HinlH^ —
^jrliT
)
fill
13
A'.
Using that /„
w„
r^^
q{x\u)dx
'?(^|^')
,
>
I
>
-
1
=
q{x\v)dx
^
||j/-
lilii
nal to
Ibll
=
z.
(m)|
<
Since y
lly
6
/(fST^NP
-
-II
('^i
<
=
lly
>
q{x\u)dx
I
—
h{t)dt
/
.rV«e-*'/2-=
f
q{x\u)dx
/
-
^ J Hi
"Indeed, take y £ Hi n (if; -
we have
h{t)dt,
Jn
dt
/
\J2i\a
Jo
5?" "Fir )
•
We
can write y
- ^11 < \\y -
""
a" irfir)'
^"^
^^^^ \^^
^{R-ff+r^ =
^JR' -
"11
+
< ^h ~
2Hf +
=
IIIJ'
^
^+
\\^A +
y^.
~
~
r2
'^11
H^H
<
R.
^
~
'')
^
where
s,
^"^
-
s
||s||
is
<
also
^ ~ ^'
r
(since
orthogo-
Therefore,
Q
where we used that
By symmetry,
respectively.
j^
<
—^
—
since r
4^/du and
Adding these
inequalities
q{x\u)dx
< j^.
and H2
are replaced by v
and using (A. 28), we have
\
f
+
4
—9 - 777^
^ V3-
>
q{x\v)dx
/
JH2nK
HiHK
-^
when u and Hi
the same inequahty holds
-^U
J
15\/27r
Thus, we have
\\Pu-PA<\- P-Lr
3
and the
result follows since
Proof of
Lemma
Starting from an arbitrary point in
1.
walk makes a proper move.
max^:Q(^)>o,.4e.4
The
result follows
D
Lr <\.
§4!
If this is
by invoking the
CLT
We will use the notation
K We have that
point in
random
{-iirf- Aet{J-')eh-'J-e^^^+'-^--'^-
restrictions.
Next we show that the probability p
constant.
the
the case note that
mB.^,eK0^
<
K, assume that
making a proper move
of
defined in the proof of
Lemma
is
at least a positive
Let u be an arbitrary
5.
.
= j^mmi^^y
p
^
l^q{x\u)dx>l3e-^'' ^g^^i^q{x\u)dx
-
Sb^^h^ q{x\u)dx
/^e-^'-
/;'/^ h{t)dt
>
i.
D
Proof of Theorem
5.
Consider the sample
i
ms) as
(A^'-^, A^'-^,
defined by
^
1
with the underlying sequence
mean
=l
produced by one of the schemes
...,
A^'-^)
is
produced by iterating the chain
(lr, ss,
follows:
• for lr,
A*'^
=
A'"*"^,
where
starting with an initial
A'^-^
draw
starting with /o by T^/q.
B
fi-om /q. Define the density after
Thus A^ has the
distribution
B+
i
times
steps of the chain
T^/o, and A*+^ has the
distribution T'+^/o.
• for ss, A*''^
• for
an
= A^^^,
ms, A*'^ are
initial
i.i.d.
where 5
is
the
draws from
number
T^ fo,
i.e.
of
draws that are "skipped".
each i-th draw
point from /o and iterating the chain
B
times.
obtained
is
•
.
bj'
sampling
We
have that
MSE{-[1b,n)
=
=
Etbj^ [M5£(/2b,jv|A^
= Ef
[MSE{11b,n\>'^
=
A)]
Ef [MSE{j1b,n\X'^ =
<
A)]
A)]
= Ej MSE{J1b,n\X^ =
+ Ef
=
MSE{i2b,n\>^^
r%(A) -
+ fEf
r^/o(A)
\)
/(A)
r^/o(A)
A)
-
1
/(A)
1
/(A)
= {a%/N) + 2g^T''fo-f\\TV,
where
cr^ yy is
the variance of tlie sample average under the assumption that
(We
exactly according to /.
The bound on
We
cr~ j^
will
also used the fact that \\T^ fo
— fWrv < «%
is
smaller than ?/3, which
Using Theorem
x/Mfl-C
2
2,
since /o
<
VMe~
<
In
Specifically,
cr'^y-
lary
2.
Thus,
To bound
j^
<
q
^^^
Nir, note that a^
Nij.
Nss,
=
2a 6
we
^^-^^
first
<Jg
70
+
where we used Corollary 2 and that
choosing the spacing
S
distributed
equivalent to imposing
is
M-warm
start for /,
Q\/MgIn
mean square
<
Jo-§i
above
2Af7i
where the
A''-^
70
+
e.
last inequality follows
suffice to obtain
<
of post-burn iterations Ni^,
error less than
must choose a spacing S
71n <
a
is
we determine the number
Ngs, or A^ms needed to set the overall
To bound
is
k - /lUi-)
QVMg-
B >
Next we bound
X^
\\\T^
depend on the particular scheme, as discussed below.
require that the second term
that \\T^ fo
- fWrv =
MSE{j1b,n) <
to ensure that the autocovariances 7^
2iV7o
f(
1
and A'+^'^ are spaced by S steps of the chain. By
as
2\5
<
e"'^^
from Corol-
£
<
.,..e.S,l,„(^)
=
and using Nss
mean square
the
,
error for the ss
method can be bounded
as
e
j^ (70 + 2Ar,37i) + 25^^/0 -/IItk
<
MSE{j1b,n)
To bound Nms, we observed that
^
+ £/3 <
£
provided that
Proof of Theorem
Nr,is
>
7/,-
=
for all
=
B(0,
where ||A'f
IJA'II)
our condition Cl
is
satisfied
by the argument given
C2
is
satisfied
by the argument given
£1=0
implying that
7^
A'ISE{'p,B^j\!)
<
Given
6.
A'
our condition
/c
D
2jo/{3e).
in
=
erf,
proof of Ghosal's
Lemma 4. Further,
Lemma 1 with
proof of Ghosal's
in the
and
1
^2=0
6
and our condition C3
is
cd ^
fed ^ ,^.
—Bin{0) + —B2n{c)
\\\ n
n
,
/
,
satisfied since
by E3 and E4
D
02\\Kf-Q.
Comment
A.l. Ghosal
normal measure under d
Proof of Theorem
7.
^
in the
[3]
and condition C3
= Op
£2
= Op
is
for details).
Lemma
where \\Kf
<52.
+
satisfied since
<5.?„
+
(
1
A. 2. For further
and
NE3
for
some
satisfied
with
,
by E3, E4, NE3, and NE4,
details
Cdi
is
A&1„(0) + —B2n{C)
e2\\Kf^0.
Comment
=
NE3. Further, condition C2
Lemma
(C + (1 + 52n)SlnVd?j
f
B{0,C\/d\ogd).
a
and discussion, see
[3].
His
to the concentration of
[3].
Then condition Cl
4 and
proof of Ghosal's
£1
= B(0,CVd) due
K = B{0, \\K\\),
Take
given in the proof of Ghosal's
argument given
for the set A'
00 asymptotics. For details, see
large independent of d (see
=
proves his results for the set K'
[17]
arguments actually go through
C
sufficiently
by the argument
is
satisfied
by the
Appendix
Lemma
Let /
7.
:
—
IR
;
<
=
<
f[x)
g{x)
for every
a;
every x, that
for
:
e
fc
N,A
e IR'',A
Recall that the epigraph of a fuirction
in IR". In fact, the values of
=
w
is
is
one dimensional,
defined as epi^,
=
{{x,t)
conv(epi^) (the convex hull of
Consider {x,m{x)) e epim, since the epigraph
is
Moreover,
:
.
J
<
t
w{x)}. Using
where both
epi/j),
sets
the boundary of conv(epi/i).
convex and this point
H on (x, m{x)).
-^
i=\
m are defined only by points in
there exists a supporting hyperplane
be the smallest
k
1=1
our definitions, we have that epim
m
let
> 0,^A, = l,^A,yi = x\
1=1
H
Now,
k
^\rh{yi)
points of
IR.
is,
{k
Since
€
lnf{x) a (ln/3)-concave function.
concave function greater than h{x)
he
there exists
IR -^ IR such that
Pgix)
Proof. Consider h{x)
Then
IR 6e a unidimensional log-P -concave function.
>
a logconcave function 5
Bounding log-/?-concave functions
B.
[x,
is
on the boundary,
m{x)) G conv {epijiOH).
can be written as convex combination of at most 2
(a;,?7i(a;))
epifi.
Furthermore, by definition of log-/?-concavity,
lnl//?>
sup
Xh{y)
we have
that
+ {l~X)h{z)-h{Xy + {l-X)z).
\e[0,l],y,z
Thus, h{x)
g{x)
=
< m{x) <
e'"^^) is
h{x) +ln(l//?). Exponentiating gives f{x)
<
<
g{x)
4/(a;),
where
D
a logconcave function.
References
[1]
D.
Applegate and R. Kannan, Sampling and
ACM
23th
STOC,
Integration of Near Logconcave Functions, Proceedings
156-163, 1993.
[2]
O. Barndorff-Nielsen, Information and exponential families in statistical theory. Wiley Series in
[3]
A.
Probability
and Mathematical
Belloni
tions
of
and
Curved
Statistics.
John Wiley
On
Chernozhukov,
V.
Exponential
Families
the
under
&
Sons, Ltd., Chichester, 1978.
Asymptotic
Increasing
Normality
Dimensions,
of
IBM
Posterior
Research
Distribu-
Report,
http://web.mit.edu/belloni/www/curvexp.pdf.
[4]
P. J.
BiCKEL and
[5]
11,
BUNKE, O., MiLHAUD,
els.
[6]
A. Yahav,
J.
Wahrsch. Verw. Geb
The Annals
G. Casella
Some
contributions to the asymptotic theory of Bayes solutions, Z.
257-276 (1969).
X., 1998. Asymptotic behavior of
Bayes estimates under possibly incorrect mod-
of Statistics 26 (2), 617-644.
AND C.
P.
Robert, Monte Carlo
Statistical Methods, Springer
Texts
in Statistics, 1999.
G. Chamberlain, Asymptotic efficiency in estimation with conditional moment restrictions. Journal
of Econometrics 34 (1987), no.
V.
Chernozhukov and
3,
305-334
Hong, An
H.
MCMC approach to classical estimation, Journal of Econometrics
115 (2003) 293-346.
S.
Chib, Markov Chain Monte Carlo Methods: Computation and Inference, Handbook of Econometrics,
Volume
S.
5,
by
Heckman and
J.J.
G. Donald, G.
moment
with conditional
E. Learner, 2001 Elsvier Science.
W. Imbens, W.
Newey, Empirical
K.
R.
Dudley, Uniform Cental Limit Theorems, Cambridge Studies
B.
Efron, The geometry of exponential
G.
S.
Annals of
in
Operations Research Letters, Vol.
Kannan and
Probability
16,
No.
4,
November
1,
tests
55-93.
advanced mathematics (2000).
Statistics, 6 (1978), no. 2,
FiSHMAN, Choosing sample path length and number of sample paths when
A. Frieze, R.
J.
families,
and consistent
likelihood estimation
restrictions. Journal of Econometrics, 117 (2003), no.
362-376.
starting at steady state,
1994, pp. 209-220.
N. Polson, Sampling from log-concave functions. Annals of Applied
pp. 812-834.
4,
Geweke and M. Keane,
Computationally Intensive Methods for Integration in Econometrics, Hand-
book of Econometrics, Volume
Geyer,
5,
by
J.
S.
Ghosal, Asymptotic normality
Practical
J.J.
Heckman and
Markov Chain Monte
C.
E. Leamer, 2001 Elsvier Science.
Carlo, Statistical Science 1992, Vol.
7,
of posterior distributions for exponential families
No.
4,
when
473-511.
the
number
of parameters tends to infinity. Journal of Multivariate Analysis, vol 73, 2000, 49-68.
L. P,
Hansen and K.
J.
tional expectations models,
Singleton, Generalized instrumental variables estimation of nonlinear
Econometrica 50 (1982), no.
L Ibragimov and R. Has'minskii,
G.
W.
5,
Statistical Estimation:
ra-
1269-1286.
Asymptotic Theory, Springer, Berlin (1981).
Imbens, One-step estimators for over-identified generalized method of moments models. Rev.
Econom. Stud. 64
(1997), no. 3, 359-383.
M. Jerrum and A. Sinclair, Conductance and
the rapid mixing property for
approximation of permanent resolved. Proceedings of the 20th Annual
Computing
ACM
Markov
chains:
the
symposium on Theory
of
(1988), 235-244.
M. Jerrum and A. Sinclair, Approximating
the peimanent,
SIAM
Journal on Computing, 18:6 (1989),
1149-1178.
R.
Kannan and
G.
Li,
Sampling according
on Foundations of Computer Science
R.
Kannan,
ization
R.
C. KiPNIS
J.
L.
convex bodies,
AND
L.
Discr.
Comput. Goem., volume 13,
Annual Symposium
(1995), pp. 541-559.
R.
S.
S.
Varadhan, Central
to simple exclusions,
Statistics. Springer- Verlag,
Tian and
MCMC samplers.
algorithm, for
Structures and Algorithms, volume 11, (1997), pp. 1-50.
limit
L.J.
theorem for additive functionals of reversible processes
Comm. Math.
Lehmann and G. Gasella, Theory
J.S. Liu, L.
density, 37th
Lovasz, and M. Simonovits, Random walks and an 0'{n^) volume
Random
and applications
E.
normal
'96), pp. 204.
Lovasz, and M. Simonovits, Isoperimetric Problems for Convex Bodies and a Local-
L.
Lemma,
Kannan,
to the multivariate
(FOGS
Phys., 104, 1-19.
of point estimation. Second edition. Springer Texts in
New York, 1998.
Wei Implementation
of estimating-function based inference proced.ures with
Journal of American Statistical Association, to appear.
[29]
J.S. Liu,
[30]
L.
Random
[31]
L.
L.
Strategies in Scientific Computing, Springer- Verlag,
in
New York
2001.
Convex Bodies and an Improved Volume Algorithm,
Structures and Algorithm, 4:4 (1993), 359-412.
LoVASZ AND
appear
[32]
Monte Carlo
LOVASZ AND M. SiMONOVlTS, Random Walks
in
LoVASZ and
Vempala, The Geometry
S.
Random
S.
of Logconcave Functions
and Sampling Algorithms, To
Structures and Algorithms.
Vempala, Hit-and-Run
is
Fast and Fun, Technical report MSR-TR-2003-05, 2003.
Available at http://www-math.mit.edu/ vempala/papers/logcon-hitrun.ps.
[33]
L.
LovASZ and
S.
Vempala, Hit-and-Run from
a
Comer, Proc.
Theory of Computation (STOC'04)(2004), pp. 310-314.Available
at
of the 36th
ACM
Symp. on the
http://www-math.mit.edu/ vem-
pala/papers/start.ps.
[34]
N, POLSON, Convergence of
Markov Chain Monte Carlo Algorithms, Bayesian
Statistics 5, pp. 297-321
(1996).
[35]
S.
PoRTNOY, Asymptotic behavior
parameters tends
[36]
to infinity.
Ann.
1,
when
the
number of
356-366.
G. O. Roberts and R. L. Tweedie, Exponential Convergence of Langevin Diffusions and Their
Discrete Approximations Bernoulli,
[37]
of likelihood methods for exponential families
Statist. 16 (1988), no.
C.
J.
2,
(1996), pp. 341-364.
Stone, M, H. Hansen, C, Kooperberg, and Y. K. Truong, Polynomial
splines
and their
tensor products in extended linear modeling. With discussion and a rejoinder by the authors and Jianhua
[38]
Z.
Huang. Annals of
A.
W. van der Vaart and
Statistics, 25 (1997), no. 4,
J.
1371-1470.
A. Wellner, Weak Convergence and Empirical Processes Spring Series
in Statistics (1996).
[39]
S.
Vempala, Geometric Random Walks: A
Publications
Volume
Survey, Combinatorial and Computational Geometry,
52, 2005.
3521
83
MSRI
Download