Digitized by tine Internet Arciiive in 2011 with funding from Boston Library Consortium IVIember Libraries littp://www.archive.org/details/oncomputationalcOObell fimEY HB31 .M415 Massachusetts Institute of Technology Department of Economics Working Paper Series ON THE COMPUTATIONAL COMPLEXITY OF MCMC-based ESTIMATORS IN LARGE SAMPLES Alexandre Belloni Victor Chernozhukov Working Paper 07-08 February 23, 2007 Room E52-251 50 Memorial Drive Cambridge, MA 021 42 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://ssrn.com/abstract=966345 On the Computational Complexity of MCMC-based Estimators in Large Samples Alexandre Belloni' and Victor Chernozhukov^ Revised February 23, 2007. Abstract This paper studies the computational complexity of Bayesian and quasi-Bayesian mation in large covers cases where the underlying likelihood or extremum concave, discontinuous, and of increasing dimension. provide structural restrictions for the problem, tionally efficient. samples is particular is Specifically, it bounded is esti- samples carried out using a basic Metropolis random walk. The framework is in probability it is criterion function possibly non- shown that the algorithm shown that the running time by a polynomial is Using a central limit framework to in the is parameter dimension of stochastic order d" in the leading cases after the burn-in period. that, in large samples, a central limit computa- of the algorithm in large d, and in The reason theorem implies that the posterior or quasi-posterior approaches a normal density, which restricts the deviations from continuity and concavity in a specific manner, so that the computational complexity to exponential and curved exponential Key Words: polynomial. is An appfication given. Computational Complexity, Metropolis, Large Samples, Sampling, gration, Exponential family, HBM is families of increasing dimension Moment Inte- restrictions TJ Watson Research Center and MIT Operations Research Center. Email: belloni@mit.eclu http://web.mit.edu/belloni/www Research support from a National Science Foundation grant . is Url; gratefully acknowledged. ^Department of Economics and http://web.mit.edu/vchern/www fully . MIT Operations Research Center. Email: vchern@mit.edu Research support from a National Science Foundation grant acknowledged. 1 is Url: grate- 1. Introduction Markov Chain Monte Carlo (MCMC) algorithms have dramatically increased the use and quasi-Bayesian methods of Bayesian methods rely and for practical estimation inference.'^ Bayesian on a likelihood formulation, while queisi-Bayesian methods replace likelihood by other criterion functions. This paper studies the computational complexity of a basic MCMC algorithm as both the sample and parameter dimension grow to infinity at appro- The paper shows how and when priate rates. the large sample asymptotics places sufficient on the likelihood and criterion functions that guarantee the restrictions polj'nomial time - computational complexity of these algorithms. efficient - that These is, results suggest that at least in large samples, Bayesian and Quasi-Bayesian estimators can be computationally efficient alternatives to in cases in where hkelihoods and parameters of maximum all and possibly non-smooth interest. To motivate our analysis, consider the M-estimation problem, which The of estimating various kinds of regression models. maximize some extremum estimators, most of likelihood and criterion functions are non-concave is a common method idea behind this approach to is criterion function: n QnW = -^m(r, -(7,:(X^,e)), eeecM'', (1.1) )=1 where In Yi is the response variable, X, many examples, estimator may be problem is problem is the problem difficult or is is ?n(-) to . dependence on .). This problem fits in for instance, ''See e.g. management a major be the asymmetric absolute deviation function'' all f{Xi, qi-i,qi-2, = {a — l{u < 0))u. past data and accurately capture •••; d) = books of Casella and Robert methods and and the M-estimation framework GARCH-like dependencies, leading research in this area^ considers recursive models of the form ^See a regression function. impossible to obtain. For instance, in risk m{u) MCMC is to predict the a-quantile of a portfolio's return Yi tomorrow, given today's by taking function reflect g,; that of constructing the estimates of Conditional Value-at-Risk. In particular, the past available information [Xi, Xi-i, To a vector of regressors, and nonlinear and non-concave, implying that the argmax X'^j [6], qi = f{Xi, qi^i,qi^2, ',d) + piqi-i + p2qi-2- This implies a highly non-linear, Chib [9], Geweke [15], Liu [29] for detailed treatments of the their apphcations in various areas of statistics, econometrics, and biometrics. Koenker and Bassett (1978). ^E.g. Engle and Maganelli (2005). . , .. , '. :., ,..•, • recursive specification for the regression function qi{-; 9), terion function used in M-estimation defined in (1.1) in this example, the function Qn{S) is e may be very hard to obtain. Figure 1 is which in turn implies that the cri- generally non-concave. Furthermore, non-smooth. As a consequence the argmax estimator & &vgmaxQn{e) (1.2) in Section 2 illustrates other kinds of examples where the argmax computation becomes intractable. As an argmax estimation, consider the Quasi-Bayesian estimator obtained alternative to by integration in place of optimization: 6'exp{Q„( (1-3) eMQn[0')]de' may be This estimator TT-niO) DC recognized as a quasi-posterior (Of course, when expQ„(6'). redundant.) This estimator is often much easier to will show that mean of the quasi-posterior density a log-likelihood, the term "quasi" becomes is not affected by local discontinuities and non-concavities and is compute if than the argmax estimator; in practice Hong the discussion in Chernozhukov and This paper Qn [8] and Liu, Tian, and Wei see, for example, [28]. the sample size n grows to infinity and the dimension of the problem d does not grow too quickly sample relative to the size, the quasi-posterior exp{Q„(g)} ^^'^^ L/e will be approximately normal. This result (1.3) is in turn leads to the main claim: the estimator can be computed using Markov Chain Monte Carlo the starting point As e-XY>{Qn{0')]de' standard is in drawn from the approximate support in tlie literature, we measure running time polynomial time, provided of the quasi-posterior (1.4). in the irumber of evaluations of the numerator of the quasi-posterior function (1.4) since this accounts for most of the computational burden. In other words, above estimator is when the central hmit theorem (CLT) for the quasi-posterior holds, the computationally tractable. The reason is that the CLT, in addition to implying the approximate normahty and attractive estimation properties of the estimator 0, bounds non-concavities and discontinuities that the computational time the is bound on the running time polynomial of Qn{d) in the of the algorithm in a specific manner that imphes parameter dimension is d. In particular, Op{d^) in the leading cases after the so-called burn-in period. Thus, our CLT main insight is to bring the structure implied by the MCMC into the computational complexity analysis of the and sampling from of (1.3) Our algorithm for computation (1.4). analysis of computational complexity builds on several fundamental papers studying the computational complexity of Metropolis procedures, especially Applegate and [1], Kannan and Poison Pi'ieze, nan and Li [23], of our results oped [14], Poison Lovasz and Simonovits and proofs We in these works. rely [34], [30], Kannan, Lovasz and Simonovits and Lovasz and Vempala upon and extend the mathematical Kannan [24], [31, 32, 33]. Kan- Many tools previously devel- extend the complexity analysis of the previous literature, which has focused on the case of an arbitrary concave log-likelihood function, to nonconcave and nonsmooth tings it is cases. The motivation typically easier to is that from a statistical point of view, in concave set- compute a maximum extremum estimate than a likelihood or Bayesian or quasi-Bayesian estimate, so the latter do not necessarily have practical appeal. In contrast, when the log-likelihood or quasi-Hkelihood is either nonsmooth, nonconcave, or both, Bayesian and quasi-Bayesian estimates defined by integration are relatively attractive computationally, compared to maximum likelihood or extremum estimators defined by optimization. Our analysis also relies on statistical large sample theory. posteriors and quasi-posteriors for large samples as n — > oo. We invoke limit theorems for These theorems are necessary to support our principal task - the analysis of computational complexity under the restrictions of the CLT. As a preliminary for quasi-posteriors literature for posteriors 1809, Bickel and Yahav and quasi-posteriors [4], provided CLTs theorems Wei provided [28] step of our computational analysis, and posteriors which generalizes the CLTs for fixed Ibragimov and Hasminskii for posteriors. dimension. In particular, Laplace [19], c. and Bunke and Milhaud (1998) [8] and Liu, Tian, and formed using various non-likehhood criterion we allow for increasing dimensions. Ghosal CLT for posteriors with increasing dimension, but only for families. We go beyond such canonical setup and establish the CLT discontinuous cases. We also allow for general criterion functions in also previously derived a concave exponential for we obtain a new CLT previously derived in the Chernozhukov and Hong for quasi-posteriors functions. In contrast to these previous results, [17] CLT non-concave and place of hkelihood functions. The paper also illustrates the plausibility of the using exponential and curved exponential families. when the data must satisfy additional moment The curved families arise for restrictions, as e.g. in approach example Hansen and Singleton Chamberlain [18], and Imbens [7], [20]. The curved famihes fall outside the log-concave framework. The rest of the paper organized as follows. is Theorem version of the Central Limit may be result that 2, we establish a generahzed seen as a generahzation of the classical Bernstein- Von-Mises theorem, in allows the parameter dimension to grow as the sample size grows, it n — as In Section Bayesian and Quasi-Bayesian estimators. This for > DO. In Section the complexity of 2, MCMC we also formulate the main problem, which is d i.e. — > oo to characterize sampling and integration as a function of the key parameters that describe the deviations of the quasi-posterior from the normal density. Section 3 explores the structure set forth in Section 2 to find bounds on conductance and mixing time MCMC of the algorithm. Section 4 derives bounds on the integration time of the MCMC algorithm. Section 5 considers an application to a broad class of curve.d exponential famihes, which are possibly non-concave and discontinuous, and We verify this class of statistical models. verifies that our results apply to that high-level conditions of Section 2 follow from primitive conditions for these models. The Setup and The Problem 2. Our We analysis is motivated by the problems of estimation and inference in large samples. consider a "reduced-form" setup formulated in terms of parameters that characterize local deviations from the true statistical parameter. ® The contiguous deviations fi-om the true parameter and we shift of the extremum estimator and — it), s ^/n{9 — we have the 6o) the local That s. is, The parameter space - parameter A describes first order approximation denoting a parameter vector, 9o the true value, normalized extremum estimator (or a first order approximation to for 9 is = ^i{9 - 9o) - s. 0, and the parameter space for A is therefore A = \/n(9 — s. The corresponding localized likehhood (or localized criterion) function For example, suppose Ln{9) or, local by a parameter A defined as A ^o) for 9 it more generally, Ln{9) Examples set-ups. in is is is denoted by ^(A). the original Hkelihood function in the hkelihood framework exp{nQ„(6)} where Qn{9) is the criterion function in extremum Section 5 further illustrate the connection between the localized set-up and the non-localized framework, then = e{X) The assumptions below + L„(0o (A + s)/M/Lnm- be stated directly will in terms of ^(A). (Section 5 provides more primitive conditions within the exponential and curved exponential family framework.) Then, the posterior or quasi-posterior density for A takes the form (impUcitly indexed by the sample size n) /(A) and we impose conditions that = (2.5) ryrhrr force the posterior to satisfy a CLT in the sense of approach- ing the normal density = (j){X) — exp ^ More n — > formally, the following conditions are These conditions, which 00. explicitly allow for CI. The s is n local assumed to hold mean fj^ local A = U Jf fiX)dX > where K'=, 1 - Op(l) is' is and £2 such that is := oo): a closed ball 5(0, > I - size conditions," A C ||fs:||) M'^. The vector bounded above with \\K\\ as = CVd o{l)J i.e., there exist positive approximation errors every X G K, for ei 4- £2 A'JA/2, • (2.6) a symmetric positive definite matrix with eigenvalues bounded away from zero and IK'lli > parameter space A e Jj. 4>{X)dX \n^{X)-l^-h'JX^ < where J "CLT or quasi-posterior function i{X) approaches a quadratic form in logs, uniformly in K, and — sample for ^(A) as the will call the vector with variance 0, whose eigenvalues are C2. The lower semi-continuous posterior ei we an increasing parameter dimension d (d -> oo, and such that 2 V in the following parameter A belongs to the a zero -^A'JA I (27r)'i/2det(J-i)^/2 fi'om above. Also, we denote the ellipsoidal norm induced by J as WJ'^M- C3. The approximation errors ei and £2 satisfy £1 = Op{l), and £2 • |lA'||j = Op(l). These conditions imply that i{X)=g{X)-m{X) ''Note that \\K\\ := sup{|ja|| concentration of measure under d ; a G K}. — > The constant 00 asymptotics. C need not grow due to the phenomenon of r?0^. -^K Figure 1. This figure illustrates how ln<'(A) can deviate from ln(;((A) in- cluding possible discontinuities on In^(A). over the approximate support set K where ln5(A) -ei Figure 1 - e2A'JA/2 = --A'JA, < Inm(A) < illustrates the kinds of deviations of lnl{X) the parameters ei and €-2, and es controls + e2A'JA/2. (2.8) from the quadratic curve captured by shows the types of discontinuities and non-convexities Parameter permitted in our framework. parameter also ei (2.7) ei controls the size of local discontinuities and the global tilting away from the cjuadratic shape of the normal log- density. Theorem 1. [Generalized CLT for Quasi-Posteriors] Under the conditions stated above, the density of interest £(A) (2.9) approaches a normal density I 1/(A) - (p{X) 0(A)|dA JA Proof. See Comment tial for with variance matrix = / JK 1/(A) - cj>{X)\d\ J in the following sense: + Op(l) = Op(l). (2.10) D Appendix A. 2.1. Theorem 1 is a simple preliminary result. defining the environment in which main However, the result results of this is essen- paper - the computational complexity results - will be developed. The theorem shows that some regularity conditions hold, Bayesian ple properties. The main CLT implications of the part of the paper, namely Section samples, provided 3, develops the computational In particular, Section 3 shows that polynomial time conditions. MCMC computing of Bayesian and Quasi-Bayesian estimators by CLT in large and Quasi-Bayesian inference has good large sam- is in fact implied by the conditions. Comment 2.2. By allowing increasing dimension (d -^ cx)) Theorem 1 extends the CLT previously derived in the hterature for posteriors in the likelihood framework (Bickel and Yahav [4], Ibragimov and Hasminskii quasi-posteriors in the general Bunker and Milhadu [19], general criterion functions (Chernozhukov and theorem is [5], Ghosal extremum framework, when the likehhood more general than the Hong Ghosal results in [8], [17], Liu, Tian, who [17]) and for replaced by is and Wei [28]). The also considered increasing dimensions but limited his analysis to the exponential likelihood family framework. contrast, Theorem of posteriors. 1 In allows for non-exponential families and allows quasi-posteriors in place Recall that quasi-posteriors result from using quasi-hkehhoods and other This expands substantially the scope of the criterion functions in place of the likelihood. applications of the result. discontinuities in the apphcations listed The Problem Importantly, Ukehhood and Theorem 1 allows for non-smoothness and even criterion functions, which are pertinent in a number of in the introduction. of the Paper. Our problem is to characterize the complexity of obtaining draws from /(A) and of Monte Carlo integration J g{X)f{X)dX, where /(A) is restricted to the approximate support K. The procedure used to obtain the basic draws as well as to carry out standard I. Monte Carlo MCMC integration a Metropolis (Gaussian) random walk, which is algorithm used in practice. The is a tasks are thus: Characterize the complexity of sampling from /(A) as a function of [d, n, ei, £2, K); K] II. Characterize the complexity of calculating J g{X)f{X)dX as a function of III. Characterize the complexity of sampling from /(A) and performing integrations with /(A) in large samples as d, ?z — » oo by invoking the bounds on {d, n, £i, £2, (d, n, ei, e2. A') imposed by the CLT; IV. Verify that the CLT conditions are apphcable in a variety of statistical problems. This paper formulates and answers this problem. strictions into the complexity analysis Thus, the paper brings the and develops complexity bounds integrating from /(A) under these restrictions. These CLT for CLT re- sampling and restrictions, arising by using large sample theory and imposing certain regularity conditions, limit the behavior of /(A) over the approximate support set nomial computing time K do not provide strong restrictions on the our analysis of complexity manner that in a specific allows us to establish poly- sampling and integration. Because the conditions for is tail behavior of /(A) outside K CLT for the other than Cl, limited entirely to the approximate support set K defined in C1-C3. Comment 2.3. By solving the above problem, this paper contributes to the recent liter- ature on the computational complexity of Metropolis procedures. Early work was primary concerned with the question of approximating the volume of high dimensional convex where uniform densities play a fundamental Lovasz and Simonovits the log-likelihood Vempala is [24, 25]). concave [31, 32, 33]). role (Lovasz and Simonovits set's Kannan, [30], Later the approach was generalized for the cases where (Fi'ieze, Kannan and Poison [14], However, under log-concavity the Poison maximum and Lovasz and [34], hkelihood estimators are usually preferred over Bayesian or quasi-Bayesian estimator from a computational point of view. In the absence of concavity, exactly the settings where there is a great practical appeal for using Bayesian and quasi-Bayesian estimates, there has been relatively any, analj'sis. One important exception smooth covers nearly-concave but the paper of Applegate and Kannan less, if [1], densities using a discrete Metropolis algorithm. Kannan contrast to Applegate and is [1], which ® In our approach allows for both discontinuous and non-concave densities that are permitted to deviate fi^om the normal density (not from an arbitrary log-concave density, like in Applegate and Kannan manner motivated by the in which they deviate from the normal by parameters ti and e2, which are is in turn restricted restrictions provide a congenial analytical [1]) by the in a specific CLT CLT conditions. framework that allows us to discrete sampling algorithm frequently used in practice. In fact, manner. The and controlled it is The CLT treat a basic non- known that the basic Metropolis walk analyzed here does not have good complexity properties (rapidlj' mixing) for arbitrary log-concave density functions, in opposition to other and-run^. Nonetheless, the CLT in fact rapidly mixing. Moreover, only Subsection 3.2 walk while all CLT sophisticated schemes. As The is more general setting. approach can be used to establish polynomial bounds standard like hit- random walk is depends on the particular form of the the remaining results are valid in a considerably suggests that the same random walks conditions imply enough structure that the in the literature, we assume that the for This more starting point discrete Metropolis algorithm facilitates the analysis, but they are are not frequently used in practice. See Lovasz and Vempala [33] for a discussion. of the algorithm in the is CLT Indeed, the polynomial approximate support of the posterior. time bound that we derive applies only because this in this case provides enough structure on the problem. Our is the domain where the analysis does not apply outside this domain. Sampling from / using a Random Walk 3. 3.1. Set-Up and Main Result. obtaining a draw from a random In this section By invoking Assumption CI, the approximate support set K measurable density function / restricted to K denoted by Q, i.e., we results to is to will be defined over Q{A) = J^ f{x)dx/ Jj^ f{x)dx for all used random walk induced by a Gaussian distribution. A e A). Asymptotically, will Casella and Robert [6] = and Vempala cr > is completely and 0, its one- procedure of drawing a point y according to a imn{£{y)/i{u), 1} move to [39] commonly concentrate on a Such random walk Gaussian distribution centered on the current point u with covariance matrix with probability mm{f{y)/f{u), 1} it is are capable of performing filter characterized by an initial point uq and a fixed standard deviation latter is defined as the to a K (this density induces a probability distribution such task. In order to prove our complexity bounds, we The this set. draw a random variable according well-known that random walks combined with a Metropolis step move. study the associated our attention entirely to restrict and the accuracy of sampling Consider a measurable space {K,A). Our task on the computational complexity of upon these function / as defined in (2.5). (Section 4 builds integration problem.) we bound variable approximately distributed according to a density for details). y; cr"/ and then, otherwise stay at u (see In the complexity analysis of this algorithm we are interested in bounding the number of steps of the random walk required to draw a random variable from / with a given in bounding the number of evaluations of the precision. Equivalently, we are interested local Ukelihood function ( required for this purpose. Next we review definitions of important concepts relevant for our analysis. tions of these concepts follow Lovasz and Simonovits [30] and Vem.pala [39]. The defini- Let q{x\u) denote the density function associated with a random variable N{u, a"!), and luiA) be the indicator function of the set A. For each u £ distribution after one step of the Pu{A)= I K the one-step distribution P^, the probability random walk starting fi'om u, is defined as m:iiA^4r\A\q{x\u)dx-V0lu{A) (.3.11) where e mm\^,l\ q{x\u)dx =1- f [fW Jk is (3.12) J the probabihty of staying at u after one step of the ball walk from u. random walk said to be proper is if the next point is different A step of the from the current point (which happens with probabihty 1—9). The (K,A, {Pu triple u G K}), along with a starting distribution Qo, defines a Markov We denote by Qt random walk. A distribution Q the probability distribution obtained after chain in K. is called stationary Pu{A)dQ[u) = on {K,A) for if A any t steps of the G A, Q{A). (3.13) Given the random walk described earher, the unique stationary probabihty distribution is induced by the function /, Q{A) and Roberts [6] literature since As mentioned This . is = /^ f{x)dx/ f{x)dx for all before, our goal is method A £ A, see MCMC the main motivation for most of the provides an asymptotic it Jj^ Q Casella e.g. studies found in the to approximate the density of interest. to properly quantify this convergence and for that we need to review additional concepts. The ergodic flow of a set A Q with respect to a distribution ^{A) is defined as = / Pu{K\A)dQ{u) JA It measures the probabihty of the event {u G A,u' ^ A) where u to Q and u' is average flow of points leaving A in and only if ^{A) stationary measure if Pu[K $(A) \ A)dQ{u) one step of the random walk. = = / JA $(A'\A) (1 is distributed according chain is said to be ergodic if /. aU A e A u\ it captures the follows that Q is a since A ^[A) > the case for the A-Iarkov chain induced by the assumptions on for It - Pu{A)) dQ{u) = Q{A) f Pu{A)dQ[i J Pu{A)dQ{u) - f Pu{A)dQ{u) A Markov is obtained after one step of the random walk starting from = ^[K \ A). for every A with < Q{A) < random walk described earlier 1, which due to the P In order to compare two probability distributions Q and we use the total variation distance^° IIP P Moreover, said to be a is - QWtv = M-warm in the analysis are the ^^ , .,,, = mm (p{A) = if ^<M. (3.15) conductance of a set A, which HA) = . 0<QiA)<l/2Q{A) Lovasz and Simonovits Q defined as is as mm . A (3.14) mm{Q{A),Q{K\A)y ' and the global conductance, defined (p Q{A)\. start with respect to sup The key concepts - sup \P{A) ACK UPuiK\A)dQ{u) -— mm -^ . 0<QiA)<l/2 . Q{A) proved the connection between conductance and convergence [30] for the continuous space setting. This result extended an earlier result of Jerome and Sinclair [21, 22], who connected convergence and conductance for discrete state spaces. Lovasz and Simonovits' result can be re-stated as follows. Theorem 2. Let Qq be a M-warm WQt Proof. See Lovasz The main 4> ~ QWtv < and Simonovits result of this Markov chain under the start with respect to the stationary distribution Q. ^1 U - [30]. D , paper provides a lower bound CLT Then, conditions. In particular, for the global conductance of the we show that 1/0 is bounded by a fixed polynomial in the dimension of the parameter space. Theorem a for the 3. (Main Result) Under Assumptions Cl, C2, C3, and random walk setting the parameter as defined in (3.17), the global conductance of the induced Markov chain satisfies l/0 = o(de4^>+^^^ll^'l!'). . In particular, the random, walk requires at most N, ^'^This distance Q since is = Op[ equivalent to the sup^gK \PiA) - Q[A)\ = \ e«^i+8''^ll^'ii5 O (K) (-Y ln(M/e: distance between the density functions associated with J^ \dP - dQ\. P and steps to achieve 0(1), vue WQ^e ~ QWtv ^ £• Invoking the CLT restrictions, ei = o(l), = Op{d) and the number of steps N^ is bounded by £2 • Op {d^ ln(M/£)) D Comment 3.1. In general, the dependence on ei and eo is exponential and this bound does CLT not imply polynomial time ("efficient") computing. However, the ei = . Proof. See Section 3.2. that ||-^'^'||j have that 1/0 = and 0(1) £2 \\I'^\\j = which by Theorem o(l), 3 in turn fi-amework implies imphes polynomial time computing. Next we discuss and bound the dependence on M, the "distance" of the Q Qo from the stationary distribution distribution Qq is M the one-step distribution conditional on a proper We point u e K. A natural as defined in (3.15). emphasize that, initial distribution candidate for a starting move from an arbitrary such choice of Qq could lead to values of in general, that are arbitrary large. In fact, this could happen even in the case of the stationary density being a uniform distribution on a convex set (see case under the Lemma least 1. CLT Let u G [33]). Fortunately, this as K be the associated one-step distribution. and P^ is not the shown by the following lemma. framework With probability at 1/3 the random walk makes a proper move. Conditioned on peiforming a proper m,ove, the one-step distribution is a InM = Under the CLT M-warm 0(dln(\/d||/^||) restrictions, e2\\K\\j = o(l) \nM = Proof. See Appendix A. Combining this result where start start for f, + |jA'||5 and + \\K\\j ei = + e2\\K\\-j). 0[\/d), so that 0{d\nd). ' ' . D with Theorem 3 yields the overall (burn-in plus post burn-in) running time Op{d^\iid). Proof of the Main Result. The 3.2. an analytical result and first is proof of Theorem first result to K= The proof Outline of the Proof. 3.2.1. U 5i 5^2 bound the ergodic U C 53 where Si HA) > X4 A A, 5^ C K \ A, one CLT i /^ P„(/i \ A)dQ{u) + i 4 P„(.4)dQ(a) = ^{K — imphes that \\u — v\\ is i.e., Q(53) is between their one-step distributions also random walk bounded from below for the global conductance from below. Theorem 3 follows 1 /^.^^ is Pu{A)dQ{u) f Q(53). will follow c and Pv{A) < \\Pu - Py\\ > (see Section 3.2.3). by bounding is c. Therefore, the \Pu{A) - Pv{A)\ > Since u and v were Therefore 53 cannot be arbitrarily the following: among all After bounding the global conductance. CLT iso-perimetric problem has a long history in mathematics. surface area. + + are used to ensure that this condition by invoking minimizes a certain criterion within a particular family of perimetric problem for bounded from below. This leads to a lower bound The K \A sets. arbitrary points, the sets Si and 52 are "far" apart. small, or decreasing in the distance between S\ and 52. Therefore properties of the is A in derived in Section 3.2.2. This result will provide a is needs to bound the distance between these The geometric 2c. [31]. disjoint partition \A). Given two points u G Si and v £ 52, we have Pu{K\A) < 1 we This will be achieved by an iso-perimetric inequality which framework and total variation distance follows at least a fixed constant c (to be = A)dQ{u) \ bound on QiS^), which still is P„(A- \ A)dQ(«) the last term from below. lower and S3 consists of points to the other set what Lovasz and Vempala in two terms could be arbitrarily small, the result first tailored to the arguments £ A, consider the particular where the second equality holds because $(A) Since the In the formal proof finally, we have Pu(K 4 i follows the flow of which the one-step probability of going defined later). Therefore established, the second result is bound the conductance from below. provide an outline of the proof, auxiliary results, and, In order to an results: of independent mathematical interest. After the connection is between the iso-perimetric inequality and the ergodic flow allows us to use the two auxihary 3 uses and a geometric property of the particular random walk. The iso-perimetric^' inequality sets. sets with unit conditions and It Theorem 2. consists of finding the set that A famous and familiar example of an iso- volume, find the one that has the smallest 3.2.2. An Iso-perimetric Inequality. A concavity. x,y E IR", we function / R" — > : some /3 £ is by defining a notion of approximate start said to be log-/?-concave if for a € every log- [0,1], liave /(a.T for We IR (0, 1]. / is + (l-a)y)>/?/(xr/(j/)i-" said to be logconcave of log-/?-concave functions is if P can be The talcen equal to one. class rather broad, for example, including various non-smooth and discontinuous functions. Together, the relations (2.7) and (2.8) imply that we can write the functions / and i as the product of e~^^ Lemma Over 2. '^^ and a log-/?-concave function: the set K the functions /(A) := i(X)/ J,^({\)d\ of a Gaussian function, e~'^ , and ({X) are the product and a p -log- concave function whose parameter (3 satisfies ln/3>2-(-ei-e2-||A'||}). D Proof. It foUows from (2.8). In our case, the larger That is is the support set appropriate since the ability densities. CLT K, the larger is the deviation from log-concavity. does not impose strong restrictions on the tail of the prob- Nonetheless, this gives a convenient structure to prove an iso-perimetric inequality which covers even non-continuous cases permitted in the framework described in the previous sections. Lemma K= Consider any measurable partition of the form 3. distance between S\ and 5*2 is at least t, i.e. d{Si,S2) Then for any lower semi-continuous function f{x) = > 5'i U 52 Let Q{S) t. e~ll^ll U Ss such that the = J^ fdx/ m{x), where m is Jj^ fdx. a log-P- concave function, we have Q{S2) > /3^^^^ min {Q(5i), QiS.)} D Proof See Appendix A. Comment 3.2. This Kannan and in Li [23], new Applegate and Kannan assumptions on /, iso-perimetric inequality extends the iso-perimetric inequality in Theorem 2.L The proof for [1]. Unhke the builds on their proof as well as on the ideas inequality in [23], Lemma 3 removes smoothness example, covering both non-log-concave and discontinuous cases. The iso-periraetric inequality of subsets of K Lemma 3 states that, under suitable conditions, the measure of at least one of the original subsets. The Corollary f{x) = Consider any measurable partition of the form 1. > t, two following corollary extends the J previous theorem to cover cases with an arbitrary covariance matrix d{Si, S2) if are far apart, the measure of the remaining subset should be comparable to and let Q{S) = e~2^ '^^m{x), where m >P QiSs) K = Si U SsU S2 such that J^ fdx/ /^ fdx. Then for any lower semi- continuous function is a log-P-concave function, we have y'^^ j^in {q(^Si), ie-^-"''/8 Q{S2)} , where Xmin denotes the minimum, eigenvalue of the positive definite matrix J. Proof. See Appendix A. Bounds on 3.2.3. D ' the Difference of One-step Distributions. Next we relate the total varia- tion distance between two one-step distributions with the Euclidean distance between the Although points that induce them. this approach follows the one there are two important differences which call for a [31, 32, 33] longer rely on log-concavity of /. Second, we use a different in Lova^z and Vempala new proof. random walk. First, We we no start with the following auxiliary result. Lemma 4. Let g compact set , K . : IR" — > IR 6e a function such that Ing Then, for every x e K and r > Lipschitz luith constant L over [g{y)/g[x)\>e-'^\ inf :,, is 0, yeB{x,r)nK Proof The result is Given a compact D obvious. set K, we can bound the Lipschitz constant of the concave function Ing defined in (2.7) by L < sup Lemma v\\ < ^ 5. Let where L f as defined in u,v e is ]|Vln5(A)|| K := B{0, < sup ||A'|1), cr^ ||JA|| < < Xmax\\K\\ jg^j;^, and suppose the Lipschitz constant of Ing on the set (2.5), = (Vd) that (3.16) . jr^ < j^ and \\u — K. Under our assumptions on we have ||Pu-p.||tv<i-^. Proof. See Appendix A. ' D The converse of Lemma 5 states that if two points induce one-step probability distribu- tions that are far apart in the total variation norm, these points must also be far apart in the Euclidean norm. This geometric result provides a key ingredient in the application of the iso-perimetric inequality as discussed earher. Proof of Theorem 3.2.4. where ||ii'|| = 3. Consider the compact support We 0{\/d/Xmin) from Assumption CI. = min {l/4VdL, a Therefore the assumptions of the theorem, using (3.16) it Lemma for f as K — B{0, \\K\\), define \\K\\/V2{)d\ (3.17) . Moreover, under the assumptions of 5 are satisfied. follows that = cj> (3.18) . , l20\raaxVd\\K\\ Fix an arbitrary set to K. We will A e ^ and denote hy A'^ = K\A the complement of A with respect prove that HA) = j Pu{A')dQiu) > ^a^%~mm{Q{A),QiA')}, which implies the desired bound on the global conductance to bounding #(A'^) since Q is (j). Note that (3.19) this is equivalent stationary on {K,A). Consider the following auxiliary definitions: Si^LeA: Pu{A') First S2 assume that Q{Si) < Q{A)/2 this case, $(A) <-^Y = lveA': (a similar P„(A) < ^| , and S^ argument can be made = K\{Si U for 52 and S2). A''). In we have = / P,{A')dQ{u) JA and the inequahty > f P^[A^)dQ{u) > Ja\Si f fdQiu) > ^Q{A\S,) > ^Q{A), JA\Si 6e 6e 12e (3.19) foUows. Next assume that QiSi) > Q{A)/2 and Q{S2) > QiA'')/2. Since $(A) = ^{A') we have that ^A) = j P„{A')dQ{u) = lJ^Pu{A')dQ{u) A\Si > ^ + "^^ ;^^\^J lJ^,P,,{A)dQiv) -r 2 Ja<=\S2 — j^( Us.ldQ{u)^4-Q{S^), 2 J53 G^dQiu) = where we used that ^3 sets Si and 5^2, Lemma we can apply the (5(53). We tire \ ^i) we have 5, U [A" that — ||u 1 \ 52). Given the definitions of the - Pu{A') - Pv{A) > i'|| > 1 - ^- | for every u € Si and v G 52. Thus, iso-perimetric inequahty of CoroUary Pu{A^)dQ{u) 1, with d{Si,S2) > cr/S, to > g|e-i^-"-/64 y2^min{Q(5i),Q(52)} > ^%^ min{Q(A),Q(A^)}. second inequahty also used that tions. Therefore, using relation (3.18) i/4> since the eigenvalues are The remaining Theorem (^ bound obtain J where = e 52 we have t; - PvWtv > Pu{A) - Py{A) = \\Pu In such case, by K \ {S\ U 52) u £ S\ and for every 2 Xmin'^'^ \\K\\ = < A^j„ ,|' ^y^ < '^/d under our to be uniforml}^ Theorem 3 follow bounded from above and away from by invoking the CLT zero. conditions and applying D with the above bound on the conductance. 4. defini- 0{\Jd/\min)^ we obtain = o f/j-^^d) =o(d e'^^+'^-mA assumed results in and Complexity of Monte Carlo Integration This section considers our second problem of interest - that of computing a high dimensional integral of a bounded real valued function g: ixg = g{\)!{\)d\. / (4.20) Jk The A"*, integral A^, . . ., is computed by simulating a dependent (Markovian) sequence of random points which has / as the stationary distribution, and taking 1 /^3 = ^ (4.21) ]^Eff('\') 1=1 as an approximation to (4.20). size draws from the The dependent nature of the sample increases the sample needed to achieve a desired precision compared to the /. It turns (infeasible) case of independent out that as in the preceding analysis, the global conductance of the Markov chain sample will be crucial in determining the appropriate sample size. The starting point of our analysis due to Kipnis and Varadhan versible Markov chain on a central limit theorem for reversible is which [26] restated here for convenience. is K with stationary distribution /. stationary time series {s'(A')}^^j, obtained by starting the distribution / is The Markov chains Consider a re- lag k autocovariance of the Markov chain with the stationary defined as 7fc = Cov/(g(A^),5(V+'^-)). Let us recall a characterization of 7^ via spectral theory following Kipnis and Varadhan [26] : Let T denote the transition operator of the In our case, since the chain Hilbert space Lr{K,A,Q), T reversible, is Markov chain induced by the random walk. a linear bounded self-adjoint operator in the Let Eg denote the measure on Borel sets of (—1,1) see [30]. induced by the spectral measure of is T applied to g, as in With [16]. these definitions, one has that for any k 7fc= f J'^kEa We are prepared to restate the central hmit theorem of Kipnis and Varadhan [26] needed for our analysis. Theorem as (4.21) 4. and For a stationary, irreducible, reversible Markov chain, with 'p.g and fig defined (4.20), +00 k — — oo almost surely. If cr^ Proof. See Kipnis In our case, 70 2, states that Corollary cr^ is finite, then \fN{flg —fig) converges in distribution and Varadhan is finite since g to N{Q,a'l). D [26]. is bounded. The next result, which builds upon Theorem can be bounded using the global conductance of the Markov chain. 2. Let g be a square integrable function with respect to the stationary Q. Under the assumptions of Theorem. 7fc Proof. See Lovaasz < I 1 - and Simonovits y 4, I [30]. we have 70 and that o-g < 70 I -2 measure We will use the mean square error as the measure of closeness for a consistent estimator: MSEiflg) Many to approaches are possible [16] for run E\fig - Mg]2. for constructing the a detailed discussion. Here, • long • = we sequence of draws will analyze three common in (4.21); we refer schemes: (Ir), subsample (ss), • multi-start (ms). Denote the sample corresponding to each method as sizes run scheme consists of generating the first A';,., and Nms- The long A^ss, point using the starting distribution and, after the burn-in period, selecting the Ni^ subsequent points to compute the sample average (4.21). in the start The subsample method sample average scheme uses Nms also uses only (4.21) are spaced out different sample paths, starting distribution /o and picking the last one sample path, but the Nss draws used by S steps of the chain. initializing draw in Finally, the multi- each one independently from the each sample path after the burn-in period to be used in (4.21). There is points are another issue that must be addressed. drawn from the stationary called "burn-in" period B, that is, distribution /. the number All We schemes require that the therefore need to initial compute the so of iterations required to reach the stationary distribution / with a desired precision, beginning at the starting distribution /q. Theorem 5. Let /o be a M-warm and g := start with respect to f, the notation introduced in this section, to obtain MSE{fig) < e sup;^^;^; |5('^)|- it is Using sufficient to use the following lengths of the burn-in sample. B, and post-burn in samples, Nij.,Nss,Mms- Tif In and ^^r The = -:|, A^.. overall complexities of Proof. See ss, — S= {2/<p^) In (670/e)), are thus . For convenience Table method depends on {unth and ms methods Appendix A. that the dependence on the Ir, = 1 M A^^. = |^. B + Ni,., B + SN^s, and B x NmsD . tabulates the bounds for the three different schemes. and g is only via log terms. Note Although the optimal choice of the particular values of the constants, when £ \ 0, the long-run Table Burn-in and Post Burn-in Complexities 1. Method Quantities Complexity Long Run B+ ^ iV,, Subsample B+ N.^ Multi-start B N^^ Table 2. x An ^^vfal^U (in ^^^ j) ^ In ((^^^) S -f f ;^ x ) ^ (lio) ^ (^ In (^)) ^ Burn-in and Post Burn-in Complexities under the CLT. Burn-in Complexity Metliod Long Run Op(dHnd IriE-'^) Subsample Op{dHnd Ine''^) Multi-start Op((i^ Inrf Ine"!) Post-burn-in Complexity +Op{d'^-£-^) + Op{d'^ e''^ Ine-'^) x Op(£"^) algorithm has the smallest (best) bound, while the the multi-start algorithm has the largest bound on the number (worst) CLT conditions, namely |jA'|| and g are constant, though and g grow In this section We nential family. [35], Table 2 presents complexities impUed by the -^ 0, and £2||-ft^P CLT -^ 0. The table assumes 70 straightforward to tabulate the results for the case where 70 requires, namely that ei = bounds apply under a Op(l) and e2||A'|P = slightly Op{l). we verify that our conditions Then we drop Lehmann and Ghosal statistical the concavity and smoothness assumptions to illustrate the of the approach developed in this paper. Concave Cases. Exponential cf. and analysis apply to a variety of begin the discussion with the canonical log-concave cases within the expo- full applicability mation, ei Application to Exponential and Curved-Exponential Families problems. 5.1. it is 0{Vd), at polynomial speed with d. Finally, note that the weaker condition than the 5. of iterations. = [17], Casella and Stone et al. families play a very important role in statistical esti- [27], [37]. especially in high-dimensional contexts, cf. Portnoy For example, the high-dimensional situations arise in modern data tial Moreover, exponen- and econometric applications. sets in technometric familes have excellent approximation properties and are useful for approximation of densities that are not necessarily of the exponential form, Our discussion exposition, based on the asymptotic analysis of Ghosal is we invoke the more canonical assumptions El. Let xi, Stone et cf. . . ,Xn be . iid al. [37]. In order to simplify [17]. similar to those given in Portnoy [35]. observations from a d-dimensional canonical exponential family with density f{x;e)=exp{x'9-i>n{0)), where 6' E an open subset of IR is £ Q. of parameter points 9q Set fx and d —> oo , = as F = and 'fp'i&o) n — Following Portnoy covariance of the observations, respectively. Fix a sequence oo. > [35], we F= re-parameterize the problem, so that the Fisher information matrix For a given prior n on 0, the posterior density of mean and the ip"{Oo), implicitly I. conditioned on the data takes over the form n [|/(.x,;6I) TTniO) oc tt{9) 2 The local in the parameter space parameter space is = = s y/n{x hood/extremum — /i) ^/ri{Q — is a estimate. first By = ^i{9 - ^(A) We = - nij{9)) . be convenient to associate every point 9 9o} - s, meLximum order approximation to the normalized design, we have that by Chebyshev's inequahty, the norm of = It will 9o). Finally, the posterior density of A over where, for x exp {nxO n{e) with an element of A, a translation of the local parameter space, A where = 1 E[s\ — and E [ss'] = can be bounded in probability, s A = x/n(0 — 9q) — s is Id- = = ||s|] given by /(A) likeli- Moreover, Op{'\/d). r }(x)d\ ' X]"=i Xi/n, exp (x'(^/^(A + s)) + n{4>{9o + (A + s)/^i) - ^(^o))) impose the following regularity conditions, following Ghosal ^(^o [17] + (A -I- s)/0I). and Portnoy E2. Consider the following quantities associated with higher moments borhood of the true parameter Bin(c) := sup{Ee|o'(.'c, - 9o: ii)\^ : a G IR^ ||a|| = 1, ||^ - ^ojl^ < cd/n}, e.a B2nic):=sup{Ee\a'{x,~^i)\'^:aeU'^,\\a\\ = l,\\9-9of<cd/n}. [35]: in a neigh- 23 > For any c and all n there are <co + Bin{c) E3. The prior density n p > d' and proper and is cq > such that and S2n(c) < satisfies cq + d". a positivity requirement at the true parameter supln[7r(6i)/7r(6'o)] = 0{d) eee where 9o Moreover, the prior n also the true parameter. is satisfies the following local Lipschitz condition <V{c)^\\e ~ \\mr{e) -lm:{9o)\ for all 6 such that ||^ — latter holding for all c 9q\\" > < eoW cd/n, and some V{c) such that V{c) + c^, the E4 The following condition on the growth rate of the dimension of the parameter assumed Comment is to hold: 5.1. Condition 0. E2 strengthens an analogous assumption of Ghosal [17]. assumption are imphed by the analogous assumption made by Portnoy is cq space d^/n -^ E3 < 0. similar to the assumption on the prior in Ghosal assumption, see [17]. [35]. Both Condition For further discussion of this Condition E4 states that the parameter dimension should not grow too [3]. quickly relative to the sample Theorem 6. Proof. See Appendix A. size. Conditions EI-E4 imply conditions C1-C3. Combining Theorems D 1 and 6, we have the asymptotic normality \f{X)-4>{X)\d\= A / |/(A)-0(A)|<iA + of the posterior, Op(l)=Op(l). Jk Furthermore, we can apply Theorem 3 to the posterior density / to bound the convergence time (number of steps) of the Metropohs walk needed to obtain a draw from / (with a fixed level of accuracy); The convergence time is at most Opid') after the burn-in period; together with the burn-in, the convergence time Opid^nd). is bounds stated Finally, the integration previous section also apply to the posterior in the Non-Concave and Discontinuous Cases. Next we 5.2. /. consider the case of a d-dimen- sional curved exponential family. Being a generaHzation of the canonical exponential family, its many analysis has enough to allow with the previous example. similarities Nonetheless, is it and even various kinds of non-smoothness for non-concavities general in the log- likelihood function. NEl. Let xi, . . ,Xn be . observations fi'om a d-dimensional curved exponential iid family with density /(.T;e)=exp(x'0(77)-V„(0(r?))), . , where ^ £ 0, an open subset NE2. The a convex compact set = ^0 9{'q) ^f ^(%)- The mapping some for = c 6q of IR parameter of interest > c rj IR^^ h^ 9(7]) , and d is 7/, The and that 1|0(?7) - r/(6'o)|| > oo as n -^ oo. takes values from > rjo the interior of lies in true value of 6 induced by Moreover, assume that 0. — whose true value eo!!?/ rjo - is '/oil IR''^ to IR'' rjo is given by where c-d < di < d, the unique solution to the system for some eo > and all ?/ G *. Thus, the parameter 9 corresponds to a high-dimensional linear parametrization of the log-density, and [12], describes the lower-dimensional parametrization of the density of interest. rj There are many classical Lehmann and examples of curved exponential Casella [27], and Bandorff-Nielsen puts a curved structure onto an exponential family is rn{x,a)f{x,9)dx This condition restricts 9 to = where component rj apphcations, often moment lie [2]. a = moment [18], 0. on a curve that can be parameterized as (a,/3) contains Chamberlain of the condition that restriction of the type: a as well as other parameters 0. {9{rj),i] £ ^}, In econometric restrictions represent Euler equations that result fi-om the data X being an outcome of an optimization by rational decision-makers; Singleton example Efron families; see for An example [7], curved exponential framework Imbens is [20], see e.g. and Donald, Imbens and Newey Hansen and [10]. Thus, the a fundamental complement of the exponential fi-amework, at least in certain fields of data analysis. We require the following additional regularity conditions on the NE3. G : For every IR''' -^ IR'^ k, and uniformly such that G'G in 7 £ mapping 9{-). B(0,K.\/(i), there exists a linear operator has eigenvalues bounded from above and away from Figure 2. solid line is This figure illustrates the mapping the mapping while the dash by G. The dash-dot The 9{-). (discontinuous) line represents the linear line represents the deviation map band controlled by induced ri„ and r2n- zero, and every n for v^ (0(770 + where < j|rin|| and 5i„ - 7/\/H) < ||i?2n|j d{vo)) Thus the mapping rj i— 9{i]) of 770- More = An example of a kind of Again, given a prior tt R2n)G-f, and 52nd --*0. implies the continuity of the NE3 generally, condition linearization in the neighborhood of S2n- + allowed to be nonlinear and discontinuous. For example, is > {Id Moreover, those coefficients are such that <52n- ^inVa^O the additional condition of Ji„ = rm + map does impose that the whose quality rjo is rj in a neighborhood admits an approximate controlled by the errors Sin and allowed in this framework on 0, the posterior of mapping map is given in the figure. given the data is denoted by - ni>{9{T]))) n Mv) « 7r(0(r/)) • H fixi; rj) = 77(^(77)) exp {ni'e{ii) . i=l In this framework, we also define the local from the true parameter J = parameters to describe contiguous deviations as y/niv - Vo) - s, s = {G'G)~^G'\/n{x - 11), where s is a first s: = £(7) exp {nx'{e{r,o ^n{d{rio + (7 + — E[s] 0, E[ss'] F = ^{'^ - posterior density of 7 over T, where The maximum likehhood/extremum order approximation to the normalized estimate. Again, similar bounds hold for s)/V^) - 'i]o) = - + n^iOim + e{r]o)) {G'G)~^, and s, is (7 /(7) + = ,- ||s'|| = /Z!^ , Op(Vd). where s)/07)) - niid{^o))) h + s)/V^)). + (5.22) The condition on NE4 The the prior prior n{rj) Theorem 7. Proof. See Appendix A. As before, is the following: <x 77(^(77)), Conditions E2-E4 arid Theorems /" where Tr{9) satisfies NEI-NE4 condition E3. imply conditions C1-C3. ; , lH , 1 1/(7) and 7 prove the asymptotic normality - 0(7)M7 =/ 1/(7)- <A(7)|rf7 + of the posterior, Op(l)=Op(l), ; - " where <t^[l) = ^- 77^ exp ' (27r)^/2det((G'G)-i)i/2 Theorem 3 implies further that the sampling and integration apply to ' 6. ' Tlris mation main this f \ -^7'(G'G)7 2' "^ results of the paper on the polynomial time curved exponential family. ' Conclusion paper studies the computational complexity of Bayesian and quasi-Bayesian in large esti- samples carried out using a basic Metropolis random walk. Our framework permits the parameter dimension of the problem to grow to infinity and allows the underlying log-likelihood or We extremum criterion function to be discontinuous and/or non-concave. establish polynomial complexity by exploiting a central hmit theorem provides structural restrictions for the problem, i.e., approaches a normal density in large samples. The analysis of this paper focused on a basic for its simplicity, it is framework which the posterior or quasi-posterior density ,,' random ..',,.'.,, . ; _ walk. Although not the most sophisticated algorithm available. it is widely used Thus, in principle further improvements could be obtained by considering different kinds of (or variance reduction schemes). As mentioned before, essentially only one random walks lemma of our analysis relies analysis on the particular choice of the random walk. This suggests that most of the apphcable to a variety of different implementations. is Appendix Proof of Theorem Prom CI 1. follows that it < \f{X)-4>{X)\dX f Proofs of Other Results A. f \f{\)-4>W\dX+ f = / |/(A)-^(A)|a!A {f{X) + 4>iX))dX + Op(l) Jk where the .T last equahty follows from Assumption J JNow, denote ^ C^ = m - 1 Cl."^^ (27r)'^/2det(J-i)i/2 ? —-— . and write r-; = ^{X)dX Cn exp[ In e{X)- (~-A'JA l\(p{X)dX <P{X) Combining the expansion /(A) -1 in C2 with conditions imposed in (j){X)dX < Jj^\Cn-exp{ei 0(A) + jj. 2 < 2|C„e°p(i) follows - ezA' JA) - 1| <^(A)dA / |c„-e°p(^)-lU(A)dA Jk The proof then + e2X'JX)-l\4>{X)dX exp (-ei \Cn < C3 ' ' by showing that Cn -^ - 1- 1| We have that R= \\K\\ = 0{\/d), and by assumption Cl .iVJAg-ei-f(A'JA)^;^ e{X)dx |A||<fl C„ g{X)dX (1+0(1))/ Jm<fi (1 + e-2^''^dX 0(1))/ Jm<R -iA'(J+£2J)A dX det(J) (1+0(1)) V det(J + 7iiA|l<fi (2^)'^/2 det((J + eo^J)-')'/^ e2J) dX l|Ai|<fl(27r)^/2det(J-i)V2 For the case of densities, see 4>! it follows from the standard concentration of Lovasz and Vempala [31]. measure arguments for Gaussian Since < £2 bounds 1/2, we can W ~ yV(0, define (1 +62)"^^"^) and V~ N{Q,J-'^) and rewrite our as 1 e-2^1 1 Y/^P{\\W\\<R) f {l+o{l))\l+e2j P{\\V\\<R) > - /||A||<fi^(^)^'^ l+o(l)J^|A||<fl5(A)rfA d/2 1 Vl+e2 (1+0(1)) where the last inequality follows < R) > from P{\\W\\ P{\\./r+l^_W\\ < R) = P{\\V\\ < R). Likewise, Proof of Lemma Proof of Lemma Assume — since ei i c' — * D 0. result follows immediately Let 3. 0, €2 > The 2. Vl-e2, /||;,|]<^fi(A)rfA <^n Therefore Cn —^ M := P ^^^ that there exists a partition of r- — . We will use the Localization Lemma prove the of for 0, i = D (2.7)-(2.8). lemma K = Si U ^2 U S3, with d{Si, S2) {Mls,{x) ~ ls,(x))f{x)dx > We from equations will bj' contradiction. > such that t 1,2. Kannan, Lovasz, and Simonovits [24] in order to reduce a high-dimensional integral to a low-dimensional integral. Lemma 6 (Localization Lemma). Let g and h he two lower semi-continuous Lebesgue integrable functions on IR such that g{x)dx Then there exist two points a,b £ M.^, > and / h{x)dx > and a linear function 7 T~{t)g{{l-t)a + tb)dt>0 and / 0. : [0, 1] f'-^(t)hi{l~t)a -^ IR-i- such that + tb)dt>0, Jo where ([a, 6], Proof. See By 7) is said to form a needle. Kannan, Lovasz, and Simonovits the Localization i''~\-u-)f{{'^ Lemma, -u)a + [24]. there exists a needle (0,6,7) such that ub){Mls.^{{l -u)a + ub) - ls:,{i.l -u)a + ub))du > 0, for i — ||6 = > a\\ = Equivalently, using 7(u) 1,2. - l{u/\\h and v := a\\) {h - a)/\\b - a\\ where we have t, \\b-a\\ 1 for i — {.u)f{a + uv) (Mls^ia /[|6-a|| + uv)ls,{a + uv)du> 'y'^-\u)f{a hand left must contain points as, for and at least is ^'^-'^{u)ls^{a side of (A.23) be positive for in Si an interval whose length Y + > uv)) du i = 0, 1, 2, r In order for the [a,b] Is-^ia can be rewritten 1,2. In turn, this last expression M + uv) - {u)f{a+uv)du> M mm ^2. Since d{Si,S2) We t. > 1 for every Y-\u)f{a + uv)du, <^ and = i 2, uj (A.23) the line segment we have that ^3 n t, prove that will = i + uv)f{a + uv)du. [a, b] contains e IR -f'^-\u)f{a / + uv)du\ (A. 24) which contradicts relation (A.23) and proves the lemma. First, note that f{a + uv) — m{a^uv) ~ e~ll''+™ll e~""+'"i""''''''m(a + ui') where t\ :— 2a'v and tq := -||a|p. m(a + uv)7'^~^(t() Next, recall that By Lemma 7 presented in such that 0m.{u) So and si ?n, this such that . Ti + m{a + / . „ + is rw+t P m{a + by In sq, ^ e-""+''i"+''odu> Jw which ioT soe^'" uv)f'^~''^{u) To := tq r+t /3 sqb^^'^ < m(u) u £ {w.w + uv)'y'^~^{u) /3soe*^" t) every u. Moreover, there exists numbers soe'^'^"'"'^ Due to the log-concavity of and < iri{u) soe^^^ otherwise. by Soe*^" on the right hand side of (A. 24) and on the left hand side of (A. 24). we have After defining ri = ^ /•« f MminM f\\b-a\\ , „ . e-"'+''i "+'''' du, / , ^ _ 1 e~""+'"i"+''"(iu ^ Jw+t \Jo (A.25) J equivalent to f f2 ( fw e-("-^)'+^°+^du>MminM Jw for m(w + t)— and implies that and si exists a unidimensional logconcave function in uv)^'^''^{u) m(w) = Thus, we can replace i a unidimensional log-/3-concave function on u. < m{a + m(u) > replace is still Appendix B, there \^Jo - f2 e^^""^) +^°+^du, p\\b-a\\ / Jw+t - ?- 1 e-^'^-i'^'+^^+^du} J (A.26) . Now, cancel the term for any w, (A. 26) is e^^^"^^!^ on both sides and; since we want the inequahty (A. 26) holding impUed by e'^^'du ^"^ > holding for any w. This inequality Proof of Corollary /(.t) The Lemma < (where a~ Define 5. and Define the direction H2 = {x G Bu n By We : let We a'^I. — B^jHSyn K. By V. satisfies the by applying jqJjji), covariance matrix and Lemma 2.2 in I (A. 27) [23]. For brevity, we """du e / Kannan and = '^ Lemma 3 Consider the change of variables x 2. e^'^m{\/2J~^/~x) result follows Proof of Au,v is e'""'du, / <j Li reproduce the proof. will not = min ^ assumption of Lemma 3 with Z^ Then, x coordinates, D the radius of is in and d{Si,S2) > t\/Xmin/2. x coordinates. K := B{0, R), so that R . K; also let r := 4\/da q{x\u) denote the normal density function centered at u with use the following notation: definition of r, = Bu B{u,r), By we have that /^ q[x\u)dx = = B{v,r), and f^ q{x\v)dx > 1 — \. — u)/\\v — u\\. Let Hi = {x £ B^nBy w'{x — u) > Hu — w]]/2}, w'{x — u) < \\v — u\\/2}. Consider the one-step distributions fi-om u w= {v : have that \\Pu-Pv\\tv 1-/ < mm{dPu,dPy} J Au.v = 1-^^ < 1 - : min|.7(.rl.)min|^,l|,g(x|i>)min|^,lUdx JAu.v < ,.' , , where \\u — v\\ arbitrary u € We mm{q{x\u),q{x\v)}dx Pe~^'' / first < l-/3e-^'"(/ a/S. Next we will q{x\u)dx+ f q{x\v)dx-) bound from below the last sum of integrals for an K. bound the integrals over the possibly larger .sets, respectively H\ and Ho- Let h denote the density function of a univariate random variable distributed as N{0,a~). is easy to see that h{t) the direction w Note that B^ C = J^,,_s_^q{x\u)dx, (up to a translation). Let //s HiU {Ho — \\u — v\vS) U iJa i.e. = {x h : is It the marginal density oi q{.\u) along —\\u — v\\/'2 where the union is < w'{x disjoint. u) < \\v — u\\/2}. Armed with these observations, we have q{x\u)dx + = q{x\v)dx / J > + q{x\u)dx / Jh2 — q{x\u)dx I r\W-v\\/2 r — q{x\u)dx / 1 - h{t)dt I JBu > q{x\u)dx / JH3 J Bu = q{x\u)dx / JH2-\\u-v\\w Hi J-\\u-v\\/2 ^ - / 4 e y_||„_^||/2 1 1 „ „ dt \/2na . \\u — < a/8 by v\\ In order to take the support K 1 8^/2n . /27r cr where we used that 1 e^ l---\]u-v\\^^>l~^ > =>4A.28) 10 the hypothesis of the lemma. into account, we can assume that u,v G dK, i.e. ||m|| = = R (otherwise the integral will be larger). Let z = {v + u)/'2 and define the half space Hz = {x z'x < z'z} whose boundary passes through u and v (Using ||u|| = \\v\\ = R it follows that z'v = z'u = z'z/2). ||w|| : By the symmetry normal density, we have of the = q{x\u)dx Although HidH. does not he in q{x\u)dx. -^ ^ JHi HiHH^ K in general, simple arithmetic shows that HinlH^ — ^jrliT ) fill 13 A'. Using that /„ w„ r^^ q{x\u)dx '?(^|^') , > I > - 1 = q{x\v)dx ^ ||j/- lilii nal to Ibll = z. (m)| < Since y lly 6 /(fST^NP - -II ('^i < = lly > q{x\u)dx I — h{t)dt / .rV«e-*'/2-= f q{x\u)dx / - ^ J Hi "Indeed, take y £ Hi n (if; - we have h{t)dt, Jn dt / \J2i\a Jo 5?" "Fir ) • We can write y - ^11 < \\y - "" a" irfir)' ^"^ ^^^^ \^^ ^{R-ff+r^ = ^JR' - "11 + < ^h ~ 2Hf + = IIIJ' ^ ^+ \\^A + y^. ~ ~ r2 '^11 H^H < R. ^ ~ '') ^ where s, ^"^ - s ||s|| is < also ^ ~ ^' r (since orthogo- Therefore, Q where we used that By symmetry, respectively. j^ < —^ — since r 4^/du and Adding these inequalities q{x\u)dx < j^. and H2 are replaced by v and using (A. 28), we have \ f + 4 —9 - 777^ ^ V3- > q{x\v)dx / JH2nK HiHK -^ when u and Hi the same inequahty holds -^U J 15\/27r Thus, we have \\Pu-PA<\- P-Lr 3 and the result follows since Proof of Lemma Starting from an arbitrary point in 1. walk makes a proper move. max^:Q(^)>o,.4e.4 The result follows D Lr <\. §4! If this is by invoking the CLT We will use the notation K We have that point in random {-iirf- Aet{J-')eh-'J-e^^^+'-^--'^- restrictions. Next we show that the probability p constant. the the case note that mB.^,eK0^ < K, assume that making a proper move of defined in the proof of Lemma is at least a positive Let u be an arbitrary 5. . = j^mmi^^y p ^ l^q{x\u)dx>l3e-^'' ^g^^i^q{x\u)dx - Sb^^h^ q{x\u)dx /^e-^'- /;'/^ h{t)dt > i. D Proof of Theorem 5. Consider the sample i ms) as (A^'-^, A^'-^, defined by ^ 1 with the underlying sequence mean =l produced by one of the schemes ..., A^'-^) is produced by iterating the chain (lr, ss, follows: • for lr, A*'^ = A'"*"^, where starting with an initial A'^-^ draw starting with /o by T^/q. B fi-om /q. Define the density after Thus A^ has the distribution B+ i times steps of the chain T^/o, and A*+^ has the distribution T'+^/o. • for ss, A*''^ • for an = A^^^, ms, A*'^ are initial i.i.d. where 5 is the draws from number T^ fo, i.e. of draws that are "skipped". each i-th draw point from /o and iterating the chain B times. obtained is • . bj' sampling We have that MSE{-[1b,n) = = Etbj^ [M5£(/2b,jv|A^ = Ef [MSE{11b,n\>'^ = A)] Ef [MSE{j1b,n\X'^ = < A)] A)] = Ej MSE{J1b,n\X^ = + Ef = MSE{i2b,n\>^^ r%(A) - + fEf r^/o(A) \) /(A) r^/o(A) A) - 1 /(A) 1 /(A) = {a%/N) + 2g^T''fo-f\\TV, where cr^ yy is the variance of tlie sample average under the assumption that (We exactly according to /. The bound on We cr~ j^ will also used the fact that \\T^ fo — fWrv < «% is smaller than ?/3, which Using Theorem x/Mfl-C 2 2, since /o < VMe~ < In Specifically, cr'^y- lary 2. Thus, To bound j^ < q ^^^ Nir, note that a^ Nij. Nss, = 2a 6 we ^^-^^ first <Jg 70 + where we used Corollary 2 and that choosing the spacing S distributed equivalent to imposing is M-warm start for /, Q\/MgIn mean square < Jo-§i above 2Af7i where the A''-^ 70 + e. last inequality follows suffice to obtain < of post-burn iterations Ni^, error less than must choose a spacing S 71n < a is we determine the number Ngs, or A^ms needed to set the overall To bound is k - /lUi-) QVMg- B > Next we bound X^ \\\T^ depend on the particular scheme, as discussed below. require that the second term that \\T^ fo - fWrv = MSE{j1b,n) < to ensure that the autocovariances 7^ 2iV7o f( 1 and A'+^'^ are spaced by S steps of the chain. By as 2\5 < e"'^^ from Corol- £ < .,..e.S,l,„(^) = and using Nss mean square the , error for the ss method can be bounded as e j^ (70 + 2Ar,37i) + 25^^/0 -/IItk < MSE{j1b,n) To bound Nms, we observed that ^ + £/3 < £ provided that Proof of Theorem Nr,is > 7/,- = for all = B(0, where ||A'f IJA'II) our condition Cl is satisfied by the argument given C2 is satisfied by the argument given £1=0 implying that 7^ A'ISE{'p,B^j\!) < Given 6. A' our condition /c D 2jo/{3e). in = erf, proof of Ghosal's Lemma 4. Further, Lemma 1 with proof of Ghosal's in the and 1 ^2=0 6 and our condition C3 is cd ^ fed ^ ,^. —Bin{0) + —B2n{c) \\\ n n , / , satisfied since by E3 and E4 D 02\\Kf-Q. Comment A.l. Ghosal normal measure under d Proof of Theorem 7. ^ in the [3] and condition C3 = Op £2 = Op is for details). Lemma where \\Kf <52. + satisfied since <5.?„ + ( 1 A. 2. For further and NE3 for some satisfied with , by E3, E4, NE3, and NE4, details Cdi is A&1„(0) + —B2n{C) e2\\Kf^0. Comment = NE3. Further, condition C2 Lemma (C + (1 + 52n)SlnVd?j f B{0,C\/d\ogd). a and discussion, see [3]. His to the concentration of [3]. Then condition Cl 4 and proof of Ghosal's £1 = B(0,CVd) due K = B{0, \\K\\), Take given in the proof of Ghosal's argument given for the set A' 00 asymptotics. For details, see large independent of d (see = proves his results for the set K' [17] arguments actually go through C sufficiently by the argument is satisfied by the Appendix Lemma Let / 7. : — IR ; < = < f[x) g{x) for every a; every x, that for : e fc N,A e IR'',A Recall that the epigraph of a fuirction in IR". In fact, the values of = w is is one dimensional, defined as epi^, = {{x,t) conv(epi^) (the convex hull of Consider {x,m{x)) e epim, since the epigraph is Moreover, : . J < t w{x)}. Using where both epi/j), sets the boundary of conv(epi/i). convex and this point H on (x, m{x)). -^ i=\ m are defined only by points in there exists a supporting hyperplane be the smallest k 1=1 our definitions, we have that epim m let > 0,^A, = l,^A,yi = x\ 1=1 H Now, k ^\rh{yi) points of IR. is, {k Since € lnf{x) a (ln/3)-concave function. concave function greater than h{x) he there exists IR -^ IR such that Pgix) Proof. Consider h{x) Then IR 6e a unidimensional log-P -concave function. > a logconcave function 5 Bounding log-/?-concave functions B. [x, is on the boundary, m{x)) G conv {epijiOH). can be written as convex combination of at most 2 (a;,?7i(a;)) epifi. Furthermore, by definition of log-/?-concavity, lnl//?> sup Xh{y) we have that + {l~X)h{z)-h{Xy + {l-X)z). \e[0,l],y,z Thus, h{x) g{x) = < m{x) < e'"^^) is h{x) +ln(l//?). Exponentiating gives f{x) < < g{x) 4/(a;), where D a logconcave function. References [1] D. Applegate and R. Kannan, Sampling and ACM 23th STOC, Integration of Near Logconcave Functions, Proceedings 156-163, 1993. [2] O. Barndorff-Nielsen, Information and exponential families in statistical theory. Wiley Series in [3] A. Probability and Mathematical Belloni tions of and Curved Statistics. John Wiley On Chernozhukov, V. Exponential Families the under & Sons, Ltd., Chichester, 1978. Asymptotic Increasing Normality Dimensions, of IBM Posterior Research Distribu- Report, http://web.mit.edu/belloni/www/curvexp.pdf. [4] P. J. BiCKEL and [5] 11, BUNKE, O., MiLHAUD, els. [6] A. Yahav, J. Wahrsch. Verw. Geb The Annals G. Casella Some contributions to the asymptotic theory of Bayes solutions, Z. 257-276 (1969). X., 1998. Asymptotic behavior of Bayes estimates under possibly incorrect mod- of Statistics 26 (2), 617-644. AND C. P. Robert, Monte Carlo Statistical Methods, Springer Texts in Statistics, 1999. G. Chamberlain, Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics 34 (1987), no. V. Chernozhukov and 3, 305-334 Hong, An H. MCMC approach to classical estimation, Journal of Econometrics 115 (2003) 293-346. S. Chib, Markov Chain Monte Carlo Methods: Computation and Inference, Handbook of Econometrics, Volume S. 5, by Heckman and J.J. G. Donald, G. moment with conditional E. Learner, 2001 Elsvier Science. W. Imbens, W. Newey, Empirical K. R. Dudley, Uniform Cental Limit Theorems, Cambridge Studies B. Efron, The geometry of exponential G. S. Annals of in Operations Research Letters, Vol. Kannan and Probability 16, No. 4, November 1, tests 55-93. advanced mathematics (2000). Statistics, 6 (1978), no. 2, FiSHMAN, Choosing sample path length and number of sample paths when A. Frieze, R. J. families, and consistent likelihood estimation restrictions. Journal of Econometrics, 117 (2003), no. 362-376. starting at steady state, 1994, pp. 209-220. N. Polson, Sampling from log-concave functions. Annals of Applied pp. 812-834. 4, Geweke and M. Keane, Computationally Intensive Methods for Integration in Econometrics, Hand- book of Econometrics, Volume Geyer, 5, by J. S. Ghosal, Asymptotic normality Practical J.J. Heckman and Markov Chain Monte C. E. Leamer, 2001 Elsvier Science. Carlo, Statistical Science 1992, Vol. 7, of posterior distributions for exponential families No. 4, when 473-511. the number of parameters tends to infinity. Journal of Multivariate Analysis, vol 73, 2000, 49-68. L. P, Hansen and K. J. tional expectations models, Singleton, Generalized instrumental variables estimation of nonlinear Econometrica 50 (1982), no. L Ibragimov and R. Has'minskii, G. W. 5, Statistical Estimation: ra- 1269-1286. Asymptotic Theory, Springer, Berlin (1981). Imbens, One-step estimators for over-identified generalized method of moments models. Rev. Econom. Stud. 64 (1997), no. 3, 359-383. M. Jerrum and A. Sinclair, Conductance and the rapid mixing property for approximation of permanent resolved. Proceedings of the 20th Annual Computing ACM Markov chains: the symposium on Theory of (1988), 235-244. M. Jerrum and A. Sinclair, Approximating the peimanent, SIAM Journal on Computing, 18:6 (1989), 1149-1178. R. Kannan and G. Li, Sampling according on Foundations of Computer Science R. Kannan, ization R. C. KiPNIS J. L. convex bodies, AND L. Discr. Comput. Goem., volume 13, Annual Symposium (1995), pp. 541-559. R. S. S. Varadhan, Central to simple exclusions, Statistics. Springer- Verlag, Tian and MCMC samplers. algorithm, for Structures and Algorithms, volume 11, (1997), pp. 1-50. limit L.J. theorem for additive functionals of reversible processes Comm. Math. Lehmann and G. Gasella, Theory J.S. Liu, L. density, 37th Lovasz, and M. Simonovits, Random walks and an 0'{n^) volume Random and applications E. normal '96), pp. 204. Lovasz, and M. Simonovits, Isoperimetric Problems for Convex Bodies and a Local- L. Lemma, Kannan, to the multivariate (FOGS Phys., 104, 1-19. of point estimation. Second edition. Springer Texts in New York, 1998. Wei Implementation of estimating-function based inference proced.ures with Journal of American Statistical Association, to appear. [29] J.S. Liu, [30] L. Random [31] L. L. Strategies in Scientific Computing, Springer- Verlag, in New York 2001. Convex Bodies and an Improved Volume Algorithm, Structures and Algorithm, 4:4 (1993), 359-412. LoVASZ AND appear [32] Monte Carlo LOVASZ AND M. SiMONOVlTS, Random Walks in LoVASZ and Vempala, The Geometry S. Random S. of Logconcave Functions and Sampling Algorithms, To Structures and Algorithms. Vempala, Hit-and-Run is Fast and Fun, Technical report MSR-TR-2003-05, 2003. Available at http://www-math.mit.edu/ vempala/papers/logcon-hitrun.ps. [33] L. LovASZ and S. Vempala, Hit-and-Run from a Comer, Proc. Theory of Computation (STOC'04)(2004), pp. 310-314.Available at of the 36th ACM Symp. on the http://www-math.mit.edu/ vem- pala/papers/start.ps. [34] N, POLSON, Convergence of Markov Chain Monte Carlo Algorithms, Bayesian Statistics 5, pp. 297-321 (1996). [35] S. PoRTNOY, Asymptotic behavior parameters tends [36] to infinity. Ann. 1, when the number of 356-366. G. O. Roberts and R. L. Tweedie, Exponential Convergence of Langevin Diffusions and Their Discrete Approximations Bernoulli, [37] of likelihood methods for exponential families Statist. 16 (1988), no. C. J. 2, (1996), pp. 341-364. Stone, M, H. Hansen, C, Kooperberg, and Y. K. Truong, Polynomial splines and their tensor products in extended linear modeling. With discussion and a rejoinder by the authors and Jianhua [38] Z. Huang. Annals of A. W. van der Vaart and Statistics, 25 (1997), no. 4, J. 1371-1470. A. Wellner, Weak Convergence and Empirical Processes Spring Series in Statistics (1996). [39] S. Vempala, Geometric Random Walks: A Publications Volume Survey, Combinatorial and Computational Geometry, 52, 2005. 3521 83 MSRI