Recent Progress and Open Problems in Algorithmic Convex Geometry∗ Santosh S. Vempala School of Computer Science Algorithms and Randomness Center Georgia Tech, Atlanta, USA 30332 vempala@gatech.edu Abstract This article is a survey of developments in algorithmic convex geometry over the past decade. These include algorithms for sampling, optimization, integration, rounding and learning, as well as mathematical tools such as isoperimetric and concentration inequalities. Several open problems and conjectures are discussed on the way. Contents 1 Introduction 1.1 2 Basic denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Problems 2 4 2.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Integration/Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.5 Rounding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Geometric inequalities and conjectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Rounding 3.2 Measure and concentration 3.3 Isoperimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Algorithms 7 8 13 4.1 Geometric random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 ∗ This work was supported by NSF awards AF-0915903 and AF-0910584. © Santosh S. Vempala; licensed under Creative Commons License NC-ND Leibniz International Proceedings in Informatics Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 14 2 CONTENTS 1 Introduction Algorithmic convex geometry is the study of the algorithmic aspects of convexity. While convex geometry, closely related to nite-dimensional functional analysis, is a rich, classical eld of mathematics, it has enjoyed a resurgence since the late 20th century, coinciding with the development of theoretical computer science. These two elds started coming together in the 1970's with the development of geometric algorithms for convex optimization, notably the Ellipsoid method. The ellipsoid algorithm uses basic notions in convex geometry with profound consequences for computational complexity, including the polynomial-time solution of linear programs. This development heralded the use of more geometric ideas, including Karmarkar's algorithm and interior point methods. fundamental geometric questions such as rounding, It brought to the forefront i.e., applying ane transformations to sets in Euclidean space to make them easier to handle. The ellipsoid algorithm also resulted in the striking application of continuous geometric methods for the solution of discrete optimization problems such as submodular function minimization. A further startling development occurred in 1989 with the discovery of a polynomial method for estimating the volume of a convex body. This was especially surprising in the light of an exponential lower bound for any deterministic algorithm for volume estimation. The crucial ingredient in the volume algorithm was an ecient procedure to sample nearly uniform random points from a convex body. Over the past two decades the algorithmic techniques and analysis tools for sampling and volume estimation have been greatly extended and rened. One noteworthy aspect here is the development of isoperimetric inequalities that are of independent interest as purely geometric properties and lead to new directions in the study of convexity. Ecient sampling has also lead to alternative polynomial algorithms for convex optimization. Besides optimization, integration and sampling, our focus problems in this survey are rounding and learning. Ecient algorithms for the rst three problems rely crucially on rounding. Approximate rounding can be done using the ellipsoid method and with a tighter guarantee using random sampling. Learning is a problem that generalizes integration. For example, while we know how to eciently compute the volume of a polytope, learning one eciently from random samples is an open problem. For general convex bodies, volume computation is ecient, while learning requires a superpolynomial number of samples (volume computation uses a membership oracle while learning gets only random samples). This survey is not intended to be comprehensive. There are numerous interesting developments in convex geometry, related algorithms and their growing list of applications and connections to other areas (e.g., data privacy, quantum computing, statistical estimators etc.) that we do not cover here, being guided instead by our ve focus problems. 1.1 Basic denitions S of Rn is convex if for any two f : R → R+ is said to be logconcave if Recall that a subset A function n points x, y ∈ S , the interval for any two points x, y ∈ R [x, y] ⊆ S . n and any λ ∈ [0, 1], f (λx + (1 − λ)y) ≥ f (x)λ f (y)1−λ . The indicator function of a convex set and the density function of a Gaussian distribution are two canonical examples of logconcave functions. A density function f : Rn → R+ is said to be isotropic, if its centroid is the origin, and its covariance matrix is the identity matrix. In terms of the associated random variable X, CONTENTS 3 this means that E(X) = 0 E(XX T ) = I. and This condition is equivalent to saying that for every unit vector Z v ∈ Rn , (v T x)2 f (x) dx = 1. Rn f We say that 1 ≤ C Z is C -isotropic, if (v T x)2 f (x) dx ≤ C Rn for every unit vector v. A convex body is said to be isotropic if the uniform density over it is isotropic. For any full-dimensional distribution D with bounded second moments, there is an ane transformation of space that puts it in isotropic position, namely, if z = ED (X) then A = E((X − z)(X − z)T ) and 1 y = A− 2 (X − z) has an isotropic density. For a density function f : Rn → R+ , we let πf be the measure associated with it. The following notions of distance between two distributions variation distance of P Q and P and Q will be used: The Total is dtv (P, Q) = sup |P (A) − Q(A)|. A∈A χ-squared distance of P with respect to Q is 2 Z n Z n dP (u) dP (u) − dQu 2 dQ(u) = dP (u) − 1 χ (P, Q) = dQ(u) dQ(u) R R The The rst term in the last expression is also called the M -warm is said to be w.r.t. Q L2 distance of P w.r.t. Q. Finally, P if P (A) . A∈A Q(A) M = sup A discrete-time Markov chain is dened using a triple (K, A, {Pu : u ∈ K}) along with a Q0 , where K is the state space, A is a set of measurable subsets of K Pu is a measure over K , as a sequence of elements of K , w0 , w1 , . . ., where w0 is chosen from Q0 and each subsequent wi is chosen from Pwi−1 . Thus, the choice of wi+1 depends only on wi and is independent of w0 , . . . , wi−1 . It is easy to verify that a distribution Q is stationary i for every A ∈ A, Z Z Pu (K \ A) dQ(u) = Pu (A) dQ(u). starting distribution and A The K\A conductance of a subset A is dened as R φ(A) = Pu (K \ A) dQ(u) min{Q(A), Q(K \ A)} A and the conductance of the Markov chain is R φ = min φ(A) = A min 0<Q(A)≤ 21 A Pu (K \ A) dQ(u) . Q(A) 4 CONTENTS The following weaker notion of conductance will also be useful. s-conductance A min A:s<Q(A)≤ 12 We let f. 0 ≤ s < 1 2 , the of a Markov chain is dened as R φs = For any Varf (g) Pu (K \ A) dQ(u) . Q(A) − s denote the variance of a real-valued function The unit ball in Rn by Bn . We use O∗ (.) g w.r.t. a density function to suppress error parameters and terms that are polylogarithmic in the leading terms. 2 Problems We now state our focus problems more precisely and mention the current bounds on their complexity. Algorithms achieving these bounds are discussed in a Section 4. As input to these problems, we typically assume a general function oracle, i.e., access to a real-valued function as a blackbox. The complexity will be measured by both the number of oracle calls and the number of arithmetic operations. 2.1 Optimization Input: f : Rn → R+ ; an oracle for a function a point x0 with f (x0 ) > 0; an error parameter ε. Output: a point x such that f (x) ≥ max f − ε. Linear optimization over a convex set is the special case when T c x is a linear function to be minimized and membership oracle for f, K K T f = e−c x K χ (x) where is the convex set. This is equivalent to a in this case. Given only access to the function f , for any logconcave the complexity of optimization is poly(n, log(1/ε)) by either an extension of the Ellipsoid method to handle membership oracles [21], or Vaidya's algorithm [60] or a reduction to sampling [6, 25, 45]. This line of work began with the breakthrough result of Khachiyan [34] showing that the Ellipsoid method proposed by Yudin and Nemirovskii [65] is polynomial for explicit linear programs. The current best time complexity is a reduction to logconcave sampling [45]. O∗ (n4.5 ) achieved by For optimization of an explicit function over a convex set, the oracle is simply membership in the convex set. A stronger oracle is typically available, namely a separation oracle that gives a hyperplane that separates the query point from the convex set, in the case when the point lies outside. Using a separation oracle, the O∗ (n2 ) O∗ (n). ellipsoid algorithm has complexity and Vempala have complexity while Vaidya's algorithm and that of Bertsimas The current frontier for polynomial-time algorithms is when in Rn . f is a logconcave function Slight generalizations, e.g., linear optimization over a star-shaped body is NP-hard, even to solve approximately [49, 9]. 2.2 Integration/Counting Input: an oracle for a function f : Rn → R+ ; ε. Output: a real number A such that Z (1 − ε) Z f ≤ A ≤ (1 + ε) f. a point x0 with f (x0 ) > 0; an error parameter CONTENTS 5 Dyer, Frieze and Kannan [15, 16] gave a polynomial-time randomized algorithm for estimating the volume of a convex body (the special case above when function of a convex body) with complexity poly(n, 1/ε, log(1/δ)), where that the algorithm's output is incorrect. f is the indicator δ is the probability This was extended to integration of logconcave functions by Applegate and Kannan [3] with an additional dependence of the complexity on the Lipschitz parameter of O∗ (n4 ) complexity is f. Following a long line of improvements, the current best using a variant of simulated annealing, quite similar to the algorithm for optimization [45]. In the discrete setting, the natural general problem is when integer points in a convex set, e.g., f is 1 f is a distribution over the only for integer points in a convex body K. While there are many special cases that are well-studied, e.g., perfect matchings of a graph [23], not much is known in general except a roundness condition [30]. 2.3 Learning Input: random points drawn from an unknown distribution in an unknown function Output: A function Rn and their labels given by n f : R → {0, 1}; an error parameter ε. g : Rn → {0, 1} such that P(g(x) 6= f (x)) ≤ ε over the unknown distribution. The special case when f is an unknown linear function can be solved by using any linear classier that correctly labels a sample of size C ε where n log C 1 1 + log ε δ , is a constant. Such a classier can be learned by using any ecient algorithm for linear programming. The case when f is a polynomial threshold function of bounded degree can also be learned eciently essentially by reducing to the case of linear thresholds via a linearization of the polynomial. 2.4 Sampling Input: an oracle for a function f : Rn → R+ ; a point x0 with f (x0 ) > 0; an error parameter ε. Output: a random point x whose distribution distribution with density proportional to D is within ε total variation distance of the f. As the key ingredient of their volume algorithm [16], Dyer, Frieze and Kannan gave a polynomial-time algorithm for the case when f is the indicator function over a convex body. The complexity of the sampler itself was poly(n, 1/ε). Later, Applegate and Kannan [3] generalized this to smooth logconcave functions with complexity polynomial in n, 1/ε and the Lipschitz parameter of the logconcave function. In [45] the complexity was improved to O∗ (n4 ) and the dependence on the Lipschitz parameter was removed. 2.5 Rounding. We rst state this problem for a special case. Input: a membership oracle for a convex body K ; a point x0 ∈ K ; an error parameter ε. Output: a linear transformation A and a point z such that K 0 = A(K − z) satises one the following: of 6 CONTENTS Exact sandwiching: Bn ⊆ K 0 ⊆ RBn . (1) Second moment sandwiching: For every unit vector u, 1 − ε ≤ EK 0 ((u · x)2 ) ≤ 1 + ε. Approximate sandwiching: vol(Bn ∩ K 0) ≥ 1 vol(Bn ) 2 and vol(K 0 ∩ RBn ) ≥ 1 0 vol(K ). 2 (2) When the input is a general density function, the second moment condition extends readily without any change to the statement. The classical Löwner-John theorem says that for any convex body of maximal volume contained in K has the property that K ⊆ nE K , the ellipsoid E √ K ⊆ nE if K (and is centrally symmetric). Thus by using the transformation that makes this ellipsoid a ball, one achieves R=n in the rst condition above (exact sandwiching). Moreover, this is the best possible as shown by the simplex. However, computing the ellipsoid of maximal volume is hard. Lovász [41] showed how to compute an ellipsoid that satises the containment R = n3/2 with using the Ellipsoid algorithm. This remains the best-known deterministic algorithm. A randomized algorithm can achieve R = n(1 + ε) ε > 0 using a simple n · C(ε) random samples for any reduction to random sampling [27]. In fact, all that one needs is [8, 57, 54, 1], and then the transformation to be computed is the one that puts the sample in isotropic position. 3 Geometric inequalities and conjectures We begin with some fundamental inequalities. For two subsets A, B of Rn , their Minkowski sum is A + B = {x + y : x ∈ A, y ∈ B}. The Brunn-Minkowski inequality says that if subsets of vol(A R n A, B and A+B are measurable, compact , then 1 1 1 + B) n ≥ vol(A) n + vol(B) n . (3) The following extension is the Prékopa-Leindler inequality: f, g, h : Rn → R+ , for any three functions satisfying h(λx + (1 − λ)y) ≥ f (x)λ g(x)1−λ for every λ ∈ [0, 1], h(x) dx ≥ Rn λ Z Z Z f (x) dx Rn 1−λ g(x) dx . (4) Rn The product and the minimum of two logconcave functions are also logconcave. We also have the following fundamental properties [12, 39, 55, 56]. CONTENTS 7 Theorem 1. All marginals and the distribution function of a logconcave function are logconcave. The convolution of two logconcave functions is logconcave. I In the rest of this section, we describe inequalities about rounding, concentration of mass and isoperimetry. These inequalities play an important role in the analysis of algorithms for our focus problems. Many of the inequalities apply to logconcave functions. The latter represent the limit to which methods that work for convex bodies can be extended. This is true for all the algorithms we will see with a few exceptions, e.g., an extension of isoperimetry and sampling to star-shaped bodies. We note here that Milman [50] has shown a general approximate equivalence between concentration and isoperimetry over convex domains. 3.1 Rounding We next describe a set of properties related to ane transformations. The rst one below is from [27]. I Theorem 2. [27] Let K be a convex body in R r n in isotropic position. Then, p n+1 Bn ⊆ K ⊆ n(n + 1)Bn . n Thus, in terms of the exact sandwiching condition (1), isotropic position achieves a factor n, and this ratio of the radii of the outer and inner ball, n, is the best possible as shown by the n-dimensional simplex. A bound of O(n) was earlier established by Milman and Pajor of [51]. Sandwiching was extended to logconcave functions in [48] as follows. Theorem 3. [48] Let f be a logconcave density in isotropic position. Then for any t > 0, the level set L(t) = {x : f (x) ≥ t} contains a ball of radius πf (L(t))/e. Also, the point z = arg max f satises kzk ≤ n + 1. I For general densities, it is convenient to dene a rounding parameter as follows. a density function 2 For 2 R = Ef (kX − EXk ) and r be the radius of the largest ball f of measure 1/8. Then R/r is the rounding parameter and √ well-rounded if R/r = O( n). By the above theorem, any logconcave f, let contained in the level set of we say a function is density in isotropic position is well-rounded. An isotropic transformation can be estimated for any logconcave function using random samples drawn from the density proportional to f. In fact, following the work of Bourgain [8] and Rudelson [57], Adamczak et al [1] showed that say, 2-isotropic O(n) random points suce to achieve, position with high probability. Theorem 4. I [1] Let x1 , . . . xm be random points drawn from an isotropic logconcave density f . Then for any ε ∈ (0, 1), t ≥ 1, there exists C(ε, t) such that if m ≥ C(ε, t)n, m 1 X T xi xi − I ≤ ε m i=1 2 √ with probability at least 1 − e−ct C(ε, t) = C n . Moreover, t4 2t2 log2 2 2 ε ε for absolute constants c, C suces. 8 CONTENTS The following conjecture was alluded to in [44]. I Conjecture 5. There exists a constant exists an ellipsoid vol(E E C such that for any convex body K in Rn , there that satises approximate sandwiching: 1 1 vol(E); vol(K ∩ C log nE) ≥ vol(K). 2 2 ∩ K) ≥ 3.2 Measure and concentration The next inequality was proved by Grünbaum [22] (for the special case of the uniform density over a convex body). Lemma 6. [22] Let f : Rn → R+ be a logconcave density function, and let H be any halfspace containing its centroid. Then I Z f (x) dx ≥ H 1 . e This was extended to hyperplanes that come close to the centroid in [6, 48]. Lemma 7. I [6, 48] Let f : Rn → R+ be an isotropic logconcave density function, and let H be any halfspace within distance t from its centroid. Then Z 1 f (x) dx ≥ − t. e H The above lemma easily implies a useful concentration result. I Theorem 8. Let f be a logconcave density in R n from πf . If H is a halfspace containing z , E (πf (H)) ≥ 1 − e and z be the average of m random points r n m . A strong radial concentration result was shown by Paouris [54]. I Lemma 9. [54] Let K be an isotropic convex body. Then, for any t ≥ 1, √ √ P(kXk ≥ ct n) ≤ e−t n where c is an absolute constant. This implies that all but an exponentially small fraction of the mass of an isotropic convex body lies in a ball of radius √ O( n). The next inequality is a special case of a more general inequality due to Brascamp and Lieb. It also falls in a family called Poincaré-type inequalities. Recall that the variance of a real-valued function I f w.r.t. a density function Theorem 10. Let γ be the standard Gaussian density in R function. Then Z Varγ (f ) ≤ n Varµ (f ) denote µ. . Let f : Rn → R be a smooth k∇f k2 dγ. Rn A smooth function here is one that is locally Lipschitz. An extension of this inequality to logconcave restrictions was developed in [63], eliminating the need for smoothness and deriving a quantitative bound. In particular, the lemma shows how variances go down when the standard Gaussian is restricted to a convex set. CONTENTS 9 Theorem 11. [63] Let g be the standard Gaussian density function in R and f : Rn → R+ be any logconcave function. Dene the function h to be the density proportional to their product, i.e., I h(x) = R n f (x)g(x) . f (x)g(x) dx Rn Then, for any unit vector u ∈ Rn , 2 e−b Varh (u · x) ≤ 1 − 2π where the support of f along u is [a0 , a1 ] and b = min{|a0 |, |a1 |}. The next conjecture is called the I Conjecture 12 (Thin shell). For X variance hypothesis or thin shell conjecture. drawn from a logconcave isotropic density f Rn , in Varf (kXk2 ) ≤ Cn We note here that for an isotropic logconcave density, σn2 = Ef ((kXk − i.e., the deviation of √ 1 Varf (kXk2 ) ≤ Cσn2 , n √ kXk from n and the deviation n)2 ) ≤ of kXk2 from E(kXk2 ) = n are closely related. Thus, the conjecture implies that most of the mass of a logconcave distribution lies in shell of constant thickness. √ σn ≤ n is easy to establish for any isotropic logconcave density. √ σn = o( n) [36, 37], a result that can be interpreted as a central limit A bound of showed that Klartag theorem (since the standard deviation is now a smaller order term than the expectation). The current best bound on σn is due to Fleury [19] who showed that σn ≤ n3/8 . We state his result below. I Lemma 13. [19] For any isotropic logconcave measure µ, √ 3 P |kxk − n| ≥ tn 8 ≤ Ce−ct . where c, C are constants. 3.3 Isoperimetry The following theorem is from [14], improving on a theorem in [43] by a factor of two subsets S1 , S2 of Rn , we let d(S1 , S2 ) 2. For denote the minimum Euclidean distance between them. Theorem 14. [14] Let S1, S2, S3 be a partition into measurable sets of a convex body K of diameter D. Then, I vol(S3 ) ≥ 2d(S1 , S2 ) min{vol(S1 ), vol(S2 )}. D A limiting (and equivalent) version of this inequality is the following: Let the interior boundary of voln−1 (∂S ∩ K) ≥ S w.r.t. K. For any subset 2 min{vol(S), vol(K \ S)}, D S ∂S ∩ K of a convex body of diameter denote D, 10 CONTENTS i.e., the surface area of S inside K is large compared to the volumes of S and K \ S. This is in direct analogy with the classical isoperimetric inequality, which says that the surface area to volume ratio of any measurable set is at least the ratio for a ball. Henceforth, for a density function Φf (S) = f, we use Φf (S) to denote the isoperimetric ratio of a subset S, i.e., πf (∂S) min{πf (S), 1 − πf (S)} and Φf = inf n φ(S). S⊆R Theorem 14 can be generalized to arbitrary logconcave measures. Theorem 15. I Let f : Rn → R+ be a logconcave function whose support has diameter D and let πf be the induced measure. Then for any partition of Rn into measurable sets S1 , S2 , S3 , πf (S3 ) ≥ 2d(S1 , S2 ) min{πf (S1 ), πf (S2 )}. D The conclusion can be equivalently stated as statement, applied to the subset S1 Φf ≥ 2/D. For Euclidean distance this latter essentially recovers the above lower bound on πf (S3 ). Next we ask if logconcave functions are the limit of such isoperimetry. Theorem 15 can s-concave functions in Rn for s ≥ −1/(n − 1) with only a loss of a factor of 2 s [10]. A nonnegative function f is s-concave if f (x) is a concave function of s. Logconcave functions correspond to s → 0. The only change in the conclusion is by a factor of 2 on the be extended to RHS. Theorem 16. [10] Let f : R → R+ be an s-concave function with s ≥ −1/(n − 1) and with support of diameter D and let πf be the induced measure. Then for any partition of Rn into measurable sets S1 , S2 , S3 , I πf (S3 ) ≥ n d(S1 , S2 ) min{πf (S1 ), πf (S2 )}. D This theorem is tight, in that there exist ε>0 s-concave functions for whose isoperimetric ratio is exponentially small in Logconcave and s ≤ −1/(n − 1 − ε) for any εn. s-concave functions are natural extensions of convex bodies. It is natural to ask what classes of nonconvex sets/functions might have good isoperimetry. In recent A star-shaped set S is one y ∈ S , the segment [x, y] is also in S . x have convex intersections with S is work [9], the isoperimetry of star-shaped bodies was studied. that contains at least one point The set of all such points called its kernel Theorem 17. KS . x x such that for any for which lines through The kernel is always a convex set. I [9] Let S1 , S2 , S3 be a partition into measurable sets of a star-shaped body S of diameter D. Let η = vol(KS )/vol(S). Then, vol(S3 ) ≥ d(S1 , S2 ) min{vol(S1 ), vol(S2 )}. 4ηD In terms of diameter, Theorem 14 is the best possible, as shown by a cylinder. A more rened inequality is obtained in [27, 48] using the average distance of a point to the center of gravity (in place of diameter). It is possible for a convex body to have much larger diameter than average distance to its centroid (e.g., a cone). In such cases, the next theorem provides a better bound. CONTENTS 11 Theorem 18. [27] Let f be a logconcave density in Rn and πf be the corresponding measure. Let zf be the centroid of f and dene M (f ) = Ef (|x − zf |). Then, for any partition of Rn into measurable sets S1 , S2 , S3 , I ln 2 d(S1 , S2 )πf (S1 )πf (S2 ). M (f ) πf (S3 ) ≥ For an isotropic density, M (f )2 ≤ Ef (|x − zf |2 ) = n and so could be unbounded (e.g., an isotropic Gaussian). Thus, if the logconcave density πf (S3 ) ≥ p f, √ M (f ) ≤ Af The diameter is the covariance matrix of the inequality can be re-stated as: c d(S1 , S2 )πf (S1 )πf (S2 ). Tr(Af ) A further renement is called the KLS hyperplane conjecture [27]. 19 (KLS). Let f Let λ1 (A) be the A. largest eigenvalue of a symmetric matrix I Conjecture c such that n. be a logconcave density in Rn . There is an absolute constant c Φf ≥ p . λ1 (Af ) The KLS conjecture implies the thin shell conjecture (Conj. 12). Even the thin shell conjecture would improve the known isoperimetry of isotropic logconcave densities due to the following result of Bobkov [7]. I Theorem 20. [7] For any logconcave density function f in R Φf ≥ n , c . Var(kXk2 )1/4 Combining this with Fleury's bound 13, we get that for an isotropic logconcave function in Rn , Φf ≥ cn−7/16 . (5) This is slightly better than the bound of cn−1/2 implied by Theorem 18 since E(kXk2 ) = n for an isotropic density. The next conjecture is perhaps the most well-known in convex geometry, and is called the slicing problem, the isotropic constant or simply the hyperplane conjecture. I Conjecture 21 (Slicing). There exists a constant density f in C, such that for an isotropic logconcave Rn , f (0) ≤ C n . In fact, a result of Ball [4] shows that it suces to prove such a bound for the case of f being the uniform density over an isotropic convex body. In this case, the conjecture says Rn is at least cn O(n ) [8, 35]. the volume of an isotropic convex body in best bound on 1/n Ln = supf f (0) is for some constant c. The current 1/4 Recently, Eldan and Klartag [18] have shown that the slicing conjecture is implied by the thin shell conjecture (and therefore by the KLS conjecture as well, a result that was earlier shown by Ball). 12 CONTENTS I Theorem 22. [18] There exists a constant C such that Ln = sup f (0)1/n ≤ Cσn = C sup f q Ef ((kXk − √ n)2 ) f where the suprema range over isotropic logconcave density functions. We conclude this section with a discussion of another direction in which classical isoperimetry has been extended. The points in K u, v in a convex body through dK (u, v) = where u and v cross-ratio distance K (also called Hilbert metric) between two is computed as follows: Let p, q such that the points occur in the order be the endpoints of the chord p, u, v, q . Then |u − v||p − q| = (p : v : u : q). |p − u||v − q| (p : v : u : q) denotes the classical cross-ratio. distance between two sets S1 , S2 We can now dene the cross-ratio as dK (S1 , S2 ) = min{dK (u, v) : u ∈ S1 , v ∈ S2 }. The next theorem was proved in [42] for convex bodies and extended to logconcave densities in [47]. Theorem 23. I [42] Let f be a logconcave density in Rn whose support is a convex body K and let πf be the induced measure. Then for any partition of Rn into measurable sets S1 , S2 , S3 , πf (S3 ) ≥ dK (S1 , S2 )πf (S1 )πf (S2 ). All the inequalities so far are based on dening the distance between minimum over pairs of some notion of pairwise distance. and S2 . and S2 by the It is reasonable to think that perhaps a much sharper inequality can be obtained by using some S1 S1 average distance between Such an inequality was proved in [46], leading to a substantial improvement in the analysis of hit-and-run. Theorem 24. [46] Let K be a convex body in R . Let f : K → R+ be a logconcave density with corresponding measure πf and h : K → R+ , an arbitrary function. Let S1 , S2 , S3 be any partition of K into measurable sets. Suppose that for any pair of points u ∈ S1 and v ∈ S2 and any point x on the chord of K through u and v , I h(x) ≤ n 1 min(1, dK (u, v)). 3 Then πf (S3 ) ≥ Ef (h(x)) min{πf (S1 ), πf (S2 )}. The coecient on the RHS has changed from a minimum to an average. h(x) at a point from S1 , S2 x The weight is restricted only by the minimum cross-ratio distance between pairs respectively, such that x u, v lies on the line between them (previously it was the overall minimum). In general, it can be much higher than the minimum cross-ratio distance between S1 and S2 . CONTENTS 13 3.4 Localization Most of the known isoperimetry theorems can be proved using the localization lemma of 1-dimensional Lovász and Simonovits [44, 27]. It reduces inequalities in high dimension to inequalities by a sequence of bisections and a limit argument. To prove that an inequality is true, one assumes it is false and derives a contradiction in the I Lemma 25. that 1-dimensional case. [44] Let g, h : Rn → R be lower semi-continuous integrable functions such Z Z and g(x) dx > 0 Rn h(x) dx > 0. Rn Then there exist two points a, b ∈ Rn and a linear function ` : [0, 1] → R+ such that 1 Z `(t)n−1 g((1 − t)a + tb) dt > 0 1 Z and 0 `(t)n−1 h((1 − t)a + tb) dt > 0. 0 The points a, b represent an interval A and one may think of sectional area of an innitesimal cone with base area cone truncated at a and b, the integrals of generality, we can assume that a, b g and h dA. l(t)n−1 dA as the cross- The lemma says that over this are positive. Also, without loss of are in the union of the supports of g and h. A variant of localization was used in [9], where instead of proving an inequality for every needle, one instead averages over the set of needles produced by the localization procedure. Indeed, each step of localization is a bisection with a hyperplane and for an inequality to hold for the original set, it often suces for it hold on average" over the partition produced rather than for both parts separately. This approach can be used to prove isoperimetry for star-shaped sets (Theorem 17) or to recover Bobkov's isoperimetric inequality (Lemma 20). We state here a strong version of a conjecture related to localization, which if true implies the KLS hyperplane conjecture. I Conjecture 26. Let f be a logconcave density function and that each part is convex. Then there exists a constant X C P be any partition of Rn such such that σf (P )2 πf (P ) ≤ Cσf2 P where of f σf2 is the largest variance of restricted to the set f along any direction and σf (P )2 is the largest variance P. We have already seen that the conjecture holds when f is a Gaussian with C=1 (Theorem 11). 4 Algorithms In this section, we describe the current best algorithms for the ve focus problems and several open questions. These algorithms are related to each other, e.g., the current best algorithms for integration and optimization (in the membership oracle model) are based on sampling, the current best algorithms for sampling and rounding proceed in tandem etc. In fact, all ve problems have an intimate relationship with random sampling. Integration (volume estimation), optimization and rounding are solved by reductions to random sampling; in the case of integration, this is the only known ecient approach. For rounding, the sampling approach gives a better bound than the current best alternatives. As for the learning problem, its input is a random sample. 14 CONTENTS The Ellipsoid method has played a central role in the development of algorithmic convex geometry. Following Khachiyan's polynomial bound for linear programs [34], the generalization to convex optimization [21, 53, 32] has been a powerful theoretical tool. Besides optimization, the method also gave an ecient rounding algorithm [41] achieving a sandwiching ratio of n1.5 , and a polynomial-time algorithm for PAC-learning linear threshold two halfspaces. functions. It is a major open problem to PAC-learn an intersection of Several polynomial-time algorithms have been proposed for linear programming and convex optimization, including: Karmarkar's algorithm [31] for linear programming, interior- self-concordant barriers point methods for convex optimization over sets with [52], poly- nomial implementation of the perceptron algorithm for linear [13] and conic program [5], simplex-with-rescaling for linear programs [33] and the random walk method [6] that requires only membership oracle access to the convex set. The eld of convex optimization continues to be very active, both because of its many applications but also due to the search for more practical algorithms that work on larger inputs. One common aspect of all the known polynomial-time algorithms is that they use use some type of repeated rescaling of space to eectively make the convex set more round", leading to a complexity that depends on the accuracy to which the output is computed. In principle, such a dependence can be avoided for linear programming and it is a major open problem to nd a strongly polynomial algorithm for solving linear programs. We note here that although there have been many developments in continuous optimization algorithms in the decades since the ellipsoid method, and in applications to combinatorial problems, there has been little progress in the past two decades on general integer programming, with the current best complexity being nO(n) from Kannan's improvement [26] of Lenstra's algorithm [40]. 4.1 Geometric random walks Sampling is achieved by rapidly mixing geometric random walks. For an in-depth survey of this topic, up-to-date till 2005, the reader is referred to [62]. Here we outline the main results, including developments since then. To sample from a density proportional to a function chain whose state space is At a point x, y move to x, K, x, where Px we set up a Markov neighbor y of x from some distribution Px x. y that For example, for uniformly sampling a convex ball walk denes the neighbors of x as all points within a xed distance δ and a random neighbor f. is dened in some eciently sampleable manner, and then we with some probability or stay at the f : Rn → R + , and stationary distribution has density proportional to one step of the chain picks a depends only on body R n is accepted as long as y is also in K. from For sampling a general density, one could use the same neighbor set, and modify the probability of transition to min{1, f (y)/f (x)}. This rejection mechanism is called the Metropolis lter and the Markov chain itself is the ball walk with a Metropolis lter. Another Markov chain that has been successfully analyzed is called point x, we pick a uniform random line l passing x, hit-and-run. At a then a random point on the line l according to the density induced by the target distribution along l. In the case of uniformly sampling a convex body, this latter distribution is simply the uniform distribution on a chord; for a logconcave function it is the one-dimensional density proportional to the target density f along the chosen line l. This process has the advantage of not needing a step-size parameter δ. For ease of analysis, we typically assume that the Markov chain is put with probability 1/2 and attempts a move with probability 1/2. lazy, i.e., it stays Then, it follows that CONTENTS 15 the distribution of the current point converges to a unique stationary distribution assuming the conductance of the Markov chain is nonzero. The rate of convergence is approximately determined by the conductance as given by the following theorem due to Lovász and Simonovits [44], extending earlier work by Diaconis and Stroock [11], Alon [2] and Jerrum and Sinclair [58]. I Theorem 27. [44] Let Q be the stationary distribution of a lazy Markov chain with Q being its starting distribution and Qt the distribution of the current point after t steps. a. Let M = supA Q0 (A)/Q(A). Then, dtv (Qt , Q) ≤ b. Let 0 < s ≤ 1 2 √ 0 t φ2 . M 1− 2 and Hs = sup{|Q0 (A) − Q(A)| : Q(A) ≤ s}. Then, Hs dtv (Qt , Q) ≤ Hs + s φ2 1− s 2 t . c. For any ε > 0, r dtv (Qt , Q) ≤ ε + χ2 (Q0 , Q) + 1 ε t φ2 . 1− 2 Thus the main quantity to analyze in order to bound the mixing time is the conductance. Ball walk. I Theorem 28. The following bound holds on the conductance of the ball walk [28]. For any 0 ≤ s ≤ 1, we can choose the step-size δ for the ball walk in a convex body K of diameter D so that φs ≥ s . 200nD The proof of this inequality is the heart of understanding the convergence. Examining the denitions of the conductance φ and the isoperimetric ratio ΦQ of the stationary distribution, one sees that they are quite similar. In fact, the main dierence is the following: in dening conductance, the weight between two points stepping from x to y x and y depends on the probability density of in one step; for the isoperimetric ratio, the distance is a geometric notion such as Euclidean distance. Connecting these two notions geometric distance and probabilistic distance along with isoperimetry bounds leads to the conductance bound above. This is the generic line of proof with each of these components chosen based on the Markov chain being analyzed. For a detailed description, we refer to [62]. Using Theorem 27(b), we conclude that from an of Qt and Q ε is smaller than M2 t ≥ C 2 n2 D2 ln ε 2M ε O(n2 D2 ) start, the variation distance (6) steps, for some absolute constant The bound of M -warm after C. on the mixing rate is the best possible in terms of the diameter, as shown by a cylinder. Here D2 can be replaced by E(kX − EXk2 ) using the isoperimetric inequality given by Theorem 18. For an isotropic convex body this gives a mixing rate of O(n3 ). The current best isoperimetry for convex bodies (5) gives a bound of KLS conjecture (19) implies a mixing rate of 2 O(n ), O(n2.875 ). The which matches the best possible for the ball walk, as shown for example by an isotropic cube. 16 CONTENTS Hit-and-run. For hit-and-run, one obtains a qualitatively better bound in the following sense. In the case of the ball walk, the starting distribution heaving aects the mixing rate. It is possible to start at points close to the boundary with very small local conductance, necessitating many attempts to make a single step. Hit-and-run does not have this problem and manages to exponentially accelerate out of any corner. This is captured in the next theorem, using the isoperimetric inequality given by Theorem 24. Theorem 29. I Ω(1/nD). [46] The conductance of hit-and-run in a convex body of diameter D is Theorem 30. [46] Let K be a convex body that contains a unit ball and has centroid zK . Suppose that EK (|x − zK |2 ) ≤ R2 and χ2 (Q0 , Q) ≤ M . Then after I t ≥ Cn2 R2 ln3 M , ε steps, where C is an absolute constant, we have d(Qt , Q)tv ≤ ε. The theorem improves on the bound for the ball walk (6) by reducing the dependence on M and ε from polynomial (which is unavoidable for the ball walk) to logarithmic, while R and n. For a body in near-isotropic position, O∗ (n3 ). One also gets a polynomial bound starting from any single interior point. If x is at distance d from the boundary, then the distribution 2 n obtained after one step from x has χ (Q1 , Q) ≤ (n/d) and so applying the above theorem, 3 4 the mixing time is O(n ln (n/dε)). maintaining the (optimal) dependence on √ R = O( n) and so the mixing time is Theorems 29 and 30 have been extended in [45] to arbitrary logconcave functions. Theorem 31. [45] Let f be a logconcave density function with support of diameter D and assume that the level set of measure 1/8 contains a unit ball. Then, I φs ≥ c nD ln(nD/s) where c is a constant. This implies a mixing rate that nearly matches the bound for convex bodies. Ane-invariant walks. In both cases above, the reader will notice the dependence on rounding parameters and the improvement achieved by assuming isotropic position. As we will see in the next section, ecient rounding can be achieved by interlacing with sampling. Here we mention two random walks which achieve the rounding implicitly" by being ane invariant. The rst is a multi-point variant of hit-and-run that can be applied to sampling any density function. It maintains m points x1 , . . . , x m . For each xj , it picks a random combination of the current points, y= m X αi (xi − x) i=1 where the αi are drawn from along this line through xj N (0, 1); the chosen point in the direction of y. xj is replaced by a random point This process in ane invariant and hence one can eectively assume that the underlying distribution is isotropic. The walk was analyzed in [6] assuming a warm start. It is open to analyze the rate of convergence from a general starting distribution. CONTENTS 17 For polytopes whose facets are given explicitly, Kannan and Narayanan [29] analyzed an ane-invariant process called the Dikin walk. At each step, the Dikin walk computes an ellipsoid based on the current point (and the full polytope) and moves to a random point in x this ellipsoid. The Dikin ellipsoid at a point in a polytope Ax ≤ 1 with m inequalities is dened as: Dx = {z ∈ Rn : (x − z)T m X i=1 At x, a new point y ai aTi (x − z) ≤ 1}. (1 − aTi x)2 is sampled from a Dikin ellipsoid and the walk moves to y with prob vol(Dy ) min 1, . vol(Dx ) For polytopes with few facets, this process uses fewer arithmetic operations than the current best bounds for the ball walk or hit-and-run. Open problems. One open problem for both hit-and-run and the ball walk is to nd a single starting point that is as good a start as a warm start. E.g., does one of these walks started at the centroid converge at the same rate as starting from a random point? The random walks we have considered so far for general convex bodies and density functions rely only on membership oracles or function oracles. Can one do better using a separation oracle? When presented with a point K is in the convex set x such an oracle either declares the point or gives a hyperplane that separates x from K. At least for convex bodies, such an oracle is typically realizable in the same complexity as a membership oracle. One attractive process based on a separation oracle is the following parameter δ: At a point x, along the straight line from separates y from K; we pick a random point x to y either reaching in the latter case, we reect y y y reection walk with in the ball of radius δ. We move or encountering a hyperplane about H H that and continue moving towards y. It is possible that for this process, the number of oracle calls (not the number of Markov chain steps) is only O∗ (nD) rather than O∗ (n2 D2 ), and even O∗ (n) for isotropic bodies. It is a very interesting open problem to analyze the reection walk. Our next open problem is to understand classes of distributions that can be eciently sampled by random walks. A recent extension of the convex setting is to sampling star- shaped bodies [9], and this strongly suggests that the full picture is far from clear. One necessary property is good isoperimetry. Can one provide a sucient condition that depends on isoperimetry and some local property of the density to be sampled? More concretely, for what manifolds with nonnegative curvature (for which isoperimetry is known) can an ecient sampling process be dened? Our nal question in this section is about sampling a discrete subset of a convex body, namely the set of lattice points that lie in the body. In [30], it is shown that this problem can be solved if the body contains a suciently large ball, by a reduction to the continuous sampling problem. The idea is that, under this condition, the volume of the body is roughly the same as the number of lattice points and thus sampling the body and rounding to a nearby lattice point is eective. Can this condition be improved substantially? E.g., can one sample lattice points of a near-isotropic convex body? Or lattice points of a body that contains at least half of a ball of radius O( √ n)? 18 CONTENTS 4.2 Annealing Rounding, optimization and integration can all be achieved by variants of the same algorithmic technique that one might call F0 annealing. The method starts at an easy" distribution and goes through a sequence of distributions moving from Fi−1 to traced back to [43]. Fi is ecient and Fm F1 , . . . , Fm where the Fi are chosen so that is a target distribution. This approach can be It was analyzed for linear optimization over convex sets in [25], for volume computation and rounding convex sets in [47] and extended to integration, rounding and maximization of logconcave functions in [45]. For example, in the case of convex optimization, the distribution most of its mass on points whose objective value is at least (1 − ε) Fm is chosen so that times the maximum. Sampling directly from this distribution could be dicult since one begins at an arbitrary point. For integration of logconcave functions, we dene a series of functions, with the nal function being the one we wish to integrate and the initial one being a function that is easy to integrate. The distribution in each phase has density proportional to the corresponding function. We use samples from the current distribution to estimate the ratios of integrals of consecutive functions. Multiplying all these ratios and the integral of the initial function gives the estimate for the integral of the target function. For rounding a logconcave density, as shown by Theorem 4, we need O(n) random sam- ples. Generating these directly from the target density could be expensive since sampling (using a random walk) takes time that depends heavily on how well-rounded the density function is. Instead, we consider again a sequence of distributions, where the rst distribution in the sequence is chosen to be easy to sample and consecutive distributions have the property that if Fi is isotropic, then Fi+1 is near-isotropic. We then sample Fi+1 eciently, apply an isotropic transformation and proceed to the next phase. For both optimization and integration, this rounding procedure is incorporated into the main algorithm to keep the sampling ecient. All three cases are captured in the generic description below. Annealing 1. 2. 3. 4. For i = 0, . . . , m, define i 1 ai = b 1 + √ and fi (x) = f (x)ai . n Let X01 , . . . , X0k be independent random points with density proportional to f0 . For i = 0, . . . , m − 1: starting with Xi1 , . . . , Xik , generate random 1 k points Xi+1 = {Xi+1 , . . . , Xi+1 }; update a running estimate g based on these samples; update the isotropy transformation using the samples. Output the final estimate of g . For optimization, the function to be maximized while and rounding function g fm = f , g fm f, is set to be a suciently high power of is simply the maximum objective value so far. the function For integration the target function to be integrated or rounded. For integration, the starts out as the integral of f0 and is multiplied by the ratio of R fi+1 / R fi in CONTENTS each step. 19 For rounding, g is simply the estimate of the isotropic transformation for the current function. We now state the known guarantees for this algorithm [45]. The complexity improves on O∗ (n10 ) the original I algorithm of Applegate and Kannan [3]. Theorem 32. [45] Let f be a well-rounded logconcave function. compute a number A such that with probability at least 1 − δ , Z (1 − ε) Given ε, δ > 0, we can Z f (x) dx ≤ A ≤ (1 + ε) Rn f (x) dx Rn and the number of function calls is n = O∗ (n4 ) O n4 logc εδ where c is an absolute constant. Any logconcave function can be put in near-isotropic position in time given a point that maximizes as remarked earlier. f. Near-isotropic position guarantees that When the maximum of f f O∗ (n4 ) steps, is well-rounded is not known in advance or computable quickly, the complexity is a bit higher and is the same as that of maximization. Theorem 33. I [45] For any well-rounded logconcave function f , given ε, δ > 0 and a point x0 with f (x0 ) ≥ β max f , the annealing algorithm nds a point x in O∗ (n4.5 ) oracle calls such that with probability at least 1 − δ , f (x) ≥ (1 − ε) max f and the dependence on ε, δ and β is bounded by a polynomial in ln(1/εδβ). This improves signicantly on the complexity of the Ellipsoid method in the memebership oracle model (the latter being Ω(n10 )). We observe here that in this general oracle model, the upper bound on the complexity of optimization is higher than that of integration. The gap gets considerably higher when we move to star-shaped sets. Integration remains where O∗ (n4 /η 2 ) η is the relative measure of the kernel of the star-shaped set, while linear optimization, η being any even approximately is NP-hard. The hardness holds for star-shaped sets with constant [9], i.e., the convex kernel takes up any constant fraction of the set. At one level, this is not so surprising since nding a maximum could require zooming in to a small hidden portion of the set while integration is a more global property. On the other hand, historically, ecient integration algorithms came later than optimization algorithms even for convex bodies and were considerably more challenging to analyze. The known analysis of sampling-based methods for optimization rely on the current point being nearly random from a suitable distribution, with the distribution modied appropriately at each step. It is conceivable that a random walk type method can be substantially faster if its goal is optimization and its analysis directly measures progress on some distance function, rather than relying on the intermediate step of producing a random sample. The modern era of volume algorithms began when Lovász asked the question of whether the volume of convex body can be estimated from a random sample (unlike annealing in which the sampling distribution is modied several times). Eldan [17] has shown that this could require a superpolynomial number of points for general convex bodies. On the other hand, it remains open to eciently compute the volume from a random sample for polytopes with a bounded number of facets. Annealing has been used to speed up the reduction from counting to sampling for discrete sets as well [59]. Here the traditional reduction with overhead linear in the underlying dimension [24] is improved to one that is roughly the square root of the dimension for 20 CONTENTS estimating a wide class of partition functions. The annealing algorithm as described above has to be extended to be nonadaptive, i.e., the exponents used change in an adaptive manner rather than according to a schedule xed in advance. 4.3 PCA The method we discuss in this section is classical, namely Principal Component Analysis (PCA). For a given set of points or a distribution in orthonormal vectors such that for any k ≤ n, Rn , PCA identies a sequence of the span of the rst k vectors in this sequence is the subspace that minimizes the expected squared distance to the given point set or distribution (among all k -dimensional subsets). The vectors can be found using the Singular Value Decomposition (SVD), which can be viewed as a greedy algorithm that nds one vector at a time. More precisely, given a distribution D in Rn , with ED (X) = 0, the top principal E((v T x)2 ). For 2 ≤ i ≤ n, component or singular vector is a unit vector that maximizes i'th the principal component is a unit vector that maximizes the same function among all unit vectors that are orthogonal to the rst i−1 principal components. For a general Gaussian, the principal components are exactly the directions along which the component 1- dimensional Gaussians are generated. Besides the regression property for subspaces, PCA is attractive because it can be computed eciently by simple, iterative algorithms. Replacing the expected squared distance by a dierent power leads to an intractable problem. We have already seen one application of PCA, namely to rounding via the isotropic position. Here we take the principal components of the covariance matrix and apply a transformation to make their corresponding singular values all equal to one (and therefore the variance in any direction is the same). The principal components are the unit vectors along the axes of the inertial ellipsoid corresponding to the covariance matrix of a distribution. We next discuss the application of PCA to the problem of learning an intersection of halfpsaces in Rn . As remarked earlier, this is an open problem even for two assumptions. First, in k but not n k is small compared to might be tolerable. n k = 2. k We make and so algorithms that are exponential Second, instead of learning such an intersection from an arbitrary distribution on examples, we assume the distribution is an unknown Gaussian. This setting was considered by Vempala [61, 64] and by Klivans et al [38]. The rst algorithm learns an intersection of intersection of tion of k k O(k) halfspaces and has complexity (n/ε)O(k) . The second learns an intersec- halfspaces from any Gaussian distribution using a polynomial threshold function O(log k/ε4 ) and therefore has polynomial when k grows with n. of degree is halfspaces from any logconcave input distribution using an complexity nO(log k/ε 4 ) . Neither of these algorithms In recent work [63], PCA is used to give an algorithm whose complexity is poly(n, k, 1/ε)+ C(k, 1/ε) where C(.) is the complexity of learning an intersection of k halfspaces in Rk . Thus one can bound this using prior results as at most n o 4 min k O(log k/ε ) , (k/ε)O(k) . √ log n) ε, forward: rst put the full distribution in isotropic position (using a sample), eectively the algorithm is polynomial for k 2O( For xed up to . The algorithm is straight- making the input distribution a standard Gaussian; then compute the smallest k principal components of the examples that lie in the intersection of the unknown halfspaces. The main claim in the analysis is that this latter subspace must be close to the span of the normals to the unknown halfspaces. This is essentially due to Lemma 11, which guarantees that the smallest k principal components are in the subspace spanned by the normals, while CONTENTS 21 orthogonal to this subspace, all variances are equal to that of the standard Gaussian. The algorithm and its analysis can be extended to arbitrary convex sets whose normals lie in an unknown k -dimensional subspace. k. The complexity remains a xed polynomial in n times an exponential in For general convex bodies, given positive and negative examples from an unknown Gaus- √ sian distribution, it is shown in [38] that the complexity is on ε, along with a nearly matching lower bound. 2Õ( n) (ignoring the dependence √ A similar lower bound of 2Ω( n) holds for learning a convex body given only random points from the body [20]. These constructions are similar to Eldan's [17] and use polytopes with an exponential number of facets. Thus an interesting open problem is to learn a polytope in Rn with m facets given either uniform random points from it or from a Gaussian. It is conceivable that the complexity is poly(m, n, 1/ε). The general theory of VC-dimension already tells us that suce. O(mn) samples Such an algorithm would of course also give us an algorithm for estimating the volume of a polytope from a set of random points and not require samples from a sequence of distributions. Acknowledgements. The author is grateful to Daniel Dadush, Elena Grigorescu, Ravi Kannan, Jinwoo Shin and Ying Xiao for helpful comments. References 1 R. Adamczak, A. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. J. Amer. Math. Soc., 23:535561, 2010. 2 3 N. Alon. Eigenvalues and expanders. Combinatorica, 6:8396, 1986. 4 K. M. Ball. Logarithmically concave functions and sections of convex sets in rn. Studia Mathematica, 88:6984, 1988. 5 A. Belloni, R. M. Freund, and S. Vempala. An ecient rescaled perceptron algorithm for conic systems. Math. Oper. Res., 34(3):621641, 2009. 6 D. Bertsimas and S. Vempala. Solving convex programs by random walks. J. ACM, 51(4):540 556, 2004. 7 S. Bobkov. On isoperimetric constants for log-concave probability distributions. Geometric aspects of functional analysis, Lect. notes in Math., 1910:8188, 2007. 8 J. Bourgain. Random points in isotropic convex sets. Convex geometric analysis, 34:5358, 1996. 9 K. Chandrasekaran, D. Dadush, and S. Vempala. Thin partitions: Isoperimetric inequalities and a sampling algorithm for star shaped bodies. In SODA, pages 16301645, 2010. 10 K. Chandrasekaran, A. Deshpande, and S. Vempala. Sampling s-concave functions: The limit of convexity based isoperimetry. In APPROX-RANDOM, pages 420433, 2009. 11 P. Diaconis and D. Stroock. Geometric bounds for eigenvalues of markov chains. Ann. Appl. Probab., 1(1):3661, 1991. 12 A. Dinghas. Uber eine klasse superadditiver mengenfunktionale vonbrunn-minkowskilusternik-schem typus. Math. Zeitschr., 68:111125, 1957. 13 J. Dunagan and S. Vempala. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Prog., 114(1):101114, 2008. D. Applegate and R. Kannan. Sampling and integration of near log-concave functions. In STOC '91: Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 156163, New York, NY, USA, 1991. ACM. 22 CONTENTS 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 M. E. Dyer and A. M. Frieze. Computing the volume of a convex body: a case where randomness provably helps. In Proc. of AMS Symposium on Probabilistic Combinatorics and Its Applications, pages 123170, 1991. M. E. Dyer, A. M. Frieze, and R. Kannan. A random polynomial time algorithm for approximating the volume of convex bodies. In STOC, pages 375381, 1989. M. E. Dyer, A. M. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM, 38(1):117, 1991. R. Eldan. A polynomial number of random points does not determine the volume of a convex body. http://arxiv.org/abs/0903.2634, 2009. R. Eldan and B. Klartag. Approximately gaussian marginals and the hyperplane conjecture. http://arxiv.org/abs/1001.0875, 2010. B. Fleury. Concentration in a thin euclidean shell for log-concave measures. J. Funct. Anal., 259(4):832841, 2010. N. Goyal and L. Rademacher. Learning convex bodies is hard. In COLT, 2009. M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, 1988. B. Grunbaum. Partitions of mass-distributions and convex bodies by hyperplanes. Pacic J. Math., 10:12571261, 1960. M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM, 51(4):671697, 2004. M. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169188, 1986. A. T. Kalai and S. Vempala. Simulated annealing for convex optimization. Math. Oper. Res., 31(2):253266, 2006. R. Kannan. Minkowski's convex body theorem and integer programming. Math. Oper. Res., 12(3):415440, 1987. R. Kannan, L. Lovász, and M. Simonovits. Isoperimetric problems for convex bodies and a localization lemama. Discrete & Computational Geometry, 13:541559, 1995. R. Kannan, L. Lovász, and M. Simonovits. Random walks and an O∗ (n5 ) volume algorithm for convex bodies. Random Structures and Algorithms, 11:150, 1997. R. Kannan and H. Narayanan. Random walks on polytopes and an ane interior point method for linear programming. In STOC, pages 561570, 2009. R. Kannan and S. Vempala. Sampling lattice points. In STOC, pages 696700, 1997. N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4(4):373396, 1984. R. M. Karp and C. H. Papadimitriou. On linear characterization of combinatorial optimization problems. SIAM J. Comp., 11:620632, 1982. J. A. Kelner and D. A. Spielman. A randomized polynomial-time simplex algorithm for linear programming. In STOC, pages 5160, 2006. L. G. Khachiyan. Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics, 20:5372, 1980. B. Klartag. On convex perturbations with a bounded isotropic constant. Geom. and Funct. Anal., 16(6):12741290, 2006. B. Klartag. A central limit theorem for convex sets. Invent. Math., 168:91131, 2007. B. Klartag. Power-law estimates for the central limit theorem for convex sets. J. Funct. Anal., 245:284310, 2007. A. R. Klivans, Ryan O'Donnell, and R. A. Servedio. Learning geometric concepts via gaussian surface area. In FOCS, pages 541550, 2008. L. Leindler. On a certain converse of Hölder's inequality ii. Acta Sci. Math. Szeged, 33:217 223, 1972. CONTENTS 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 H. W. Lenstra. Integer programming with a xed number of variables. Math. of Oper. Res., 8(4):538548, 1983. L. Lovász. An Algorithmic Theory of Numbers, Graphs and Convexity, volume 50 of CBMSNSF Conference Series. SIAM, 1986. L. Lovász. Hit-and-run mixes fast. Math. Prog., 86:443461, 1998. L. Lovász and M. Simonovits. On the randomized complexity of volume and diameter. In Proc. 33rd IEEE Annual Symp. on Found. of Comp. Sci., pages 482491, 1992. L. Lovász and M. Simonovits. Random walks in a convex body and an improved volume algorithm. In Random Structures and Alg., volume 4, pages 359412, 1993. L. Lovász and S. Vempala. Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In FOCS, pages 5768, 2006. L. Lovász and S. Vempala. Hit-and-run from a corner. SIAM J. Computing, 35:9851005, 2006. L. Lovász and S. Vempala. Simulated annealing in convex bodies and an O∗ (n4 ) volume algorithm. J. Comput. Syst. Sci., 72(2):392417, 2006. L. Lovász and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random Structures and Algorithms, 30(3):307358, 2007. J. Luedtke, S. Ahmed, and G. L. Nemhauser. An integer programming approach for linear programs with probabilistic constraints. Math. Program., 122(2):247272, 2010. E. Milman. On the role of convexity in isoperimetry, spectral gap and concentration. Invent. Math., 177(1):143, 2009. V. Milman and A. Pajor. Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed n-dimensional space. Geometric aspects of Functional Analysis, Lect. notes in Math., pages 64104, 1989. Yu. Nesterov and A. Nemirovski. Interior-Point Polynomial Algorithms in Convex Programming. Studies in Applied Mathematics. SIAM, 1994. M. W. Padberg and M. R. Rao. The russian method for linear programming iii: Bounded integer programming. NYU Research Report, 1981. G. Paouris. Concentration of mass on convex bodies. Geometric and Functional Analysis, 16:10211049, 2006. A. Prekopa. Logarithmic concave measures and functions. Acta Sci. Math. Szeged, 34:335343, 1973. A. Prekopa. On logarithmic concave measures with applications to stochastic programming. Acta Sci. Math. Szeged, 32:301316, 1973. M. Rudelson. Random vectors in the isotropic position. Journal of Functional Analysis, 164:6072, 1999. A. Sinclair and M. Jerrum. Approximate counting, uniform generation and rapidly mixing markov chains. Information and Computation, 82:93133, 1989. D. Stefankovic, S. Vempala, and E. Vigoda. Adaptive simulated annealing: A near-optimal connection between sampling and counting. In FOCS, pages 183193, Washington, DC, USA, 2007. IEEE Computer Society. P. M. Vaidya. A new algorithm for minimizing convex functions over convex sets. Math. Prog., 73:291341, 1996. S. Vempala. A random sampling based algorithm for learning the intersection of half-spaces. In FOCS, pages 508513, 1997. S. Vempala. Geometric random walks: A survey. MSRI Combinatorial and Computational Geometry, 52:573612, 2005. S. Vempala. Learning convex concepts from gaussian distributions with PCA. In FOCS, 2010. S. Vempala. A random sampling based algorithm for learning the intersection of half-spaces. JACM, to appear, 2010. 23 24 CONTENTS 65 D. B. Yudin and A. S. Nemirovski. Evaluation of the information complexity of mathematical programming problems (in russian). Ekonomika i Matematicheskie Metody, 13(2):345, 1976.