Cool with a Gaussian: An π∗ (π3 ) volume algorithm Ben Cousins and Santosh Vempala The Volume Problem Given a measurable, compact set K in n-dimensional space, find a number A such that: 1 − π vol πΎ ≤ π΄ ≤ 1 + π vol πΎ K is given by ο½ a point π₯0 ∈ πΎ, s.t. π₯0 + π΅π ⊆ πΎ ο½ a membership oracle: answers YES/NO to “π₯ ∈ πΎ? " Volume: first attempt ο½ Divide and conquer: ο½ Difficulty: number of parts grows exponentially in n. More generally: Integration Input: integrable function f: π π → π + specified by an oracle, point x, error parameter ε . Output: number A such that: 1 − π ∫ π ≤ π΄ ≤ (1 + π)∫ π Volume is the special case when f is a 0-1 function. High-dimensional problems ο½ ο½ ο½ ο½ ο½ Integration (volume) Optimization Learning Rounding Sampling All hopelessly intractable in general, even to approximate. High-dimensional problems Input: ο½ A set of points S in n-dimensional space π π or a distribution in π π ο½ A function f that maps points to real values (could be the indicator of a set) ο½ What is the complexity of computational problems as the dimension grows? ο½ Dimension = number of variables ο½ Typically, size of input is a function of the dimension. Structure Q. What structure makes high-dimensional problems computationally tractable? (i.e., solvable with polynomial complexity) ο½ Convexity and its extensions appear to be the frontier of polynomial-time solvability. Volume: second attempt: Sandwiching Thm (John). Any convex body K has an ellipsoid E s.t. πΈ ⊆ πΎ ⊆ ππΈ. E = maximum volume ellipsoid contained in K. Thm (KLS95). For a convex body K in isotropic position, π+1 π΅π π ο½ ⊆πΎ⊆ π π + 1 π΅π Also a factor n sandwiching, but with a different ellipsoid. Isotropic position and sandwiching ο½ For any convex body K (in fact any set/distribution with bounded second moments), we can apply an affine transformation so that for a random point x from K : πΈ π₯ = 0, πΈ π₯π₯ π = πΌπ . ο½ Thus K “looks like a ball” up to second moments. ο½ How close is it really to a ball? K lies between two balls with radii within a factor of n. ο½ Volume via Sandwiching ο½ The John ellipsoid can be approximated using the Ellipsoid algorithm, s.t. πΈ ⊆ πΎ ⊆ π1.5 πΈ ο½ The Inertial ellipsoid can be approximated to within any constant factor (we’ll see how) ο½ Using either one, πΈ ⊆ πΎ ⊆ ππ(1) πΈ ο½ Polytime algorithm, ππ ο½ Can we do better? π£ππ πΈ ≤ π£ππ πΎ ≤ ππ(π) π£ππ πΈ . π approximation Complexity of Volume Estimation Thm [E86, BF87]. For any deterministic algorithm that uses at most ππ membership calls to the oracle for a convex body K and computes two numbers A and B such that A ≤ vol K ≤ B, there is some convex body for which the ratio B/A is at least ππ π log π n 2 where c is an absolute constant. Thm [DF88]. Computing the volume of an explicit polytope π΄π₯ ≤ π is #P-hard, even for a totally unimodular matrix A and rational b. Complexity of Volume Estimation Thm [BF]. For deterministic algorithms: # oracle calls approximation factor Thm [Dadush-V.13]. Matching upper bound of 1 + π π in time 1 π π π poly(π). Randomized Volume/Integration [DFK89]. Polynomial-time randomized algorithm that estimates volume to within relative error 1 + π with 1 1 probability at least 1 − πΏ in time poly(n, , log ). π πΏ [Applegate-K91]. Polytime randomized algorithm to estimate integral of any (Lipshitz) logconcave function. Volume Computation: an ongoing adventure Dyer-Frieze-Kannan 89 23 Lovász-Simonovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 5 LV 03,04 LV 06 Cousins-V. 13, 14 Power New aspects everything 16 localization 10 logconcave integration 10 ball walk 8 error analysis 7 multiple improvements speedy walk, isotropy 4 annealing, wt. isoper. 4 integration, local analysis 3 Gaussian cooling Does it work? ο½ [Lovász-Deák 2012] implemented [LV] π4 algorithm ο½ ο½ ο½ worked for cubes up to dimension 9 but too slow after that. [CV13] Matlab implementation of a new algorithm Rotated cubes Volume: third attempt: Sampling ο½ Pick random samples from ball/cube containing K. Compute fraction c of sample in K. Output c.vol(outer ball). ο½ Need too many samples! ο½ ο½ Volume via Sampling [DFK89] π΅ ⊆ πΎ ⊆ π π΅. Let πΎπ = πΎ ∩ 2π/π π΅, π = 0, 1, … , π = π log π . vol(K1 ) vol(K 2 ) vol(K m ) vol K = vol B . … . vol(K 0 ) vol(K1 ) vol(K m−1 ) Estimate each ratio with random samples. Volume via Sampling πΎπ = πΎ ∩ 2π/π π΅, π = 0, 1, … , π = π log π . vol(K1 ) vol(K 2 ) vol(K m ) vol K = vol B . … . vol(K 0 ) vol(K1 ) vol(K m−1 ) Claim. vol K i+1 ≤ 2. vol K i . Total #samples = π π. 2 π But, how to sample? = π ∗ π2 . Sampling ο½ Generate ο½ ο½ ο½ a uniform random point from a compact set S or with density proportional to a function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. Sampling Input: function f: π π → π + specified by an oracle, point x, error parameter ε. Output: A point y from a distribution within distance ε of distribution with density proportional to f. Logconcave functions ο½ π: π π → π + is logconcave if for any π₯, π¦ ∈ π π , π ππ₯ + 1 − π π¦ ≥ π π₯ π π π¦ ο½ Examples: ο½ ο½ ο½ ο½ ο½ 1−π Indicator functions of convex sets are logconcave Gaussian density function exponential function Level sets of f, πΏπ‘ = π₯ βΆ π π₯ ≥ π‘ , are convex. Product, minimum, convolution preserve logconcavity Algorithmic Applications Given a blackbox for sampling logconcave densities, we get efficient algorithms for: ο½ ο½ ο½ ο½ Rounding Convex Optimization Volume Computation/Integration some Learning problems Rounding via Sampling Sample m random points from K; Compute sample mean and sample covariance matrix 1. 2. ο½ 3. π΄ = πΈ( π₯ − π§ π₯ − π§ π ). π§=πΈ π₯ 1 2 − Output π΅ = π΄ . B(K-z) is nearly isotropic. Thm. C(π).n random points suffice to get πΈ π΄−πΌ 2 [Adamczak et al; improving on Bourgain, Rudelson] I.e., for any unit vector v, 1 + π ≤ πΈ π£ππ₯ 2 ≤ 1 + π. ≤ π. How to Sample? Ball walk: At x, -pick random y from π₯ + πΏπ΅π -if y is in K, go to y Hit-and-Run: At x, -pick a random chord L through x -go to a random point y on L Complexity of Sampling Thm. [KLS97] For a convex body, the ball walk with an M-warm start reaches an (independent) nearly random point in poly(n, R, M) steps. π0 π π = sup π π ππ π = πΈπ0 π0 π₯ π π₯ Thm. [LV03]. Same holds for arbitary logconcave density functions. From a warm start, complexity is π∗ (π2 π 2 ). ο½ Isotropic transformation makes R=O( π). KLS volume algorithm: π × π × π3 = π5 Markov chains ο½ ο½ ο½ ο½ State space K set of measurable subsets that form a π-algebra, i.e., closed under finite unions and intersections A next step distribution ππ’ . associated with each point u in the state space. A starting point. π€0 , π€1 , … , π€π , … s.t. π π€π ∈ π΄ π€0 , π€1 , … , π€π−1 ) = π(π€π ∈ π΄ | π€π−1 ) ο½ Convergence Stationary distribution Q, ergodic “flow” is: Φ π΄ = π΄ ππ’ πΎ\A ππ(π’) For any subset π΄, we have Φ π΄ = Φ(πΎ\A) Conductance: ∫π΄ ππ’ πΎ\A ππ(π’) π π΄ = min π π΄ , π πΎ\A π = inf π(π΄) π΄ 1 Rate of convergence is bounded by 2 [LS93, JS86]. π Conductance Arbitrary measurable subset S. How large is the conditional escape probability from S? Local conductance can be arbitrarily small for the ball walk. vol π₯ + πΏπ΅π ∩ πΎ β π₯ = vol(πΏπ΅π ) Conductance Need: ο½ Nearby points have overlapping one-step distributions ο½ Large subsets have large boundaries [isoperimetry] π π π3 ≥ π π1 , π2 min π π1 , π π2 π where π 2 = πΈπΎ | π₯| 2 Isoperimetry and the KLS conjecture π π π π3 ≥ π(π1 , π2 ) min π π1 , π π2 A = πΈ((π₯ − π₯)(π₯ − π₯)π ) : covariance matrix of π π 2 = πΈπ π₯−π₯ Thm. [KLS95]. π π3 ≥ Conj. [KLS95]. π π3 ≥ 2 π ππ π΄ π π1 π΄ = ππ π΄ = ππ (π΄) π π(π1 , π2 ) min π π1 , π(π2 ) π(π1 , π2 ) min π π1 , π(π2 ) KLS hyperplane conjecture π΄ = πΈ(π₯π₯ π ) Conj. [KLS95]. π π3 ≥ π π1 π΄ π(π1 , π2 ) min π π1 , π(π2 ) • Could improve sampling complexity by a factor of n • Implies well-known conjectures in convex geometry: slicing conjecture and thin-shell conjecture • But wide open! KLS, Slicing, Thin-shell thin shell slicing KLS current bound π1/3 [Guedon-Milman] π1/4 [Bourgain, Klartag] ~ π1/3 [Bobkov; Eldan-Klartag] All are conjectured to be O(1). Conjectures are equivalent! [Ball, Eldan-Klartag]. Is rapid mixing possible? Ball walk can have bad starts, but Hit-and-run escapes from corners Min distance isoperimetry is too coarse Average distance isoperimetry ο½ ο½ How to average distance? β π₯ ≤ πππ π π’, π£ βΆ π’ ∈ π1 , π£ ∈ π2 , π₯ ∈ β(π₯, π¦) Thm.[LV04] π π3 ≥ πΈ β π₯ π(π1 )π(π2 ) Hit-and-run mixes rapidly ο½ Thm [LV04]. Hit-and-run mixes in polynomial time from any starting point inside a convex body. 1 ππ· ο½ Conductance = Ω ο½ Along with isotropic transformation, gives π∗ π3 sampling algorithm. Simulated Annealing [LV03, Kalai-V.04] To estimate ∫ π consider a sequence π0 , π1 , π2 , … , π = ππ with ∫ π0 being easy, e.g., constant function over ball. Then, ∫ π = ∫ π1 ∫ π2 ∫ ππ ∫ π0 . . … . ∫ π0 ∫ π1 ∫ ππ−1 Each ratio can be estimated by sampling: 1. Sample X with density proportional to ππ 2. Compute π = Then, πΈ π = ∫ ππ+1 π ππ π ππ+1 π ππ π . ππ π ∫ ππ π ππ = ∫ ππ+1 . ∫ ππ Annealing [LV06] ο½ Define: ππ π = π −ππ π ο½ π0 = 2π , ππ+1 = ππ / 1 + ο½ π ~ π log(2π /π) phases 1 π , ππ = π 2π ∫ π1 ∫ π2 ∫ ππ ο½ π0 . . … . ∫ π0 ∫ π1 ∫ ππ−1 ο½ The final estimate could be πΩ(π) , so each ratio could be π π or higher. How can we estimate it with a few samples?! Annealing [LV03, 06] ο½ ο½ ππ π = π −ππ π π0 = 2π , ππ+1 = ππ / 1 + Lemma. ππ΄π π = ο½ ππ+1 π ππ π 1 π , ππ = π 2π < 4 πΈ π 2. Although expectation of Y can be large (exponential even), we need only a few samples to estimate it! LoVe algorithm: π × π × π3 = π4 Variance of ratio estimator Let π ππ , π = Then, πΈ π = ∫ πΈ π πΈ π 2 2 π ππ+1 ,π , π= π ππ ,π π ππ+1 , π π ππ ,π . ππ π ππ , π ∫ π ππ , π π −ππ π = ∫ π ππ+1 ,π ∫ π ππ ,π 2 = π ππ+1 , π π ππ , π ∫ π ππ , π ππ ⋅ 2 π ππ , π ∫ π ππ , π ∫ π ππ+1 , π = ∫ π 2ππ+1 − ππ , π ∫ π ππ , π ∫ π ππ+1 , π 2 πΉ π 1−πΌ πΉ π 1+πΌ = πΉ π 2 (would be at most 1, if F was logconcave…) 2 Variance of ratio estimator Lemma. For any logconcave f and a >0, the function π π = ππ ∫ π π, π ππ is also logconcave. So ππ πΉ(π) is logconcave and (ππ πΉ π )2 ≥ π 1−πΌ π πΉ(π 1 − πΌ ) π 1 + πΌ Therefore: πΉ π 1−πΌ πΉ π 1+πΌ πΉ π 2 for πΌ ≤ 1 . π ≤ 1 1−πΌ 1+πΌ π ≤ π 1 1−πΌ2 ≤ 4. π πΉ(π 1 + πΌ ) Volume Computation: an ongoing adventure Dyer-Frieze-Kannan 89 23 Lovász-Simonovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 5 LV 03,04 LV 06 Cousins-V. 13, 14 Power New aspects everything 16 localization 10 logconcave integration 10 ball walk 8 error analysis 7 multiple improvements speedy walk, isotropy 4 annealing, wt. isoper. 4 integration, local analysis 3 Gaussian cooling Gaussian sampling/volume ο½ Sample from Gaussian restricted to K Compute Gaussian measure of K ο½ Anneal with a Gaussian ο½ ο½ Define ππ π = π − π 2 2π2 π 1 , increase π ο½ Start with π0 small ~ ο½ Compute ratios of integrals of consecutive phases: ∫ ππ+1 ∫ ππ in phases. Gaussian sampling ο½ KLS conjecture holds for Gaussian restricted to any convex body (via Brascamp-Lieb inequality). Thm. π π3 ≥ ο½ π π π(π1 , π2 ) min π π1 , π(π2 ) Not enough on its own, but can be used to show: Thm. [Cousins-V. 13]. For π 2 = π 1 , Ball walk applied to Gaussian π(0, π 2 πΌπ ) restricted to any convex body containing the unit ball mixes in π∗ π2 time from a warm start. Speedy walk: a thought experiment ο½ Take sequence of points visited by ball walk: π€0 , π€1 , π€2 , π€3 , … , π€π , π€π+1 , π€π+3 … ο½ Subsequence of “proper” attempts that stay inside K ο½ This subsequence is a Markov chain and is rapidly mixing from any point ο½ For a warm start, the total number of steps is only a constant factor higher Gaussian volume ο½ Theorem [Cousins-V.13] The Gaussian volume of a convex body K containing the unit ball can be estimated in time π∗ π3 . ο½ ο½ No need to adjust for isotropy! Each step samples a 1-d Gaussian from an interval ο½ Can we use this to compute the volume? Gaussian Cooling ο½ ππ π = π 2 ο½ π0 ο½ = − 1 2 , ππ π Estimate ππ2 π 2 2π2 π =π π . ∫ ππ+1 ∫ ππ using samples drawn according to ππ ≤ 1, set ππ2 = 2 ππ−1 1+ ο½ For ο½ 2 For ππ2 > 1, set ππ2 = ππ−1 1+ 1 π ππ−1 π Gaussian Cooling ο½ ππ π = π − π 2 2π2 π 2 For ππ2 ≤ 1, we set ππ2 = ππ−1 1+ 1 π ο½ Sampling time: π2 , #phases, #samples per phase: π ο½ So, total time = π2 × π × π = π3 ο½ Gaussian Cooling ο½ ππ π = π For ο½ ο½ ο½ ο½ ππ2 − π 2 2π2 π > 1, we set ππ2 = Sampling time: π 2 π2 #phases to double π is 2 ππ−1 1+ ππ−1 π (too much??) π π #samples per phase is also π π So, total time to double π is π π × π π × π 2 π2 = π3 !!! Variance of ratio estimator ο½ 2 Why can we set ππ2 as high as ππ−1 1+ 2 π₯ − 2 2π ππ−1 π ? π ππ2 , π₯ = π for π₯ ∈ πΎ πΉ ππ2 = ∫ π ππ2 , π₯ ππ₯ Lemma. π = π π2 , π π2 π 1+πΌ, π πΈ π2 πΈ π 2 for πΌ = π π π . πΈ π = πΉ π2 πΉ π2 1+πΌ π2 π2 πΉ πΉ 1+πΌ 1−πΌ π = = π πΉ π2 2 πΌ2 π π2 =π 1 Variance of ratio estimator πΈ π2 πΈ π 2 π2 π2 πΉ 1+πΌ πΉ 1−πΌ π = =π 2 2 πΉ π First use localization to reduce to 1-d inequality, for a restricted family of logconcave functions: For πΎ ∈ π ⋅ π΅π and −π ≤ π ≤ π’ ≤ π π’ πΊ π2 = π‘+π t2 π−1 π −2π 2 dt π π2 π2 πΊ 1+πΌ πΊ 1−πΌ π =π 2 2 πΊ π πΌ2 π 2 π2 πΌ2 π π2 Variance of ratio estimator π2 π2 πΊ 1+πΌ πΊ 1−πΌ π =π 2 2 πΊ π πΌ2 π 2 π2 where πΈ π‘2 = π’ 2 ∫π π‘ π’ ∫π π‘+π π‘+π ⇔ ππΈ π‘ 2 ≤π 2 dπ π‘2 π−1 −2π 2 π ππ‘ π‘2 π−1 π −2π 2 ππ‘ Warm start ο½ With ππ2 = 2 ππ−1 1 ππ−1 + π , random point from one distribution gives a warm start for the next. π ππ2 , π₯ = π π = πΈπ for πΌ = π 2 π₯ − 2 2π ππ π₯ ππ+1 π₯ π π for π₯ ∈ πΎ ππ π₯ πΉ ππ2 = ∫ π ππ2 , π₯ ππ₯ ∫ ππ+1 π₯ ππ π₯ ππ₯ = ⋅ ⋅ ππ+1 π₯ ∫ ππ π₯ ∫ ππ π₯ 1+πΌ πΉ π 2 (1 + πΌ) πΉ π 2 ⋅ 1 + 2πΌ = π 1 = πΉ π2 2 . Same lemma! Gaussian Cooling [CV14] ο½ Accelerated annealing π π ο½ 1+ ο½ Thm. The volume of any well-rounded convex body K can be estimated using π∗ (π3 ) membership queries. is best possible rate CV algorithm: π π × π π × π 2 π2 × log π = π∗ (π3 ) Practical volume/integration Start with a concentrated Gaussian Run the algorithm till the Gaussian is nearly flat In each phase, flatten Gaussian as much as possible while keeping variance of ratio of integrals bounded Variance can be estimated with a small constant number of samples If covariance is skewed (as seen by SVD of O(n) points), scale down high variance subspace “Adaptive” annealing (also used in [Stefankovic-Vigoda-V.] for discrete problems) Open questions ο½ How true is the KLS conjecture? Open questions When to stop a random walk? (how to decide if current point is “random”?) ο½ ο½ Faster isotropy/rounding? ο½ How to get information before reaching stationary? To make isotropic: run for N steps; transform using covariance; repeat. Open questions ο½ How efficiently can we learn a polytope given only random points? ο½ With O(mn) points, cannot “see” structure, but enough information to estimate the polytope! Algorithms? For convex bodies: ο½ ο½ ο½ [KOS][GR] need 2Ω π points π [Eldan] need 2π even to estimate volume! Open questions ο½ Can we estimate the volume of an explicit polytope in deterministic polynomial time? π΄π₯ ≤ π