Recent Progress and Open Problems in Algorithmic Convex Geometry ∗ Santosh S. Vempala

advertisement
Recent Progress and Open Problems in
Algorithmic Convex Geometry∗
Santosh S. Vempala
School of Computer Science
Algorithms and Randomness Center
Georgia Tech, Atlanta, USA 30332
vempala@gatech.edu
Abstract
This article is a survey of developments in algorithmic convex geometry over the past decade.
These include algorithms for sampling, optimization, integration, rounding and learning, as well
as mathematical tools such as isoperimetric and concentration inequalities. Several open problems
and conjectures are discussed on the way.
Contents
1 Introduction
1.1
2
Basic denitions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Problems
2
4
2.1
Optimization
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.2
Integration/Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.3
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.4
Sampling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.5
Rounding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3 Geometric inequalities and conjectures
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3.1
Rounding
3.2
Measure and concentration
3.3
Isoperimetry
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.4
Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Algorithms
7
8
13
4.1
Geometric random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
4.3
PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
∗ This work was supported by NSF awards AF-0915903 and AF-0910584.
©
Santosh S. Vempala;
licensed under Creative Commons License NC-ND
Leibniz International Proceedings in Informatics
Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
14
2
CONTENTS
1
Introduction
Algorithmic convex geometry is the study of the algorithmic aspects of convexity.
While
convex geometry, closely related to nite-dimensional functional analysis, is a rich, classical
eld of mathematics, it has enjoyed a resurgence since the late 20th century, coinciding
with the development of theoretical computer science.
These two elds started coming
together in the 1970's with the development of geometric algorithms for convex optimization,
notably the Ellipsoid method. The ellipsoid algorithm uses basic notions in convex geometry
with profound consequences for computational complexity, including the polynomial-time
solution of linear programs.
This development heralded the use of more geometric ideas,
including Karmarkar's algorithm and interior point methods.
fundamental geometric questions such as
rounding,
It brought to the forefront
i.e., applying ane transformations to
sets in Euclidean space to make them easier to handle. The ellipsoid algorithm also resulted
in the striking application of continuous geometric methods for the solution of discrete
optimization problems such as submodular function minimization.
A further startling development occurred in 1989 with the discovery of a polynomial
method for estimating the volume of a convex body. This was especially surprising in the
light of an exponential lower bound for any deterministic algorithm for volume estimation.
The crucial ingredient in the volume algorithm was an ecient procedure to sample nearly
uniform random points from a convex body.
Over the past two decades the algorithmic
techniques and analysis tools for sampling and volume estimation have been greatly extended
and rened.
One noteworthy aspect here is the development of isoperimetric inequalities
that are of independent interest as purely geometric properties and lead to new directions in
the study of convexity. Ecient sampling has also lead to alternative polynomial algorithms
for convex optimization.
Besides optimization, integration and sampling, our focus problems in this survey are
rounding
and
learning.
Ecient algorithms for the rst three problems rely crucially on
rounding. Approximate rounding can be done using the ellipsoid method and with a tighter
guarantee using random sampling. Learning is a problem that generalizes integration. For
example, while we know how to eciently compute the volume of a polytope, learning one
eciently from random samples is an open problem. For general convex bodies, volume computation is ecient, while learning requires a superpolynomial number of samples (volume
computation uses a membership oracle while learning gets only random samples).
This survey is not intended to be comprehensive. There are numerous interesting developments in convex geometry, related algorithms and their growing list of applications and
connections to other areas (e.g., data privacy, quantum computing, statistical estimators
etc.) that we do not cover here, being guided instead by our ve focus problems.
1.1 Basic denitions
S of Rn is convex if for any two
f : R → R+ is said to be logconcave if
Recall that a subset
A function
n
points
x, y ∈ S ,
the interval
for any two points
x, y ∈ R
[x, y] ⊆ S .
n
and any
λ ∈ [0, 1],
f (λx + (1 − λ)y) ≥ f (x)λ f (y)1−λ .
The indicator function of a convex set and the density function of a Gaussian distribution
are two canonical examples of logconcave functions.
A density function
f : Rn → R+
is said to be
isotropic,
if its centroid is the origin, and
its covariance matrix is the identity matrix. In terms of the associated random variable
X,
CONTENTS
3
this means that
E(X) = 0
E(XX T ) = I.
and
This condition is equivalent to saying that for every unit vector
Z
v ∈ Rn ,
(v T x)2 f (x) dx = 1.
Rn
f
We say that
1
≤
C
Z
is
C -isotropic,
if
(v T x)2 f (x) dx ≤ C
Rn
for every unit vector
v.
A convex body is said to be isotropic if the uniform density over it
is isotropic. For any full-dimensional distribution
D
with bounded second moments, there
is an ane transformation of space that puts it in isotropic position, namely, if
z = ED (X)
then
A = E((X − z)(X − z)T )
and
1
y = A− 2 (X − z)
has an isotropic density.
For a density function
f : Rn → R+ ,
we let
πf
be the measure associated with it. The
following notions of distance between two distributions
variation distance of P
Q
and
P
and
Q
will be used: The
Total
is
dtv (P, Q) = sup |P (A) − Q(A)|.
A∈A
χ-squared distance of P with respect to Q is
2
Z n
Z n
dP (u)
dP (u) − dQu
2
dQ(u) =
dP (u) − 1
χ (P, Q) =
dQ(u)
dQ(u)
R
R
The
The rst term in the last expression is also called the
M -warm
is said to be
w.r.t.
Q
L2
distance of
P
w.r.t.
Q.
Finally,
P
if
P (A)
.
A∈A Q(A)
M = sup
A discrete-time
Markov chain is dened using a triple (K, A, {Pu : u ∈ K}) along with a
Q0 , where K is the state space, A is a set of measurable subsets of K
Pu is a measure over K , as a sequence of elements of K , w0 , w1 , . . ., where w0 is chosen
from Q0 and each subsequent wi is chosen from Pwi−1 . Thus, the choice of wi+1 depends
only on wi and is independent of w0 , . . . , wi−1 . It is easy to verify that a distribution Q is
stationary i for every A ∈ A,
Z
Z
Pu (K \ A) dQ(u) =
Pu (A) dQ(u).
starting distribution
and
A
The
K\A
conductance of a subset A is dened as
R
φ(A) =
Pu (K \ A) dQ(u)
min{Q(A), Q(K \ A)}
A
and the conductance of the Markov chain is
R
φ = min φ(A) =
A
min
0<Q(A)≤ 21
A
Pu (K \ A) dQ(u)
.
Q(A)
4
CONTENTS
The following weaker notion of conductance will also be useful.
s-conductance
A
min
A:s<Q(A)≤ 12
We let
f.
0 ≤ s <
1
2 , the
of a Markov chain is dened as
R
φs =
For any
Varf (g)
Pu (K \ A) dQ(u)
.
Q(A) − s
denote the variance of a real-valued function
The unit ball in
Rn
by
Bn .
We use
O∗ (.)
g
w.r.t. a density function
to suppress error parameters and terms that
are polylogarithmic in the leading terms.
2
Problems
We now state our focus problems more precisely and mention the current bounds on their
complexity. Algorithms achieving these bounds are discussed in a Section 4. As input to
these problems, we typically assume a general function oracle, i.e., access to a real-valued
function as a blackbox. The complexity will be measured by both the number of oracle calls
and the number of arithmetic operations.
2.1 Optimization
Input:
f : Rn → R+ ;
an oracle for a function
a point
x0
with
f (x0 ) > 0;
an error parameter
ε.
Output:
a point
x
such that
f (x) ≥ max f − ε.
Linear optimization over a convex set is the special case when
T
c x
is a linear function to be minimized and
membership oracle for
f,
K
K
T
f = e−c
x K
χ (x)
where
is the convex set. This is equivalent to a
in this case. Given only access to the function
f , for any logconcave
the complexity of optimization is poly(n, log(1/ε)) by either an extension of the Ellipsoid
method to handle membership oracles [21], or Vaidya's algorithm [60] or a reduction to
sampling [6, 25, 45]. This line of work began with the breakthrough result of Khachiyan [34]
showing that the Ellipsoid method proposed by Yudin and Nemirovskii [65] is polynomial
for explicit linear programs.
The current best time complexity is
a reduction to logconcave sampling [45].
O∗ (n4.5 )
achieved by
For optimization of an explicit function over a
convex set, the oracle is simply membership in the convex set. A stronger oracle is typically
available, namely a
separation oracle that gives a hyperplane that separates the query point
from the convex set, in the case when the point lies outside. Using a separation oracle, the
O∗ (n2 )
O∗ (n).
ellipsoid algorithm has complexity
and Vempala have complexity
while Vaidya's algorithm and that of Bertsimas
The current frontier for polynomial-time algorithms is when
in
Rn .
f
is a logconcave function
Slight generalizations, e.g., linear optimization over a star-shaped body is NP-hard,
even to solve approximately [49, 9].
2.2 Integration/Counting
Input:
an oracle for a function
f : Rn → R+ ;
ε.
Output:
a real number
A
such that
Z
(1 − ε)
Z
f ≤ A ≤ (1 + ε)
f.
a point
x0
with
f (x0 ) > 0;
an error parameter
CONTENTS
5
Dyer, Frieze and Kannan [15, 16] gave a polynomial-time randomized algorithm for
estimating the volume of a convex body (the special case above when
function of a convex body) with complexity poly(n, 1/ε, log(1/δ)), where
that the algorithm's output is incorrect.
f is the indicator
δ is the probability
This was extended to integration of logconcave
functions by Applegate and Kannan [3] with an additional dependence of the complexity
on the Lipschitz parameter of
O∗ (n4 )
complexity is
f.
Following a long line of improvements, the current best
using a variant of simulated annealing, quite similar to the algorithm
for optimization [45].
In the discrete setting, the natural general problem is when
integer points in a convex set, e.g.,
f
is
1
f
is a distribution over the
only for integer points in a convex body
K.
While
there are many special cases that are well-studied, e.g., perfect matchings of a graph [23],
not much is known in general except a roundness condition [30].
2.3 Learning
Input:
random points drawn from an unknown distribution in
an unknown function
Output:
A function
Rn
and their labels given by
n
f : R → {0, 1}; an error parameter ε.
g : Rn → {0, 1} such that P(g(x) 6= f (x)) ≤ ε
over the unknown
distribution.
The special case when
f
is an unknown linear function can be solved by using any linear
classier that correctly labels a sample of size
C
ε
where
n log
C
1
1
+ log
ε
δ
,
is a constant. Such a classier can be learned by using any ecient algorithm for
linear programming. The case when
f
is a polynomial threshold function of bounded degree
can also be learned eciently essentially by reducing to the case of linear thresholds via a
linearization of the polynomial.
2.4 Sampling
Input:
an oracle for a function
f : Rn → R+ ;
a point
x0
with
f (x0 ) > 0;
an error parameter
ε.
Output:
a random point
x
whose distribution
distribution with density proportional to
D
is within
ε
total variation distance of the
f.
As the key ingredient of their volume algorithm [16], Dyer, Frieze and Kannan gave a
polynomial-time algorithm for the case when
f
is the indicator function over a convex body.
The complexity of the sampler itself was poly(n, 1/ε).
Later, Applegate and Kannan [3]
generalized this to smooth logconcave functions with complexity polynomial in
n, 1/ε
and
the Lipschitz parameter of the logconcave function. In [45] the complexity was improved to
O∗ (n4 )
and the dependence on the Lipschitz parameter was removed.
2.5 Rounding.
We rst state this problem for a special case.
Input: a membership oracle for a convex body K ; a point x0 ∈ K ; an error parameter ε.
Output: a linear transformation A and a point z such that K 0 = A(K − z) satises one
the following:
of
6
CONTENTS
Exact sandwiching:
Bn ⊆ K 0 ⊆ RBn .
(1)
Second moment sandwiching: For every unit vector
u,
1 − ε ≤ EK 0 ((u · x)2 ) ≤ 1 + ε.
Approximate sandwiching:
vol(Bn
∩ K 0) ≥
1
vol(Bn )
2
and vol(K
0
∩ RBn ) ≥
1
0
vol(K ).
2
(2)
When the input is a general density function, the second moment condition extends readily
without any change to the statement.
The classical Löwner-John theorem says that for any convex body
of maximal volume contained in
K
has the property that
K ⊆ nE
K , the ellipsoid E
√
K ⊆ nE if K
(and
is centrally symmetric). Thus by using the transformation that makes this ellipsoid a ball,
one achieves
R=n
in the rst condition above (exact sandwiching). Moreover, this is the
best possible as shown by the simplex. However, computing the ellipsoid of maximal volume
is hard.
Lovász [41] showed how to compute an ellipsoid that satises the containment
R = n3/2
with
using the Ellipsoid algorithm.
This remains the best-known deterministic
algorithm. A randomized algorithm can achieve
R = n(1 + ε)
ε > 0 using a simple
n · C(ε) random samples
for any
reduction to random sampling [27]. In fact, all that one needs is
[8, 57, 54, 1], and then the transformation to be computed is the one that puts the sample
in isotropic position.
3
Geometric inequalities and conjectures
We begin with some fundamental inequalities. For two subsets
A, B
of
Rn ,
their Minkowski
sum is
A + B = {x + y : x ∈ A, y ∈ B}.
The Brunn-Minkowski inequality says that if
subsets of
vol(A
R
n
A, B
and
A+B
are measurable, compact
, then
1
1
1
+ B) n ≥ vol(A) n + vol(B) n .
(3)
The following extension is the Prékopa-Leindler inequality:
f, g, h : Rn → R+ ,
for any three functions
satisfying
h(λx + (1 − λ)y) ≥ f (x)λ g(x)1−λ
for every
λ ∈ [0, 1],
h(x) dx ≥
Rn
λ Z
Z
Z
f (x) dx
Rn
1−λ
g(x) dx
.
(4)
Rn
The product and the minimum of two logconcave functions are also logconcave. We also
have the following fundamental properties [12, 39, 55, 56].
CONTENTS
7
Theorem 1. All marginals and the distribution function of a logconcave function are
logconcave. The convolution of two logconcave functions is logconcave.
I
In the rest of this section, we describe inequalities about rounding, concentration of mass
and isoperimetry. These inequalities play an important role in the analysis of algorithms
for our focus problems. Many of the inequalities apply to logconcave functions. The latter
represent the limit to which methods that work for convex bodies can be extended. This is
true for all the algorithms we will see with a few exceptions, e.g., an extension of isoperimetry
and sampling to star-shaped bodies. We note here that Milman [50] has shown a general
approximate equivalence between concentration and isoperimetry over convex domains.
3.1 Rounding
We next describe a set of properties related to ane transformations. The rst one below
is from [27].
I
Theorem 2. [27] Let K be a convex body in R
r
n
in isotropic position. Then,
p
n+1
Bn ⊆ K ⊆ n(n + 1)Bn .
n
Thus, in terms of the exact sandwiching condition (1), isotropic position achieves a factor
n, and this ratio of the radii of the outer and inner ball, n, is the best possible as shown by
the n-dimensional simplex. A bound of O(n) was earlier established by Milman and Pajor
of
[51].
Sandwiching was extended to logconcave functions in [48] as follows.
Theorem 3. [48] Let f be a logconcave density in isotropic position. Then for any t > 0,
the level set L(t) = {x : f (x) ≥ t} contains a ball of radius πf (L(t))/e. Also, the point
z = arg max f satises kzk ≤ n + 1.
I
For general densities, it is convenient to dene a rounding parameter as follows.
a density function
2
For
2
R = Ef (kX − EXk ) and r be the radius of the largest ball
f of measure 1/8. Then R/r is the rounding parameter and
√
well-rounded if R/r = O( n). By the above theorem, any logconcave
f,
let
contained in the level set of
we say a function is
density in isotropic position is well-rounded.
An isotropic transformation can be estimated for any logconcave function using random
samples drawn from the density proportional to
f.
In fact, following the work of Bourgain
[8] and Rudelson [57], Adamczak et al [1] showed that
say,
2-isotropic
O(n) random points suce to achieve,
position with high probability.
Theorem 4.
I
[1] Let x1 , . . . xm be random points drawn from an isotropic logconcave density
f . Then for any ε ∈ (0, 1), t ≥ 1, there exists C(ε, t) such that if m ≥ C(ε, t)n,
m
1 X
T
xi xi − I ≤ ε
m
i=1
2
√
with probability at least 1 − e−ct
C(ε, t) = C
n
. Moreover,
t4
2t2
log2 2
2
ε
ε
for absolute constants c, C suces.
8
CONTENTS
The following conjecture was alluded to in [44].
I
Conjecture 5. There exists a constant
exists an ellipsoid
vol(E
E
C
such that for any convex body
K
in
Rn ,
there
that satises approximate sandwiching:
1
1
vol(E); vol(K ∩ C log nE) ≥
vol(K).
2
2
∩ K) ≥
3.2 Measure and concentration
The next inequality was proved by Grünbaum [22] (for the special case of the uniform density
over a convex body).
Lemma 6.
[22] Let f : Rn → R+ be a logconcave density function, and let H be any
halfspace containing its centroid. Then
I
Z
f (x) dx ≥
H
1
.
e
This was extended to hyperplanes that come close to the centroid in [6, 48].
Lemma 7.
I
[6, 48] Let f : Rn → R+ be an isotropic logconcave density function, and let
H be any halfspace within distance t from its centroid. Then
Z
1
f (x) dx ≥ − t.
e
H
The above lemma easily implies a useful concentration result.
I
Theorem 8. Let f be a logconcave density in R
n
from πf . If H is a halfspace containing z ,
E (πf (H)) ≥
1
−
e
and z be the average of m random points
r
n
m
.
A strong radial concentration result was shown by Paouris [54].
I
Lemma 9. [54] Let K be an isotropic convex body. Then, for any t ≥ 1,
√
√
P(kXk ≥ ct n) ≤ e−t n
where c is an absolute constant.
This implies that all but an exponentially small fraction of the mass of an isotropic convex
body lies in a ball of radius
√
O( n).
The next inequality is a special case of a more general inequality due to Brascamp and
Lieb. It also falls in a family called Poincaré-type inequalities. Recall that
the variance of a real-valued function
I
f
w.r.t. a density function
Theorem 10. Let γ be the standard Gaussian density in R
function. Then
Z
Varγ (f ) ≤
n
Varµ (f )
denote
µ.
. Let f : Rn → R be a smooth
k∇f k2 dγ.
Rn
A smooth function here is one that is locally Lipschitz.
An extension of this inequality to logconcave restrictions was developed in [63], eliminating the need for smoothness and deriving a quantitative bound. In particular, the lemma
shows how variances go down when the standard Gaussian is restricted to a convex set.
CONTENTS
9
Theorem 11. [63] Let g be the standard Gaussian density function in R
and f : Rn → R+
be any logconcave function. Dene the function h to be the density proportional to their
product, i.e.,
I
h(x) = R
n
f (x)g(x)
.
f (x)g(x) dx
Rn
Then, for any unit vector u ∈ Rn ,
2
e−b
Varh (u · x) ≤ 1 −
2π
where the support of f along u is [a0 , a1 ] and b = min{|a0 |, |a1 |}.
The next conjecture is called the
I
Conjecture 12 (Thin shell). For
X
variance hypothesis or thin shell conjecture.
drawn from a logconcave isotropic density
f
Rn ,
in
Varf (kXk2 ) ≤ Cn
We note here that for an isotropic logconcave density,
σn2 = Ef ((kXk −
i.e., the deviation of
√
1
Varf (kXk2 ) ≤ Cσn2 ,
n
√
kXk from n and the deviation
n)2 ) ≤
of
kXk2
from
E(kXk2 ) = n
are closely
related. Thus, the conjecture implies that most of the mass of a logconcave distribution lies
in shell of
constant thickness.
√
σn ≤ n is easy to establish for any isotropic logconcave density.
√
σn = o( n) [36, 37], a result that can be interpreted as a central limit
A bound of
showed that
Klartag
theorem
(since the standard deviation is now a smaller order term than the expectation). The current
best bound on
σn
is due to Fleury [19] who showed that
σn ≤ n3/8 .
We state his result
below.
I
Lemma 13. [19] For any isotropic logconcave measure µ,
√
3
P |kxk − n| ≥ tn 8 ≤ Ce−ct .
where c, C are constants.
3.3 Isoperimetry
The following theorem is from [14], improving on a theorem in [43] by a factor of
two subsets
S1 , S2
of
Rn ,
we let
d(S1 , S2 )
2.
For
denote the minimum Euclidean distance between
them.
Theorem 14. [14] Let S1, S2, S3 be a partition into measurable sets of a convex body K
of diameter D. Then,
I
vol(S3 )
≥
2d(S1 , S2 )
min{vol(S1 ), vol(S2 )}.
D
A limiting (and equivalent) version of this inequality is the following: Let
the interior boundary of
voln−1 (∂S
∩ K) ≥
S
w.r.t.
K.
For any subset
2
min{vol(S), vol(K \ S)},
D
S
∂S ∩ K
of a convex body of diameter
denote
D,
10
CONTENTS
i.e., the surface area of
S
inside
K
is large compared to the volumes of
S
and
K \ S.
This
is in direct analogy with the classical isoperimetric inequality, which says that the surface
area to volume ratio of any measurable set is at least the ratio for a ball. Henceforth, for a
density function
Φf (S) =
f,
we use
Φf (S)
to denote the isoperimetric ratio of a subset
S,
i.e.,
πf (∂S)
min{πf (S), 1 − πf (S)}
and
Φf = inf n φ(S).
S⊆R
Theorem 14 can be generalized to arbitrary logconcave measures.
Theorem 15.
I
Let f : Rn → R+ be a logconcave function whose support has diameter
D and let πf be the induced measure. Then for any partition of Rn into measurable sets
S1 , S2 , S3 ,
πf (S3 ) ≥
2d(S1 , S2 )
min{πf (S1 ), πf (S2 )}.
D
The conclusion can be equivalently stated as
statement, applied to the subset
S1
Φf ≥ 2/D.
For Euclidean distance this latter
essentially recovers the above lower bound on
πf (S3 ).
Next we ask if logconcave functions are the limit of such isoperimetry. Theorem 15 can
s-concave functions in Rn for s ≥ −1/(n − 1) with only a loss of a factor of 2
s
[10]. A nonnegative function f is s-concave if f (x) is a concave function of s. Logconcave
functions correspond to s → 0. The only change in the conclusion is by a factor of 2 on the
be extended to
RHS.
Theorem 16. [10] Let f : R
→ R+ be an s-concave function with s ≥ −1/(n − 1) and
with support of diameter D and let πf be the induced measure. Then for any partition of
Rn into measurable sets S1 , S2 , S3 ,
I
πf (S3 ) ≥
n
d(S1 , S2 )
min{πf (S1 ), πf (S2 )}.
D
This theorem is tight, in that there exist
ε>0
s-concave
functions for
whose isoperimetric ratio is exponentially small in
Logconcave and
s ≤ −1/(n − 1 − ε)
for any
εn.
s-concave functions are natural extensions of convex bodies.
It is natural
to ask what classes of nonconvex sets/functions might have good isoperimetry.
In recent
A star-shaped set S is one
y ∈ S , the segment [x, y] is also in S .
x have convex intersections with S is
work [9], the isoperimetry of star-shaped bodies was studied.
that contains at least one point
The set of all such points
called its kernel
Theorem 17.
KS .
x
x
such that for any
for which lines through
The kernel is always a convex set.
I
[9] Let S1 , S2 , S3 be a partition into measurable sets of a star-shaped body
S of diameter D. Let η = vol(KS )/vol(S). Then,
vol(S3 )
≥
d(S1 , S2 )
min{vol(S1 ), vol(S2 )}.
4ηD
In terms of diameter, Theorem 14 is the best possible, as shown by a cylinder. A more
rened inequality is obtained in [27, 48] using the average distance of a point to the center of
gravity (in place of diameter). It is possible for a convex body to have much larger diameter
than average distance to its centroid (e.g., a cone). In such cases, the next theorem provides
a better bound.
CONTENTS
11
Theorem 18.
[27] Let f be a logconcave density in Rn and πf be the corresponding
measure. Let zf be the centroid of f and dene M (f ) = Ef (|x − zf |). Then, for any
partition of Rn into measurable sets S1 , S2 , S3 ,
I
ln 2
d(S1 , S2 )πf (S1 )πf (S2 ).
M (f )
πf (S3 ) ≥
For an isotropic density,
M (f )2 ≤ Ef (|x − zf |2 ) = n
and so
could be unbounded (e.g., an isotropic Gaussian). Thus, if
the logconcave density
πf (S3 ) ≥ p
f,
√
M (f ) ≤
Af
The diameter
is the covariance matrix of
the inequality can be re-stated as:
c
d(S1 , S2 )πf (S1 )πf (S2 ).
Tr(Af )
A further renement is called the KLS hyperplane conjecture [27].
19 (KLS). Let
f
Let
λ1 (A)
be the
A.
largest eigenvalue of a symmetric matrix
I Conjecture
c such that
n.
be a logconcave density in
Rn .
There is an absolute constant
c
Φf ≥ p
.
λ1 (Af )
The KLS conjecture implies the thin shell conjecture (Conj.
12).
Even the thin shell
conjecture would improve the known isoperimetry of isotropic logconcave densities due to
the following result of Bobkov [7].
I
Theorem 20. [7] For any logconcave density function f in R
Φf ≥
n
,
c
.
Var(kXk2 )1/4
Combining this with Fleury's bound 13, we get that for an isotropic logconcave function in
Rn ,
Φf ≥ cn−7/16 .
(5)
This is slightly better than the bound of
cn−1/2
implied by Theorem 18 since
E(kXk2 ) = n
for an isotropic density.
The next conjecture is perhaps the most well-known in convex geometry, and is called
the slicing problem, the isotropic constant or simply the hyperplane conjecture.
I
Conjecture 21 (Slicing). There exists a constant
density
f
in
C,
such that for an isotropic logconcave
Rn ,
f (0) ≤ C n .
In fact, a result of Ball [4] shows that it suces to prove such a bound for the case of
f
being the uniform density over an isotropic convex body. In this case, the conjecture says
Rn is at least cn
O(n ) [8, 35].
the volume of an isotropic convex body in
best bound on
1/n
Ln = supf f (0)
is
for some constant
c.
The current
1/4
Recently, Eldan and Klartag [18] have shown that the slicing conjecture is implied by the
thin shell conjecture (and therefore by the KLS conjecture as well, a result that was earlier
shown by Ball).
12
CONTENTS
I
Theorem 22. [18] There exists a constant C such that
Ln = sup f (0)1/n ≤ Cσn = C sup
f
q
Ef ((kXk −
√
n)2 )
f
where the suprema range over isotropic logconcave density functions.
We conclude this section with a discussion of another direction in which classical isoperimetry has been extended. The
points
in
K
u, v
in a convex body
through
dK (u, v) =
where
u
and
v
cross-ratio distance
K
(also called Hilbert metric) between two
is computed as follows: Let
p, q
such that the points occur in the order
be the endpoints of the chord
p, u, v, q .
Then
|u − v||p − q|
= (p : v : u : q).
|p − u||v − q|
(p : v : u : q)
denotes the classical cross-ratio.
distance between two sets
S1 , S2
We can now dene the cross-ratio
as
dK (S1 , S2 ) = min{dK (u, v) : u ∈ S1 , v ∈ S2 }.
The next theorem was proved in [42] for convex bodies and extended to logconcave densities
in [47].
Theorem 23.
I
[42] Let f be a logconcave density in Rn whose support is a convex body
K and let πf be the induced measure. Then for any partition of Rn into measurable sets
S1 , S2 , S3 ,
πf (S3 ) ≥ dK (S1 , S2 )πf (S1 )πf (S2 ).
All the inequalities so far are based on dening the distance between
minimum
over pairs of some notion of pairwise distance.
and
S2 .
and
S2
by the
It is reasonable to think that
perhaps a much sharper inequality can be obtained by using some
S1
S1
average distance between
Such an inequality was proved in [46], leading to a substantial improvement in
the analysis of hit-and-run.
Theorem 24. [46] Let K be a convex body in R
. Let f : K → R+ be a logconcave density
with corresponding measure πf and h : K → R+ , an arbitrary function. Let S1 , S2 , S3 be
any partition of K into measurable sets. Suppose that for any pair of points u ∈ S1 and
v ∈ S2 and any point x on the chord of K through u and v ,
I
h(x) ≤
n
1
min(1, dK (u, v)).
3
Then
πf (S3 ) ≥ Ef (h(x)) min{πf (S1 ), πf (S2 )}.
The coecient on the RHS has changed from a minimum to an average.
h(x)
at a point
from
S1 , S2
x
The weight
is restricted only by the minimum cross-ratio distance between pairs
respectively, such that
x
u, v
lies on the line between them (previously it was the
overall minimum). In general, it can be much higher than the minimum cross-ratio distance
between
S1
and
S2 .
CONTENTS
13
3.4 Localization
Most of the known isoperimetry theorems can be proved using the localization lemma of
1-dimensional
Lovász and Simonovits [44, 27]. It reduces inequalities in high dimension to
inequalities by a sequence of bisections and a limit argument. To prove that an inequality
is true, one assumes it is false and derives a contradiction in the
I
Lemma 25.
that
1-dimensional
case.
[44] Let g, h : Rn → R be lower semi-continuous integrable functions such
Z
Z
and
g(x) dx > 0
Rn
h(x) dx > 0.
Rn
Then there exist two points a, b ∈ Rn and a linear function ` : [0, 1] → R+ such that
1
Z
`(t)n−1 g((1 − t)a + tb) dt > 0
1
Z
and
0
`(t)n−1 h((1 − t)a + tb) dt > 0.
0
The points
a, b
represent an interval
A
and one may think of
sectional area of an innitesimal cone with base area
cone truncated at
a
and
b,
the integrals of
generality, we can assume that
a, b
g
and
h
dA.
l(t)n−1 dA
as the cross-
The lemma says that over this
are positive.
Also, without loss of
are in the union of the supports of
g
and
h.
A variant of localization was used in [9], where instead of proving an inequality for
every
needle, one instead averages over the set of needles produced by the localization procedure.
Indeed, each step of localization is a bisection with a hyperplane and for an inequality to
hold for the original set, it often suces for it hold on average" over the partition produced
rather than for both parts separately. This approach can be used to prove isoperimetry for
star-shaped sets (Theorem 17) or to recover Bobkov's isoperimetric inequality (Lemma 20).
We state here a strong version of a conjecture related to localization, which if true implies
the KLS hyperplane conjecture.
I
Conjecture 26. Let
f
be a logconcave density function and
that each part is convex. Then there exists a constant
X
C
P
be any partition of
Rn
such
such that
σf (P )2 πf (P ) ≤ Cσf2
P
where
of
f
σf2
is the largest variance of
restricted to the set
f
along any direction and
σf (P )2
is the largest variance
P.
We have already seen that the conjecture holds when
f
is a Gaussian with
C=1
(Theorem
11).
4
Algorithms
In this section, we describe the current best algorithms for the ve focus problems and
several open questions. These algorithms are related to each other, e.g., the current best
algorithms for integration and optimization (in the membership oracle model) are based on
sampling, the current best algorithms for sampling and rounding proceed in tandem etc.
In fact, all ve problems have an intimate relationship with random sampling. Integration (volume estimation), optimization and rounding are solved by reductions to random
sampling; in the case of integration, this is the only known ecient approach. For rounding,
the sampling approach gives a better bound than the current best alternatives. As for the
learning problem, its input is a random sample.
14
CONTENTS
The Ellipsoid method has played a central role in the development of algorithmic convex
geometry. Following Khachiyan's polynomial bound for linear programs [34], the generalization to convex optimization [21, 53, 32] has been a powerful theoretical tool.
Besides
optimization, the method also gave an ecient rounding algorithm [41] achieving a sandwiching ratio of
n1.5 ,
and a polynomial-time algorithm for PAC-learning linear threshold
two halfspaces.
functions. It is a major open problem to PAC-learn an intersection of
Several polynomial-time algorithms have been proposed for linear programming and convex optimization, including: Karmarkar's algorithm [31] for linear programming, interior-
self-concordant barriers
point methods for convex optimization over sets with
[52], poly-
nomial implementation of the perceptron algorithm for linear [13] and conic program [5],
simplex-with-rescaling for linear programs [33] and the random walk method [6] that requires only membership oracle access to the convex set. The eld of convex optimization
continues to be very active, both because of its many applications but also due to the search
for more practical algorithms that work on larger inputs.
One common aspect of all the known polynomial-time algorithms is that they use use
some type of repeated rescaling of space to eectively make the convex set more round",
leading to a complexity that depends on the accuracy to which the output is computed. In
principle, such a dependence can be avoided for linear programming and it is a major open
problem to nd a strongly polynomial algorithm for solving linear programs.
We note here that although there have been many developments in continuous optimization algorithms in the decades since the ellipsoid method, and in applications to combinatorial problems, there has been little progress in the past two decades on general integer
programming, with the current best complexity being
nO(n)
from Kannan's improvement
[26] of Lenstra's algorithm [40].
4.1 Geometric random walks
Sampling is achieved by rapidly mixing geometric random walks. For an in-depth survey
of this topic, up-to-date till 2005, the reader is referred to [62]. Here we outline the main
results, including developments since then.
To sample from a density proportional to a function
chain whose state space is
At a point
x,
y
move to
x,
K,
x,
where
Px
we set up a Markov
neighbor y
of
x
from some distribution
Px
x.
y
that
For example, for uniformly sampling a convex
ball walk denes the neighbors of x as all points within a xed distance δ
and a random neighbor
f.
is dened in some eciently sampleable manner, and then we
with some probability or stay at
the
f : Rn → R + ,
and stationary distribution has density proportional to
one step of the chain picks a
depends only on
body
R
n
is accepted as long as
y
is also in
K.
from
For sampling a general
density, one could use the same neighbor set, and modify the probability of transition to
min{1, f (y)/f (x)}.
This rejection mechanism is called the Metropolis lter and the Markov
chain itself is the ball walk with a Metropolis lter.
Another Markov chain that has been successfully analyzed is called
point
x,
we pick a uniform random line
l
passing
x,
hit-and-run.
At a
then a random point on the line
l
according to the density induced by the target distribution along l. In the case of uniformly
sampling a convex body, this latter distribution is simply the uniform distribution on a
chord; for a logconcave function it is the one-dimensional density proportional to the target
density
f
along the chosen line l. This process has the advantage of not needing a step-size
parameter
δ.
For ease of analysis, we typically assume that the Markov chain is
put with probability
1/2
and attempts a move with probability
1/2.
lazy,
i.e., it stays
Then, it follows that
CONTENTS
15
the distribution of the current point converges to a unique stationary distribution assuming
the conductance of the Markov chain is nonzero. The rate of convergence is approximately
determined by the conductance as given by the following theorem due to Lovász and Simonovits [44], extending earlier work by Diaconis and Stroock [11], Alon [2] and Jerrum and
Sinclair [58].
I
Theorem 27. [44] Let Q be the stationary distribution of a lazy Markov chain with Q
being its starting distribution and Qt the distribution of the current point after t steps.
a. Let M = supA Q0 (A)/Q(A). Then,
dtv (Qt , Q) ≤
b. Let 0 < s ≤
1
2
√
0
t
φ2
.
M 1−
2
and Hs = sup{|Q0 (A) − Q(A)| : Q(A) ≤ s}. Then,
Hs
dtv (Qt , Q) ≤ Hs +
s
φ2
1− s
2
t
.
c. For any ε > 0,
r
dtv (Qt , Q) ≤ ε +
χ2 (Q0 , Q) + 1
ε
t
φ2
.
1−
2
Thus the main quantity to analyze in order to bound the mixing time is the conductance.
Ball walk.
I Theorem 28.
The following bound holds on the conductance of the ball walk [28].
For any 0 ≤ s ≤ 1, we can choose the step-size δ for the ball walk in a
convex body K of diameter D so that
φs ≥
s
.
200nD
The proof of this inequality is the heart of understanding the convergence. Examining the
denitions of the conductance
φ and the isoperimetric ratio ΦQ of the stationary distribution,
one sees that they are quite similar. In fact, the main dierence is the following: in dening
conductance, the weight between two points
stepping from
x
to
y
x
and
y
depends on the probability density of
in one step; for the isoperimetric ratio, the distance is a geometric
notion such as Euclidean distance. Connecting these two notions geometric distance and
probabilistic distance along with isoperimetry bounds leads to the conductance bound
above. This is the generic line of proof with each of these components chosen based on the
Markov chain being analyzed. For a detailed description, we refer to [62].
Using Theorem 27(b), we conclude that from an
of
Qt
and
Q
ε
is smaller than
M2
t ≥ C 2 n2 D2 ln
ε
2M
ε
O(n2 D2 )
start, the variation distance
(6)
steps, for some absolute constant
The bound of
M -warm
after
C.
on the mixing rate is the best possible in terms of the diameter,
as shown by a cylinder. Here
D2
can be replaced by
E(kX − EXk2 )
using the isoperimetric
inequality given by Theorem 18. For an isotropic convex body this gives a mixing rate of
O(n3 ).
The current best isoperimetry for convex bodies (5) gives a bound of
KLS conjecture (19) implies a mixing rate of
2
O(n ),
O(n2.875 ).
The
which matches the best possible for the
ball walk, as shown for example by an isotropic cube.
16
CONTENTS
Hit-and-run.
For hit-and-run, one obtains a qualitatively better bound in the following
sense. In the case of the ball walk, the starting distribution heaving aects the mixing rate.
It is possible to start at points close to the boundary with very small local conductance,
necessitating many attempts to make a single step. Hit-and-run does not have this problem
and manages to exponentially accelerate out of any corner.
This is captured in the next
theorem, using the isoperimetric inequality given by Theorem 24.
Theorem 29.
I
Ω(1/nD).
[46] The conductance of hit-and-run in a convex body of diameter D is
Theorem 30. [46] Let K be a convex body that contains a unit ball and has centroid zK .
Suppose that EK (|x − zK |2 ) ≤ R2 and χ2 (Q0 , Q) ≤ M . Then after
I
t ≥ Cn2 R2 ln3
M
,
ε
steps, where C is an absolute constant, we have d(Qt , Q)tv ≤ ε.
The theorem improves on the bound for the ball walk (6) by reducing the dependence
on
M
and
ε
from polynomial (which is unavoidable for the ball walk) to logarithmic, while
R and n. For a body in near-isotropic position,
O∗ (n3 ). One also gets a polynomial bound starting
from any single interior point. If x is at distance d from the boundary, then the distribution
2
n
obtained after one step from x has χ (Q1 , Q) ≤ (n/d) and so applying the above theorem,
3
4
the mixing time is O(n ln (n/dε)).
maintaining the (optimal) dependence on
√
R = O( n)
and so the mixing time is
Theorems 29 and 30 have been extended in [45] to arbitrary logconcave functions.
Theorem 31. [45] Let f be a logconcave density function with support of diameter D and
assume that the level set of measure 1/8 contains a unit ball. Then,
I
φs ≥
c
nD ln(nD/s)
where c is a constant.
This implies a mixing rate that nearly matches the bound for convex bodies.
Ane-invariant walks.
In both cases above, the reader will notice the dependence on
rounding parameters and the improvement achieved by assuming isotropic position. As we
will see in the next section, ecient rounding can be achieved by interlacing with sampling.
Here we mention two random walks which achieve the rounding implicitly" by being ane
invariant. The rst is a multi-point variant of hit-and-run that can be applied to sampling
any density function.
It maintains
m
points
x1 , . . . , x m .
For each
xj ,
it picks a random
combination of the current points,
y=
m
X
αi (xi − x)
i=1
where the
αi
are drawn from
along this line through
xj
N (0, 1);
the chosen point
in the direction of
y.
xj
is replaced by a random point
This process in ane invariant and hence one
can eectively assume that the underlying distribution is isotropic. The walk was analyzed
in [6] assuming a warm start. It is open to analyze the rate of convergence from a general
starting distribution.
CONTENTS
17
For polytopes whose facets are given explicitly, Kannan and Narayanan [29] analyzed an
ane-invariant process called the
Dikin walk.
At each step, the Dikin walk computes an
ellipsoid based on the current point (and the full polytope) and moves to a random point in
x
this ellipsoid. The Dikin ellipsoid at a point
in a polytope
Ax ≤ 1
with
m
inequalities is
dened as:
Dx = {z ∈ Rn : (x − z)T
m
X
i=1
At
x,
a new point
y
ai aTi
(x − z) ≤ 1}.
(1 − aTi x)2
is sampled from a Dikin ellipsoid and the walk moves to
y
with prob
vol(Dy )
min 1,
.
vol(Dx )
For polytopes with few facets, this process uses fewer arithmetic operations than the current
best bounds for the ball walk or hit-and-run.
Open problems.
One open problem for both hit-and-run and the ball walk is to nd a
single starting point that is as good a start as a warm start. E.g., does one of these walks
started at the centroid converge at the same rate as starting from a random point?
The random walks we have considered so far for general convex bodies and density
functions rely only on membership oracles or function oracles. Can one do better using a
separation oracle? When presented with a point
K
is in the convex set
x
such an oracle either declares the point
or gives a hyperplane that separates
x
from
K.
At least for convex
bodies, such an oracle is typically realizable in the same complexity as a membership oracle.
One attractive process based on a separation oracle is the following
parameter
δ:
At a point
x,
along the straight line from
separates
y
from
K;
we pick a random point
x
to
y
either reaching
in the latter case, we reect
y
y
y
reection walk with
in the ball of radius
δ.
We move
or encountering a hyperplane
about
H
H
that
and continue moving towards
y.
It is possible that for this process, the number of oracle calls (not the number of Markov
chain steps) is only
O∗ (nD)
rather than
O∗ (n2 D2 ),
and even
O∗ (n)
for isotropic bodies. It
is a very interesting open problem to analyze the reection walk.
Our next open problem is to understand classes of distributions that can be eciently
sampled by random walks.
A recent extension of the convex setting is to sampling star-
shaped bodies [9], and this strongly suggests that the full picture is far from clear.
One
necessary property is good isoperimetry. Can one provide a sucient condition that depends
on isoperimetry and some local property of the density to be sampled?
More concretely,
for what manifolds with nonnegative curvature (for which isoperimetry is known) can an
ecient sampling process be dened?
Our nal question in this section is about sampling a discrete subset of a convex body,
namely the set of lattice points that lie in the body. In [30], it is shown that this problem
can be solved if the body contains a suciently large ball, by a reduction to the continuous
sampling problem. The idea is that, under this condition, the volume of the body is roughly
the same as the number of lattice points and thus sampling the body and rounding to a
nearby lattice point is eective. Can this condition be improved substantially? E.g., can
one sample lattice points of a near-isotropic convex body? Or lattice points of a body that
contains at least half of a ball of radius O(
√
n)?
18
CONTENTS
4.2 Annealing
Rounding, optimization and integration can all be achieved by variants of the same algorithmic technique that one might call
F0
annealing.
The method starts at an easy" distribution
and goes through a sequence of distributions
moving from
Fi−1
to
traced back to [43].
Fi
is ecient and
Fm
F1 , . . . , Fm
where the
Fi
are chosen so that
is a target distribution. This approach can be
It was analyzed for linear optimization over convex sets in [25], for
volume computation and rounding convex sets in [47] and extended to integration, rounding
and maximization of logconcave functions in [45].
For example, in the case of convex optimization, the distribution
most of its mass on points whose objective value is at least
(1 − ε)
Fm
is chosen so that
times the maximum.
Sampling directly from this distribution could be dicult since one begins at an arbitrary
point.
For integration of logconcave functions, we dene a series of functions, with the nal
function being the one we wish to integrate and the initial one being a function that is easy
to integrate. The distribution in each phase has density proportional to the corresponding
function. We use samples from the current distribution to estimate the ratios of integrals
of consecutive functions. Multiplying all these ratios and the integral of the initial function
gives the estimate for the integral of the target function.
For rounding a logconcave density, as shown by Theorem 4, we need
O(n)
random sam-
ples. Generating these directly from the target density could be expensive since sampling
(using a random walk) takes time that depends heavily on how well-rounded the density
function is. Instead, we consider again a sequence of distributions, where the rst distribution in the sequence is chosen to be easy to sample and consecutive distributions have the
property that if
Fi
is isotropic, then
Fi+1
is near-isotropic. We then sample
Fi+1
eciently,
apply an isotropic transformation and proceed to the next phase.
For both optimization and integration, this rounding procedure is incorporated into the
main algorithm to keep the sampling ecient. All three cases are captured in the generic
description below.
Annealing
1.
2.
3.
4.
For i = 0, . . . , m, define
i
1
ai = b 1 + √
and fi (x) = f (x)ai .
n
Let X01 , . . . , X0k be independent random points with density
proportional to f0 .
For i = 0, . . . , m − 1: starting with Xi1 , . . . , Xik , generate random
1
k
points Xi+1 = {Xi+1
, . . . , Xi+1
}; update a running estimate g based
on these samples; update the isotropy transformation using the
samples.
Output the final estimate of g .
For optimization, the function
to be maximized while
and rounding
function
g
fm = f ,
g
fm
f,
is set to be a suciently high power of
is simply the maximum objective value so far.
the function
For integration
the target function to be integrated or rounded. For integration, the
starts out as the integral of
f0
and is multiplied by the ratio of
R
fi+1 /
R
fi
in
CONTENTS
each step.
19
For rounding,
g
is simply the estimate of the isotropic transformation for the
current function.
We now state the known guarantees for this algorithm [45]. The complexity improves on
O∗ (n10 )
the original
I
algorithm of Applegate and Kannan [3].
Theorem 32. [45] Let f be a well-rounded logconcave function.
compute a number A such that with probability at least 1 − δ ,
Z
(1 − ε)
Given ε, δ > 0, we can
Z
f (x) dx ≤ A ≤ (1 + ε)
Rn
f (x) dx
Rn
and the number of function calls is
n
= O∗ (n4 )
O n4 logc
εδ
where c is an absolute constant.
Any logconcave function can be put in near-isotropic position in time
given a point that maximizes
as remarked earlier.
f.
Near-isotropic position guarantees that
When the maximum of
f
f
O∗ (n4 )
steps,
is well-rounded
is not known in advance or computable
quickly, the complexity is a bit higher and is the same as that of maximization.
Theorem 33.
I
[45] For any well-rounded logconcave function f , given ε, δ > 0 and a point
x0 with f (x0 ) ≥ β max f , the annealing algorithm nds a point x in O∗ (n4.5 ) oracle calls
such that with probability at least 1 − δ , f (x) ≥ (1 − ε) max f and the dependence on ε, δ and
β is bounded by a polynomial in ln(1/εδβ).
This improves signicantly on the complexity of the Ellipsoid method in the memebership
oracle model (the latter being
Ω(n10 )).
We observe here that in this general oracle model, the
upper bound on the complexity of optimization is higher than that of integration. The gap
gets considerably higher when we move to star-shaped sets. Integration remains
where
O∗ (n4 /η 2 )
η is the relative measure of the kernel of the star-shaped set, while linear optimization,
η being any
even approximately is NP-hard. The hardness holds for star-shaped sets with
constant [9], i.e., the convex kernel takes up any constant fraction of the set. At one level,
this is not so surprising since nding a maximum could require zooming in to a small
hidden portion of the set while integration is a more global property. On the other hand,
historically, ecient integration algorithms came later than optimization algorithms even
for convex bodies and were considerably more challenging to analyze.
The known analysis of sampling-based methods for optimization rely on the current point
being nearly random from a suitable distribution, with the distribution modied appropriately at each step. It is conceivable that a random walk type method can be substantially
faster if its goal is optimization and its analysis directly measures progress on some distance
function, rather than relying on the intermediate step of producing a random sample.
The modern era of volume algorithms began when Lovász asked the question of whether
the volume of convex body can be estimated from a random sample (unlike annealing in
which the sampling distribution is modied several times). Eldan [17] has shown that this
could require a superpolynomial number of points for general convex bodies. On the other
hand, it remains open to eciently compute the volume from a random sample for polytopes
with a bounded number of facets.
Annealing has been used to speed up the reduction from counting to sampling for discrete
sets as well [59].
Here the traditional reduction with overhead linear in the underlying
dimension [24] is improved to one that is roughly the square root of the dimension for
20
CONTENTS
estimating a wide class of partition functions. The annealing algorithm as described above
has to be extended to be nonadaptive, i.e., the exponents used change in an adaptive manner
rather than according to a schedule xed in advance.
4.3 PCA
The method we discuss in this section is classical, namely Principal Component Analysis
(PCA). For a given set of points or a distribution in
orthonormal vectors such that for any
k ≤ n,
Rn ,
PCA identies a sequence of
the span of the rst
k
vectors in this sequence
is the subspace that minimizes the expected squared distance to the given point set or
distribution (among all
k -dimensional subsets).
The vectors can be found using the Singular
Value Decomposition (SVD), which can be viewed as a greedy algorithm that nds one vector
at a time. More precisely, given a distribution
D
in
Rn ,
with
ED (X) = 0, the top principal
E((v T x)2 ). For 2 ≤ i ≤ n,
component or singular vector is a unit vector that maximizes
i'th
the
principal component is a unit vector that maximizes the same function among
all unit vectors that are orthogonal to the rst
i−1
principal components. For a general
Gaussian, the principal components are exactly the directions along which the component
1-
dimensional Gaussians are generated. Besides the regression property for subspaces, PCA is
attractive because it can be computed eciently by simple, iterative algorithms. Replacing
the expected squared distance by a dierent power leads to an intractable problem. We have
already seen one application of PCA, namely to rounding via the isotropic position. Here
we take the principal components of the covariance matrix and apply a transformation to
make their corresponding singular values all equal to one (and therefore the variance in any
direction is the same). The principal components are the unit vectors along the axes of the
inertial ellipsoid corresponding to the covariance matrix of a distribution.
We next discuss the application of PCA to the problem of learning an intersection of
halfpsaces in
Rn .
As remarked earlier, this is an open problem even for
two assumptions. First,
in
k
but not
n
k
is small compared to
might be tolerable.
n
k = 2.
k
We make
and so algorithms that are exponential
Second, instead of learning such an intersection from
an arbitrary distribution on examples, we assume the distribution is an unknown Gaussian.
This setting was considered by Vempala [61, 64] and by Klivans et al [38]. The rst algorithm learns an intersection of
intersection of
tion of
k
k
O(k) halfspaces and has complexity (n/ε)O(k) .
The second learns an intersec-
halfspaces from any Gaussian distribution using a polynomial threshold function
O(log k/ε4 ) and therefore has
polynomial when k grows with n.
of degree
is
halfspaces from any logconcave input distribution using an
complexity
nO(log k/ε
4
)
. Neither of these algorithms
In recent work [63], PCA is used to give an algorithm whose complexity is poly(n, k, 1/ε)+
C(k, 1/ε) where C(.) is the complexity of learning an intersection of k halfspaces in Rk . Thus
one can bound this using prior results as at most
n
o
4
min k O(log k/ε ) , (k/ε)O(k) .
√
log n)
ε,
forward:
rst put the full distribution in isotropic position (using a sample), eectively
the algorithm is polynomial for
k
2O(
For xed
up to
. The algorithm is straight-
making the input distribution a standard Gaussian; then compute the
smallest k
principal
components of the examples that lie in the intersection of the unknown halfspaces.
The
main claim in the analysis is that this latter subspace must be close to the span of the
normals to the unknown halfspaces. This is essentially due to Lemma 11, which guarantees
that the smallest
k
principal components are in the subspace spanned by the normals, while
CONTENTS
21
orthogonal to this subspace, all variances are equal to that of the standard Gaussian. The
algorithm and its analysis can be extended to arbitrary convex sets whose normals lie in an
unknown
k -dimensional subspace.
k.
The complexity remains a xed polynomial in
n times an
exponential in
For general convex bodies, given positive and negative examples from an unknown Gaus-
√
sian distribution, it is shown in [38] that the complexity is
on
ε,
along with a nearly matching lower bound.
2Õ(
n)
(ignoring the dependence
√
A similar lower bound of
2Ω(
n)
holds
for learning a convex body given only random points from the body [20]. These constructions are similar to Eldan's [17] and use polytopes with an exponential number of facets.
Thus an interesting open problem is to learn a polytope in
Rn
with
m
facets given either
uniform random points from it or from a Gaussian. It is conceivable that the complexity is
poly(m, n, 1/ε). The general theory of VC-dimension already tells us that
suce.
O(mn)
samples
Such an algorithm would of course also give us an algorithm for estimating the
volume of a polytope from a set of random points and not require samples from a sequence
of distributions.
Acknowledgements.
The author is grateful to Daniel Dadush, Elena Grigorescu, Ravi
Kannan, Jinwoo Shin and Ying Xiao for helpful comments.
References
1
R. Adamczak, A. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Quantitative estimates of
the convergence of the empirical covariance matrix in log-concave ensembles. J. Amer. Math.
Soc., 23:535561, 2010.
2
3
N. Alon. Eigenvalues and expanders. Combinatorica, 6:8396, 1986.
4
K. M. Ball. Logarithmically concave functions and sections of convex sets in rn. Studia
Mathematica, 88:6984, 1988.
5
A. Belloni, R. M. Freund, and S. Vempala. An ecient rescaled perceptron algorithm for
conic systems. Math. Oper. Res., 34(3):621641, 2009.
6
D. Bertsimas and S. Vempala. Solving convex programs by random walks. J. ACM, 51(4):540
556, 2004.
7
S. Bobkov. On isoperimetric constants for log-concave probability distributions. Geometric
aspects of functional analysis, Lect. notes in Math., 1910:8188, 2007.
8
J. Bourgain. Random points in isotropic convex sets. Convex geometric analysis, 34:5358,
1996.
9
K. Chandrasekaran, D. Dadush, and S. Vempala. Thin partitions: Isoperimetric inequalities
and a sampling algorithm for star shaped bodies. In SODA, pages 16301645, 2010.
10
K. Chandrasekaran, A. Deshpande, and S. Vempala. Sampling s-concave functions: The limit
of convexity based isoperimetry. In APPROX-RANDOM, pages 420433, 2009.
11
P. Diaconis and D. Stroock. Geometric bounds for eigenvalues of markov chains. Ann. Appl.
Probab., 1(1):3661, 1991.
12
A. Dinghas. Uber eine klasse superadditiver mengenfunktionale vonbrunn-minkowskilusternik-schem typus. Math. Zeitschr., 68:111125, 1957.
13
J. Dunagan and S. Vempala. A simple polynomial-time rescaling algorithm for solving linear
programs. Math. Prog., 114(1):101114, 2008.
D. Applegate and R. Kannan. Sampling and integration of near log-concave functions. In
STOC '91: Proceedings of the twenty-third annual ACM symposium on Theory of computing,
pages 156163, New York, NY, USA, 1991. ACM.
22
CONTENTS
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
M. E. Dyer and A. M. Frieze. Computing the volume of a convex body: a case where
randomness provably helps. In Proc. of AMS Symposium on Probabilistic Combinatorics and
Its Applications, pages 123170, 1991.
M. E. Dyer, A. M. Frieze, and R. Kannan. A random polynomial time algorithm for approximating the volume of convex bodies. In STOC, pages 375381, 1989.
M. E. Dyer, A. M. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM, 38(1):117, 1991.
R. Eldan. A polynomial number of random points does not determine the volume of a convex
body. http://arxiv.org/abs/0903.2634, 2009.
R. Eldan and B. Klartag. Approximately gaussian marginals and the hyperplane conjecture.
http://arxiv.org/abs/1001.0875, 2010.
B. Fleury. Concentration in a thin euclidean shell for log-concave measures. J. Funct. Anal.,
259(4):832841, 2010.
N. Goyal and L. Rademacher. Learning convex bodies is hard. In COLT, 2009.
M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, 1988.
B. Grunbaum. Partitions of mass-distributions and convex bodies by hyperplanes. Pacic J.
Math., 10:12571261, 1960.
M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial-time approximation algorithm for the
permanent of a matrix with nonnegative entries. J. ACM, 51(4):671697, 2004.
M. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures
from a uniform distribution. Theor. Comput. Sci., 43:169188, 1986.
A. T. Kalai and S. Vempala. Simulated annealing for convex optimization. Math. Oper. Res.,
31(2):253266, 2006.
R. Kannan. Minkowski's convex body theorem and integer programming. Math. Oper. Res.,
12(3):415440, 1987.
R. Kannan, L. Lovász, and M. Simonovits. Isoperimetric problems for convex bodies and a
localization lemama. Discrete & Computational Geometry, 13:541559, 1995.
R. Kannan, L. Lovász, and M. Simonovits. Random walks and an O∗ (n5 ) volume algorithm
for convex bodies. Random Structures and Algorithms, 11:150, 1997.
R. Kannan and H. Narayanan. Random walks on polytopes and an ane interior point
method for linear programming. In STOC, pages 561570, 2009.
R. Kannan and S. Vempala. Sampling lattice points. In STOC, pages 696700, 1997.
N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica,
4(4):373396, 1984.
R. M. Karp and C. H. Papadimitriou. On linear characterization of combinatorial optimization
problems. SIAM J. Comp., 11:620632, 1982.
J. A. Kelner and D. A. Spielman. A randomized polynomial-time simplex algorithm for linear
programming. In STOC, pages 5160, 2006.
L. G. Khachiyan. Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics, 20:5372, 1980.
B. Klartag. On convex perturbations with a bounded isotropic constant. Geom. and Funct.
Anal., 16(6):12741290, 2006.
B. Klartag. A central limit theorem for convex sets. Invent. Math., 168:91131, 2007.
B. Klartag. Power-law estimates for the central limit theorem for convex sets. J. Funct. Anal.,
245:284310, 2007.
A. R. Klivans, Ryan O'Donnell, and R. A. Servedio. Learning geometric concepts via gaussian
surface area. In FOCS, pages 541550, 2008.
L. Leindler. On a certain converse of Hölder's inequality ii. Acta Sci. Math. Szeged, 33:217
223, 1972.
CONTENTS
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
H. W. Lenstra. Integer programming with a xed number of variables. Math. of Oper. Res.,
8(4):538548, 1983.
L. Lovász. An Algorithmic Theory of Numbers, Graphs and Convexity, volume 50 of CBMSNSF Conference Series. SIAM, 1986.
L. Lovász. Hit-and-run mixes fast. Math. Prog., 86:443461, 1998.
L. Lovász and M. Simonovits. On the randomized complexity of volume and diameter. In
Proc. 33rd IEEE Annual Symp. on Found. of Comp. Sci., pages 482491, 1992.
L. Lovász and M. Simonovits. Random walks in a convex body and an improved volume
algorithm. In Random Structures and Alg., volume 4, pages 359412, 1993.
L. Lovász and S. Vempala. Fast algorithms for logconcave functions: Sampling, rounding,
integration and optimization. In FOCS, pages 5768, 2006.
L. Lovász and S. Vempala. Hit-and-run from a corner. SIAM J. Computing, 35:9851005,
2006.
L. Lovász and S. Vempala. Simulated annealing in convex bodies and an O∗ (n4 ) volume
algorithm. J. Comput. Syst. Sci., 72(2):392417, 2006.
L. Lovász and S. Vempala. The geometry of logconcave functions and sampling algorithms.
Random Structures and Algorithms, 30(3):307358, 2007.
J. Luedtke, S. Ahmed, and G. L. Nemhauser. An integer programming approach for linear
programs with probabilistic constraints. Math. Program., 122(2):247272, 2010.
E. Milman. On the role of convexity in isoperimetry, spectral gap and concentration. Invent.
Math., 177(1):143, 2009.
V. Milman and A. Pajor. Isotropic position and inertia ellipsoids and zonoids of the unit ball
of a normed n-dimensional space. Geometric aspects of Functional Analysis, Lect. notes in
Math., pages 64104, 1989.
Yu. Nesterov and A. Nemirovski. Interior-Point Polynomial Algorithms in Convex Programming. Studies in Applied Mathematics. SIAM, 1994.
M. W. Padberg and M. R. Rao. The russian method for linear programming iii: Bounded
integer programming. NYU Research Report, 1981.
G. Paouris. Concentration of mass on convex bodies. Geometric and Functional Analysis,
16:10211049, 2006.
A. Prekopa. Logarithmic concave measures and functions. Acta Sci. Math. Szeged, 34:335343,
1973.
A. Prekopa. On logarithmic concave measures with applications to stochastic programming.
Acta Sci. Math. Szeged, 32:301316, 1973.
M. Rudelson. Random vectors in the isotropic position. Journal of Functional Analysis,
164:6072, 1999.
A. Sinclair and M. Jerrum. Approximate counting, uniform generation and rapidly mixing
markov chains. Information and Computation, 82:93133, 1989.
D. Stefankovic, S. Vempala, and E. Vigoda. Adaptive simulated annealing: A near-optimal
connection between sampling and counting. In FOCS, pages 183193, Washington, DC, USA,
2007. IEEE Computer Society.
P. M. Vaidya. A new algorithm for minimizing convex functions over convex sets. Math.
Prog., 73:291341, 1996.
S. Vempala. A random sampling based algorithm for learning the intersection of half-spaces.
In FOCS, pages 508513, 1997.
S. Vempala. Geometric random walks: A survey. MSRI Combinatorial and Computational
Geometry, 52:573612, 2005.
S. Vempala. Learning convex concepts from gaussian distributions with PCA. In FOCS, 2010.
S. Vempala. A random sampling based algorithm for learning the intersection of half-spaces.
JACM, to appear, 2010.
23
24
CONTENTS
65
D. B. Yudin and A. S. Nemirovski. Evaluation of the information complexity of mathematical
programming problems (in russian). Ekonomika i Matematicheskie Metody, 13(2):345, 1976.
Download