>> Yuval Peres: All right. Good morning everyone. ... from the IAS who will tell us how to beat...

advertisement
>> Yuval Peres: All right. Good morning everyone. We are very happy to have Raghu Meka
from the IAS who will tell us how to beat the union bound.
>> Raghu Meka: Today I'll talk about beating the union bound by geometric techniques. So
what is a union bound? It's a basic fact in probability. It says it shall, in events or some
probability space then the probability of the union of the events is at most the sum of the
individual probabilities. In particular, if the sum of the regional properties is less than one, then
with nonzero probability, none of the events occur. And this was popularized by Erdos, and is
one of the most basic techniques in the probabilistic method. But even before Erdos there was
one literary figure who made prominent use of this. And I'll give you three guesses.
>>: [inaudible]?
>> Raghu Meka: Literary figure.
>>: [inaudible]?
>> Raghu Meka: Okay. Close, but>>: What was the question?
>> Raghu Meka: There was one literary figure who made use of the union bound before?
>>: [inaudible].
>> Raghu Meka: Exactly. When you have eliminated impossible, whatever remains. However,
improbable must be the truth. It’s kind of a paraphrasing of the union bound. So despite its
simplicities, amazingly effective in the Probabilistic Method. And some notable examples are
the original application the method by Erdos students showing the existence of Ramsey graphs,
very good Ramsey graphs. And Shannon, who also used it to show the existence of very good
error correcting codes. Johnson-Lindenstrauss use it to show the existence of very good metric
embeddings and so on. There are many others such applications as can be seen in this
beautiful book by Alon and Spencer. On the other hand, sometimes the union bound is indeed
too naïve, and it's not enough to capture the full picture. And one salient example of this is the
Lovasz Local Lemma. It says that if you have any events as before but the events are d
dependent, meaning each event depends on, at most, d other events, and the probability of
each event is not too large, then the probability of the union is at most one, is less than one.
And if you use the union bound here, you get nothing if is much larger than the degree d. So it
beats the union bound. And we’ll keep things simple whenever we [inaudible] in beating the
union bound.
And this result itself, the [inaudible] result is amazingly effective. And Spencer says it helps us
find needles in haystacks. On the other hand, there are, we’ll come across this concept again
just to make this result constructive or algorithmic. And this required further breakthroughs by
starting with the work of Beck and more recently the work of Moser and Moser and [inaudible]
and so on.
>>: [inaudible]?
>> Raghu Meka: All right. I guess this is the first Lovasz and Erdos formulation.
>>: [inaudible].
>> Raghu Meka:. I think in seventy-seven it was 4 and was improved to e later on. This was
first result had a 4 in it. So here I'll talk about two examples. The first, [inaudible] the union
bound. The first is constructive discrepancy minimization. And here I'll talk about a new
approach for online linear programs, but by Brownian motion. And then I'll talk about the limit
theorem for polytope, and here I'll just state the result and give some applications to learning
theory, to learning theory. And in both cases, we will use some nice symmetry and geometric
properties of the Gaussian distribution to go past the union bound, and that kind of serves as
the underlying or unifying theme between these results. And here's the outline that I'll talk
with the two parts and finally I'll finish by mentioning some of my other results and
summarizing the work.
So let's start with the first part. So in the basic setting here, in discrepancy minimization, is you
have a set system which is a collection of n sets or something universe of size n. For example,
here, let's say the universe is size 5 and s one is the set 1,3,4, and s two is 2, 3, 5, s three is
everything and so on. So our collection of sets like this and the goal in discrepancy
minimization is to color the elements red or blue so as to minimize the maximum imbalance.
So for example, let's say we color, construct this coloring here, then the first row has an
imbalance of three, the second row or the second set has an imbalance of minus 1, and so on.
So the maximum imbalance or the discrepancy of the coloring is three. And the goal in
discrepancy minimization is you're given a set system and you want to find a coloring CHI, such
that, which minimizes the maximum or all sets in your set system of the discrepancy. And this
is a fundamental commentarial concept in many applications. And let's look at a few examples
to familiarize ourselves with that. And one of the first and notable examples of bounds in
discrepancy was a [inaudible] from 64. We showed that if the sets in your set system
correspond to arithmetic progressions, so Z, n, then the discrepancy is at least n to the one
quarter.
>>: Is m equal to n?
>> Raghu Meka: Here?
>>: Yeah.
>> Raghu Meka: Here m can be much larger; it’s usually where n squared>>: Is this uniform over m? How does m>> Raghu Meka: So m is, instead of all o, m is the size of the set system and here you are taking
all arithmetic progressions, so they'll be roughly what n squared says.
>>: Okay.
>> Raghu Meka: Like you have the set one, three, five, seven, you also get two, four, six and so
on. And Matousek and Spencer also gave a matching upper bound. And either example that is
well studied is the case of halfspaces. So here you have a bunch of points or some, let's say the
plane, and the sets in your set system correspond to indicator functions of halfspaces. So you
draw a hyperplane and all the points on one side will correspond to one set, like s one here, it
will either hyperplane and all the points in this side will correspond to s 2, and you consider all
possible hyperplane’s and all possible sets that you get in this form. And once again we have
tight upper and lower bounds for the discrepancy of this set system as well. Okay?
So discrepancy minimization and discrepancy theory has many applications, and here I'll talk
about one cute application that I like to, very interesting in computer graphics. So ray tracing is
a kind of nice concept in graphics where you give some textual description of the, of some
figure. There is a sphere at so and so coordinates, and there's a slab at so and so coordinates,
there's a triangle somewhere and so on. And the goal of the ray tracer is to take this text
description and render an image that corresponds to this description. And this image is actually
generated by one such ray tracer which I was glad to use because this is from an undergrad
project I did long time back. So how does a ray tracer work? So ray tracer works as follows:
you specify the position of the camera and position of the screen, and there are objects that
you're trying to render. So I shoot some rays from the camera, finds out where the rays
intersect the objects, and then retraces back onto the screen, and then renders them according
to the properties of the intersection point. So a very simple concept, and one important
operational question here is to which [inaudible] decide which points to shoot rays from? For
instance, you can't shoot the rays from all the points on the screen because that would be too
computationally intensive. And the choice of the points makes a big difference in the quality of
the rendered image. For example, if you choose a narrow grid then you get a lower [inaudible]
errors. And what people do in practice or in graphics is to choose a set of points whose
discrepancy with respect to small or with respect to simple geometric shapes like half spaces,
triangles, circles, is small. And this is actually what practitioners use; they compute some, they
have some preset configurations which minimizes this discrepancy, and they render, they use
this in their ray tracing programs. And more generally, discrepancies has many applications.
>>: [inaudible]?
>> Raghu Meka: I think the intuition is that when you do this, every ship gets kind of the same
number of points, but that [inaudible] is probably more empirical. It can really clearly, I mean if
you do it, run the program, you can clearly see a [inaudible] has been minimized with the
aspect of discrepancies. So it's probably more empirical.
>>: Somehow I’m missing something. I mean, wouldn’t the grid be [inaudible]?
>> Raghu Meka: The grid is not so good>>: Why?
>> Raghu Meka: Why? Let’s see. Because it depends on how fine a grid you choose. So if you
fix the number of points then the grid is not as good as some other configurations.
>>: Why?
>> Raghu Meka: So, for instance, if you take a, let's say I take a square, which is smaller than
the resolution of the grid, then you are kind of, or like a rectangle which sits inside between the
two lines.
>>: [inaudible]?
>> Raghu Meka: So you can do well with grid too, but then you would be require finer and finer
resolution or more points.
>>: I mean for a small square the grid is more or less as good as that.
>> Raghu Meka: Yeah, yeah. The thing is, how small a grid do you need?
>>: Yeah. But for a given set of points>>: For a given this number of points [inaudible] much better than>>: The small square. I mean, it’s a question of which collection of sets you are trying to-
>> Raghu Meka: Right. Let's say you take a rectangle. Thin rectangle. You don't want to lose
those two. The shapes like that might be problematic if you use the grid. And discrepancy
theory has many applications in complexity theory where they correspond to good average
case lower bounds, to communication complexity where it is one of the main techniques for
proving lower bounds, to computational geometry where it is useful in building small data
structures and to graphics, as we saw before, and also to pseudo-randomness. And there are
many more applications as can be seen in [inaudible] books.
And here I'll talk about one of the cornerstone results in this area, which is this Spencer
celebrator’s [phonetic] six standard deviation suffice theorem. It says the following: if you have
a set system with n sets in it, then the system has discrepancy at most six times square root n.
And it's called the six standard deviation suffice result because there is a six in the theorem
statement and square root n corresponds to the standard deviation of a set under a random
coloring. And the square root n bound here is actually necessary. You don't know the right
constant in front of it, but more or less it's tight. And what's most interesting here is that it
beats the union bound. If you take a random coloring and use a union bound across the
different sets, you can show that the coloring gets it discrepancies square root of n log n. But
on the other hand, the [inaudible] systems where random colorings get your discrepancies,
[inaudible] off square root n log n. So Spencer’s result beats the union bound in this sense.
>>: [inaudible]?
>> Raghu Meka: And another aspect of this result is that Spencer's original proof had an
ingenious use of the pigeonhole principle and hence was nonconstructive. So again, no
algorithm for finding such a coloring shot of and numerating all possible colorings. And in fact,
Alon and Spencer, Spencer [inaudible] twenty years ago that there is no inefficient algorithm
for finding such a coloring, finding a coloring which gets discrepancy o of square root n. And
like all group [inaudible], this was shown to be false. Recently in a breakthrough work by
Bansal, we showed that the [inaudible] randomized algorithm for finding a coloring with
discrepancy o of square root n.
And here, I want to talk about a new elementary and geometric proof of Spencer's result which
also gives an algorithm to find such a coloring. And I want to say that our result is truly
constructive in the sense that even though Bansal give an algorithm for the problem, the
analysis of his algorithm still appeal to the nonconstructive proof. So he had some [inaudible]
program in relaxation and to argue about the value of the program he used Spencer's
nonconstructive proof, whereas our algorithm itself gives a proof of the result. And more than
the specific result, I think the technique we entered, used for finding just a coloring that we call
EDGE-WALK, which involves some rounding linear programs where Brownian motion seems to
be a much broader potential and could be used elsewhere. And very recently, about 2, 3 weeks
back, Rothvoss used some of our techniques to get the first implements for the bin-packing
problem in nearly 30 years. So it seems to other applications which are made further
investigation.
>>: Can ask a question about the [inaudible] algorithm? When you said, in this context, does
that mean just with a fixed probability your algorithm will give you one more?
>> Raghu Meka: Right. And you can also make it deterministic, but I think here the punch line
is just finding such an algorithm. And here's the outline of the algorithm. I’ll describe the
algorithm in its full glory and also sketch its analysis. So the algorithm was two steps. One is
the partial coloring method which was used by Spencer and we’ll also use. And next I'll
describe the EDGE-WALK algorithm. So let me start with the partial coloring method. So the
partial coloring method was introduced by Beck and the philosophy it is as follows: in the
discrepancy minimization problem we said we wanted to find a red, blue, or 1 minus 1 coloring.
And Beck said, instead of insert, let's find a 1 minus 1, 0 coloring, so where some of the
variables might get a value of zero. But if you don't put any restriction here, the problem
becomes trivial. You just put all zeros vector. So Beck said, instead of insert, let's find a partial
coloring where at least half of the coordinates are nonzero. And the [inaudible] is as follows.
Let's say you find a partial coloring where half of the coordinates are nonzero, like the one
shown here, and so you say once you find such a coloring let's hide the things which are
nonzero. Let's forget about that and [inaudible] the problem on the remaining unfixed
variables. So you find other partial coloring on these guys, maybe n over four elements, and
then repeat the process until you cover the whole, all the variables. Okay? And if everything
works out according to plan, the hope is that the first partial coloring gives you a discrepancy of
square root n and the second partial coloring gives you a discrepancy of square root of n over
two because now you are working over universe of size n over 2, and then n over four and so
on. So you get a geometrical [inaudible] series and total discrepancy is o of square root n,
which is what we want.
And more concretely, the first step is you have a collection of n set systems, and sets over some
universe size n, and you want to find a 1, 0, minus 1 coloring such that the coloring has small
discrepancy or o of root n and most importantly, at least half of the coordinates are nonzero.
>>: And it's really, n is the number sets and the [inaudible]?
>> Raghu Meka: Right. So here I'm simplifying, if you put m there, it will be square root of n
times log of m over n. But I think this is the main step. Once you get this it’s not too hard to
get to other things.
>>: [inaudible]? Number of sets or number of [inaudible]?
>> Raghu Meka: Oh, here? It's square root of number of variables, number of elements times
log of m over n. That’s the precise bound. And our main result is that [inaudible] is an efficient
algorithm to find such a coloring. And that's what I'll talk about. And because there is an
efficient algorithm to find such a coloring, as a corollary, we get that there exists one such
coloring which was a main point of the [inaudible] results.
So let me now talk about the EGE-WALK algorithm. And the [inaudible] algorithm, let me
rephrase the problem in a geometric language. So here's a discrepancy set, [inaudible] coloring
which minimizes the maximum imbalance. But there is a different way or perhaps more
intuitive way of measuring the discrepancy which is to look at the matrix vector product of the
incidence matrix of the set system with the coloring vector. Then the discrepancy of the
coloring is just a maximum entry in absolute value of the metrics vector product. So you want
to place some [inaudible] vectors so the infinity normal of the matrix vector product is not too
large. And further, if you let V, 1 through V, m be the indicator factors of the sets in your set
system, what we want is the coloring CHI such that linear part of CHI, with each of the indicator
vectors is small. And now these constraints, the linear part of CHI with the indicator vectors
being small, these are all linear constraints except with the [inaudible], they’ll be one CHI to be
a plus or minus one vector. And this naturally suggests the approach of looking at the linear
programming relaxation for the problem. And if you do that, you end up with the following
polytope. B, which is a set of all x, so that each entry of x is at most one in absolute value and
the end product of x with each of the indicator vectors is small. So this is a linear program in
relaxation of the discrepancy minimization problem.
And our goal is to find a nonzero lattice point inside this polytope. Not just a nonzero point, but
trivial point which has many nonzero coordinates that would correspond to a good partial
coloring. And in the rest of the talk I'll refer to the constants of the form absolute value of i, x, i
at most one, as color constraints because that's where they come from. And the linear part of
constraints is discrepancy constraints because they correspond to the discrepancy of the set
system. So here's a goal set up. Again, you have this polytope and we like to find a highly nonzero lattice point inside this polytope. And for intuition, let’s use the distance from the origin as
a proxy for how nonzero a point is. That's just for intuition. And so we are starting at the
origin, let's say. And our goal is to find a vertex which is kind of as far away from the origin as
possible, but it's still inside the polytope. And the way we do this is to use Brownian motion.
So we do a random walk in dimensions until you hit the boundary of the polytope. And once
you hit the boundary of the polytope, you need to decide what to do. We still want to keep
moving away from the origin because you want to increase the distance from the origin, on the
other hand, we don't want to cross the polytope. So we take the greedy approach, we change
it to the Brownian motion but now constrain yourselves to lie within the boundary that you hit.
So do a random walk of Brownian motion, and so you hit other constraints, and then you
repeat the process until lo and behold, you hit the vertex that you wanted to get at the
beginning. So this is that the full algorithm. And the claim is that this algorithm will find a good
partial coloring.
>>: So this Brownian[inaudible] through fractal points?
>> Raghu Meka: Right. So if Brownian, it is through fractional points.
>>: And what is your stopping point?
>> Raghu Meka: So [inaudible] the stopping criteria is just go until you reach a vertex.
>>: The vertex of the polytope?
>> Raghu Meka: Vertex of the polytope. But in reality, we’ll just stop after a certain number of
iterations because you wanted to [inaudible] discrete wall.
>>: But this final point is also [inaudible] fractional?
>> Raghu Meka: So most of the coordinates will be integral except some coordinates. Half the
coordinates will be integral, half the coordinates will be fractional. And we’ll then recourse on
the fractional coordinates as before. So half the coordinates will be integral.. That's what we
want it to show.
>>: I see.
>>: You can keep either the phase, coloring phase>> Raghu Meka: Or the discrepancy. Exactly. That's the point. So, I mean, let me describe the
algorithm fully because this continuity, continuous Brownian motion is not exactly, it’s nice for
intuition, but when you try to implement it it’s a bit problematic. So to make the algorithm
formal, let me describe what’s the random walk that we’ll use. So the random walk is very
natural. You have some subspace v and given your current position, you just choose a random
Gaussian vector in your subspace and take a step of [inaudible] size gamma in that direction
where gamma is some step size. You want to take tiny steps, not too large. So the walk looks
something like this. And another minor issue is that in the description I said you stop the walk
once you hit the phase. But if you're doing a discrete walk you might hit the phase but might
overshoot it, and the solution is to just introduce a slack near each bound. So if you get too
close to the boundary, then we’ll say we hit it.
So let me now describe the algorithm. So the algorithm gets those input some vectors which
you should think of as the indicator vectors of your set system, and it transfers a certain
number of steps which depends on step size gamma, but let's not worry about the
parameterization too much. So you start from the origin, and that [inaudible] step let color t
[inaudible] for all color constraints or color phases that are nearly hit. This is a set of all
coordinates i such that the absolute value of the i coordinate is very close to one. And let
discrepancy t, to know the set of our discrepancy constraints that that are nearly hit. This is a
set of all vectors V, j, so this the linear part of V, j with our current vector is very close to the
threshold that we don't want to cross. And that said, you still want to do the random walk, but
not change any of these constraints which are very close to being violated. So we just walk in
the subspace that is orthogonal to all of these constraints. So if you let B, t be the subspace,
you just pick a random Gaussian vector in that subspace, take a tiny step in that direction, and
repeat this process. And the number of steps you have to do this will roughly be one over
gamma squared where gamma is the step size. So that's the algorithm. And the claim is that
you'll still find it good partial coloring with some non-negligible probability.
Now let me give a sketch of the analysis and actual proof is not too much more complicated.
You just have to write some dead bounds. But it's not too hard.
>>: [inaudible]?
>> Raghu Meka: So you’d have to take very tiny>>: [inaudible] But then do you stay at the point of actually, nevertheless [inaudible]?
>> Raghu Meka: So, I mean it happens with so little, tiny probability that you kind of ignore
such events. So if your step size is much smaller than the slack you introduce, you're never
going to overshoot the [inaudible].
>>: But you stay actually off the phase. You don't move around [inaudible]?
>> Raghu Meka: Know. So you can think of the slack as being polynomial small, so instead of
stacking them at the n, if it's one over n, you can just [inaudible] anything you want. It's not
going to affect the error in the discrepancy problem.
>>: Could you use Brownian motion if you think you're [inaudible]?
>> Raghu Meka: I think so. I mean, probably. Yeah. You should be able to. Brownian motion
is basically like the case where gamma is going to zero. So let me describe the analysis. The
setup is the same. You have this polytope and we’d like to find a highly nonzero lattice point.
And the punch line for the analysis is that the discrepancy phases are much farther from the
origin than the color phases. Okay? So let's say we are shooting for a constant of hundred
times root n. Then the discrepancy phases are at a distance of hundred from the origin because
each of these vectors V, j has [inaudible] root n, and the color phases are at a distance of one
from the origin because [inaudible] X size less than equal to one. And by design, by the way, we
designed the random walk, it looks in Gaussian in any direction, with some variants. So you do
some calculations, some [inaudible] bounds, and you get that the probability that the walk hits
a discrepancy phase is roughly exponentially minus hundred squares, like [inaudible] Gaussian
[inaudible], and the probability of the walk is a color phase is roughly exponentially minus 1.
It’s not too small. And as a consequence of these two bounds, you get that the probability that
the walk hits a discrepancy phase is much smaller than the probability that it hits a color phase.
>>: [inaudible]?
>> Raghu Meka: So you fix any discrepancy phase, in any color phase, it's more likely to hit the
color phase, and then you take a kind of expectation or all the discrepancy phases.
>>: [inaudible] when you say is [inaudible] do you mean the probability that it ever hits
[inaudible]?
>> Raghu Meka: Okay. Because of this comparison, what's morally true is that you end up
hitting the cube phases more often. And let's say you don't worry about those [inaudible] as an
issue and you will run the walk until you reach a vertex. But when you which reach their vertex,
you should have hit n constraints. And we know that most of them are color constraints. And if
you are satisfying n constraints, and most of them are color constraints, half of them must be
color constraints. That means that you get a good partial coloring.
>>: So the estimate you have [inaudible] one round from the origin until you hit something
straight, right? So you hit the discrete, the discrepancy phase is smaller [inaudible]?
>> Raghu Meka: So this is [inaudible] of the algorithm. Let’s fix the discrepancy phase. The
probability that hits that discrepancy phase is much smaller than the probability that hits any of
the, an average cube constraint.
>>: [inaudible] first time we hit [inaudible]'s start with>> Raghu Meka: Kind of intuition actually works even afterwards because I cheated a little bit,
and you're catching me on that, is that it's not any color phase, but like most of the color
phases. It might not be true for let's say, X, 1, but it will be true for the majority of the color
phases.
>>: I don’t understand the quantifiers here. You are fixing two phases, one as [inaudible] and
one color phase?
>> Raghu Meka: Right.
>>: So what’s the probability that it hits>> Raghu Meka: The discrepancy phase throughout the run of the algorithm? It's roughly the
sky.
>>: If that’s what’s true, then with high probability you would never hit any discrepancy phase.
>> Raghu Meka: No, but there are many of them, right?
>>: Yeah. But they're not exponentially [inaudible] trying to>> Raghu Meka: Hundred is a constant here. You should think of it, maybe if you put a six here,
and then it might make sense. It's a constant, and the number of discrepancy constraints may
be larger or is comparable to the number of color constraints.
>>: But really when you say a half, you can improve that too.
>>: Because once you keep one of those discrepancy phases, so your random walk is no longer>> Raghu Meka: Truly Gaussian. But what happens is the first inequalities still hold because if
you look at the production, it's a Gaussian with lesser variants, so you're doing better. And for
the other guys, it won't be true for every color constraint but it will be true on average. I mean,
it's like the intuition, and you arrived on the calculations, it will come out to be true on average.
You might decrease the variants n for one of the color constraints but not for all that data
[inaudible] simultaneously. Okay?
So that's basically the proof, and you just have to write some [inaudible] bounds and use this
averaging argument to formerly verify it. And the key part here is that we use the symmetry of
the Gaussian distribution to argue this fact and that's where we beat the, end up beating the
union bound in the analysis. Okay? So let me summarize. So you use our EDGE-WALK, we call
it the EDGE-WALK algorithm because you're basically trying to walk on the edges of the
polytope as much as you can. And so it gives an algorithmic formal departure coloring lemma,
you find a group partial coloring, and then you recourse on the unfixed variables as in Spencer's
original argument. And you combine these two to get Spencer’s data. So here is the symbol.
>>: When you optimize [inaudible]?
>> Raghu Meka: So I was [inaudible] completely sure, I did some crude estimates and wrote
some MatLab code to figure out the constant, it came out to be 13. It probably can tweak it
better that for a starter approximation, say 13.
>>: So let me just try to understand. When you choose the amount of this slack [inaudible],
you're saying we can do that in a way that justifies [inaudible]?
>> Raghu Meka: On n.
>>: [inaudible] estimate I mean, somehow it might depend on [inaudible] get to [inaudible]?
>>: [inaudible] Guassian [inaudible]?
>> Raghu Meka: Right. So you choose it, let's say you choose that one over n squared or
something. Okay? And you run this whole algorithm. And then you, whatever slack you have,
you just round it to the nearest one or minus one. And because it's one over n squared and
each vector has [inaudible] n, you don't hurt yourself much. So before moving onto the second
part, let me just say that the algorithm I described, you can do it for any polytope. This is just a
random walk on polytopes, so you just start from the origin, if it contains the origin or start
from any point inside the polytope, and then repeat the process as before. And here let’s look
at kind of the ideal scenario where you do that Brownian motion. And it seems to introduce
some strange distribution on the vertices of this polytope, which kind it is cute to [inaudible]
vertices which are closer to the origin in terms of the bounding hyperplanes.
>>: So one thing is it's like [inaudible].
>> Raghu Meka: It's kind of an iterator rounding scheme.
>>: [inaudible]?
>> Raghu Meka: It’s really, you’re [inaudible] vertex of the remaining bound and you start with
the vertex of the points.
>>: [multiple speakers][inaudible].
>> Raghu Meka: No. In rounding you solve your optimized linear function [inaudible]. That's a
special case of rounding.
>>: [inaudible] polytope. [inaudible]?
>>: So you think of this as an integral polynomial?
>> Raghu Meka: Yeah. It's like you're trying to find an integral solution. So you find this
fractionally integral solution and then you iterate on the remaining things and so on.
>>: The problem is if I have as they say an integral [inaudible] and I don't know what the
hyperplanes are, so how can you do it?
>> Raghu Meka: Say that again?
>>:. So if I have, let’s say I start with the integral portal which is [inaudible] of the integral
points, I [inaudible] optimize [inaudible] and I don't know what I iterate on. How I do this?
[inaudible]?
>>: Well, if a polytope is integral anyway>> Raghu Meka: For any polytope you just, so here's the routing algorithm if you want, you
have a fractional solution. You do this walk and you get some vertex, and you look at the
coordinates of the vertices which are now integral.
>>: [inaudible]?
>> Raghu Meka: If it's not, then the routing algorithm has failed. But that's the approach. So
you find some coordinates, which are integral, okay?
>>: Maybe not afterward.
>> Raghu Meka: Yeah. Then the algorithm is failing for those kinds of, it's not a universal
rounding algorithm. So if [inaudible], it will [inaudible]. So this is the approach for finding>>: So it’ll work if this is a polytope, if I start to have many integral coordinates>> Raghu Meka: [inaudible].
>>: What I’m saying is if the polytope is guaranteed, has this property to begin with, I should be
able to solve this by running [inaudible] and even finding [inaudible].
>> Raghu Meka: No, no, no, no. But think of like which vertex, there might be a vertex which
has some integral coordinates, but not all vertices have this property. And you want to find one
such vertex just as in the discrepancy minimization problem. It was not even clear that there
existed such a vertex. And in fact, these vertices are much fewer than the total number of
vertices. They are exponentially small fraction of the total number of vertices and you want to
kind of isolate these vertices.
>>: [inaudible]. But on the other hand, some of the vertices [inaudible]. So maybe if you
expand your definition of rounding, my sense of rounding is that you take a fraction [inaudible].
>> Raghu Meka: So you guess through integral point. And if it fails, let’s say it gives all zeros,
saying which is a trivial rounding. It doesn't give anything meaningful there.
>>: [inaudible]?
>> Raghu Meka: The thing is a rounding algorithm in principle, and the applications also use
this as a rounding algorithm. There is a way of rounding using it around discrepancy constraints
and so on. And in fact, [inaudible] use it in this context. He finds the LP solution for this binpacking problem and uses our algorithm to round the LP solution.
>>: Do you know how exactly [inaudible]?
>> Raghu Meka: Not really because it's like two or three weeks old, and I haven't had time to
look at it. Okay. So here's the first part of the talk, and let me now talk about the second half.
And here I'll just state our result and mostly talk about some applications. So what are central
limit theorems? And I'm sure most of us know what limit theorems are, but let me just quickly
run through some examples to basically set up some notation. So the central limit theorem
says that if you're n independent [inaudible] random variables, then the sum of the variables
after a proper normalization looks like the standard Gaussian distribution. And it is one of the
most basic results in probability and has many applications. And pictorially, here I had the
probability density function of one random variable, two random variables, and so on. And you
can clearly see the shape of the Bell curve emerging. In either example, is large deviation
balance in probability. We say that if, I mean, specifically for independent boundary random
variables, the sum of independent [inaudible] random variables, the probability deviates too
much from the mean, and somehow comparable to what happens in the Gaussian case. And
such bounds are crucial for [inaudible] randomized algorithms, they again fall in same kind of
broad paradigm of limit theorems because you're comparing the behavior of sum of
independent things to that of the Gaussian distribution.
And why should we care about limit theorems and algorithms or in computer sense
applications? Mainly because they give us a nice way to translate [inaudible] problems to
continuous problems where we often have many more sophisticated tools like calculus or
convex geometry and so on. And one nice example of this philosophy is the matter of convex
relaxations for [inaudible] optimization problems. So there you look at relaxations which are
now continuous problems and then you can do some rounding or so on.
So limit theorems have a lot of applications in computer science to working theory or social
choice theory, to learning theory, complexity theory, communication complexity, and also in
pseudo-randomness. And here I’ll talk about one limit theorem and its applications to
questions in working theory and learning theory. So my work has led to two generalizations of
the central limit theorem to multidimensional and [inaudible] which are motivated by such
applications for problems in learning theory. And here I'll talk about the multidimensional
version. So to motivate the multidimensional central limit theorem, let us look at a special case
of the classical central limit theorem. It says that if you have n random plus or minus one signs,
then there are some looks like the standard Gaussian distribution, which is a very, very special
case of the central limit theorem. But there is a geometric way of looking that this result which
is as follows: let's say you have the universal, the end-dimensional space [inaudible] two
dimensions here, you do limitations of power point, and red squares, you should think of as n
bid Boolean factors, and blue circles as Gaussian random vectors, random Gaussian vectors.
And the central limit theorem says that if you take a hyperplane in n dimensions, it slices the
space, then the fraction of bullion points that lie on one side of the hyperplane is close to the
fraction of Gaussian points that lie on the same side of the hyperplane.
And given this geometric interpretation, it's natural to ask what happens if you have, instead of
having one hyperplane, you have two hyperplane's? Is the fraction of Boolean points on the
same side close to the fraction of blue points still? What if you have three hyperplanes, or
more generally, you have many hyperplanes which now form a polytope. And this leads to the
question of central limit theorem for polytopes. So you have a polytope in n dimensions and
we’d like to know if the Boolean volume of the polytope, which is the probability that a random
plus or minus one point license at the polytope is close to the Gaussian volume, which is the
probability that random Gaussian vector lies inside the polytope. And this is what, we’ll show a
limit theorem in this context, but let me just put some conditions and we’ll reach out the
[inaudible]. So we’ll assume that our polytope has a polynomial number of facets because
that's the most interesting case for applications. And, most importantly, we also assume that
the polytope is regular in the sense that no variable is too inferential for any of the bounding
hyperplanes. Let me explain this. So the geometrically, it says that the polytope, none of the
bounding hyperplanes of the polytope are aligned with the coordinate access vectors. For
example, the hyperplane x, 1 will not be regular, and the hyperplane X, 1, oh, sorry. The sum of
all the x sides is a regular hyperplane. And regular polytopes appear reasonably commonly in
[inaudible] programs and so on, but most importantly, regularities needed for such central limit
theorem. With regularity you cannot have a central limit theorem, so it's a reasonable
assumption to make.
And in joint work with Harsha and Klivan, we showed that if you have such a regular polytope,
then the Boolean volume of the polytope is close to the Gaussian volume [inaudible] which is a
little o of one. The little o of one is actually something like log squared the number of phases,
but it's most importantly little o of one. So error goes to zero as the dimensions go to infinity.
>>: Does the little o depend on how far these planes are from the coordinate axis?
>> Raghu Meka: I mean, not in terms of the distance from the origin, but in terms of
orientation.
>>: Yeah.
>> Raghu Meka: Exactly. So here's our statement limit theorem restated. And once again, the
notable thing here is that it beats the union bound. There’s a rich body of literature on
multidimensional central limit theorems, but unfortunately, if you apply those results in our
setting, you end up with a bound which is at least linear in the number of facets of the
polytope. And at its heart, or implicitly, I think this is because when you specialized those
theorems to our setting, you end up bounding the error for a single hyperplane and then kind
of taking a union bound across a different hyperplane's.
>>: So then why do need the condition that it has polynomial facets? [inaudible]?
>> Raghu Meka: Say that again?
>>: So why doesn't the sum of m>> Raghu Meka: It applies to the most, like the error is actually [inaudible] polytope with any
number of facets. But it’s the easiest to write down. So if the number of facets is K, there is
like log squared K, so as long as K is maybe slightly sub exponential you get something
meaningful.
>>: If you know exponential you could shave off all the corners of>>: Of course. [inaudible] exponential [inaudible].
>> Raghu Meka: Right. We can handle up to something like two to the root 10 or two to the n
some power. Yeah?
>>: But again, it doesn't depend on the distance [inaudible] how it captures the large
deviations case as well?
>> Raghu Meka: No. I think, it's too, it's too imprecise for large deviation bounds because>>: It's too small?
>> Raghu Meka: It’s hard captioning the [inaudible] details. Even the usual central limit
theorem, you can't get large deviation bounds from that. And our bounds are actually nearly
optimal. You get comparable lower bounds as well. So you get something like a log squared K
and the lower bound is square root of log K. And before going through the applications, I won't
have time to delve too much into the proof, but let me say a few words. The first thing I want
to say is what makes us think this is possible, that you can be to the union bound in this
context? If you look at the literature in probability, one of the main techniques they have for
proving limit theorems are these radiational[phonetic]methods, and the decor of them is if you
want to bound the error, you need to bound the probability of some, of regions which are close
to the boundary. So when you do the analysis, things which already close to the boundary on
either side, if they have too much volume then you'll be in trouble. And for the case of
polytopes, let's say you start with your polytope and look at all points which are at the distance
of epsilon from the boundary of the polytope. So you get K of these strips like this, like the
ones shown here, and there's a beautiful result due to Nasarov, which showed that the
Gaussian volume of these blue regions is actually square root of log K times epsilon. And if you
use the union bound, you get a bound of K times epsilon because each strip has a volume of
epsilon and there are K of these strips. So this is one of the places where you beat the union
bound and knowing this [inaudible] prompted us to think such results as this and you can use it.
>>: So what is K?
>> Raghu Meka: K is the number of facets.
>>: [inaudible]?
>> Raghu Meka: This is in n dimensions, but the figure is power points. And finally, the actual
proof of the result uses this. This is one place where we beat the union bound. And then we
also use some other results from nontrivial, that’s from convex geometry, but what if I find
surprising is the proof actually uses techniques that will dole out in the context of designing
pseudorandom generators. So the result has nothing to do with pseudo-randomness or
designing PRG’s. Some of the proof was at least morally inspired by those techniques. You can
de-randomize it, but we throw the [inaudible] probably without coming across this, at least
[inaudible].
So let me now talk about the applications of the limit theorem. And you'll see how to use the
limit theorem to beat union bound in various cases. The first application I want to talk about is
noise sensitivity. Let's say you have some election where people cast their votes and you
decide the outcome of the election based on the majority. And what we'd like to, what we are
interested in here is what if there are errors in registering the words. So you have these words
again, and there are errors, and what we'd like to know is if the errors cause the outcome of
the election to change. And this is can be captured by this nice concept from the analysis of
Boolean functions called noise sensitivity. So you have some Boolean function f and some noise
parameter epsilon, and the noise sensitivity of f is the probability that the function evaluated at
a random point changes its value when you perturb each coordinate of the random point with
some probability epsilon. Okay?
So coming back to the voting scenario, so the election scheme here will be decided by the
function f, and noise sensitivity is a probability that if you flip each vote with some probability
with error probability epsilon, the outcome of the election changes. That is noise sensitivity.
>>: X is also at random?
>> Raghu Meka: X is also at random. So it has many applications in analysis of Boolean
functions starting with the work of Kahn, Kalai, Linial, who implicitly used it to show some four
year concentration properties started in 1997, who used it to show some optimal in
approximal[phonetic] results. Benjamini, Kalai and Shramm, who formerly defined. They also
use it for some questions in percolation and so on. And here, I want to talk about noise
sensitivity of majorities, or more generally, weighted majorities. And here, Peres showed that
the noise sensitivity of any rated majority with noise rate epsilon is at most two times square
root of epsilon. And as you've been asking before, when you have a question about one
majority, it’s more natural to ask what happens if you have two majorities or three or four or
several majorities. And this leads to the problem of computing the, or bonding the noise
sensitivity of polytopes. So you have some polytoping end dimensions and the noise sensitivity
of the polytope geometrically means the following: so you generate a random point, and by
[inaudible] the probability that when you put out this point, if it ends up crossing the boundary.
Right? You take a random point, perturb it, and we want to know when will it, how likely is it to
cross the boundary of the polytope? And using our limit theorem, we can show that the noise
sensitivity of any regular polytope is at most by log it [inaudible] K times the factor which is
polynomial in the noise rate epsilon.
>>: So there’s [inaudible]?
>> Raghu Meka: Yeah. Boolean points. And once again, this beats the union bound. And the
best previous result was, basically said that you have a bound in the noise sensitive of a single
hyperplane [inaudible] things and the noise sensitive can increase at most by a factor of K.
>>: So K is the number of facets?
>> Raghu Meka: K is the number of facets. And how does it prove [inaudible]? At a high-level,
the proof basically says the following: so you want to compute the Boolean noise sensitivity.
We use our limit theorem to translate the Boolean problem to the Gaussian world where the
Gaussian analog of noise sensitivity is the surface area or the Gaussian service area of a
polytope. And there there's a beautiful result of Nasarov, which I briefly method mentioned
before; so it is a Gaussian service area of a polytope with k facets, it's square root of log K times
epsilon. The square root of log K. So here it’s important that our limit theorem itself have very
good dependence on the number of facets because if you lose too much in the limit theorem,
even if you have very good bounds with the Gaussian, well you don't get anything. And so we
have to beat the union bound in both cases. And one application for noise sensitivity bound is
to learning intersections of half spaces. And let me skip this application for now. And let me
just summarize by saying that in both of these cases we beat the union bound by using some
geometric techniques, especially some symmetric properties of the Gaussian distribution and
this other example of, along the same lines in my work on computing the [inaudible] of
Gaussian process.
Let me briefly mention some of my interests and then I'll conclude. So one of the major focuses
has been in pseudo-randomness and constructing pseudorandom generators and then focusing
on constructing generators for geometric shapes like halfspaces, polynomial threshold
functions, polytopes; and here, the pseudorandom generators are naturally motivated by
questions and complexity theory. But they also have applications in algorithms. Especially
through dimension reduction, counting solutions to [inaudible] programs and so on. And the
bounded space algorithms, where they’re helpful for making some streaming algorithms
efficient and simulating random walks efficiently, and so on to bound [inaudible] and to
counting algorithms. So that is one [inaudible]. And the [inaudible] is hardness of
approximation where the focus of my work has been on the unique [inaudible] conjecture,
which is probably the most important open question in this area, and which if true, would solve
many other important open problems like the complexity of Max-Cut, Vertex-Cover, SparsestCut and so on. And here in my work, gives the best evidence in support of the conjecture and
so just the quality to approach improving the conjecture by using alphabet reduction; and this is
one of the things that I thought about in the past and will probably think about in the future.
And a learning theory, my work has focused on learning halfspaces, polynomials, and more
recently, to adapting the smooth complexity framework from algorithms to make certain
seemingly intractable problems in learning theory tractable.
And I'm also done work in data mining where the focus has been mainly on matrix rank
minimization problems. So you have a bunch of linear constraints on matrixes and you want to
find the minimal rank matrix which satisfies these constraints. This has many applications,
many [inaudible] such as a Netflix challenge or recommendation systems, metric and
computing metric embeddings, kernel learning and so on. And here, this is more [inaudible].
So the focus has been on giving simple and practical algorithms which scale well for high
dimensional data and also have some nice rigorous guarantees.
So in summary, my work has basically been around using the structure of randomness in the
form of limit theorems and the geometry of randomness, for various questions in algorithmic
proofs, for complexity theories, pseudo-randomness, learning theory, and vice versa. And it is
these connections I think that form the core of my research, and I’d like to explore these things
in the future. Thank you.
>>: Further questions?
>>: Do you have, what’s the lowest bound [inaudible] polytopes?
>> Raghu Meka: Of polytopes you need square root log K.
>>: Times?
>> Raghu Meka: Times epsilon. Sorry, square root of epsilon. Up to some constants. If you
have a polytope with K facets, it seems like the>>: [inaudible]?
>> Raghu Meka: Epsilon to the half, n square root of log K.
>>: [inaudible]?
>> Raghu Meka: We have log square and also the regularity assumption.
>>: [inaudible]?
>> Raghu Meka: So I think it might be easier to push the K down then to push the epsilon down
because I think getting square root of epsilon in our techniques might be much harder than, like
we use what's called the Linderberg[phonetic] method and even for a single, if you use that
method even for the center limit theorem, you end up getting a bound like epsilon to one fifth.
So that might be more harder than the K.
>>: For the small k there's nothing better than this? [inaudible]?
>> Raghu Meka: Right. So you have our results and it beats the union bound but it has a
regularity assumption.
>>: But with the union bound, only the K [inaudible]?
>> Raghu Meka: Right. Yeah. Thank you.
Download