>> Kamal Jain: Hello, it's my pleasure to introduce... He's visiting us from Berkeley, and he'll be talking about...

advertisement
>> Kamal Jain: Hello, it's my pleasure to introduce you to Grant Schoenbeck.
He's visiting us from Berkeley, and he'll be talking about the strength of linear
and semidefinite programs.
>> Grant Schoenbeck: So thanks, everyone, for showing up to my talk. I'm
going to talk about the strength of linear and semidefinite programs. You've all
said you've heard several talk on this topic before, so I changed things around
from the normal talk and hopefully it will help put things in context and it won't be
so into the details as some of the talks. So I hope you enjoy it.
The setting is we have these combinatorial optimization problems. And we know
how to get -- we know that you can't solve them exactly, NP-hard and general
solve exactly. And so you might say well, are they all the same? And people will
say no, they're not all the same because actually some of them you can
approximate well and some of them you can't approximate well. And a lot of the
recent work in theoretical peer science has been pinning down how well you can
approximate these problems.
And we've pinned this down pretty well for a large swath of these problems, but
problems that look like travelling salesmen problem or problems that are 2-CSPs,
which means that kind of there you have local constraints and each constraint
only depends on two variables, these we've been rather unsuccessful at pinning
down approximation factors.
We do have at least non trivial approximation algorithms for these problems.
And some examples of these are the Lovasz theta function. This MAX CUT
algorithm by Goemans and Williamson and the ARV sparsest cut algorithm.
So and these kind of successful algorithms are all based on semidefinite
programming. So kind of a useful technique seems to be semidefinite
programming and as we'll see kind of later, there is obvious ways to make these
semidefinite programs stronger. And one of the main questions is do these
techniques help? So can we get stronger or better approximation algorithms kind
of without thinking too hard for these problems.
So today I'm going to focus on one particular problem which is a vertex cover.
So in vertex cover you're given a graph. And you want to find a minimal kind of
cover so a cover is a set of vertices such that each edge is incident to at least
one vertex in the cover. You want to find a minimal size cover.
And this is -- this problem is NP-hard. We're going to approximate it. And there's
a simple 2 approximation which I'll show soon. It's been shown to be NP-hard to
compute exactly, better than 1.36. This is by Dinar and Safra. Actually it's unique
games hard to do better than 2. And we're interested in how well can these
linear programs, semidefinite programs do. Even in just like barely
subexponential time can we get better than a 2 approximation for these -- using
these techniques.
So to define things a little bit. The first thing I'll kind of define what a linear
program relaxation here is. So you start off with an integer program. So we
encode the vertex cover problem as an integer program. So for each vertex in
the graph we're going to choose a variable. The variable's going to be 1 if the
vertex is in the cover, 0 if the vertex is not in the cover. And so we're trying to
minimize over the sum of the variables which is the number of variables in the
cover.
And for each edge we add a constraint that says that, you know, the sum of the
end points is greater than 1, which means one of them must be in the cover.
So all I've done here -- I haven't done anything. I've just rewritten vertex cover as
an integer program. Vertex cover was NP complete. So was an integer
programming. So nothing has happened.
But the kind of amazing thing is that if we relax this and we add more possible
solutions and we say just anything between 0 and 1 is okay, now I have a linear
program and I can solve it but now I'm solving a different problem than what I set
out to solve. And this notion of integrality gap captures kind of the relation
between the linear program and the integer program. So what it is is it's the ratio
in kind of the worst case in the combinatorial optimum divided by linear
programming optimum. So this ratio will be large if there's any instance where
the linear program optimum gets much better, right, because it relaxing it so the
minimization is going to decrease so it gets much better than the integer.
And so we can see here the integrality gap is actually two more or less. It's less
than two by rounding. If you just solve the linear program, you get these
fractional solutions. And we're just going to round the fraction to 0 or 1. And at
most the weight of the total solution can double because one-half can go to one,
and that's doubling. That's the most it can do.
You notice that if the constraints were satisfied before the rounding, they're
satisfied after because if the sum of the constraints was greater than 1, one of
them must be greater than one-half. That one will round to 1, so the integer's
solution will also be a solution here.
So you get the integrality of less than 2 by rounding. And so in particular this
gives you an algorithm. And you can see it's greater than 2 also. And the way to
do that is you have the complete graph. That's going to be your counter example
here. It has no small vertex cover. So the smallest vertex cover is N minus 1.
But the linear program gives and optimum of N over 2. And the reason is you
can put one-half everywhere. If you put one-half everywhere, all these
constraints with satisfied. So it claims the kind of found a vertex cover of size
half the vertices but no such vertex cover exists.
And so the ratio between N minus 1, N over 2 is 2. So you get the integrality gap
being at least 2 here. So let's just kind of review what happened here. What
happened is we started off with an integer program. And you can look at the
convex combinations of the valid integral solutions. And then we relaxed the
program. When we relaxed it, this kind of polytope became bigger and emitted
kind of bad solutions to it.
Okay. Now, you might object to this being -- saying this complete graph is kind
of a obvious counterexample, and we can -- we can get rid of that pretty easily by
just saying it has triangles in it, so any triangle at least two of the vertices must
be in the cover. So I'm going to add another constraint that says that, you know,
for any triangle in the graph the sum of the vertices corresponding to them must
be at least 2. Okay?
And now the previous counterexample doesn't work, right? And the question is
well what's the integrality got here? And the answer is it's still 2. You need to be
more clever to find a counterexample, but it's still 2.
And so then once we had the integer convex integral solutions we relax it to
linear solutions and then this essentially creates a cut on these solutions that we
have making the polytope smaller. And the question is kind of can we make
these cuts enough to get it back so that the -- to actually shrink the integrality
gap? And you see while the complete graph was removed as a counterexample
equally bad counterexamples still exist here.
So let's just do it one more time, this time with semidefinite programming. Okay.
So this is kind of almost doing the exact same thing with semidefinite
programming. So we start off with our integer program here. I wrote it a little
differently. So instead of the sum constraint, I have the product equal 0 so at
least one of them must be 1. It's all that saying. You can rewrite it by just
homogenizing it.
So now I don't have any degree one terms. It's all I did here. So these are
squared. And that doesn't actually change anything, so I equals I squared
means that -- XI equals XI squared means that XI is 0 or 1. Those are the only
two solutions there. So here I add another variable that's essentially 1. So you
can think of X not as being 1. It could be minus 1, but it didn't change things, so
just think of it being 1. And then I take XI squared, which is the same as XI. I'm
minimizing sum over this. And I have -- I replaced this constraint here by just
multiplying by one there which doesn't change anything. And I replace the 1 by X
not there.
So I haven't changed anything yet. I just homogenized it. But now from here we
can actually relax and say, well, instead of using these scalers, I'll allow any
vectors. And so this is a relaxation. And now I just get this goofy looking
program over here which again is solvable in polynomial time because it's a
semidefinite program.
And now the question is what's the relation between the solutions of this
program, the semidefinite program, and the integral program? And it was shown
by Kleinberg and Goemans in '98 that the integrality gap is still essentially 2. And
they used this kind of special graph or kind of family of graphs to do this. All
these Frankl-Rodl graphs which will come up more in the future too. But what I
want to say about this is that so if you have an integrality gap, it kind of requires
three main components. One is you need the counter example itself, which here
is going to be a Frankl-Rodl graph. So the Frankl-Rodl graph is you have the
vertices of the vertices of the hypercube and each vertex is connected to the
points that are almost antipodal to it.
So if you connect each vertex to its antipodal point in a hypercube, you get a
matching, right? And a matching actually has a vertex cover of size a half. It's
matching. Okay? This graph is geometrically very similar. Instead of connecting
to antipodal point, each point is connected to points that are almost antipodal to
it. And so geometrically it looks kind of like a matching, but in fact it's not, okay?
So the next two components you need of the integrality gap, to show a integrality
gap is you need to show one that this graph actually -- this counter example does
not have a small vertex cover, which is what Frankl-Rodl showed, that this kind of
graph where you connect things almost antipodal points has no small vertex
cover. Any vertex cover is almost the entire graph.
And you also need to show a semidefinite solution to the problem that obtains -that shows kind of the semidefinite program things there is a good solution to
this.
And here what they did with these Frankl-Rodl graphs is they were able to create
solutions kind of with intuition I told you before that geometrically this graph looks
very similar to matching, even though its combinatorial behavior is far from that.
So Kleinberg and Goemans created this SDP solution that satisfies this for the
Frankl-Rodl graphs which showed integrality gap as 2. But again you can
introduce very simple kind of additional equations to it. So here we add one that
says XI times XJ is greater than 0 which is always true for these integral
solutions. And that adds the corresponding constraint that VI.VJ is greater than
0.
And this it turns out this equation was not satisfied in the Kleinberg and Goemans
solution. So kind of if you just increase the program a little bit it's not satisfied
anymore. Charikar showed that you could kind of -- you could tweak it and
satisfy it again.
But what I want to point out here is kind of this cat and mouse game between
kind of creating better cuts and then better integrality gaps and then better cuts
and then better integrality gaps and then better cuts and better integrality gaps.
And ->>: [inaudible]. [laughter].
>> Grant Schoenbeck: And so kind of the goal of this line of research is to
systematize this and to try to get rid of the cat and mouse game, though perhaps
it just creates it on a larger scale and say what it means to kind of rule out large
swaths of these linear and semidefinite programs.
So the way we're going to do this is to think of things as distributions. So we
have -- we're dealing with convex programs. So we can't help but include all con
vehicle combinations of valid integral solutions. Otherwise you won't have a
convex program.
But we want to -- so the kind of thought is we want to allow only these things. To
allow only these things. So a convex combination of integral solutions can be
thought of as a distribution over integral solutions. Which is kind of a little
strange. But so someone comes to you and says well I don't have one integral
solution but I have a whole bunch of them, and here's a distribution over them.
Okay? And we have to allow this. And we'll see later why this is maybe a good
idea, but just go with me here for a second.
So distribution is just a map from the kind of 0, 1 interval to 0, 1 to the N. We'll
think of things in vertex covers. For this I have a map of assignments to the -what vertices are in the cover, which vertices are outside of the cover. And I
want these to be vertex covers. And I can encode this in this very inefficient way,
which is kind of like a probabilistic long code, I guess.
So for every function from 0, 1 to N to 0, 1, I say what's the probability given your
distribution of that function being satisfied? Okay? So if this function were just -were just kind of one-on-one particular assignment and on the rest, you'd get
kind of, you know, the nonzero functions would be the support, and you'd see the
probability. So you see this actually gives you distribution but gives you a lot
more than that.
So examples of kind of functions I could ask is like what's the probability that X is
in the -- that XI is in the cover? Vertex I is in the cover? What's probability that
vertex I is in the cover but vertex J is not? What's probability that there is 10
vertices in the cover, for example? And it turns out that I can -- if someone gives
you an encoding so they give you kind of the probability of F for all Fs, so think of
this as a big vector and they give you this vector, you can check that the vector is
a valid, you know, probability distribution by checking that this -- that, you know,
this thing holds the probability of F plus probability of G is just a probability FNG
plus F or G for all combinations here, and you can check that the probability that
kind of always true thing is 1, probability of all false 1 is 0, and you can check that
this function that corresponds to kind of valid vertex cover assignments is 1 or
correspondingly the one that corresponds to invalid vertex cover assignment is 0.
So you put 0 weight on invalid vertex cover assignments and all the weight goes
on valid ones. Okay?
So these are -- my point is these are things that you can check with linear
program. So I can encode -- if I'm giving this vector I can encode kind of linear
program, I can write a linear program that checks that you gave me a legitimate
solution.
And so we can simplify this a little bit. One way to simplify is we only -- we
actually only need to check local constraints. So for -- if IJ is an edge, we really
only need to check that with the 0 probability neither I nor J is in the vertex cover.
That's enough. So we don't have to check this, we can just check these local
constraints. The reason being, if there's any weight on a -- on a invalid vertex
cover assignment, then if we have a true probability distribution, they'll be some
weight on a violated constraint. Okay? So we can check that the constraints are
always unviolated.
And actually so this problem here is doubly exponential in size because the
number of functions is double the exponential the number of variables. And you
don't need this. You can actually restrict yourself to conjunctions of literals. But
-- and then it's exponential which is kind of the way that we'll want to think about
these things being kind of exponential. And then double exponential. But I think
this notation's just easier so I use this notation. But it doesn't really matter.
>>: [inaudible].
>> Grant Schoenbeck: No. I want D to be a distribution so this is just like the
probability space and then you get a -- an assignment.
>>: [inaudible] as a weight [inaudible].
>> Grant Schoenbeck: Okay. So, yeah, so -- well, you can think of it that way.
I'm just thinking of D being a random variable. But yeah. Okay. So now this is
similar to what I said before. You can -- we want to minimize this expression
here which is just the expected number of variables that are in the cover. And for
all functions F, we have a variable, which is supposed to be their kind of
probability and we want that to be between 0 and 1, and we expect this to hold
for variables. These things that we have a valid assignment, this thing here to
hold. And then I just rewrote this using kind simplified things so maybe for each
conjunction you have a variable that's probably between 0 and 1 and you check
that kind of the probability C is equal to the probability of C and I plus C and not I.
So still have conjunctions.
And the probability of not I and not J should be 0 for every edge. So I don't put
any weight on things that are violated. And so this -- this gives you a nice linear
program here. I just wrote it in terms so you can see kind of how we get from
distribution to linear program.
And now we're going to do something a little goofy that doesn't -- just to
complicate things a little more. We're going to treat this thing that they call a
moment matrix, and the rows and the columns are going to be indexed by these
functions F and the IJth entry is going to be the probability of F and -- or the FGth
entry is going to be the probability F and G. Okay? So I had this vector of all
these things, and I'm going to create kind of a matrix out of it. And the -- what
are some things about this matrix is that if we look at one row, it's kind of like
everything in that -- if we look at Fth row, everything in that row is kind of like and
F, and F, right, it's what probably this and F happens, it's probably that and F
happens. And so if we divide by the probability of F, assuming it's nonzero, it's
kind of like conditioning on F happening. Right? Because the probability of F
and I divided by the probability F is the probability of I conditioned on F. So
these rows kind of normalized, it's like conditioning on an event. And when you
condition, you should also get a distribution of vertex covers, right?
Because if I say all of these -- this is the distribution of vertex covers and I
condition on event, then I get another different distribution of vertex covers.
And in particular, if the future we'll condition on dictators, which are just like XI.
So what's the probability that XI is in the cover. Or let's condition on XI being in
the cover. And also a property of this matrix is that it's positive semidefinite. And
the reason is if you have an integral solution, then the matrix is just if Y is kind of
this vector, this encoding vector we had, then matrix is just the outer product, this
YTY, right? Because the probability of F and G is 1 if F and G are both 3 and 0 if
either F or G is 0. So just the outer product here.
And then the linear combination of positive semidefinite matrices is also positive
semidefinite. So, okay. So we have this moment matrix. Each row, each
normalized row is like conditioning on an element. And each -- and its positive
semidefinite.
>>: [inaudible].
>> Grant Schoenbeck: Here Y would be -- yeah, the -- it would be a row vector.
But it would be like the row vector corresponding to the row 1. Okay.
So now let's define some stuff. Let's do something cool. I'm going to rearrange
the vectors, the order of these things in a particular way. And the way I'm going
to arrange them is so a function said to depend on a variable if it's ever the case
that changing the value of that variable changes the value of the function. And a
function is a K junta if it only depends on at most K variables.
Okay. So we're going to put the 0 juntas first. These are just the constant
functions that are always 1 or always 0. And then we have the 1 juntas. And
these are just dictator functions or anti-dictator functions. 1 juntas and 2 juntas
and 3 juntas and so on. Okay.
So now to get K rounds of Sherali-Adams, what I'm going to do is I'm going to
insist that you give me this kind of truncated part of the moment matrix. So I'll
require that you give me not all the probabilities but all the probabilities for, you
know, say 3 juntas -- up to 3 juntas.
>>: It looks like [inaudible].
>> Grant Schoenbeck: Yeah. Exactly. Yeah. A real lot of them. Okay. So I'm
going to -- so if for -- for vertex cover, for example, I'm going to minimize over
these probability of The Is, the sum of the probability of Is this is the number of
vertices under the cover, and the kind of polytope I'm minimizing over is the one
defined here. It's the kind of assignments that you can lift to this larger space
here legally. I'm going to minimize all those. And these are just the same
constraints that we had before that we said held for the entire moment matrix.
We're just going to insist that they hold on this small part here.
Are people following this at all? Because you should ask questions at this point if
you're not. Because it won't get any better. Actually this is the worst of it. I
should say this is the worst of it. So -- [laughter]. But if you don't understand
this, it won't get much better.
And then Lasserre says that actually this matrix here is positive semidefinite. So
for the same reason if the whole matrix is positive semidefinite, any submatrix will
be also. Right? So given a solution, I want to lift it to this large space. So when
I -- when I say that, you know, a vertex cover solution survives, what I need to do
is define all the things on this kind of section of the moment matrix. And then I've
showed that it survives, you know, K rounds of Sherali-Adams or K rounds of
Lasserre, okay?
And actually the -- so I defined -- there's four hierarchies. I defined two of them
now. The other two are very similar to each other. And they actually use -- they
use this little box here, this much smaller box. They just kind of look at 0 juntas
and 1 juntas.
And so the Lovasz-Schrijver, because you have some linear -- so
Lovasz-Schrijver is -- you start with linear program and you say X amongst to the
kind of the lifting of this linear program if there's a protection matrix. So this is
just the moment matrix for all the 1 juntas. So I define kind of what's the
probability -- remember we said we could look at those rows as kind of
conditioning on something? So you can say what's the probability -- this is the
probability of kind of like XI being in it. This is probability of XI and X1, XI and X
2, this row here. So I give the moment matrix for this and it turns out that the
kind of consistency or constraints now reduced to simply checking that the matrix
is symmetric and that the first row in the first column in the diagonal are the
same. And this is just -- these are just all the consistency constraints that there
are. The row is just conditioning on a -- and if you condition on an event so the
Ith row and you renormalize that, this should still belong to the initial cone.
So what it is saying is if I condition on that I is in the vertex cover, I should still get
a distribution that looks like a distribution of vertex covers. Right? And so this is
LS here. And if we want to do it -- we just do it iteratively. So we iterate it. So if
we want to do the Rth level, we insist that, you know, each row belongs to the R
minus 1th level. Okay?
And so the way to kind of think of this is that, you know, you give up protection
matrix and then some -- say some adversary says, you know, but what about this
row here. Show that this row belongs to the R minus 1th and so then you take
that row, you normalize it, and you get a protection matrix for that and so on and
so forth.
So you can condition on the Ith variable and say show what happens when the
Ith variable is in the cover. And then you give another protection matrix that
says, okay, show me now what happens when I condition additionally at the Jth
vertex is not in the cover. So now I've conditioned that I is in the cover and J is
not in the cover. Okay? And I need a new protection matrix.
And so this kind of is similar to what happens before right where I have the juntas
I can condition on more than 1 thing happening at once here, right, I can
condition that Is in the cover and J's not in the cover, whereas here in LS I
condition that I's in the cover, J's in the cover, and I can give a protection matrix
that depends on the order in which these variables were conditioned on. So I
can give you a different matrix if you first condition that J's not in the cover and I
is in the cover. Okay? And this means that Lovasz-Schrijver is weaker than
Sherali-Adams. And additionally you can include a positive semidefinite
constraint here, and this gives you LS plus, and -- in the same way.
Do people understand that a little bit? Okay. So a way to make it a little more
concrete is this prover adversary game. So the prover says I have this valid
vertex cover solution that looks -- that gives these weights. And here's a little
adversary. And he says, well, show me what happens when I condition on this
vertex here. And so the prover has to give two distributions of what happens
when it's not in the cover and when it is in the cover. So here it's not in the
cover, here it is in the cover. This happens probably two-thirds, one-third, and
you can see if you kind of multiply these weights by two-thirds, those weights by
one-third, you get those weights up there.
And then the next round the adversary says okay, tell me what happens when
this vertex is in the cover and not in the cover. And you say well, if it's not in the
cover, the graph has to look line this. That's kind of all there is to do. If it is in the
cover, I'm stuck. All right? I don't -- there's nothing I can do. And so I don't
survive two rounds of LS here. Of course this is kind of a caricature of it. You
need these protection matrices, but this is more or less what's going on, just to
make it more concrete. And so the adversary says what, you know, I got yeah.
Okay.
So kind of review, we have he's hierarchy of hierarchies. There's four hierarchies
that we saw. There is the Sherali-Adams and Lasserre. These ones have the -this is like the semidefinite constraint here. And so Lasserre the strictly stronger
than Sherali-Adams because it has a semidefinite constraint. LS plus is strictly
stronger than LS because it has a semidefinite constraint.
We saw before that the LS ones kind of proceed and Lasserre proceeds in these
rounds, which means that I can change my distributions, depending on what
order things are conditioned in. So it makes it kind of easier to fool them. And so
the LS is weaker than Sherali-Adams, just as LS plus is weaker than Lasserre.
So this is kind of -- and so we'll see some gaps for all those guys.
Okay. These hierarchies all have certain things in common. They systematically
add more constraints. After our rounds, you've added kind of all the valid
constraints on any subset of are variables. You actually added much more, but
you've at least got this.
And this shows that it's tight after N rounds. Because you have all valid
constraints on all variables. You're tight. And it also runs in time N to the order
R. So this means if we show a lower bound against algorithms that run for log N
rounds, we've ruled out all quasi polynomial algorithms based on these ideas
here.
And if you were allowed up to order N rounds, you've ruled out all subexponential
time algorithms. And super constant you rule out all polynomial algorithms. So
these capture interesting algorithms in very low rounds. So one round captures
-- one round LS plus captures less data function for the independent set problem.
You can also use it for vertex cover. And the Goemans- Williamson relaxation of
MAX CUT. And three rounds of LS plus captures the ARV sparsest cut. So kind
of the best that we have fall very low in these hierarchies. So they seem to be -they seem to capture things that people care about.
Okay. So I want to -- yeah, I want to talk a little bit about different results and
kind of in the proof, so these hierarchy systems can be thought of as proof
systems and so what you need in this result you need an -- I said you need a
counterexample and then you have to have a proof that the counterexample has
no combinatorial solution and the proof that this kind of proof system fails at
seeing that. So you actually need a -- one of the kind of themes is you need a
stronger proof system to prove -- than these things, to prove that the
combinatorial example has no small vertex cover. Right? Because this fails to
see that has no small vertex cover. So you need a stronger kind of proof to show
that and an interesting kind of way of thinking of these results is what kind of
techniques are they using and what kind of graphs using what techniques are
they using on these graphs to show that no small vertex cover exists. So kind of
outsmart these proof systems. Okay?
And so the first two examples we saw I said use this Frankl-Rodl graph which is
the kind of the proof is kind of messy combinatorial proof that kind of when you
change things just slightly geometrically they change radically in the vertex cover
case. And the next result I'll talk about very briefly is by Arora, Bollobas, Lovasz
and Tourlakis, you know two and they -- I think this was the first paper that really
showed what you could do with these things and showed that you could produce
pretty robust lower bounds and they showed that the integrality gap of 2 remains
even after a log N rounds of Lovasz-Schrijver.
So there's no [inaudible] algorithm that's going to fit in Lovasz-Schrijver
hierarchy. And what they used was a random graph of large girth.
So next I'll talk very briefly about this result of mine and Luca Trevisan and
Madhur Tulsiani that extends this result to order N rounds.
So I think this surprised people a bit saying that these linear programs, a large
cost linear programs actually can't solve vertex cover to approximation factor
better than 2 in subpotential time. And not only can't it do that, it can't do it on
random graphs. So kind of random graphs are enough to fool linear programs.
And maybe this is interesting because random graphs can't tool semidefinite
programs. Right? The Lovasz theta function which falls in the first round of LS
plus is not fooled by our random graphs. It kind of sees this graph is random,
you know, it has -- looks at its Eigenvalues and it says there's no way this thing
has a small vertex cover. But they fool very deep into the LS hierarchy.
So I'm going to go over this proof just a little bit to show you one Tim part of it.
So the graph which we're going to -- the graph we're going to use is a
counterexample for this LS hierarchy is a random graph. And we're going to
modify it a little bit. You don't have to, but we'll -- so that it has large girth. So
you just have to remove a few edges of a random graph and you get a graph with
large girth. That means there's no small cycle in it. And what that means is that
any kind of -- any subgraph if you look at -- if you pick a point and you look locally
around it, it looks like a tree.
And also a property that's used in this is that small subgraphs are sparse. So if
you look at any induced subgraph that contains only a small fraction of the
vertices, then that subgraph doesn't have many cycles. It has just a few more -it has just a few more edges than it has vertices. So it can't have many cycles.
So it's almost a tree. It's just a little more complicated than a tree.
Also, random graphs do not have small vertex covers.
>>: [inaudible] regular graph or GNP.
>> Grant Schoenbeck: Just GNP.
>>: One P?
>> Grant Schoenbeck: Let's see. I don't -- I'm thinking it's constant, but I don't
remember. Like the expect number of vertices would be constant. But I -- I don't
really recall. Yeah. I can tell you later. But I'll have to look it up. I mean, you
just -- you need enough so that this doesn't hold. Right?
And then you can show that the solution of putting weight kind of just over
one-half on every vertex survives many, many rounds. And this gives you the
integrality gap of two because almost all the vertices are required to be in the
vertex cover but the linear programming solution says that only half of them are.
And so I'll show you how to survive one round of this. And this is different than
the result before, the Arora, Bollobas, Lovasz and Tourlakis because they did
things very implicitly with kind of duals of linear programs and we did things
explicitly. So go back to that model of the adversary kind of pointing at vertices
and conditioning on them. So the adversary points at some vertex in the graph
and you have to condition on that vertex being in the cover, and you have to say
what happens when the vertex is in the cover and when it's not in the cover.
And what we're going to want is we're going to want to only change an area in
kind of a small ball around that very text. And a constant size ball around that
vertex. And the way we're going to do it in this. In that constant size ball, the -- it
looks like a tree. And we're going to imagine this process being run on the tree.
The process is this. If your apparent is 0, then you're 1 with probability 1, and if
your parent is 1, then most of the time you're 0, but some of the time you're 1.
So this is like noisy transmission of data down a tree. Okay? And these values
are picked so that if you're 1 with probability one-half plus epsilon, then your child
is 1 with probability one-half plus epsilon. Okay?
>>: Again, just to regap. How high -- what's the depth of these trees?
>> Grant Schoenbeck: It's constant.
>>: Constant depth.
>> Grant Schoenbeck: Yeah.
>>: Each vertex is about ->> Grant Schoenbeck: This is the question I failed that you've all asked earlier.
Say constant for now. But it won't matter.
>>: So this is a very sparse random graph?
>> Grant Schoenbeck: Yeah. It's -- you need it -- it's just -- it's just dense
enough that it has no small vertex covers. That's what you need. You need to
put enough edges so that you get this combinatorial proof that there's no small
vertex cover, it goes via just a normal kind of Chernoff bound group.
>>: And so the fact that [inaudible] has nothing to do with the [inaudible] it's just
[inaudible].
>> Grant Schoenbeck: No.
>>: [inaudible].
>> Grant Schoenbeck: You need to remove a few edges to make the girth large.
But you can show that that you remove square root and edges and the girth is
large.
>>: Okay.
>> Grant Schoenbeck: Okay. So on this part of the graph, this little circle that
we cut out, this is actually a distribution of vertex covers. Right? I mean, trees
have small vertex covers, right? You can always cover them with half. And in
particular, this is distribution of vertex covers that includes about one-half plus
epsilon of the vertices.
And so because I gave it a true distribution, if I condition in on this, I know what
happens. It's just what happens in this distribution. So when I condition on it,
you know, if I -- if this is in the cover, then the second row, these are very unlikely
to be in the cover, these are very likely to be in the cover, but as I go down
further and further, this epsilon bit of noise that I add at each level adds up, and if
you get a constant number of rounds I get very, very close to the levels kind of
being the same and everything being one-half plus epsilon again. Right in
because it's this noisy transmission.
So what do I do then? Within this circle of the graph, I give this distribution which
is a distribution of vertex covers, so it's definitely allowed. Outside I leave
everything one-half plus epsilon and across, because -- because the edges
across the one noticed is about one-half plus epsilon, the one outside is about
one-half plus epsilon, the requirement is at that time sum be greater than one.
It's always greater than one. I survive on that boundary too.
But it turns out what this kind of crossing the edge and throwing it away while it
works in LS gives you trouble later in other models. And I'll talk about that later.
So I'm not going to have time to talk more about this except to say that so this
kind of -- this shows you that you can survive one round with this kind of splash
with local modification as long as the splashes are far apart you can always
locally modify the part of the graph I care about and condition on another vertex
and survive another round. And you -- it's kind of tricky, but if the slashes get
close together, you can actually use a trick to kind of fix all the vertices that are
involved and make the new slashes far apart. And that's -- and you can -- it turns
out you can kind of survive this game for quite a while.
Okay. All right. So that's all I wanted to say about this LS on vertex cover. The
next couple results are these -- and these are kind of interesting that all use
different instances. And I'll talk about each of them very briefly. So the first
result is this -- this was Feige and Ofek. And what Feige and Ofek did is they
said let's try to refute a 3 set formula. And the way we're going to try to refute a 3
set formula is we're going to use the standard reduction, we're going to make it
into -- I think the next slide I have here -- yes. Use the standard reduction and
we're going to make it into an independent set problem. And then we're doing to
run the Lovasz theta function on the independent set problem. And we're going
to see what kind of 3 set -- a random 3 set formulas we can refute.
And there is the standard reduction called FGLSS. And you replace each 3 set
clause is replaced by one of these gadgets here. Actually I did it for 3XOR, so
that 3XOR, so you have XOR clauses. So X1 plus X2 plus X3 equals 1. So you
produce it by this little gadget, this 4 click and so here X1 is kind of 0, X2 is 0,
and X3 is 1. And these are the four satisfying assignments here.
So you create a -- one of these clicks for each clause and then you connect the
edges that are -- that contradict each other. So for example, here this one says
X3 is 1, this one says X3 is 0. So they should -- there's a line between them.
They should be connected. And then you can show that if the solution is
satisfiable then there's an independent set of size a fourth of the graph. Because
I just include the ones that the vertices that correspond to the satisfying solution.
And if it's not, no such thing exists. So then if I can show if -- the theta function
can show that no satisfying assignment or no independent set of size a fourth the
graph exists, then I can refute the 3 set formula.
And actually showed that this works kind of as well as the other kind of good
techniques to refute 3 set formulas but no better.
Let's see. Okay. So that's what they're doing. But implicitly what they did was
actually give a seven-sixths minus epsilon integrality gap for vertex cover by
reducing from these 3XOR. And in this paper by again Madhur, Luca and I, we
showed that actually you can amplify this to survive order N rounds of LS plus for
the same integrality gap.
So this shows that kind of, you know, sub -- or no subexponential linear
programming algorithm is going to, you know, exactly compute a vertex cover.
And within LS, these reductions don't work very well. And so you have to kind of
reprove everything. So we kind of -- the first -- the proof -- there is a proof that
LS plus doesn't work for 3XOR by Alekhnovich, Arora, Tourlakis and we kind of
took that and combined it with a Feige effect to get this result here. Actually here
I'll briefly talk about is a result by -- well, GMPT, Georgiou, Magen, Pitassi and
Tourlakis. And they showed that actually on this same Frankl-Rodl graph you
can survive a super constant number of rounds of LS plus and you get an
integrality gap of 2 minus epsilon.
Now, I told you that before I would say something about the errors and what
happens to these errors, kind of why they're heart to deal with, so the semi
definite constraints that you get from LS plus, they introduce a kind of global
constraint. You can't just modify things locally anymore.
If you modify a little bit of something, you have to modify something else to make
the programs positive semidefinite. And so the way we did it is we showed that
kind of actually with these 3XOR instances you can just modify things locally and
there is -- because you're not so close to the gap, this error dies very soon and
there's no error.
In this paper here what they did is they showed that you get these ripples and
they go throughout the entire graph. And what you can do is after each round,
you just kind of round down to where the lowest ripple is. And because the -because of this -- because of the monotonicity of the constraints you can always
round down without punishing yourself but you survive a lot fewer rounds when
you do that.
And the next result which I think you guys saw which by Charikar, Makarychev
and Makarychev, they show it for Sherali-Adams, the 2 minus epsilon, they also
use the random graph. And they reduce the noise by using this really clever way
that I guess use them a trick for metric embeddings to kind of show that the noise
doesn't matter.
Again, you lose something in a number of rounds, it's the end of the epsilon, but
you gain because you don't have to -- the Sherali-Adams, right, you can't switch
the order. The order can't matter for you. So it's a much stronger constraint.
And they were able to survive -- to show that these stronger constraints don't
work.
>>: [inaudible] get this straight. About the seven-sixths. So proving anything
that says that it takes -- if you replace this it will enable by anything that's
constant, that's just implied by the [inaudible] result? Right? Just because it's
one [inaudible] something?
>> Grant Schoenbeck: Oh, right. Yeah. Right. So that's ->>: So the [inaudible] result is actually immediately stronger result is implied just
by NP-hardness.
>> Grant Schoenbeck: Okay. So that's a really good question. And the answer
is that that's true if you assume that P does not equal NP. Right? So -- and if
you assume that, right [laughter] yeah. So and if you assume that -- if you
assume that there's no subexponential algorithm for NP-hard problems, then this
result is also implied. Right? So depends on how strong of assumptions you
want to make about NP.
So kind of one neat things about these things is that they are unconditional. The
other neat thing is that you get this stratification of, you know, subexponential or
quasi polynomial or whatever. But, yeah, that's a great point.
Okay. So the last -- oh, boy. Last result that I'll -- I guess I'm a little low on time.
So I'll just mention is that actually you can -- I was able to strengthen this result
which is LS plus for Lasserre and this is the first integrality gap for Lasserre
hierarchy so that's the strongest of the hierarchies. And it survives order N
rounds and gets you the this seven-sixths minus epsilon. And the kind of neat
thing about it is that it comes from a proof that XOR in a random XOR formula
can't be refuted by Lasserre. So let me just say -- so the theorem is that our
random XOR instance is not refuted even by a linear number of rounds of
Lasserre and so the proof is that a random 3XOR formula can't be refuted by
width W resolution. And this was shown by Ben-Sasson and Wigderson. And
then I was able to show that actually if it has no width W resolution the Lasserre
is kind of no stronger than width W resolution.
Now, the other side of it, how do you prove that a random 3XOR formula is
unsatisfiable comes from just the -- just Chernoff type arguments and it -- it
seems that like -- I guess it seems that it's odd that you can -- like the other one
is odd that you can fool semidefinite programs using random things but kind of
the reason seems to be is that you're using random hypergraphs instead of
random actual graphs. And the semidefinite programs don't pick these up very
well.
Okay. You guys missed out on a beautiful proof. I'm sorry.
But another neat thing about Lasserre is that reductions actually work quite well
on Lasserre. And so the corollary of the result on 3XOR is actually you get this
integrality gap for vertex cover immediately from it. And it's not like LS plus
where you had to kind of recruit things a little bit.
The end of the story for today on this is Madhur Tulsiani went -- showed you
could take these vectors from local constraint satisfaction problems and push
them through the Dinar-Safra reduction that showed the NP-hardness result and
actually get the same integrality gap result, though it -- because the size of the
program -- the size of the graph instance increases, you only get it for N to the
epsilon rounds, which means that actually on this slide there's five like
incomparable best, best integrality gaps. So I think it's kind of -- and still lots of
room for improvement.
So kind of from a higher level, SDP hierarchies, why do we care about them?
Well, they're related to approximation algorithms. And some of these problems
are toy problems like vertex cover maybe, but some are problems people care
about, like sparsest cut.
They provide unconditional lower bounds. We don't have to wait for people to
prove that P does not equal NP to show that our techniques are feeble.
They are related to prove complexity in kind of fun ways and kind of, you know,
you're trying to come up with kind of things that fool, you know, these hierarchies
and kind of more imaginative ways to prove that there's no solution.
They're also related to local-global tradeoffs. If you look at like Sherali-Adams, it
defines local distribution. So it's kind of saying locally everything looks good, but
globally things are amiss. So this goes to a long kind of mathematical -mathematic transition of looking at local -- global tradeoffs, and one is looking at
something locally tell you something about the global structure of it. And that's
definitely present here.
And the last thing is average case hardness. Maybe not as it's traditionally
meant, but you can look and see when the integrality gap is large, like for which
instances it's large. So here -- and a lot of these results we're able to show that
kind of for a random instance, the integrality gap is large.
Where you might be able to -- sometimes you can show that for like a suitably
dense instance it's not large. So we can -- we can actually solve dense instance
as well or things like that. So you can start breaking down kind of the
NP-hardness results by saying on this subset we can actually do something
substantial.
That's all I want to say about NP-hardness results. I'll talk to some of you later.
So I've done other recent work. Should I -- so question -- should I end now? I
was told about 50 minutes. Or should I go on for a little longer?
>>: [inaudible].
>> Grant Schoenbeck: Okay. Okay. So I'll just make it a little longer. So other
work. We proved a new -- yet another proof of the XOR lemma with Thomas
Holenstein. And it's kind of fun because it's really simple and easy. And it has
applications to taking -- combining cryptographic protocols and rerunning like a
weak bit commitment protocol several times and strengthening it.
Also work on online algorithms for non-monotone submodular maximization.
Can't hardly read all those words. But the kind of the idea there is like so you
have a secretary -- the hiring problem or the secretary problem where you want
to pick the largest element of the set and now instead of picking largest element
of a set, you're trying to hire, I don't know, like post-docs and they all have
different strengths and weaknesses, but you don't get any reward if their
strengths overlap, you only count them once, right? We already have an expert
and blah, blah, blah, we don't need another one. So now your maximization -your function is submodular, it's kind of a set cover problem, and you can only
hire three, and so you want -- and you hire them kind of sequentially. Once you
let one go, you can't hire the next one, and so you want to maximize that.
I did work on property testing with Reed-Solomon codes.
And I'll talk very briefly about arriving at consensus in social networks with
Elchanan Mossel.
And then in the -- this is older work but Nash Equilibrium in concisely represented
games. So you have a game and it's not -- all the payoffs aren't written off
explicitly, maybe it's computed by a circuit, maybe it's a graphical game, maybe
it's another funky kind of a game called an action graph game which kind of
captures local dependencies and you want to solve questions related to Nash
Equilibrium and there's kind of hardness results there.
So I'll talk just briefly kind of mention this work on arriving at consensus on social
networks. So the story is there's a group of kids and they want to go to a movie.
And these are two movie theaters from my home town. You have the Warren old
town and the Warren east side and you need to pick which movie theater to go
to, but they don't all know each other and may be that I will bump together in the
hall ways and they wanted to pick between these otherwise indistinguishable
things but it's very important that everyone goes to the same one, otherwise no
one's going to have any fun.
And the question we look at are what are the kind of computational issues to
doing this. So assuming no game theoretic problems, these people are clever
incurring any algorithm they want. What kind of -- what are -- are there any
inherent kind of computational boundaries to this?
It was motivated by some experimental work where they -- basically they put
people in front of a computer and you could maybe see what color your neighbor
chose and you're trying to reach a consensus or you're trying to color a graph so
choose something opposite them. And so there's kind of a series of works here
which I won't go into any more, but they're kind of interesting. And they're
definitely fun read if you want some fun in your life.
So this is the model very briefly. You have a weighted network. These are the -meet with five people, so I gave this the ICF, so I was in China. That's my
brother. He's a professor too. Everyone has a state of their own. And -- each
edge is equipped with a boson clock and the weight of the boson clock is equal
to the weight of the edge. So when an edge rings, the boson clock rings, then
the two -- it's like the two people are talking to each other. Right? So they talk to
each other, they maybe flip some random coins, and they update their states
based on the coin tosses in the previous states. And the other important thing is
that this update function has to treat the two kind of choices the consensus the
same, right? So a lot of times in distributed computing the way that you compute
consensus is you compute like the OR of things, that's not allowed here because
that doesn't treat them the same. So there's no difference between the two
theatres except that it really matters which one we go to. But otherwise, they're
indistinguishable.
And so within this kind of -- and also -- this is a -- we often parameterize it by the
size of the state. So if they have one bit for a state or two bits for a state or log N
bits for a state, what can they do?
And the problems we've studied are kind of coordination. So they all want to
arrive at the same solution. So there's some special state, red or blue, and they
need to all arrive at red or all arrive at blue.
Majority coordination, which is simply computing the majority of some original
signal they're given.
And we -- so let's -- it's kind of interesting that one of the important kind of
definitions or parameters for this problem is what we call the broadcast time.
And I think this differs a little bit from some of the distributing computing stuff
because of this. So the broadcast time is a time for message to flood the
network, so kind of if I start sending out something really outrageous so that
everyone will send it, how long does it take for that message to reach the entire
network? And this, because we use these boson clocks which are kind of goofy,
it seems to depend on the -- like maybe it's -- behaves more like the expansion
than the diameter and that if there's lot -- many paths for a message to reach
somewhere, it will actually get there much faster than it will if there's just one
short path.
And actually this -- it turns out this broadcast time is the right -- like you can do
consensus with a constant number of bits in the broadcast time. So I don't want
to go over too much, but just briefly some results. So with one bit you can do it in
kind of slowly in N squared time, but you can do it. This coordination. But
actually in constant number of bits you can -- you can compute it in the broadcast
time, which is maybe as fast as you'd expect to be able to compute it.
And I won't show you any of this stuff. Boy, you guys are missing out here. All
right. The majority cooperation it turns out is impossible to do with one bit. But
it's kind of fun. You can actually do it with two bits. And again it's kind of more
mixing time. It's this slow like N cube stuff. But you can actually do it with not
many more bits. You need log -- you need to kind of remember a neighbor
essentially. But if you kind of remember a neighbor, then you can do it with -- in
time of the -- based on the diameter. Not the broadcast time, but the diameter, N
plus some log N factors.
>>: [inaudible] assumption on the geometry?
>> Grant Schoenbeck: So there's no assumption on the geometry here. But a
lot of times like -- so this N cubed -- this is like a worst case. So it will be much
faster on most graphs than N cubed, you would think.
>>: Can you say something on the specific interesting families of facts?
>> Grant Schoenbeck: No. So we -- this is kind of the first shot of doing this,
and probably it makes more sense to get the model working better before
spending too much time on the math for specific graphs. But some cases would
be interesting to look at, you're right.
I mean a lot of times like with the -- with the coordination, kind of the broadcast
time seems to capture a large bit of the geometry of a graph, right, like how long
it takes a message to get from one side of the network to the other. It says
something about the way the network is made up.
If there's like one link between them, you've got to wait for that one to go off. It's
like they take a lot longer than if it's well connected.
All right. So kind of -- like I just want to mention this work is -- it's kind of in its
infancy, right, it's just like we proposed this kind of fun model and showed some
stuff on it. In the future this is -- it's kind of -- I think it will be interesting. Like you
can try to make the model fit the experimental results better than what people
did. Actually you can try to modify the experimental results to fit the model a little
bit better.
And what I mean by that is that a lot of the experiments, everyone just had one
bit and they were communicating with one bit. And we showed that actually the
strategies that we show that converge quick, some of them are very natural but
require more than one bit of communication. So you couldn't employ these in
these networks. And so maybe they're not very good models of the way people
interact. And so the experiments could be tweaked.
A big thing is -- is this -- several big things -- is evolvability. How do the agents
learn what algorithms to run? This, we just say like you run this like weird
algorithm like go. Right? And this didn't seem to be very realistic and could be
improved upon a lot. Put game theory in it. Maybe I don't care about -- I only
care if my friends are there, not the friends of their friends much maybe I'll cheat
and run a different model if it's better for me.
And also we use bounded rationality by bounding the memory of these. It's the
first hack at things, but of course, you know, a two bit finite state machine
opportunity exactly replicate a human and many -- there's many, you know, ways
that that can fail. So you know, other -- what are other ways that we can try to do
that better?
Yeah. And also, just to go beyond consensus, because consensus is like the
easiest thing you can think of. It's great. Because as you add an edge in the
graph, the problem doesn't change, the problem is always the same to reach
consensus, whereas like with coloring problems it does. So you have two
moving targets that makes it a lot harder. But so there's lots of -- there's lots of
work in the future in this area, and it -- if you all want to talk about it today in the
interviews, I'd be happy to discuss it with you. So that's all I wanted to say.
Thanks.
[applause].
>> Kamal Jain: Okay. No questions. Thank you.
>> Grant Schoenbeck: Thank you, all.
[applause]
Download