>> Kamal Jain: Hello, it's my pleasure to introduce you to Grant Schoenbeck. He's visiting us from Berkeley, and he'll be talking about the strength of linear and semidefinite programs. >> Grant Schoenbeck: So thanks, everyone, for showing up to my talk. I'm going to talk about the strength of linear and semidefinite programs. You've all said you've heard several talk on this topic before, so I changed things around from the normal talk and hopefully it will help put things in context and it won't be so into the details as some of the talks. So I hope you enjoy it. The setting is we have these combinatorial optimization problems. And we know how to get -- we know that you can't solve them exactly, NP-hard and general solve exactly. And so you might say well, are they all the same? And people will say no, they're not all the same because actually some of them you can approximate well and some of them you can't approximate well. And a lot of the recent work in theoretical peer science has been pinning down how well you can approximate these problems. And we've pinned this down pretty well for a large swath of these problems, but problems that look like travelling salesmen problem or problems that are 2-CSPs, which means that kind of there you have local constraints and each constraint only depends on two variables, these we've been rather unsuccessful at pinning down approximation factors. We do have at least non trivial approximation algorithms for these problems. And some examples of these are the Lovasz theta function. This MAX CUT algorithm by Goemans and Williamson and the ARV sparsest cut algorithm. So and these kind of successful algorithms are all based on semidefinite programming. So kind of a useful technique seems to be semidefinite programming and as we'll see kind of later, there is obvious ways to make these semidefinite programs stronger. And one of the main questions is do these techniques help? So can we get stronger or better approximation algorithms kind of without thinking too hard for these problems. So today I'm going to focus on one particular problem which is a vertex cover. So in vertex cover you're given a graph. And you want to find a minimal kind of cover so a cover is a set of vertices such that each edge is incident to at least one vertex in the cover. You want to find a minimal size cover. And this is -- this problem is NP-hard. We're going to approximate it. And there's a simple 2 approximation which I'll show soon. It's been shown to be NP-hard to compute exactly, better than 1.36. This is by Dinar and Safra. Actually it's unique games hard to do better than 2. And we're interested in how well can these linear programs, semidefinite programs do. Even in just like barely subexponential time can we get better than a 2 approximation for these -- using these techniques. So to define things a little bit. The first thing I'll kind of define what a linear program relaxation here is. So you start off with an integer program. So we encode the vertex cover problem as an integer program. So for each vertex in the graph we're going to choose a variable. The variable's going to be 1 if the vertex is in the cover, 0 if the vertex is not in the cover. And so we're trying to minimize over the sum of the variables which is the number of variables in the cover. And for each edge we add a constraint that says that, you know, the sum of the end points is greater than 1, which means one of them must be in the cover. So all I've done here -- I haven't done anything. I've just rewritten vertex cover as an integer program. Vertex cover was NP complete. So was an integer programming. So nothing has happened. But the kind of amazing thing is that if we relax this and we add more possible solutions and we say just anything between 0 and 1 is okay, now I have a linear program and I can solve it but now I'm solving a different problem than what I set out to solve. And this notion of integrality gap captures kind of the relation between the linear program and the integer program. So what it is is it's the ratio in kind of the worst case in the combinatorial optimum divided by linear programming optimum. So this ratio will be large if there's any instance where the linear program optimum gets much better, right, because it relaxing it so the minimization is going to decrease so it gets much better than the integer. And so we can see here the integrality gap is actually two more or less. It's less than two by rounding. If you just solve the linear program, you get these fractional solutions. And we're just going to round the fraction to 0 or 1. And at most the weight of the total solution can double because one-half can go to one, and that's doubling. That's the most it can do. You notice that if the constraints were satisfied before the rounding, they're satisfied after because if the sum of the constraints was greater than 1, one of them must be greater than one-half. That one will round to 1, so the integer's solution will also be a solution here. So you get the integrality of less than 2 by rounding. And so in particular this gives you an algorithm. And you can see it's greater than 2 also. And the way to do that is you have the complete graph. That's going to be your counter example here. It has no small vertex cover. So the smallest vertex cover is N minus 1. But the linear program gives and optimum of N over 2. And the reason is you can put one-half everywhere. If you put one-half everywhere, all these constraints with satisfied. So it claims the kind of found a vertex cover of size half the vertices but no such vertex cover exists. And so the ratio between N minus 1, N over 2 is 2. So you get the integrality gap being at least 2 here. So let's just kind of review what happened here. What happened is we started off with an integer program. And you can look at the convex combinations of the valid integral solutions. And then we relaxed the program. When we relaxed it, this kind of polytope became bigger and emitted kind of bad solutions to it. Okay. Now, you might object to this being -- saying this complete graph is kind of a obvious counterexample, and we can -- we can get rid of that pretty easily by just saying it has triangles in it, so any triangle at least two of the vertices must be in the cover. So I'm going to add another constraint that says that, you know, for any triangle in the graph the sum of the vertices corresponding to them must be at least 2. Okay? And now the previous counterexample doesn't work, right? And the question is well what's the integrality got here? And the answer is it's still 2. You need to be more clever to find a counterexample, but it's still 2. And so then once we had the integer convex integral solutions we relax it to linear solutions and then this essentially creates a cut on these solutions that we have making the polytope smaller. And the question is kind of can we make these cuts enough to get it back so that the -- to actually shrink the integrality gap? And you see while the complete graph was removed as a counterexample equally bad counterexamples still exist here. So let's just do it one more time, this time with semidefinite programming. Okay. So this is kind of almost doing the exact same thing with semidefinite programming. So we start off with our integer program here. I wrote it a little differently. So instead of the sum constraint, I have the product equal 0 so at least one of them must be 1. It's all that saying. You can rewrite it by just homogenizing it. So now I don't have any degree one terms. It's all I did here. So these are squared. And that doesn't actually change anything, so I equals I squared means that -- XI equals XI squared means that XI is 0 or 1. Those are the only two solutions there. So here I add another variable that's essentially 1. So you can think of X not as being 1. It could be minus 1, but it didn't change things, so just think of it being 1. And then I take XI squared, which is the same as XI. I'm minimizing sum over this. And I have -- I replaced this constraint here by just multiplying by one there which doesn't change anything. And I replace the 1 by X not there. So I haven't changed anything yet. I just homogenized it. But now from here we can actually relax and say, well, instead of using these scalers, I'll allow any vectors. And so this is a relaxation. And now I just get this goofy looking program over here which again is solvable in polynomial time because it's a semidefinite program. And now the question is what's the relation between the solutions of this program, the semidefinite program, and the integral program? And it was shown by Kleinberg and Goemans in '98 that the integrality gap is still essentially 2. And they used this kind of special graph or kind of family of graphs to do this. All these Frankl-Rodl graphs which will come up more in the future too. But what I want to say about this is that so if you have an integrality gap, it kind of requires three main components. One is you need the counter example itself, which here is going to be a Frankl-Rodl graph. So the Frankl-Rodl graph is you have the vertices of the vertices of the hypercube and each vertex is connected to the points that are almost antipodal to it. So if you connect each vertex to its antipodal point in a hypercube, you get a matching, right? And a matching actually has a vertex cover of size a half. It's matching. Okay? This graph is geometrically very similar. Instead of connecting to antipodal point, each point is connected to points that are almost antipodal to it. And so geometrically it looks kind of like a matching, but in fact it's not, okay? So the next two components you need of the integrality gap, to show a integrality gap is you need to show one that this graph actually -- this counter example does not have a small vertex cover, which is what Frankl-Rodl showed, that this kind of graph where you connect things almost antipodal points has no small vertex cover. Any vertex cover is almost the entire graph. And you also need to show a semidefinite solution to the problem that obtains -that shows kind of the semidefinite program things there is a good solution to this. And here what they did with these Frankl-Rodl graphs is they were able to create solutions kind of with intuition I told you before that geometrically this graph looks very similar to matching, even though its combinatorial behavior is far from that. So Kleinberg and Goemans created this SDP solution that satisfies this for the Frankl-Rodl graphs which showed integrality gap as 2. But again you can introduce very simple kind of additional equations to it. So here we add one that says XI times XJ is greater than 0 which is always true for these integral solutions. And that adds the corresponding constraint that VI.VJ is greater than 0. And this it turns out this equation was not satisfied in the Kleinberg and Goemans solution. So kind of if you just increase the program a little bit it's not satisfied anymore. Charikar showed that you could kind of -- you could tweak it and satisfy it again. But what I want to point out here is kind of this cat and mouse game between kind of creating better cuts and then better integrality gaps and then better cuts and then better integrality gaps and then better cuts and better integrality gaps. And ->>: [inaudible]. [laughter]. >> Grant Schoenbeck: And so kind of the goal of this line of research is to systematize this and to try to get rid of the cat and mouse game, though perhaps it just creates it on a larger scale and say what it means to kind of rule out large swaths of these linear and semidefinite programs. So the way we're going to do this is to think of things as distributions. So we have -- we're dealing with convex programs. So we can't help but include all con vehicle combinations of valid integral solutions. Otherwise you won't have a convex program. But we want to -- so the kind of thought is we want to allow only these things. To allow only these things. So a convex combination of integral solutions can be thought of as a distribution over integral solutions. Which is kind of a little strange. But so someone comes to you and says well I don't have one integral solution but I have a whole bunch of them, and here's a distribution over them. Okay? And we have to allow this. And we'll see later why this is maybe a good idea, but just go with me here for a second. So distribution is just a map from the kind of 0, 1 interval to 0, 1 to the N. We'll think of things in vertex covers. For this I have a map of assignments to the -what vertices are in the cover, which vertices are outside of the cover. And I want these to be vertex covers. And I can encode this in this very inefficient way, which is kind of like a probabilistic long code, I guess. So for every function from 0, 1 to N to 0, 1, I say what's the probability given your distribution of that function being satisfied? Okay? So if this function were just -were just kind of one-on-one particular assignment and on the rest, you'd get kind of, you know, the nonzero functions would be the support, and you'd see the probability. So you see this actually gives you distribution but gives you a lot more than that. So examples of kind of functions I could ask is like what's the probability that X is in the -- that XI is in the cover? Vertex I is in the cover? What's probability that vertex I is in the cover but vertex J is not? What's probability that there is 10 vertices in the cover, for example? And it turns out that I can -- if someone gives you an encoding so they give you kind of the probability of F for all Fs, so think of this as a big vector and they give you this vector, you can check that the vector is a valid, you know, probability distribution by checking that this -- that, you know, this thing holds the probability of F plus probability of G is just a probability FNG plus F or G for all combinations here, and you can check that the probability that kind of always true thing is 1, probability of all false 1 is 0, and you can check that this function that corresponds to kind of valid vertex cover assignments is 1 or correspondingly the one that corresponds to invalid vertex cover assignment is 0. So you put 0 weight on invalid vertex cover assignments and all the weight goes on valid ones. Okay? So these are -- my point is these are things that you can check with linear program. So I can encode -- if I'm giving this vector I can encode kind of linear program, I can write a linear program that checks that you gave me a legitimate solution. And so we can simplify this a little bit. One way to simplify is we only -- we actually only need to check local constraints. So for -- if IJ is an edge, we really only need to check that with the 0 probability neither I nor J is in the vertex cover. That's enough. So we don't have to check this, we can just check these local constraints. The reason being, if there's any weight on a -- on a invalid vertex cover assignment, then if we have a true probability distribution, they'll be some weight on a violated constraint. Okay? So we can check that the constraints are always unviolated. And actually so this problem here is doubly exponential in size because the number of functions is double the exponential the number of variables. And you don't need this. You can actually restrict yourself to conjunctions of literals. But -- and then it's exponential which is kind of the way that we'll want to think about these things being kind of exponential. And then double exponential. But I think this notation's just easier so I use this notation. But it doesn't really matter. >>: [inaudible]. >> Grant Schoenbeck: No. I want D to be a distribution so this is just like the probability space and then you get a -- an assignment. >>: [inaudible] as a weight [inaudible]. >> Grant Schoenbeck: Okay. So, yeah, so -- well, you can think of it that way. I'm just thinking of D being a random variable. But yeah. Okay. So now this is similar to what I said before. You can -- we want to minimize this expression here which is just the expected number of variables that are in the cover. And for all functions F, we have a variable, which is supposed to be their kind of probability and we want that to be between 0 and 1, and we expect this to hold for variables. These things that we have a valid assignment, this thing here to hold. And then I just rewrote this using kind simplified things so maybe for each conjunction you have a variable that's probably between 0 and 1 and you check that kind of the probability C is equal to the probability of C and I plus C and not I. So still have conjunctions. And the probability of not I and not J should be 0 for every edge. So I don't put any weight on things that are violated. And so this -- this gives you a nice linear program here. I just wrote it in terms so you can see kind of how we get from distribution to linear program. And now we're going to do something a little goofy that doesn't -- just to complicate things a little more. We're going to treat this thing that they call a moment matrix, and the rows and the columns are going to be indexed by these functions F and the IJth entry is going to be the probability of F and -- or the FGth entry is going to be the probability F and G. Okay? So I had this vector of all these things, and I'm going to create kind of a matrix out of it. And the -- what are some things about this matrix is that if we look at one row, it's kind of like everything in that -- if we look at Fth row, everything in that row is kind of like and F, and F, right, it's what probably this and F happens, it's probably that and F happens. And so if we divide by the probability of F, assuming it's nonzero, it's kind of like conditioning on F happening. Right? Because the probability of F and I divided by the probability F is the probability of I conditioned on F. So these rows kind of normalized, it's like conditioning on an event. And when you condition, you should also get a distribution of vertex covers, right? Because if I say all of these -- this is the distribution of vertex covers and I condition on event, then I get another different distribution of vertex covers. And in particular, if the future we'll condition on dictators, which are just like XI. So what's the probability that XI is in the cover. Or let's condition on XI being in the cover. And also a property of this matrix is that it's positive semidefinite. And the reason is if you have an integral solution, then the matrix is just if Y is kind of this vector, this encoding vector we had, then matrix is just the outer product, this YTY, right? Because the probability of F and G is 1 if F and G are both 3 and 0 if either F or G is 0. So just the outer product here. And then the linear combination of positive semidefinite matrices is also positive semidefinite. So, okay. So we have this moment matrix. Each row, each normalized row is like conditioning on an element. And each -- and its positive semidefinite. >>: [inaudible]. >> Grant Schoenbeck: Here Y would be -- yeah, the -- it would be a row vector. But it would be like the row vector corresponding to the row 1. Okay. So now let's define some stuff. Let's do something cool. I'm going to rearrange the vectors, the order of these things in a particular way. And the way I'm going to arrange them is so a function said to depend on a variable if it's ever the case that changing the value of that variable changes the value of the function. And a function is a K junta if it only depends on at most K variables. Okay. So we're going to put the 0 juntas first. These are just the constant functions that are always 1 or always 0. And then we have the 1 juntas. And these are just dictator functions or anti-dictator functions. 1 juntas and 2 juntas and 3 juntas and so on. Okay. So now to get K rounds of Sherali-Adams, what I'm going to do is I'm going to insist that you give me this kind of truncated part of the moment matrix. So I'll require that you give me not all the probabilities but all the probabilities for, you know, say 3 juntas -- up to 3 juntas. >>: It looks like [inaudible]. >> Grant Schoenbeck: Yeah. Exactly. Yeah. A real lot of them. Okay. So I'm going to -- so if for -- for vertex cover, for example, I'm going to minimize over these probability of The Is, the sum of the probability of Is this is the number of vertices under the cover, and the kind of polytope I'm minimizing over is the one defined here. It's the kind of assignments that you can lift to this larger space here legally. I'm going to minimize all those. And these are just the same constraints that we had before that we said held for the entire moment matrix. We're just going to insist that they hold on this small part here. Are people following this at all? Because you should ask questions at this point if you're not. Because it won't get any better. Actually this is the worst of it. I should say this is the worst of it. So -- [laughter]. But if you don't understand this, it won't get much better. And then Lasserre says that actually this matrix here is positive semidefinite. So for the same reason if the whole matrix is positive semidefinite, any submatrix will be also. Right? So given a solution, I want to lift it to this large space. So when I -- when I say that, you know, a vertex cover solution survives, what I need to do is define all the things on this kind of section of the moment matrix. And then I've showed that it survives, you know, K rounds of Sherali-Adams or K rounds of Lasserre, okay? And actually the -- so I defined -- there's four hierarchies. I defined two of them now. The other two are very similar to each other. And they actually use -- they use this little box here, this much smaller box. They just kind of look at 0 juntas and 1 juntas. And so the Lovasz-Schrijver, because you have some linear -- so Lovasz-Schrijver is -- you start with linear program and you say X amongst to the kind of the lifting of this linear program if there's a protection matrix. So this is just the moment matrix for all the 1 juntas. So I define kind of what's the probability -- remember we said we could look at those rows as kind of conditioning on something? So you can say what's the probability -- this is the probability of kind of like XI being in it. This is probability of XI and X1, XI and X 2, this row here. So I give the moment matrix for this and it turns out that the kind of consistency or constraints now reduced to simply checking that the matrix is symmetric and that the first row in the first column in the diagonal are the same. And this is just -- these are just all the consistency constraints that there are. The row is just conditioning on a -- and if you condition on an event so the Ith row and you renormalize that, this should still belong to the initial cone. So what it is saying is if I condition on that I is in the vertex cover, I should still get a distribution that looks like a distribution of vertex covers. Right? And so this is LS here. And if we want to do it -- we just do it iteratively. So we iterate it. So if we want to do the Rth level, we insist that, you know, each row belongs to the R minus 1th level. Okay? And so the way to kind of think of this is that, you know, you give up protection matrix and then some -- say some adversary says, you know, but what about this row here. Show that this row belongs to the R minus 1th and so then you take that row, you normalize it, and you get a protection matrix for that and so on and so forth. So you can condition on the Ith variable and say show what happens when the Ith variable is in the cover. And then you give another protection matrix that says, okay, show me now what happens when I condition additionally at the Jth vertex is not in the cover. So now I've conditioned that I is in the cover and J is not in the cover. Okay? And I need a new protection matrix. And so this kind of is similar to what happens before right where I have the juntas I can condition on more than 1 thing happening at once here, right, I can condition that Is in the cover and J's not in the cover, whereas here in LS I condition that I's in the cover, J's in the cover, and I can give a protection matrix that depends on the order in which these variables were conditioned on. So I can give you a different matrix if you first condition that J's not in the cover and I is in the cover. Okay? And this means that Lovasz-Schrijver is weaker than Sherali-Adams. And additionally you can include a positive semidefinite constraint here, and this gives you LS plus, and -- in the same way. Do people understand that a little bit? Okay. So a way to make it a little more concrete is this prover adversary game. So the prover says I have this valid vertex cover solution that looks -- that gives these weights. And here's a little adversary. And he says, well, show me what happens when I condition on this vertex here. And so the prover has to give two distributions of what happens when it's not in the cover and when it is in the cover. So here it's not in the cover, here it is in the cover. This happens probably two-thirds, one-third, and you can see if you kind of multiply these weights by two-thirds, those weights by one-third, you get those weights up there. And then the next round the adversary says okay, tell me what happens when this vertex is in the cover and not in the cover. And you say well, if it's not in the cover, the graph has to look line this. That's kind of all there is to do. If it is in the cover, I'm stuck. All right? I don't -- there's nothing I can do. And so I don't survive two rounds of LS here. Of course this is kind of a caricature of it. You need these protection matrices, but this is more or less what's going on, just to make it more concrete. And so the adversary says what, you know, I got yeah. Okay. So kind of review, we have he's hierarchy of hierarchies. There's four hierarchies that we saw. There is the Sherali-Adams and Lasserre. These ones have the -this is like the semidefinite constraint here. And so Lasserre the strictly stronger than Sherali-Adams because it has a semidefinite constraint. LS plus is strictly stronger than LS because it has a semidefinite constraint. We saw before that the LS ones kind of proceed and Lasserre proceeds in these rounds, which means that I can change my distributions, depending on what order things are conditioned in. So it makes it kind of easier to fool them. And so the LS is weaker than Sherali-Adams, just as LS plus is weaker than Lasserre. So this is kind of -- and so we'll see some gaps for all those guys. Okay. These hierarchies all have certain things in common. They systematically add more constraints. After our rounds, you've added kind of all the valid constraints on any subset of are variables. You actually added much more, but you've at least got this. And this shows that it's tight after N rounds. Because you have all valid constraints on all variables. You're tight. And it also runs in time N to the order R. So this means if we show a lower bound against algorithms that run for log N rounds, we've ruled out all quasi polynomial algorithms based on these ideas here. And if you were allowed up to order N rounds, you've ruled out all subexponential time algorithms. And super constant you rule out all polynomial algorithms. So these capture interesting algorithms in very low rounds. So one round captures -- one round LS plus captures less data function for the independent set problem. You can also use it for vertex cover. And the Goemans- Williamson relaxation of MAX CUT. And three rounds of LS plus captures the ARV sparsest cut. So kind of the best that we have fall very low in these hierarchies. So they seem to be -they seem to capture things that people care about. Okay. So I want to -- yeah, I want to talk a little bit about different results and kind of in the proof, so these hierarchy systems can be thought of as proof systems and so what you need in this result you need an -- I said you need a counterexample and then you have to have a proof that the counterexample has no combinatorial solution and the proof that this kind of proof system fails at seeing that. So you actually need a -- one of the kind of themes is you need a stronger proof system to prove -- than these things, to prove that the combinatorial example has no small vertex cover. Right? Because this fails to see that has no small vertex cover. So you need a stronger kind of proof to show that and an interesting kind of way of thinking of these results is what kind of techniques are they using and what kind of graphs using what techniques are they using on these graphs to show that no small vertex cover exists. So kind of outsmart these proof systems. Okay? And so the first two examples we saw I said use this Frankl-Rodl graph which is the kind of the proof is kind of messy combinatorial proof that kind of when you change things just slightly geometrically they change radically in the vertex cover case. And the next result I'll talk about very briefly is by Arora, Bollobas, Lovasz and Tourlakis, you know two and they -- I think this was the first paper that really showed what you could do with these things and showed that you could produce pretty robust lower bounds and they showed that the integrality gap of 2 remains even after a log N rounds of Lovasz-Schrijver. So there's no [inaudible] algorithm that's going to fit in Lovasz-Schrijver hierarchy. And what they used was a random graph of large girth. So next I'll talk very briefly about this result of mine and Luca Trevisan and Madhur Tulsiani that extends this result to order N rounds. So I think this surprised people a bit saying that these linear programs, a large cost linear programs actually can't solve vertex cover to approximation factor better than 2 in subpotential time. And not only can't it do that, it can't do it on random graphs. So kind of random graphs are enough to fool linear programs. And maybe this is interesting because random graphs can't tool semidefinite programs. Right? The Lovasz theta function which falls in the first round of LS plus is not fooled by our random graphs. It kind of sees this graph is random, you know, it has -- looks at its Eigenvalues and it says there's no way this thing has a small vertex cover. But they fool very deep into the LS hierarchy. So I'm going to go over this proof just a little bit to show you one Tim part of it. So the graph which we're going to -- the graph we're going to use is a counterexample for this LS hierarchy is a random graph. And we're going to modify it a little bit. You don't have to, but we'll -- so that it has large girth. So you just have to remove a few edges of a random graph and you get a graph with large girth. That means there's no small cycle in it. And what that means is that any kind of -- any subgraph if you look at -- if you pick a point and you look locally around it, it looks like a tree. And also a property that's used in this is that small subgraphs are sparse. So if you look at any induced subgraph that contains only a small fraction of the vertices, then that subgraph doesn't have many cycles. It has just a few more -it has just a few more edges than it has vertices. So it can't have many cycles. So it's almost a tree. It's just a little more complicated than a tree. Also, random graphs do not have small vertex covers. >>: [inaudible] regular graph or GNP. >> Grant Schoenbeck: Just GNP. >>: One P? >> Grant Schoenbeck: Let's see. I don't -- I'm thinking it's constant, but I don't remember. Like the expect number of vertices would be constant. But I -- I don't really recall. Yeah. I can tell you later. But I'll have to look it up. I mean, you just -- you need enough so that this doesn't hold. Right? And then you can show that the solution of putting weight kind of just over one-half on every vertex survives many, many rounds. And this gives you the integrality gap of two because almost all the vertices are required to be in the vertex cover but the linear programming solution says that only half of them are. And so I'll show you how to survive one round of this. And this is different than the result before, the Arora, Bollobas, Lovasz and Tourlakis because they did things very implicitly with kind of duals of linear programs and we did things explicitly. So go back to that model of the adversary kind of pointing at vertices and conditioning on them. So the adversary points at some vertex in the graph and you have to condition on that vertex being in the cover, and you have to say what happens when the vertex is in the cover and when it's not in the cover. And what we're going to want is we're going to want to only change an area in kind of a small ball around that very text. And a constant size ball around that vertex. And the way we're going to do it in this. In that constant size ball, the -- it looks like a tree. And we're going to imagine this process being run on the tree. The process is this. If your apparent is 0, then you're 1 with probability 1, and if your parent is 1, then most of the time you're 0, but some of the time you're 1. So this is like noisy transmission of data down a tree. Okay? And these values are picked so that if you're 1 with probability one-half plus epsilon, then your child is 1 with probability one-half plus epsilon. Okay? >>: Again, just to regap. How high -- what's the depth of these trees? >> Grant Schoenbeck: It's constant. >>: Constant depth. >> Grant Schoenbeck: Yeah. >>: Each vertex is about ->> Grant Schoenbeck: This is the question I failed that you've all asked earlier. Say constant for now. But it won't matter. >>: So this is a very sparse random graph? >> Grant Schoenbeck: Yeah. It's -- you need it -- it's just -- it's just dense enough that it has no small vertex covers. That's what you need. You need to put enough edges so that you get this combinatorial proof that there's no small vertex cover, it goes via just a normal kind of Chernoff bound group. >>: And so the fact that [inaudible] has nothing to do with the [inaudible] it's just [inaudible]. >> Grant Schoenbeck: No. >>: [inaudible]. >> Grant Schoenbeck: You need to remove a few edges to make the girth large. But you can show that that you remove square root and edges and the girth is large. >>: Okay. >> Grant Schoenbeck: Okay. So on this part of the graph, this little circle that we cut out, this is actually a distribution of vertex covers. Right? I mean, trees have small vertex covers, right? You can always cover them with half. And in particular, this is distribution of vertex covers that includes about one-half plus epsilon of the vertices. And so because I gave it a true distribution, if I condition in on this, I know what happens. It's just what happens in this distribution. So when I condition on it, you know, if I -- if this is in the cover, then the second row, these are very unlikely to be in the cover, these are very likely to be in the cover, but as I go down further and further, this epsilon bit of noise that I add at each level adds up, and if you get a constant number of rounds I get very, very close to the levels kind of being the same and everything being one-half plus epsilon again. Right in because it's this noisy transmission. So what do I do then? Within this circle of the graph, I give this distribution which is a distribution of vertex covers, so it's definitely allowed. Outside I leave everything one-half plus epsilon and across, because -- because the edges across the one noticed is about one-half plus epsilon, the one outside is about one-half plus epsilon, the requirement is at that time sum be greater than one. It's always greater than one. I survive on that boundary too. But it turns out what this kind of crossing the edge and throwing it away while it works in LS gives you trouble later in other models. And I'll talk about that later. So I'm not going to have time to talk more about this except to say that so this kind of -- this shows you that you can survive one round with this kind of splash with local modification as long as the splashes are far apart you can always locally modify the part of the graph I care about and condition on another vertex and survive another round. And you -- it's kind of tricky, but if the slashes get close together, you can actually use a trick to kind of fix all the vertices that are involved and make the new slashes far apart. And that's -- and you can -- it turns out you can kind of survive this game for quite a while. Okay. All right. So that's all I wanted to say about this LS on vertex cover. The next couple results are these -- and these are kind of interesting that all use different instances. And I'll talk about each of them very briefly. So the first result is this -- this was Feige and Ofek. And what Feige and Ofek did is they said let's try to refute a 3 set formula. And the way we're going to try to refute a 3 set formula is we're going to use the standard reduction, we're going to make it into -- I think the next slide I have here -- yes. Use the standard reduction and we're going to make it into an independent set problem. And then we're doing to run the Lovasz theta function on the independent set problem. And we're going to see what kind of 3 set -- a random 3 set formulas we can refute. And there is the standard reduction called FGLSS. And you replace each 3 set clause is replaced by one of these gadgets here. Actually I did it for 3XOR, so that 3XOR, so you have XOR clauses. So X1 plus X2 plus X3 equals 1. So you produce it by this little gadget, this 4 click and so here X1 is kind of 0, X2 is 0, and X3 is 1. And these are the four satisfying assignments here. So you create a -- one of these clicks for each clause and then you connect the edges that are -- that contradict each other. So for example, here this one says X3 is 1, this one says X3 is 0. So they should -- there's a line between them. They should be connected. And then you can show that if the solution is satisfiable then there's an independent set of size a fourth of the graph. Because I just include the ones that the vertices that correspond to the satisfying solution. And if it's not, no such thing exists. So then if I can show if -- the theta function can show that no satisfying assignment or no independent set of size a fourth the graph exists, then I can refute the 3 set formula. And actually showed that this works kind of as well as the other kind of good techniques to refute 3 set formulas but no better. Let's see. Okay. So that's what they're doing. But implicitly what they did was actually give a seven-sixths minus epsilon integrality gap for vertex cover by reducing from these 3XOR. And in this paper by again Madhur, Luca and I, we showed that actually you can amplify this to survive order N rounds of LS plus for the same integrality gap. So this shows that kind of, you know, sub -- or no subexponential linear programming algorithm is going to, you know, exactly compute a vertex cover. And within LS, these reductions don't work very well. And so you have to kind of reprove everything. So we kind of -- the first -- the proof -- there is a proof that LS plus doesn't work for 3XOR by Alekhnovich, Arora, Tourlakis and we kind of took that and combined it with a Feige effect to get this result here. Actually here I'll briefly talk about is a result by -- well, GMPT, Georgiou, Magen, Pitassi and Tourlakis. And they showed that actually on this same Frankl-Rodl graph you can survive a super constant number of rounds of LS plus and you get an integrality gap of 2 minus epsilon. Now, I told you that before I would say something about the errors and what happens to these errors, kind of why they're heart to deal with, so the semi definite constraints that you get from LS plus, they introduce a kind of global constraint. You can't just modify things locally anymore. If you modify a little bit of something, you have to modify something else to make the programs positive semidefinite. And so the way we did it is we showed that kind of actually with these 3XOR instances you can just modify things locally and there is -- because you're not so close to the gap, this error dies very soon and there's no error. In this paper here what they did is they showed that you get these ripples and they go throughout the entire graph. And what you can do is after each round, you just kind of round down to where the lowest ripple is. And because the -because of this -- because of the monotonicity of the constraints you can always round down without punishing yourself but you survive a lot fewer rounds when you do that. And the next result which I think you guys saw which by Charikar, Makarychev and Makarychev, they show it for Sherali-Adams, the 2 minus epsilon, they also use the random graph. And they reduce the noise by using this really clever way that I guess use them a trick for metric embeddings to kind of show that the noise doesn't matter. Again, you lose something in a number of rounds, it's the end of the epsilon, but you gain because you don't have to -- the Sherali-Adams, right, you can't switch the order. The order can't matter for you. So it's a much stronger constraint. And they were able to survive -- to show that these stronger constraints don't work. >>: [inaudible] get this straight. About the seven-sixths. So proving anything that says that it takes -- if you replace this it will enable by anything that's constant, that's just implied by the [inaudible] result? Right? Just because it's one [inaudible] something? >> Grant Schoenbeck: Oh, right. Yeah. Right. So that's ->>: So the [inaudible] result is actually immediately stronger result is implied just by NP-hardness. >> Grant Schoenbeck: Okay. So that's a really good question. And the answer is that that's true if you assume that P does not equal NP. Right? So -- and if you assume that, right [laughter] yeah. So and if you assume that -- if you assume that there's no subexponential algorithm for NP-hard problems, then this result is also implied. Right? So depends on how strong of assumptions you want to make about NP. So kind of one neat things about these things is that they are unconditional. The other neat thing is that you get this stratification of, you know, subexponential or quasi polynomial or whatever. But, yeah, that's a great point. Okay. So the last -- oh, boy. Last result that I'll -- I guess I'm a little low on time. So I'll just mention is that actually you can -- I was able to strengthen this result which is LS plus for Lasserre and this is the first integrality gap for Lasserre hierarchy so that's the strongest of the hierarchies. And it survives order N rounds and gets you the this seven-sixths minus epsilon. And the kind of neat thing about it is that it comes from a proof that XOR in a random XOR formula can't be refuted by Lasserre. So let me just say -- so the theorem is that our random XOR instance is not refuted even by a linear number of rounds of Lasserre and so the proof is that a random 3XOR formula can't be refuted by width W resolution. And this was shown by Ben-Sasson and Wigderson. And then I was able to show that actually if it has no width W resolution the Lasserre is kind of no stronger than width W resolution. Now, the other side of it, how do you prove that a random 3XOR formula is unsatisfiable comes from just the -- just Chernoff type arguments and it -- it seems that like -- I guess it seems that it's odd that you can -- like the other one is odd that you can fool semidefinite programs using random things but kind of the reason seems to be is that you're using random hypergraphs instead of random actual graphs. And the semidefinite programs don't pick these up very well. Okay. You guys missed out on a beautiful proof. I'm sorry. But another neat thing about Lasserre is that reductions actually work quite well on Lasserre. And so the corollary of the result on 3XOR is actually you get this integrality gap for vertex cover immediately from it. And it's not like LS plus where you had to kind of recruit things a little bit. The end of the story for today on this is Madhur Tulsiani went -- showed you could take these vectors from local constraint satisfaction problems and push them through the Dinar-Safra reduction that showed the NP-hardness result and actually get the same integrality gap result, though it -- because the size of the program -- the size of the graph instance increases, you only get it for N to the epsilon rounds, which means that actually on this slide there's five like incomparable best, best integrality gaps. So I think it's kind of -- and still lots of room for improvement. So kind of from a higher level, SDP hierarchies, why do we care about them? Well, they're related to approximation algorithms. And some of these problems are toy problems like vertex cover maybe, but some are problems people care about, like sparsest cut. They provide unconditional lower bounds. We don't have to wait for people to prove that P does not equal NP to show that our techniques are feeble. They are related to prove complexity in kind of fun ways and kind of, you know, you're trying to come up with kind of things that fool, you know, these hierarchies and kind of more imaginative ways to prove that there's no solution. They're also related to local-global tradeoffs. If you look at like Sherali-Adams, it defines local distribution. So it's kind of saying locally everything looks good, but globally things are amiss. So this goes to a long kind of mathematical -mathematic transition of looking at local -- global tradeoffs, and one is looking at something locally tell you something about the global structure of it. And that's definitely present here. And the last thing is average case hardness. Maybe not as it's traditionally meant, but you can look and see when the integrality gap is large, like for which instances it's large. So here -- and a lot of these results we're able to show that kind of for a random instance, the integrality gap is large. Where you might be able to -- sometimes you can show that for like a suitably dense instance it's not large. So we can -- we can actually solve dense instance as well or things like that. So you can start breaking down kind of the NP-hardness results by saying on this subset we can actually do something substantial. That's all I want to say about NP-hardness results. I'll talk to some of you later. So I've done other recent work. Should I -- so question -- should I end now? I was told about 50 minutes. Or should I go on for a little longer? >>: [inaudible]. >> Grant Schoenbeck: Okay. Okay. So I'll just make it a little longer. So other work. We proved a new -- yet another proof of the XOR lemma with Thomas Holenstein. And it's kind of fun because it's really simple and easy. And it has applications to taking -- combining cryptographic protocols and rerunning like a weak bit commitment protocol several times and strengthening it. Also work on online algorithms for non-monotone submodular maximization. Can't hardly read all those words. But the kind of the idea there is like so you have a secretary -- the hiring problem or the secretary problem where you want to pick the largest element of the set and now instead of picking largest element of a set, you're trying to hire, I don't know, like post-docs and they all have different strengths and weaknesses, but you don't get any reward if their strengths overlap, you only count them once, right? We already have an expert and blah, blah, blah, we don't need another one. So now your maximization -your function is submodular, it's kind of a set cover problem, and you can only hire three, and so you want -- and you hire them kind of sequentially. Once you let one go, you can't hire the next one, and so you want to maximize that. I did work on property testing with Reed-Solomon codes. And I'll talk very briefly about arriving at consensus in social networks with Elchanan Mossel. And then in the -- this is older work but Nash Equilibrium in concisely represented games. So you have a game and it's not -- all the payoffs aren't written off explicitly, maybe it's computed by a circuit, maybe it's a graphical game, maybe it's another funky kind of a game called an action graph game which kind of captures local dependencies and you want to solve questions related to Nash Equilibrium and there's kind of hardness results there. So I'll talk just briefly kind of mention this work on arriving at consensus on social networks. So the story is there's a group of kids and they want to go to a movie. And these are two movie theaters from my home town. You have the Warren old town and the Warren east side and you need to pick which movie theater to go to, but they don't all know each other and may be that I will bump together in the hall ways and they wanted to pick between these otherwise indistinguishable things but it's very important that everyone goes to the same one, otherwise no one's going to have any fun. And the question we look at are what are the kind of computational issues to doing this. So assuming no game theoretic problems, these people are clever incurring any algorithm they want. What kind of -- what are -- are there any inherent kind of computational boundaries to this? It was motivated by some experimental work where they -- basically they put people in front of a computer and you could maybe see what color your neighbor chose and you're trying to reach a consensus or you're trying to color a graph so choose something opposite them. And so there's kind of a series of works here which I won't go into any more, but they're kind of interesting. And they're definitely fun read if you want some fun in your life. So this is the model very briefly. You have a weighted network. These are the -meet with five people, so I gave this the ICF, so I was in China. That's my brother. He's a professor too. Everyone has a state of their own. And -- each edge is equipped with a boson clock and the weight of the boson clock is equal to the weight of the edge. So when an edge rings, the boson clock rings, then the two -- it's like the two people are talking to each other. Right? So they talk to each other, they maybe flip some random coins, and they update their states based on the coin tosses in the previous states. And the other important thing is that this update function has to treat the two kind of choices the consensus the same, right? So a lot of times in distributed computing the way that you compute consensus is you compute like the OR of things, that's not allowed here because that doesn't treat them the same. So there's no difference between the two theatres except that it really matters which one we go to. But otherwise, they're indistinguishable. And so within this kind of -- and also -- this is a -- we often parameterize it by the size of the state. So if they have one bit for a state or two bits for a state or log N bits for a state, what can they do? And the problems we've studied are kind of coordination. So they all want to arrive at the same solution. So there's some special state, red or blue, and they need to all arrive at red or all arrive at blue. Majority coordination, which is simply computing the majority of some original signal they're given. And we -- so let's -- it's kind of interesting that one of the important kind of definitions or parameters for this problem is what we call the broadcast time. And I think this differs a little bit from some of the distributing computing stuff because of this. So the broadcast time is a time for message to flood the network, so kind of if I start sending out something really outrageous so that everyone will send it, how long does it take for that message to reach the entire network? And this, because we use these boson clocks which are kind of goofy, it seems to depend on the -- like maybe it's -- behaves more like the expansion than the diameter and that if there's lot -- many paths for a message to reach somewhere, it will actually get there much faster than it will if there's just one short path. And actually this -- it turns out this broadcast time is the right -- like you can do consensus with a constant number of bits in the broadcast time. So I don't want to go over too much, but just briefly some results. So with one bit you can do it in kind of slowly in N squared time, but you can do it. This coordination. But actually in constant number of bits you can -- you can compute it in the broadcast time, which is maybe as fast as you'd expect to be able to compute it. And I won't show you any of this stuff. Boy, you guys are missing out here. All right. The majority cooperation it turns out is impossible to do with one bit. But it's kind of fun. You can actually do it with two bits. And again it's kind of more mixing time. It's this slow like N cube stuff. But you can actually do it with not many more bits. You need log -- you need to kind of remember a neighbor essentially. But if you kind of remember a neighbor, then you can do it with -- in time of the -- based on the diameter. Not the broadcast time, but the diameter, N plus some log N factors. >>: [inaudible] assumption on the geometry? >> Grant Schoenbeck: So there's no assumption on the geometry here. But a lot of times like -- so this N cubed -- this is like a worst case. So it will be much faster on most graphs than N cubed, you would think. >>: Can you say something on the specific interesting families of facts? >> Grant Schoenbeck: No. So we -- this is kind of the first shot of doing this, and probably it makes more sense to get the model working better before spending too much time on the math for specific graphs. But some cases would be interesting to look at, you're right. I mean a lot of times like with the -- with the coordination, kind of the broadcast time seems to capture a large bit of the geometry of a graph, right, like how long it takes a message to get from one side of the network to the other. It says something about the way the network is made up. If there's like one link between them, you've got to wait for that one to go off. It's like they take a lot longer than if it's well connected. All right. So kind of -- like I just want to mention this work is -- it's kind of in its infancy, right, it's just like we proposed this kind of fun model and showed some stuff on it. In the future this is -- it's kind of -- I think it will be interesting. Like you can try to make the model fit the experimental results better than what people did. Actually you can try to modify the experimental results to fit the model a little bit better. And what I mean by that is that a lot of the experiments, everyone just had one bit and they were communicating with one bit. And we showed that actually the strategies that we show that converge quick, some of them are very natural but require more than one bit of communication. So you couldn't employ these in these networks. And so maybe they're not very good models of the way people interact. And so the experiments could be tweaked. A big thing is -- is this -- several big things -- is evolvability. How do the agents learn what algorithms to run? This, we just say like you run this like weird algorithm like go. Right? And this didn't seem to be very realistic and could be improved upon a lot. Put game theory in it. Maybe I don't care about -- I only care if my friends are there, not the friends of their friends much maybe I'll cheat and run a different model if it's better for me. And also we use bounded rationality by bounding the memory of these. It's the first hack at things, but of course, you know, a two bit finite state machine opportunity exactly replicate a human and many -- there's many, you know, ways that that can fail. So you know, other -- what are other ways that we can try to do that better? Yeah. And also, just to go beyond consensus, because consensus is like the easiest thing you can think of. It's great. Because as you add an edge in the graph, the problem doesn't change, the problem is always the same to reach consensus, whereas like with coloring problems it does. So you have two moving targets that makes it a lot harder. But so there's lots of -- there's lots of work in the future in this area, and it -- if you all want to talk about it today in the interviews, I'd be happy to discuss it with you. So that's all I wanted to say. Thanks. [applause]. >> Kamal Jain: Okay. No questions. Thank you. >> Grant Schoenbeck: Thank you, all. [applause]