16646 >> Yuval Peres: Okay. We're happy to have Prasad Raghavendra from across the lake, who will be teaching us approximating the optimum efficient algorithms and their limits. >> Prasad Raghavendra: Thank you. Okay. So let's start right away with looking at the Max 3 set problem. So here you're given a 3-C formula and you want to find an assignment that satisfies the maximum number of clauses. Now, it's a very well known theorem that this problem is NP-hard. And as it turns out, most optimization problems that we want to solve are actually NP-hard. So really we want to cope with this. And one way to do that would be to say, okay, can we find a solution that is, say, at least half as good as the optimum? And that's the approach which led to approximation algorithms. So an algorithm is an alpha approximation if for every instance of that problem it always outputs a solution which is at least alpha times optimum. So that's the approximation algorithms. And there's vast literature in this area. Despite this vast literature in the area, if you look at the core of most approximation algorithms, there are very few tools available. In fact, until 1994 nearly every approximation algorithm directly or indirectly relied on linear program relaxations. And in 1994 Gomez and Williams introduced significant programming based algorithm for max cut, and some different programming is a generalization of linear programming, which makes it the strongest tool that we have today in approximation algorithms. So as of today, if you write a significant program relaxation and it does not give a good approximation, pretty much there is no way to get around it. So this is the state-of-the-art. But before I go any further, let me define what I call constraint satisfaction problems. So Max 3 set was an example of a constraint satisfaction problem. In general, what you have is you have a bunch of variables and these variables take values in some finite domain. In our case boolean domain and then there are constraints on these variables. In our case it's in max 3 set it's just clauses and depending on the kind of constraints that you permit you get different CSPs. And depending on the CSP, I mean its behavior depends on the kind of constraints that you permit. So since STP was introduced there's been lots and lots of algorithms for different CSPs using some definite programming. So I've basically listed every, nearly every algorithm which has been there. So these bars should really indicate the approximation ratio. It's a number between 0 and 1, but it's not completely drawn to scale, at least the order is right. Okay. So these are all the algorithms. And in some cases we actually know that you can't do any better. For instance, if you look at Max 3 Set, we have an NP-hard method which says that you cannot do better than seven-eighths approximation for Max 3 Set. So that basically closes the problem for Max 3 Set because we have an approximation algorithm and a hard message result saying we can't do any better. But still there is gap for some very fundamental problems like Max Cut. For Max Cut, the best algorithm choose a ratio of .878 approximation. Whereas the hardness factor is .941. So there could be a better algorithm, or it could be that we could share hardness of .878, so there's a gap in our understanding of this CSP. >>: A hardness result, is it [inaudible]. >> Prasad Raghavendra: That's what I'm coming to, yeah. >>: [Inaudible]. >> Prasad Raghavendra: So this is NP-hardness and this is algorithm. Yeah. Okay. So to deal with this gap, so they suggested that we work with the conjecture, hardness consumption called Unique Games Conjecture. Basically Unique Games is the following problem: Doing a bunch of linear equations, modular prime P or a number P, and each equation is of the form [inaudible] called [inaudible], so each equation is two variables but equation. And what you want to do is you want to satisfy the maximum number of these equations. And Unique Games Conjecture asserts that given a system which is nearly satisfiable, if the prime is large enough, it's NP-hard besides even epsilon fraction of the equations. And so given 1 minus epsilon satisfied with the system, NP-hard to satisfy the epsilon fraction of the equation. So this is the Unique Games Conjecture. >>: When you say it's the system, what's it mean? >> Prasad Raghavendra: It means that there's a solution which satisfies 1 minus epsilon of fraction of equation. >>: The frame for this ->> Prasad Raghavendra: For this kind of linear systems. So this is the Unique Games Conjecture, and it's a notorious open problem today. There's no consensus today about its truth or falsity. In fact, there are have been attempts to disprove the conjecture, and we are not even close to doing anything but proving the conjecture. In fact, this algorithm of [inaudible], which actually gets to, this is an algorithm for Unique Games Conjecture and gets very close to this spring of conjecture in the sense that any improvement on this approximation ratio would disprove the conjecture. It's not obvious that you could improve this approximation ratio in this. >>: The number of variables or the number of equations? >> Prasad Raghavendra: N? Here? >>: Yeah. >> Prasad Raghavendra: N is the number of variables of the equation since they're prolonging factors. Polynomially related. Okay. >>: [Inaudible]. >> Prasad Raghavendra: The table means that given a system of Unique Games problem, which is 1 minus epsilon satisfiable, this algorithm will output a solution which is 1 out P to the epsilon, a solution which satisfies so many equations. And P is the prime with respect to it, the Unique Games problems. >>: Times N? >> Prasad Raghavendra: This is a fraction, fraction of the equation. 1 minus epsilon fraction of equation that's satisfiable, which finds the solution which satisfies 1 over P to the epsilon fraction of the equation. >>: The other lines you have considered close to 1? >> Prasad Raghavendra: Actually, I mean the close to 1 only epsilon is small enough in these cases. In the original version of Unique Games, epsilon is a constant. So these algorithms are actually not -- these two algorithms only work when epsilon is really small. And it doesn't have much ->>: Last line. >> Prasad Raghavendra: Last line lambda is the spectrum gap of the Unique Games graph. If the gap of the Unique Games has a large spectral graph, then there's a good solution. Okay. So now we assume the Unique Games Conjecture, and there are several hardness reserves that have been shown as having the Unique Games Conjecture. The green bars indicate the hardness results. The surprising factor is if you look at Max Cut or Max 2 Set ->>: What are the approximations here? It's some number? >> Prasad Raghavendra: Yeah. >>: What's this number after new game [inaudible] approximating on a constant, right? >> Prasad Raghavendra: This is the approximation algorithm. The blue bars are the approximation algorithms that we have. It's a number between 0 and 1. What's the approximation ratio conjecture. These are the NP-hardness results. >>: What's the number particularly for Unique Games? >> Prasad Raghavendra: For Unique Games? >>: Yes. Epsilon and 1, what is the number? >> Prasad Raghavendra: Unique Games, it's 1 over P to the epsilon. The number which I showed is 1 over P to the epsilon. This is it. This is the best possible approximation. That's the best possible known approximation. >>: But you're putting a constant bar there on the next chart. That's why. >> Prasad Raghavendra: That's why I said it's not -- it doesn't prove 1 over P. Yeah. So the surprising fact is that if you look at Max Cut or Max 2 Set, the Unique Games hardness exactly matches the approximation given by the STP. For Max Cut, it tells you cannot approximate better than 7.78, which is exactly equal to the approximation given by STP. So this is kind of a connection which is somewhat mysterious, and it was believed that indeed there is some connection between Unique Games and STPs and earlier this year, last year, which we showed the following result; that if you are in Unique Games Conjecture, then in fact for every constraint satisfaction problem the simplest STPs give the best approximation, computable efficient. It's NP-hard to compute an approximation better than the simple STP. In fact, for all the CSPs, you have exactly a common barrier which is the simple STP gives the best approximation, if you assume the Unique Games Conjecture. And, surprisingly, what the same hardness reduction also gives an algorithm which is optimal for every CSP in the sense that it achieves exactly the approximation ratio which Unique Games tells you is optimal. So if one believes in the Unique Games Conjecture, these two results would give both an algorithm which is optimal and a hardness which is matching. >>: Do you just have to prove the conjecture? >>: Directly MP problem. >> Prasad Raghavendra: Luckily we're not that done, and we still have work to do. >>: Did you use the specific properties of some definite problem or this may be included if you try more general problems like [inaudible]. >> Prasad Raghavendra: It uses properties of similar program. >>: So it's possible for somebody to write [inaudible] program that go beyond this? >> Prasad Raghavendra: If unique games is false, yeah. So this actually goes beyond CSP, and there are other problems for which similar things happen. Like if you look at the three record problem, here you're given a graph and given three terminals. ABC. And the goal is to actually cut the graph into three pieces so that you minimize the number of crossing gauges. So that is the three-record problem. So it's basically a generalization of the classic ST cut problem where you have two terminals. You want to find the minimum cut separating them. This is a very nice approximation algorithm which embeds this graph into a simplex and cuts the simplex. And it's a 12-by-11 approximation for this problem, actually. So this is a minimization problem, unlike CSP, and there are several generalizations of this, like KV cut and there's a class of matriculating problems. And it turns out if you assume the Unique Games Conjecture for each of these problems there's a single linear program that gives the best approximation. This linear program actually has been used in one of these analyses and this is the case. >>: Which? >> Prasad Raghavendra: This is [inaudible] so that is graph labeling. And it goes more -- so suppose you have a bunch of football teams which play a bunch of games, and in order to rank the teams such that the result agrees with the maximum number of games, there is a maximum number of games the higher ranked team wins. Formally this is just you're given a directed graph and you want to order the vertices so maximum number of it goes in the forward direction. So this is called the maximum cyclic subgraph problem, and this is in very old problem in approximation. And the best approximation algorithm we know today is output to random order, which uses a half approximation. So a half a approximation just because of ->>: [Inaudible]. >> Prasad Raghavendra: Yeah, that's even simpler. Pick an arbitrary output this way. One of them will be more than half. So this is the best approximation we have today. And if you assume the Unique Games Conjecture, indeed this is the best you can do. And, in fact, I mean it's more general in the sense that you can look at other kinds of ordering constraints and section problems where your output has to be an ordering and there are constraints on that ordering. One can show that the simplest [inaudible] program gives the best approximation. >>: [Inaudible] for the best algorithm in that instance to achieve one half. It's possible to have an algorithm would be better, almost better than random [phonetic]. >> Prasad Raghavendra: Yes, worst case analysis. Yeah. So actually this goes on and on. So if you look at -- so if you are again if you -- so there's this problem called the [inaudible] problem, which is kind of a special case of a quadratic optimization. Even there the simplest tested figure is the best approximation. And recently Cort now pointed out that a class of clustering problems known as a kernel clustering problems for them simplest PA gives them the best approximation. >>: [Inaudible]. >> Prasad Raghavendra: There are three now? Yes. So really, for a surprising variety of problems, like CSPs, labeling problem, some clustering problems, ordering CSPs. We have actually reached a common barrier, Unique Games, in the sense that all the algorithmic techniques have reached this common barrier on so many different problems. And this barrier, in fact, identifies exactly how well we can do today in the sense that it matches the approximations of so many different algorithms. And if you begin to [inaudible] then clearly that would, you know, that would solve many of these problems in the sense that it would give a tight upper bound for all these problems. If Unique Games is false, then hopefully a new algorithmic technique will come out. >>: You talk about [inaudible] you don't care? The routing -- you look at [inaudible]. >> Prasad Raghavendra: Yeah, yeah. Right. But for CSPs we get a rounding scheme out of this; but in general I'm just talking about not the rounding but just the integrating. And even if unique games is false, all this effort at proving results as Unique Games is actually useful because, firstly, the algorithm which we go obtain generally converts them for every CSP, at least as good for all known algorithms, it's a common algorithm at least as good as all the algorithms we know irrespective of the proof of the unique games collection. >>: When you say it's at least as good, the approximation [inaudible] the complexity or, not polynomial, but every one of the algorithms or every one of the ->> Prasad Raghavendra: So all the CSP algorithms I showed with the first solver is a definite program, then it will do a rounding. Rounding is usually linear time. And some definite programs solving. Here again even for the general STPs take the same time. The rounding again takes linear time except that the constant in that is horrible. That's the difference. I mean, on the way also we get lower bounds against STP, that means STP in gap, several problems in this exercise, actually. An interesting consequence of these selections is that one can actually get an algorithm to compute the STP integral. So I'll give an example of an application. So the growth in equality is an inequality which is shown by [inaudible] in 1953 which holds for all matrices. Basically, it says if you look at summation XIYJ and you optimize this over minus 1-1 or XYJ being unit vectors, the two are at most a constant KG. Clearly without, even without the constant, this side is better than this because when you're optimizing your vectors you can also choose one minus one. That's it. >>: And it's a sum over -- no graph there. >> Prasad Raghavendra: There's no graph here. So this is the growth inequality. And its values are known in the sense that these are the best upper and lower bounds that we have for the growth inequality as of today. And this work actually gives an algorithm to compute the growth constant approximation in time which is double exponential in the approximation. >>: [inaudible] what's the estimate, the ->>: [inaudible]. >> Prasad Raghavendra: It's about the exponential, you take the total over the 10, that's the large quantity. >>: Maybe you can start the STP or something. >>: Two to the thousand. >>: You need first set to succeed then you need all the galaxies. [laughter]. >> Prasad Raghavendra: Okay. So now I move on to how one proves this for the case of Max Cut, which is a very simple case, which illustrates most of the main ideas in this. So Max Cut, so you are given a weighted graph and you want to find the cut, which is the cut of the maximum edges, if you want to maximum the total weight of crossing edges. And we'll always assume that the total weight of the edges in the graph is 1. So it's a probability distribution of edges. And hence we can talk of just a fraction of crossing edges for every graph in terms of the number. So one can write a quadratic program for this. Basically for every vertex your variable which takes 1 or minus 1 variables, depending on which side of the cut it is. And this is just one on different side of the cut. And you can't solve this quadratic program, so you relax this in the sense that you -- 1 minus 1, you allow it to be unit vectors. So now you have this semi different program where it's unit vectors and we want to maximize this quantity. So, okay. So in terms of words, it's just that we want to embed the graph on N vertices into an end dimensional unit ball while maximizing the average square distance, average square lengths of the edges, without it would just be under the distribution given by the weights. So this is just a restatement of Max Cut statistic. Okay. So what is our goal? So we solve this semi different program, and we got a bunch of vertices on the end dimensional ball. And we want to find the maximum on this graph. And what we want to do, we want to get both the algorithm and the hardness at the same time. We want to get the algorithm and we want to show it's optimal both at the same time. So for this we need, for hardness results we need a gadget. And these gadgets are called dictatorship tests. So in this case I would just introduce what I mean by dictatorship test for the special case of Max Cut. So for Max Cut a dictatorship test is just the following. So a dictatorship test would just be a graph on the hypercube. So the hyper cube is minus 1-1, let's say fixed some con height, large constant dimension, let's say 100 dimensions. Look at the hyper given 100 dimensions. A dictatorship test is just a graph on this hypercube, with vertices being this. And it has what is called a -- it should have a dichotomy of cuts. So if you look at the cuts which are parallel to the axis, that is the dictator cuts, they should have a high value. They should cut a large number of edges. Whereas, if you look at cuts which are far from every dictator they should cut a small number of edges. And this is, the completeness is just the number of fraction of cut by dictator cuts, cuts parallel to the axis, and soundness is the maximum number, maximum fraction of edges cut by something far from a dictator. So if we want to actually construct graphs on the hypercube such that the dictator cuts have very high value, and things far from, cuts far from dictator have very low value. So that's the goal. And suppose we do that. Suppose we construct a dictatorship test with completeness C and soundness test, and then by result of code linear [inaudible] or random you immediately get the Unique Games hardness. That is with the same parameters. So you would get the hardness saying that on instances with value C it is NP hard for the solution of value more than S. So now we want to construct these graphs on the hypercube. So what we'll actually do is the following: Suppose we are given some arbitrary graph on N vertices, and we solve the DP. We got the vertices on the unit ball. What we'll do is use the construct graph on the hypercube with the following properties. If you look at a dictator cut, its value is equal to the STP value of the graph G. And if you look at a cut which is far from dictator, suppose it's a cut which is far from dictate for this high hundred dimensional cube, then there's an efficient procedure by which you can get back a cut to the original graph with the same value. So if the graph, for instance, had STP solution .9 where the actual cut is at most .7, what will happen is the completeness will be .9 because the dictator cuts would get STP value. Whereas anything not a dictator, if it cannot be more than .7 because you can, whatever the cut value is, you can get back a cut to the original graph with the same value. So this actually gives, already gives an algorithm for Max Cut in the following sense. So let's say we construct this hypercube. We can look at all possible cuts which are not dictators here. And run this conversion to get a cut of that graph. So run on around on every cut on this; you get the cut for original graph and then output the best. >>: [inaudible] the possible cuts of 100 dimensions? >> Prasad Raghavendra: Yes. Of the hypercube, yes. Actually, that is the reason why. And you might not have to look at all possible cuts. But, in general, for arbitrary CSP, you have to. Maybe for Max Cut you can use that, use half spaces, symmetric functions, so on. So, okay, so what we have done is we have -- I mean we'll construct a dictatorship test with completeness STP value, and soundness by definition is the output of the algorithm, the optimal algorithm which I showed. So using that conversion, you would get a fact that the algorithm is indeed optimum. Okay. So this is the hardness. This is what -- so now our goal is just to do this conversion. From a graph we want to get a graph on the hypercube with these properties. Okay. So let's see how we construct this. So we have STP solution. We want to construct a graph here. So we intuitively what we're going to do is just the following. For every edge E here, you connect every pair of vertices on the hypercube at that distance. So if the ease of length B then you connect every pair of vertices of length D on the hypercube. But this is only roughly because we have 100 dimensions hypercube and there might not be any of that length on the 100 dimensions on the hypercube starting from this STP. So really what we'll do is I'll describe a random procedure to sample an edge from the hypercube that will give a graph on the hypercube. So what is the random procedure? You basically pick an edge from the original STP, arbitrary edge UE, and then you generate edges of that length on the hypercube. How do you do that? You pick a random point and suppose the length of the edge is D, then once you pick the random X you flip every bit of X with priority D over 4 to get another point Y and connect these two. So really the expected length of the edge XY will be equal to D because you're flipping each bit with priority D. B or 4. So actually signal that. And then once you generate this X and Y pair on the hypercube, we put each bit of Y and Y on epsilon, priority epsilon. I mean we can ignore this fact for most of the talk. >>: The edge is random, why do you want to [inaudible]. It's as random perturbed as unperturbed. Y has to be perturbed but why X? >> Prasad Raghavendra: It would give the same graph, yes. But this is just easier for analysis, just for analysis it's easier. And then output the X edge Y prime. So this gives a graph on the hypercube, because I'm giving you a method for sample edges of the hypercube. So let's see how dictator cuts do on this. So okay how do we estimate the value of the cut? The value of the cut is given by -- first we chose an edge E in the original graph. So expectation over twice the edge E in the original graph. And then we generated these two points on the hypercube, with appropriate distance. So expectation of the choice of edge E and expectation of the choice of these two points of, suppose, F is the cut which is -- a cut is a function on the hypercube, N minus 1. F of X minus Y, whole square. This is an indicator of edge is cut. So this is just an expectation of giving out whether this particular edge is cut or not. We'll say F of X is equal to X-1, that's the first dictator function. Now, the point is our choice of these two points X and Y is such that X-1 is not equal to Y-1 with exactly the priority UV minus holds square. If E is the edge UV, we pick an edge UV, we generate an edge exactly that length on that hypercube. X-1 is not equal to Y-1, with priority minus UV squared. So we replace that here. And you can see that this is indeed the STP value of the original graph, because expectation of the choice of edge E of the length, the squared length of the edge is just the STP value. So dictator cuts get the STP value. So now we want to analyze things which are not dictators, cuts which are not dictators. And to do this we define a graph on the Goshen space. So the graph is as follows. First I'll define a random projection of G. So random projection of G would be, you know, you obtain [inaudible] as follows. You just generate hundred random directions in the Goshen -- 100 random vectors each [inaudible] Goshen. You project your STP solution along these 100 directions. So each vector U, you map it to U.G1, U.G2, U.G100. So this actually, if you start with the graph and dimensional space, it gives you a copy of the graph in 100 dimensional space, because you just projected it down. So there is again a perturbation that lets you know that. So this is the copy of the original graph in 100 dimension space. And the real Goshen graph is just the union of all possible projections of G. So you take for different choices of the random vectors you have different projections. You take the union of all possible projections, that is the Goshen graph. So it's a graph on real numbers to the par 100 or 100 dimensional space and it's a -- the nice thing about this graph is that suppose somebody gives a cut for this Goshen graph, then one can get back a cut for the original graph. How? Just by this Goshen graph is just the union of several copies of the original graph G. Different places. So you pick a random copy and read out the cut. So basically you pick a random projection and then you read out the cut. That would give a cut for the original graph. So this Goshen graph, given any cut we can get back for that same graph, and what we will do is we have the hypercube. We have a cut which is far from dictator on the hypercube. We will use the invariant principle to say that there is a cut of the same value on the Goshen graph. That would finish the proof. So the invariant principle is a generalization of the central theorem. The central theorem roughly says that the sum of large number of 1 minus 1 random variables has similar distribution as the sum large Goshen random variables. The invariant principle for loading a polynomial says you replace the sum by a low degree polynomial. If a low degree polynomial is far from a dictator, then 1 minus 1, Goshen, it has similar distributions. So what do we have now? On the hypercube we have a cut which is far from a dictator. So it's a function on the hypercube. So express it in a multi-linear polynomial as a full expansion. And now you start plugging in Goshen values. Ideally, you would -- you would expect just the real numbers. But because of the invariance principle, you can zoom it now with 1 minus 1 values, because its behavior on the hypercube is the same as the behavior on the Goshen space. You substitute Goshen outputs 1 minus 1 values. So really the same polynomial also gives a cut of the Goshen space. So using the invariance principle one more time to actually say that if you give me a cut which is far from dictator on the hypercube, the corresponding polynomial extension is a cut on the Goshen space, Goshen graph, with the same value is. We use invariance principle to show that the Max Cut value is the same on this Goshen graph as the hypercube graph. And that kind of finishes it, because basically all we've shown is that given a cut far from dictator here you can get a cut with the same value on the Goshen graph, and we already saw that Goshen graph is given a cut for the Goshen graph you can get back cut for the original graph, because the Goshen graph was just the union of several copies of the G. Okay. So that kind of finishes the proof. So this same framework works for all CSPs, in fact, and it's a, kind of a powerful tool to convert the weakness of an algorithm into Unique Games hardness results. Okay. I'll get to future work now. So, of course, the most important question is to actually resolve the Unique Games Conjecture, but since we -- probably more tractable goals which one can work on. One is that we said if Unique Games is false, it will hopefully lead to new algorithms. It's very good to make this precise in the sense that if Unique Games is false, show that it actually leads to some new algorithm, better approximation for a problem which we care about, like, let's say, Max Cut. The other is this is a very pressing question is that directly show that stronger semi different programming relaxations do not disprove any Unique Games Conjecture; that is, to go beyond this barrier do we have to look beyond STP or even stronger STPs will do. Even this is unresolved today. So it could be that just a few more constraints into the STP would give a better algorithm for these problems. And we haven't ruled that out yet. And the other interesting open direction is that there seems to be some connection between Unique Games and expansion in the sense that all hard instances of Unique Games actually come from graphs in which every small set expands, so in fact the hardness of Unique Games just comes from expansion of small sets in the Goshen space, but at the same time recent results show that if your graph is an expander, then Unique Games is easy. So these seem to be contradictory and it needs to be resolved, actually. And, of course, one needs to look beyond CSPs, all the STPs came in '94. There have been very few algorithms for problems like which do not look like CSPs, like metric TSP or standard 3, there haven't been any applications of STP or Unique Games for these things. And it will be interesting to see techniques that are useful there. There's an interesting ->>: Has it been looked at from -- [inaudible]. >> Prasad Raghavendra: Vertex goes on, yes, one of the things. But still almost an CSP for Max Cut. >>: [inaudible]. >> Prasad Raghavendra: Huh? >>: [inaudible]. >> Prasad Raghavendra: Yes, yes. But it's not -- I mean, all the LP still gives the best approximation. It's nearly a CSP. We invited Fritz, one can show hardness in this framework in paper. But however these are out of reach. So there's like 15-year-old conjecture that every CSP is actually polynomial soluble or NP-hard, which means 3 set is NP-hard, 2 set is polynomial time. And there's nothing in between. Every CSP is actually poly time or NP-hard. And this has been resolved for boolean CSPs, domain size 3. But higher order it's not been resolved. And recently [inaudible] pointed out that some of these techniques from approximation can be useful to resolve this. Okay. So I'll conclude the talk here. I'll be happy to talk about any of the other things which I was interested in, that I'm interested in. Thank you. [applause]