>> Mohit Singh: Okay. Hi everyone. It's a great pleasure to introduce Thomas who just came to [inaudible]. He’s U-dub and he'll talk about some very exciting work that he's done on matching polytopes. >> Thomas Rothvoss: Thanks a lot. Yeah. Thanks for inviting me. Yeah, we’ll talk about one of my more recent papers which is on extension complexity of the matching polytope. Okay. So what are we actually talking about, let's imagine you have a polytope P and usually the way how you represent a polytope it's by giving me the inequalities, but you could imagine that you have a different way of writing the polytope so you could imagine that you have, you allow some extra variables and then you take all the points so that there is a solution for [inaudible] variable so that it's a different constraint system is satisfied. And what this geometrically means is that you have a higher polytope Q which essentially is the solution to that second system so that if you project it down on the X variables that gives your original polytope P. Now you might wonder why should you do that and what the point is that there are many examples where the original inequality description needs let's say exponentially many inequalities, but it's possible by adding those extra variables so you reduce the number of constraints to let's say a polynomial. And if you count here we have a polytope P with eight inequalities and that higher dimension polytope actually needs only six. Okay. And now formally you can define the extension complexity of such a polytope P as the smallest number of inequalities that you need to define such a higher dimension of polytope that you can project down and get P. So this is somewhat the, this is a good measure for the complexity of a polytope, and there are tons of examples where the extension complexity of a polytope is bounded by polynomial even though the number of inequalities of the original polytope is actually exponentially large. So, for example, this works for the spanning tree polytope or the permutahedron. In this work we actually are going to care about the other direction. So for which polytopes can you show that the polytopes are, they are really hard in the sense that there's no way to significantly reduce the number of inequalities. There also has been some work done and the whole story actually starts with a paper of Yannakakis beginning of the 90s, where he showed that there is no symmetric linear programming formulation of polynomial sides for, actually for the matching polytope and for the TSP polytope. Now this is, this assumption that he talks only about symmetric LPs, what that really restricted the number that really restricted the applications, but essentially what he wanted to do is he just wanted to show that a bunch of papers who claimed that NP equals P, that all those approaches were wrong because they were actually based on a symmetric LP. So this worked to reject those papers, but then actually it turned out that this restriction of looking only at symmetric LPs it’s really a strong [inaudible]. Then, well actually if you look at random 0, 1 polytopes using some [inaudible] arguments it’s not so difficult to show that the extension complexity must be large. Also, if you look at nonsymmetric LPs, but that didn't really lead anywhere until there was the big breakthrough of Fiorini, Massar, Pokutta, Tiwary, and de Wolf, and they showed that for the traveling salesman problem, the linear program must have exponential extension complexity for general LPs. Now these techniques you can extend them and you can also show that there are certain polytopes that you cannot approximate well. So this approach actually was somewhat very flexible to show that. There was essentially some prior line of work that was done very recently by James and others, and what they showed is that you cannot have a linear program for max cut of polynomial asides which gives you an approximation factor of better than two. It's somewhat surprising because you probably remember that the SDP it gives you a factor of which is much better which is just 1.13. So this already shows that LPs are much weaker than what you can do with SDPs. But you might argue that actually those polytopes here, for which you see unconditional lower bounds, the underlined polytopes are all NP-hard. While it’s a little bit debatable, the max cut, polytope is NP-hard; you have a spectrahedron which is better than two minus epsilon [inaudible]. If the underlying polytope is NP-hard, yeah. We have a couple of examples. So what about a polytope, which is actually a nice polytope in the sense that you can optimize any linear function polynomial time, can you get any kind of the lower bound on that? And the prime example that exists for this is actually the so-called perfect matching polytope. Let me just briefly remind you what is the perfect matching polytope? We're looking at a complete graph on end nodes. You probably remember what a perfect matching is, so the perfect matching polytope it’s the convex hard of all perfect matchings in that complete graph. And if you try to write down the linear constraints that are supposed to define your polytope then you start with these constraints, you say well, for every node you want to pick up one edge outgoing but then you quickly realize that this is not really defining you the perfect matching polytope because this does not rule out extreme points of this form. Actually it does not rule out feasible points of this form so you could have some odd cycled with one half everywhere so you need some additional inequalities that also kill these kind of solutions. And then you can do by looking at every set of canonality[phonetic] and then you can require that well, if you have a perfect matching then you must have at least one edge leaving that set U. So you can write down some more constraints which actually ask for exactly that. And this is a very classical work of Jake Edmonds, and he already proved in the 60s that this really describes, correctly describes the convex hull of all perfect matchings. Edmonds also gave an algorithm that can optimize any linear function over this polytope in polynomial time just by finding a maximum weight matching. You could use that already to solve the separation problem, but there's actually also a different way to solve the separation problem which is maybe a little more elegant which is due to Padberg and Rao. So it's kind of it’s a nice polytope but if you look, if you count the number of inequalities that you have there are exponentially many. So you have really exponentially many facets. Then you wonder can you significantly reduce the number of inequalities that you have by introducing some of these sub variables. And what this is, what we're going to see here, this is actually not possible. So you can, the only thing that you could do is actually reduce the constant that you have in the exponent, but apart from that this description is actually almost optimal. So there's no way to be sub exponentially in the number of the inequalities. And everything that was previously known to the best of my knowledge was only essentially trivial lower bound that you get from the dimension. Before I go into detail and show you how to prove that let's talk a little bit about the theory for extended formations. In particular, we need to talk about what's called the slack-matrix. Let me remind you what is the slack-matrix of a polytope P? You can look at a polytope and you can look at the vertices and you can also look at the inequalities; and now the slack-matrix is a huge matrix and an exponential size matrix so that you have a row for every facet and you have a column for every vertex, and the entry that you have for a vertex and a facet is that it's the distance, it’s essentially the distance that the vertex has with respect to the facet. It’s the slack that the vertex has with respect to the facet. Okay. We need one more definition which is the so-called nonnegative rank of a matrix that’s the smallest number of columns of a matrix U and rows of a matrix V so that you can find nonnegative matrixes so that you can factor your original matrix S. And if you would drop this non-negativity condition then you would recover the usual rank that you know from linear algebra. So this quantity is always at least usual rank. Why is this nonnegative rank, why is this interesting? Because of the theorem of Yannakakis from his classical paper where he showed that you take any polytope, you take any slack-matrix, and then actually the extension complexity of the polytope it equals the non-negative rank of the slack-matrix. And this is a very nice elegant relation. In particular, this is very helpful for proving lower bounds because instead of arguing on some kind of higher dimension point where actually you don't know the polytope you need to argue only about one slack-matrix and you really know the slack-matrix. So you have to argue only about one object which makes things much easier. If we have two minutes I can actually just briefly outline the proof of it if you like. Okay. So one direction is kind of easy. So let's imagine you do have a non-negative factorization of the slackmatrix then what is the extended formulation? Well, you can just write your original polytope as all X so that there is a non-negative Y so that you take, and then you take the original inequality system and then you just add the left side matrix plus that non-negative Y. That should equal the right-hand side. And that kind of makes sense because if you imagine that you have a vertex I and you want to prove me that this satisfies this thing so this lies on the polytope then you can just take the corresponding column of V and that's just your Y and then you multiply this stuff and then gives you precisely the slack vector that you need to make this equation. Okay. The other direction uses duality and how this works is the following, let’s imagine that this is your polytope and suppose you do have a higher dimension polytope Q that projects on P let's say with this description. And now I claim that you can also find a factorization where the size of the factorization is the number of inequalities that you need to describe Q. How that works is the following, well, what do we need to do? We need to find for every inequality of the original polytope we need to find a non-negative vector U and for every vertex we need to find a non-negative vector V so that their scalar product gives the slack which this vertex has with respect to that inequality. The way have you do that is, now you have an inequality here of the polytope and by duality you can use some inequalities of that higher dimensional polytope and you can add up nonnegative multiples of them in order to combine that inequality. And you take those non-negative multiples and those coefficients give you the U vector. And then you look at that vertex and you also want to find a non-negative vector for it. The way how you do that is you look up into your high dimensional polytope and you take a point here, and then you take the slack vector which this point has with respect to all the inequalities of that high dimensional polytope Q. This is again a non-negative vector, and if you calculate what there in a product then it turns out this is precisely the slack of this vertex with respect that inequality. Okay, good. >>: [inaudible]? The extended formulation, the forced direction? >> Thomas Rothvoss: Yes. >>: How do you calculate this [inaudible] facets? >> Thomas Rothvoss: Okay. So, no. The number of the equations is as large as the number of inequalities so that might be large. But the number of inequalities that you have it's, oh. Yeah, yeah. The number of inequalities, just the number of columns of U. So you have tons of equations, but many of them are redundant. So you could essentially throw out everything which is redundant, just take [inaudible]. But you're right. This might still be large but you can take a subset which is very small. >>: The Y is not [inaudible]? >> Thomas Rothvoss: Yes, yes. Okay. There was this paper of Fiorini [inaudible] that I mentioned and the technique which they used to show that the extension complexity is large and actually the non-negative rank of this slack-matrix is large, the way how they did that was by using the so-called rectangle covering lower bound. And what it says is just that the nonnegative rank of any slack-matrix must be at least the rectangle covering number. So let me just give you a quick picture. Now imagine that this is any matrix, any nonnegative matrix, let's say that this is a nonnegative factorization. And now let's just forget the numbers. Let's just look are the numbers zero or are they positive? Then actually if you look at the I of column of U and the I of row of V then this induces a combinatorial rectangle and so that the rectangle it has only positive entries in that slack-matrix S. And if you look at all of those non-negative rank many rectangles then you can see that they cover all the positive entries of the slack-matrix S. And it doesn’t cover any zero entries. So, the natural question is if you want to show a lower bound for the perfect matching polytope we could try to apply this rectangle covering lower bound. So let's try that. Well, it will actually turn out that this doesn't work. But let's see why it doesn't work. I mentioned that this is a slack-matrix. Let’s say this is the slack-matrix of the perfect matching polytope. The perfect matching polytope, it actually it has a bunch of different constraints, the degree constraints, the non-negativity constraints, and there are these [inaudible] inequalities. But only the [inaudible] inequalities there are exponentially many of them. The others are only polynomial many. So actually we only need to care about that part of the slack-matrix which comes from the [inaudible] inequalities. Let me remind you that that then we have entries of the following forms of the slack-matrix and then we have for every odd cut U and for every matching, perfect matching M we have an entry which is not the form the number of edges which they had to have in common minus one because this is the slack. And I claim that we can cover the positive entries in that partial slack-matrix with only into the form many rectangles. How that works is the following, we take every pair of non-[inaudible] edges and for every such pair we get one the rectangle. So what I need to tell you is what is a set of matching that lies in the rectangle and what is the set of cuts that lie in the rectangle. And the set of cuts is simply the set of cuts which cut both edges. And the set of matchings is simply the set of matchings which contain both edges. Now it this way you see that every entry U, M which lies in this rectangle must have a positive slack because they share at least those two edges in e1 and e2. And the other way around if you have any entry where you have a positive slack then you must have at least two edges in common, well, actually by parity reasons at least three so you can just take two of those edges and you see that the entry is in at least one rectangle. If you look a little closer, if you, generally if you have an entry U, M then they share K edges then this entry with be in precisely K choose too many rectangles. Now we could be a little bit naive and we could say that well now this lower bound doesn't work. So we have an upper bound and a lower bound which is actually useless, but this might give us some ideas. In fact, now let's try to take this construction and let’s try to get a factorization out of it. So let's try to do the following, let’s try to write the slack-matrix as the sum of all those into the form any rectangles. And what I mean here is that you have that rectangle and that gives you a zero, one, rank one matrix where you have a one entry if that entry lies in the rectangle in the zero otherwise. And now I'm wondering why is this not a correct factorization? Well, obviously it is not otherwise the title of the talk would be different. But what's going wrong? Now let's look at an entry U, M where they share K edges. You know that on the left-hand side for that entry you expect a variable of K minus 1. On the right-hand side you know that this entry is in roughly a quadratic number of rectangles so the valuable that you get is like this curve. So it doesn't work. Now I do have some freedom. I could put some non-active scalars here in front. But you see that there is no way how I could put scalars there and make this curve equal to that curve. So that doesn't work. >>: I have a question. Sorry. I was, can you say that argument again? Because I was always taught only [inaudible]. I wasn’t listening. I was trying to figure out the K. >> Thomas Rothvoss: That's an excuse. >>: I was going 20 percent slower than you so if you say that again I would appreciate it. >> Thomas Rothvoss: Okay, fine. So if we take this construction, and we consider it as a [inaudible] factorization, non-negative factorization of our slack-matrix and then we wonder why is this going wrong? >>: And you’re telling the issue is the multiplicity of the cover. That's when you lost me. >> Thomas Rothvoss: So if you look at to be entries U, M and you look at an entry which shares, where they share K edges then this is the slack that you should get and this is, so this is saying that this curve gives you the value that you expect on the left-hand side, and this quadratic curve, this gives you the value on the right-hand side because every entry is in a quadratic number of rectangles. Okay. And now you>>: The [inaudible] is not quite precise. [inaudible] K [inaudible]. >> Thomas Rothvoss: Yeah. Okay, okay. >>: [inaudible]. >> Thomas Rothvoss: Well, I think this is one and this is two or three or so. Anyway, yeah. So the point is, the point that I tried to illustrate here is that if you have entries where you do have a large slack, think of this as being some large constant, then it seems that this kind of entry it’s just in too many rectangles. So we could very naively ask maybe every covering of the slack matrix with only, let's say with polynomial in many rectangles, every such covering kind of covers the entries that have a large slack too many times maybe. It's a bit naive. The strange thing is that it turns out that the answer is yes, this precisely the case. Now this is actually what we are going to prove. But what kind of proof approach should I use if this rectangle covering lower bound does not work. I need some kind of a proof technique that I can work with. And the technique that I'm going to use is the so-called>>: So is the, I missed this, is the minimum rectangle covering exactly equal to positive [inaudible]? >> Thomas Rothvoss: No, no. >>: It's just>> Thomas Rothvoss: They could be a very different. >>: Okay. So you're going to lower bound this and that's going to translate and go lower bound on>> Thomas Rothvoss: On the nonnegative rank, yes. >>: Okay. But in general it could be larger? >> Thomas Rothvoss: So the directing a covering lower bound it’s just a lower bound. It could be very loose. Fiorini and others, here in this case it turns out to be a very loose. For the TSP polytope well, if you find some tricky inequalities you can actually, then there's a rectangle covering lower bound is actually surprisingly good. Here, in this case it’s terribly bad. This is why we kind of need to find something different, something better. >>: So you mentioned the lower bound of n squared of the lower bound. Could you get it even a little better than this one [inaudible] use this? >> Thomas Rothvoss: I'm not aware of any publication where anything better than that was mentioned. Maybe somebody came up with something polynomial better but didn't publish it. >>: Do you know what it’s rectangular? >> Thomas Rothvoss: Actually, not. I don’t know whether it's really n squared or whether is n to the four. It's not hard to show that it’s at least n squared and so it's not hard to show that it’s at most n to the four. Mainly, it might be actually closer to n to the four. That would be my guess. But that's just a speculation. Okay. This kind of stronger lower bound that we are going to use it's the so-called hyperplane separation lower bound. And this was suggested by Samuel Fiorini. Actually it works as follows, imagine you can pick a magic linear function, well a linear function in the space of fewer matrices with the following property that for every rectangle that you can pick this linear function is very small. And if you want a geometric picture then you’re in some high dimensional space and the linear function it gives us some kind of an inequality, some hyperplane and all the rectangles lie on one side. And now let's suppose that you were so lucky to pick the linear function so that the slack-matrix itself has a very high value on that linear function. >>: Rectangles mean zero, one matrices or [inaudible] matrices? >> Thomas Rothvoss: Rank one, zero, one matrix. Yes. [inaudible] switching these notions. Now, then the claim is then the non-negative rank of the slack-matrix it’s essentially lower bounded by the ratio of both values of that linear function W. So the biggest entry in the slackmatrix in our case it’s at most 10. So it’s not a value that we need to care much about. Now what's the intuition? Imagine your slack-matrix would be the sum of rectangles, would be the sum of rank one, zero, one matrices. Then what this claim says is that look, you have a linear function and it’s more on every of those rectangles but it's very large on the slack-matrix then obviously you had to use a lot of rectangles. But now the slack-matrix is not necessarily the sum of rectangles, but you can imagine that the slack-matrix it’s the sum of rank one matrices where all the entries are between zero and one. So we would only lose a factor of n to go to that view. And that actually it turns out that if you look at the convex hull of the rectangles that's actually, this is precisely the set of rank one matrices with entries between zero and one. I think this is also called the Bell polytope. I think in quantum physics this would be a [inaudible]. So we have this linear function and all those rank one matrices with entries between zero and one they have a very small value and then the slack-matrix has a very large value and then that just means that you had to use a lot of these rank one matrices. >>: [inaudible]? >> Thomas Rothvoss: Yes, yes. Because you add up that matrix that you have with some other nonnegative matrices. You add it up and then you know that the thing that you get the slackmatrix it doesn't have two large entries. So, and obviously each of the sum end was not large either. So the tricky question is how do you come up with this magic linear function? Now let's try to apply this rectangle covering lower bound to the matching case. Let me introduce a little bit of notation. Let's say Q, L, this is the set of entries with a not cut U and a perfect matching M so that they share L edges. Now this is going to be my magic linear function where for an entry which has slack zero I put a minus infinity. >>: Can you tell us how to recover the rectangle covering lower bounds for this formulation? >> Thomas Rothvoss: Yes. For every entry where you have a zero slack you would put a minus infinity which essentially means that you only need to look at rectangles where you're not containing any slack zero entry. And then you'll put something positive on the others. In fact, if you’re linear bound you can really see in this framework so essentially they have a measure and they distribute it on a particular set of entries where the slack as positive. I think where the slack is one, actually. >>: Suppose I just put minus infinity of one or something like that. >> Thomas Rothvoss: Yeah, yeah, yeah. That is>>: Then you will get entire covering bound? >> Thomas Rothvoss: More or less. More to some small effect I think. This is essentially what you would do for the rectangle covering lower bound. You would take, let's say you take a measure, you look at all the entries, this is the smallest positive slack due to parity reasons; and you take a measure and you take a measure of one and you distribute that uniformly of all those entries. Okay. This is essentially the rectangle covering lower bound. Well, we have one row left and we know that the rectangular covering lower bound does not work. So we should put something here and that's what we're going to put. So we look at some large constant K, in fact 501 works, and then also we distribute the measuring uniformly on all those entries. It's just that we scale it a little bit down. And the important thing is that we put the negative number there. It's negative. Now imagine what would you do here, imagine you want to find a rectangle which maximizes this in a product, this [inaudible] product, how would you do that? You would like to collect as many of those entries where you have a small slack but you would be completely forbidden to take any entry where you have slack zero and you also should try to avoid to contain too many entries where you have a large slack. We'll say that you cannot come very far with that. Now, first of all let's start with the easy things first. If you look at the linear product of that linear function with the slack-matrix you'll get a nice constant value because the slack-matrix has a zero entry for this, that gives us zero, then we get this is a measure so we get to one, but all the entries have a slack too, so this [inaudible] with two and so here we put a measure again and the slack that we get is actually K minus one. And that K minus one cancels out with this K minus one so we essentially have a minus one here. And some constant is left. The crucial thing is that we can prove that for every rectangle the inner product with any rectangle is exponentially small. And then the inner product with the slack matrix as a constant, the inner product with any rectangle is exponentially small and then the hyperplane separation lower bound that gives us an exponential lower bound. And this is research in the lower bound [inaudible] end. >>: Can I [inaudible] one more time? What is, a rectangle is a set of cuts and a set of perfect matchings>> Thomas Rothvoss: Yes. >>: And you say that [inaudible]. >>: What are these numbers [inaudible]? >> Thomas Rothvoss: So these are the entries U, M where they share three edges. These are the entries where they share K edges. >>: [inaudible] exponential? How big are these numbers? >> Thomas Rothvoss: [inaudible] exponentially. Shall we go on? Okay. A little bit more rotation>>: Can we go back? Can we iteration for why [inaudible]? >> Thomas Rothvoss: Yeah. The iteration is that if you look at the construction that we had, these example rectangles, there you contained actually entries that had a large slack. They were contained kind of too many times. It means that if you look at individual rectangles then the measure for this is quadratically[phonetic] larger in K than the measure that you collected for those entries. And this actually means that if you take the rectangles that we saw a couple of slides ago then the contribution that you get from these guys is much larger, the contribution that you get from those guys. So the value that you get for those rectangles would be very negative. >>: So how do they do that? [inaudible]? [inaudible] slack of these pairs, but you think that is not possible. >> Thomas Rothvoss: Because you want to have many pairs, I mean if you have a tiny rectangle you can do that. But if you have a larger rectangle, at some point you will contain a lot of pairs U, M so that they have a slack precisely K. Tons of them. And then that contribution is going to kill U. I think I have a picture later which tries to. >>: But just by [inaudible] once you have enough matchings and enough cuts you have to get like main cuts, just cutting [inaudible]. Like you have to get enough matchings across these cuts. >>: I guess it’s true also that [inaudible] matching is tied to each of them. Like no cut [inaudible]. >> Thomas Rothvoss: Yeah, yeah, yeah. You also need to use later the rectangle that you're looking at doesn't contain any entry with slack zero. That's important. It would be wrong otherwise. So I tried to introduce the measure mu, L which is essentially, this is just the uniform measure that we had on those entries Q, L. So this is just the fraction of entries of U, M where they share L edges and it’s the fraction that I have in my rectangle R. Let’s say R is my rectangle that I'm talking about for the rest of this talk. And actually the technical lemma which takes let’s say 10 pages to prove is this one. This is the key technical lemma. And this is essentially what I already said just written a little differently. So if you have a rectangle and the one measure is a small, so you don't contain, so this is your rectangle. These are the entries U, M where they only have one edge in common, so this is like zero entry, then you have the following property that if you look at that the mu, 3 measure, so you look at the entries where you share three edges then the fraction that falls into your rectangle is quadratically[phonetic] smaller than the fraction than the rectangle gets for entries that have slack K, slack K minus one because they share K edges. Okay? >>: For all of them? >> Thomas Rothvoss: Yeah. For any constant K, actually the constant moves also here into the exponent. And this is more due to some little error term Some exponentially small error term. >>: So does the same technique give things for like mu, 4 verses mu, K? >> Thomas Rothvoss: Probably. I haven't written it down. You could probably get some general iteration. >>: It’s like mu, 1 equals zero means that you have some pseudo-randomness property. >> Thomas Rothvoss: Yes. It's really very important that you have this condition. Without this condition you could just take let's say everything, if the rectangle was everything then this is one and this is one. So obviously the inequality’s [inaudible]. It's very, very important that you have this condition, and that actually changes the whole relations. Now do you see why this main technical lemma, this actually implies this lower bound because [inaudible] this tiny error this contribution will be quadratically[phonetic] larger than that one. Well, you divide by one over K but it’s still much larger than that contribution and so the only thing that you could collect is that exponentially small term. Good. So the technique that we're using to show this inequality, this measured inequality, it actually originates in a paper of Sasha Rasparoff[phonetic]. So in the beginning of the 90s he hada paper that showed some kind of measured inequalities for pairs. Well, he didn't talk about matchings and cuts, but essentially we can adapt this technique to also work in this case. [inaudible]. The rough idea of Rasparoff[phonetic] proof technique is the following, so this rectangle R that we have it's an arbitrary rectangle. So we don't know any kind of structure properties of it. So what he does is that, so what we're doing is we look at the more structured rectangle let’s call it T, later this is going to be called a partition, then we show that for this structured rectangle the inequality holds. In fact, we're going to show that often 99 percent of those randomly taken structured rectangles this inequality holds if we kind of intersect it here. >>: So what [inaudible] once again? Rectangles, if you take [inaudible] just random? >> Thomas Rothvoss: You will take this kind of rectangle, you would take it at random but it’s not uniformly at random. It's in a very structured way you take it at random. >>: Okay. >> Thomas Rothvoss: Good. >>: You want R to be equal to zero for these rectangles as well, right? >> Thomas Rothvoss: For this one not necessarily. It's more that you look at the restriction here of that rectangle and then you want to argue that this inequality kind of the still holds. >>: But you mean the intersection of T and R when you say that restriction? >> Thomas Rothvoss: Yes. I should probably define this kind of rectangle. So essentially if you look at the picture we are going to define at this rectangle T so that it’s also a set of matchings and it’s a set of cuts. Now I'm going to call this a partition. Now this partition will be defined by as follows, now you mentioned these are the nodes, these are all the nodes in my graph, and then the partition it's going to be defined by set A of nodes, set B of nodes, set C of nodes and a set D of nodes. And the set A it’s partitioned into what I'm going to call blocks, and the set B is also partitioned into blocks. And they have very well picked sizes, the blocks. The numbers are really cooped up so that things work out. In particular, we need later, much later in the talk we need some symmetry. And the symmetry that we get is the following that if you forget three nodes you trick the rest, and this actually looks like one of the blocks. If you forget three nodes in C and D and you take this thing, this actually looks like one of those blocks. So we get it a lot of symmetry that we need later in the talk. >>: What would it look like, the number of nodes? >> Thomas Rothvoss: The number of nodes, yes. Now let me associate a set of edges with this partition which is, as you can see here, so we have edges running inside the blocks and running inside C and D and also running between C and D. And now the set of matchings that belong to this partition is the set of matchings that you can build only with those edges that you see here. And the set of cuts that you can build it's the set of cuts that you can get by taking some nodes of C and then some blocks in A. But the point is that you have to, I force you to either take a complete block or not to intersect the block at all. Why should I do that? Now if you look at the cuts and you look at the matching then the only way how they can intersect is actually here between C and D. And now this is, the sizes of C and D that’s just might be constant K. So essentially I have intersection going on only in the constant number of nodes and edges. So I can really control how the matching and the cut intersect. Now this is my partition, and now I want to rewrite my measures using these partitions. Now essentially what I'm doing is well, what is the measure? Essentially you generate one of those entries U, M where U and M they share three edges and instead of uniformly generating it we generate it definitely, we first pick the partition at random and then we take a random entry in that partition. You’ll get that same measure if you do it right. Okay. So the first step is we take the partition at random; and this is kind of uniform in the sense that you take a uniform A and then you take a uniform C and the rest, you take uniform D and the rest, you take uniform Band the rest, and you take uniform random partitions. So everything’s nicely symmetric. Now once you are in this partition you do the following, you take three edges between C and D and call them H and you take a matching way to contain those edges in respect to the partition and you take a cut with cuts precisely those three edges and respects the partition. What I know is that the cut and the matching they're going to intersect precisely in those three edges that I selected. The good thing is that once I have picked a partition and once I have picked the three edges the choice of picking the matching and the choice of picking the cut are independent because I know that they cannot intersect anywhere else. In fact, it means that this probability that I care about I can separate it because they are independent. So we can play the same game for the measure mu, K. Again, we pick a random partition, now we pick K edges running between C and D so essentially we would pick a perfect matching, perfect bipartite matching between C and D, and then we pick a random matching containing all those edges and we pick a random cut cutting all those edges. >>: This is more trivial fact [inaudible]? By symmetry or>> Thomas Rothvoss: It’s by symmetry. I mean, yeah. I think it's pretty clear that every pair U, M which shares precisely K edges must have the same probability of appearing here. Good. Now for the next, let's say five minutes, let’s forget everything that I told you about cuts and matchings and let's talk a little bit, for James, okay, you don't need to do anything. You can just, let’s talk a little bit about the behavior of large sets. Now imagine you have a set of vectors capital X, and the entries of the vector they are between let's say one and Q and imagine Q as some constant that we don't care much about. Now let's look at a little random experiment. Let's say I take one vector at random from that big set of vectors X, and now I'm looking at one of the coordinates of the vector, and now I observe the random experiment. I look how is the I coordinate behaving? So this is maybe one choice of little x and this is another choice, and then look how does the coordinate X, I look like? And what I claim is that if the center of vectors is large enough then for most indices I that you could select the random variable that you observe here, this is going to be roughly a uniform distribution. You can formulate it the other way around. And this is how you can prove it nicer. Let’s say you have a linear number of coordinates where that coordinate is biased in the sense that the distribution that you see there it’s a little bit away from the uniform distribution. It doesn't really matter whether you look at the statistical difference or the maximum difference in probability. Just a little bit away from the uniform distribution. Then you can prove that the density of that set of vectors that you had that holds that Q to the N, that’s actually exponentially small. This is already essentially some standard arguments that appear in the Rasparoff[phonetic] proof. Let me just give you the quick proof of this fact. Essentially you get the result by counting the entropy. You count the entropy of that little random variable X. What is the entropy? Well, it’s the logarithm of the size of the bigger set where you take the element from and then well you can just bound it using the sub [inaudible] and now let's read the indices and look at the biased ones and the unbiased indices separately. And then you know that the entropy of a random variable where you take some numbers between one and Q then the entropy is at most log Q. And you know that this is the unique case of a distribution where this is obtained is the uniform distribution. And if you're a little bit away from the uniform distribution then the entropy is also a little bit smaller. I guess I can probably skip the picture for Q. So this is the entropy if you just have a coin and if you're a little bit away from the uniform distribution then you're a little bit away from log K. And you can quantify this. That's not a problem. But anyway, if you do have a linear number of biased coordinates then you know that there is a linear number of end review that you're missing, and you rearrange this and you get that inequality. There's actually a different way of seeing this. Essentially the same claim, it’s the following that if you have a large, sufficiently large set of vectors, let's say at least that large, where the density is at least that large and you want to talk about the density. The density it’s, you take a vector X, now you take it in the set of all vectors and you want to check is it in my set X? This is let's say the density. Then actually for most coordinates you're not changing that density if you condition that the coordinates has some particular value J. >>: For most I, for most J? >> Thomas Rothvoss: For most I and all J. Yeah. This is essentially a reformulation of what we already had before. And this is a very useful for us because that’s kind of our business model. We have that rectangle, and we wanted to look at the density, we want to look at what's the measure that falls into that rectangle, and this says that we can essentially locally condition on everything that we like. So we can say we want to look at matchings but we want that these edges are in the matching and those are not. >>: If you have for all of J than that will make [inaudible] some constant Q, right? >> Thomas Rothvoss: Yeah. This depends on Q and on epsilon. Now, we again remember everything that I said about matchings and cuts. James, you don't need to do anything. And let's try to formulize this. Let's look at one partition T and let's look at one choice of those three edges H. And let's call that pair good with respect to the matching part if the following is true, imagine you take a random perfect matching from your rectangle which contains those edges and respect to the partition then I want the distribution that you see on these nodes, this should be uniform. In other words, I want that the matching it induces uniform random edging here. Or the other way around this says that if I have a good partition with a good triple H then it means I can condition on everything here and this is not going to change the outcome of that probability. So how much time do I have, because we had>>: Seven to ten minutes. >> Thomas Rothvoss: 10 minutes? Okay. We can do something similar for cuts. So also again I wanted to call a pair T of a partition, and H of three edges I want to call it good with respect to the cuts if essentially I have the same a number of cuts containing those three edges and I have the same number of cuts containing like everything. Okay. Now why is this useful? So first of all, let me just, this is the measure that I want to bound. And let me, let's express it the way that we saw already and now let's split this according to whether partitions are good or bad. You get this picture. So this is, okay, so we spread the measure. This is the good part and then you can be bad either because the matching part went wrong or the cut part went bad. And actually what we are going to do is we’re going to bound each of those quantities separately. Now in particular, this is probably the most interesting case. This is kind of the generic case where you have a good partition with a good treble of H’s eight. And in this case we see surprisingly that this measure, the three measure is quadratically[phonetic] smaller than the K measure. And probably, this is probably the surprising part. The rest is a little technical, probably not super surprising. So I’ll try to describe in the next maybe five minutes why we come up with this bound. And I think this is the key reason why the matching polytope doesn't have a small LP. And the rest are some technicalities that you need to work on. So I want to show that this is quadratic factor larger is smaller than this and I want to compare this essentially for every partition. Now let's say we fix a partition T and let's say we also fix K edges F. And now we compare the contribution that you had to one measure with the contribution we have to the second measure and then we'll see this divergence. Now what's the contribution to the K measure? That’s the probability that this random cut is my rectangle times the probability that this random matching is in my rectangle. That's easy. Now I want to compare this essentially to the contribution for the three measure, and I do that as follows, I have these K edges and I take three of them at random. And these are my edges H, and then you take a random cut cutting those edges and then you take a random matching containing those edges. The point is that if the partition and that triple H if they are good then it actually doesn't matter what I'm conditioning on. Then it doesn't matter if I condition that I cut all the edges or I cut only those edges. So this probability, it doesn't actually change if I condition on something else. So essentially what I need to bound is what's the fraction of triples H that are good? And I claim that for every partition T and every set of K edges F this is actually quadratically[phonetic] small. So the point is that you cannot have this kind of nice pseudorandom behavior for like all triples, only for a very small fraction. And the reason is the following, I claim that H and H star, so two sets of triples, they can only be good if they share at least two edges. I mean, if you imagine the beginning, if you remember the beginning of the talk we had this construction of rectangles. And the way how we did it we took two edges and then we took all of the cuts cutting those and we take all the matchings containing those two edges and this essentially says that this is the only way how you can get a decent rectangle. Now, let's say we have a triple H and a triple H star and they are both good. Good means that you have some nice pseudorandom behavior in the rest. Now we know that those three edges they are good, it means that whatever I condition here on, whatever matching I condition on here that's fine. I'm not going to change the outcome. In particular, I know that there's some kind of matching in my rectangle where I connect two nodes that are in the other triple. But also that other triple is good and then I know that there must be a cut which cuts precisely the three edges in that red triple. And now let's have a look. So we have a matching and we have a cut and they intersect only in one match. So we have select zero entry in our rectangle and that's a contradiction. And that's it. Yeah. So essentially there's this pseudorandom behavior, it kind of forces of this result. Okay. Now there is some, if I have a three minutes then maybe I quickly outline how you prove the technical part. Now what you're essentially showing is that if you take a random partition T and you pick randomly three edges then the chance that this is the bad because of the matching part went bad that chance is actually very small. You can make this as small as you like, and that's nice. In fact, something much stronger is true. It's not just that for a random partition T this holds, you can fix a lot of things. You can pick the three edges as you like, you can pick the A part as you like, you can essentially pick that B part as you like, and you can give me any kind of partition of the B blogs[phonetic] into two halves. The only thing that is random is how you pick the remaining C and D part. In fact, let's imagine we take, we fix this arbitrarily, and then we pick one of those guys at random, and then we split this and that's the remaining C and D part. Now what I'm claiming is that you can fix everything onto this point and now with a good probability still the remaining outcome in T and H it’s going to be good with a good chance. The reason is, now imagine you have all the matchings that you're going to conform with those edges and you have exponentially many of those in the rectangle and then essentially they must behave almost uniformly randomly in each of those logs and if you’re not pick one at random then you will see an almost uniform behavior there. And that's it. Now finally, I believe that we have a very good understanding in how strong linear programs are, but I think we have a very, very little understanding how strong SDPs are. And I think the next step would be to try to generalize any of those bounds to similar different programs. But that seems to be quite a hard problem. So, yeah. But that's a very nice problem for the future. Thank you. >>: [inaudible]? >> Thomas Rothvoss: Yeah, why not? >>: So what [inaudible] everything or are [inaudible]? So if you want to [inaudible] some lower bounds how do you [inaudible]? >> Thomas Rothvoss: Actually we are already stuck at the very beginning. We are stuck at the very, very beginning. We are stuck here. Now this lower bound only holds for the LP case, not for the SDP case. And in particular for the non-a negative rank case, so for the LP case you have some kind of atomic view in the sense that you can write your slack-matrix as a sum of rectangles. And it's kind of the rectangles you can look at them as like atoms and you select one. And then we show that look, you cannot even take one which is good. But for SDPs this doesn't work. For SDPs if you have essentially PSD matrices and they're all interconnected, so you cannot argue that there is not a single good SDP matrix. >>: So [inaudible] find the slack-matrix [inaudible]? >> Thomas Rothvoss: No, this theorem of Yannakakis is there are essentials for that. So that's not the problem. >>: Okay. [inaudible]? >> Thomas Rothvoss: This one. This essentially works for SDPs as well. Yeah. >>: How do you find the slack-matrix? >> Thomas Rothvoss: The slack-matrix is the same. The factorization is different. Now here this says you look at the inner product of two non-negative vectors, and for the S>>: [inaudible] part of>> Thomas Rothvoss: For the SDP case you have two PSD matrices and then you have the [inaudible] product. >>: And what you restrict is the dimension? >> Thomas Rothvoss: Yeah, yeah. >>: Any other questions? No? Okay. Thanks.