>> Mohit Singh: Okay. Hi everyone. It's... to [inaudible]. He’s U-dub and he'll talk about some...

advertisement
>> Mohit Singh: Okay. Hi everyone. It's a great pleasure to introduce Thomas who just came
to [inaudible]. He’s U-dub and he'll talk about some very exciting work that he's done on
matching polytopes.
>> Thomas Rothvoss: Thanks a lot. Yeah. Thanks for inviting me. Yeah, we’ll talk about one of
my more recent papers which is on extension complexity of the matching polytope. Okay. So
what are we actually talking about, let's imagine you have a polytope P and usually the way
how you represent a polytope it's by giving me the inequalities, but you could imagine that you
have a different way of writing the polytope so you could imagine that you have, you allow
some extra variables and then you take all the points so that there is a solution for [inaudible]
variable so that it's a different constraint system is satisfied. And what this geometrically
means is that you have a higher polytope Q which essentially is the solution to that second
system so that if you project it down on the X variables that gives your original polytope P.
Now you might wonder why should you do that and what the point is that there are many
examples where the original inequality description needs let's say exponentially many
inequalities, but it's possible by adding those extra variables so you reduce the number of
constraints to let's say a polynomial. And if you count here we have a polytope P with eight
inequalities and that higher dimension polytope actually needs only six. Okay. And now
formally you can define the extension complexity of such a polytope P as the smallest number
of inequalities that you need to define such a higher dimension of polytope that you can project
down and get P.
So this is somewhat the, this is a good measure for the complexity of a polytope, and there are
tons of examples where the extension complexity of a polytope is bounded by polynomial even
though the number of inequalities of the original polytope is actually exponentially large. So,
for example, this works for the spanning tree polytope or the permutahedron.
In this work we actually are going to care about the other direction. So for which polytopes can
you show that the polytopes are, they are really hard in the sense that there's no way to
significantly reduce the number of inequalities. There also has been some work done and the
whole story actually starts with a paper of Yannakakis beginning of the 90s, where he showed
that there is no symmetric linear programming formulation of polynomial sides for, actually for
the matching polytope and for the TSP polytope. Now this is, this assumption that he talks only
about symmetric LPs, what that really restricted the number that really restricted the
applications, but essentially what he wanted to do is he just wanted to show that a bunch of
papers who claimed that NP equals P, that all those approaches were wrong because they were
actually based on a symmetric LP. So this worked to reject those papers, but then actually it
turned out that this restriction of looking only at symmetric LPs it’s really a strong [inaudible].
Then, well actually if you look at random 0, 1 polytopes using some [inaudible] arguments it’s
not so difficult to show that the extension complexity must be large. Also, if you look at
nonsymmetric LPs, but that didn't really lead anywhere until there was the big breakthrough of
Fiorini, Massar, Pokutta, Tiwary, and de Wolf, and they showed that for the traveling salesman
problem, the linear program must have exponential extension complexity for general LPs.
Now these techniques you can extend them and you can also show that there are certain
polytopes that you cannot approximate well. So this approach actually was somewhat very
flexible to show that. There was essentially some prior line of work that was done very recently
by James and others, and what they showed is that you cannot have a linear program for max
cut of polynomial asides which gives you an approximation factor of better than two. It's
somewhat surprising because you probably remember that the SDP it gives you a factor of
which is much better which is just 1.13. So this already shows that LPs are much weaker than
what you can do with SDPs.
But you might argue that actually those polytopes here, for which you see unconditional lower
bounds, the underlined polytopes are all NP-hard. While it’s a little bit debatable, the max cut,
polytope is NP-hard; you have a spectrahedron which is better than two minus epsilon
[inaudible]. If the underlying polytope is NP-hard, yeah. We have a couple of examples.
So what about a polytope, which is actually a nice polytope in the sense that you can optimize
any linear function polynomial time, can you get any kind of the lower bound on that? And the
prime example that exists for this is actually the so-called perfect matching polytope. Let me
just briefly remind you what is the perfect matching polytope? We're looking at a complete
graph on end nodes. You probably remember what a perfect matching is, so the perfect
matching polytope it’s the convex hard of all perfect matchings in that complete graph. And if
you try to write down the linear constraints that are supposed to define your polytope then you
start with these constraints, you say well, for every node you want to pick up one edge
outgoing but then you quickly realize that this is not really defining you the perfect matching
polytope because this does not rule out extreme points of this form. Actually it does not rule
out feasible points of this form so you could have some odd cycled with one half everywhere so
you need some additional inequalities that also kill these kind of solutions. And then you can
do by looking at every set of canonality[phonetic] and then you can require that well, if you
have a perfect matching then you must have at least one edge leaving that set U.
So you can write down some more constraints which actually ask for exactly that. And this is a
very classical work of Jake Edmonds, and he already proved in the 60s that this really describes,
correctly describes the convex hull of all perfect matchings. Edmonds also gave an algorithm
that can optimize any linear function over this polytope in polynomial time just by finding a
maximum weight matching. You could use that already to solve the separation problem, but
there's actually also a different way to solve the separation problem which is maybe a little
more elegant which is due to Padberg and Rao. So it's kind of it’s a nice polytope but if you
look, if you count the number of inequalities that you have there are exponentially many. So
you have really exponentially many facets. Then you wonder can you significantly reduce the
number of inequalities that you have by introducing some of these sub variables. And what this
is, what we're going to see here, this is actually not possible. So you can, the only thing that
you could do is actually reduce the constant that you have in the exponent, but apart from that
this description is actually almost optimal. So there's no way to be sub exponentially in the
number of the inequalities. And everything that was previously known to the best of my
knowledge was only essentially trivial lower bound that you get from the dimension.
Before I go into detail and show you how to prove that let's talk a little bit about the theory for
extended formations. In particular, we need to talk about what's called the slack-matrix. Let
me remind you what is the slack-matrix of a polytope P? You can look at a polytope and you
can look at the vertices and you can also look at the inequalities; and now the slack-matrix is a
huge matrix and an exponential size matrix so that you have a row for every facet and you have
a column for every vertex, and the entry that you have for a vertex and a facet is that it's the
distance, it’s essentially the distance that the vertex has with respect to the facet. It’s the slack
that the vertex has with respect to the facet.
Okay. We need one more definition which is the so-called nonnegative rank of a matrix that’s
the smallest number of columns of a matrix U and rows of a matrix V so that you can find
nonnegative matrixes so that you can factor your original matrix S. And if you would drop this
non-negativity condition then you would recover the usual rank that you know from linear
algebra. So this quantity is always at least usual rank. Why is this nonnegative rank, why is this
interesting? Because of the theorem of Yannakakis from his classical paper where he showed
that you take any polytope, you take any slack-matrix, and then actually the extension
complexity of the polytope it equals the non-negative rank of the slack-matrix. And this is a
very nice elegant relation. In particular, this is very helpful for proving lower bounds because
instead of arguing on some kind of higher dimension point where actually you don't know the
polytope you need to argue only about one slack-matrix and you really know the slack-matrix.
So you have to argue only about one object which makes things much easier.
If we have two minutes I can actually just briefly outline the proof of it if you like. Okay. So one
direction is kind of easy. So let's imagine you do have a non-negative factorization of the slackmatrix then what is the extended formulation? Well, you can just write your original polytope
as all X so that there is a non-negative Y so that you take, and then you take the original
inequality system and then you just add the left side matrix plus that non-negative Y. That
should equal the right-hand side. And that kind of makes sense because if you imagine that you
have a vertex I and you want to prove me that this satisfies this thing so this lies on the
polytope then you can just take the corresponding column of V and that's just your Y and then
you multiply this stuff and then gives you precisely the slack vector that you need to make this
equation.
Okay. The other direction uses duality and how this works is the following, let’s imagine that
this is your polytope and suppose you do have a higher dimension polytope Q that projects on P
let's say with this description. And now I claim that you can also find a factorization where the
size of the factorization is the number of inequalities that you need to describe Q. How that
works is the following, well, what do we need to do? We need to find for every inequality of
the original polytope we need to find a non-negative vector U and for every vertex we need to
find a non-negative vector V so that their scalar product gives the slack which this vertex has
with respect to that inequality. The way have you do that is, now you have an inequality here
of the polytope and by duality you can use some inequalities of that higher dimensional
polytope and you can add up nonnegative multiples of them in order to combine that
inequality. And you take those non-negative multiples and those coefficients give you the U
vector. And then you look at that vertex and you also want to find a non-negative vector for it.
The way how you do that is you look up into your high dimensional polytope and you take a
point here, and then you take the slack vector which this point has with respect to all the
inequalities of that high dimensional polytope Q. This is again a non-negative vector, and if you
calculate what there in a product then it turns out this is precisely the slack of this vertex with
respect that inequality. Okay, good.
>>: [inaudible]? The extended formulation, the forced direction?
>> Thomas Rothvoss: Yes.
>>: How do you calculate this [inaudible] facets?
>> Thomas Rothvoss: Okay. So, no. The number of the equations is as large as the number of
inequalities so that might be large. But the number of inequalities that you have it's, oh. Yeah,
yeah. The number of inequalities, just the number of columns of U. So you have tons of
equations, but many of them are redundant. So you could essentially throw out everything
which is redundant, just take [inaudible]. But you're right. This might still be large but you can
take a subset which is very small.
>>: The Y is not [inaudible]?
>> Thomas Rothvoss: Yes, yes. Okay. There was this paper of Fiorini [inaudible] that I
mentioned and the technique which they used to show that the extension complexity is large
and actually the non-negative rank of this slack-matrix is large, the way how they did that was
by using the so-called rectangle covering lower bound. And what it says is just that the nonnegative rank of any slack-matrix must be at least the rectangle covering number. So let me
just give you a quick picture. Now imagine that this is any matrix, any nonnegative matrix, let's
say that this is a nonnegative factorization. And now let's just forget the numbers. Let's just
look are the numbers zero or are they positive? Then actually if you look at the I of column of U
and the I of row of V then this induces a combinatorial rectangle and so that the rectangle it has
only positive entries in that slack-matrix S. And if you look at all of those non-negative rank
many rectangles then you can see that they cover all the positive entries of the slack-matrix S.
And it doesn’t cover any zero entries.
So, the natural question is if you want to show a lower bound for the perfect matching polytope
we could try to apply this rectangle covering lower bound. So let's try that. Well, it will actually
turn out that this doesn't work. But let's see why it doesn't work. I mentioned that this is a
slack-matrix. Let’s say this is the slack-matrix of the perfect matching polytope. The perfect
matching polytope, it actually it has a bunch of different constraints, the degree constraints, the
non-negativity constraints, and there are these [inaudible] inequalities. But only the [inaudible]
inequalities there are exponentially many of them. The others are only polynomial many. So
actually we only need to care about that part of the slack-matrix which comes from the
[inaudible] inequalities. Let me remind you that that then we have entries of the following
forms of the slack-matrix and then we have for every odd cut U and for every matching, perfect
matching M we have an entry which is not the form the number of edges which they had to
have in common minus one because this is the slack. And I claim that we can cover the positive
entries in that partial slack-matrix with only into the form many rectangles.
How that works is the following, we take every pair of non-[inaudible] edges and for every such
pair we get one the rectangle. So what I need to tell you is what is a set of matching that lies in
the rectangle and what is the set of cuts that lie in the rectangle. And the set of cuts is simply
the set of cuts which cut both edges. And the set of matchings is simply the set of matchings
which contain both edges. Now it this way you see that every entry U, M which lies in this
rectangle must have a positive slack because they share at least those two edges in e1 and e2.
And the other way around if you have any entry where you have a positive slack then you must
have at least two edges in common, well, actually by parity reasons at least three so you can
just take two of those edges and you see that the entry is in at least one rectangle.
If you look a little closer, if you, generally if you have an entry U, M then they share K edges
then this entry with be in precisely K choose too many rectangles. Now we could be a little bit
naive and we could say that well now this lower bound doesn't work. So we have an upper
bound and a lower bound which is actually useless, but this might give us some ideas. In fact,
now let's try to take this construction and let’s try to get a factorization out of it. So let's try to
do the following, let’s try to write the slack-matrix as the sum of all those into the form any
rectangles. And what I mean here is that you have that rectangle and that gives you a zero,
one, rank one matrix where you have a one entry if that entry lies in the rectangle in the zero
otherwise.
And now I'm wondering why is this not a correct factorization? Well, obviously it is not
otherwise the title of the talk would be different. But what's going wrong? Now let's look at an
entry U, M where they share K edges. You know that on the left-hand side for that entry you
expect a variable of K minus 1. On the right-hand side you know that this entry is in roughly a
quadratic number of rectangles so the valuable that you get is like this curve. So it doesn't
work. Now I do have some freedom. I could put some non-active scalars here in front. But you
see that there is no way how I could put scalars there and make this curve equal to that curve.
So that doesn't work.
>>: I have a question. Sorry. I was, can you say that argument again? Because I was always
taught only [inaudible]. I wasn’t listening. I was trying to figure out the K.
>> Thomas Rothvoss: That's an excuse.
>>: I was going 20 percent slower than you so if you say that again I would appreciate it.
>> Thomas Rothvoss: Okay, fine. So if we take this construction, and we consider it as a
[inaudible] factorization, non-negative factorization of our slack-matrix and then we wonder
why is this going wrong?
>>: And you’re telling the issue is the multiplicity of the cover. That's when you lost me.
>> Thomas Rothvoss: So if you look at to be entries U, M and you look at an entry which shares,
where they share K edges then this is the slack that you should get and this is, so this is saying
that this curve gives you the value that you expect on the left-hand side, and this quadratic
curve, this gives you the value on the right-hand side because every entry is in a quadratic
number of rectangles. Okay. And now you>>: The [inaudible] is not quite precise. [inaudible] K [inaudible].
>> Thomas Rothvoss: Yeah. Okay, okay.
>>: [inaudible].
>> Thomas Rothvoss: Well, I think this is one and this is two or three or so. Anyway, yeah. So
the point is, the point that I tried to illustrate here is that if you have entries where you do have
a large slack, think of this as being some large constant, then it seems that this kind of entry it’s
just in too many rectangles. So we could very naively ask maybe every covering of the slack
matrix with only, let's say with polynomial in many rectangles, every such covering kind of
covers the entries that have a large slack too many times maybe. It's a bit naive. The strange
thing is that it turns out that the answer is yes, this precisely the case.
Now this is actually what we are going to prove. But what kind of proof approach should I use if
this rectangle covering lower bound does not work. I need some kind of a proof technique that
I can work with. And the technique that I'm going to use is the so-called>>: So is the, I missed this, is the minimum rectangle covering exactly equal to positive
[inaudible]?
>> Thomas Rothvoss: No, no.
>>: It's just>> Thomas Rothvoss: They could be a very different.
>>: Okay. So you're going to lower bound this and that's going to translate and go lower bound
on>> Thomas Rothvoss: On the nonnegative rank, yes.
>>: Okay. But in general it could be larger?
>> Thomas Rothvoss: So the directing a covering lower bound it’s just a lower bound. It could
be very loose. Fiorini and others, here in this case it turns out to be a very loose. For the TSP
polytope well, if you find some tricky inequalities you can actually, then there's a rectangle
covering lower bound is actually surprisingly good. Here, in this case it’s terribly bad. This is
why we kind of need to find something different, something better.
>>: So you mentioned the lower bound of n squared of the lower bound. Could you get it even
a little better than this one [inaudible] use this?
>> Thomas Rothvoss: I'm not aware of any publication where anything better than that was
mentioned. Maybe somebody came up with something polynomial better but didn't publish it.
>>: Do you know what it’s rectangular?
>> Thomas Rothvoss: Actually, not. I don’t know whether it's really n squared or whether is n
to the four. It's not hard to show that it’s at least n squared and so it's not hard to show that
it’s at most n to the four. Mainly, it might be actually closer to n to the four. That would be my
guess. But that's just a speculation.
Okay. This kind of stronger lower bound that we are going to use it's the so-called hyperplane
separation lower bound. And this was suggested by Samuel Fiorini. Actually it works as
follows, imagine you can pick a magic linear function, well a linear function in the space of
fewer matrices with the following property that for every rectangle that you can pick this linear
function is very small. And if you want a geometric picture then you’re in some high
dimensional space and the linear function it gives us some kind of an inequality, some
hyperplane and all the rectangles lie on one side. And now let's suppose that you were so lucky
to pick the linear function so that the slack-matrix itself has a very high value on that linear
function.
>>: Rectangles mean zero, one matrices or [inaudible] matrices?
>> Thomas Rothvoss: Rank one, zero, one matrix. Yes. [inaudible] switching these notions.
Now, then the claim is then the non-negative rank of the slack-matrix it’s essentially lower
bounded by the ratio of both values of that linear function W. So the biggest entry in the slackmatrix in our case it’s at most 10. So it’s not a value that we need to care much about. Now
what's the intuition? Imagine your slack-matrix would be the sum of rectangles, would be the
sum of rank one, zero, one matrices. Then what this claim says is that look, you have a linear
function and it’s more on every of those rectangles but it's very large on the slack-matrix then
obviously you had to use a lot of rectangles. But now the slack-matrix is not necessarily the
sum of rectangles, but you can imagine that the slack-matrix it’s the sum of rank one matrices
where all the entries are between zero and one. So we would only lose a factor of n to go to
that view. And that actually it turns out that if you look at the convex hull of the rectangles
that's actually, this is precisely the set of rank one matrices with entries between zero and one.
I think this is also called the Bell polytope. I think in quantum physics this would be a
[inaudible].
So we have this linear function and all those rank one matrices with entries between zero and
one they have a very small value and then the slack-matrix has a very large value and then that
just means that you had to use a lot of these rank one matrices.
>>: [inaudible]?
>> Thomas Rothvoss: Yes, yes. Because you add up that matrix that you have with some other
nonnegative matrices. You add it up and then you know that the thing that you get the slackmatrix it doesn't have two large entries. So, and obviously each of the sum end was not large
either. So the tricky question is how do you come up with this magic linear function? Now let's
try to apply this rectangle covering lower bound to the matching case. Let me introduce a little
bit of notation. Let's say Q, L, this is the set of entries with a not cut U and a perfect matching
M so that they share L edges. Now this is going to be my magic linear function where for an
entry which has slack zero I put a minus infinity.
>>: Can you tell us how to recover the rectangle covering lower bounds for this formulation?
>> Thomas Rothvoss: Yes. For every entry where you have a zero slack you would put a minus
infinity which essentially means that you only need to look at rectangles where you're not
containing any slack zero entry. And then you'll put something positive on the others. In fact, if
you’re linear bound you can really see in this framework so essentially they have a measure and
they distribute it on a particular set of entries where the slack as positive. I think where the
slack is one, actually.
>>: Suppose I just put minus infinity of one or something like that.
>> Thomas Rothvoss: Yeah, yeah, yeah. That is>>: Then you will get entire covering bound?
>> Thomas Rothvoss: More or less. More to some small effect I think. This is essentially what
you would do for the rectangle covering lower bound. You would take, let's say you take a
measure, you look at all the entries, this is the smallest positive slack due to parity reasons; and
you take a measure and you take a measure of one and you distribute that uniformly of all
those entries. Okay. This is essentially the rectangle covering lower bound. Well, we have one
row left and we know that the rectangular covering lower bound does not work. So we should
put something here and that's what we're going to put.
So we look at some large constant K, in fact 501 works, and then also we distribute the
measuring uniformly on all those entries. It's just that we scale it a little bit down. And the
important thing is that we put the negative number there. It's negative. Now imagine what
would you do here, imagine you want to find a rectangle which maximizes this in a product, this
[inaudible] product, how would you do that? You would like to collect as many of those entries
where you have a small slack but you would be completely forbidden to take any entry where
you have slack zero and you also should try to avoid to contain too many entries where you
have a large slack. We'll say that you cannot come very far with that.
Now, first of all let's start with the easy things first. If you look at the linear product of that
linear function with the slack-matrix you'll get a nice constant value because the slack-matrix
has a zero entry for this, that gives us zero, then we get this is a measure so we get to one, but
all the entries have a slack too, so this [inaudible] with two and so here we put a measure again
and the slack that we get is actually K minus one. And that K minus one cancels out with this K
minus one so we essentially have a minus one here. And some constant is left.
The crucial thing is that we can prove that for every rectangle the inner product with any
rectangle is exponentially small. And then the inner product with the slack matrix as a
constant, the inner product with any rectangle is exponentially small and then the hyperplane
separation lower bound that gives us an exponential lower bound. And this is research in the
lower bound [inaudible] end.
>>: Can I [inaudible] one more time? What is, a rectangle is a set of cuts and a set of perfect
matchings>> Thomas Rothvoss: Yes.
>>: And you say that [inaudible].
>>: What are these numbers [inaudible]?
>> Thomas Rothvoss: So these are the entries U, M where they share three edges. These are
the entries where they share K edges.
>>: [inaudible] exponential? How big are these numbers?
>> Thomas Rothvoss: [inaudible] exponentially. Shall we go on? Okay. A little bit more
rotation>>: Can we go back? Can we iteration for why [inaudible]?
>> Thomas Rothvoss: Yeah. The iteration is that if you look at the construction that we had,
these example rectangles, there you contained actually entries that had a large slack. They
were contained kind of too many times. It means that if you look at individual rectangles then
the measure for this is quadratically[phonetic] larger in K than the measure that you collected
for those entries. And this actually means that if you take the rectangles that we saw a couple
of slides ago then the contribution that you get from these guys is much larger, the contribution
that you get from those guys. So the value that you get for those rectangles would be very
negative.
>>: So how do they do that? [inaudible]? [inaudible] slack of these pairs, but you think that is
not possible.
>> Thomas Rothvoss: Because you want to have many pairs, I mean if you have a tiny rectangle
you can do that. But if you have a larger rectangle, at some point you will contain a lot of pairs
U, M so that they have a slack precisely K. Tons of them. And then that contribution is going to
kill U. I think I have a picture later which tries to.
>>: But just by [inaudible] once you have enough matchings and enough cuts you have to get
like main cuts, just cutting [inaudible]. Like you have to get enough matchings across these
cuts.
>>: I guess it’s true also that [inaudible] matching is tied to each of them. Like no cut
[inaudible].
>> Thomas Rothvoss: Yeah, yeah, yeah. You also need to use later the rectangle that you're
looking at doesn't contain any entry with slack zero. That's important. It would be wrong
otherwise. So I tried to introduce the measure mu, L which is essentially, this is just the
uniform measure that we had on those entries Q, L. So this is just the fraction of entries of U,
M where they share L edges and it’s the fraction that I have in my rectangle R. Let’s say R is my
rectangle that I'm talking about for the rest of this talk. And actually the technical lemma which
takes let’s say 10 pages to prove is this one. This is the key technical lemma. And this is
essentially what I already said just written a little differently. So if you have a rectangle and the
one measure is a small, so you don't contain, so this is your rectangle. These are the entries U,
M where they only have one edge in common, so this is like zero entry, then you have the
following property that if you look at that the mu, 3 measure, so you look at the entries where
you share three edges then the fraction that falls into your rectangle is quadratically[phonetic]
smaller than the fraction than the rectangle gets for entries that have slack K, slack K minus one
because they share K edges. Okay?
>>: For all of them?
>> Thomas Rothvoss: Yeah. For any constant K, actually the constant moves also here into the
exponent. And this is more due to some little error term Some exponentially small error term.
>>: So does the same technique give things for like mu, 4 verses mu, K?
>> Thomas Rothvoss: Probably. I haven't written it down. You could probably get some
general iteration.
>>: It’s like mu, 1 equals zero means that you have some pseudo-randomness property.
>> Thomas Rothvoss: Yes. It's really very important that you have this condition. Without this
condition you could just take let's say everything, if the rectangle was everything then this is
one and this is one. So obviously the inequality’s [inaudible]. It's very, very important that you
have this condition, and that actually changes the whole relations.
Now do you see why this main technical lemma, this actually implies this lower bound because
[inaudible] this tiny error this contribution will be quadratically[phonetic] larger than that one.
Well, you divide by one over K but it’s still much larger than that contribution and so the only
thing that you could collect is that exponentially small term.
Good. So the technique that we're using to show this inequality, this measured inequality, it
actually originates in a paper of Sasha Rasparoff[phonetic]. So in the beginning of the 90s he
hada paper that showed some kind of measured inequalities for pairs. Well, he didn't talk
about matchings and cuts, but essentially we can adapt this technique to also work in this case.
[inaudible]. The rough idea of Rasparoff[phonetic] proof technique is the following, so this
rectangle R that we have it's an arbitrary rectangle. So we don't know any kind of structure
properties of it. So what he does is that, so what we're doing is we look at the more structured
rectangle let’s call it T, later this is going to be called a partition, then we show that for this
structured rectangle the inequality holds. In fact, we're going to show that often 99 percent of
those randomly taken structured rectangles this inequality holds if we kind of intersect it here.
>>: So what [inaudible] once again? Rectangles, if you take [inaudible] just random?
>> Thomas Rothvoss: You will take this kind of rectangle, you would take it at random but it’s
not uniformly at random. It's in a very structured way you take it at random.
>>: Okay.
>> Thomas Rothvoss: Good.
>>: You want R to be equal to zero for these rectangles as well, right?
>> Thomas Rothvoss: For this one not necessarily. It's more that you look at the restriction
here of that rectangle and then you want to argue that this inequality kind of the still holds.
>>: But you mean the intersection of T and R when you say that restriction?
>> Thomas Rothvoss: Yes. I should probably define this kind of rectangle. So essentially if you
look at the picture we are going to define at this rectangle T so that it’s also a set of matchings
and it’s a set of cuts. Now I'm going to call this a partition. Now this partition will be defined by
as follows, now you mentioned these are the nodes, these are all the nodes in my graph, and
then the partition it's going to be defined by set A of nodes, set B of nodes, set C of nodes and a
set D of nodes. And the set A it’s partitioned into what I'm going to call blocks, and the set B is
also partitioned into blocks. And they have very well picked sizes, the blocks. The numbers are
really cooped up so that things work out. In particular, we need later, much later in the talk we
need some symmetry. And the symmetry that we get is the following that if you forget three
nodes you trick the rest, and this actually looks like one of the blocks. If you forget three nodes
in C and D and you take this thing, this actually looks like one of those blocks. So we get it a lot
of symmetry that we need later in the talk.
>>: What would it look like, the number of nodes?
>> Thomas Rothvoss: The number of nodes, yes. Now let me associate a set of edges with this
partition which is, as you can see here, so we have edges running inside the blocks and running
inside C and D and also running between C and D. And now the set of matchings that belong to
this partition is the set of matchings that you can build only with those edges that you see here.
And the set of cuts that you can build it's the set of cuts that you can get by taking some nodes
of C and then some blocks in A. But the point is that you have to, I force you to either take a
complete block or not to intersect the block at all. Why should I do that? Now if you look at
the cuts and you look at the matching then the only way how they can intersect is actually here
between C and D. And now this is, the sizes of C and D that’s just might be constant K. So
essentially I have intersection going on only in the constant number of nodes and edges. So I
can really control how the matching and the cut intersect.
Now this is my partition, and now I want to rewrite my measures using these partitions. Now
essentially what I'm doing is well, what is the measure? Essentially you generate one of those
entries U, M where U and M they share three edges and instead of uniformly generating it we
generate it definitely, we first pick the partition at random and then we take a random entry in
that partition. You’ll get that same measure if you do it right.
Okay. So the first step is we take the partition at random; and this is kind of uniform in the
sense that you take a uniform A and then you take a uniform C and the rest, you take uniform
D and the rest, you take uniform Band the rest, and you take uniform random partitions. So
everything’s nicely symmetric.
Now once you are in this partition you do the following, you take three edges between C and D
and call them H and you take a matching way to contain those edges in respect to the partition
and you take a cut with cuts precisely those three edges and respects the partition. What I
know is that the cut and the matching they're going to intersect precisely in those three edges
that I selected. The good thing is that once I have picked a partition and once I have picked the
three edges the choice of picking the matching and the choice of picking the cut are
independent because I know that they cannot intersect anywhere else. In fact, it means that
this probability that I care about I can separate it because they are independent.
So we can play the same game for the measure mu, K. Again, we pick a random partition, now
we pick K edges running between C and D so essentially we would pick a perfect matching,
perfect bipartite matching between C and D, and then we pick a random matching containing
all those edges and we pick a random cut cutting all those edges.
>>: This is more trivial fact [inaudible]? By symmetry or>> Thomas Rothvoss: It’s by symmetry. I mean, yeah. I think it's pretty clear that every pair U,
M which shares precisely K edges must have the same probability of appearing here. Good.
Now for the next, let's say five minutes, let’s forget everything that I told you about cuts and
matchings and let's talk a little bit, for James, okay, you don't need to do anything. You can
just, let’s talk a little bit about the behavior of large sets. Now imagine you have a set of
vectors capital X, and the entries of the vector they are between let's say one and Q and
imagine Q as some constant that we don't care much about.
Now let's look at a little random experiment. Let's say I take one vector at random from that
big set of vectors X, and now I'm looking at one of the coordinates of the vector, and now I
observe the random experiment. I look how is the I coordinate behaving? So this is maybe one
choice of little x and this is another choice, and then look how does the coordinate X, I look
like? And what I claim is that if the center of vectors is large enough then for most indices I that
you could select the random variable that you observe here, this is going to be roughly a
uniform distribution. You can formulate it the other way around. And this is how you can
prove it nicer. Let’s say you have a linear number of coordinates where that coordinate is
biased in the sense that the distribution that you see there it’s a little bit away from the
uniform distribution. It doesn't really matter whether you look at the statistical difference or
the maximum difference in probability. Just a little bit away from the uniform distribution.
Then you can prove that the density of that set of vectors that you had that holds that Q to the
N, that’s actually exponentially small. This is already essentially some standard arguments that
appear in the Rasparoff[phonetic] proof.
Let me just give you the quick proof of this fact. Essentially you get the result by counting the
entropy. You count the entropy of that little random variable X. What is the entropy? Well, it’s
the logarithm of the size of the bigger set where you take the element from and then well you
can just bound it using the sub [inaudible] and now let's read the indices and look at the biased
ones and the unbiased indices separately. And then you know that the entropy of a random
variable where you take some numbers between one and Q then the entropy is at most log Q.
And you know that this is the unique case of a distribution where this is obtained is the uniform
distribution. And if you're a little bit away from the uniform distribution then the entropy is
also a little bit smaller. I guess I can probably skip the picture for Q. So this is the entropy if you
just have a coin and if you're a little bit away from the uniform distribution then you're a little
bit away from log K. And you can quantify this. That's not a problem. But anyway, if you do
have a linear number of biased coordinates then you know that there is a linear number of end
review that you're missing, and you rearrange this and you get that inequality.
There's actually a different way of seeing this. Essentially the same claim, it’s the following that
if you have a large, sufficiently large set of vectors, let's say at least that large, where the
density is at least that large and you want to talk about the density. The density it’s, you take a
vector X, now you take it in the set of all vectors and you want to check is it in my set X? This is
let's say the density. Then actually for most coordinates you're not changing that density if you
condition that the coordinates has some particular value J.
>>: For most I, for most J?
>> Thomas Rothvoss: For most I and all J. Yeah. This is essentially a reformulation of what we
already had before. And this is a very useful for us because that’s kind of our business model.
We have that rectangle, and we wanted to look at the density, we want to look at what's the
measure that falls into that rectangle, and this says that we can essentially locally condition on
everything that we like. So we can say we want to look at matchings but we want that these
edges are in the matching and those are not.
>>: If you have for all of J than that will make [inaudible] some constant Q, right?
>> Thomas Rothvoss: Yeah. This depends on Q and on epsilon. Now, we again remember
everything that I said about matchings and cuts. James, you don't need to do anything. And
let's try to formulize this. Let's look at one partition T and let's look at one choice of those
three edges H. And let's call that pair good with respect to the matching part if the following is
true, imagine you take a random perfect matching from your rectangle which contains those
edges and respect to the partition then I want the distribution that you see on these nodes, this
should be uniform. In other words, I want that the matching it induces uniform random edging
here. Or the other way around this says that if I have a good partition with a good triple H then
it means I can condition on everything here and this is not going to change the outcome of that
probability. So how much time do I have, because we had>>: Seven to ten minutes.
>> Thomas Rothvoss: 10 minutes? Okay. We can do something similar for cuts. So also again I
wanted to call a pair T of a partition, and H of three edges I want to call it good with respect to
the cuts if essentially I have the same a number of cuts containing those three edges and I have
the same number of cuts containing like everything. Okay. Now why is this useful? So first of
all, let me just, this is the measure that I want to bound. And let me, let's express it the way
that we saw already and now let's split this according to whether partitions are good or bad.
You get this picture. So this is, okay, so we spread the measure. This is the good part and then
you can be bad either because the matching part went wrong or the cut part went bad. And
actually what we are going to do is we’re going to bound each of those quantities separately.
Now in particular, this is probably the most interesting case. This is kind of the generic case
where you have a good partition with a good treble of H’s eight. And in this case we see
surprisingly that this measure, the three measure is quadratically[phonetic] smaller than the K
measure. And probably, this is probably the surprising part. The rest is a little technical,
probably not super surprising.
So I’ll try to describe in the next maybe five minutes why we come up with this bound. And I
think this is the key reason why the matching polytope doesn't have a small LP. And the rest
are some technicalities that you need to work on. So I want to show that this is quadratic
factor larger is smaller than this and I want to compare this essentially for every partition. Now
let's say we fix a partition T and let's say we also fix K edges F. And now we compare the
contribution that you had to one measure with the contribution we have to the second
measure and then we'll see this divergence. Now what's the contribution to the K measure?
That’s the probability that this random cut is my rectangle times the probability that this
random matching is in my rectangle. That's easy. Now I want to compare this essentially to the
contribution for the three measure, and I do that as follows, I have these K edges and I take
three of them at random. And these are my edges H, and then you take a random cut cutting
those edges and then you take a random matching containing those edges. The point is that if
the partition and that triple H if they are good then it actually doesn't matter what I'm
conditioning on. Then it doesn't matter if I condition that I cut all the edges or I cut only those
edges. So this probability, it doesn't actually change if I condition on something else.
So essentially what I need to bound is what's the fraction of triples H that are good? And I claim
that for every partition T and every set of K edges F this is actually quadratically[phonetic]
small. So the point is that you cannot have this kind of nice pseudorandom behavior for like all
triples, only for a very small fraction. And the reason is the following, I claim that H and H star,
so two sets of triples, they can only be good if they share at least two edges. I mean, if you
imagine the beginning, if you remember the beginning of the talk we had this construction of
rectangles. And the way how we did it we took two edges and then we took all of the cuts
cutting those and we take all the matchings containing those two edges and this essentially says
that this is the only way how you can get a decent rectangle.
Now, let's say we have a triple H and a triple H star and they are both good. Good means that
you have some nice pseudorandom behavior in the rest. Now we know that those three edges
they are good, it means that whatever I condition here on, whatever matching I condition on
here that's fine. I'm not going to change the outcome. In particular, I know that there's some
kind of matching in my rectangle where I connect two nodes that are in the other triple. But
also that other triple is good and then I know that there must be a cut which cuts precisely the
three edges in that red triple. And now let's have a look. So we have a matching and we have a
cut and they intersect only in one match. So we have select zero entry in our rectangle and
that's a contradiction. And that's it. Yeah. So essentially there's this pseudorandom behavior,
it kind of forces of this result. Okay.
Now there is some, if I have a three minutes then maybe I quickly outline how you prove the
technical part. Now what you're essentially showing is that if you take a random partition T and
you pick randomly three edges then the chance that this is the bad because of the matching
part went bad that chance is actually very small. You can make this as small as you like, and
that's nice. In fact, something much stronger is true. It's not just that for a random partition T
this holds, you can fix a lot of things. You can pick the three edges as you like, you can pick the
A part as you like, you can essentially pick that B part as you like, and you can give me any kind
of partition of the B blogs[phonetic] into two halves. The only thing that is random is how you
pick the remaining C and D part.
In fact, let's imagine we take, we fix this arbitrarily, and then we pick one of those guys at
random, and then we split this and that's the remaining C and D part. Now what I'm claiming is
that you can fix everything onto this point and now with a good probability still the remaining
outcome in T and H it’s going to be good with a good chance. The reason is, now imagine you
have all the matchings that you're going to conform with those edges and you have
exponentially many of those in the rectangle and then essentially they must behave almost
uniformly randomly in each of those logs and if you’re not pick one at random then you will see
an almost uniform behavior there. And that's it.
Now finally, I believe that we have a very good understanding in how strong linear programs
are, but I think we have a very, very little understanding how strong SDPs are. And I think the
next step would be to try to generalize any of those bounds to similar different programs. But
that seems to be quite a hard problem. So, yeah. But that's a very nice problem for the future.
Thank you.
>>: [inaudible]?
>> Thomas Rothvoss: Yeah, why not?
>>: So what [inaudible] everything or are [inaudible]? So if you want to [inaudible] some lower
bounds how do you [inaudible]?
>> Thomas Rothvoss: Actually we are already stuck at the very beginning. We are stuck at the
very, very beginning. We are stuck here. Now this lower bound only holds for the LP case, not
for the SDP case. And in particular for the non-a negative rank case, so for the LP case you have
some kind of atomic view in the sense that you can write your slack-matrix as a sum of
rectangles. And it's kind of the rectangles you can look at them as like atoms and you select
one. And then we show that look, you cannot even take one which is good. But for SDPs this
doesn't work. For SDPs if you have essentially PSD matrices and they're all interconnected, so
you cannot argue that there is not a single good SDP matrix.
>>: So [inaudible] find the slack-matrix [inaudible]?
>> Thomas Rothvoss: No, this theorem of Yannakakis is there are essentials for that. So that's
not the problem.
>>: Okay. [inaudible]?
>> Thomas Rothvoss: This one. This essentially works for SDPs as well. Yeah.
>>: How do you find the slack-matrix?
>> Thomas Rothvoss: The slack-matrix is the same. The factorization is different. Now here
this says you look at the inner product of two non-negative vectors, and for the S>>: [inaudible] part of>> Thomas Rothvoss: For the SDP case you have two PSD matrices and then you have the
[inaudible] product.
>>: And what you restrict is the dimension?
>> Thomas Rothvoss: Yeah, yeah.
>>: Any other questions? No? Okay. Thanks.
Download