23124 >> Yuval Peres: Okay. We're happy to have... differentiation and local rigidity of cuts.

advertisement

23124

>> Yuval Peres: Okay. We're happy to have James Lee tell us about discrete differentiation and local rigidity of cuts.

>> James Lee: So, yeah, I'm going to talk about, as Yuval said. So thanks for coming, even though some people didn't even know there was a talk. So it's even more miraculous that you ended up here.

So here's a brief outline. First I'm going to talk about optimization over cuts and how this is related to some classical combinatorial optimization problems, and then I'll talk about the relationship with metric embeddings, and then I'll introduce sort of discrete differentiation as a technique for analyzing these embeddings.

And, finally, the last part of the talk will be about sort of the main technical issue that arises from a particular construction which involves local rigidity of cuts in the plane. This is the basic outline. But let's start at the beginning.

So the first thing is to understand sort of what is the cut cone. So if you have a finite set X and a subset S, then one can define these cuts pseudometric associated to S as a function from pairs of -- so pairs of points in X to 0-1 which is just sort of 0, if the two points are on the same side of the set and 1 if they're on different sides. 1S is the characteristic function of S. In case that wasn't clear, here's a picture. Here's X, set S, if we take a bunch of points, then the pseudo metric associated to S assigned to 1is associated to this points and this pair of points.

Now, the cut cone is just the set, is just the cone generated by all of these pseudo metrics on finite X. So the cut cone on X is the set of all nonnegative combinations of these things. So you can consider sort of -- you can consider these things as lying in R to the X cross X, and then the cut cone is the cone generated by all these objects.

Let me just give a slightly different way of saying that. A mapping D from pairs in

X to real numbers is in the cut cone if and only if there exists some measure of subsets of X such that -- okay. So I'm thinking about this as a distance because it eventually is going to be.

Such that the distance between X and Y is exactly sort of this non-negative combination here. So I just say what it means to be in a cone, again.

Okay. And so one thing that's going to be sort of very key to this talk are sort of cut measures. So measures on cuts. So, for instance, here's a picture of three cuts, each has a non-negative weight. This induces -- so this measure on cuts and this is the distance given in this way. So, for instance, the distance between these two points is, what, 3.4, right, because 1.4 plus 2.0 is 3.4 and this is 5.5.

From every cut measure we actually get a distance in this way.

All right. So, okay, if we optimize over this set that would be awesome. So, for instance, if we have a graph then we can express the maximum cut in G as some sort of linear sort of -- it optimizes some linear function over these distances subject to D being in this cone.

So if we could optimize over this cone, that would be awesome, we could solve problems like max cut. But, of course, max cut is NP hard. So we don't expect to be able to optimize over this cone efficiently. So then this leads to the notion of relaxation, which is we take some kind of bigger cone that contains the cut cone, and we would like to be able to optimize over this bigger green cone. And then sort of of the idea of an algorithm would be, well, first let's optimize over this larger convex set and then do some kind of rounding where we take a point here and try to map it to a close by point in the original set.

This is where we recover a cut. Okay. So now I want to talk about what does it mean for two of these points to be close together. Of course, whether two things like in this cone are close depends upon what you care about. So a nice way to think about this is all of these things are trapped inside the metric cone.

So this is this big pink thing is the metric cone. This is the set of all symmetric functions triangle inequality. In particular the cut cones produce distances D that satisfy the triangle inequality. And everything we'll think about today is going to sit inside this metric cone. Every time you see a red dot, it's going to be a metric on the finite set X.

All red dots, they're metrics, they give different geometries on the set X. Once you think at these as geometries, you can think of geometric ways that these two spaces might be similar. And the most sort of natural one is two spaces are close like the L infinity notion, if all the distances in once, given by one metric are similar to all distances given by the other metric. So this leads to the notion of biodistortion between two metric spaces.

So if we have a map between two metric spaces X and Y, the Lipschitz constant, or the expansion of the map, the maximum by which distance they're expanded, the contraction, they maximum amount by which it's contracted and this map.

The distortion of the map is the product of these two things. The product of the expansion and contraction. This is just scale invariant way to measure how much this thing changes distances. So that if -- okay.

And then this notation won't actually be used except in one place, so it's not so important. But C sub Y of X is the smallest distortion with which we can embed X into Y. So we look at all the ways to embed X into Y.

The one which induces the smallest distortion gives us CY of X. So this is one kind of notion for comparing two points. We have two metric spaces. We can ask what's the distortion between them. This wasn't all for naught. So, okay, I have one more. I didn't tell you what this green cone is going to be yet. So this blue cone is the set of cut metrics. I didn't tell you what we're going to be talking about our efficient relaxation as. So we can -- okay. So here's something well known. It's not very difficult to see.

Cut metrics on a finite set, you can also think of as L1 metrics. So they're the same thing. So a metric space here in the cut cone is in the cut cone if and only if there's an isometric embedding through some finite dimensional Euclidian space, so the distance is exactly the L1 distance. So that's a different way of characterizing the cut cone.

That's an exercise. You'll see that. And what we're going to talk about as this sort of larger convex body is the set of negative type metrics. And I'll mention later on why this is very natural. I mean, first of all, this set is such that we can optimize over it efficiently. So it's going to be -- that's one reason to care about it.

But there are also lots of other reasons to think about it.

And you can talk about this set as being the set of metric spaces which admit an embedding into RN, but now such that the distance is the square of the Euclidian distance. And, okay, so it's not completely clear that sort of the negative type metrics contain the L1 metrics. But, again, I'll leave it to you as an exercise.

It's a simple exercise. But, okay, so these negative type metrics, these are the squares of Euclidian distances. So this is the -- okay. So now if you set this up in our languages before, we're looking at sort of like given some metric space here, how far is it from a metric space here.

Okay? Okay. And what I want to say that's not just sort of like -- that's an interesting structural question, but it's actually tightly connected to optimization problems. So let me tell you an optimization problem. So the sparsest cut problem on graph is the following number of vertices, we have a set of edges, that's our graph. We have another sort of edges, the blue edges classically called the demands, and then in the sparsest cut problem the goal is to find the cut in the graph which minimizes this ratio, minimizes the fraction of right edges to blue edges cut. This is the general sparsest cut problem. If, for instance, these blue edges were just a complete graph, if they were everything, then this objective is exactly the conductance of the cut.

>>: [inaudible] the sphere [inaudible].

>> James Lee: No, no, here it's a metric. So it doesn't, it's not going to matter.

So, for instance, this is a complete graph and this is going to be the product of S and the complement -- the product of the size of S and the size of the complement.

So there is no -- yeah. So this is the sparsest cut problem. As I said, it's a generalization of, for instance, conductance or expansion, which is the case when you take D to be V cross V the set of all pairs in the graph.

Okay. So here is something cool. If you -- this is a problem you can write this down as an optimization problem over cuts. And if you relax it to an optimization problem over this negative type cone, it turns out that the integrality gap for N point graphs, in other words, I look at this problem, I look at what's the furthest the objective can be when I optimize over cuts versus optimizing over negative

type metrics, the integrality gap for N vertex graphs is precisely the worst distortion to embed one of these metric spaces in one of these metric spaces for

N point metrics.

So the question of studying how far apart these two cones are is actually exactly equivalent to the problem of determining how well this algorithm does for sparsest cut.

So there's one that gives a natural reason for studying it. In fact, it was so natural and apparently like compelling that in '94 Gowens and [inaudible] actually conjectured that these two cones are very close together. They conjectured every negative type metric on a finite set can be embedded with some universal constant distortion independent number of points into a cut metrics. They conjectured for every point here you can go here and only lose some order one distortion.

Okay. So very well. Let me just -- I'm going to talk mostly about lower bounds.

We mentioned the best upper bound is due to Aurora myself and Naor, which is that for endpoint, for endpoint metrics here you can move into the cut cone, but you pay something which depends on N. And that something is square root log

N times log log N, which is the maximum bound. This sort of goes back to work of Borgan [phonetic] 20 years, now 25 years ago, which gives order log N bound and a sequence of works, many works eventually give you this bound.

I'm not going to talk about this. We actually saw something stronger which is related to some problems in nonlinear theory finite spaces. But let's ignore that for now.

Okay. Because so I'm actually going to introduce one more object. Okay. So we have the negative type metrics. We have the L1 metrics. And now the last thing we're going to need is something that I call soft negative type metrics. And

I'll tell you now why they're interesting. This is the set of metric spaces where you can't get the distance exactly equal to the squared Euclidian distance but up to constant factors equivalent to the square of Euclidian distance.

Okay. So there's a constant that is, implicit constant that I haven't listed here.

But maybe the distance is at most times 10 the square of the Euclidian distance and at least 10 to the squared Euclidian distance. It's up to constant factors.

The reason this is important is we'll see a structural reason in a moment. But even from an algorithmic perspective, all the known rounding algorithms, so all the algorithms which use optimization over this green cone and then round to this

L1 cone actually only use the weaker condition that you're inside this soft cone.

They only use this weaker condition.

And that's actually been crucial for a number of algorithms that try to do things faster. If you don't want to solve a semidefinite program to get your point inside this cone, you can do something faster like work of Sherman, who gives sort of N to the four-thirds time algorithm for approximating sparsest cut uses the fact that actually the optimization occurs over this larger set.

That turns out to be important for designing fast algorithms.

>>: Still [inaudible].

>> James Lee: What's that?

>>: It's not --

>> James Lee: It's still a metric. So everything is inside the metric cone. So every -- but now it doesn't have to be exactly a squared Euclidian distance. It can just be close to one. Doesn't seem like a big difference, by the way.

>>: So especially --

>> James Lee: So here this is really a metric -- so, I mean, the squared -- here this is really a metric. We don't lose any constant in the triangle inequality. It's just that -- right. So it depends. If you pass to the equivalent Euclidian squared distance and it's no longer a metric. Right? But it's -- so if you mean by pseudo metric you lose a constant in the triangle inequality, that's true. But that's actually a very weak property. You have a much stronger property if you pass to this, which is even if you apply the triangle inequality to a path length 100 you still only lose the same constant.

>>: [inaudible] all metrics that can be embedded or constant [inaudible] to embed?

>> James Lee: No.

>>: So this is interesting --

>> James Lee: In fact, the question -- so you can ask -- so you can ask what's, if

I have a point here how far do I need to go to get it to be here? And, well, I'll talk about it at the end of the talk. But it's a nontrivial question.

Okay. Right. So it's possible that this -- that in order to deform this metric, so that it's actually negative type, you need some huge distortion. Even though this is only by a constant, somehow it's a little bit --

>>: [inaudible].

>> James Lee: Well, yes, if you write it -- if you write it the right way. If you write it how I said it, then it's -- yes, it's a cone. It's just a bunch of -- the distances D give you a bunch of positive semi definite constraints.

All right. So now jumping ahead 10, 15 years after the Gomens [phonetic] conjuncture, it was a huge result related to lots of other things including unique games conjecture. When [inaudible] proved in fact it was false and they showed in fact some quantitative bound, code vision combined with the follow-up work, that, okay, so there exists -- there exists N point metrics space of negative type

which in order to move inside here you have to pay something. It's a slowly growing function but it's something.

So it's not a universal constant. And I want to stress something about their proof.

Their proof works in the following way. First, they exhibit one of these soft negative type metrics and then via what were -- by the time they proved it sort of rather standard techniques from complexity theory, illustrate that this metric is far from the cut cone.

And then the really sort of interesting magical part of the proof is this part, where they show that this point is actually close to a real negative type metric. So the answer to Nikhil's question, for this particular space, you can actually sort of pay a constant and move inside this negative type cone.

I still claim that this is not very well understood. I don't know if Naor disagrees with me, but this is a rather magical kind of thing where they sort of like, based on faith, they do something and then they have three pages of technical calculations that verify that it was the correct thing to do.

But it's still -- the motivation is still kind of like faith for why you can do this. Okay.

So now back in 2004, when [inaudible] Naor and I saw this -- it's blinking. It's magical.

>>: Does that show any particular?

>> James Lee: No, no, this is their particular metric space and a particular kind of perturbation that applies only to their metric space. Well, it's been generalized slightly. But to sort of families of the same kinds of things. So now -- so Naor and I saw these back in 2004 and we said can we try to -- can we try to do something like let's copy their sort of approach and try to do something better.

So, in other words, let's try to find some soft negative type metric and then show it's far from here and close to here. Okay. So a very sort of general theorem in this regard is that every doubling metric space is actually soft negative type.

So this is the theorem of Aswad [phonetic] which I have essentially corrupted by writing it this way. But a doubling metric space is just one where it has some kind of finite dimensionality property, every ball can be covered by order of balls half the radius.

So this says, okay, well, let's look for a doubling space. This will automatically be soft negative type. So the question of whether it's doubling space which doesn't embed into L1 is the sort of the interesting question.

And for various reasons this is the most natural thing to try. In particular, because Aswad [phonetic] himself had originally asked whether every metric space embeds into a Hilbert space, which is more restrictive than embedding into

L1. And this was the first counterexample to that question.

So here --

>>: Doubling.

>> James Lee: Yes. What did I say, every metric space. So that is also false.

Well, it's not so simple. Sorry. Not so simple, right? If you go by Lipschitz, it's not so simple.

In any case, all right. So then sort of we -- so now we started looking at this -- I'll tell you in a second briefly about this, although I'm not going to focus on it.

This three dimensionallizing group. And before I tell you about it, I'll tell you sort of what we proved. So by Aswad's result, this points it here. Great. We spend a lot of time to prove the difficult part of the covisiono [phonetic] thing, which is you can move it into this negative code type thing.

Again, this is all magic. I'm not going to tell the story but the story is kind of like magical and then you get like seven equations, one of which might work and you plug them all into Maple and Maple says only one of them can work and you check it and it does like this kind of thing.

And this is very strongly the analytics, the fact that sort of the analysis, they can do 4A analysis on the Heisenberg [phonetic] group and they interacted very well with the geometry. So we did this magical step.

But then, although we had reason to expect that it doesn't embed into L1, we didn't know how to prove this. So we did the next best thing and we sort of -- we conjectured it. We conjectured that the Heisenberg group is far from this L1 com.

Okay. So then he took essentially three years for Cheeger and Kleiner to prove that it's true.

So they proved that in fact this metric I haven't told you about -- it's just the

Cayley graph. If you take, for instance, the group of three-by-three upper triangular matrices with integer matrices and take some Cayley graph, that's the metric we're considering. They proved that that metric doesn't by Lipschitz embed into L1. One reason that's very interesting is because it was clear the way you should prove it is via differentiation. I'll talk about differentiation in a second.

But no sort of straightforward theory of differentiation can hold for L1 valued mapping. So this is actually a big problem in the nonlinear theory of Monarch spaces, like what does it mean to differentiate L1 valued maps.

So the fact that sort of they were able to solve this question -- I mean, and give a reasonable answer in the setting of the Heisenberg group was very nice. Okay.

So they solved this, but it doesn't give -- well, this shows that you can do it with a constant but it doesn't give a quantitative bound yet.

So let's see we started late. So let's just ignore the geometry of the Heisenberg group.

>>: No.

>> James Lee: No, you don't want to ignore it? [chuckling]. All right. So I've already said what the Heisenberg group is. You can endow this with -- well, if you think instead -- forget the real case. Think about integer entries and think about the Cayley graph, then here's what the structure of the space looks like.

Okay. So in the Cayley graph, then I'll tell you the continuous things, in the

Cayley graph you can take four generators here. I'm not going to write all the generators down. But you put a 1 here and 0s here and you put a 1 here and 0s here, you put plus minus 1 plus minus 1, it gives you four generators.

And you can think of those generators, if you embed it in the natural way, in fact, according to the letter you gave here into R 3, you can think about the generators as sort of moving in the X direction or the Y direction.

And the way that you move into the Z direction you use the fact that the group is not commutative. If you go XY, X inverse Y inverse, then actually you increase by one in the Z direction. And there's actually a continuous version of this space where in order to go distance, say, A in the Z direction, you actually trace out a circle of area A in the XY plane. When you trace out the circle, you go up one in the Z direction.

So if you look at the continuous case, then sort of this is, here is a picture of what's going on. At every point you're only allowed to move in a two-dimensional plane. But the orientation of the plane changes as you move, and actually using that you can connect any two points by a curve that only at any point only travels in these two-dimensional planes.

And the reason essentially it doesn't embed into L1 is because -- okay. So what's the -- it's basically because infinitesimally all movement is horizontal. If you think about what that means, it sort of means that the cuts in the embedding should be perpendicular to the direction of movement. That will make more sense and essentially giving us information for Yuval's sake because he wanted to hear about the Heisenberg group. That will make more sense in a few slides.

So all the cuts should be sort of perpendicular the direction of movement. And then if you sort of look at what that means, it basically means that the cuts have to be sort of vertical in this space. That's not exactly true. They just have to be vertical locally. We'll get to that in a minute.

But that means that sort of points are vertically separated collapse in any L1 embedding, because most of the cut measures is focused on cuts that look like this which don't separate points like that.

If you don't know what -- so in a second I'm going to give you construction which does this but in a simpler way. So then it will all make sense. That's the

Heisenberg. But it's beautiful.

But, I mean, also the geometry is quite difficult to work with. So like, for instance, to connect pairs of points is nontrivial. Even to see the space is connected, the continuous space is connected is a theorem.

Okay. So, okay, let me tell you what I'm going to present today, which is --

>>: You said earlier the counterexample to what?

>> James Lee: It was used as a counterexample as to whether a doubling space by the Lipschitz bed, order 1 distortion into a Hilbert space into L2. Even if

Euclidian space was known, because every doubling space emit an order

1distortion mapping into Euclidian space.

>>: Use the vertical separation property there, too?

>> James Lee: No. No, so there what you do is -- you use -- okay. Yuval likes the -- so, no, no, so there you take a map into L2. And the idea is that so this map is -- so if this map is small distortion, particularly it's going to be Lipschitz.

And then since it's Lipschitz, there's a whole theory of differentiation for maps that take the Heisenburg and Hilbert space. So differentiation means that since this map is Lipschitz, locally it can be approximated by a linear map. So a linear map in this setting sort of means that, for instance, FGH is their two matrices should be -- let's not use F. Let's use L for linear map.

So it would be LG plus LH. So using differentiation, which I'll talk about in a moment, you can say that locally this map is approximated by a linear map. But now linear map -- I mean, on the one hand in Hilbert space here you have -- you're commutative but on the other hand over here you're not commutative. So here's something that follows from having a group homomorphism from H3 to L2, it implies in particular that L of GH equals L of HG. But these two points are different in the Heisenberg group. I mean, if you take two points which are sort of like -- I mean, any pair of element don't commute, these two points are different.

So locally -- but locally this map is going to map them to the same point in the

Hilbert space. Okay? So that was a bit vague. But the point is that when you're taking maps that have values in L2, this sort of differentiation theory, the fact that you can approximate a Lipschitz map locally by a homomorphism, works out very nicely. There's like a -- when you change the target space to L1, everything goes away.

So even if you take a map from the real line to L1, which is Lipschitz, this map can be differentiable nowhere. Whereas if you take a map from the reliant Hilbert space it's differentiable almost everywhere. So Lipschitz map from -- realign to

Hilbert space is differentable almost everywhere.

So, yes, this group was used to prove that doubling spaces don't embed to

Hilbert space. But the method by which the proof goes completely breaks down for L1, even at the very start. So at the very start it's to consider map from the real line and even those get killed.

So there are even isometric embeddings of the real line into L1 which are differentiable nowhere. It's easy to produce them. Okay. We'll see an easy differentiation argument in a moment. But here's what I want to prove today.

I want to prove a nearly tight separation between this soft negative type cone and the metric cone. Okay? So the upper bound I mentioned before holds. So there's a square root log N upper bound for points going from here to here and I want to prove that this strong quantitative result actually that's tight. So these two things are really separated by square root log N.

>>: Some figure, some metric on the work group?

>> James Lee: On this group?

>>: On the distance?

>> James Lee: On here? On the Heisenberg group? Yes. So the metric here is hard to explain. If you want something which for all intents and purposes except for the proof is the same, you can consider the group of upper triangular three by three matrices. So consider the group of these things with integer entries, XY, Z are all integers.

And then the generating set is just plus minus 1 these things and also these things. So just this is a graph, the Cayley graph. Take the shortest path metric.

That's the metric. On the Cayley graph with these generators. You can take any generators you want.

So in fact this embeds isometrically, yes, into this version. This is a continuous version of that space. But it's important for the proof that you're in a continuous version. It's unknown how to do the proof in a discrete version. And it's in fact unknown in a strong way because it would be certain things would be much easier if you could do it in a discrete setting.

Okay. So this is what I'm going to try to prove today is that we can get sort of an optimal separation here. And then at the end I'll -- which is going to be in 20 minutes -- I'll talk about whether you can get sort of an optimal separation here.

But this is going to be based on a discrete differentiation theory. So let me tell you now about that and then I'll tell you the formal statement of the result. We've already talked about it. But let's just remind people what it means to take a derivative.

So Labegue's [phonetic] differentiation theorem, for instance, says if you take a map from the real line to the real line, which is Lipschitz, for instance, then almost everywhere you can approximate this map by a linear map. Almost everywhere the map is approximated by its sort of tangent line up to sublinear air. This is sort of what it means to take a derivative. Now in the last 30, 40, 50 years, to generalize the spaces with less and less structure has been like very important in bannock [phonetic] spaces and geometric group theory.

So I'm going to follow an approach taken by Eskin, Fisher and Whyte which is discrete. So here's sort of the -- okay. So let's remember what we need. The two interesting facts here are sort of we get a locally linear approximation and we get it almost everywhere. So we're going to now lose all the structure and have to replace linear by some other property and almost everywhere by some other property.

So let's just look at a path of length N and a metric space X. And now first let's look at what it means to be linear. So here's a very sort of interesting elementary way to say that maybe this map from the path to X is linear if the triangle inequality for this path holds, is tight up to a 1 plus epsilon factor.

So if I apply the triangle inequality that says, of course, the distance between the endpoint is at most the sum of the distances along the edges, and that's the other direction, and so now I'm saying the triangle inequality is actually tight 1 plus epsilon factor. Here's the picture. Here's the path. This map is pretty efficient.

This map is pretty less efficient. This is the notion of efficiency.

And now we need a notion of almost everywhere, but we're in a discrete setting.

Almost everywhere is going to have two components. First of all, there's going to be like a locality but also a granularity. So it's going to be local but also some kind of coarseness. You can't look too closely at the line.

So, okay, so what I mean by that is, okay, so there's some granularity parameter.

Let's say the granularity here is four, because there are four points. This is -- I'm just going to show you the localities and hopefully you'll understand. This is one locality. Use four points. Here are three other localities at the next scale. Here's a locality. Here's a locality, and here's a locality. It's a scale but also I'm only allowed to look at these four points.

In other words, if I was considering, for instance, some map on this space, then I couldn't see what the map does in here at this locality. And so on. So now here are nine localities of the next scale.

And now here's a theorem which is fairly easy to prove, which is that if I have a map from the path into a metric space and it has distortion D, then it's epsilon efficient, almost everywhere, as long as you give me -- as long as the path is long enough. As long as I have enough scales. So you need number of times you need to subdivide. It's about D over epsilon. If M is the granularity, scale like M to the D over epsilon points. By the way, here's the proof.

I said what does almost everywhere mean? It means that if you choose -- I'll give you a measure on localities. Choose a random scale and then a random offset, almost so that's the measure. Almost everywhere this is another parameter delta which I haven't put here.

But you can sort of get it for, if you want you can get it, for instance, for

99 percent of the localities you can get this property. And here's the proof. I'll give you the proof. So let's just -- let's just find one locality at which it's epsilon efficient, then it generalizes to finding many of them. So here's the proof. Okay.

So if our map is efficient at this red scale, then we're done. We found one locality. If it's not, then it has to be inefficient. It has to look like this.

So now I say, okay, I'll look at the green scale, look at these three localities. If it's efficient on one of these, then I'm done. Otherwise it has to be inefficient on all those scales and, okay, I can do the same thing again and you just see that the more times you go, the longer and longer this curve is going to become in the target. So eventually you're going to violate your distortion condition, because this curve is going to have some fractile structure at all scales. It's going to be too long.

So that's the proof. Very simple. All right. All right. So now the key is to say it's not to specialize. So we have our locally linear property. We have almost everywhere and we have a differentiation theorem. And now the key is to specialize to the target space being L1 and see what happens.

Okay. And so this was done independently in work with [inaudible] and also a later paper of Cheeger and Kleiner where they gave a different proof for the

Heisenberg group. Here's the idea if we take a path into L1. So remember this just means that we equivalently we take a cut measure on the path and the distance is given by the distance between points is the measure of cuts that separate them. I claim this map is zero efficient. The triangle inequality holds exactly, if and only if the cut measure is concentrated on cuts which are half line.

So if and only if the cuts occur at half lines. So this is a good cut. This is a bad cut. And, in fact, you can see this cut is not efficient, the distance in this cut between the endpoints is one. But if I go along the edges I count three, the distance is three.

So, okay. So that's the exact version. And now sort of the almost version is that which just follows from exactly what I said, is that the map is epsilon efficient if at most the epsilon fraction of the cut measure is supported on these bad cuts. And all the other cut measure has to be on good cuts.

Okay. So then what the differentiation theorem says is that if I give you some map from the path to L1 which has small distortion, then there exists a locality such that when I look at the cut measure restricted to that locality, most of the mass is on these monotone cuts, these cuts which are like half lines. Okay.

Good.

So you can use that to solve some problem from the Gupta intervention and

Sinclair, just ignore that. Let's try to use this to give some quantitative bounds.

So the first thing is that two years ago Cheeger Kleiner and Naor proved a much stronger bound than log log N. They should, for the Heisenberg group for some delta you can get the distortion to go like log N to the delta. But delta is two to the minus 1,000 and the paper is like 90 pages long.

So it's not -- you don't have a lot of hope that you can get delta equals half at least at the moment from that kind of analysis. So what I'm going to present now is a doubling space which gets almost the optimal bound on a square root log N

of a lower bound. So okay so it says that now we have some techniques to get into the right ballpark of the correct asymptotic bound. Okay. So -- so I'm going to go like to 4:25; is that okay, Yuval? So in order to do that we just have to bump up our differentiation theory by one dimension it turns out.

Instead of thinking about lines, just think about squares. So, again, I can consider the unit square in the plane, consider some cut measure on the unit square, which comes from an L1 embedding, and now I can ask the same question. If this cut measure gives me a metric on the unit square, which is order

1 distortion say from the Euclidian metric, what does it imply about the structure of the cuts locally? And the result that you get is that when you zoom in locally and you look at the cut measure on some tiny copy of this square inside, almost all the cut measure is concentrated on half space cuts. So intersections of half spaces with the unit square.

And the reason for that is just that instead of differentiating along -- we differentiate along path, paths, but instead of differentiating on a single line like we did before you just differentiate along all the Euclidian lines. You differentiate in all directions and the only cuts which are monotone with respect to all the

Euclidian lines are the half space cuts.

So it's sort of that basically whenever I hit this square with the Euclidian line, if I have a cut such that whenever I hit it I cross the boundary at most one time that cut has to be a half space cut.

So the half space cuts are sort of the only cuts that are stable with respect to differentiating along all the Euclidian lines. Okay. So we're going to use this property now, and here's the construction. Okay. So I mean that was great, but I mean this square embeds isometrically into L1 so it's not so interesting. We have to do something a little bit more.

So here's the basics of the construction. Take two copies of the unit square and identify their boundaries. So just glue them along the boundary. So topologically we get a sphere. Geometrically it's not a sphere. Geometrically, if I take two points and the top copy the distance is Euclidian distance but I just want to take the induced proc metric, if I take a point here and a point here, the way to go is sort of like go to the boundary and then go.

So it's not this sphere metric. It's the -- on each copy you have the flat Euclidian metric. And then you move between them in the obvious sort of way. Okay. But let's suppose I have an embedding of this --

>>: [inaudible].

>> James Lee: What's that?

>>: [inaudible].

>> James Lee: Well right now, we'll see in a second. You can think about it as a pillow, yeah. Even then it's warped. It's very flat pillow. In some sense the

reason that it's -- you can't represent it because that's what we're going to -- anyway. Okay. So...okay. So suppose you take some embedding of this glued sphere into L1 and just suppose for the second that the cut measure sort of resulting from the L1 embedding is supported on cuts which are monotone on both the top square and the bottom square. Okay? So I think an embedding, and if I look at the cuts in the embedding when I restrict them to the top square, the bottom square I see monotone cuts, which are exactly intersections of half spaces with these things.

So what does it mean? Well, take any such sort of a cut, look at what it looks like in the top copy, it looks like this. Now, they're identified along the boundaries and it's easy to see that any half space intersected with the unit squared is determined by its intersection by the boundary, so it has to be -- so the cut has to be the same in the top and the bottom and therefore if I consider any two points which are only -- which are vertically separated. So the same point but in different copies of the two squares, such a cut cannot separate them, right?

So what it means is that if I take a mapping from this sphere to L1 where all the cut measure is on monotone cuts and it has infinite distortion because it collapses vertical pairs. So now we want to -- of course, by the way, there exists a perfectly...there exists an embedding, the standards theory is two distortion embedding of this thing. Of course there exists a good embedding. They don't have to be supported by monotone cuts.

The idea is to construct a space where for most of the spheres the embedding has to be supported on monotone cuts and has to collapse them. This is the idea.

These are the diamond folds. So we start with the unit squares before and now, okay, so this is the sphere but I've drawn it in -- there's two copies here. One a top copy and one a bottom copy. But I've drawn it in a way so that it has eight squares which are up to scaling isometric to the original square. So now I have eight more squares. What I'm going to do is apply the same thing to each of those squares. So I glue another square on to each of them and I get some picture like this. And then just -- and I'm going to keep going. So the idea is that the basic operation is you can take a square and then do this pillowing thing and then once you do that you have sort of eight smaller squares you pillar each of those and you keep going.

You get some recursive construction like this. Okay. So this is like some kind of recursive bubble wrap or something. And the main important property is that no matter where I -- no matter where I look in this space at every scale I see copies of this sphere. There are copies of the big scale, copies of the next scale and so on. And so in fact that -- so in fact if you want to just prove that this thing requires that there's no [inaudible] embedding, like if you want to pass to the infinite version. Eventually I want a quantitative result. We're not going to go to infinite version.

But past the infinite version, you can prove just by saying, you apply the differentiation argument and that says there must be some sphere where almost

all the cut measure is supported on monotone cuts on the top and the bottom.

And then all that cut measure collapses very cool pairs because there's only epsilon fraction left to separate the vertical, vertically separated points and now epsilon to 0. So there has to be some -- there has to be a sequence of vertically separated pairs that they get worse and worse and worse. Okay. So that was -- so now, okay, good so now I want to talk about the quantitative bounds. So, again, the result is actually that you can take -- we'll take the infinite space.

The claim is that there are actually N point subspaces there such that you get some very strong quantitative bound on the distortion, on the end space.

Eventually you don't need the continuous -- the continuous structure again is important for the analysis but you can take the one graph that forms the skeleton of these one spaces. And that's enough to give the bound. Okay.

So now what's the quantitative problem that arises here. So it's the following.

You take the unit square and you take a set. And now if I tell you that every line crosses the boundary of this set at most once.

So we don't count the boundary of the square as the boundary of the set. So these crossings don't count. But this is the boundary on the inside.

So if I tell you that every line crosses at most one time, this line doesn't. This line crosses twice, but if every line crosses the boundary of at most one time we've already said S has to be the restriction of the intersection of a half space with the unit squared. Okay. So now if you want some quantitative version of this theory you have to start asking about slightly more difficult question. Like what if

99 percent of the lines crossed the boundary at most one time? So now I have a set where most lines cross once, but, of course, any line that goes through this dot crosses twice, and any line sort of at this bad angle crosses twice. So you can ask that question. And the goal would be to say that, for instance, in this case the set has to be close, say, in symmetric difference to a half space, right?

But in fact for the discrete version it's even harder. We actually only have finite set of points along these lines. And now you only get to see a crossing if it's witnessed by this finite set of points.

For instance, these points witness the crossing because they're not in the set not in the set not in the set but then in the set. But these finite set of points across the line don't witness this spot at all, because they totally miss it.

Okay. So the goal is to -- I mean, and this is dictated -- the fact that we had to take a finite set of points is dictated by the fact that we want a quantitative bound.

So let me just tell you the main quantitative sort of theorem that comes up there.

It's the following. So take the unit square and take some subset. And then choose a random line L. So from like the kinetic measure. So a random line L conditioned on intersecting the unit square and choose K random points on the line.

Okay? And now let theta sub SK be the probability that this random line with random points crosses-sees more than one crossing. Like it crosses into S and back out of S. So the main technical theorem is that there exists a half space

sort of the symmetric difference between -- I didn't take the intersection with the square. But the symmetric difference between S and the half space is only square root of theta plus something that depends on the granularity of your sort of ability to look at the space. So as you take -- I mean, there's some error term that depends on K.

It's actually not clear whether this necessary. But it doesn't matter so much for us. And in fact this is odd. I didn't expect this. This square root is responsible for in the square root log N. And in fact -- and this is tight. So as you can see, for instance, if you take this set, then the symmetric -- the area of the symmetric difference is the closest half space is about delta. You pay the area of this triangle up here. But the probability that you detect it is only delta squared. And the reason for that is that in order to detect this bump, the line has to be sort of within this shift of, which has only measured delta but also the angle has to be correct, which is also another delta.

So you get probability of detecting it to delta squared and the symmetric difference is delta. So in fact this square root dependency is tight. It's very strange that -- I mean, it's very strange that somehow a two-dimensional -- this is two dimensional phenomenon. It's very strange that gives the correct bound.

I always thought that the correct bound would come from the differentiation theory and then -- okay. So it's just strange to me that this comes from sort of like two dimensional -- some very seemingly strong two dimensional property. All right. I don't know what it means.

Okay. So now to conclude, let me just say -- so what we've shown is a gap between this soft negative type and L1. And what we really want to finish the entire question is the gap between the actual negative type and L 1. So what we need is sort of we need one more of these magical steps. There's the called visionary step. There was the step for the Heisenberg group. So usually when you have like -- and both of these were kind of miraculous, and the proofs are relatively opaque. So you can ask -- usually when you have miracles, there's some reason behind them.

So for a long time I thought it should be the case that as Nikhil asked earlier, that actually every point in here should be order one from this thing. This was just sort of -- there's a special proof for each of these two cases where there should be some more general property but it's false. So with my student last year and using differentiation we proved there's actually a substantial gap between soft negative type and negative type.

Which means in particular that sort of closing the -- getting our construction to give the optimal integrality gap for the original semidefinite program is still a nontrivial open problem.

>>: What you do is -- [inaudible] is this a nontrivial in the sandbox?

>> James Lee: Like the upper bound? So there is an improvement to one-third.

Like you can get log N to the one-third by analyzing our construction slightly better.

>>: No, no.

>> James Lee: The upper bound is log N.

>>: Which is --

>> James Lee: Which one? The one-third? Or this? This is up here.

>>: But that's a period --

>> James Lee: I agree. It's unclear whether the one-third will appear. The one-third is like -- okay. It's also not the right answer. The answer is -- the right answer is probably square root log N. So if you would like to encourage my student to write the one-third argument, please feel free to send an e-mail. Okay.

So now everything I talked about was just sort of some very particular -- was about some very particular algorithm. It was about how semidefinite-based algorithm could yield sort of reasonable approximation to sparsest cut. So via sequence of works which is now even now the slide is now updated because it doesn't contain 30 papers, but there's a sort of -- there's some profound connections between these semidefinite programming gaps and actual complexity theoretic obstruction to solving the problems like there's some kind of mechanism to take a semidefinite programming gap and then convert it into an actual statement that the problem is difficult to solve by any efficient algorithm.

Assuming the unique games conjecture. But unfortunately this connection is essentially unrealized for sparsest cut. So like the at least in the context that I'm discussing here. So, for instance, for this problem the best thing that's known even assuming new games conjecture is that there's no constant factor approximation. But so one hope would be that sort of this construction is now simple enough and has enough ingredients that look kind of like PCPs. Instead of a global decoding, it's some kind of local decoding. Like somehow in PCP theory you say if you have some nice property of some structure, then it has to be close to an algebraic like some kind of algebraic structure. And here you get the same thing but now the closest is only local. If you have some nice cut measure on this space, then locally it has to look algebraic. Like it has the look like half spaces. So the main open problem is whether you can sort of use this construction now to actually say something complexity theoretic about like the sparsest cut problem. And I should mention that the one thing that I'm working on here is algorithms for sparsest cut but for small sets. So like you want to find sets that don't expand but you care about small sets. And the relation to eigenvalues and well we can talk about it more off line is anyone's interested.

But I'll end with that.

[applause].

>> Yuval Peres: Further questions?

>>: Sparsest cut, nonuniform, what's the -- [inaudible].

>> James Lee: So the nonuniform version has 1.001 hardness using --

>>: You see this --

>> James Lee: Not anymore, because basically there was some nonconstant hardness but that was before these algorithms that sort of said that it can't work in that regime. So it's not clear -- it's not clear how like what the quantitative results are, like how fast you can let epsilon go to 0 and what the size of the label set should be. In fact, the initial assumption that gave growing hardness used some properties that don't seem to be true given the algorithms. Okay.

>>: I guess all bets are off on everything you said here about [inaudible] -- causing the assumptions?

>>: Yeah, yeah.

>>: Can't understand why --

>>: For the purpose, the [inaudible] symmetry.

>>: I understand why here you care about .0 distance.

>>: But this is a discrete proof. Actually it shows also -- there's a generic way to show it but this also shows that this space is not quasi metric to a subset.

As I said you can in fact instead of taking graphs sort of that have any infinitesimal subject you can just take the one skeletons, and these distances -- in fact, so you can take graphs where the minimum distance is 1. So then quasi symmetry and the distortion are the same.

>>: Right. But oddly the continuous structure is used fundamentally in the proof.

So I mean, if you -- I don't know how to prove it on the skeleton. It just follows from a compactness argument. The skeleton is essentially are -- essentially are isometric to the nets in the continuous space. So by compactness argument since continuous space doesn't embed also the skeletons don't embed. But I don't know how to prove that directly.

>> Yuval Peres: If there are no more questions, let's thank James again.

[applause]

Download