Document 17868095

>> Yuval Peres: So welcome to the third in this series, so all the little guns you saw in the first two lectures are all going to be firing today. Please, James. >> James Lee: Thanks. Okay. So right. So our goal today is actually to prove the main theorem, and I'll remind you what it is. And of all the -- well, I think this will be pretty interesting. It's a cute proof and finally I figured out a way how to explain it in a talk. So first let me remind you -- okay, just the object we're working with again. And I'll try to write -- tell me if I'm not writing big enough. I see that many people are sitting quite a distance away. All right. The objects we're talking about again, a collection of joined and Gaussian random variables, the Gaussian process. We equip this with a canonical metric, which is the L2 metric here. And our goal was to understand the quantity, which is the expected supreme of this process. And remember the philosophy is to understand this quantity in terms of the geometry. Okay. So the index here is capital D. The philosophy is to understand this in terms of the geometry of this sort of -- of this metric space T. Okay. So now let me remind you very briefly the -- what the upper bound was, because we're going to prove a matching lower bound today. So we'll take a sequence of -- sequence of partitions of T, and we'll call this sequence -- so this is the sequence of partitions. It's a sequence of increasing partitions. So here AN plus 1 is the a refinement of AN. And we'll call this sequence, so this sequence partitions is -- let's call it admissible. If it satisfies two properties, the first property is just that -well, we start with the whole set. So we start with a trivial partition into one piece. And second property -- we saw this last time -- is that we have some upper bound on the sizes of these partitions. Okay. So the first partition is into one piece, and the Nth partition [inaudible] 2 to the 2 of the N pieces. And we saw -- and we saw before, I mean, in the first talk why this number comes up naturally. Basically -- well, the number of -- eventually we're going to be considering sort of the log of the number of points of the important thing, and if you want the log to double, then you should be squaring the number of points, which is why I have a growth pattern like this. And the important upper bound we prove was due to Fernique, and sort of in the earlier version the N tree bound due to Dudley, which says that we can bound -- given any such admissible sequence of partitions, we can bound the expected [inaudible] of the process in this way. So I'll remind you what this notation means in a second. But given any admissible -- okay. So this holds for every admissible sequence. And just to remind you of this notation, let me write it here in red. For one of these partitions -so if T -- if little T is a point of big T, then for some partition AN, AN of T is just the set of AN containing little T. >> [inaudible] >> James Lee: All right. Fine. They do this in like -- all right. Okay. Okay. So the -- this -that's -- so I hope this is -- so this is just diameter of the set in this partition containing T. And we prove this is the chaining upper bound. Okay. And so what you can do is is you can define -- let's define a functional, which is -- which is called gamma 2 of this metric space, which is just a best possible upper bound that this Fernique chaining argument proves. So the functional is just take the infimum over all admissible sequences of the upper bound you get. I'm just writing the same thing over again. So the Fernique's bound exactly says that the expected supremum is upper bounded by the gamma 2 functional. The gamma 2 functional gives you the best [inaudible]. And now we can state what's called the majorizing measures theorem, which is what we're going to prove today, that in fact such a sequence of partitions is the only way to upper bound the expected supremum. So in fact the expected supremum of any Gaussian process is proportional to this gamma 2 functional. Okay. So proportional just means up to an absolute constant. It's at most C times gamma 2, and it's at least gamma 2 over C for some constant C. And this is due to -- this was conjectured by Fernique and then eventually proved by Talagrand that this gamma 2 functional controls the expected supremum. And this turns out to be a fairly powerful thing. I claim today that this will be the only proof of the majorizing measures theorem given that was possible to understand in a talk. That's my claim. It's a bold claim, but... But the presentation of the proof of this theorem is always kind of disgusting. And okay. So I think there's a nice way to do it. But this is our goal today. We've already proved the upper bound. The goal is to prove the lower bound. And now I just want to remind you what we were talking about last time. We introduced some tools to prove the lower bound. So the first tool was the Sudakov inequality, which said the following. Okay. Again there's this -- I'm not going to keep writing down this process. We have this Gaussian process sitting in the background all the time. This Gaussian process X of T says that if we take a bunch of points, T1, T2, up to TM, such that the pairwise distances in our metric are all -- are large, so they're all at least alpha for I not equal to J, then -- and we proved this last time using Slepian's lemma, this comparison equality, then if we look at the expected supremum of say XT1 up to XTM, we said that this is at least -- grows like alpha times the square root log the number of points here. This is what we proved last time. Somehow it was a lower bound that matches what the union bound gives. If we knew all the points were distance alpha, the -- sort of the Gaussian tail inequality, which is the expected theorem, is that most alphas grow at log N, so if we know they're at least alpha, we got some kind of matching lower bound. And from this we actually want to get a slightly stronger corollary. So let me make -- let me make it down here. Make one definition that's going to be useful. Definition. For some subset A of our process, let's define G of A to be the -- just the expected supremum of the subprocess when we look at the variables in A. Okay. So our expected supremum is just G of T. But in general for some -- sort of some subset A, we can consider G of A. Okay. Now here's a corollary. I mean, it's a corollary that I wrote, require a proof of the Sudakov inequality. So, again, let's say -- okay. So now we have -- let's start it this way. So under the same assumptions. So, again, we have -- think about M points. All pairwise [inaudible] large. I want to get a slightly better lower bound than this, which is of the following form. I claim that the expected supremum, if we consider -- okay. So this is for some R. R is going to be a fixed constant. So you can think about if you want R equals 20 definitely works for everything in the talk. Little R is always going to be some fixed constant. So the claim is that if we look at the expected supremum of not just looking at the points T1 of the TN, but let's look at small balls around the points -- I'll draw a picture in a second -- then we can actually get something slightly better. So what we can get is -- we get this alpha over square log M. C is some universal constant. Whenever I write C, it's a universal constant. So this hid some universal constant, this C is universal constant. I claim we can get this plus something a little bit more, so we get a contribution coming from the centers of these balls. I claim that we can also get the minimum contribution coming from one of the balls. Okay. So the picture is that we've got our whole space T, you give me these separated points T1, T2, T3, T4 like this, and now they're separated by alpha. But now I come and I look at even a much smaller ball around each point, an alpha over R ball around each point. And the claim that we're making here is that not only do we get this large contribution coming from one of the variables, you know, X of T1, we can also get some contribution from what's going on inside these balls. And let me just sort of sketch the proof of that. So what we saw before is that -- I mean, we can always make a move like this. So let's just fix some point T not. I don't care what T not is. We can always make a move like this where we say the expected supremum, just because all these variables are centered -- okay. I mean, this is -- the expected value of this is 0, so we can do this just fine. And so the idea is that what this Sudakov inequality says is that sort of -- well, we know that -- is that one of these -- I'm looking at these sort of -- these values here are like XT1 minus XT0, we know that one of these should be large. The expected supremum of one of these things will be large. Okay. And what we'd like to do is say, well, also I should be able to get some credit for, you know, the supremum. Now think about the -- think about each T1 as being the center of its ball. And then I should be able to get some credit for these little black arrows as well. So I should not just get one of these, I should get one of these plus one of these. Okay. And the reason we take the minimum here is because we don't know which one of these we're going to get, right? I don't know a priori which one of these variables is going to be big. I just know that one of them should be big. So I get one of them to be big, and then I want to get sort of this associated black arrow as well. Now, of course the problem is that if I condition, for instance, on this one being big, I could screw up the expectation of this ball. So first of all let me just assume -- let me assume for the moment that essentially all the time all of these balls have the fact that they're -- that the value here, so the supremum here -- okay, again, what is the supremum here? It's the -- we look at XT minus XT4 over all Ts, all Ts in this ball. I'll write as alpha R. Look at all these things. I claim that we can basically [inaudible] the supremum here is always at least the expectation, so the expectation is BTI alpha over R minus, okay, something that looks like some constant C times alpha over R times square root log of M. So let's assume I can -- I said all the time, no matter what happens, that all these balls achieve at least [inaudible] which is the expectation minus this. Then we're done in the following way. Because what we'll get here is instead of this inequality we'll get this minus C alpha over R square root log M. But then by choosing R to be a large enough constant, this can be absorbed into here. In fact, by choosing here R equals, what, twice C squared, we'll, you know -- by choosing R to be this, which is just some constant, this gets absorbed into here and you would get a 2 here. Okay. So if we could guarantee that all these ball -- these small balls always achieves at least the expectation minus a little bit of loss, we get this inequality. And this is ->> [inaudible] alpha times square log M? >> James Lee: Yeah, it could be much larger. Because it could be many, many more points in there, right? So the diameter went down, but if the number of points went up by a huge amount, then it could be larger. And, okay, so if we knew this -- now I claim that this is essentially true. And the reason this is essentially true, we wrote down last time, is because of the following concentration inequality, which I want to focus on the proof of the main theorem. So I won't prove the concentration inequality now, but if someone wants to see it at the end, the proof is not too difficult just from the classical concentration inequality for -- on the Gaussian measure in RN. So here's the concentration inequality, though. If we have a Gaussian process XT, then the claim is that the probability that the supremum of XT differs from its expectation by more than lambda. All right. It grows like this. It's exponential [inaudible] factor 2 here minus lambda squared over something. And here's the important point. This something just depends on the maximum variance. So the maximum XT squared value. So this is a classical concentration inequality, which is somewhat surprising because it doesn't depend on the number of points in this process. In fact, this T could be an infinite set. Could be some kind of continuous set. And still the only thing that matters is the maximum variance. Okay. Okay. So it's -- well, we can get to the proof later. But why does this finish it? Because what's the -- if we look at any variable of the form XT here in this ball, any variable of this form, well, we know the ball has radius alpha over R in this metric, which means that the variance of this -the variance of all of these random variables is at most the variance of this thing is at most alpha over R squared. I mean, the Euclidean distance is at most alpha over R, which means that the variance at most alpha over R squared. So now, I mean, you plug alpha over R squared in here. If we want sort of -- we want to take a union bound, we have M events, these M balls, we want to take a union bound, we should try to get in [inaudible] to be about 1 over M. Right. Which means we should take lambda to be about, what, C times square root log M times the maximum variance. But the maximum variance is alpha over R -- sorry. Not the variance, because we're squaring that, the time is the maximum distance, which is alpha over R. So the point is that if we take lambda to be this, then we can basically be assured that none of the balls will deviate by more than this, and that's exactly what we said we were getting in the first place. So if you just make that slightly more rigorous, then you get the actual statement of the lemma. But basically these balls -- the fluctuation of these balls can be absorbed into the main -- into this term. All right. So that proves our concentration inequality. I mean, that proves our corollary. And now actually you can forget everything about Gaussian processes because now that we have -- this is the only thing -- this is what we're going to use about Gaussian processes. So in fact let's -- now let me state this theorem. Let me restate it slightly differently, and then we'll just be able to focus on the proof of this theorem. Okay. So let F be any functional, so just a real valid function, on subsets of T. Okay. Think about F as measuring the size of this subset. The F we're going to use is actually just the expected supremum. But somehow we don't want to -- we can now divorce ourselves from this -- from thinking about random variables because we're just going to use this fact. Okay. Such that two properties hold one property is that this is a measure of size, so if A is a subset of B, then F of A should be at most F of B. Okay. This is certainly satisfied for our -- for the expected supremum. And the second property is just that this holds. But let me just restate it here, and then I won't erase this for the rest of the talk. The second property is that if we have T1, T2, up to TM in our set, and the distances between these things are pairwise at least some alpha for all II equal to J, then this holds. So then the functional applied to the union of the balls is at least some constant times alpha square root log M plus the minimum over the balls. Sorry. Of the functional applied to the balls. Okay. So we're going to take any functional that satisfies these properties. Certainly this little G function, which is just the expected supremum, satisfies these properties. And then, okay -- so then I claim there exists an admissible sequence of partitions, A sub-N, such that we get exactly 1. If we apply the functional to T, then up to some constant factor, this is at least this. Okay. So the point is for any functional on subsets satisfying this kind of growth inequality, I claim that there exists an admissible sequence such that this lower bounds F of T -- yeah. >> Why does G satisfy the property number 1? So let's say I take -- like let's say A is part of the space where things are getting really crazy, and then B is that -- is A plus like basically like a flat area? Won't the expected supremum be the smaller? >> James Lee: The expected supremum of a subset is always less than the expected supremum of the whole set. I mean ->> What's a subset? >> James Lee: Even more stronger than the supremum of a subset is always less than supremum of the whole set for any. Yeah. This holds trivially for the expected supremum. But now does everybody see that this finishes the proof? Because now if we just instantiate F with the expected supremum, then we show there exists admissible sequence, and of course now by definition this is -- I mean [inaudible] this is at least gamma 2. So that finishes the proof. So now our entire goal is to show that if you give me F, I can give you this sequence of partitions, such that F of T is at least this. And a lot of the power from this framework comes in the fact that sort of this is a very general kind of thing that applies to lots of different kinds of processes or modifications of this condition apply to lots of different kinds of processes. All right. So this is our goal. And in fact -- so just remember an admissible sequence as the Nth thing has size at most 2 to the N. That's all you need to remember from any of this. And now let's erase this and just concentrate on -- so I hope everybody understands everything we're doing now has nothing do with probability anymore. It's just about something about metric spaces. So specifying this partition is actually also going to be not very difficult at all. I'm going to use -- okay. Let's see what's going to happen. I will specify the partition here and then I'll do the analysis here. Okay. >> [inaudible] >> James Lee: What's that? >> That was already the [inaudible]. >> James Lee: Not quite that simple. At most 2 to the 2 pieces. All right. So let's start the partition. Awesome. All right. That's a good first step. There's one thing that's going to happen is that every set in the partition is also going to have a value, which is going to upper bound the diameter of the set. So the value for this piece will just be -- and the value will be implicit, because it's not -- I don't need some notation for it. But the value of this piece is just the diameter of the set. Okay. So now -- well, let's suppose that you've come -- you've given me A sub-N, looks some way, you partition the space in A sub-N. Let's just choose -- I'm going to get the next partition by taking every PC in your A sub-N. Okay. Let's blow up C just for the sake of -- okay. Here's C. Now I'm going to partition C into 2 to the 2 to the N pieces. So I'll take each of these pieces and partition them further into 2 to the N pieces. And of course if I do that, then the size of AN plus 1 is at most 2 to the 2 to the N times 2 to the 2 to the N, which is 2 to the 2 the N plus 1. So this will give an admissible sequence. So how am I going to partition it. This is the whole -- all right. Let me tell you how to choose the first piece, and then I'll tell you how to choose all the pieces. So we choose -- okay. And this set C has some valued delta. Remember, this value delta is just an upper bound of the diameter of C. Actually, it's an upper bound on the radius of C, not the diameter. In other words, C is contained in some ball of radius delta. But this is not a [inaudible]. Okay. So now choose T1 in C such that the following quantity is maximized. Pick your functional. Look at the ball around T1 of radius delta over R squared. Okay. R is some constant, which is bigger than 20, and such that this is satisfied. Just think about R as a constant. It is a constant. Okay. Such that this intersected with C is maximum. Okay. In other words, cut out the biggest piece you can where big is defined by looking at this small ball around the thing. Okay. Okay. Expect -- well, I said choose T1 such this happens. Here's the whole trick of the proof. And set C1, which is the first piece of our partition, to be -and this is the -- this is where all the magic happens. I mean, you won't see the magic now. But see it soon. And set C1 to be the delta over R [inaudible] T1. So what we do is the following. We first choose some point T1 which maximizes this amount. Okay. So what am I looking at to maximize [inaudible]. I'm looking at this delta over R squared ball around T1. But then once I've chosen T1, I actually cut out the delta over R ball. So I actually cut out this bigger ball [inaudible] delta over R. Okay. That's how I choose T1. So this is a delta over R squared ball, this is cutting out the delta over R ball. All right. And the value of this set will be delta over R, which is of course an upper bound in its radius because it was cut out. All right. Okay. So now we just keep going. So in general let's let D sub-L be the amount of space that's remaining after we've gone L steps, so it will be C minus everything we've cut out so far. Okay. And we'll choose T sub-L, the next point in D sub-L, to again maximize the same sort of quantity. It maximizes the delta over R squared ball intersected with what's left. And finally you put C sub-L equals -- again, we maximize according to delta of R squared ball but we cut out the delta over R ball. >> CL plus 1? >> James Lee: Yeah. Okay. Good call. Yes. CL plus 1. And probably TL plus 1, if we -- this is 1, this is 1. All right. Okay. Okay. So let the [inaudible] okay, now we go to -- we select the next point T2 such that this ball is large, cut this out, maybe the next point T3 looks like this, but this is -- happens to be pretty large. We cut this out. And we keep going. But now we want -- we only want to cut out 2 to the 2 to the N pieces. And we might get screwed up and we might -- I mean, we might not exhaust the space before we get 2 to the 2 to the N pieces. So except -- so let's -- okay. So I should specify here. Let's let -- let's let M here -- I just want to -- I don't want to write 2 to the 2 to the N over and over again. So let's let M be 2 to the 2 to the N. So this M is number of pieces. So we keep going except that -- dot, dot, dot, dot, dot, except that C sub-M -- okay. I'm not going to write down here. Dot, dot, dot, dot, dot. Except that C sub-M is actually just going to be D sub-M. So, in other words, when you get to the end, you've got nothing left to do, so we're cutting, we're cutting, we're cutting, we went, we finally got to the Mth point, again, it was chosen to maximize this ball. But now what can we do. We just cut out the whole set. So this is T sub-M. I mean, this is T sub-M. This is the last set. Okay. >> [inaudible] >> James Lee: Good. >> [inaudible] >> James Lee: I thought I got it all plus 1 though. Okay. Good. Okay. All right. That's the whole -- okay. That's it. We're done. Except I have to tell you -- okay. Obviously we can't reduce the -- the value here is now delta. We didn't reduce the diameter, so the value [inaudible] delta. All these pieces now have value delta of R, and this piece has value delta. And that specifies the entire partitioning, because I told you how to break up one piece, now you could just keep going on and on and on. Okay. The claim is that this partition satisfies this lower bound. All right. So now we need to get to the -- all right. That's the whole partitioning. It's really quite simple. It's not clear right now, but it was actually chosen -- the partitioning is chosen so as to make this tree as balanced as possible. Okay. You might not think it's balanced because you say, well, why would you -- if you look -- you're cutting out the biggest pieces, so you might leave behind very little. But -- well, you'll see what comes up. Actually this is going to tend to be the biggest piece. Okay. That's not even true, but let's see what happens. Okay. So now we're going to go on to the analysis of this partition. So Talagrand's analysis involves defining five quantities -- I'm not going to do this; I'm just telling you -- it involves finding five quantities that satisfy seven equations. And then verifying that with every possible choice all these things are -- are remained satisfied and then summing something at the end, which is -- it's hard to understand. Okay. This proof is going to be understandable hopefully. Okay. So let's -- here's the -- here's the idea. We can think about of course this whole partition as a tree. So looks like it's a tree. And when I draw this tree, I just want to use -- I'm going to use one convention that the leaves of the tree -- I mean the children go from left to right. So if we indeed cut out M pieces at this level, the last piece, this giant -- it's a giant sucker over here -- is the -- is going to be the rightmost piece. Okay. So okay. So that gives us a tree. And on the nodes of the tree we have -- I mean, we have like values, so, you know, we have values like delta, delta over R, and so on. Okay. Corresponding to what's going on. All right. And what the -- the final thing we can do is there's a natural value to associate to every edge in the tree. The value of this edge is delta times 2 to the N over 2. Okay. So if this is level N, which means we're using 2 to the 2 to the N points, the value of this edge is going to be delta times 2 to the N over 2. If we do that, okay, then here's what I claim. Then I claim that this quantity we care about, diameter A sub-N of T, all right, I claim that this is at most -- okay. I know this is a factor of 2, but it doesn't matter. And I hope nobody gets really upset about this. So first of all it should be -- I didn't say it, but let's assume that here -- again, it doesn't really matter, but just for simplicity, let's assume that T is finite. The main thing that we're trying to prove follows from the finite case just by an easy -- well, at least for separable processes, but -okay. But just assume that T is finite. So eventually the leaves of this tree are just singletons. We eventually just get singletons at the end, and we stop. So the claim is that -- okay. I hope this is clear what it means. We've given every edge in this tree a value. So we can look at a root leaf path in this tree, and it has some value, which is the sum of the edge length along the path. The sum of the edge length along the path is essentially this value. Essentially 2 to the N over 2 times the diameter, except for the fact that we said this is not actually diameter, this is just an upper bound. So it's an -- so this upper bound is this value. Okay. So now here's the whole game. [inaudible] told me not to use red, although the red got better. >> Explain the [inaudible]. >> James Lee: Well, exists 2 to the N over 2, but 2 to the N over 2 is square root log the number of points. And square root log, as you see, is an important thing for us, right? So that 2 to the N over 2 is square root log. Okay. I'm just really reformulating this bound in terms of this tree. If we think about this tree and we give the edges this length, then the supremum root leaf path is bigger, is at least this. So our goal now is to show that F of T is at least the value of any root leaf path. In other words, my goal is this. You give me root leaf path in the tree, I show -- I prove to you that F of T is at least that value. That will prove that it's at least a sup, which proves it's at least this. Okay. This is the whole gain. So now we need two properties. I'll stop using red. We need to make two observations. Let's keep the -- oh, no, we have the -- we don't need the corollary anymore. We just need to make two observations about this tree. And then -- let's see. Oh, yeah, we're in good shape. Okay. Observation number one is that only right turns in this tree matter. So if we want to compute the value, we only have to look at right turns. Okay. So why is that. Let's see. Okay. So right -and when I say a right turn, I mean a turn like this which corresponds to having chosen this -having chosen this -- just everything remaining and keeping the parameter value at delta. Okay. So why is it -- well, look what happens anytime you make a turn that's not a right turn. Look what happens here. At this level you've got value 2 to the N over 2. I mean, this is times delta. Right? What's the value of this. Well, since it's not a right turn, we know that -- okay, so -- so since -- let's say since this was not a right turn. By -- when I say right turn, I'm referring to -- look at the board -- the rightmost child. Since this is not a right turn, if this was delta, this goes down to delta over R, which means the value I get here is only 2 to the N over 2 times -- sorry. 2 to the N plus 1 over 2 times delta over R. Okay. Now, suppose I again don't make a right turn. So let's suppose this wasn't a right turn, it was another thing here. Well, then the value here is delta over R squared which means that at the next level the value is going to be 2 to the N plus 2 over 2 times delta over R squared. Now, R is a number that's bigger than 20. So taking these non-right turns is a geometrically decreasing sequence as we go. So actually basically if we take a right turn like this and then a sequence of non-right turns, the value you get along here is just comparable to the value you got here. So we only need to count right turns. >> [inaudible] look at one path [inaudible]. >> James Lee: No, no, no. Because you might -- you might venture this way in a tree because you know that later on you're going to get to take a lot of very nice right turns. Taking the right -- you might -- you might take a right turn and then realize you have nowhere else to go, whereas you might want to like venture -- so you can -- optimizing all the way down so that you take the most expensive right turns. I mean -- okay. So -- but the point is that considering the value of a root leaf path, I claim we only need to consider the value over right turns. Okay. So, in other words, I'm going to just think about weight 0 being on everything except these edges. That's the first reduction. So let's write down only right turns matter. And the second property is that in fact if you take a sequence of right turns, only the last one matters. Because, what, let's look what happens in a sequence of right turns. So these are all right turns, which means that the delta parameter stays the same every time. But now what's the value? This is delta times 2 to the N over 2. This is delta times 2 to the N plus 1 over 2. This is delta times 2 to the N plus 2 over 2. It's a geometrically increasing sequence so that only -- you know, okay, let's double this is not a right turn. Only the value of the last right turn matters. So, in other words, when I compute the value of a root leaf path, since I'm only trying to get things right up to constants, I only need to have -- I only need to add up the values for the last right -- for every last right turn in that path. Again, non-right turns are geometrically decreasing, and if I take a sequence of right turns, it's dominated by the last one. So this -- okay. So -- and in fact only the last right turn in a sequence -- in a sequence of right turns matters. All right. With these two things set up -- all right. With these two things set up, let's -- okay. Let's just continue here. We're ready to do the analysis. Okay. And the analysis is not going to be very difficult, but here's -- so here it is. Okay. And I apologize for the name of the following thing. If you know of a better name, like if you think of something better that one could actually [inaudible] let me know. So this is the snake poop game. You'll see why it's really the only appropriate name for this. >> [inaudible] >> James Lee: Okay. So again we want to prove that F of T is at least the value of any root leaf path, and we know that to calculate this value we only need to look at the values of the last right turns along the sequence. Okay. So here's what we're going to do. If you give me this tree, let me define -- okay. So this tree had values on just the edges. But I want to put values in the nodes as well, and you'll see why this happens in a second. So the value in a node is just -- so every node -- this is a partition tree. Every node corresponds to a set. So the value on a node is just -- if this is set S is just the functional applied to the set. That's the value ->> [inaudible] >> James Lee: For the edges. Not for the nodes. We had to -- it doesn't -- I mean, it doesn't -oh, you mean these values. These values stick around. The diameter value sticks around. But now let's call it a reward. >> [inaudible] >> James Lee: No, no, it's just that we have diameter values. I agree. But these are different. So these things still have diameter values, but let's give them -- let's call them rewards. You want to collect these things. All right. I don't know. Some kind of other value. And the edges, instead of giving the edges value delta times 2 to the N over 2 well, look, I'm not going to get delta, I'm only going to get -- oh, by the way, we -- I mean, you could swallow into the R, but we should put R here just to be clear what's going on. The separation -- we didn't have R before. Okay. So we can put C. Never mind. So this C will in general be a small number. It could be 1 over 100. Instead of having the edges have value 2 to the N over 2 times delta, let me put the edges to have value C times 2 to the N over 2 times delta, where it's this C. Because I know I'm not going to be able to get this much if I apply in inequality. I'm only going to be able to get C times this much. So the value of an edge here is -- if this node sort of had diameter delta, then the value of an outgoing edge is C times 2 to the N times 2 times delta. Again, if I get a -- I mean, if I can show that F is at least the value of any root leaf path here, again, the value just means along the edges, then I just lost a factor of C. So, in other words, changing their edge values didn't affect much, except for the fact that it's going to be a little bit helpful. Okay. So now given any tree with values like this, you can, like with rewards, like think about having a subset of the tree. So like choose some vertices and some edges. Okay. Now I can sum up these values. So there's some reward associated with this. My goal now is to show that F of T is at least -- so you give me some root leaf path, so your root leaf path goes like this. I would like to show that F of T is at least -- is at least this value, which is the value of all the right turns in the past. If we can do that we're done. What I want to be able to do is write down inequality on trees. Like one tree with some markings is greater or equal than some other tree with markings. So what I'm going to prove is F of T is at least this. This is my goal. Okay. I'm not going to be able to prove it. In fact, what I'll prove is that three times F of T is at least this. Okay. And this is where the whole trick is going to come in. All right. So that's it. So we're going to prove that three times F of T is at least this. You give me your path. This is what I'm going to prove. Okay. So we need to start. So let's start somewhere. All right. So now we're at the top of the tree. And suppose you tell me the first two steps in your path. So your path goes like this. All right. So now I'm going to start -- I'm going to -- I get to start -- I'm going to spend my three times F of T. I'm going to spend it in the following way. I'm going to mark this node and this node and this node. These are the first three steps in your path. Now I can -- now this tree times F of T is at least this, because this node has value F of T. And by the subset property this node has value at most F of T and this one has value at most F of T. So I can -- so I start the game like this. Now okay. And then the whole idea of the game is that you're going to reveal to me the next step in your path, and I'm going to have to respond -- I'm going to have to say that sort of I can choose different rewards such that this tree is greater than the next tree. Okay. So let's look at an example. Okay. So this is -- okay. So let's look at this example, first of all, which is a simple one. In this example you can hear -- what I'm going to observe is that since this is not a rightmost -- this is not a right edge, so I don't need to take -- I don't need to get this. I don't need to take care of this. So in this case my move will just be the following move. I'll just go like this. By the subset property I can make this move. I mean, this node is less -- costs less than this node. So I can make this move. And I -- this move was easy. I didn't need to get anything because this was not a rightmost turn. Okay. >> [inaudible] the value of the tree [inaudible]. >> James Lee: Right. Because I'm going to have a sequence of inequalities. This sort of -maybe I should do it this way just for this one step. This -- three times this is at least this, and this is at least -- let's draw the same thing. So this step was easy. This was the step I did here. Okay. And so this is the easy case. If this top edge of the -- at all times I'm going to have three colored nodes like this. If this top edge was not a right edge, then actually I don't need do anything and I can just make this easy move. This move is easy because this -- this node -- the value of this is greater [inaudible] the value of this. So this move we can make. This was an easy case. Let's look at the -- [inaudible] looks skeptical, so let's look at the -- let's look at the -- everybody remembers the lessons. All right. Okay. So let's -- so this is not the hard case. The hard case is when we need to poop. All right. Okay. So the hard case is if the path looks like this, it's the last -- so -- okay. So there's -- there is a rightmost turn. So our current state -- again, we're somewhere in the tree. Our current state looks like this. We've marked this, we've marked this, and we've marked this. Okay. So now our -- again, you're specifying the path to me, and I'm just making sure I can take care of anything. Now, in this case -- okay, so now you specify to me the next -- the next -- okay. You want to make this move. I have to make this move. >> [inaudible] reward always going to be on the path? >> James Lee: Yeah. The reward is always going to be on the path, and I'm going to -- every time that I'm about to leave the last rightmost edge in the sequence, I'm going to have to get credit for it. I'm going to mark that edge as well so that eventually I end up in this situation, where all the last rightmost edges are marked. In this step, this was not a rightmost edge, so I didn't care about marking it. I just kept sliding -I just kept like ->> [inaudible] >> James Lee: Yeah, yeah, yeah. But no, no. But I need to get credit now. I need to -- I need to move the snake so that the head -- that the head is here. And what I need to -- this is the pooping part. >> [inaudible] >> James Lee: Yeah. I need to get the value of this edge. >> So what is the value of the [inaudible]? How does F of F play a role in a reward? You just collect the reward on ->> James Lee: The -- every subset of vertices and edges has a reward, has just a sum of the values. So far in this picture you didn't see any edges getting a reward. Now I'm going to -- at the end of the proof, I don't care about the vertices anymore. I just care about the edges that I marked. But these vertices are going to help me pay for edges. So I initially invest three times F of T in three vertices. And now as these vertices slide down the tree, they're going to help me pay for edges. So here's the important -- here's the -- I mean, this is really the heart of the matter. Basically you can assume that all the rightmost -- all the last rightmost turns have been paid for inductively, and now the snake is about to slither past this rightmost turn. We need to pay for it. That's the pooping part. Because it's the end of the snake. Okay. Look, it still seems like the best analogy. If you don't like it, come up with a better analogy. But here's the -- okay. So we need to slide the snake down and also mark this edge but still have it that the next configuration is at most the cost of this configuration. So how do we do it. Well ->> [inaudible] configuration? >> James Lee: It's just the sum of the marked edges and vertices. >> Marked edges and vertices. >> James Lee: Yeah. That's the value of the configuration. Right? We're moving from our initial configuration here to this configuration, always decreasing the value. So at the end we know that three times F of T is at least this. >> [inaudible] sum of two edges and three vertices? >> James Lee: No, no, it will be a sum of three vertices and all the edges that we've encountered that are rightmost ->> Last right most ->> James Lee: Last rightmost edges. >> Okay. >> James Lee: I mean, the questions are great, because it's not -- I mean, it's still -- I mean, it's a -- but see -- okay. Yeah. Again, in this case nothing interesting is going on. We can just keep slithering because we don't need to mark anything. This is where all the action is going to happen. We need to pay for this last rightmost edge. So the first case I want to do, because it contains all the ideas, is the case when -- is when the next place you want to go is not a rightmost edge. Okay. You want to go here. So let's say -let's look at the values of these nodes. This one is delta. This had diameter delta. This was rightmost edge, so it stayed at delta. This was not a rightmost edge, so it went to delta over R, and this is not a rightmost edge, so it went to delta over R squared. Okay. So -- okay. So let's see what happens. So first of all now I want to apply my inequality here on these balls. So what do I get. I want to apply the inequality. So from the inequality, first of all, I get this term, which is if you see -- I get C -- this term is C times alpha times square root log M, square root log M is 2 to the N over 2. So I get this much. So in fact I'm going to ->> [inaudible] >> James Lee: What's that? >> Alpha and delta ->> James Lee: Oh, sorry. Yeah. This is -- yeah, I should put delta. I mean, alpha equals delta in this demonstration. Okay. So I know that the value of this set is at least -- basically I can make this edge and get rid of this, and I also get -- what do I also get. I also get the minimum of BTI delta over R ->> [inaudible] >> James Lee: Okay. Delta over R squared. It's the minimum of delta F I guess. Okay. So you have to say why is the delta over R squared. Because the separation between these points, if I'm at delta, the separation between these points is delta over R. So that's why. So this alpha over R is delta over R squared. So I get this edge value plus I get this. Now, the whole idea is I want to use this to pay for this. If I can prove that this value is at least this value, then I can put the next thing here and now I -- and I've marked this edge and I can keep going. So now why is it the case. So the first thing to observe is that the minimum here actually applies to this vertex T sub-M. Because the order in which we chose these vertices was in terms of these balls being decreasing. So this little -- this small ball has more weight than this small ball weight and this small ball has more weight than this small ball. So this minimum actually just applies -is just -- is actually BTM delta over R squared. >> F of. >> James Lee: F of that. Yes. Okay. All right. Okay. So that's the first thing. But this T sub-M was chosen so that among all the pieces in this set here it had the maximum delta over R squared value. Since this node -- where is the -- oh, yeah. Since this node is contained in a ball of radius delta of R squared, this value -- this F value is bigger than this F value. Because this T sub-M was chosen so that its delta over R squared was the maximum of everything in this set. So that means that -- that means that this value, F of this, is at least -- is at least the value here. In fact, it's at least the value of anything of delta R squared coming onto this tree. Also any of the other ones. So that's how you move the -- that's how you move the token and pay for the -- I mean, the snake moved on and left something behind. That's how you pay for this right turn. Okay. And then there's only one more case which is I said, you know -- the other case is what if this is a right turn. So that's not any conceptually -- any more conceptually difficult. Let's just do the picture now. It's exactly the same thing. Now the picture is -- okay. We had a right turn like this. Then there's a non-right turn because we only need to pay for this if it's the last right turn. But then you chose to go down a right turn the next time instead of not a right turn. So now I let you keep going and I'll just -- tell me when you stop making right turns. Okay. So you keep making right turns for a long time. Okay. Eventually you stop. Good. So now the idea is I do the same thing. So we started in this configuration. Okay. Again, as before, I use this to pay for this, plus I get a little bit extra, which is this value. And now I'll just observe that, I mean, how do the delta values go. This was delta, this is delta, this one is delta over R. Now it stays delta over R for a long time until finally you make a non-right turn and this one is delta over R squared. Well, now the same argument applies. This delta over R squared ball must be bigger in value than this delta over R squared ball. So, again, we can just move this down to -- down to here. And of course we need to get the things. But now these can be moved for just -- in the tree we did before. You can always move these things down the tree to be here and here. Because moving down the tree only decreases value. >> [inaudible] >> James Lee: It was constipated. >> So if things -- I see. So if it actually ends before you make the right turn, then I guess there's nothing to pay. >> James Lee: So you're saying -- you're saying what if we -- what if eventually we just stop at the last right turn. Okay. So the last -- it's true that the last -- the last right turn doesn't -- we can always use one of these tokens to pay for the last right turn. I mean, if this -- if this was -- all right. We just need to pay for this thing, how do we pay for it. Well, just move it here, and then -- I mean, then you can just pay for it automatically. So you can always pay for the last right turn. Okay. So I'll draw the little box. But that's the end of the proof, that the functional is at least the value of this partition. And the -- and, again, let's see. We're going to finish in an hour. That's good. So we can try to -- now that we've seen it we can try to figure out why -- you know, the whole idea was this. We started with a space, some diameter delta, and then we partition it into pieces of diameter delta over R. Okay. A bunch of these pieces, diameter delta over R. All right. Now, okay, so this -- of course this partitioning gave us an upper bound at this level, but the lower bound has a deficit, right? The lower bound has this deficit that it loses this factor of R. So the balls that we get in our lower bound here, we don't get these -- we would love if the lower bound was sort of like all these giant things, but the balls we get from a lower bound only look like this. Now, this is a really crappy state to be in if all the edges -- if there was like a ton of interesting stuff here, because the lower bound would completely miss it. Like we could lose all the space not contained in these blue dotted balls if we just applied a lower bound to this. So we have to hope that someone was paying more attention at a higher scale so that if we miss something in here somebody would have caught it beforehand. But how are we going to ensure that happens? We do this by look -- I mean, if we want somebody at a higher scale to be paying attention, then we should be paying attention to what's going on at lower scales. So that's somehow what this delta over R squared versus delta over R thing is doing. You optimize so that you make sure you're taking care of the lower scales, but of course you have to partition -- I mean ->> [inaudible] >> James Lee: Where was this used in the proof? This was this -- this was this -- the masterful step of, you know, the proof when we managed to take this to pay for this next thing down here. This thing only gives us minimums. We got a maximum, right? We said that this value was greater than anything that came down here, not just the minimum. So somehow this was because sort of when we chose this vertex we were looking ahead to make sure -- at this step sort of there could have been a lot of loss, but we made sure that we covered it at the next step. It's exactly taking care of this situation. There could be lots of stuff -- but lower bound at this step is only going to see what's inside the green ball. So there's a lot of -- I mean blue ball. So there's a lot of stuff that it's missing. We need to hope that if we're missing stuff there then somebody at an earlier level who sort of had a better viewpoint of what's going on in the space was taking care of it. And to do that, I mean, yeah, as I said. Sort of we make sure that we're taking care of the next scale. So yeah. Pretty beautiful proof [inaudible]. [applause] >> When is the movie coming out? >> [inaudible] can be applied to any of it, to other functional [inaudible]? >> James Lee: Okay. So let me say two things. Oh, beside from the supremum. Somehow in this field the supremum is the most interesting thing people study. But it has been applied to nonGaussian processes, like P stable processes, or in general sort of any kind of process where you have some kind of exponential tail with some power. You can do something similar. Although instead of having one distance a lot of times you get a family of distances that comes up. So let me just say -- I told Jeff I would say something about this. So let me just say why my selfish motivation for understanding this proof, because this proof has the weird property that maybe the more natural thing to do is why do you stop at a bound number of pieces. Just keep cutting out delta over R balls until you exhaust this space, and then the next step could outlook the delta over R squared balls and keep going like this. Okay. So that's how the original proof was done. But this proof has some nice features that -- I mean, that come up in analyzing. Let me just say this problem that Talagrand worked on for quite a long time, which is the Bernoulli conjecture. As we said in the first talk -- I'll stop in five minutes. As we said in the first talk, we can consider a Gaussian process in a different way. Just take T to be a subset of L2, so just a subset of the sequences or the sum of the squares is bounded. And then define your process in the following way. Okay. Also take an infinite family of -- so these are IID normal 01s. And then your process is just -- does this. Okay. So for a separable Gaussian process, this is a generic instruction. This gets you anything you want. So in the index I hear is T. So the question is what if you consider instead of Gaussians here something very natural, which would be the Bernoulli process, where these things are just IID, you know, uniform plus/minus 1 random variables. And instead of -- so instead of trying [inaudible] controlling the expected sup for these Gaussians, what if you tried to control the expected sup of these random sums of the sines. So -- so okay. So there are two observations that come up there. So, I mean, how did we start the Gaussian set. We started, we came up with a natural upper bound, which is this chaining, and then we tried to match it. So what's a natural upper bound for the Bernoulli process? Well, one natural upper bound is that -- I mean, I guess I'll leave this as an exercise. That for some universal constant, which is at most five, I mean, I think it's square root pi over 2. But I have to think about it for a second. One thing you can do just by convexity argument is observe that the expected [inaudible] for the Bernoulli is always bounded by some constant times the same thing for the Gaussians. It makes sense. The Gaussians have tails and the Bernoullis don't, so they tend to be bigger. So this is one way of bounding the process. So in fact if we -- right. Okay. So let's define -- if we define sort of B of T in the same way we define G of T, so B of T is the expected supremum of the sum of the epsilon ITIs, this says that -- this just says that B of T is at most a constant times G of T. That's one way of getting control on the expected supremum. The Bernoulli supremum is at most a constant times a Gaussian supremum. All right. But then there's another way of upper bounding a Bernoulli process that doesn't apply in the Gaussian setting, which is just this second way of upper bounding it, which is that this is at most the maximum L1 norm of any vector in the set. Of course, the maximum value of this sum is if all the sines of the epsilon Is coincide with the [inaudible] TIs. And then you can upper bound it by the L1 norm. But this doesn't -- I mean, so -- you know, okay. So these are two ways to upper bound it. And then finally you can combine these two ways together in the following sense. If you put T inside a set T1 plus T2, so this T1 plus T2, this is the Minkowski sum. So this is the set of all A plus B such that A is in T1 and B is in T2. If you put T inside a set like this, then it's immediately clear that you have this. In particular you can mix the two kinds of bounds together. So you can -- okay. So now up to a constant you can write them like -- okay. So you can mix the Gaussian and the -- sort of the -oh, good name -- the L1 upper bounds together according to some decomposition like this. And Talagrand's conjecture, the Bernoulli conjecture is that this is a universal way of upper bounding the process. So for every Bernoulli process, so for every T there exists T1 and T2 such that T is contained in T1 plus T2 and in fact the B of T value is precisely -- I mean, constants given by what's going on in the Gaussian setting for T1 plus the L1 bound for T2. >> What's the example that the GT is not a corresponding [inaudible]? >> James Lee: Take a family of -- I mean, take your set T to be E1, E2, E3 and so on. So now in the Gaussian case, the expected supremum is infinite. I mean, this is -- because it's the supremum of an infinite number of ID Gaussians, but in the Bernoulli case of course the supremum is 1. I mean, if you sum up one term, you get -- yeah. So in fact they can be arbitrarily different. And of course -- I mean, of course this describes the whole heart of the problem, which is that when I take -- when I take vectors T, which is very spread out, the Bernoulli sum, you know, by the central limit theorem, tends to behave just as does a Gaussian sum. But if I take things that are concentrated, then it sort of -- it behaves more like this bound or it can behave more like this bound. And now the problem is that the process could be a mixture of these behaviors at all scales going back and forth and being -- you know, a different way of saying is that this process is rotationally invariant. So if you rotate the set T, the distribution here doesn't change. Whereas of course I mean this process is crazily aligned with coordinates. It doesn't have this rotational variance at all. So, anyway, when you consider -- when you consider this process, it seems that the most natural thing to do is instead of considering one distance you consider a family of distances. What's that family of distances. You sort of think about truncating these vectors T, so they have bounded L infinity norm. Once these things have bounded L infinity norm, then you can start to see some kind of comparison with the Gaussian case. But now you sort of need to consider all the truncations, you know, you truncate, you don't truncate, you look what happens when you sort of -- when I truncate, I just mean like sort of cap out the coordinates. You know, like make the coordinates have some maximum value by just cutting off the tops of them. And you can consider it sort of -- it seems to understand this process you have to consider what happens as this truncation parameter goes from infinity to 0 and you get this family of distances. And this setting where you index things by the number of points instead of the distance is much better when you have many different distances. Because then you're always making progress. You're getting more and more sets as opposed to like -- if you have a bunch of distances and you have different distances in every cluster, it's not clear like -- I mean, is your diameter going down with respect to what distance or whatever. So, anyways, this was my motivation for understanding Talagrand's new way of proving this. Okay. That's all. >> Yuval Peres: One more thing. Maybe you want to spend a minute of saying how these -how the gamma 2, the Talagrand functional serves to replace the log N [inaudible] theorem. >> James Lee: [inaudible]. >> [inaudible] >> James Lee: It's -- I mean, we've already seen it. It's -- you combine chaining with the [inaudible] we already have. >> Okay. Let's just use that as a hint for anyone that wants to pursue it. And let's thank James. [applause]

Document 17868095

Related documents

Products

Support

Document 17868095

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib