>> Yuval Peres: Okay. So good afternoon, welcome everyone. Today we have a double feature. At 3:30 Jeffery Grimmit [phonetic] will be speaking, now it’s James Lee who will tell us about Markov type and the multi-scale geometry of metric spaces. How well can martingales aim? >> James Lee: Okay. So thanks for coming. So the talk today is going to be about a problem kind of with the intersection of probability and geometry of metric spaces. Let me begin by just giving you a sort of conjecture that, well, which is the main object in the tall court, at least a special case of the main thing. So this is a conjecture of, oh actually, okay I’ll write the conjecture over here but then I won’t write below it just in case the screen -- of nor [inaudible] and Scott Sheffield from 2004, okay. And the conjecture is the following: So one has first of all a planar graph. Either the graph can be infinite or finite. You’ll see that for this particular stating of it it doesn’t really matter. You have a planar graph. I’ll use D sub G to denote the path metric, or the shortest path metric on the graph. And then you also have a reversible Markov chain that states base as a subset of vertices, okay? So this is some reversible Markov chain on V, and it’s supported on a finite subset of the vertices, okay? So this is why is sort of doesn’t matter if the graph is finite or infinite, I’ll tell you the quantifiers here in a second. So you have this Markov chain and it’s important that this Markov chain is started according to the stationary measure. So, in fact, this is going to persist throughout the talk. The Markov chain is started as stationary, and now the question one asks is what’s the average rate of drift of this Markov chain? So look at the expected distance in the graph, starting at time zero going to time T. Square this and then compare it to, so some kind of [inaudible] like this, compare it to the distance that the chain goes in one step, okay? So this is the kind of statement one wants. The expected distance squared after t steps is at most the constant times time times the expected distance after one step, okay? Okay? So what’s the, I didn’t want to -- okay, so the point is that C here should be a universal constant independent of the chain, independent of the graph. Just, you know, C should be 10 or 100, okay? This is the conjecture, okay, that there exists such a constant, such that for all planar graphs for all Markov chains supported on the vertices [inaudible] have this kind of drift that you would get for instance from a simple random walk on the path, right? >>: [inaudible] have to be restricted to the edges of the graph? >> James Lee: No, no. So the Markov chain, the only way the graph comes in is through the path metric in [inaudible]. So it doesn’t have to walk on ->>: [inaudible] >> James Lee: So the metric is certainly given by the edges of the graph. It’s the path metric so you put some weights on the edges or you don’t have to wake them if you don’t want. >>: [inaudible] the Markov chain could be any subset. Couldn’t you just have an [inaudible] metric? >> James Lee: No because you measure the distance in terms of the geometry on the graph. So ->>: If you have [inaudible] metric space you can just put a path [inaudible]. >> James Lee: There’s a restriction on the graph. So let me give a couple of examples just to demonstrate some of the subtleties here. So if, for instance, if you just considered some N by N grid and you did simple random walk, then no one is surprised by the fact that it’s Holt [phonetic]. After T steps you expect your distance to be about square root of T away from the starting point. But suppose the random walk didn’t sit on the edges, so you could do things like larger jumps, okay? Then if all the jumps were the same length then you also wouldn’t be surprised because kind of just by rescaling it again would be a random walk in the plane and the same thing with Holt. So it starts to get a little more difficult if you think what happens when sometimes the jumps could be small, and then every once in a while there could be a huge jump. And now there’s sort of you can’t rescale to get a random walk on the plane again. You can’t rescale to get a simple random walk. And this is already, I mean, there is a simpler proof of this. There’s a fairly easy proof that it holds for this graph just because it’s a subset of the, you know, it’s a subset of the plane. But already this is, it’s not so easy to analyze that from first principles when you have the jumps being allowed to vary in size. Okay. So that’s one example to keep in mind. And another example is to look at the complete binary tree. So if you weren’t paying very careful attention to the definition you might think that this is [inaudible] for the binary tree because if you start your random walk at the root, then simple random walk will of course go at linear speed until for the height of the tree, okay? But the point is that this Markov chain is started stationary and in the complete binary tree almost all the stationary measure is at the leaves, at which point the random walk doesn’t go anywhere. It just sits at the leaves, right? So okay, so these are two examples to keep in mind. So now, first of all, let me tell you where this notion comes from and then I’ll in a little bit get into the proof. So it goes back to this definition of something called Markov Type [inaudible] from 92. Okay, so let me define what Markov Type is. So a metric space, this is a metric space, has Markov type P, so this is some number greater than or equal to one. Okay, so there are a lot of qualifiers, but it’s not so -okay so here are the qualifiers. So if there exists some constant C, such that, and I’ll remove these qualifiers for most of the talk, but for every finite state space, this is some finite state space, for every reversible Markov chain on the state space, okay, started at stationary, and for every time we have this kind of dependence -- Okay, now I have this map. Oh I guess yes, okay. For every, I’ll explain this in a second. So take a finite state space, take a map from the state space into your metric space. Take reversible Markov chain on the state space, and then ask for something like this to hold, okay? So it’s the same thing I wrote before for planar graphs, except now I have put a power P instead of the power P equals two. And here I had the Markov chain walking around on the graph, whereas here it walks around on some auxiliary state space and then there’s a map from that state space into the graph. This is the right definition because, well, I’ll explain in a second. It’s the one that has the correct applications. It also sort of composes very nicely, but for most of the talk I’ll actually omit this map F and just pretend that the random walk is walking around the metric space. I don’t know of any, you know, it could actually be that you could remove them. Actually, I suspect it’s probably not true for some trivial examples. So for most of the talk I’ll omit the map F but this is the proper definition. In other words, the Markov chain has more state than just where it is in the metric space. It also could have some auxiliary state sort of coming from this map, F. >>: Independent of the metric space. >> James Lee: Say it again? >>: So C-- >> James Lee: Oh, no, no. So C could have been in the metric space. But C cannot depend on any of this other data, okay? So C is probably -- whether a metrics or not this P is the Markov type of the metrics space, the best P for which this holds in there is this constant also is a property of the metric space. And then, okay, so where does this concept come from? I mean, it’s a fairly natural way to talk about the drift behavior sort of on finite subsets of a metric space, okay? But it’s inspired by sort of something in the linear theory, so in kind of the geometry of [inaudible] spaces. So suppose I have a norm space X, and then I have a bunch of vectors in X, X1, 2, to XN. Okay so I could consider a random sum that looks like this where the epsilon I’s are [inaudible] random signs. So I take some norm space, take a bunch of vectors, and randomly sum up the vectors with sines. And now, okay, so I could, for instance, measure -- no. Okay, let’s just use the square for a moment. I could, for instance, measure okay so this is a random walk in this linear space. I can measure the distance I go after sort of N steps according to this random walk, and I could compare it, say, to something like this. Okay, so this is where you can think about this as kind of this is the length this is the sum of the length of the squares of the individual steps of the walk. Okay? So if a [inaudible] space X satisfies this for some constant C independent of the choice of vectors and independent of the number N, the space is said to have type two, okay? And there’s a sort of similar notion of if you reverse the inequality you get the notion of co-type, and it turns out that sort of this parameter sort of type and this parameter type and co-type, if you change the constant two to a power P, then you get type P and co-type P. Actually tell you a lot of geometric information about the [inaudible] space you’re in, okay? So let me give you an example then I’ll tell you why [inaudible] prove this. So, for instance, the following is known as the Morey [phonetic] extension theorem. So take two [inaudible] spaces, X and Y, and then consider the following type of question. Okay, let’s actually also fix some subspace of X. And then consider the following type of question. I have a bounded linear operator from S into Y. So it’s a linear operator with bounded operator norm, and I’m curious about when I can actually get an extension of this operator to the whole space X, okay? Such that it remains bounded. So this one is bounded and I want this one to be bounded as well, okay? An extension in a sense that if I restrict the new math to the subspace, then it’s equal to the old map, okay? So you can ask sort of when do such extensions exist? And Morey proved that such an extension always exists just given the behavior, just given information about how random walks behave in X and Y. So if X has type two, so if X satisfies and inequality like this, and Y has co-type two, then for any bounded linear operator this extension always exists. And this kind of extension is a very powerful thing you can do a lot with. So what Keith was trying to do is, actually something on the face of it seems kind of ridiculous, is there is this sort of beautiful history in the non-linear geometry of [inaudible] spaces of linear notions, okay, so this is a bunch of linear notions. I have linear spaces and subspaces and linear maps as having non-linear analogs that sort of by all accounts one shouldn’t expect tot have, except for the fact that we happen to live in a world that’s nice enough that somehow this kind of thing works out. And so what Keith was trying to do was to give a non-linear version of this extension theorem just in the category of metric spaces, okay? So what do these things mean in the category of metric spaces? Let’s just suppose we have two metric spaces X and Y, and now we’ll have this S, but now S is just a subset of X. So a subset of X, and instead of a bounded linear operator, we’ll have a Lipchitz map from the subset into Y. And now the extension question is whether this map extends, does there exist and extension of this map to the whole of X? Okay? Which is Lipchitz. So, for instance, it’s a classical result but if X and Y are Hilbert spaces it’s a result of [inaudible] that such an extension always exists, and, in fact, you can take the Lipchitz constant of the extension to be equal to the original Lipchitz constant. Here’s what Keith proved, that if X has Markov type two, and Y has, okay so Markov co-type two. Markov co-type is not defined but this is already interesting if Y is just a Hilbert space. So this holds for Y being a Hilbert space. So X has Markov type two and Y is a Hilbert space or has Markov co-type two, which I’m not going to define for various reasons. Then this kind of Lipchitz extension always exists. Okay. I’m just trying to give this to get the setting correctly if you really sort of want to know what’s going on. There always exists and extension such that the new Lipchitz constant is at most some constant times the old Lipchitz constant, and that constant depends on just this constant in the Markov type definition. So the point is it’s a quantitative relationship as well, okay? So this is sort of you go form looking at the geometry of spaces in linear spaces in terms of these linear random walks. There is this metrical definition and there is somehow this beautiful extension, analog, of the linear extension theory to the non-linear setting. Okay. So, I mean, this is sort of just one example of the applications of this thing. Yeah? >>: [inaudible] example that shows a metric space is not Markov two? >> James Lee: Good. So let’s talk about so I didn’t give any example of metric spaces that are Markov type two. Okay. So what did Keith -- Yeah, so there was one problem with Keith’s paper. It’s a beautiful paper. He proved the following theorem. He proved that Hilbert space, so L2 has Markov type two, and, in fact, with constant C equals one in the definition there. I guess I should mention, by the way, that Markov type one is trivial. So if you take P equals one, then this follows with constant one just by using the triangle inequality, lineative expectation, and then the fact that you are stationary means that every step of the walk is actually distributed the same as sort of FZ zero, FZ one. Okay? So P equals one is trivial, just follow from the triangle inequality. Right. So yeah, this theorem seems great, the only problem was that this was the only, the Hilbert space was the only space for which Keith was able to prove Markov type two. So if this were really a non-linear sort of hardcore non-linear generalization of type and co-type, then you would expect that there would be other spaces besides just Hilbert space that had this notion. Okay. So I’m coming to your question. So now let me mention sort of, so this was the situation until this work of -- I’ve already written your names elsewhere, so from now on this will be NPSS. Right, so they proved two things. First of all, they proved that LP for P bigger than two has Markov type two, which is like a very comforting thing because those spaces have linear type two. So if this were really a non-linear generalization then you would hope that when you restricted the linear category you sort of get the linear theory back. So that’s great. And they also proved that trees have Markov type two. And for Yuval [phonetic] it’s a very nice kind of -- so this was in the functional analysis setting, this was kind of the most important open problem after Keith’s work, to resolve the Markov type of LP for P bigger than two. I believe that they started actually just working on the problem for trees, and it turns actually that trees are a bit harder in some sense than LP, P bigger than two. So once they solved the tree problem this problem fell rather immediately. So there are other spaces of Markov type two. What’s an example of a space that doesn’t have Markov type two? Okay, so L one, I’d even say a different -- L one has only trivial Markov types, only Markov type one. It has nothing better than one. Okay? So why is that? So just take the discreet hypercube sitting inside L one, and just take standard random walk. So simple random walk on the [inaudible] cube, right? And now it’s straightforward to see that the distance from the starting point to your ending point after, say, N over three steps. This is at least N over ten with high probability. So this immediately shows that this space has no non-trivial Markov type because after N steps it’s gone distance N. So it certainly will not be the N, and of course, every step goes distance only one. So the right hand side here is one, the left hand side is after T steps, which for T equals N over three is going to be like N. It grows linearly, it doesn’t grow sort of any slower than that. Okay? So this shows that L one has only trivial Markov type, okay? And I guess it’s a good point to sort of point out one other thing about Markov type now, which is, okay, so let me do the following. Suppose X and Y are metric spaces. Let me just write down what it means to be [inaudible]. But sort of the map from X to Y is bi-Lipchitz, okay, if it preserves all [inaudible] distances, so all right I’m going to cheat a very small amount here. Okay. So let’s say that a map F is by Lipchitz if it preserves all the distances up to a factor of C, this quantity, we’ll call sort of the informal C such that [inaudible] the biLipchitz distortion of the mapping. And one thing that’s fairly apparent is that this Markov type is a bi-Lipchitz invariant, which means that if I have a metric space X that embeds into Y bi-Lipchitzly, and Y has Markov type two, then X also has Markov type two, okay? And that’s just because if you look at the definition, if I can change all these values of the constants and the statements are still true, okay? As long as I change them all to universal constants, not depending on the [inaudible]. So if I can embed one space into another space in a bi-Lipchitz way, and the new space has Markov type two, then X inherits this from the new space, okay, and in fact, in a quantitative way, right? So if I could embed one space into another with distortion K, then you would have the sort of same kind of inequality -- okay, I guess all right. Okay. This doesn’t matter because it was just a constant raised to the power P. But if I want the constant to change in proportion to the distortion I guess I should put a power P here. But the point is it’s quantitative. So if the map has distortion D, then sort of X will inherit the Markov type of Y, but now with a constant. So where this constant has grown by a factor of D, okay? So in particular you can conclude, for instance, from this argument, that the Euclidean distortion, so the distortion require to bi-Lipchitz the embed and the N dimensional cube into Hilbert space grows like square root of the dimension, okay. So there are many ways to prove this. It’s been known for over 50 years. But this gives you an example of how Markov type comes into play also when you think about the sort of bi-Lipchitz geometry of metric spaces. Okay, this is a very simple example, but by studying random walks you can sort of really compare the geometry of two spaces using this kind of notion. Okay. Good. Okay, so now let me return to the conjecture. So I’ll prove this to you now in a moment, but different from Keith’s proof I’ll follow sort of NPSS and then you’ll see why the second, sort of the subtitle of the talk, comes into play. Why Martingale’s become a key object of study here. So ask many questions if you have any at the moment. All right. Okay. All right. So now coming back to the planar graph question. So why hasn’t sort of the theory so far answered this question? Well, one reason is that we know that there are planar graph metrics, which do not bi-Lipchitz embed into Hilbert space. And they don’t bi-Lipchitz embed into LP for P bigger than two, so you can’t use any of this sort of embedding machinery we’ve talked about to solve this question. Let me tell you one thing we do know about planar graphs. Obviously we know a million things about planar graphs, so let me tell you sort of one kind of embedding we know they do have, and then I’ll tell you our main result. So one more notion of embedding and then we’ll move on to Martingales. So this is a notion of a threshold embedding. So we have again two metric spaces, X and Y, and we’ll say that X threshold embeds into Y, okay, so if there exists a constant K, and now this threshold embedding is going to be a collection of mappings, not a single mapping. So it’s a family of mappings from X to Y. The family is indexed by some non-negative real number. Okay, so the family of one Lipchitz maps, which satisfy the following: If the distance between X and Y in X is at least Tao, okay, well, if I had a single biLipchitz map and the distance was at least Tao, then a distance in the image would be at least Tao divided by some constant or times some constant. But here the sort of only one of these maps is required to notice this. So in this threshold embedding, okay. We have this kind of condition. These are all supposed to be Taos, all right. Okay? So X thresholds embeds into Y. If there exists a family of one Lipchitz maps, basically one for every scale of the space. So there is no global map that can get all the geometry right, but at scale Tao, V sub Tao reflects that scale Tao up to a constant. This is for all X and Y in X. But if the distance in X is at least Tao, the distance in Y should be at least Tao to a constant factor. And this case is a universal constant, okay? So this may look a little bit strange. There are plenty of examples, for instance, even the complete binary tree [inaudible] shows that the complete binary tree does not embed into any Hilbert space, but it does threshold embed into Hilbert space. Actually, I won’t do it now, but if you want at the end or offline I found at least a fairly simple description of a threshold embedding of the complete binary tree into Hilbert space. If you want to see what one of these things looks like. And actually, especially in the number of applications in computer science, understanding the relationship between these threshold embeddings and biLipchitz embeddings was very important, okay? So there’s this ongoing theme in many parts of metric geometry of understanding sort of when the ability to control a space at every scale uniformly sort of implies the ability to control it somehow all at the same time, all the scales simultaneously. Okay. So this is the notion of a threshold embedding. And here is, I shouldn’t call it a fact, it’s not obvious, a theorem that sort of planar graph metrics threshold embed into Hilbert space. So you can take that, so I’ll add the word uniformly. So uniformly here means you can interpret this theorem in multiple ways. If you take an infinite planar graph metric it threshold embeds in Hilbert space. If you take a collection of finite planar graph metrics they all threshold embed with the same constant, K. The constant is uniform. It doesn’t depend on the, okay? And I won’t go into it now, but actually along with this conjecture they ask about other spaces like doubling spaces, other even things sort of like hyperbolic spaces, spaces of bound in the [inaudible] dimension. Okay, that wasn’t specifically asked there, but the point is that all of those spaces admit threshold embeddings in Hilbert space, okay? So sort of it’s not just a planar graph question, although that’s what I’m focusing on at the moment. Okay? So this is one thing we know about planar graphs, and now here is the main theorem, which I’ll try to present in the remaining time. If a metric space threshold embeds into Hilbert space, then the space inherits Markov type two. Okay? So for bi-Lipchitz embeddings it’s straightforward, but it turns out that you actually need much weaker control on the geometry to get the Markov type two. Okay? So just the existence of a threshold embedding is enough to get Markov type two. And for any sort of enthusiasts or junkies you can sort of generalize this to say, if you threshold embed into a P uniformly smooth [inaudible] space, then you have Markov type P. So there is sort of a generalization beyond Hilbert space. But this is sort of the most interesting thing because it has applications. It shows that planar metrics, and doubling metrics, and sort of hyperbolic spaces and so on all have Markov type two. Okay. All right. So now I want to give you some idea of how this theorem is proved. So now we’ll go to probability. >>: [inaudible] >>: I thought that was for universal C. >> James Lee: Yeah. So this is, yes. So I guess this is something people do quantitatively. In other words, the Markov type constant only depends on the constant in the threshold embedding. And so the fact that this is uniform does imply that you get uniform constants. Okay. You’re right. So it’s weird. It actually does imply just by itself, you can actually take all the planar graph metrics and just put them all in one single giant planar metric space. You don’t even care if it’s separable or not, I mean, it’s [inaudible] you can just -- and then the fact that I have for a single graph means that [inaudible] uniform constant. Okay. All right. So now let’s start with something easy. Let’s see, I have twenty minutes? Is that how long I have? >>: Twenty-eighty. >> James Lee: Okay. Let’s prove this fact that was first proved by Keith [inaudible], but we’ll prove it in a more complicated way, that the real line has Markov type two. Okay. This is our goal. Okay. So recall what happens. We have this Markov chain, which sort of has its state space is finally many points on the real line, and this thing hops around. And we want to prove something like this with P equals two. Okay. So how can you do it? Well, there is one situation on the real line where we know we get some kind of behavior like this, and that’s if we had a Martingale, right? So suppose that we have some real value Martingale. Then, by just [inaudible] of Martingale difference sequences, the expected distant square after T steps is exactly the sum of the distances squared in the individual steps, okay? So at least Martingales satisfy the kind of growth that we’re looking for. Okay. So this is not a Martingale, but maybe it’s kind of, I mean, you started at stationary, right? So if you run it for long enough it doesn’t go anywhere, sort of the center of mass stays at the same point because you started stationary. Okay. So the way to bring Martingales into the fold here, and this is what NPSS did based on some work of Lions [phonetic] and Jung [phonetic], T. J. Lions. Okay, so let’s try to convert this Markov chain into a Martingale. >>: Which Jung was it? >> James Lee: T.S. Jung. [laughter] Okay. Let’s try to convert this chain into a martingale. All right, so the first step we can do pretty easily. We’ll start at the martingale at zero. Good. Now, the next step we get into trouble. So let’s sort of try to define the difference sequence. So if the world was sort of as we might hope and expect, we would just define our martingale difference to be the differences of the values in the chain, you know. And if this was a martingale we’d be done, so we’d be really happy. Okay? Because then this would hold. That would immediately give us our inequality. Of course, it’s not. So, okay, let’s just add the appropriate correction term, right? For this to be a martingale you want that the conditional expectation of this is zero. So let’s just force it to be zero. Okay? So the differences here are given by how a chain behaves minus the defect of our chain from being a martingale, okay? Okay. So let me tell there is a good thing and bad things. The good thing is that we can control, by the way, tell me if I, I won’t write any lower but tell me if I start to write -- the good thing is that we can control the increments of this martingale in terms of the increments of the chain. So this squared is at most, sort of twice the expectation of this squared, plus twice the expectation of this squared. So sort of, all right, we can write something like this. And then because you are stationary this is four times the zero minus the one squared, okay? So I’ve given you a martingale. At least I can bound the differences in terms of the differences of zed [phonetic]. All right? The problem, of course, is that what I really want to be able to do now is say, well, if the martingale doesn’t go very far after T steps, then the chain also didn’t go very far after T steps. This is problematic because in every step this martingale sort of picks up this extra cruft [phonetic] that sort of from the non-martingaleness of zed. So now here’s the beautiful step. I mean, we have reversible Markov chain at stationary, so we’ve been a little bit sort of like, we’ve broken symmetry by only tracking it from the beginning, right? Because it should be the same run backwards in time. So in one sentence I’ll say the idea, and then we’ll write it down. But the idea is now to do the same thing but have another martingale that tracks the Markov chain backwards in time. And then by the magic of reversibility, when we take the difference of those two martingales, all the crap will cancel out and we’ll sort of, you know -- when one is going forward in time and one is going backward in time, when they meet this extra stuff cancels out. Okay? So what do I mean? Here’s our backwards in time martingale. It starts at time T and then the differences are given by T minus S minus one. The transformation is just T minus this. Okay? So the point is here’s another martingale where there’s backwards filtration. And it satisfies that its increments are bounded by the increments for zed. And now here’s the, okay, I’ll state it as a lemma, but -- if I look at zed as plus one minus zed as minus one, this is exactly this minus it should be plus one, okay. So this is my claim. That this is equal to this. >>: [inaudible] >> James Lee: No. No. Okay. It’s not divided by two. >>: Check it. >> James Lee: Okay, let’s check it. So for N what do we get? We can just plug it in. I mean, you know it’s not divided by two -- >>: [inaudible] the difference by two on the left. >> James Lee: Yeah. So there’s a gap of two on the left. That’s the -- right, so in [inaudible] case there is a parody issue, so there’s a ->>: Yes, so it’s different from a continuous one. A continuous one you have to divide by two. >> James Lee: In the discreet case it’s actually a little uglier because we have this gap. But okay, so you can just -- well, here’s the, okay. We’ll just do the calculations. So you get zed S plus one minus zed S minus expectation of this condition on this. And here you get, okay so here you get zed S minus one minus zed S. So again you take T minus this and you get S minus one. And you get minus expectation of this, okay? And now we’re subtracting this from this, so this cancels, okay? Here we get the right thing, zed S plus one minus zed S minus one, okay? Zed S cancels, and now we have reversibility. The expected value of zed s plus one condition on zed S is exactly the same as the expected value of zed S minus one condition. So those ->>: [inaudible] >> James Lee: It cancels as well. >>: [inaudible] >> James Lee: Okay. So sort of the differences in z up to this annoying parody issue can be represented by, it’s not a single martingale, but it’s the difference of two martingale different sequences. And so what this implies, okay, so there’s a parody issue that I’m going to now gloss over because it’s messy, is that we can write zed t minus zed zero as a difference of two martingales, AT and BT, where the increments here are bounded, sort of the square of the increments here are bounded by what they were in zed. A and B are almost M and N, except for the fact that you have to correct for the parody issue. Okay? So then this immediately tells us what? That expected value zed T minus zed zero squared is at most, well, twice expected AT squared plus twice expected BT squared. But now these are martingales and their different sequences are bounded by this, so we immediately get -- okay, there’s some constant times T. Okay, all right. So there, we proved, that’s the end of the proof that the real line has Markov type two. Okay? And it was by taking our Markov chain, decomposing it to a difference of two martingales, and then using just straightforward bounds for the martingales, okay? So two things to observe. First of all, the proof didn’t have to be on the real line. Actually, one of the beautiful things about the proof is that it only uses addition and subtraction, okay? It doesn’t use multiplication at all. So if it used multiplication it would still work. It would still go from the real line to Hilbert spaces, but since it uses only addition and subtraction, actually you can generalize it to arbitrary norm spaces. Okay, so there’s one part that, this is the one part of the proof that uses multiplication, actually. The idea that martingale different sequences are orthogonal actually involves inner products. So this is what fails in a general norm space, but everything else carries through. Okay. And actually, so with this machinery now, for instance, if you can control martingale different sequences in LP spaces then you can prove LP has Markov type two for P bigger than two, which is what NPSS did. So now, let me try to tell you the difficulty that arises in proving the main theorem here, okay? All right. So we’re going to follow this formula but then we’ll get stuck. Okay. So if we have our space X and we say, suppose we had a bi-Lipchitz map from X into L two. I can kind of, you know, sort of the point is that now I can -- and also I have my Markov chain taking values of X, I can use this -- okay, so again you have to believe that everything I said here works if you replace R by Hilbert space. Hang on, just go through and check. Okay. It does work. I mean, there’s only addition and then maybe, like, the Pythagorean theorem, okay. Okay. So we can sort of take our bi-Lipchitz map, we would write this as a difference of two martingales in Hilbert space, and now just apply everything as before and we would sort of conclude that -- okay. So what do we conclude? Expected value of this is at most some constant times T, okay, times this. This is all in Hilbert space, all right? This is the L two norm. And now use the fact that it’s bi-Lipchitz on the right here you just replace this by, you know, okay, what’s going on in the metric space, and on the left here I think the same thing. Replace it by what’s going on in the metric space. There’s a reason I’m doing this. Okay. I’ve already claimed that bi-Lipchitz preserves the property. So, I mean, here you would first map your space in Hilbert space. You would write sort of the Markov chain under this map as a difference of martingales and then proceed as before. The problem, now, is that we don’t have a bi-Lipchitz embedding. We just have a threshold embedding. So we only have controls sort of for every scale we have to use a different map, which means that for every scale we actually get a different martingale, all right? So now, if my threshold embedding I unfortunately, of course, erased the definition, I’ll recall it in a second, but if I look at sort of my map that’s able to control scale Tao, I can again write this as the difference of two martingales. But the martingales depend on the scale of the mapping that I’m using, okay? All right. So now let’s try to use that to prove it, and then you’ll see where the main difficulty lies and I’ll tell you how to resolve it, and then we’ll be done. Okay, so if this didn’t make sense let’s just -- Okay. Here’s our setting. We have a metric space X. We have this Markov chain zed. And we have this family of mappings from X into Hilbert space. These are all Lipchitz and they have the property if the distance in X is bigger than Tao then the distance in the Hilbert space is bigger than Tao divided by some cost. And now let’s try to use this to prove some bound on the expected value of this. So this is what we care about. We try to prove that the existence of this bedding gives X Markov type two. We should be able to prove an upper bound on this thing. Okay. So now I’m just going to do the most obvious thing. I want to control this, but I can only control one scale at a time, okay? So let’s first just write this in the following way: Let’s sort of the write the expectations squared in terms of the tail. So now at least I know I can write this in terms of the event that this thing is big, okay? That’s going to allow me to use my mapping at scale Lambda to say something about something Hilbert space. So now I can just say this is at most probability. So now I’m working at scale Lambda and I have a K here, okay? So here I’ve just used the property that when this is big it implies that this is big. Okay? When this is bigger than Lambda it implies this is bigger than Lambda over K. Okay. Now I know that for every lambda, all right, I’m using lambda so let’s put lambda. For every lambda, this is a mapping into Hilbert space. So I can write it as a difference of two martingales. For simplicity and the sake of time let’s just pretend I can write it as a single martingale. I’ll just cut down the number of terms by two. So let’s just use A. So we can now bound this by probability that this martingale, okay now it’s indexed, okay good. This is where I wanted to get. Okay. So now I can bound it by this kind of weird thing. Now, this is very strange because, well, first of all it’s not even necessarily measurable, but ignore that for the moment. This is very strange because at every point in time I’m considering a different martingale, okay? So, I mean, if these were just arbitrary martingales, or even martingales with just bounds on their total [inaudible] norm or something I would be out of luck here. The only real benefit I have now is that I know the way these martingales were constructed. They all live on the filtration that follows the random walk around the metric space. So they’re all defined sort of with the same filtration, and that brings us to the following. I’ll state a theorem now, and then you’ll see how this sort of -- basically I have some sort of bound on the increments of all these Markov chains, but there are a bunch of them, and they could all use those increments in different ways. So this is where the part about martingales aiming comes into play. Okay. So let me write down the theorem. Okay. So I have, and I’m just abstracting what we know here. So we have some common filtration of our probability space, and we have some random variables Alpha T, which are adapted to the filtration. And now consider, say, a family of martingales. Let’s index them. Okay, I’ll index them by, okay, some index set I. So this is some family of martingales. And what I’m going to say is that, and okay all the martingales are [inaudible] filtration. I can bound all of their difference sequences uniformly in terms of Alpha. So a bunch of martingales, and what I know about them is they all sit on the same probability space and I have, this is a random variable, but I have the same upper bound on all their differences, okay? This random variable is the one coming from the random walk. This is essentially how far the random walk goes in the metric space and that upper bounds how far all of these martingales can travel, okay? And now what I want to be able to say is the following, okay so this is the assumption. Then the integral, I’m going to use y instead of lambda, of this times the supreme over all these martingales; I’ll take the worst possible tail. Okay. So here’s what I want to say. I have this integral, which is the same integral here, I’ve just replaced lambda by y just so that lambda here is tied to the martingale as well. Here I’ve taken just the supreme over all these possible martingales. If I took the sup [phonetic] outside, then there would be an obvious bound here. If the sup was outside, then this is just what’s inside is just the expected square of this. That’s what’s inside if I take the sup outside. It’s the sup of a bunch of martingales, but the integral is just the expected square. And then, as before, we can just bound it by -- Okay [inaudible]. So if I took the sup outside then I have to bound that I can just bound, this is just the two norm of the martingales, just the expected value squared. And I can bound it by the expected values of the increments, okay? So the novelty is that the supreme comes inside but it’s still true with some universal constant C out here. Okay. So this is in some sense, at least here I’ve done it for real value martingales. This is maybe the main technical step to analyze martingales in the real case. And okay, so I just wanted to sort of express the difficulty of what’s going on here. Again, if you get rid of the sup and you zed a single martingale, then this would follow immediately from, this is just the expectation squared and you get it immediately from the fact that the expectation squared is at most sum of the differences squared. But if I allow you to take the sup it’s not clear at all sort of why these martingales can’t sort of each one try to aim for a different point in the tail, right? Like some of them only have two jumps, you know? I have a big jump I take with some small probability, and a little jump that I take most of the time. So now my martingales can at every step I have a whole family. They can take either of these two jumps. So I can consider, for instance, the martingale that always takes the small jump to the right and the big jump to the left, or I can consider a martingale that picks uniformly at random and sometimes takes a small jump to the left and sometimes to the right, or the martingale that, you know, goes the other way, all right? And the question is whether if I give you some tail bound Y to aim for, can you sort of conspire so that your martingale manages to use all it’s L two norms just to hit that particular value of Y? Okay? Well, the answer is no. Okay. So there is uniform control if you take the sup over all these things, okay? So this is the sort of how well can martingales aim? Not that well. Okay? Like, a martingale can’t conspire to use all of its two norm to sort of just mange to get to Y, okay? I mean, they’re subject to the same difference constraints. So this is the main theorem. What’s the proof? Well, it’s essentially due to work of Burkholder and Gundy from the 70’s, although that took us quite a while to realize. So they have some very beautiful techniques for analyzing martingales using very clever stopping times that allow you to prove this kind of result. Okay. So let me know if it’s not clear. I’m going to end in just a second, okay? But this is the main kind of thing that comes up here when you have these threshold embeddings. Now for every scale there’s a different martingale and you have to somehow control them all uniformly. And it works, but it’s perhaps counter intuitive. Yeah? >>: [inaudible] >> James Lee: So it’s not, okay. So it’s odd actually. It doesn’t appear in his book. It does appear in a survey paper around section 11. Okay. What’s the -- okay. So here I’ll state the theorem. So let’s see. So I have a martingale and I want to make sure and in this case I need some lower bound on, like this, okay? Okay. I need some lower bound on some kind of thing that says you make a move often enough, okay? So this is a [inaudible] martingale have some bound like this, and then let’s define the square function just to be the sum of the squares of the increments, okay? So this is sort of what I know about the martingale. And now here is the claim. All right, so let me write it down. I guess I should do this. Okay, so I’ll write it down then I’ll say what it says. Okay, so for sufficiently smooth function one has the following kind of control. Okay. So this is the maximal process associated with the martingale. So I have the martingale, I look at its maximum value. So I’ll say what smooth means in a second, but for any sufficiently smooth function the phi of this maximum value is controlled in terms of phi of the square function. Okay? It’s in a very strong inequality and what kind of functions can you use? So this holds as long as sort of phi is doubling in the sense that when I, okay, so there’s quantitative argument here. You assume phi is doubling with some constant lambda, so when I double the argument the value goes up by a factor of lambda. That gives me some constant C lambda such that this holds, okay? And the interesting thing is you can’t use this to bound the tail necessarily because you can’t use cutoff functions, but if you look at this integral you don’t really need to be able to bond the tail precisely, you only need to be able to bound it up to something which is sort of up to third order because the integral is computing something that’s second order. So just by making phi drop off cubically that’s enough to write which is going to be doubling. So if phi is just a cubic function you’ll get something doubling. That’s enough to control this integral. Okay. This is the Burkholder Gundy theorem. It’s beautiful and somehow has to sort of cope with this and it does it using magic. I mean, it’s sort of like, well, Burkholder has a survey where he has all these techniques and there’s sort of this, you know, it’s a really magical stopping time argument that gets this to hold. Okay. So I have to end now, so I did want to give one open question. Okay. So this is a q question about whether or not martingales can sort of aim, you know, use even subject to some bounds kind of aim to be at a particular point. Let me just state another question that came up in some other work with Yuval, which his also about how well martingales can aim, and in this case better than we would have liked. So it’s just sort of like as a way to see the deep richness of even very simple martingales. Here’s a consider the following class of martingales. So I have a martingale on the real line, and the martingale can just do the following. At every point, so actually suppose it’s a martingale in the integers, at every point it can go plus or minus one, or it can go plus or minus two. These are the options that the martingale has to itself, okay? It’s very, you know, it can just go left or right, or it can go left two steps or it can go right two steps. Okay? And you can even assume it has the sort of Markovian. It just makes a decision based on the integer it’s at, okay? So ever integer it goes plus or minus one or plus or minus two. And now you can ask -- so it’s a whole family of martingales, so you can choose the rule however you want. Suppose you try to choose the rule sort of so that you land at zero as often as possible, okay? You always have to move so you can’t just sort of stay still, you have to stay moving. But you choose the rule so that you want to maximize the probability of being at zero, okay? So does anybody have a guess on what the upper bound should be? I want to say that no matter what rule you choose, again your rule is just at every point you can choose differently at every point, but you can go plus minus one or plus minus two, that’s all you can do. Okay? And now I want to give you an upper bound on how well you can hit zero, to how well your martingale can aim for zero. Does anybody have a guess? Well, that’s definitely a good upper bound but -No, no, no. So ->>: You’re turning around a fixed end, right? >> James Lee: For a fixed end, but it’s an asymptotic problem. So it’s up to a constant I’m happy with. >>: But [inaudible] you’re trying to optimize it. >> James Lee: You do know you’re trying to optimize it, yes. Okay, so Costia [phonetic] said this is sort of the obvious conjecture to have, and in fact, Yuval lived with this belief at least probably for six months that this should be the right answer. I mean, this thing, look it’s plus minus one or plus minus two. I mean, the best you can do is something like this. Now, the truth is we actually don’t know what the answer is, but there is reason to suspect that actually the right answer is that one can achieve something significantly better, and to the half minus epsilon. So there is some rule, which does much, much better than standard random walk. Okay, so computer simulations are boring us out and there is a differential equation which ->>: [inaudible] >>: [inaudible] >> James Lee: I know, it’s not like a little epsilon. I mean, epsilon come sin various sizes, right? I mean -- [laughter] There is a differential equation, which suggests, yeah? >>: The rule, what is it? >> James Lee: Oh, that I don’t know. Right. I can draw a picture of the rule for you, which is just that as long as you’re within a certain area you just do plus minus one because you’re trying to stay near the origin. And then, once you realize you’re getting screwed and you’ve somehow sort of gone like way further than you want to be, you just desperately start plus minus two-ing to try to get back to the origin. >>: So if there’s K units left and you’re closer than one k, then you walk plus minus one. If you’re further you walk plus minus two? And [inaudible] values for this instead of plus minus one, plus minus two, you stay at zero with some high probability and you walk plus minus one, then this can be [inaudible]. >> James Lee: If you stay at zero with high probability, I mean, it’s uh ->>: No. You stay every step you stay in place with probability and you walk plus minus one ->> James Lee: You know how to prove it with equal zero? Or just with a power? >>: [inaudible] >> James Lee: So there is a differential equation that Charles Smart sort of analyzed, but it’s not sort of the corresponds between the continuous and discreet cases aren’t able to touch events that are this fine as sort of B add zero. So, well, now I guess we should ask Yuval about his proof. So let me stop the talk. Thank you. [applause] >> James Lee: Is it a commentarial type argument, or, like I mean, you have an exact rule by hand? Okay, good. >>: Any questions or comments? >>: [inaudible] somehow differential equation is the best thing for the continuous [inaudible], or is it just about ->> James Lee: Even there when you analyze it with two different rates the power doesn’t come out very cleanly. >>: Well, [inaudible] found some old papers that [inaudible] indicated that for the continuous case [inaudible]. >> James Lee: What’s the [inaudible]? >>: [inaudible] [laughter] For some power it says bound. >> James Lee: No, no. Well, as far as I know, and even what Yuval says for this the best known -- okay, there’s a question whether you can prove an upper bound of some power that’s bigger, that is possible. >>: That is easier. >> James Lee: But I guess you’re saying a lower bound that beats this, that’s what you really want. Are you asking if we know any strategy that beats this? >>: No, no. I’m asking what’s ->> James Lee: You can beat the trivial bound, which is one. >>: So it does go down by the power, no? There is an upper bound [inaudible] power. >>: [inaudible] >>: No, no. For instance, an upper bound on the probability so there is an upper bound that’s just one over M to the point one. >>: Oh, okay. >>: So you’re trying to ->>: The most [inaudible] thing is you’re trying to prove the lower bound that shows that the upper bound ->> James Lee: The upper bound cannot be one over root N, right? Which they can do apparently for a slightly different model. >>: [inaudible] >> James Lee: The point is that martingales, somehow even though they seem very trivial, this one is almost, like somehow there is still very sophisticated behavior going on. >>: [inaudible] power, the power of the martingale by the power of the square function, and that doesn’t leave the [inaudible] condition [inaudible]. That’s somehow for this application was important to have find that the more general than just power [inaudible]. >> James Lee: And I guess, yeah. This condition doesn’t necessarily hold for a martingale, but what we do is actually give the martingale a little kick in case it doesn’t’ satisfy this and that can be absorbed in the square function. >>: Any other comments or questions? Okay, so we adjourn. [applause]