>> Yuval Peres: Okay. So good afternoon,... double feature. At 3:30 Jeffery Grimmit [phonetic] will be...

advertisement
>> Yuval Peres: Okay. So good afternoon, welcome everyone. Today we have a
double feature. At 3:30 Jeffery Grimmit [phonetic] will be speaking, now it’s James
Lee who will tell us about Markov type and the multi-scale geometry of metric
spaces. How well can martingales aim?
>> James Lee: Okay. So thanks for coming. So the talk today is going to be about a
problem kind of with the intersection of probability and geometry of metric spaces.
Let me begin by just giving you a sort of conjecture that, well, which is the main
object in the tall court, at least a special case of the main thing.
So this is a conjecture of, oh actually, okay I’ll write the conjecture over here but
then I won’t write below it just in case the screen -- of nor [inaudible] and Scott
Sheffield from 2004, okay. And the conjecture is the following:
So one has first of all a planar graph. Either the graph can be infinite or finite. You’ll
see that for this particular stating of it it doesn’t really matter. You have a planar
graph. I’ll use D sub G to denote the path metric, or the shortest path metric on the
graph.
And then you also have a reversible Markov chain that states base as a subset of
vertices, okay? So this is some reversible Markov chain on V, and it’s supported on a
finite subset of the vertices, okay?
So this is why is sort of doesn’t matter if the graph is finite or infinite, I’ll tell you the
quantifiers here in a second. So you have this Markov chain and it’s important that
this Markov chain is started according to the stationary measure. So, in fact, this is
going to persist throughout the talk. The Markov chain is started as stationary, and
now the question one asks is what’s the average rate of drift of this Markov chain?
So look at the expected distance in the graph, starting at time zero going to time T.
Square this and then compare it to, so some kind of [inaudible] like this, compare it
to the distance that the chain goes in one step, okay?
So this is the kind of statement one wants. The expected distance squared after t
steps is at most the constant times time times the expected distance after one step,
okay?
Okay? So what’s the, I didn’t want to -- okay, so the point is that C here should be a
universal constant independent of the chain, independent of the graph. Just, you
know, C should be 10 or 100, okay?
This is the conjecture, okay, that there exists such a constant, such that for all planar
graphs for all Markov chains supported on the vertices [inaudible] have this kind of
drift that you would get for instance from a simple random walk on the path, right?
>>: [inaudible] have to be restricted to the edges of the graph?
>> James Lee: No, no. So the Markov chain, the only way the graph comes in is
through the path metric in [inaudible]. So it doesn’t have to walk on ->>: [inaudible]
>> James Lee: So the metric is certainly given by the edges of the graph. It’s the
path metric so you put some weights on the edges or you don’t have to wake them if
you don’t want.
>>: [inaudible] the Markov chain could be any subset. Couldn’t you just have an
[inaudible] metric?
>> James Lee: No because you measure the distance in terms of the geometry on
the graph. So ->>: If you have [inaudible] metric space you can just put a path [inaudible].
>> James Lee: There’s a restriction on the graph. So let me give a couple of
examples just to demonstrate some of the subtleties here. So if, for instance, if you
just considered some N by N grid and you did simple random walk, then no one is
surprised by the fact that it’s Holt [phonetic]. After T steps you expect your distance
to be about square root of T away from the starting point.
But suppose the random walk didn’t sit on the edges, so you could do things like
larger jumps, okay? Then if all the jumps were the same length then you also
wouldn’t be surprised because kind of just by rescaling it again would be a random
walk in the plane and the same thing with Holt.
So it starts to get a little more difficult if you think what happens when sometimes
the jumps could be small, and then every once in a while there could be a huge jump.
And now there’s sort of you can’t rescale to get a random walk on the plane again.
You can’t rescale to get a simple random walk.
And this is already, I mean, there is a simpler proof of this. There’s a fairly easy
proof that it holds for this graph just because it’s a subset of the, you know, it’s a
subset of the plane. But already this is, it’s not so easy to analyze that from first
principles when you have the jumps being allowed to vary in size.
Okay. So that’s one example to keep in mind. And another example is to look at the
complete binary tree. So if you weren’t paying very careful attention to the
definition you might think that this is [inaudible] for the binary tree because if you
start your random walk at the root, then simple random walk will of course go at
linear speed until for the height of the tree, okay?
But the point is that this Markov chain is started stationary and in the complete
binary tree almost all the stationary measure is at the leaves, at which point the
random walk doesn’t go anywhere. It just sits at the leaves, right? So okay, so these
are two examples to keep in mind.
So now, first of all, let me tell you where this notion comes from and then I’ll in a
little bit get into the proof. So it goes back to this definition of something called
Markov Type [inaudible] from 92. Okay, so let me define what Markov Type is.
So a metric space, this is a metric space, has Markov type P, so this is some number
greater than or equal to one. Okay, so there are a lot of qualifiers, but it’s not so -okay so here are the qualifiers.
So if there exists some constant C, such that, and I’ll remove these qualifiers for most
of the talk, but for every finite state space, this is some finite state space, for every
reversible Markov chain on the state space, okay, started at stationary, and for every
time we have this kind of dependence -- Okay, now I have this map. Oh I guess yes,
okay.
For every, I’ll explain this in a second. So take a finite state space, take a map from
the state space into your metric space. Take reversible Markov chain on the state
space, and then ask for something like this to hold, okay?
So it’s the same thing I wrote before for planar graphs, except now I have put a
power P instead of the power P equals two. And here I had the Markov chain
walking around on the graph, whereas here it walks around on some auxiliary state
space and then there’s a map from that state space into the graph.
This is the right definition because, well, I’ll explain in a second. It’s the one that has
the correct applications. It also sort of composes very nicely, but for most of the talk
I’ll actually omit this map F and just pretend that the random walk is walking
around the metric space.
I don’t know of any, you know, it could actually be that you could remove them.
Actually, I suspect it’s probably not true for some trivial examples. So for most of
the talk I’ll omit the map F but this is the proper definition.
In other words, the Markov chain has more state than just where it is in the metric
space. It also could have some auxiliary state sort of coming from this map, F.
>>: Independent of the metric space.
>> James Lee: Say it again?
>>: So C--
>> James Lee: Oh, no, no. So C could have been in the metric space. But C cannot
depend on any of this other data, okay? So C is probably -- whether a metrics or not
this P is the Markov type of the metrics space, the best P for which this holds in
there is this constant also is a property of the metric space. And then, okay, so
where does this concept come from?
I mean, it’s a fairly natural way to talk about the drift behavior sort of on finite
subsets of a metric space, okay? But it’s inspired by sort of something in the linear
theory, so in kind of the geometry of [inaudible] spaces. So suppose I have a norm
space X, and then I have a bunch of vectors in X, X1, 2, to XN.
Okay so I could consider a random sum that looks like this where the epsilon I’s are
[inaudible] random signs.
So I take some norm space, take a bunch of vectors, and randomly sum up the
vectors with sines. And now, okay, so I could, for instance, measure -- no. Okay, let’s
just use the square for a moment.
I could, for instance, measure okay so this is a random walk in this linear space. I
can measure the distance I go after sort of N steps according to this random walk,
and I could compare it, say, to something like this. Okay, so this is where you can
think about this as kind of this is the length this is the sum of the length of the
squares of the individual steps of the walk. Okay?
So if a [inaudible] space X satisfies this for some constant C independent of the
choice of vectors and independent of the number N, the space is said to have type
two, okay?
And there’s a sort of similar notion of if you reverse the inequality you get the notion
of co-type, and it turns out that sort of this parameter sort of type and this
parameter type and co-type, if you change the constant two to a power P, then you
get type P and co-type P.
Actually tell you a lot of geometric information about the [inaudible] space you’re in,
okay? So let me give you an example then I’ll tell you why [inaudible] prove this.
So, for instance, the following is known as the Morey [phonetic] extension theorem.
So take two [inaudible] spaces, X and Y, and then consider the following type of
question.
Okay, let’s actually also fix some subspace of X. And then consider the following
type of question. I have a bounded linear operator from S into Y. So it’s a linear
operator with bounded operator norm, and I’m curious about when I can actually
get an extension of this operator to the whole space X, okay? Such that it remains
bounded.
So this one is bounded and I want this one to be bounded as well, okay? An
extension in a sense that if I restrict the new math to the subspace, then it’s equal to
the old map, okay?
So you can ask sort of when do such extensions exist? And Morey proved that such
an extension always exists just given the behavior, just given information about how
random walks behave in X and Y.
So if X has type two, so if X satisfies and inequality like this, and Y has co-type two,
then for any bounded linear operator this extension always exists. And this kind of
extension is a very powerful thing you can do a lot with.
So what Keith was trying to do is, actually something on the face of it seems kind of
ridiculous, is there is this sort of beautiful history in the non-linear geometry of
[inaudible] spaces of linear notions, okay, so this is a bunch of linear notions.
I have linear spaces and subspaces and linear maps as having non-linear analogs
that sort of by all accounts one shouldn’t expect tot have, except for the fact that we
happen to live in a world that’s nice enough that somehow this kind of thing works
out.
And so what Keith was trying to do was to give a non-linear version of this extension
theorem just in the category of metric spaces, okay? So what do these things mean
in the category of metric spaces? Let’s just suppose we have two metric spaces X
and Y, and now we’ll have this S, but now S is just a subset of X.
So a subset of X, and instead of a bounded linear operator, we’ll have a Lipchitz map
from the subset into Y. And now the extension question is whether this map
extends, does there exist and extension of this map to the whole of X? Okay? Which
is Lipchitz.
So, for instance, it’s a classical result but if X and Y are Hilbert spaces it’s a result of
[inaudible] that such an extension always exists, and, in fact, you can take the
Lipchitz constant of the extension to be equal to the original Lipchitz constant.
Here’s what Keith proved, that if X has Markov type two, and Y has, okay so Markov
co-type two. Markov co-type is not defined but this is already interesting if Y is just
a Hilbert space.
So this holds for Y being a Hilbert space. So X has Markov type two and Y is a Hilbert
space or has Markov co-type two, which I’m not going to define for various reasons.
Then this kind of Lipchitz extension always exists. Okay.
I’m just trying to give this to get the setting correctly if you really sort of want to
know what’s going on. There always exists and extension such that the new Lipchitz
constant is at most some constant times the old Lipchitz constant, and that constant
depends on just this constant in the Markov type definition.
So the point is it’s a quantitative relationship as well, okay? So this is sort of you go
form looking at the geometry of spaces in linear spaces in terms of these linear
random walks. There is this metrical definition and there is somehow this beautiful
extension, analog, of the linear extension theory to the non-linear setting. Okay.
So, I mean, this is sort of just one example of the applications of this thing. Yeah?
>>: [inaudible] example that shows a metric space is not Markov two?
>> James Lee: Good. So let’s talk about so I didn’t give any example of metric
spaces that are Markov type two. Okay. So what did Keith -- Yeah, so there was one
problem with Keith’s paper. It’s a beautiful paper.
He proved the following theorem. He proved that Hilbert space, so L2 has Markov
type two, and, in fact, with constant C equals one in the definition there. I guess I
should mention, by the way, that Markov type one is trivial.
So if you take P equals one, then this follows with constant one just by using the
triangle inequality, lineative expectation, and then the fact that you are stationary
means that every step of the walk is actually distributed the same as sort of FZ zero,
FZ one. Okay?
So P equals one is trivial, just follow from the triangle inequality. Right. So yeah,
this theorem seems great, the only problem was that this was the only, the Hilbert
space was the only space for which Keith was able to prove Markov type two.
So if this were really a non-linear sort of hardcore non-linear generalization of type
and co-type, then you would expect that there would be other spaces besides just
Hilbert space that had this notion. Okay.
So I’m coming to your question. So now let me mention sort of, so this was the
situation until this work of -- I’ve already written your names elsewhere, so from
now on this will be NPSS. Right, so they proved two things.
First of all, they proved that LP for P bigger than two has Markov type two, which is
like a very comforting thing because those spaces have linear type two. So if this
were really a non-linear generalization then you would hope that when you
restricted the linear category you sort of get the linear theory back.
So that’s great. And they also proved that trees have Markov type two. And for
Yuval [phonetic] it’s a very nice kind of -- so this was in the functional analysis
setting, this was kind of the most important open problem after Keith’s work, to
resolve the Markov type of LP for P bigger than two.
I believe that they started actually just working on the problem for trees, and it
turns actually that trees are a bit harder in some sense than LP, P bigger than two.
So once they solved the tree problem this problem fell rather immediately.
So there are other spaces of Markov type two. What’s an example of a space that
doesn’t have Markov type two? Okay, so L one, I’d even say a different -- L one has
only trivial Markov types, only Markov type one. It has nothing better than one.
Okay? So why is that? So just take the discreet hypercube sitting inside L one, and
just take standard random walk. So simple random walk on the [inaudible] cube,
right? And now it’s straightforward to see that the distance from the starting point
to your ending point after, say, N over three steps. This is at least N over ten with
high probability.
So this immediately shows that this space has no non-trivial Markov type because
after N steps it’s gone distance N. So it certainly will not be the N, and of course,
every step goes distance only one. So the right hand side here is one, the left hand
side is after T steps, which for T equals N over three is going to be like N. It grows
linearly, it doesn’t grow sort of any slower than that.
Okay? So this shows that L one has only trivial Markov type, okay? And I guess it’s a
good point to sort of point out one other thing about Markov type now, which is,
okay, so let me do the following.
Suppose X and Y are metric spaces. Let me just write down what it means to be
[inaudible]. But sort of the map from X to Y is bi-Lipchitz, okay, if it preserves all
[inaudible] distances, so all right I’m going to cheat a very small amount here.
Okay. So let’s say that a map F is by Lipchitz if it preserves all the distances up to a
factor of C, this quantity, we’ll call sort of the informal C such that [inaudible] the biLipchitz distortion of the mapping. And one thing that’s fairly apparent is that this
Markov type is a bi-Lipchitz invariant, which means that if I have a metric space X
that embeds into Y bi-Lipchitzly, and Y has Markov type two, then X also has Markov
type two, okay?
And that’s just because if you look at the definition, if I can change all these values of
the constants and the statements are still true, okay? As long as I change them all to
universal constants, not depending on the [inaudible].
So if I can embed one space into another space in a bi-Lipchitz way, and the new
space has Markov type two, then X inherits this from the new space, okay, and in
fact, in a quantitative way, right? So if I could embed one space into another with
distortion K, then you would have the sort of same kind of inequality -- okay, I guess
all right.
Okay. This doesn’t matter because it was just a constant raised to the power P. But
if I want the constant to change in proportion to the distortion I guess I should put a
power P here. But the point is it’s quantitative. So if the map has distortion D, then
sort of X will inherit the Markov type of Y, but now with a constant. So where this
constant has grown by a factor of D, okay?
So in particular you can conclude, for instance, from this argument, that the
Euclidean distortion, so the distortion require to bi-Lipchitz the embed and the N
dimensional cube into Hilbert space grows like square root of the dimension, okay.
So there are many ways to prove this. It’s been known for over 50 years. But this
gives you an example of how Markov type comes into play also when you think
about the sort of bi-Lipchitz geometry of metric spaces. Okay, this is a very simple
example, but by studying random walks you can sort of really compare the
geometry of two spaces using this kind of notion.
Okay. Good. Okay, so now let me return to the conjecture. So I’ll prove this to you
now in a moment, but different from Keith’s proof I’ll follow sort of NPSS and then
you’ll see why the second, sort of the subtitle of the talk, comes into play. Why
Martingale’s become a key object of study here.
So ask many questions if you have any at the moment. All right. Okay.
All right. So now coming back to the planar graph question. So why hasn’t sort of
the theory so far answered this question? Well, one reason is that we know that
there are planar graph metrics, which do not bi-Lipchitz embed into Hilbert space.
And they don’t bi-Lipchitz embed into LP for P bigger than two, so you can’t use any
of this sort of embedding machinery we’ve talked about to solve this question. Let
me tell you one thing we do know about planar graphs.
Obviously we know a million things about planar graphs, so let me tell you sort of
one kind of embedding we know they do have, and then I’ll tell you our main result.
So one more notion of embedding and then we’ll move on to Martingales. So this is a
notion of a threshold embedding. So we have again two metric spaces, X and Y, and
we’ll say that X threshold embeds into Y, okay, so if there exists a constant K, and
now this threshold embedding is going to be a collection of mappings, not a single
mapping.
So it’s a family of mappings from X to Y. The family is indexed by some non-negative
real number. Okay, so the family of one Lipchitz maps, which satisfy the following:
If the distance between X and Y in X is at least Tao, okay, well, if I had a single biLipchitz map and the distance was at least Tao, then a distance in the image would
be at least Tao divided by some constant or times some constant.
But here the sort of only one of these maps is required to notice this. So in this
threshold embedding, okay. We have this kind of condition. These are all supposed
to be Taos, all right. Okay?
So X thresholds embeds into Y. If there exists a family of one Lipchitz maps,
basically one for every scale of the space. So there is no global map that can get all
the geometry right, but at scale Tao, V sub Tao reflects that scale Tao up to a
constant.
This is for all X and Y in X. But if the distance in X is at least Tao, the distance in Y
should be at least Tao to a constant factor. And this case is a universal constant,
okay?
So this may look a little bit strange. There are plenty of examples, for instance, even
the complete binary tree [inaudible] shows that the complete binary tree does not
embed into any Hilbert space, but it does threshold embed into Hilbert space.
Actually, I won’t do it now, but if you want at the end or offline I found at least a
fairly simple description of a threshold embedding of the complete binary tree into
Hilbert space. If you want to see what one of these things looks like.
And actually, especially in the number of applications in computer science,
understanding the relationship between these threshold embeddings and biLipchitz embeddings was very important, okay?
So there’s this ongoing theme in many parts of metric geometry of understanding
sort of when the ability to control a space at every scale uniformly sort of implies
the ability to control it somehow all at the same time, all the scales simultaneously.
Okay. So this is the notion of a threshold embedding. And here is, I shouldn’t call it
a fact, it’s not obvious, a theorem that sort of planar graph metrics threshold embed
into Hilbert space. So you can take that, so I’ll add the word uniformly.
So uniformly here means you can interpret this theorem in multiple ways. If you
take an infinite planar graph metric it threshold embeds in Hilbert space. If you take
a collection of finite planar graph metrics they all threshold embed with the same
constant, K. The constant is uniform. It doesn’t depend on the, okay?
And I won’t go into it now, but actually along with this conjecture they ask about
other spaces like doubling spaces, other even things sort of like hyperbolic spaces,
spaces of bound in the [inaudible] dimension. Okay, that wasn’t specifically asked
there, but the point is that all of those spaces admit threshold embeddings in Hilbert
space, okay?
So sort of it’s not just a planar graph question, although that’s what I’m focusing on
at the moment. Okay? So this is one thing we know about planar graphs, and now
here is the main theorem, which I’ll try to present in the remaining time.
If a metric space threshold embeds into Hilbert space, then the space inherits
Markov type two. Okay? So for bi-Lipchitz embeddings it’s straightforward, but it
turns out that you actually need much weaker control on the geometry to get the
Markov type two. Okay?
So just the existence of a threshold embedding is enough to get Markov type two.
And for any sort of enthusiasts or junkies you can sort of generalize this to say, if
you threshold embed into a P uniformly smooth [inaudible] space, then you have
Markov type P.
So there is sort of a generalization beyond Hilbert space. But this is sort of the most
interesting thing because it has applications. It shows that planar metrics, and
doubling metrics, and sort of hyperbolic spaces and so on all have Markov type two.
Okay. All right. So now I want to give you some idea of how this theorem is proved.
So now we’ll go to probability.
>>: [inaudible]
>>: I thought that was for universal C.
>> James Lee: Yeah. So this is, yes. So I guess this is something people do
quantitatively. In other words, the Markov type constant only depends on the
constant in the threshold embedding. And so the fact that this is uniform does imply
that you get uniform constants.
Okay. You’re right. So it’s weird. It actually does imply just by itself, you can
actually take all the planar graph metrics and just put them all in one single giant
planar metric space. You don’t even care if it’s separable or not, I mean, it’s
[inaudible] you can just -- and then the fact that I have for a single graph means that
[inaudible] uniform constant.
Okay. All right. So now let’s start with something easy. Let’s see, I have twenty
minutes? Is that how long I have?
>>: Twenty-eighty.
>> James Lee: Okay. Let’s prove this fact that was first proved by Keith [inaudible],
but we’ll prove it in a more complicated way, that the real line has Markov type two.
Okay. This is our goal.
Okay. So recall what happens. We have this Markov chain, which sort of has its
state space is finally many points on the real line, and this thing hops around. And
we want to prove something like this with P equals two. Okay.
So how can you do it? Well, there is one situation on the real line where we know
we get some kind of behavior like this, and that’s if we had a Martingale, right? So
suppose that we have some real value Martingale. Then, by just [inaudible] of
Martingale difference sequences, the expected distant square after T steps is exactly
the sum of the distances squared in the individual steps, okay?
So at least Martingales satisfy the kind of growth that we’re looking for. Okay. So
this is not a Martingale, but maybe it’s kind of, I mean, you started at stationary,
right? So if you run it for long enough it doesn’t go anywhere, sort of the center of
mass stays at the same point because you started stationary. Okay.
So the way to bring Martingales into the fold here, and this is what NPSS did based
on some work of Lions [phonetic] and Jung [phonetic], T. J. Lions. Okay, so let’s try
to convert this Markov chain into a Martingale.
>>: Which Jung was it?
>> James Lee: T.S. Jung.
[laughter]
Okay. Let’s try to convert this chain into a martingale. All right, so the first step we
can do pretty easily. We’ll start at the martingale at zero. Good. Now, the next step
we get into trouble. So let’s sort of try to define the difference sequence. So if the
world was sort of as we might hope and expect, we would just define our martingale
difference to be the differences of the values in the chain, you know.
And if this was a martingale we’d be done, so we’d be really happy. Okay? Because
then this would hold. That would immediately give us our inequality. Of course, it’s
not. So, okay, let’s just add the appropriate correction term, right?
For this to be a martingale you want that the conditional expectation of this is zero.
So let’s just force it to be zero. Okay? So the differences here are given by how a
chain behaves minus the defect of our chain from being a martingale, okay?
Okay. So let me tell there is a good thing and bad things. The good thing is that we
can control, by the way, tell me if I, I won’t write any lower but tell me if I start to
write -- the good thing is that we can control the increments of this martingale in
terms of the increments of the chain.
So this squared is at most, sort of twice the expectation of this squared, plus twice
the expectation of this squared. So sort of, all right, we can write something like this.
And then because you are stationary this is four times the zero minus the one
squared, okay?
So I’ve given you a martingale. At least I can bound the differences in terms of the
differences of zed [phonetic]. All right?
The problem, of course, is that what I really want to be able to do now is say, well, if
the martingale doesn’t go very far after T steps, then the chain also didn’t go very far
after T steps.
This is problematic because in every step this martingale sort of picks up this extra
cruft [phonetic] that sort of from the non-martingaleness of zed. So now here’s the
beautiful step.
I mean, we have reversible Markov chain at stationary, so we’ve been a little bit sort
of like, we’ve broken symmetry by only tracking it from the beginning, right?
Because it should be the same run backwards in time.
So in one sentence I’ll say the idea, and then we’ll write it down. But the idea is now
to do the same thing but have another martingale that tracks the Markov chain
backwards in time.
And then by the magic of reversibility, when we take the difference of those two
martingales, all the crap will cancel out and we’ll sort of, you know -- when one is
going forward in time and one is going backward in time, when they meet this extra
stuff cancels out. Okay?
So what do I mean? Here’s our backwards in time martingale. It starts at time T and
then the differences are given by T minus S minus one. The transformation is just T
minus this. Okay?
So the point is here’s another martingale where there’s backwards filtration. And it
satisfies that its increments are bounded by the increments for zed. And now here’s
the, okay, I’ll state it as a lemma, but -- if I look at zed as plus one minus zed as minus
one, this is exactly this minus it should be plus one, okay.
So this is my claim. That this is equal to this.
>>: [inaudible]
>> James Lee: No. No. Okay. It’s not divided by two.
>>: Check it.
>> James Lee: Okay, let’s check it. So for N what do we get? We can just plug it in.
I mean, you know it’s not divided by two --
>>: [inaudible] the difference by two on the left.
>> James Lee: Yeah. So there’s a gap of two on the left. That’s the -- right, so in
[inaudible] case there is a parody issue, so there’s a ->>: Yes, so it’s different from a continuous one. A continuous one you have to divide
by two.
>> James Lee: In the discreet case it’s actually a little uglier because we have this
gap. But okay, so you can just -- well, here’s the, okay. We’ll just do the calculations.
So you get zed S plus one minus zed S minus expectation of this condition on this.
And here you get, okay so here you get zed S minus one minus zed S.
So again you take T minus this and you get S minus one. And you get minus
expectation of this, okay? And now we’re subtracting this from this, so this cancels,
okay?
Here we get the right thing, zed S plus one minus zed S minus one, okay? Zed S
cancels, and now we have reversibility. The expected value of zed s plus one
condition on zed S is exactly the same as the expected value of zed S minus one
condition.
So those ->>: [inaudible]
>> James Lee: It cancels as well.
>>: [inaudible]
>> James Lee: Okay. So sort of the differences in z up to this annoying parody issue
can be represented by, it’s not a single martingale, but it’s the difference of two
martingale different sequences.
And so what this implies, okay, so there’s a parody issue that I’m going to now gloss
over because it’s messy, is that we can write zed t minus zed zero as a difference of
two martingales, AT and BT, where the increments here are bounded, sort of the
square of the increments here are bounded by what they were in zed.
A and B are almost M and N, except for the fact that you have to correct for the
parody issue. Okay? So then this immediately tells us what? That expected value
zed T minus zed zero squared is at most, well, twice expected AT squared plus twice
expected BT squared.
But now these are martingales and their different sequences are bounded by this, so
we immediately get -- okay, there’s some constant times T. Okay, all right. So there,
we proved, that’s the end of the proof that the real line has Markov type two. Okay?
And it was by taking our Markov chain, decomposing it to a difference of two
martingales, and then using just straightforward bounds for the martingales, okay?
So two things to observe. First of all, the proof didn’t have to be on the real line.
Actually, one of the beautiful things about the proof is that it only uses addition and
subtraction, okay? It doesn’t use multiplication at all.
So if it used multiplication it would still work. It would still go from the real line to
Hilbert spaces, but since it uses only addition and subtraction, actually you can
generalize it to arbitrary norm spaces.
Okay, so there’s one part that, this is the one part of the proof that uses
multiplication, actually. The idea that martingale different sequences are
orthogonal actually involves inner products.
So this is what fails in a general norm space, but everything else carries through.
Okay. And actually, so with this machinery now, for instance, if you can control
martingale different sequences in LP spaces then you can prove LP has Markov type
two for P bigger than two, which is what NPSS did.
So now, let me try to tell you the difficulty that arises in proving the main theorem
here, okay? All right.
So we’re going to follow this formula but then we’ll get stuck. Okay. So if we have
our space X and we say, suppose we had a bi-Lipchitz map from X into L two. I can
kind of, you know, sort of the point is that now I can -- and also I have my Markov
chain taking values of X, I can use this -- okay, so again you have to believe that
everything I said here works if you replace R by Hilbert space.
Hang on, just go through and check. Okay. It does work. I mean, there’s only
addition and then maybe, like, the Pythagorean theorem, okay. Okay.
So we can sort of take our bi-Lipchitz map, we would write this as a difference of
two martingales in Hilbert space, and now just apply everything as before and we
would sort of conclude that -- okay.
So what do we conclude? Expected value of this is at most some constant times T,
okay, times this. This is all in Hilbert space, all right? This is the L two norm. And
now use the fact that it’s bi-Lipchitz on the right here you just replace this by, you
know, okay, what’s going on in the metric space, and on the left here I think the
same thing. Replace it by what’s going on in the metric space.
There’s a reason I’m doing this. Okay. I’ve already claimed that bi-Lipchitz
preserves the property. So, I mean, here you would first map your space in Hilbert
space. You would write sort of the Markov chain under this map as a difference of
martingales and then proceed as before.
The problem, now, is that we don’t have a bi-Lipchitz embedding. We just have a
threshold embedding. So we only have controls sort of for every scale we have to
use a different map, which means that for every scale we actually get a different
martingale, all right?
So now, if my threshold embedding I unfortunately, of course, erased the definition,
I’ll recall it in a second, but if I look at sort of my map that’s able to control scale Tao,
I can again write this as the difference of two martingales. But the martingales
depend on the scale of the mapping that I’m using, okay?
All right. So now let’s try to use that to prove it, and then you’ll see where the main
difficulty lies and I’ll tell you how to resolve it, and then we’ll be done. Okay, so if
this didn’t make sense let’s just -- Okay. Here’s our setting.
We have a metric space X. We have this Markov chain zed. And we have this family
of mappings from X into Hilbert space. These are all Lipchitz and they have the
property if the distance in X is bigger than Tao then the distance in the Hilbert space
is bigger than Tao divided by some cost.
And now let’s try to use this to prove some bound on the expected value of this. So
this is what we care about. We try to prove that the existence of this bedding gives X
Markov type two. We should be able to prove an upper bound on this thing. Okay.
So now I’m just going to do the most obvious thing. I want to control this, but I can
only control one scale at a time, okay? So let’s first just write this in the following
way:
Let’s sort of the write the expectations squared in terms of the tail. So now at least I
know I can write this in terms of the event that this thing is big, okay? That’s going
to allow me to use my mapping at scale Lambda to say something about something
Hilbert space.
So now I can just say this is at most probability. So now I’m working at scale
Lambda and I have a K here, okay? So here I’ve just used the property that when
this is big it implies that this is big. Okay? When this is bigger than Lambda it
implies this is bigger than Lambda over K.
Okay. Now I know that for every lambda, all right, I’m using lambda so let’s put
lambda. For every lambda, this is a mapping into Hilbert space. So I can write it as a
difference of two martingales. For simplicity and the sake of time let’s just pretend I
can write it as a single martingale.
I’ll just cut down the number of terms by two. So let’s just use A. So we can now
bound this by probability that this martingale, okay now it’s indexed, okay good.
This is where I wanted to get. Okay. So now I can bound it by this kind of weird
thing. Now, this is very strange because, well, first of all it’s not even necessarily
measurable, but ignore that for the moment.
This is very strange because at every point in time I’m considering a different
martingale, okay? So, I mean, if these were just arbitrary martingales, or even
martingales with just bounds on their total [inaudible] norm or something I would
be out of luck here.
The only real benefit I have now is that I know the way these martingales were
constructed. They all live on the filtration that follows the random walk around the
metric space. So they’re all defined sort of with the same filtration, and that brings
us to the following.
I’ll state a theorem now, and then you’ll see how this sort of -- basically I have some
sort of bound on the increments of all these Markov chains, but there are a bunch of
them, and they could all use those increments in different ways.
So this is where the part about martingales aiming comes into play. Okay. So let me
write down the theorem.
Okay. So I have, and I’m just abstracting what we know here. So we have some
common filtration of our probability space, and we have some random variables
Alpha T, which are adapted to the filtration. And now consider, say, a family of
martingales.
Let’s index them. Okay, I’ll index them by, okay, some index set I. So this is some
family of martingales. And what I’m going to say is that, and okay all the martingales
are [inaudible] filtration.
I can bound all of their difference sequences uniformly in terms of Alpha. So a
bunch of martingales, and what I know about them is they all sit on the same
probability space and I have, this is a random variable, but I have the same upper
bound on all their differences, okay?
This random variable is the one coming from the random walk. This is essentially
how far the random walk goes in the metric space and that upper bounds how far all
of these martingales can travel, okay?
And now what I want to be able to say is the following, okay so this is the
assumption. Then the integral, I’m going to use y instead of lambda, of this times the
supreme over all these martingales; I’ll take the worst possible tail.
Okay. So here’s what I want to say. I have this integral, which is the same integral
here, I’ve just replaced lambda by y just so that lambda here is tied to the martingale
as well. Here I’ve taken just the supreme over all these possible martingales.
If I took the sup [phonetic] outside, then there would be an obvious bound here. If
the sup was outside, then this is just what’s inside is just the expected square of this.
That’s what’s inside if I take the sup outside. It’s the sup of a bunch of martingales,
but the integral is just the expected square.
And then, as before, we can just bound it by -- Okay [inaudible]. So if I took the sup
outside then I have to bound that I can just bound, this is just the two norm of the
martingales, just the expected value squared. And I can bound it by the expected
values of the increments, okay?
So the novelty is that the supreme comes inside but it’s still true with some
universal constant C out here. Okay. So this is in some sense, at least here I’ve done
it for real value martingales. This is maybe the main technical step to analyze
martingales in the real case.
And okay, so I just wanted to sort of express the difficulty of what’s going on here.
Again, if you get rid of the sup and you zed a single martingale, then this would
follow immediately from, this is just the expectation squared and you get it
immediately from the fact that the expectation squared is at most sum of the
differences squared.
But if I allow you to take the sup it’s not clear at all sort of why these martingales
can’t sort of each one try to aim for a different point in the tail, right? Like some of
them only have two jumps, you know?
I have a big jump I take with some small probability, and a little jump that I take
most of the time. So now my martingales can at every step I have a whole family.
They can take either of these two jumps.
So I can consider, for instance, the martingale that always takes the small jump to
the right and the big jump to the left, or I can consider a martingale that picks
uniformly at random and sometimes takes a small jump to the left and sometimes to
the right, or the martingale that, you know, goes the other way, all right?
And the question is whether if I give you some tail bound Y to aim for, can you sort
of conspire so that your martingale manages to use all it’s L two norms just to hit
that particular value of Y?
Okay? Well, the answer is no. Okay.
So there is uniform control if you take the sup over all these things, okay? So this is
the sort of how well can martingales aim? Not that well. Okay? Like, a martingale
can’t conspire to use all of its two norm to sort of just mange to get to Y, okay?
I mean, they’re subject to the same difference constraints. So this is the main
theorem. What’s the proof? Well, it’s essentially due to work of Burkholder and
Gundy from the 70’s, although that took us quite a while to realize. So they have
some very beautiful techniques for analyzing martingales using very clever stopping
times that allow you to prove this kind of result.
Okay. So let me know if it’s not clear. I’m going to end in just a second, okay? But
this is the main kind of thing that comes up here when you have these threshold
embeddings.
Now for every scale there’s a different martingale and you have to somehow control
them all uniformly. And it works, but it’s perhaps counter intuitive. Yeah?
>>: [inaudible]
>> James Lee: So it’s not, okay. So it’s odd actually. It doesn’t appear in his book. It
does appear in a survey paper around section 11. Okay. What’s the -- okay. So here
I’ll state the theorem. So let’s see.
So I have a martingale and I want to make sure and in this case I need some lower
bound on, like this, okay? Okay. I need some lower bound on some kind of thing
that says you make a move often enough, okay?
So this is a [inaudible] martingale have some bound like this, and then let’s define
the square function just to be the sum of the squares of the increments, okay? So this
is sort of what I know about the martingale. And now here is the claim.
All right, so let me write it down. I guess I should do this. Okay, so I’ll write it down
then I’ll say what it says. Okay, so for sufficiently smooth function one has the
following kind of control. Okay.
So this is the maximal process associated with the martingale. So I have the
martingale, I look at its maximum value. So I’ll say what smooth means in a second,
but for any sufficiently smooth function the phi of this maximum value is controlled
in terms of phi of the square function. Okay?
It’s in a very strong inequality and what kind of functions can you use? So this holds
as long as sort of phi is doubling in the sense that when I, okay, so there’s
quantitative argument here.
You assume phi is doubling with some constant lambda, so when I double the
argument the value goes up by a factor of lambda. That gives me some constant C
lambda such that this holds, okay?
And the interesting thing is you can’t use this to bound the tail necessarily because
you can’t use cutoff functions, but if you look at this integral you don’t really need to
be able to bond the tail precisely, you only need to be able to bound it up to
something which is sort of up to third order because the integral is computing
something that’s second order.
So just by making phi drop off cubically that’s enough to write which is going to be
doubling. So if phi is just a cubic function you’ll get something doubling. That’s
enough to control this integral. Okay. This is the Burkholder Gundy theorem. It’s
beautiful and somehow has to sort of cope with this and it does it using magic.
I mean, it’s sort of like, well, Burkholder has a survey where he has all these
techniques and there’s sort of this, you know, it’s a really magical stopping time
argument that gets this to hold. Okay.
So I have to end now, so I did want to give one open question. Okay. So this is a q
question about whether or not martingales can sort of aim, you know, use even
subject to some bounds kind of aim to be at a particular point. Let me just state
another question that came up in some other work with Yuval, which his also about
how well martingales can aim, and in this case better than we would have liked.
So it’s just sort of like as a way to see the deep richness of even very simple
martingales. Here’s a consider the following class of martingales.
So I have a martingale on the real line, and the martingale can just do the following.
At every point, so actually suppose it’s a martingale in the integers, at every point it
can go plus or minus one, or it can go plus or minus two. These are the options that
the martingale has to itself, okay?
It’s very, you know, it can just go left or right, or it can go left two steps or it can go
right two steps. Okay? And you can even assume it has the sort of Markovian. It
just makes a decision based on the integer it’s at, okay?
So ever integer it goes plus or minus one or plus or minus two. And now you can
ask -- so it’s a whole family of martingales, so you can choose the rule however you
want. Suppose you try to choose the rule sort of so that you land at zero as often as
possible, okay?
You always have to move so you can’t just sort of stay still, you have to stay moving.
But you choose the rule so that you want to maximize the probability of being at
zero, okay?
So does anybody have a guess on what the upper bound should be? I want to say
that no matter what rule you choose, again your rule is just at every point you can
choose differently at every point, but you can go plus minus one or plus minus two,
that’s all you can do. Okay?
And now I want to give you an upper bound on how well you can hit zero, to how
well your martingale can aim for zero. Does anybody have a guess? Well, that’s
definitely a good upper bound but -No, no, no. So ->>: You’re turning around a fixed end, right?
>> James Lee: For a fixed end, but it’s an asymptotic problem. So it’s up to a
constant I’m happy with.
>>: But [inaudible] you’re trying to optimize it.
>> James Lee: You do know you’re trying to optimize it, yes. Okay, so Costia
[phonetic] said this is sort of the obvious conjecture to have, and in fact, Yuval lived
with this belief at least probably for six months that this should be the right answer.
I mean, this thing, look it’s plus minus one or plus minus two. I mean, the best you
can do is something like this.
Now, the truth is we actually don’t know what the answer is, but there is reason to
suspect that actually the right answer is that one can achieve something
significantly better, and to the half minus epsilon. So there is some rule, which does
much, much better than standard random walk.
Okay, so computer simulations are boring us out and there is a differential equation
which ->>: [inaudible]
>>: [inaudible]
>> James Lee: I know, it’s not like a little epsilon. I mean, epsilon come sin various
sizes, right? I mean -- [laughter]
There is a differential equation, which suggests, yeah?
>>: The rule, what is it?
>> James Lee: Oh, that I don’t know. Right. I can draw a picture of the rule for you,
which is just that as long as you’re within a certain area you just do plus minus one
because you’re trying to stay near the origin. And then, once you realize you’re
getting screwed and you’ve somehow sort of gone like way further than you want to
be, you just desperately start plus minus two-ing to try to get back to the origin.
>>: So if there’s K units left and you’re closer than one k, then you walk plus minus
one. If you’re further you walk plus minus two? And [inaudible] values for this
instead of plus minus one, plus minus two, you stay at zero with some high
probability and you walk plus minus one, then this can be [inaudible].
>> James Lee: If you stay at zero with high probability, I mean, it’s uh ->>: No. You stay every step you stay in place with probability and you walk plus
minus one ->> James Lee: You know how to prove it with equal zero? Or just with a power?
>>: [inaudible]
>> James Lee: So there is a differential equation that Charles Smart sort of
analyzed, but it’s not sort of the corresponds between the continuous and discreet
cases aren’t able to touch events that are this fine as sort of B add zero. So, well,
now I guess we should ask Yuval about his proof.
So let me stop the talk. Thank you.
[applause]
>> James Lee: Is it a commentarial type argument, or, like I mean, you have an exact
rule by hand? Okay, good.
>>: Any questions or comments?
>>: [inaudible] somehow differential equation is the best thing for the continuous
[inaudible], or is it just about ->> James Lee: Even there when you analyze it with two different rates the power
doesn’t come out very cleanly.
>>: Well, [inaudible] found some old papers that [inaudible] indicated that for the
continuous case [inaudible].
>> James Lee: What’s the [inaudible]?
>>: [inaudible]
[laughter]
For some power it says bound.
>> James Lee: No, no. Well, as far as I know, and even what Yuval says for this the
best known -- okay, there’s a question whether you can prove an upper bound of
some power that’s bigger, that is possible.
>>: That is easier.
>> James Lee: But I guess you’re saying a lower bound that beats this, that’s what
you really want. Are you asking if we know any strategy that beats this?
>>: No, no. I’m asking what’s ->> James Lee: You can beat the trivial bound, which is one.
>>: So it does go down by the power, no? There is an upper bound [inaudible]
power.
>>: [inaudible]
>>: No, no. For instance, an upper bound on the probability so there is an upper
bound that’s just one over M to the point one.
>>: Oh, okay.
>>: So you’re trying to ->>: The most [inaudible] thing is you’re trying to prove the lower bound that shows
that the upper bound ->> James Lee: The upper bound cannot be one over root N, right? Which they can
do apparently for a slightly different model.
>>: [inaudible]
>> James Lee: The point is that martingales, somehow even though they seem very
trivial, this one is almost, like somehow there is still very sophisticated behavior
going on.
>>: [inaudible] power, the power of the martingale by the power of the square
function, and that doesn’t leave the [inaudible] condition [inaudible]. That’s
somehow for this application was important to have find that the more general than
just power [inaudible].
>> James Lee: And I guess, yeah. This condition doesn’t necessarily hold for a
martingale, but what we do is actually give the martingale a little kick in case it
doesn’t’ satisfy this and that can be absorbed in the square function.
>>: Any other comments or questions? Okay, so we adjourn.
[applause]
Download