Yuval Peres: And so thanks, all, for coming. ... talk before. This is a more -- this is...

advertisement
Yuval Peres: And so thanks, all, for coming. Some of you have seen versions of this
talk before. This is a more -- this is yet another attempt to make this -- to make the
ideas comprehensible.
So this is joint work with Jian Ding from Berkeley and James Lee from UW. Please.
>>: [inaudible]
Yuval Peres: What?
>>: Where is your picture?
Yuval Peres: Well -- [laughter]. Okay. So please let me know if there are questions
along the way, and so we're going to consider random walk from graphs, and you can
think of having conductances on the edges or just think of a simple random walk that's
general enough.
So here are some motivating pictures from two dimensions. This is drawn by Raissa.
This is random walk on a 2-dimensional lattice run from time about the square of the
side, and it's colored by the number of different sides. Here we're running it on torus
until everything is covered and color by the time it took to reach different vertices.
So the basic quantities on random walks are the hitting time, expected number of steps
to hit a vertex from another, the commute time, time to go from U to V and back to U,
and the one we'll be focusing on, the cover time, expected time for the random walk to
visit all the vertices. And, of course, the starting point comes in here. So we're going to
start from the worst vertex, that is, takes the maximum of the expectation over all
starting vertices.
So just to get -- to get acquainted, here the orders of magnitude of cover times for some
different graphs. So if you have a path of length n, it's n squared. Complete graph and
log in. Expander still in log in. Two-dimensional grid, n log n squared. And again -- so
n here is the number of vertices. And, again, for many of these examples the
constant -- for some of these examples the constants in front are known. But I'm not
going to focus on the constants here.
So for the 3-dimensional grid and also for higher dimensions, it's n log n. And then for
the complete d-ary tree it's, again, n log n squared. And -- yeah, there's -- one could
remove this log d because I'm not worrying about the constant, but you can think of d
growing and then you want the log d.
So some of these examples, for instance, this one, the constant was later found by
David Aldous, but we're just going to focus on the order of magnitude.
So people were interested in cover times for different reasons. One is as a simple
algorithm for determining connectivity in a network, you run -- you know the size of the
network, but suppose the operations you can do are very limited, so you can just walk
around vertices, record where you've been and see when you've visited all vertices.
But I think then this study took on a life of its own, and people in computer science,
combinatorics probability got really interested in this with no -- perhaps beyond the
practical motivations this had. So several people here worked a lot on cover times, and
I'll come back to that.
Now, one reason of interest is if we look at something like hitting times, they're very
easy to calculate, explicitly deterministically using linear equations, so I'll -- I'll come
back to this point -- while the cover time is more elusive.
Here are the generally bounds that are available. So a cover time of a graph is always
bounded by number of vertices times the number of edges times 2. This is a very nice
argument based on spanning trees due to these people, Alelinuas, et al., and there's a
lower bound of n log n, and it's somewhat tricky to get -- it's tricky to get the right
constant here. This was done by Feige in '95, but basically it's -- a variation of
Matthews' old argument can give you up to a constant.
And these are hitting times. These are cover times. And we'll connect them to electrical
resistance in a moment.
So a crucial tool in analyzing, well, both hitting times and cover times is the notion of
electrical resistance, which -- and so you think of the edges as unit resisters, and there
is an effective resistance between one vertex if another. If I would want to define that,
then think of sending a flow from one vertex to the other, a flow that has to satisfy the -the incoming flow at every vertex has to be the outgoing flow except the source in the
sync. I send a unit of flow from the source to the sync and then I calculate the energy of
the flow, the sum squares of the flow on the edge and try to minimize that. That's one
definition of the energy, of the resistance, the minimum energy of a flow that goes from
one vertex to the other.
And there's a connection between the effective resistance and the commute time. So,
remember, the commute time is to go from U to V and back, and that's -- the commute
time is just equal to twice the number of edges times the effective resistance. So a
general and very convenient formula.
>>: [inaudible]
Yuval Peres: Right.
>>: I don't see how that work, because what if the graph is disconnected [inaudible].
Yuval Peres: Okay. So we're looking at connected graphs. So this is for connected
graphs. So that's part of the assumption, that the graph is connected.
>>: [inaudible]
Yuval Peres: What?
>>: [inaudible]
Yuval Peres: So if you --
>>: [inaudible]
Yuval Peres: There are no -- with this formula, there are no weights.
>>: [inaudible]
Yuval Peres: What? Right. The graphs are undirected. So this is for -- these formulas
are for undirected graphs, and everything extends to reversible Markov chains, but let's
just think of simple random walks on undirected graphs. That's a large enough class to
focus on.
And we'll see that this connection between the effective resistance and the commute
time gives both of them some nice properties. So the question of computation of cover
time I think was highlighted by Aldous and Fill in '94. So if you look at hitting times,
hitting times are easy to compute explicitly, and this is because they satisfy linear
equations. If I want a hitting time from U to V, well, there's a first step to go from U and
then I have to average over the neighbors of U the time to go from W to V.
So I can write down these linear equations and it's not hard to check. These are
enough equations, and they're non-singular, so one can use them to determine all the
hitting times. So just -- so a very fast algorithm to determine explicitly the hitting times.
Of course, you could run the Markov chain and compute hitting times at random and
average, but there's no need to do that. Linear equations are better.
Now, for cover time, it's more challenging because there's no simple equations like that.
So one comment is, of course, you can think of the cover time problem as a hitting time
problem but going over to a much larger space. So if you go to a non-reversible chain,
which is just the chain of sets you visited so far, so the sets you visited so far by the
base chain itself forms a Markov chain, and you want to -- you start with a single
[inaudible] and you want to hit the whole space. So it is a hitting time problem.
This linear equation connection doesn't really depend on the fact that it's reversible.
>>: [inaudible]
Yuval Peres: Right. It contains the set and where you are now and where the
[inaudible] and then that's all the information you need -- the past is irrelevant -- to
determine the progress of this in a set value chain. So you can see when the set hits
the full space, so that would be a hitting time. But -- so you reduce it to a hitting time
problem, but then exponentially sized space, so not very pleasant computationally.
The other option is we can estimate cover times just by running a random walk. So
since cover time is bounded both in expectation and also the standard deviation is
always bounded by order n cubed on an n vertex graph, then just by running many
times and averaging the result, you get a polynomial time algorithm to approximate the
cover time to any degree of precision, but, A, there's the statistical errors so you don't
get it exactly, and there's the randomness.
So one can -- so one question which, again, as far as we've been able to trace goes
back to Aldous-Fill, '94, but Lovasz was also propounding it later, to find an
approximation to order 1 of the cover time for general graphs which is a deterministic
one. So not -- no randomization.
>>: When you say order of one, do you mean difference or the ratio?
Yuval Peres: Ratio.
So here's the strongest result of this type known before. So if you look at the -- so if H
max is the maximal hitting time, then there's a result of Matthews I can remind you why
that says that the cover time bounded by H max times a log factor. But there's also a
lower bound of this type, and I'll say that in a minute.
But the key thing is that Kahn-Kim-Lovasz and Vu showed that this lower bound, if you
combine these two lower bounds, of course, the cover time is bigger than the hitting
time, and it's bigger than this quantity. If you combine those, that's a pretty good
approximation. Indeed, it's up to log log n squared sharp in general.
So let me quickly go over those. First, for the upper bound, if we don't care with the
sharp constant, there's a very easy argument for the upper bound of the cover time,
because if I wait until twice the H max, I start a vertex. If I wait until twice
H max, well, then for any vertex of probability I'll hit, it must be at least a half. So the
expected number of vertices I hit by time 2H max is at least half the graph. So at most,
half the graph remains.
Now I repeat from where I am, and, again, just looking at these leftover vertices, each
one is hit with probability half. So the number of leftovers is going to be a quarter after
the second round. So if we just go logarithmic number of rounds, the expected number
of vertices that have not been hit is going to be less than 1 so with high probability we've
hit and this way we can bound the cover time.
What Peter Matthews is a more elegant argument that gives the right constant. So the
cover time is bounded by H max actually times. This is a proxy the harmonic series 1
plus a half plus up to 1 over n. And his argument was based on taking the sequence of
vertices, putting them in random order, so adding randomness to the system, and then
looking at which vertices in this random sequence you hit. It turns out that each time
when you go from the vertex k to k plus 1, there's some chance that the k plus 1 vertex
in your sequence was already hit before. The chance it wasn't hit before because of the
random permutation is 1 over k plus 1. So that's the chance you'll need to take in that
hitting time.
So that rough idea made precise, Matthews proved this upper bound, and the same
idea proves this lower bound. So for any set S you can get a lower bound by looking at
the hitting times within the set times the harmonic series whose length is just the size of
the set. So you take this set you're interested -- if you're going to cover -- in particular,
you have to cover this set, well, so you have to go between the elements of a set. You
take them in random order and note that here we're taking them in random order. We
go from 1 to the next to the third and so on. Each time we lower bound the hitting time
by the minimal hitting time, but then these hitting times will only be taken with some
probabilities because with some chance the vertices has already been hit before. So
you get this kind of lower bound.
And then it turns out it's not obvious, but it was proved in this paper KKLV that if you
replace here the hitting times by the commute times, it's the same of the constant. And
this quantity is, again, up to log log n squared, sharp. So there's an upper bound of
[inaudible] by this random side together with this multiplied. So you take the maximum
of this and this, multiply them by log log n squared and get an upper bound. This was a
non-trivial argument. And that yielded a deterministic approximation up to this factor.
So the question of a constant approximation remained open.
>>: Your title is polynomial time because of the max over S. This doesn't seem to be
polynomial time.
Yuval Peres: Thank you.
So another thing that was done in this KKLV paper which was -- again, this is not
obvious -- is that you can estimate this quantity up to constant in polynomial time. So
this -- so there's an algorithm for this. It's related to questions on packing and covering
of spaces if you look for the -- the best packing of balls many a metric space and you
want to estimate what's the largest packing you could do of these joint balls, it's very
hard.
But if you are ready to forego some constants or especially some constants [inaudible]
log of the size of the set and then you can then just agree the algorithm will give you the
sharp result, and that's roughly what is shown -- so this part of the argument is not the
hard part of their paper to show that this quantity which, as you say, is defined by an
exponential max, can actually be calculated efficiently. This is a relatively easy part of
their paper, but I won't do it here.
Thanks.
All right. And a very recent result is that for trees, one can do a recursive approximation
that gives the cover time up to 1 plus epsilon. For any epsilon you can get a polynomial
time, polynomial time result. And this was actually predicted. So there was a paper by
David Aldous that worked out the exact cover time or worked out the order of the cover
time with some further information for random trees.
So you take random labelled tree on n vertices picked uniformly among all such labeled
trees and then the cover time of that is order n to the three halves. David's proof was
based on the recursion, and he wrote that he expected this type of recursion could be
used for an algorithm for general trees.
Turns out to be a correct prediction, all though there's a lot of sweat and tears in actually
carrying this out for general trees. And this can be found in this paper from September
by Feige and Zeitouni. However, trees still seem very special where you can do these
kind of recursions.
So now I want to come to a second motivation, which is suppose a probabilist is
interested in cover times, but not in algorithmic calculations. What's the relevance of
this quest?
Well, the KKLV paper, besides giving the algorithm, it also gave a result on the question
of Winkler and Zuckerman. This is about blanket times. So blanket times are times
when we cover a graph approximately in the form [inaudible]. Let me be more precise.
Look at a graph like the complete graph where the cover time problem is just a coupon
collector problem. So you just walk at random, you have n vertices, you wait until you
cover. We know that time is n log n. Even the constant is 1. And at the cover time
itself you have most vertices visited order log n times, and, of course, the last vertex to
be reached is visited just once. So you have this disparity.
So suppose I want a stronger notion of covering where all vertices will be visited about
the same. So what's the right definition there? Well, if I just run to infinity, I know that
the number of visits to every vertex is going to be according to the stationary
distribution, which is just the degrees.
So the right thing to look at is what we call the local times, which are the number of
visits to a vertex normalized by the degree. And if we look at these quantities, they
should balance out. So we know that if we drive time to infinity, all these -- the ratio of
all these local times to each other is going to tend to 1. But one could look [inaudible]
well suppose I didn't want to wait until infinity and have only finite time and I want those
ratios to be bounded by some constant, 10, 2, 1.5. You name the -- let's fix a constant
like 2, and then we want all these ratios to be 2. How long after the cover time do we
need to wait?
So they examined this in some examples. So here's the formal definition. The beta
blanket time is the expected first time where all local times are within a factor beta of
each other. So beta is some fixed number bigger than 1. You could take it or -- I'm
sorry, fixed number less than 1. So you could take it to be a half.
And so the conjecture of Winkler and Zuckerman is that the blanket time is actually
equivalent to the cover time. So once you -- the constants in this equivalence will
depend on the beta, but once you fix beta, then you only have to wait a constant's
multiple of the cover time for things to even out. May be a little surprising.
They proved it for some special cases, actually not that many, but in the complete graph
it's an easy classical things to analyze. But they also did it for a circle and tori. And in
all these cases they found that it and then they made the brave conjecture that it's true
in general and that this constant will depend on anything except the beta.
>>: [inaudible]
Yuval Peres: It would what?
>>: [inaudible]
Yuval Peres: Yeah.
>>: [inaudible]
Yuval Peres: The ->>: [inaudible]
Yuval Peres: The number of visits ->>: [inaudible] it becomes like Gaussian, but there's got to be some [inaudible] number
of them because -Yuval Peres: So we have to wait for -- we have to wait for quite a while. So on graphs,
you have to wait more than time -- certain more than time n log n. But order n log n
could suffice if the cover time is n log n.
So, well, there is something surprising to this. You have to look at a few examples to
even convince yourself this is plausible. But that's what they did. And it turned out that
the estimates that were given by this paper KKLV that I mentioned before work for
blanket time just as they work for cover time. So since they estimated the cover time up
to log log n squared and the same estimation works for blanket times, then you
conclude -- even if you started with a goal of algorithmic calculation, you got a formula,
and this formula is good for blanket times too. So the arguments work there. So it
follows that the Winkler-Zuckerman conjecture is true up to log log n squared.
Okay. So now I'm going to shift gears and start telling you about our work on this. And
so to do that I need to introduce one more object, the Gaussian free field on a graph.
So this is a Gaussian process. So we have a sequence of Gaussian variables indexed
by the vertices. Gaussian process means all linear combinations are also Gaussian, or
you can think of this as obtained from an independent Gaussian vector by multiplying by
some matrix.
So that's a Gaussian process. This is a special Gaussian process, has several
equivalent descriptions. So here is one of them. So if I want to see -- to understand
this process, it's fixed to be 0 at some vertex v0, so that's some anchor vertex where the
process is 0. And then everywhere else -- to know the process everywhere else, it's
enough to know these L2 norms. They determine the process. And the L2 norm
between two vertices x and y is just the effective resistance between x and y.
Here's an equivalent definition, and I'll give you a couple more orally. So another
definition is a Gaussian process is completely determined by the covariances, and the
covariance between gx and gy is the green function or the green kernel for random
walk. You start at x and -- so you count the expected number of visits to y normalized
by the degree of y. That's a symmetric function. Random walk is killed when it hits this
vertex v0, and that's a positive definite function, and so there always exists a Gaussian
process with that as covariance.
Let me tell you more intuitively what this is. This is a two-dimensional picture. In one
dimension a Gaussian process would just -- a Gaussian free field, suppose my graph is
a long interval. Then what -- and v0 is one endpoint of the interval, then it's just a
Gaussian random walk with -- going along the interval. That's all it is in that case.
In the -- more generally, if I graph is a tree and v0 is some fixed vertex, say the root of
the tree, then the Gaussian free field is just obtained as follows. So for every edge you
give an IED -- you give the edges IED normal variables and then you just sum along -you have these -- so if this is your tree, you give the edges IED normal variables, and
then to determine the value at the vertex you just sum these variables from the root to
give you the new value. So that's the Gaussian free field on a tree.
Now, on the graph, we want to do the same thing. So if we have a graph, we want to
first assign the edge -- fix some v0, assign the edges labels, which are IED normal, and
then define the value at the point to be -- well, if I have a point here, let me just take
some path here, sum the values along this path, and that gives me the value at w.
That's a nice choice. But what if I choose a different path? I would get a different value.
So to define the Gaussian free field, one thing we could do is use this definition, but
insist that we'll get the same value no matter which way we go. In other words,
condition that the sum along every directed cycle of these Gaussian variables has to be
0. So you start with IED Gaussian variables, but then I impose this very strong
condition on them.
This is conditioning of a measure 0 event, but with Gaussian it's easy to do such
conditioning. Conditioning is just projection for Gaussian. So you just project on the
subspace where the sum along every cycle is 0 and now once you've done this
conditioning, now you can do what you like -- what you did before on the tree. So these
definitions and several others are equivalent definitions of the Gaussian free field, and
it's been used a lot in recent years by Dave [inaudible], [inaudible] and others, especially
in the continuum limit.
So in the example I did on the interval, if you take a limit of that, you'll get [inaudible]
motion. So in the plane you could -- in the lattice you could do a Gaussian free field and
take a limit of that. You'll get Gaussian free field in the continuum, which is very
important, but not the topic I'll discuss today.
>>: Is there an easier way to sample from it? Maybe just computing these by the
resistances and [inaudible].
Yuval Peres: That's basically it. I mean, the green function is kind of an inverse of an
Laplacian, so you can use that.
>>: Okay.
Yuval Peres: So find more Smith inverses rather than computing the green function
specifically, but then [inaudible].
>>: Isn't it also a stationary measure if you just take the average of the neighbors and
then add a Gaussian to each [inaudible] simultaneously?
Yuval Peres: Right.
Okay. So here's our main result which says that -- or one form of it -- which says that
the cover time is equivalent up to constant to the -- you take the maximum of this
Gaussian free field, take the expectation of that, square it, and multiply by the number of
edges.
This is also equivalent to the blanket time, but here the constant depends on beta.
So in particular, already stated this way, the Winkler-Zuckerman conjecture is true, the
blanket times r equivalent to the cover times, but we can't see this directly. We have to
make this detour via the Gaussian process.
Any question on this statement?
>>: Have you determined the [inaudible] for estimating this?
Yuval Peres: Yes. So this is one randomized quantity. This is another randomized
quantity. So what have we gained? Well, what we've gained is now we're ready to step
on the shoulders of giants who have been analyzing Gaussian processes for 50 years.
So here is another -- before getting to the be intuition, let me give you another
interpretation of this which I like. So given a graph, how can we see somehow the
cover time in the graph? So given a graph, I want to give it a metric. So effective
resistances are a metric. So they satisfy the triangle inequality. And if you have any
metric, the square root of that is also a metric by concavity.
So let's endow the graph with this metric, the square root of the effective resistance.
And turns out with this metric, it's a -- this metric embeds the graph in the Hilberg space,
in Euclidean space. So you can take the vertices and put them in Euclidean space
where the distances are exactly the square root of the effective resistance. In fact, the
Gaussian free field is such an embedding, because the Gaussian free field is an L2
metric, and the L2 norm between two vertices is the square root of the effective
resistance.
So when we think of actually taking the graph and putting it in space with these
distances, now we claim that we can -- so think of this graph as spinning in space. And
now we can see -- so the effective resistance is just the diameter squared of this graph
in this metric, but the cover time is also there up to constant. As follows, take this
graph, project it on a random line. So take a line in random direction and project this
graph on the random line and you get some random diameter d. Take this expected
diameter squared multiplied by the number of vertices, number of edges, you got the
cover time.
So just by scaling by these constants you get from this diameter of this projection to the
cover time. So this graph as embedded in Hilberg space has in it geometrically the
cover time. This is just a restatement of the previous theorem because projecting -- if
you have something in high dimensional space, projecting is the same as multiplying by
a Gaussian vector because a Gaussian vector of length -- of dimension n, its norm is
almost constant. So multiplying by a Gaussian vector is the same as -- with a smaller
error -- as multiplying by uniform element this sphere which is projecting on a random
line.
So it's easy to unravel this and see that it's equivalent to the previous one, but I like this
geometric picture.
>>: Can you get the deterministic algorithm in this way by computing this, embedding,
and then maybe some randomization principles involved applied to -Yuval Peres: We don't know anyway to do deterministic [inaudible] this way. So it
would exciting.
>>: So you can get the embedding? That part is okay?
Yuval Peres: Right.
>>: But this [inaudible].
Yuval Peres: The problem is this expected -- right. Right. This expected diameter in a
random direction. That's the trouble.
>>: [inaudible]
Yuval Peres: Okay. Well, first of all, I just claim that the graph with this metric can be
embedded, and you can see it in various ways. The way I like to think of it is just the
Gaussian free field is itself embedding. So to every vertex I attach this Gaussian
variable, and I use the L2 metric, and the definition -- let's go back. Definition of the
Gaussian free field says that the L2 distance between these two variables is the square
root of the effective resistance.
This is the Hilberg space. Now, it's the Hilberg space with n points so you can always
think of it as rn.
>>: So it's possible to go from the distances to some kind of location in a Euclidean
space [inaudible].
Yuval Peres: Yes. Yes, it is possible. So if you have -- if you know the metric and you
know it's Euclidean, then you can write down a matrix and get this more explicit.
Okay. So the key in the theorem is to relate the cover time to the maximum of the
Gaussian process, and so I have to tell you something about this maximum Gaussian
processes.
So this is a highly studied subject. And we're just going to take centered Gaussian
practices, so mean 0, and given any Gaussian process, turns out the right way to
analyze it is to first look at this metric where the distance between two indices is the L2
norm of the corresponding Gaussian variables. So that defines -- that's an L2 metric.
And then in terms of that classical problem is estimate the expected maximum.
Actually, the way this occurred classically is people were really interested in processes
in the continuum and wanted to know is the process continuous, is it bounded. But in
order to do that they took some nets and tried to estimate the maximum on the net, and
that's how this problem arose.
So in a way this goes back to [inaudible] and [inaudible]'s proofs of continuity of running
motion, this type of question, in the 1920s.
I'll tell you a little bit about the history of this, but let me jump to the answer, which is -the final answer was obtained in '87 by Talagrand following an earlier work of Fernique
which is that the [inaudible] can be estimated by some quantity that's deterministic
function of the metric space, which is given by a formula. Their explicit form again
involves some quantization over all partitions or sequences of partitions of the space,
certainly an exponential quantization. But once you have a [inaudible] formula you can
play with it. And in fact, we do find a polynomial time algorithm to calculate this
Talagrand gamma 2. And I'll give you the definition of gamma two later, but the key
thing right now is that there is this formula.
So this formula was first suggested by Fernique, who gave an upper bound, and by far
the harder part was to prove the lower bound, which was done by Talagrand in '87.
So in view of this theorem, the cover time, our cover time result, can be stated as saying
the cover time is the number of edges times this quantity gamma two of Fernique and
Talagrand applied to the vertex with this square root of effective resistance metric and
then you have to square that.
And there is a deterministic polynomial time algorithm which we construct in the paper
to approximate them too. So this answers the Aldous-Fill question.
>>: I don't know if this question makes sense, but is your [inaudible] for a gamma 2 for
any Gaussian process or just ones that arise in this way? Or does every Gaussian
process arise in this way?
Yuval Peres: Not every Gaussian process arises in this way. I think -- what we have is
intermediate. It's more general than this case, but it doesn't cover the completely
general case.
So what struck -- in fact, this work really started from some earlier work we did together,
Jan [phonetic] and I did together with [inaudible] Nachmias and Martin Barlow in UBC,
and in retrospect, it struck us that this connection should have been spotted before.
And let me give you some examples of the parallel between a Gaussian process and its
maximum on the one hand and hitting times, local times and cover times on the other
hand. But remember that the parallel is really between the cover time and the squared
maximum of a Gaussian process.
So what are the obvious lower bounds? Like cover time is bigger than hitting time and,
well, the maximum is bigger -- the expected maximum is bigger than the maximum of
the expectations. So there is -- what about upper bounds?
So I told you about Matthews bound, which if I don't worry about the constant, it's really
immediate, and even if I do, it's pretty easy. And if I want to estimate the maximum of a
Gaussian process, well, I can just use the union bound, and if -- you use the union
bound on the tail of the variables, and if you use the union bound, you lose a factor of
root log n. So root log n is -- the inverse of the Gaussian tail is e to the minus x
squared. The inverse of that is a root log. And so just the union bound for an a bunch
Gaussians to estimate the probability that they're all bigger than something will give you
a root log n, or in terms of the maximum squared, it will give you a factor of log n.
Now, I told you about Matthews lower bound, which lower bounds the cover time. In
terms of hitting times in some subset times the log of the subset, here's an important
inequality in Gaussian process which says that the maximum squared of a Gaussian
process can be bounded by these distances squared between the variables times the
log of the set that I'm considering. So that's another parallel.
Let me go further. There is classical Gauss concentration of variables, and there is a
very similar [inaudible] I'll state exactly for concentration of local times proved by Jeff
Kahn and his collaborators. And then there's a -- so an integral to bound the maximum
of Gaussian process introduced by Dudley in '67, and there was a related sum or
integral that we did in this recent paper with Asaf [phonetic] and Barlow, Martin Barlow,
which this is actually what rang the bell for us to see that, well, these are two parallel.
And then we thought, well, if we've obtained the -- we are -- on the study cover times,
where the Gaussian folks were in '67, so luckily we can use a time machine and say
where were they 20 years later? Let's try and catch up.
So I'll give these parallels in a minute. So, again, Sudakav minoration is what I told you
just a moment ago. And that's the parallel with Matthews lower bound.
This is just concentration of Gaussian variables. Probability of difference is bigger than
lambda is bounded by E to the minus lambda squared, and you have to normalize by
the variance. So the variance is this distance.
Here the concentration result was proved by KKLV that -- and, again, this fact is very
easy. It's just probability of the local time at one vertex and the local time to another
vertex differ by alpha can be bounded by E to the minus alpha squared. Instead of the
distance here you have the effective resistance times this local time.
Now, in this story L is a local time at one vertex. So L is a local time at u. We run
until -- so we run the walks until u has accumulated that local time. And so we actually
run to a random time capital T which is the global time we need to accumulate local time
L at at the vertex u.
And then the probability that these differ by more than alpha can be bounded this way.
Just to explain the philosophy here, if I have two vertices u and v and I'm kind of
measuring local time with u, every time I do an excursion I look at the excursions from
u. So I go from u and wait until I return to u. Well, the number visits to v needs such
excursions a random variable, and these random variables are all independent because
in different excursions I'm adding I said things. And also the distribution is very simple.
If I go from u so v I first have to hit u from v. That's a geometric variable, and the
probability of success is essentially the effective conductance from u to v.
And then if I do hit, then I have a geometric number of visits to v before coming back to
u. So this is just a large [inaudible] inequality for sums of essentially geometric random
variables or geometric times [inaudible], and that's what's proved there and then used
there to prove their bounds. So you see the parallels.
Finally, here is what kind of rang the bell for us. This is the Dudley 1967 integral. So
given any metric space, an important quantity is known as metric entropy. Covering
numbers is a minimal number of [inaudible] epsilon, you need to cover a set. And,
again, you can also ask for the maximum number of disjoint balls in a set. And if you're
only interested logarithmic asymptotics, these two things are the same.
So anyway, that's for any metric space. And his bound was given a Gaussian process,
use this metric that's adapted to the process and then integrate the square root log of
this over the scales. So may look scary the first time you see it, but it's actually a very
easy bound to prove and also sharp in many cases, in all symmetric cases.
In this work I mentioned B, D, and P. We're actually interested in specific graphs.
We're interested in cover time of nearly critical random graphs, graphs which were kind
of delicate. So in order to prove the estimates there, we had to write down pretty good
general estimate, and when we did, well, you know -- or, rather, after we did, you know,
we looked again at this estimate and just recognizing that it was very close to Dudley's,
and that's what brought the connection to our mind. And then we said, well, since
Dudley is a good upper bound, but the answer there is known, maybe the sharp upper
bound from there fits in our problem.
Okay. And this is a reminder. This is the sharp result by -- again, for the life of us, we
cannot find a picture of Fernique so you just have to see Talagrand twice.
So that's his theorem that estimates exactly. And here is the definition of this gamma 2
that estimates it. So it's rather elaborate if you've never seen it before given -- so you
look at a sequence of partitions. Each one is a refinement of the other and they're
growing doubly exponential 2 to the 2 to the k. And then given a point x, you look at the
diameter of the partition element Ak of x that contains x, add those with these weights 2
to the k over 2 -- by the way, one thing -- it's hard to parse where this comes from, but 2
to the k over 2 is the square root of the log of the size here. So that's what's coming in
here.
And so you take sup over all points and inf over partitions. So the way this is defined, it
doesn't look algorithmic at all, and there's quite a bit of effort to prove that this can be
calculated in polynomial time. I won't go into that now.
All right. So I'll give you the rough reason where these kind of estimates come from.
This is the method of chaining. So if you have the maximum of independent variables,
you expect the union bound to be tight.
So that means for maximum of k points you expect order of square root log k. And,
indeed, it's easy to prove in the independent case. So we think of the independent
case, you know, we have -- all these points are far away.
So, formally, we can prove something like that. More generally for processes that are
nearly independent using combining Gaussian concentration and Sudakav minoration.
Now, once the process is not independent, then using a union bound is too crude. So
the idea which really goes back to Komogorav [phonetic] is to use chaining. So
suppose my space has -- geometry looks like this, and we want to bound the Gaussian
process. So, remember, the distance between two vertices is the standard deviation of
the differences between the Gaussians.
So what you do is first bound the process on these gray points using a union bound. So
those aren't that many. And then say that on the red point it's going to be not too far
from the corresponding gray points. So this is two stages of chaining. So you'll bound
on the gray points using a union bound, and there aren't too many of them. And then
the red ones, if you just look at their distance from your base, from your center, that's
too big and there are too many of them, but if you just compare each one to the
corresponding gray one, well, we have lots of red points, but now they're going to kind
of be tightly coupled, tightly knit, connected to the corresponding gray one.
And you just have to iterate this again and again. So two levels of chaining in general
won't be enough. You have to do a growing number of levels of chaining.
Now, how do we technically connect the Gaussian world with the random walks? Well,
if all you're interested is in upper bounds for cover time, you can do it directly using this
KKLV concentration I told you about, which is the analog of the Gaussian concentration,
and just repeating the Talagrand argument for his upper bound or his results
immediately gives you the upper bound there.
But that method, A, it won't be as sharp as the one I'll tell you about, and, B, and doesn't
help with lower bounds. So with Gaussians, the key thing that allowed Talagrand to
prove lower bounds -- well, there are several key things, but one is that you can
understand very well what happens with conditioning. So if you have a Gaussian
process you've already conditioned and a bunch of variables, you understand easily
what's the distribution of the remaining ones.
Now, with local times, if I know the local time at one vertex, I still can understand pretty
well the distribution of the others because I can just think of excursions from that vertex.
But once I started conditioning on local times at 17 different vertices, I'm in trouble,
because the combinatorial intricacy of in what order did I visit those vertices, computing
the effect of that on other vertices is hard.
So the key is to make a more direct connection from loam times to Gaussian processes.
The first such connection is due to Ray and Knight in the '60s and then Dynkin in the
'80s, there was some intermediate related work, but Dynkin realized the general
connection and this was continuously refined. And this is the best theorem we know
due to Eisenbaum, Kaspi, Marcus, Rosen and Shi, which is -- that and Talagrand's are
the most important technical tools.
So here's the statement. Fix a vertex and look at the local time L at that vertex. So
we're going to run the chain until we accumulate local time L in that vertex. Capital T is
the time we're going to run it.
Let g be the Gaussian free field on the graph. Then we have the following identity and
distribution. This is not an estimate, it's an identity in law of two things. So the
right-hand side is very simple. We take this constant L that we fixed, look at gx minus
root 2L squared, divide it, and so this is essentially the Gaussian free field. You remove
a constant, you square it. The right-hand side is pretty benign.
Now, the left-hand side is a convolution. So it's a sum of two independent terms. So
here on the left-hand side, this Gaussian is independent from this local time. So you
look at the Gaussian free field and square it and add the local times for an
independently run random walk.
So the Gaussian free field is connected in the sense that the metric that defines the
Gaussian free field is this effective -- square root of effective resistance metric, but note
that the Gaussians here are not affected by this L, by the local time we're running. So if
you run for a very long time, then this is going to grow, and, of course, this is going to
grow, but this Gaussians are not going to change.
This is kind of an amazing identity -- yes?
>>: I just got a little bit confused. So this Ray-Knight theorem was actually about, like,
local times on random logs -Yuval Peres: No. Ray-Knight was about local times for running in motion. So
Ray-Knight said that if you do local times of running motion you can write them as the
square of a -- and if you stop the running motion properly you can write it as a square of
another Gaussian process.
>>: [inaudible]
Yuval Peres: So that was -- Ray-Knight was on running motion?
Now, Dynkin's theorem was for general Markov processes. And the reason people
were interested in this is mostly because they were studying more and more general
processes and more and more general spaces. They were not interested in random
walks on graphs. And they wanted continuity criteria for local times on the general
spaces.
So that was the impetus. But to get the sharp conditions, they really needed very
precise estimates. What turns out is that these estimates are very useful in our
problems that involve this finite -- you know, random walks in finite graphs.
So this is somehow the key, a key identity. And so let me quickly sketch for you at least
some things you can see from this identity that -- so say you want bound from above
blanket times your cover times. So what do you need to show is that if the time is much
bigger than the number of edges times these expected maximum squared, then the
local times are about uniform.
So scaling global times and local timings, what do you need to show is that this is a
local time, the fixed vertex is much bigger than EM squared, then the local times are
approximately uniform. So that's -- by scaling, that's basically what we need to show.
And, remember, this is this identity. I mentioned the identity. Even the people -Dynkin, Rosen, others who have discovered and used these identities, there's a book by
Marcus and Rosen basically centered around this connection, and they say themselves
that they don't really understand why this works. Basically proof of the equality is by
calculating joint [inaudible] transforms of both sides. They turn out to be big
determinants which can be identified.
And there are other proofs, too, which are even more intricate. But there's no intuitive
understanding of why this holds.
Okay. So we have this. Now, let's just open the square on the right-hand side so we
can write it this way. And now I'm going to copy this but just use bigger type for the
bigger quantities.
So the idea is we want to run this to a time which is large, so the local time, this L, will
be large, and then these local times will also be large. So this is the a lot of same
formula you see before, but emphasizing the fact that we're running to a time L that's
much bigger than the maximum squared of this Gaussian process.
So this Gaussian process becomes like noise, and we have this identity. So L here is a
constant. It's a local time at one vertex that we fixed. Here it's a local time field in all
the vertices. This is what we're trying to control. Well, what this identity is telling us is
that this local time is approximately constant, which is what we want. We want
uniformity. So we can get this upper bound quite cheaply from the identity.
So here's the same picture here. So we have this local time and we add to it this noise.
Here we have this unknown field and we add to it this noise. These two things have the
same loss, so this thing must be approximately constant.
The lower bound on the blanket time follows a similar idea and then it gets stuck, so I'll
quickly tell you that. So in the local time -- maybe I'll jump to the picture in a moment.
So we start again with this identity. Just use the fact that here -- three away these
squares. So you have a bound. The square root of the local time can be bounded by
root 2L minus gx, and now I'm going to write that down and compare. This is the square
root of the local time and this is root 2L minus -- you know, compared to gx.
And now we use the fact that the Gaussian -- the maximum of the Gaussian is
concentrated. So it's close to its expectation. So -Finish in five minutes.
So it's close to its expectation, so this difference is relatively small. And so there must
be some vertex where the square root of the local time is smaller -- is of smaller order
than this height, and this height is the square root of, well, twice the local time at some
other vertex. So because of that, we can pretty easily lower bound the blanket time.
But lower bounding the cover time is much harder because that involves finding place
where the local time actually is 0.
So in the last three minutes let me sketch what's the issue there. We want to show -- so
three quarters of the technical work starts here, but I won't be able to do it, which is how
to find -- you know, show a vertex where the local time is actually 0. These type of
estimates just -- you can't do that from the isomorphism theory directly.
So the observation is that when the local time is positive, it can't be positive and
extremely small because if you actually visit a vertex and let's say it's far away from
your base vertex, then you're going to visit it several times before you are going to come
back to your base vertex. So you can bound it below by an exponential.
So the idea, in order to show -- in order to give a lower bound for the cover time we're
going to take this time, because we want to show this is less than the cover time. We
want a find a low time that's 0 there. What are we going to do? We're going to find a
local time that's very small and then argue the if it's very small, it must actually be 0.
So how do we find local time is very small? Before we used the fact that the maximum
Gaussian is close to its expectation, but this close is not close enough for us. So the
key is if we put a threshold that's much lower -- so I said here half the height, but
actually you need one millionth of the height. So epsilon is some constant. If you look
at mountains that are not the tallest but much less than that, then there are lots of
mountains near that, so you can find one that's close to your target.
And here's another picture. If you think of the Gaussian -- you know, the levels of the
Gaussian particles as competing, here, this is the winner, and it's pretty near its
expectation. But if we go below, there are lots of -- as the winter pulls up, there are lots
of stragglers, and we can find some very close to a precise target.
Actually, doing that involves ideas from percolation on trees, which I don't have time to
go into. So the paper is on the archive for those interested. And we have to use both
specific properties of this Gaussian process and electrical network theory and ideas
from percolation on trees to find this kind of point.
So I'm going to end the sketch of the proof here and just finish with a couple of open -really one main open question. What about more precise asymptotics for the cover
time?
So these are available in not too many examples. So here is one kind of striking
connection. If you look at the Gaussian free field on a 2-D lattice, the expected sup is -was calculated by Bolthausen, et al., to be logarithmic times square root 1 over 2 Pi.
On the other hand, in the paper with Dembo, Rosen, and Zeitouni, we showed for the
2-D torus the cover time is asymptotic to 1 over Pi n log n squared. So this relation that
I mentioned before as true of the constant is actually in this example an asymptotic. So
the ratio of these things is 1 plus little o of 1.
Well, is this general? So if I want to talk about asymptotic, I want the cover time to be
concentrated. And here luckily a theorem of David Aldous comes to help. He proved
that if you have any sequence of graphs where the hitting times are negligible to the
cover times, then the cover times are concentrated. So the cover time is seen as a
random variable. You divide it by its mean and it goes to 1 in probability.
So let's only look at sequences of graphs like that where the cover time is concentrated.
And then there's a meaning to estimate the constant. There's more of a meaning to
estimate the constant. And then we don't have a counter example to this relation
being -- this relation holding asymptotically. On the other hand, the examples where we
can really calculate both things are limited so it's not a lot of evidence. But actually
using the isomorphism theorem, we can prove it in one direction.
So I said this is the open question. It holds in a few examples. Probably it can be
checked whether it holds for trees, for general trees. We didn't do that. It holds in a few
examples.
Let me -- you know, ignore this slide, but just look at this square thing. So we do know
that the cover time is bounded by the right-hand side up to 1 plus little o of 1. So in this
asymptotic conjecture, one side is true and follows from the isomorphism theorem. I
won't labor you at this hour with the argument. But I'll come back -- end with the two
questions. Is there a deterministic polynomial-time 1 plus epsilon approximation to the
cover time?
So we don't know that. And even if that asymptotic is true, that doesn't solve it because
we don't know 1 plus epsilon approximation to this side. Talagrand's only the gives you
an approximation up to constant or our version of Talagrand's theorem.
So these two challenges are both -- are related but are different. Is there a 1 plus
epsilon approximation?
Another question which is related to Aldous's result is can you make the concentration
in David's result more explicit? Is the standard deviation of the time to cover, the cover
time seen as a random variable, is that actually bounded by the maximum hitting time?
That would correspond to things we know about Gaussian processes, but the
connection we have between Gaussians and local times is not strong enough to control
fluctuations, so this is open.
Thanks for your attention.
[applause]
Yes?
>>: So you may have entered this in a language that I don't recognize, but to what
extent can this be applied to [inaudible] contact manifolds and [inaudible].
Yuval Peres: I expect it can be applied, but this hasn't been done. So all tools, the
isomorphism theorem and so on, apply in that setting, but it hasn't been done. The way
the cover time has been estimated exactly, say, for the Z2, for square and Z2, was first
to look at running in motion on the torus and then use the approximation that way.
Sometimes running the questions can be easier because of additional symmetry.
>>: That Talagrand gamma, you calculated it exactly or -Yuval Peres: No, no, approximately. But even Talagrand, each -- it's really only defined
up to constants. If you look in different papers of Talagrand, each time gamma is
different or different pages of one paper [laughter] because there is a definition using
packing trees, using covering trees. There are many different definitions. They're not
the same, but they're all equivalent up to universal constants. And the algorithm that we
have, again, doesn't at all calculate any of these gammas. It's yet another quantity
that's equivalent to it up to a constant. So it's really loosely defined.
The only things that are kind of rigid are the cover time and the maximum of a Gaussian
process.
>>: I know you have this time machine [inaudible].
Yuval Peres: Yes [laughter]. But the time machine is restricted to the cover time versus
Gaussian connection so far.
>>: [inaudible]
Yuval Peres: What?
>>: [inaudible]
Yuval Peres: Okay. Thanks.
[applause]
Download