21184 >> Yuval Peres: Good afternoon. Really delighted to...

advertisement
21184
>> Yuval Peres: Good afternoon. Really delighted to have this talk here, given that this
conjecture, which I knew as Aldous Diaconis conjecture is something that I was wondering about
for many years. So it's really a pleasure to see the solution. So Pietro Caputo.
>> Pietro Caputo: Thank you very much. It's very nice to be here, and it's a pleasure to give this
talk here. So I will talk about a theorem we proved around last year with Tom Liggett and
Thomas Richthammer while I was visiting UCLA. This result will be published in Volume 23 of
the Journal of AMS.
So to start with, I'm going to formulate the problem, although it's probably well known to all of you.
But let me just recall what -- let me just go through the outline of this talk first.
So as I said, I will start by reviewing some standard facts about random walks on weighted
graphs. And the interchange process. Then formulate what we call the Aldous conjecture, some
people call it the Aldous Diaconis conjecture. While it's clear that both have a lot to do with this
problem. So I guess either way it would be correct.
And we will see how this result will have consequences for other processes like exclusion
processes. And if we have time, maybe we can look at other applications for related processes
on weighted graphs such as perfect matchings and other models.
So the talk will be mainly -- hopefully there will be enough time to go through the main ideas.
This involved essentially a new idea to perform some recursion over weighted graphs, which
allows to prove the result for arbitrary graphs, and this recursion leads naturally to a new
comparison in equality, which required quite a bit of work to be proven. And I won't be able to
give the details of the proof of this inequality.
But you will see why it comes up and what is interesting about it. So let's start with the standard
facts. So we start with a graph, a weighted graph. Namely, we consider vertices V and complete
edges. So every vertex has an edge with any other vertex, but we put a weight on the edge XY,
which is CXY. So CXY is a negative weight and we assume that the resulting skeleton is
reducible. So the positive weight for any reducible graph, connected graph. The random walk is
a standard process where you attach to each edge an independent Poisson clock with a rate
CXY. And when your particle sits some vertex here in this case vertex one of this simple five
vertex graph, then if the edge 1-2 rings with the correct rate C1-2, then you move the particle
from 1 to 2 and so on.
Of course, if another edge rings where the particle is not present, then nothing happens. This is a
very simple process, which is described by this infinitesimal generator. So it's a continuous time
Markov chain with this infinitesimal generator which I call L random walk, and it is reversible with
respect to uniform measure with respect to counting measure on the vertex set V.
So this is a simple Markov chain. The interchange process on the weighted graph G consists of
the Markov chain where at each edge we again have the same collection of Poisson clocks with
the same rates. And now we have a permutation of the labels. Namely, we have N particles if
the vertex set as cardinality N. So at each vertex there is one particle. So the configuration is
given by a permutation. Say for this case here if the edge 1-2 rings, then we swap the label
sitting at the two vertices on that edge.
So this gives us a continuous time Markov chain with this infinitesimal generator, where here the
function F is a function on permutations. And the F at XY is just evaluated at the permutation eta
after a transposition of X and Y has been performed. So this is the infinitesimal generator, which
is reversible, with respect to the uniform measure over all permutations of N labels.
So this can be seen as a model, as a simple model of interacting random walks, if you want.
Because each label is performing a random walk, and interaction is that there is no particle -there are no two particles sitting at the same vertex.
Okay. So let me skip some simple details here. So let's look at the eigenvalues. So eigenvalues
of these two matrices. -- we're seeing these two processes are described by two symmetric
matrices. L for the random walk and L for the interchange process.
So in both cases the matrix minus L as a nonnegative eigenvalues, let's call them lambda 0 for
the eigenvalue 0, trivial case, lambda 1 up to lambda minus 1 where M is just the size of the state
space. Spectral gap is the first eigenvalue lambda 1, which is non-0, because of the reducibility
assumption.
And this is often called the inverse of the relaxation time. Just because if you project a vector F.
So a function F, if you project it along the spectral -- along the eigenvalues of the matrix L, then
what you obtain is -- if you look at the evolution at time T of this function, what you obtain is a
trivial term which is just a projection along constants.
And then the first dominant term usually is given by the projection along the spectral gap,
eigenfunction.
So this would be in the, long run, it would be the dominant term if there's a non-0 projection along
the spectral gap. This is why usually it's called the relaxation time. But it has some relevance
when you analyze converge to equilibrium of this process.
So the conjecture I want to talk about and describe here is the conjecture formulated around the
early '90s that for every graph, I think the formulation refers to unweighted graphs. So only
weights equal to 0 or 1.
The spectral gap of the two processes I just described coincide. So one sort of obvious
observation is that the spectral gap of the interchange process should always be at most the
spectral gap of the random walk process on any graph G. This simple observation comes from
the fact that the random walk is really a sub process, if you want, of the interchange process.
Namely, if an interchange process, you only follow one label, what you obtain is the random walk
process.
So it's clear that the slowest mode of the interchange process should be smaller or equal than the
slowest mode of the random walk process. So there is some work to do on the other hand to
prove that these two are equal. And it's actually not obvious at all at first.
But this was the conjecture, based on several observations and several works of Diaconis and
other co-authors that actually showed that this was the case in several instances.
So we will see a little bit of history of this problem. So maybe I will quickly say that this conjecture
has consequences for a large family of other processes that can be constructed as projections of
the interchange process. And one of this is the exclusion process. Exclusion process is in this
case would be described as follows.
Take K particles on your graph. So let's take two particles on our five vertex graph. And let's run
again the same Poisson clocks on each edge.
And here the, when the edge rings, you move the particles sitting at its end points, just as before.
So it's the same as having unlabeled particles. So in this case this can be obtained from the
interchange process by calling, say, particles two and particle five black and all the remaining
white or something like that.
So this can be obtained trivially from the interchange process. And the fact that you can obtain
this as a projection immediately proves that all the eigenvalues you can see in the exclusion
process are contained in the spectrum of the interchange process.
So you have this inclusion here. On the other hand, you also have the inclusion that all the
eigenvalues you can see in the random walk you can see in the exclusion process.
This is also quite simple. You can't just project trivially here. You have to use symmetry and take
a Sim tried eigenfunction of the random walk. It's a simple matter.
So you have this double inclusion. So once we prove the conjecture we'll prove that for any
number of particles the spectral gap coincides, of the exclusion process coincide with the spectral
gap of the random walk. So there are some other processes but I will not go into the details now.
So let me say a few things about what was previously known for this problem.
>>: What was the matching process that you went through? Very quickly?
>> Pietro Caputo: Yeah, very quickly. So just as I looked at the exclusion process as a
projection of the interchange, I can look at the other projections. So here we need an even
number of sites. So let's take six, say, and look at ->>: Swap like this?
>> Pietro Caputo: Look at all the perfect matchings of the complete graph. So think of the
complete graph. But the rules for swapping are inherited from the weighted graph. So what you
do is if this edge 1-2 rings in our Poisson clocks, what you do is you take these two end points
and exchange them.
So what you obtain is this matching configuration. It turns out that this is also a projection of the
Markov chain of the interchange process. So there is again a trivial inclusion of the spectra. And
then so if you can prove that the spectral gap of interchange process is equal to the spectral gap
of the random walk, what you obtain here is only the spectral gap of the matching is larger or
equal than the one of the random walk. You don't obtain it if they're equal. And in fact they're not
equal.
So you can show that for a complete graph, without weights equal to 1, this one is strictly larger
than the random walk.
So at this level you can only get a bound. But, yeah. So there's a whole family -- it's easy to
describe. But this is an example. Other questions? So previously known cases, there is a line of
attack of this problem which is purely algebraic, and it starts with a celebrated work of Diaconis
and Shashahani from '81 where they actually compute the full spectrum of the operator L in the
interexchange process, in the case of complete graphs. So all weights equal to one.
And this can be done with some detailed knowledge of the characters of the transpositions for the
irreducible representations of the symmetric group. Recently Phillip Posheshe [phonetic] pushed
this approach in an untrivial way to obtain the same conclusion for all complete multi-partite
graphs. This means if you take the complement of the graph you obtain a bunch of complete
graphs. So weights all equal to 1 again. Either 0 or 1 and this is for complete multi-partite. So
you need some special algebraic structure because this is all based on exact computations for if
reducible representation of the symmetric group. So it's quite clear that such an approach as
very little chances of getting to a full proof of the problem.
So on the other hand, there was -- the first paper who did a recursive approach for this problem,
not based on exact computation, was a paper of Handjani Jungreis, where they proved the result
for all weighted trees.
So arbitrary weights, but on a tree. Similar things were obtained in a different context for spin
chains by Koma Nachtergaele. And then based on this recursive approach, some more recent
papers by Morrison and Conomos Starr they proved asymptotic results in the spirit of the full
conjecture for subsets of the -- for box-like subsets of ZD.
Let me just explain a little bit about this recursion, because it plays a role in the following: So this
recursion -- it's quite clear that you want to build up some recursive idea to, because it's trivially
true that the conjecture holds on, say, two vertices.
So you want to try to build up your graph by an inductive reasoning. So the way they did it was
by taking your graph and simply removing a vertex and all the edges that are adjacent to that
vertex.
So that is not a huge problem when you have a tree, if you start from a leaf. You can do that.
And it turns out that you're not disturbing too much the graph so that you can go through
successfully in this reasoning.
On the other hand, if you take boxes in ZD, if you start removing your vertex at the boundary, you
do disturb a little bit the graph. But what they proved in these two papers is that this disturbed -this does not disturb so much in the limit.
So asymptotically you do get through. So that was the key for these results. So as I said. We
have a full proof of the result for all weighted graphs.
And I'm going to describe this theorem now. So the corollary, as I briefly discussed for this other
processes, is that you get equality for all exclusion processes and a bound for other processes
like the matching process or other forms.
So as I said, the way we prove this result is by refinement of the recursive strategy, which was
mentioned before. Namely, we want to go from a graph G to a graph GX where the vertex X has
been eliminated.
So how do you do this in such a way that you don't disturb too much the graph? One way, one
guess could be to write an equivalent electric network on the remaining vertices, equivalent
meaning that if you apply potentials to two nodes and look at the potential across an edge inside
the network.
This would stay unchanged in the two networks. In the graph G and the graph GX where you
have removed the vertex X. So this is what can be called an electric network reduction. And it
turns out that if you implement this idea, then what you have to show to get the result here is
some inequality comparing two Dirichlet forms for weighted graphs. And this inequality turns out
to be very tricky.
And the reason why this is tricky is that in some sense there is a lot of degeneracy here. There
are a lot of cases where all these inequality are really exact identities.
So to prove a bound, it's not that easy. But we will see what it looks like this new inequality. So
at some point we did understand that this was the right idea. The elected network reduction, but
we still couldn't prove this comparison in equality.
So at some point we had almost given up and we have written a small paper with this idea and
with some speculations on how to prove the rest of the things and so on. And at that point Tom
Dicker, independently of us, had obtained essentially the same idea coming from a much more
algebraic point of view. So the formulation was not in terms of electric network and so on, but he
had an equivalent proposal for the recursive strategy. And he also conjectured the validity of an
equivalent inequality here.
So right after we have almost decided to give up, we came up with the idea of the full proof of the
inequality as well. So that was a happy ending in the end. [laughter].
>>: [inaudible].
>>: Not so much, right?
>>: Okay. I mean -- [laughter].
>> Pietro Caputo: Yeah. I guess. So -- no, I mean, his work is remarkable, and we're all happy
to acknowledge that. It's not a big deal. So what is this recursive strategy? The idea, as I said,
is to remove a vertex X. So let's think of a five vertex graph here. I have a point X which I want
to remove and I have the remaining four points that I want to keep.
And the idea would be that I remove X with all the edges adjacent to it, and I want to look at the
remaining vertices where I redistribute the weights adjacent to X in a suitable way. Namely, the
suitable way will be dictated by this electric analogy, network analogy.
So what I have to keep is -- of course I have to keep the usual weights. And I have to add some
new weights here. So Y and Z will be two of these vertices from A to D. And so the new weight
at Y and Z will be the old weight plus this additional weight which is CXY, CXZ divided by the sum
of the weights coming out of X.
So how do we read this? We can read this from a probabilistic point of view in a simple way.
Namely, consider the Markov chain you obtain by neglecting all the times that -- all the time you
spent at X.
You obtain a new continuous time Markov chain with a new generator, of course. And the new
generator is exactly the random walk on the graph where you remove X with these new weights.
Because if you want to go from Y to Z now you always have the option of jumping directly. But
you also have the option to jump to X and then from X with probability CXZ divided by CX you
jump to Z.
So this is usually called an embedded process or ->>: Induced Markov chain.
>> Pietro Caputo: Induced Markov chain.
>>: Chain watched.
>> Pietro Caputo: Chain watched on the set -- the vertex set of GX. Exactly.
So the theorem here that we will need for the SQL is that the random walk on the new graph GX
has a larger or equal spectral gap of the random walk on the old graph G.
This result is not very difficult to obtain. And once you formulate it in terms of the induced chain,
it turns out to be already written in the literature. Apparently Aldous was probably the first to
prove something like this. And then it's also Yuval's book with Bill Moran, and I forgot the name -Levin.
So once -- [laughter] sorry? So once you ->>: [inaudible].
>> Pietro Caputo: No. [laughter].
I talked to him. But I have a blank moment now. So this is not a huge deal once you formulate
things properly. And it's a natural way also to formulate it in this way, because this is
equivalent -- in some sense any reversible Markov chain, if you move a state, has this form when
you take the induced Markov chain. So it can be seen as a general formulation of that problem.
In terms of electric network, on the other hand, this is not something that you find in this form in
standard presentation of electric network reduction.
Because this is a generalized form of the standard resistance in serious transformation or star
triangle transformation, but in this general setting it is maybe new.
So how do we get this problematic comparison estimate? And, well, let me formulate it here as
this theorem, and then we will see why this is exactly what you need.
So the problem can be seen as follows here. You have a vertex X with a bunch of edges coming
out. And you have the remaining edges here. So let's pick two, which are here. And you want to
compare this weighted graph -- let's just look at the star with centered at X and ignore the original
weights on the other vertices for the moment. Then you want to compare this to the graph that
you obtain when you remove X, but you place all the edges coming out from the electric network
reduction.
So you have all connections here. Okay? So you want to compare the star to the complete
graph on the remaining vertices that you obtain once you do the electric network reduction.
And the way you want to compare this is in terms of Dirichlet forms, so what you see here on the
left side is the Dirichlet form associated to the star. So B would be the edge coming out of X. CB
is the weight.
And this is compact way of writing the energy associated to the star. Namely, you take a function
of permutation. You compute the gradient associated to the transposition at the edge B. So
gradient B of F.
You take the L2 norm of that with respect to the uniform measure. And this gives you in a way
the Dirichlet form associated to the star. Now you want to say this is larger or equal to the
Dirichlet associated to the complete graph out of the star, given by this extra weights that you
have to put in when you do the electric network reduction.
So this is the claim. This is the theorem you have to prove. And, yeah, due to this sort of
pentacular nature of this object we started calling it the octopus inequality and it escaped us for
quite a while.
At some point we caught the octopus and it was fine. So let's see, first of all, let's see -- it's pretty
simple to prove that once you have theorem one and theorem two, the story's over. This is very
simple.
So let's see why. So you'll start to appreciate this inequality. So recall that from the standard
variational principle describing the spectral gap, you can write the lambda one for the interchange
process as the infimum over all the functions orthogonal to the constant function of the Dirichlet
form. I put a half here, because the true Dirichlet form here has a half here.
The Dirichlet form associated to the full graph. So now I'm not talking about the star attacks. I'm
talking about the full graph here. So CB is all the weights of the graph. And here I normalize with
the L2 norm. This is the standard variational principle.
So that's decompose this in two pieces. One associated to the random walk, namely this comes
as we know from functions that are -- functions that depend only on the position of one label. Say
label one.
So if you look at your variational principle, restricting to functions of only the position of label one,
what you get is lambda one, random walk. So here you have self-jointness, what you do is look
at the orthogonal complement now. You're on the orthogonal complement now of all functions
that are only dependent on one label. So how do you describe the space here? Well, one way to
describe this space is as this H 0. H 0 is the set of all functions on the permutation, such that if
you take conditional expectation, given the label occupying site X, this is equal to 0. This is a
function here.
It's equal -- this function is equal to 0 for all X. So let me say why you have this. Well, maybe I
have a few lines here that explain this in detail.
So there is one way of formulating the fact that if you follow one random walk, you obtain the
random walk Markov chain is if you follow one label, you obtain the one random walk Markov
chain is this intertwining relation, namely if you take projection pi I, this projection here means that
I look at the value given the position of label I. So if you tell me what the position of label I is, I
take conditional expectation of, with respect to uniform choices of all other positions.
Let's call it pi I. So there's this intertwining relation between the generator of the interchange
process and the generator of the random walk process. And so I don't know if I have to work out
these details.
But well I'll just say a few words. Morally, basically, if you have a -- if you have an eigenfunction
F of the interchange process, and this eigenfunction has some non-0 projection, so it's not in H 0.
Suppose this eigenfunction is not in H0, then the moral is that this relation immediately implies
this is also when you project it on one label, this is also an eigenfunction of the random walk.
So you get a non-0 vector which satisfies the eigenvalue equation and you get the same
eigenvalue. So it has to be an H 0 tow be associated with the eigenvalue not in the random walk.
So this basically tells you the lambda one of the interchange is the minimum of these two
quantities where mu one is associated to two functions in each 0.
So the claim here will be that this mu 1 is larger or equal to lambda 1 of the interchange process
on the graph GX where you have removed the site X and you have performed the electric
network redistribution of the weights.
So let's prove this. This would be an immediate concept of the octopus inequality. So the
observation is that if you are in H 0, then the L2 norm of F is the same as the expected value of
the conditional variance given at X. Because variance can be decomposed into expected
variance, plus variance of conditional expectation. That second term is 0.
So conditioned on eta X, everything becomes a system that your measure becomes uniform over
permutations of the labels which are not occupying X. So it's a reduced system where you have
N minus 1 labels.
So you can use spectral gap inequality on that system. So on that system, you can place your
weights according to the GX recipe.
Here you're free to do whatever you want, provided you put the right weights. So you put the
weights corresponding to the GX graph, and what you obtain is the weights of the GX graph,
which are the old weights, plus the additional weights. But now these edges do not touch X.
And you have this formulation here. This term here of the gradients. So if you take expectation
now, you can remove these conditional -- conditioning on eta X just because we had the L2 -- this
identity.
So what you get is you have two terms. One involved in the old weights and one involving the
new weights. So now you use the octopus inequality.
The new weights are dominated by the old weights coming out of X. So you just plug it in. And
from this inequality, what you get is a piece involving the edges not touching X and a piece
involving the edges touching X. You put them together. You get the old graph.
So when you get the old graph, you're done, because this is exactly the bound you are looking
for, because this holds for any F and H 0 so the mu 1 G is larger than or equal to lambda 1 of the
interchange on GX.
Okay. So this is a piece of knowledge that now we can use to finally kill the problem.
Namely, assume by induction that on your reduced graph the conjecture is true. Then what we
obtain is that this mu 1 G is larger or equal than the interchange process gap on GX which, by
assumption, is equal to the random walk gap on GX. And so going back to our minimum of the
two quantities, we get that the interchange process gap on G is bounded below by this minimum.
And from theorem one, we know that the random walk gap on GX is larger than or equal than the
random walk gap on G. So we're done.
So this is the bounded that was missing to prove the theorem.
So really you have reduced the problem essentially to one nontrivial piece of information, which is
this octopus inequality. So the work starts here and the talk stops here, I guess. But [laughter]
but I don't know how much time I have, actually. Maybe I can give a few ->>: Five by five ->> Pietro Caputo: Maybe I can give a little bit of just a little bit of ideas about this inequality. So
what we said is that we want to bound the energy or the Dirichlet form associated to the star in
terms of the Dirichlet form associated to the complete graph in the complement.
So if you start doing this on, say, three vertices, you want to start easy. And you find that actually
this is not too hard. But it's still not just the Schwartz inequality of one step kind of thing. So okay
it doesn't require so much work for three vertices. But if you go to four, it becomes really
nontrivial. So four really requires some work. But before I tell you that, maybe one observation,
which is sort of interesting, the first thing you try when you have something about the interchange
is if it's through for functions of one label only. Because for this, things are easy. And it turns out
that this inequality here takes a nice form for functions of one label only. So let's take for our
function F, just a function C of label 1.
So if you plug this in, this bound -- well, let's look at the left-hand side. You obtain that the
left-hand side is equal to the right-hand side, plus this quantity, which is just the random walk
generator applied to function psi at function X which is the site we're removing. Normalizing by
CX. So this term is 0 every time that you have an harmonic function C at the point X that you are
removing.
So this tells you that for functions of one particle, this inequality is always true. Of course, this is
nonnegative, and it becomes saturated by all functions of, that are harmonic at the point X. You
might expect that there is a way to interpret this in terms of this harmonic property.
So this is what we tried to do for a long time. And we never came up with anything. So this is a
little bit surprising. And it would be nice to have some probabilistic or analytic interpretation or
whatever, anyway, of this inequality, which is not just a proof that -- because in the end we have a
proof, but it's not that nice from an intuitive perspective.
And so maybe what to say. Maybe another thing that should be noticed here is that we have
from theorem one we have that the spectral gap walk, the spectral gap of the random walk on GX
is larger or equal than the spectral gap of the random walk on G. So in some sense this
inequality is going in the opposite direction of that statement, because that statement says that
the quadratic form of the reduced graph produces a larger gap.
So there is a subtle point here. And the subtle point comes from the fact that this expectation you
see here does not involve -- does involve all the sites, including X.
Namely, for this Dirichlet form, there is an isolated point at X. So this isolated point in some
sense is shifting down all the eigenvalues because there's an extra 0 in this spectrum of this
object.
So there is this interplay here between the result of theorem one and the result of theorem two,
they sort of go in the opposite directions. That's what makes the whole proof works in the end.
So maybe just about the general case. So the general case you have some metrics A, which is
indexed by permutations, and this metrics can be written -- if you want to prove this inequality,
this is equivalent to prove that this metrics A is nonnegative definite. If you bring all the weights
on the same side, you get some weights with a positive signed, some weights with a negative
side, and this is the inequality you want to prove, basically.
So, of course, this CB all depend on the weights, this CB star all depend on the weights CB. So
the only variables are really these weights coming out of X.
And you want this inequality for arbitrary choice of these weights. So even if you want to prove
this inequality in general for all weights equal to 1 is nontrivial. So it's not a really simple object.
So the way we found to prove this result is by observing that if you start playing with these
quantities, you discover that there is a little bit of structure, which is independent of the value of
the weights. So if you try to exploit that and play with the weights, the way that these things
combine is that you manage to write some linear combinations of some deterministic matrix,
deterministic I mean independent of the weights.
And so you start seeing that in the end it is sufficient to have certain bounds on some
deterministic matrices and these are huge because they're N factorial by N factorial, but you can
try to give some block decomposition to them and analyze each block in a suitable way. In the
end, you reduce the problem to some nontrivial inequality, which you can check by computation,
essentially.
So I think the final object one has to look at is a 60-by-60 matrix in the end that you can check is
nonnegative. The only entries are 0s, 1s and 2s, and you can check that by sort of simple
algebraic manipulation, you can check that it is nonnegative definite.
But as I said, it's clear that it would be interesting -- also in light of this electric network reduction
analogy and this harmonic function analogy, it would be really interesting to have some deeper
understanding of this type of relations. So that remains something to do, apparently.
Okay. I guess I'll stop here. [applause].
>> Yuval Peres: Questions or comments?
>>: So the power is from theorem one and theorem two which is going different directions, once
you have them at each other you block the power [phonetic].
>> Pietro Caputo: This is one way to put it, yeah. I agree.
>>: Simply marvelous.
>>: The theorem [inaudible].
>> Pietro Caputo: Yes.
>> Yuval Peres: Thank you.
>> Pietro Caputo: Thank you. [applause]
Download