21184 >> Yuval Peres: Good afternoon. Really delighted to have this talk here, given that this conjecture, which I knew as Aldous Diaconis conjecture is something that I was wondering about for many years. So it's really a pleasure to see the solution. So Pietro Caputo. >> Pietro Caputo: Thank you very much. It's very nice to be here, and it's a pleasure to give this talk here. So I will talk about a theorem we proved around last year with Tom Liggett and Thomas Richthammer while I was visiting UCLA. This result will be published in Volume 23 of the Journal of AMS. So to start with, I'm going to formulate the problem, although it's probably well known to all of you. But let me just recall what -- let me just go through the outline of this talk first. So as I said, I will start by reviewing some standard facts about random walks on weighted graphs. And the interchange process. Then formulate what we call the Aldous conjecture, some people call it the Aldous Diaconis conjecture. While it's clear that both have a lot to do with this problem. So I guess either way it would be correct. And we will see how this result will have consequences for other processes like exclusion processes. And if we have time, maybe we can look at other applications for related processes on weighted graphs such as perfect matchings and other models. So the talk will be mainly -- hopefully there will be enough time to go through the main ideas. This involved essentially a new idea to perform some recursion over weighted graphs, which allows to prove the result for arbitrary graphs, and this recursion leads naturally to a new comparison in equality, which required quite a bit of work to be proven. And I won't be able to give the details of the proof of this inequality. But you will see why it comes up and what is interesting about it. So let's start with the standard facts. So we start with a graph, a weighted graph. Namely, we consider vertices V and complete edges. So every vertex has an edge with any other vertex, but we put a weight on the edge XY, which is CXY. So CXY is a negative weight and we assume that the resulting skeleton is reducible. So the positive weight for any reducible graph, connected graph. The random walk is a standard process where you attach to each edge an independent Poisson clock with a rate CXY. And when your particle sits some vertex here in this case vertex one of this simple five vertex graph, then if the edge 1-2 rings with the correct rate C1-2, then you move the particle from 1 to 2 and so on. Of course, if another edge rings where the particle is not present, then nothing happens. This is a very simple process, which is described by this infinitesimal generator. So it's a continuous time Markov chain with this infinitesimal generator which I call L random walk, and it is reversible with respect to uniform measure with respect to counting measure on the vertex set V. So this is a simple Markov chain. The interchange process on the weighted graph G consists of the Markov chain where at each edge we again have the same collection of Poisson clocks with the same rates. And now we have a permutation of the labels. Namely, we have N particles if the vertex set as cardinality N. So at each vertex there is one particle. So the configuration is given by a permutation. Say for this case here if the edge 1-2 rings, then we swap the label sitting at the two vertices on that edge. So this gives us a continuous time Markov chain with this infinitesimal generator, where here the function F is a function on permutations. And the F at XY is just evaluated at the permutation eta after a transposition of X and Y has been performed. So this is the infinitesimal generator, which is reversible, with respect to the uniform measure over all permutations of N labels. So this can be seen as a model, as a simple model of interacting random walks, if you want. Because each label is performing a random walk, and interaction is that there is no particle -there are no two particles sitting at the same vertex. Okay. So let me skip some simple details here. So let's look at the eigenvalues. So eigenvalues of these two matrices. -- we're seeing these two processes are described by two symmetric matrices. L for the random walk and L for the interchange process. So in both cases the matrix minus L as a nonnegative eigenvalues, let's call them lambda 0 for the eigenvalue 0, trivial case, lambda 1 up to lambda minus 1 where M is just the size of the state space. Spectral gap is the first eigenvalue lambda 1, which is non-0, because of the reducibility assumption. And this is often called the inverse of the relaxation time. Just because if you project a vector F. So a function F, if you project it along the spectral -- along the eigenvalues of the matrix L, then what you obtain is -- if you look at the evolution at time T of this function, what you obtain is a trivial term which is just a projection along constants. And then the first dominant term usually is given by the projection along the spectral gap, eigenfunction. So this would be in the, long run, it would be the dominant term if there's a non-0 projection along the spectral gap. This is why usually it's called the relaxation time. But it has some relevance when you analyze converge to equilibrium of this process. So the conjecture I want to talk about and describe here is the conjecture formulated around the early '90s that for every graph, I think the formulation refers to unweighted graphs. So only weights equal to 0 or 1. The spectral gap of the two processes I just described coincide. So one sort of obvious observation is that the spectral gap of the interchange process should always be at most the spectral gap of the random walk process on any graph G. This simple observation comes from the fact that the random walk is really a sub process, if you want, of the interchange process. Namely, if an interchange process, you only follow one label, what you obtain is the random walk process. So it's clear that the slowest mode of the interchange process should be smaller or equal than the slowest mode of the random walk process. So there is some work to do on the other hand to prove that these two are equal. And it's actually not obvious at all at first. But this was the conjecture, based on several observations and several works of Diaconis and other co-authors that actually showed that this was the case in several instances. So we will see a little bit of history of this problem. So maybe I will quickly say that this conjecture has consequences for a large family of other processes that can be constructed as projections of the interchange process. And one of this is the exclusion process. Exclusion process is in this case would be described as follows. Take K particles on your graph. So let's take two particles on our five vertex graph. And let's run again the same Poisson clocks on each edge. And here the, when the edge rings, you move the particles sitting at its end points, just as before. So it's the same as having unlabeled particles. So in this case this can be obtained from the interchange process by calling, say, particles two and particle five black and all the remaining white or something like that. So this can be obtained trivially from the interchange process. And the fact that you can obtain this as a projection immediately proves that all the eigenvalues you can see in the exclusion process are contained in the spectrum of the interchange process. So you have this inclusion here. On the other hand, you also have the inclusion that all the eigenvalues you can see in the random walk you can see in the exclusion process. This is also quite simple. You can't just project trivially here. You have to use symmetry and take a Sim tried eigenfunction of the random walk. It's a simple matter. So you have this double inclusion. So once we prove the conjecture we'll prove that for any number of particles the spectral gap coincides, of the exclusion process coincide with the spectral gap of the random walk. So there are some other processes but I will not go into the details now. So let me say a few things about what was previously known for this problem. >>: What was the matching process that you went through? Very quickly? >> Pietro Caputo: Yeah, very quickly. So just as I looked at the exclusion process as a projection of the interchange, I can look at the other projections. So here we need an even number of sites. So let's take six, say, and look at ->>: Swap like this? >> Pietro Caputo: Look at all the perfect matchings of the complete graph. So think of the complete graph. But the rules for swapping are inherited from the weighted graph. So what you do is if this edge 1-2 rings in our Poisson clocks, what you do is you take these two end points and exchange them. So what you obtain is this matching configuration. It turns out that this is also a projection of the Markov chain of the interchange process. So there is again a trivial inclusion of the spectra. And then so if you can prove that the spectral gap of interchange process is equal to the spectral gap of the random walk, what you obtain here is only the spectral gap of the matching is larger or equal than the one of the random walk. You don't obtain it if they're equal. And in fact they're not equal. So you can show that for a complete graph, without weights equal to 1, this one is strictly larger than the random walk. So at this level you can only get a bound. But, yeah. So there's a whole family -- it's easy to describe. But this is an example. Other questions? So previously known cases, there is a line of attack of this problem which is purely algebraic, and it starts with a celebrated work of Diaconis and Shashahani from '81 where they actually compute the full spectrum of the operator L in the interexchange process, in the case of complete graphs. So all weights equal to one. And this can be done with some detailed knowledge of the characters of the transpositions for the irreducible representations of the symmetric group. Recently Phillip Posheshe [phonetic] pushed this approach in an untrivial way to obtain the same conclusion for all complete multi-partite graphs. This means if you take the complement of the graph you obtain a bunch of complete graphs. So weights all equal to 1 again. Either 0 or 1 and this is for complete multi-partite. So you need some special algebraic structure because this is all based on exact computations for if reducible representation of the symmetric group. So it's quite clear that such an approach as very little chances of getting to a full proof of the problem. So on the other hand, there was -- the first paper who did a recursive approach for this problem, not based on exact computation, was a paper of Handjani Jungreis, where they proved the result for all weighted trees. So arbitrary weights, but on a tree. Similar things were obtained in a different context for spin chains by Koma Nachtergaele. And then based on this recursive approach, some more recent papers by Morrison and Conomos Starr they proved asymptotic results in the spirit of the full conjecture for subsets of the -- for box-like subsets of ZD. Let me just explain a little bit about this recursion, because it plays a role in the following: So this recursion -- it's quite clear that you want to build up some recursive idea to, because it's trivially true that the conjecture holds on, say, two vertices. So you want to try to build up your graph by an inductive reasoning. So the way they did it was by taking your graph and simply removing a vertex and all the edges that are adjacent to that vertex. So that is not a huge problem when you have a tree, if you start from a leaf. You can do that. And it turns out that you're not disturbing too much the graph so that you can go through successfully in this reasoning. On the other hand, if you take boxes in ZD, if you start removing your vertex at the boundary, you do disturb a little bit the graph. But what they proved in these two papers is that this disturbed -this does not disturb so much in the limit. So asymptotically you do get through. So that was the key for these results. So as I said. We have a full proof of the result for all weighted graphs. And I'm going to describe this theorem now. So the corollary, as I briefly discussed for this other processes, is that you get equality for all exclusion processes and a bound for other processes like the matching process or other forms. So as I said, the way we prove this result is by refinement of the recursive strategy, which was mentioned before. Namely, we want to go from a graph G to a graph GX where the vertex X has been eliminated. So how do you do this in such a way that you don't disturb too much the graph? One way, one guess could be to write an equivalent electric network on the remaining vertices, equivalent meaning that if you apply potentials to two nodes and look at the potential across an edge inside the network. This would stay unchanged in the two networks. In the graph G and the graph GX where you have removed the vertex X. So this is what can be called an electric network reduction. And it turns out that if you implement this idea, then what you have to show to get the result here is some inequality comparing two Dirichlet forms for weighted graphs. And this inequality turns out to be very tricky. And the reason why this is tricky is that in some sense there is a lot of degeneracy here. There are a lot of cases where all these inequality are really exact identities. So to prove a bound, it's not that easy. But we will see what it looks like this new inequality. So at some point we did understand that this was the right idea. The elected network reduction, but we still couldn't prove this comparison in equality. So at some point we had almost given up and we have written a small paper with this idea and with some speculations on how to prove the rest of the things and so on. And at that point Tom Dicker, independently of us, had obtained essentially the same idea coming from a much more algebraic point of view. So the formulation was not in terms of electric network and so on, but he had an equivalent proposal for the recursive strategy. And he also conjectured the validity of an equivalent inequality here. So right after we have almost decided to give up, we came up with the idea of the full proof of the inequality as well. So that was a happy ending in the end. [laughter]. >>: [inaudible]. >>: Not so much, right? >>: Okay. I mean -- [laughter]. >> Pietro Caputo: Yeah. I guess. So -- no, I mean, his work is remarkable, and we're all happy to acknowledge that. It's not a big deal. So what is this recursive strategy? The idea, as I said, is to remove a vertex X. So let's think of a five vertex graph here. I have a point X which I want to remove and I have the remaining four points that I want to keep. And the idea would be that I remove X with all the edges adjacent to it, and I want to look at the remaining vertices where I redistribute the weights adjacent to X in a suitable way. Namely, the suitable way will be dictated by this electric analogy, network analogy. So what I have to keep is -- of course I have to keep the usual weights. And I have to add some new weights here. So Y and Z will be two of these vertices from A to D. And so the new weight at Y and Z will be the old weight plus this additional weight which is CXY, CXZ divided by the sum of the weights coming out of X. So how do we read this? We can read this from a probabilistic point of view in a simple way. Namely, consider the Markov chain you obtain by neglecting all the times that -- all the time you spent at X. You obtain a new continuous time Markov chain with a new generator, of course. And the new generator is exactly the random walk on the graph where you remove X with these new weights. Because if you want to go from Y to Z now you always have the option of jumping directly. But you also have the option to jump to X and then from X with probability CXZ divided by CX you jump to Z. So this is usually called an embedded process or ->>: Induced Markov chain. >> Pietro Caputo: Induced Markov chain. >>: Chain watched. >> Pietro Caputo: Chain watched on the set -- the vertex set of GX. Exactly. So the theorem here that we will need for the SQL is that the random walk on the new graph GX has a larger or equal spectral gap of the random walk on the old graph G. This result is not very difficult to obtain. And once you formulate it in terms of the induced chain, it turns out to be already written in the literature. Apparently Aldous was probably the first to prove something like this. And then it's also Yuval's book with Bill Moran, and I forgot the name -Levin. So once -- [laughter] sorry? So once you ->>: [inaudible]. >> Pietro Caputo: No. [laughter]. I talked to him. But I have a blank moment now. So this is not a huge deal once you formulate things properly. And it's a natural way also to formulate it in this way, because this is equivalent -- in some sense any reversible Markov chain, if you move a state, has this form when you take the induced Markov chain. So it can be seen as a general formulation of that problem. In terms of electric network, on the other hand, this is not something that you find in this form in standard presentation of electric network reduction. Because this is a generalized form of the standard resistance in serious transformation or star triangle transformation, but in this general setting it is maybe new. So how do we get this problematic comparison estimate? And, well, let me formulate it here as this theorem, and then we will see why this is exactly what you need. So the problem can be seen as follows here. You have a vertex X with a bunch of edges coming out. And you have the remaining edges here. So let's pick two, which are here. And you want to compare this weighted graph -- let's just look at the star with centered at X and ignore the original weights on the other vertices for the moment. Then you want to compare this to the graph that you obtain when you remove X, but you place all the edges coming out from the electric network reduction. So you have all connections here. Okay? So you want to compare the star to the complete graph on the remaining vertices that you obtain once you do the electric network reduction. And the way you want to compare this is in terms of Dirichlet forms, so what you see here on the left side is the Dirichlet form associated to the star. So B would be the edge coming out of X. CB is the weight. And this is compact way of writing the energy associated to the star. Namely, you take a function of permutation. You compute the gradient associated to the transposition at the edge B. So gradient B of F. You take the L2 norm of that with respect to the uniform measure. And this gives you in a way the Dirichlet form associated to the star. Now you want to say this is larger or equal to the Dirichlet associated to the complete graph out of the star, given by this extra weights that you have to put in when you do the electric network reduction. So this is the claim. This is the theorem you have to prove. And, yeah, due to this sort of pentacular nature of this object we started calling it the octopus inequality and it escaped us for quite a while. At some point we caught the octopus and it was fine. So let's see, first of all, let's see -- it's pretty simple to prove that once you have theorem one and theorem two, the story's over. This is very simple. So let's see why. So you'll start to appreciate this inequality. So recall that from the standard variational principle describing the spectral gap, you can write the lambda one for the interchange process as the infimum over all the functions orthogonal to the constant function of the Dirichlet form. I put a half here, because the true Dirichlet form here has a half here. The Dirichlet form associated to the full graph. So now I'm not talking about the star attacks. I'm talking about the full graph here. So CB is all the weights of the graph. And here I normalize with the L2 norm. This is the standard variational principle. So that's decompose this in two pieces. One associated to the random walk, namely this comes as we know from functions that are -- functions that depend only on the position of one label. Say label one. So if you look at your variational principle, restricting to functions of only the position of label one, what you get is lambda one, random walk. So here you have self-jointness, what you do is look at the orthogonal complement now. You're on the orthogonal complement now of all functions that are only dependent on one label. So how do you describe the space here? Well, one way to describe this space is as this H 0. H 0 is the set of all functions on the permutation, such that if you take conditional expectation, given the label occupying site X, this is equal to 0. This is a function here. It's equal -- this function is equal to 0 for all X. So let me say why you have this. Well, maybe I have a few lines here that explain this in detail. So there is one way of formulating the fact that if you follow one random walk, you obtain the random walk Markov chain is if you follow one label, you obtain the one random walk Markov chain is this intertwining relation, namely if you take projection pi I, this projection here means that I look at the value given the position of label I. So if you tell me what the position of label I is, I take conditional expectation of, with respect to uniform choices of all other positions. Let's call it pi I. So there's this intertwining relation between the generator of the interchange process and the generator of the random walk process. And so I don't know if I have to work out these details. But well I'll just say a few words. Morally, basically, if you have a -- if you have an eigenfunction F of the interchange process, and this eigenfunction has some non-0 projection, so it's not in H 0. Suppose this eigenfunction is not in H0, then the moral is that this relation immediately implies this is also when you project it on one label, this is also an eigenfunction of the random walk. So you get a non-0 vector which satisfies the eigenvalue equation and you get the same eigenvalue. So it has to be an H 0 tow be associated with the eigenvalue not in the random walk. So this basically tells you the lambda one of the interchange is the minimum of these two quantities where mu one is associated to two functions in each 0. So the claim here will be that this mu 1 is larger or equal to lambda 1 of the interchange process on the graph GX where you have removed the site X and you have performed the electric network redistribution of the weights. So let's prove this. This would be an immediate concept of the octopus inequality. So the observation is that if you are in H 0, then the L2 norm of F is the same as the expected value of the conditional variance given at X. Because variance can be decomposed into expected variance, plus variance of conditional expectation. That second term is 0. So conditioned on eta X, everything becomes a system that your measure becomes uniform over permutations of the labels which are not occupying X. So it's a reduced system where you have N minus 1 labels. So you can use spectral gap inequality on that system. So on that system, you can place your weights according to the GX recipe. Here you're free to do whatever you want, provided you put the right weights. So you put the weights corresponding to the GX graph, and what you obtain is the weights of the GX graph, which are the old weights, plus the additional weights. But now these edges do not touch X. And you have this formulation here. This term here of the gradients. So if you take expectation now, you can remove these conditional -- conditioning on eta X just because we had the L2 -- this identity. So what you get is you have two terms. One involved in the old weights and one involving the new weights. So now you use the octopus inequality. The new weights are dominated by the old weights coming out of X. So you just plug it in. And from this inequality, what you get is a piece involving the edges not touching X and a piece involving the edges touching X. You put them together. You get the old graph. So when you get the old graph, you're done, because this is exactly the bound you are looking for, because this holds for any F and H 0 so the mu 1 G is larger than or equal to lambda 1 of the interchange on GX. Okay. So this is a piece of knowledge that now we can use to finally kill the problem. Namely, assume by induction that on your reduced graph the conjecture is true. Then what we obtain is that this mu 1 G is larger or equal than the interchange process gap on GX which, by assumption, is equal to the random walk gap on GX. And so going back to our minimum of the two quantities, we get that the interchange process gap on G is bounded below by this minimum. And from theorem one, we know that the random walk gap on GX is larger than or equal than the random walk gap on G. So we're done. So this is the bounded that was missing to prove the theorem. So really you have reduced the problem essentially to one nontrivial piece of information, which is this octopus inequality. So the work starts here and the talk stops here, I guess. But [laughter] but I don't know how much time I have, actually. Maybe I can give a few ->>: Five by five ->> Pietro Caputo: Maybe I can give a little bit of just a little bit of ideas about this inequality. So what we said is that we want to bound the energy or the Dirichlet form associated to the star in terms of the Dirichlet form associated to the complete graph in the complement. So if you start doing this on, say, three vertices, you want to start easy. And you find that actually this is not too hard. But it's still not just the Schwartz inequality of one step kind of thing. So okay it doesn't require so much work for three vertices. But if you go to four, it becomes really nontrivial. So four really requires some work. But before I tell you that, maybe one observation, which is sort of interesting, the first thing you try when you have something about the interchange is if it's through for functions of one label only. Because for this, things are easy. And it turns out that this inequality here takes a nice form for functions of one label only. So let's take for our function F, just a function C of label 1. So if you plug this in, this bound -- well, let's look at the left-hand side. You obtain that the left-hand side is equal to the right-hand side, plus this quantity, which is just the random walk generator applied to function psi at function X which is the site we're removing. Normalizing by CX. So this term is 0 every time that you have an harmonic function C at the point X that you are removing. So this tells you that for functions of one particle, this inequality is always true. Of course, this is nonnegative, and it becomes saturated by all functions of, that are harmonic at the point X. You might expect that there is a way to interpret this in terms of this harmonic property. So this is what we tried to do for a long time. And we never came up with anything. So this is a little bit surprising. And it would be nice to have some probabilistic or analytic interpretation or whatever, anyway, of this inequality, which is not just a proof that -- because in the end we have a proof, but it's not that nice from an intuitive perspective. And so maybe what to say. Maybe another thing that should be noticed here is that we have from theorem one we have that the spectral gap walk, the spectral gap of the random walk on GX is larger or equal than the spectral gap of the random walk on G. So in some sense this inequality is going in the opposite direction of that statement, because that statement says that the quadratic form of the reduced graph produces a larger gap. So there is a subtle point here. And the subtle point comes from the fact that this expectation you see here does not involve -- does involve all the sites, including X. Namely, for this Dirichlet form, there is an isolated point at X. So this isolated point in some sense is shifting down all the eigenvalues because there's an extra 0 in this spectrum of this object. So there is this interplay here between the result of theorem one and the result of theorem two, they sort of go in the opposite directions. That's what makes the whole proof works in the end. So maybe just about the general case. So the general case you have some metrics A, which is indexed by permutations, and this metrics can be written -- if you want to prove this inequality, this is equivalent to prove that this metrics A is nonnegative definite. If you bring all the weights on the same side, you get some weights with a positive signed, some weights with a negative side, and this is the inequality you want to prove, basically. So, of course, this CB all depend on the weights, this CB star all depend on the weights CB. So the only variables are really these weights coming out of X. And you want this inequality for arbitrary choice of these weights. So even if you want to prove this inequality in general for all weights equal to 1 is nontrivial. So it's not a really simple object. So the way we found to prove this result is by observing that if you start playing with these quantities, you discover that there is a little bit of structure, which is independent of the value of the weights. So if you try to exploit that and play with the weights, the way that these things combine is that you manage to write some linear combinations of some deterministic matrix, deterministic I mean independent of the weights. And so you start seeing that in the end it is sufficient to have certain bounds on some deterministic matrices and these are huge because they're N factorial by N factorial, but you can try to give some block decomposition to them and analyze each block in a suitable way. In the end, you reduce the problem to some nontrivial inequality, which you can check by computation, essentially. So I think the final object one has to look at is a 60-by-60 matrix in the end that you can check is nonnegative. The only entries are 0s, 1s and 2s, and you can check that by sort of simple algebraic manipulation, you can check that it is nonnegative definite. But as I said, it's clear that it would be interesting -- also in light of this electric network reduction analogy and this harmonic function analogy, it would be really interesting to have some deeper understanding of this type of relations. So that remains something to do, apparently. Okay. I guess I'll stop here. [applause]. >> Yuval Peres: Questions or comments? >>: So the power is from theorem one and theorem two which is going different directions, once you have them at each other you block the power [phonetic]. >> Pietro Caputo: This is one way to put it, yeah. I agree. >>: Simply marvelous. >>: The theorem [inaudible]. >> Pietro Caputo: Yes. >> Yuval Peres: Thank you. >> Pietro Caputo: Thank you. [applause]