>> Yuval Peres: Hi, everyone. It’s my pleasure to have Persi Diaconis here. Persi has been a great influence on many people here; in particular, a lot of my own research in the last fifteen years has been dictated by the directions that Persi initiated thirty years ago. And now he’s a—you know—and as I said in the e-mail, you can find him now in the list of the twenty most influential scientists today. Anyway, without further ado, today, Persi will tell us about random walks on the Heisenberg group. >> Persi Diaconis: Thank you, Yuval. Hello, thanks for coming out. Let’s see… first is: there’s a handout, so this is a no free lunch… you have to make sure somebody gets… each person gets one of these pieces of paper. Okay, that’s the first thing. So this talk is about a homework problem I’m in the middle of, and I know better than to do that to you—you seem like nice people—so I’ll try to explain to you why there’s something in it, I hope, for you. The talk is joint work with Dan Bump, Angela Hicks, Laurent Miclo, and Harold Widom, and it’s about, well, the Heisenberg group, so I’m going to call that group H, and it’s ju… very simple thing—it’s the three by three matrices, upper-triangular, and with entries x, y, z. And I’ll write such a matrix as x, y, z. And so, in particular, if I multiply two of them times x prime, y prime, z prime, well, that’s x plus x prime—they just add above the diagonal—y plus y prime, and here, if you figure it out, it’s z plus z prime plus xy prime—just multiplying matrices. And x, y, and z, well, in physics they are usually in R and C; x, y, and z could be real numbers; they could be—in number theory, where this is a big deal—they could be in the integers. For me, often, they’ll be in the integers mod n, but they could really be in any ring and this all makes sense. The random walk on this group there… the group is generated by this generating set I’ll call S, which is one, zero, zero minus one, zero, zero symmetric random walk—zero, one, zero and zero, minus one, zero. So random walk is: you pick one of these two coordinates at random, and you put plus or minus one there; you write down that matrix, and then you multiply by it on the left. And so I’ll try to motivate it in a second, but as math, if I say q of g—g is a group element—is equal to one-quarter if g is contained in this set S and zero otherwise. Okay, that’s one step of the walk, and then q star q of g is convolution—q of h times q of gh inverse, summed over h. So what’s the chance of being at g after two steps? You have to have picked something your first step, and then picked the thing that gets you to g your second step, and similarly, we have q star k of g—the chance of being at g after k steps of the walk—and under no conditions… well, under… so example of the kinds of theorems that people prove—several people in this room—if the entries are in the integers mod n, then—and if n is odd, ‘cause otherwise, there’s a parity problem—then q star k of g converges to the uniform distribution, u of g, which is equal to one over n cubed. So if you walk around on this group, you get to a random group element—random group element means x, y, and z can be anything—and the rate of convergence is less than or equal to some constant e to the minus—I think it’—two pi squared times k over n squared, and it’s bigger than or equal to some other constant times the same thing—two pi squared k over n squared. And so this says—we say—order in squared steps are enough. If you go ten n squared steps—if k is ten n squared—this side is exponentially small; if k is a tenth n squared, the side is big, and this… everything is explicit. And so that’s a typical theorem in this subject—how long do you have to walk—and if we’re over Z, for example, then random… this is an infinite group, and it’s not Abelian, and it’s—I don’t know—it’s… so random walk isn’t recurrent; you don’t come back to zero, but you can ask, “How long do you… how… what does it look like?” And well, q star k of—I’ll write the identity—zero, zero, zero, the chance—you can ask all kinds of questions—but what’s the chance of being back at zero after k steps? This is asymptotic to a constant over k squared, where c is equal to something: gamma of one-quarter squared divided by pi to the five halves times the square root of two, if you wanted to know. Okay, so these are kinds of theorems that people prove. Now, this is a sort of generalish audience, and I want to tell you why I care about these things, and maybe a little bit why you might be interested in some of what’s coming, so some motivation for this study, ‘cause that’s… >>: Is it unclear if it is transient? >> Persi Diaconis: It… no, it’s not hard to see that it’s transient, but it is transient. It just a… let’s just… it’s a theorem, mmhmm, so it’s a little theorem, but it’s… >>: Any group of more than quadratic growth is quadratic. >> Persi Diaconis: Right, there are many, many ways to see it, but I’m just—you know—but that would be a question you could ask, right? “Is it transient or not?” And you could ask about off-diagonal—you know—and you can ask a lot of questions. And so my motivation for studying this problem—for giving this talk—is the following: the first is—it’s a funny question for you, maybe, but not for me—“Is Fourier analysis good for anything?” [laughter] Actually, useful—I’ll ex… you’ll understand that by the time I get through—and well, just to say a sentence about it, this is a random walk on a group, and you could try to study that using Fourier analysis, and it’s hard to do, as you’ll see. And I think that there are eight proofs of this now, and none of them using Fourier analysis, and so since this is the natural way to try it, well you’ll—I hope—understand, but that was my motivation. I just wanted to try to do one of these problems by Fourier analysis. The second motivation is about the features of random walk, and I think this is a quite important… quite an important topic for a lot of things. So I probably don’t have to tell the people in this audience that Markov chains are used for all kinds of computational tasks, and people like Yuval, and I, and other people in this room often study the question of how long do you have to run a Markov chain until it’s close to its stationary distribution. Well, we have global kinds of results, like this one: if I run the Markov chain n squared steps, then in this very strong total variation distance, it’s close, say… which means for many, many questions the answer you get from the uniform distribution, and the answer you get from the random walk are close together. But you might not care about all aspects; you might only care about the aspect you care about. You’re running this random walk to do a certain simulation problem; you don’t care about all the other questions you might ask. How long do you have to run it to get your feature right? So this is an example where that comes into focus. So let me explain that in this context. So the steps are: I’m gonna pick elements of this set one at a time—so suppose the elements I pick are epsilon i, delta i, so that is… I, you know, just… so this is either, you know, one, zero; minus one, zero; zero, one; zero, minus one, okay? Those are the steps I pick— I’ll… I won’t write the last zero—and then my walk is—you know—epsilon—well, I guess the way I’m doing it—epsilon k, delta k times epsilon k minus one, delta k minus one, epsilon one, delta one, alright? That’s my walk, and I’m gonna multiply out, and I’m gonna say that’s equal to, say, xk, yk, ck. Now, it’s very easy to see from the rules what—you know—so here, xk is equal to epsilon one plus—and so on— plus epsilon k, and these epsilons are—you know—zero, plus or minus one; they’re zero with probability a half, and they’re plus or minus one with probability a quarter. So this is just simple random walk on the integers mod n—if I’m in this context—and we know everything about how this behaves, right? That is, this behaves like the central limit theorem says, and if you’re… if you think of the integers mod n as n points wrapped around a circle, it’s doing random walk, and therefore, it takes n squared steps to get random, and et cetera, but in particular the… well, okay, so and the same for y: yk is equal to delta one plus—and so on—plus delta k. And the joint distribution of these two things follows the bivariate central limit theorem over Z, and so when you multiply these matrices, what’s here and here are like Gaussian random variables, and so we know everything about them. So let’s look at this third coordinate… yeah? >> Sarah: I got a little lost. What are your pairs there in terms … what’s [indiscernible] >> Persi Diaconis: So each time—so I’m using random variable notation—each time I’m picking one of these four things, and each time, the fourth coor… the third coordinate’s zero, so I’m forgetting about it. So this really could’ve been epsilon delta zero, epsilon delta zero—okay, but I’m just forgetting the zero, okay? And they’re random—okay—so epsilon delta is equal to one, zero, or zero, one, or minus one, zero, or zero, minus one with probability a quarter each. So those are just the steps I’m picking each time, and then I’m multiplying them together, so there should really be a zero following each thing, and then I’m multiplying them. So is that okay, Sarah? Okay, thank you, thank you. So what’s the third coordinate? So here, zk, I think it’s this: it’s epsilon two times delta one plus epsilon three times delta one plus delta two plus epsilon k times delta one and plus—and so on—plus delta k minus one. Okay, I think it’s that—it’s very easy to figure out what it is—I think it’s that. And so the— you know—these epsilons and deltas, they’re a little bit dependent; if epsilon i is plus one, delta i is zero, okay? They’re a little bit dependent, but really… okay, so you can ask now if you’re a probabilist—forget about anything else—or any kind of thinker about these kinds of things: “How does this thing behave? What do I have to divide it by so that it has a nontrivial limit distribution?” Here we know that—you know—xk divided by square root of k goes to normal with mean zero and variance a half or whatever it is. And what do I have to divide zk by? Well, what you can show is that you have to divide this by k—so here zk over k—that has a limit; this goes to a limiting random variable; this converges, as k gets large, to a nontrivial limit, and—‘cause Yuval wrote a book about it—I’ll just put it down; what’s the limit? This goes to—I’ll call it—z infinity; this has a limit; this has a limit this way—weak limit—and where the z infinity is distributed as the integral from zero to one, of B one of SDB two of S, where these are independent Brownian motions, and that’s not very surprising. If I divide this by k—you know—this… these are going to—if I divide this by root k, sorry—if I divide it by k, I put one of the root k’s under these things, they all go to… this goes to Browninan motion, and then, this is a Riemann sum for this integral. It’s not very hard. If you want to make math out of it, this is a martingale, and the martingale central limit theorem tells you it has a limit, and the limit is identifiable as that. And so that shows that this third coordinate—this z coordinate—you see these coordinates, it takes them… in order to go… they want to get random on the integers mod n. They have to go n squared steps in order to have a good chance of getting down to the bottom. This third coordinate is getting random much faster, so this shows that—you know—taken… so if k is of order a constant times n squared, zk is—mod n—is uniform if c is large. So this third coordinate is getting random much faster. If I… so this is an example of features, and I think that it’s an important thing to try to study: how long does it take parts of a random walk or Markov chain to get random. >>: [indiscernible] k is c, and you wanted k to be cn? >> Persi Diaconis: cn… where? I’m going… thank you. That’s right. And c should be large, and so forth. Anyway, those are… so as a conjecture that we don’t know how to prove, if instead of doing this with three by three matrices, if you do it with d by d matrices, and then you… so you pick a coordinate just above the diagonal at random, and then you put plus or minus one there, and everything else is zero— just the analogue of that. Yuval and Allan Sly have beautiful papers showing that this takes… I think it’s… if it’s d squared, p squared steps on the whole group. But if you just look above the diagonal, it gets faster, and if you look on the second diagonal, it’s faster still, and in the corner, it’s very fast. I mean, it… just above the diagonal, one of these entries that’s the same as what I just talked about, gets random. So just above the diagonal, it takes p squared steps to get random; two above the diagonal, it takes p steps to get random; three above the diagonal, it takes p to the two-thirds steps to get random; and j above the diagonal, it takes p to the two over j steps to get random. And that’s a conjecture; I can’t prove that, but I’m pretty sure it’s true. Okay, so that’s a little bit of what I mean by features—I hope that’s okay—and now I want to talk to you a little bit about what I mean by this question. So for a moment let me let G be any group—G is any group, finite group, say. Each group, like the Heisenberg group, and suppose that q is a probability—q of g is a probability on G—then… and I define convolution just by this recipe, et cetera—so I do the random walk generated by q. And you can ask—you know—suppose that the… it’s… you don’t have pariy problems, and that you’re—you know—living on a generating set and stuff like that. So suppose that q star k of g converges to the uniform distribution, and then you can ask how fast it occurs. And you… one way in which I’ve studied those problems is to use Fourier analysis, so a representation of G is mapping, that … it is mapping that assigns group elements to matrices, GL(V), with the property that rho of st is equal to rho of s times rho of t. And so you assign matrices to group elements in such a way that products are preserved in that way. And the Fourier transform at a representation q hat at rho is, by definition, the sum over G of q of g, the weights times the matrices. So it’s a matrix—the Fourier transform is a matrix—and as usual, Fourier transform takes convolution into product, so q star q hat at rho, is equal to q hat at rho squared. And the uniform distribution has the q—you know, q is uniform—if u hat at rho… the Fourier transform of the uniform distribution is easily seen to be zero if rho is nontrivial irreducible—a representation is called irreducible if you can’t break the vector space up into two disjoint parts such that the representation only takes you into one part—anyway, and is equal to one if rho is one—the trivial representation. And so the way you can study this convergence of… to uniformity is by showing that high powers of this matrix converge to zero. And the upper bound lemma makes that precise, and it goes this way four times the distance of q star k to uniform—so this is this total variation distance—is less than or equal to the sum over irreducible representations of the dimension of the representation—I’ll try to explain this—times the size of the matrix, q hat rho to the k, and this is squared, and this is the trace norm… trace norm. So okay, that’s… and this V here is d rho dimensional. So that’s a bound… that says if you have your hands on how fast these thing go to zero, then you can bound this. So this is a completely general recipe, and—you know—here’s this… this is perhaps the—or almost—perhaps the simplest noncommutative group, so if you like that kind of stuff— it’s very natural—why can’t we do it on this group? What’s wrong with that? What happens—you know—what happens, okay? And it’s a famous hard problem, and I found out why, and you’re gonna hear why, okay? That’s the… but, what kinds of things happen? So in order to talk about that, I have to tell you what the irreducible representations of the Heisenberg group are, and they’re easy. So the representations of the Heisenberg group—and I’m gonna tell you for the integers mod p, just… it’s not hard to say it for any group, if you want to know ask me after the… for any… it’s hard to say for any group; it’s not hard to say for any n, but it’s a little more complicated, and let me tell you what they are when p is a prime. And so there are p squared one-dimensional representations—dimensional representations, characters—and there are p minus one p-dimensional representations. And one of the facts of life is that if you sum the squares of the irreducible representations, that sums to the size to the group, and p squared plus p minus one times p squared— the sum of squares of the irreducible rep—is p cubed, right? p squared plus p minus one times p squared equals p cubed—that’s true. So okay, so what are the representations? The one-dimensional one… representations, they’re indexed by pairs ab, mod p, and so, xyz. This is a… so remember representation’s a linear map; a one-dimensional representation is a linear map of a one-dimensional space into itself. That’s just multiplying by a number, and this number here is e to the two pi i a x plus b y over p—that’s familiar—those are the… you can see that if you—you know—multiply this and multiply these, the product is the product, it’s okay. The p-dimensional representations, they’re… they go like this. They’re on a vector space, so the vector space I’ll take… you can take all the same vector space; V is the set of all functions f—or column vectors f, anyway—f from the integers mod p, into C. Okay, so it’s just… that’s a space. And I just have to tell you how x, y, and z act, so here, rho in this… there’s only one parameter that comes up—c—of x, zero, zero. Now, that’s a linear map of this space into itself, so it has to take a function into another function, so it acts on the function f, at the argument j, as a f of j plus x, so it just translates, okay? That’s okay. It just shifts; it’s a cyclic shift, if you like; it just shifts the vector around. The next one, rho sub c of zero, y, zero—that’s what underlies the Fourier transform— this acts on f at j as a… it’s e to the two pi i c over n times yj, as a multiplier, times f of j. So it acts diagonally, and… that’s how it acts. And z acts… rho sub c of zero, zero, z at f of j just acts even in simpler way, it’s e to the two pi i c over n times z times f of j. Okay, so if you put it all together, rho sub c—c’s a nonzero integer mod p of x, y, z—well, it’s something—you know—it’s e to the two pi i c over n times—they will combine, right—so y times—don’t tell me—times j plus z times f of x plus j, this is acting like f of j. Okay, so that… those are the p-dimensional representations, and there’s one for every non-zero … n is p—you all knew that, right? n is p—p, p, p—okay. So is that—I mean—that’s… you can just check that they obey this rule, and it’s not hard to check. And so now, what does my Fourier transform become? So here, ‘member my q; q is—you know—I picked one of the two diagonal elements—they’re plus or minus one—and I—you know—so here, q hat rho sub ab. The Fourier transform… well, that’s just—you know—that’s the Fourier transform at the one-dimensional representations; that’s just—well—it’s one half cosine two pi a over p plus one half cosine two pi b over p—two pi b over p. So it’s just… that’s the Fourier transform. And q hat at the p-dimensional representation, it’s a matrix, and it’s not a bad matrix. It’s got a quarter in front; it’s got ones just below the diagonal; it’s got ones just above the diagonal; it’s got a one here and a one here; and then on the diagonal, is cosine two pi c j over p—zero less than or equal to j less than p minus one. Okay, so it’s a… it’s that matrix. And one of the jobs that there’s gonna be—which was new to me—is I had to get my hands on the eigenvalues of this matrix. And so if anybody’s seen that matrix before or knows anything interesting about it, I’d be happy to hear about it. And so let’s say what the job is in order to bound the Fourier transform, I… so I… ‘member my job was to try to use this machine to bound the rate of convergence of the walk. So I have to bound; I have this calculus problem. So this is a matrix; it’s a symmetric matrix, so it has real eigenvalues. The eigenvalues are beta one of c, beta two of c, beta—I don’t know—p minus… well, p of c—the real eigenvalues… some real eigenvalues. So—you know—if you write that thing out, I have to bound this… this should be rho, not the trivial representation. So I have to bound this sum—it’s the sum of two parts—the sum over a and b not equal to zero, zero of, well, one half cosine two pi a over p plus one half cosine two pi b over p to the two kth power—that’s the sum over the one-dimensional representations—plus the sum of c equals one up to p minus one of…well, this… the dimension of the representation is p, and then the sum over j equals one to p of beta sub beta j of p to the two kth power. And that’s… this should have been… yeah, this is a norm squared, so this is the sum of the squares of the eigenvalues. Okay so, just… I’m sorry, but I’m gonna make you look at this for a second. I know it’s not fun to look at anybody else’s calculus, but I’m not gonna do very much of it, and see if you can at least look and see what the job is. So one term in this sum is when a is zero and b is one—that’s a term. So, when a is zero, this is one—cosine of zero is one—this is one half plus one… well, this is one half plus one half cosine two pi over p to the two kth power. Then that’s one term in the sum. Well, cosine near zero is one minus x squared, and so this is one—you know—minus two pi over p squared over two squared— something like that—to the two kth power. And so in order to make this one term small, k has to be of order p squared, I mean that’s… if k is ten times p squared, this is like e to the minus ten, and so… okay that… and then in the usual way, that’s all the trouble—that is the… all the other terms are smaller, and they all add up, and you can bound this sum by straightforward analysis as long as… so k has to be of order p squared. So good… I need to know about these numbers, and I need to know about them with some kind of reasonable exactitude. And I will show you coming—a little bit—beta j c is—you know— certainly less than or equal to one minus an explicit number over p—so something like that is true—and so this sum has p terms in it, and so you have to… and this sum has p terms in it, and so I need to choose k so large so that—p terms, p terms, and then there’s a p—so it’s p cubed times one minus constant over p to the two kth power. Well—you know—if k is of order p squared, which I need to kill the linear terms, then this is tiny, so that doesn’t cause any trouble. So sorry for making you look at this calculus, but that’s what would be involved if you were able to do that. So I was left with the problem of trying to bound the eigenvalues of these matrices. Now, by now, I’m very good at bounding eigenvalues of matrices, and I just thought, “You know, it’s a tri-diagonal matrix, how bad can it be? What’s that gonna do? You know, that’s not gonna cause any trouble.” But then I realized that all the tools and tricks I know are in probability language and about stochastic matrices. Well, it shouldn’t be so bad—you know—this cosine can be negative—sorry, cosine can be negative—I… and then the rows aren’t… don’t have a constant sum, and so I didn’t know what to do, and I tried some things, and they didn’t work, and so we tried some other things. And I want to tell you that, but before doing that, ‘cause that… I want to give you some more motivation. So why would anybody care about these matrices; it turns out they’re famous matrices. So more motivation. So just to make it simple, let me try to take c to be one, so M is equal to M one, and that has cosine two pi over p or n—I’ll make p n now, doesn’t matter—times j down the diagonal. So it’s this matrix with just that down the diagonal, okay? So those matrices come up, and here’s one place they come up. Probably most people in this room know what the discrete Fourier transform is. So the discrete Fourier transform matrix , fn, has, say, j, kth entry e to the two pi i j k over n—over square root of n if you want it to be a unitary matrix— so that’s a n by n matrix, zero less than or equal to j and k less than n. So… and there are teams of electrical engineers who want to know about the eigenvalues and eigenvectors of the discrete Fourier transform matrix; there’s a fair-size literature on that. Why do I care about it? Well, it turns out that, because of what the Heisenberg group does for a living, this matrix commutes with M. So fMf inverse—I don’t know—fM is equal to Mf… and… or, right, so I’ll put it here… and what that means is that they’re both symmetric matrices; it means they’re simultaneously diagonalizable, and I thought, “Ah, I’m in luck. These engineers are gonna know all about the eigenvalues of the discrete Fourier transform matrix, and therefore, since they’re simultaneously diagonalizable, I’ll know a lot about the—you know—I’ll know a basis for… I’ll know how to diagonalize M.” And… well, okay, that sounds good—it sounded good to me—but alas—not alas, but anyway, what’s true; I think Gauss showed this—but the fast Fourier transform, you know it’s its own inverse, right? So f to the fourth is the identity, and what that means is that the eigenvalues of this matrix are plus or minus one, plus or minus i—that is, it’s the… it’s a unitary matrix, so it has eigenvalues which are roots of unity—and the dimension of the eigenspace are around n over four—within one. So I do get a reduction of M in—you know, because M preserves the eigenspaces of the Fourier transform matrix—but it’s not helpful. And there is a lot of work, as I said, in the engineering literature about various decompositions—you know—eigenvalue decompositions for f, but because the bases are so non-unique, they weren’t usefully related to M and I couldn’t us it. On the other hand, I now have very, very good approximate bases—eigenvalue decompositions of M—and they do decompose the Fourier transform matrix, and that seems to be interesting. So that’s one motivation—okay—for studying this matrix. A second motivation comes from physics. And so the… so this is Harper O Hofstadter and martinis— martinis, not Martini; it’s not a person; it’s the alcohol. So there’s a very large literature in solid-state physics about periodic… the Schrödinger equation with periodic potentials. To just say a simple version of it, say on l2 of Z—square summable sequences of length z—the Schrödinger operator—it’s just this operator with a periodic potential—takes… so I’ll call it—I don’t know—l of… phi as an l2 function at j is equal to phi of j minus one plus phi of j plus one plus cosine theta j plus eta phi of j. So that… this is the analogue of a second derivative operator, and they’d usually put a constant in here—a v. And if you make… if you want to compute anything, you discretize this operator and look at it mod n, and these are exactly my matrices, I mean, so this… these are slight shifts, but that doesn’t change anything. That is, these matrices, if you discretize it or take periodic boundary conditions, are my matrices, and there’s enormous, both applied, numerical, theoretical work on: what’s the spectrum of this operator? The… if you want to get famous, Arthur Avila just won the Fields medal, one of his accomplishments that’s listed is called the ten martinis problem; that was a problem of Mark Kac, who in a talk of this sort, said, “Well, I’ll give ten martinis to anyone who can solve this problem.” And it was to show things about—this is an infinite operator—to show things about the absolute continuity of the spectral measure, and well, people like Barry Simon, and many other people have written lots and lots of papers. When this is a general parameter—when v is two, which is the case that we’re doing… we’re dealing with—this is called Harpers operator or Doug Hofstadter worked on it in his thesis. And if you type in Hofstadter’s butterfly, you’ll see lots and lots of references, and looks at what the spectrum the operator looks like. I won’t try to say that more; I can, but there’s a lot of interest in the eigenvalues of this matrix in the physics community and in the solid-state community. So I won’t say more about it, unless asked. Okay, so I need to know about the eigenvalues of this matrix. So—you know—this is… 2014, so take a look at the handout here… well, this is… this side with the pictures. So this is when m is two hundred and the parameter a is c. So this is: what’s the biggest eigenvalue of that matrix when the parameter— which I was calling c and is here called a—varies in its range? So what you can see is that, for example, when the parameter is—you know—one, the eigenvalue’s very close to one, and I said it’s one minus a constant over… >>: Does the parameter a [indiscernible] the x axis? >> Persi Diaconis: No, the parameter across the x axis is… let me write it down. Thanks. The parameter… the matrix has a cosine of two pi—well, I called it c—times j over p, zero less than or equal to j less than p minus one down the diagonal, and the parameter that’s across the x axis is c, okay? So for each c, this is a matrix, and it has a top eigenvalue, and that’s what’s pictured here. >>: So what’s p? >> Persi Diaconis: Oh, p is two hundred, which isn’t prime, but that doesn’t matter. p is two hundred. Sorry. p is n. So p is two hundred—so here, it’s called m. It’s the size of the matrix, it’s two hundred. So what you can see is that the eigenvalues are pretty close to one, and then they fall off—they’re not monotone—you know, unfortunately. They are symmetric, and that’s not hard to show. And okay, that’s good. Now, in order to use these bounds, I also need to know the smallest eigenvalue—the one that’s the closest to minus one. And that’s what the smallest eigenvalues look like, and these look like mirror images of one another, but unfortunately, that’s just what they look like. They’re approximately mirror images. Those two dots, for example—no—but they’re not exactly mirror images, and so I also had to bound the smallest eigenvalues, and so forth. And I’ll come back, maybe, and talk about some of the eigenfunctions. This is the first eigenfunction; the eigenfunctions are localized, and they’re very peaked around zero. And this v equals two, which is the case we’re in, is the critical case. If v is bigger than two, then the eigenvectors are not localized, and when v is less than two, they’re are localized, and here, they also are localized, but… which you can see; they’re mostly zero, but they’re kind of very peaked around… well, very peaked around one. Okay, so you can look, and then you have to prove something; eventually, you have to prove something. I just looked on the table, and I won’t… but it’s a good time for a minute, so Anther showed me this. I can’t hand it around, because he’ll kill me, but this is the Heisenberg group—a portion of the Heisenberg group—with those generators. This is the Cayley graph of the Heisenberg group, and those of you who are interested, can come up and take a look at it under inspection, so okay. It’s—I think—from a threeD printer, and I never saw such a thing, so I’m thrilled—thank you—but one can look. Okay, so I want to talk to you about the… how you bound eigenvalues of non-stochastic matrices. And I’m afraid there’s a joke about this—you know—there’s a joke about a physicist, and a mathematician, and—I don’t know—somebody else, and—you know—the point is: the mathematician goes back to cases he knows, and proves things by induction, right? So I’m gonna go back to what I know, so I have this matrix—my matrix—and I’m gonna… it’s got cosine down the diagonal, so it’s not positive. So I’m gonna make it positive, and I’m gonna make it sub-stochastic. So I’m gonna let M—I’ll work with M one—I’ll just call it M, but then I’ll just add the identity, now it turns out good to add a third the identity plus two thirds of M, okay? So… but once you add the identity and two thirds of M, that makes everything nonnegative, and the row sums less… between zero and one. So has nonnegative entries, and the row sums… okay, so you can just easily… easy to check that. And then, I can make it into a stochastic matrix by just making it an absorbing Markov chain. So I’m now gonna make a Markov chain, K, which is like this: here’s infinity, here’s zero, one up to—well—n minus one, and here’s one third the identity plus two thirds m, and then, here are some numbers which I’ll call a zero, a one, up to a n minus one, which just make the rows sum up to one. So let me try to explain that… well, let’s say what they are: aj is equal to one third times one minus two thirds cosine two pi j over n. So it’s just what you need to make the rows sum up to—what—the identity, if I did it properly—to make the rows sum up to one— if I did it properly. So what is this? This is a stochastic matrix—all the rows sum to one; all the entries are nonnegative—I added a site to the space, infinity, and this is a Markov chain, but if it hits infinity, it dies. So this is absorbing at infinity—if you hit infinity, you stay there—and the rest of the time, you could go to infinity; this is the chance of going from j to infinity—that’s what this first row is. So okay, I reduced… I made the problem into something that’s friendlier to me; it’s a stochastic matrix now. Of course, this matrix is a stochastic matrix, so it has one eigenvalue which is one—mainly… yes. And so if I can bound the second eigenvalue of this matrix, that will be the top eigenvalue of this matrix, and then I’ll be in business. So the way you do that, I’m gonna call these—this set of states—S, and S bar is S union infinity—in case that comes up—and I’m gonna use… I’m gonna bound this by the minimax type principal, but I associate a quadratic form with this matrix, and it’s a little bit tricky; it took quite a while to get it right. The quadratic form is the Dirichlet form, Eff—f is a vector, column vector— and it’s just equal to this: it’s one half the sum, over x and y contained in S bar, of f of x minus f of y squared K of xy K of x. K is this matrix, and this is a symmetric matrix, and this part is… well, let me write it down: u of x, where… these are Dirichlet eigenfunctions, so f of infinity equals u of infinity is zero, but of course, Kxy—you know—Kj infinity is positive, and so this… anyway, that’s the quadratic form which is useful and needed. And what you can show—the usual characterization of eigenvalues and terms of quadratic forms—shows that if we can find a a bigger than zero such that the l2 norm squared of any function f is less than or equal to a times Eff—that’s the usual way of… you know, the usual thing is the eigenvalue is the quadratic form divided by the length of the vector, and so then, that ratio will be bigger than one over a. That is equivalent to—or implies anyway—that the top eigenvalue of this matrix—which I’ll call beta—beta is less than or equal to one minus one over a. So if I can bound the quadratic form, I can get a bound on the eigenvalue I want. And now we use… I use this path method, but here the… it’s a little bit different, ‘cause of this infinity— and that’s a new thought to me. We need paths and for gamma x, which take any x and connect it to infinity, so this is x naught, which is x, then x one, x two, up to x d, which is infinity. And they have to be paths in the graph—that is, I need K of x i minus one x i to be bigger than zero. So I need paths connecting any x to infinity, okay? And the way we use that is we write f of x as, well, f of x minus f of x one, plus f of x one, minus f of x two, et cetera, plus f of x d minus one minus f of x d. The point being, everything cancels out, and this is zero. So that… and that’s just an identity. And then use CauchySchwartz; that’s less than or equal to the length of the path—the number of terms in the sum times the sum of squares, f of xi minus one… oh, I don’t know, minus f of x i squared. And then that’s like this, and then you fool around in a way that is standard way of fooling around—read Yuval’s book with David Levin and Elizabeth Wilmer, Path Arguments—and the bottom line is: using the geometry of the graph, you wind up proving this kind of a bound. You can take a—this a which works here—to be the maximum over edges xy where x is in S and y is in S—could be infinity; that has to be allowed, y’know— one over k of xy—this chance of this path—times the sum over z such that xy is contained in the path of associated to z, on the length of the path. So that… you can prove that kind of a bound. And then, so you want a to be small—turns out, in order for this bound to be useful, you want a to be small—and so a will be small if you could choose paths in the graph which take you from x to infinity with the following property: none of these things should be too small—so that’s important—and you shouldn’t have too many paths that use a given edge, ‘cause you don’t want this sum to be too big. And so that’s a kind of thing that we do as combinatorialists, and I’ll just say a sentence about that, but then I’ll just stop. So here’s infinity; lots of things connect to infinity. You can go from any j to infinity; so there’s infinity, and here’s—you know—here are these points, one up to n minus one. And if you start here at the beginning, this thing is very close to zero, it’s one minus cosine, right? That’s a… I don’t… I probably have these numbers wrong, but this thing is very close to zero, so you don’t want to do that. So what you do is you connect points to infinity by going from here over into where it’s nonzero and then up to infinity. And this point you go to here, then up to infinity; this point you go to here, then up to infinity. You do the same in the opposite direction. You have to choose a break point, and you go here, and then here, and then here, and then up to infinity. Points in the middle you connect directly to infinity, and what you can see if you choose paths that way and do the combinatorics—you know—this isn’t too small, and therefore, one over it isn’t too big, and this isn’t… not—you know—at most four paths use an edge, and the lengths of the paths are not too long, and so you can control this. If you do this carefully, you get that a… you can take a to be of order a constant times—well, for this case—a constant times one over p to the four thirds. Now, that’s more than enough for the eigenvalue bound… >>: Is that one over a or a or…? >> Persi Diaconis: This is a, so that… oh, yes, yes… yes, c times p to the four thirds. And so the eigenvalue bound: you get beta’s less than or equal to one minus some constant over p to the four thirds, and that’s not what I used before, but it’s more than enough to do the job—you know—I… that… so I’m gonna stop here, except to say time flies, and it’s about… it is about time and over time. We actually know for these matrices, I actually know what the top eigenvalues are, at least in the corners, and so actually beta is equal to one minus pi over two p plus big O one over p squared. And actually, if I put in a c—it’s c over p… as long as c is small, c fixed. So I was able—or we—our crew was able to relate the eigenvalues of this matrix to the eigenvalues of the harmonic oscillator, which are very well-known, and use that relationship in order to bound these eigenvalues. Unfortunately, I need to know this, not only for c fixed, I need to know it for all c. And these cruder geometric arguments work great; they don’t give quite as precise results, but they do give good answers. So I want to finish by just saying, I started on this talk for two reasons: I wanted to know is Fourier analysis useful? And if not, why not? And I wanted to know about—you know—the distribution of the center argument. Well, I can’t answer—you know—this is pretty hard work is all I’ll say. We really have seven proofs of the fact that n squared steps are necessary and sufficient for random walk on the Heisenberg group, so I—you know—did I need to do it this way? No is the answer, but still, it shows—you know—what the difficulty is. Having done this work—it is important to say— there are lots of other groups where the Fourier transform has exactly this form, and so more or less, any class-two nilpotent group, the Fourier transform… the high-dimensional Fourier transforms have this form. And so knowing about the eigenvalues of these matrices, there are a bunch of other random walks—some of which, our previous techniques didn’t apply to—that we can do well. It is also worth saying that our sharp… these sharp eigenvalue bounds are much better than what ergodic theorists and Arthur Avila got on the problems that they care about. They didn’t actually care so much about the extreme eigenvalues, although they did care about them, and they got… we got much sharper bounds, and so that’s good. So I hope that you know what I was up to, trying to do my work on the Heisenberg group, and I hope that is instructive for you sometime. Thank you. [Applause] >> Yuval Peres: Questions? >>: So does… did you to try to expand this a little bit to other… the other upper-triangular matrix groups or one with the diagonal integers above or… >> Persi Diaconis: Yeah—Sarah was the editor for our super-character paper—bless you. Yeah, I didn’t try. I just… we… I just decided I was gonna do this one, and I didn’t know… I mean, I will eventually try. But one of the problems is for the group of n by n upper-triangular matrices, nobody knows what the irreducibles are. And in some sense you can prove that nobody will ever know; they’re wild problems. And for this case—you know—I said, “Well, here are the irreducibles; four by four, okay; five by five, okay; six by six, and then it stops. I mean, nobody knows what the irreducibles are—I think—for seven by seven. And we don’t know the characters, we don’t know the conjugacy classes. So we did manage to make new, and better, and easier-to-use super character theories using these ideas and that, so I would like to do more about that, but I just decided I wanted to do this problem. How hard could it be? It’s a tri-diagonal matrix; it shouldn’t be so bad. Well, okay, I have scars to prove it—wasn’t so much fun. >>: What do you mean that you can prove that you can’t find the counter…? >> Persi Diaconis: That you can’t find the counterexample? That’s a good question. So let’s see if I can explain that. So the claim is that… so these are these groups: U, N, P, FP, which are ones on the diagonal—star NFP—and you know, our… what I claim is that I’m gonna try to convince you that there is a proof that you can’t describe the conjugacy classes. Now, of course, for any finite n, it’s a finite problem, and leave me alone—okay—but I mean, it’s not—you know—like the conjugacy classes of the symmetric group—I’ll explain, as a math thing—but conjugacy classes of the symmetric group are indexed by partitions, and we all… that’s okay, right? For GLn—you know—these are nice groups, they’re the Sylow p-subgroups of GLn—I mean, they’re not bad groups. So okay, so, let’s see, I proved that if you had a nice description of the conjugacy classes by a bijection, you’d have a nice description of what are called wild quivers. So a quiver, is just a collection; it’s a directed graph—I don’t know, okay— and the representation of a quiver is a vector space at each place, and a—at each vertex—and a linear map. Nothing has to commute, just… okay. Two representations are called equivalent if you can change bases and make the linear maps actually the same—okay, so if they’re equivalent up to change of basis. And the… a problem is classify quivers with the given—you know—representations of quivers. So these are familiar problems. If you have one dot and an arrow leading into it, that’s classify linear maps of a vector space into itself up to change of basis. Okay, that’s the rational canonical form—that’s good— okay, we know what that is. So here—two dots and an arrow going from… that’s… you have two matrices, and you’re allowed—you have a linear map—and you’re allowed to change bases arbitrarily, there’s only one invariant, which is the rank. Okay, but still, that’s nice. Okay, quivers which… in which the representation type is indexed by an integer, like the rank, are called finite type. Quivers in which the representations are indexed by a finite collection of real parameters or complex parameters, those are called—what are they called, not finite type—tame, type tame. And there’s a trichotomy theorem which says any quiver is a finite type, tame, or wild, and… just so you see the sub-content—it sounds crazy—but a quiver is of finite type if and only if it’s an orientation of a Dynkin diagram. That’s a nice theorem, that’s Gabriel’s theorem. Okay, so I showed that if you had a description of the conjugacy classes, you would have a nice description of a wild quiver—this one: two arrows. This is: classify pairs of linear maps from a vector space into another vector space up to change of basis. So this here, you have a pair of matrices and… say you have a pair of matrices, and you want to classify them up to change of bases, and there’s no… okay, now there’s a theorem that says if you had a nice description of wild quivers—any wild quiver— then you’d ha… there’s a unsolvable word problem that you could solve. So it’s much worse than PNP; it’s—you know—it’s up there, right? It’s… so the complexities… and now, to try to get a finite quantifiable version of that, I’d love to do that. And I keep trying to find a tame model theorist. The theorem I just told you about, that there’s a… this equivalence class, it’s on page three hundred of a book called Modules and Model Theory. And unfortunately, there are two hundred and ninety-nine pages before that theorem, [Laughter] but there is a theorem that says that if you have a nice description of wild quivers, you have a… and here it’s very easy to say what you do: given a matrix, you just… you embed two big blocks in it, and it… conjugation—you know—describing the conjugacy classes of such a matrix you can easily see is equivalent to classifying pairs of maps from one space to another. So I hope that gives a flavor for it, and… but if you can ever make more sense out of it… I did spend a week with Katrin Tent, who’s a model theorist trying to make a quantifiable version of it. I don’t write… but in some sense or other you can prove it. There… we have a… I have a paper with Richard Stanley and Ery Arias-Castro which is called “Random Walk on the Upper-Triangular Matrices Mod p”, and it has all this literature and the proofs and everything—everything’s written down. It was about ten years ago. We’ll never get out of here if you don’t know him. I’ll answer any question. >> Yuval Peres: We have the room for seven more minutes, but the… so this group is obtained from a group on the—you know—on vector z, but then you have all the coordinates from over here, so could use estimates from—you know—the previous space, like every cell of cost and then take that [indiscernible] quantity… >> Persi Diaconis: Right, that’s one of the proofs we… and in order to go from infinite down to finite, we needed Harnack inequalities, but they exist—I mean—and actually, this guy, Alexopoulus, has this for any—you know—any nilpotent—discrete nilpotent—group, he has—you know—the right Harnack inequal… so that’s one way of doing it. Now, that’s—you know—that’s bringing in a lot of hard work. It just seemed to me—you know—here’s this poor little group—Fourier analysis, its tri-diagonal matrices—how bad should it be? I mean, that’s what I was trying to do. It’s an exercise. I said, “It’s an exercise, I’m just trying to do an exercise.” And it got me—you know—wham! I’ll show you. So I… there… your proof with Allan Sly—you know—you did in Vienna; I’ve forgotten whether you did general p, but probably did—you know—that is, Yuval has a beautiful proof of the rate of convergence on… for n by n matrices, showing that—you know—p squared is enough, based on a very clever foundational. But, I mean, there are really a lot of different proofs. This is just… it’s a straightforward approach; you could try it on any group. You know, the other cases, there are special tricks for using structure, and I wanted to try it. That… it just… it was really a homework problem which turned into three papers. That’s… what else can I say? [Laughs] >>: Nice. >> Persi Diaconis: Yeah. >> Yuval Peres: Any other comments or questions? If not, let’s thank Persi again. [Applause]