Document 17865155

advertisement
>> Yuval Peres: Hi, everyone. It’s my pleasure to have Persi Diaconis here. Persi has been a great
influence on many people here; in particular, a lot of my own research in the last fifteen years has been
dictated by the directions that Persi initiated thirty years ago. And now he’s a—you know—and as I said
in the e-mail, you can find him now in the list of the twenty most influential scientists today. Anyway,
without further ado, today, Persi will tell us about random walks on the Heisenberg group.
>> Persi Diaconis: Thank you, Yuval. Hello, thanks for coming out. Let’s see… first is: there’s a handout,
so this is a no free lunch… you have to make sure somebody gets… each person gets one of these pieces
of paper. Okay, that’s the first thing. So this talk is about a homework problem I’m in the middle of, and
I know better than to do that to you—you seem like nice people—so I’ll try to explain to you why there’s
something in it, I hope, for you. The talk is joint work with Dan Bump, Angela Hicks, Laurent Miclo, and
Harold Widom, and it’s about, well, the Heisenberg group, so I’m going to call that group H, and it’s ju…
very simple thing—it’s the three by three matrices, upper-triangular, and with entries x, y, z. And I’ll
write such a matrix as x, y, z. And so, in particular, if I multiply two of them times x prime, y prime, z
prime, well, that’s x plus x prime—they just add above the diagonal—y plus y prime, and here, if you
figure it out, it’s z plus z prime plus xy prime—just multiplying matrices. And x, y, and z, well, in physics
they are usually in R and C; x, y, and z could be real numbers; they could be—in number theory, where
this is a big deal—they could be in the integers. For me, often, they’ll be in the integers mod n, but they
could really be in any ring and this all makes sense. The random walk on this group there… the group is
generated by this generating set I’ll call S, which is one, zero, zero minus one, zero, zero symmetric
random walk—zero, one, zero and zero, minus one, zero.
So random walk is: you pick one of these two coordinates at random, and you put plus or minus one
there; you write down that matrix, and then you multiply by it on the left. And so I’ll try to motivate it in
a second, but as math, if I say q of g—g is a group element—is equal to one-quarter if g is contained in
this set S and zero otherwise. Okay, that’s one step of the walk, and then q star q of g is convolution—q
of h times q of gh inverse, summed over h. So what’s the chance of being at g after two steps? You
have to have picked something your first step, and then picked the thing that gets you to g your second
step, and similarly, we have q star k of g—the chance of being at g after k steps of the walk—and under
no conditions… well, under… so example of the kinds of theorems that people prove—several people in
this room—if the entries are in the integers mod n, then—and if n is odd, ‘cause otherwise, there’s a
parity problem—then q star k of g converges to the uniform distribution, u of g, which is equal to one
over n cubed. So if you walk around on this group, you get to a random group element—random group
element means x, y, and z can be anything—and the rate of convergence is less than or equal to some
constant e to the minus—I think it’—two pi squared times k over n squared, and it’s bigger than or equal
to some other constant times the same thing—two pi squared k over n squared. And so this says—we
say—order in squared steps are enough. If you go ten n squared steps—if k is ten n squared—this side
is exponentially small; if k is a tenth n squared, the side is big, and this… everything is explicit. And so
that’s a typical theorem in this subject—how long do you have to walk—and if we’re over Z, for
example, then random… this is an infinite group, and it’s not Abelian, and it’s—I don’t know—it’s… so
random walk isn’t recurrent; you don’t come back to zero, but you can ask, “How long do you… how…
what does it look like?” And well, q star k of—I’ll write the identity—zero, zero, zero, the chance—you
can ask all kinds of questions—but what’s the chance of being back at zero after k steps? This is
asymptotic to a constant over k squared, where c is equal to something: gamma of one-quarter squared
divided by pi to the five halves times the square root of two, if you wanted to know. Okay, so these are
kinds of theorems that people prove. Now, this is a sort of generalish audience, and I want to tell you
why I care about these things, and maybe a little bit why you might be interested in some of what’s
coming, so some motivation for this study, ‘cause that’s…
>>: Is it unclear if it is transient?
>> Persi Diaconis: It… no, it’s not hard to see that it’s transient, but it is transient. It just a… let’s just…
it’s a theorem, mmhmm, so it’s a little theorem, but it’s…
>>: Any group of more than quadratic growth is quadratic.
>> Persi Diaconis: Right, there are many, many ways to see it, but I’m just—you know—but that would
be a question you could ask, right? “Is it transient or not?” And you could ask about off-diagonal—you
know—and you can ask a lot of questions. And so my motivation for studying this problem—for giving
this talk—is the following: the first is—it’s a funny question for you, maybe, but not for me—“Is Fourier
analysis good for anything?” [laughter] Actually, useful—I’ll ex… you’ll understand that by the time I get
through—and well, just to say a sentence about it, this is a random walk on a group, and you could try to
study that using Fourier analysis, and it’s hard to do, as you’ll see. And I think that there are eight proofs
of this now, and none of them using Fourier analysis, and so since this is the natural way to try it, well
you’ll—I hope—understand, but that was my motivation. I just wanted to try to do one of these
problems by Fourier analysis. The second motivation is about the features of random walk, and I think
this is a quite important… quite an important topic for a lot of things. So I probably don’t have to tell the
people in this audience that Markov chains are used for all kinds of computational tasks, and people like
Yuval, and I, and other people in this room often study the question of how long do you have to run a
Markov chain until it’s close to its stationary distribution. Well, we have global kinds of results, like this
one: if I run the Markov chain n squared steps, then in this very strong total variation distance, it’s close,
say… which means for many, many questions the answer you get from the uniform distribution, and the
answer you get from the random walk are close together. But you might not care about all aspects; you
might only care about the aspect you care about. You’re running this random walk to do a certain
simulation problem; you don’t care about all the other questions you might ask. How long do you have
to run it to get your feature right? So this is an example where that comes into focus.
So let me explain that in this context. So the steps are: I’m gonna pick elements of this set one at a
time—so suppose the elements I pick are epsilon i, delta i, so that is… I, you know, just… so this is either,
you know, one, zero; minus one, zero; zero, one; zero, minus one, okay? Those are the steps I pick—
I’ll… I won’t write the last zero—and then my walk is—you know—epsilon—well, I guess the way I’m
doing it—epsilon k, delta k times epsilon k minus one, delta k minus one, epsilon one, delta one, alright?
That’s my walk, and I’m gonna multiply out, and I’m gonna say that’s equal to, say, xk, yk, ck. Now, it’s
very easy to see from the rules what—you know—so here, xk is equal to epsilon one plus—and so on—
plus epsilon k, and these epsilons are—you know—zero, plus or minus one; they’re zero with probability
a half, and they’re plus or minus one with probability a quarter. So this is just simple random walk on
the integers mod n—if I’m in this context—and we know everything about how this behaves, right?
That is, this behaves like the central limit theorem says, and if you’re… if you think of the integers mod n
as n points wrapped around a circle, it’s doing random walk, and therefore, it takes n squared steps to
get random, and et cetera, but in particular the… well, okay, so and the same for y: yk is equal to delta
one plus—and so on—plus delta k. And the joint distribution of these two things follows the bivariate
central limit theorem over Z, and so when you multiply these matrices, what’s here and here are like
Gaussian random variables, and so we know everything about them. So let’s look at this third
coordinate… yeah?
>> Sarah: I got a little lost. What are your pairs there in terms … what’s [indiscernible]
>> Persi Diaconis: So each time—so I’m using random variable notation—each time I’m picking one of
these four things, and each time, the fourth coor… the third coordinate’s zero, so I’m forgetting about it.
So this really could’ve been epsilon delta zero, epsilon delta zero—okay, but I’m just forgetting the zero,
okay? And they’re random—okay—so epsilon delta is equal to one, zero, or zero, one, or minus one,
zero, or zero, minus one with probability a quarter each. So those are just the steps I’m picking each
time, and then I’m multiplying them together, so there should really be a zero following each thing, and
then I’m multiplying them. So is that okay, Sarah? Okay, thank you, thank you.
So what’s the third coordinate? So here, zk, I think it’s this: it’s epsilon two times delta one plus epsilon
three times delta one plus delta two plus epsilon k times delta one and plus—and so on—plus delta k
minus one. Okay, I think it’s that—it’s very easy to figure out what it is—I think it’s that. And so the—
you know—these epsilons and deltas, they’re a little bit dependent; if epsilon i is plus one, delta i is zero,
okay? They’re a little bit dependent, but really… okay, so you can ask now if you’re a probabilist—forget
about anything else—or any kind of thinker about these kinds of things: “How does this thing behave?
What do I have to divide it by so that it has a nontrivial limit distribution?” Here we know that—you
know—xk divided by square root of k goes to normal with mean zero and variance a half or whatever it
is. And what do I have to divide zk by? Well, what you can show is that you have to divide this by k—so
here zk over k—that has a limit; this goes to a limiting random variable; this converges, as k gets large,
to a nontrivial limit, and—‘cause Yuval wrote a book about it—I’ll just put it down; what’s the limit? This
goes to—I’ll call it—z infinity; this has a limit; this has a limit this way—weak limit—and where the z
infinity is distributed as the integral from zero to one, of B one of SDB two of S, where these are
independent Brownian motions, and that’s not very surprising. If I divide this by k—you know—this…
these are going to—if I divide this by root k, sorry—if I divide it by k, I put one of the root k’s under these
things, they all go to… this goes to Browninan motion, and then, this is a Riemann sum for this integral.
It’s not very hard. If you want to make math out of it, this is a martingale, and the martingale central
limit theorem tells you it has a limit, and the limit is identifiable as that. And so that shows that this
third coordinate—this z coordinate—you see these coordinates, it takes them… in order to go… they
want to get random on the integers mod n. They have to go n squared steps in order to have a good
chance of getting down to the bottom. This third coordinate is getting random much faster, so this
shows that—you know—taken… so if k is of order a constant times n squared, zk is—mod n—is uniform
if c is large. So this third coordinate is getting random much faster. If I… so this is an example of
features, and I think that it’s an important thing to try to study: how long does it take parts of a random
walk or Markov chain to get random.
>>: [indiscernible] k is c, and you wanted k to be cn?
>> Persi Diaconis: cn… where? I’m going… thank you. That’s right. And c should be large, and so forth.
Anyway, those are… so as a conjecture that we don’t know how to prove, if instead of doing this with
three by three matrices, if you do it with d by d matrices, and then you… so you pick a coordinate just
above the diagonal at random, and then you put plus or minus one there, and everything else is zero—
just the analogue of that. Yuval and Allan Sly have beautiful papers showing that this takes… I think it’s…
if it’s d squared, p squared steps on the whole group. But if you just look above the diagonal, it gets
faster, and if you look on the second diagonal, it’s faster still, and in the corner, it’s very fast. I mean, it…
just above the diagonal, one of these entries that’s the same as what I just talked about, gets random.
So just above the diagonal, it takes p squared steps to get random; two above the diagonal, it takes p
steps to get random; three above the diagonal, it takes p to the two-thirds steps to get random; and j
above the diagonal, it takes p to the two over j steps to get random. And that’s a conjecture; I can’t
prove that, but I’m pretty sure it’s true.
Okay, so that’s a little bit of what I mean by features—I hope that’s okay—and now I want to talk to you
a little bit about what I mean by this question. So for a moment let me let G be any group—G is any
group, finite group, say. Each group, like the Heisenberg group, and suppose that q is a probability—q of
g is a probability on G—then… and I define convolution just by this recipe, et cetera—so I do the random
walk generated by q. And you can ask—you know—suppose that the… it’s… you don’t have pariy
problems, and that you’re—you know—living on a generating set and stuff like that. So suppose that q
star k of g converges to the uniform distribution, and then you can ask how fast it occurs. And you… one
way in which I’ve studied those problems is to use Fourier analysis, so a representation of G is mapping,
that … it is mapping that assigns group elements to matrices, GL(V), with the property that rho of st is
equal to rho of s times rho of t. And so you assign matrices to group elements in such a way that
products are preserved in that way. And the Fourier transform at a representation q hat at rho is, by
definition, the sum over G of q of g, the weights times the matrices. So it’s a matrix—the Fourier
transform is a matrix—and as usual, Fourier transform takes convolution into product, so q star q hat at
rho, is equal to q hat at rho squared. And the uniform distribution has the q—you know, q is uniform—if
u hat at rho… the Fourier transform of the uniform distribution is easily seen to be zero if rho is
nontrivial irreducible—a representation is called irreducible if you can’t break the vector space up into
two disjoint parts such that the representation only takes you into one part—anyway, and is equal to
one if rho is one—the trivial representation. And so the way you can study this convergence of… to
uniformity is by showing that high powers of this matrix converge to zero. And the upper bound lemma
makes that precise, and it goes this way four times the distance of q star k to uniform—so this is this
total variation distance—is less than or equal to the sum over irreducible representations of the
dimension of the representation—I’ll try to explain this—times the size of the matrix, q hat rho to the k,
and this is squared, and this is the trace norm… trace norm. So okay, that’s… and this V here is d rho
dimensional. So that’s a bound… that says if you have your hands on how fast these thing go to zero,
then you can bound this. So this is a completely general recipe, and—you know—here’s this… this is
perhaps the—or almost—perhaps the simplest noncommutative group, so if you like that kind of stuff—
it’s very natural—why can’t we do it on this group? What’s wrong with that? What happens—you
know—what happens, okay? And it’s a famous hard problem, and I found out why, and you’re gonna
hear why, okay? That’s the… but, what kinds of things happen?
So in order to talk about that, I have to tell you what the irreducible representations of the Heisenberg
group are, and they’re easy. So the representations of the Heisenberg group—and I’m gonna tell you
for the integers mod p, just… it’s not hard to say it for any group, if you want to know ask me after the…
for any… it’s hard to say for any group; it’s not hard to say for any n, but it’s a little more complicated,
and let me tell you what they are when p is a prime. And so there are p squared one-dimensional
representations—dimensional representations, characters—and there are p minus one p-dimensional
representations. And one of the facts of life is that if you sum the squares of the irreducible
representations, that sums to the size to the group, and p squared plus p minus one times p squared—
the sum of squares of the irreducible rep—is p cubed, right? p squared plus p minus one times p
squared equals p cubed—that’s true. So okay, so what are the representations? The one-dimensional
one… representations, they’re indexed by pairs ab, mod p, and so, xyz. This is a… so remember
representation’s a linear map; a one-dimensional representation is a linear map of a one-dimensional
space into itself. That’s just multiplying by a number, and this number here is e to the two pi i a x plus b
y over p—that’s familiar—those are the… you can see that if you—you know—multiply this and multiply
these, the product is the product, it’s okay. The p-dimensional representations, they’re… they go like
this. They’re on a vector space, so the vector space I’ll take… you can take all the same vector space; V
is the set of all functions f—or column vectors f, anyway—f from the integers mod p, into C. Okay, so
it’s just… that’s a space. And I just have to tell you how x, y, and z act, so here, rho in this… there’s only
one parameter that comes up—c—of x, zero, zero. Now, that’s a linear map of this space into itself, so it
has to take a function into another function, so it acts on the function f, at the argument j, as a f of j plus
x, so it just translates, okay? That’s okay. It just shifts; it’s a cyclic shift, if you like; it just shifts the
vector around. The next one, rho sub c of zero, y, zero—that’s what underlies the Fourier transform—
this acts on f at j as a… it’s e to the two pi i c over n times yj, as a multiplier, times f of j. So it acts
diagonally, and… that’s how it acts. And z acts… rho sub c of zero, zero, z at f of j just acts even in
simpler way, it’s e to the two pi i c over n times z times f of j. Okay, so if you put it all together, rho sub
c—c’s a nonzero integer mod p of x, y, z—well, it’s something—you know—it’s e to the two pi i c over n
times—they will combine, right—so y times—don’t tell me—times j plus z times f of x plus j, this is
acting like f of j.
Okay, so that… those are the p-dimensional representations, and there’s one for every non-zero … n is
p—you all knew that, right? n is p—p, p, p—okay. So is that—I mean—that’s… you can just check that
they obey this rule, and it’s not hard to check. And so now, what does my Fourier transform become?
So here, ‘member my q; q is—you know—I picked one of the two diagonal elements—they’re plus or
minus one—and I—you know—so here, q hat rho sub ab. The Fourier transform… well, that’s just—you
know—that’s the Fourier transform at the one-dimensional representations; that’s just—well—it’s one
half cosine two pi a over p plus one half cosine two pi b over p—two pi b over p. So it’s just… that’s the
Fourier transform. And q hat at the p-dimensional representation, it’s a matrix, and it’s not a bad
matrix. It’s got a quarter in front; it’s got ones just below the diagonal; it’s got ones just above the
diagonal; it’s got a one here and a one here; and then on the diagonal, is cosine two pi c j over p—zero
less than or equal to j less than p minus one. Okay, so it’s a… it’s that matrix. And one of the jobs that
there’s gonna be—which was new to me—is I had to get my hands on the eigenvalues of this matrix.
And so if anybody’s seen that matrix before or knows anything interesting about it, I’d be happy to hear
about it. And so let’s say what the job is in order to bound the Fourier transform, I… so I… ‘member my
job was to try to use this machine to bound the rate of convergence of the walk. So I have to bound; I
have this calculus problem. So this is a matrix; it’s a symmetric matrix, so it has real eigenvalues. The
eigenvalues are beta one of c, beta two of c, beta—I don’t know—p minus… well, p of c—the real
eigenvalues… some real eigenvalues. So—you know—if you write that thing out, I have to bound this…
this should be rho, not the trivial representation. So I have to bound this sum—it’s the sum of two
parts—the sum over a and b not equal to zero, zero of, well, one half cosine two pi a over p plus one half
cosine two pi b over p to the two kth power—that’s the sum over the one-dimensional
representations—plus the sum of c equals one up to p minus one of…well, this… the dimension of the
representation is p, and then the sum over j equals one to p of beta sub beta j of p to the two kth power.
And that’s… this should have been… yeah, this is a norm squared, so this is the sum of the squares of the
eigenvalues.
Okay so, just… I’m sorry, but I’m gonna make you look at this for a second. I know it’s not fun to look at
anybody else’s calculus, but I’m not gonna do very much of it, and see if you can at least look and see
what the job is. So one term in this sum is when a is zero and b is one—that’s a term. So, when a is
zero, this is one—cosine of zero is one—this is one half plus one… well, this is one half plus one half
cosine two pi over p to the two kth power. Then that’s one term in the sum. Well, cosine near zero is
one minus x squared, and so this is one—you know—minus two pi over p squared over two squared—
something like that—to the two kth power. And so in order to make this one term small, k has to be of
order p squared, I mean that’s… if k is ten times p squared, this is like e to the minus ten, and so… okay
that… and then in the usual way, that’s all the trouble—that is the… all the other terms are smaller, and
they all add up, and you can bound this sum by straightforward analysis as long as… so k has to be of
order p squared. So good… I need to know about these numbers, and I need to know about them with
some kind of reasonable exactitude. And I will show you coming—a little bit—beta j c is—you know—
certainly less than or equal to one minus an explicit number over p—so something like that is true—and
so this sum has p terms in it, and so you have to… and this sum has p terms in it, and so I need to choose
k so large so that—p terms, p terms, and then there’s a p—so it’s p cubed times one minus constant
over p to the two kth power. Well—you know—if k is of order p squared, which I need to kill the linear
terms, then this is tiny, so that doesn’t cause any trouble. So sorry for making you look at this calculus,
but that’s what would be involved if you were able to do that.
So I was left with the problem of trying to bound the eigenvalues of these matrices. Now, by now, I’m
very good at bounding eigenvalues of matrices, and I just thought, “You know, it’s a tri-diagonal matrix,
how bad can it be? What’s that gonna do? You know, that’s not gonna cause any trouble.” But then I
realized that all the tools and tricks I know are in probability language and about stochastic matrices.
Well, it shouldn’t be so bad—you know—this cosine can be negative—sorry, cosine can be negative—I…
and then the rows aren’t… don’t have a constant sum, and so I didn’t know what to do, and I tried some
things, and they didn’t work, and so we tried some other things. And I want to tell you that, but before
doing that, ‘cause that… I want to give you some more motivation. So why would anybody care about
these matrices; it turns out they’re famous matrices. So more motivation. So just to make it simple, let
me try to take c to be one, so M is equal to M one, and that has cosine two pi over p or n—I’ll make p n
now, doesn’t matter—times j down the diagonal. So it’s this matrix with just that down the diagonal,
okay? So those matrices come up, and here’s one place they come up. Probably most people in this
room know what the discrete Fourier transform is. So the discrete Fourier transform matrix , fn, has,
say, j, kth entry e to the two pi i j k over n—over square root of n if you want it to be a unitary matrix—
so that’s a n by n matrix, zero less than or equal to j and k less than n. So… and there are teams of
electrical engineers who want to know about the eigenvalues and eigenvectors of the discrete Fourier
transform matrix; there’s a fair-size literature on that. Why do I care about it? Well, it turns out that,
because of what the Heisenberg group does for a living, this matrix commutes with M. So fMf inverse—I
don’t know—fM is equal to Mf… and… or, right, so I’ll put it here… and what that means is that they’re
both symmetric matrices; it means they’re simultaneously diagonalizable, and I thought, “Ah, I’m in luck.
These engineers are gonna know all about the eigenvalues of the discrete Fourier transform matrix, and
therefore, since they’re simultaneously diagonalizable, I’ll know a lot about the—you know—I’ll know a
basis for… I’ll know how to diagonalize M.” And… well, okay, that sounds good—it sounded good to
me—but alas—not alas, but anyway, what’s true; I think Gauss showed this—but the fast Fourier
transform, you know it’s its own inverse, right? So f to the fourth is the identity, and what that means is
that the eigenvalues of this matrix are plus or minus one, plus or minus i—that is, it’s the… it’s a unitary
matrix, so it has eigenvalues which are roots of unity—and the dimension of the eigenspace are around
n over four—within one. So I do get a reduction of M in—you know, because M preserves the
eigenspaces of the Fourier transform matrix—but it’s not helpful. And there is a lot of work, as I said, in
the engineering literature about various decompositions—you know—eigenvalue decompositions for f,
but because the bases are so non-unique, they weren’t usefully related to M and I couldn’t us it. On the
other hand, I now have very, very good approximate bases—eigenvalue decompositions of M—and they
do decompose the Fourier transform matrix, and that seems to be interesting. So that’s one
motivation—okay—for studying this matrix.
A second motivation comes from physics. And so the… so this is Harper O Hofstadter and martinis—
martinis, not Martini; it’s not a person; it’s the alcohol. So there’s a very large literature in solid-state
physics about periodic… the Schrödinger equation with periodic potentials. To just say a simple version
of it, say on l2 of Z—square summable sequences of length z—the Schrödinger operator—it’s just this
operator with a periodic potential—takes… so I’ll call it—I don’t know—l of… phi as an l2 function at j is
equal to phi of j minus one plus phi of j plus one plus cosine theta j plus eta phi of j. So that… this is the
analogue of a second derivative operator, and they’d usually put a constant in here—a v. And if you
make… if you want to compute anything, you discretize this operator and look at it mod n, and these are
exactly my matrices, I mean, so this… these are slight shifts, but that doesn’t change anything. That is,
these matrices, if you discretize it or take periodic boundary conditions, are my matrices, and there’s
enormous, both applied, numerical, theoretical work on: what’s the spectrum of this operator? The… if
you want to get famous, Arthur Avila just won the Fields medal, one of his accomplishments that’s listed
is called the ten martinis problem; that was a problem of Mark Kac, who in a talk of this sort, said, “Well,
I’ll give ten martinis to anyone who can solve this problem.” And it was to show things about—this is an
infinite operator—to show things about the absolute continuity of the spectral measure, and well,
people like Barry Simon, and many other people have written lots and lots of papers. When this is a
general parameter—when v is two, which is the case that we’re doing… we’re dealing with—this is
called Harpers operator or Doug Hofstadter worked on it in his thesis. And if you type in Hofstadter’s
butterfly, you’ll see lots and lots of references, and looks at what the spectrum the operator looks like. I
won’t try to say that more; I can, but there’s a lot of interest in the eigenvalues of this matrix in the
physics community and in the solid-state community. So I won’t say more about it, unless asked.
Okay, so I need to know about the eigenvalues of this matrix. So—you know—this is… 2014, so take a
look at the handout here… well, this is… this side with the pictures. So this is when m is two hundred
and the parameter a is c. So this is: what’s the biggest eigenvalue of that matrix when the parameter—
which I was calling c and is here called a—varies in its range? So what you can see is that, for example,
when the parameter is—you know—one, the eigenvalue’s very close to one, and I said it’s one minus a
constant over…
>>: Does the parameter a [indiscernible] the x axis?
>> Persi Diaconis: No, the parameter across the x axis is… let me write it down. Thanks. The
parameter… the matrix has a cosine of two pi—well, I called it c—times j over p, zero less than or equal
to j less than p minus one down the diagonal, and the parameter that’s across the x axis is c, okay? So
for each c, this is a matrix, and it has a top eigenvalue, and that’s what’s pictured here.
>>: So what’s p?
>> Persi Diaconis: Oh, p is two hundred, which isn’t prime, but that doesn’t matter. p is two hundred.
Sorry. p is n. So p is two hundred—so here, it’s called m. It’s the size of the matrix, it’s two hundred.
So what you can see is that the eigenvalues are pretty close to one, and then they fall off—they’re not
monotone—you know, unfortunately. They are symmetric, and that’s not hard to show. And okay,
that’s good. Now, in order to use these bounds, I also need to know the smallest eigenvalue—the one
that’s the closest to minus one. And that’s what the smallest eigenvalues look like, and these look like
mirror images of one another, but unfortunately, that’s just what they look like. They’re approximately
mirror images. Those two dots, for example—no—but they’re not exactly mirror images, and so I also
had to bound the smallest eigenvalues, and so forth. And I’ll come back, maybe, and talk about some of
the eigenfunctions. This is the first eigenfunction; the eigenfunctions are localized, and they’re very
peaked around zero. And this v equals two, which is the case we’re in, is the critical case. If v is bigger
than two, then the eigenvectors are not localized, and when v is less than two, they’re are localized, and
here, they also are localized, but… which you can see; they’re mostly zero, but they’re kind of very
peaked around… well, very peaked around one.
Okay, so you can look, and then you have to prove something; eventually, you have to prove something.
I just looked on the table, and I won’t… but it’s a good time for a minute, so Anther showed me this. I
can’t hand it around, because he’ll kill me, but this is the Heisenberg group—a portion of the Heisenberg
group—with those generators. This is the Cayley graph of the Heisenberg group, and those of you who
are interested, can come up and take a look at it under inspection, so okay. It’s—I think—from a threeD printer, and I never saw such a thing, so I’m thrilled—thank you—but one can look.
Okay, so I want to talk to you about the… how you bound eigenvalues of non-stochastic matrices. And
I’m afraid there’s a joke about this—you know—there’s a joke about a physicist, and a mathematician,
and—I don’t know—somebody else, and—you know—the point is: the mathematician goes back to
cases he knows, and proves things by induction, right? So I’m gonna go back to what I know, so I have
this matrix—my matrix—and I’m gonna… it’s got cosine down the diagonal, so it’s not positive. So I’m
gonna make it positive, and I’m gonna make it sub-stochastic. So I’m gonna let M—I’ll work with M
one—I’ll just call it M, but then I’ll just add the identity, now it turns out good to add a third the identity
plus two thirds of M, okay? So… but once you add the identity and two thirds of M, that makes
everything nonnegative, and the row sums less… between zero and one. So has nonnegative entries,
and the row sums… okay, so you can just easily… easy to check that. And then, I can make it into a
stochastic matrix by just making it an absorbing Markov chain. So I’m now gonna make a Markov chain,
K, which is like this: here’s infinity, here’s zero, one up to—well—n minus one, and here’s one third the
identity plus two thirds m, and then, here are some numbers which I’ll call a zero, a one, up to a n minus
one, which just make the rows sum up to one. So let me try to explain that… well, let’s say what they
are: aj is equal to one third times one minus two thirds cosine two pi j over n. So it’s just what you need
to make the rows sum up to—what—the identity, if I did it properly—to make the rows sum up to one—
if I did it properly. So what is this? This is a stochastic matrix—all the rows sum to one; all the entries
are nonnegative—I added a site to the space, infinity, and this is a Markov chain, but if it hits infinity, it
dies. So this is absorbing at infinity—if you hit infinity, you stay there—and the rest of the time, you
could go to infinity; this is the chance of going from j to infinity—that’s what this first row is.
So okay, I reduced… I made the problem into something that’s friendlier to me; it’s a stochastic matrix
now. Of course, this matrix is a stochastic matrix, so it has one eigenvalue which is one—mainly… yes.
And so if I can bound the second eigenvalue of this matrix, that will be the top eigenvalue of this matrix,
and then I’ll be in business. So the way you do that, I’m gonna call these—this set of states—S, and S
bar is S union infinity—in case that comes up—and I’m gonna use… I’m gonna bound this by the
minimax type principal, but I associate a quadratic form with this matrix, and it’s a little bit tricky; it took
quite a while to get it right. The quadratic form is the Dirichlet form, Eff—f is a vector, column vector—
and it’s just equal to this: it’s one half the sum, over x and y contained in S bar, of f of x minus f of y
squared K of xy K of x. K is this matrix, and this is a symmetric matrix, and this part is… well, let me write
it down: u of x, where… these are Dirichlet eigenfunctions, so f of infinity equals u of infinity is zero, but
of course, Kxy—you know—Kj infinity is positive, and so this… anyway, that’s the quadratic form which is
useful and needed. And what you can show—the usual characterization of eigenvalues and terms of
quadratic forms—shows that if we can find a a bigger than zero such that the l2 norm squared of any
function f is less than or equal to a times Eff—that’s the usual way of… you know, the usual thing is the
eigenvalue is the quadratic form divided by the length of the vector, and so then, that ratio will be
bigger than one over a. That is equivalent to—or implies anyway—that the top eigenvalue of this
matrix—which I’ll call beta—beta is less than or equal to one minus one over a. So if I can bound the
quadratic form, I can get a bound on the eigenvalue I want.
And now we use… I use this path method, but here the… it’s a little bit different, ‘cause of this infinity—
and that’s a new thought to me. We need paths and for gamma x, which take any x and connect it to
infinity, so this is x naught, which is x, then x one, x two, up to x d, which is infinity. And they have to be
paths in the graph—that is, I need K of x i minus one x i to be bigger than zero. So I need paths
connecting any x to infinity, okay? And the way we use that is we write f of x as, well, f of x minus f of x
one, plus f of x one, minus f of x two, et cetera, plus f of x d minus one minus f of x d. The point being,
everything cancels out, and this is zero. So that… and that’s just an identity. And then use CauchySchwartz; that’s less than or equal to the length of the path—the number of terms in the sum times the
sum of squares, f of xi minus one… oh, I don’t know, minus f of x i squared. And then that’s like this, and
then you fool around in a way that is standard way of fooling around—read Yuval’s book with David
Levin and Elizabeth Wilmer, Path Arguments—and the bottom line is: using the geometry of the graph,
you wind up proving this kind of a bound. You can take a—this a which works here—to be the
maximum over edges xy where x is in S and y is in S—could be infinity; that has to be allowed, y’know—
one over k of xy—this chance of this path—times the sum over z such that xy is contained in the path of
associated to z, on the length of the path. So that… you can prove that kind of a bound. And then, so
you want a to be small—turns out, in order for this bound to be useful, you want a to be small—and so a
will be small if you could choose paths in the graph which take you from x to infinity with the following
property: none of these things should be too small—so that’s important—and you shouldn’t have too
many paths that use a given edge, ‘cause you don’t want this sum to be too big. And so that’s a kind of
thing that we do as combinatorialists, and I’ll just say a sentence about that, but then I’ll just stop. So
here’s infinity; lots of things connect to infinity. You can go from any j to infinity; so there’s infinity, and
here’s—you know—here are these points, one up to n minus one. And if you start here at the
beginning, this thing is very close to zero, it’s one minus cosine, right? That’s a… I don’t… I probably
have these numbers wrong, but this thing is very close to zero, so you don’t want to do that. So what
you do is you connect points to infinity by going from here over into where it’s nonzero and then up to
infinity. And this point you go to here, then up to infinity; this point you go to here, then up to infinity.
You do the same in the opposite direction. You have to choose a break point, and you go here, and then
here, and then here, and then up to infinity. Points in the middle you connect directly to infinity, and
what you can see if you choose paths that way and do the combinatorics—you know—this isn’t too
small, and therefore, one over it isn’t too big, and this isn’t… not—you know—at most four paths use an
edge, and the lengths of the paths are not too long, and so you can control this. If you do this carefully,
you get that a… you can take a to be of order a constant times—well, for this case—a constant times
one over p to the four thirds. Now, that’s more than enough for the eigenvalue bound…
>>: Is that one over a or a or…?
>> Persi Diaconis: This is a, so that… oh, yes, yes… yes, c times p to the four thirds. And so the
eigenvalue bound: you get beta’s less than or equal to one minus some constant over p to the four
thirds, and that’s not what I used before, but it’s more than enough to do the job—you know—I… that…
so I’m gonna stop here, except to say time flies, and it’s about… it is about time and over time. We
actually know for these matrices, I actually know what the top eigenvalues are, at least in the corners,
and so actually beta is equal to one minus pi over two p plus big O one over p squared. And actually, if I
put in a c—it’s c over p… as long as c is small, c fixed.
So I was able—or we—our crew was able to relate the eigenvalues of this matrix to the eigenvalues of
the harmonic oscillator, which are very well-known, and use that relationship in order to bound these
eigenvalues. Unfortunately, I need to know this, not only for c fixed, I need to know it for all c. And
these cruder geometric arguments work great; they don’t give quite as precise results, but they do give
good answers. So I want to finish by just saying, I started on this talk for two reasons: I wanted to know
is Fourier analysis useful? And if not, why not? And I wanted to know about—you know—the
distribution of the center argument. Well, I can’t answer—you know—this is pretty hard work is all I’ll
say. We really have seven proofs of the fact that n squared steps are necessary and sufficient for
random walk on the Heisenberg group, so I—you know—did I need to do it this way? No is the answer,
but still, it shows—you know—what the difficulty is. Having done this work—it is important to say—
there are lots of other groups where the Fourier transform has exactly this form, and so more or less,
any class-two nilpotent group, the Fourier transform… the high-dimensional Fourier transforms have
this form. And so knowing about the eigenvalues of these matrices, there are a bunch of other random
walks—some of which, our previous techniques didn’t apply to—that we can do well. It is also worth
saying that our sharp… these sharp eigenvalue bounds are much better than what ergodic theorists and
Arthur Avila got on the problems that they care about. They didn’t actually care so much about the
extreme eigenvalues, although they did care about them, and they got… we got much sharper bounds,
and so that’s good. So I hope that you know what I was up to, trying to do my work on the Heisenberg
group, and I hope that is instructive for you sometime. Thank you.
[Applause]
>> Yuval Peres: Questions?
>>: So does… did you to try to expand this a little bit to other… the other upper-triangular matrix groups
or one with the diagonal integers above or…
>> Persi Diaconis: Yeah—Sarah was the editor for our super-character paper—bless you. Yeah, I didn’t
try. I just… we… I just decided I was gonna do this one, and I didn’t know… I mean, I will eventually try.
But one of the problems is for the group of n by n upper-triangular matrices, nobody knows what the
irreducibles are. And in some sense you can prove that nobody will ever know; they’re wild problems.
And for this case—you know—I said, “Well, here are the irreducibles; four by four, okay; five by five,
okay; six by six, and then it stops. I mean, nobody knows what the irreducibles are—I think—for seven
by seven. And we don’t know the characters, we don’t know the conjugacy classes. So we did manage
to make new, and better, and easier-to-use super character theories using these ideas and that, so I
would like to do more about that, but I just decided I wanted to do this problem. How hard could it be?
It’s a tri-diagonal matrix; it shouldn’t be so bad. Well, okay, I have scars to prove it—wasn’t so much
fun.
>>: What do you mean that you can prove that you can’t find the counter…?
>> Persi Diaconis: That you can’t find the counterexample? That’s a good question. So let’s see if I can
explain that. So the claim is that… so these are these groups: U, N, P, FP, which are ones on the
diagonal—star NFP—and you know, our… what I claim is that I’m gonna try to convince you that there is
a proof that you can’t describe the conjugacy classes. Now, of course, for any finite n, it’s a finite
problem, and leave me alone—okay—but I mean, it’s not—you know—like the conjugacy classes of the
symmetric group—I’ll explain, as a math thing—but conjugacy classes of the symmetric group are
indexed by partitions, and we all… that’s okay, right? For GLn—you know—these are nice groups,
they’re the Sylow p-subgroups of GLn—I mean, they’re not bad groups. So okay, so, let’s see, I proved
that if you had a nice description of the conjugacy classes by a bijection, you’d have a nice description of
what are called wild quivers. So a quiver, is just a collection; it’s a directed graph—I don’t know, okay—
and the representation of a quiver is a vector space at each place, and a—at each vertex—and a linear
map. Nothing has to commute, just… okay. Two representations are called equivalent if you can change
bases and make the linear maps actually the same—okay, so if they’re equivalent up to change of basis.
And the… a problem is classify quivers with the given—you know—representations of quivers. So these
are familiar problems. If you have one dot and an arrow leading into it, that’s classify linear maps of a
vector space into itself up to change of basis. Okay, that’s the rational canonical form—that’s good—
okay, we know what that is. So here—two dots and an arrow going from… that’s… you have two
matrices, and you’re allowed—you have a linear map—and you’re allowed to change bases arbitrarily,
there’s only one invariant, which is the rank. Okay, but still, that’s nice. Okay, quivers which… in which
the representation type is indexed by an integer, like the rank, are called finite type. Quivers in which
the representations are indexed by a finite collection of real parameters or complex parameters, those
are called—what are they called, not finite type—tame, type tame. And there’s a trichotomy theorem
which says any quiver is a finite type, tame, or wild, and… just so you see the sub-content—it sounds
crazy—but a quiver is of finite type if and only if it’s an orientation of a Dynkin diagram. That’s a nice
theorem, that’s Gabriel’s theorem.
Okay, so I showed that if you had a description of the conjugacy classes, you would have a nice
description of a wild quiver—this one: two arrows. This is: classify pairs of linear maps from a vector
space into another vector space up to change of basis. So this here, you have a pair of matrices and…
say you have a pair of matrices, and you want to classify them up to change of bases, and there’s no…
okay, now there’s a theorem that says if you had a nice description of wild quivers—any wild quiver—
then you’d ha… there’s a unsolvable word problem that you could solve. So it’s much worse than PNP;
it’s—you know—it’s up there, right? It’s… so the complexities… and now, to try to get a finite
quantifiable version of that, I’d love to do that. And I keep trying to find a tame model theorist. The
theorem I just told you about, that there’s a… this equivalence class, it’s on page three hundred of a
book called Modules and Model Theory. And unfortunately, there are two hundred and ninety-nine
pages before that theorem, [Laughter] but there is a theorem that says that if you have a nice
description of wild quivers, you have a… and here it’s very easy to say what you do: given a matrix, you
just… you embed two big blocks in it, and it… conjugation—you know—describing the conjugacy classes
of such a matrix you can easily see is equivalent to classifying pairs of maps from one space to another.
So I hope that gives a flavor for it, and… but if you can ever make more sense out of it… I did spend a
week with Katrin Tent, who’s a model theorist trying to make a quantifiable version of it. I don’t write…
but in some sense or other you can prove it. There… we have a… I have a paper with Richard Stanley
and Ery Arias-Castro which is called “Random Walk on the Upper-Triangular Matrices Mod p”, and it has
all this literature and the proofs and everything—everything’s written down. It was about ten years ago.
We’ll never get out of here if you don’t know him. I’ll answer any question.
>> Yuval Peres: We have the room for seven more minutes, but the… so this group is obtained from a
group on the—you know—on vector z, but then you have all the coordinates from over here, so could
use estimates from—you know—the previous space, like every cell of cost and then take that
[indiscernible] quantity…
>> Persi Diaconis: Right, that’s one of the proofs we… and in order to go from infinite down to finite, we
needed Harnack inequalities, but they exist—I mean—and actually, this guy, Alexopoulus, has this for
any—you know—any nilpotent—discrete nilpotent—group, he has—you know—the right Harnack
inequal… so that’s one way of doing it. Now, that’s—you know—that’s bringing in a lot of hard work. It
just seemed to me—you know—here’s this poor little group—Fourier analysis, its tri-diagonal
matrices—how bad should it be? I mean, that’s what I was trying to do. It’s an exercise. I said, “It’s an
exercise, I’m just trying to do an exercise.” And it got me—you know—wham! I’ll show you. So I…
there… your proof with Allan Sly—you know—you did in Vienna; I’ve forgotten whether you did general
p, but probably did—you know—that is, Yuval has a beautiful proof of the rate of convergence on… for n
by n matrices, showing that—you know—p squared is enough, based on a very clever foundational. But,
I mean, there are really a lot of different proofs. This is just… it’s a straightforward approach; you could
try it on any group. You know, the other cases, there are special tricks for using structure, and I wanted
to try it. That… it just… it was really a homework problem which turned into three papers. That’s… what
else can I say? [Laughs]
>>: Nice.
>> Persi Diaconis: Yeah.
>> Yuval Peres: Any other comments or questions? If not, let’s thank Persi again.
[Applause]
Download