>> Yuval Peres: Okay. So I've been looking... some years instead direct pose the question. And on...

advertisement
>> Yuval Peres: Okay. So I've been looking forward to see a proof of this result for
some years instead direct pose the question. And on Tuesday we had all the buildup of
the tension, and today we're going to see how come that direct lower bound can be almost
realized as an upper bound for the L-reversal chain. Ben Morris.
>> Ben Morris: Okay. Thank you. And I forgot to say thank you, Yuval, for the
invitation.
Okay. There's my title again. So let me remind you of the main theorem, well, so first
the definition. Okay. We have this -- with this card shuffle that has collisions, so a
collision is -- when you do a collision with probability at half, you do nothing with
probability at half, you do a transposition.
And we have a random variable T such that after time T you start matching the cards up
based on what collides with what. So after time T, if two cards collide, they're matched.
Two more cards collide, they're matched, and so on.
>>: I think from the bottom it really makes us focus on the card you're trying to hide.
>> Ben Morris: Okay. And then the theorem said -- that was the definition of the match
of a card. And then the theorem said that if -- for every card K its match is lambda K
uniform, so roughly uniform, in some sense, over all the cards that were initially above it,
then for any permutation mu, so any permutation, random permutation you start with, if
you run the chain for T additional steps, you're going to bring the entropy down by the
sum over K roughly a constant times the entropy that was attributable to position K.
Okay. Does anybody need to be reminded about any of the definitions? Okay. So this
was -- this was on Tuesday. So now ->>: [inaudible] say the end is not entropy but [inaudible].
>> Ben Morris: Okay. So right. And the end is relative entropy. The difference
between entropy and the entropy of the uniform distribution. So this will have a different
science. So when this thing gets smaller, then that actually means that the permutation is
getting more random. So zero means it's completely random.
All right. So now let me -- hopefully I can bring out some ideas by doing an example.
So I will -- I'll get to the L-reversal chain, but let me do the Thorp shuffle first since the
analysis is simpler. So recall in the Thorp shuffle you cut the cards exactly in half and
then you line up the cards in pairs and then the pairs collide.
So it turns out that the card in position Y, if Y is less than an over 2, will collide with the
card at position Y plus N over 2, where now I'm making the positions go from zero up to
N minus 1.
So Y -- the card in position Y and the card in position Y plus N over 2 are sent to 2Y and
2Y plus 1, determined by a coin flip which goes where. Okay. So in this example it's
actually easier to work with the time reversal, so the time reversal is obtained by just -well, you look at this picture and turn around the direction of the arrows. So for the time
reversal, if X is even, then card X and card X plus 1 will collide on a given step, and
they'll be sent to either X over 2 or X over 2 plus N over 2 where a coin flip decides what
goes where.
Okay. So for general X the position of the card that was at X after one step is going to be
either the floor of X over 2 or the floor of X over 2 plus N over 2 based on the outcome
of the coin flip.
Okay. So I claim that if little T is the ceiling of a log base 2 of N, then for any random
permutation mu, if you start with mu and then you run T steps of the chain, you decrease
the entropy by a factor one minus constant over log squared N. Okay. And before I
prove this, let me just say why this gives the claimed mixing time bound of log N to the
4th. So if after log N steps you reduce the mixing time -- you reduce the entropy by a
factor 1 minus constant over log squared N, that means that the rate of decay of entropy
per step is going to be constant over log cubed of N. And the initial entropy, the entropy
of the identity is going to be log N factorial, which is at most N log N. So how long does
it take to bring the entropy down to something small? It's going to be on the order of log
to the 4th N steps.
Okay. So let me now prove this claim about the decay of entropy. Okay. So for this I'm
going to use the theorem. So recall that you can decompose the entropy of a random
permutation based -- you can decompose it into the contributions of the different
positions. So let's write the entropy of mu is the contribution from position zero plus the
contribution from position 1 and so on.
Okay. Now I'm going to divide these positions up into geometrically growing intervals,
so the -- so this one has 1, this one has 1, now this one has 2, the next one will have 4, the
next one will have 8 and so on.
Now, let M star be the value that maximizes the contribution of the corresponding
interval. So one of these intervals will contribute more entropy than anybody else. Let
M star be the index of that interval.
Okay. So what we're going to do is we're going to choose our random variable T so that
it gives high values of lambda for positions in that interval. So it's enough to show that
there's some random variable T such that for every J the card that's matched with J is
lambda J uniform over the cards that were originally higher in the deck, where lambda J
is constant for J in the good interval and zero otherwise.
Okay. So we're saying that if J -- we have to design this random time T such that if J is in
this special interval and I is less than J, then the probability that J matches with I is at
most -- I mean is at least 1 over 4 times J. So it's roughly -- so the card matched with J is
roughly uniform over the all cards that were initially higher than that.
>>: Do you want to say why this is enough?
>> Ben Morris: So this is -- this is -- right. The theorem will tell us that if -- that now we
get a decay in entropy that's equal to at least one quarter times the contribution of the
entropy of the positions in IM star. So ->>: [inaudible] 1 over log N with the whole thing?
>> Ben Morris: Right. Which is at least 1 over log N of the -- 1 over log N of the overall
entropy.
>>: Right. Wasn't there a log Q [inaudible]?
>> Ben Morris: Well, we're trying to prove log squared. The additional factor of log
comes in the actual theorem.
>>: [inaudible]
>> Ben Morris: Right. So we get one log -- so one of the logs comes from the fact that
you're only doing one interval in those log intervals, and another log comes from the
theorem.
Okay. And then there was a third log which was because you're doing -- I'm counting log
N steps as one step. All right.
So let me just give examples of -- so let's say N equals 32. It's actually easy to work
things out if N is the power of 2, so let me assume that N is 32. So then the card in
position X is sent to either X over 2 or X over 2 plus 16 with probability of half each. So
I'll call this -- so basically X is sent to the floor of X over 2, but then there's this random
offset that you add each step, which is either 16 or zero.
Okay. And the -- so let's assume that J is between 2 to the M minus 1 and 2 to the M
minus 1. So M star takes the value M for some M at least 1. Now, notice that if you
have two cards, say the card at position 20 and the card at position 7, now, if the coin
flips that determine the offsets each step are the same for both cards, then those cards are
bound to collide.
So say -- look at 20 and 7. After one step you go to -- let's say the offset is zero at this
step. It becomes 10 and 3. And then divide by 2 again, it becomes 5 and 1, and now let's
say plus 16 to both sides. Divide this 5 by 2, you get 2, 1 divided by 2 is zero. What?
>>: [inaudible]
>> Ben Morris: Huh? You can see what's happening is that after a while you forget
where you started from and all that matters is what the offsets were, because the -- you're
dividing by -- each step you're dividing by 2, so after log of your initial position steps,
you've completely forgotten where you were and you're just ->>: The offsets are feeding the high order bits of [inaudible].
>> Ben Morris: Exactly. Yeah. Right.
>>: [inaudible]
>>: Why do you assume that they have -- that they receive the same offset?
>>: If. If they receive the same offset.
>> Ben Morris: Right. If it happens that two cards receive the same offsets each step,
they are bound to collide. Okay. In at most M steps, where M is the -- where the bigger
card is less than 2 to the M minus 1.
>>: [inaudible]
>> Ben Morris: Oh, okay. Uh-huh. Okay. Okay. So -- so -- so in particular, if two
cards say potentially could collide after seven steps, assuming that all the offsets are the
same, then the chance that they will collide is one half to the 7th. Okay. Or if they could
collide after 16 steps, then the chance that they will is one half to the 16.
So this tells us how we want to design our random time T. It has to -- the probability that
T equals R has to grow exponentially with R. So we'll start with -- so we'll say the
probability that it's -- that it -- that it stops right when you start is going to be like on the
order of 1 over J. And then with each additional step, we have to multiply the probability
of the time stopping there by a factor of 2 to cancel out with this probability of our two
cards having had the same offsets each step.
So okay. So we're going to -- oops. Okay. So we're going to let our random variable T
satisfy probability T equals R is at least 2 to the R minus M minus 1 for R less than equal
to M. So it will -- so it starts out on the order of 1 over 2 to the M and then grows
exponentially with a rate 2.
So then if I is less than J, the probability that the match of J is I is at least 1 over 2 to the
N plus 1. Okay. It doesn't matter how far away I was initially from J. You know if -from the previous slide, you know if I is less than J that there is a certain number of steps
such that after this many steps I and J are bound to collide if they have the same offsets.
>>: [inaudible] collision or J?
>> Ben Morris: So here's the thing: You know that I and J will collide if they have the
same offsets at a certain point, and that happens with probability one half to the -- I'll
call -- I'll say that if they collide -- if they potentially collide after seven steps, I'll say that
their distance is 7. So if their distance is 7, then the chance that they are going to run into
each other is one half to the 7th. But that doesn't imply that they're matched with each
other. Because the way that a match is defined is you wait until time T and then the next
card that is matched with card J after time T, the next one that collides with it after time T
is defined as its match.
Okay. So for any -- so for any I less than J, it's some distance away from J. The chance
that it will collide with J after this many steps is one half to the distance. The chance that
the random variable T actually takes the value that is the distance between I and J is like
one half to the M times 2 to the distance. Because we're -- this starts out at one half to
the M and grows by a factor of two each step.
So you end up with -- so the probability -- maybe I should have written that -- done more
steps. Probability M of J equals I equals probability that they -- let's say the distance is -between I and J is D, so this is -- the probability that the match of J is I is the probability
that they do in fact collide after D steps, so I and J collide after D steps. But then you
also have to have the fact that the time T is the time after which you start matching things
together also has to take the value in D. So this is -- so the chance that I ->>: So you bound this by the probability that they collide exactly at the stopping time?
So this has to be the first collision after T [inaudible]?
>> Ben Morris: Yeah. In this example, after time T is just at time T, because everything
will collide with something at time T. All right. So yeah. I mean, I should say at, right,
after time, at. Because it happens at time T. Times the probability of T equals D. So the
chance of colliding after D steps is one half to the D. And now we've designed our
random variable T such that the probability that it takes the value D is 2 to the D minus N
minus 1. So then the one half to the D will cancel out with a 2 to the D here. So this is at
least 2 to the minus M minus 1. And J is on the order of 2 to the M. So this is like -- this
is at least 1 over 4J, say. Okay.
>>: [inaudible]
>> Ben Morris: Okay. So we're done. This is it.
>>: Does this happen a lot that cards collide in that way by just following the same
offsets? It seems very crude.
>> Ben Morris: Um ->>: It's the only way for them to collide is to have the same offsets.
[multiple people speaking at once]
>>: Eventually [inaudible] they can be different at first, but eventually they have to
[inaudible].
>> Ben Morris: Yeah. That's easy to see if you -- there's an equivalent description of the
Thorp shuffle when at the power of 2 where you just -- you just -- you take the position -so write the position in binary and then each step you erase the first bit and then put a
random bit at the end, and that's the new position. So what happens is the -- after a while
it doesn't matter what was at the beginning; all that matters is that all -- these new bits
that you're writing in.
>>: [inaudible] reverse direction.
>>: Reversing direction all the time to keep us [inaudible].
>> Ben Morris: Oh. I thought it was -- you don't shift to the left, you shift to the right?
Okay.
[multiple people speaking at once]
>> Ben Morris: Uh-huh.
>>: Then you have to --
>> Ben Morris: Yeah, okay.
>>: [inaudible]
>> Ben Morris: Right.
>>: Confusing that the cards on top have nowhere in the [inaudible].
[multiple people speaking at once]
>>: [inaudible]
>> Ben Morris: Okay. So let me -- okay. So let me -- anyway, so this -- so it's kind of
nice because you're just verifying the condition that it involves only the pairs of cards.
So kind of like when you're -- maybe you have some card shuffle, you might -- you
reason at an upper bound for the mixing time by just thinking, well, how long does it take
this shuffle to mix up pairs. But what this does is it says, well, I can actually prove an
upper bound on the mixing time by just considering the behavior of pairs.
Okay. So if there are no more questions on this, let me do the other example.
Okay. All right. Here it is. Let me just remind you of the definition of the L-reversal
chain. So -- so when -- so you have N cards arrayed in a circle and then each step you'll
choose an interval of cards of length at most L and reverse it. So if you consider a single
card, it will move -- when it does move, it will move on the order of L -- distance on the
order of L. But so it will have to be moved on the order of N squared over L squared
times before its position is random. But it's only going to moved with probability on the
order of L over N each step. So you need to -- so to figure out the time to randomize a
single card, you take the N squared over L squared and you multiply by N over L and you
get N cubed over L cubed. And then you have to multiply by an additional factor of log
N, because there are N cards that have to be randomized.
So this can all be made vigorous using David's argument. You know, using
eigenfunctions and so on. But I'm trying to give a heuristic.
So you know the mixing time has to be at least on the order of N cubed over L cubed log
N. But then now consider pairs. So if a pair of cards are initially adjacent, then they will
remain adjacent after one step unless they were separated by the interval that was
reversed at that step. Each step an interval will only reverse -- will only separate at most
two bounds between adjacent cards. So by the coupon collector problem -- by the
coupon collector problem, the time to separate all the bounds is on the order of N log N.
So Durrett argued that the -- well, Durrett proved that the mixing time has to be at least
on the order of the maximum of N and N cubed over L cubed times log N and
conjectured that this is also an upper bound.
Okay. So basically since the argument I just gave for the lower bound, it just has to do
with pairs in considering the behavior of pairs, we should expect that maybe on your
theorem, could also prove something about the upper bound.
Okay. Okay. So indeed we get the -- we get the following corollary of the main
theorem, which says that the mixing time is [inaudible] of log cubed N times the strange
max of N and N cubed over L cubed. So it's the conjecture with an extra factor log
squared N. Okay.
And the proof will proceed in a similar way to the Thorp shuffle. So first we're going to
decompose the -- so we'll start with some arbitrary permutation mu, we'll decompose the
entropy of mu according to the positions. Now we'll break the positions into
geometrically growing intervals starting with -- initially the first interval is of length L.
So here. So I guess it goes L and then of length 3L and then 9L and so on.
So we must have the contribution from one of the intervals at least 1 over log N times the
overall entropy. And here I'm using some -- okay. I introduce some new notation here.
I'll write that the entropy -- so you have some random permutation mu. And I'll say that
the entropy of mu on AK and mu comma AK, this is my notation for the contribution of
the indices in AK to the entropy of mu. So this is the sum over JN AK of EJ plus the
contribution of AK to the overall entropy.
So now we break down according to the value of K. If K is one, so the -- most of the
entropy comes from the -- or the highest amount of entropy comes from the first interval,
then it actually works out we can analyze this using standard techniques. So imagine that
we're just -- okay. So we want to say that we get a nice decay in entropy. So if the -- so
if you just consider the case where a lot of the entropy comes from the first L positions,
then here we're -- notice that if you switch -- if you switch an interval and then you
switch another interval that's exactly the same only one smaller on each side, then that
has the effect of just -- of keeping everything the same only reversing the cards that are in
the outer interval and not the inner interval.
And when the length of the interval is comparable to the number of cards, then this is a
well-known card shuffle that's called random transpositions. And like everything is
known about it, so the rate of decay of entropy is known. So I don't have to do anything
using my theorem for this case.
>>: That was when you make this successive pair switches.
>> Ben Morris: Oh, right. So you can use what's called comparison techniques to -- so if
you can simulate a known Markov chain using moves of your Markov chain, then you
can get ->>: But this is not a move, right, because [inaudible].
>>: No, no, L is the maximum length of interval.
>>: Maximum [inaudible].
>>: Switch a uniform length [inaudible].
>> Ben Morris: Yeah. So since I can simulate random transpositions using two
consecutive moves of my Markov chain, I can use what's called comparison techniques to
compare the log [inaudible] constant for random transpositions which the log [inaudible]
constant for this -- for the L-reversal chain in the case where the length of reversals is
comparable to the number of cards. So I can get a bound in the decay of entropy in that
case.
But, anyway, my point is in this case it's completely not related to this talk, so I'll just
move on to the case where ->>: [inaudible]
>>: Then the length of the interval where the action is is the same as the length of the
interval where you [inaudible].
>>: So L, L is fixed, so you say K equals 1 means that most of the entropy ->>: So at least one of the [inaudible].
>> Ben Morris: So if we -- so we put down our cards according to mu, once -- so once
we've gotten to the top L cards, it still looks random, you know, up to the factor log N.
So that means that -- that we -- that means that we just need to consider the effect of the
chain on the top -- on the randomness of the top L cards.
>>: I see. Okay.
>> Ben Morris: Okay. So now let's assume that K is bigger than 1. So in this case we're
going to use our theorem. So we have to -- okay. So we want to come up with a random
time T such that -- such that the distribution of the card matched with J is roughly
uniform over 123 up to J. So how long are we -- how long are we going to want to run
the chain if J is in AK, so suppose we have -- okay. So we have our A1, A2 and so.
Remember these are geometrically growing intervals, dot, dot, dot, so suppose we're in -we have some J in AK. So since these intervals are going geometrically, the value of J is
going to be roughly the size of AK, so J is up to -- to a constant factor is just the size of
AK.
So how long do we need to -- how many steps do we need to run the chain such that the
card that will -- that is matched with J is roughly uniform over what was initially higher?
Well, each time you -- well, it's like the analysis from before. Each time you move J it's
going to move on the order of L. So you need to move it on the order of N squared over
L squared times -- sorry. On the order of size of AK squared over L squared times, and
you touch it with probability on the order of L over N each step.
So the number of times you're going to want to move it -- the number of times you're
going to want to do your process is going to be like size of AK squared over L squared
times N over L. Okay. So if we let T be some constant times size of AK squared over L
squared times N over L, and so here it actually works out that we want to -- so I could
probably just let the random variable T equal little T, but I wasn't able to prove that it
works in that way, so I had to just -- I had to -- the only way I could get things to work is
if I let T be uniform over 123 up to little T.
So now what's the chance that -- what's the chance that you -- that I is matched with J if I
is less than J? Well, if I run the process for this many steps, then -- then the distribution
of what's close to J after that many steps is going to be roughly uniform over 123 up to J.
But there's only going to be a collision involving J with probability on the order of T over
N. Because there's a collision each step. So if I run the process for only little T steps,
there's only -- the chance that there's a collision involving J is only on the order of T over
N.
So the distribution of the match of J is going to be lambda J uniform, where lambda J is
constant times the max of T over N and 1. Right? Okay.
>>: [inaudible] you don't need card J to collide exactly [inaudible].
>> Ben Morris: Right. We need it to collide by time T. So since we only have T
chances for it to collide, then the probability -- you have to multiply by the T over N,
because you only have T chances.
>>: Can you explain that again why you get lambda J uniformity [inaudible]?
>> Ben Morris: Okay. Okay. So by the -- okay. So by the -- so you can see where this
number comes from, right? This is how long it takes to kind of make the distribution of
card J roughly uniform over 123 up to J. But that's not enough to imply that it even has a
match. So if I were to run the process for that many steps and look at some card that's
nearby, it's going to be roughly uniform over 123 up to J. Okay.
>>: Okay. Now I see.
>> Ben Morris: But I also need for card J to have collided with something ->>: The collisions here you get just by doing these two successive intervals, one
[inaudible] where are -- what's the source of collisions?
>> Ben Morris: So remember I gave an alternative definition of the L-reversal chain
where each step you don't actually know -- you can narrow the reverse interval down to
2, one of which was 1 bigger than the other on each side. So I -- so roughly speaking -okay. So the original definition of the L-reversal chain is take an interval of length at
most L and reverse it. The new definition is take an interval of length at most L and
reverse it, and then with probability at half, change your mind and decide that you don't
really want to reverse the two cards on the outside of the interval.
So, in other words, do -- in other words, reverse the interval, but then do a collision of -say if A and B are the ones on the outside, do a collision of A and B after reversing the
whole interval. All right.
>>: Then you have to start with another exact of a uniform distribution [inaudible].
>> Ben Morris: It's -- yeah, I mean, it's basically the uniform distribution except at the
ends there's some issues. But -- okay. So -- but the point is there's only two cards that
collide at any step. So if you want to -- if you're running the process for only little T
steps, you will only -- at most T -- little T of your cards have -- or 2 times little T of your
cards have collided, so that's why you have to multiply by this factor of T over N when T
is less than N. Okay.
So then -- all right. So where's my next slide.
>>: [inaudible]
>> Ben Morris: Once T is bigger than N, then we don't have to worry about this issue.
So -- but, by the way, you're wondering where the max comes from in Durrett's
conjecture, it comes from this min, and this min is a lower bound on the decay of entropy.
So when you turn that into an upper bound for the mixing time you get a max. So this
is -- so if we kind of follow this min around we'll -- and in the end it will show us where
the max is.
Okay. So then by the theorem -- so by the theorem, the decay of entropy after little T
steps is going to be -- so let me write lambda for what I call -- okay. So lambda is this
quantity that I wrote on the previous slide, constant T over N min 1. So the decay of
entropy is going to be at least a constant times lambda over a log N times the entropy
attributable to decay
interval. And remember we're assuming that that's at least a fraction 1 over log N times
the overall entropy, so the entropy decays by a factor of 1 minus constant over log
squared N times the initial entropy, which is at most E to the minus constant lambda over
log squared N times the initial entropy.
So but remember this is the decay in entropy after little T steps. To get the rate of decay
of entropy per step, we have to divide this by T, so we get one over -- so now I'm
plugging in the value of lambda, so I'm plugging in the value of lambda, and, well, no,
over here I'm just dividing by T, so C lambda over T log squared N. And here I'm
plugging in the value for lambda, which is constant times T over N min 1. Oh, and
there's two Cs. This is -- yeah. This Y is ->>: C squared equals C.
>> Ben Morris: C squared equals C for today. Okay. So take this, divide by T to get the
rate of entropy decay. And it's equal. If you're substituting for lambda you get 1 over C
log squared N times the max of N and T. Because if you have a min and then you do 1
over it, you have to change it to a max.
And now I'm saying -- I'm using the fact that T is at most N cubed over L cubed. So we
can replace the T by N cubed over L cubed. And this gives us a bound on the decay of
entropy per unit of time over the medium term. And, again, the initial entropy is N log N,
so to get the mixing time you take the rate of decay of entropy and multiply that by -- or 1
over the rate of decay of entropy, multiply that by log of the initial entropy, and then you
get -- you get what I claimed it was, log cubed times max N, N over L cubed.
>>: Where was -- can you go into the value for T and how [inaudible]?
>> Ben Morris: Okay. So T -- oh, I erased it. So T was a constant times the size of AK
squared over L squared times N over L, and the size of any interval is less than it, than N.
So this is less equal constant times N cubed over L cubed.
>>: So the only time this is sharp is if the relevant interval happened to be the last one?
>> Ben Morris: Right.
>>: Seems like you can regain something there by making the last point shorter.
>>: [inaudible]
>> Ben Morris: Well, this -[multiple people speaking at once]
>>: So making them not grow geometrically, divide into log N [inaudible].
>> Ben Morris: But this size of AK is just an approximation for the value of J. So this is
like J squared over -- this is like J squared over L squared for some value of J and AK.
Okay. So there's my talk.
[applause]
Download