>> Yuval Peres: Okay. So I've been looking... some years instead direct pose the question. And on...

>> Yuval Peres: Okay. So I've been looking forward to see a proof of this result for some years instead direct pose the question. And on Tuesday we had all the buildup of the tension, and today we're going to see how come that direct lower bound can be almost realized as an upper bound for the L-reversal chain. Ben Morris. >> Ben Morris: Okay. Thank you. And I forgot to say thank you, Yuval, for the invitation. Okay. There's my title again. So let me remind you of the main theorem, well, so first the definition. Okay. We have this -- with this card shuffle that has collisions, so a collision is -- when you do a collision with probability at half, you do nothing with probability at half, you do a transposition. And we have a random variable T such that after time T you start matching the cards up based on what collides with what. So after time T, if two cards collide, they're matched. Two more cards collide, they're matched, and so on. >>: I think from the bottom it really makes us focus on the card you're trying to hide. >> Ben Morris: Okay. And then the theorem said -- that was the definition of the match of a card. And then the theorem said that if -- for every card K its match is lambda K uniform, so roughly uniform, in some sense, over all the cards that were initially above it, then for any permutation mu, so any permutation, random permutation you start with, if you run the chain for T additional steps, you're going to bring the entropy down by the sum over K roughly a constant times the entropy that was attributable to position K. Okay. Does anybody need to be reminded about any of the definitions? Okay. So this was -- this was on Tuesday. So now ->>: [inaudible] say the end is not entropy but [inaudible]. >> Ben Morris: Okay. So right. And the end is relative entropy. The difference between entropy and the entropy of the uniform distribution. So this will have a different science. So when this thing gets smaller, then that actually means that the permutation is getting more random. So zero means it's completely random. All right. So now let me -- hopefully I can bring out some ideas by doing an example. So I will -- I'll get to the L-reversal chain, but let me do the Thorp shuffle first since the analysis is simpler. So recall in the Thorp shuffle you cut the cards exactly in half and then you line up the cards in pairs and then the pairs collide. So it turns out that the card in position Y, if Y is less than an over 2, will collide with the card at position Y plus N over 2, where now I'm making the positions go from zero up to N minus 1. So Y -- the card in position Y and the card in position Y plus N over 2 are sent to 2Y and 2Y plus 1, determined by a coin flip which goes where. Okay. So in this example it's actually easier to work with the time reversal, so the time reversal is obtained by just -well, you look at this picture and turn around the direction of the arrows. So for the time reversal, if X is even, then card X and card X plus 1 will collide on a given step, and they'll be sent to either X over 2 or X over 2 plus N over 2 where a coin flip decides what goes where. Okay. So for general X the position of the card that was at X after one step is going to be either the floor of X over 2 or the floor of X over 2 plus N over 2 based on the outcome of the coin flip. Okay. So I claim that if little T is the ceiling of a log base 2 of N, then for any random permutation mu, if you start with mu and then you run T steps of the chain, you decrease the entropy by a factor one minus constant over log squared N. Okay. And before I prove this, let me just say why this gives the claimed mixing time bound of log N to the 4th. So if after log N steps you reduce the mixing time -- you reduce the entropy by a factor 1 minus constant over log squared N, that means that the rate of decay of entropy per step is going to be constant over log cubed of N. And the initial entropy, the entropy of the identity is going to be log N factorial, which is at most N log N. So how long does it take to bring the entropy down to something small? It's going to be on the order of log to the 4th N steps. Okay. So let me now prove this claim about the decay of entropy. Okay. So for this I'm going to use the theorem. So recall that you can decompose the entropy of a random permutation based -- you can decompose it into the contributions of the different positions. So let's write the entropy of mu is the contribution from position zero plus the contribution from position 1 and so on. Okay. Now I'm going to divide these positions up into geometrically growing intervals, so the -- so this one has 1, this one has 1, now this one has 2, the next one will have 4, the next one will have 8 and so on. Now, let M star be the value that maximizes the contribution of the corresponding interval. So one of these intervals will contribute more entropy than anybody else. Let M star be the index of that interval. Okay. So what we're going to do is we're going to choose our random variable T so that it gives high values of lambda for positions in that interval. So it's enough to show that there's some random variable T such that for every J the card that's matched with J is lambda J uniform over the cards that were originally higher in the deck, where lambda J is constant for J in the good interval and zero otherwise. Okay. So we're saying that if J -- we have to design this random time T such that if J is in this special interval and I is less than J, then the probability that J matches with I is at most -- I mean is at least 1 over 4 times J. So it's roughly -- so the card matched with J is roughly uniform over the all cards that were initially higher than that. >>: Do you want to say why this is enough? >> Ben Morris: So this is -- this is -- right. The theorem will tell us that if -- that now we get a decay in entropy that's equal to at least one quarter times the contribution of the entropy of the positions in IM star. So ->>: [inaudible] 1 over log N with the whole thing? >> Ben Morris: Right. Which is at least 1 over log N of the -- 1 over log N of the overall entropy. >>: Right. Wasn't there a log Q [inaudible]? >> Ben Morris: Well, we're trying to prove log squared. The additional factor of log comes in the actual theorem. >>: [inaudible] >> Ben Morris: Right. So we get one log -- so one of the logs comes from the fact that you're only doing one interval in those log intervals, and another log comes from the theorem. Okay. And then there was a third log which was because you're doing -- I'm counting log N steps as one step. All right. So let me just give examples of -- so let's say N equals 32. It's actually easy to work things out if N is the power of 2, so let me assume that N is 32. So then the card in position X is sent to either X over 2 or X over 2 plus 16 with probability of half each. So I'll call this -- so basically X is sent to the floor of X over 2, but then there's this random offset that you add each step, which is either 16 or zero. Okay. And the -- so let's assume that J is between 2 to the M minus 1 and 2 to the M minus 1. So M star takes the value M for some M at least 1. Now, notice that if you have two cards, say the card at position 20 and the card at position 7, now, if the coin flips that determine the offsets each step are the same for both cards, then those cards are bound to collide. So say -- look at 20 and 7. After one step you go to -- let's say the offset is zero at this step. It becomes 10 and 3. And then divide by 2 again, it becomes 5 and 1, and now let's say plus 16 to both sides. Divide this 5 by 2, you get 2, 1 divided by 2 is zero. What? >>: [inaudible] >> Ben Morris: Huh? You can see what's happening is that after a while you forget where you started from and all that matters is what the offsets were, because the -- you're dividing by -- each step you're dividing by 2, so after log of your initial position steps, you've completely forgotten where you were and you're just ->>: The offsets are feeding the high order bits of [inaudible]. >> Ben Morris: Exactly. Yeah. Right. >>: [inaudible] >>: Why do you assume that they have -- that they receive the same offset? >>: If. If they receive the same offset. >> Ben Morris: Right. If it happens that two cards receive the same offsets each step, they are bound to collide. Okay. In at most M steps, where M is the -- where the bigger card is less than 2 to the M minus 1. >>: [inaudible] >> Ben Morris: Oh, okay. Uh-huh. Okay. Okay. So -- so -- so in particular, if two cards say potentially could collide after seven steps, assuming that all the offsets are the same, then the chance that they will collide is one half to the 7th. Okay. Or if they could collide after 16 steps, then the chance that they will is one half to the 16. So this tells us how we want to design our random time T. It has to -- the probability that T equals R has to grow exponentially with R. So we'll start with -- so we'll say the probability that it's -- that it -- that it stops right when you start is going to be like on the order of 1 over J. And then with each additional step, we have to multiply the probability of the time stopping there by a factor of 2 to cancel out with this probability of our two cards having had the same offsets each step. So okay. So we're going to -- oops. Okay. So we're going to let our random variable T satisfy probability T equals R is at least 2 to the R minus M minus 1 for R less than equal to M. So it will -- so it starts out on the order of 1 over 2 to the M and then grows exponentially with a rate 2. So then if I is less than J, the probability that the match of J is I is at least 1 over 2 to the N plus 1. Okay. It doesn't matter how far away I was initially from J. You know if -from the previous slide, you know if I is less than J that there is a certain number of steps such that after this many steps I and J are bound to collide if they have the same offsets. >>: [inaudible] collision or J? >> Ben Morris: So here's the thing: You know that I and J will collide if they have the same offsets at a certain point, and that happens with probability one half to the -- I'll call -- I'll say that if they collide -- if they potentially collide after seven steps, I'll say that their distance is 7. So if their distance is 7, then the chance that they are going to run into each other is one half to the 7th. But that doesn't imply that they're matched with each other. Because the way that a match is defined is you wait until time T and then the next card that is matched with card J after time T, the next one that collides with it after time T is defined as its match. Okay. So for any -- so for any I less than J, it's some distance away from J. The chance that it will collide with J after this many steps is one half to the distance. The chance that the random variable T actually takes the value that is the distance between I and J is like one half to the M times 2 to the distance. Because we're -- this starts out at one half to the M and grows by a factor of two each step. So you end up with -- so the probability -- maybe I should have written that -- done more steps. Probability M of J equals I equals probability that they -- let's say the distance is -between I and J is D, so this is -- the probability that the match of J is I is the probability that they do in fact collide after D steps, so I and J collide after D steps. But then you also have to have the fact that the time T is the time after which you start matching things together also has to take the value in D. So this is -- so the chance that I ->>: So you bound this by the probability that they collide exactly at the stopping time? So this has to be the first collision after T [inaudible]? >> Ben Morris: Yeah. In this example, after time T is just at time T, because everything will collide with something at time T. All right. So yeah. I mean, I should say at, right, after time, at. Because it happens at time T. Times the probability of T equals D. So the chance of colliding after D steps is one half to the D. And now we've designed our random variable T such that the probability that it takes the value D is 2 to the D minus N minus 1. So then the one half to the D will cancel out with a 2 to the D here. So this is at least 2 to the minus M minus 1. And J is on the order of 2 to the M. So this is like -- this is at least 1 over 4J, say. Okay. >>: [inaudible] >> Ben Morris: Okay. So we're done. This is it. >>: Does this happen a lot that cards collide in that way by just following the same offsets? It seems very crude. >> Ben Morris: Um ->>: It's the only way for them to collide is to have the same offsets. [multiple people speaking at once] >>: Eventually [inaudible] they can be different at first, but eventually they have to [inaudible]. >> Ben Morris: Yeah. That's easy to see if you -- there's an equivalent description of the Thorp shuffle when at the power of 2 where you just -- you just -- you take the position -so write the position in binary and then each step you erase the first bit and then put a random bit at the end, and that's the new position. So what happens is the -- after a while it doesn't matter what was at the beginning; all that matters is that all -- these new bits that you're writing in. >>: [inaudible] reverse direction. >>: Reversing direction all the time to keep us [inaudible]. >> Ben Morris: Oh. I thought it was -- you don't shift to the left, you shift to the right? Okay. [multiple people speaking at once] >> Ben Morris: Uh-huh. >>: Then you have to -- >> Ben Morris: Yeah, okay. >>: [inaudible] >> Ben Morris: Right. >>: Confusing that the cards on top have nowhere in the [inaudible]. [multiple people speaking at once] >>: [inaudible] >> Ben Morris: Okay. So let me -- okay. So let me -- anyway, so this -- so it's kind of nice because you're just verifying the condition that it involves only the pairs of cards. So kind of like when you're -- maybe you have some card shuffle, you might -- you reason at an upper bound for the mixing time by just thinking, well, how long does it take this shuffle to mix up pairs. But what this does is it says, well, I can actually prove an upper bound on the mixing time by just considering the behavior of pairs. Okay. So if there are no more questions on this, let me do the other example. Okay. All right. Here it is. Let me just remind you of the definition of the L-reversal chain. So -- so when -- so you have N cards arrayed in a circle and then each step you'll choose an interval of cards of length at most L and reverse it. So if you consider a single card, it will move -- when it does move, it will move on the order of L -- distance on the order of L. But so it will have to be moved on the order of N squared over L squared times before its position is random. But it's only going to moved with probability on the order of L over N each step. So you need to -- so to figure out the time to randomize a single card, you take the N squared over L squared and you multiply by N over L and you get N cubed over L cubed. And then you have to multiply by an additional factor of log N, because there are N cards that have to be randomized. So this can all be made vigorous using David's argument. You know, using eigenfunctions and so on. But I'm trying to give a heuristic. So you know the mixing time has to be at least on the order of N cubed over L cubed log N. But then now consider pairs. So if a pair of cards are initially adjacent, then they will remain adjacent after one step unless they were separated by the interval that was reversed at that step. Each step an interval will only reverse -- will only separate at most two bounds between adjacent cards. So by the coupon collector problem -- by the coupon collector problem, the time to separate all the bounds is on the order of N log N. So Durrett argued that the -- well, Durrett proved that the mixing time has to be at least on the order of the maximum of N and N cubed over L cubed times log N and conjectured that this is also an upper bound. Okay. So basically since the argument I just gave for the lower bound, it just has to do with pairs in considering the behavior of pairs, we should expect that maybe on your theorem, could also prove something about the upper bound. Okay. Okay. So indeed we get the -- we get the following corollary of the main theorem, which says that the mixing time is [inaudible] of log cubed N times the strange max of N and N cubed over L cubed. So it's the conjecture with an extra factor log squared N. Okay. And the proof will proceed in a similar way to the Thorp shuffle. So first we're going to decompose the -- so we'll start with some arbitrary permutation mu, we'll decompose the entropy of mu according to the positions. Now we'll break the positions into geometrically growing intervals starting with -- initially the first interval is of length L. So here. So I guess it goes L and then of length 3L and then 9L and so on. So we must have the contribution from one of the intervals at least 1 over log N times the overall entropy. And here I'm using some -- okay. I introduce some new notation here. I'll write that the entropy -- so you have some random permutation mu. And I'll say that the entropy of mu on AK and mu comma AK, this is my notation for the contribution of the indices in AK to the entropy of mu. So this is the sum over JN AK of EJ plus the contribution of AK to the overall entropy. So now we break down according to the value of K. If K is one, so the -- most of the entropy comes from the -- or the highest amount of entropy comes from the first interval, then it actually works out we can analyze this using standard techniques. So imagine that we're just -- okay. So we want to say that we get a nice decay in entropy. So if the -- so if you just consider the case where a lot of the entropy comes from the first L positions, then here we're -- notice that if you switch -- if you switch an interval and then you switch another interval that's exactly the same only one smaller on each side, then that has the effect of just -- of keeping everything the same only reversing the cards that are in the outer interval and not the inner interval. And when the length of the interval is comparable to the number of cards, then this is a well-known card shuffle that's called random transpositions. And like everything is known about it, so the rate of decay of entropy is known. So I don't have to do anything using my theorem for this case. >>: That was when you make this successive pair switches. >> Ben Morris: Oh, right. So you can use what's called comparison techniques to -- so if you can simulate a known Markov chain using moves of your Markov chain, then you can get ->>: But this is not a move, right, because [inaudible]. >>: No, no, L is the maximum length of interval. >>: Maximum [inaudible]. >>: Switch a uniform length [inaudible]. >> Ben Morris: Yeah. So since I can simulate random transpositions using two consecutive moves of my Markov chain, I can use what's called comparison techniques to compare the log [inaudible] constant for random transpositions which the log [inaudible] constant for this -- for the L-reversal chain in the case where the length of reversals is comparable to the number of cards. So I can get a bound in the decay of entropy in that case. But, anyway, my point is in this case it's completely not related to this talk, so I'll just move on to the case where ->>: [inaudible] >>: Then the length of the interval where the action is is the same as the length of the interval where you [inaudible]. >>: So L, L is fixed, so you say K equals 1 means that most of the entropy ->>: So at least one of the [inaudible]. >> Ben Morris: So if we -- so we put down our cards according to mu, once -- so once we've gotten to the top L cards, it still looks random, you know, up to the factor log N. So that means that -- that we -- that means that we just need to consider the effect of the chain on the top -- on the randomness of the top L cards. >>: I see. Okay. >> Ben Morris: Okay. So now let's assume that K is bigger than 1. So in this case we're going to use our theorem. So we have to -- okay. So we want to come up with a random time T such that -- such that the distribution of the card matched with J is roughly uniform over 123 up to J. So how long are we -- how long are we going to want to run the chain if J is in AK, so suppose we have -- okay. So we have our A1, A2 and so. Remember these are geometrically growing intervals, dot, dot, dot, so suppose we're in -we have some J in AK. So since these intervals are going geometrically, the value of J is going to be roughly the size of AK, so J is up to -- to a constant factor is just the size of AK. So how long do we need to -- how many steps do we need to run the chain such that the card that will -- that is matched with J is roughly uniform over what was initially higher? Well, each time you -- well, it's like the analysis from before. Each time you move J it's going to move on the order of L. So you need to move it on the order of N squared over L squared times -- sorry. On the order of size of AK squared over L squared times, and you touch it with probability on the order of L over N each step. So the number of times you're going to want to move it -- the number of times you're going to want to do your process is going to be like size of AK squared over L squared times N over L. Okay. So if we let T be some constant times size of AK squared over L squared times N over L, and so here it actually works out that we want to -- so I could probably just let the random variable T equal little T, but I wasn't able to prove that it works in that way, so I had to just -- I had to -- the only way I could get things to work is if I let T be uniform over 123 up to little T. So now what's the chance that -- what's the chance that you -- that I is matched with J if I is less than J? Well, if I run the process for this many steps, then -- then the distribution of what's close to J after that many steps is going to be roughly uniform over 123 up to J. But there's only going to be a collision involving J with probability on the order of T over N. Because there's a collision each step. So if I run the process for only little T steps, there's only -- the chance that there's a collision involving J is only on the order of T over N. So the distribution of the match of J is going to be lambda J uniform, where lambda J is constant times the max of T over N and 1. Right? Okay. >>: [inaudible] you don't need card J to collide exactly [inaudible]. >> Ben Morris: Right. We need it to collide by time T. So since we only have T chances for it to collide, then the probability -- you have to multiply by the T over N, because you only have T chances. >>: Can you explain that again why you get lambda J uniformity [inaudible]? >> Ben Morris: Okay. Okay. So by the -- okay. So by the -- so you can see where this number comes from, right? This is how long it takes to kind of make the distribution of card J roughly uniform over 123 up to J. But that's not enough to imply that it even has a match. So if I were to run the process for that many steps and look at some card that's nearby, it's going to be roughly uniform over 123 up to J. Okay. >>: Okay. Now I see. >> Ben Morris: But I also need for card J to have collided with something ->>: The collisions here you get just by doing these two successive intervals, one [inaudible] where are -- what's the source of collisions? >> Ben Morris: So remember I gave an alternative definition of the L-reversal chain where each step you don't actually know -- you can narrow the reverse interval down to 2, one of which was 1 bigger than the other on each side. So I -- so roughly speaking -okay. So the original definition of the L-reversal chain is take an interval of length at most L and reverse it. The new definition is take an interval of length at most L and reverse it, and then with probability at half, change your mind and decide that you don't really want to reverse the two cards on the outside of the interval. So, in other words, do -- in other words, reverse the interval, but then do a collision of -say if A and B are the ones on the outside, do a collision of A and B after reversing the whole interval. All right. >>: Then you have to start with another exact of a uniform distribution [inaudible]. >> Ben Morris: It's -- yeah, I mean, it's basically the uniform distribution except at the ends there's some issues. But -- okay. So -- but the point is there's only two cards that collide at any step. So if you want to -- if you're running the process for only little T steps, you will only -- at most T -- little T of your cards have -- or 2 times little T of your cards have collided, so that's why you have to multiply by this factor of T over N when T is less than N. Okay. So then -- all right. So where's my next slide. >>: [inaudible] >> Ben Morris: Once T is bigger than N, then we don't have to worry about this issue. So -- but, by the way, you're wondering where the max comes from in Durrett's conjecture, it comes from this min, and this min is a lower bound on the decay of entropy. So when you turn that into an upper bound for the mixing time you get a max. So this is -- so if we kind of follow this min around we'll -- and in the end it will show us where the max is. Okay. So then by the theorem -- so by the theorem, the decay of entropy after little T steps is going to be -- so let me write lambda for what I call -- okay. So lambda is this quantity that I wrote on the previous slide, constant T over N min 1. So the decay of entropy is going to be at least a constant times lambda over a log N times the entropy attributable to decay interval. And remember we're assuming that that's at least a fraction 1 over log N times the overall entropy, so the entropy decays by a factor of 1 minus constant over log squared N times the initial entropy, which is at most E to the minus constant lambda over log squared N times the initial entropy. So but remember this is the decay in entropy after little T steps. To get the rate of decay of entropy per step, we have to divide this by T, so we get one over -- so now I'm plugging in the value of lambda, so I'm plugging in the value of lambda, and, well, no, over here I'm just dividing by T, so C lambda over T log squared N. And here I'm plugging in the value for lambda, which is constant times T over N min 1. Oh, and there's two Cs. This is -- yeah. This Y is ->>: C squared equals C. >> Ben Morris: C squared equals C for today. Okay. So take this, divide by T to get the rate of entropy decay. And it's equal. If you're substituting for lambda you get 1 over C log squared N times the max of N and T. Because if you have a min and then you do 1 over it, you have to change it to a max. And now I'm saying -- I'm using the fact that T is at most N cubed over L cubed. So we can replace the T by N cubed over L cubed. And this gives us a bound on the decay of entropy per unit of time over the medium term. And, again, the initial entropy is N log N, so to get the mixing time you take the rate of decay of entropy and multiply that by -- or 1 over the rate of decay of entropy, multiply that by log of the initial entropy, and then you get -- you get what I claimed it was, log cubed times max N, N over L cubed. >>: Where was -- can you go into the value for T and how [inaudible]? >> Ben Morris: Okay. So T -- oh, I erased it. So T was a constant times the size of AK squared over L squared times N over L, and the size of any interval is less than it, than N. So this is less equal constant times N cubed over L cubed. >>: So the only time this is sharp is if the relevant interval happened to be the last one? >> Ben Morris: Right. >>: Seems like you can regain something there by making the last point shorter. >>: [inaudible] >> Ben Morris: Well, this -[multiple people speaking at once] >>: So making them not grow geometrically, divide into log N [inaudible]. >> Ben Morris: But this size of AK is just an approximation for the value of J. So this is like J squared over -- this is like J squared over L squared for some value of J and AK. Okay. So there's my talk. [applause]

>> Yuval Peres: Okay. So I've been looking... some years instead direct pose the question. And on...

Related documents

Products

Support

&gt;&gt; Yuval Peres: Okay. So I've been looking... some years instead direct pose the question. And on...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Yuval Peres: Okay. So I've been looking... some years instead direct pose the question. And on...