>> Ben Morris: Okay. So I'm going to talk about improved mixing time bounds for the L-reversal chain and Thorp shuffle. And this will be a two-part talk. So today I'm going to focus on -- I'm going to prove a theorem about -- a general theorem about card shuffling. And then in the part two I'm going to talk about how to apply that theorem to the two shuffles that are in the title. Okay. So setup, let PT XY be transition probabilities from a Markov chain. And we're going to be studying card shuffles so that the distribution at time T will converge to the uniform distribution as T goes to infinity. And we're interested in how fast. So I'll quantify that using the mixing time which I define as the smallest T such that the total variation distance from uniform is at most a quarter at time T. So I'll be interested in two specific card shuffles which I'll talk about now. The first one is it was introduced by Rick Durrett as a model for evolution of a genome. And you have N cards that are arrayed in a circle and each step you choose an interval of cards of length at most L and reverse it. So I've drawn -- here's one step of the L-reversal chain where the 345 becomes 543. So there are two parameters: N the number of cards and L which is the maximum length of an interval that can be reversed. And Durrett has this conjecture that the mixing time is -- it's this funny expression, the maximum of N and N cubed over L cubed times log N. So I'll say some more in part two where this comes from. And I'll just say now that this is kind of a hard example because in this particular Markov chain the L2 mixing time -- so if we define the mixing time in terms of not using total variation distance or L1 distance but using L2 distance, then in fact the conjecture would not be true. It's a fractional power of N higher. So it's kind of -- the conjecture is barely true. >>: [inaudible] >>: He proved the lower bounds. >>: He proved the lower bounds. >> Ben Morris: Right. He put the -- he proved the lower bound and left the upper bound open. Okay. And the second example that I'm going to be interested in is what's called the Thorp shuffle. Oops. Well, here. So in the Thorp shuffle, in a step of the Thorp shuffle you divide the deck in two equal piles and then you line up the cards in pairs. And for each pair of cards you flip a coin. And if the coin lands heads, you drop from the left and then from the right. If the coin lands tails, you drop from the right and then from the left. And you do this independently for each pair of cards that gets lined up together. Okay. So here it was an open problem to show that the mixing time was a polynomial function of the log of the number of cards. So actually I had proved that when N is the power of 2 it's -- there's a poly log mixing time. But it was still open to show for arbitrary even N that there's a poly log mixing time. Okay. So the theorem that I'll prove today has -- will give new bounds for the L-reversal chain and the Thorp shuffle. So for the L-reversal chain previously the best bound for the mixing time was constant times N cubed over L squared log N, which can be a fractional power of N times the conjecture. But using the theorem, I can get the conjecture to within a log -- to within a logarithmic factor. And I want to point out that it was -- the spectral gap was determined to within constant factors by Cancrini, Caputo and Martinelli. >>: Both cases? >> Ben Morris: Um ->>: [inaudible] >> Ben Morris: Oh, oh, for the L-reversal -- yeah, that was just -- that was for the L-reversal chain. Okay. Now, for the Thorp shuffle, the previously best bound had been log N to the 29th. And as I said, that was only for N a power of 2. This was by Montenegro and Tetali . Now, using the theorem we can improve this to log N to the 4th and for arbitrary even N. Okay. So I'll say a little bit more about these examples in Thursday's talk. >>: [inaudible] >> Ben Morris: No. The only lower bound I know is log N, so there's a trivial lower bound of log N, but some people believe it's log N squared. Okay. Let me just introduce my notation. So I'll represent an ordering of cards using a permutation. So if the ordering is -- so the -- if it's 4312, then I'll represent that by the permutation such that mu of 4 equals 1, mu of 3 equals 2 and so on. >>: [inaudible] >> Ben Morris: I'm a maverick. The big word. So then I can -- like if I write mu times pi 1 times pi 2, I'm going to use this arrow notation where I'll draw an arrow from 2 to 1. If the card in position 2 gets sent to the -- gets sent to position 1, so that I can just follow the trajectory of a card by saying, well, suppose the initial ordering is mu and I want to follow the card with a label 3, I can just follow the arrows and see what happens to it. Okay. Okay. So the general kind of card shuffle that I'm going to be studying will be a generalization of Three Card Monte. So in Three Card Monte there's a table and the cards are laid face down on the table. And the dealer will move the cards around. And occasionally he'll put two cards together and separate them quickly so that you can't tell which is which. So I'm going to model that move mathematically using what I call a collision, which is -so a collision is a random permutation that's -- with probability of half is a transposition and with probability of half is the identity. Okay. So you can define the Thorp shuffle naturally by using collisions. So in the Thorp shuffle when you -- so you cut the deck into two equal piles. And then the cards that get paired with each other collide. The L-reversal chain, it's not -- it's not defined -- the way I defined it before wasn't in terms of collisions, but there is a way to define it in terms of collisions, as follows. Suppose that I told you that in a particular step of the L-reversal chain either this interval of length K was reversed or the interval of length K plus 2 that includes this interval plus one card on each side is reversed, then you could generate the conditional distribution after one step by first reversing the inner interval and then switching the outer two cards with probability of half. Okay. So we're going to try to show that after a bunch of shuffles the distribution is close to uniform. And the key operation in these card shuffles is a collision. So it's helpful to -- what we want to do is we want to analyze how a collision brings a random permutation closer to uniform. And it turns out that a good measure of distance from uniform -- to do this is so-called relative entropy. So define -- so for a probability distribution PI define the relative entropy as the sum over I, PI log MPI. So another way to say that is if a random variable X is chosen according to P, then the relative entropy is the expected value to have log of M times the probability of X. Okay. So what's nice about the relative entropy is that you can decompose it into contributions from the different positions. So let's suppose we generated our N to permutation by first putting down the bottom card and then the next from the bottom card and then the next one and so on. Now, when we're generating a particular -- the card in a particular position, we're choosing from the probability distribution on what's the next -what's the card that will go here. And if we let EK be the expected entropy of the distribution of the card that goes in location K given what's below it, then we can write the overall entropy as the sum of the EK. Okay. So now we can talk about -- so we have the overall entropy. We can talk about, well, how much of the entropy is attributable to this position or that position or this block of positions. So okay. So the setup is we have the pi which is our random permutation, so this is -- pi is -- pi is a shuffle. And so we have pi 1, pi 2, up to pi K is a product of K independent copies of pi, this is like the distribution of the deck after K shuffles. And the relative entropy will be converging to zero. And it's going to be nonincreasing, so for any card shuffle, after you do one step, the relative entropy is going to be less than or equal to what it was when you started. And so the way we're going to show convergence is to show that there's a guaranteed loss in entropy every time we do a shuffle. So we're going to try to lower bound the entropy loss using one of these shuffles that involve collisions. >>: [inaudible] variations? >> Ben Morris: There's some formula that can [inaudible] total variation. >>: Is that why you use a log perhaps? >> Ben Morris: No. I'll show you exactly where I use the log. Okay. So -- okay. All right. So we're talking about relative entropy. I'm going to use the fact that the function F of X equals X log MX is strictly convex. And what this means is that if you have two numbers, P and Q, then the average of F of P and F of Q is bigger than F of the average of P and Q. So we can define the distance between P and Q as this difference. So the difference between the average of F and F of the average. Okay. That was for P and Q numbers. If P and Q are probability distributions, define the distance between P and Q as the sum over I of D, PI QI. >>: [inaudible] >> Ben Morris: Excuse me? >>: [inaudible] >> Ben Morris: I don't know. I don't know. I doubt it. I put distance in quotation marks because I'm -- I'm not -- I don't care whether it satisfies a triangle inequality or anything like that. So the distance between probability distributions is a sum of D, PI QI. Now, since the entropy is the sum of F over all the probabilities, the distance between two probability distributions is the difference between the average of the entropies and the entropy of the average. Okay. So some useful facts that I will use later on. This function D is -- so DPQ is convex in Q. And then I'm going to use what I call the projection lambda which is if you have little P and little Q, which are distributions of random variables X and Y respectively, then suppose you're looking at the distribution of G of X and G of Y for some function G, this can only decrease the distance between the distributions. So the distance between little P and little Q is at least the distance between big P and big Q. So, in other words, applying a function -- when you apply a function you lose information. So you're going to tend to bring the two distributions closer together. Okay. So another fact I use is a comparison between this D distance and relative entropy. So relative entropy is a notion of a distance from the uniform distribution. DPU is another notion of distance from the uniform distribution. So there's a lambda that relates the two. It says that DPI is at least a constant over log N times the relative entropy of P. Okay. So it turns out that the worst case is a point mass where it's constant for the D distance and log N for the entropy. >>: How N is decided [inaudible]. >> Ben Morris: Oh, it was N before, wasn't it. Yeah. Sorry. Thanks. Every time I give this talk there's a new issue. >>: I would make sure your audience has a shorter memory. >> Ben Morris: Can I get everything in one page? Okay. So I'll use this -- okay. So I'm going to write CIJ for a collision between I and J. Around I'm going to use the abusive notation CIJ equals a half times the identity plus a half times transposition of I and J. Now, okay. So the key thing to look at in these shuffles is what is the effect of a collision on the relative entropy. So let's suppose we start with the random permutation M. So we fix a position J. Now we're going to collide J with some value of I where I is less than or equal to J. So I think of position J as being J from the top of the deck, so what I'm saying is you start -- so you choose a position J and then you collide that with some position higher in the deck. Now, what we're interested in is what is the loss of entropy when we do this. So, well, the entropy of mu is the same as the entropy of mu times the transposition of I and J, because, well, this is a deterministic permutation, so we're just -- up to a difference in the ordering it's the same permutation. Okay. And mu times collision of I and J is -- has a distribution which is an even mixture of distribution of mu and the distribution of mu times transposition of I and J. So the loss in entropy is exactly the distance between mu and mu times transposition of I and J, okay, by the definition of this D function. So what is the loss of entropy when you collide position J with position I? It's exactly the distance between the distribution of mu and the distribution of mu IJ. Okay. Now, there's a lambda which says that on the average, so when you average over all positions higher in the deck than J, the loss in entropy is roughly a constant times EJ. >>: EJ is -- what is EJ? >> Ben Morris: EJ is the portion of the ->>: [inaudible] >> Ben Morris: Excuse me? >>: [inaudible] >> Ben Morris: EJ is the portion of the overall entropy that is attributable to position J. >>: But wouldn't that [inaudible] or do you take the average of what you see [inaudible]? >> Ben Morris: So the definition of the entropy attributable to position J is -- okay. Suppose -- okay. So here's position J. So let's say -- so I'm going to start generating my random permutation starting from the bottom. Let's say it's 8 and then 6 and then 3. Now, there's something -- I'm going to put a card in position J with some distribution. >>: [inaudible] eventually you average over 3, 6, and 8. >> Ben Morris: Yeah. You average over that. Right. Because the entropy of this distribution is a random variable because it depends on the 3, 6, and 8. So EJ is the expected value of this -- EJ is expected value of the entropy of question mark given the stuff below. Okay. Okay. Is that -- okay. So that's pretty clear. Okay. All right. So I can actually prove this lambda. It's pretty simple. So [inaudible] we just prove it in the special case where J is N. Okay. So in the case where -- so let's say we collide the card -- the bottom card with some position higher in the deck chosen randomly. Then the loss of entropy is at least, well, roughly a constant times the entropy attributable to position N on the average, averaging over all I. Okay. So let's say N equals 4. Then if you have a random permutation mu, this induces probability distributions of single cards. So the column -- so in this table the columns are the probability distributions on the various cards. And the rhos are probability distributions on the various locations. So if you look at location 4, this is saying, well, with probability at half you'll find a 1 there, and with probability at half you'll find a 4 there. So what happens when you take mu and then you do a collision of, say, 4 and 2, so mu times 4 and 2? Well, the loss in entropy is going to be the distance between mu and mu times 4 2. But by the projection lambda, the distance between mu and mu times 4 2 is at least the distance between the induced distributions on the bottom card. So if we ignore all the information other than what's in the bottom card, that gives us a lower bound on the distance between the two distributions. Okay. So what is the induced distribution of the bottom card? Well, if mu -- the distribution of the bottom card in mu with probability at half it's 1 and with probability at half it's 4, in mu times 4 2 it's with probability at half 2 and with probability at half 3. So the distance between those two things is the distance between a half, zero, zero a half and zero a half a half zero, which is distance -- which is just four times the distance between zero and a half. So when we take an average over all I of the distance between rho 4 and rho I, by convexity, this is at least the distance between rho 4 and the average over all rhos. But the average over all rhos is just the uniform distribution. So this is the distance between rho 4 and the uniform distribution, and then by the lambda they mentioned, this is at least roughly a constant times the relative entropy of the bottom position, which is E4. Okay. So that's a proof in the certain case where we're looking at the bottom card, but the general proof is much the same. Okay. So that was the case where you -- okay. So what I'm saying is kind of the general lesson of all this stuff is that if you have -- if you have a random permutation and then you decompose the entropy according to position and you get, you know, say, 1, 1, 10, 2, 3 or something, then a good way to -- a good way to do a shuffle would be to choose the position with the -- that's contributing the highest relative entropy and then do a collision between that position and something higher in the deck. Okay. So, well, I talked about the case where the position -- so if this is J, then the I -- I said, well, if I is uniform over everything higher then you get something good. It turns out that you only need I to be roughly uniform over what's higher. So let me define a notion of roughly uniform. I'll say that a random variable X is a lambda uniform over 123 up to J if the probability that X equals I is at least lambda over J for K equals 123 up to J. So if lambda equals 1, this would just be uniform, but we're allowing it to be somewhat less than uniform as specified by the number lambda. So there's a -- there's a lambda which is proved much the same as what you've seen which says that if X is lambda uniform over 123 up to J, then the expected distance between mu and mu times transposition of J and X is roughly a constant times the entropy attributable to J. Okay. So if we were to -- so if we were to choose a position I ->>: [inaudible] >> Ben Morris: It's right there. Entropy of -- well, I'll write a more mathematical -okay. So it's EJ is -- so it's expected entropy of -- I forget what I'm using for permutations, pi or mu or something. Let's say mu. Entropy of mu inverse of J given mu inverse of J plus 1 up to N. >>: [inaudible] U is a random distribution [inaudible]? >> Ben Morris: What I'm doing is I'm saying for any distribution mu, so I'm proving inequalities about ->>: So your mu is distribution; their mu is a permutation. >> Ben Morris: No. Mu is ->>: [inaudible] >> Ben Morris: -- is a permutation everywhere. >>: Distribution. Distribution [inaudible]. >> Ben Morris: Well, it's a permutation, but I'm using -- when I write entropy of a random variable, I'm using the random variable as shorthand for its distribution. >>: [inaudible]. >> Ben Morris: [inaudible] personation. So what I mean here is the entropy of the conditional distribution of mu inverse of J -- the conditional distribution of what card you put in position J given what cards you put below position J. >>: I'm struggling with something. If both mu [inaudible] uniform random permutation ->> Ben Morris: Okay. >>: -- then UJX would also ->> Ben Morris: Right. And EJ would be zero because everything's uniform. >>: [inaudible] >> Ben Morris: Because it's relative entropy. Right. It's just -- it's this unfortunate thing that entropy and relative entropy are like -- it's the minus of entropy in the usual sense of the word entropy. Okay. So okay. So the upshot is just a general philosophy is that if you take -- so if you choose a position that's contributing a lot of entropy, collide it with a card that's -- with a position where the position is roughly uniform over what's higher, you're going to reduce the relative entropy by a lot, roughly a constant times whatever this was contributing. Okay. Wow. Okay. Okay. Oh, wow, I can't even do my trick of covering up here. So okay. So the general philosophy is, okay, you take a -- you have a position [inaudible] you collide it with something higher in the deck and this gives you a nice loss of entropy. Now, but in the shuffles that I've been discussing, there are only local moves. So in one step you're not colliding a card with something that's going to be roughly uniform over everything that's higher. So what you have to do is you have to look at a bunch of moves. So maybe -- so let's say it's this picture. We have this position J that's contributing a lot of entropy. What we want to do is we want to, say, look at a thousand moves into the future and pay attention to what this card is. So let's say we run -- we do a thousand shuffles. Then if in that case the distribution of what this cards collides with is roughly uniform over what was higher in the deck when we started, then we're still going to get this nice loss in relative entropy. So here's how the theorem goes. So the theorem says let T be a random variable less or equal to little T. So this is -- 24 big T is going to be like how much shuffles we're going to do before we pay attention to the next collision. So for each -- for each card I we're going to say that the match of I equals J, if J is the first card to collide with I after time T and vice versa. So we run -- so we do T shuffles. And then after this we start watching the shuffling and we start matching cards up with each other. So we do -- we do T shuffles. And then the next two cards that collide are matched with each other, and then the next two cards that collide are matched with each other unless one of them has already been matched with something else and so on. Okay. And if a card never matches with anything, we say it's matched with itself. Okay. Now suppose that for every K if random variable M of K is lambda K uniform over 123 up to K. Okay. So every card K is matched with something that's roughly uniform over the cards above it where roughly is quantified with the number lambda K. Then for every permutation mu, the loss in entropy when you multiply by pi 1, pi 2, up to pi T, is at least the sum over K times lambda K over log N time the entropy attributable to position K. >>: So with this you're always assuming that the card is transposed -- this collides with something that's higher in the deck? >> Ben Morris: No. The definition of the match is just the next card that it collides with after time T. So that could be -- so for one of the -- so if 3 and 7 collides, then the match of 3 is 7 and the match of 7 is 3, and for only one of those two will it have collided with something higher in the deck. >>: Right. But here you're assuming that it's lambda K over [inaudible]. Does that mean that ->> Ben Morris: Oh, oh, oh. Oh. It does not -- okay. That's a good point. When I say that a distribution is lambda UK uniform over 123 up to K, I don't require that the distribution be concentrated on 123 up to K. >>: But if K is small, then it means that these are getting a lot of -- these are getting constant fraction of the mass, a total ->>: [inaudible] lambda K is ->> Ben Morris: Right. But, yeah, but, you know, in order for us to be getting something good out of it, it's going to have to be -- right, right. So right. So in particular when it's small, we're not going to -- this value of T is going to have to be small. Right. The value of T is tuned according to which positions are contributing the most entropy. If we have -- if say the entropy is -- say it was this -- I'm jumping ahead because I'm now -- I'm talking about how to use the theorem. But if it was all zeros and say, you know, 5 5, then we would want to -- we would want this position, the match of this to be roughly uniform over what's above it. So we would only want to do a small number of shuffles. Whereas if we were trying to get -- if we had a 5 down here and these were zeros, then this guy would have to be almost uniform. So we'd have to do a bunch of shuffles. Okay. But that's how to use a theorem. Right now I'm just talking about the theorem. Okay. >>: When you're giving us the long sequence of little M over there, it's hard to focus on like where you're going. >> Ben Morris: Okay. Well, yeah. So the -- it's actually -- okay. The level of complexity of the talk, it's -- today it's like this. And then but in the next one it's actually -- it's like this. So I don't want to scare everybody -- I mean, I hope people come back for Thursday because things get better. >>: The problem is not the complexity, it's about the division of the steps. >> Ben Morris: Okay. >>: Why don't you tell us how you use it, then, or is that the next one? >> Ben Morris: That's going to be on Thursday. Right. I'm going to ->>: [inaudible] >> Ben Morris: I mean, yeah. I just want to get -- I just want to get to the proof of this, and then I'll get to the -- okay. >>: [inaudible] have the cookies and other things after the talk. >>: Can you go back to the statement of the theorem? >> Ben Morris: Yes. Okay. >>: Could you at least tell us what were -- I mean, you will choose these lambda Ks, right? >> Ben Morris: I will -- pardon? >>: I agree that this [inaudible] lambda K, but what do you -- like what kind of values should I think about lambda Ks? >> Ben Morris: Constant. >>: They're all K? >> Ben Morris: Well, it's going to be constant for the region. So what we want to do is we're going to have this decomposition of relative entropy. And so let me draw it as a graph. So let's say -- so here's position 123 up to N. and there's going to be some -- say that the entropies are -- the relative entropies are contributed in this, according to this graph. Then we're going to want to choose our T. T is -- this random variable T is what we have control over. We're going to choose our random variable T such that the lambdas in this fat part of the distribution are big. So we want to -- you know, the lambda is also to roughly -- you know, we want to sort of have big lambdas in here if there's a bump in the relative entropies there. >>: [inaudible] will you just apply this with mu, which is like a constant, a delta, delta distribution? >> Ben Morris: No. Mu is -- this is -- mu is just some arbitrary random permutation. >>: Well, you started [inaudible] right and then you do a bunch of things and you get some entropy loss, relative entropy loss like here. Now [inaudible] new mu, then you do it again. Right? >> Ben Morris: Right. Exactly. >>: [inaudible] >> Ben Morris: Oh, well, okay, right. Russell's right. Mu is the distribution after time T of your -- pardon? >>: After some [inaudible]. >> Ben Morris: Okay. Good. >>: [inaudible] >> Ben Morris: Okay. I mean -- all right. I shall go on. >>: Everything's on a need-to-know basis. >> Ben Morris: I thought I was doing good. >>: [inaudible] >> Ben Morris: Okay. So let me improve the theorem. So you have -- let me introduce the notation pi superscript T for the product of pi 1, pi 2, up to pi T. Okay. So okay. So think about again in terms of the dealer doing Three Card Monte, he is -- okay. So he may move the cards around on the table. But the real action happens when he does these collisions. So we want to kind of ignore all this showy stuff that he's doing with his hands and just focus on the collisions. So what we want to observe is that if I and J collide at some point, then -- so suppose we know that at some point I will collide with J, we could just initially do a collision of I and J. And it doesn't change the distribution. So ->>: [inaudible] well, I mean, it depends on what you do with I and -- other things that you do with I and J. >>: [inaudible] >>: Oh, well, yeah. Right. >> Ben Morris: So like if I know that the dealer is going to trade -- you know, collide these two cards, I can just step in and just do that initially and it doesn't change the distribution. So that means that pi T has the same distribution of the product of -- of CI MI times pi T. so remember M of I is the card that's matched with I. So if I go in beforehand and for every I I collide it with its match initially ->>: [inaudible] after some random variable [inaudible]. >> Ben Morris: Um-hmm. Um-hmm. So in particular we know that at some point IM and M of I will collide. So if I just go in ahead of time and do a collision, I don't change the distribution. Okay. Now, the point is I'm trying to just -- what I really want to do is scratch out this pi T and just focus on the product of collisions. >>: The pi T on the right is the same pi the as on the left? >>: In distribution. >>: Oh, I see, because it's at one half, one half. >> Ben Morris: Okay. So I'm trying to bound the entropy of mu times pi T. That's what this whole thing is about. This is the entropy of mu times the product of collisions times pi T. Okay. So what I would like to do is just kind of scratch -- what I kind of -- loosely what I want to say is this multiplication by pi T can only help us, so I can just kind of cross it off and end up with mu times this product of collisions. But we have to be careful because pi T depends on this product of collisions because what is this product of collisions, it's a collision of I and a match of I, and how do you know what the match of I is? It's based on pi T. Okay. So but the way we get around that is we condition on what gets matched with what. Okay. So let M be the collection of what gets matched with what. Then if we -- conditioning on M can only increase the relative entropy, okay, because, you know, the relative -- by Jensen's inequality, the relative entropy of a weighted average is at most the weighted average of the relative entropies. So let's condition on pie T and M. So I end up with the same expression as here, only now I'm conditioning on pi T and M. Okay. Now, since I'm conditioning on pi T, now I can cross out this pi of T. Because if you condition on something, it's not random anymore. It's just you're just multiplying at the end by some -- a deterministic permutation, so you don't change the entropy. Okay. So I can cross this out. So I have now mu times the product of the collisions. Okay. The entropy of that given pi T and mu. But conditional on M, mu and the product of the collisions are independent of pi T. Because the only way that pi T affects the distribution of the product of collisions is that it tells you what gets matched with what. But if we condition on that, then they become independent. So now I can cross out pi T to the right of the conditioning line. So I end up with the expected value of the entropy of mu times this product of collisions given what gets matched with what. Okay. So now -- so I really can just ignore all this stuff that doesn't involve the collisions and ask myself, well, how much does -- by -- what is the entropy loss when I start with mu, and then I multiply by all these collisions. Okay. Okay. So I have -- okay. So you start with mu, you multiply by all these collisions, you know, the collision of I and its match. So what we're going to do is we're going to let mu K be the product of all collisions where both -- okay. M of I is less than or equal to I and I is less than or equal to K. So this is collisions that -- where both cards involved are at most K from the top. Okay. So then we can write the entropy of mu times mu N, which is what we're interested in, because mu N is just the whole collection of collisions. The difference between that and the entropy of mu, we write that as a telescoping sum of the differences between entropy of mu times nu K and the entropy of mu times nu K minus 1. Okay. So we want to figure out, well, what is the difference in relative entropy when we go from allowing just stuff at a distance K minus 1 from the top to allowing possibly a distance K from the top. Okay. Well, if card -- if K is matched with something below K in the deck, then it's not going to be -- then nu K minus 1 is equal to nu K because, well, nu K only allows stuff within distance K from the deck. So if you have something below it's not going to be allowed either way. But if the match of K is less than or equal to K, then nu K is equal to nu K minus 1 times this additional match of K and this additional collision of K and its match. So what is the loss in entropy when we take this permutation mu and nu K and we multiply it by a collision of K and its match, which remember is roughly uniform over stuff higher in the deck? Well, I'm going to cheat a little bit here, but if nobody notices how I'm cheating, I'm not going to bring it up. Okay. So remember we had this lambda that said if you start with a permutation, you multiply it by a collision between K and something roughly uniform over higher in the deck, you get a loss in entropy which is just the contribution to the entropy of that position. So the loss in entropy when we look at the difference between mu and nu K and nu K minus 1 is roughly a constant times the entropy attributable to position K. >>: Oh, the measure of mu and nu K minus 1. >> Ben Morris: That was one of the ways in which I cheated. Okay. It's for -- right. But, it's not cheating because the entropy attributable to position K of mu and nu K minus 1 is the same as the entropy attributable to position K of mu because nu K minus 1 -remember, this only affects stuff that's high -- that's within distance K minus 1 from the top of the deck. So we can't tell the difference between these two permutations until -- if we're dealing out cards from the bottom of the deck, we can't tell the difference between those two permutations before we get to the top K minus 1. So this EK is the same as the other EK. And there's another way I'm cheating, but I'm not -- since I'm at the end of the talk, I -- it's ->>: [inaudible] >> Ben Morris: Um... >>: Condition on M [inaudible] it is because -[multiple people speaking at once] >>: This EK should be -- it's a condition of a probability -- condition of the expectation given M. >> Ben Morris: Given M, right. [inaudible] right. So that's I guess the other way I'm cheating is that M, this mu, nu K minus 1, what this is, this is -- this has information about the collision between what gets matched with what. So when we condition on M, we actually -- we actually affect this. But it turns out that if K is matched -- whatever -- say K is matched -- say K is 10 and it's matched with 2, then we don't -- this nu K minus 1 does not affect what happens in positions 10 or 2. So it turns out that when you sit down and do the calculation, it's exactly like it was in that lambda. So I was cheating a little bit, but in terms of -- I mean, when you write down what is the distance between mu, nu K minus 1, and mu and nu K minus 1 and CK MK, you get an expression that it's exactly the same as -- you just get a sum over all the different possibilities, the distance between what's in location K and what's in location M of K, which is... >>: What are the other [inaudible]. >> Ben Morris: No, it's just those two things. It's just -- this is not exactly a -- I said so the lambda gives this. It's not exactly a consequence of the lambda. But it's -- but I should say by analysis it's analogous to that and the lambda, by a calculation similar to that in the lambda, we get this. But I think it would -- if I went through it all, I think I would probably confuse people more than -- all right. So let me end this. So let me end this part of the talk. And then on Thursday I'll show how to use the theorem to get the bounds on those two examples, and that's actually kind of not so -- you know, that's actually kind of straightforward. >>: Interesting that you get the max of two expressions in the L-reversal. >> Ben Morris: Yeah. >>: Very natural to get in the lower bound because there are two different arguments in the lower bound. >> Ben Morris: Right. >>: [inaudible] and upper bound, it's strange. >> Ben Morris: It is. Yeah, it is. >>: You're supposed to say no [inaudible] now, you're not supposed to say it's strange. >> Ben Morris: Well, I mean, I'm going to talk about the -- I'm going to talk about that on Thursday, so -- so I promise I'll talk about where the max comes from on Thursday. >>: And you said you were going to point out where you use the log [inaudible] that's the log there? >> Ben Morris: Oh, okay. So one of the logs comes from comparing the -- so remember there's this comparison between the D distance from uniform and the relative entropy. And that's -- there was a log there. And then in the examples you -- it turns out in both examples the way to do it is to divide the relative entropy in -- up into these geometrically growing intervals. So it's interval of constant length and then double that, double that, double that and so on. And then to choose the interval of positions that contributes the highest amount of entropy, and then choose your random variable T to -- with that interval in mind, you know, make the lambdas high for that interval. So you -since you're only kind of ->>: It would be completely independent [inaudible]? >> Ben Morris: T will depend on mu. But, anyway, since you're only optimizing for this one interval, which is, you know, 1 over log N of -- there's going to be like -- there's going to be log N intervals and you're only doing something good for one of these intervals, you get another factor of log N there. >>: Is the L-reversal like -- so this is conjecture known if L is constant [inaudible]? >> Ben Morris: Yes. >>: [inaudible] and for any constant [inaudible]. >> Ben Morris: And for L close to N it's known as well. >>: [inaudible] power of N. Okay. [applause]