>> Ben Morris: Okay. So I'm going to... L-reversal chain and Thorp shuffle. And this will be...

advertisement
>> Ben Morris: Okay. So I'm going to talk about improved mixing time bounds for the
L-reversal chain and Thorp shuffle. And this will be a two-part talk. So today I'm going
to focus on -- I'm going to prove a theorem about -- a general theorem about card
shuffling. And then in the part two I'm going to talk about how to apply that theorem to
the two shuffles that are in the title.
Okay. So setup, let PT XY be transition probabilities from a Markov chain. And we're
going to be studying card shuffles so that the distribution at time T will converge to the
uniform distribution as T goes to infinity. And we're interested in how fast. So I'll
quantify that using the mixing time which I define as the smallest T such that the total
variation distance from uniform is at most a quarter at time T.
So I'll be interested in two specific card shuffles which I'll talk about now. The first one
is it was introduced by Rick Durrett as a model for evolution of a genome. And you have
N cards that are arrayed in a circle and each step you choose an interval of cards of length
at most L and reverse it.
So I've drawn -- here's one step of the L-reversal chain where the 345 becomes 543. So
there are two parameters: N the number of cards and L which is the maximum length of
an interval that can be reversed.
And Durrett has this conjecture that the mixing time is -- it's this funny expression, the
maximum of N and N cubed over L cubed times log N. So I'll say some more in part two
where this comes from. And I'll just say now that this is kind of a hard example because
in this particular Markov chain the L2 mixing time -- so if we define the mixing time in
terms of not using total variation distance or L1 distance but using L2 distance, then in
fact the conjecture would not be true. It's a fractional power of N higher. So it's kind
of -- the conjecture is barely true.
>>: [inaudible]
>>: He proved the lower bounds.
>>: He proved the lower bounds.
>> Ben Morris: Right. He put the -- he proved the lower bound and left the upper bound
open.
Okay. And the second example that I'm going to be interested in is what's called the
Thorp shuffle. Oops. Well, here. So in the Thorp shuffle, in a step of the Thorp shuffle
you divide the deck in two equal piles and then you line up the cards in pairs. And for
each pair of cards you flip a coin. And if the coin lands heads, you drop from the left and
then from the right. If the coin lands tails, you drop from the right and then from the left.
And you do this independently for each pair of cards that gets lined up together.
Okay. So here it was an open problem to show that the mixing time was a polynomial
function of the log of the number of cards. So actually I had proved that when N is the
power of 2 it's -- there's a poly log mixing time. But it was still open to show for
arbitrary even N that there's a poly log mixing time.
Okay. So the theorem that I'll prove today has -- will give new bounds for the L-reversal
chain and the Thorp shuffle. So for the L-reversal chain previously the best bound for the
mixing time was constant times N cubed over L squared log N, which can be a fractional
power of N times the conjecture. But using the theorem, I can get the conjecture to
within a log -- to within a logarithmic factor. And I want to point out that it was -- the
spectral gap was determined to within constant factors by Cancrini, Caputo and
Martinelli.
>>: Both cases?
>> Ben Morris: Um ->>: [inaudible]
>> Ben Morris: Oh, oh, for the L-reversal -- yeah, that was just -- that was for the
L-reversal chain.
Okay. Now, for the Thorp shuffle, the previously best bound had been log N to the 29th.
And as I said, that was only for N a power of 2. This was by Montenegro and Tetali .
Now, using the theorem we can improve this to log N to the 4th and for arbitrary even N.
Okay. So I'll say a little bit more about these examples in Thursday's talk.
>>: [inaudible]
>> Ben Morris: No. The only lower bound I know is log N, so there's a trivial lower
bound of log N, but some people believe it's log N squared.
Okay. Let me just introduce my notation. So I'll represent an ordering of cards using a
permutation. So if the ordering is -- so the -- if it's 4312, then I'll represent that by the
permutation such that mu of 4 equals 1, mu of 3 equals 2 and so on.
>>: [inaudible]
>> Ben Morris: I'm a maverick. The big word. So then I can -- like if I write mu times
pi 1 times pi 2, I'm going to use this arrow notation where I'll draw an arrow from 2 to 1.
If the card in position 2 gets sent to the -- gets sent to position 1, so that I can just follow
the trajectory of a card by saying, well, suppose the initial ordering is mu and I want to
follow the card with a label 3, I can just follow the arrows and see what happens to it.
Okay. Okay. So the general kind of card shuffle that I'm going to be studying will be a
generalization of Three Card Monte. So in Three Card Monte there's a table and the
cards are laid face down on the table. And the dealer will move the cards around. And
occasionally he'll put two cards together and separate them quickly so that you can't tell
which is which.
So I'm going to model that move mathematically using what I call a collision, which is -so a collision is a random permutation that's -- with probability of half is a transposition
and with probability of half is the identity.
Okay. So you can define the Thorp shuffle naturally by using collisions. So in the Thorp
shuffle when you -- so you cut the deck into two equal piles. And then the cards that get
paired with each other collide. The L-reversal chain, it's not -- it's not defined -- the way
I defined it before wasn't in terms of collisions, but there is a way to define it in terms of
collisions, as follows. Suppose that I told you that in a particular step of the L-reversal
chain either this interval of length K was reversed or the interval of length K plus 2 that
includes this interval plus one card on each side is reversed, then you could generate the
conditional distribution after one step by first reversing the inner interval and then
switching the outer two cards with probability of half.
Okay. So we're going to try to show that after a bunch of shuffles the distribution is close
to uniform. And the key operation in these card shuffles is a collision. So it's helpful
to -- what we want to do is we want to analyze how a collision brings a random
permutation closer to uniform. And it turns out that a good measure of distance from
uniform -- to do this is so-called relative entropy. So define -- so for a probability
distribution PI define the relative entropy as the sum over I, PI log MPI. So another way
to say that is if a random variable X is chosen according to P, then the relative entropy is
the expected value to have log of M times the probability of X.
Okay. So what's nice about the relative entropy is that you can decompose it into
contributions from the different positions. So let's suppose we generated our N to
permutation by first putting down the bottom card and then the next from the bottom card
and then the next one and so on. Now, when we're generating a particular -- the card in a
particular position, we're choosing from the probability distribution on what's the next -what's the card that will go here. And if we let EK be the expected entropy of the
distribution of the card that goes in location K given what's below it, then we can write
the overall entropy as the sum of the EK.
Okay. So now we can talk about -- so we have the overall entropy. We can talk about,
well, how much of the entropy is attributable to this position or that position or this block
of positions.
So okay. So the setup is we have the pi which is our random permutation, so this is -- pi
is -- pi is a shuffle. And so we have pi 1, pi 2, up to pi K is a product of K independent
copies of pi, this is like the distribution of the deck after K shuffles. And the relative
entropy will be converging to zero. And it's going to be nonincreasing, so for any card
shuffle, after you do one step, the relative entropy is going to be less than or equal to
what it was when you started.
And so the way we're going to show convergence is to show that there's a guaranteed loss
in entropy every time we do a shuffle. So we're going to try to lower bound the entropy
loss using one of these shuffles that involve collisions.
>>: [inaudible] variations?
>> Ben Morris: There's some formula that can [inaudible] total variation.
>>: Is that why you use a log perhaps?
>> Ben Morris: No. I'll show you exactly where I use the log. Okay. So -- okay. All
right. So we're talking about relative entropy. I'm going to use the fact that the function
F of X equals X log MX is strictly convex. And what this means is that if you have two
numbers, P and Q, then the average of F of P and F of Q is bigger than F of the average
of P and Q. So we can define the distance between P and Q as this difference. So the
difference between the average of F and F of the average.
Okay. That was for P and Q numbers. If P and Q are probability distributions, define the
distance between P and Q as the sum over I of D, PI QI.
>>: [inaudible]
>> Ben Morris: Excuse me?
>>: [inaudible]
>> Ben Morris: I don't know. I don't know. I doubt it. I put distance in quotation marks
because I'm -- I'm not -- I don't care whether it satisfies a triangle inequality or anything
like that.
So the distance between probability distributions is a sum of D, PI QI. Now, since the
entropy is the sum of F over all the probabilities, the distance between two probability
distributions is the difference between the average of the entropies and the entropy of the
average.
Okay. So some useful facts that I will use later on. This function D is -- so DPQ is
convex in Q. And then I'm going to use what I call the projection lambda which is if you
have little P and little Q, which are distributions of random variables X and Y
respectively, then suppose you're looking at the distribution of G of X and G of Y for
some function G, this can only decrease the distance between the distributions. So the
distance between little P and little Q is at least the distance between big P and big Q. So,
in other words, applying a function -- when you apply a function you lose information.
So you're going to tend to bring the two distributions closer together.
Okay. So another fact I use is a comparison between this D distance and relative entropy.
So relative entropy is a notion of a distance from the uniform distribution. DPU is
another notion of distance from the uniform distribution. So there's a lambda that relates
the two. It says that DPI is at least a constant over log N times the relative entropy of P.
Okay. So it turns out that the worst case is a point mass where it's constant for the D
distance and log N for the entropy.
>>: How N is decided [inaudible].
>> Ben Morris: Oh, it was N before, wasn't it. Yeah. Sorry. Thanks. Every time I give
this talk there's a new issue.
>>: I would make sure your audience has a shorter memory.
>> Ben Morris: Can I get everything in one page? Okay. So I'll use this -- okay. So I'm
going to write CIJ for a collision between I and J. Around I'm going to use the abusive
notation CIJ equals a half times the identity plus a half times transposition of I and J.
Now, okay. So the key thing to look at in these shuffles is what is the effect of a collision
on the relative entropy. So let's suppose we start with the random permutation M. So we
fix a position J. Now we're going to collide J with some value of I where I is less than or
equal to J. So I think of position J as being J from the top of the deck, so what I'm saying
is you start -- so you choose a position J and then you collide that with some position
higher in the deck.
Now, what we're interested in is what is the loss of entropy when we do this. So, well,
the entropy of mu is the same as the entropy of mu times the transposition of I and J,
because, well, this is a deterministic permutation, so we're just -- up to a difference in the
ordering it's the same permutation.
Okay. And mu times collision of I and J is -- has a distribution which is an even mixture
of distribution of mu and the distribution of mu times transposition of I and J. So the loss
in entropy is exactly the distance between mu and mu times transposition of I and J, okay,
by the definition of this D function.
So what is the loss of entropy when you collide position J with position I? It's exactly the
distance between the distribution of mu and the distribution of mu IJ. Okay. Now,
there's a lambda which says that on the average, so when you average over all positions
higher in the deck than J, the loss in entropy is roughly a constant times EJ.
>>: EJ is -- what is EJ?
>> Ben Morris: EJ is the portion of the ->>: [inaudible]
>> Ben Morris: Excuse me?
>>: [inaudible]
>> Ben Morris: EJ is the portion of the overall entropy that is attributable to position J.
>>: But wouldn't that [inaudible] or do you take the average of what you see [inaudible]?
>> Ben Morris: So the definition of the entropy attributable to position J is -- okay.
Suppose -- okay. So here's position J. So let's say -- so I'm going to start generating my
random permutation starting from the bottom. Let's say it's 8 and then 6 and then 3.
Now, there's something -- I'm going to put a card in position J with some distribution.
>>: [inaudible] eventually you average over 3, 6, and 8.
>> Ben Morris: Yeah. You average over that. Right. Because the entropy of this
distribution is a random variable because it depends on the 3, 6, and 8. So EJ is the
expected value of this -- EJ is expected value of the entropy of question mark given the
stuff below. Okay. Okay. Is that -- okay. So that's pretty clear. Okay.
All right. So I can actually prove this lambda. It's pretty simple. So [inaudible] we just
prove it in the special case where J is N. Okay. So in the case where -- so let's say we
collide the card -- the bottom card with some position higher in the deck chosen
randomly. Then the loss of entropy is at least, well, roughly a constant times the entropy
attributable to position N on the average, averaging over all I.
Okay. So let's say N equals 4. Then if you have a random permutation mu, this induces
probability distributions of single cards. So the column -- so in this table the columns are
the probability distributions on the various cards. And the rhos are probability
distributions on the various locations. So if you look at location 4, this is saying, well,
with probability at half you'll find a 1 there, and with probability at half you'll find a 4
there.
So what happens when you take mu and then you do a collision of, say, 4 and 2, so mu
times 4 and 2? Well, the loss in entropy is going to be the distance between mu and mu
times 4 2. But by the projection lambda, the distance between mu and mu times 4 2 is at
least the distance between the induced distributions on the bottom card. So if we ignore
all the information other than what's in the bottom card, that gives us a lower bound on
the distance between the two distributions.
Okay. So what is the induced distribution of the bottom card? Well, if mu -- the
distribution of the bottom card in mu with probability at half it's 1 and with probability at
half it's 4, in mu times 4 2 it's with probability at half 2 and with probability at half 3. So
the distance between those two things is the distance between a half, zero, zero a half and
zero a half a half zero, which is distance -- which is just four times the distance between
zero and a half.
So when we take an average over all I of the distance between rho 4 and rho I, by
convexity, this is at least the distance between rho 4 and the average over all rhos. But
the average over all rhos is just the uniform distribution. So this is the distance between
rho 4 and the uniform distribution, and then by the lambda they mentioned, this is at least
roughly a constant times the relative entropy of the bottom position, which is E4.
Okay. So that's a proof in the certain case where we're looking at the bottom card, but the
general proof is much the same.
Okay. So that was the case where you -- okay. So what I'm saying is kind of the general
lesson of all this stuff is that if you have -- if you have a random permutation and then
you decompose the entropy according to position and you get, you know, say, 1, 1, 10, 2,
3 or something, then a good way to -- a good way to do a shuffle would be to choose the
position with the -- that's contributing the highest relative entropy and then do a collision
between that position and something higher in the deck. Okay.
So, well, I talked about the case where the position -- so if this is J, then the I -- I said,
well, if I is uniform over everything higher then you get something good. It turns out that
you only need I to be roughly uniform over what's higher.
So let me define a notion of roughly uniform. I'll say that a random variable X is a
lambda uniform over 123 up to J if the probability that X equals I is at least lambda over
J for K equals 123 up to J. So if lambda equals 1, this would just be uniform, but we're
allowing it to be somewhat less than uniform as specified by the number lambda. So
there's a -- there's a lambda which is proved much the same as what you've seen which
says that if X is lambda uniform over 123 up to J, then the expected distance between mu
and mu times transposition of J and X is roughly a constant times the entropy attributable
to J.
Okay. So if we were to -- so if we were to choose a position I ->>: [inaudible]
>> Ben Morris: It's right there. Entropy of -- well, I'll write a more mathematical -okay. So it's EJ is -- so it's expected entropy of -- I forget what I'm using for
permutations, pi or mu or something. Let's say mu. Entropy of mu inverse of J given mu
inverse of J plus 1 up to N.
>>: [inaudible] U is a random distribution [inaudible]?
>> Ben Morris: What I'm doing is I'm saying for any distribution mu, so I'm proving
inequalities about ->>: So your mu is distribution; their mu is a permutation.
>> Ben Morris: No. Mu is ->>: [inaudible]
>> Ben Morris: -- is a permutation everywhere.
>>: Distribution. Distribution [inaudible].
>> Ben Morris: Well, it's a permutation, but I'm using -- when I write entropy of a
random variable, I'm using the random variable as shorthand for its distribution.
>>: [inaudible].
>> Ben Morris: [inaudible] personation. So what I mean here is the entropy of the
conditional distribution of mu inverse of J -- the conditional distribution of what card you
put in position J given what cards you put below position J.
>>: I'm struggling with something. If both mu [inaudible] uniform random
permutation ->> Ben Morris: Okay.
>>: -- then UJX would also ->> Ben Morris: Right. And EJ would be zero because everything's uniform.
>>: [inaudible]
>> Ben Morris: Because it's relative entropy. Right. It's just -- it's this unfortunate thing
that entropy and relative entropy are like -- it's the minus of entropy in the usual sense of
the word entropy.
Okay. So okay. So the upshot is just a general philosophy is that if you take -- so if you
choose a position that's contributing a lot of entropy, collide it with a card that's -- with a
position where the position is roughly uniform over what's higher, you're going to reduce
the relative entropy by a lot, roughly a constant times whatever this was contributing.
Okay.
Wow. Okay. Okay. Oh, wow, I can't even do my trick of covering up here. So okay.
So the general philosophy is, okay, you take a -- you have a position [inaudible] you
collide it with something higher in the deck and this gives you a nice loss of entropy.
Now, but in the shuffles that I've been discussing, there are only local moves. So in one
step you're not colliding a card with something that's going to be roughly uniform over
everything that's higher. So what you have to do is you have to look at a bunch of moves.
So maybe -- so let's say it's this picture. We have this position J that's contributing a lot
of entropy. What we want to do is we want to, say, look at a thousand moves into the
future and pay attention to what this card is.
So let's say we run -- we do a thousand shuffles. Then if in that case the distribution of
what this cards collides with is roughly uniform over what was higher in the deck when
we started, then we're still going to get this nice loss in relative entropy.
So here's how the theorem goes. So the theorem says let T be a random variable less or
equal to little T. So this is -- 24 big T is going to be like how much shuffles we're going
to do before we pay attention to the next collision. So for each -- for each card I we're
going to say that the match of I equals J, if J is the first card to collide with I after time T
and vice versa. So we run -- so we do T shuffles. And then after this we start watching
the shuffling and we start matching cards up with each other. So we do -- we do T
shuffles. And then the next two cards that collide are matched with each other, and then
the next two cards that collide are matched with each other unless one of them has
already been matched with something else and so on.
Okay. And if a card never matches with anything, we say it's matched with itself. Okay.
Now suppose that for every K if random variable M of K is lambda K uniform over 123
up to K. Okay. So every card K is matched with something that's roughly uniform over
the cards above it where roughly is quantified with the number lambda K. Then for every
permutation mu, the loss in entropy when you multiply by pi 1, pi 2, up to pi T, is at least
the sum over K times lambda K over log N time the entropy attributable to position K.
>>: So with this you're always assuming that the card is transposed -- this collides with
something that's higher in the deck?
>> Ben Morris: No. The definition of the match is just the next card that it collides with
after time T. So that could be -- so for one of the -- so if 3 and 7 collides, then the match
of 3 is 7 and the match of 7 is 3, and for only one of those two will it have collided with
something higher in the deck.
>>: Right. But here you're assuming that it's lambda K over [inaudible]. Does that mean
that ->> Ben Morris: Oh, oh, oh. Oh. It does not -- okay. That's a good point. When I say
that a distribution is lambda UK uniform over 123 up to K, I don't require that the
distribution be concentrated on 123 up to K.
>>: But if K is small, then it means that these are getting a lot of -- these are getting
constant fraction of the mass, a total ->>: [inaudible] lambda K is ->> Ben Morris: Right. But, yeah, but, you know, in order for us to be getting something
good out of it, it's going to have to be -- right, right. So right. So in particular when it's
small, we're not going to -- this value of T is going to have to be small. Right. The value
of T is tuned according to which positions are contributing the most entropy. If we
have -- if say the entropy is -- say it was this -- I'm jumping ahead because I'm now -- I'm
talking about how to use the theorem. But if it was all zeros and say, you know, 5 5, then
we would want to -- we would want this position, the match of this to be roughly uniform
over what's above it. So we would only want to do a small number of shuffles.
Whereas if we were trying to get -- if we had a 5 down here and these were zeros, then
this guy would have to be almost uniform. So we'd have to do a bunch of shuffles.
Okay. But that's how to use a theorem. Right now I'm just talking about the theorem.
Okay.
>>: When you're giving us the long sequence of little M over there, it's hard to focus on
like where you're going.
>> Ben Morris: Okay. Well, yeah. So the -- it's actually -- okay. The level of
complexity of the talk, it's -- today it's like this. And then but in the next one it's
actually -- it's like this. So I don't want to scare everybody -- I mean, I hope people come
back for Thursday because things get better.
>>: The problem is not the complexity, it's about the division of the steps.
>> Ben Morris: Okay.
>>: Why don't you tell us how you use it, then, or is that the next one?
>> Ben Morris: That's going to be on Thursday. Right. I'm going to ->>: [inaudible]
>> Ben Morris: I mean, yeah. I just want to get -- I just want to get to the proof of this,
and then I'll get to the -- okay.
>>: [inaudible] have the cookies and other things after the talk.
>>: Can you go back to the statement of the theorem?
>> Ben Morris: Yes. Okay.
>>: Could you at least tell us what were -- I mean, you will choose these lambda Ks,
right?
>> Ben Morris: I will -- pardon?
>>: I agree that this [inaudible] lambda K, but what do you -- like what kind of values
should I think about lambda Ks?
>> Ben Morris: Constant.
>>: They're all K?
>> Ben Morris: Well, it's going to be constant for the region. So what we want to do is
we're going to have this decomposition of relative entropy. And so let me draw it as a
graph. So let's say -- so here's position 123 up to N. and there's going to be some -- say
that the entropies are -- the relative entropies are contributed in this, according to this
graph. Then we're going to want to choose our T. T is -- this random variable T is what
we have control over. We're going to choose our random variable T such that the
lambdas in this fat part of the distribution are big. So we want to -- you know, the
lambda is also to roughly -- you know, we want to sort of have big lambdas in here if
there's a bump in the relative entropies there.
>>: [inaudible] will you just apply this with mu, which is like a constant, a delta, delta
distribution?
>> Ben Morris: No. Mu is -- this is -- mu is just some arbitrary random permutation.
>>: Well, you started [inaudible] right and then you do a bunch of things and you get
some entropy loss, relative entropy loss like here. Now [inaudible] new mu, then you do
it again. Right?
>> Ben Morris: Right. Exactly.
>>: [inaudible]
>> Ben Morris: Oh, well, okay, right. Russell's right. Mu is the distribution after time T
of your -- pardon?
>>: After some [inaudible].
>> Ben Morris: Okay. Good.
>>: [inaudible]
>> Ben Morris: Okay. I mean -- all right. I shall go on.
>>: Everything's on a need-to-know basis.
>> Ben Morris: I thought I was doing good.
>>: [inaudible]
>> Ben Morris: Okay. So let me improve the theorem. So you have -- let me introduce
the notation pi superscript T for the product of pi 1, pi 2, up to pi T. Okay. So okay. So
think about again in terms of the dealer doing Three Card Monte, he is -- okay. So he
may move the cards around on the table. But the real action happens when he does these
collisions. So we want to kind of ignore all this showy stuff that he's doing with his
hands and just focus on the collisions.
So what we want to observe is that if I and J collide at some point, then -- so suppose we
know that at some point I will collide with J, we could just initially do a collision of I and
J. And it doesn't change the distribution. So ->>: [inaudible] well, I mean, it depends on what you do with I and -- other things that
you do with I and J.
>>: [inaudible]
>>: Oh, well, yeah. Right.
>> Ben Morris: So like if I know that the dealer is going to trade -- you know, collide
these two cards, I can just step in and just do that initially and it doesn't change the
distribution.
So that means that pi T has the same distribution of the product of -- of CI MI times pi T.
so remember M of I is the card that's matched with I. So if I go in beforehand and for
every I I collide it with its match initially ->>: [inaudible] after some random variable [inaudible].
>> Ben Morris: Um-hmm. Um-hmm. So in particular we know that at some point IM
and M of I will collide. So if I just go in ahead of time and do a collision, I don't change
the distribution.
Okay. Now, the point is I'm trying to just -- what I really want to do is scratch out this pi
T and just focus on the product of collisions.
>>: The pi T on the right is the same pi the as on the left?
>>: In distribution.
>>: Oh, I see, because it's at one half, one half.
>> Ben Morris: Okay. So I'm trying to bound the entropy of mu times pi T. That's what
this whole thing is about. This is the entropy of mu times the product of collisions times
pi T. Okay.
So what I would like to do is just kind of scratch -- what I kind of -- loosely what I want
to say is this multiplication by pi T can only help us, so I can just kind of cross it off and
end up with mu times this product of collisions.
But we have to be careful because pi T depends on this product of collisions because
what is this product of collisions, it's a collision of I and a match of I, and how do you
know what the match of I is? It's based on pi T. Okay. So but the way we get around
that is we condition on what gets matched with what. Okay. So let M be the collection
of what gets matched with what. Then if we -- conditioning on M can only increase the
relative entropy, okay, because, you know, the relative -- by Jensen's inequality, the
relative entropy of a weighted average is at most the weighted average of the relative
entropies.
So let's condition on pie T and M. So I end up with the same expression as here, only
now I'm conditioning on pi T and M. Okay. Now, since I'm conditioning on pi T, now I
can cross out this pi of T. Because if you condition on something, it's not random
anymore. It's just you're just multiplying at the end by some -- a deterministic
permutation, so you don't change the entropy. Okay. So I can cross this out. So I have
now mu times the product of the collisions. Okay. The entropy of that given pi T and
mu.
But conditional on M, mu and the product of the collisions are independent of pi T.
Because the only way that pi T affects the distribution of the product of collisions is that
it tells you what gets matched with what. But if we condition on that, then they become
independent. So now I can cross out pi T to the right of the conditioning line. So I end
up with the expected value of the entropy of mu times this product of collisions given
what gets matched with what.
Okay. So now -- so I really can just ignore all this stuff that doesn't involve the collisions
and ask myself, well, how much does -- by -- what is the entropy loss when I start with
mu, and then I multiply by all these collisions.
Okay. Okay. So I have -- okay. So you start with mu, you multiply by all these
collisions, you know, the collision of I and its match. So what we're going to do is we're
going to let mu K be the product of all collisions where both -- okay. M of I is less than
or equal to I and I is less than or equal to K. So this is collisions that -- where both cards
involved are at most K from the top. Okay.
So then we can write the entropy of mu times mu N, which is what we're interested in,
because mu N is just the whole collection of collisions. The difference between that and
the entropy of mu, we write that as a telescoping sum of the differences between entropy
of mu times nu K and the entropy of mu times nu K minus 1. Okay.
So we want to figure out, well, what is the difference in relative entropy when we go
from allowing just stuff at a distance K minus 1 from the top to allowing possibly a
distance K from the top.
Okay. Well, if card -- if K is matched with something below K in the deck, then it's not
going to be -- then nu K minus 1 is equal to nu K because, well, nu K only allows stuff
within distance K from the deck. So if you have something below it's not going to be
allowed either way. But if the match of K is less than or equal to K, then nu K is equal to
nu K minus 1 times this additional match of K and this additional collision of K and its
match.
So what is the loss in entropy when we take this permutation mu and nu K and we
multiply it by a collision of K and its match, which remember is roughly uniform over
stuff higher in the deck?
Well, I'm going to cheat a little bit here, but if nobody notices how I'm cheating, I'm not
going to bring it up. Okay. So remember we had this lambda that said if you start with a
permutation, you multiply it by a collision between K and something roughly uniform
over higher in the deck, you get a loss in entropy which is just the contribution to the
entropy of that position. So the loss in entropy when we look at the difference between
mu and nu K and nu K minus 1 is roughly a constant times the entropy attributable to
position K.
>>: Oh, the measure of mu and nu K minus 1.
>> Ben Morris: That was one of the ways in which I cheated. Okay. It's for -- right.
But, it's not cheating because the entropy attributable to position K of mu and nu K minus
1 is the same as the entropy attributable to position K of mu because nu K minus 1 -remember, this only affects stuff that's high -- that's within distance K minus 1 from the
top of the deck. So we can't tell the difference between these two permutations until -- if
we're dealing out cards from the bottom of the deck, we can't tell the difference between
those two permutations before we get to the top K minus 1. So this EK is the same as the
other EK. And there's another way I'm cheating, but I'm not -- since I'm at the end of the
talk, I -- it's ->>: [inaudible]
>> Ben Morris: Um...
>>: Condition on M [inaudible] it is because -[multiple people speaking at once]
>>: This EK should be -- it's a condition of a probability -- condition of the expectation
given M.
>> Ben Morris: Given M, right. [inaudible] right. So that's I guess the other way I'm
cheating is that M, this mu, nu K minus 1, what this is, this is -- this has information
about the collision between what gets matched with what. So when we condition on M,
we actually -- we actually affect this.
But it turns out that if K is matched -- whatever -- say K is matched -- say K is 10 and it's
matched with 2, then we don't -- this nu K minus 1 does not affect what happens in
positions 10 or 2. So it turns out that when you sit down and do the calculation, it's
exactly like it was in that lambda.
So I was cheating a little bit, but in terms of -- I mean, when you write down what is the
distance between mu, nu K minus 1, and mu and nu K minus 1 and CK MK, you get an
expression that it's exactly the same as -- you just get a sum over all the different
possibilities, the distance between what's in location K and what's in location M of K,
which is...
>>: What are the other [inaudible].
>> Ben Morris: No, it's just those two things. It's just -- this is not exactly a -- I said so
the lambda gives this. It's not exactly a consequence of the lambda. But it's -- but I
should say by analysis it's analogous to that and the lambda, by a calculation similar to
that in the lambda, we get this. But I think it would -- if I went through it all, I think I
would probably confuse people more than -- all right.
So let me end this. So let me end this part of the talk. And then on Thursday I'll show
how to use the theorem to get the bounds on those two examples, and that's actually kind
of not so -- you know, that's actually kind of straightforward.
>>: Interesting that you get the max of two expressions in the L-reversal.
>> Ben Morris: Yeah.
>>: Very natural to get in the lower bound because there are two different arguments in
the lower bound.
>> Ben Morris: Right.
>>: [inaudible] and upper bound, it's strange.
>> Ben Morris: It is. Yeah, it is.
>>: You're supposed to say no [inaudible] now, you're not supposed to say it's strange.
>> Ben Morris: Well, I mean, I'm going to talk about the -- I'm going to talk about that
on Thursday, so -- so I promise I'll talk about where the max comes from on Thursday.
>>: And you said you were going to point out where you use the log [inaudible] that's
the log there?
>> Ben Morris: Oh, okay. So one of the logs comes from comparing the -- so remember
there's this comparison between the D distance from uniform and the relative entropy.
And that's -- there was a log there. And then in the examples you -- it turns out in both
examples the way to do it is to divide the relative entropy in -- up into these
geometrically growing intervals. So it's interval of constant length and then double that,
double that, double that and so on. And then to choose the interval of positions that
contributes the highest amount of entropy, and then choose your random variable T to --
with that interval in mind, you know, make the lambdas high for that interval. So you -since you're only kind of ->>: It would be completely independent [inaudible]?
>> Ben Morris: T will depend on mu. But, anyway, since you're only optimizing for this
one interval, which is, you know, 1 over log N of -- there's going to be like -- there's
going to be log N intervals and you're only doing something good for one of these
intervals, you get another factor of log N there.
>>: Is the L-reversal like -- so this is conjecture known if L is constant [inaudible]?
>> Ben Morris: Yes.
>>: [inaudible] and for any constant [inaudible].
>> Ben Morris: And for L close to N it's known as well.
>>: [inaudible] power of N. Okay.
[applause]
Download