>> Yuval Peres: Okay. So we're very happy... Stanford, and he'll tell us about investigating the fundamental network...

advertisement
>> Yuval Peres: Okay. So we're very happy to have Paul Cuff back from
Stanford, and he'll tell us about investigating the fundamental network burden of
distributed cooperation.
>> Paul Cuff: All right. Thank you. Well, it's good to be back. I'm going to talk
today about the communication requirements to set up cooperative behavior and
the examples we'll use will be a computer network where you want to distribute
some computation among the computers in the network. Many examples could
apply. Could be talking about UAVs coordinating their flight patterns or
something. But we'll look at a very simple model, just to see how information
distributed tools can be used to analyze the communication requirements in
some of these settings.
And the first half of my talk will just be about the goal will just be to distribute
these different computations. The second half will be looking at an adversarial
setting and what changes, trying to relate it to encryption if there's an adversary.
And in between we'll do a short break. Since it is the third group, I have to do
something on the white board. Okay.
So this is an artists rendition of a data center. Let's say this is one of Microsoft's
data centers around the world where they're doing some cloud computing or
something. In fact, let's just -- so these are a bunch of computers in racks
somewhere and chair negotiated together with some cables. Let's say some
question, maybe a search request comes in, into a buffer. And so you buffer up
all these requests and then they're distributed out to the computers in the
network.
So I assume that Microsoft probably does it similar to how other companies do
their search where a small portion of the search is done on various computers so
they can quickly get the results and send them back. And so the search request
would be farmed out to these different computers, and they'd all send back the
results and then you could send it back to the person who searched for it.
So the question is, in a setting like this, what are the requirements, how much
needs to be send across these network cables in order to get the various parts of
the task done at different computers? Let's look at a -- we'll look at a really
simple model. We have computers and some sort of communication setting, and
each one of these boxes represents a computer.
>>: [inaudible].
>> Paul Cuff: No. Did I say they're assigning? They're assigning tasks. And
what happens is some of the computers will be given -- so the tasks will just be
numbered from some set. They all know what the number corresponds to as far
as what that -- what a computation they have to do if they're assigned number
three, so forth. And some of the computers are given their assignments and
other computers get to choose their assignment based on the communication
they receive. The goal is that no one -- no two computers do the same task,
okay?
So we'll look at sort of a cascade network and the network that you might call star
network as examples of how we might analyze this. Okay. So what do I mean
by assigning tasks? Well, let's start with the two node case just so we -- we're on
the same page, okay. Here's a computer who gets a task assigned. In this case
there will only be two tasks, task number one and number two. So two different
parts of the computation. This computer is assigned to do tasks -- one of the two
randomly. This computer then needs to pick the other task, okay. What -- how
many bits must be sensed through this network in order to achieve this? This is
not too tricky. Anyone want to venture a guess?
>>: One bit.
>> Paul Cuff: One bit, right. You would say which task you're doing, they'd pick
the other one. Okay. Suppose that there are more tasks. Same problem,
though. This computer just needs to pick a different task. How many bits might
you need now in this network?
>>: [inaudible] one task at a time [inaudible] two tasks.
>> Paul Cuff: What was that?
>>: You mean to assign one task to each computer or [inaudible].
>> Paul Cuff: No, just one to each computer. So they just each need to do one
task different from the other one out of the [inaudible] that's automatic.
>>: And you have just two computers here.
>> Paul Cuff: Just two computers in this one.
>>: [inaudible].
>> Paul Cuff: Okay. Less than one, right? I was trying to trick you into saying
log K, right? If you wanted to say what task this is, you would need about log K,
log base 2K. Bits, to say what task it is and they could pick a different one. But
you certainly don't needed more than one bit.
One way to do that is just divide the tasks in half and use your bit to say whether
you're in K over two the first half or the second half and they pick from the half
but like you've all said, you can do much less than one bit. Let's see what we
mean by doing less than one bit. Okay.
So a common -- we'll use the common information theory assumption that we're
actually buffering a lot of tasks, and at each time [inaudible] we have the same
problem where we want the other computer to pick a different task but from time
to time, it's a new independent problem, we just want an independent assignment
of tasks that are different, okay? So we have a long buffer coming in and we
want to solve the problem, but we're allowed now to -- notice that I said bits per
task. We're allowed now to save up our bits, okay, and use them all at once. So
for example, let's let K equal five for this example. So the tasks are one, two,
three, four, five. There's a sequence of independent task assignments coming
into the first computer. And he might take maybe eight of them. And let's see
what -- suppose he's -- suppose the rate we're trying to use is one-fourth, okay.
Then with eight symbols you now have two bits to work with, okay? So you,
based on this task assignment you will -- the encoder, the first computer will send
two bits to the other computer and that computer will have four possible received
messages. Each message will correspond with the sequence. So it's a function.
From these bits to a sequence of tasks for computer two. Right? So this we'll
call the code -- a code word out of this entire thing, which is a code book. And
the idea is if this was the code book, then the encoder would look at this
sequence and say well, there's a four and there's a four, the second sequence
certainly doesn't work, okay, so I'm not going to send the message 01, but we
see if we compare every bit with this first run, they're different, different, different,
different, different, different, different. Okay. It works, right? So you can send
the message 00, the decoder will then two sequence of actions and you will have
accomplished the goal.
So for each code book there's a probability of error. There are some sequences
where you won't be able to find a message to send such that it will be different in
every place, right? So the probability of error for a given code book what we say
is that a rate now is achievable, here we were looking at rate one-fourth, but a
rate is achievable if no matter what probability of error is demanded, let's say one
in a billion, you can find a buffer size N and a code book that gives you less than
that probability of error for any probability of error. And it's not achievable if no
matter how long you look you'll always have some error. Okay?
>>: Are you always just assume independent error uniform.
>> Paul Cuff: Yes. For this problem. These -- I mean, the same analysis would
work for other things but for these problems, we'll always assume that. So the
problem for the two computer cases solved with rate distortion theory ->>: What?
>> Paul Cuff: Rate distortion theory. And so let's see what the answer is. The
minimum rate needed is the mutual information between X and Y. Now, X and Y
are random variables. Well, where X -- the distribution of X is given by the
problem, uniform one through K. You get to pick the distribution P of Y given X
to minimize this quantity, but the constraint is that X and Y cannot be equal with
probably one, okay. So this optimization problem will tell you the minimum rate
needed in that. And mutual information is the sum of the entropy of X plus
entropy by Y minus their joint entropy, and entropy is this expected value.
So let's just work out what's going to be the minimum one for this problem. I
think I'll do it over here. Let me propose a P of Y given X. Let's say -- I'll just -I'll give you the P of Y given X that minimizes this and then we'll show it actually
does, okay? Let's let Y be uniform over all the choices not equal to X, okay? So
then we have I of XY equals. So H of X, which is log K, right, H of Y is log K,
whoops, and H of XY, now this is two -- a sequence of two non equal numbers
out of K, so here we have K, K minus one, right? The entropy of XY. Okay. And
so that equals log K over K minus one. Okay?
Now, is that the lowest -- well, let's write mutual information another way. H of X
minus H of X given Y. This is another valid way to write mutual information. This
is fixed by the problem. We have no choice in this, so this is going to be log K.
>>: [inaudible].
>> Paul Cuff: Yes. Here I was doing it the first way of spanning that's on the
slide.
>>: Why eventually is uniform [inaudible].
>> Paul Cuff: Yes. Yes. It is when you marginalize out X. So then here we
want to minimize this, so we want to maximize this, so what's the most that it can
be, well H of X given Y less than or equal to. We know that it's not equal to Y.
So this only has positive mass on K minus one entries so it's less than log of K
minus one, right? So there we go. This is that minimizes the mutual information,
right? Okay. So as you've all said, it becomes less than one bit. If K is two, then
it's one bit, right, and it goes down. This is approximately one over K bits.
Okay. So now let's look at it in this sort of cascade network. Now there are K
computers total. The first one is given an assignment, okay. The next K minus
one computers all get to choose their tasks based on these communications
done kind of in a daisy chain like this. So now what rates are needed so they
can all choose different tasks? They're going to each -- all the tasks are going to
get assigned in this problem.
In this case, we know the minimum rates. And it's done in the following way.
You first assign the last computer a task using the rate we just talked about, log
K over K minus one. Then every -- since that communication went through all
the links, everyone knows what that task assignment was. So now all the
remaining computers kind of reduce the set of tasks by one. They say oh, the
last computer is doing task five, so none of us are going to do five.
>>: [inaudible]. K task?
>> Paul Cuff: Yes.
>>: [inaudible].
>> Paul Cuff: Yes. Certainly across this link here you're giving all the
information. But not necessarily across these, right, because -- so ->>: [inaudible]. [brief talking over].
>>: [inaudible].
>> Paul Cuff: Yes.
>>: [inaudible].
>> Paul Cuff: Yes. Exactly. Exactly.
>>: [inaudible].
>> Paul Cuff: Yeah. So optimally what you do is you just send the last guy his
assignment, and since everyone else heard it they'll reduce the problem size by
one and you'll peel off the next one, but now it's changing K to K minus one,
okay. So this, the way you prove that this is optimal -- well, first of all, let's look at
how much rate on each link. This link only has this communication. But the next
one back, RK minus two is the sum of these, right, and so forth. So what we get
is that RI is just -- when you do this summing up, you get log of K over I. Okay?
And what we can do is we can show that this is actually a lower bound
individually for each link just by looking at the mutual information between X and
everything past it, which can now be unordered. We don't care about the order.
Okay. Now, let's look at the sum rate in this network. At first glance, okay, I
cheated and gave the answer away already. Okay. At first glance, this -- it's a
pretty bad network in the sense that the message you send to a computer has to
go through many links. So it may look like the communication scales with maybe
the square of the number of computers when you add them all up because -because the communication to any computer passes through a number of links
that's linear in the size of the network, right?
But when you add the log K over I up, take out the K log K and notice that the
sum log I is log K factorial, which can be approximated by K over E to the K.
Now, the K to the K can sells this, and you end up with a K log E. All the logs
we're using are a log base two, okay. So we're using log base two so we can call
it bits. You can use any log you want but no one works -- I don't know a
computer that uses gnats for their base arithmetic.
Okay. So it's linear in K, okay? So that's kind of nice. So you know, log two of E
is one point something. Anyone? No?
>>: [inaudible] you can just do the usual one that [inaudible] so far so that it
matches the same bound.
>> Paul Cuff: If they ->>: I mean, [inaudible] is gone so far [inaudible].
>> Paul Cuff: Oh, I see. So you're saying tell this guy what you -- well, okay, but
you -- no, you don't want to tell him a set. Oh, yeah, you do. Yeah, you tell him
what you have so far, right. But that would suggest that these rates would go up,
right, because then here you would have to tell him these two assignments. If
you did it that way, where you like, you say I've got assignment five and the next
computer picks his task, he picks ->>: [inaudible] which is typically why he just tell what ->> Paul Cuff: Yeah, you just tell the set. So you get rid of the ordering?
>>: Yes.
>> Paul Cuff: Yes. Still that would go up. It will still go up because you would
get K choose two, log of K choose two here, right?
>>: But that's the [inaudible].
>>: [inaudible] [brief talking over].
>>: You do actually send because even in the two computer ->> Paul Cuff: Okay. Yeah. See, the trick with the two computer case is you
can't understand it by looking at it one letter at a time, because there's no
message about just that one symbol. So in -- you just are identifying a sequence
that works.
>>: No, but those two [inaudible] because they are clear -- because [inaudible]
the number of computers in K.
>>: [inaudible].
>>: You still have to know all this first very long.
>> Paul Cuff: Yes, this thing here.
>>: And it [inaudible] is saying it reads something complicated that says you wait
for a long ->> Paul Cuff: Yeah. And then you just tell them a sequence that works.
>>: But that's the -- I mean, I'm not saying that you should ->> Paul Cuff: Yeah.
>>: [inaudible] the information you would need to convey on every edge that
makes it every level [inaudible].
>> Paul Cuff: So if we look at our -- I think that the key is you're looking at the
second rate and saying okay, how do we get this smaller. Because this is
smaller than this, right?
>>: I'm not saying smaller, I'm saying this is the attention information [brief
talking over].
>>: Getting something more ->> Paul Cuff: Yeah. Because what I'm doing is I'm assigning, basically if you
look at the message on this rate, on this link, you're basically assigning all the
rest of the guys unordered without assigning this yet. It's like if you already
assigned this, then you have to convey more information.
>>: [inaudible]. Those jobs will not be assigned further, so and further all the
jobs will be assigned.
>> Paul Cuff: Yeah.
>>: Because these days are different. So essentially whatever [inaudible] you
are doing, this and exactly this information is pass on this [inaudible].
>>: No. Because even -- because [inaudible] the last one [brief talking over].
>>: The last guy what your sequence is.
>>: Oh, so you are not assigning this guy and then ->> Paul Cuff: No, they are in reverse order. And it does say it saves the rate
that way. So ->>: But you can't think of it one tile at a time, right?
>> Paul Cuff: No. You have to think of it -- you have to buffer it up to really get
these rates. Yeah. So this is -- so you get log of -- you get basically log base K
of E for each computer on the network.
Okay. Let's look at a ->>: [inaudible]. You need to do it sequentially. Or can you do it -- pipeline? I
mean will you need to wait for ->> Paul Cuff: The idea is you need to wait ->>: [inaudible].
>> Paul Cuff: Okay. You will always be behind a little bit or something. Like you
would have to know your task assignments ahead of time. The question is how
do you actually like -- well, the question I've made out of this is how do you
actually do a block coding with task assignment? You -- the first computer would
have to know his task assignments ahead of time, and after it's waited for block
of a hundred, it would -- it would send at a lower rate for the next hundred that it's
receiving, it's receiving the next buffer, and it would be using that time to send the
information from the previous block. And then they would all execute.
>>: [inaudible]. I mean the information transferring through the network. Say
you were already ->> Paul Cuff: Oh, I see. Because you're like -- you now know what the message
are, but you have to send here and then send here and then send here to the
delay of going down the network. Yeah. So I haven't considered like yet that you
-- if you count the delay for each link then that would affect things.
But of course you could pipeline like you were saying. Yeah, you could pipeline
this. Okay. So very similar, but what we're going to do is say the first K minus
one tasks are assigned, okay. Some -- for some reason they know that their task
assignment is rigid and you only want to know the last one. No really the idea
would be that you would want to know, some of the tasks are assigned and some
aren't and what's the answer. But we're looking at the simple cases first. This
case actually turns out to be not very simple.
So in this network, the last computer, the task that he does is fixed. Like it's the
task that all these guys are not assigned, right? So there's no flexibility here. He
just has to know enough information from the rest of the computers in order to
figure out which task it is that's not being done. Okay? But so we don't know the
actual optimal answer here. But let me propose one idea. It's like Kamal was
saying. Tell -- send here the information about what task you're doing. That has
to be done. We know that that must get across somehow. You save a little bit
because you use the side information that this guy knows one task is eliminated
and you do [inaudible] coating. So you save a little bit of rate. And the rate
needed there is log of K minus one, okay?
Then here, this person, he looks at this computer looks at both these tasks and
gets rid of the ordering because that doesn't matter, so saves a little bit -- saves
one bit by getting rid of the ordering, but basically transmits this set to the next
one and they keep going so forth, like that. And then the rate you would need at
each one is K minus one choose I. Okay?
Now, this is kind of a strange scheme because it's totally different than the
network before. Now we get rates, here's a -- consider this a graph of the rates
and the network. They peak in the middle, right? So you need really high rates
in the middle and not very high at the edges. Plus all of these rates are already
very high. I mean, this is -- this is worse than log K for, you know, log K minus
one for every link. Now, this -- so this is one scheme that seems reasonable. It's
not -- we could also go through and minimize each rate, each link individually,
and you'll find that this does not -- this is not necessarily optimal. There's a
whole trade-off. If you allow for higher rates on some links, you can then get
away with small rates on the other links, okay?
>>: So you can get equal and rate to -- transmit the sum of the numbers? The
last one we just did cast K times K plus one over two minus the sum [inaudible]
previous numbers, then the rate will be fixed.
>> Paul Cuff: Interesting. Okay. So you do like the mod K sum or even just the
real sum. Just the real sum. Mod K, fine. Either one. Interesting. I wonder
what rates we'd get in that case. That's nice. That's nice.
>>: What's the [inaudible]?
>> Paul Cuff: Is this. This is an achievable point, okay? But it's a whole region,
right? It's a high dimensional region because you could always trade off rate one
for rate five in -- but -- so in the first case, there was an optimal point. The region
was rectangular, and there was one optimal point that summarized everything.
In this case, there's a lot of tradeoffs, and this is one achievable region -- point in
the region.
>>: [inaudible].
>> Paul Cuff: Okay. Then yeah. Exactly. So if you're trying to optimize the
sum, I mean that scheme would be interesting to look at, like what does that
take. So that -- that at worst would be like log K for each one. Right. Okay.
>>: So if all these X1 to XK minus one are assigned to [inaudible] communicate
with each other?
>> Paul Cuff: They just need to communicate with Y. But the thing is the only
way this computer can communicate with Y is through the others, okay?
>> Paul Cuff: So we're looking at saying the connections are fixed. If you
change the connections then, yeah, you would want -- maybe, probably you
would just want them each to go there. Although it would still be a challenging
problem actually.
Okay. So if you minimize each rate individually you find that the rate needed on
each link is just log of I plus one. So that actually suggests that might be an
interesting -- no, but this does a little bit better than log K, right? Because for the
first ones you're much smaller, I is smaller than K.
>>: [inaudible].
>> Paul Cuff: So this is not necessarily -- this is not achievable. In fact, I think
this is far from achievable. This is just saying that the smallest rate you need on
the first one is log two. You are don't need to send actually the whole thing, what
you can do, the way -- what you would do basically is you say okay, I'm going to
assume that these rates are really high so the last computer knows all of these,
knows all these assignments. If he knew all those assignments he can use that
side information to greatly reduce the rate with which I need to send this, and I
can get away with only one bit, right?
So using those types of bounds you get a lower bound. But this is still -- the sum
is still K log K. So even with these low bounds you're getting worse than the
other network, which seems kind of funny considering that -- considering that
there's only one task being assigned in this network and the other ones had a lot
of tasks assignment, right?
>>: So does zero [inaudible] achieve this [inaudible].
>> Paul Cuff: No. No, mine I think would be more like K squared.
>>: [inaudible].
>> Paul Cuff: This one here?
>>: Up to this ->> Paul Cuff: No, but this one does.
>>: Up to this.
>> Paul Cuff: Exactly. So yeah. So divide it -- yeah, exactly. So up to the
[inaudible]. So let's look at this last one here. So we have this star shade
network. And basically what we have here is the guy in the middle gets a task
assigned, okay, and could communicate directly with all of the remaining
computers. And each of those computers gets to choose its task. So you may
try here to use -- go back to the two node, the two computer limit. We knew that
with two computers it was log of K over K minus one, right?
So what if I just used log of K over K minus one over each -- for each of these?
And the -- that doesn't work, but it will guarantee to let -- it will guarantee that for
example Y2 doesn't equal X and Y1 doesn't equal X and each of these will be
different than X, but they won't be different from each other necessarily, if you try
to use that communication scheme, right? So what extra needs to be sent to
make sure they're all different from each other? Okay. Well, one idea that
actually works quite well is to -- is the following. You say let each of these
computers have a default value, okay, so this computer will do task one by
default and two. So they have K minus one defaults and K is not a default
anywhere, right?
So if X gets assigned K, then good, they're all fine. If X gets assigned something
else, like three, then all he has to do is send a message to this computer to say
move out of my way and this computer knows to then go take assignment K,
right? So now what rates are needed? Well, you just have to know how
frequently will you be telling an individual computer to move out of the way, that's
one over K, right? So sending a bit stream of -- that's one, one over K of the
time, that takes a rate of H of one over K where H is this -- is the binary entropy
function. It's just negative P log P minus one minus P log one minus P. So it's
the entropy of a Bernoulli random variable. It's approximately log K over K, okay.
So already we're seeing that this scheme is going to be very good because when
you multiple it by K, you're going to be less than linear, right for the sum right and
the entire network. But I have to mention a scheme that works even better than
this. The reason is because the answer's so nice. But let's look at -- see you've
all probably thought I only had one talk that had the golden ratio in it.
So let's look at just the three node case of this network, all right? So this
computer gets assigned to task one through three and then these two need to be
different, right? We can make an improvement over this default scheme. I
mean, he could take a default one and default two and we know how that works.
We could do an improvement by first sending an estimate of X to them. And the
reason this is -- okay. So in other words, you sent at a very low rate you send a
guess of what X is that isn't usually right. It's only right slightly more than it
should be. Okay?
So these two computers have a guess at what X is to work with. Now, they're
going to pick default values center around that guess, okay? So in other words,
if the guess was two, then this guy's default will be one and his will be three,
okay? Something like this. Now they go with the same original scheme where
this computer then on the second round, so it's like a two stage communication,
first stage is send the estimate, second stage is tell them if they have to move out
of the way.
Now, because of the estimate, that I will have to move out of the way less
frequently. Right? So there's a trade-off of how good of an estimate do you send
of X and, you know, how much do you save from not having to tell them to move
as often. And you optimize that with calculus and you get that actually the
optimal rates with this scheme are log three, which is what it would take to send
X exactly, minus log of the golden ratio. And the golden ratio is the square root
of five plus one over two. Doesn't come up too often in communication problems
unless I'm working on them. But you know, there are golden ratio fan a
particulars who think it's in -- who see it in pine cones and everywhere, right?
>>: [inaudible].
>> Paul Cuff: Exactly. Exactly.
>>: So [inaudible] somehow [inaudible] something else. I think it's -- I mean, you
wanted to use less bits ->> Paul Cuff: Yes. Yes. So it's like sending the estimate. The thing is when
you send an estimate of X, basically what you do is you device a joint -- you use
the same rate distortion type of results and you say -- you get to pick the rate
that's going to equal I of X, X hat is the rate you're going to have to use to send
the estimate. And you get to choose so P of X hat given X, P of X is fixed and
the receiver essentially is going to get X hat. I mean, it's going to come in the
form of some long sequence and stuff, but essentially it's as if the decoder knows
X hat and you only needed to send it with this much mutual information.
Now, the cool thing is so X the distribution is going to be something like this.
This will be like so one, two, three, this would be something like P of X hat given
that X equals two, okay, would be like, you know, some probability -- some
conditional probability mass function where it's slightly more likely that X hat
equals X, right? But the mutual information is like, you know, your -- when this is
uniform, you're at the peak and so you're quadratically getting penalized based
on what conditional distribution you use. So you're not paying much. In fact,
when you optimize this, the rate -- so this total rate is like .9, around .9. The rate
of the sending the estimate is .04. So you know, you hardly use any rate, yet you
get away with reducing this probability here. And so the savings are kind of
linear. It's like the penalty's quadratic, the savings are linear. So that's why it
works.
So okay. So for this part of the talk we looked at these three networks, and the -what we noticed anyway in the first network we could analyze it exactly and the
sum right was linear in K. When we went to the second one, where a lot of the
assignments were given already and you only had to sign the last one, the rates
got worse. We saw that our men's is kind of K log K and we were given the
scheme here during the talk that gets that same scaling.
And then in this last network, we don't know the answer, but we do know a
scheme, so we have an upper bound on the minimum rate because we know a
scheme that works.
>>: [inaudible].
>> Paul Cuff: Yes. It's not -- it's not -- there's no lower bound on that to say that
that's actually optimal.
>>: If you [inaudible] you also got ->> Paul Cuff: Same. Same.
>>: [inaudible].
>> Paul Cuff: Yes. You also got [inaudible].
>>: [inaudible].
>> Paul Cuff: Yeah. So ->>: And you know [inaudible].
>> Paul Cuff: I don't know that.
>>: So what lower bound is [inaudible].
>> Paul Cuff: Oh, okay. We know that you at least need the two computer limit.
Like if none of the other things were in the network. So log K over K minus one.
But see, that's going to be like constant. All right? I mean, log K over K minus
one times K, how does that scale?
>>: [inaudible].
>> Paul Cuff: What?
>>: Constant.
>> Paul Cuff: It's constant. So the lower -- the lower bound we know would be
constant scaling. Of course log K is already [inaudible]. Okay. Okay. Great.
Time for intermission and since we're taking too long, we don't have that long,
but I want to -- we're going to move into like an adversarial setting, but I thought
since this is the theory group we might enjoy a little puzzling question before we
move on to this part.
Okay. So suppose you have some random variables, two sequences, X1 to XN,
Y1 to YN, I'm going to abbreviate these as X to the N, Y to the N. Okay? I'm not
going to tell you they're joint distribution, but I'll tell you something about their
joint distribution, okay? And the -- what we'll want to -- the question will be if we
have some Markov chain with a variable in the middle that separates these entire
sequences, okay, so conditioned on U, the random variable U, the sequence X1
XN is independent, conditionally independent of Y1 to YN, okay, and I'll want to
know something about the cardinality of U. How big does the cardinality of U
have to be?
And it will tell us something about you know what's the connection between XN
and YN, this minimal cardinality we need on U to separate them, okay? So let's
say that cardinality of U equals two -- it's going to be exponential in N, okay, so
let's say two to the NR, okay. So we want to know what R is necessary as N
gets really large and as I give some constraints on the joint distribution of XN,
YN, what R will be necessary. Okay? So the first questions, there will be three,
first one is I'm just -- so the first thing I'll tell you about XN and YN is the
following. When we define a typical set, okay, can we -- so typical set -- this isn't
-- I'm not going to have room here. I'll do it on the other side. Okay.
We have a P knot of XY. Do be confused to think that this tells us the distribution
of XN, YN, it's just a joint distribution okay? XY. And the typical set with respect
to that joint distribution, epsilon N, is a set of XN, YN sequences such that the
sum N plus one over N sum -- the indicator function that XI, YI equals AB, A, B,
minus P knot of AB, this absolute value is less than epsilon for all AB.
Okay. So all I'm saying here is that the empirical distribution of these sequences,
okay, this here is the empirical distributions of XN, YN. If you count them up just
looking at the pairs, and you look at their distribution, that is very close to P knot
of AB, okay? That's what -- so this set is all sequences whose empirical
distribution is close to P knot. Okay? And I'll call this typical set. Now, all I'm
going to tell you about XN and YN first is that with high probability, you know,
greater than one minus epsilon, they are in the set, okay?
So the first step is simply that I'm saying that the joint distribution of these two
sequences is such that with high probability they're typical, according to P knot of
AB. Or P knot of XY. Okay? What cardinality do you need for U? Now, this first
one's sort of a -- so this first one's sort of a trick question, because XN and YN,
they could just be a deterministic sequence that is in that set. Okay? So if they
were deterministic, then they're already independent, so you don't need any -you don't need anything to separate them.
>>: [inaudible].
>> Paul Cuff: No, given this information, what's the best as far as lowest rate,
okay? So given that information about their joint distribution, what's the lowest -you can pick the joint distribution and you can pick -- you can actually pick the
joint distribution even with U, right, to minimize this. Okay? And then you get
that R equals zero because you can just have a deterministic sequence. Now,
let me add one more thing, okay? Let's add that -- so the first one was that XN,
YN are NT with probability, you know, one minus epsilon, right? Okay. So and
we saw that, you know, R equals zero. Okay. Next one. Same thing. But also
XN is distributed IID according to P knot of X, according to marginal on X. Okay?
So now I'm telling you a little bit more that this is actually IID. Also, there are
jointly typical with high probability. Now, what is the minimum U that separates
them?
>>: [inaudible].
>> Paul Cuff: With the marginal distribution of P knot. P knot is the same P knot
that was used to define the typical set. X is also IID with that distribution. I'll just
give these answers. So now the minimum rate needed is the -- is the mutual
information between X and Y. Okay? And this is relates to coding theory
because -- theory because X is like a source of information, U is like a
description about X, and there's no requirement that the reconstruction is actually
random, but you do require that it's jointly typical.
So you know, you have this Markovity naturally from describing X and then
decoding it. So this is just another way of looking at rate distortion theory. Now,
the last one is that -- can you see this? Okay. That XN and YN are IID
according to P knot of XY. Okay. So that's what I'm telling you about the joint
distribution now is that these are actually IID or the P knot of XY. Now, you get
to pick a joint distribution with you that's Markov and try to minimize the
cardinality of U. Okay?
And this one, the answer it's not mutual information, although I've had -- I know
many people to guess mutual information here. But, know, it's that the rate is
greater than what I'm going to refer to as the common information between X and
Y. This was defined by Weiner [phonetic], this term, common information. And
what it means is so the common information between X and Y is the minimum
mutual information between the pair XY and some other variable U where U
satisfies the Markovity XUY. So this is just a single -- you know, this is
optimization problem. You just look over all U that separate XUY that you may
think, well, this looks a lot like the problem we were just looking at, except for that
this mutual information is very different than you might have guessed is the
answer, right? It's mutual information between the parent Y and the U that's in
the middle. Okay?
Anyway, so this common information will come up in the work we're doing so I
thought this was the way to introduce it.
>>: [inaudible].
>> Paul Cuff: What?
>>: What is the [inaudible].
>> Paul Cuff: Oh, yeah, yeah, yeah. So U -- okay. So when you do this
observation you only have to look over U of size I think X times Y. So the
product of the cardinality minus one or something like that. Still pretty big. Okay.
>>: [inaudible].
>> Paul Cuff: Okay. So what's this picture of?
>>: [inaudible].
>> Paul Cuff: [inaudible]. Yes. Okay. Great. So let's -- we're going to now talk
about coordination of behavior in an adversarial setting, okay? So what is -- how
would -- let me describe encryption, the basic problem of encryption as I see it,
anyway. You have a source of information that we'll call X, and you want to send
it with a description to a decoder that will then decode X. Okay? But an enemy
sees this communication, and you don't want the enemy to know anything about
your information. So in order to do this, you're not going to have luck in this
setup unless you have a secret key, okay. Now, these two know the secret key.
The enemy doesn't. And this problem was looked at by Shannon and a very
negative result showed that -- yes?
>>: You are [inaudible]?
>> Paul Cuff: Yes. Well, what I'm -- I am taking away all computational limits,
okay? So we'll just look at from the information theoretic perspective. So, yeah,
clearly this is not how encryption is being done. Because of this -- basically
because this result's very negative. It says that the secret -- amount of secret
key you need is as much as the information you're sending, and you can never
reuse the secret key, right? So one time pad. The way you do this encryption is
you have a message here, you have the secret key, and you just do X source of
the entire message, right, and you get -- you get something that's independent of
the message. No matter how long someone sits here an tries to crack it, they'll
be no better trying to figure out your message. And then you undo it with the
exclusive [inaudible] also.
Okay. So then the minimum rates needed, R1, if R1 is the communication rate
and R2 is the secret key rate, then R1 equals R2, that should be equals, I need
to fix this equals the entropy of X, right, the rate needed to describe X without
loss. Okay. So, yeah, let me emphasize again, so what we'll look at the
information threat limits assuming there's no computational limits. Far from
practice. Although the -- with the advent of quantum key encryption and you
know, maybe some realistic ways of actually exchanging long keys, then maybe
this will have some relevance coming up.
So, okay. Now, I want to sort of broaden the mind-set of encryption into a game
theoretic setting. Why do you need to set a secret that someone else can't
discover? Is there a way of modeling this as a game where the person who
might discover your information would use it against you somehow? And if so,
then let's look at it in a game theoretic setting. So here's the simple repeated
game where you have an enemy who can take one of two actions and here's me,
and I can take one of these two actions, and based on the actions we take, we
get a score. And you know, I wouldn't want to take action zero if it were known I
was going to take action zero necessarily because the enemy -- then the enemy
would choose zero because he would want me to avoid getting two points. And if
I chose one, and he knew I was taking one, he would choose one and get me the
negative one. So it's better actually if I randomize over zero and one just right
and void him being able to slap me with the worst penalty, right? Okay. And
there's a score of this game.
Now, suppose that I have a partner so my team so our paths are actually a
combination of two actions. Again, we might like to randomize and the optimal
randomization might be correlated between the partners trying to cooperate,
right? So let's look at from this context let's throw it back into a communication
setting, okay? If you have person A and person B separated, acting in a game,
and they each are randomizing by flipping their own coins, okay, then they're -their actions will be independent from each other. Okay? But if you add a
communication link between them, then they can achieve coordinated actions for
competing in this game, okay? So suppose an encryption setting were looked at
in the sense of you have some sensitive information that affects the actions
you're taking in this multiplayer game, you want to send something about that
information to your partner, who will then take some actions in the game as well.
And you don't want the enemy to know anything about your actions, okay? So
we're going to try to relax -- basically we're relaxing the requirements of
encryption by not requiring that you send your information. The only difference
here is that that doesn't equal X, okay? So we're not requiring that you send the
information exactly, we're going to try to relieve the rates needed on the
encryption and on the communication but at the expense of only decoding
something correlated with K, okay, not equal to X. So now we ask what are the
required rights. We're not compromising secrecy at all. So the enemy can't
know anything about X or Y. What are the required rates of communication and
rates of secret key to achieve a given desired P of Y given X. Okay?
>>: [inaudible] communicate, have a higher correlation.
>> Paul Cuff: And give ->>: A way a little bit and then might be ->> Paul Cuff: Good. So you can actually look at, analyze the game the way we
did it was we say the enemy sees all of your past actions, the enemy sees this
public communication, of course, and you do a repeated game, and you say of all
possible ways of using the communication you actually show that you can't do
better than doing it this way, okay? So where you keep the secrecy perfect.
Because essentially when they see -- if you leak some of the secrecy, you're not
achieving -- you're achieving some of the new conditional joint distribution, right,
secretly so. Okay. So here's the result.
>>: [inaudible] always keeping the secrecy as a measure of them linking
information. [inaudible] information. Communicating more, correlating better.
>> Paul Cuff: So there's -- okay. I guess I do need to explain one thing. The
way -- when we actually showed the lower bound -- this is optimal for games, we
were actually letting -- we -- this -- it was slightly different setting. There was still
-- this guy actually picked his actions. It wasn't given to him randomly. He
generated actions, sent something about it to the partner who generated the
counterpart actions. In that setting, this is optimal for games, okay? There's a
slight -- there's a nuance when you do this exact setting where this is supplied
randomly as sensitive information. And you're describing something about it to
your partner. The nuances that the game might be kind of degenerate in the
sense that you just want your partner to -- you don't care about keeping a secret,
there's nothing they can do to stop you when you do a certain set of actions,
right. If the game is degenerate in that way, then you don't want -- you don't care
about secrecy, so you just go for correlation. So there is kind of a balance
between leaking the secret and getting higher correlation if in this exact setting.
Yeah. So the setting where we showed the optimality in the game -- in the
repeated game was slightly different than this. Yeah.
>>: [inaudible] because [inaudible] because otherwise maybe you want the
enemy to find that ->> Paul Cuff: Yes. So actually that's one -- that's one thing I think that would be
-- that I'm interested in learning is taking this exact setting, applying it to a game.
Okay. In this exact setting, X is better viewed as side information about the
game, okay? X is like some side information, you have a random payoff matrix,
and X is correlated to that randomness. And then you say if you had a partner
who was watching the side information and could send you something about it at
a certain rate, what should they do to get you to do well in the game? Okay?
That's what this exactly applies to how this would be applied to the game theory.
The setting we were able to solve was the one where rather than site information
this is another action in the game, and you generate it here. It's not given to you,
okay? So ->>: [inaudible].
>> Paul Cuff: Yeah, you can generate with mixed strategy, describe it to your
partner. But it turns out that that actually makes the problem a little easier that
your generating it rather than sending it. Yeah. So I mean it's a little bit subtle,
but there is some concrete connection with game theory, but there's also what
you've all mentioned actually is relevant too.
Okay. So the result is again you have this auxiliary U that you get to use in your
optimization. Now, we have a trade-off actually, so it's not just going to be a
minimum communication rate and a communication encryption rate. There's a
trade-off. You can exchange some encryption rate for some communication rate.
And the rate pairs need to satisfy that our ones greater than the -- mutual -between X and this made up U that you get to use for optimization. And R2 is
greater than the pair XY with U, the mutual infringe between the pair and U. U
again has to satisfy it's Markovity. It separates X and Y in the Markov sense.
Okay?
So you consider all U and each U gives you a rate pair, a set of rate pairs that
are achievable. Okay? And when you -- but the nice thing about this is what if
we looked at the extreme? So we're going to get a region. It will be some
trade-off region of R1 -- rate 1 and rate 2. But what if we looked at just the
extremes? What the minimum communication rate if we don't care about the
encryption rate, okay? Well, by the data processing and equality you can say
that's minimized by letting U equal Y, okay? But because of this Markovity. So if
U equals Y, then we have mutual information between XY. Okay. So minimum
communication rate is the mutual information. What if we want to minimize R2
without caring about R1? Well, that's exactly this Weiner [phonetic] problem
here. That's the common information. So the two extremes of this region are the
mutual information and the common information. It's kind of nice to see these
kind of fundamental quantities showing up.
So okay. Lastly let's look at an example. Let's go back to the task assignment
and solve it for the encryption setting. So in the task assignment problem one
computer was given a task randomly and just wanted the other computer to pick
a different task. What we're going to adhere is that you want the other computer
to actually pick randomly among the other tasks, okay? So let me try to give you
an example of what we might be trying to do. Say there are a bunch of files, and
you have a virus scanner that's looking through for viruses, okay, and some
network. And it picks a file -- and it goes through these files but you don't want it
to go through in a deterministic pattern because you might get some clever virus
that follows right behind it and never gets caught, okay? So -- so what it's doing
is it's randomly picking a file to search and then randomly picking a new one.
Okay? Now, you have another process, another part of the virus scanner that's
doing the same thing. Picking files to scan. But you don't want to be scanning
the same one, so what rate of communication and what encryption is needed so
that this other virus scanner can randomly pick another test you're both always
randomly picking two different tasks. Okay? That's the scenario here, right?
Okay. So you're given the task -- you're given yourself uniform one to K and you
apply those results and what you get, what you find is that a rate -- here's one
point that's achievable. Communicating at rate one and using encryption at rate
two. Okay. Now, how do you -- how does this work? Well, essentially what
you're doing, what you're sending is you're telling the decoder a set to randomize
over. So you get assignments task three and you're telling them randomize over
one task one, four, seven, 12 and 15. And then they're randomly picking among
those, and they don't even know what task you have, they just know it's not in
that set, right?
So to achieve this point, you're basically the sets that you're telling them to
randomize over are about half of the total tasks. So it takes about one bit when
you construct this code book to tell them -- to give them a sequence of these sets
that they can randomize from. Okay? Now, but when -- but you still need some
encryption. What you use the encryption for, one bit is used for the one time
pad, okay? The extra bit is to randomize your -- to randomly generate the code
books in a special way because notice that even when you communicate this
way, they're not really randomly picking from all of the other tasks, they're only
randomly picking from some of the other tasks, right? So if you randomize your
code books enough, you can -- you can make it all work out that they are actually
randomizing over all the other tasks, okay?
Now, you can -- there's a trade-off. If you want, you can get away with those
much smaller communication rate. You can get away with a communication rate
of about one over K, approximately. But it's at the expense of a lot more
common randomness, a lot more secret key. And so the way this works is you
reduce the size of the set that you're telling them to randomize over until
eventually you're just telling them which task to take, right? And but because
they're not randomizing very much privately, then you need to do much more
randomizing of your code books. Okay. So you require a much larger -- a secret
key rate of approximately log K, rather than just a constant two which was
constant for all K. Okay? So all right. So I mean the general idea here, we
looked at how information theory could solve some simple cases or at least give
you some insight into some communicating for simple cases of coordination.
The idea would be, you know, real networks aren't of the simple structure and
some -- and this probably isn't a very good model of how tasks are distributed in
the network, but just a demonstration of how these tools might be -- might be
used.
And just a reminder of what we looked at. The nonadversarial case in three
different networks and then the adversarial case where we saw a trade-off
between communication rate and secret key, and we also saw that some of
these fundamental quantities came up, the minimum communication rate was
mutual information, the minimum secret rate was common information. Thanks.
[applause].
>> Yuval Peres: Are there any questions?
>>: And perhaps going back to the [inaudible] what happens if you insist that
there's no buffer and no code book, you just have to -- you do things on
experiment at the time and then what you're interested in, you know, and maybe
use some randomness in [inaudible] what you're interested in is just the entropy
of this random variable that [inaudible] seems you can still do some things.
>> Paul Cuff: So you still are -- you're allowing for still coding in the sense that
you're getting away with the entropy, right. So if it's less than a bit, then you're
going to use less than a bit.
>>: So you [inaudible] most of the time --
>> Paul Cuff: I see. So instead, it seems like instead you're saying rather than
working with mutual informations, which is what you get out of this code book
method, you're using just entropies of random variables. So in other words,
you're instructing correlated random variable and sending it [inaudible] with
entropy. I think you would end -- well, a couple of things. First of all, you'd end
up with significantly higher rates I think if you did that. But also, there's another
way to model this short delay in a network. You can say suppose you don't want
to have buffers at all or you want to constrain the buffers to some limited size and
say we don't want more than this much delay. And people have looked at
information direct problems from this perspective. I haven't really done it much,
but there's work on source coding with fixed delay and that would also be
relevant here, I think. You know. And ->>: [inaudible] sort of interesting because then it makes it so much simpler, like
here's something you can do maybe if communication doesn't always work or
something like that.
>> Paul Cuff: Yeah. Exactly. So ->>: But measuring your cost in termination of the entropy, your own [inaudible].
>>: Yeah, but.
>> Paul Cuff: You're already buffered.
>>: But suppose that your communication [inaudible] is smart so you can -- it
can -- it can send the entropy [brief talking over].
>>: But the two people that have ->> Paul Cuff: Sort of a separation. You want sort of a separation where you can
get -- you can have a -- the channel coding is done at the entropy limit by some
box that you don't have to deal with and you ->>: Just like [inaudible] I mean, that's [inaudible].
>>: I have a stupid question as well. Just about motivation that you gave, the
mathematical problem [inaudible] so I think ->> Paul Cuff: For this thing here?
>>: No, no, right [inaudible] when you were talking about searching and
[inaudible].
>> Paul Cuff: Yeah.
>>: So it seems to me not application that the -- there would be a lot of
information content in what the actually [inaudible] is worth or I'm just wondering
if there's some --
>> Paul Cuff: I actually -- I think a good model, I think a good model for task
assignment better than the one we've done here would be to say you have a
huge set of possible tasks, like all these possible searches or something. And
you are given some subset of those to do, like, you know, some set of size X and
those involve like all the key words in your search query or something like that,
say, oh, you need to do -- here's the different parts we can break out this search
into several different tasks, right? So the -- so then the computer and the
network that's the controller for the network or whatever would not just be given a
task to do, would be given a set of K tasks that need to get done, out of the
possible million tasks and then you say now he gets to choose who does all
these tasks but they just want to spread them out so everyone is doing one of
them or something. And it's a really different problem actually. I mean, it may
sound the same, but it's really different because he now has a set of K out of a
large number and gets to choose one for himself, whatever he wants, an gets to
have them each two-different things.
>>: So it seems that telling the other computer what the whole task is a lot more
than just telling it it's task number three?
>> Paul Cuff: Yes. Well, if you're looking at -- if you're looking at all the possible
tasks out of a million, then, yeah, that takes that into account, right, or out of
hundreds of trillions, right? That takes that into account. It's like the entire
description of what you're supposed to search, right?
But, yeah, it's a different problem, although, you know, you can still try to -- you
can still hack away with it with some of these same tools of -- and find some
interesting results actually. For the case I just described and in the network
where you just have, in this network here, you can get an upper and lower bound
that are pretty close to each other, actually. So ->>: You go back to the star networks [inaudible] the original problem [inaudible] I
mean [inaudible] and the rest just need to be different. If you of a complete
edges between them [inaudible].
>> Paul Cuff: Yeah. I think it would, if you just look at the three node case. If
you had a third connection like this, yes. Actually it might even be solved in that
case like -- and it does help, but I mean it's hard to say I guess you'd compare
the sum rate.
>>: [inaudible] can you [inaudible].
>> Paul Cuff: Oh, beat log K?
>>: [inaudible].
>> Paul Cuff: Yeah.
>>: [inaudible].
>> Paul Cuff: Yeah, that's a good question.
>>: [inaudible].
>> Paul Cuff: Maybe you could. I don't know. So if you just.
>>: I mean certainly monotone. So if you have more [inaudible].
>> Paul Cuff: That's a good question. Like here's a good question. If you're just
allowed any network you want, like you said, the complete graph, yeah, what's
the lowest rate, sum rate? Yeah. Is it constant? Maybe it is.
>>: And the [inaudible] is the best [inaudible].
>> Paul Cuff: Yeah. As long as you're allowed bidirectional. Yeah.
[applause]
Download