>> Yuval Peres: Okay. So we're very happy to have Paul Cuff back from Stanford, and he'll tell us about investigating the fundamental network burden of distributed cooperation. >> Paul Cuff: All right. Thank you. Well, it's good to be back. I'm going to talk today about the communication requirements to set up cooperative behavior and the examples we'll use will be a computer network where you want to distribute some computation among the computers in the network. Many examples could apply. Could be talking about UAVs coordinating their flight patterns or something. But we'll look at a very simple model, just to see how information distributed tools can be used to analyze the communication requirements in some of these settings. And the first half of my talk will just be about the goal will just be to distribute these different computations. The second half will be looking at an adversarial setting and what changes, trying to relate it to encryption if there's an adversary. And in between we'll do a short break. Since it is the third group, I have to do something on the white board. Okay. So this is an artists rendition of a data center. Let's say this is one of Microsoft's data centers around the world where they're doing some cloud computing or something. In fact, let's just -- so these are a bunch of computers in racks somewhere and chair negotiated together with some cables. Let's say some question, maybe a search request comes in, into a buffer. And so you buffer up all these requests and then they're distributed out to the computers in the network. So I assume that Microsoft probably does it similar to how other companies do their search where a small portion of the search is done on various computers so they can quickly get the results and send them back. And so the search request would be farmed out to these different computers, and they'd all send back the results and then you could send it back to the person who searched for it. So the question is, in a setting like this, what are the requirements, how much needs to be send across these network cables in order to get the various parts of the task done at different computers? Let's look at a -- we'll look at a really simple model. We have computers and some sort of communication setting, and each one of these boxes represents a computer. >>: [inaudible]. >> Paul Cuff: No. Did I say they're assigning? They're assigning tasks. And what happens is some of the computers will be given -- so the tasks will just be numbered from some set. They all know what the number corresponds to as far as what that -- what a computation they have to do if they're assigned number three, so forth. And some of the computers are given their assignments and other computers get to choose their assignment based on the communication they receive. The goal is that no one -- no two computers do the same task, okay? So we'll look at sort of a cascade network and the network that you might call star network as examples of how we might analyze this. Okay. So what do I mean by assigning tasks? Well, let's start with the two node case just so we -- we're on the same page, okay. Here's a computer who gets a task assigned. In this case there will only be two tasks, task number one and number two. So two different parts of the computation. This computer is assigned to do tasks -- one of the two randomly. This computer then needs to pick the other task, okay. What -- how many bits must be sensed through this network in order to achieve this? This is not too tricky. Anyone want to venture a guess? >>: One bit. >> Paul Cuff: One bit, right. You would say which task you're doing, they'd pick the other one. Okay. Suppose that there are more tasks. Same problem, though. This computer just needs to pick a different task. How many bits might you need now in this network? >>: [inaudible] one task at a time [inaudible] two tasks. >> Paul Cuff: What was that? >>: You mean to assign one task to each computer or [inaudible]. >> Paul Cuff: No, just one to each computer. So they just each need to do one task different from the other one out of the [inaudible] that's automatic. >>: And you have just two computers here. >> Paul Cuff: Just two computers in this one. >>: [inaudible]. >> Paul Cuff: Okay. Less than one, right? I was trying to trick you into saying log K, right? If you wanted to say what task this is, you would need about log K, log base 2K. Bits, to say what task it is and they could pick a different one. But you certainly don't needed more than one bit. One way to do that is just divide the tasks in half and use your bit to say whether you're in K over two the first half or the second half and they pick from the half but like you've all said, you can do much less than one bit. Let's see what we mean by doing less than one bit. Okay. So a common -- we'll use the common information theory assumption that we're actually buffering a lot of tasks, and at each time [inaudible] we have the same problem where we want the other computer to pick a different task but from time to time, it's a new independent problem, we just want an independent assignment of tasks that are different, okay? So we have a long buffer coming in and we want to solve the problem, but we're allowed now to -- notice that I said bits per task. We're allowed now to save up our bits, okay, and use them all at once. So for example, let's let K equal five for this example. So the tasks are one, two, three, four, five. There's a sequence of independent task assignments coming into the first computer. And he might take maybe eight of them. And let's see what -- suppose he's -- suppose the rate we're trying to use is one-fourth, okay. Then with eight symbols you now have two bits to work with, okay? So you, based on this task assignment you will -- the encoder, the first computer will send two bits to the other computer and that computer will have four possible received messages. Each message will correspond with the sequence. So it's a function. From these bits to a sequence of tasks for computer two. Right? So this we'll call the code -- a code word out of this entire thing, which is a code book. And the idea is if this was the code book, then the encoder would look at this sequence and say well, there's a four and there's a four, the second sequence certainly doesn't work, okay, so I'm not going to send the message 01, but we see if we compare every bit with this first run, they're different, different, different, different, different, different, different. Okay. It works, right? So you can send the message 00, the decoder will then two sequence of actions and you will have accomplished the goal. So for each code book there's a probability of error. There are some sequences where you won't be able to find a message to send such that it will be different in every place, right? So the probability of error for a given code book what we say is that a rate now is achievable, here we were looking at rate one-fourth, but a rate is achievable if no matter what probability of error is demanded, let's say one in a billion, you can find a buffer size N and a code book that gives you less than that probability of error for any probability of error. And it's not achievable if no matter how long you look you'll always have some error. Okay? >>: Are you always just assume independent error uniform. >> Paul Cuff: Yes. For this problem. These -- I mean, the same analysis would work for other things but for these problems, we'll always assume that. So the problem for the two computer cases solved with rate distortion theory ->>: What? >> Paul Cuff: Rate distortion theory. And so let's see what the answer is. The minimum rate needed is the mutual information between X and Y. Now, X and Y are random variables. Well, where X -- the distribution of X is given by the problem, uniform one through K. You get to pick the distribution P of Y given X to minimize this quantity, but the constraint is that X and Y cannot be equal with probably one, okay. So this optimization problem will tell you the minimum rate needed in that. And mutual information is the sum of the entropy of X plus entropy by Y minus their joint entropy, and entropy is this expected value. So let's just work out what's going to be the minimum one for this problem. I think I'll do it over here. Let me propose a P of Y given X. Let's say -- I'll just -I'll give you the P of Y given X that minimizes this and then we'll show it actually does, okay? Let's let Y be uniform over all the choices not equal to X, okay? So then we have I of XY equals. So H of X, which is log K, right, H of Y is log K, whoops, and H of XY, now this is two -- a sequence of two non equal numbers out of K, so here we have K, K minus one, right? The entropy of XY. Okay. And so that equals log K over K minus one. Okay? Now, is that the lowest -- well, let's write mutual information another way. H of X minus H of X given Y. This is another valid way to write mutual information. This is fixed by the problem. We have no choice in this, so this is going to be log K. >>: [inaudible]. >> Paul Cuff: Yes. Here I was doing it the first way of spanning that's on the slide. >>: Why eventually is uniform [inaudible]. >> Paul Cuff: Yes. Yes. It is when you marginalize out X. So then here we want to minimize this, so we want to maximize this, so what's the most that it can be, well H of X given Y less than or equal to. We know that it's not equal to Y. So this only has positive mass on K minus one entries so it's less than log of K minus one, right? So there we go. This is that minimizes the mutual information, right? Okay. So as you've all said, it becomes less than one bit. If K is two, then it's one bit, right, and it goes down. This is approximately one over K bits. Okay. So now let's look at it in this sort of cascade network. Now there are K computers total. The first one is given an assignment, okay. The next K minus one computers all get to choose their tasks based on these communications done kind of in a daisy chain like this. So now what rates are needed so they can all choose different tasks? They're going to each -- all the tasks are going to get assigned in this problem. In this case, we know the minimum rates. And it's done in the following way. You first assign the last computer a task using the rate we just talked about, log K over K minus one. Then every -- since that communication went through all the links, everyone knows what that task assignment was. So now all the remaining computers kind of reduce the set of tasks by one. They say oh, the last computer is doing task five, so none of us are going to do five. >>: [inaudible]. K task? >> Paul Cuff: Yes. >>: [inaudible]. >> Paul Cuff: Yes. Certainly across this link here you're giving all the information. But not necessarily across these, right, because -- so ->>: [inaudible]. [brief talking over]. >>: [inaudible]. >> Paul Cuff: Yes. >>: [inaudible]. >> Paul Cuff: Yes. Exactly. Exactly. >>: [inaudible]. >> Paul Cuff: Yeah. So optimally what you do is you just send the last guy his assignment, and since everyone else heard it they'll reduce the problem size by one and you'll peel off the next one, but now it's changing K to K minus one, okay. So this, the way you prove that this is optimal -- well, first of all, let's look at how much rate on each link. This link only has this communication. But the next one back, RK minus two is the sum of these, right, and so forth. So what we get is that RI is just -- when you do this summing up, you get log of K over I. Okay? And what we can do is we can show that this is actually a lower bound individually for each link just by looking at the mutual information between X and everything past it, which can now be unordered. We don't care about the order. Okay. Now, let's look at the sum rate in this network. At first glance, okay, I cheated and gave the answer away already. Okay. At first glance, this -- it's a pretty bad network in the sense that the message you send to a computer has to go through many links. So it may look like the communication scales with maybe the square of the number of computers when you add them all up because -because the communication to any computer passes through a number of links that's linear in the size of the network, right? But when you add the log K over I up, take out the K log K and notice that the sum log I is log K factorial, which can be approximated by K over E to the K. Now, the K to the K can sells this, and you end up with a K log E. All the logs we're using are a log base two, okay. So we're using log base two so we can call it bits. You can use any log you want but no one works -- I don't know a computer that uses gnats for their base arithmetic. Okay. So it's linear in K, okay? So that's kind of nice. So you know, log two of E is one point something. Anyone? No? >>: [inaudible] you can just do the usual one that [inaudible] so far so that it matches the same bound. >> Paul Cuff: If they ->>: I mean, [inaudible] is gone so far [inaudible]. >> Paul Cuff: Oh, I see. So you're saying tell this guy what you -- well, okay, but you -- no, you don't want to tell him a set. Oh, yeah, you do. Yeah, you tell him what you have so far, right. But that would suggest that these rates would go up, right, because then here you would have to tell him these two assignments. If you did it that way, where you like, you say I've got assignment five and the next computer picks his task, he picks ->>: [inaudible] which is typically why he just tell what ->> Paul Cuff: Yeah, you just tell the set. So you get rid of the ordering? >>: Yes. >> Paul Cuff: Yes. Still that would go up. It will still go up because you would get K choose two, log of K choose two here, right? >>: But that's the [inaudible]. >>: [inaudible] [brief talking over]. >>: You do actually send because even in the two computer ->> Paul Cuff: Okay. Yeah. See, the trick with the two computer case is you can't understand it by looking at it one letter at a time, because there's no message about just that one symbol. So in -- you just are identifying a sequence that works. >>: No, but those two [inaudible] because they are clear -- because [inaudible] the number of computers in K. >>: [inaudible]. >>: You still have to know all this first very long. >> Paul Cuff: Yes, this thing here. >>: And it [inaudible] is saying it reads something complicated that says you wait for a long ->> Paul Cuff: Yeah. And then you just tell them a sequence that works. >>: But that's the -- I mean, I'm not saying that you should ->> Paul Cuff: Yeah. >>: [inaudible] the information you would need to convey on every edge that makes it every level [inaudible]. >> Paul Cuff: So if we look at our -- I think that the key is you're looking at the second rate and saying okay, how do we get this smaller. Because this is smaller than this, right? >>: I'm not saying smaller, I'm saying this is the attention information [brief talking over]. >>: Getting something more ->> Paul Cuff: Yeah. Because what I'm doing is I'm assigning, basically if you look at the message on this rate, on this link, you're basically assigning all the rest of the guys unordered without assigning this yet. It's like if you already assigned this, then you have to convey more information. >>: [inaudible]. Those jobs will not be assigned further, so and further all the jobs will be assigned. >> Paul Cuff: Yeah. >>: Because these days are different. So essentially whatever [inaudible] you are doing, this and exactly this information is pass on this [inaudible]. >>: No. Because even -- because [inaudible] the last one [brief talking over]. >>: The last guy what your sequence is. >>: Oh, so you are not assigning this guy and then ->> Paul Cuff: No, they are in reverse order. And it does say it saves the rate that way. So ->>: But you can't think of it one tile at a time, right? >> Paul Cuff: No. You have to think of it -- you have to buffer it up to really get these rates. Yeah. So this is -- so you get log of -- you get basically log base K of E for each computer on the network. Okay. Let's look at a ->>: [inaudible]. You need to do it sequentially. Or can you do it -- pipeline? I mean will you need to wait for ->> Paul Cuff: The idea is you need to wait ->>: [inaudible]. >> Paul Cuff: Okay. You will always be behind a little bit or something. Like you would have to know your task assignments ahead of time. The question is how do you actually like -- well, the question I've made out of this is how do you actually do a block coding with task assignment? You -- the first computer would have to know his task assignments ahead of time, and after it's waited for block of a hundred, it would -- it would send at a lower rate for the next hundred that it's receiving, it's receiving the next buffer, and it would be using that time to send the information from the previous block. And then they would all execute. >>: [inaudible]. I mean the information transferring through the network. Say you were already ->> Paul Cuff: Oh, I see. Because you're like -- you now know what the message are, but you have to send here and then send here and then send here to the delay of going down the network. Yeah. So I haven't considered like yet that you -- if you count the delay for each link then that would affect things. But of course you could pipeline like you were saying. Yeah, you could pipeline this. Okay. So very similar, but what we're going to do is say the first K minus one tasks are assigned, okay. Some -- for some reason they know that their task assignment is rigid and you only want to know the last one. No really the idea would be that you would want to know, some of the tasks are assigned and some aren't and what's the answer. But we're looking at the simple cases first. This case actually turns out to be not very simple. So in this network, the last computer, the task that he does is fixed. Like it's the task that all these guys are not assigned, right? So there's no flexibility here. He just has to know enough information from the rest of the computers in order to figure out which task it is that's not being done. Okay? But so we don't know the actual optimal answer here. But let me propose one idea. It's like Kamal was saying. Tell -- send here the information about what task you're doing. That has to be done. We know that that must get across somehow. You save a little bit because you use the side information that this guy knows one task is eliminated and you do [inaudible] coating. So you save a little bit of rate. And the rate needed there is log of K minus one, okay? Then here, this person, he looks at this computer looks at both these tasks and gets rid of the ordering because that doesn't matter, so saves a little bit -- saves one bit by getting rid of the ordering, but basically transmits this set to the next one and they keep going so forth, like that. And then the rate you would need at each one is K minus one choose I. Okay? Now, this is kind of a strange scheme because it's totally different than the network before. Now we get rates, here's a -- consider this a graph of the rates and the network. They peak in the middle, right? So you need really high rates in the middle and not very high at the edges. Plus all of these rates are already very high. I mean, this is -- this is worse than log K for, you know, log K minus one for every link. Now, this -- so this is one scheme that seems reasonable. It's not -- we could also go through and minimize each rate, each link individually, and you'll find that this does not -- this is not necessarily optimal. There's a whole trade-off. If you allow for higher rates on some links, you can then get away with small rates on the other links, okay? >>: So you can get equal and rate to -- transmit the sum of the numbers? The last one we just did cast K times K plus one over two minus the sum [inaudible] previous numbers, then the rate will be fixed. >> Paul Cuff: Interesting. Okay. So you do like the mod K sum or even just the real sum. Just the real sum. Mod K, fine. Either one. Interesting. I wonder what rates we'd get in that case. That's nice. That's nice. >>: What's the [inaudible]? >> Paul Cuff: Is this. This is an achievable point, okay? But it's a whole region, right? It's a high dimensional region because you could always trade off rate one for rate five in -- but -- so in the first case, there was an optimal point. The region was rectangular, and there was one optimal point that summarized everything. In this case, there's a lot of tradeoffs, and this is one achievable region -- point in the region. >>: [inaudible]. >> Paul Cuff: Okay. Then yeah. Exactly. So if you're trying to optimize the sum, I mean that scheme would be interesting to look at, like what does that take. So that -- that at worst would be like log K for each one. Right. Okay. >>: So if all these X1 to XK minus one are assigned to [inaudible] communicate with each other? >> Paul Cuff: They just need to communicate with Y. But the thing is the only way this computer can communicate with Y is through the others, okay? >> Paul Cuff: So we're looking at saying the connections are fixed. If you change the connections then, yeah, you would want -- maybe, probably you would just want them each to go there. Although it would still be a challenging problem actually. Okay. So if you minimize each rate individually you find that the rate needed on each link is just log of I plus one. So that actually suggests that might be an interesting -- no, but this does a little bit better than log K, right? Because for the first ones you're much smaller, I is smaller than K. >>: [inaudible]. >> Paul Cuff: So this is not necessarily -- this is not achievable. In fact, I think this is far from achievable. This is just saying that the smallest rate you need on the first one is log two. You are don't need to send actually the whole thing, what you can do, the way -- what you would do basically is you say okay, I'm going to assume that these rates are really high so the last computer knows all of these, knows all these assignments. If he knew all those assignments he can use that side information to greatly reduce the rate with which I need to send this, and I can get away with only one bit, right? So using those types of bounds you get a lower bound. But this is still -- the sum is still K log K. So even with these low bounds you're getting worse than the other network, which seems kind of funny considering that -- considering that there's only one task being assigned in this network and the other ones had a lot of tasks assignment, right? >>: So does zero [inaudible] achieve this [inaudible]. >> Paul Cuff: No. No, mine I think would be more like K squared. >>: [inaudible]. >> Paul Cuff: This one here? >>: Up to this ->> Paul Cuff: No, but this one does. >>: Up to this. >> Paul Cuff: Exactly. So yeah. So divide it -- yeah, exactly. So up to the [inaudible]. So let's look at this last one here. So we have this star shade network. And basically what we have here is the guy in the middle gets a task assigned, okay, and could communicate directly with all of the remaining computers. And each of those computers gets to choose its task. So you may try here to use -- go back to the two node, the two computer limit. We knew that with two computers it was log of K over K minus one, right? So what if I just used log of K over K minus one over each -- for each of these? And the -- that doesn't work, but it will guarantee to let -- it will guarantee that for example Y2 doesn't equal X and Y1 doesn't equal X and each of these will be different than X, but they won't be different from each other necessarily, if you try to use that communication scheme, right? So what extra needs to be sent to make sure they're all different from each other? Okay. Well, one idea that actually works quite well is to -- is the following. You say let each of these computers have a default value, okay, so this computer will do task one by default and two. So they have K minus one defaults and K is not a default anywhere, right? So if X gets assigned K, then good, they're all fine. If X gets assigned something else, like three, then all he has to do is send a message to this computer to say move out of my way and this computer knows to then go take assignment K, right? So now what rates are needed? Well, you just have to know how frequently will you be telling an individual computer to move out of the way, that's one over K, right? So sending a bit stream of -- that's one, one over K of the time, that takes a rate of H of one over K where H is this -- is the binary entropy function. It's just negative P log P minus one minus P log one minus P. So it's the entropy of a Bernoulli random variable. It's approximately log K over K, okay. So already we're seeing that this scheme is going to be very good because when you multiple it by K, you're going to be less than linear, right for the sum right and the entire network. But I have to mention a scheme that works even better than this. The reason is because the answer's so nice. But let's look at -- see you've all probably thought I only had one talk that had the golden ratio in it. So let's look at just the three node case of this network, all right? So this computer gets assigned to task one through three and then these two need to be different, right? We can make an improvement over this default scheme. I mean, he could take a default one and default two and we know how that works. We could do an improvement by first sending an estimate of X to them. And the reason this is -- okay. So in other words, you sent at a very low rate you send a guess of what X is that isn't usually right. It's only right slightly more than it should be. Okay? So these two computers have a guess at what X is to work with. Now, they're going to pick default values center around that guess, okay? So in other words, if the guess was two, then this guy's default will be one and his will be three, okay? Something like this. Now they go with the same original scheme where this computer then on the second round, so it's like a two stage communication, first stage is send the estimate, second stage is tell them if they have to move out of the way. Now, because of the estimate, that I will have to move out of the way less frequently. Right? So there's a trade-off of how good of an estimate do you send of X and, you know, how much do you save from not having to tell them to move as often. And you optimize that with calculus and you get that actually the optimal rates with this scheme are log three, which is what it would take to send X exactly, minus log of the golden ratio. And the golden ratio is the square root of five plus one over two. Doesn't come up too often in communication problems unless I'm working on them. But you know, there are golden ratio fan a particulars who think it's in -- who see it in pine cones and everywhere, right? >>: [inaudible]. >> Paul Cuff: Exactly. Exactly. >>: So [inaudible] somehow [inaudible] something else. I think it's -- I mean, you wanted to use less bits ->> Paul Cuff: Yes. Yes. So it's like sending the estimate. The thing is when you send an estimate of X, basically what you do is you device a joint -- you use the same rate distortion type of results and you say -- you get to pick the rate that's going to equal I of X, X hat is the rate you're going to have to use to send the estimate. And you get to choose so P of X hat given X, P of X is fixed and the receiver essentially is going to get X hat. I mean, it's going to come in the form of some long sequence and stuff, but essentially it's as if the decoder knows X hat and you only needed to send it with this much mutual information. Now, the cool thing is so X the distribution is going to be something like this. This will be like so one, two, three, this would be something like P of X hat given that X equals two, okay, would be like, you know, some probability -- some conditional probability mass function where it's slightly more likely that X hat equals X, right? But the mutual information is like, you know, your -- when this is uniform, you're at the peak and so you're quadratically getting penalized based on what conditional distribution you use. So you're not paying much. In fact, when you optimize this, the rate -- so this total rate is like .9, around .9. The rate of the sending the estimate is .04. So you know, you hardly use any rate, yet you get away with reducing this probability here. And so the savings are kind of linear. It's like the penalty's quadratic, the savings are linear. So that's why it works. So okay. So for this part of the talk we looked at these three networks, and the -what we noticed anyway in the first network we could analyze it exactly and the sum right was linear in K. When we went to the second one, where a lot of the assignments were given already and you only had to sign the last one, the rates got worse. We saw that our men's is kind of K log K and we were given the scheme here during the talk that gets that same scaling. And then in this last network, we don't know the answer, but we do know a scheme, so we have an upper bound on the minimum rate because we know a scheme that works. >>: [inaudible]. >> Paul Cuff: Yes. It's not -- it's not -- there's no lower bound on that to say that that's actually optimal. >>: If you [inaudible] you also got ->> Paul Cuff: Same. Same. >>: [inaudible]. >> Paul Cuff: Yes. You also got [inaudible]. >>: [inaudible]. >> Paul Cuff: Yeah. So ->>: And you know [inaudible]. >> Paul Cuff: I don't know that. >>: So what lower bound is [inaudible]. >> Paul Cuff: Oh, okay. We know that you at least need the two computer limit. Like if none of the other things were in the network. So log K over K minus one. But see, that's going to be like constant. All right? I mean, log K over K minus one times K, how does that scale? >>: [inaudible]. >> Paul Cuff: What? >>: Constant. >> Paul Cuff: It's constant. So the lower -- the lower bound we know would be constant scaling. Of course log K is already [inaudible]. Okay. Okay. Great. Time for intermission and since we're taking too long, we don't have that long, but I want to -- we're going to move into like an adversarial setting, but I thought since this is the theory group we might enjoy a little puzzling question before we move on to this part. Okay. So suppose you have some random variables, two sequences, X1 to XN, Y1 to YN, I'm going to abbreviate these as X to the N, Y to the N. Okay? I'm not going to tell you they're joint distribution, but I'll tell you something about their joint distribution, okay? And the -- what we'll want to -- the question will be if we have some Markov chain with a variable in the middle that separates these entire sequences, okay, so conditioned on U, the random variable U, the sequence X1 XN is independent, conditionally independent of Y1 to YN, okay, and I'll want to know something about the cardinality of U. How big does the cardinality of U have to be? And it will tell us something about you know what's the connection between XN and YN, this minimal cardinality we need on U to separate them, okay? So let's say that cardinality of U equals two -- it's going to be exponential in N, okay, so let's say two to the NR, okay. So we want to know what R is necessary as N gets really large and as I give some constraints on the joint distribution of XN, YN, what R will be necessary. Okay? So the first questions, there will be three, first one is I'm just -- so the first thing I'll tell you about XN and YN is the following. When we define a typical set, okay, can we -- so typical set -- this isn't -- I'm not going to have room here. I'll do it on the other side. Okay. We have a P knot of XY. Do be confused to think that this tells us the distribution of XN, YN, it's just a joint distribution okay? XY. And the typical set with respect to that joint distribution, epsilon N, is a set of XN, YN sequences such that the sum N plus one over N sum -- the indicator function that XI, YI equals AB, A, B, minus P knot of AB, this absolute value is less than epsilon for all AB. Okay. So all I'm saying here is that the empirical distribution of these sequences, okay, this here is the empirical distributions of XN, YN. If you count them up just looking at the pairs, and you look at their distribution, that is very close to P knot of AB, okay? That's what -- so this set is all sequences whose empirical distribution is close to P knot. Okay? And I'll call this typical set. Now, all I'm going to tell you about XN and YN first is that with high probability, you know, greater than one minus epsilon, they are in the set, okay? So the first step is simply that I'm saying that the joint distribution of these two sequences is such that with high probability they're typical, according to P knot of AB. Or P knot of XY. Okay? What cardinality do you need for U? Now, this first one's sort of a -- so this first one's sort of a trick question, because XN and YN, they could just be a deterministic sequence that is in that set. Okay? So if they were deterministic, then they're already independent, so you don't need any -you don't need anything to separate them. >>: [inaudible]. >> Paul Cuff: No, given this information, what's the best as far as lowest rate, okay? So given that information about their joint distribution, what's the lowest -you can pick the joint distribution and you can pick -- you can actually pick the joint distribution even with U, right, to minimize this. Okay? And then you get that R equals zero because you can just have a deterministic sequence. Now, let me add one more thing, okay? Let's add that -- so the first one was that XN, YN are NT with probability, you know, one minus epsilon, right? Okay. So and we saw that, you know, R equals zero. Okay. Next one. Same thing. But also XN is distributed IID according to P knot of X, according to marginal on X. Okay? So now I'm telling you a little bit more that this is actually IID. Also, there are jointly typical with high probability. Now, what is the minimum U that separates them? >>: [inaudible]. >> Paul Cuff: With the marginal distribution of P knot. P knot is the same P knot that was used to define the typical set. X is also IID with that distribution. I'll just give these answers. So now the minimum rate needed is the -- is the mutual information between X and Y. Okay? And this is relates to coding theory because -- theory because X is like a source of information, U is like a description about X, and there's no requirement that the reconstruction is actually random, but you do require that it's jointly typical. So you know, you have this Markovity naturally from describing X and then decoding it. So this is just another way of looking at rate distortion theory. Now, the last one is that -- can you see this? Okay. That XN and YN are IID according to P knot of XY. Okay. So that's what I'm telling you about the joint distribution now is that these are actually IID or the P knot of XY. Now, you get to pick a joint distribution with you that's Markov and try to minimize the cardinality of U. Okay? And this one, the answer it's not mutual information, although I've had -- I know many people to guess mutual information here. But, know, it's that the rate is greater than what I'm going to refer to as the common information between X and Y. This was defined by Weiner [phonetic], this term, common information. And what it means is so the common information between X and Y is the minimum mutual information between the pair XY and some other variable U where U satisfies the Markovity XUY. So this is just a single -- you know, this is optimization problem. You just look over all U that separate XUY that you may think, well, this looks a lot like the problem we were just looking at, except for that this mutual information is very different than you might have guessed is the answer, right? It's mutual information between the parent Y and the U that's in the middle. Okay? Anyway, so this common information will come up in the work we're doing so I thought this was the way to introduce it. >>: [inaudible]. >> Paul Cuff: What? >>: What is the [inaudible]. >> Paul Cuff: Oh, yeah, yeah, yeah. So U -- okay. So when you do this observation you only have to look over U of size I think X times Y. So the product of the cardinality minus one or something like that. Still pretty big. Okay. >>: [inaudible]. >> Paul Cuff: Okay. So what's this picture of? >>: [inaudible]. >> Paul Cuff: [inaudible]. Yes. Okay. Great. So let's -- we're going to now talk about coordination of behavior in an adversarial setting, okay? So what is -- how would -- let me describe encryption, the basic problem of encryption as I see it, anyway. You have a source of information that we'll call X, and you want to send it with a description to a decoder that will then decode X. Okay? But an enemy sees this communication, and you don't want the enemy to know anything about your information. So in order to do this, you're not going to have luck in this setup unless you have a secret key, okay. Now, these two know the secret key. The enemy doesn't. And this problem was looked at by Shannon and a very negative result showed that -- yes? >>: You are [inaudible]? >> Paul Cuff: Yes. Well, what I'm -- I am taking away all computational limits, okay? So we'll just look at from the information theoretic perspective. So, yeah, clearly this is not how encryption is being done. Because of this -- basically because this result's very negative. It says that the secret -- amount of secret key you need is as much as the information you're sending, and you can never reuse the secret key, right? So one time pad. The way you do this encryption is you have a message here, you have the secret key, and you just do X source of the entire message, right, and you get -- you get something that's independent of the message. No matter how long someone sits here an tries to crack it, they'll be no better trying to figure out your message. And then you undo it with the exclusive [inaudible] also. Okay. So then the minimum rates needed, R1, if R1 is the communication rate and R2 is the secret key rate, then R1 equals R2, that should be equals, I need to fix this equals the entropy of X, right, the rate needed to describe X without loss. Okay. So, yeah, let me emphasize again, so what we'll look at the information threat limits assuming there's no computational limits. Far from practice. Although the -- with the advent of quantum key encryption and you know, maybe some realistic ways of actually exchanging long keys, then maybe this will have some relevance coming up. So, okay. Now, I want to sort of broaden the mind-set of encryption into a game theoretic setting. Why do you need to set a secret that someone else can't discover? Is there a way of modeling this as a game where the person who might discover your information would use it against you somehow? And if so, then let's look at it in a game theoretic setting. So here's the simple repeated game where you have an enemy who can take one of two actions and here's me, and I can take one of these two actions, and based on the actions we take, we get a score. And you know, I wouldn't want to take action zero if it were known I was going to take action zero necessarily because the enemy -- then the enemy would choose zero because he would want me to avoid getting two points. And if I chose one, and he knew I was taking one, he would choose one and get me the negative one. So it's better actually if I randomize over zero and one just right and void him being able to slap me with the worst penalty, right? Okay. And there's a score of this game. Now, suppose that I have a partner so my team so our paths are actually a combination of two actions. Again, we might like to randomize and the optimal randomization might be correlated between the partners trying to cooperate, right? So let's look at from this context let's throw it back into a communication setting, okay? If you have person A and person B separated, acting in a game, and they each are randomizing by flipping their own coins, okay, then they're -their actions will be independent from each other. Okay? But if you add a communication link between them, then they can achieve coordinated actions for competing in this game, okay? So suppose an encryption setting were looked at in the sense of you have some sensitive information that affects the actions you're taking in this multiplayer game, you want to send something about that information to your partner, who will then take some actions in the game as well. And you don't want the enemy to know anything about your actions, okay? So we're going to try to relax -- basically we're relaxing the requirements of encryption by not requiring that you send your information. The only difference here is that that doesn't equal X, okay? So we're not requiring that you send the information exactly, we're going to try to relieve the rates needed on the encryption and on the communication but at the expense of only decoding something correlated with K, okay, not equal to X. So now we ask what are the required rights. We're not compromising secrecy at all. So the enemy can't know anything about X or Y. What are the required rates of communication and rates of secret key to achieve a given desired P of Y given X. Okay? >>: [inaudible] communicate, have a higher correlation. >> Paul Cuff: And give ->>: A way a little bit and then might be ->> Paul Cuff: Good. So you can actually look at, analyze the game the way we did it was we say the enemy sees all of your past actions, the enemy sees this public communication, of course, and you do a repeated game, and you say of all possible ways of using the communication you actually show that you can't do better than doing it this way, okay? So where you keep the secrecy perfect. Because essentially when they see -- if you leak some of the secrecy, you're not achieving -- you're achieving some of the new conditional joint distribution, right, secretly so. Okay. So here's the result. >>: [inaudible] always keeping the secrecy as a measure of them linking information. [inaudible] information. Communicating more, correlating better. >> Paul Cuff: So there's -- okay. I guess I do need to explain one thing. The way -- when we actually showed the lower bound -- this is optimal for games, we were actually letting -- we -- this -- it was slightly different setting. There was still -- this guy actually picked his actions. It wasn't given to him randomly. He generated actions, sent something about it to the partner who generated the counterpart actions. In that setting, this is optimal for games, okay? There's a slight -- there's a nuance when you do this exact setting where this is supplied randomly as sensitive information. And you're describing something about it to your partner. The nuances that the game might be kind of degenerate in the sense that you just want your partner to -- you don't care about keeping a secret, there's nothing they can do to stop you when you do a certain set of actions, right. If the game is degenerate in that way, then you don't want -- you don't care about secrecy, so you just go for correlation. So there is kind of a balance between leaking the secret and getting higher correlation if in this exact setting. Yeah. So the setting where we showed the optimality in the game -- in the repeated game was slightly different than this. Yeah. >>: [inaudible] because [inaudible] because otherwise maybe you want the enemy to find that ->> Paul Cuff: Yes. So actually that's one -- that's one thing I think that would be -- that I'm interested in learning is taking this exact setting, applying it to a game. Okay. In this exact setting, X is better viewed as side information about the game, okay? X is like some side information, you have a random payoff matrix, and X is correlated to that randomness. And then you say if you had a partner who was watching the side information and could send you something about it at a certain rate, what should they do to get you to do well in the game? Okay? That's what this exactly applies to how this would be applied to the game theory. The setting we were able to solve was the one where rather than site information this is another action in the game, and you generate it here. It's not given to you, okay? So ->>: [inaudible]. >> Paul Cuff: Yeah, you can generate with mixed strategy, describe it to your partner. But it turns out that that actually makes the problem a little easier that your generating it rather than sending it. Yeah. So I mean it's a little bit subtle, but there is some concrete connection with game theory, but there's also what you've all mentioned actually is relevant too. Okay. So the result is again you have this auxiliary U that you get to use in your optimization. Now, we have a trade-off actually, so it's not just going to be a minimum communication rate and a communication encryption rate. There's a trade-off. You can exchange some encryption rate for some communication rate. And the rate pairs need to satisfy that our ones greater than the -- mutual -between X and this made up U that you get to use for optimization. And R2 is greater than the pair XY with U, the mutual infringe between the pair and U. U again has to satisfy it's Markovity. It separates X and Y in the Markov sense. Okay? So you consider all U and each U gives you a rate pair, a set of rate pairs that are achievable. Okay? And when you -- but the nice thing about this is what if we looked at the extreme? So we're going to get a region. It will be some trade-off region of R1 -- rate 1 and rate 2. But what if we looked at just the extremes? What the minimum communication rate if we don't care about the encryption rate, okay? Well, by the data processing and equality you can say that's minimized by letting U equal Y, okay? But because of this Markovity. So if U equals Y, then we have mutual information between XY. Okay. So minimum communication rate is the mutual information. What if we want to minimize R2 without caring about R1? Well, that's exactly this Weiner [phonetic] problem here. That's the common information. So the two extremes of this region are the mutual information and the common information. It's kind of nice to see these kind of fundamental quantities showing up. So okay. Lastly let's look at an example. Let's go back to the task assignment and solve it for the encryption setting. So in the task assignment problem one computer was given a task randomly and just wanted the other computer to pick a different task. What we're going to adhere is that you want the other computer to actually pick randomly among the other tasks, okay? So let me try to give you an example of what we might be trying to do. Say there are a bunch of files, and you have a virus scanner that's looking through for viruses, okay, and some network. And it picks a file -- and it goes through these files but you don't want it to go through in a deterministic pattern because you might get some clever virus that follows right behind it and never gets caught, okay? So -- so what it's doing is it's randomly picking a file to search and then randomly picking a new one. Okay? Now, you have another process, another part of the virus scanner that's doing the same thing. Picking files to scan. But you don't want to be scanning the same one, so what rate of communication and what encryption is needed so that this other virus scanner can randomly pick another test you're both always randomly picking two different tasks. Okay? That's the scenario here, right? Okay. So you're given the task -- you're given yourself uniform one to K and you apply those results and what you get, what you find is that a rate -- here's one point that's achievable. Communicating at rate one and using encryption at rate two. Okay. Now, how do you -- how does this work? Well, essentially what you're doing, what you're sending is you're telling the decoder a set to randomize over. So you get assignments task three and you're telling them randomize over one task one, four, seven, 12 and 15. And then they're randomly picking among those, and they don't even know what task you have, they just know it's not in that set, right? So to achieve this point, you're basically the sets that you're telling them to randomize over are about half of the total tasks. So it takes about one bit when you construct this code book to tell them -- to give them a sequence of these sets that they can randomize from. Okay? Now, but when -- but you still need some encryption. What you use the encryption for, one bit is used for the one time pad, okay? The extra bit is to randomize your -- to randomly generate the code books in a special way because notice that even when you communicate this way, they're not really randomly picking from all of the other tasks, they're only randomly picking from some of the other tasks, right? So if you randomize your code books enough, you can -- you can make it all work out that they are actually randomizing over all the other tasks, okay? Now, you can -- there's a trade-off. If you want, you can get away with those much smaller communication rate. You can get away with a communication rate of about one over K, approximately. But it's at the expense of a lot more common randomness, a lot more secret key. And so the way this works is you reduce the size of the set that you're telling them to randomize over until eventually you're just telling them which task to take, right? And but because they're not randomizing very much privately, then you need to do much more randomizing of your code books. Okay. So you require a much larger -- a secret key rate of approximately log K, rather than just a constant two which was constant for all K. Okay? So all right. So I mean the general idea here, we looked at how information theory could solve some simple cases or at least give you some insight into some communicating for simple cases of coordination. The idea would be, you know, real networks aren't of the simple structure and some -- and this probably isn't a very good model of how tasks are distributed in the network, but just a demonstration of how these tools might be -- might be used. And just a reminder of what we looked at. The nonadversarial case in three different networks and then the adversarial case where we saw a trade-off between communication rate and secret key, and we also saw that some of these fundamental quantities came up, the minimum communication rate was mutual information, the minimum secret rate was common information. Thanks. [applause]. >> Yuval Peres: Are there any questions? >>: And perhaps going back to the [inaudible] what happens if you insist that there's no buffer and no code book, you just have to -- you do things on experiment at the time and then what you're interested in, you know, and maybe use some randomness in [inaudible] what you're interested in is just the entropy of this random variable that [inaudible] seems you can still do some things. >> Paul Cuff: So you still are -- you're allowing for still coding in the sense that you're getting away with the entropy, right. So if it's less than a bit, then you're going to use less than a bit. >>: So you [inaudible] most of the time -- >> Paul Cuff: I see. So instead, it seems like instead you're saying rather than working with mutual informations, which is what you get out of this code book method, you're using just entropies of random variables. So in other words, you're instructing correlated random variable and sending it [inaudible] with entropy. I think you would end -- well, a couple of things. First of all, you'd end up with significantly higher rates I think if you did that. But also, there's another way to model this short delay in a network. You can say suppose you don't want to have buffers at all or you want to constrain the buffers to some limited size and say we don't want more than this much delay. And people have looked at information direct problems from this perspective. I haven't really done it much, but there's work on source coding with fixed delay and that would also be relevant here, I think. You know. And ->>: [inaudible] sort of interesting because then it makes it so much simpler, like here's something you can do maybe if communication doesn't always work or something like that. >> Paul Cuff: Yeah. Exactly. So ->>: But measuring your cost in termination of the entropy, your own [inaudible]. >>: Yeah, but. >> Paul Cuff: You're already buffered. >>: But suppose that your communication [inaudible] is smart so you can -- it can -- it can send the entropy [brief talking over]. >>: But the two people that have ->> Paul Cuff: Sort of a separation. You want sort of a separation where you can get -- you can have a -- the channel coding is done at the entropy limit by some box that you don't have to deal with and you ->>: Just like [inaudible] I mean, that's [inaudible]. >>: I have a stupid question as well. Just about motivation that you gave, the mathematical problem [inaudible] so I think ->> Paul Cuff: For this thing here? >>: No, no, right [inaudible] when you were talking about searching and [inaudible]. >> Paul Cuff: Yeah. >>: So it seems to me not application that the -- there would be a lot of information content in what the actually [inaudible] is worth or I'm just wondering if there's some -- >> Paul Cuff: I actually -- I think a good model, I think a good model for task assignment better than the one we've done here would be to say you have a huge set of possible tasks, like all these possible searches or something. And you are given some subset of those to do, like, you know, some set of size X and those involve like all the key words in your search query or something like that, say, oh, you need to do -- here's the different parts we can break out this search into several different tasks, right? So the -- so then the computer and the network that's the controller for the network or whatever would not just be given a task to do, would be given a set of K tasks that need to get done, out of the possible million tasks and then you say now he gets to choose who does all these tasks but they just want to spread them out so everyone is doing one of them or something. And it's a really different problem actually. I mean, it may sound the same, but it's really different because he now has a set of K out of a large number and gets to choose one for himself, whatever he wants, an gets to have them each two-different things. >>: So it seems that telling the other computer what the whole task is a lot more than just telling it it's task number three? >> Paul Cuff: Yes. Well, if you're looking at -- if you're looking at all the possible tasks out of a million, then, yeah, that takes that into account, right, or out of hundreds of trillions, right? That takes that into account. It's like the entire description of what you're supposed to search, right? But, yeah, it's a different problem, although, you know, you can still try to -- you can still hack away with it with some of these same tools of -- and find some interesting results actually. For the case I just described and in the network where you just have, in this network here, you can get an upper and lower bound that are pretty close to each other, actually. So ->>: You go back to the star networks [inaudible] the original problem [inaudible] I mean [inaudible] and the rest just need to be different. If you of a complete edges between them [inaudible]. >> Paul Cuff: Yeah. I think it would, if you just look at the three node case. If you had a third connection like this, yes. Actually it might even be solved in that case like -- and it does help, but I mean it's hard to say I guess you'd compare the sum rate. >>: [inaudible] can you [inaudible]. >> Paul Cuff: Oh, beat log K? >>: [inaudible]. >> Paul Cuff: Yeah. >>: [inaudible]. >> Paul Cuff: Yeah, that's a good question. >>: [inaudible]. >> Paul Cuff: Maybe you could. I don't know. So if you just. >>: I mean certainly monotone. So if you have more [inaudible]. >> Paul Cuff: That's a good question. Like here's a good question. If you're just allowed any network you want, like you said, the complete graph, yeah, what's the lowest rate, sum rate? Yeah. Is it constant? Maybe it is. >>: And the [inaudible] is the best [inaudible]. >> Paul Cuff: Yeah. As long as you're allowed bidirectional. Yeah. [applause]