22778 >> Melissa Chase: Today we're very happy to have Leo Reyzin here visiting. He's a professor at BU. And he's done a lot of work on things like extraction and key agreement and leakage resilience and a whole bunch of other things. And today he's going to be telling us about some notions of entropy and applications to key agreement and leakage resilience. >> Leo Reyzin: Thank you, Melissa. Thanks for inviting me. So entropy that I'm going to be talking about is sort of a thread that ties everything together but that doesn't mean that I don't want to tell you about the parts that it's tying together. So feel free to ask questions about any of this. I want it to be interactive, and the different pieces that I'm trying to fit together, we'll see how well it works. Okay. So there are many, many ways to measure entropy. You may have learned some of them in some of your classes. There's the classic Shannon entropy. That's not the one I'm going to talk about, so if you don't know what it is you're safe. If I want to just, I'm the bad guy, I want to guess your password, obviously your password has to have high entropy. But from what point of view should it have high entropy. It should be hard to guess. That's really what we mean. We'll try to define entropy throughout this talk as the probability that the address we predict the sample from your distribution. We have some distribution we're talking about. And you look at the probability that the address views successfully predict a sample on one attempt. And then while that probability is going to be obviously less than one, so in order to make a log positive, we'll have to take a negative of the log, that's just how the way you always deal with entropy, have to take a negative of the log. If the probability stays at negative 80, you have 80 bits of security. >>: For the entropy, the max operation, why is the max is not here? >> Leo Reyzin: I'm sorry? >>: Usually for main entropy, it's minus log log. >> Leo Reyzin: So I'll show you in a minute. I'll show you in a minute. But this is -- all right. I'll write the formula in a moment. But I want you to think of this as the philosophy behind everything I'll talk about. Because there won't be just one definition, there will be many. But they will all be about the probability that the adversary predicts the sample. That's the important part. We want that probability to be small, for obvious reasons like passwords. Okay so far? Wanting it to be small. Okay. So now if I actually draw a distribution in capital letters, I'll try to be consistent, capital letters will denote distributions, lower case letters will denote samples from those distributions. If I have a distribution of passwords, you're the adversary, which one will you go for? If you're moderately smart, you're not going to go for this one, you'll go for this one, you'll go for the most likely password. So I take the maximum probability event over all possible events in my distribution. I take the negative log of that, that's the main entropy. That's the definition of min entropy. It's a very simple definition. And the reason it's called min entropy is because if you took the max outside of the minus sign, you would have a min. Yeah? It's just a silly thing. Okay. So that's min entropy, and that's been around for a very long time. What's it good for? Well, so one I already talked about passwords. If you have a password, it better come from a low min entropy distribution. No, sorry, high min entropy distribution. If you have a password and you want sort of 80 bits of security, it should come from a distribution with min entropy 80. So the maximum probability should be two to the negative 80. What else? Well, it turns out it's good for message authentication. This was observed some time ago by Renner and Wolf. So let's say I have a key that does not come from uniform distribution for whatever reason. I can sample things uniformly. And I want to use that key as a message authentication key. So that you and I share that key and it's a symmetric messaging, I want to send you a message using that key. So the way I'm going to think of this key it's going to be length N, split it into two parts length N over two each. Each of those halves I will think of it as being in the correct finite field. And here's my message M. And I'm going to think of it as belonging to the same finite field. The key is twice as long as the message in this case. And I'm going to multiply it by the first part, I'm going to add the second part, and that's going to be my message, that's going to be my MAC. My tag on the message. So it's A times N plus B. Have people seen this? How many people have seen this MAC construction, I'm curious, in the audience, not everybody has seen it. This is a MAC construction that actually goes back to Wagman and Carter [phonetic], all the way back to 1981 when they talk about universal hashing. Sort of there's a first paper of Carter Wagman, universal hashing for algorithms and data structure purposes, and they realize two years later that the same tool that they already developed actually works for security. And this is a message authentication code. So it turns out this is secure if A and B are chosen completely uniformly at random. This has nothing to do with entropy. This is just uniform random key. This is going to guarantee that any adversary who sees sigma and M cannot modify the message and come up with a different sigma that verifies correctly. But what if we don't have full entropy, what if we only have some K bits of entropy in here? So think of the entropy somehow is distributed through this A/B. It's not like certain bits have entropy and certain ones don't. I'll draw it like this so we can visualize it. Imagine that all entropy is sitting in B and there's also in A. And so the gap, the lack of entropy which I'm going to note by this sort of brick color is N minus K. That's what's missing. It turns out that this thing is still secure, and the amount of security you get is what's in this sort of bricked off area. That's your security. The security is the difference between N over 2 and the entropy gap. In other words, what is this really saying? It's saying that you better be at least half entropic. Right? And essentially there should be some entropy in A and some entropy in B. Because if either component has no entropy, then you clearly didn't need such a complicated function. You could have done without it. But it says if either component, if both components have enough entropy, this is going to work. I'm not going to prove this result. >>: That's for that specific. >> Leo Reyzin: That's for that specific MAC. And in fact later I'll show you a result that says that this is essentially the best you can do. If you have less than half entropy, then message authentication is not possible. If you and I share a key whose entropy is less than half of its length, we cannot authenticate a message without somehow coming up with more shared randomness. That's this key alone is not enough. >>: Does that imply that you have to do entropy and somehow shorten the key length, perhaps? >> Leo Reyzin: So it would be nicer, right? It would be nice if the key, if the key has entropy K, it would be nice if it also had length K. You're sort of wasting all these G bits, right? But we don't have a nice generic way to shorten the key. So it's obviously -- it's best to have keys that have full entropy. Still the case. It's just that if you don't have quite have full entropy, if you're worried about not having full entropy, you may still be okay. Does that make sense? And this holds only with respect to min entropy. If you have a different notion of entropy, it's not clear this is going to work. And there's a proof of this in Maurer Wolf. And I'm not going to prove it here. Does it make sense what I mean by security, what I mean by security? The game is -- no? So please ask. So the game is you and I share this key. I want to send you a message. Christine wants to mess with it. So I send you the message. And she sees the message and the signature sigma on the message. She doesn't know the key. She wants to modify the message and the signature so that you still think it's okay. And the way you're going to verify is by computing the same formula. And seeing if it matches. And then the point is that this is not going to be possible, and it's not going to be possible exactly with this probability. The probability of success is 2 to the minus this. So this is the amount of bits of security you get. >>: [inaudible]. >> Leo Reyzin: This Maurer Wolf paper, you know, I'll have to look it up for you. I think it's in the information theoretic, I think it's in the privacy simplification paper. >>: This is the [inaudible]. >> Leo Reyzin: Yes, yes. All the results so far are information theory, there are no computational assumptions we're making. We're not assuming hardness of anything. Just assuming entropy of keys. Other questions? Good. What else is it good for. This is the MAC. I just want to keep it up there. It's also good, it turns out, for secret key extraction, provided -provided we have something. And I'll tell you exactly what I mean by have something. But once you extract the good secret key that has full entropy, then you can use it for one-time pad encryption if you believe in hardness assumptions you can use the random generator to expand it out, you can use all sorts of things. So secret key extraction seems like a nice thing to have. What I mean by secret key extraction, there's a primitive called an extractor that will take an input that has reasonable amounts of entropy, it will take a uniform seed, and it will output something that is uniform. How many people have seen extractors? Yes. Almost everybody. Okay. Good. So this seems like cheating because I have to give a uniform seed to get a uniform key, so kind of what's the point, if I could do that, then why do I need this thing for? But it's not cheating in the following sense, in the sense that the seed and the output will be jointly uniform. This is known as a strong extractor, sort of the old classic literature. What does that mean? That means that any adversary who sees the seed still doesn't know the key. Right? Doesn't know -- the key doesn't know the seed. So they jointly distribute almost uniform epsilon close to uniform, whatever epsilon you want to set. It essentially means the seed can be reused like a catalyst in a chemical reaction. What's a catalyst? You throw it into a chemical reaction, reaction happens, you take it out, throw it into the next chemical reaction. Same thing with a seed. You need a seed. You need to throw it in in order to extract a good key, but the seed is reusable. In particular it can be public, as long as it's uniform and independent of this W, you're okay. So that's the important feature of extractors. They do need uniform bits, but those bits can be public and reusable, as long as they're independent of whatever input of the extractor you have. >>: [inaudible]. >> Leo Reyzin: Right. And some construction, good point, some constructions the seed can be much shorter, the seed can be logarithmic, the length of this, the length of this, it can be really, really short. Actually, we won't be taking huge advantage of that fact here, but it is a useful, it is a useful feature that you need very little ran form randomness and you can extract. So this is reusable. Extractors go back to the mid '80s although they were not formally defined until the mid '90s really, but we've seen them. We know them from then. Okay. So I just want you to remember those are three things that min entropy is good for already without any extra work. None of it is my work here. But then you can ask what about privacy implication, how many people know what privacy implication is? Good. I'll define it then. So privacy implication is the following thing: Alice and Bob have a shared W. This is -- think of this scenario of message authentication that I was talking about before. It's the same scenario you and I share the same secret key. And our adversary knows something about it. Okay. And because the adversary knows something about it, it's no longer fully secret. It's partially secret, perhaps, because the adversary might not know everything about it. So classic example is -- there's several classic examples. A really classic example going back to the '80s, to the original motivation is quantum key agreement, quantum key distribution. If I said things over a quantum channel, the adversary may eavesdrop on some of them but not too many. So the adversary may eavesdrop on some bits that I sent without modifying them, but after a few there's a good probability they become modified and we detect it. That's essentially what's going on. So the adversary knows something about our shared key but not everything. And there are more motivations today, sort of people are trying to do key agreement where we take two cell phones and we shake them together. The cell phones share some common information W about the way they were shaken. But it's hard to argue that the shaking pattern was fully secret. Maybe partially secret, but there's some base pattern that is probably the same for every shaking. So there's sort of some notion of entropy, right? That Alice and Bob share some entropy in this key that nobody else has. But it's really hard to pin it down, know exactly what Alice and Bob share. So that's the scenario of privacy implication. And the goal is to agree on a uniform secret R. Because why? Because once we agree on a good uniform secret, then we can use it as a one time pad, a pseudo random generator, whatever. This is sort of the start of all cryptography, once you have a uniform secret. So that's the goal. The goal goes back, the first non -- sort of a nonconstructive construction of [inaudible] and nonconstructive construction, existence proof. And then back in '84. Okay. And the simple solution, given sort of the terminology that we have today, is just to use an extractor. Right? Alice can generate a random seed I, send it over to Bob. We don't mind that Eve also gets I in the process. Because we don't mind that the extractor seed is public. That's the beauty of extractors, right? And then Bob can do the same thing. So this makes sense. It's secure against Steve who eavesdrop on the channel and knows something about W. It seems to work, except in order for extractors to work, you have to know the entropy of what you're extracting from. You certainly can't extract more than the entropy you have. Extractors are designed to work for specific entropies. How do you know the entropy you have? The entropy you have depends on Eve's knowledge. We'll use Y to denote Eve's knowledge, right? So in order to know what extractor applied it would seem like we have to to know the entropy and we may not because it depends on what Eve knows. So it seems like min entropy is not good enough anymore. So let me talk about a different definition, a conditional definition of entropy. Let's think about it for a moment. Imagine that what we have is a uniform key initially, perfectly good uniform key. It's not like we're unable to generate uniform bits. It just so happened, however, that Eve, through her devious ways, found out the hemming weight of our key. The number of ones in it. So let's think about the entropy. Well, let's say that Eve knows that the hemming weight is exactly N over 2. Half the bits are ones. Why is the knowledge of Eve. So let's say Eve's knowledge is this. The probability of this fact happens to be that. There's a bound. Just trust me on the bound. Well, if -- that fact is not terribly surprising, one over two root N is not terribly surprising. So Eve's knowledge is not a lot. You know a lot when the knowledge is surprising. So because of that, we could prove -- this is a fairly straightforward proof -- that the entropy of the original key conditioned on Eve's knowledge, this is Eve's knowledge, is what it was originally minus the log of this. And the log of this is half log N minus 1 or half log plus 1. >>: This always half of->> Leo Reyzin: If I tell you that Eve's knowledge is X, then my probability reduces, my entropy reduces by log of X. That's a fairly straightforward thing to prove. Does that make sense? So if Eve's knowledge is -- so this is the easiest to imagine condition, suppose Eve knows that the first bidder message is one. Then her entropy got reduced by one bit. The probability of that fact is a half. >>: Notion or [inaudible]. >> Leo Reyzin: It is. For min entropy it also works. Actually for min entropy it's even easier to prove. Because you just look at the conditional probability, you cannot jump by more than the probability of the condition. Min entropy is about the max. The max cannot increase by more than the probability of the condition, because the way the conditional entropies behave. >>: [inaudible]. >> Leo Reyzin: I'm sorry, inequality ->>: Inequality, like when you say W given Y is greater than N minus whatever, so the greater than equal to is because of the inequality of the ->> Leo Reyzin: I see. So if this was equal would this also be equal? >>: Because it sounds from what you said that you can say that [inaudible] W given Y is min entropy of W minus min entropy. >> Leo Reyzin: No, no, it could be better. The entropy doesn't have to get reduced by that much, because the probability of the -- it could be that W wasn't initially uniform, their examples. So imagine the following example: W wasn't uniform initially. W has the first bit always equal to one. And Eve finds out that the first bit is equal to one. She hasn't learned anything new, so it doesn't reduce the entropy. So you don't necessarily get equality. Okay. So that's if Eve knows that the hemming weight is exactly half, which is most likely hemming weight for a given string. If Eve knows that the hemming weight is N, well, that's a very, very unlikely event. So she suddenly learned a lot. The string is all 1s. So the entropy has gone to zero. But, you know, sometimes you don't want to reason about the sort of what if Eve knows this or what if Eve knows that. You want to say conditioned on the knowledge of hemming weight do I have entropy or not. So you really want to sort of talk about the average in some sense, conditioning, right? Not conditioning on the worst case, not conditioning on the best case, but conditioning on the typically case that happens, how can you even talk about it. So you can still try to do the same thing. Remember that min entropy is the log of predictability. You can still ask, run the whole experiment, and ask what is the probability that Eve predicts my sample. So what is the experiment? The experiment is I pick a random W, I give Eve its hemming weight, and Eve is incredibly smart. All powerful. I ask given that she knows the hemming weight, what is the probability that she predicts my W? It's a meaningful thing to ask. And so that's what we want to define the probability of predicting W given Y as our entropy. And it turns out that the right thing to define is the expectation over all possible Ys of the maximum of the probability that W is that for a particular Y. So whatever Y Eve knows, whatever particular value of Y Eve knows, she's going to go for the most likely W. If she knows the hemming weight of this, she'll go for this W. If she knows the hemming weight is that, she'll go for that W. But she doesn't control the hemming weight. I choose W initially. It's the expectation over all Ys, it's not the worst case Y; she doesn't control that. >>: Why would you try to pick a competing sample, why not give one that's optimally good for you [inaudible]. >> Leo Reyzin: So often you don't have the luxury of picking repeated samples. Let's say you shook your phone, you don't want to tell users to shake phones a million times, right, and often you don't know that Eve knows the hemming weight. It's very hard to bound exactly what Eve knows. It's assumption, Eve knows no more than some number of bits and hope that you're right. I mean, this is too clean an example for real life. >>: But [inaudible] make these assumptions, you could sample ->> Leo Reyzin: Then I would sample -- if I know Eve gets the hemming weight I would sample things with hemming weight full, I mean I'm sorry, half, hemming weight half. And that's it. I would not even bother. Sometimes I don't have that luxury. >>: [inaudible]. >> Leo Reyzin: So W is some knowledge of Eve. It's a random variable, correlated with W. W and Y are correlated random variables. >>: [inaudible] like a random variable with some distribution? >> Leo Reyzin: Yes, exactly. Y and W are correlated random variables. And you can define this notion. And so this was defined by [inaudible] and Smith way back in 2004 as we called I think average conditional entropy or something like that. And then, of course, at the end you have to take the negative log as always. When you talk about entropies you take logs, just easier so you don't have to deal with really small numbers. Notice that average min entropy is not the average of min entropy. The expectation lives inside the log, not outside the log. Makes a big difference. So a good counterexample to consider. Imagine you have a system where for half the cases min entropy is 0. Eve knows everything. For half the cases min entropy is a thousand. What's the probability that Eve guesses? It's about a half, right. So you don't want to average 0 and a thousand and get 500 bits of entropy. You don't have 500 bits of entropy, if half of the time you have zero and half the time a thousand. Shannon entropy doesn't behave this way. In Shannon entropy, for those who know Shannon entropy, if half the time you have 0 bits of Shannon entropy and half the time you have a thousand bits of Shannon entropy, then on average Shannon says you have 500 bits of entropy. You don't here. Not from security point of view. If you sort of look at a lot of older papers doing the kinds of stuff I'm talking about here, they consider Shannon entropy because they didn't have min entropy to talk about and they would prove things like the resulting key has 500 bits of Shannon entropy. Well, that's not interesting. 500 bits of Shannon entropy means it could be 0 half the time. That's not good enough. And so really took a while for people to converge to better notions of entropy than Shannon. So take the log after you've done everything. That's the moral of this. Does that make sense? >>: Based on a distribution applied? >> Leo Reyzin: Right. For any pair of distributions YW, I can specify -- I can measure. Of course if I don't know anything about the distributions, then it's bad. So I'll give you examples. I'll give you examples, I promise. Okay. So now we have this average min entropy. And we can ask the same question, what is that one good for. Had the original min entropy, what's this one good for, the average case. You can still prove essentially directly from the definition. If you have an adversary who knows Y, and your password is W, then the probability of guessing it is exactly minus to the entropy. If you think about it, this is a better way to model passwords. The adversary knows something about you. That's the Y. You pick the password; that's the W. It's always conditional. Life is never sort of you live in a vacuum, there's no information that the adversary has. This is the right way of modeling a password. There's a W you know, there's the Y the adversary knows, and they may be correlated in some way. The adversary may know your gender, age birthdate, whatever, that's the correlated stuff. Turns out that the same message authentication code that I talked to you about earlier also works for average entropy. The probability that I think Christine was our adversary, the probability that Christine will be able to modify the message and the signature and will still verify is exactly, is exactly two, whatever it was before, except you substitute average entropy for the entropy, which is nice. And the nice thing that was proven by Sylvia Vodon two months ago, literally, is that every extractor that works for entropy also works for average entropy. I think with the loss of three bits or something like that. But essentially -- three is the actual number. I'm not exaggerating. >>: [inaudible]. >> Leo Reyzin: This one, whatever I just defined, I call it average entropy, or people call it conditional entropy. But I think conditional min entropy is probably the right notion, the right way to call it. So we knew before that some extractors worked and some worked with a fairly big loss. So we proved that no loss is needed, you lose three bits and that's it. So all extractors will work for this notion. So, right, this whole thing, and so what I mean by extractor work, I mean the output of my extractor will look uniform even to the adversary who knows Y. So there's an adversary who has this additional information, the output will still look uniform even to that adversary. It will be statistically close to uniform, even given that extra information. So that implies that privacy simplification works. Remember what motivated us, the question of Alice and Bob and Eve and Alice and Bob share something W and Eve knows something about it. Now finally we can say that we can put a rigorous condition on what we want in order for privacy simplification to work. What we want is that the conditional entropy of W conditioned on Eve's knowledge should be high. And if it is, then we're good, then an extractor will work. >>: You still have to sort of say I believe Eve only knows Y, and that's ->> Leo Reyzin: Correct. Yes. Yes. Yes. You have to say -- and I'll show you how to bound that, how to be able to say that in a moment. >>: Large number of trials. >> Leo Reyzin: No, no repeated independent experiments at all. So if you and I share a W and we're worried who is going to be the next adversary, David, we're worried that David knows something about our W, but we believe that there's some entropy left in W conditioned on his knowledge, then we're okay. >>: This setting reminds me of the sort of work of [inaudible] key exchange. So is it exactly the same setting? >> Leo Reyzin: Same setting, except password authenticated key exchange -W is the shared password. Think of this as the client. This is the server. They share password W. In that setting they don't mind making computational assumptions. This is information theoretic. So the first talk of this talk is information theoretic. We'll see how far we get into computational. So password authenticated key exchange also solves this problem. The nice thing about this is this can be very efficient. Extractors can be incredibly efficient construction's, that kind of product, no fancy math required. >>: The Vadon's theorem, is that only when you're talking about Eve knowing the hemming weight, or does it work like other stuff? >> Leo Reyzin: No, for any -- the rigorous statement, as rigorous as I'll make it on the slide is this. If the conditional entropy is equal to K, then the -- then for some K an extractor was designed for that specific K, then the output would be jointly uniform. So for any Y, not just hemming weight. For any Y, as long as the conditional entropy of W conditioned on Y still remains high. If you look at extractor literature, it says this extractor will work, this is the K epsilon extractor, means it works with anything for min entropy K. It proves, if it was a K epsilon extractor for unconditional entropy, then it's also a K epsilon extractor for conditional or maybe K plus 3 epsilon, something. >>: So this is -- so this extractor, the single extractor will work for a large number of distribution [inaudible]. >> Leo Reyzin: Exactly. No matter what Y is as long as W has entropy conditioned on Y. >>: Then, in particular, if you don't have some sample Y, be able to decide membership. >> Leo Reyzin: Y doesn't have to be efficient in any sense. It's all purely information theoretic. It's really cool. And, moreover, it's like a two-page, a page and a half proof with all the details. It's nice. I can find the pointers if you're curious. So the point is we don't need to know what Eve knows, we just need to know that she doesn't know too much, and then we can apply an extractor. And, by the way, so these ideas about sort of Eve knowing something, they go back to the '80s and the first papers people designed were what if Eve knows this thing and what if Eve knows that thing? And slowly they realize the techniques are general enough, but it took a while. In modern language, it's all trivial. But it took a lot of work to get there, and took a lot of definitions of the right notions. Now we'll move to a harder problem: Information reconciliation, information reconciliation is like the previous setting except life is only a little bit worse in that what Bob has is not actually identical to Alice, but it's somewhat different. Maybe not too different. So if you shake two phones, right, the accelerometers are not going to measure the exact same thing because of calibration issues, orientation issues and whatnot. If you do quantum key distribution, some quantum bits could flip along the way because of, I don't know, cosmic rays or whatever causes cosmit rays to flip. I'm making stuff up. And the problem also goes back to that -- so everything goes back to Vadhan, [inaudible] Roberts. This is the like the third time I have to cite this paper. Right? They already considered that. Okay. And what do I mean by symetric [phonetic], for now let's think of hemming distance, the number of bit flips. There are more general definitions. But for this talk I'm going to think about hemming distance. Okay. So how do we do this. There's a generic technique that people have thought about. I'm going to apply some algorithm called S for sketch. S stands for sketch to my W. I'm Alice now. I'm going to send some error correcting info, S to Bob. And then I'm also going to extract. So my W is going into algorithms. The error correcting algorithm and the extractor algorithm. Before it was just extractor. Now I'm doing for both. And I send both things, the extractor seed and the sketch, over to Bob. Bob is going to recover W using the error correcting information. I'll show you how to do that in a moment. I'll show you a concrete example of this error correcting information but not yet. And this thing will get extracted, and they'll extract the same thing. So that's the big idea of how to do this sort of thing. Well, how long an arc can you extract? That's, of course, what you want to know, what kind of extractor you can use. And how long an error you can extract depends, of course, on the entropy, right, of W, conditioned on what Eve knew before, even started she knew Y, and what she finds out now, which is also S. So now you've added something, right, to Eve's knowledge by sending that error correcting information. So here's a nice lemma that says that if you condition on a new variable, you are reducing by the bit length of that variable by no more than the bit length of that variable. So it intuitively makes sense, if I have 100 bits of information that's secret from Eve and I said three bits, Eve ought to have no more than 97 bits. It should be, rather, Eve ought to have no more than 3 bits, should still have 97 bits of secrecy. Intuitively, it's the right statement. The fact it's actually the right statement is nice, the fact you can prove it. >>: I'm sure we have that S on the right-hand side, S inside? >>: Yes. >>: So it's a raised condition on I? >>: In other words, if this information was entirely uncorrelated with W, imagine S is three bits of information entirely uncorrelated with anything, then conditioning on it shouldn't reduce the entropy. And, indeed, it won't because it will increase this side and it will decrease this side, and it will allow -- I'll show you an example how this is used. It's actually very nice. Does it make sense? So the entropy of W when it's conditioned on an additional thing, gets reduced by exactly the bit length of that thing but increased if that thing is independent of W to begin with. And that holds even if W was initially conditioned. You can keep conditioning and conditioning and it will work. So why is this useful? I actually want to show you an instantiation of this protocol and how easy the analysis becomes given this lemma. So I'm going to show you how to build this error correcting information. This has nothing to do with entropy. It's skewed enough but the analysis will be very nice after it, so I think it's worth it->>: Where can I find this lemma? >> Leo Reyzin: I don't want to scroll all the way back. It's in the same paper of myself, Pestrofski, Doris and Smith [phonetic], where we define this notion. So we define the notion and prove this lemma. I can send you the link. So, okay, so let's build this S. We're going to build it, not surprisingly, out of error correcting codes. The way we'll think of error correcting codes, a code is sort of maps N bit messages into N bit code words. That's the way we're going to think of it. So here are all the messages, and this is the entire space of N bit messages. This is 2 to the N possible code words, much larger space of 2 to the N. They're all far apart. That's the way to think of codes. And any two code words will differ in at least these locations. That's the nice feature of error correcting codes. Therefore, if you have fewer than D over 2 errors you know exactly where things came from. Okay. So that's just the background on what I'm going to be by error correcting code. So how do we then use this error correcting code to actually transmit this error correcting information? What if we know, you know, I'm Alice. So I have my W, right? You have W prime. What if you happen to know that my W is a code word? Well, as long as your number of errors W prime is not too big, you can just decode to that code word. You don't need any information from me. If I have a code word and you have a corrupted code word, decode, we're done. Well, that's not the case, of course, always in life. But what we're going to do is if my W is not a code word, I'm going to shift the entire code to make it a code word. What do you mean by shift? I mean, I'm going to take a random code word. This is a random code word. Error correcting encoding over random message. That's a random code word, and I'm going to shift it since we're in the binary case, just XOR to random a W, to my W. That offset is what I'm going to send to you. How do you decode? So this is what I send, you're right, this shift. How do you decode? Let's just forget about the formula. Think about the picture. You have a W prime, right? You shifted back by the offset. You decode. And you shifted forward by the same offset. Shifting backward and shifting forward in the binary case is the same thing, it's all XOR, but logically that's what's happening. Because that's what I've told you is happening. In that offset my W is a code word. You have to offset things and work it out. This construction is due to Jules and Wattenberg [phonetic] in 2002. But there wasn't really a nice analysis of it, because they didn't have the notion of entropy. So there was some analysis of it, but let's actually analyze it the way I want to analyze it because it all fits into a one slide and I can prove a strong thing about it, whereas before we didn't know how to do it. Alice does on her own she extracts R and she also sends this information over to Bob, a random code word, XOR'd with her W, that's the information. And we have this lemma that I already showed you that this is the amount of entropy we have. And now just using this lemma I can tell you how big R can be. Think about it. How much entropy do we have? Well, what happened? There was some entropy to begin with. And then there was randomness that went into here. How many bits of randomness went into here. Well, the length of the random thing that we encoded. >>: What is R? >> Leo Reyzin: R was a random value that we encoded to get a random code word. We took a random R and slide the error correcting encoding function to it, to get it. So this, whatever was originally there plus R, because we added that in. And the bit length of S is N. N is length of R W. And N is always the length of RW. Let's do some more bit counting. This is N because the code goes from N bits to N bits. This is N. This is it. So the entropy of what we have conditioned on what Eve has is whatever we had originally plus or minus N. If you have a good code then the entropy loss due to this extra step is M minus N. If people give me better codes, I get a better result, and I can hand off the problem to error correcting people and tell them give me the good codes. Good codes are the ones where the distance between M and N is not too big, and where the coding is not that much longer than the word. >>: This is the entire analysis in one slide. That's a nice thing. I know exactly how many bits I can extract, whatever entropy I had initially minus -Yeah? Sorry? >>: So the reason you do this is that you don't want your secrecy to depend on ->> Leo Reyzin: Well, I don't know the distribution of W. I could also shift it to -it's trickier to analyze, but I could shift it to a fixed -- can't shift to a fixed. In a linear code I could do other things. >>: What if I already know the distribution of -- do I still need to do the shift? >> Leo Reyzin: Yes. And distribution is not enough. >>: Why is this shift in time existing in the original [inaudible]. >> Leo Reyzin: It does, implicitly. It instantiates, but it's in there. I'll show you off line. But it is in there. They use systematic codes, different trick. Systematic code and send over the tail but it's all equivalent because of duality of error correcting codes. It's really the same thing, really. Okay. So let's consider the case of an active adversary. So everything we've done so far but now Eve can actually modify the messages between us. Right? That seems trickier. Right. So what we want to say is, look, if we catch Eve doing this, then the best thing we can do is give up. This is sort of like the key agreement talk we had one hour ago. And this goes back to Maurer from '97. And notice this is interesting given the two things are the same, because the trick is to get the extractor seed across. If I can get the extractor seed across, make sure Eve does not modify it, then I'm okay. That's really the main trick. Because Eve gets to modify stuff. So how do I get it across and make sure I detect modification. And the problem is that we cannot use our -- the only we have to begin with is W. And we cannot use it as a MAC key if it's not at least half entropic, that's what I showed you at the very beginning. It can be used if it's half entropic, so what do I do? >>: [inaudible] when W is equal ->>: So actually I'm not going to invest in equal case, but essentially you saw what this weighs, solve it the same way as we solve this previous things adding some authentication in. So the idea due to Maurer Wolf is to use interaction in order to do this. And so -- their work was for the case when W is equal to W prime. And Bob and I had follow-up work that made W not equal to W prime. And still made it work. So I want to show you how they authenticate one bit of the extractor seed, and you will believe me they can authenticate the entire extractor seed, because it's just bit by bit. So how do they authenticate one bit of extractor seed? Bob sends a challenge, and that gets -- Alice supplies an extractor on W to that channel. This is not the ultimate extractor they want to use, this is just an extractor as a tool. >>: With the modify the channel? >> Leo Reyzin: Yes, yes, yes. Correct. We'll see exactly what happens. Okay. And then Alice wants to authenticate a bit B. What she's going to do she's going to send B, extract the result if the B is one actually otherwise she'll just send zero, no extracted result. And Bob is going to accept if this is correct. So you stare at this a little bit. It makes absolutely no sense, we'll try to make it make sense for a moment. Eve can make Bob's view not equal to Alice's view. Eve is perfectly capable of taking the extractor seed that Bob sents, making it into a different extractor seed. Nothing stopping from you doing that. Alice will extract some P prime. In fact, extractors don't guarantee anything. She could even take this T prime and modify to a T that Bob will accept. Eve could do that evil thing. But that doesn't concern us. What concerns us is that Eve cannot change 0 to a 1 what's a 0, 0 is when Alice sends nothing and a 1 is when Alice sends something. And all we want to prove is that it's hard to change something to -nothing to something, right? So T has entropy. And there is -- this thing is acting up. Okay. Wait, where were we? Here we were. To prevent change 1 from 0 make it the equal to number of ones, same as before. B is a bit. Authenticate a single bit for now. Eventually I want to authenticate a whole long extractor seed I. But I'm going to do it one bit at a time. So I'll only worry for now changing of 0 to 1 because in my big extractor seed I'll make it balanced and then so I'm not going to worry about this one. So even though Eve can make the two views unequal she cannot change logically the bit that Alice is sending. Alice is sending some bit logically over to Bob using this trick. So how do we prove that? >>: Wait. You said that Bob accepts a 1 if the extraction is current. >> Leo Reyzin: Yes, sorry. >>: Always accept -- like if he gets a 0 does ->> Leo Reyzin: He accepts. If he sees a 0, thank you for sending me 0. If you see 1 bit, you verify that extraction is correct. >>: Eve has no W. >> Leo Reyzin: Eve doesn't know W. That's the only thing Eve doesn't know. The problem is we're trying to agree on a key from something that Eve doesn't know and we have to use that something in order to agree on a key. Chicken-and-egg sort of problem. Okay. So here's the lemma from a paper by Bob and myself. The entropy of the extracted value, this value that Bob, that -- this value T right here. The entropy of the value T conditioned on everything Eve knows including X is sufficiently high. How high? Well, the 2 parameters in extractors, the output length and how close the output is to uniform. So if the output is far from uniform it's not a very good thing. But if output is close to uniform, it's exactly as long as the extractor output minus 1. So now what does this lemma give us? This lemma immediately gives us that if Eve sees X and whatever other knowledge she had as long as W had entropy, this whole thing, this T is going to have entropy. If Eve sees X then T is going to have entropy and if Alice is sending a 0, she's not going to respond. So she cannot possibly reduce the entropy. If she's sending a 1 that entry may get reduced by the response. If she sends 0, entropy cannot be reduced. For Eve to guess T is going to be hard. Interesting, hand waving causes -- [laughter] -there may be a security question here somewhere. So this is going to work, this lemma holds, of course, as long as there's enough entropy in W to begin with. You can't extract something out of nothing. There's a precondition to the lemma, that W has enough entropy. But we know how much entropy W has, because we know how much it has initially. We know how much each authenticated bit reduces. Remember, we're authenticating many bits one at a time this way, but we know how it gets reduced from the previous lemma. That each time you give T bits of information it reduces by T. So as long as there's enough total, we'll be okay. So this is a very simple analysis of a protocol that did not have such a simple analysis. And then Alice is basically the extracted output has entropy. >>: Different [inaudible]. >> Leo Reyzin: Yes. For every bit this thing is repeated, not a particularly efficient protocol. But I'll -- I can tell you off-line about different follow-up work that makes it better. But essentially for every bit I send you a fresh random challenge you send me a response, fresh challenge, response and extractor seed can be logarithmic, it's not terrible. Right. So it's a multi-round thing. >>: What was it using to authenticate? >> Leo Reyzin: So why are we doing this in the first place? We're doing this because we want to do a privacy simplification, the way we do privacy simplification is by me sending a seed over to you, seed I. Send seed I over to you by doing one I bit at a time. That's going to -- each time I'm going to come up with this fresh X. I'm going to do challenge response. You see if it's a 0 or 1. You go into the next bit. We actually have an implementation that's not terrible. What was it? Well under a second somewhere, right? >>: I think so. Couple of seconds. >> Leo Reyzin: It's doable. It's not the thing you want to do all the time but it's doable; if you have no other choice it's doable. Okay. I'm probably going to skip this if I want to get a little bit to the computational thing. So [inaudible] and I have worked on making this protocol more efficient, essentially, and what I wanted to brag about is how easy the analysis is, but I'm going to skip it because it's not so easy. So, okay, so what about the computational analog. Because everything I talked about so far is information theoretic, why not make computational assumptions, see what we can do with them. I'll try to sketch one or two things very quickly. We'll see how far we get. Okay. So what is the computationnal analog of min entropy. It's known as hill entropy. It's defined by Hastad Impagliazzo, Levin and Luby, as the following thing. A distribution has hill entropy if it's indistinguishable from some other distribution that actually has true entropy. What do I mean by indistinguishable, take a circuit of a bounded size or tuition of bounded time, it cannot tell the two apart. Usual computational notion. So H hill of W good or equal to K, if there exists a Z that truly has K bits. That's hill entropy. You have to be careful because it got messier because there are two parameters that relate to indistinguishability, the size of the circuit that you're considering as a distinguisher, right, you have to be computationally bounded, and the distinguishing advantage. There's some probability that it will distinguish, so that's epsilon. So there's a delta-delta are the two parameters. There's quality here. Quality of distinguishing. So this entropy has quality and quantity. Before entropy only had quantity. Now it has quality and quantity. Because it's computational. There's no getting away from those annoying parameters. So what is this one good for? That's sort of the repeated question. Well, basically in any proof where you could use min entropy, if you have a bounded adversary you could use hill entropy roughly. Bounded adversary you can't tell you substituted one for the other because it's indistinguishable. Very, very high level idea. It works in a lot of places. So, for example, right, if you start with a uniform X, you apply pseudorandom generator to expand it out, what is the min entropy of the resulting random variable? It's just the length of X. Couldn't be better. Can't create stuff out of thin air. Can't create entropy out of thin air. But the hill entropy is the length of the output, not the lengths of the input. So you get a lot more hill entropy. >>: So there's one previously min entropy, if I give you this extra little bit of error correcting stuff ->>: Good, good. You're anticipating two slides from now. I'll tell you what the lemma is, what lemmas we can prove about this. >>: But your notion, the proof substance bing W, right, that you can't just substitute that proof ->> Leo Reyzin: Lemma -- I have not defined conditional hill. Even defining conditional hill causes some questions. So it's not -- you're right. You're right, it cannot. You're absolutely right. It also turns out you can apply the good old extractors we already have. Don't need to redesign them. If this thing only has hill entropy, not true min entropy, then this result will look close to uniform, to any circuit that is bounded. It will not actually be close to uniform, but it will look close to uniform. So any bounded circuit, it will not be able to distinguish. That's sort of the expected thing. Okay. What about conditional entropy? In real life entropy is always conditional. If you think about sort of the Diffie-Hellman secret G to the AB, the observer knows the G to the AB, still wants to say G to AB has entropy, right? That's intuitively what you want to be able to say. There's lots more examples. So how does conditioning reduce entropy? By the probability of condition, that we had from before, right? What happens with -- okay. So these are the examples. Here's the theorem that the same holds for computational entropy. Except you get degradation in two things. So that's David's question. You get degradation in quantity by exactly what you expect. But also in quality by the probability of the event. So if you're conditioning on a very surprising event, then things go bad, potentially. So degradation happens in two parts. Notice it's the same. Just we measure entropy logarithmically and we measure quality not logarithmically, same degradation, both things degrade. So this theorem is, this formulation is due to Ben Fuller and myself, but it's actually sort of a variant of the dense model theorem. If you have heard of dense model theorem it's cool. If you haven't, then it's not cool. >>: What's the dense model theorem? >> Leo Reyzin: It's actually this result, but in a very different formulation. I'll tell you off-line. It's a long story that I'm sort of almost out of time anyway. There's one more caveat that we cannot prove this theorem for hill entropy, which is really unfortunate. We can only prove it for a slightly weaker entropy notion. Given the time constraints I will not define that entropy notion. I'll just let you know it's a weaker notion and can be converted to hill. Actually, I'll fast forward to the slide with the simple message. Simple message is the lemma had about information theoretic entropy also holds for computational entropy. Just for different notion, but it can be converted to hill entropy, you can sort of convert back and forth with some losses. Now, again, this is for a specific value Y, right? But, again, we want to think about an average, right, not specific hemming weight but in general if you know the hemming weight then what, then there's a nice statement I will not go here. If Y has B bit strings, then this thing holds. It's a very clean statement but again about a somewhat messier notion of entropy that can be converts it to hill. Now that I've given you all this, let me show you one application and then stop because of time constraints. So what is this whole thing good for? Imagine that we start with a uniform X, we apply pseudorandom generator. We get uniform looking W. Imagine now that I'm worried about side channel attacks where the adversary finds out something about this entire computation, maybe something about X, maybe something about the inner workings, maybe something about W. There's some adversary that's allegedly a stethoscope, who gets L bits of leakage. And I somehow model the adversary by saying there's no more than L bits of information that the adversary gets. Okay. Well, it's no longer uniform looking. That's pretty clear. But that's not a problem because of my previous lemma that says up to some parameter losses and have to fight here unfortunately because of time, W conditioned on what the adversary sees still has entropy, just gets reduced by the number of bits of what the adversary sees. Okay. And it has hill entropy. And hill entropy can be extracted from. We know that from a few slides ago. So we can extract get a totally uniform looking V. The nice thing we notice these trapezoids, this is bigger than that. We can expand X a lot and then we can extract but not shrink it as much. So we got more bits than we started with. This idea is actually used by, fairly nice result, maybe you have heard at least the title, Jim Cosgy Pago [phonetic] from 2008 have result one of the very earlier leaky resilient crypto results, stream cipher resilient against side channel attacks, certain class of side channel attacks. The way they model the side channel attacks is by a bounded number of leakage. This is essentially their proof. There's a lot more going on in their result because trickier than this, but the heart of the proof is this. Except they don't have the nice notions of entropy to work with. They don't get the one slide proof that we have here, right? So this makes it easier. Okay. So that's the moral again. If you have the right notion of conditional entropy and you have the right lemmas, then you get proofs like this, even in the computational sense. So I'm going to skip this part. And I will go to the last slide. So min entropy is often the right measure for security as opposed to Shannon or something else. Conditional entropy we've seen is natural and a lot of very simple bit counting proofs you can do if you leak L bits, then this happens if you condition on a secret of this length and this happens, that sort of thing, it's very simple. But even the computational case you can still use it to simplify proofs. There's lots of open problems, if anybody's interested, talk to me. And in information theoretic case we actually have this kind of result. If you started with conditional entropy, you condition further you just reduce by the number of bits. In the computational case I have to start with unconditional in order for this result to hold. So it cannot be a Y one. That is -- I don't know how to prove it. It's annoying, should be able to. >>: What's the [inaudible] do you have a [inaudible]. >> Leo Reyzin: Yes. Here is a short way to tell you why. I did not pay her this time, really, I didn't. Sometimes, right, so what is the bad case for us? Where do the proofs break down? Imagine W is a plain text and Y1 is the public key. On average, the plain text has some entropy conditioned on the public key but for any fixed value of the public key there exists an adversary that has secret key and therefore has no entropy anymore. Does that make any sense. So if I give you the hypertext on the public key, it has logically, but if you happen to know the secret key then it does. If I fix any particular value of public key, then there exists what is a hard wire for every particular value Y1 I have no entropy. On average I do. This screws things up, because I only know how to deal with specific values of Y1, I don't know how on average. That's really annoying actually. That's the nutshell. It would be nice not to have to go through this weird notion of entropy that I sort of didn't tell you that condition H hill directly didn't have to convert. It would be nice to get exponential quality loss that happens here. But only quantity loss. And maybe we need a totally different notion of computational entropy. There's two more notions I skipped in this talk I could have talked to you about computationally. Nobody knows exactly what the right notion is, to make theorems look nice. Because that's what the right notion is about. And that's also open, lots of definitions, do we know anything about them and which one is the right one. All right. Thank you. [applause] >> Melissa Chase: Any other questions? Let's thank Leo again. [applause] >> Leo Reyzin: Thank you.