22778 >> Melissa Chase: Today we're very happy to have... He's a professor at BU. And he's done a...

advertisement
22778
>> Melissa Chase: Today we're very happy to have Leo Reyzin here visiting.
He's a professor at BU. And he's done a lot of work on things like extraction and
key agreement and leakage resilience and a whole bunch of other things. And
today he's going to be telling us about some notions of entropy and applications
to key agreement and leakage resilience.
>> Leo Reyzin: Thank you, Melissa. Thanks for inviting me. So entropy that I'm
going to be talking about is sort of a thread that ties everything together but that
doesn't mean that I don't want to tell you about the parts that it's tying together.
So feel free to ask questions about any of this. I want it to be interactive, and the
different pieces that I'm trying to fit together, we'll see how well it works.
Okay. So there are many, many ways to measure entropy. You may have
learned some of them in some of your classes. There's the classic Shannon
entropy. That's not the one I'm going to talk about, so if you don't know what it is
you're safe.
If I want to just, I'm the bad guy, I want to guess your password, obviously your
password has to have high entropy. But from what point of view should it have
high entropy. It should be hard to guess. That's really what we mean. We'll try
to define entropy throughout this talk as the probability that the address we
predict the sample from your distribution. We have some distribution we're
talking about.
And you look at the probability that the address views successfully predict a
sample on one attempt. And then while that probability is going to be obviously
less than one, so in order to make a log positive, we'll have to take a negative of
the log, that's just how the way you always deal with entropy, have to take a
negative of the log. If the probability stays at negative 80, you have 80 bits of
security.
>>: For the entropy, the max operation, why is the max is not here?
>> Leo Reyzin: I'm sorry?
>>: Usually for main entropy, it's minus log log.
>> Leo Reyzin: So I'll show you in a minute. I'll show you in a minute. But this
is -- all right. I'll write the formula in a moment.
But I want you to think of this as the philosophy behind everything I'll talk about.
Because there won't be just one definition, there will be many. But they will all be
about the probability that the adversary predicts the sample. That's the important
part. We want that probability to be small, for obvious reasons like passwords.
Okay so far? Wanting it to be small. Okay. So now if I actually draw a
distribution in capital letters, I'll try to be consistent, capital letters will denote
distributions, lower case letters will denote samples from those distributions. If I
have a distribution of passwords, you're the adversary, which one will you go for?
If you're moderately smart, you're not going to go for this one, you'll go for this
one, you'll go for the most likely password. So I take the maximum probability
event over all possible events in my distribution. I take the negative log of that,
that's the main entropy. That's the definition of min entropy. It's a very simple
definition. And the reason it's called min entropy is because if you took the max
outside of the minus sign, you would have a min.
Yeah? It's just a silly thing. Okay. So that's min entropy, and that's been around
for a very long time. What's it good for?
Well, so one I already talked about passwords. If you have a password, it better
come from a low min entropy distribution. No, sorry, high min entropy
distribution. If you have a password and you want sort of 80 bits of security, it
should come from a distribution with min entropy 80. So the maximum
probability should be two to the negative 80. What else? Well, it turns out it's
good for message authentication. This was observed some time ago by Renner
and Wolf. So let's say I have a key that does not come from uniform distribution
for whatever reason.
I can sample things uniformly. And I want to use that key as a message
authentication key. So that you and I share that key and it's a symmetric
messaging, I want to send you a message using that key.
So the way I'm going to think of this key it's going to be length N, split it into two
parts length N over two each. Each of those halves I will think of it as being in
the correct finite field.
And here's my message M. And I'm going to think of it as belonging to the same
finite field. The key is twice as long as the message in this case. And I'm going
to multiply it by the first part, I'm going to add the second part, and that's going to
be my message, that's going to be my MAC. My tag on the message. So it's A
times N plus B.
Have people seen this? How many people have seen this MAC construction, I'm
curious, in the audience, not everybody has seen it. This is a MAC construction
that actually goes back to Wagman and Carter [phonetic], all the way back to
1981 when they talk about universal hashing. Sort of there's a first paper of
Carter Wagman, universal hashing for algorithms and data structure purposes,
and they realize two years later that the same tool that they already developed
actually works for security.
And this is a message authentication code. So it turns out this is secure if A and
B are chosen completely uniformly at random. This has nothing to do with
entropy.
This is just uniform random key. This is going to guarantee that any adversary
who sees sigma and M cannot modify the message and come up with a different
sigma that verifies correctly.
But what if we don't have full entropy, what if we only have some K bits of
entropy in here? So think of the entropy somehow is distributed through this A/B.
It's not like certain bits have entropy and certain ones don't. I'll draw it like this so
we can visualize it. Imagine that all entropy is sitting in B and there's also in A.
And so the gap, the lack of entropy which I'm going to note by this sort of brick
color is N minus K. That's what's missing.
It turns out that this thing is still secure, and the amount of security you get is
what's in this sort of bricked off area.
That's your security. The security is the difference between N over 2 and the
entropy gap. In other words, what is this really saying? It's saying that you better
be at least half entropic. Right? And essentially there should be some entropy in
A and some entropy in B.
Because if either component has no entropy, then you clearly didn't need such a
complicated function. You could have done without it. But it says if either
component, if both components have enough entropy, this is going to work. I'm
not going to prove this result.
>>: That's for that specific.
>> Leo Reyzin: That's for that specific MAC. And in fact later I'll show you a
result that says that this is essentially the best you can do. If you have less than
half entropy, then message authentication is not possible. If you and I share a
key whose entropy is less than half of its length, we cannot authenticate a
message without somehow coming up with more shared randomness. That's
this key alone is not enough.
>>: Does that imply that you have to do entropy and somehow shorten the key
length, perhaps?
>> Leo Reyzin: So it would be nicer, right? It would be nice if the key, if the key
has entropy K, it would be nice if it also had length K. You're sort of wasting all
these G bits, right? But we don't have a nice generic way to shorten the key. So
it's obviously -- it's best to have keys that have full entropy. Still the case. It's
just that if you don't have quite have full entropy, if you're worried about not
having full entropy, you may still be okay.
Does that make sense? And this holds only with respect to min entropy. If you
have a different notion of entropy, it's not clear this is going to work.
And there's a proof of this in Maurer Wolf. And I'm not going to prove it here.
Does it make sense what I mean by security, what I mean by security? The
game is -- no? So please ask. So the game is you and I share this key. I want
to send you a message. Christine wants to mess with it.
So I send you the message. And she sees the message and the signature sigma
on the message. She doesn't know the key.
She wants to modify the message and the signature so that you still think it's
okay. And the way you're going to verify is by computing the same formula. And
seeing if it matches. And then the point is that this is not going to be possible,
and it's not going to be possible exactly with this probability. The probability of
success is 2 to the minus this. So this is the amount of bits of security you get.
>>: [inaudible].
>> Leo Reyzin: This Maurer Wolf paper, you know, I'll have to look it up for you.
I think it's in the information theoretic, I think it's in the privacy simplification
paper.
>>: This is the [inaudible].
>> Leo Reyzin: Yes, yes. All the results so far are information theory, there are
no computational assumptions we're making. We're not assuming hardness of
anything. Just assuming entropy of keys.
Other questions? Good. What else is it good for. This is the MAC. I just want to
keep it up there. It's also good, it turns out, for secret key extraction, provided -provided we have something. And I'll tell you exactly what I mean by have
something.
But once you extract the good secret key that has full entropy, then you can use
it for one-time pad encryption if you believe in hardness assumptions you can
use the random generator to expand it out, you can use all sorts of things. So
secret key extraction seems like a nice thing to have. What I mean by secret key
extraction, there's a primitive called an extractor that will take an input that has
reasonable amounts of entropy, it will take a uniform seed, and it will output
something that is uniform. How many people have seen extractors?
Yes. Almost everybody. Okay. Good. So this seems like cheating because I
have to give a uniform seed to get a uniform key, so kind of what's the point, if I
could do that, then why do I need this thing for?
But it's not cheating in the following sense, in the sense that the seed and the
output will be jointly uniform. This is known as a strong extractor, sort of the old
classic literature. What does that mean? That means that any adversary who
sees the seed still doesn't know the key.
Right? Doesn't know -- the key doesn't know the seed. So they jointly distribute
almost uniform epsilon close to uniform, whatever epsilon you want to set.
It essentially means the seed can be reused like a catalyst in a chemical
reaction. What's a catalyst? You throw it into a chemical reaction, reaction
happens, you take it out, throw it into the next chemical reaction. Same thing
with a seed. You need a seed. You need to throw it in in order to extract a good
key, but the seed is reusable. In particular it can be public, as long as it's uniform
and independent of this W, you're okay. So that's the important feature of
extractors. They do need uniform bits, but those bits can be public and reusable,
as long as they're independent of whatever input of the extractor you have.
>>: [inaudible].
>> Leo Reyzin: Right. And some construction, good point, some constructions
the seed can be much shorter, the seed can be logarithmic, the length of this, the
length of this, it can be really, really short. Actually, we won't be taking huge
advantage of that fact here, but it is a useful, it is a useful feature that you need
very little ran form randomness and you can extract.
So this is reusable. Extractors go back to the mid '80s although they were not
formally defined until the mid '90s really, but we've seen them. We know them
from then. Okay. So I just want you to remember those are three things that min
entropy is good for already without any extra work.
None of it is my work here. But then you can ask what about privacy implication,
how many people know what privacy implication is? Good. I'll define it then. So
privacy implication is the following thing: Alice and Bob have a shared W. This
is -- think of this scenario of message authentication that I was talking about
before. It's the same scenario you and I share the same secret key. And our
adversary knows something about it. Okay. And because the adversary knows
something about it, it's no longer fully secret.
It's partially secret, perhaps, because the adversary might not know everything
about it. So classic example is -- there's several classic examples. A really
classic example going back to the '80s, to the original motivation is quantum key
agreement, quantum key distribution. If I said things over a quantum channel,
the adversary may eavesdrop on some of them but not too many.
So the adversary may eavesdrop on some bits that I sent without modifying
them, but after a few there's a good probability they become modified and we
detect it. That's essentially what's going on. So the adversary knows something
about our shared key but not everything.
And there are more motivations today, sort of people are trying to do key
agreement where we take two cell phones and we shake them together. The cell
phones share some common information W about the way they were shaken.
But it's hard to argue that the shaking pattern was fully secret. Maybe partially
secret, but there's some base pattern that is probably the same for every
shaking.
So there's sort of some notion of entropy, right? That Alice and Bob share some
entropy in this key that nobody else has. But it's really hard to pin it down, know
exactly what Alice and Bob share. So that's the scenario of privacy implication.
And the goal is to agree on a uniform secret R. Because why? Because once
we agree on a good uniform secret, then we can use it as a one time pad, a
pseudo random generator, whatever. This is sort of the start of all cryptography,
once you have a uniform secret.
So that's the goal. The goal goes back, the first non -- sort of a nonconstructive
construction of [inaudible] and nonconstructive construction, existence proof.
And then back in '84.
Okay. And the simple solution, given sort of the terminology that we have today,
is just to use an extractor. Right? Alice can generate a random seed I, send it
over to Bob. We don't mind that Eve also gets I in the process. Because we
don't mind that the extractor seed is public. That's the beauty of extractors,
right?
And then Bob can do the same thing. So this makes sense. It's secure against
Steve who eavesdrop on the channel and knows something about W. It seems
to work, except in order for extractors to work, you have to know the entropy of
what you're extracting from. You certainly can't extract more than the entropy
you have. Extractors are designed to work for specific entropies. How do you
know the entropy you have? The entropy you have depends on Eve's
knowledge. We'll use Y to denote Eve's knowledge, right? So in order to know
what extractor applied it would seem like we have to to know the entropy and we
may not because it depends on what Eve knows.
So it seems like min entropy is not good enough anymore. So let me talk about a
different definition, a conditional definition of entropy. Let's think about it for a
moment. Imagine that what we have is a uniform key initially, perfectly good
uniform key. It's not like we're unable to generate uniform bits. It just so
happened, however, that Eve, through her devious ways, found out the hemming
weight of our key. The number of ones in it. So let's think about the entropy.
Well, let's say that Eve knows that the hemming weight is exactly N over 2. Half
the bits are ones. Why is the knowledge of Eve. So let's say Eve's knowledge is
this. The probability of this fact happens to be that. There's a bound. Just trust
me on the bound.
Well, if -- that fact is not terribly surprising, one over two root N is not terribly
surprising. So Eve's knowledge is not a lot. You know a lot when the knowledge
is surprising.
So because of that, we could prove -- this is a fairly straightforward proof -- that
the entropy of the original key conditioned on Eve's knowledge, this is Eve's
knowledge, is what it was originally minus the log of this. And the log of this is
half log N minus 1 or half log plus 1.
>>: This always half of->> Leo Reyzin: If I tell you that Eve's knowledge is X, then my probability
reduces, my entropy reduces by log of X. That's a fairly straightforward thing to
prove. Does that make sense? So if Eve's knowledge is -- so this is the easiest
to imagine condition, suppose Eve knows that the first bidder message is one.
Then her entropy got reduced by one bit. The probability of that fact is a half.
>>: Notion or [inaudible].
>> Leo Reyzin: It is. For min entropy it also works. Actually for min entropy it's
even easier to prove. Because you just look at the conditional probability, you
cannot jump by more than the probability of the condition. Min entropy is about
the max. The max cannot increase by more than the probability of the condition,
because the way the conditional entropies behave.
>>: [inaudible].
>> Leo Reyzin: I'm sorry, inequality ->>: Inequality, like when you say W given Y is greater than N minus whatever, so
the greater than equal to is because of the inequality of the ->> Leo Reyzin: I see. So if this was equal would this also be equal?
>>: Because it sounds from what you said that you can say that [inaudible] W
given Y is min entropy of W minus min entropy.
>> Leo Reyzin: No, no, it could be better. The entropy doesn't have to get
reduced by that much, because the probability of the -- it could be that W wasn't
initially uniform, their examples.
So imagine the following example: W wasn't uniform initially. W has the first bit
always equal to one. And Eve finds out that the first bit is equal to one. She
hasn't learned anything new, so it doesn't reduce the entropy. So you don't
necessarily get equality.
Okay. So that's if Eve knows that the hemming weight is exactly half, which is
most likely hemming weight for a given string. If Eve knows that the hemming
weight is N, well, that's a very, very unlikely event. So she suddenly learned a
lot. The string is all 1s. So the entropy has gone to zero.
But, you know, sometimes you don't want to reason about the sort of what if Eve
knows this or what if Eve knows that. You want to say conditioned on the
knowledge of hemming weight do I have entropy or not.
So you really want to sort of talk about the average in some sense, conditioning,
right? Not conditioning on the worst case, not conditioning on the best case, but
conditioning on the typically case that happens, how can you even talk about it.
So you can still try to do the same thing. Remember that min entropy is the log
of predictability.
You can still ask, run the whole experiment, and ask what is the probability that
Eve predicts my sample. So what is the experiment? The experiment is I pick a
random W, I give Eve its hemming weight, and Eve is incredibly smart. All
powerful.
I ask given that she knows the hemming weight, what is the probability that she
predicts my W? It's a meaningful thing to ask. And so that's what we want to
define the probability of predicting W given Y as our entropy.
And it turns out that the right thing to define is the expectation over all possible
Ys of the maximum of the probability that W is that for a particular Y. So
whatever Y Eve knows, whatever particular value of Y Eve knows, she's going to
go for the most likely W.
If she knows the hemming weight of this, she'll go for this W. If she knows the
hemming weight is that, she'll go for that W. But she doesn't control the
hemming weight. I choose W initially. It's the expectation over all Ys, it's not the
worst case Y; she doesn't control that.
>>: Why would you try to pick a competing sample, why not give one that's
optimally good for you [inaudible].
>> Leo Reyzin: So often you don't have the luxury of picking repeated samples.
Let's say you shook your phone, you don't want to tell users to shake phones a
million times, right, and often you don't know that Eve knows the hemming
weight. It's very hard to bound exactly what Eve knows. It's assumption, Eve
knows no more than some number of bits and hope that you're right.
I mean, this is too clean an example for real life.
>>: But [inaudible] make these assumptions, you could sample ->> Leo Reyzin: Then I would sample -- if I know Eve gets the hemming weight I
would sample things with hemming weight full, I mean I'm sorry, half, hemming
weight half. And that's it. I would not even bother. Sometimes I don't have that
luxury.
>>: [inaudible].
>> Leo Reyzin: So W is some knowledge of Eve. It's a random variable,
correlated with W. W and Y are correlated random variables.
>>: [inaudible] like a random variable with some distribution?
>> Leo Reyzin: Yes, exactly. Y and W are correlated random variables. And
you can define this notion. And so this was defined by [inaudible] and Smith way
back in 2004 as we called I think average conditional entropy or something like
that. And then, of course, at the end you have to take the negative log as
always. When you talk about entropies you take logs, just easier so you don't
have to deal with really small numbers.
Notice that average min entropy is not the average of min entropy. The
expectation lives inside the log, not outside the log. Makes a big difference. So
a good counterexample to consider. Imagine you have a system where for half
the cases min entropy is 0. Eve knows everything. For half the cases min
entropy is a thousand.
What's the probability that Eve guesses? It's about a half, right. So you don't
want to average 0 and a thousand and get 500 bits of entropy. You don't have
500 bits of entropy, if half of the time you have zero and half the time a thousand.
Shannon entropy doesn't behave this way. In Shannon entropy, for those who
know Shannon entropy, if half the time you have 0 bits of Shannon entropy and
half the time you have a thousand bits of Shannon entropy, then on average
Shannon says you have 500 bits of entropy. You don't here. Not from security
point of view. If you sort of look at a lot of older papers doing the kinds of stuff
I'm talking about here, they consider Shannon entropy because they didn't have
min entropy to talk about and they would prove things like the resulting key has
500 bits of Shannon entropy. Well, that's not interesting. 500 bits of Shannon
entropy means it could be 0 half the time.
That's not good enough. And so really took a while for people to converge to
better notions of entropy than Shannon. So take the log after you've done
everything. That's the moral of this. Does that make sense?
>>: Based on a distribution applied?
>> Leo Reyzin: Right. For any pair of distributions YW, I can specify -- I can
measure. Of course if I don't know anything about the distributions, then it's bad.
So I'll give you examples. I'll give you examples, I promise.
Okay. So now we have this average min entropy. And we can ask the same
question, what is that one good for. Had the original min entropy, what's this one
good for, the average case. You can still prove essentially directly from the
definition. If you have an adversary who knows Y, and your password is W, then
the probability of guessing it is exactly minus to the entropy. If you think about it,
this is a better way to model passwords. The adversary knows something about
you. That's the Y. You pick the password; that's the W.
It's always conditional. Life is never sort of you live in a vacuum, there's no
information that the adversary has. This is the right way of modeling a password.
There's a W you know, there's the Y the adversary knows, and they may be
correlated in some way.
The adversary may know your gender, age birthdate, whatever, that's the
correlated stuff. Turns out that the same message authentication code that I
talked to you about earlier also works for average entropy.
The probability that I think Christine was our adversary, the probability that
Christine will be able to modify the message and the signature and will still verify
is exactly, is exactly two, whatever it was before, except you substitute average
entropy for the entropy, which is nice. And the nice thing that was proven by
Sylvia Vodon two months ago, literally, is that every extractor that works for
entropy also works for average entropy. I think with the loss of three bits or
something like that. But essentially -- three is the actual number. I'm not
exaggerating.
>>: [inaudible].
>> Leo Reyzin: This one, whatever I just defined, I call it average entropy, or
people call it conditional entropy. But I think conditional min entropy is probably
the right notion, the right way to call it.
So we knew before that some extractors worked and some worked with a fairly
big loss. So we proved that no loss is needed, you lose three bits and that's it.
So all extractors will work for this notion. So, right, this whole thing, and so what
I mean by extractor work, I mean the output of my extractor will look uniform
even to the adversary who knows Y.
So there's an adversary who has this additional information, the output will still
look uniform even to that adversary. It will be statistically close to uniform, even
given that extra information.
So that implies that privacy simplification works. Remember what motivated us,
the question of Alice and Bob and Eve and Alice and Bob share something W
and Eve knows something about it. Now finally we can say that we can put a
rigorous condition on what we want in order for privacy simplification to work.
What we want is that the conditional entropy of W conditioned on Eve's
knowledge should be high. And if it is, then we're good, then an extractor will
work.
>>: You still have to sort of say I believe Eve only knows Y, and that's ->> Leo Reyzin: Correct. Yes. Yes. Yes. You have to say -- and I'll show you
how to bound that, how to be able to say that in a moment.
>>: Large number of trials.
>> Leo Reyzin: No, no repeated independent experiments at all. So if you and I
share a W and we're worried who is going to be the next adversary, David, we're
worried that David knows something about our W, but we believe that there's
some entropy left in W conditioned on his knowledge, then we're okay.
>>: This setting reminds me of the sort of work of [inaudible] key exchange. So
is it exactly the same setting?
>> Leo Reyzin: Same setting, except password authenticated key exchange -W is the shared password. Think of this as the client. This is the server. They
share password W. In that setting they don't mind making computational
assumptions. This is information theoretic. So the first talk of this talk is
information theoretic. We'll see how far we get into computational.
So password authenticated key exchange also solves this problem. The nice
thing about this is this can be very efficient. Extractors can be incredibly efficient
construction's, that kind of product, no fancy math required.
>>: The Vadon's theorem, is that only when you're talking about Eve knowing the
hemming weight, or does it work like other stuff?
>> Leo Reyzin: No, for any -- the rigorous statement, as rigorous as I'll make it
on the slide is this. If the conditional entropy is equal to K, then the -- then for
some K an extractor was designed for that specific K, then the output would be
jointly uniform.
So for any Y, not just hemming weight. For any Y, as long as the conditional
entropy of W conditioned on Y still remains high. If you look at extractor
literature, it says this extractor will work, this is the K epsilon extractor, means it
works with anything for min entropy K. It proves, if it was a K epsilon extractor
for unconditional entropy, then it's also a K epsilon extractor for conditional or
maybe K plus 3 epsilon, something.
>>: So this is -- so this extractor, the single extractor will work for a large number
of distribution [inaudible].
>> Leo Reyzin: Exactly. No matter what Y is as long as W has entropy
conditioned on Y.
>>: Then, in particular, if you don't have some sample Y, be able to decide
membership.
>> Leo Reyzin: Y doesn't have to be efficient in any sense. It's all purely
information theoretic. It's really cool. And, moreover, it's like a two-page, a page
and a half proof with all the details. It's nice.
I can find the pointers if you're curious. So the point is we don't need to know
what Eve knows, we just need to know that she doesn't know too much, and then
we can apply an extractor.
And, by the way, so these ideas about sort of Eve knowing something, they go
back to the '80s and the first papers people designed were what if Eve knows this
thing and what if Eve knows that thing? And slowly they realize the techniques
are general enough, but it took a while.
In modern language, it's all trivial. But it took a lot of work to get there, and took
a lot of definitions of the right notions.
Now we'll move to a harder problem: Information reconciliation, information
reconciliation is like the previous setting except life is only a little bit worse in that
what Bob has is not actually identical to Alice, but it's somewhat different. Maybe
not too different.
So if you shake two phones, right, the accelerometers are not going to measure
the exact same thing because of calibration issues, orientation issues and
whatnot.
If you do quantum key distribution, some quantum bits could flip along the way
because of, I don't know, cosmic rays or whatever causes cosmit rays to flip. I'm
making stuff up.
And the problem also goes back to that -- so everything goes back to Vadhan,
[inaudible] Roberts. This is the like the third time I have to cite this paper. Right?
They already considered that.
Okay. And what do I mean by symetric [phonetic], for now let's think of hemming
distance, the number of bit flips. There are more general definitions. But for this
talk I'm going to think about hemming distance. Okay. So how do we do this.
There's a generic technique that people have thought about.
I'm going to apply some algorithm called S for sketch. S stands for sketch to my
W. I'm Alice now. I'm going to send some error correcting info, S to Bob. And
then I'm also going to extract. So my W is going into algorithms. The error
correcting algorithm and the extractor algorithm. Before it was just extractor.
Now I'm doing for both.
And I send both things, the extractor seed and the sketch, over to Bob. Bob is
going to recover W using the error correcting information. I'll show you how to do
that in a moment. I'll show you a concrete example of this error correcting
information but not yet. And this thing will get extracted, and they'll extract the
same thing.
So that's the big idea of how to do this sort of thing.
Well, how long an arc can you extract? That's, of course, what you want to
know, what kind of extractor you can use.
And how long an error you can extract depends, of course, on the entropy, right,
of W, conditioned on what Eve knew before, even started she knew Y, and what
she finds out now, which is also S.
So now you've added something, right, to Eve's knowledge by sending that error
correcting information.
So here's a nice lemma that says that if you condition on a new variable, you are
reducing by the bit length of that variable by no more than the bit length of that
variable.
So it intuitively makes sense, if I have 100 bits of information that's secret from
Eve and I said three bits, Eve ought to have no more than 97 bits. It should be,
rather, Eve ought to have no more than 3 bits, should still have 97 bits of
secrecy. Intuitively, it's the right statement. The fact it's actually the right
statement is nice, the fact you can prove it.
>>: I'm sure we have that S on the right-hand side, S inside?
>>: Yes.
>>: So it's a raised condition on I?
>>: In other words, if this information was entirely uncorrelated with W, imagine S
is three bits of information entirely uncorrelated with anything, then conditioning
on it shouldn't reduce the entropy. And, indeed, it won't because it will increase
this side and it will decrease this side, and it will allow -- I'll show you an example
how this is used. It's actually very nice.
Does it make sense? So the entropy of W when it's conditioned on an additional
thing, gets reduced by exactly the bit length of that thing but increased if that
thing is independent of W to begin with.
And that holds even if W was initially conditioned. You can keep conditioning
and conditioning and it will work. So why is this useful? I actually want to show
you an instantiation of this protocol and how easy the analysis becomes given
this lemma.
So I'm going to show you how to build this error correcting information. This has
nothing to do with entropy. It's skewed enough but the analysis will be very nice
after it, so I think it's worth it->>: Where can I find this lemma?
>> Leo Reyzin: I don't want to scroll all the way back. It's in the same paper of
myself, Pestrofski, Doris and Smith [phonetic], where we define this notion. So
we define the notion and prove this lemma.
I can send you the link. So, okay, so let's build this S. We're going to build it, not
surprisingly, out of error correcting codes. The way we'll think of error correcting
codes, a code is sort of maps N bit messages into N bit code words. That's the
way we're going to think of it.
So here are all the messages, and this is the entire space of N bit messages.
This is 2 to the N possible code words, much larger space of 2 to the N. They're
all far apart. That's the way to think of codes.
And any two code words will differ in at least these locations. That's the nice
feature of error correcting codes. Therefore, if you have fewer than D over 2
errors you know exactly where things came from. Okay. So that's just the
background on what I'm going to be by error correcting code.
So how do we then use this error correcting code to actually transmit this error
correcting information? What if we know, you know, I'm Alice. So I have my W,
right? You have W prime. What if you happen to know that my W is a code
word? Well, as long as your number of errors W prime is not too big, you can
just decode to that code word. You don't need any information from me. If I
have a code word and you have a corrupted code word, decode, we're done.
Well, that's not the case, of course, always in life. But what we're going to do is if
my W is not a code word, I'm going to shift the entire code to make it a code
word. What do you mean by shift? I mean, I'm going to take a random code
word. This is a random code word. Error correcting encoding over random
message. That's a random code word, and I'm going to shift it since we're in the
binary case, just XOR to random a W, to my W. That offset is what I'm going to
send to you.
How do you decode? So this is what I send, you're right, this shift. How do you
decode? Let's just forget about the formula. Think about the picture. You have
a W prime, right? You shifted back by the offset. You decode. And you shifted
forward by the same offset. Shifting backward and shifting forward in the binary
case is the same thing, it's all XOR, but logically that's what's happening.
Because that's what I've told you is happening. In that offset my W is a code
word. You have to offset things and work it out.
This construction is due to Jules and Wattenberg [phonetic] in 2002. But there
wasn't really a nice analysis of it, because they didn't have the notion of entropy.
So there was some analysis of it, but let's actually analyze it the way I want to
analyze it because it all fits into a one slide and I can prove a strong thing about
it, whereas before we didn't know how to do it.
Alice does on her own she extracts R and she also sends this information over to
Bob, a random code word, XOR'd with her W, that's the information.
And we have this lemma that I already showed you that this is the amount of
entropy we have. And now just using this lemma I can tell you how big R can be.
Think about it. How much entropy do we have? Well, what happened? There
was some entropy to begin with.
And then there was randomness that went into here. How many bits of
randomness went into here. Well, the length of the random thing that we
encoded.
>>: What is R?
>> Leo Reyzin: R was a random value that we encoded to get a random code
word. We took a random R and slide the error correcting encoding function to it,
to get it.
So this, whatever was originally there plus R, because we added that in. And the
bit length of S is N. N is length of R W. And N is always the length of RW. Let's
do some more bit counting. This is N because the code goes from N bits to N
bits. This is N. This is it. So the entropy of what we have conditioned on what
Eve has is whatever we had originally plus or minus N. If you have a good code
then the entropy loss due to this extra step is M minus N. If people give me
better codes, I get a better result, and I can hand off the problem to error
correcting people and tell them give me the good codes. Good codes are the
ones where the distance between M and N is not too big, and where the coding
is not that much longer than the word.
>>: This is the entire analysis in one slide. That's a nice thing. I know exactly
how many bits I can extract, whatever entropy I had initially minus -Yeah? Sorry?
>>: So the reason you do this is that you don't want your secrecy to depend on ->> Leo Reyzin: Well, I don't know the distribution of W. I could also shift it to -it's trickier to analyze, but I could shift it to a fixed -- can't shift to a fixed. In a
linear code I could do other things.
>>: What if I already know the distribution of -- do I still need to do the shift?
>> Leo Reyzin: Yes. And distribution is not enough.
>>: Why is this shift in time existing in the original [inaudible].
>> Leo Reyzin: It does, implicitly. It instantiates, but it's in there. I'll show you
off line. But it is in there. They use systematic codes, different trick. Systematic
code and send over the tail but it's all equivalent because of duality of error
correcting codes. It's really the same thing, really.
Okay. So let's consider the case of an active adversary. So everything we've
done so far but now Eve can actually modify the messages between us. Right?
That seems trickier. Right. So what we want to say is, look, if we catch Eve
doing this, then the best thing we can do is give up. This is sort of like the key
agreement talk we had one hour ago. And this goes back to Maurer from '97.
And notice this is interesting given the two things are the same, because the trick
is to get the extractor seed across. If I can get the extractor seed across, make
sure Eve does not modify it, then I'm okay. That's really the main trick. Because
Eve gets to modify stuff. So how do I get it across and make sure I detect
modification. And the problem is that we cannot use our -- the only we have to
begin with is W. And we cannot use it as a MAC key if it's not at least half
entropic, that's what I showed you at the very beginning. It can be used if it's half
entropic, so what do I do?
>>: [inaudible] when W is equal ->>: So actually I'm not going to invest in equal case, but essentially you saw what
this weighs, solve it the same way as we solve this previous things adding some
authentication in. So the idea due to Maurer Wolf is to use interaction in order to
do this. And so -- their work was for the case when W is equal to W prime. And
Bob and I had follow-up work that made W not equal to W prime. And still made
it work. So I want to show you how they authenticate one bit of the extractor
seed, and you will believe me they can authenticate the entire extractor seed,
because it's just bit by bit.
So how do they authenticate one bit of extractor seed? Bob sends a challenge,
and that gets -- Alice supplies an extractor on W to that channel. This is not the
ultimate extractor they want to use, this is just an extractor as a tool.
>>: With the modify the channel?
>> Leo Reyzin: Yes, yes, yes. Correct. We'll see exactly what happens. Okay.
And then Alice wants to authenticate a bit B. What she's going to do she's going
to send B, extract the result if the B is one actually otherwise she'll just send
zero, no extracted result. And Bob is going to accept if this is correct. So you
stare at this a little bit. It makes absolutely no sense, we'll try to make it make
sense for a moment. Eve can make Bob's view not equal to Alice's view. Eve is
perfectly capable of taking the extractor seed that Bob sents, making it into a
different extractor seed. Nothing stopping from you doing that. Alice will extract
some P prime. In fact, extractors don't guarantee anything. She could even take
this T prime and modify to a T that Bob will accept. Eve could do that evil thing.
But that doesn't concern us. What concerns us is that Eve cannot change 0 to a
1 what's a 0, 0 is when Alice sends nothing and a 1 is when Alice sends
something. And all we want to prove is that it's hard to change something to -nothing to something, right?
So T has entropy. And there is -- this thing is acting up. Okay. Wait, where
were we? Here we were. To prevent change 1 from 0 make it the equal to
number of ones, same as before. B is a bit. Authenticate a single bit for now.
Eventually I want to authenticate a whole long extractor seed I. But I'm going to
do it one bit at a time. So I'll only worry for now changing of 0 to 1 because in my
big extractor seed I'll make it balanced and then so I'm not going to worry about
this one. So even though Eve can make the two views unequal she cannot
change logically the bit that Alice is sending. Alice is sending some bit logically
over to Bob using this trick. So how do we prove that?
>>: Wait. You said that Bob accepts a 1 if the extraction is current.
>> Leo Reyzin: Yes, sorry.
>>: Always accept -- like if he gets a 0 does ->> Leo Reyzin: He accepts. If he sees a 0, thank you for sending me 0. If you
see 1 bit, you verify that extraction is correct.
>>: Eve has no W.
>> Leo Reyzin: Eve doesn't know W. That's the only thing Eve doesn't know.
The problem is we're trying to agree on a key from something that Eve doesn't
know and we have to use that something in order to agree on a key.
Chicken-and-egg sort of problem.
Okay. So here's the lemma from a paper by Bob and myself. The entropy of the
extracted value, this value that Bob, that -- this value T right here. The entropy of
the value T conditioned on everything Eve knows including X is sufficiently high.
How high? Well, the 2 parameters in extractors, the output length and how close
the output is to uniform.
So if the output is far from uniform it's not a very good thing. But if output is close
to uniform, it's exactly as long as the extractor output minus 1.
So now what does this lemma give us? This lemma immediately gives us that if
Eve sees X and whatever other knowledge she had as long as W had entropy,
this whole thing, this T is going to have entropy. If Eve sees X then T is going to
have entropy and if Alice is sending a 0, she's not going to respond. So she
cannot possibly reduce the entropy. If she's sending a 1 that entry may get
reduced by the response. If she sends 0, entropy cannot be reduced. For Eve to
guess T is going to be hard. Interesting, hand waving causes -- [laughter] -there may be a security question here somewhere.
So this is going to work, this lemma holds, of course, as long as there's enough
entropy in W to begin with. You can't extract something out of nothing. There's a
precondition to the lemma, that W has enough entropy. But we know how much
entropy W has, because we know how much it has initially. We know how much
each authenticated bit reduces. Remember, we're authenticating many bits one
at a time this way, but we know how it gets reduced from the previous lemma.
That each time you give T bits of information it reduces by T. So as long as
there's enough total, we'll be okay.
So this is a very simple analysis of a protocol that did not have such a simple
analysis. And then Alice is basically the extracted output has entropy.
>>: Different [inaudible].
>> Leo Reyzin: Yes. For every bit this thing is repeated, not a particularly
efficient protocol. But I'll -- I can tell you off-line about different follow-up work
that makes it better. But essentially for every bit I send you a fresh random
challenge you send me a response, fresh challenge, response and extractor
seed can be logarithmic, it's not terrible. Right. So it's a multi-round thing.
>>: What was it using to authenticate?
>> Leo Reyzin: So why are we doing this in the first place? We're doing this
because we want to do a privacy simplification, the way we do privacy
simplification is by me sending a seed over to you, seed I. Send seed I over to
you by doing one I bit at a time. That's going to -- each time I'm going to come
up with this fresh X. I'm going to do challenge response. You see if it's a 0 or 1.
You go into the next bit.
We actually have an implementation that's not terrible. What was it? Well under
a second somewhere, right?
>>: I think so. Couple of seconds.
>> Leo Reyzin: It's doable. It's not the thing you want to do all the time but it's
doable; if you have no other choice it's doable.
Okay. I'm probably going to skip this if I want to get a little bit to the
computational thing. So [inaudible] and I have worked on making this protocol
more efficient, essentially, and what I wanted to brag about is how easy the
analysis is, but I'm going to skip it because it's not so easy.
So, okay, so what about the computational analog. Because everything I talked
about so far is information theoretic, why not make computational assumptions,
see what we can do with them. I'll try to sketch one or two things very quickly.
We'll see how far we get. Okay. So what is the computationnal analog of min
entropy. It's known as hill entropy. It's defined by Hastad Impagliazzo, Levin and
Luby, as the following thing. A distribution has hill entropy if it's indistinguishable
from some other distribution that actually has true entropy. What do I mean by
indistinguishable, take a circuit of a bounded size or tuition of bounded time, it
cannot tell the two apart. Usual computational notion. So H hill of W good or
equal to K, if there exists a Z that truly has K bits. That's hill entropy. You have
to be careful because it got messier because there are two parameters that
relate to indistinguishability, the size of the circuit that you're considering as a
distinguisher, right, you have to be computationally bounded, and the
distinguishing advantage. There's some probability that it will distinguish, so
that's epsilon. So there's a delta-delta are the two parameters. There's quality
here. Quality of distinguishing. So this entropy has quality and quantity. Before
entropy only had quantity. Now it has quality and quantity. Because it's
computational. There's no getting away from those annoying parameters.
So what is this one good for? That's sort of the repeated question. Well,
basically in any proof where you could use min entropy, if you have a bounded
adversary you could use hill entropy roughly. Bounded adversary you can't tell
you substituted one for the other because it's indistinguishable. Very, very high
level idea. It works in a lot of places.
So, for example, right, if you start with a uniform X, you apply pseudorandom
generator to expand it out, what is the min entropy of the resulting random
variable? It's just the length of X. Couldn't be better.
Can't create stuff out of thin air. Can't create entropy out of thin air. But the hill
entropy is the length of the output, not the lengths of the input. So you get a lot
more hill entropy.
>>: So there's one previously min entropy, if I give you this extra little bit of error
correcting stuff ->>: Good, good. You're anticipating two slides from now. I'll tell you what the
lemma is, what lemmas we can prove about this.
>>: But your notion, the proof substance bing W, right, that you can't just
substitute that proof ->> Leo Reyzin: Lemma -- I have not defined conditional hill. Even defining
conditional hill causes some questions. So it's not -- you're right. You're right, it
cannot. You're absolutely right.
It also turns out you can apply the good old extractors we already have. Don't
need to redesign them. If this thing only has hill entropy, not true min entropy,
then this result will look close to uniform, to any circuit that is bounded. It will not
actually be close to uniform, but it will look close to uniform. So any bounded
circuit, it will not be able to distinguish. That's sort of the expected thing.
Okay. What about conditional entropy? In real life entropy is always conditional.
If you think about sort of the Diffie-Hellman secret G to the AB, the observer
knows the G to the AB, still wants to say G to AB has entropy, right? That's
intuitively what you want to be able to say. There's lots more examples. So how
does conditioning reduce entropy?
By the probability of condition, that we had from before, right? What happens
with -- okay. So these are the examples. Here's the theorem that the same
holds for computational entropy. Except you get degradation in two things. So
that's David's question. You get degradation in quantity by exactly what you
expect. But also in quality by the probability of the event.
So if you're conditioning on a very surprising event, then things go bad,
potentially. So degradation happens in two parts. Notice it's the same. Just we
measure entropy logarithmically and we measure quality not logarithmically,
same degradation, both things degrade.
So this theorem is, this formulation is due to Ben Fuller and myself, but it's
actually sort of a variant of the dense model theorem. If you have heard of dense
model theorem it's cool. If you haven't, then it's not cool.
>>: What's the dense model theorem?
>> Leo Reyzin: It's actually this result, but in a very different formulation. I'll tell
you off-line. It's a long story that I'm sort of almost out of time anyway.
There's one more caveat that we cannot prove this theorem for hill entropy,
which is really unfortunate. We can only prove it for a slightly weaker entropy
notion. Given the time constraints I will not define that entropy notion. I'll just let
you know it's a weaker notion and can be converted to hill.
Actually, I'll fast forward to the slide with the simple message. Simple message
is the lemma had about information theoretic entropy also holds for
computational entropy. Just for different notion, but it can be converted to hill
entropy, you can sort of convert back and forth with some losses.
Now, again, this is for a specific value Y, right? But, again, we want to think
about an average, right, not specific hemming weight but in general if you know
the hemming weight then what, then there's a nice statement I will not go here. If
Y has B bit strings, then this thing holds. It's a very clean statement but again
about a somewhat messier notion of entropy that can be converts it to hill.
Now that I've given you all this, let me show you one application and then stop
because of time constraints. So what is this whole thing good for? Imagine that
we start with a uniform X, we apply pseudorandom generator. We get uniform
looking W. Imagine now that I'm worried about side channel attacks where the
adversary finds out something about this entire computation, maybe something
about X, maybe something about the inner workings, maybe something about W.
There's some adversary that's allegedly a stethoscope, who gets L bits of
leakage.
And I somehow model the adversary by saying there's no more than L bits of
information that the adversary gets. Okay. Well, it's no longer uniform looking.
That's pretty clear.
But that's not a problem because of my previous lemma that says up to some
parameter losses and have to fight here unfortunately because of time, W
conditioned on what the adversary sees still has entropy, just gets reduced by
the number of bits of what the adversary sees. Okay. And it has hill entropy.
And hill entropy can be extracted from. We know that from a few slides ago. So
we can extract get a totally uniform looking V. The nice thing we notice these
trapezoids, this is bigger than that. We can expand X a lot and then we can
extract but not shrink it as much. So we got more bits than we started with.
This idea is actually used by, fairly nice result, maybe you have heard at least the
title, Jim Cosgy Pago [phonetic] from 2008 have result one of the very earlier
leaky resilient crypto results, stream cipher resilient against side channel attacks,
certain class of side channel attacks. The way they model the side channel
attacks is by a bounded number of leakage. This is essentially their proof.
There's a lot more going on in their result because trickier than this, but the heart
of the proof is this. Except they don't have the nice notions of entropy to work
with. They don't get the one slide proof that we have here, right? So this makes
it easier.
Okay. So that's the moral again. If you have the right notion of conditional
entropy and you have the right lemmas, then you get proofs like this, even in the
computational sense. So I'm going to skip this part.
And I will go to the last slide. So min entropy is often the right measure for
security as opposed to Shannon or something else. Conditional entropy we've
seen is natural and a lot of very simple bit counting proofs you can do if you leak
L bits, then this happens if you condition on a secret of this length and this
happens, that sort of thing, it's very simple. But even the computational case you
can still use it to simplify proofs. There's lots of open problems, if anybody's
interested, talk to me.
And in information theoretic case we actually have this kind of result. If you
started with conditional entropy, you condition further you just reduce by the
number of bits. In the computational case I have to start with unconditional in
order for this result to hold. So it cannot be a Y one. That is -- I don't know how
to prove it. It's annoying, should be able to.
>>: What's the [inaudible] do you have a [inaudible].
>> Leo Reyzin: Yes. Here is a short way to tell you why. I did not pay her this
time, really, I didn't. Sometimes, right, so what is the bad case for us? Where do
the proofs break down? Imagine W is a plain text and Y1 is the public key. On
average, the plain text has some entropy conditioned on the public key but for
any fixed value of the public key there exists an adversary that has secret key
and therefore has no entropy anymore. Does that make any sense.
So if I give you the hypertext on the public key, it has logically, but if you happen
to know the secret key then it does. If I fix any particular value of public key, then
there exists what is a hard wire for every particular value Y1 I have no entropy.
On average I do. This screws things up, because I only know how to deal with
specific values of Y1, I don't know how on average. That's really annoying
actually. That's the nutshell.
It would be nice not to have to go through this weird notion of entropy that I sort
of didn't tell you that condition H hill directly didn't have to convert. It would be
nice to get exponential quality loss that happens here. But only quantity loss.
And maybe we need a totally different notion of computational entropy. There's
two more notions I skipped in this talk I could have talked to you about
computationally. Nobody knows exactly what the right notion is, to make
theorems look nice. Because that's what the right notion is about. And that's
also open, lots of definitions, do we know anything about them and which one is
the right one. All right. Thank you.
[applause]
>> Melissa Chase: Any other questions? Let's thank Leo again.
[applause]
>> Leo Reyzin: Thank you.
Download