16059 - Msecnd.net

advertisement
16059
>> Kristin Lauter: Okay. So today we're pleased to have Professor Alexandra Boldyreva visiting
us from Georgia Institute of Technology, and she will speak about deterministic and efficiently
searchable encryption. Thank you.
>> Alexandra Boldyreva: Thank you very much, Kristin. Can you hear me well? Is the mic on?
Okay. Good. Thank you very much for inviting me here. I changed a little bit the title and the talk
just slightly from what was announced in the abstract. I hope it's okay. I included some new and
probably more applied things.
This talk is based on several works I did jointly with Mihir Bellare, Serge Fehr and my student
Adam O'Neill. These are the talks I will cover mostly. But if I have time, I hope I will. I will also
include some other things which would run together with students at Georgia Tech (inaudible)
Nate Shenet (phonetic) and Una Nee (phonetic) who is actually visiting post doc. So the plan of
my talk I'll tell you about the focus and motivation for looking at these things, then we'll talk about
the problem of defining security for the deterministic encryption and why it's a problem and then
we'll see very efficient and secure constructions. Some will be in the random Oracle model and
then we'll see other constructions which are slightly less efficient, well, some of them more less
efficient. But they're securer without relying on the random Oracle model. And we'll see some
general constructions and extended general constructions and we'll discuss they're specific
instantiations.
And, finally, we'll talk about more general primitives, efficiently searchable encryption, which
doesn't have to be deterministic, and, finally, we'll discuss the problem of deterministic encryption
in the symmetric setting, because most of the talk will be focus on the asymmetric setting. And I
also want to talk about more flexible database queries and encryption schemes designed
specifically for this problem.
Okay. So let's go with this. We'll start with the main topic. So classical security definitions for
encryption. If you're familiar, it's good. If not, it's also okay. But if you notice the standard
notions and distinguishability against chosen plain text attack and chosen attacks, IDCC, they
require encryption scheme to be randomized. And so why is that? Because these are really
strong properties they say encryption shouldn't leak any partial information about the message.
Should be very, very secure but deterministic encryption and very deterministic encryption, when
the encryption algorithm doesn't use any randomness, if you encrypt the same message twice
you'll get the same cypher text with deterministic encryption. Such encryption does leak
information inherently, just that information. You can see when the same message is encrypted
twice.
And because of this, in my lectures I teach my students that deterministic encryption is not good.
Remember that. It just cannot be good. It has to be randomized. And, in fact, it's not a problem
that there are tons of efficient probabilistic randomized encryption schemes out there. But in my
talk I'm going to look at deterministic encryption, which I usually say is no good.
And so why is that? Why am I doing this? There is some reason for doing this, because
database researchers at Georgia Tech brought some practical problem to me. I didn't know
about it before. And the problem is fast search on encrypted remotely stored data. We'll discuss
it in more detail.
And it turns out that deterministic encryption is what can be very useful there. So there is some
reason to look at this. So we'll talk about this more. Other than that, there are some other
reasons why it may be interesting.
Randomized encryption, the cypher text has to include randomness so it's longer than the
message but deterministic encryption in principle can be length preserving, which may be useful,
again, in some particular applications.
Late, after we started this, we just discovered some relation to some other notion which is called
convergent encryption used for totally different problem for secure storage and checking
consistency of the storage.
And just it's interesting from a theoretical point of view and from historical standpoint because
when public encryption appeared it was first designed in the form of deterministic encryption.
Okay. So let's look at the main application I mentioned. Outsource the databases. So what's the
problem? So apparently nowadays it's becoming more and more popular for companies and
organizations to outsource data storage and, most importantly, management, to external service
providers.
So it's cheap to store, but to manage it better, let it do the specialists. But this external service,
external service providers do not have to be fully trusted. So I may trust them to store and
manage my data, but I don't want them to read everything.
And security may be one required by law like, for example, in the case of medical data. So what
is the setting? Mostly in my talk I will look at the public setting, but at the end we'll talk about the
symmetric setting.
So what is the setting? We have the external service provider database server and anyone who
knows the public key of some distinguished receiver can submit data. For example, like nurses,
pharmacists can submit some records about some patients to the database. And then the doctor
who has the secret key should be able to query the database, ask it for specific records, get them
back and decrypt and read the data. So this is the setting.
And we want to do it securely. Okay. So I guess you can ask me questions any time, I don't
mind.
>>: So would it suffice just to keep the hash data along side the randomized encrypted data?
>> Alexandra Boldyreva: Good point, yes. And we will -- yes, I will -- it's one of the solutions
which was actually, of course, proposed, mentioned by the database scientists.
Right. So I will talk about it. It's a solution -- it will not be -- it doesn't have to be much better, but
it's good and we'll discuss it at the end. Not at the end, but after, yes, it's there.
>>: If really anyone can put data in, it's a nightmare of data corruption and pollution.
>> Alexandra Boldyreva: So, we, of course, in reality we would assume the database server will
also have some kind of authorization service on top. But it's an extra layer. But by anyone,
anyone good who we want to.
>>: That's easy.
>> Alexandra Boldyreva: Not quite. We'll see. Right. So I'm not saying it's a terribly hard
problem, no. But we'll see some small things you have to take care of. That's a very simple
setting.
And we want to do it securely. So in principle, when these database people told me the problem,
I said: But there is a good solution, which was recently proposed. And it's better than the
solution which you told me was the hash. Because it provides really strong security, where even
though we didn't discuss it, but very good security.
And it's called encryption with key word search, public encryption with key word search effects
done by Dan Bonet (phonetic) and others not long ago. And it could be used and really good
formally analyzed security. In this solution, the user, the doctor, my example, will send the
service some chapter and it's generated using the doctor secret key only the doctor can do it.
And the server will test this chapter against the data, encrypted data stored in the database and
will find records matching it.
Really good security. Nothing is leaked. Well, almost -- really strong security there. And I told
him here's the solution. Really good. But they told me like -- and they told me are you crazy?
We are not -- this solution requires the database server to go over the whole database. And
theoretical for me it sounds good but they said: No. We have terabites and terabites of data on
each query, no way we're going to go with the whole database. That's insane. And no, we want
to do it fast. Fast meaning we want to index the data and locate it very efficiently.
>>: You said each query. Can't you batch up these queries, every minute it would start running
and you could ->> Alexandra Boldyreva: What is it again?
>>: You said each query. Single, single database, but do 100 queries.
>> Alexandra Boldyreva: But doing it in parallel?
>>: Yes. In parallel.
>> Alexandra Boldyreva: Even if it's just one query, it's a long time to go. They don't want to do
it. Yeah. I guess it's possible. But they say it's not practical enough.
But I said if then you want to do it faster, it's not possible to have this really good security. And
they said it's okay. Fast is important. And then what's the best we can do with this constraint?
And actually database researchers already -- they tried to come up with some solutions
themselves. And what you mentioned was one of the solutions. And another was -- and so
maybe it's good to mention what's the problem? What's the problem if we just encrypt all data
with the randomized encryption and send it? So what's the problem?
The problem is if the sender uses random points to create cypher text of the message and the
client wants to locate this message and it encrypts it, he's going to get a new cypher text because
it's randomized. It's very good. And there's no way to point to this. So this is just why randomize
encryption itself is not good. And so they say, okay, let's just use deterministic encryption the
sender will encrypt the message and for the query the user encrypt the message so they are the
same and they can be used as a query and the server can index using of this deterministic
encryption.
And so they say we already have these good solutions. But no formal analysis was given, which
is understandable, but we want to just to study more this problem to see whether these solutions
are good and what security does this provide. So just formally we want to look at this.
So we started to look at this and I already told you. So with deterministic solution it's indeed
works, works meaning I don't know how secure it is, but the receiver would encrypt something so
here's the example it sends a review for some paper, public here of the author, the title and the
review encrypted, the service stores, and then the author wants to retrieve the review, sends,
encrypts the title. So it's the C cypher text coincides. And we assume the server before indexed
that. So it has some, uses some data structure to index it that. When the query comes, it can
very fast logarithmic time locate this record and return the other part and the author can decrypt.
So it seems to work.
But, formally, we usually -- we want to internalize how secure is it? And you just want to define
some security. So, formally, just to understand what is it. And so I already mentioned that
deterministic encryption cannot satisfy the standard notions of security and DCCP and NDCCA,
so we need something else.
And so, in fact, you may notice that since we're in the public key setting, if the message space is
really small, you cannot have good security at all, just intuitively, because everyone can encrypt.
So you can exhaustively encrypt all messages in the small space and see what cypher text they
match and so you would know.
For this you have to assume your message space is large. Whenever we look at the public key
setting and deterministic encryption you have to assume you're dealing with a large message
space. But, still, what is the definition of security?
Cryptographers knew one definition which is suitable for deterministic encryption for a long time.
It's just one-wayness, just basically saying looking at the cypher text of a randomly chosen
message you cannot recover the message.
So you cannot go back if you do not know the secret key. But for these applications it seems too
weak. So maybe I can't recover the whole message but maybe I can recover half the message.
It just doesn't seem strong enough. It seems like it's possible to do better. So it's good to have
better security definition.
Stronger one. And we tried to come up with this. So the first card, let's try just to ask the cypher
text of a message drawing from a large space. By looking at it and knowing the public key, of
course, no efficient adversary can compute any function of the message. Okay. So that seems
like a good idea.
Of course, you always -- so you want to say cannot compute with good probability. But some
function, for example, first bit of the message you could guess is just probability one half. So we
will say no adversary should do compute this better than some adversary who is not given a
cypher text. Even without cypher text you can guess the first bit of the message. So it's not a
good attack. But if you can do something on retrieval from the cypher text this is what -- yeah?
>>: It seems impossible if FM is message and (inaudible) compute it.
>> Alexandra Boldyreva: Good meaning. The cypher text itself, right? Exactly. So this is what I
was going to say. Yeah. Good. So I'll go briefly to the point and it's good. So that's why I said
the first try. So we'll have to fix it.
So and we're just trying to realize that fixed encryption scheme. So even though we're thinking of
deterministic encryption, in the definition it doesn't really matter. It's just some definition of
security, and it will cover your solution to your proposed with the hash. So the definition is still
what we were going to say. So we can see the adversary, which we just divide by two parts.
Incorporating two guys, but they don't share, they can't exchange information, even though they
can know of each other's algorithm.
And we think of the experiment. So the first guy out puts a message drawn from a large space
should have some entropy in it and some strength target presenting this final information they
have to guess about the message. And later, the other guy will be given the cypher text of this
message and the public key and has to guess this target information.
So this is what is represented. So one experiment. First guy out puts the message on some
target, some string, which can have first bit, anything, half of the bits. When we encrypt give the
cypher text to the other guy and he has to guess this target this first guy created. If it guesses we
say it wins. This is one experiment. This experiment, the adversary it's the same but the
adversary is given some bogus message, and it has to guess some information about totally
unrelated message.
So this is just to exclude retrieval attacks. So this guy is just not given the cypher text. So it
seems reasonable but you caught me. So, exactly, under this, which seems natural definition, no
scheme is secure, regardless of whether the message is space is large or not. Just this target. If
it's a cypher text, deterministic cypher text, some information which is leaked and this definition
was kind of targeting high. It says no information should be leaked. But it is leaked by
functionality. It just has to be leaked. Therefore, it doesn't work.
Well, it's just too strong a definition for our primitive. But it seemed good. So we just tried to fix it,
yeah.
>>: Adversary, is that the pair of players in bridge, playing and one player communicate this
model?
>> Alexandra Boldyreva: Yeah. (Chuckling) yeah. Before it lays down, right when they ->>: Afterwards.
>> Alexandra Boldyreva: But then they share some information, one reveals one, if it's ->>: Yeah.
>> Alexandra Boldyreva: But the other player. Right. Exactly. Sorry for the detour. But the
definition looked good. There was just one problem and we tried to fix it. How to fix it? Well,
kind of trivially. We just say we are not going to give the public key to this guy. So it cannot
output the cypher text.
Well, it's not ideal, but it's the only way we can see how to fix. What it means is now is the
definition would depend on the data which does not depend on the public key. In practice, it's
probably okay. Public keys are usually hidden in some software? No?
>>: People think that way.
>> Alexandra Boldyreva: Well, our definition is going to hide information about messages now
depending on the public key, that's the only thing we could do. I still think it's fine for more
applications, but it's good to realize we cannot do better.
Now the previous attack does not apply and the definition seems achievable. So that's one. And
in fact this is one of the definitions we call it one one brief of privacy. One means just one
message. In the standard notions, again, whether you know them or not, it doesn't really matter.
It looks something like this. The adversary outputs messages and gets cypher text and has to
guess something. It doesn't really matter whether the adversary asks just one message or many
messages. Polynomially, it's still the same. And the question is the same for our definition, too.
By the way, under this definition we say the definition is pre-1 CP chosen plain text attack secure
if it's just no efficient adversary can do much better in the first experiment than in the other.
But we can consider just exact same experiment where it's not one message but a vector of
messages. And the target information represents some information from the whole vector. It's
reasonable. In the standard setting, for randomized encryption. It doesn't matter much. For
simplicity you can consider just one message. Here, we'll see. But first we just do that. It's just a
vector now it's several messages.
Okay. And we just, any restrictions on the message, so you can -- we're still going to work with
large message space. This is not going to go away. So each message, which means each
message is hard to guess. (Inaudible) entropy, otherwise the definition we cannot achieve it. But
still there are several possibilities. And one possibility, we say that each message is hard to
guess. But other than that, they can be arbitrarily dependent. You can have a message X, which
is random, and the second message can be X plus one. The second message can be X plus 3.
It's allowed. Still each message is hard to guess. And this is the definition which we call brief
CPE. We'll see whether the equivalent are strong or not.
But you also can consider a weaker definition, stronger restriction on the messages when we say
each message hard to guess given the others. So my previous example we had the random
message and random message plus one does not satisfy this restriction, because once you are
given X, the X plus 1 is not hard to guess. Right? And this is called block sources. And in the
example, for example, if we consider phone numbers, you can know that some numbers share
the same prefix, because they're from the same area.
But even knowing that, they're dependent but still given one it's hard to guess the others. So this
would still satisfy this weaker definition. So we have these two variations, and the question so
what are the relations between these definitions?
And we just started so we know what is the strongest, what is equivalent. And so our results are
that the definition for one message is strictly weaker than the strongest definition for vector of
messages. And this is different from the case from the standard encryption.
It's good to know. The other result is actually the middle one. Brief CPA for block sources turns
out to be equivalent to the definition with just one message. So for simplicity we can just work
with this one for block sources and this is going to be our strongest one.
It's interesting, the second result is way more, it's way harder to show than the first one and use
some ideas from works of doter and Smith on entropic security and some other recent works,
some French names I forgot, which also looked at entropic security. It's all in the information
theoretic, not computational and symmetric setting and the last two works actually for the
quantum setting, but the techniques very useful.
But what's important is just how do the definitions relate. Now we have the definitions and
almost. As usual so if you know it makes sense to always consider stronger definitions in terms
of not only chosen plain text attack where the adversary can choose messages but also chosen
cypher text attacks when the adversary is allowed to see decryptions of cypher text of its choice,
chosen cypher text, in addition to chosen plain text. And we can just make our definition take
these chosen cypher text into account in the standard way given by the description Oracle. We
do this. It's pretty standard. And then we have all these definitions with CCTA, chosen cypher
text attack, this is just the stronger version.
It's always good to target for the stronger chosen cypher text attacks and the same relations hold.
So now we have definitions. It's time to look at the constructions. And first constructions are
going to be in the random Oracle model. So what is this? There's some -- it's not real. It's an
idealized model which assumes the hash functions. Actually ideal objects. They're just purely
random functions. All parts just have Oracle access to.
In this model, it's not very hard to come up with secure constructions. And one of the
constructions, we'll call it encrypt with hash, it's pretty intuitive. We have any encryption scheme.
It has randomness in it. We want deterministic scheme. We make the randomness. We replace
it with some deterministic function of any message. Namely the hash message. Hash this
message and then use any randomized encryption scheme and substitute random coins with the
hash of the message and you can encrypt, too.
So it may be a bit better than your solution because cypher text will be shorter, for example, for
this reason because hash is within the cypher text, but pretty much the same idea.
So it's a deterministic encryption scheme. The random Oracle model, if we assume hash is ideal,
it satisfies the stronger notion of security. Of the scheme where two CPA, you get brief CPA.
And if you take CCA-based scheme and also if it satisfies some very, very minor constrain, which
is satisfied by all practical schemes, so it's not a big deal, you get brief CCA scheme,
deterministic scheme.
So one simple construction.
>>: What's the scheme that you're contrasting it with?
>> Alexandra Boldyreva: I'll repeat. So his idea was very easy solution. Taking your
randomized scheme. It's going to be randomized. Encrypt the message. Plus hash the
message and append the hash. And it's not deterministic encryption, but for the application of
search you will just use hash as the target location.
>>: So then you can't do searchable ->> Alexandra Boldyreva: You do. You do.
>>: Do each word?
>> Alexandra Boldyreva: No, no. Right. So you would ->>: You would keep the hash in a separate field and then can search ->>: So word-by-word indexing, words and hashes of words.
>> Alexandra Boldyreva: Even in one case that I was looking at encryption of whether it's a field
or word, one cell, whatever this is. I'm looking at encryption and indexing by that.
So if a cypher text just contains the hash in it, you just look at that. I don't know whether you put
it separately or not. Instead of the whole cypher text being the tag on which you index, you just
use hash part. And to decrypt you use the cypher text, which is randomized. Does that make
sense?
>>: Really inefficient if you have to do word ->>: It would only be for exact match.
>> Alexandra Boldyreva: But it's all for exact match so far. So I don't think -- this is pretty much
the same to me efficiency wise. It's not that it's better. I'm just looking at, but still if I want to
locate the word. I hash it. If the server indexed it by it, but it's the same. Okay?
>>: What's the constraint on the scheme?
>> Alexandra Boldyreva: We say as long as no cypher text occurs with too high a probability
when encrypting any given message. And this is for randomized encryption. And it actually
seems from looking at it it seems like it follows from just the CPA requirement. It just seems, of
course. Actually, it does not. You can have contrived scheme which do not. But all practical
schemes have this. And it's really easy to check for any scheme. So it's not a constrain at all.
Another scheme which we designed slightly more complicated but not much, it resembles, if
you're familiar with the standardized COA encryption scheme but it uses three rounds in terms of
two rounds. So these are the hashes. And this is the safe function applied to the output of this
transform and also the difference, the RCA function does not have to be applied to the whole
message, the whole transform, just the part. So variable message lengths is okay.
And the result is that what we get is length-preserving scheme. The first one was not length
preserving, because we used hash in place of the randomness. It was some extra. Here it can
be length preserving. Nothing on top. And we show that it's also satisfied the strongest notion
brief CCA in the random Oracle model assuming one-wayness of the RSA function.
Okay. So now let's look at the construction without assuming this idealistic random Oracles. And
both of the schemes were first designed in the random Oracle model. It's very common to first,
whenever you have the first primitive, to realize assuming random Oracle. It's convenient.
It's not that bad. But everyone realizes it's an idealistic model. It's a good heuristic, but strictly
speaking, proofs of security in this model do not guarantee security in the strong essence. So it's
always good to have something without random Oracles.
Even though it's common, without random Oracle, schemes tend to be less efficient. But still it's
good to know you can do it. And there are several works which raised concern actually. There
are more than what's listed here, that they can raise concern about the standards of the random
Oracle model. You can have schemes which are secure in it but in practice no way.
But they're very contrived. So we still hope for good schemes, it's okay. So it's interesting to see
whether you can do deterministic encryption in the standard model, random Oracle devoid model.
And so we managed to get some positive results. But only for this weaker definition security for
block sources. High restrictions on the message space. Still probably good for some
applications like Social Security numbers, for numbers. It's still an open question, seems to be
very hard. We'll see. Seems to be very hard to achieve the strongest definition without random
Oracles.
We'll see. It will be interesting to see if it's possible. So our constructions use as building blocks
some recently introduced primitives called (inaudible) functions, one chapter functions introduced
by Picker and Waters. Very recently the goal was different. They were looking for constructing
randomized IDCCN encryption schemes, but the primitives seems to be very useful and we will
use them.
So let's briefly review them. So what does it mean, a loss of raptor function. It's a raptor function
one way, there is some trap door with which you can invert. But it operates in two modes.
Basically, the key generation can generate two types of the keys. So one is a normal mode
invertible. Hard to invert without knowing the secret key.
But if you know the secret key you can invert. But the other mode is low C meaning it just loses
some information about the input and you cannot invert. And the other restriction, the modes are
indistinguishable, if you're given these two keys which yield to two different modes you don't know
what is the case.
And there are several constructions under different assumptions we're given in that paper. And
what we observe is that if you just are concerned about security against chosen plain text attacks,
it's easy. If you get a low C trap door function, and if you look at it low C mode. By the way, the
low C mode invertible one, it's only used in the proofs. In the reality you always use the normal
one. But this low C mode is very useful in the proofs.
So this, if you look at this low C mode and if you see that it acts as a universal hash function,
there is this property, it's not quite important what it is, but it's a property, noncryptographic
property. If this low C mode has this property, then immediately this low C trap door function
gives you deterministic encryption secure in the sense brief for block sources.
Because it's used in the normal mode. You can decrypt. Its function deterministic, and you can
show it's indeed secure for block sources or for the equivalent definition with just one message.
And the proof is rather easy. It just uses the left-over hash lambda, and so it's pretty
straightforward. And so we'll talk -- and I will tell you I guess I'll later have a slide that the
constructions of low C trap door functions which were provided by Picker and Waters, they
satisfied this restriction for one of them.
So it has this universal low C mode so it can be used. We'll discuss it. But what about the
strongest security against chosen cypher text attacks? So it's slightly more involved and we use
the other primitive used by these people to construct randomized encryption schemes. So all but
one trap door function is a generalization of a low C function.
It has an additional input set of branches or modes, you can also call. One branch specify a low
C function but all others specify normal invertible functions and again you cannot tell which is
which, by given a branch you cannot tell.
So it's just slightly more general. And, again, several constructions are there. And so if we
have -- and this is our construction of deterministic encryption scheme. It's different from the
construction of Picker and Waters which used one-time signature scheme which seemed to be
hard to derandomize.
This is how a cypher text in our scheme would look like. So here we have some hash function.
This is a low C trap door function. And this is all but one function. And this hash specifies a
branch. And you can decrypt. It's deterministic. And we show that it is secure -- actually, it's
either pre-one or pre for block sources. It's a typo. For this intermediate notion. So if hash -- this
is low C trap door function with the universal low C mode. This is all but one function. Again,
with universal branch and this is a hash function which is both universal and target collision
resistant.
It's a weaker notion than standard collision resistance. Under these assumptions we show that
this is pre for block sources. Okay. So this is general construction. What about instantiations?
As I said, the DDH, Decisional Deharmon based construction provided by Picker and Waters, has
universal mode so we can just plug it in. But we need for the CCA construction, we need hash,
which is both universal and target collision resistant. To the best of my knowledge, it's the first
time when these two properties are needed together. Usually it's one or another. If we want to
work the DDH assumption, well we've found some schemes, some constructions, actually, of
existing hash functions, and we show that they satisfy both these properties.
And we show this for two popular groups where DDH is hard. So we have under the DDH
assumption we have everything. But the constructions of low C trap door functions, but trap door
functions based on DDH which were provided in the other paper, so they are inefficient. They're
actually bit by bit in the matrix form. Not very efficient.
So we wanted to do more, something more efficient. But the other constructions, for example,
the head constructions of these low C functions based on Palliere's (phonetic) based encryption
and the other assumption, but they didn't seem to have universal low C modes and this is an
additional thing we needed.
So didn't seem to work. And for that we just designed a tweak to our general construction, which
allows us to avoid this additional restriction. So we just modify all construction by using Parez
(phonetic) independent but convertible hash functions. If you have this and there are some
constructions which you can have, if you process the message first with these hash functions,
then we say that the assumptions can be weakened, altogether. You don't need universal low C
modes of these two primitives anymore.
And also you don't need universality of the hash. So it seems better with this streak and we
can -- then we show that you can use Pallier-based (phonetic) construction from Picker and
Waters but we actually improve efficiency over it. Just make it even more efficient. But funny, it
was independently, the exact same efficient Pallier-based constructions were discovered by Rosa
and Sager very recently. And so we can have something more efficient. It's not as close to
efficiency to random Oracle constructions but it's actually not that bad.
And we have security in the standard model without assuming random Oracles. But it's for block
sources. Okay. This is about all the main results we did on the public key deterministic
encryption. So now, as you mentioned, as you noticed, I was going to trick you a bit by saying
deterministic encryption is what you need for this search, but it doesn't have to be deterministic.
So it doesn't have to be for the same functionality of this particular application.
And so this is why we generalize what we need. We call it efficiently searchable encryption, this
primitive, and deterministic encryption is just one particular case of it that satisfies it. Yeah?
>>: But when you say encryption, how do you guarantee that you have length preserving
mechanism? Because I see that it's a nice feature of actually storing the data. Using the data.
But it's actually for if you're assuming just serial storage of it. If you didn't have length per
serving, you talked about rights of data, (inaudible) then it becomes an issue of observation.
>> Alexandra Boldyreva: Just the construction was deterministic. It was length preserving.
>>: So for the case of the hash ->> Alexandra Boldyreva: Hash and RSA. No, the length preserving one, when the hashing was
done kind of -- so maybe I can go back quickly and we'll see. Okay. This is the scheme. So this
is the message. Real size. And this is the cypher text. Real size. They're about the same.
That's my proof.
So okay. So if the message is really short, you cannot use just short RSA, because you need to
have at least a thousand 24. But then probably it's not a big deal if you have a few extra bits.
Right. So it cannot get shorter than the RSA key, but if it's longer, you just do it like this.
This is length preserving transformation, hash, on the part of the message, and they're inside. It
doesn't increase the size of the cypher text. And you apply the RSA. It's a permutation. You can
stay the same size as long as it's not shorter than the size of the RSA key. It's just by
construction.
And all the other constructions are not length preserving. This is the only one we had. Efficiently
searchable encryption, just the generalization. We say you don't need deterministic encryption.
We want the same functionality.
So for that we require encryption scheme to have two functions in addition have to be defined. F
and G. Which are used for queries and indexing. So if someone sends a cypher text, so the
server will use one function, G, on the cypher text in order to index. Know where to locate.
Then whenever the query comes in, the user would use the other function, F, on the message to
compute the query. And the server will take it and go to the exact same index.
So we asked -- the results of these functions to be the same. What server can use and what the
user can -- so basically what sender can send and the server can use and what the receiver can
use to query.
In the occasion of deterministic encryption, both of these functions are just cypher text, basically.
This whole part. In your solution, which we are coming to, it's going to be just the hash. You take
the same cypher text and hash part of it. This is your function.
And you just hash the message. This is the function and they coincide. Doesn't have to be the
whole cypher text which is the same. It can be a part, which can be used.
So but deterministic encryption is just a special case. The same security definition applies
because in the definition we didn't say fit has to be deterministic. Same security definitions are
there. And your construction, which you saw immediately we called encrypt and hash, you just
encrypt and randomize scheme and append the hash and you use it for searching.
And it was proposed in the database literature, but we just confirmed that it is secure in the
random Oracle model, if the hash has this ideal property.
It also brings me to one question that this scheme, maybe I overemphasized a little bit the
importance of deterministic encryption, because for the main application we see that it's the hash
what's needed. In the random Oracle model no problem. In the standard model, so it makes
sense to design not the whole deterministic encryption scheme, but just the hash, which has -- so
what properties? So that the whole scheme satisfy our definitions.
So we see that -- we didn't state it anywhere, but it seems fine that for block sources just
universal hash function and the leftover hash lambda will work. Still a big open question, not
much -- well, seems easier than encryption but still seems to be very hard to design a hash
secure in the strongest sense. Still open question, not for block sources. I don't know the
solution. I know several people are trying. The results so far we'll see.
Good. So just one small point here. If you use encrypt and hash construction, then you can have
some efficiency security trade-off because if you don't like that you can see that where the equal
messages are in the database, you can try to decrease the output of the hash. So have some
collisions, actually, there. So several plain text will map to the same cypher text, the server
wouldn't be able to tell. It will come with the price of having more false positives on each query.
So the server will return more answers. Maybe preferable in some applications. And so just this
trade-off you may have to limit the output of the hash. And I have a few minutes and I just want
to tell you a little bit the symmetric setting and more flexible queries.
So deterministic encryption in the symmetric setting we treated it, too. Things are a bit easier
there. So we have the definition and because things are easy we'll start the definition not only
provide privacy but also provide authenticity, and finding the definition is also not that
complicated.
So we did this. Our construction, what we had, just if you're interested, asks to compute a map of
the message. This is basically in place of the hash we used in the public key setting, and most
MACs is a PDF, good randomized function. We use this tag, this MAC for the ID for the CVC
encryption mode.
Surprisingly, it's the same construction which was used for independently for totally different
application which is called key wrap by Rogueway and Trimpton (phonetic). Okay. So this is
very briefly about the symmetric key setting.
So when we did it, we were kind of happy with ourselves, and, for example, the symmetric
solution represented at the database conference, we thought they would like it, and they said,
okay, but it's only exact match. It's about what we want to do all types of queries. We want to do
minimum, maximum, range queries, average, do it -- it's not enough -- it's like okay, one thing at a
time.
So what they're particularly interested about. And, again, they're trying to do it themselves, they
want to do range queries. So you have your data now return with the records which contain
cypher text and encryptions from 100 to 300, something like that.
And, again, they had several ideas. And two ideas, actually. One is I'm not sure if it's their name,
but the idea is if it's (inaudible) encryption and remember your plain text of numbers in the binary
form. So they encrypt everything bit by bit and they have the property that -- and it's kind of
deterministic, right? So if two messages start with the same prefix, the cypher text will also start
with the same prefix. That's why I call it prefix preserving. And they say you can express a
range, a set of numbers, sometimes with just the minimum set of prefixes.
Right? So then basically how it's called star at the end whatever after. So this prefix and the rest
can be anything. And this prefix. And you can express a range like this. They suggest to use it
to answer in range queries.
The other suggestion was to use all the preserving encryption. Meaning if two plain texts stay in
order one smaller than the other, the same should hold in the cypher text. So you can sort the
cypher text and that's all. And then if it's deterministic, you just encrypt the range and the
database will return everything in between.
>>: Does such a stream exist?
>> Alexandra Boldyreva: They designed something. I looked at it. My student looked at it. It
doesn't make any sense. So I cannot parse it. No. Even though I told them it doesn't make
sense, they said it must make sense because it was published in our best database conference.
And I said no, they say it must. It's very good.
>>: Order preserving identical to one part code books which I believe in the 19th century they
knew how to crack.
>> Alexandra Boldyreva: Actually, so maybe you can notice that it's easy to crack. So if you're in
the public key setting, okay? If it's public key, if you can encrypt everything yourself it's
deterministic or preserving, it doesn't make any sense. If you have some cypher text, you take
some message, encrypt it. If it's larger you go here. Encrypt it and you will converge to the
message. So I don't know how can it make sense. But they're talking about symmetric, you're
saying, where you're saying it seems to me that it can make some sense. It's not very strong
security. But something can be there. Right?
So at least it's not everything is obviously insecure. You can have some security.
>>: It's not totally (inaudible) but it's broken in (inaudible).
>> Alexandra Boldyreva: But it worked, what scheme?
>>: One time code books have notoriously been weak since ->> Alexandra Boldyreva: What does one time code books?
>>: The plain words on the code book are in the same order. So if you know -- so you find a
code work, you know this must be before Ali Baba and after Ali Baba but before (inaudible)
because it goes alphabetical order?
>> Alexandra Boldyreva: Well...well, one thing is if you think about the cypher text space,
definitely has to be way larger than the message space.
It's not the case in your scheme. So if it's way larger, these simple things do not necessarily hold.
>>: Order preserving. If I know two messages, I know which message is between them and not
between them.
>> Alexandra Boldyreva: Right. But you look at the cypher text, and they can be way farther
apart. So I think it may make sense.
>>: May make sense.
>> Alexandra Boldyreva: So that's what we're looking at now. So I don't have all the results, but
we're trying to define security synch and you have some security and then trying to find the
schemes. Maybe we can discuss. I'll tell you will our scheme and you try to crack it. That's our
first task. It can be interesting.
Preference preserving scheme, we also had something which may be more efficient than what
they had, but it seems less interesting to me because when used for range queries it actually can
reveal some information for some ranges just looking at the cypher text you can know, for
example, for bit-by-bit encryption.
I'm not sure how useful it is. But order encryption scheme seems interesting to me. It seems to
make some sense. So we can discuss it. Actually, there are some ideas here, and it's
interesting. It's the first time I do something when the solution has so little encrypted, it's just
probability of theory. Some distributions and just a question of how to efficiently sample a
random order preserving function.
We'll See. And that's it. So on time. So cited some security definitions for deterministic and
efficiently searchable functions, showed some relations between them and provided construction
in the random Oracle model. In the strongest sense and some less efficient construction but in
the standard model for slightly weaker definition.
So discussed this more general primitive efficiently searchable encryption. Several
constructions -- one construction there and discussed symmetric setting and more flexible
queries. Thank you.
(Applause)
>>: One question.
>> Alexandra Boldyreva: More questions.
>>: I have a question. In the efficiently searchable encryption don't you need (inaudible) F and G
are both noninvertible with private key?
>> Alexandra Boldyreva: It is probably implied by the security notion. But this is just the
minimum syntax you need just to have functionality. But then you look at the security definition.
So most probably, yes. If it's not satisfied, it will not be satisfied. But it had nothing to do with the
security at the time. It's just what do we need to do to do search and then you later whether to
secure it or not.
>>: This may be a unfair question. But it seems naively this searchable deterministic encryption
we'll all be broke in the first two years when we learn our first language, what is the difference?
Why is this encryption stronger than what a two-year-old baby can break?
>> Alexandra Boldyreva: So what is it that can break?
>>: The known language. And we all manage by listening around. About two years to learn one
language. Those who don't are in deep trouble.
>> Alexandra Boldyreva: Right.
>>: Why is it that deterministic encryption, any stronger than a code two-year-old babies break?
Not all two-year-olds can get job offers with NSA.
>> Alexandra Boldyreva: The message space is larger. It's not possible to remember it all. And
you cannot go back. You just don't remember all the mappings.
>>: You can break, can remember all the messages, babies maybe can't ->> Alexandra Boldyreva: Do they learn more than they hear? They actually ->>: They learn what they hear.
>> Alexandra Boldyreva: But here you ->>: English or German or Hebrew or French somehow.
>>: Smaller vocabulary than this case.
>> Alexandra Boldyreva: Right. Here you hear way less than what you're actually asked to
correct.
>>: Maybe you should say the full message, should be longer than a telephone number. I doubt
most services are on telephone numbers, galaxy numbers.
>> Alexandra Boldyreva: What you're saying is that in practice you will not -- so our requirement
that the message space is very large, a lot of high entropy. In reality, yes, even probably my
name you can -- Alexandra is not that bad. But it's a good point.
>>: (Inaudible).
>> Alexandra Boldyreva: Exactly. It's a good point. So what it says that, look, if you want to
have -- they just want to have fast search. And they say whatever the best security. That's the
best security? It's not very good. The point is if you don't have a large message space, and
probably you don't have it, well, then, we cannot help you. Then you have to do more work on
searching if you want high security. Absolutely.
>>: Does it have to do with the fact that your definition of security, the first one never sees the
public key. Is the baby trying to learn with some critical piece of information that he never gets to
see?
>> Alexandra Boldyreva: I somehow don't think this is the reason. It's probably more to the
difference; but even without it, I'm not sure.
>>: One of the (inaudible).
>> Alexandra Boldyreva: The other one can, yes. Who is asked ->>: Who is actually ->> Alexandra Boldyreva: And they can encrypt everything with the public, so it's not really a bad
thing. You see?
>>: And also no traffic analysis, just knowing when that query is sent? The day one Bubba has a
heart attack, you know we'll have a pathway.
>> Alexandra Boldyreva: Good question. We actually did internalize -- like somehow it was the
security of the primitive. How it was used like the server would know what it finds. So all these
things would be -- they have to be treated additionally but that's true. There are things like that,
yeah.
>> Kristin Lauter: More questions? So let's thank her again.
(Applause)
Download