16059 - Msecnd.net

16059 >> Kristin Lauter: Okay. So today we're pleased to have Professor Alexandra Boldyreva visiting us from Georgia Institute of Technology, and she will speak about deterministic and efficiently searchable encryption. Thank you. >> Alexandra Boldyreva: Thank you very much, Kristin. Can you hear me well? Is the mic on? Okay. Good. Thank you very much for inviting me here. I changed a little bit the title and the talk just slightly from what was announced in the abstract. I hope it's okay. I included some new and probably more applied things. This talk is based on several works I did jointly with Mihir Bellare, Serge Fehr and my student Adam O'Neill. These are the talks I will cover mostly. But if I have time, I hope I will. I will also include some other things which would run together with students at Georgia Tech (inaudible) Nate Shenet (phonetic) and Una Nee (phonetic) who is actually visiting post doc. So the plan of my talk I'll tell you about the focus and motivation for looking at these things, then we'll talk about the problem of defining security for the deterministic encryption and why it's a problem and then we'll see very efficient and secure constructions. Some will be in the random Oracle model and then we'll see other constructions which are slightly less efficient, well, some of them more less efficient. But they're securer without relying on the random Oracle model. And we'll see some general constructions and extended general constructions and we'll discuss they're specific instantiations. And, finally, we'll talk about more general primitives, efficiently searchable encryption, which doesn't have to be deterministic, and, finally, we'll discuss the problem of deterministic encryption in the symmetric setting, because most of the talk will be focus on the asymmetric setting. And I also want to talk about more flexible database queries and encryption schemes designed specifically for this problem. Okay. So let's go with this. We'll start with the main topic. So classical security definitions for encryption. If you're familiar, it's good. If not, it's also okay. But if you notice the standard notions and distinguishability against chosen plain text attack and chosen attacks, IDCC, they require encryption scheme to be randomized. And so why is that? Because these are really strong properties they say encryption shouldn't leak any partial information about the message. Should be very, very secure but deterministic encryption and very deterministic encryption, when the encryption algorithm doesn't use any randomness, if you encrypt the same message twice you'll get the same cypher text with deterministic encryption. Such encryption does leak information inherently, just that information. You can see when the same message is encrypted twice. And because of this, in my lectures I teach my students that deterministic encryption is not good. Remember that. It just cannot be good. It has to be randomized. And, in fact, it's not a problem that there are tons of efficient probabilistic randomized encryption schemes out there. But in my talk I'm going to look at deterministic encryption, which I usually say is no good. And so why is that? Why am I doing this? There is some reason for doing this, because database researchers at Georgia Tech brought some practical problem to me. I didn't know about it before. And the problem is fast search on encrypted remotely stored data. We'll discuss it in more detail. And it turns out that deterministic encryption is what can be very useful there. So there is some reason to look at this. So we'll talk about this more. Other than that, there are some other reasons why it may be interesting. Randomized encryption, the cypher text has to include randomness so it's longer than the message but deterministic encryption in principle can be length preserving, which may be useful, again, in some particular applications. Late, after we started this, we just discovered some relation to some other notion which is called convergent encryption used for totally different problem for secure storage and checking consistency of the storage. And just it's interesting from a theoretical point of view and from historical standpoint because when public encryption appeared it was first designed in the form of deterministic encryption. Okay. So let's look at the main application I mentioned. Outsource the databases. So what's the problem? So apparently nowadays it's becoming more and more popular for companies and organizations to outsource data storage and, most importantly, management, to external service providers. So it's cheap to store, but to manage it better, let it do the specialists. But this external service, external service providers do not have to be fully trusted. So I may trust them to store and manage my data, but I don't want them to read everything. And security may be one required by law like, for example, in the case of medical data. So what is the setting? Mostly in my talk I will look at the public setting, but at the end we'll talk about the symmetric setting. So what is the setting? We have the external service provider database server and anyone who knows the public key of some distinguished receiver can submit data. For example, like nurses, pharmacists can submit some records about some patients to the database. And then the doctor who has the secret key should be able to query the database, ask it for specific records, get them back and decrypt and read the data. So this is the setting. And we want to do it securely. Okay. So I guess you can ask me questions any time, I don't mind. >>: So would it suffice just to keep the hash data along side the randomized encrypted data? >> Alexandra Boldyreva: Good point, yes. And we will -- yes, I will -- it's one of the solutions which was actually, of course, proposed, mentioned by the database scientists. Right. So I will talk about it. It's a solution -- it will not be -- it doesn't have to be much better, but it's good and we'll discuss it at the end. Not at the end, but after, yes, it's there. >>: If really anyone can put data in, it's a nightmare of data corruption and pollution. >> Alexandra Boldyreva: So, we, of course, in reality we would assume the database server will also have some kind of authorization service on top. But it's an extra layer. But by anyone, anyone good who we want to. >>: That's easy. >> Alexandra Boldyreva: Not quite. We'll see. Right. So I'm not saying it's a terribly hard problem, no. But we'll see some small things you have to take care of. That's a very simple setting. And we want to do it securely. So in principle, when these database people told me the problem, I said: But there is a good solution, which was recently proposed. And it's better than the solution which you told me was the hash. Because it provides really strong security, where even though we didn't discuss it, but very good security. And it's called encryption with key word search, public encryption with key word search effects done by Dan Bonet (phonetic) and others not long ago. And it could be used and really good formally analyzed security. In this solution, the user, the doctor, my example, will send the service some chapter and it's generated using the doctor secret key only the doctor can do it. And the server will test this chapter against the data, encrypted data stored in the database and will find records matching it. Really good security. Nothing is leaked. Well, almost -- really strong security there. And I told him here's the solution. Really good. But they told me like -- and they told me are you crazy? We are not -- this solution requires the database server to go over the whole database. And theoretical for me it sounds good but they said: No. We have terabites and terabites of data on each query, no way we're going to go with the whole database. That's insane. And no, we want to do it fast. Fast meaning we want to index the data and locate it very efficiently. >>: You said each query. Can't you batch up these queries, every minute it would start running and you could ->> Alexandra Boldyreva: What is it again? >>: You said each query. Single, single database, but do 100 queries. >> Alexandra Boldyreva: But doing it in parallel? >>: Yes. In parallel. >> Alexandra Boldyreva: Even if it's just one query, it's a long time to go. They don't want to do it. Yeah. I guess it's possible. But they say it's not practical enough. But I said if then you want to do it faster, it's not possible to have this really good security. And they said it's okay. Fast is important. And then what's the best we can do with this constraint? And actually database researchers already -- they tried to come up with some solutions themselves. And what you mentioned was one of the solutions. And another was -- and so maybe it's good to mention what's the problem? What's the problem if we just encrypt all data with the randomized encryption and send it? So what's the problem? The problem is if the sender uses random points to create cypher text of the message and the client wants to locate this message and it encrypts it, he's going to get a new cypher text because it's randomized. It's very good. And there's no way to point to this. So this is just why randomize encryption itself is not good. And so they say, okay, let's just use deterministic encryption the sender will encrypt the message and for the query the user encrypt the message so they are the same and they can be used as a query and the server can index using of this deterministic encryption. And so they say we already have these good solutions. But no formal analysis was given, which is understandable, but we want to just to study more this problem to see whether these solutions are good and what security does this provide. So just formally we want to look at this. So we started to look at this and I already told you. So with deterministic solution it's indeed works, works meaning I don't know how secure it is, but the receiver would encrypt something so here's the example it sends a review for some paper, public here of the author, the title and the review encrypted, the service stores, and then the author wants to retrieve the review, sends, encrypts the title. So it's the C cypher text coincides. And we assume the server before indexed that. So it has some, uses some data structure to index it that. When the query comes, it can very fast logarithmic time locate this record and return the other part and the author can decrypt. So it seems to work. But, formally, we usually -- we want to internalize how secure is it? And you just want to define some security. So, formally, just to understand what is it. And so I already mentioned that deterministic encryption cannot satisfy the standard notions of security and DCCP and NDCCA, so we need something else. And so, in fact, you may notice that since we're in the public key setting, if the message space is really small, you cannot have good security at all, just intuitively, because everyone can encrypt. So you can exhaustively encrypt all messages in the small space and see what cypher text they match and so you would know. For this you have to assume your message space is large. Whenever we look at the public key setting and deterministic encryption you have to assume you're dealing with a large message space. But, still, what is the definition of security? Cryptographers knew one definition which is suitable for deterministic encryption for a long time. It's just one-wayness, just basically saying looking at the cypher text of a randomly chosen message you cannot recover the message. So you cannot go back if you do not know the secret key. But for these applications it seems too weak. So maybe I can't recover the whole message but maybe I can recover half the message. It just doesn't seem strong enough. It seems like it's possible to do better. So it's good to have better security definition. Stronger one. And we tried to come up with this. So the first card, let's try just to ask the cypher text of a message drawing from a large space. By looking at it and knowing the public key, of course, no efficient adversary can compute any function of the message. Okay. So that seems like a good idea. Of course, you always -- so you want to say cannot compute with good probability. But some function, for example, first bit of the message you could guess is just probability one half. So we will say no adversary should do compute this better than some adversary who is not given a cypher text. Even without cypher text you can guess the first bit of the message. So it's not a good attack. But if you can do something on retrieval from the cypher text this is what -- yeah? >>: It seems impossible if FM is message and (inaudible) compute it. >> Alexandra Boldyreva: Good meaning. The cypher text itself, right? Exactly. So this is what I was going to say. Yeah. Good. So I'll go briefly to the point and it's good. So that's why I said the first try. So we'll have to fix it. So and we're just trying to realize that fixed encryption scheme. So even though we're thinking of deterministic encryption, in the definition it doesn't really matter. It's just some definition of security, and it will cover your solution to your proposed with the hash. So the definition is still what we were going to say. So we can see the adversary, which we just divide by two parts. Incorporating two guys, but they don't share, they can't exchange information, even though they can know of each other's algorithm. And we think of the experiment. So the first guy out puts a message drawn from a large space should have some entropy in it and some strength target presenting this final information they have to guess about the message. And later, the other guy will be given the cypher text of this message and the public key and has to guess this target information. So this is what is represented. So one experiment. First guy out puts the message on some target, some string, which can have first bit, anything, half of the bits. When we encrypt give the cypher text to the other guy and he has to guess this target this first guy created. If it guesses we say it wins. This is one experiment. This experiment, the adversary it's the same but the adversary is given some bogus message, and it has to guess some information about totally unrelated message. So this is just to exclude retrieval attacks. So this guy is just not given the cypher text. So it seems reasonable but you caught me. So, exactly, under this, which seems natural definition, no scheme is secure, regardless of whether the message is space is large or not. Just this target. If it's a cypher text, deterministic cypher text, some information which is leaked and this definition was kind of targeting high. It says no information should be leaked. But it is leaked by functionality. It just has to be leaked. Therefore, it doesn't work. Well, it's just too strong a definition for our primitive. But it seemed good. So we just tried to fix it, yeah. >>: Adversary, is that the pair of players in bridge, playing and one player communicate this model? >> Alexandra Boldyreva: Yeah. (Chuckling) yeah. Before it lays down, right when they ->>: Afterwards. >> Alexandra Boldyreva: But then they share some information, one reveals one, if it's ->>: Yeah. >> Alexandra Boldyreva: But the other player. Right. Exactly. Sorry for the detour. But the definition looked good. There was just one problem and we tried to fix it. How to fix it? Well, kind of trivially. We just say we are not going to give the public key to this guy. So it cannot output the cypher text. Well, it's not ideal, but it's the only way we can see how to fix. What it means is now is the definition would depend on the data which does not depend on the public key. In practice, it's probably okay. Public keys are usually hidden in some software? No? >>: People think that way. >> Alexandra Boldyreva: Well, our definition is going to hide information about messages now depending on the public key, that's the only thing we could do. I still think it's fine for more applications, but it's good to realize we cannot do better. Now the previous attack does not apply and the definition seems achievable. So that's one. And in fact this is one of the definitions we call it one one brief of privacy. One means just one message. In the standard notions, again, whether you know them or not, it doesn't really matter. It looks something like this. The adversary outputs messages and gets cypher text and has to guess something. It doesn't really matter whether the adversary asks just one message or many messages. Polynomially, it's still the same. And the question is the same for our definition, too. By the way, under this definition we say the definition is pre-1 CP chosen plain text attack secure if it's just no efficient adversary can do much better in the first experiment than in the other. But we can consider just exact same experiment where it's not one message but a vector of messages. And the target information represents some information from the whole vector. It's reasonable. In the standard setting, for randomized encryption. It doesn't matter much. For simplicity you can consider just one message. Here, we'll see. But first we just do that. It's just a vector now it's several messages. Okay. And we just, any restrictions on the message, so you can -- we're still going to work with large message space. This is not going to go away. So each message, which means each message is hard to guess. (Inaudible) entropy, otherwise the definition we cannot achieve it. But still there are several possibilities. And one possibility, we say that each message is hard to guess. But other than that, they can be arbitrarily dependent. You can have a message X, which is random, and the second message can be X plus one. The second message can be X plus 3. It's allowed. Still each message is hard to guess. And this is the definition which we call brief CPE. We'll see whether the equivalent are strong or not. But you also can consider a weaker definition, stronger restriction on the messages when we say each message hard to guess given the others. So my previous example we had the random message and random message plus one does not satisfy this restriction, because once you are given X, the X plus 1 is not hard to guess. Right? And this is called block sources. And in the example, for example, if we consider phone numbers, you can know that some numbers share the same prefix, because they're from the same area. But even knowing that, they're dependent but still given one it's hard to guess the others. So this would still satisfy this weaker definition. So we have these two variations, and the question so what are the relations between these definitions? And we just started so we know what is the strongest, what is equivalent. And so our results are that the definition for one message is strictly weaker than the strongest definition for vector of messages. And this is different from the case from the standard encryption. It's good to know. The other result is actually the middle one. Brief CPA for block sources turns out to be equivalent to the definition with just one message. So for simplicity we can just work with this one for block sources and this is going to be our strongest one. It's interesting, the second result is way more, it's way harder to show than the first one and use some ideas from works of doter and Smith on entropic security and some other recent works, some French names I forgot, which also looked at entropic security. It's all in the information theoretic, not computational and symmetric setting and the last two works actually for the quantum setting, but the techniques very useful. But what's important is just how do the definitions relate. Now we have the definitions and almost. As usual so if you know it makes sense to always consider stronger definitions in terms of not only chosen plain text attack where the adversary can choose messages but also chosen cypher text attacks when the adversary is allowed to see decryptions of cypher text of its choice, chosen cypher text, in addition to chosen plain text. And we can just make our definition take these chosen cypher text into account in the standard way given by the description Oracle. We do this. It's pretty standard. And then we have all these definitions with CCTA, chosen cypher text attack, this is just the stronger version. It's always good to target for the stronger chosen cypher text attacks and the same relations hold. So now we have definitions. It's time to look at the constructions. And first constructions are going to be in the random Oracle model. So what is this? There's some -- it's not real. It's an idealized model which assumes the hash functions. Actually ideal objects. They're just purely random functions. All parts just have Oracle access to. In this model, it's not very hard to come up with secure constructions. And one of the constructions, we'll call it encrypt with hash, it's pretty intuitive. We have any encryption scheme. It has randomness in it. We want deterministic scheme. We make the randomness. We replace it with some deterministic function of any message. Namely the hash message. Hash this message and then use any randomized encryption scheme and substitute random coins with the hash of the message and you can encrypt, too. So it may be a bit better than your solution because cypher text will be shorter, for example, for this reason because hash is within the cypher text, but pretty much the same idea. So it's a deterministic encryption scheme. The random Oracle model, if we assume hash is ideal, it satisfies the stronger notion of security. Of the scheme where two CPA, you get brief CPA. And if you take CCA-based scheme and also if it satisfies some very, very minor constrain, which is satisfied by all practical schemes, so it's not a big deal, you get brief CCA scheme, deterministic scheme. So one simple construction. >>: What's the scheme that you're contrasting it with? >> Alexandra Boldyreva: I'll repeat. So his idea was very easy solution. Taking your randomized scheme. It's going to be randomized. Encrypt the message. Plus hash the message and append the hash. And it's not deterministic encryption, but for the application of search you will just use hash as the target location. >>: So then you can't do searchable ->> Alexandra Boldyreva: You do. You do. >>: Do each word? >> Alexandra Boldyreva: No, no. Right. So you would ->>: You would keep the hash in a separate field and then can search ->>: So word-by-word indexing, words and hashes of words. >> Alexandra Boldyreva: Even in one case that I was looking at encryption of whether it's a field or word, one cell, whatever this is. I'm looking at encryption and indexing by that. So if a cypher text just contains the hash in it, you just look at that. I don't know whether you put it separately or not. Instead of the whole cypher text being the tag on which you index, you just use hash part. And to decrypt you use the cypher text, which is randomized. Does that make sense? >>: Really inefficient if you have to do word ->>: It would only be for exact match. >> Alexandra Boldyreva: But it's all for exact match so far. So I don't think -- this is pretty much the same to me efficiency wise. It's not that it's better. I'm just looking at, but still if I want to locate the word. I hash it. If the server indexed it by it, but it's the same. Okay? >>: What's the constraint on the scheme? >> Alexandra Boldyreva: We say as long as no cypher text occurs with too high a probability when encrypting any given message. And this is for randomized encryption. And it actually seems from looking at it it seems like it follows from just the CPA requirement. It just seems, of course. Actually, it does not. You can have contrived scheme which do not. But all practical schemes have this. And it's really easy to check for any scheme. So it's not a constrain at all. Another scheme which we designed slightly more complicated but not much, it resembles, if you're familiar with the standardized COA encryption scheme but it uses three rounds in terms of two rounds. So these are the hashes. And this is the safe function applied to the output of this transform and also the difference, the RCA function does not have to be applied to the whole message, the whole transform, just the part. So variable message lengths is okay. And the result is that what we get is length-preserving scheme. The first one was not length preserving, because we used hash in place of the randomness. It was some extra. Here it can be length preserving. Nothing on top. And we show that it's also satisfied the strongest notion brief CCA in the random Oracle model assuming one-wayness of the RSA function. Okay. So now let's look at the construction without assuming this idealistic random Oracles. And both of the schemes were first designed in the random Oracle model. It's very common to first, whenever you have the first primitive, to realize assuming random Oracle. It's convenient. It's not that bad. But everyone realizes it's an idealistic model. It's a good heuristic, but strictly speaking, proofs of security in this model do not guarantee security in the strong essence. So it's always good to have something without random Oracles. Even though it's common, without random Oracle, schemes tend to be less efficient. But still it's good to know you can do it. And there are several works which raised concern actually. There are more than what's listed here, that they can raise concern about the standards of the random Oracle model. You can have schemes which are secure in it but in practice no way. But they're very contrived. So we still hope for good schemes, it's okay. So it's interesting to see whether you can do deterministic encryption in the standard model, random Oracle devoid model. And so we managed to get some positive results. But only for this weaker definition security for block sources. High restrictions on the message space. Still probably good for some applications like Social Security numbers, for numbers. It's still an open question, seems to be very hard. We'll see. Seems to be very hard to achieve the strongest definition without random Oracles. We'll see. It will be interesting to see if it's possible. So our constructions use as building blocks some recently introduced primitives called (inaudible) functions, one chapter functions introduced by Picker and Waters. Very recently the goal was different. They were looking for constructing randomized IDCCN encryption schemes, but the primitives seems to be very useful and we will use them. So let's briefly review them. So what does it mean, a loss of raptor function. It's a raptor function one way, there is some trap door with which you can invert. But it operates in two modes. Basically, the key generation can generate two types of the keys. So one is a normal mode invertible. Hard to invert without knowing the secret key. But if you know the secret key you can invert. But the other mode is low C meaning it just loses some information about the input and you cannot invert. And the other restriction, the modes are indistinguishable, if you're given these two keys which yield to two different modes you don't know what is the case. And there are several constructions under different assumptions we're given in that paper. And what we observe is that if you just are concerned about security against chosen plain text attacks, it's easy. If you get a low C trap door function, and if you look at it low C mode. By the way, the low C mode invertible one, it's only used in the proofs. In the reality you always use the normal one. But this low C mode is very useful in the proofs. So this, if you look at this low C mode and if you see that it acts as a universal hash function, there is this property, it's not quite important what it is, but it's a property, noncryptographic property. If this low C mode has this property, then immediately this low C trap door function gives you deterministic encryption secure in the sense brief for block sources. Because it's used in the normal mode. You can decrypt. Its function deterministic, and you can show it's indeed secure for block sources or for the equivalent definition with just one message. And the proof is rather easy. It just uses the left-over hash lambda, and so it's pretty straightforward. And so we'll talk -- and I will tell you I guess I'll later have a slide that the constructions of low C trap door functions which were provided by Picker and Waters, they satisfied this restriction for one of them. So it has this universal low C mode so it can be used. We'll discuss it. But what about the strongest security against chosen cypher text attacks? So it's slightly more involved and we use the other primitive used by these people to construct randomized encryption schemes. So all but one trap door function is a generalization of a low C function. It has an additional input set of branches or modes, you can also call. One branch specify a low C function but all others specify normal invertible functions and again you cannot tell which is which, by given a branch you cannot tell. So it's just slightly more general. And, again, several constructions are there. And so if we have -- and this is our construction of deterministic encryption scheme. It's different from the construction of Picker and Waters which used one-time signature scheme which seemed to be hard to derandomize. This is how a cypher text in our scheme would look like. So here we have some hash function. This is a low C trap door function. And this is all but one function. And this hash specifies a branch. And you can decrypt. It's deterministic. And we show that it is secure -- actually, it's either pre-one or pre for block sources. It's a typo. For this intermediate notion. So if hash -- this is low C trap door function with the universal low C mode. This is all but one function. Again, with universal branch and this is a hash function which is both universal and target collision resistant. It's a weaker notion than standard collision resistance. Under these assumptions we show that this is pre for block sources. Okay. So this is general construction. What about instantiations? As I said, the DDH, Decisional Deharmon based construction provided by Picker and Waters, has universal mode so we can just plug it in. But we need for the CCA construction, we need hash, which is both universal and target collision resistant. To the best of my knowledge, it's the first time when these two properties are needed together. Usually it's one or another. If we want to work the DDH assumption, well we've found some schemes, some constructions, actually, of existing hash functions, and we show that they satisfy both these properties. And we show this for two popular groups where DDH is hard. So we have under the DDH assumption we have everything. But the constructions of low C trap door functions, but trap door functions based on DDH which were provided in the other paper, so they are inefficient. They're actually bit by bit in the matrix form. Not very efficient. So we wanted to do more, something more efficient. But the other constructions, for example, the head constructions of these low C functions based on Palliere's (phonetic) based encryption and the other assumption, but they didn't seem to have universal low C modes and this is an additional thing we needed. So didn't seem to work. And for that we just designed a tweak to our general construction, which allows us to avoid this additional restriction. So we just modify all construction by using Parez (phonetic) independent but convertible hash functions. If you have this and there are some constructions which you can have, if you process the message first with these hash functions, then we say that the assumptions can be weakened, altogether. You don't need universal low C modes of these two primitives anymore. And also you don't need universality of the hash. So it seems better with this streak and we can -- then we show that you can use Pallier-based (phonetic) construction from Picker and Waters but we actually improve efficiency over it. Just make it even more efficient. But funny, it was independently, the exact same efficient Pallier-based constructions were discovered by Rosa and Sager very recently. And so we can have something more efficient. It's not as close to efficiency to random Oracle constructions but it's actually not that bad. And we have security in the standard model without assuming random Oracles. But it's for block sources. Okay. This is about all the main results we did on the public key deterministic encryption. So now, as you mentioned, as you noticed, I was going to trick you a bit by saying deterministic encryption is what you need for this search, but it doesn't have to be deterministic. So it doesn't have to be for the same functionality of this particular application. And so this is why we generalize what we need. We call it efficiently searchable encryption, this primitive, and deterministic encryption is just one particular case of it that satisfies it. Yeah? >>: But when you say encryption, how do you guarantee that you have length preserving mechanism? Because I see that it's a nice feature of actually storing the data. Using the data. But it's actually for if you're assuming just serial storage of it. If you didn't have length per serving, you talked about rights of data, (inaudible) then it becomes an issue of observation. >> Alexandra Boldyreva: Just the construction was deterministic. It was length preserving. >>: So for the case of the hash ->> Alexandra Boldyreva: Hash and RSA. No, the length preserving one, when the hashing was done kind of -- so maybe I can go back quickly and we'll see. Okay. This is the scheme. So this is the message. Real size. And this is the cypher text. Real size. They're about the same. That's my proof. So okay. So if the message is really short, you cannot use just short RSA, because you need to have at least a thousand 24. But then probably it's not a big deal if you have a few extra bits. Right. So it cannot get shorter than the RSA key, but if it's longer, you just do it like this. This is length preserving transformation, hash, on the part of the message, and they're inside. It doesn't increase the size of the cypher text. And you apply the RSA. It's a permutation. You can stay the same size as long as it's not shorter than the size of the RSA key. It's just by construction. And all the other constructions are not length preserving. This is the only one we had. Efficiently searchable encryption, just the generalization. We say you don't need deterministic encryption. We want the same functionality. So for that we require encryption scheme to have two functions in addition have to be defined. F and G. Which are used for queries and indexing. So if someone sends a cypher text, so the server will use one function, G, on the cypher text in order to index. Know where to locate. Then whenever the query comes in, the user would use the other function, F, on the message to compute the query. And the server will take it and go to the exact same index. So we asked -- the results of these functions to be the same. What server can use and what the user can -- so basically what sender can send and the server can use and what the receiver can use to query. In the occasion of deterministic encryption, both of these functions are just cypher text, basically. This whole part. In your solution, which we are coming to, it's going to be just the hash. You take the same cypher text and hash part of it. This is your function. And you just hash the message. This is the function and they coincide. Doesn't have to be the whole cypher text which is the same. It can be a part, which can be used. So but deterministic encryption is just a special case. The same security definition applies because in the definition we didn't say fit has to be deterministic. Same security definitions are there. And your construction, which you saw immediately we called encrypt and hash, you just encrypt and randomize scheme and append the hash and you use it for searching. And it was proposed in the database literature, but we just confirmed that it is secure in the random Oracle model, if the hash has this ideal property. It also brings me to one question that this scheme, maybe I overemphasized a little bit the importance of deterministic encryption, because for the main application we see that it's the hash what's needed. In the random Oracle model no problem. In the standard model, so it makes sense to design not the whole deterministic encryption scheme, but just the hash, which has -- so what properties? So that the whole scheme satisfy our definitions. So we see that -- we didn't state it anywhere, but it seems fine that for block sources just universal hash function and the leftover hash lambda will work. Still a big open question, not much -- well, seems easier than encryption but still seems to be very hard to design a hash secure in the strongest sense. Still open question, not for block sources. I don't know the solution. I know several people are trying. The results so far we'll see. Good. So just one small point here. If you use encrypt and hash construction, then you can have some efficiency security trade-off because if you don't like that you can see that where the equal messages are in the database, you can try to decrease the output of the hash. So have some collisions, actually, there. So several plain text will map to the same cypher text, the server wouldn't be able to tell. It will come with the price of having more false positives on each query. So the server will return more answers. Maybe preferable in some applications. And so just this trade-off you may have to limit the output of the hash. And I have a few minutes and I just want to tell you a little bit the symmetric setting and more flexible queries. So deterministic encryption in the symmetric setting we treated it, too. Things are a bit easier there. So we have the definition and because things are easy we'll start the definition not only provide privacy but also provide authenticity, and finding the definition is also not that complicated. So we did this. Our construction, what we had, just if you're interested, asks to compute a map of the message. This is basically in place of the hash we used in the public key setting, and most MACs is a PDF, good randomized function. We use this tag, this MAC for the ID for the CVC encryption mode. Surprisingly, it's the same construction which was used for independently for totally different application which is called key wrap by Rogueway and Trimpton (phonetic). Okay. So this is very briefly about the symmetric key setting. So when we did it, we were kind of happy with ourselves, and, for example, the symmetric solution represented at the database conference, we thought they would like it, and they said, okay, but it's only exact match. It's about what we want to do all types of queries. We want to do minimum, maximum, range queries, average, do it -- it's not enough -- it's like okay, one thing at a time. So what they're particularly interested about. And, again, they're trying to do it themselves, they want to do range queries. So you have your data now return with the records which contain cypher text and encryptions from 100 to 300, something like that. And, again, they had several ideas. And two ideas, actually. One is I'm not sure if it's their name, but the idea is if it's (inaudible) encryption and remember your plain text of numbers in the binary form. So they encrypt everything bit by bit and they have the property that -- and it's kind of deterministic, right? So if two messages start with the same prefix, the cypher text will also start with the same prefix. That's why I call it prefix preserving. And they say you can express a range, a set of numbers, sometimes with just the minimum set of prefixes. Right? So then basically how it's called star at the end whatever after. So this prefix and the rest can be anything. And this prefix. And you can express a range like this. They suggest to use it to answer in range queries. The other suggestion was to use all the preserving encryption. Meaning if two plain texts stay in order one smaller than the other, the same should hold in the cypher text. So you can sort the cypher text and that's all. And then if it's deterministic, you just encrypt the range and the database will return everything in between. >>: Does such a stream exist? >> Alexandra Boldyreva: They designed something. I looked at it. My student looked at it. It doesn't make any sense. So I cannot parse it. No. Even though I told them it doesn't make sense, they said it must make sense because it was published in our best database conference. And I said no, they say it must. It's very good. >>: Order preserving identical to one part code books which I believe in the 19th century they knew how to crack. >> Alexandra Boldyreva: Actually, so maybe you can notice that it's easy to crack. So if you're in the public key setting, okay? If it's public key, if you can encrypt everything yourself it's deterministic or preserving, it doesn't make any sense. If you have some cypher text, you take some message, encrypt it. If it's larger you go here. Encrypt it and you will converge to the message. So I don't know how can it make sense. But they're talking about symmetric, you're saying, where you're saying it seems to me that it can make some sense. It's not very strong security. But something can be there. Right? So at least it's not everything is obviously insecure. You can have some security. >>: It's not totally (inaudible) but it's broken in (inaudible). >> Alexandra Boldyreva: But it worked, what scheme? >>: One time code books have notoriously been weak since ->> Alexandra Boldyreva: What does one time code books? >>: The plain words on the code book are in the same order. So if you know -- so you find a code work, you know this must be before Ali Baba and after Ali Baba but before (inaudible) because it goes alphabetical order? >> Alexandra Boldyreva: Well...well, one thing is if you think about the cypher text space, definitely has to be way larger than the message space. It's not the case in your scheme. So if it's way larger, these simple things do not necessarily hold. >>: Order preserving. If I know two messages, I know which message is between them and not between them. >> Alexandra Boldyreva: Right. But you look at the cypher text, and they can be way farther apart. So I think it may make sense. >>: May make sense. >> Alexandra Boldyreva: So that's what we're looking at now. So I don't have all the results, but we're trying to define security synch and you have some security and then trying to find the schemes. Maybe we can discuss. I'll tell you will our scheme and you try to crack it. That's our first task. It can be interesting. Preference preserving scheme, we also had something which may be more efficient than what they had, but it seems less interesting to me because when used for range queries it actually can reveal some information for some ranges just looking at the cypher text you can know, for example, for bit-by-bit encryption. I'm not sure how useful it is. But order encryption scheme seems interesting to me. It seems to make some sense. So we can discuss it. Actually, there are some ideas here, and it's interesting. It's the first time I do something when the solution has so little encrypted, it's just probability of theory. Some distributions and just a question of how to efficiently sample a random order preserving function. We'll See. And that's it. So on time. So cited some security definitions for deterministic and efficiently searchable functions, showed some relations between them and provided construction in the random Oracle model. In the strongest sense and some less efficient construction but in the standard model for slightly weaker definition. So discussed this more general primitive efficiently searchable encryption. Several constructions -- one construction there and discussed symmetric setting and more flexible queries. Thank you. (Applause) >>: One question. >> Alexandra Boldyreva: More questions. >>: I have a question. In the efficiently searchable encryption don't you need (inaudible) F and G are both noninvertible with private key? >> Alexandra Boldyreva: It is probably implied by the security notion. But this is just the minimum syntax you need just to have functionality. But then you look at the security definition. So most probably, yes. If it's not satisfied, it will not be satisfied. But it had nothing to do with the security at the time. It's just what do we need to do to do search and then you later whether to secure it or not. >>: This may be a unfair question. But it seems naively this searchable deterministic encryption we'll all be broke in the first two years when we learn our first language, what is the difference? Why is this encryption stronger than what a two-year-old baby can break? >> Alexandra Boldyreva: So what is it that can break? >>: The known language. And we all manage by listening around. About two years to learn one language. Those who don't are in deep trouble. >> Alexandra Boldyreva: Right. >>: Why is it that deterministic encryption, any stronger than a code two-year-old babies break? Not all two-year-olds can get job offers with NSA. >> Alexandra Boldyreva: The message space is larger. It's not possible to remember it all. And you cannot go back. You just don't remember all the mappings. >>: You can break, can remember all the messages, babies maybe can't ->> Alexandra Boldyreva: Do they learn more than they hear? They actually ->>: They learn what they hear. >> Alexandra Boldyreva: But here you ->>: English or German or Hebrew or French somehow. >>: Smaller vocabulary than this case. >> Alexandra Boldyreva: Right. Here you hear way less than what you're actually asked to correct. >>: Maybe you should say the full message, should be longer than a telephone number. I doubt most services are on telephone numbers, galaxy numbers. >> Alexandra Boldyreva: What you're saying is that in practice you will not -- so our requirement that the message space is very large, a lot of high entropy. In reality, yes, even probably my name you can -- Alexandra is not that bad. But it's a good point. >>: (Inaudible). >> Alexandra Boldyreva: Exactly. It's a good point. So what it says that, look, if you want to have -- they just want to have fast search. And they say whatever the best security. That's the best security? It's not very good. The point is if you don't have a large message space, and probably you don't have it, well, then, we cannot help you. Then you have to do more work on searching if you want high security. Absolutely. >>: Does it have to do with the fact that your definition of security, the first one never sees the public key. Is the baby trying to learn with some critical piece of information that he never gets to see? >> Alexandra Boldyreva: I somehow don't think this is the reason. It's probably more to the difference; but even without it, I'm not sure. >>: One of the (inaudible). >> Alexandra Boldyreva: The other one can, yes. Who is asked ->>: Who is actually ->> Alexandra Boldyreva: And they can encrypt everything with the public, so it's not really a bad thing. You see? >>: And also no traffic analysis, just knowing when that query is sent? The day one Bubba has a heart attack, you know we'll have a pathway. >> Alexandra Boldyreva: Good question. We actually did internalize -- like somehow it was the security of the primitive. How it was used like the server would know what it finds. So all these things would be -- they have to be treated additionally but that's true. There are things like that, yeah. >> Kristin Lauter: More questions? So let's thank her again. (Applause)

16059 - Msecnd.net

Related documents

Products

Support

16059 - Msecnd.net

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib