>>: The first speaker is Emily Shen. She's a... with us as an intern this summer. And she'll tell...

>>: The first speaker is Emily Shen. She's a grad student at MIT, and she's here with us as an intern this summer. And she'll tell about you evaluating predicates privately over encrypted data. >> Emily Shen: Okay. Thanks. I'm going to talk about evaluating predicates profile on encrypted data. And this is joint work with Elaine Shi and Brent Waters and appeared in TCC last year. Okay. So in the traditional encryption scheme, you can encrypt data under a public key or a secret key. And then only the owner of the secret key can decrypt the data. So in a sense decryption is sort of all or nothing. If you have the secret key, you can decrypt the message, and if you don't have a secret key, you can't get any information. But in some applications ->>: [inaudible]. >> Emily Shen: I don't know. Now it is. But in some applications what we want is more fine-grained control over who can access the data and what can be learned about the data. So for example, you can imagine that a user is receiving e-mail encrypted under her public key. And depending on certain -- whether certain predicates are true on the e-mail, she wants to route the e-mail accordingly either to her phone or her desktop. So if the subject line includes the word urgent, then she wants the e-mail to be routed to her phone and otherwise just sent to her desktop. So the user wants to be able to give the server a token corresponding to these predicates and let the server test these things but not learn any other information about the e-mails. As another example, you can imagine that a user is storing her files encrypted on the cloud and at some later point she wants to retrieve all the files satisfying a certain predicate; for example the category equals work and the subject line includes Microsoft. So predicate encryption is a new encryption paradigm which gives us this more fine-grained access control. And, okay, what does a predicate encryption scheme look like? We have four algorithms. A setup algorithm which gives us a public key and a secret key. And an encryption algorithm which takes a public key and a message and returns a ciphertext, a generate token algorithm which takes the secret key and the description of a predicate and gives us a token for that predicate. So now we have a query algorithm that takes a token for a predicate F and a ciphertext for a plaintext X and it will return one if the predicate F evaluated on X is true and 0 otherwise. So and we can also attach a payload message to the -- so that you're encrypting a message along with X and the query algorithm will returned the message if the predicate is true and nothing otherwise. But for a construction it's simpler to just consider a predicate only version of these encryption schemes. So predicate encryption is related to a lot of other encryption paradigms which we all have heard of, for example, identity-based encryption is -- can be seen as a specially case of predicate encryption where the predicates correspond to equality testing. So in identity-based encryption each person's public key is a string corresponding to their identity, for example, their e-mail address. You can encrypt under that public key and only the person who has the key for that string can decrypt the message. In attribute-based encryption, a user can receive a capability for a policy over the attributes -- over attributes of encrypted data. And so a ciphertext can be decrypted if a user has a key for a policy that is satisfied by the attributes of that data. But one important difference is that in IBE and ABE the identity and the attributes are not hidden, whereas in predicate encryption, the predicate is encrypted and hidden. So previous work on predicate encryption has looked at the security notion which we'll call plaintext privacy. Roughly what this is is that ciphertexts should reveal nothing about the encrypted plaintexts beyond what is revealed by the evaluation of predicates on them. So a little bit more formally. If an adversary has tokens for predicates F1 through FM, then a ciphertext for a plaintext X should reveal nothing about X other than what the adversary can learn by evaluating F1 of X through FM of X. And there's a whole series of works that achieve predicate encryption with plaintext privacy for various predicates, starting with equality testing and then conjunctions and disjunctions of equalities. And most recently the evaluation of inner product queries, which was done by Katz, Sahai and Waters. So one line of research looks at expanding the expressiveness of these schemes and expanding the class of predicates that we can achieve. But in our work we look at a different aspect, which is the security definitions. And we ask whether in addition to plaintext privacy we can also achieve predicate privacy. So what do I mean by this? In our example of encrypted e-mail routing, I said that if the subject line includes the word urgent you want the e-mail to be forwarded to your phone and otherwise you want it to be forwarded to your desktop. But probably the user wants to hide these e-mail routing rules from the server. You don't want the server to know what predicates you're testing. And in the encrypted file storage example, when you retrieve all files satisfying some criteria, you want to be able to hide these search criteria from the server. So it turns out that predicate privacy is actually difficult to capture in the public key setting. And the reason for this is is that in the public key setting, the adversary can encrypt any messages of his choice. So for example, in this encrypted file storage case, if the server has a guess that the predicate you're searching on is that the subject includes Microsoft, then the server can just test this by encrypting a message where the subject is Microsoft and testing whether the token successfully decrypts that message. So for this reason we focus on the symmetric key setting. And we come up with a scheme for predicate description that provides predicate privacy in the symmetric key setting. >>: [inaudible]. >> Emily Shen: Yeah? >>: [inaudible] outputs X and two functions [inaudible] such that one of X equals sub two of X and then the example Microsoft disappears. So essentially you will just [inaudible] functions one of two an input X such that one of X is equal to two of X. Right? And then you -- you will produce a [inaudible]. >> Emily Shen: So if you imagine indistinguishability game, the adversary is going to say here are two predicates that I think I can distinguish tokens for. And now it's going to get back tokens for those predicates, but you can encrypt messages that help you distinguish between the two predicates. Okay. So a systematic key predicate encryption scheme looks pretty much like the public key version but now our setup algorithm returns just a secret key. You need the secret key to encrypt a message and to generate a token you need the secret key and a description of the predicate. And again, the query algorithm is going to return one or zero, depending on the result of the predicate on the plaintext. Okay. So let's see what the definition of predicate privacy looks like a little bit more formally. We're going to define this in terms of a game between an adversary and a challenger. And in the first step the challenger runs the setup algorithm, generates a secret key and keeps that to himself. Then the adversary goes through a query phase where it can make queries that are of one of two times. It can make ciphertext queries and token queries. So in a ciphertext query, it's going to output plaintext XI and get back an encryption of XI. In a token query it will output a predicate FI and get back a token for FI. So you can run several of these queries and issue them adaptively. And then when the query -- when the adversary wants to be challenged, it will output to predicates F star 0 and F star 1 that it thinks it can distinguish between. And this is subject to the restriction that these two predicates have to have the same value on all of the plaintexts that were queried so far. Now, the challenger is going to flip a random bit B and give back a tone for F star of B. And the adversary can ask some more queries, subject to the same restriction as before. And finally the adversary outputs a guess B prime of this bit B. And it wins if B prime equals B. And we say that our scheme has predicate privacy if no polynomial time adversary can win this game with more than negligible probability. So here I've shown the game in terms of a token challenge where in the challenge phase the adversary's outputting these two predicates F0 and F1. And this corresponds to the predicate privacy game. If we replace the challenge step with a ciphertext challenge where the adversary is giving two plaintexts and getting back an encryption of one of them, then this corresponds to the plaintext privacy notion which is already achieved by these public key predicate encryption schemes. Okay. So in our work we construct a predicate encryption scheme with predicate privacy. And we do this by transforming the inner product scheme of Katz, Sahai and Waters into a symmetric key scheme that has not only plaintext privacy but also predicate privacy. So here's what the scheme looks like. The functionality is that we have a plaintext X, which is a vector X1 through XM, and predicates corresponding to vectors V1 through VM. And a predicate for vector V evaluates to true on X if the inner-product of X and V is equal to 0 mod N. So the key intuition for how we construct our scheme is that for inner products, the ciphertexts and the tokens actually play symmetric roles in the functionality and the security game. So what I mean by this is while you can view the ciphertext as an encryption of the plaintext and you can view the token as and encryption of the predicate and now in the inner-product functionality these two vectors are treated symmetrically and the same thing in the security game. So if we can come up with a scheme where the ciphertext is created from the plaintext in exactly the same way that the token is created from the predicate, then if we prove plaintext privacy, this automatically implies predicate privacy. So our initial scheme looks a lot like the scheme of KSW, which means that in a relatively straightforward way, we can prove plaintext privacy. And the next step is that we show through a series of modifications that our scheme is actually indistinguishable from one where the ciphertexts and the tokens are treated symmetrically, and so the plaintext privacy that we already proved gives us predicate privacy for free. Okay. So before I go into the scheme, you might ask why we're even considering inner products. And it turns out that inner-product queries actually allow us to make a lot of other more expressive queries as well. So I'll just give a simple example of this. So suppose we have an inner products scheme that has these four algorithms, setup, encrypt, generate token, and query that gives us the inner-product functionality. And I'm just going to show how we can use this to actually get equality testing on elements of ZN. So if our plaintexts are elements of ZN and our predicates are equality testing, we can use an inner-products scheme of dimension two. So to encrypt an element J of ZN, we first create a vector which is minus JN1, and we encrypt this vector using our inner-product scheme. And to generate a token for testing equality to an element K, that's in ZN, then we first create a vector which is 1, K and use the inner-product scheme to generate a token for that. So now the query algorithm works exactly as before, we just run the inner-product query algorithm, and you can see that the inner-product of these two vectors minus J and 1 and 1 and K is equal to 0, if and only if J is equal to K. Yes? >>: [inaudible]. >> Emily Shen: On yes, this is ->>: [inaudible] very specially [inaudible] scheme. >> Emily Shen: It is mod N, which is the N which is part of our scheme. Right. Okay. So that's just a simple example. But you can also show that using inner products you can come up with inner products you can come up with predicate encryption schemes to do polynomial evaluation, and using those you can do conjunctions and disjunctions and exact thresholds and other things. Okay. So I'll just show sort of at a high level what our scheme looks like. In our scheme we use bilinear maps, so we have two groups, G and GT which are finite cyclic groups of order N where here N is a composite which is the product of four primes, P, Q, R, and S. And we can write G as the product of these four subgroups, GP, GQ, GR, and GS. And we have a bilinear map E which takes G cross G to GT. The bilinear property says that if we paired G to the A and G to the B, it's equal to the pairing of G, G to the AB. So we can pull the exponents out. And the map has to be non-degenerate, which says that if G generates the entire group G, then the pairing of G with itself generates the entire target group. Okay. So there are a couple of useful mathematical properties that we need in our schemes. So one is that if we pair A with the product of B and C, we can split this up into the pairing of A with B and the product of A and C. The other useful property which comes up in groups of composite order is that when we pair two elements G and H, which are of relatively prime orders, basically they cancel out, and the result is one. So this means when G and H come from distinct subgroups, program, if G is from the subgroup of order P and H is from the subgroup of order Q, when we pair them together, it's just going to cancel out and we get one. >>: [inaudible]. >> Emily Shen: Yes. Basically you can write each element as a generator of the group to some power. So if you look at an element of the subgroup of order P, you can write it as G to the QRS to some power and if you do that for both of the elements, then you can pull out the powers and you have a P, Q, R, S all in the exponent. So it's going to equal one. Okay. So again we have a plaintext X, which is vector X1 through XM and a described V1 through VM. And we want the evaluation of a token on a ciphertext to come out to one if the inner product is equal to zero. So as a first attempt, we can say for the ciphertext first just generate random element G of the subgroup GQ and raise G to the powers X1 through XM and for the token similarly choose a random value H in GQ and raise H to the V1 through VM. So now we can easily see if we just pair these elements together component by component and multiple all the pairings, we end up with the -- we end up with E of G, H, with the inner product of X and V mod N and the exponent. So if the inner product is equal to zero, then this value is going to be equal to one in the target group. So this gives us the functionality we want but there's a lot of problems with it. So one problem is that we get ciphertext-ciphertext interaction. What I mean by that is if you look at the ciphertext, which is G to the X1 through G to the XM, we can learn a lot about the XIs just by doing pairings on these terms. So, for example, you can learn whether the inner product of X1, X3 and X2, X4 is equal to zero by pairing the components in this way. And similarly for the tokens. So the solution to this is that we multiply add masking factors. So in the ciphertext we can multiply in random values from the subgroup GR. And for the tokens we can multiply random values from the subgroup GS. So now because of the cancellation property that I mentioned before, if we pair a cyphertext component with a token component, the R and the S are going to cancel out because they're from distinct subgroups. But if we improperly combine two ciphertext components or two token components, then the randomization is from the same group, and it's not going to cancel out, and the result will look random. Okay. So another problem that we still have is that we can sort of do partial evaluation. And what I mean by that is you could basically combine just a subset of the components and learn something about the inner product of only some of the components. So, for example, you could learn the result of learn whether the inner product of X1, X2 and V1, V2 is equal to zero by doing this pairing. So the solution to that is in the exponent of the GP subgroup we encode an equation which had evaluates to zero if all of the components are combined and in the right order. So roughly what we do is in the ciphertext we multiply in these terms F to the C1 through F to the CM, and in the token we multiply an F to the D1 and through F to the DM and these C1 -- CIs and DIs are random but subject to the constraint that the inner product of the vector C and the vector D is equal to zero. So now only if you combine all of the components and in the right order will they cancel out and you get something that does not look random. >>: [inaudible]. >> Emily Shen: Sorry? >>: [inaudible] part of the secret key [inaudible]. >> Emily Shen: That's just something that's -- yeah, it's part of the secret key. So I mean what we do is actually not exactly this. But it's something similar, which is also an equation which comes out to zero if you pair things correctly. So part of it is from the secret key and part of it is randomness that you generate in these algorithms. Okay. So to summarize, we have these four subgroups which each play different roles. In the GQ subgroup we encode the plaintext and the predicate and the exponent and do the inner-product computation. We use the GR subgroup to mask the ciphertext components. The GS subgroup similarly masks the token components. And the GP subgroup is used to ensure that all the components are used and in the right order. So in conclusion we achieved a predicate encryption scheme that has predicate privacy in the symmetric key setting and supports inner-product queries. And from inner-product we can also achieve equality testing polynomial evaluation and a conjunctive and disjunctive formulas. And what this means is that in encrypted file storage scenarios on the cloud a user can give a server a token to retrieve encrypted file satisfying a particular predicate P without revealing what that predicate is. So that's the end of my talk. And I'll talk any questions. >>: [inaudible]. >> Emily Shen: So I mean to me in the file storage case, that's a case where it would be important because you're storing your files encrypted and now you want to retrieve the one satisfying a particular predicate and you don't necessarily want to reveal to the server what you're searching on. >>: [inaudible] I don't understand the definition of predicate. If I take a conjunction [inaudible]. Predicate privacy jugs [inaudible] conjunction to [inaudible] it doesn't say what [inaudible]. >> Emily Shen: It's not even going to reveal that you took a conjunction of equality [inaudible]. >>: I mean what you defined this predicate privacy, the fact that you get conjunction for [inaudible]. >> Emily Shen: You won't even know that it's a conjunction of [inaudible]. >>: I know. >> Emily Shen: Okay. >>: But that's what you tried to achieve by predicate privacy, effectively it is a conjunction, right? >> Emily Shen: I'm not sure I understand the question. Okay. Sure. >>: I don't know the numbers off the top of my head. If you take say ten 24 RS, RSA size modulos and chop it into four equally sized parts, are they safe [inaudible] of X? >> Emily Shen: Are they safe against? >>: [inaudible] P, Q, R, S. >> Emily Shen: Right. So it means that we [inaudible] larger size. >>: [inaudible] elliptic curve and [inaudible] because clearly you're nearer than ->>: That's much smaller. >>: Much smaller. Okay. >>: [inaudible]. >>: I'm sorry? >>: That's about 50 [inaudible]. >>: Okay. >>: Any more questions? >>: So I'm just clear about this public key definition. >> Emily Shen: Yeah? >>: I can see the issue, but you still can [inaudible] I mean so what I'm saying is maybe the -- it seems like if you can do it [inaudible] obfuscation, all right [inaudible]. >> Emily Shen: Yeah. So that's something that people have mentioned. So it's possible that you could define it in some other way, not with the same kind of indistinguishability game, but maybe you could have a sort of simulation based definition where you want to say that an adversary basically -- that the ciphertext and tokens give the adversary no more information than he would get by just having of access to a ->>: Yeah, but this is [inaudible] so what I'm saying is [inaudible], you know, token from the function F, one way to achieve program obfuscation is you encrypt X and then, you know, get F of X. So particularly it does give you a [inaudible] FRX for any input you want. So, you know, if you could say that's the best you can do for any attacker, whatever you can learned from this, you can give [inaudible] doesn't exactly [inaudible] obfuscation. >> Emily Shen: Yeah. >>: So some [inaudible] but maybe for [inaudible] it's kind of funny. So this [inaudible] goes the other way around. This thing is stronger than program obfuscation, right? >> Emily Shen: Yeah. >>: So it may be for the [inaudible]. >>: That's [inaudible]. >>: Oh, I see. [inaudible]. >>: Equality function [inaudible]. >>: [inaudible]. I mean, it doesn't give you as [inaudible] it doesn't give you a [inaudible] functions. >>: [inaudible]. >>: [inaudible]. >>: But I'm just [inaudible]. >>: [inaudible]. >> Emily Shen: Yes. So we didn't explore it, but I think it's worth exploring the connection. >>: [inaudible] questions? Okay. Let's thank the speaker. [applause]. >>: So the next speaker is our own Seny Kamara and he'll tell you about structured encryption and controlled disclosure. >> Seny Kamara: Thanks. So everybody can hear me? Okay. So as Vino said, I'll be speaking about structured encryption and controlled disclosure. And this is joint work with Melissa Chase. So the setting that we consider here is that of cloud storage, which you've heard a lot about today. So here we have a cloud provider that offers storage as a service and we have clients that want to pay the cloud provider to store the data, right. So they just send it over. And as we all know there's a lot of services based on cloud -- on cloud storage. So there is Web based e-mail like Gmail or Hotmail, Mozilla Weave, Live Mesh, Windows Azure has Azure storage and Amazon has S3. And there are a lot more services beyond these. Okay. So cloud storage is great. There's a lot of advantages. But really the main disadvantage is security, right? And the main concern here is basically what's going to happen to my data, right? And typically this is addressed by cloud providers in the following way: They say, well, don't worry too much about it, we're going to encrypt the data, we're going to authenticate it, it will be backed up, we have access control, our data centers are very secure, we have guards, we have biometric access control. So nobody's going to be able to get to your data. And these are all great -- these are all great measures, but they really only provide you security against either outsiders, like hackers, for example, or other tenants. So other clients that are using the same cloud storage infrastructure. So really the type model that we consider here and that you've heard about in previous talks today is basically one where we don't trust the cloud operator himself, right? So the question is instead of providing security against outsiders and other tenants, can we provide security against the cloud operator himself? Okay. So we might say, why do we want to protect against the cloud? These cloud providers are rational entities, they're trying to redundantly a business, they have a reputation, it's not in their interest to sort of do all these malicious things. So you might want to create against the cloud for multiple reasons. One of them is just plain lack of trust. If you're, for example, a pharmaceutical company that's investing you know billions in R&D, you're probably not going to entrust a cloud provider to store your data. Or you could be a government agency who basically won't trust anybody to store any of its data. And there's also legal situations where if you're a hospital or publically-traded company, by law you might not be able to disclose some of your data to a third party. So in this work what we're concerned about is confidentiality, right? So I have data. I want to store it in the cloud. But I don't want the cloud operator to see my data. So there is very simple solutions to this problem. One of them is to just use encryption so we take our data, we encrypt it, we send it over to the cloud, and whenever we want to perform some operation on this data, we send a message and the cloud returns the encrypted data back. We decrypt it and then perform our operations locally. Right? And obviously the downside to this approach is that there's large communication complexity, right? It doesn't really scale well. If we're talking about terabytes of data, we don't really want to be sending this around each time we want to do some operation. So another solution is to take the data when we have it if our possession and to build an index, right? So if I have my e-mail collection, for example, I can build a keyword index which allows me to do keyword search for this data, then I can store the index locally, encrypt all my e-mails, send them to the cloud and whenever I want to retrieve some e-mails I just send -- sorry, I query my -- I query my index and I figure out which e-mails contain the keyword, and then I ask the cloud to return those specific encrypted e-mails. Okay? And this says good communication complexity but the downside is that it requires large storage locally. Right? So the size of this index is going to grow as a function of -- as a function of an e-mail archive. Okay so. The question is can we achieve the best of both worlds? Can we have constant storage at the client and small communication complexity? And this is essentially the problem of searching encrypted data which was first considered by Song, Wagner and Perrig in 2001 and basically what we want is we want a system that the client can use to encrypt its data. And whenever it wants to work on a subset of these documents, it can send a token that encodes a particular keyword. The cloud can then take this token which hides the information about the keyword, combine it with the encrypted data and return the specific documents that contain the keyword. Okay? All right. So this problem of searchable encryption can be addressed using general tools. So one of them is two-party computation. But here if you use two-party computation, the number of rounds that you're going to have to use is going to be linear in the size of the data. On top of that, the server is going to have to do work that's polynomial in the size of the data. So this really doesn't give you any advantage over just sending the data back and forth. So you could also use techniques from Goldreich and Ostrovsky, which were called oblivious RAMs. But here sort of the overhead is going to be fairly large. For every read and write of your RAM, so you have a program that does keyword search, for every read and write that this program does you're going to require logarithmic number of rounds and polylog work on the server. Another thing you could use, as Craig mentioned this morning, is fully-homomorphic encryption. And here you can get a one round solution. But the server computation is going to be polynomial in the size of the data. And again, this is sort of large for the times of parameters that we have in mind. So people have tried to solve the problem by designing crypto systems or schemes specifically for searchable encryption. So there's been a lot of work in the symmetric key setting, in the public key setting and basically most of the schemes have a one round solution but the work of the server is linear in the number of data items. So if N is your number of e-mails, then the server -- the computation for the server is linear. But in 2006, work that we did with Curtmola, Garay and Ostrovsky, we gave a solution that one round and that had sublinear search time for the server and in particular it's optimal, it's basically linear in the number of documents that contain the keyword. Okay. So searchable encryption is great. The schemes are fairly efficient. And it basically allows to you do private keyword search for encrypted data, right, or encrypted text data. Sorry. You had a question? Sure. >>: [inaudible]. >> Seny Kamara: So it seems to me, at least all the symmetric key schemes, including this one, they do leak some information about the keywords. It's sort of they -- they leak information by access pattern basically. So it's not specifically about the keyword, but there's a little bit of the statistical information, and you could, you know, sort of, you know, keep monitoring sort of the access pattern and eventually get something. So, yeah. >>: [inaudible]. >> Seny Kamara: Yeah. >>: [inaudible]. >> Seny Kamara: So essentially kind of tricky, right, because you have to weaken the definitions to basically leak a little bit of information. So if you're defining things, you know, using the general tools it's fairly easy, you have an idea of functionality, and you can just -- but here you actually have to weaken it. >>: [inaudible]. >> Seny Kamara: Yeah. Yeah. >>: [inaudible]. You can do it sort of optimally. Basically the server computation time is output sensitive, so it's just ->>: [inaudible]. >> Seny Kamara: Exactly. Which is ->>: Which is [inaudible]. [brief talking over]. >> Seny Kamara: It's linear the number of documents with the keyword. >>: It's the order of the number of ->> Seny Kamara: You just use the reverse. >>: [inaudible]. >> Seny Kamara: You just use the reverse index on your data and you get optimal search time. In the unencrypted case. In the encrypted case, you have to use [inaudible] index but apply the crypto to it in order to make it secure. Okay. So okay. So searchable encryption is great. We can search for encrypted data. But really it's sort of -- it's actually fairly limited as far as a functionality, right? So we can do keyword searches or with text data where a lot of the data that's generated isn't text data, right? And in particular, a lot of this data that's not text data we don't really care about doing keyword searches over. Right? So the question is can we profile query other types of encrypted data, and in particular things like maps or image collections, social networks or Web page archives. Right? So this is a problem that we consider in this paper, and so we focus on a particular type of data which is a graph structured data. And really this type of data is ubiquitous and is being generated at a huge rate, right? So basically any communication between, you know, between people generates graphs, right? So if you look at e-mails headers, if you look at phone logs, you're going to get huge graphs. If you look at research papers and citations and bibliographies, you also get big graphs. A lot of sort of network topology things generate these huge graphs, you know, AS level graphs or Internet scale graphs as well of course social networks, right, are generating huge, huge amounts of graph data. And. And a Web crawlers for search engines, right, for bing or Google, they're generating massive, massive graphs. And finally you can also think of maps as graphs, where you have basically intersections are the nodes of the graph and the roads between the intersections are your edges. So we felt that this was sort of an interesting type of data and we wanted to see if we could encrypt it in a way that we could actually do graph queries on top of this data. Okay. So we introduced this idea of structured encryption, which is basically a generalization of searchable encryption. So other types of data, not just text data but arbitrarily structured data. We have a formal security definition which is simulation-based. And we give a few constructions. So we show how to do adjacency queries on encrypted graphs, which are basically given two nodes [inaudible]. We started with neighbor queries on encrypted graphs, so given a node I want to get back all the nodes that have that are adjacent to that node. And we also do something called focused subgraph queries on encrypted Web graphs. And these are sort of more complex types of queries. And I'll explain those later. Okay. And finally we consider sort of a new application of structured encryption, and therefore searchable encryption to a problem that we refer to as controlled disclosure. And we show some applications of -- we sort of mention some applications of controlled disclosure to cloud-based data brokering, which if I have some time, I'll explain what that is. Okay. So for structured encryption, so if we take the example of graph structured data, right, so the idea is we have our client, he has this graph, he wants to encrypt underneath the cloud and later he wants to make some query, some graph query on this graph. He's going to assign a token, which is being to encode the graph query he wants to make, but it's going to hide the information about the query's making. And then the cloud is going to be able to return the encrypted -- the encryptions of whatever the answer is. Okay? So for this particular type of encryption scheme, right, you can see sort of the message space is a little bit weird, right? So typically, for typical encryption schemes the message space is either a bit string or some [inaudible] of a group, right? But now we're encrypting something that has a lot more -- is a lot more structured. So the first thing that we have to do is we have to understand what exactly is our message space. So the way we describe the message space is as follows: So we say -- so we -- okay. So we viewed the but as some structured data. And then we say we're going to decompose this structured data into two elements. One of them is a data structure that encodes the structure of the data and the other one are the data items, which basically are just bit strings, right? Which sort of have whatever information we want to associate with the structure encoded by the data structure. So as an example, if you have an e-mail archive where you [inaudible] e-mails, then we're going to view it as a combination of a keyword index and then the text in the e-mails. Okay? So this index just encodes the keywords associated with each of the e-mails. And if I query -- if I query this data structure, if I query this index on the keyword I'm going to get pointers into the particular e-mails that contain the keyword. So another example is for social networks. So here we have a social network, and we're going to view this as a combination of a graph which encodes the friendships between the people and their profiles for example. And if I do a graph query on this graph, I'm going to get pointers to the different -- to the different profiles. Okay. So such encryption schemes composed of five algorithms. The first one is a key generation algorithm that's a security parameter outputs a symmetric key. Then there's an encryption algorithm that takes the key, takes the data structure, delta, which encodes the structure, and then a vector which consists of the data items. Okay? And it outputs an encrypted data structure gamma, and a ciphertext C. And this is where the client is going to run on the structured data in order to generate the ciphertext, which is composed of the actual encrypted data items and the encrypted data structure. And that's what he's going to send to the cloud. Whenever the client wants to make a query on this data, he's going to use the token algorithm on his query, which is going to generate a token, which he's going to send to the cloud and then the cloud is going to take the token and the encrypted data structure, run this query algorithm on the encrypted data structure with the token, and this is going to output a set pointers, right? I is just a set of numbers, which basically point to the particular data items that satisfy the query. And then it can just go fetch those encrypted items and send them back. And then the client can just decrypt those items individually. Is. So as far as security, so we give the definition which we refer to as security against adaptive chosen query attacks. So it's simulation based. And essentially the guarantee is given the ciphertext no adversary can learn any information about the data or the queries other than what can be learned from the access and the search patterns. And that's even if the queries are made adaptively. So by adaptive, by adaptive queries I mean that even if the adversary makes his queries as a function of answers of previous queries or as a function of the ciphertext. Okay? And so there's this caveat in the definition which had says that nothing can be learned other than what can be deduced from the access and the search patterns, which basically means that the schemes leak the access in the search pattern and to be more precise, what I mean by access the search pattern is the following. So the access patterns are basically the pointers to the encrypted documents that satisfy the query, all right, which we're willing to leak in this case because we want an efficient solution. And in any case the server -- we want the server to return these encrypted data items. So we might as well just leak it. So there are techniques to hide this. But they're much more expensive. And we search and we leak -- we leak something else, which is the query pattern which is basically whether a query is repeated. Okay? And this is pretty standard, at least in the case of symmetric searchable encryption. All the schemes leak, leak these things. Some leak more. But this is sort of the best we can do without interaction. At least so far. So we have this simulation-based definition and this definition implies the lower bound on the size of the tokens. In particular. So the lower bound is lambda times log N where lambda is the number of data items that satisfy the query and N is the number of data items, okay? And this lower bound is in the standard model. So and then the local model you can do better. >>: [inaudible]. >> Seny Kamara: It's sort of obvious. I mean, the thing is that this -- so this definition is adaptive basically so you get into a situation where the simulator basically has to sort of commit to some encryption and then he's going to get queries after he's committed to the ciphertext. And he has to answer those queries correctly. So he has to generate tokens that are actually going to work. But he only gets the query -- so he only gets information from the query afterwards. And if you want to be able to satisfy any possible query basically a token -- you're going to have enough tokens sort of in your token space to satisfy any possible answer. So in terms of the size it's going to be basically related to the size of all the different answers that you have to be able to sort of simulate. And the log N is really just because what you -- what you encode in terms of the responses are pointers into the items and you have N items, so you need log N in or to -- so that's pretty much it. >>: [inaudible]. >> Seny Kamara: I'm sorry? >>: [inaudible]. >> Seny Kamara: Not in this work. [inaudible] which, you know, which has some nice properties, right, typically -- you typically prefer these types of definition, they compose easier, you know, they're sort of more natural also. So -- but, yeah, you could also formulate a -- you know a [inaudible] based definition. Okay. So we consider -- so we consider, we give different constructions. So we do adjacency queries on encrypted graphs. And we do this -- so we design a scheme that handles lookup queries on encrypted matrices and then we basically just look at the adjacency matrix representing on the graph and then just use the scheme on top of it. We also do neighbor queries on encrypted graphs and we actually show that this can be built from any structured encryption scheme that handles keyword searches, so basically any searchable encryption scheme. And then as I mentioned, we do also focused subgraph queries on encrypted Web graphs. And for this we need a scheme that handles keyword searches on encrypted data, so basically and SSE scheme, and a scheme that handles neighbor queries on encrypted graphs. So I'll talk a little about how we do neighbor queries on encrypted graphs, and then I'll talk about how we do focused subgraph queries, depending how much time I have. Okay. So how do we do neighbor queries on encrypted graphs? So basically what I want is we have this structured data which looks as follows: So we have this graph, and then we have the data items and basically the yellow data item is basically whatever data we want to associate with the [inaudible], right, et cetera, so this is our -- this is our input. We want to encrypt this, send it to the cloud and then generate a token to do a neighbor query. So we send a token for the green node and then the cloud is able to figure out which encrypted documents are connected to the green node. Okay. So the [inaudible] we use, we use a dictionary which is basically just a key value store. We use a pseudorandom function and we use some form of non committing symmetric encryption. And this we can build some things using a pseudorandom function in XOR. In this case, we have lower bound on the token size or we can use random oracle in XOR and we don't have to worry about the lower bound, we get tokens that are as large as the security parameter. >>: [inaudible]. >> Seny Kamara: No. So everything here is static. Okay. So what does this scheme look like? So this is our graph. The first thing we do is we generate the adjacency list representation of this graph, which consists of for every node we just write down all the nodes that are adjacent to it. So this should be a familiar, familiar data structure. And then what we do is we use the pseudorandom function on the nodes, right, so we just turn them into random looking values. And then we use the non-committing encryption scheme on the list of edges, right, for each node. And the key is the function of the particular node that we're working with. Okay? And then we just store this in our dictionary. And that's what we call an encrypted data structure. And we send this along with the encrypted documents to the cloud. And whenever we want to -- whenever we want to do a neighbor query on a particular node, we send a token that has this form. So the first element is just evaluation of pseudorandom on the particular node that we're trying to query and then we generate this key for the non-committing encryption scheme. We send both of these elements to the server. The server takes the encrypted data structure, so this dictionary queries it using the first element, which is this random looking string. The dictionary returns the associated value, which is basically the non-committed encryption of the nodes that are adjacent to it, and then it uses the key to decrypt and give back which nodes are connected to it, right? So it's very simple. That's essentially how the scheme works. You have to take care of some details, but this is at a high level how it works. Okay. So that was neighbor queries on encrypted graphs. And now we -- okay. So ->>: [inaudible]. >> Seny Kamara: Yes. So in this area, right, it's sort of the -- we're trying to get this -- get the right tradeoff between efficiency and security, right? And we know how to do these things like completely securely. We use oblivious RAMs and we're done. But, you know, here we're really trying to get something that's actually backwards. So we are looking some information. And to be honest, it's not clear. Like we don't have a good way of assessing how much information -you know, how dangerous this is, right? But we're sort of willing to say, you know, we're going to leak this much information in order to get efficiency and hopefully that's -- hopefully it's not too much. >>: But you feel it's much better than saying this is my graph, I mean here's a [inaudible] but you are saying the [inaudible] information especially from [inaudible] by just saying there is some random node connect to this parameterized [inaudible]. >> Seny Kamara: Yeah. Presumably [inaudible] this information, and it would take a long time before you could actually reconstruct any meaningful information. But, you know, we don't have a good way of assessing that, right? It's sort of -- but, in any case -- but sort of from a practical point of view it's better than what we have now, which is nothing. Right? So ->>: [inaudible]. >> Seny Kamara: Yeah, but what I mean is the following. So we know -- so we know what is being leaked by the scheme. So we -- we can characterize what is being leaked in some way. We can say it's the access pattern, it's this, these corners and these elements, right? Now, the problem is that we're actually using this multiple times. Then it's not clear, like if -- so if I look at the information that's being leaked over multiple, multiple queries, right, I can start to sort of make inferences about -- so I can start -- I can try to guess what you might be searching for. Okay. So take the case of keyword search, right? So if I see that, you know, you're doing a bunch of searches that hit a lot of documents, right, and I know that you work at a particular company, maybe I can start inferring things about what you're surfing for. And the more of this data I get, the better, you know, I can infer. So it's not -- I mean, so it's not that we don't know exactly how to statistically what would be -- what we're leaking. Yeah? >>: And you don't want to [inaudible] public encryption because it's [inaudible]. >> Seny Kamara: Yeah, I mean you could use, you know, these sort of generic techniques and you could use homomorphic encryption, you could use oblivious RAMs [inaudible] yes? >>: [inaudible]. >> Seny Kamara: Yes. Okay. So the example I gave, which was neighbor queries on graphs is really sort of a simple type of query on a graph, right? But in a lot of cases we have more complicated types of graphs. We have sort of objects that have -- that might fix different types of structure, right? So one example are Web graphs which are basically just collections of Web pages, right, and Web pages of hyperlinks. So Web graphs essentially consist of text data, which is in the pages, enough graph data, right, which are the hyperlinks between the pages. So you can do simple queries on Web graphs. You could ask for all the pages that are linked, you know, to a -- from a particular page or all the pages that link to a particular page. So these would just be plain neighbor queries on a Web graph. But you could also ask more complicated queries on Web graph, right, because they have this extra structure. So you could ask queries that basically mix both text, the text and the graph structure of the data. Right? And one example -- so and where this comes up a lot for Web graphs are for search engine algorithms, right? In particular -- so the more modern search engine algorithms like PageRank or HITS basically they don't -- when you do a keyword search in your search engine and they give you back a ranking, they don't just look at the text data, right, of the Web pages. They look at the text. But they also do some computation on the link structure. So they mix both. And so yes. Some of the more well known algorithms are PageRank, obviously, that Google uses but in particular I'll highlight Kleinberg's HITS algorithm because this uses actually focused subgraph queries which is what we do, and there's some derivatives of this algorithm, SALSA, and a bunch of others. Okay. So what is a focused subgraph query? So the way these search engine algorithms work is basically the first thing they do is they look at the Web graph, they compute a focused subgraph and then they bring this iterative algorithm on this focused subgraph and then they output a ranking, okay? And then they send back, you know, the hundred best pages. So if I'm doing a keyword search for crypto, then a focused subgraph is essentially the following. So first I do a keyword search over all the Web pages and I figure out which pages contain the word crypto. So in this case, it's these three pages. And then I add to this subgraph any page that's connected to these pages or any pages that connects two of those pages, right? So I just add all these pages. And that's my focused subgraph. Okay? So that's all that means. Okay. So how could we -- how could we encrypt a graph or Web graphs so that we can do focused subgraph queries on it? So one approach is to just take this Web graph and encrypt it with a structured encryption scheme that handles keyword searches, output a ciphertext, and then script it with a structured encryption scheme that handles neighbor queries. And output is a second ciphertext and the ciphertext is a combination of both, right? And suppose it won't work it. Because if I gave you a token for the keyword search, for example, you'll be able to figure out where the server would go, figure out which documents or which Web pages contain the keyword. But then I still need to give you a token so that you can get the neighbor, the neighbors of that particular Web page, right? And I don't know, I can't predict what the answer to that query is going to be. So there's no way for me to do this interact without interaction. Okay? So in the paper, we sort of introduced this chaining technique which allows us to combine structured encryption schemes in order to combine -- in order to generate a structured encryption scheme that's more complex. And to handle focused subgraph queries, we combine structured encryption for keyword search with structured encryption for neighborhood queries. And the nice thing about this approach is that it preserves the token size of the first scheme. Okay? So the token sizes don't add up. You have the second one for each structured encryption scheme that you use, you just send one token for the first one. Okay? So this chaining technique is useful but it requires an extra property from the structured encryption scheme which we call associativity. And essentially what this means is that you have a more complicated message space which includes -- so the data structure, both the items, the data items and also a vector of what we call semi-private information, which is information that's private in one way but public in another. So -- and I'll explain a little bit what that means. And the sort of the answer spaces are a little bit more complicated. You get pointers to the data items but you also get this semi-private information. Okay? And it turns out -- so there is a scheme that is -- that handles keyword -- keyword searches that is associative but it's not secure in this adaptive sense, which is what we need. So in the paper, we propose a scheme that does the handle keyword searches, this associative and that's adaptively secure. Okay. So a little bit more precisions on what we mean by associativity. So these are the algorithms that we had initially for structured encryption. So now the scheme is structured encryption scheme is associative if it has the following properties: So on top of the data structure and the data items we also have the semi-private information. And what this means is for each data item I can associate another item. All right? So for M1, I have V1, for M2, I have V2. Okay? And what's the point of doing this? The pound is when I run my query algorithm on the encrypted data structure and the token, I'm not only going to get the pointers into the data items that are relevant for this particular query but I'm also going to get the associated data. And I'm going to get this in clear text, right? So it's not encrypted. So it might seem a little bit word. Why do I want to release information? But it turns out that we actually need this in order to handle complex queries. Okay. So I don't know how much time I have left. >>: [inaudible]. >> Seny Kamara: Oh, good. All right. So how do we do focused subgraph queries on Web graphs? Okay. So actually before -- let me go -- okay. So we view on Web graph as a combination of three things. There's a keyword index, a graph, and then the data items. Okay? So now our message space looks like this. One encrypted, some tokens, and I get back the encryptions of the documents associated with the focused subgraph. So we're going to build our encrypting scheme for focused subgraph queries out of two of the schemes, one that handles keyword searches and one that handles neighbor queries. So given this Web graph, right, so a bunch of Web pages with hyperlinks, the first thing we do is we use the structured encryption scheme for neighbor queries and we generate tokens for neighbor queries for each node of this Web graph. So we start with the first page. We generate a token for neighbor queries that we go to the second one, we generate a token for neighbor queries, et cetera. Okay. Then we use the structured encryption scheme that handles keyword searches and we encrypt the following things: So as the data items we use the Web pages, right, so just the plaintext data. And as the associated semi-private information for each Web page, we encrypt the token, right? The token that handles neighbor queries. And so we do this for all the Web pages. And so this is going to generate a ciphertext, right? Okay. So then we use our structured encryption scheme for neighbor queries and we use that to encrypt the graph structure of this Web graph. So we just read this as a graph, we wish to encrypt this using the encryption scheme for neighbor queries. So this is going to generate -- so now we have two ciphertexts, one that handles keyword searches and one that handles neighbor queries. And our ciphertext for focused subgraphs is basically just the combination of the two. Okay. So how do we actually perform the queries on this ciphertext? So this is what a ciphertext looks like. Our token is basically just a token for keyword search, right? So we want a query for the word crypto, we generate a token for crypto using the structured encryption scheme that handles keyword search. We send this to the cloud and then the server uses that in conjunction with this encryption scheme, right, that handles keyword searches, and this with this token, so the ciphertext with this token is going to allow the server to figure out which encrypted documents contain the keyword crypto, right? And because of this associativity property it will also enable it to figure out the tokens associated with those pages. Okay? So that's why we actually want to release this information and encrypt text. So we just -- so okay. So in this case, there's just one file that contains the word crypto, so we get a pointer to that file and we get this token, and then we use this token together with the encryption scheme that handles neighbor queries, right. So now we have the encryption of the graph structure. We use this token here with this and this is going to allow us to figure out which nodes are neighbors of the yellow node. So basically this one in three, okay. So in this way, we've done focused subgraph queries which are basically more complex than normal graph queries, right, by combining these two different types of encryption schemes without using interaction. So if I have a little more time I'll talk about controlled disclosure. Okay. So the typical application for structured encryption or social encryption is doing private queries on encrypted data. Right? This is set of what we're all familiar with. And this is fine. But there may be situations where you actually want the cloud or the server to do some computations on your data. And maybe you don't, you know -- you don't care that much that it learns some information about your data, right? So I have this social network. I send it off to the cloud. And I want the cloud to compute some very complicated function of this graph. And my data is so huge that there's just no way I'm going to be able to use fully homomorphic encryption or two party computation. It's just not feasible. So maybe I don't mind it learning, you know, parts of my graph. Right? Okay. So that could happen. Now, what if the algorithm is a local algorithm, right? So the algorithm doesn't actually need to see all of the data, right? It only needs to see part of the data. Right? So again, I have this social network, I want it to compute something about my network. But really it only needs to see like a small subset of my graph, right? So in that case, maybe I don't want it to see the rest of my graph, right? Maybe part of my social network is related to my family, so I don't really want them to like go snoop, snoop around there. But I want them to compute something over the part of my social network that's related to my work, okay? So I don't mind them learning information that's related to my work friends, but I don't want them to learn anything about my family. So the algorithm is local, right, this type of security guarantee could make sense. And so controlled disclosure is basically just sort of mechanism that would allow you to disclose pieces of your data, right? So you encrypt your data, you send it off, you want the server or the cloud to perform some computation on your data. You don't mind them learning part of the data, but you don't want them to learn all of the data, right? So you want to be able to just disclose a subset of the data. So here again if we have the social network, we want to encrypt it, send it, and then I'm going to send the cloud a token. This token is going to allow the cloud to recover basically a subgraph in this case, right, just a small subgraph, and then it can evaluate some function F on this subgraph and send me back the answer. Okay? Okay. And so you can use structured encryption to do this kind of thing. And so, you know, this could be useful just sort of independently in and of itself but one application or another application is to sort of these cloud-based data brokerage system. So if you haven't heard of these, so there's a few examples I didn't know of. One of them is being developed at Microsoft, it's called Dallas for now. There's another server it's called Infochimps. And so the idea here is that you might have some -- a producer of data, like the government, for example, that produces massive amounts of data and it wants to store it in the cloud, right, because these are really, really massive datasets. And another people want to use this data. They want to write applications that use this data. And so they're going to interact with the cloud. Are they going to pay the cloud in order to use the data, right? So they pay the cloud a certain fee per query on this data and then they get back just a piece that they pay for. And then they do their computation and they, you know -- and they're happy. So that's what I'm calling a data brokerage service. And this is actually being built. I mean, Microsoft has a service that's available to do this. So controlled disclosure could be used in this setting in the following way. So the client basically encrypts its data, sends it off. Then -- so this is -- sorry. So this is the producer of the data. And the consumer generates this query, sends it to the producer, the owner of the data. The owner sends them back some token. He sends this to the cloud. The cloud is able to recover a subpiece of the data, whatever is actually needed for the consumer to perform its computation, and then the consumer can just, you know, run his app in the cloud on this data, okay? And if you do things this way, as long as the producer is will to remain online and, you know, and send tokens around, the producer has a stronger guarantee than what's available now. In particularly, he can actually get an accurate count of which queries are made or how many queries are made on his data. And this is even if there's collusions between the cloud and the consumer. Okay. That's pretty much it. [applause]. >>: The next talk is by Giuseppe Ateniese from Johns Hopkins. And he'll tell us about cloud cryptography, give back control to the users. >> Giuseppe Ateniese: Thanks. So don't have to spend much time on motivation since previous talks already talked about were cryptography is important for the cloud. Definitely the cloud is a big business opportunity and it will fly only if users will feel in control of the data. So I'm going to focus on two cryptographic primitives I've worked on that will enable this. But first let me thank my coauthors. As I mentioned in the previous talk here at Microsoft, I'm lucky because my first name starts with A, so I'm always mentioned in all these papers. But so I will touch the topics related to provable data possession, which is with Randall Burns, Reza Curtmola, Herring, Kissner, Peterson and Dawn Song. And I'll talk briefly, very briefly about some new results with Di Pietro, Mancini, Gene Tsudik, and some work with Seny Kamara, Jonathan Katz that appeared in Asiacrypt. And then I'm going to talk about briefly proxy reencryption which is another cryptographic tool. And this is joint work with Kevin Fu, Matt Green and Susan Hohenberger and some recent work with Karen Benson and Susan Hohenberger key privacy. I'm not going to of course touch -- focus on the details. The idea is that I'm going to mention some ideas, and if you want to know more we can talk offline. So the first part of the talk is about provable data possession. And it's clear that cloud storage server benefits. In particular, clients with limited resources can outsource storage. Cloud provides universal access independent of location. For instance, I can access my mail, I can access my documents. And it provides free services like data backup, recovery, or archival. And one may argue that also for the average user at least cloud storage provides more security because if someone -- even though the services are always online and available, they are usually only monitored by professional people. So in case of attack or, you know -- intrusions can be kind of easily detected in most cases. At least then for the average user that, you know, usually doesn't choose strong passwords or doesn't encrypt file disks. This may be a good solution. In addition, what we focus on here is mostly archival storage. So there are legislations that require data to be retained for several years, sometimes forever. And the data's to be available. And again, outsourcing data to third party avoids the initial setup cost and avoid these nightmares like maintenance and scalability. So in particular, we focus on archives like the Library of Congress in the US. They are supposed to keep this data available forever. And we are talking about several hundreds of terabytes of data that has to be kept forever. So the question is for instance suppose we want to provide the content of the Library of Congress available to every taxpayer. The idea is how can we make sure that this information is actually stored correctly? As you might know, you know, the Library of Congress contains books that people may not even know the existence of. So it might be possible for a storage provider to not necessarily delete but at least but this information that is never requested by users on a secondary storage device like tapes, okay. Is so that it's not deleted but it's not using premium storage space, okay? So here we are dealing with sort of adversary that the storing very large amount of data, okay, and is willing to delete a percentage of this data. Not necessarily, again, deletion means destroying of data, it means that this data is moved to secondary storage devices, okay, for instance. And unfortunately we cannot trust third party cloud providers necessarily. There's always the incentive to reduce cost and increase profit and so it may be that third party cloud providers might discard data that is not accessed or is rarely accessed. And it could be that there are -- there is an incident and some data is lost, but since nobody access that particular information, who cares. So maybe they will not notify the customers. But also, you know, you can -- if you start for instance financial records and things like that, I mean, third party providers might intentionally modify data. So we want to avoid all these problems. So provable data possession is this area. And it's a set of tools that allow my cell phone -- here I'm lighting the cell phone because I'm looking for very efficient solutions, okay. So I don't want to use anything strange or esoteric. So can my cell phone verify that the entire content of the Library of Congress, for instance, is stored and available online? Okay? So let me clarify that this is kind of hard problem in some sense because once I store this information in the cloud, I don't have a local copy. So it's not like oh, I have a backup and then I can check whether the information I have is the same as the backup stored in the cloud. Now, the idea here is I have information in my disk and I move this information to the cloud and I'll just have a diskless computer. Okay? So I don't have this information anymore. Okay? So even if I don't have this information, is it possible to verify that everything is there? So if you store your -- the pictures of your family, movies and stuff, can you check that the picture you took like 30 years ago is still there even if you don't have a local copy? Okay. So that's the question we want to answer. So the answer is yes. And you can actually do it efficiently. So before we look at some solution, let's first see some partial solution that might not really work. So suppose I store everything in the cloud and I want to check if everything is there. Okay? One obvious way of doing it is to retrieve this information. Okay? This of course is very inefficient because I will have to retrieve for instance several terabytes of data. Okay? So another possibility will be okay, so what I'm doing is before storing this information I'm going to compute several MACs on this information, on this -- let's say huge file. And then I store these values on my computer. So I have to store very short values. And then later when I want to check and say I go to the cloud provider and say hey, compute this MAC on my file and give back this value to me and I will check whether that MAC is valid. Now, this solution actually doesn't really work because here we are talking about the file of several terabytes. If I ask Google to compute a MAC on 75 terabytes, for instance, it will take weeks. Okay? So it's not a problem of -- the MAC is a very efficient primitive. The problem is accessing several terabytes requires a long time. Okay? So that's not good. So what I could do is, though, I could use a probabilistic approach. So rather than ask you for an entire file, what I could do is I can say okay, look, this is a key to compute a MAC on, and you know, compute a MAC on block number one, block number 10, block number 50 and 100. So I pick random file blocks, and I ask the cloud provider to compute MACs on those, okay. Since we are considering an adversary that is deleting, you know, a percentage of the file, like for instance, a one percent, I don't have to ask for many blocks to have a good probability of catching a misbehaving provider. For instance, if I delete one percent of my file, if the cloud provider deleted one percent of my file, I can ask for about 500 file blocks if I want to have a probability of caching the adversary higher than 99 percent. Okay? So this is not -- what the cloud provider will do is get these file blocks or books in the case of the Library of Congress and compute the MAC under the key provided by the client and send back these books and the MACs, corresponding MACs. Now, this solution is better definite than the previous ones but still not satisfactory. First of all it's linear with respect to the number of file blocks I queried the cloud provider of. And also I need to just check. Why do I need to get the books in the first place or the file blocks in the first place? I mean I don't need to have those just for checking. I don't need to read those books. I just want to make sure that the cloud provider is storing those books. That's it. Or those file blocks. Okay? So the question will be can I check that those books are stored without downloading the actual books? Okay? So essentially using constant bandwidth. >>: [inaudible] a little bit confused. So is it Google is computing -- you're giving the key format to Google? I mean, if you don't ask Google, I mean you can't use security of a MAC you [inaudible]. >> Giuseppe Ateniese: You store those before. >>: So you compute ->> Giuseppe Ateniese: Yes. >>: You [inaudible]. >> Giuseppe Ateniese: Yes. >>: Why not just [inaudible] you're saying Google refuse to store -- I mean usually ->> Giuseppe Ateniese: You can also store -- yeah, of course. If you don't have local storage, you can store it using, for instance, authenticated encryption and say before you compute this, give me my answers that I stored in this safe. Yeah, I mean. So the idea here is though that we don't want to download books, okay, so we don't want to download files. We don't need to read them in order to check if they are stored. So our target is to kind of aggregate MACs so if I have several MACs I want to find a way to compress them into a single value. And also, I don't want to send file blocks at all. So I need to be able to check these MACs which now are just a single value without having the actual messages, which sounds a little bit crazy. So since not so many of you are necessarily cryptographer, let me just quickly refresh memory like what RSA signatures. Because our solution will be based on RSA. So if I have N equal to the product bit primes, in this particular case we are using sub primes where P prime, Q prime are also primes, and E is a public exponent RSA, the secret exponent such that E times D's congruent to 1 mod field N, and so the public key in RSA signatures is EN, and the secret key is D and the factorization of N. So what you do in order to sign a message in RSA, you use a random oracle H and you hash your message M and raise the D mod N and to verify the signature you basically raise the signature to the public exponent. Okay? And if you get back H of M, then the signature is valid. This is just a standard RSA signature in the random oracle model. Okay? And indeed our schemes also work in the random oracle model. So let me look at a first simple solution. So suppose I have a block MI, okay? And what I do is I want to compute like a MAC or a tag on this block, okay? Now, this MAC or this tag or whatever will look like this. So H is again a random oracle. I'm simplifying a lot the notation. But consider this WI as just an index I, okay? So this represents the position of the file block. Okay. For instance, I'm considering the second file block, this will be two. Okay? So I'm hashing two. And times G is an element in this [inaudible] with very large order and this will be raised to the actual file block. Okay? And everything will be kind of sign using RSA. So I'm raising this to D mod N. Okay? So imagine this is our attack. So how can I design a scheme to verify that MI is actually stored at Google without knowing MI? So what I can do is I can do this interactive problem. So I'm asking Google, for instance, hey, I want to know if you are storing my Ith block. Okay? So previously of course I sent MI to goal, okay. And along with a tag TI that I computed as in the previous slide. Okay. So what Google will do is will just send MI and the actual tag, TI. Okay? And then I just verified that this tag is valid. Now, here I'm kind of cheating because as I mentioned before, I don't want to send the message in the first place, right? But here I'm just sending the message and a kind of signature on this message. Okay? Has just a simplification. Indeed this is just a simplification. Let's look at the case where we have many file blocks. So this is the full solution. So suppose I had this very long file, okay. So what I do is I store this file into the cloud provider along with the tags, okay? And then I want to check that this file is still there. So I use a probabilistic approach. So I'm going to ask for random file blocks. Okay? So I'm going to say to the cloud provider, hey, I want to know if you are storing my first book, my third book, my sixth book and my seventh book. Okay? Yup? >>: [inaudible] storage provider from just choosing, you know, the seventh book and the eighth book and the ninth book? >> Giuseppe Ateniese: So you avoid this problem by using this index here as I mentioned before in the previous slide. So if you look at this, this is actually -well, it's not exactly just an index, but you can assume that this is just -- so this is -- specifies that this is block number one, and this is block number three and this is block number six. So if you use different books, this verification will not pass. So the position is specified inside these hashes. So the client says okay, I want to know M1, M3, M6 and M7, okay, and, by the way, I'm also sending some random integer that could be small like 80 bits each. Okay? And now I'm calling these integers A1, A3, A6, A7. They are randomly generated. For instance, you can use a seed in the random oracle itself to generate these values. So you send the random value, a random key to the cloud provider and the cloud provider uses these keys to generate these integers. Okay. So what the cloud provider will do is will compute -- so will pick all the tags, T1, T3, T6, T7 that I stored before, okay, and will compute this mod N. Okay? So T1 to the A1, T3 to A3, T6, A6, T7, A7, all multiplied. So you can see that this is a single value mod N. Okay? And then over the integers is going to compute a function of the file blocks I was queried -- I was interested in. Okay? So it's going to compute A1 times M1 plus A3, M3, and so on, so forth. This is an integer, not mod anything. Because of course the cloud provider doesn't know how the order of the group. Okay? Which is part of the secret key in RSA. So the cloud provider will send these two values. As you can see, suppose the verification works and after the verification the idea is that I can check that this cloud provider is storing these files. I'm sending very short information here, right? So T is a single value in this, okay, and this is an integer actually grows very slowly, right? It's a sum, so it grows logarithmic with respect to the number of terms in the sum. Okay? So this is essentially as big as a single block. Okay? So rather than sending a linear number of blocks, I'm just sending kind of essentially a single block. And what I do is I need to check that this message is for a technicality is between ->>: [inaudible]. >> Giuseppe Ateniese: Yeah, I mentioned before, they could be small, they could be usually 80 bits. So what I do is I verified that practically these tags, this kind of aggregated tag was computed correctly, okay? And so what I do is I pick whatever the cloud providers send me, I raise that to E, okay, and then I divide this by -- if you remember how this T, these TIs were formed, right, like where H of the index, G to the five block. Okay? So what I'm doing essentially is I want to remove the H part from this value. Okay? So I'm removing these H parts from this. And what I'm left is G to the M. Okay? If this verification passes then I can claim that those files, those file blocks were actually stored there at the server, even though I don't have these file blocks in the first place. Okay? Yup? >>: [inaudible] M is less than T? >> Giuseppe Ateniese: Right. I mean, it's important because it's actually it happens in the security proof. If M is bigger than E, you can cheat because you can pick -- so in order to prove the security what you do is you have this T to the E so the simulator, you know, simulates the game with the adversary and will create like a fake file and tags and then what -- suppose there is an adversary that will come up with a forgery, okay, so we'll come up with some T prime such that T prime will pass this verification even though the files are not there, okay? Then if you divide T to the E and this value -- so the idea is that you want to make sure that you get on one side of the equation this division of T and T prime raise to E and the other side of the equation something that is morn E so that you can use Shamir's trick. Sorry, this is slightly technical. >>: [inaudible] because the lengths of M ->> Giuseppe Ateniese: Yeah, I mean C is fixed a priori, C is the number of challenges is fixed a priori. >>: [inaudible]. >> Giuseppe Ateniese: Yeah. So and E is a very large prime number. Yeah. And by the way, this works -- sorry. I'm talking [inaudible] again, but this works when RSA E doesn't have to be necessarily less than N. So in this case, particularly E is much bigger than N, could be much bigger than N. So technically, this is an extension of RSA. But it's a kind of RSA with a big exponent. So question? >>: [inaudible]. >> Giuseppe Ateniese: Oh, no. No. It was talking about number of challenges. So here I'm asking for four challenges, right? Depending on the application you may need 400, for instance, rather than ->>: [inaudible]. >> Giuseppe Ateniese: You can repeat this as many -- it's unbounded. Yup? >>: How strong can [inaudible] could be AB [inaudible] so it could be like [inaudible]. >> Giuseppe Ateniese: No, this has to be like 80 bits. >>: [inaudible]. >> Giuseppe Ateniese: Well, because in the end what you do -- what you want to prove is that you want to prove that if there is a cheating adversary you can extract the actual file, okay? So what you do is you -- there is an extractor that query the adversary without rewinding several times until it gets like a series of like equations that are linearly independent and then you solve the equations and you get the messages back. >>: [inaudible]. >>: [inaudible]. >> Giuseppe Ateniese: Okay. So the features of this scheme is well, first of all there is a nice side effect here. I don't know if you'll notice that. So if the client can check that these file blocks are actually stored there without having the file itself, which is great, right, then everybody can do it. I mean, even if I'm not the data owner, I don't have any idea of MIs, but even the -- if the owner didn't have -- doesn't have any idea of MIs. Okay? So actually this is what we called public verifiability, okay? So everybody can actually check that person's Google restoring the Library of Congress even though we have no idea what the Library of Congress is storing in Google. Okay? So everybody can do it. So that's a great feature that is from many, many, many applications. I should mention that after this work, I think we counted like 60 publications after that and on this area. So it's becoming quite interesting and popular. Another feature is that there's an unbounded number of challenges. Of course you can apply this challenge as many times as you want. I'm emphasizing this because previous solutions provided just a limited number of challenges and also previous solutions required this data to be encrypted, okay? We don't have to necessarily encrypt data, it could be just public data. Yup? >>: I'm confused by your latest statement. So ->> Giuseppe Ateniese: Public verifiability? >>: Yes. Because I could do the same game with another library and [inaudible] what would [inaudible] so this is the Library of Congress that I'm checking. >> Giuseppe Ateniese: Well, yeah. Remember that this is public -- the RSA public parameters are given to you by the Library of Congress. So the Library of Congress saying ->>: [inaudible]. >> Giuseppe Ateniese: Yes. You are the taxpayers. This is E and N. Please, you know, once every two weeks check that Google is storing everything for us. You know. So the idea now people are using these in order to outsource this verification type of thing. Because now you can -- everybody can check. So even an auditor can check that those files are stored there even though the auditor has no idea what this data is about. And think about like liability and financial records. Sometimes you know it's efficient they had to necessarily outsource an auditor that can check that these financial savings are stored correctly, and nobody's modifying them. But for privacy reasons before a certain date they cannot be released, so you want to provide also some from a privacy protection. Yes? >>: [inaudible] would it be fair to say [inaudible] in the Library of Congress [inaudible] a public verification key and basically it gives to the Google a signed copy of whatever [inaudible] then basically what Google does it basically [inaudible] proofs [inaudible] is. >> Giuseppe Ateniese: Right. >>: [inaudible]. >> Giuseppe Ateniese: You can but remember that here we are talking about a very large file. So when you do these proofs, right. >>: Right. But it didn't give us [inaudible] so it's [inaudible]. >> Giuseppe Ateniese: Right. But the problem is ->>: [inaudible]. >>: Software from using PCP and the random oracle [inaudible]. >>: And in a sense it's more efficient [inaudible]. >> Giuseppe Ateniese: Right. Because the problem is you don't want to access the whole file. Unfortunately -- when you want to prove -- >>: [inaudible]. >> Giuseppe Ateniese: Right. But you pay like logarithmic but here is essentially custom. Yeah, I mean, also one thing we want to emphasize is that we wanted to look at something that was, you know, essentially constant in size. So independent -- yeah? >>: So if you [inaudible] make sure to detect let's say one percent loss of data. If you are like below the ECC layer, can you do better and make sure that data cannot be lost because ->> Giuseppe Ateniese: Yeah. I mean, again, for efficiency reasons here, we are just focusing on detecting corruption or percentage of the files. Okay? If you want to recover your data, that's a different problem. You can apply error correction codes of course. But then it becomes less practical for several reasons, in particular we also are interested in dynamic data. See here, I was talking about archival storage. This is kind of static. You can only add data. You can never delete information, okay? And these protocols are using error correction codes are fine as long as the data is static. But if you want to change file blocks, if you want to have the ability to have -- to modify your picture, a picture that you stored three years ago, you want to apply like some filtering something and then store the new version of the picture then becomes extremely complicated and inefficient if you use error correction codes. >>: [inaudible] underlying assumptions are RSA. >> Giuseppe Ateniese: Yes, it's RSA but in the random oracle model. I was arguing before that the random oracle doesn't exist. I think it's debatable because you can always think of like a remote server that computes the hash per use. But yeah, I mean unfortunately -- we do have some results. >>: [inaudible]. >> Giuseppe Ateniese: No, no, it's standard RSA. Yes. >>: [inaudible]. >> Giuseppe Ateniese: Yeah. We have a -- yeah. This is the paper I wanted to mention here. Well, first I want to mention that if you don't care about unbounded number of challenges, and if you don't care about public verifiability, then together with Jim Sudik and researchers in Rome, we found a very efficient way of doing this that it's extremely fast and uses only hash functions. Okay? So the intuition -- I'm not going to go into details, but the intuition there is you kind of precompute all the responses that you would expect from Google or any cloud provider and you store this in Google itself, encrypted using for instance authenticated encryption. Okay? So we have this very efficient solution. So the complexity comes from the fact that we are interested also in public key verifiability and of the previous solution. The public verifiability unbounded number of challenges. But we may argue that you don't really need an unbounded number of challenges anyway. In recent work in -- here at Microsoft, Seny Kamara and Jonathan Katz we actually find a generic way of transforming what we call kind of homomorphic sigma protocols. These are kind of sigma protocols that have certain homomorphic property into PDPs. And this transformation doesn't use random oracles. However, we need the random oracle for the actual construction. But thanks to this compiler now we can pick any kind of sigma protocol as long as they satisfy this homomorphic property and build a PDP. For instance, we built a PDP based on factoring. So it turns out that this is as efficient as the RSA version. So this shows the benefits of this compiler. >>: [inaudible]. >> Giuseppe Ateniese: So the transformation doesn't use any random oracle. >>: Does it [inaudible]. >> Giuseppe Ateniese: Yeah. But them the actual -- the actual sigma protocol might require the random oracle. So the -- so in this result for instance we started from a sigma protocol that has these properties. But that uses the random oracle. And then we apply our compiler. So in the end, we had to rely on the random oracle. But it's an interesting opening question to start from a sigma protocol with homomorphic property that it works in the standard model. Okay? We haven't found any. >>: [inaudible]. >> Giuseppe Ateniese: No. >>: What is the technical difference between [inaudible]. >> Giuseppe Ateniese: Yeah. That too kind of [inaudible] you can see proof [inaudible] as PDP, so this technique applied to a file where it's encoded using error correction code or erasure codes. I mean, this is a very rough equivalent. But it's not exactly that. >>: [inaudible]. >> Giuseppe Ateniese: Yeah. I mean suppose my file is encoded using error correction code first before I apply the PDP scheme on it. Okay? Then of course if I want to retrieve, files, right, I retrieve the good ones and I recover the bad ones using the error correction code. But that's a rough I call it -- there are distinctions, very subtle distinctions. >>: [inaudible]. >> Giuseppe Ateniese: We can talk offline. But more or less it's a PDP using erasure codes first. So you encode your file and then you apply the [inaudible]. >>: [inaudible]. >> Giuseppe Ateniese: I mean it's a different -- versus we don't care about retrievability. So I don't want to prove that my skin satisfies the retrievability property. I just care about detection mostly because of efficiency. We focus or more efficient schemes. Okay. So let me focus on the other interesting primitive. I think it's very important for cloud crypto in general, a proxy reencryption. So which provides a way to -- yup? >>: [inaudible]. >> Giuseppe Ateniese: Yes, this was the question. >>: [inaudible]. >> Giuseppe Ateniese: So basically -- POR is basically PDP plus erasure codes, more or less. I mean, just to see it [inaudible]. >>: [inaudible]. >> Giuseppe Ateniese: Yeah. So what is proxy reencryption? And why is it useful for the cloud? So proxy reencryption basically is the following. So suppose I have a message encrypted for Alice. I'm glad to have this server. So this encrypted e-mail goes to the mail server, right? The mail server forwards this to Alice. So suppose Alice goes on vacation and she says -- she asks Bob to read her e-mails while she's away. Okay? So ideally we would like to have like a mail server, okay, that gets this encrypted e-mail under Alice public key. Decrypt it, okay? And reencrypt this information using Bob's public key. Okay? So that will be a solution. The problem with this solution, though, is that the mail server first has to know the secret key of Alice, okay? And so the mail server will see the actual message. Okay? So that's not a satisfiable solution. So in proxy reencryption what we want to do is the following. We want to find a way to provide some proxy key to the mail server, okay, that is used to translate between encryptions. So think of proxy encryption as a way to translate between different languages. So this is a message encrypted for Alice, and this is the translator. It can translate something that only Bob can understand. Now, the nice thing of proxy reencryption, this information to provide to the proxy there is really no way to remove the message, okay? So the proxy doesn't have to know the secret key of Alice, okay? And at the same time would not be able to actually read a message. So that will -- that's -- that's the ideal solution. Now, this concept was introduced in '98 at Eurocrypt '98 by Blaze, Bleumer and Strauss, and they provided a very basically protocol for let's say El Gamal. Okay? So basically the idea is very simple. Suppose Alice has public key G to the A, okay? And Bob has public key G to the B. So in order to encrypt using El Gamal what I can do is pick a message M in the same group generated by G and multiply M by G to the R where R is random. So G has to be a group of prime order, so R is chosen in the Q where Q is this prime order. Okay? And then I also release G to the RA. This is actually a variant of El Gamal but it's equivalent. And I want to come out with something that will allow me to go from here, right, which is an encryption for always to here which is an encryption for Bob. Okay? One way they say -- they do it is by releasing B divided by A mod Q, okay, where Q is this prime number. So clearly the proxy must be divided by A so it doesn't know A or B, okay, doesn't know the secret key. And the proxy can raise this to B over A to actually get GRB. So it will copy the first message entirely, okay, and it compute this operation on the second component of El Gamal. So if this was a proper encryption for Alice, this is also a proper encryption for Bob. Okay? Well, there are several issues with this problem, though. First of all, it's bidirectional. So it means that if Alice goes on vacation, okay, and the proxy has this key, of course encryption for Alice can be transformed into encryption for Bob. But also the other way around. So encryption for Bob can be transported into an encryption for A. Remember that this is mod Q and Q is prime, so every element has an inverse. Another problem is that the secret key requires interaction. So in order to generate B divided by A, I need both Bob and Alice to be online and participate. And if one of the two actually collude at the proxy can get the other's secret key which is really kind of bad. And lots of transitivity for have -- if the proxy has several keys, for instance, from Alice to Bob and Bob to Charles, then the proxy itself, without ever interacting with the users can generate a key for Alice to Charles. Even though they may not know each other. >>: [inaudible] if I have A to B I can transform A ciphertext to B ciphertext and ->> Giuseppe Ateniese: Oh, you may want something like A to the B and then you have B to the C, but not necessarily you have a relationship A to the C. >>: [inaudible]. Because A to B and B to C let you transform A ciphertext to B ciphertext which is exactly the functionality of B to C, right? >> Giuseppe Ateniese: Yeah, yeah. I mean in principal, yes. But you can find versions of this protocol where you can actually not do this. So you can say I want to be able to do from A to the B and from B to the C, but I do not necessarily want A to the C. I don't want the transitivity property. And you can actually achieve that. So this property may be good for some application but actually bad for others. >>: Are you saying that the transformations from A to B some kind of specially ciphertext so just if you. >> Giuseppe Ateniese: Yes. And partially -- yeah. I mean, all these solutions have like what do we call first and second-level encryption that are distinct, yeah. It's kind of asymmetric, the protocol. So of course it's a big area. We also defined the security of the schemes. And in order to solve this problem, which seems hard, we kind of used the magic one in cryptography, Balena maps, and since they were introduced before, so I don't have to introduce them here, so we use a Balena map E and we used a symmetric map from G1, G1 to G2 and we publish as before as like in El Gamal G to the A and G to the B as the public keys of Alice and Bob, and we publish Z, which is the value of the generator of G2, which is called the external field. And we encrypt messages in the extension field M. G2. Sorry. So here comes the asymmetry we mentioned before. So suppose I want to encrypt a message for Alice such that the proxy cannot that it. So only Alice can read these encryption, okay? So just compute like standard El Gamal but not in G2. So NZK, ZKA. And this is just standard El Gamal, it just remove A and divide this ZK, ZK, and get N back. Suppose now instead I want to encrypt a message for Alice so that Alice can read it, of course, but if these on vacation and she has delegated someone else, okay, then the proxy can translate it. And the idea is that now we send NZK and GKA. Okay, where G now is a generator of G1, not the extension of field anymore. And now we provide to the proxy this key, GB divided by A. Now, notice that B divided by A is in the exponent, okay? So informally it's protected. It's not available anymore to the proxy. Okay? And also notice that in order to compute this GB divided by A Bob despite to be online at all. It's not an interactive scheme any more because I can start from the public key of Bob, G to the B, which is available by definition. Okay? So we solved like two, three problems at the same time. And then very simply what you can do is in order to transform this encryption for A into an encryption for Bob, D proxy will just compute the pairing of GKA and GB divided by A. And you get exactly ZKB, which is of course a properly formed El Gamal encryption for Bob. Okay. There are a lot of interesting protocols here. Let me mention why this is useful in the cloud scenario. These actually I think are Ken slides. So we have different clients and we want to use a storage provided, right? So we want to encrypt all these files. As it works right now, suppose these clients have to share files, okay? Suppose we have Google docs that are encrypted, right, and we want to share these files. Current solutions to this problem will require like what is called the key server, okay? So the key server encrypts all these files okay? And then if I want to access a certain file, I will have to ask for the key to the key server and say, hey, can I decrypt this file? Okay? And the key server will say, well, let me see if you have access rights to this file. I will ask the owner if I should provide you with this information. Okay? And this works pretty well. The only problem here is that we had to trust the key server. The key server in principal could read all the files because the key server owns the actual keys. Okay? But these are the current solutions. And also the server is always online, is always a single point of attack, okay? So if you compromise the key server, you get all the keys. So how can I use proxy reencryption here? Well, it's very simple. You have a question? >>: Yes. Can we go back to the [inaudible]. So it looks like there are two ways to encrypt, for -- if I want to encrypt something to [inaudible]. >> Giuseppe Ateniese: Yes. >>: [inaudible]. >> Giuseppe Ateniese: Well, the message is always the same. The first part of the encryption doesn't change. What changes is the second part. So it's the decryption algorithm that is different. In one case than the other. But the message is always the same. >>: [inaudible]. >> Giuseppe Ateniese: In this case? >>: [inaudible]. >> Giuseppe Ateniese: Okay. So in the first case it's just a plain El Gamal. So what you do is you remove your circuit A from the exponent and you divide these two, right, and you get M back. In the second case you can simply compute -you can remove the A, right, computer the map, the pairing so you go to -- you can put ZK, right, using the map and then divide ZK again. >>: You have to know which [inaudible] in the first group or in the second group? >> Giuseppe Ateniese: Yes, because it's implicit -- there are two distinct groups and you can specify if this is a first-level encryption or a second-level encryption, when you encrypt. So the first encryption is just for people that want to send a message like to Alice and there is no way for anybody else to read this message even if there are proxy set up by Alice. Okay? The second one says well, you know, if Alice delegated someone, it's fine by me. The important thing is that Alice gets the message. >>: [inaudible]. >> Giuseppe Ateniese: Well, unfortunately we have this symmetry somehow, but it would be an interesting problem to find uni-directionals, because you know, consider this scheme is not bidirectional. For instance, I can only go from Alice to Bob, I cannot go back. I cannot translate encryption from Bob to A, Alice. Okay. But we can talk offline. I have two minutes. So I was here. I was mentioning -- so you encrypt -- what you do then, if you don't want to reveal your key to the key server what you can do is I encrypt my file using for instance AES and CBC mode and with a symmetric key, and then I encrypt the symmetric key using my public key. And then I stored this to the cloud storage, okay? Now, suppose Bob wants to access this file. Okay? So what I can do is I can send a reencryption key to the cloud provider from A to B. Now, what the cloud provider can do now, it's only translate. So he will pick the encrypted symmetric key and convert the encryption from A to the B, but by the properties of proxy reencryption won't be able to learn the underlying message, so the underlying key and won't be able to know any secret key from Alice to Bob. Okay? And so Bob will be able to decrypt this information and use this key to decrypt the actual file using AES, symmetric encryption. Okay? And so this is very cool. Actually a similar system -- I don't know if you remember there was this attack on Apple iTunes in the past. The problem was that they were trying to convert an encryption from a certain user to another user and it was attacked because, you know, this key was actually stored inside the program. If they used proxy reencryption like this way, they will of avoided this problem in the first place. We have some recent results but I don't gonna talk about it. I just want to mention that one property we provide recently is what we called key privacy for a proxy reencryption. And the idea here is that the proxy will not learn even the participants by looking at the proxy key. Okay? Will not learn the identities of Alice and Bob. And these are several interesting applications. To conclude, again, cloud cryptography it's a great opportunity for research. And for the first time I think cryptography is really essential to the cloud provider. I mean, cloud providers would fail if they don't provide a way to give control back to the users of the -- of their own data. They will never really use cloud storage or cloud providers if they don't feel they have control on their data. So this time it seems like crypto can make a difference. I hope I convinced you that provable data possession is cool. I think proxy reencryption is good in general and it's a good way to provide access rights in the cloud. And let me just briefly three interesting open problems. One is to provide a PDP with full privacy. So when I mention I can check that the file is stored on Google, for instance, and I don't have to have these files in the first place. But during these proofs I can actually leak some information about the actual file blocks. It would be nice to provide complete zero knowledge. I actually should mention that with Seny Kamara and Jonathan Katz we have some results on this front. So now we can have full privacy in PDP schemes which is related to kind of leakage resilient signature schemes. It would be nice to find efficient PDP schemes for multiple storage servers. And for the proxy prescription part it would be nice to find efficient unidirection and multi-hop proxy encryption key. I should mention that if you use a fully homomorphic encryption you can build uni- directional multi-hop encryption keys. So theoretically this problem is known. The emphasis is on efficient and possibly key private. So it's still open to find and efficient solution for that. Thanks. [applause]. >>: [inaudible] time for one question here. >> Giuseppe Ateniese: You are not supposed to ask anything. >>: [inaudible] so I [inaudible] file systems and I talked to Susan and I said hey Susan, do you know any constructions in the symmetric key world that can do this kind of [inaudible] encryption? And has it been shown now that that's impossible or [inaudible] what's the [inaudible]. >> Giuseppe Ateniese: No. The latest result on this is a paper from South Colombia using this. But it's a very limited protocol for symmetric -- so the question is can we do proxy reencryption using symmetric primitive, for instance an encryption under certain key AES, for instance, uncertain key K1 and transform it into an encryption of AES under K2. >>: What's the rule of the server then? >> Giuseppe Ateniese: It's just a proxy. >>: This setting changes doesn't it? >> Giuseppe Ateniese: Yeah. I mean, suppose you have a symmetric encryption and you want to transform it from -- into an encryption under a different key, without knowing the actual message. There has been some work in the past, but nothing really that is -- provides all these features. So only for public key cryptography. So now you can find this. But there is no results that says this is not possible. It looks unlikely also because it's hard to prove anything. >>: Public key [inaudible]. >> Giuseppe Ateniese: No, no, the question is if I have ->>: Of course. >> Giuseppe Ateniese: Yeah. But them I will have to use number theory. So the question is more like not just symmetric encryption but efficient symmetric encryption using AES or DAS or symmetric ciphers. >>: It's not even clear what impossible means, right? >> Giuseppe Ateniese: Right. Exactly. But, yeah, I mean not using number theory to achieve ->>: [inaudible]. >> Giuseppe Ateniese: So efficient PRF or PRP is all block cyphers in general. So is it possible to do it? We don't know. I mean, there is some partial results from these guys at Colombia. >>: Okay. Let's thank the speaker again. [applause]

>>: The first speaker is Emily Shen. She's a... with us as an intern this summer. And she'll tell...

Related documents

Products

Support

&gt;&gt;: The first speaker is Emily Shen. She's a... with us as an intern this summer. And she'll tell...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>>: The first speaker is Emily Shen. She's a... with us as an intern this summer. And she'll tell...