>>: The first speaker is Emily Shen. She's a... with us as an intern this summer. And she'll tell...

advertisement
>>: The first speaker is Emily Shen. She's a grad student at MIT, and she's here
with us as an intern this summer. And she'll tell about you evaluating predicates
privately over encrypted data.
>> Emily Shen: Okay. Thanks. I'm going to talk about evaluating predicates
profile on encrypted data. And this is joint work with Elaine Shi and Brent Waters
and appeared in TCC last year.
Okay. So in the traditional encryption scheme, you can encrypt data under a
public key or a secret key. And then only the owner of the secret key can decrypt
the data. So in a sense decryption is sort of all or nothing. If you have the secret
key, you can decrypt the message, and if you don't have a secret key, you can't
get any information.
But in some applications ->>: [inaudible].
>> Emily Shen: I don't know. Now it is. But in some applications what we want
is more fine-grained control over who can access the data and what can be
learned about the data.
So for example, you can imagine that a user is receiving e-mail encrypted under
her public key. And depending on certain -- whether certain predicates are true
on the e-mail, she wants to route the e-mail accordingly either to her phone or
her desktop. So if the subject line includes the word urgent, then she wants the
e-mail to be routed to her phone and otherwise just sent to her desktop.
So the user wants to be able to give the server a token corresponding to these
predicates and let the server test these things but not learn any other information
about the e-mails.
As another example, you can imagine that a user is storing her files encrypted on
the cloud and at some later point she wants to retrieve all the files satisfying a
certain predicate; for example the category equals work and the subject line
includes Microsoft.
So predicate encryption is a new encryption paradigm which gives us this more
fine-grained access control. And, okay, what does a predicate encryption
scheme look like? We have four algorithms. A setup algorithm which gives us a
public key and a secret key. And an encryption algorithm which takes a public
key and a message and returns a ciphertext, a generate token algorithm which
takes the secret key and the description of a predicate and gives us a token for
that predicate.
So now we have a query algorithm that takes a token for a predicate F and a
ciphertext for a plaintext X and it will return one if the predicate F evaluated on X
is true and 0 otherwise.
So and we can also attach a payload message to the -- so that you're encrypting
a message along with X and the query algorithm will returned the message if the
predicate is true and nothing otherwise. But for a construction it's simpler to just
consider a predicate only version of these encryption schemes.
So predicate encryption is related to a lot of other encryption paradigms which
we all have heard of, for example, identity-based encryption is -- can be seen as
a specially case of predicate encryption where the predicates correspond to
equality testing.
So in identity-based encryption each person's public key is a string
corresponding to their identity, for example, their e-mail address. You can
encrypt under that public key and only the person who has the key for that string
can decrypt the message.
In attribute-based encryption, a user can receive a capability for a policy over the
attributes -- over attributes of encrypted data. And so a ciphertext can be
decrypted if a user has a key for a policy that is satisfied by the attributes of that
data.
But one important difference is that in IBE and ABE the identity and the attributes
are not hidden, whereas in predicate encryption, the predicate is encrypted and
hidden.
So previous work on predicate encryption has looked at the security notion which
we'll call plaintext privacy. Roughly what this is is that ciphertexts should reveal
nothing about the encrypted plaintexts beyond what is revealed by the evaluation
of predicates on them.
So a little bit more formally. If an adversary has tokens for predicates F1 through
FM, then a ciphertext for a plaintext X should reveal nothing about X other than
what the adversary can learn by evaluating F1 of X through FM of X.
And there's a whole series of works that achieve predicate encryption with
plaintext privacy for various predicates, starting with equality testing and then
conjunctions and disjunctions of equalities. And most recently the evaluation of
inner product queries, which was done by Katz, Sahai and Waters.
So one line of research looks at expanding the expressiveness of these schemes
and expanding the class of predicates that we can achieve. But in our work we
look at a different aspect, which is the security definitions. And we ask whether
in addition to plaintext privacy we can also achieve predicate privacy. So what
do I mean by this?
In our example of encrypted e-mail routing, I said that if the subject line includes
the word urgent you want the e-mail to be forwarded to your phone and
otherwise you want it to be forwarded to your desktop.
But probably the user wants to hide these e-mail routing rules from the server.
You don't want the server to know what predicates you're testing.
And in the encrypted file storage example, when you retrieve all files satisfying
some criteria, you want to be able to hide these search criteria from the server.
So it turns out that predicate privacy is actually difficult to capture in the public
key setting. And the reason for this is is that in the public key setting, the
adversary can encrypt any messages of his choice.
So for example, in this encrypted file storage case, if the server has a guess that
the predicate you're searching on is that the subject includes Microsoft, then the
server can just test this by encrypting a message where the subject is Microsoft
and testing whether the token successfully decrypts that message.
So for this reason we focus on the symmetric key setting. And we come up with
a scheme for predicate description that provides predicate privacy in the
symmetric key setting.
>>: [inaudible].
>> Emily Shen: Yeah?
>>: [inaudible] outputs X and two functions [inaudible] such that one of X equals
sub two of X and then the example Microsoft disappears. So essentially you will
just [inaudible] functions one of two an input X such that one of X is equal to two
of X. Right? And then you -- you will produce a [inaudible].
>> Emily Shen: So if you imagine indistinguishability game, the adversary is
going to say here are two predicates that I think I can distinguish tokens for. And
now it's going to get back tokens for those predicates, but you can encrypt
messages that help you distinguish between the two predicates.
Okay. So a systematic key predicate encryption scheme looks pretty much like
the public key version but now our setup algorithm returns just a secret key. You
need the secret key to encrypt a message and to generate a token you need the
secret key and a description of the predicate. And again, the query algorithm is
going to return one or zero, depending on the result of the predicate on the
plaintext.
Okay. So let's see what the definition of predicate privacy looks like a little bit
more formally. We're going to define this in terms of a game between an
adversary and a challenger. And in the first step the challenger runs the setup
algorithm, generates a secret key and keeps that to himself.
Then the adversary goes through a query phase where it can make queries that
are of one of two times. It can make ciphertext queries and token queries. So in
a ciphertext query, it's going to output plaintext XI and get back an encryption of
XI. In a token query it will output a predicate FI and get back a token for FI.
So you can run several of these queries and issue them adaptively. And then
when the query -- when the adversary wants to be challenged, it will output to
predicates F star 0 and F star 1 that it thinks it can distinguish between. And this
is subject to the restriction that these two predicates have to have the same
value on all of the plaintexts that were queried so far.
Now, the challenger is going to flip a random bit B and give back a tone for F star
of B. And the adversary can ask some more queries, subject to the same
restriction as before. And finally the adversary outputs a guess B prime of this bit
B. And it wins if B prime equals B. And we say that our scheme has predicate
privacy if no polynomial time adversary can win this game with more than
negligible probability.
So here I've shown the game in terms of a token challenge where in the
challenge phase the adversary's outputting these two predicates F0 and F1. And
this corresponds to the predicate privacy game.
If we replace the challenge step with a ciphertext challenge where the adversary
is giving two plaintexts and getting back an encryption of one of them, then this
corresponds to the plaintext privacy notion which is already achieved by these
public key predicate encryption schemes.
Okay. So in our work we construct a predicate encryption scheme with predicate
privacy. And we do this by transforming the inner product scheme of Katz, Sahai
and Waters into a symmetric key scheme that has not only plaintext privacy but
also predicate privacy.
So here's what the scheme looks like. The functionality is that we have a
plaintext X, which is a vector X1 through XM, and predicates corresponding to
vectors V1 through VM. And a predicate for vector V evaluates to true on X if the
inner-product of X and V is equal to 0 mod N.
So the key intuition for how we construct our scheme is that for inner products,
the ciphertexts and the tokens actually play symmetric roles in the functionality
and the security game.
So what I mean by this is while you can view the ciphertext as an encryption of
the plaintext and you can view the token as and encryption of the predicate and
now in the inner-product functionality these two vectors are treated symmetrically
and the same thing in the security game.
So if we can come up with a scheme where the ciphertext is created from the
plaintext in exactly the same way that the token is created from the predicate,
then if we prove plaintext privacy, this automatically implies predicate privacy.
So our initial scheme looks a lot like the scheme of KSW, which means that in a
relatively straightforward way, we can prove plaintext privacy. And the next step
is that we show through a series of modifications that our scheme is actually
indistinguishable from one where the ciphertexts and the tokens are treated
symmetrically, and so the plaintext privacy that we already proved gives us
predicate privacy for free.
Okay. So before I go into the scheme, you might ask why we're even
considering inner products. And it turns out that inner-product queries actually
allow us to make a lot of other more expressive queries as well. So I'll just give a
simple example of this.
So suppose we have an inner products scheme that has these four algorithms,
setup, encrypt, generate token, and query that gives us the inner-product
functionality. And I'm just going to show how we can use this to actually get
equality testing on elements of ZN. So if our plaintexts are elements of ZN and
our predicates are equality testing, we can use an inner-products scheme of
dimension two. So to encrypt an element J of ZN, we first create a vector which
is minus JN1, and we encrypt this vector using our inner-product scheme.
And to generate a token for testing equality to an element K, that's in ZN, then we
first create a vector which is 1, K and use the inner-product scheme to generate
a token for that. So now the query algorithm works exactly as before, we just run
the inner-product query algorithm, and you can see that the inner-product of
these two vectors minus J and 1 and 1 and K is equal to 0, if and only if J is
equal to K. Yes?
>>: [inaudible].
>> Emily Shen: On yes, this is ->>: [inaudible] very specially [inaudible] scheme.
>> Emily Shen: It is mod N, which is the N which is part of our scheme. Right.
Okay. So that's just a simple example. But you can also show that using inner
products you can come up with inner products you can come up with predicate
encryption schemes to do polynomial evaluation, and using those you can do
conjunctions and disjunctions and exact thresholds and other things.
Okay. So I'll just show sort of at a high level what our scheme looks like. In our
scheme we use bilinear maps, so we have two groups, G and GT which are finite
cyclic groups of order N where here N is a composite which is the product of four
primes, P, Q, R, and S. And we can write G as the product of these four
subgroups, GP, GQ, GR, and GS.
And we have a bilinear map E which takes G cross G to GT. The bilinear
property says that if we paired G to the A and G to the B, it's equal to the pairing
of G, G to the AB. So we can pull the exponents out. And the map has to be
non-degenerate, which says that if G generates the entire group G, then the
pairing of G with itself generates the entire target group.
Okay. So there are a couple of useful mathematical properties that we need in
our schemes. So one is that if we pair A with the product of B and C, we can
split this up into the pairing of A with B and the product of A and C.
The other useful property which comes up in groups of composite order is that
when we pair two elements G and H, which are of relatively prime orders,
basically they cancel out, and the result is one. So this means when G and H
come from distinct subgroups, program, if G is from the subgroup of order P and
H is from the subgroup of order Q, when we pair them together, it's just going to
cancel out and we get one.
>>: [inaudible].
>> Emily Shen: Yes. Basically you can write each element as a generator of the
group to some power. So if you look at an element of the subgroup of order P,
you can write it as G to the QRS to some power and if you do that for both of the
elements, then you can pull out the powers and you have a P, Q, R, S all in the
exponent. So it's going to equal one.
Okay. So again we have a plaintext X, which is vector X1 through XM and a
described V1 through VM. And we want the evaluation of a token on a ciphertext
to come out to one if the inner product is equal to zero. So as a first attempt, we
can say for the ciphertext first just generate random element G of the subgroup
GQ and raise G to the powers X1 through XM and for the token similarly choose
a random value H in GQ and raise H to the V1 through VM.
So now we can easily see if we just pair these elements together component by
component and multiple all the pairings, we end up with the -- we end up with E
of G, H, with the inner product of X and V mod N and the exponent.
So if the inner product is equal to zero, then this value is going to be equal to one
in the target group.
So this gives us the functionality we want but there's a lot of problems with it. So
one problem is that we get ciphertext-ciphertext interaction. What I mean by that
is if you look at the ciphertext, which is G to the X1 through G to the XM, we can
learn a lot about the XIs just by doing pairings on these terms.
So, for example, you can learn whether the inner product of X1, X3 and X2, X4 is
equal to zero by pairing the components in this way. And similarly for the tokens.
So the solution to this is that we multiply add masking factors. So in the
ciphertext we can multiply in random values from the subgroup GR. And for the
tokens we can multiply random values from the subgroup GS. So now because
of the cancellation property that I mentioned before, if we pair a cyphertext
component with a token component, the R and the S are going to cancel out
because they're from distinct subgroups.
But if we improperly combine two ciphertext components or two token
components, then the randomization is from the same group, and it's not going to
cancel out, and the result will look random. Okay.
So another problem that we still have is that we can sort of do partial evaluation.
And what I mean by that is you could basically combine just a subset of the
components and learn something about the inner product of only some of the
components. So, for example, you could learn the result of learn whether the
inner product of X1, X2 and V1, V2 is equal to zero by doing this pairing.
So the solution to that is in the exponent of the GP subgroup we encode an
equation which had evaluates to zero if all of the components are combined and
in the right order. So roughly what we do is in the ciphertext we multiply in these
terms F to the C1 through F to the CM, and in the token we multiply an F to the
D1 and through F to the DM and these C1 -- CIs and DIs are random but subject
to the constraint that the inner product of the vector C and the vector D is equal
to zero.
So now only if you combine all of the components and in the right order will they
cancel out and you get something that does not look random.
>>: [inaudible].
>> Emily Shen: Sorry?
>>: [inaudible] part of the secret key [inaudible].
>> Emily Shen: That's just something that's -- yeah, it's part of the secret key.
So I mean what we do is actually not exactly this. But it's something similar,
which is also an equation which comes out to zero if you pair things correctly. So
part of it is from the secret key and part of it is randomness that you generate in
these algorithms.
Okay. So to summarize, we have these four subgroups which each play different
roles. In the GQ subgroup we encode the plaintext and the predicate and the
exponent and do the inner-product computation. We use the GR subgroup to
mask the ciphertext components. The GS subgroup similarly masks the token
components. And the GP subgroup is used to ensure that all the components
are used and in the right order.
So in conclusion we achieved a predicate encryption scheme that has predicate
privacy in the symmetric key setting and supports inner-product queries. And
from inner-product we can also achieve equality testing polynomial evaluation
and a conjunctive and disjunctive formulas. And what this means is that in
encrypted file storage scenarios on the cloud a user can give a server a token to
retrieve encrypted file satisfying a particular predicate P without revealing what
that predicate is.
So that's the end of my talk. And I'll talk any questions.
>>: [inaudible].
>> Emily Shen: So I mean to me in the file storage case, that's a case where it
would be important because you're storing your files encrypted and now you
want to retrieve the one satisfying a particular predicate and you don't
necessarily want to reveal to the server what you're searching on.
>>: [inaudible] I don't understand the definition of predicate. If I take a
conjunction [inaudible]. Predicate privacy jugs [inaudible] conjunction to
[inaudible] it doesn't say what [inaudible].
>> Emily Shen: It's not even going to reveal that you took a conjunction of
equality [inaudible].
>>: I mean what you defined this predicate privacy, the fact that you get
conjunction for [inaudible].
>> Emily Shen: You won't even know that it's a conjunction of [inaudible].
>>: I know.
>> Emily Shen: Okay.
>>: But that's what you tried to achieve by predicate privacy, effectively it is a
conjunction, right?
>> Emily Shen: I'm not sure I understand the question. Okay. Sure.
>>: I don't know the numbers off the top of my head. If you take say ten 24 RS,
RSA size modulos and chop it into four equally sized parts, are they safe
[inaudible] of X?
>> Emily Shen: Are they safe against?
>>: [inaudible] P, Q, R, S.
>> Emily Shen: Right. So it means that we [inaudible] larger size.
>>: [inaudible] elliptic curve and [inaudible] because clearly you're nearer than ->>: That's much smaller.
>>: Much smaller. Okay.
>>: [inaudible].
>>: I'm sorry?
>>: That's about 50 [inaudible].
>>: Okay.
>>: Any more questions?
>>: So I'm just clear about this public key definition.
>> Emily Shen: Yeah?
>>: I can see the issue, but you still can [inaudible] I mean so what I'm saying is
maybe the -- it seems like if you can do it [inaudible] obfuscation, all right
[inaudible].
>> Emily Shen: Yeah. So that's something that people have mentioned. So it's
possible that you could define it in some other way, not with the same kind of
indistinguishability game, but maybe you could have a sort of simulation based
definition where you want to say that an adversary basically -- that the ciphertext
and tokens give the adversary no more information than he would get by just
having of access to a ->>: Yeah, but this is [inaudible] so what I'm saying is [inaudible], you know,
token from the function F, one way to achieve program obfuscation is you
encrypt X and then, you know, get F of X. So particularly it does give you a
[inaudible] FRX for any input you want. So, you know, if you could say that's the
best you can do for any attacker, whatever you can learned from this, you can
give [inaudible] doesn't exactly [inaudible] obfuscation.
>> Emily Shen: Yeah.
>>: So some [inaudible] but maybe for [inaudible] it's kind of funny. So this
[inaudible] goes the other way around. This thing is stronger than program
obfuscation, right?
>> Emily Shen: Yeah.
>>: So it may be for the [inaudible].
>>: That's [inaudible].
>>: Oh, I see. [inaudible].
>>: Equality function [inaudible].
>>: [inaudible]. I mean, it doesn't give you as [inaudible] it doesn't give you a
[inaudible] functions.
>>: [inaudible].
>>: [inaudible].
>>: But I'm just [inaudible].
>>: [inaudible].
>> Emily Shen: Yes. So we didn't explore it, but I think it's worth exploring the
connection.
>>: [inaudible] questions? Okay. Let's thank the speaker.
[applause].
>>: So the next speaker is our own Seny Kamara and he'll tell you about
structured encryption and controlled disclosure.
>> Seny Kamara: Thanks. So everybody can hear me? Okay. So as Vino said,
I'll be speaking about structured encryption and controlled disclosure. And this is
joint work with Melissa Chase.
So the setting that we consider here is that of cloud storage, which you've heard
a lot about today. So here we have a cloud provider that offers storage as a
service and we have clients that want to pay the cloud provider to store the data,
right. So they just send it over.
And as we all know there's a lot of services based on cloud -- on cloud storage.
So there is Web based e-mail like Gmail or Hotmail, Mozilla Weave, Live Mesh,
Windows Azure has Azure storage and Amazon has S3. And there are a lot
more services beyond these.
Okay. So cloud storage is great. There's a lot of advantages. But really the
main disadvantage is security, right? And the main concern here is basically
what's going to happen to my data, right? And typically this is addressed by
cloud providers in the following way:
They say, well, don't worry too much about it, we're going to encrypt the data,
we're going to authenticate it, it will be backed up, we have access control, our
data centers are very secure, we have guards, we have biometric access control.
So nobody's going to be able to get to your data. And these are all great -- these
are all great measures, but they really only provide you security against either
outsiders, like hackers, for example, or other tenants. So other clients that are
using the same cloud storage infrastructure.
So really the type model that we consider here and that you've heard about in
previous talks today is basically one where we don't trust the cloud operator
himself, right? So the question is instead of providing security against outsiders
and other tenants, can we provide security against the cloud operator himself?
Okay. So we might say, why do we want to protect against the cloud? These
cloud providers are rational entities, they're trying to redundantly a business, they
have a reputation, it's not in their interest to sort of do all these malicious things.
So you might want to create against the cloud for multiple reasons. One of them
is just plain lack of trust. If you're, for example, a pharmaceutical company that's
investing you know billions in R&D, you're probably not going to entrust a cloud
provider to store your data. Or you could be a government agency who basically
won't trust anybody to store any of its data.
And there's also legal situations where if you're a hospital or publically-traded
company, by law you might not be able to disclose some of your data to a third
party.
So in this work what we're concerned about is confidentiality, right? So I have
data. I want to store it in the cloud. But I don't want the cloud operator to see my
data. So there is very simple solutions to this problem. One of them is to just
use encryption so we take our data, we encrypt it, we send it over to the cloud,
and whenever we want to perform some operation on this data, we send a
message and the cloud returns the encrypted data back. We decrypt it and then
perform our operations locally. Right?
And obviously the downside to this approach is that there's large communication
complexity, right? It doesn't really scale well. If we're talking about terabytes of
data, we don't really want to be sending this around each time we want to do
some operation.
So another solution is to take the data when we have it if our possession and to
build an index, right? So if I have my e-mail collection, for example, I can build a
keyword index which allows me to do keyword search for this data, then I can
store the index locally, encrypt all my e-mails, send them to the cloud and
whenever I want to retrieve some e-mails I just send -- sorry, I query my -- I
query my index and I figure out which e-mails contain the keyword, and then I
ask the cloud to return those specific encrypted e-mails. Okay?
And this says good communication complexity but the downside is that it requires
large storage locally. Right? So the size of this index is going to grow as a
function of -- as a function of an e-mail archive.
Okay so. The question is can we achieve the best of both worlds? Can we have
constant storage at the client and small communication complexity?
And this is essentially the problem of searching encrypted data which was first
considered by Song, Wagner and Perrig in 2001 and basically what we want is
we want a system that the client can use to encrypt its data. And whenever it
wants to work on a subset of these documents, it can send a token that encodes
a particular keyword. The cloud can then take this token which hides the
information about the keyword, combine it with the encrypted data and return the
specific documents that contain the keyword. Okay?
All right. So this problem of searchable encryption can be addressed using
general tools. So one of them is two-party computation. But here if you use
two-party computation, the number of rounds that you're going to have to use is
going to be linear in the size of the data. On top of that, the server is going to
have to do work that's polynomial in the size of the data.
So this really doesn't give you any advantage over just sending the data back
and forth. So you could also use techniques from Goldreich and Ostrovsky,
which were called oblivious RAMs. But here sort of the overhead is going to be
fairly large. For every read and write of your RAM, so you have a program that
does keyword search, for every read and write that this program does you're
going to require logarithmic number of rounds and polylog work on the server.
Another thing you could use, as Craig mentioned this morning, is
fully-homomorphic encryption. And here you can get a one round solution. But
the server computation is going to be polynomial in the size of the data. And
again, this is sort of large for the times of parameters that we have in mind.
So people have tried to solve the problem by designing crypto systems or
schemes specifically for searchable encryption. So there's been a lot of work in
the symmetric key setting, in the public key setting and basically most of the
schemes have a one round solution but the work of the server is linear in the
number of data items.
So if N is your number of e-mails, then the server -- the computation for the
server is linear. But in 2006, work that we did with Curtmola, Garay and
Ostrovsky, we gave a solution that one round and that had sublinear search time
for the server and in particular it's optimal, it's basically linear in the number of
documents that contain the keyword.
Okay. So searchable encryption is great. The schemes are fairly efficient. And
it basically allows to you do private keyword search for encrypted data, right, or
encrypted text data. Sorry. You had a question? Sure.
>>: [inaudible].
>> Seny Kamara: So it seems to me, at least all the symmetric key schemes,
including this one, they do leak some information about the keywords. It's sort of
they -- they leak information by access pattern basically. So it's not specifically
about the keyword, but there's a little bit of the statistical information, and you
could, you know, sort of, you know, keep monitoring sort of the access pattern
and eventually get something. So, yeah.
>>: [inaudible].
>> Seny Kamara: Yeah.
>>: [inaudible].
>> Seny Kamara: So essentially kind of tricky, right, because you have to
weaken the definitions to basically leak a little bit of information. So if you're
defining things, you know, using the general tools it's fairly easy, you have an
idea of functionality, and you can just -- but here you actually have to weaken it.
>>: [inaudible].
>> Seny Kamara: Yeah. Yeah.
>>: [inaudible]. You can do it sort of optimally. Basically the server computation
time is output sensitive, so it's just ->>: [inaudible].
>> Seny Kamara: Exactly. Which is ->>: Which is [inaudible].
[brief talking over].
>> Seny Kamara: It's linear the number of documents with the keyword.
>>: It's the order of the number of ->> Seny Kamara: You just use the reverse.
>>: [inaudible].
>> Seny Kamara: You just use the reverse index on your data and you get
optimal search time. In the unencrypted case. In the encrypted case, you have
to use [inaudible] index but apply the crypto to it in order to make it secure.
Okay. So okay. So searchable encryption is great. We can search for
encrypted data. But really it's sort of -- it's actually fairly limited as far as a
functionality, right? So we can do keyword searches or with text data where a lot
of the data that's generated isn't text data, right? And in particular, a lot of this
data that's not text data we don't really care about doing keyword searches over.
Right?
So the question is can we profile query other types of encrypted data, and in
particular things like maps or image collections, social networks or Web page
archives. Right?
So this is a problem that we consider in this paper, and so we focus on a
particular type of data which is a graph structured data. And really this type of
data is ubiquitous and is being generated at a huge rate, right? So basically any
communication between, you know, between people generates graphs, right?
So if you look at e-mails headers, if you look at phone logs, you're going to get
huge graphs. If you look at research papers and citations and bibliographies,
you also get big graphs.
A lot of sort of network topology things generate these huge graphs, you know,
AS level graphs or Internet scale graphs as well of course social networks, right,
are generating huge, huge amounts of graph data. And.
And a Web crawlers for search engines, right, for bing or Google, they're
generating massive, massive graphs.
And finally you can also think of maps as graphs, where you have basically
intersections are the nodes of the graph and the roads between the intersections
are your edges.
So we felt that this was sort of an interesting type of data and we wanted to see if
we could encrypt it in a way that we could actually do graph queries on top of this
data.
Okay. So we introduced this idea of structured encryption, which is basically a
generalization of searchable encryption. So other types of data, not just text data
but arbitrarily structured data. We have a formal security definition which is
simulation-based. And we give a few constructions. So we show how to do
adjacency queries on encrypted graphs, which are basically given two nodes
[inaudible]. We started with neighbor queries on encrypted graphs, so given a
node I want to get back all the nodes that have that are adjacent to that node.
And we also do something called focused subgraph queries on encrypted Web
graphs. And these are sort of more complex types of queries. And I'll explain
those later. Okay.
And finally we consider sort of a new application of structured encryption, and
therefore searchable encryption to a problem that we refer to as controlled
disclosure. And we show some applications of -- we sort of mention some
applications of controlled disclosure to cloud-based data brokering, which if I
have some time, I'll explain what that is.
Okay. So for structured encryption, so if we take the example of graph structured
data, right, so the idea is we have our client, he has this graph, he wants to
encrypt underneath the cloud and later he wants to make some query, some
graph query on this graph. He's going to assign a token, which is being to
encode the graph query he wants to make, but it's going to hide the information
about the query's making. And then the cloud is going to be able to return the
encrypted -- the encryptions of whatever the answer is. Okay?
So for this particular type of encryption scheme, right, you can see sort of the
message space is a little bit weird, right? So typically, for typical encryption
schemes the message space is either a bit string or some [inaudible] of a group,
right? But now we're encrypting something that has a lot more -- is a lot more
structured.
So the first thing that we have to do is we have to understand what exactly is our
message space. So the way we describe the message space is as follows: So
we say -- so we -- okay. So we viewed the but as some structured data. And
then we say we're going to decompose this structured data into two elements.
One of them is a data structure that encodes the structure of the data and the
other one are the data items, which basically are just bit strings, right? Which
sort of have whatever information we want to associate with the structure
encoded by the data structure.
So as an example, if you have an e-mail archive where you [inaudible] e-mails,
then we're going to view it as a combination of a keyword index and then the text
in the e-mails. Okay? So this index just encodes the keywords associated with
each of the e-mails.
And if I query -- if I query this data structure, if I query this index on the keyword
I'm going to get pointers into the particular e-mails that contain the keyword.
So another example is for social networks. So here we have a social network,
and we're going to view this as a combination of a graph which encodes the
friendships between the people and their profiles for example. And if I do a
graph query on this graph, I'm going to get pointers to the different -- to the
different profiles. Okay.
So such encryption schemes composed of five algorithms. The first one is a key
generation algorithm that's a security parameter outputs a symmetric key. Then
there's an encryption algorithm that takes the key, takes the data structure, delta,
which encodes the structure, and then a vector which consists of the data items.
Okay? And it outputs an encrypted data structure gamma, and a ciphertext C.
And this is where the client is going to run on the structured data in order to
generate the ciphertext, which is composed of the actual encrypted data items
and the encrypted data structure. And that's what he's going to send to the
cloud.
Whenever the client wants to make a query on this data, he's going to use the
token algorithm on his query, which is going to generate a token, which he's
going to send to the cloud and then the cloud is going to take the token and the
encrypted data structure, run this query algorithm on the encrypted data structure
with the token, and this is going to output a set pointers, right? I is just a set of
numbers, which basically point to the particular data items that satisfy the query.
And then it can just go fetch those encrypted items and send them back. And
then the client can just decrypt those items individually. Is.
So as far as security, so we give the definition which we refer to as security
against adaptive chosen query attacks. So it's simulation based. And essentially
the guarantee is given the ciphertext no adversary can learn any information
about the data or the queries other than what can be learned from the access
and the search patterns. And that's even if the queries are made adaptively.
So by adaptive, by adaptive queries I mean that even if the adversary makes his
queries as a function of answers of previous queries or as a function of the
ciphertext. Okay?
And so there's this caveat in the definition which had says that nothing can be
learned other than what can be deduced from the access and the search
patterns, which basically means that the schemes leak the access in the search
pattern and to be more precise, what I mean by access the search pattern is the
following. So the access patterns are basically the pointers to the encrypted
documents that satisfy the query, all right, which we're willing to leak in this case
because we want an efficient solution. And in any case the server -- we want the
server to return these encrypted data items. So we might as well just leak it. So
there are techniques to hide this. But they're much more expensive.
And we search and we leak -- we leak something else, which is the query pattern
which is basically whether a query is repeated. Okay? And this is pretty
standard, at least in the case of symmetric searchable encryption. All the
schemes leak, leak these things. Some leak more. But this is sort of the best we
can do without interaction. At least so far.
So we have this simulation-based definition and this definition implies the lower
bound on the size of the tokens. In particular. So the lower bound is lambda
times log N where lambda is the number of data items that satisfy the query and
N is the number of data items, okay? And this lower bound is in the standard
model. So and then the local model you can do better.
>>: [inaudible].
>> Seny Kamara: It's sort of obvious. I mean, the thing is that this -- so this
definition is adaptive basically so you get into a situation where the simulator
basically has to sort of commit to some encryption and then he's going to get
queries after he's committed to the ciphertext. And he has to answer those
queries correctly. So he has to generate tokens that are actually going to work.
But he only gets the query -- so he only gets information from the query
afterwards. And if you want to be able to satisfy any possible query basically a
token -- you're going to have enough tokens sort of in your token space to satisfy
any possible answer.
So in terms of the size it's going to be basically related to the size of all the
different answers that you have to be able to sort of simulate. And the log N is
really just because what you -- what you encode in terms of the responses are
pointers into the items and you have N items, so you need log N in or to -- so
that's pretty much it.
>>: [inaudible].
>> Seny Kamara: I'm sorry?
>>: [inaudible].
>> Seny Kamara: Not in this work. [inaudible] which, you know, which has some
nice properties, right, typically -- you typically prefer these types of definition, they
compose easier, you know, they're sort of more natural also. So -- but, yeah, you
could also formulate a -- you know a [inaudible] based definition.
Okay. So we consider -- so we consider, we give different constructions. So we
do adjacency queries on encrypted graphs. And we do this -- so we design a
scheme that handles lookup queries on encrypted matrices and then we basically
just look at the adjacency matrix representing on the graph and then just use the
scheme on top of it.
We also do neighbor queries on encrypted graphs and we actually show that this
can be built from any structured encryption scheme that handles keyword
searches, so basically any searchable encryption scheme.
And then as I mentioned, we do also focused subgraph queries on encrypted
Web graphs. And for this we need a scheme that handles keyword searches on
encrypted data, so basically and SSE scheme, and a scheme that handles
neighbor queries on encrypted graphs.
So I'll talk a little about how we do neighbor queries on encrypted graphs, and
then I'll talk about how we do focused subgraph queries, depending how much
time I have.
Okay. So how do we do neighbor queries on encrypted graphs? So basically
what I want is we have this structured data which looks as follows: So we have
this graph, and then we have the data items and basically the yellow data item is
basically whatever data we want to associate with the [inaudible], right, et cetera,
so this is our -- this is our input.
We want to encrypt this, send it to the cloud and then generate a token to do a
neighbor query. So we send a token for the green node and then the cloud is
able to figure out which encrypted documents are connected to the green node.
Okay.
So the [inaudible] we use, we use a dictionary which is basically just a key value
store. We use a pseudorandom function and we use some form of non
committing symmetric encryption. And this we can build some things using a
pseudorandom function in XOR. In this case, we have lower bound on the token
size or we can use random oracle in XOR and we don't have to worry about the
lower bound, we get tokens that are as large as the security parameter.
>>: [inaudible].
>> Seny Kamara: No. So everything here is static. Okay. So what does this
scheme look like? So this is our graph. The first thing we do is we generate the
adjacency list representation of this graph, which consists of for every node we
just write down all the nodes that are adjacent to it. So this should be a familiar,
familiar data structure.
And then what we do is we use the pseudorandom function on the nodes, right,
so we just turn them into random looking values. And then we use the
non-committing encryption scheme on the list of edges, right, for each node. And
the key is the function of the particular node that we're working with. Okay?
And then we just store this in our dictionary. And that's what we call an
encrypted data structure. And we send this along with the encrypted documents
to the cloud. And whenever we want to -- whenever we want to do a neighbor
query on a particular node, we send a token that has this form. So the first
element is just evaluation of pseudorandom on the particular node that we're
trying to query and then we generate this key for the non-committing encryption
scheme. We send both of these elements to the server. The server takes the
encrypted data structure, so this dictionary queries it using the first element,
which is this random looking string. The dictionary returns the associated value,
which is basically the non-committed encryption of the nodes that are adjacent to
it, and then it uses the key to decrypt and give back which nodes are connected
to it, right? So it's very simple.
That's essentially how the scheme works. You have to take care of some details,
but this is at a high level how it works. Okay. So that was neighbor queries on
encrypted graphs. And now we -- okay. So ->>: [inaudible].
>> Seny Kamara: Yes. So in this area, right, it's sort of the -- we're trying to get
this -- get the right tradeoff between efficiency and security, right? And we know
how to do these things like completely securely. We use oblivious RAMs and
we're done. But, you know, here we're really trying to get something that's
actually backwards. So we are looking some information. And to be honest, it's
not clear. Like we don't have a good way of assessing how much information -you know, how dangerous this is, right? But we're sort of willing to say, you
know, we're going to leak this much information in order to get efficiency and
hopefully that's -- hopefully it's not too much.
>>: But you feel it's much better than saying this is my graph, I mean here's a
[inaudible] but you are saying the [inaudible] information especially from
[inaudible] by just saying there is some random node connect to this
parameterized [inaudible].
>> Seny Kamara: Yeah. Presumably [inaudible] this information, and it would
take a long time before you could actually reconstruct any meaningful
information. But, you know, we don't have a good way of assessing that, right?
It's sort of -- but, in any case -- but sort of from a practical point of view it's better
than what we have now, which is nothing. Right? So ->>: [inaudible].
>> Seny Kamara: Yeah, but what I mean is the following. So we know -- so we
know what is being leaked by the scheme. So we -- we can characterize what is
being leaked in some way. We can say it's the access pattern, it's this, these
corners and these elements, right? Now, the problem is that we're actually using
this multiple times. Then it's not clear, like if -- so if I look at the information that's
being leaked over multiple, multiple queries, right, I can start to sort of make
inferences about -- so I can start -- I can try to guess what you might be
searching for. Okay. So take the case of keyword search, right? So if I see that,
you know, you're doing a bunch of searches that hit a lot of documents, right, and
I know that you work at a particular company, maybe I can start inferring things
about what you're surfing for. And the more of this data I get, the better, you
know, I can infer. So it's not -- I mean, so it's not that we don't know exactly how
to statistically what would be -- what we're leaking. Yeah?
>>: And you don't want to [inaudible] public encryption because it's [inaudible].
>> Seny Kamara: Yeah, I mean you could use, you know, these sort of generic
techniques and you could use homomorphic encryption, you could use oblivious
RAMs [inaudible] yes?
>>: [inaudible].
>> Seny Kamara: Yes. Okay. So the example I gave, which was neighbor
queries on graphs is really sort of a simple type of query on a graph, right? But
in a lot of cases we have more complicated types of graphs. We have sort of
objects that have -- that might fix different types of structure, right? So one
example are Web graphs which are basically just collections of Web pages, right,
and Web pages of hyperlinks.
So Web graphs essentially consist of text data, which is in the pages, enough
graph data, right, which are the hyperlinks between the pages.
So you can do simple queries on Web graphs. You could ask for all the pages
that are linked, you know, to a -- from a particular page or all the pages that link
to a particular page. So these would just be plain neighbor queries on a Web
graph. But you could also ask more complicated queries on Web graph, right,
because they have this extra structure. So you could ask queries that basically
mix both text, the text and the graph structure of the data. Right?
And one example -- so and where this comes up a lot for Web graphs are for
search engine algorithms, right? In particular -- so the more modern search
engine algorithms like PageRank or HITS basically they don't -- when you do a
keyword search in your search engine and they give you back a ranking, they
don't just look at the text data, right, of the Web pages. They look at the text.
But they also do some computation on the link structure. So they mix both. And
so yes. Some of the more well known algorithms are PageRank, obviously, that
Google uses but in particular I'll highlight Kleinberg's HITS algorithm because this
uses actually focused subgraph queries which is what we do, and there's some
derivatives of this algorithm, SALSA, and a bunch of others.
Okay. So what is a focused subgraph query? So the way these search engine
algorithms work is basically the first thing they do is they look at the Web graph,
they compute a focused subgraph and then they bring this iterative algorithm on
this focused subgraph and then they output a ranking, okay? And then they send
back, you know, the hundred best pages.
So if I'm doing a keyword search for crypto, then a focused subgraph is
essentially the following. So first I do a keyword search over all the Web pages
and I figure out which pages contain the word crypto. So in this case, it's these
three pages. And then I add to this subgraph any page that's connected to these
pages or any pages that connects two of those pages, right? So I just add all
these pages. And that's my focused subgraph. Okay? So that's all that means.
Okay. So how could we -- how could we encrypt a graph or Web graphs so that
we can do focused subgraph queries on it? So one approach is to just take this
Web graph and encrypt it with a structured encryption scheme that handles
keyword searches, output a ciphertext, and then script it with a structured
encryption scheme that handles neighbor queries. And output is a second
ciphertext and the ciphertext is a combination of both, right? And suppose it
won't work it. Because if I gave you a token for the keyword search, for example,
you'll be able to figure out where the server would go, figure out which
documents or which Web pages contain the keyword. But then I still need to give
you a token so that you can get the neighbor, the neighbors of that particular
Web page, right?
And I don't know, I can't predict what the answer to that query is going to be. So
there's no way for me to do this interact without interaction. Okay? So in the
paper, we sort of introduced this chaining technique which allows us to combine
structured encryption schemes in order to combine -- in order to generate a
structured encryption scheme that's more complex. And to handle focused
subgraph queries, we combine structured encryption for keyword search with
structured encryption for neighborhood queries.
And the nice thing about this approach is that it preserves the token size of the
first scheme. Okay? So the token sizes don't add up. You have the second one
for each structured encryption scheme that you use, you just send one token for
the first one. Okay?
So this chaining technique is useful but it requires an extra property from the
structured encryption scheme which we call associativity. And essentially what
this means is that you have a more complicated message space which includes
-- so the data structure, both the items, the data items and also a vector of what
we call semi-private information, which is information that's private in one way but
public in another. So -- and I'll explain a little bit what that means.
And the sort of the answer spaces are a little bit more complicated. You get
pointers to the data items but you also get this semi-private information. Okay?
And it turns out -- so there is a scheme that is -- that handles keyword -- keyword
searches that is associative but it's not secure in this adaptive sense, which is
what we need.
So in the paper, we propose a scheme that does the handle keyword searches,
this associative and that's adaptively secure. Okay.
So a little bit more precisions on what we mean by associativity. So these are
the algorithms that we had initially for structured encryption. So now the scheme
is structured encryption scheme is associative if it has the following properties:
So on top of the data structure and the data items we also have the semi-private
information. And what this means is for each data item I can associate another
item. All right?
So for M1, I have V1, for M2, I have V2. Okay?
And what's the point of doing this? The pound is when I run my query algorithm
on the encrypted data structure and the token, I'm not only going to get the
pointers into the data items that are relevant for this particular query but I'm also
going to get the associated data. And I'm going to get this in clear text, right? So
it's not encrypted.
So it might seem a little bit word. Why do I want to release information? But it
turns out that we actually need this in order to handle complex queries. Okay.
So I don't know how much time I have left.
>>: [inaudible].
>> Seny Kamara: Oh, good. All right. So how do we do focused subgraph
queries on Web graphs? Okay. So actually before -- let me go -- okay. So we
view on Web graph as a combination of three things. There's a keyword index, a
graph, and then the data items. Okay? So now our message space looks like
this. One encrypted, some tokens, and I get back the encryptions of the
documents associated with the focused subgraph. So we're going to build our
encrypting scheme for focused subgraph queries out of two of the schemes, one
that handles keyword searches and one that handles neighbor queries.
So given this Web graph, right, so a bunch of Web pages with hyperlinks, the first
thing we do is we use the structured encryption scheme for neighbor queries and
we generate tokens for neighbor queries for each node of this Web graph. So
we start with the first page. We generate a token for neighbor queries that we go
to the second one, we generate a token for neighbor queries, et cetera.
Okay. Then we use the structured encryption scheme that handles keyword
searches and we encrypt the following things: So as the data items we use the
Web pages, right, so just the plaintext data. And as the associated semi-private
information for each Web page, we encrypt the token, right? The token that
handles neighbor queries. And so we do this for all the Web pages. And so this
is going to generate a ciphertext, right? Okay.
So then we use our structured encryption scheme for neighbor queries and we
use that to encrypt the graph structure of this Web graph. So we just read this as
a graph, we wish to encrypt this using the encryption scheme for neighbor
queries. So this is going to generate -- so now we have two ciphertexts, one that
handles keyword searches and one that handles neighbor queries. And our
ciphertext for focused subgraphs is basically just the combination of the two.
Okay. So how do we actually perform the queries on this ciphertext? So this is
what a ciphertext looks like. Our token is basically just a token for keyword
search, right? So we want a query for the word crypto, we generate a token for
crypto using the structured encryption scheme that handles keyword search. We
send this to the cloud and then the server uses that in conjunction with this
encryption scheme, right, that handles keyword searches, and this with this
token, so the ciphertext with this token is going to allow the server to figure out
which encrypted documents contain the keyword crypto, right? And because of
this associativity property it will also enable it to figure out the tokens associated
with those pages. Okay?
So that's why we actually want to release this information and encrypt text. So
we just -- so okay. So in this case, there's just one file that contains the word
crypto, so we get a pointer to that file and we get this token, and then we use this
token together with the encryption scheme that handles neighbor queries, right.
So now we have the encryption of the graph structure. We use this token here
with this and this is going to allow us to figure out which nodes are neighbors of
the yellow node. So basically this one in three, okay.
So in this way, we've done focused subgraph queries which are basically more
complex than normal graph queries, right, by combining these two different types
of encryption schemes without using interaction.
So if I have a little more time I'll talk about controlled disclosure.
Okay. So the typical application for structured encryption or social encryption is
doing private queries on encrypted data. Right? This is set of what we're all
familiar with. And this is fine. But there may be situations where you actually
want the cloud or the server to do some computations on your data. And maybe
you don't, you know -- you don't care that much that it learns some information
about your data, right?
So I have this social network. I send it off to the cloud. And I want the cloud to
compute some very complicated function of this graph. And my data is so huge
that there's just no way I'm going to be able to use fully homomorphic encryption
or two party computation. It's just not feasible. So maybe I don't mind it learning,
you know, parts of my graph. Right?
Okay. So that could happen. Now, what if the algorithm is a local algorithm,
right? So the algorithm doesn't actually need to see all of the data, right? It only
needs to see part of the data. Right? So again, I have this social network, I want
it to compute something about my network. But really it only needs to see like a
small subset of my graph, right?
So in that case, maybe I don't want it to see the rest of my graph, right? Maybe
part of my social network is related to my family, so I don't really want them to
like go snoop, snoop around there. But I want them to compute something over
the part of my social network that's related to my work, okay? So I don't mind
them learning information that's related to my work friends, but I don't want them
to learn anything about my family.
So the algorithm is local, right, this type of security guarantee could make sense.
And so controlled disclosure is basically just sort of mechanism that would allow
you to disclose pieces of your data, right? So you encrypt your data, you send it
off, you want the server or the cloud to perform some computation on your data.
You don't mind them learning part of the data, but you don't want them to learn
all of the data, right? So you want to be able to just disclose a subset of the data.
So here again if we have the social network, we want to encrypt it, send it, and
then I'm going to send the cloud a token. This token is going to allow the cloud to
recover basically a subgraph in this case, right, just a small subgraph, and then it
can evaluate some function F on this subgraph and send me back the answer.
Okay?
Okay. And so you can use structured encryption to do this kind of thing. And so,
you know, this could be useful just sort of independently in and of itself but one
application or another application is to sort of these cloud-based data brokerage
system. So if you haven't heard of these, so there's a few examples I didn't know
of. One of them is being developed at Microsoft, it's called Dallas for now.
There's another server it's called Infochimps.
And so the idea here is that you might have some -- a producer of data, like the
government, for example, that produces massive amounts of data and it wants to
store it in the cloud, right, because these are really, really massive datasets. And
another people want to use this data. They want to write applications that use
this data. And so they're going to interact with the cloud. Are they going to pay
the cloud in order to use the data, right?
So they pay the cloud a certain fee per query on this data and then they get back
just a piece that they pay for. And then they do their computation and they, you
know -- and they're happy. So that's what I'm calling a data brokerage service.
And this is actually being built. I mean, Microsoft has a service that's available to
do this.
So controlled disclosure could be used in this setting in the following way. So the
client basically encrypts its data, sends it off. Then -- so this is -- sorry. So this is
the producer of the data. And the consumer generates this query, sends it to the
producer, the owner of the data. The owner sends them back some token. He
sends this to the cloud. The cloud is able to recover a subpiece of the data,
whatever is actually needed for the consumer to perform its computation, and
then the consumer can just, you know, run his app in the cloud on this data,
okay?
And if you do things this way, as long as the producer is will to remain online and,
you know, and send tokens around, the producer has a stronger guarantee than
what's available now. In particularly, he can actually get an accurate count of
which queries are made or how many queries are made on his data. And this is
even if there's collusions between the cloud and the consumer.
Okay. That's pretty much it.
[applause].
>>: The next talk is by Giuseppe Ateniese from Johns Hopkins. And he'll tell us
about cloud cryptography, give back control to the users.
>> Giuseppe Ateniese: Thanks. So don't have to spend much time on
motivation since previous talks already talked about were cryptography is
important for the cloud. Definitely the cloud is a big business opportunity and it
will fly only if users will feel in control of the data. So I'm going to focus on two
cryptographic primitives I've worked on that will enable this. But first let me thank
my coauthors. As I mentioned in the previous talk here at Microsoft, I'm lucky
because my first name starts with A, so I'm always mentioned in all these papers.
But so I will touch the topics related to provable data possession, which is with
Randall Burns, Reza Curtmola, Herring, Kissner, Peterson and Dawn Song.
And I'll talk briefly, very briefly about some new results with Di Pietro, Mancini,
Gene Tsudik, and some work with Seny Kamara, Jonathan Katz that appeared in
Asiacrypt. And then I'm going to talk about briefly proxy reencryption which is
another cryptographic tool. And this is joint work with Kevin Fu, Matt Green and
Susan Hohenberger and some recent work with Karen Benson and Susan
Hohenberger key privacy.
I'm not going to of course touch -- focus on the details. The idea is that I'm going
to mention some ideas, and if you want to know more we can talk offline.
So the first part of the talk is about provable data possession. And it's clear that
cloud storage server benefits. In particular, clients with limited resources can
outsource storage. Cloud provides universal access independent of location.
For instance, I can access my mail, I can access my documents. And it provides
free services like data backup, recovery, or archival. And one may argue that
also for the average user at least cloud storage provides more security because
if someone -- even though the services are always online and available, they are
usually only monitored by professional people. So in case of attack or, you know
-- intrusions can be kind of easily detected in most cases. At least then for the
average user that, you know, usually doesn't choose strong passwords or doesn't
encrypt file disks. This may be a good solution.
In addition, what we focus on here is mostly archival storage. So there are
legislations that require data to be retained for several years, sometimes forever.
And the data's to be available. And again, outsourcing data to third party avoids
the initial setup cost and avoid these nightmares like maintenance and scalability.
So in particular, we focus on archives like the Library of Congress in the US.
They are supposed to keep this data available forever. And we are talking about
several hundreds of terabytes of data that has to be kept forever. So the
question is for instance suppose we want to provide the content of the Library of
Congress available to every taxpayer. The idea is how can we make sure that
this information is actually stored correctly?
As you might know, you know, the Library of Congress contains books that
people may not even know the existence of. So it might be possible for a storage
provider to not necessarily delete but at least but this information that is never
requested by users on a secondary storage device like tapes, okay. Is so that it's
not deleted but it's not using premium storage space, okay?
So here we are dealing with sort of adversary that the storing very large amount
of data, okay, and is willing to delete a percentage of this data. Not necessarily,
again, deletion means destroying of data, it means that this data is moved to
secondary storage devices, okay, for instance.
And unfortunately we cannot trust third party cloud providers necessarily.
There's always the incentive to reduce cost and increase profit and so it may be
that third party cloud providers might discard data that is not accessed or is rarely
accessed. And it could be that there are -- there is an incident and some data is
lost, but since nobody access that particular information, who cares. So maybe
they will not notify the customers.
But also, you know, you can -- if you start for instance financial records and
things like that, I mean, third party providers might intentionally modify data. So
we want to avoid all these problems.
So provable data possession is this area. And it's a set of tools that allow my cell
phone -- here I'm lighting the cell phone because I'm looking for very efficient
solutions, okay. So I don't want to use anything strange or esoteric.
So can my cell phone verify that the entire content of the Library of Congress, for
instance, is stored and available online? Okay? So let me clarify that this is kind
of hard problem in some sense because once I store this information in the
cloud, I don't have a local copy. So it's not like oh, I have a backup and then I
can check whether the information I have is the same as the backup stored in the
cloud.
Now, the idea here is I have information in my disk and I move this information to
the cloud and I'll just have a diskless computer. Okay? So I don't have this
information anymore. Okay? So even if I don't have this information, is it
possible to verify that everything is there? So if you store your -- the pictures of
your family, movies and stuff, can you check that the picture you took like 30
years ago is still there even if you don't have a local copy? Okay. So that's the
question we want to answer.
So the answer is yes. And you can actually do it efficiently. So before we look at
some solution, let's first see some partial solution that might not really work. So
suppose I store everything in the cloud and I want to check if everything is there.
Okay? One obvious way of doing it is to retrieve this information. Okay? This of
course is very inefficient because I will have to retrieve for instance several
terabytes of data. Okay?
So another possibility will be okay, so what I'm doing is before storing this
information I'm going to compute several MACs on this information, on this -- let's
say huge file. And then I store these values on my computer. So I have to store
very short values.
And then later when I want to check and say I go to the cloud provider and say
hey, compute this MAC on my file and give back this value to me and I will check
whether that MAC is valid.
Now, this solution actually doesn't really work because here we are talking about
the file of several terabytes. If I ask Google to compute a MAC on 75 terabytes,
for instance, it will take weeks. Okay? So it's not a problem of -- the MAC is a
very efficient primitive. The problem is accessing several terabytes requires a
long time. Okay? So that's not good.
So what I could do is, though, I could use a probabilistic approach. So rather
than ask you for an entire file, what I could do is I can say okay, look, this is a key
to compute a MAC on, and you know, compute a MAC on block number one,
block number 10, block number 50 and 100. So I pick random file blocks, and I
ask the cloud provider to compute MACs on those, okay. Since we are
considering an adversary that is deleting, you know, a percentage of the file, like
for instance, a one percent, I don't have to ask for many blocks to have a good
probability of catching a misbehaving provider. For instance, if I delete one
percent of my file, if the cloud provider deleted one percent of my file, I can ask
for about 500 file blocks if I want to have a probability of caching the adversary
higher than 99 percent.
Okay? So this is not -- what the cloud provider will do is get these file blocks or
books in the case of the Library of Congress and compute the MAC under the
key provided by the client and send back these books and the MACs,
corresponding MACs.
Now, this solution is better definite than the previous ones but still not
satisfactory. First of all it's linear with respect to the number of file blocks I
queried the cloud provider of. And also I need to just check. Why do I need to
get the books in the first place or the file blocks in the first place? I mean I don't
need to have those just for checking. I don't need to read those books. I just
want to make sure that the cloud provider is storing those books. That's it. Or
those file blocks. Okay?
So the question will be can I check that those books are stored without
downloading the actual books? Okay? So essentially using constant bandwidth.
>>: [inaudible] a little bit confused. So is it Google is computing -- you're giving
the key format to Google? I mean, if you don't ask Google, I mean you can't use
security of a MAC you [inaudible].
>> Giuseppe Ateniese: You store those before.
>>: So you compute ->> Giuseppe Ateniese: Yes.
>>: You [inaudible].
>> Giuseppe Ateniese: Yes.
>>: Why not just [inaudible] you're saying Google refuse to store -- I mean
usually ->> Giuseppe Ateniese: You can also store -- yeah, of course. If you don't have
local storage, you can store it using, for instance, authenticated encryption and
say before you compute this, give me my answers that I stored in this safe.
Yeah, I mean.
So the idea here is though that we don't want to download books, okay, so we
don't want to download files. We don't need to read them in order to check if
they are stored. So our target is to kind of aggregate MACs so if I have several
MACs I want to find a way to compress them into a single value. And also, I
don't want to send file blocks at all. So I need to be able to check these MACs
which now are just a single value without having the actual messages, which
sounds a little bit crazy.
So since not so many of you are necessarily cryptographer, let me just quickly
refresh memory like what RSA signatures. Because our solution will be based
on RSA. So if I have N equal to the product bit primes, in this particular case we
are using sub primes where P prime, Q prime are also primes, and E is a public
exponent RSA, the secret exponent such that E times D's congruent to 1 mod
field N, and so the public key in RSA signatures is EN, and the secret key is D
and the factorization of N.
So what you do in order to sign a message in RSA, you use a random oracle H
and you hash your message M and raise the D mod N and to verify the signature
you basically raise the signature to the public exponent. Okay?
And if you get back H of M, then the signature is valid. This is just a standard
RSA signature in the random oracle model. Okay? And indeed our schemes
also work in the random oracle model.
So let me look at a first simple solution. So suppose I have a block MI, okay?
And what I do is I want to compute like a MAC or a tag on this block, okay? Now,
this MAC or this tag or whatever will look like this. So H is again a random
oracle. I'm simplifying a lot the notation. But consider this WI as just an index I,
okay? So this represents the position of the file block. Okay. For instance, I'm
considering the second file block, this will be two. Okay? So I'm hashing two.
And times G is an element in this [inaudible] with very large order and this will be
raised to the actual file block. Okay? And everything will be kind of sign using
RSA. So I'm raising this to D mod N. Okay?
So imagine this is our attack. So how can I design a scheme to verify that MI is
actually stored at Google without knowing MI? So what I can do is I can do this
interactive problem. So I'm asking Google, for instance, hey, I want to know if
you are storing my Ith block. Okay?
So previously of course I sent MI to goal, okay. And along with a tag TI that I
computed as in the previous slide. Okay. So what Google will do is will just send
MI and the actual tag, TI. Okay? And then I just verified that this tag is valid.
Now, here I'm kind of cheating because as I mentioned before, I don't want to
send the message in the first place, right? But here I'm just sending the
message and a kind of signature on this message. Okay? Has just a
simplification. Indeed this is just a simplification. Let's look at the case where we
have many file blocks. So this is the full solution.
So suppose I had this very long file, okay. So what I do is I store this file into the
cloud provider along with the tags, okay? And then I want to check that this file is
still there. So I use a probabilistic approach. So I'm going to ask for random file
blocks. Okay? So I'm going to say to the cloud provider, hey, I want to know if
you are storing my first book, my third book, my sixth book and my seventh book.
Okay? Yup?
>>: [inaudible] storage provider from just choosing, you know, the seventh book
and the eighth book and the ninth book?
>> Giuseppe Ateniese: So you avoid this problem by using this index here as I
mentioned before in the previous slide. So if you look at this, this is actually -well, it's not exactly just an index, but you can assume that this is just -- so this is
-- specifies that this is block number one, and this is block number three and this
is block number six. So if you use different books, this verification will not pass.
So the position is specified inside these hashes. So the client says okay, I want
to know M1, M3, M6 and M7, okay, and, by the way, I'm also sending some
random integer that could be small like 80 bits each. Okay? And now I'm calling
these integers A1, A3, A6, A7. They are randomly generated. For instance, you
can use a seed in the random oracle itself to generate these values. So you
send the random value, a random key to the cloud provider and the cloud
provider uses these keys to generate these integers.
Okay. So what the cloud provider will do is will compute -- so will pick all the
tags, T1, T3, T6, T7 that I stored before, okay, and will compute this mod N.
Okay? So T1 to the A1, T3 to A3, T6, A6, T7, A7, all multiplied. So you can see
that this is a single value mod N. Okay? And then over the integers is going to
compute a function of the file blocks I was queried -- I was interested in. Okay?
So it's going to compute A1 times M1 plus A3, M3, and so on, so forth. This is an
integer, not mod anything. Because of course the cloud provider doesn't know
how the order of the group. Okay? Which is part of the secret key in RSA.
So the cloud provider will send these two values. As you can see, suppose the
verification works and after the verification the idea is that I can check that this
cloud provider is storing these files. I'm sending very short information here,
right? So T is a single value in this, okay, and this is an integer actually grows
very slowly, right? It's a sum, so it grows logarithmic with respect to the number
of terms in the sum. Okay? So this is essentially as big as a single block.
Okay?
So rather than sending a linear number of blocks, I'm just sending kind of
essentially a single block. And what I do is I need to check that this message is
for a technicality is between ->>: [inaudible].
>> Giuseppe Ateniese: Yeah, I mentioned before, they could be small, they
could be usually 80 bits. So what I do is I verified that practically these tags, this
kind of aggregated tag was computed correctly, okay? And so what I do is I pick
whatever the cloud providers send me, I raise that to E, okay, and then I divide
this by -- if you remember how this T, these TIs were formed, right, like where H
of the index, G to the five block. Okay? So what I'm doing essentially is I want to
remove the H part from this value. Okay?
So I'm removing these H parts from this. And what I'm left is G to the M. Okay?
If this verification passes then I can claim that those files, those file blocks were
actually stored there at the server, even though I don't have these file blocks in
the first place. Okay? Yup?
>>: [inaudible] M is less than T?
>> Giuseppe Ateniese: Right. I mean, it's important because it's actually it
happens in the security proof. If M is bigger than E, you can cheat because you
can pick -- so in order to prove the security what you do is you have this T to the
E so the simulator, you know, simulates the game with the adversary and will
create like a fake file and tags and then what -- suppose there is an adversary
that will come up with a forgery, okay, so we'll come up with some T prime such
that T prime will pass this verification even though the files are not there, okay?
Then if you divide T to the E and this value -- so the idea is that you want to
make sure that you get on one side of the equation this division of T and T prime
raise to E and the other side of the equation something that is morn E so that you
can use Shamir's trick. Sorry, this is slightly technical.
>>: [inaudible] because the lengths of M ->> Giuseppe Ateniese: Yeah, I mean C is fixed a priori, C is the number of
challenges is fixed a priori.
>>: [inaudible].
>> Giuseppe Ateniese: Yeah. So and E is a very large prime number. Yeah.
And by the way, this works -- sorry. I'm talking [inaudible] again, but this works
when RSA E doesn't have to be necessarily less than N. So in this case,
particularly E is much bigger than N, could be much bigger than N. So
technically, this is an extension of RSA. But it's a kind of RSA with a big
exponent.
So question?
>>: [inaudible].
>> Giuseppe Ateniese: Oh, no. No. It was talking about number of challenges.
So here I'm asking for four challenges, right? Depending on the application you
may need 400, for instance, rather than ->>: [inaudible].
>> Giuseppe Ateniese: You can repeat this as many -- it's unbounded. Yup?
>>: How strong can [inaudible] could be AB [inaudible] so it could be like
[inaudible].
>> Giuseppe Ateniese: No, this has to be like 80 bits.
>>: [inaudible].
>> Giuseppe Ateniese: Well, because in the end what you do -- what you want
to prove is that you want to prove that if there is a cheating adversary you can
extract the actual file, okay? So what you do is you -- there is an extractor that
query the adversary without rewinding several times until it gets like a series of
like equations that are linearly independent and then you solve the equations and
you get the messages back.
>>: [inaudible].
>>: [inaudible].
>> Giuseppe Ateniese: Okay. So the features of this scheme is well, first of all
there is a nice side effect here. I don't know if you'll notice that. So if the client
can check that these file blocks are actually stored there without having the file
itself, which is great, right, then everybody can do it. I mean, even if I'm not the
data owner, I don't have any idea of MIs, but even the -- if the owner didn't have
-- doesn't have any idea of MIs. Okay?
So actually this is what we called public verifiability, okay? So everybody can
actually check that person's Google restoring the Library of Congress even
though we have no idea what the Library of Congress is storing in Google.
Okay? So everybody can do it. So that's a great feature that is from many,
many, many applications. I should mention that after this work, I think we
counted like 60 publications after that and on this area. So it's becoming quite
interesting and popular.
Another feature is that there's an unbounded number of challenges. Of course
you can apply this challenge as many times as you want. I'm emphasizing this
because previous solutions provided just a limited number of challenges and also
previous solutions required this data to be encrypted, okay? We don't have to
necessarily encrypt data, it could be just public data. Yup?
>>: I'm confused by your latest statement. So ->> Giuseppe Ateniese: Public verifiability?
>>: Yes. Because I could do the same game with another library and [inaudible]
what would [inaudible] so this is the Library of Congress that I'm checking.
>> Giuseppe Ateniese: Well, yeah. Remember that this is public -- the RSA
public parameters are given to you by the Library of Congress. So the Library of
Congress saying ->>: [inaudible].
>> Giuseppe Ateniese: Yes. You are the taxpayers. This is E and N. Please,
you know, once every two weeks check that Google is storing everything for us.
You know. So the idea now people are using these in order to outsource this
verification type of thing. Because now you can -- everybody can check. So
even an auditor can check that those files are stored there even though the
auditor has no idea what this data is about.
And think about like liability and financial records. Sometimes you know it's
efficient they had to necessarily outsource an auditor that can check that these
financial savings are stored correctly, and nobody's modifying them. But for
privacy reasons before a certain date they cannot be released, so you want to
provide also some from a privacy protection. Yes?
>>: [inaudible] would it be fair to say [inaudible] in the Library of Congress
[inaudible] a public verification key and basically it gives to the Google a signed
copy of whatever [inaudible] then basically what Google does it basically
[inaudible] proofs [inaudible] is.
>> Giuseppe Ateniese: Right.
>>: [inaudible].
>> Giuseppe Ateniese: You can but remember that here we are talking about a
very large file. So when you do these proofs, right.
>>: Right. But it didn't give us [inaudible] so it's [inaudible].
>> Giuseppe Ateniese: Right. But the problem is ->>: [inaudible].
>>: Software from using PCP and the random oracle [inaudible].
>>: And in a sense it's more efficient [inaudible].
>> Giuseppe Ateniese: Right. Because the problem is you don't want to access
the whole file. Unfortunately -- when you want to prove --
>>: [inaudible].
>> Giuseppe Ateniese: Right. But you pay like logarithmic but here is essentially
custom. Yeah, I mean, also one thing we want to emphasize is that we wanted
to look at something that was, you know, essentially constant in size. So
independent -- yeah?
>>: So if you [inaudible] make sure to detect let's say one percent loss of data. If
you are like below the ECC layer, can you do better and make sure that data
cannot be lost because ->> Giuseppe Ateniese: Yeah. I mean, again, for efficiency reasons here, we are
just focusing on detecting corruption or percentage of the files. Okay? If you
want to recover your data, that's a different problem. You can apply error
correction codes of course. But then it becomes less practical for several
reasons, in particular we also are interested in dynamic data. See here, I was
talking about archival storage. This is kind of static. You can only add data. You
can never delete information, okay? And these protocols are using error
correction codes are fine as long as the data is static.
But if you want to change file blocks, if you want to have the ability to have -- to
modify your picture, a picture that you stored three years ago, you want to apply
like some filtering something and then store the new version of the picture then
becomes extremely complicated and inefficient if you use error correction codes.
>>: [inaudible] underlying assumptions are RSA.
>> Giuseppe Ateniese: Yes, it's RSA but in the random oracle model. I was
arguing before that the random oracle doesn't exist. I think it's debatable
because you can always think of like a remote server that computes the hash per
use. But yeah, I mean unfortunately -- we do have some results.
>>: [inaudible].
>> Giuseppe Ateniese: No, no, it's standard RSA. Yes.
>>: [inaudible].
>> Giuseppe Ateniese: Yeah. We have a -- yeah. This is the paper I wanted to
mention here. Well, first I want to mention that if you don't care about unbounded
number of challenges, and if you don't care about public verifiability, then
together with Jim Sudik and researchers in Rome, we found a very efficient way
of doing this that it's extremely fast and uses only hash functions. Okay?
So the intuition -- I'm not going to go into details, but the intuition there is you kind
of precompute all the responses that you would expect from Google or any cloud
provider and you store this in Google itself, encrypted using for instance
authenticated encryption. Okay?
So we have this very efficient solution. So the complexity comes from the fact
that we are interested also in public key verifiability and of the previous solution.
The public verifiability unbounded number of challenges.
But we may argue that you don't really need an unbounded number of challenges
anyway.
In recent work in -- here at Microsoft, Seny Kamara and Jonathan Katz we
actually find a generic way of transforming what we call kind of homomorphic
sigma protocols. These are kind of sigma protocols that have certain
homomorphic property into PDPs. And this transformation doesn't use random
oracles. However, we need the random oracle for the actual construction. But
thanks to this compiler now we can pick any kind of sigma protocol as long as
they satisfy this homomorphic property and build a PDP. For instance, we built a
PDP based on factoring. So it turns out that this is as efficient as the RSA
version. So this shows the benefits of this compiler.
>>: [inaudible].
>> Giuseppe Ateniese: So the transformation doesn't use any random oracle.
>>: Does it [inaudible].
>> Giuseppe Ateniese: Yeah. But them the actual -- the actual sigma protocol
might require the random oracle. So the -- so in this result for instance we
started from a sigma protocol that has these properties. But that uses the
random oracle. And then we apply our compiler. So in the end, we had to rely
on the random oracle. But it's an interesting opening question to start from a
sigma protocol with homomorphic property that it works in the standard model.
Okay? We haven't found any.
>>: [inaudible].
>> Giuseppe Ateniese: No.
>>: What is the technical difference between [inaudible].
>> Giuseppe Ateniese: Yeah. That too kind of [inaudible] you can see proof
[inaudible] as PDP, so this technique applied to a file where it's encoded using
error correction code or erasure codes. I mean, this is a very rough equivalent.
But it's not exactly that.
>>: [inaudible].
>> Giuseppe Ateniese: Yeah. I mean suppose my file is encoded using error
correction code first before I apply the PDP scheme on it. Okay? Then of course
if I want to retrieve, files, right, I retrieve the good ones and I recover the bad
ones using the error correction code. But that's a rough I call it -- there are
distinctions, very subtle distinctions.
>>: [inaudible].
>> Giuseppe Ateniese: We can talk offline. But more or less it's a PDP using
erasure codes first. So you encode your file and then you apply the [inaudible].
>>: [inaudible].
>> Giuseppe Ateniese: I mean it's a different -- versus we don't care about
retrievability. So I don't want to prove that my skin satisfies the retrievability
property. I just care about detection mostly because of efficiency. We focus or
more efficient schemes.
Okay. So let me focus on the other interesting primitive. I think it's very
important for cloud crypto in general, a proxy reencryption. So which provides a
way to -- yup?
>>: [inaudible].
>> Giuseppe Ateniese: Yes, this was the question.
>>: [inaudible].
>> Giuseppe Ateniese: So basically -- POR is basically PDP plus erasure codes,
more or less. I mean, just to see it [inaudible].
>>: [inaudible].
>> Giuseppe Ateniese: Yeah. So what is proxy reencryption? And why is it
useful for the cloud? So proxy reencryption basically is the following. So
suppose I have a message encrypted for Alice. I'm glad to have this server. So
this encrypted e-mail goes to the mail server, right? The mail server forwards
this to Alice. So suppose Alice goes on vacation and she says -- she asks Bob
to read her e-mails while she's away. Okay? So ideally we would like to have
like a mail server, okay, that gets this encrypted e-mail under Alice public key.
Decrypt it, okay? And reencrypt this information using Bob's public key. Okay?
So that will be a solution.
The problem with this solution, though, is that the mail server first has to know
the secret key of Alice, okay? And so the mail server will see the actual
message. Okay? So that's not a satisfiable solution.
So in proxy reencryption what we want to do is the following. We want to find a
way to provide some proxy key to the mail server, okay, that is used to translate
between encryptions. So think of proxy encryption as a way to translate between
different languages. So this is a message encrypted for Alice, and this is the
translator. It can translate something that only Bob can understand.
Now, the nice thing of proxy reencryption, this information to provide to the proxy
there is really no way to remove the message, okay? So the proxy doesn't have
to know the secret key of Alice, okay? And at the same time would not be able to
actually read a message. So that will -- that's -- that's the ideal solution.
Now, this concept was introduced in '98 at Eurocrypt '98 by Blaze, Bleumer and
Strauss, and they provided a very basically protocol for let's say El Gamal.
Okay? So basically the idea is very simple. Suppose Alice has public key G to
the A, okay? And Bob has public key G to the B. So in order to encrypt using El
Gamal what I can do is pick a message M in the same group generated by G and
multiply M by G to the R where R is random. So G has to be a group of prime
order, so R is chosen in the Q where Q is this prime order. Okay?
And then I also release G to the RA. This is actually a variant of El Gamal but it's
equivalent. And I want to come out with something that will allow me to go from
here, right, which is an encryption for always to here which is an encryption for
Bob. Okay? One way they say -- they do it is by releasing B divided by A mod
Q, okay, where Q is this prime number.
So clearly the proxy must be divided by A so it doesn't know A or B, okay,
doesn't know the secret key. And the proxy can raise this to B over A to actually
get GRB. So it will copy the first message entirely, okay, and it compute this
operation on the second component of El Gamal.
So if this was a proper encryption for Alice, this is also a proper encryption for
Bob. Okay?
Well, there are several issues with this problem, though. First of all, it's
bidirectional. So it means that if Alice goes on vacation, okay, and the proxy has
this key, of course encryption for Alice can be transformed into encryption for
Bob. But also the other way around. So encryption for Bob can be transported
into an encryption for A. Remember that this is mod Q and Q is prime, so every
element has an inverse.
Another problem is that the secret key requires interaction. So in order to
generate B divided by A, I need both Bob and Alice to be online and participate.
And if one of the two actually collude at the proxy can get the other's secret key
which is really kind of bad. And lots of transitivity for have -- if the proxy has
several keys, for instance, from Alice to Bob and Bob to Charles, then the proxy
itself, without ever interacting with the users can generate a key for Alice to
Charles. Even though they may not know each other.
>>: [inaudible] if I have A to B I can transform A ciphertext to B ciphertext and ->> Giuseppe Ateniese: Oh, you may want something like A to the B and then
you have B to the C, but not necessarily you have a relationship A to the C.
>>: [inaudible]. Because A to B and B to C let you transform A ciphertext to B
ciphertext which is exactly the functionality of B to C, right?
>> Giuseppe Ateniese: Yeah, yeah. I mean in principal, yes. But you can find
versions of this protocol where you can actually not do this. So you can say I
want to be able to do from A to the B and from B to the C, but I do not
necessarily want A to the C. I don't want the transitivity property. And you can
actually achieve that. So this property may be good for some application but
actually bad for others.
>>: Are you saying that the transformations from A to B some kind of specially
ciphertext so just if you.
>> Giuseppe Ateniese: Yes. And partially -- yeah. I mean, all these solutions
have like what do we call first and second-level encryption that are distinct, yeah.
It's kind of asymmetric, the protocol. So of course it's a big area. We also
defined the security of the schemes. And in order to solve this problem, which
seems hard, we kind of used the magic one in cryptography, Balena maps, and
since they were introduced before, so I don't have to introduce them here, so we
use a Balena map E and we used a symmetric map from G1, G1 to G2 and we
publish as before as like in El Gamal G to the A and G to the B as the public keys
of Alice and Bob, and we publish Z, which is the value of the generator of G2,
which is called the external field. And we encrypt messages in the extension
field M. G2. Sorry.
So here comes the asymmetry we mentioned before. So suppose I want to
encrypt a message for Alice such that the proxy cannot that it. So only Alice can
read these encryption, okay? So just compute like standard El Gamal but not in
G2. So NZK, ZKA. And this is just standard El Gamal, it just remove A and
divide this ZK, ZK, and get N back.
Suppose now instead I want to encrypt a message for Alice so that Alice can
read it, of course, but if these on vacation and she has delegated someone else,
okay, then the proxy can translate it. And the idea is that now we send NZK and
GKA. Okay, where G now is a generator of G1, not the extension of field
anymore. And now we provide to the proxy this key, GB divided by A. Now,
notice that B divided by A is in the exponent, okay? So informally it's protected.
It's not available anymore to the proxy. Okay?
And also notice that in order to compute this GB divided by A Bob despite to be
online at all. It's not an interactive scheme any more because I can start from the
public key of Bob, G to the B, which is available by definition. Okay? So we
solved like two, three problems at the same time.
And then very simply what you can do is in order to transform this encryption for
A into an encryption for Bob, D proxy will just compute the pairing of GKA and
GB divided by A. And you get exactly ZKB, which is of course a properly formed
El Gamal encryption for Bob.
Okay. There are a lot of interesting protocols here. Let me mention why this is
useful in the cloud scenario. These actually I think are Ken slides.
So we have different clients and we want to use a storage provided, right? So
we want to encrypt all these files. As it works right now, suppose these clients
have to share files, okay? Suppose we have Google docs that are encrypted,
right, and we want to share these files. Current solutions to this problem will
require like what is called the key server, okay? So the key server encrypts all
these files okay? And then if I want to access a certain file, I will have to ask for
the key to the key server and say, hey, can I decrypt this file? Okay?
And the key server will say, well, let me see if you have access rights to this file.
I will ask the owner if I should provide you with this information. Okay?
And this works pretty well. The only problem here is that we had to trust the key
server. The key server in principal could read all the files because the key server
owns the actual keys. Okay? But these are the current solutions. And also the
server is always online, is always a single point of attack, okay? So if you
compromise the key server, you get all the keys. So how can I use proxy
reencryption here? Well, it's very simple.
You have a question?
>>: Yes. Can we go back to the [inaudible]. So it looks like there are two ways
to encrypt, for -- if I want to encrypt something to [inaudible].
>> Giuseppe Ateniese: Yes.
>>: [inaudible].
>> Giuseppe Ateniese: Well, the message is always the same. The first part of
the encryption doesn't change. What changes is the second part. So it's the
decryption algorithm that is different. In one case than the other. But the
message is always the same.
>>: [inaudible].
>> Giuseppe Ateniese: In this case?
>>: [inaudible].
>> Giuseppe Ateniese: Okay. So in the first case it's just a plain El Gamal. So
what you do is you remove your circuit A from the exponent and you divide these
two, right, and you get M back. In the second case you can simply compute -you can remove the A, right, computer the map, the pairing so you go to -- you
can put ZK, right, using the map and then divide ZK again.
>>: You have to know which [inaudible] in the first group or in the second group?
>> Giuseppe Ateniese: Yes, because it's implicit -- there are two distinct groups
and you can specify if this is a first-level encryption or a second-level encryption,
when you encrypt. So the first encryption is just for people that want to send a
message like to Alice and there is no way for anybody else to read this message
even if there are proxy set up by Alice. Okay? The second one says well, you
know, if Alice delegated someone, it's fine by me. The important thing is that
Alice gets the message.
>>: [inaudible].
>> Giuseppe Ateniese: Well, unfortunately we have this symmetry somehow, but
it would be an interesting problem to find uni-directionals, because you know,
consider this scheme is not bidirectional. For instance, I can only go from Alice
to Bob, I cannot go back. I cannot translate encryption from Bob to A, Alice.
Okay. But we can talk offline. I have two minutes.
So I was here. I was mentioning -- so you encrypt -- what you do then, if you
don't want to reveal your key to the key server what you can do is I encrypt my
file using for instance AES and CBC mode and with a symmetric key, and then I
encrypt the symmetric key using my public key. And then I stored this to the
cloud storage, okay?
Now, suppose Bob wants to access this file. Okay? So what I can do is I can
send a reencryption key to the cloud provider from A to B. Now, what the cloud
provider can do now, it's only translate. So he will pick the encrypted symmetric
key and convert the encryption from A to the B, but by the properties of proxy
reencryption won't be able to learn the underlying message, so the underlying
key and won't be able to know any secret key from Alice to Bob. Okay?
And so Bob will be able to decrypt this information and use this key to decrypt the
actual file using AES, symmetric encryption. Okay?
And so this is very cool. Actually a similar system -- I don't know if you
remember there was this attack on Apple iTunes in the past. The problem was
that they were trying to convert an encryption from a certain user to another user
and it was attacked because, you know, this key was actually stored inside the
program. If they used proxy reencryption like this way, they will of avoided this
problem in the first place.
We have some recent results but I don't gonna talk about it. I just want to
mention that one property we provide recently is what we called key privacy for a
proxy reencryption. And the idea here is that the proxy will not learn even the
participants by looking at the proxy key. Okay? Will not learn the identities of
Alice and Bob. And these are several interesting applications.
To conclude, again, cloud cryptography it's a great opportunity for research. And
for the first time I think cryptography is really essential to the cloud provider. I
mean, cloud providers would fail if they don't provide a way to give control back
to the users of the -- of their own data. They will never really use cloud storage
or cloud providers if they don't feel they have control on their data. So this time it
seems like crypto can make a difference.
I hope I convinced you that provable data possession is cool. I think proxy
reencryption is good in general and it's a good way to provide access rights in the
cloud. And let me just briefly three interesting open problems. One is to provide
a PDP with full privacy. So when I mention I can check that the file is stored on
Google, for instance, and I don't have to have these files in the first place. But
during these proofs I can actually leak some information about the actual file
blocks. It would be nice to provide complete zero knowledge. I actually should
mention that with Seny Kamara and Jonathan Katz we have some results on this
front. So now we can have full privacy in PDP schemes which is related to kind
of leakage resilient signature schemes. It would be nice to find efficient PDP
schemes for multiple storage servers. And for the proxy prescription part it would
be nice to find efficient unidirection and multi-hop proxy encryption key.
I should mention that if you use a fully homomorphic encryption you can build
uni- directional multi-hop encryption keys. So theoretically this problem is known.
The emphasis is on efficient and possibly key private. So it's still open to find
and efficient solution for that. Thanks.
[applause].
>>: [inaudible] time for one question here.
>> Giuseppe Ateniese: You are not supposed to ask anything.
>>: [inaudible] so I [inaudible] file systems and I talked to Susan and I said hey
Susan, do you know any constructions in the symmetric key world that can do
this kind of [inaudible] encryption? And has it been shown now that that's
impossible or [inaudible] what's the [inaudible].
>> Giuseppe Ateniese: No. The latest result on this is a paper from South
Colombia using this. But it's a very limited protocol for symmetric -- so the
question is can we do proxy reencryption using symmetric primitive, for instance
an encryption under certain key AES, for instance, uncertain key K1 and
transform it into an encryption of AES under K2.
>>: What's the rule of the server then?
>> Giuseppe Ateniese: It's just a proxy.
>>: This setting changes doesn't it?
>> Giuseppe Ateniese: Yeah. I mean, suppose you have a symmetric
encryption and you want to transform it from -- into an encryption under a
different key, without knowing the actual message. There has been some work
in the past, but nothing really that is -- provides all these features. So only for
public key cryptography. So now you can find this.
But there is no results that says this is not possible. It looks unlikely also
because it's hard to prove anything.
>>: Public key [inaudible].
>> Giuseppe Ateniese: No, no, the question is if I have ->>: Of course.
>> Giuseppe Ateniese: Yeah. But them I will have to use number theory. So
the question is more like not just symmetric encryption but efficient symmetric
encryption using AES or DAS or symmetric ciphers.
>>: It's not even clear what impossible means, right?
>> Giuseppe Ateniese: Right. Exactly. But, yeah, I mean not using number
theory to achieve ->>: [inaudible].
>> Giuseppe Ateniese: So efficient PRF or PRP is all block cyphers in general.
So is it possible to do it? We don't know. I mean, there is some partial results
from these guys at Colombia.
>>: Okay. Let's thank the speaker again.
[applause]
Download