37313 >> Melissa Chase: All right. So we're very... Ghosh visiting us from Brown University. Esha is a...

advertisement
37313
>> Melissa Chase: All right. So we're very happy this week to have Esha
Ghosh visiting us from Brown University. Esha is a Ph.D. candidate and she's
been working on intersection of privacy and security and applied crypto. And
today she's going to talk about some of her work on authenticated data
structures and adding zero knowledge.
>> Esha Ghosh: Thanks, Melissa. So, today I'm going to talk about our work
on efficient zero-knowledge authenticated data structures. So this is a
brief overview of the talk. First I'm going to tell you why this field of
study is important and what is this field of study. And I'll tell you some
traditional authenticated data structures that actually do not give you the
support for privacy. Then I'll give you first an informal and then build up
to the formal security definition that we propose. And finally, I'll give
three different constructions for three different data structures. And this
section is completely modular. And finally, will conclude. So with the
advent of cloud computing, a large number of corporations and individuals
outsource their data and computation to this untrusted cloud storage or
server. It is a very common thing now and since the data is now the owner
who owns the data is not in physical possession of the data anymore, it is
stored on an untrusted server, there is obvious question of integrity as in
that the data has not been tampered with and it has been maintained properly.
But along with that, often third-party providers of given partial access to
this data, which introduces another problem of privacy. So they have partial
access and partial view of this data, but to prove integrity, often more
information is leaked about the data than they should learn. So this is more
than a theoretical risk because there have been serious attacks in this. So
I'm just going to give all few motivations. So let's go to zone enumeration
attack. It is an attack of DNS name result queries where DNS queries are
basically give me the IP address of some host name. And the way the zone
enumeration works is an attacker gets to know all the IP addresses of a
particular zone, all the host names of a particular zone. So how does it
work? So this -- the owner of this zone, it generates -- let's call this
there are two kind of resolvers. Primary name resolvers or secondary
resolvers. So when a query comes and give me the IP address of this
particular website, this resolver gives the answer, that this is the IP
address. Now, in early '90s, DNS sec was proposed to defend against
attackers. So that you know that the correct IP address has been returned,
addresses are usually signed by the trusted owner. And then there is this
issue, but then there are zones that are not -- the queries can come from
arbitrary website corresponding to [indiscernible] no IP address in the word.
So you have to also prove to the querier that that result is not in the
database. Now, proving a positive, so there are two kinds of queries. One
is positive membership, one is non-membership or negative membership. So
proving positive membership is easy. In the offline phase, the resolver
could then sign all the [indiscernible] that are in the log and store it with
an untrusted server. So this setting is very important because often the
[indiscernible] name servers that administers a zone, they do not -- they
have secondary resolvers. For example, if you do an NS lookup for Brown
[indiscernible], three of the rim resolvers are owned by Brown. One is
actually owned by UCSP. So what happens is the secondary resolver can also
answer this queries. And they may not be trusted. They're secondary
resolvers. They do not administer the zone. So and the problem here is how
to prove the non-membership. The membership proof is sort of easy. You just
sign it offline. The authoritative name server signs it offline, and the
secondary just gives the signature. And that proves membership. So there
have been a few ways proposed for non-membership. So the most recent one was
NSEC three which basically says hash all the website names. Generally
there's hash values. Then sort those hash values and sign every pair. Okay.
And store this record with the secondary resolver. Now, when a query comes,
say q.com, which is not in this list of data, the proof is -- the hash of
q.com is computed and let's say the hash falls between these two records. So
it is efficient to return this to record sign and that will prove that that
zone is not in the database. That was approved. Now, what is the problem
with this proof? Of course this is dictionary attack. And the zone
enumeration attack. So what an attacker can do is it can take -- ask many,
many arbitrary queries and collect all these hashes of the zones and then
mount an offline dictionary attack on this. So, this is a scenario where
privacy is really important, not just integrity. Okay. Another example is
where public -- yes?
>>
[Indiscernible].
>> Esha Ghosh:
Yeah.
>> So why shouldn't someone be able to see the list of all the domains
[indiscernible]?
>> Esha Ghosh: Oh, this is a privacy [indiscernible]. You are going to
learn all the host names in a zone. That could be a privacy preserving thing
in itself. You might know all those names, the router names and everything.
And that is a base to mount more complex attacks.
>>
Okay.
It's not information I [indiscernible].
>> Esha Ghosh: No, no. You have to learn. You have to ask, what is my IP
address. And then you learn all the host names in that zone. Okay. This is
the -- personal digital records are often subject to audits or authorized
parties or analysts are given access to it when something malicious goes on.
For example, the database -- the e-mail record of a particular company. It
is often stored with a third party server. And in case of some sort of fishy
activity between certain dates in this example, say, this date period, our
authorized data analyst is given access to records from within this time
period. But because of privacy issues, here, the requirements are twofold.
First of all, you need to prove to the analyst that the server hasn't
tampered with the messages. It is returning the correct messages that it got
from the organization. But also, because of privacy reasons, it's necessary
not to reveal anything beyond this time period and the data. And finally,
this is another example where XML data is stored as directory trees. So for
patient health record, for example, the insurance company might have access
to certain portion of the tree and to prove integrity, you should not reveal
anything beyond that subject [indiscernible] access. Some more examples. So
this scenario motivates the following model. Okay so this is a three party
model where there is an owner, a servers and clients. Okay. The owner is a
trusted owner of the data set. Someone you would trust to generate the data
set and give some authentication information on it. Now, this data is stored
on this untrusted server. So the adversarial model is that the server is not
trusted to store the data faithfully all along. It might delete it or it
might get [indiscernible], might get attacked, but so there's a -- okay.
I'll come to that later, but the data is stored with the server. So the
owner basically generates some sort of authentication information about this
date set and stores it with the server. It also generates a short succinct
digest of this data set and makes it publicly available. Okay. And then the
owner can store and periodically update, but otherwise, owner goes offline
from this scenario. Now, the clients interact with the server to ask queries
on the data set and gets responses. Here, the requirement is that first is
integrity where it does the -- the client never accepts incorrect answers.
So whatever the owner has generated, the server cannot give some answer which
is inconsistent with the data store generated by the owner. And the privacy
requirement is that the client does not learn anything beyond answer of the
query. Anything else about the data set. Okay. So if you are familiar with
traditional authenticated data structures, traditional authenticated data
gives you the guarantee of integrity that data is not getting tampered. But
this proofs are generally leaks a lot of information. So for example, this
is a very well-known data structure, Merkle's Hash Tree. So when you prove
[indiscernible] Merkle's Hash Tree, for example, you want to prove this
integrity of limit X two so the authentication path of those red nodes which
revealed a lot of information, first of all, it reveals that there are a
number of records in the database and if this data elements are stored and
sorted it also reveals the rank of the element. A lot of leakage happens
with the proof that this is indeed in the database. And the way
non-membership is proved is if their order like we just discussed in the
hashes, it will give to neighboring elements and that also reveal as lot of
information. So this is not privacy preserving. Okay. So this is another
data structure authentication skip list which again reveals a lot of
information. This is to say that the traditional aliases do not give privacy
[indiscernible] privacy. Okay. So now I'll come to the formal security
definitions so if you have any questions at this point, good to ask. Okay.
Okay. So the algorithm, there are three parties that we set, the owner, the
server and the client. And they're three algorithms. One is the
three-generation algorithm that takes the security parameter and generates a
secret key into a public key for the owner. The second phase, the owner has
its own security key set it up as to zero. Let's say times times zero. It
generates some succinct authentication information, the little orange box
that we saw. That will be publicly posted. Anybody can access. Then there
is theta zero which is the authentication information that it will store with
the server to basic facilitate answering queries. And some state
information, internal state information that it will use if it ever wants to
update the database. So in the update state, it just takes all this
information so the data will snapshot at some time I and this is not dead
treatment. UI, which it wants to update on the database. For example, if
it's a set maybe it wants to insert a new element or a delete an existing
element. And it gives you the updated database. The updated is you
succinctive nature. Sigmoid plus one. Some updates [indiscernible] used to
update things at its end and updates its own state information.
>>
[Indiscernible].
>> Esha Ghosh:
>>
[Indiscernible].
>> Esha Ghosh:
>>
This one?
This one?
[Indiscernible].
>> Esha Ghosh: Oh, that is the server. So the server, as of now, has theta
I. So it will take this update string and update it at its site, which is
this. Okay. It takes it update I, updates the database, and the
[indiscernible]. Now, so the server is responsible for this perform update
and query algorithm. In query algorithm, it has this sigmoid J which is
public information. Everybody has. It has theta J with corresponding to the
database J and it receives a query directly from the client. Okay. And it
generates an answer in the proof. So the rest of the information it has from
the owner, only this is the online information that comes from the client.
And it generates a proof for that. Yeah?
>>
[Indiscernible].
>> Esha Ghosh:
>>
This one?
[Indiscernible].
>> Esha Ghosh: It depends on the implementation. It could even have it in
its state information. So it just needs to not what was the last date that
it was in. Okay. And in the very [indiscernible] phase, the client just
takes a public key, the sigmoid and the query, its answer and proof are
neither accepts nor rejects. Okay. So like we already described, the
security features are twofold. Completeness just means that correct answers
below has been accepted. The soundness is the game that models the adversary
behavior of the server. So in this game, the adversity initially sees the
public key from the challenger. And it comes up with the database of its
choice. [Indiscernible] zero. It gets the information that an honest
challenger would do. An honest owner would [indiscernible] sigma zero and
theta zero, and then some polynomial number of times it asks some update
information of the database and gets the update strings. Okay. And finally,
it outputs some answer which is inconsistent of the database snapshot J. One
of those snapshots for which it has queried. And it wins if the answer is
accepted even though the answer is not correct with respect to the database.
Okay? So you notice that the server actually in the original scheme
[indiscernible] the server doesn't do the updates but this is a stronger
server who can also affect the updates. And so even when the updates are
adversary, it should not be able to forge this.
>>
[Indiscernible].
>> Esha Ghosh: This one? This is the succinct public information about the
database. That's public, yeah. Okay?
>> [Indiscernible]. So the idea is the owner of the database updates the
database and then [indiscernible].
>> Esha Ghosh:
Yes, inside, yeah.
Should not be able to learn from, yes.
>> So I don't understand the role of this sigma.
information --
Like what public
>> Esha Ghosh: [Indiscernible] succinct coming into the database by the
owner. So because the server has to prove correctness with respect to the
original database, but it doesn't know the original database so it is
consistent with respect to the succinct commitment that the owner generated.
>>
You need to verify your [indiscernible].
>> Esha Ghosh:
>>
Yeah.
[Indiscernible].
>> Esha Ghosh:
Yes. Okay.
>>
The verifier doesn't know, yes.
Server knows, of course, yeah.
The verified doesn't know.
So [indiscernible] definition, is that a standard definition?
>> Esha Ghosh: No. This is the -- these are the definitions we proposed for
original authentication data set. This was a formal study of this. And then
there is the privacy definition which models the adversarial behavior of the
client or the verifier. So here, the client is adversarial or the adversary
is modeling the client so the adversary initially sees the model as it's
either talking to the challenger or it is talking to a simulator and it will
succeed if it can tell [indiscernible] who it's talking to. So the idea is
this [indiscernible] should be distinguishable to the client, to the
adversaries. So the adversary initially sees the public key which comes from
the challenger of the simulator. Then it comes up with a database of its
choice. So it was [indiscernible] database, which is sent to the challenger
if it were talking to the challenger, but simulator sees nothing about the
state of this. And then it sees this succinct [indiscernible] zero that the
owner would generate. Simulator also simulates that and shows it to the
adversary. And again, then that adversary asks two kinds of queries. One is
query, one is update. So if it asks a query, it gets great the honest
challenger the answer in the proof and in case of the simulator, it only has
Oracle access to the database. Meaning that it -- the all simulator only
receives the answer to this query with respect to the most current database
from the Oracle. And you it simulates the proof. So answer is only part the
simulator gets from the database with its Oracle access and this is the proof
that it's in units. Then if there's an update query, then the honest
challenger receives the update and it updates this sigma. But for the
simulator, it doesn't say anything except a [indiscernible] depict meaning is
this a valid update on the current database. For example, if it's on a set,
it's trying to insert duplicate. That's not a valid update. So it only
receives a valid update, and assuming it's a valid update, it gets sigmoid
plus one in both cases of valid update it gets [indiscernible].
>> [Indiscernible]. The simulator gets to see the query and the answer but
not -- it just doesn't get the data [indiscernible].
>> Esha Ghosh: Yeah. It doesn't need to see the query also.
to see the answer to that query. Yeah.
>>
It only needs
[Indiscernible].
>> Esha Ghosh: Well, yeah. Yeah. [Indiscernible]. Yeah. Okay. So that
was what the model. Now I'm going to move on to the next part of the talk
which is specific instructions for some specific data structures. So the
first, these are the three data structures I'll try to cover. The first one
is very simple. It's set membership and non-membership. So this I guess I
don't have to go over this. Yeah?
>>
You need to make the definition again?
>> Esha Ghosh:
>>
Yeah.
[Indiscernible] simulator should have to know the [indiscernible]?
>> Esha Ghosh: Well, the simulator must know the kind of query, yes, but
it's already defined by the entity. The actual data type on which ->> But, I mean, shouldn't it have to know the actual query [indiscernible]?
Well, it gets the answer are from ->> Esha Ghosh:
>>
It gets the answer [indiscernible].
[Indiscernible] proof for a particular --
>> Esha Ghosh:
Query?
>> -- statement, right?
query?
Because the verifier [indiscernible] particular
>> Esha Ghosh: Okay. Yeah. Yeah. That's right. Yeah. Get the query,
right, okay. Yeah. [Indiscernible] query and [indiscernible] just not the
database, yeah. Okay. So set membership, it's based on bilinear maps, the
constructions [indiscernible] notations. And so for the rotation, let's say
kai is a set of elements. This is the database and the entity so the client
queries is an element X in the set or not? In this set, kai or not, and the
server's response is a bit, which is the answer, yes, it is in the database
or not it is not in the database and the corresponding proof. So for a set,
we first presented as a formal polynomial, [indiscernible] minus zero, so
doesn't matter. And the polynomial is called characteristic polynomial of
kai. This is the formal polynomial. And when -- sorry.
>>
[Indiscernible]?
>> Esha Ghosh: That's a product, yes. And when this is evaluated as a
secret point, we just denote it like that. That's a value. Okay. So this
is the construction for the key generation of the owner so these little red
things denote the number of operations, operations [indiscernible]. So
first, you generate the bilinear public parameters so update this bilinear
group, G, G run, we are doing it as symmetric group, symmetric value
integrating. The bilinear map, and this is a generator of the group G. And
this is the prime order of the group. So you choose a secret as from CD star
randomly. This notation is just say it's random, and set the secret key is S
and the public key is G to the S and this parameters.
>>
What is this G one?
>> Esha Ghosh:
>>
That's the group where the --
>> Esha Ghosh:
>>
It's a bilinear --
Yeah.
[Indiscernible].
>> Esha Ghosh: Okay. Now in this setup phase it takes the owner again runs
this set. So the additional thing it does is chooses a random R from
[indiscernible] star, which is the blinding factor, really. And it sets a
sigma zero G to the R, the characteristic polynomial evaluated at S for
secret part. Okay? And then theta zero. Remember, this is the auxiliary
information that is stored with the server so this is this long public key -it's not the public kai like this tuple and there are blinding factor and the
set size. Anyway, the set is [indiscernible] so this is not in part. And
for this talk, let's say the state is the set that is stores in case it needs
to update the set later. But you can do other things like this can also be
outsourced and can use Merkle tree to just keep the root to know the state of
the set but for simplicity, let's say the state is the set that it stores.
Okay. Now the query part. So for the query part, the server has some
snapshot of the set kai J. So theta J and this is the query delta. So if
[indiscernible] go to X is in the set, okay, then the answer is one. And the
proof is this sigma J to the Y over [indiscernible] which is basically this
polynomial. Okay. So the server computes this and returns it as proof.
>>
[Indiscernible].
>> Esha Ghosh: I'm not going to talk about it, but this is a polynomial, it
is just a characteristic polynomial with one factor missing. It's dB minus
one characteristic polynomial. But the server cannot compute it -- cannot
compute in polynomial and raise G to that, but it has the string G due to the
SS square S to the N, so it's just basic [indiscernible] to the correct
coefficient and that's how it computes it.
>>
[Indiscernible].
>> Esha Ghosh: It computes it. It was not computed earlier. So what was
computed earlier was this -- without this factor as plus X divided. Okay?
So that is ->>
[Indiscernible].
>> Esha Ghosh:
>>
Yeah, yeah.
That's why --
[Indiscernible].
>> Esha Ghosh: Yeah, yeah. You -- it can be done in 15 first time and log
in. Yeah. Okay. And then if the set is not in the set, the way proof
non-membership is think of the characteristic polynomial of X plus C. This
is a degree one polynomial, right? If your set is only the query X, then C
plus X will be character polynomial. Now, if this is not in the set, then
this polynomial and this polynomial might be called [indiscernible] to each
other. Meaning that each cannot have a common factor, common divisor. Okay.
So the way -- then this is run [indiscernible] Euclidian algorithm to
generate the coefficient polynomials. [Indiscernible] does that. Q1 Z and
Q2 Z. And what we actually want to give is G to the Q1 Z and G to the Q2 Z.
That will prove that it's non-membered, that you believe and check in
exponent using bilinear map that these are coprime. But this is not
perfectly blinding. So we have to traditionally do the step of pick a fresh
randomness, which the server does, and blinds this Q1 prime and Q2 prime.
Make it Q1 prime and Q2 prime and return this as non-membership witness. So
the non-membership witness is two elements, W2 and W3. Okay? Yeah.
>>
[Indiscernible].
>> Esha Ghosh: Oh, because there is this blinding factor R in the original
accumulation. So sort of have to cancel it out to see the verification goes
through.
>>
[Indiscernible].
>> Esha Ghosh: So if you -- yeah. So this is the verification, right? When
you do the [indiscernible] take that R off. Yeah. That's how. And the
verification is obviously simple. Just for the membership witness, you plug
that witness, right? So this is run by the client. So recall that the
client has this and this from the owner. Clients get this from the server.
And then it can compute this part on its own. And it plugs in and checks if
this equality holds for membership. And for non-membership, we basically
checks that the extended Euclidian -- the GCD is one in the exponent. So
what this give us G to the Q1 S into [indiscernible] plus U to the Q to S
[indiscernible] and it [indiscernible] one in the next one. Is that clear?
So this is the question that we want to check in the exponent. Right? So
yeah. And this gamma factors are set accordingly so that they cancel out in
the exponent and have been verified. And the updater algorithm result is
really simple that stand by the owner. So if a new element is added to the
set, it just refreshes the sigma I with this new factor and blinded with a
fresh randomness. So it's like S plus -- let's say X was not in the set
earlier and it's adding this new element to the set, the owner, so the owner
takes the old digest and raises it X and R prime, the fresh randomness. For
non-membership witness, if it wants to delete something, it just takes off
that spot from the set. That's all. That's the update. Yeah. And perform
update is very simple. It's just refreshing with this new R prime on the
server's end. And the sigma.
>>
[Indiscernible]?
>> Esha Ghosh: The server needs R, R and R prime, yes. Because earlier one
was just R. Now you have blinded with fresh random is R prime so now the
random that is an exponent is RR prime. Yeah.
>>
[Indiscernible]?
>> Esha Ghosh: Well, yes and -- yeah. If N prime is larger, it will grow,
it takes one more factor, right, yes. Exactly.
>>
[Indiscernible].
>> Esha Ghosh: Yes. It's in the Q [indiscernible], yes. Yeah. Yeah. I'm
not going to prove it here, but the proof is based on N strong
[indiscernible] by assumption. It's basically says for [indiscernible]
adversary who has access to this public parameters, it is hard to invert in
the exponent. And that's what the proof is based on [indiscernible]. Okay.
Now we're on to the next -- yeah?
>>
[Indiscernible].
>> Esha Ghosh: Oh, why in [indiscernible] is because every part that the
client sees how a proper blinding factor. So what does the client see. Sees
the public key, but beyond that, whatever it sees about the database that the
succinct sigma that it sees has this blindness in its exponent and the way we
constructed the proof, every proof also has a -- sorry.
also has randomness.
>>
Yeah.
Every proof
[Indiscernible].
>> Esha Ghosh: If you see different proofs from the -- yeah. It will be the
same R, but that correlation is already known to the client, right? You're
not trying to hide -- there's no out [indiscernible] probability. Not trying
on to say that it's not from the same database.
>> So is it something like [indiscernible] is exactly the one value that
satisfies [indiscernible].
>> Esha Ghosh:
>>
It's not that.
So the randomness is fixed when you say [indiscernible].
>> Esha Ghosh:
>>
No, it's just a randomness thing.
[Indiscernible] but here's --
No, [indiscernible] so the sigma J should --
>> Esha Ghosh: Yeah. So, yeah, so the simulator can just pick any
randomness and say G to the R is the sigma. And whatever proof comes -yeah. Just divided.
>>
[Indiscernible].
>> Esha Ghosh: Yes. At the beginning, yes. Yeah. Yeah. Exactly. And
this witness has generated with respect to the initial randomness that you
picked.
>>
[Indiscernible].
>> Esha Ghosh:
>>
[Indiscernible].
>> Esha Ghosh:
>>
Just one random, yeah.
Yeah.
And then it just divided [indiscernible], yeah.
[Indiscernible].
>> Esha Ghosh:
Sorry?
>> That would be the only thing that satisfies the verification, once you
pick sigma, there's only one ->> Esha Ghosh: There's only one
Okay. So for the next part, I'm
the problem is the data store is
a totally ordered universe. And
degree of freedom, yes, yes, exactly. Yeah.
going to talk about range queries. And here
a P value paired store. Okay. So P is from
the client query is basically return all the
values [indiscernible] live within some range A to B. And the server
response is the answer which is the correct key value pairs along with the
proof that this answer is complete. That all the limits that it has returned
early in the database and it has not omitted anything. Like an example, this
is actually from the Enron e-mail data set on which I ran experiments for
this query. So you can think of this as the date or timestamp and this is
the mail ID. So if you think of the database there are many possible
timestamps and in timestamps, the data, there are mails, others are empty.
Nothing is in there. So this is basically your range of the domain set, zero
to 15. And some of this is the data store, which six results. As you can
see, the six results are present. Only these keys are presents in the data
set and the rest of the keys are not present in the data set. So let's try
to see how we could think of a simple solution in this case. Do you have any
questions? Okay. Okay. So the one way to proof this could be that we
initially what we're been talking about, this idea [indiscernible] a bunch of
times so the key value pairs that are present in the database could be signed
by the owner. And to prove that the things are not present, the owner could
also sign [indiscernible] in pairs, right, on the keys that are present. So
one key -- or one signature could be for 01, the keys that are present and
this pair and then this pair and so on. So to prove that something is not in
the database, you can give the corresponding -- the server can provide the
corresponding signature. But this obviously valid privacy. Okay. Now,
there is an attempt to the [indiscernible] think of like why not from the
solution we saw in part one, why not accumulate the elements the owner maybe
computes a zero-knowledge accumulation of the elements that are present in
the data set and then when a query comes, the owner proofs -- the server
proves that these are members of the database the owner has accumulated. And
to prove that there are no limits, it proves non-membership and
zero-knowledge using the accumulator. But the problem here is this range
could be too long. Corresponding to the number of queries, data sets in the
database. So for example, the key in this case is a four-bit string, there
are two to the four possible keys. But only a very few are present in the
data set. So if you want to prove non-membership, you have to do so much
work as to touch every possible element that is not in the data set. So
that's not a viable solution. So let's see what we can do then. This idea
that we use was of hierarchical identity based encryption. Okay. So the
idea here is anyone can encrypt messages for users using their public IDs.
And the users will be able to decrypt only if it possesses the correct secret
key. Okay. And this [indiscernible] implicitly you can think of a hierarchy
as a binary data stream where the route is empty and the left child is marked
with a zero and the right child is marked with a one like this Redis tree.
Okay. So if the idea at level K can issue secret keys for as descendent IDs
but nothing beyond that. So if somebody has a secret key for this ID, it
will be able to generate secret keys for every ID that's below it or it's in
that subtree, but will not be able to generate anything outside that subtree.
So the idea is if a message is encrypted with the ID of 00, 00 can itself
decrypt it or it can generate secret keys for any descendent who can decrypt
it but nobody else will be able to decrypt it. Okay. Okay. So now the idea
now we're back to our range queries. So think of the database that they were
initially thinking of. Okay. Now, this key value pairs are present in the
data set and the rest of them are not present in the data set. So think of
this implicit Redis tree and delete all the [indiscernible] to root parts for
these keys that are present in the data set. Okay. So end up with the
forest which has basically this orange note. Okay. So what the owner does
is the owner generates signature for all the key value pairs that are present
in the data set. And sets up a hierarchy identity to [indiscernible]
encryption scape of that height of in this case four-bit of length four and
it generates this secret keys for the roots of this forest. So this is the
setup phase. And this are stored with the server. Now, when a query comes,
let's say the query is 4 to 14. That's the client query. The owner -- the
server first returns this [indiscernible] key value signature page so that
the databases that are present in the data set so it returns this to the
clients. Okay. And for the non-membership, we have to -- first let's look
at this idea of canonical covering for a range. What does it mean? For a
canonical covering for a range, with respect to a tree of certain height,
say. So the canonical covering is basically a bunch of node in the tree
which satisfies the following properties. First is that every node is a
descendent of one of the nodes in the canonical covering. And the second is
every node's leftmost child and right most child falls completely within that
range. And this of given height of course, given a tree and a range, this
covering is unique. So in this example, if 4 to 14 is my query drench, this
is the canonical covering. The roots of this forest basically. This one and
this one and this one. Okay. Now, since this is unique, the client can
compute it on itself. Right? Now, the client computes it on itself and it
also has got the signatures of the elements that are present in the data set.
So it computes this canonical covering and deletes those parts from this
forest. The red parts it deletes so it's left with this forest, the blue
forest. Now the client takes random messages and crypts them with these ID,
the blue IDs. And sends it to the server. But is that because of the way
the setup was done, the server either [indiscernible]s all of this
corresponding secret keys or it is able to [indiscernible] the secret keys.
For example, in this case, you recall that the server did not have the secret
key for this one but it had the secret key for its prefix so it can generate
the secret key for this and then it's able to decrypt it. So if you're able
to decrypt it, you can send it back and if they match, the client accepts.
So that's the -- and you can replace it with the signature hierarchy
identical to the signature instead of encryption. And that's the complexity.
Yeah. This is a comparison of the previous work so the only previous work
known for this was from I cat 2004. Which is a long time ago.
[Indiscernible] and that was a stateful algorithm, it was very inefficient,
would generate proof of knowledge. Okay. Okay. So yeah. And the security
is based on the unforgeability of the -- sorry. It's [indiscernible]. The
signature scheme and the height of security. Yeah. Yeah. And I have five
minutes?
>> Melissa Chase:
Yeah.
>> Esha Ghosh: Okay. So the last part of the talk is ordered queries on
lists. So here, L is a linearly ordered list of elements, set of elements.
And an order query is a pair from the set. And the answer is X and Y
rearranged according to their order in the list and the proof the order.
Okay. And we'll use aggregate signature for this scheme but if you know
this, I'm going to skip this. Okay. So the aggregate signature is this idea
of giving signatures or indistinct messages by indistinct users. You can
generate a short succinct signature sigma. So given the signature in the
public keys and the messages, it's client can be convinced that they were
indeed signed by the [indiscernible] users and the secure -- we will use a
special case of a single signor instead of indifferent signors. For M
messages, we'll use a special case of a single signor and it is valid only if
aggregate used all of the sigmas as a security [indiscernible] signature. It
cannot give you a spurious signature and aggregate it along with the other
signatures. Okay?
>>
[Indiscernible].
>> Esha Ghosh: No, no, no. It's just -- yeah. It's [indiscernible]. I'm
going to use this. Just talked about this. Okay. Now for the list
construction. So what is the basic idea of the list construction? It is
that for every element in the list, we're going to associate a member witness
with it. Maybe the picture is better. So if you have this set, X1 to X4,
for every element, there is a member witness which is some line and version
of the information of its rank in that set and the element itself. Okay.
Now, the owner will initially generate this list identified to say that this
is unique for the list. It will generate a member witness for every element
in the list and sign them individually. Okay. And aggregate them to form
the list signature. Okay. And this would be stored with the server. And
the sigma, the succinct sigma that the owner publishes is its public key and
this list digest signature.
>>
So you sign in [indiscernible].
>> Esha Ghosh: Yes. Okay. So this is simple [indiscernible] to secret keys
chosen the public key G to the P and the public parent. Okay. I'm going to
skip this for the sake of time, but the basic idea is this. So you pick
randomness for every element in the set and then compute this member witness
as G to the S to the I and to RI. Okay? I mean, this is the [indiscernible]
public key G to the S as square is to the N. So this I index is then quoting
the rank. So for the I it element, the owner computes member witness as G to
the S to the I into this randomness RI and blinds it with randomness then it
signs it hashes and signs this. Okay. These are all signed together to for
this succinct signature which is the aggregate signature. And then the sigma
L is basically all this member witness, sigma I and the -- this information
is stored with the server. Authenticates information that's stored with the
server. Okay. Now when a query comes, the order of queries to actually
prove that some element precede another. X precedes Y. These order
witnesses are computed by the server online. These are not computed by
the -- the only [indiscernible] is proportionate to N, the list size that's
up. Now, these order queries are actually computed. This is to prove that
it is a part of that list so it initially, the server generates a signature
for the queried elements only. This is because of the homomorphic nature of
aggregate signature. Okay. And some verification [indiscernible] I'm going
to skip for the sake of time, but in this part essentially proves that the
signature and this two together forms -- gives you the list digest signature.
It proves it's a part of the list. And it or order is more interesting. So
the order witness that is computed is computed basically it's a blinded
version of the distance between these two elements. So just compute it as
follows. Let the element [indiscernible] prime and I double prime of the
queried elements and the distance between them is I prime [indiscernible]
double prime. Then the weight is computed is G to the S to the D. This is
the distance. Right? Between element 1 and 2. Okay. Notice that the
server has this from its public key. Right? It has this whole G to the S
sub-S to the N. So this N to R one, N to R2, the corresponding randomness
that the server -- the owner had originally used to compute the member
witnesses.
>>
[Indiscernible].
>> Esha Ghosh:
>>
Sorry?
[Indiscernible]?
>> Esha Ghosh: [Indiscernible] yes, yes. Yes. And then it just generates
those order witnesses. Okay. So the verification is first verified that the
signature is correct and then this was the verification. Yeah. Okay. So
this is the idea of the verification. So remember the member witness encoded
the rank information in the blinded [indiscernible]. So the client in a
blinded fashion checks this equation, basically. That rank of X1 -- when the
queries are X1 and X three, [indiscernible] rank of X1 plus the distance
equals the rank of X three. So that's just the check. So this is the order
witness of the owner -- server had computed. These are the two member
witnesses that are computed by the owner that it had got. And it verifies
that this equation makes sense.
>>
[Indiscernible].
>> Esha Ghosh: So this is [indiscernible], yes. On this [indiscernible].
So again, the dB [indiscernible] is one here. Yeah. This is complexity
which is really proportional to [indiscernible] optimal. Whatever is answer
size, the client proportional to that. [Indiscernible] work. And yeah. The
proof is based on this and [indiscernible] inversion assumption. So the
intuition is that the server had to cheat, it had to compute one over S
[indiscernible] S to the one over S and therefore -- while computing the
order witness that's the reason it cannot cheat. It goes on fortunately to
the signature along with that. And finally, okay, so this is the static
construction so we have a dynamic construction over this which is from
this -- the observation that even though this construction used rank
information, we really did not need the rank. What we need is something that
respects the rank order. But you could use another mapping to allow your
domain. Instead of 1, 2, 3, you could say 5, 8, 12. So that idea gives the
[indiscernible] data structure but order date structure that helps you
maintain this tags when the tags are coming online. So we use this data
structure to make this dynamic [indiscernible]. Yeah. And that's it. The
confusion was they initiated this study of [indiscernible] preserving
authenticated data structures and the real take away from this work is really
privacy and security along with efficiency that has been the major motivation
of this line of work. And these are the papers I talk about. Portions from
this paper. All are available on apron. Of course you're welcome to talk to
me about them and thank you.
[Applause]
>> Melissa Chase:
>>
Questions?
So, [indiscernible] the construction hides the sides of the set.
>> Esha Ghosh:
Yes.
>> So if you were to relax it say [indiscernible], the constructions were
super complicated just to hide the set size.
>> Esha Ghosh: Right. There are constructions that reveal the set size,
which is pretty much in the same module. Well, the soundness is not as
strong as that because they're really not considering update, but there has
been results which revealed the set size and could do the same construction.
This was published in TCC last year.
>>
[Indiscernible]?
>> Esha Ghosh: Yeah. Yeah. It used a form of signature so it was more
efficient than doing proportional work to the set size, yeah. Yeah.
>> And then my other question was for the range queries.
there, you don't want to [indiscernible]?
So the privacy
>> Esha Ghosh: Well, [indiscernible] disclosure. So yes, if there are
colluding clients who have access to the entire database, then you cannot
hide anything. Right? Of course they will learn [indiscernible] collude.
But the idea is to control [indiscernible] so that the proof doesn't tell you
more than that. So if you were to only query the database, you will not
learn more than that.
>>
Okay.
>>
I was going to ask the last thing [indiscernible].
>> Esha Ghosh:
That one?
>> Yes. Data structure.
[indiscernible]?
How much do you have to extend your
>> Esha Ghosh: Yeah. It's double the size. So if you were to allocate
[indiscernible] size N, you allocate space for about 2N data sets and
generate that so that you can keep on generating these tags which are like
grants which as long as the number do not fall below N by 2 or 2N or go above
2N. And then if it happens, then you basically rebuild the data structure.
So then the model is guaranteed.
>>
[Indiscernible].
>> Esha Ghosh:
It does.
Yes, yes, of course.
It does.
Yeah.
>> So you just completely [indiscernible] scratch every time you have to
change the size.
>> Esha Ghosh: Yes. You have to recomplete, that's right. But because you
have [indiscernible] guaranteed, doesn't happen too often. Yeah.
>> Melissa Chase:
[Applause]
[Indiscernible] again.
Download