37313 >> Melissa Chase: All right. So we're very happy this week to have Esha Ghosh visiting us from Brown University. Esha is a Ph.D. candidate and she's been working on intersection of privacy and security and applied crypto. And today she's going to talk about some of her work on authenticated data structures and adding zero knowledge. >> Esha Ghosh: Thanks, Melissa. So, today I'm going to talk about our work on efficient zero-knowledge authenticated data structures. So this is a brief overview of the talk. First I'm going to tell you why this field of study is important and what is this field of study. And I'll tell you some traditional authenticated data structures that actually do not give you the support for privacy. Then I'll give you first an informal and then build up to the formal security definition that we propose. And finally, I'll give three different constructions for three different data structures. And this section is completely modular. And finally, will conclude. So with the advent of cloud computing, a large number of corporations and individuals outsource their data and computation to this untrusted cloud storage or server. It is a very common thing now and since the data is now the owner who owns the data is not in physical possession of the data anymore, it is stored on an untrusted server, there is obvious question of integrity as in that the data has not been tampered with and it has been maintained properly. But along with that, often third-party providers of given partial access to this data, which introduces another problem of privacy. So they have partial access and partial view of this data, but to prove integrity, often more information is leaked about the data than they should learn. So this is more than a theoretical risk because there have been serious attacks in this. So I'm just going to give all few motivations. So let's go to zone enumeration attack. It is an attack of DNS name result queries where DNS queries are basically give me the IP address of some host name. And the way the zone enumeration works is an attacker gets to know all the IP addresses of a particular zone, all the host names of a particular zone. So how does it work? So this -- the owner of this zone, it generates -- let's call this there are two kind of resolvers. Primary name resolvers or secondary resolvers. So when a query comes and give me the IP address of this particular website, this resolver gives the answer, that this is the IP address. Now, in early '90s, DNS sec was proposed to defend against attackers. So that you know that the correct IP address has been returned, addresses are usually signed by the trusted owner. And then there is this issue, but then there are zones that are not -- the queries can come from arbitrary website corresponding to [indiscernible] no IP address in the word. So you have to also prove to the querier that that result is not in the database. Now, proving a positive, so there are two kinds of queries. One is positive membership, one is non-membership or negative membership. So proving positive membership is easy. In the offline phase, the resolver could then sign all the [indiscernible] that are in the log and store it with an untrusted server. So this setting is very important because often the [indiscernible] name servers that administers a zone, they do not -- they have secondary resolvers. For example, if you do an NS lookup for Brown [indiscernible], three of the rim resolvers are owned by Brown. One is actually owned by UCSP. So what happens is the secondary resolver can also answer this queries. And they may not be trusted. They're secondary resolvers. They do not administer the zone. So and the problem here is how to prove the non-membership. The membership proof is sort of easy. You just sign it offline. The authoritative name server signs it offline, and the secondary just gives the signature. And that proves membership. So there have been a few ways proposed for non-membership. So the most recent one was NSEC three which basically says hash all the website names. Generally there's hash values. Then sort those hash values and sign every pair. Okay. And store this record with the secondary resolver. Now, when a query comes, say q.com, which is not in this list of data, the proof is -- the hash of q.com is computed and let's say the hash falls between these two records. So it is efficient to return this to record sign and that will prove that that zone is not in the database. That was approved. Now, what is the problem with this proof? Of course this is dictionary attack. And the zone enumeration attack. So what an attacker can do is it can take -- ask many, many arbitrary queries and collect all these hashes of the zones and then mount an offline dictionary attack on this. So, this is a scenario where privacy is really important, not just integrity. Okay. Another example is where public -- yes? >> [Indiscernible]. >> Esha Ghosh: Yeah. >> So why shouldn't someone be able to see the list of all the domains [indiscernible]? >> Esha Ghosh: Oh, this is a privacy [indiscernible]. You are going to learn all the host names in a zone. That could be a privacy preserving thing in itself. You might know all those names, the router names and everything. And that is a base to mount more complex attacks. >> Okay. It's not information I [indiscernible]. >> Esha Ghosh: No, no. You have to learn. You have to ask, what is my IP address. And then you learn all the host names in that zone. Okay. This is the -- personal digital records are often subject to audits or authorized parties or analysts are given access to it when something malicious goes on. For example, the database -- the e-mail record of a particular company. It is often stored with a third party server. And in case of some sort of fishy activity between certain dates in this example, say, this date period, our authorized data analyst is given access to records from within this time period. But because of privacy issues, here, the requirements are twofold. First of all, you need to prove to the analyst that the server hasn't tampered with the messages. It is returning the correct messages that it got from the organization. But also, because of privacy reasons, it's necessary not to reveal anything beyond this time period and the data. And finally, this is another example where XML data is stored as directory trees. So for patient health record, for example, the insurance company might have access to certain portion of the tree and to prove integrity, you should not reveal anything beyond that subject [indiscernible] access. Some more examples. So this scenario motivates the following model. Okay so this is a three party model where there is an owner, a servers and clients. Okay. The owner is a trusted owner of the data set. Someone you would trust to generate the data set and give some authentication information on it. Now, this data is stored on this untrusted server. So the adversarial model is that the server is not trusted to store the data faithfully all along. It might delete it or it might get [indiscernible], might get attacked, but so there's a -- okay. I'll come to that later, but the data is stored with the server. So the owner basically generates some sort of authentication information about this date set and stores it with the server. It also generates a short succinct digest of this data set and makes it publicly available. Okay. And then the owner can store and periodically update, but otherwise, owner goes offline from this scenario. Now, the clients interact with the server to ask queries on the data set and gets responses. Here, the requirement is that first is integrity where it does the -- the client never accepts incorrect answers. So whatever the owner has generated, the server cannot give some answer which is inconsistent with the data store generated by the owner. And the privacy requirement is that the client does not learn anything beyond answer of the query. Anything else about the data set. Okay. So if you are familiar with traditional authenticated data structures, traditional authenticated data gives you the guarantee of integrity that data is not getting tampered. But this proofs are generally leaks a lot of information. So for example, this is a very well-known data structure, Merkle's Hash Tree. So when you prove [indiscernible] Merkle's Hash Tree, for example, you want to prove this integrity of limit X two so the authentication path of those red nodes which revealed a lot of information, first of all, it reveals that there are a number of records in the database and if this data elements are stored and sorted it also reveals the rank of the element. A lot of leakage happens with the proof that this is indeed in the database. And the way non-membership is proved is if their order like we just discussed in the hashes, it will give to neighboring elements and that also reveal as lot of information. So this is not privacy preserving. Okay. So this is another data structure authentication skip list which again reveals a lot of information. This is to say that the traditional aliases do not give privacy [indiscernible] privacy. Okay. So now I'll come to the formal security definitions so if you have any questions at this point, good to ask. Okay. Okay. So the algorithm, there are three parties that we set, the owner, the server and the client. And they're three algorithms. One is the three-generation algorithm that takes the security parameter and generates a secret key into a public key for the owner. The second phase, the owner has its own security key set it up as to zero. Let's say times times zero. It generates some succinct authentication information, the little orange box that we saw. That will be publicly posted. Anybody can access. Then there is theta zero which is the authentication information that it will store with the server to basic facilitate answering queries. And some state information, internal state information that it will use if it ever wants to update the database. So in the update state, it just takes all this information so the data will snapshot at some time I and this is not dead treatment. UI, which it wants to update on the database. For example, if it's a set maybe it wants to insert a new element or a delete an existing element. And it gives you the updated database. The updated is you succinctive nature. Sigmoid plus one. Some updates [indiscernible] used to update things at its end and updates its own state information. >> [Indiscernible]. >> Esha Ghosh: >> [Indiscernible]. >> Esha Ghosh: >> This one? This one? [Indiscernible]. >> Esha Ghosh: Oh, that is the server. So the server, as of now, has theta I. So it will take this update string and update it at its site, which is this. Okay. It takes it update I, updates the database, and the [indiscernible]. Now, so the server is responsible for this perform update and query algorithm. In query algorithm, it has this sigmoid J which is public information. Everybody has. It has theta J with corresponding to the database J and it receives a query directly from the client. Okay. And it generates an answer in the proof. So the rest of the information it has from the owner, only this is the online information that comes from the client. And it generates a proof for that. Yeah? >> [Indiscernible]. >> Esha Ghosh: >> This one? [Indiscernible]. >> Esha Ghosh: It depends on the implementation. It could even have it in its state information. So it just needs to not what was the last date that it was in. Okay. And in the very [indiscernible] phase, the client just takes a public key, the sigmoid and the query, its answer and proof are neither accepts nor rejects. Okay. So like we already described, the security features are twofold. Completeness just means that correct answers below has been accepted. The soundness is the game that models the adversary behavior of the server. So in this game, the adversity initially sees the public key from the challenger. And it comes up with the database of its choice. [Indiscernible] zero. It gets the information that an honest challenger would do. An honest owner would [indiscernible] sigma zero and theta zero, and then some polynomial number of times it asks some update information of the database and gets the update strings. Okay. And finally, it outputs some answer which is inconsistent of the database snapshot J. One of those snapshots for which it has queried. And it wins if the answer is accepted even though the answer is not correct with respect to the database. Okay? So you notice that the server actually in the original scheme [indiscernible] the server doesn't do the updates but this is a stronger server who can also affect the updates. And so even when the updates are adversary, it should not be able to forge this. >> [Indiscernible]. >> Esha Ghosh: This one? This is the succinct public information about the database. That's public, yeah. Okay? >> [Indiscernible]. So the idea is the owner of the database updates the database and then [indiscernible]. >> Esha Ghosh: Yes, inside, yeah. Should not be able to learn from, yes. >> So I don't understand the role of this sigma. information -- Like what public >> Esha Ghosh: [Indiscernible] succinct coming into the database by the owner. So because the server has to prove correctness with respect to the original database, but it doesn't know the original database so it is consistent with respect to the succinct commitment that the owner generated. >> You need to verify your [indiscernible]. >> Esha Ghosh: >> Yeah. [Indiscernible]. >> Esha Ghosh: Yes. Okay. >> The verifier doesn't know, yes. Server knows, of course, yeah. The verified doesn't know. So [indiscernible] definition, is that a standard definition? >> Esha Ghosh: No. This is the -- these are the definitions we proposed for original authentication data set. This was a formal study of this. And then there is the privacy definition which models the adversarial behavior of the client or the verifier. So here, the client is adversarial or the adversary is modeling the client so the adversary initially sees the model as it's either talking to the challenger or it is talking to a simulator and it will succeed if it can tell [indiscernible] who it's talking to. So the idea is this [indiscernible] should be distinguishable to the client, to the adversaries. So the adversary initially sees the public key which comes from the challenger of the simulator. Then it comes up with a database of its choice. So it was [indiscernible] database, which is sent to the challenger if it were talking to the challenger, but simulator sees nothing about the state of this. And then it sees this succinct [indiscernible] zero that the owner would generate. Simulator also simulates that and shows it to the adversary. And again, then that adversary asks two kinds of queries. One is query, one is update. So if it asks a query, it gets great the honest challenger the answer in the proof and in case of the simulator, it only has Oracle access to the database. Meaning that it -- the all simulator only receives the answer to this query with respect to the most current database from the Oracle. And you it simulates the proof. So answer is only part the simulator gets from the database with its Oracle access and this is the proof that it's in units. Then if there's an update query, then the honest challenger receives the update and it updates this sigma. But for the simulator, it doesn't say anything except a [indiscernible] depict meaning is this a valid update on the current database. For example, if it's on a set, it's trying to insert duplicate. That's not a valid update. So it only receives a valid update, and assuming it's a valid update, it gets sigmoid plus one in both cases of valid update it gets [indiscernible]. >> [Indiscernible]. The simulator gets to see the query and the answer but not -- it just doesn't get the data [indiscernible]. >> Esha Ghosh: Yeah. It doesn't need to see the query also. to see the answer to that query. Yeah. >> It only needs [Indiscernible]. >> Esha Ghosh: Well, yeah. Yeah. [Indiscernible]. Yeah. Okay. So that was what the model. Now I'm going to move on to the next part of the talk which is specific instructions for some specific data structures. So the first, these are the three data structures I'll try to cover. The first one is very simple. It's set membership and non-membership. So this I guess I don't have to go over this. Yeah? >> You need to make the definition again? >> Esha Ghosh: >> Yeah. [Indiscernible] simulator should have to know the [indiscernible]? >> Esha Ghosh: Well, the simulator must know the kind of query, yes, but it's already defined by the entity. The actual data type on which ->> But, I mean, shouldn't it have to know the actual query [indiscernible]? Well, it gets the answer are from ->> Esha Ghosh: >> It gets the answer [indiscernible]. [Indiscernible] proof for a particular -- >> Esha Ghosh: Query? >> -- statement, right? query? Because the verifier [indiscernible] particular >> Esha Ghosh: Okay. Yeah. Yeah. That's right. Yeah. Get the query, right, okay. Yeah. [Indiscernible] query and [indiscernible] just not the database, yeah. Okay. So set membership, it's based on bilinear maps, the constructions [indiscernible] notations. And so for the rotation, let's say kai is a set of elements. This is the database and the entity so the client queries is an element X in the set or not? In this set, kai or not, and the server's response is a bit, which is the answer, yes, it is in the database or not it is not in the database and the corresponding proof. So for a set, we first presented as a formal polynomial, [indiscernible] minus zero, so doesn't matter. And the polynomial is called characteristic polynomial of kai. This is the formal polynomial. And when -- sorry. >> [Indiscernible]? >> Esha Ghosh: That's a product, yes. And when this is evaluated as a secret point, we just denote it like that. That's a value. Okay. So this is the construction for the key generation of the owner so these little red things denote the number of operations, operations [indiscernible]. So first, you generate the bilinear public parameters so update this bilinear group, G, G run, we are doing it as symmetric group, symmetric value integrating. The bilinear map, and this is a generator of the group G. And this is the prime order of the group. So you choose a secret as from CD star randomly. This notation is just say it's random, and set the secret key is S and the public key is G to the S and this parameters. >> What is this G one? >> Esha Ghosh: >> That's the group where the -- >> Esha Ghosh: >> It's a bilinear -- Yeah. [Indiscernible]. >> Esha Ghosh: Okay. Now in this setup phase it takes the owner again runs this set. So the additional thing it does is chooses a random R from [indiscernible] star, which is the blinding factor, really. And it sets a sigma zero G to the R, the characteristic polynomial evaluated at S for secret part. Okay? And then theta zero. Remember, this is the auxiliary information that is stored with the server so this is this long public key -it's not the public kai like this tuple and there are blinding factor and the set size. Anyway, the set is [indiscernible] so this is not in part. And for this talk, let's say the state is the set that is stores in case it needs to update the set later. But you can do other things like this can also be outsourced and can use Merkle tree to just keep the root to know the state of the set but for simplicity, let's say the state is the set that it stores. Okay. Now the query part. So for the query part, the server has some snapshot of the set kai J. So theta J and this is the query delta. So if [indiscernible] go to X is in the set, okay, then the answer is one. And the proof is this sigma J to the Y over [indiscernible] which is basically this polynomial. Okay. So the server computes this and returns it as proof. >> [Indiscernible]. >> Esha Ghosh: I'm not going to talk about it, but this is a polynomial, it is just a characteristic polynomial with one factor missing. It's dB minus one characteristic polynomial. But the server cannot compute it -- cannot compute in polynomial and raise G to that, but it has the string G due to the SS square S to the N, so it's just basic [indiscernible] to the correct coefficient and that's how it computes it. >> [Indiscernible]. >> Esha Ghosh: It computes it. It was not computed earlier. So what was computed earlier was this -- without this factor as plus X divided. Okay? So that is ->> [Indiscernible]. >> Esha Ghosh: >> Yeah, yeah. That's why -- [Indiscernible]. >> Esha Ghosh: Yeah, yeah. You -- it can be done in 15 first time and log in. Yeah. Okay. And then if the set is not in the set, the way proof non-membership is think of the characteristic polynomial of X plus C. This is a degree one polynomial, right? If your set is only the query X, then C plus X will be character polynomial. Now, if this is not in the set, then this polynomial and this polynomial might be called [indiscernible] to each other. Meaning that each cannot have a common factor, common divisor. Okay. So the way -- then this is run [indiscernible] Euclidian algorithm to generate the coefficient polynomials. [Indiscernible] does that. Q1 Z and Q2 Z. And what we actually want to give is G to the Q1 Z and G to the Q2 Z. That will prove that it's non-membered, that you believe and check in exponent using bilinear map that these are coprime. But this is not perfectly blinding. So we have to traditionally do the step of pick a fresh randomness, which the server does, and blinds this Q1 prime and Q2 prime. Make it Q1 prime and Q2 prime and return this as non-membership witness. So the non-membership witness is two elements, W2 and W3. Okay? Yeah. >> [Indiscernible]. >> Esha Ghosh: Oh, because there is this blinding factor R in the original accumulation. So sort of have to cancel it out to see the verification goes through. >> [Indiscernible]. >> Esha Ghosh: So if you -- yeah. So this is the verification, right? When you do the [indiscernible] take that R off. Yeah. That's how. And the verification is obviously simple. Just for the membership witness, you plug that witness, right? So this is run by the client. So recall that the client has this and this from the owner. Clients get this from the server. And then it can compute this part on its own. And it plugs in and checks if this equality holds for membership. And for non-membership, we basically checks that the extended Euclidian -- the GCD is one in the exponent. So what this give us G to the Q1 S into [indiscernible] plus U to the Q to S [indiscernible] and it [indiscernible] one in the next one. Is that clear? So this is the question that we want to check in the exponent. Right? So yeah. And this gamma factors are set accordingly so that they cancel out in the exponent and have been verified. And the updater algorithm result is really simple that stand by the owner. So if a new element is added to the set, it just refreshes the sigma I with this new factor and blinded with a fresh randomness. So it's like S plus -- let's say X was not in the set earlier and it's adding this new element to the set, the owner, so the owner takes the old digest and raises it X and R prime, the fresh randomness. For non-membership witness, if it wants to delete something, it just takes off that spot from the set. That's all. That's the update. Yeah. And perform update is very simple. It's just refreshing with this new R prime on the server's end. And the sigma. >> [Indiscernible]? >> Esha Ghosh: The server needs R, R and R prime, yes. Because earlier one was just R. Now you have blinded with fresh random is R prime so now the random that is an exponent is RR prime. Yeah. >> [Indiscernible]? >> Esha Ghosh: Well, yes and -- yeah. If N prime is larger, it will grow, it takes one more factor, right, yes. Exactly. >> [Indiscernible]. >> Esha Ghosh: Yes. It's in the Q [indiscernible], yes. Yeah. Yeah. I'm not going to prove it here, but the proof is based on N strong [indiscernible] by assumption. It's basically says for [indiscernible] adversary who has access to this public parameters, it is hard to invert in the exponent. And that's what the proof is based on [indiscernible]. Okay. Now we're on to the next -- yeah? >> [Indiscernible]. >> Esha Ghosh: Oh, why in [indiscernible] is because every part that the client sees how a proper blinding factor. So what does the client see. Sees the public key, but beyond that, whatever it sees about the database that the succinct sigma that it sees has this blindness in its exponent and the way we constructed the proof, every proof also has a -- sorry. also has randomness. >> Yeah. Every proof [Indiscernible]. >> Esha Ghosh: If you see different proofs from the -- yeah. It will be the same R, but that correlation is already known to the client, right? You're not trying to hide -- there's no out [indiscernible] probability. Not trying on to say that it's not from the same database. >> So is it something like [indiscernible] is exactly the one value that satisfies [indiscernible]. >> Esha Ghosh: >> It's not that. So the randomness is fixed when you say [indiscernible]. >> Esha Ghosh: >> No, it's just a randomness thing. [Indiscernible] but here's -- No, [indiscernible] so the sigma J should -- >> Esha Ghosh: Yeah. So, yeah, so the simulator can just pick any randomness and say G to the R is the sigma. And whatever proof comes -yeah. Just divided. >> [Indiscernible]. >> Esha Ghosh: Yes. At the beginning, yes. Yeah. Yeah. Exactly. And this witness has generated with respect to the initial randomness that you picked. >> [Indiscernible]. >> Esha Ghosh: >> [Indiscernible]. >> Esha Ghosh: >> Just one random, yeah. Yeah. And then it just divided [indiscernible], yeah. [Indiscernible]. >> Esha Ghosh: Sorry? >> That would be the only thing that satisfies the verification, once you pick sigma, there's only one ->> Esha Ghosh: There's only one Okay. So for the next part, I'm the problem is the data store is a totally ordered universe. And degree of freedom, yes, yes, exactly. Yeah. going to talk about range queries. And here a P value paired store. Okay. So P is from the client query is basically return all the values [indiscernible] live within some range A to B. And the server response is the answer which is the correct key value pairs along with the proof that this answer is complete. That all the limits that it has returned early in the database and it has not omitted anything. Like an example, this is actually from the Enron e-mail data set on which I ran experiments for this query. So you can think of this as the date or timestamp and this is the mail ID. So if you think of the database there are many possible timestamps and in timestamps, the data, there are mails, others are empty. Nothing is in there. So this is basically your range of the domain set, zero to 15. And some of this is the data store, which six results. As you can see, the six results are present. Only these keys are presents in the data set and the rest of the keys are not present in the data set. So let's try to see how we could think of a simple solution in this case. Do you have any questions? Okay. Okay. So the one way to proof this could be that we initially what we're been talking about, this idea [indiscernible] a bunch of times so the key value pairs that are present in the database could be signed by the owner. And to prove that the things are not present, the owner could also sign [indiscernible] in pairs, right, on the keys that are present. So one key -- or one signature could be for 01, the keys that are present and this pair and then this pair and so on. So to prove that something is not in the database, you can give the corresponding -- the server can provide the corresponding signature. But this obviously valid privacy. Okay. Now, there is an attempt to the [indiscernible] think of like why not from the solution we saw in part one, why not accumulate the elements the owner maybe computes a zero-knowledge accumulation of the elements that are present in the data set and then when a query comes, the owner proofs -- the server proves that these are members of the database the owner has accumulated. And to prove that there are no limits, it proves non-membership and zero-knowledge using the accumulator. But the problem here is this range could be too long. Corresponding to the number of queries, data sets in the database. So for example, the key in this case is a four-bit string, there are two to the four possible keys. But only a very few are present in the data set. So if you want to prove non-membership, you have to do so much work as to touch every possible element that is not in the data set. So that's not a viable solution. So let's see what we can do then. This idea that we use was of hierarchical identity based encryption. Okay. So the idea here is anyone can encrypt messages for users using their public IDs. And the users will be able to decrypt only if it possesses the correct secret key. Okay. And this [indiscernible] implicitly you can think of a hierarchy as a binary data stream where the route is empty and the left child is marked with a zero and the right child is marked with a one like this Redis tree. Okay. So if the idea at level K can issue secret keys for as descendent IDs but nothing beyond that. So if somebody has a secret key for this ID, it will be able to generate secret keys for every ID that's below it or it's in that subtree, but will not be able to generate anything outside that subtree. So the idea is if a message is encrypted with the ID of 00, 00 can itself decrypt it or it can generate secret keys for any descendent who can decrypt it but nobody else will be able to decrypt it. Okay. Okay. So now the idea now we're back to our range queries. So think of the database that they were initially thinking of. Okay. Now, this key value pairs are present in the data set and the rest of them are not present in the data set. So think of this implicit Redis tree and delete all the [indiscernible] to root parts for these keys that are present in the data set. Okay. So end up with the forest which has basically this orange note. Okay. So what the owner does is the owner generates signature for all the key value pairs that are present in the data set. And sets up a hierarchy identity to [indiscernible] encryption scape of that height of in this case four-bit of length four and it generates this secret keys for the roots of this forest. So this is the setup phase. And this are stored with the server. Now, when a query comes, let's say the query is 4 to 14. That's the client query. The owner -- the server first returns this [indiscernible] key value signature page so that the databases that are present in the data set so it returns this to the clients. Okay. And for the non-membership, we have to -- first let's look at this idea of canonical covering for a range. What does it mean? For a canonical covering for a range, with respect to a tree of certain height, say. So the canonical covering is basically a bunch of node in the tree which satisfies the following properties. First is that every node is a descendent of one of the nodes in the canonical covering. And the second is every node's leftmost child and right most child falls completely within that range. And this of given height of course, given a tree and a range, this covering is unique. So in this example, if 4 to 14 is my query drench, this is the canonical covering. The roots of this forest basically. This one and this one and this one. Okay. Now, since this is unique, the client can compute it on itself. Right? Now, the client computes it on itself and it also has got the signatures of the elements that are present in the data set. So it computes this canonical covering and deletes those parts from this forest. The red parts it deletes so it's left with this forest, the blue forest. Now the client takes random messages and crypts them with these ID, the blue IDs. And sends it to the server. But is that because of the way the setup was done, the server either [indiscernible]s all of this corresponding secret keys or it is able to [indiscernible] the secret keys. For example, in this case, you recall that the server did not have the secret key for this one but it had the secret key for its prefix so it can generate the secret key for this and then it's able to decrypt it. So if you're able to decrypt it, you can send it back and if they match, the client accepts. So that's the -- and you can replace it with the signature hierarchy identical to the signature instead of encryption. And that's the complexity. Yeah. This is a comparison of the previous work so the only previous work known for this was from I cat 2004. Which is a long time ago. [Indiscernible] and that was a stateful algorithm, it was very inefficient, would generate proof of knowledge. Okay. Okay. So yeah. And the security is based on the unforgeability of the -- sorry. It's [indiscernible]. The signature scheme and the height of security. Yeah. Yeah. And I have five minutes? >> Melissa Chase: Yeah. >> Esha Ghosh: Okay. So the last part of the talk is ordered queries on lists. So here, L is a linearly ordered list of elements, set of elements. And an order query is a pair from the set. And the answer is X and Y rearranged according to their order in the list and the proof the order. Okay. And we'll use aggregate signature for this scheme but if you know this, I'm going to skip this. Okay. So the aggregate signature is this idea of giving signatures or indistinct messages by indistinct users. You can generate a short succinct signature sigma. So given the signature in the public keys and the messages, it's client can be convinced that they were indeed signed by the [indiscernible] users and the secure -- we will use a special case of a single signor instead of indifferent signors. For M messages, we'll use a special case of a single signor and it is valid only if aggregate used all of the sigmas as a security [indiscernible] signature. It cannot give you a spurious signature and aggregate it along with the other signatures. Okay? >> [Indiscernible]. >> Esha Ghosh: No, no, no. It's just -- yeah. It's [indiscernible]. I'm going to use this. Just talked about this. Okay. Now for the list construction. So what is the basic idea of the list construction? It is that for every element in the list, we're going to associate a member witness with it. Maybe the picture is better. So if you have this set, X1 to X4, for every element, there is a member witness which is some line and version of the information of its rank in that set and the element itself. Okay. Now, the owner will initially generate this list identified to say that this is unique for the list. It will generate a member witness for every element in the list and sign them individually. Okay. And aggregate them to form the list signature. Okay. And this would be stored with the server. And the sigma, the succinct sigma that the owner publishes is its public key and this list digest signature. >> So you sign in [indiscernible]. >> Esha Ghosh: Yes. Okay. So this is simple [indiscernible] to secret keys chosen the public key G to the P and the public parent. Okay. I'm going to skip this for the sake of time, but the basic idea is this. So you pick randomness for every element in the set and then compute this member witness as G to the S to the I and to RI. Okay? I mean, this is the [indiscernible] public key G to the S as square is to the N. So this I index is then quoting the rank. So for the I it element, the owner computes member witness as G to the S to the I into this randomness RI and blinds it with randomness then it signs it hashes and signs this. Okay. These are all signed together to for this succinct signature which is the aggregate signature. And then the sigma L is basically all this member witness, sigma I and the -- this information is stored with the server. Authenticates information that's stored with the server. Okay. Now when a query comes, the order of queries to actually prove that some element precede another. X precedes Y. These order witnesses are computed by the server online. These are not computed by the -- the only [indiscernible] is proportionate to N, the list size that's up. Now, these order queries are actually computed. This is to prove that it is a part of that list so it initially, the server generates a signature for the queried elements only. This is because of the homomorphic nature of aggregate signature. Okay. And some verification [indiscernible] I'm going to skip for the sake of time, but in this part essentially proves that the signature and this two together forms -- gives you the list digest signature. It proves it's a part of the list. And it or order is more interesting. So the order witness that is computed is computed basically it's a blinded version of the distance between these two elements. So just compute it as follows. Let the element [indiscernible] prime and I double prime of the queried elements and the distance between them is I prime [indiscernible] double prime. Then the weight is computed is G to the S to the D. This is the distance. Right? Between element 1 and 2. Okay. Notice that the server has this from its public key. Right? It has this whole G to the S sub-S to the N. So this N to R one, N to R2, the corresponding randomness that the server -- the owner had originally used to compute the member witnesses. >> [Indiscernible]. >> Esha Ghosh: >> Sorry? [Indiscernible]? >> Esha Ghosh: [Indiscernible] yes, yes. Yes. And then it just generates those order witnesses. Okay. So the verification is first verified that the signature is correct and then this was the verification. Yeah. Okay. So this is the idea of the verification. So remember the member witness encoded the rank information in the blinded [indiscernible]. So the client in a blinded fashion checks this equation, basically. That rank of X1 -- when the queries are X1 and X three, [indiscernible] rank of X1 plus the distance equals the rank of X three. So that's just the check. So this is the order witness of the owner -- server had computed. These are the two member witnesses that are computed by the owner that it had got. And it verifies that this equation makes sense. >> [Indiscernible]. >> Esha Ghosh: So this is [indiscernible], yes. On this [indiscernible]. So again, the dB [indiscernible] is one here. Yeah. This is complexity which is really proportional to [indiscernible] optimal. Whatever is answer size, the client proportional to that. [Indiscernible] work. And yeah. The proof is based on this and [indiscernible] inversion assumption. So the intuition is that the server had to cheat, it had to compute one over S [indiscernible] S to the one over S and therefore -- while computing the order witness that's the reason it cannot cheat. It goes on fortunately to the signature along with that. And finally, okay, so this is the static construction so we have a dynamic construction over this which is from this -- the observation that even though this construction used rank information, we really did not need the rank. What we need is something that respects the rank order. But you could use another mapping to allow your domain. Instead of 1, 2, 3, you could say 5, 8, 12. So that idea gives the [indiscernible] data structure but order date structure that helps you maintain this tags when the tags are coming online. So we use this data structure to make this dynamic [indiscernible]. Yeah. And that's it. The confusion was they initiated this study of [indiscernible] preserving authenticated data structures and the real take away from this work is really privacy and security along with efficiency that has been the major motivation of this line of work. And these are the papers I talk about. Portions from this paper. All are available on apron. Of course you're welcome to talk to me about them and thank you. [Applause] >> Melissa Chase: >> Questions? So, [indiscernible] the construction hides the sides of the set. >> Esha Ghosh: Yes. >> So if you were to relax it say [indiscernible], the constructions were super complicated just to hide the set size. >> Esha Ghosh: Right. There are constructions that reveal the set size, which is pretty much in the same module. Well, the soundness is not as strong as that because they're really not considering update, but there has been results which revealed the set size and could do the same construction. This was published in TCC last year. >> [Indiscernible]? >> Esha Ghosh: Yeah. Yeah. It used a form of signature so it was more efficient than doing proportional work to the set size, yeah. Yeah. >> And then my other question was for the range queries. there, you don't want to [indiscernible]? So the privacy >> Esha Ghosh: Well, [indiscernible] disclosure. So yes, if there are colluding clients who have access to the entire database, then you cannot hide anything. Right? Of course they will learn [indiscernible] collude. But the idea is to control [indiscernible] so that the proof doesn't tell you more than that. So if you were to only query the database, you will not learn more than that. >> Okay. >> I was going to ask the last thing [indiscernible]. >> Esha Ghosh: That one? >> Yes. Data structure. [indiscernible]? How much do you have to extend your >> Esha Ghosh: Yeah. It's double the size. So if you were to allocate [indiscernible] size N, you allocate space for about 2N data sets and generate that so that you can keep on generating these tags which are like grants which as long as the number do not fall below N by 2 or 2N or go above 2N. And then if it happens, then you basically rebuild the data structure. So then the model is guaranteed. >> [Indiscernible]. >> Esha Ghosh: It does. Yes, yes, of course. It does. Yeah. >> So you just completely [indiscernible] scratch every time you have to change the size. >> Esha Ghosh: Yes. You have to recomplete, that's right. But because you have [indiscernible] guaranteed, doesn't happen too often. Yeah. >> Melissa Chase: [Applause] [Indiscernible] again.