>>: All right, everyone, this next session is on data-oblivious computation. So it means computations where you are not paying attention to the data. I hope you do not take that as a cause to not pay any attention to the talks because they do have data in them that's worth paying attention to. The first one is from Marcel Keller who is a post-doc at the University of Bristol with Nigel Smart and formerly of Ivan Damgard's group back in Aarhus. And he's going to be telling us about some of the work that they've been doing in the SPDZ family. >> Marcel Keller: Okay. Thank you. So my talk is going to be in two parts. So as you maybe could tell by the title. First I'm going to talk about some older work which is mainly concerned with implementation of secret sharing based MPC, a.k.a. the SPDZ protocol. And then, next I'm going to talk about some pretty new work which using oblivious RAM in secret sharing based MPC and what kind of fun stuff you can do with that. So let me start with the implementation part. So SPDZ is secret sharing based MPC and this is the preprocessing model or also offline-online model it's sometimes called. Basically the idea is that you start with a so-called offline phase meaning it's a phase where you don't use actual secret data. So, you only generate correlated randomness which you're going to lose later in an online phase together then with your secret inputs to you multiparty computation. And then, you compute your outputs be that public or still secret shared outputs that you actually can use later. Specific to SPDZ is that you can use fully homomorphic encryption or actually somewhat homomorphic encryption to generate the so-called multiplication triples, so Beaver triples. And a property of that, because we don't use secret inputs there, is that this is highly parallelizable. Basically you just increase your capacity just by throwing more cores and computers at it. Then, in the online phase we don't use encryption any more so that's why the online phase is then pretty fast and it's information-theoretic security in the random oracle model that you achieve. So because this is basically at the core of SPDZ, I'm going to talk about it because in my perception the workshop so far has been very emphasized on garbled circuits. So how do we use secret sharing? We use additive secret sharing that means, in this case, you have three parties and they all have a share of a secret a. And they have also have a share of a MAC of a. And they have a share of a MAC key. And the MAC simply is the product of the MAC key and the secret information. And all those three things are going to simply be shared additively. So basically I'm later going to use this blue box to say, "Yeah, we have this information secret shared with MAC." And because it's all nice and linear, of course, you can do linear operations especially addition of shares and MAC keys of shares and shares of the MAC. And note that MAC key stays the same, so throughout the whole computation we have one secret MAC key. And that's actually going to stay secret even after finishing the computation. So much about the theory of the secret sharing: if you're not familiar with it, here's a short slide on Beaver's randomization technique like, how do we actually use correlated randomness? And if you look at this, basically consider you want to compute a product of a secret x and a secret y. What you then do with this Beaver triple that we denote a, b and a times b -- So that's the correlation in there. And of course for the correlation is that they are authenticated with the same MAC key. What you then can do, you can mask x and y with a and b respectively and then open x plus a and y plus b. And of course I should add, if you're not familiar, we work in finite fields here so obviously x plus a for a random a doesn't tell you anything about x if you don't know a nor x. So one can do this multiplication in the open of x plus a and y plus b. And then, there is some correction using random secret triple. And in the end this gives you a secret sharing with new randomness derived from the triple of x times y. So much about the theory. Now we come to the practical part, the implementation part. We have developed this tool chain which consists of two components: there is a compiler and there is a virtual machine. And the idea is that the virtual machine that is fast, that is kind of low-level implementation using C++. It's relatively simple. It just understands about 150 instructions, so we kind of use a bytecode like we would do in complied Java or even whatever is your favorite processor. On the other hand we have this compiler that compiles Python-like code, and I admit it's a bit hackish but we just did this to get ahead quickly. Basically we just programmed something in Python and then ran it within the scope of the compiler. So maybe compiler is a big word for the whole thing. But there we can do optimizations of the circuit. And I'm going to talk about what kind of optimizations we do. And of course basically the only or most important optimization that we do there is IO Parallelization. So if you compare here right side and left side, what's the difference? Well it looks almost the same but not quite because on the right side we can parallelize because the two computations are independent; whereas, in the left side the second line of computation is dependent on the first line of computation. So of course on the right side you get an advantage in terms of IO communication over the network if you do the mask and open at the same time. And then, you can compute zed and u in one go. And of course with just a simple example I mean it would be easy to do it manually, but think of something more complicated like this. Obviously you don't want to go here and say, "Yeah, okay, which round can I compute? X26. So that depends on X23 and [inaudible] X22," and so on. It's a bit tedious on one hand. On the other hand, an easy way to get out of here is going for SIMD instructions. And of course depending on the application, this is very promising. But I would argue not always. And I think with the whole implementation of oblivious RAM that I'm going to talk about later we saw that the possibility to have an automized compilation of any general circuit or computation really gives you an advantage. So how we do this optimization? Well, we understand the circuit as a graph. And what you can see here on the left side is basically a visualization of one multiplication. You use the inputs x and y then the triple. There's this addition and sending-receiving; that's the masking then open part. And then, there is some multiplication of public and secret things so that's like linear in the secret information. And then a subtraction, addition at the end is your output x times y. So I think it's pretty obvious what's what in this graph. We have instructions are nodes. Edges are output of this instruction is input to that instruction. We kind of do this thing of putting an edge weight on it, so basically we say, well, local operations are for free. So we assign weight 0. Communication operations cost us IO, so we assign a 1. As you can see between the send and receive there is one each. And now we want to figure out the optimal number of rounds. To figure out in which round we want to place an instruction, we compute the longest path. Because obviously if you're starting some computation the longest path with respect to the weights will tell you the rounds because you have to do all the previous IO before you can actually do what you're at right now. And then we simply merge all communication per round and this gives us the optimal number rounds. Yeah, there are some more not too-complicated graph algorithms involved. So if we represent all computation of the whole circuit as a graph and we some merging, of course, like we have to output this in the linear order that can be run the virtual machine. And this is so-called computing the topological order; that's pretty standard. And another thing is, of course, similar to what a normal C compiler does is kind of to figure out how much memory do we need or how many registers, as we call them, in our virtual machine. So we want to reduce the memory usage with some -- a bit of heuristics there. But it works pretty well. So that's the first part. Now the second part which -- Yeah, I guess that's why I'm talking in this session -- is how can we use oblivious RAM in MPC? Just to set the stage: what is the goal? The goal of our work is to have oblivious data structures. What do we mean by that? Well, the simplest thing you can think of, an oblivious array; that is we have an array of secret shared information and we want to access this array with a secret shared index, of course, without revealing the index. Very similarly an oblivious dictionary. Yeah, pretty obviously we're talking about a key, a secret key; not in terms of, obviously, cryptographic keys but key for the dictionary. Also we will need later is obscuring how we are actually accessing the array, so are we reading or just writing? Also sometimes we need to kind of pretend, "Well, we might or might not write to the array, to the dictionary." We'll see that later in our application why we do that. And basically building on the oblivious array we have implemented an oblivious priority queue. So if you don't know what a priority queue is, I'm going to explain that later briefly. But I guess many of you will know. So then again everything is secret, the priority and the values. And what we also will need in terms of obscuring the type of access operation is in particular whether we are decreasing the priority in the queue or whether we're actually inserting a new value. And that is actually somewhat straightforward because if you think how this work does like both operations involved this -- What is it? -bubbling up or down -- I can't remember at the moment but it's just the same direction in both cases. Of course one could extend this to even obscuring are we decreasing the key or the priority or are we even removing the element of the lowest priority. But we don't need that in our application. Okay, oblivious RAM: we haven't heard so much about oblivious RAM in these two days so far, so I'm going to briefly set the stage here. So think of a scenario, kind of the scenario of a CPU and the RAM. And the CPU obviously is very limited in memory; whereas, in RAM you have a lot of memory. And there are all those excesses going on for addresses in the RAM. But as you can see in this case, in this simple example there are some kind of pattern. So x 0 is accessed a lot, actually every second time. And one can think of if the RAM or the server as it's also called, even if the information is encrypted on the server, the access pattern might reveal something especially if we assume that there might be other side-channel information available to the adversary sitting on the RAM. So what we actually want to have, we want to obscure the access so that basically every address being accessed is chosen according to some randomness distribution. Now how do we translate this setting? Basically at the core of our work is translating this setting to secret shared based MPC, and this is not necessarily a new idea. [Inaudible] like Ostrovsky and Kushilevitz in their original ORAM paper, they briefly mention this possibility. Also [inaudible]. So basically we do, the client, a.k.a. the CPU is just replaced by an MPC circuit. We know we can compute whatever in an MPC circuit so that's okay. On the other hand the server doesn't really do computation, so the only thing we need to do there instead of encryption is we translate this to secret sharing because obviously secret sharing is secret under the premises of our scheme; in the case of SPDZ it's fullthreshold security. And then, basically the RAM access operation is translated into revealing the address that you're actually accessing. And I would say it's pretty straightforward to see if it's in a classical ORAM scenario, if it's okay for the server to see the address, it's okay for all the players in the multiparty computation to learn the address. And luckily we're in an interactive setting so we can continuously reveal those addresses. And then, the players can use this revealed address to access the right secret share in their big long array of secret shares that form the encrypted RAM in the ORAM scenario. Okay, so in the classical setting, non-MPC setting, the easiest solution to ORAM is called Trivial ORAM. And that's basically whenever you access the RAM, you access all of it. You just ask the server, "Give me everything," and then you can pick the ones you actually want to access. Now how do we translate this to an MPC scenario? Well, use this thing that we call index vectors. So that's basically a vector that has lots of secret shared 0's and exactly one secret shared 1 at the right position that you want to access. And then, doing this dot product with the whole -- so a 0 to a n minus 1 here stands for the RAM. That's all consisting of secret shared values. And obviously what we get out is ai if the secret shared 1 is in f position. So how can we get this index vector? Obviously the easiest solution is just like do comparison with every possible value, and this gives you this index vector. So here [inaudible] I mean is a secret comparison that returns a secret shared 0 if equality is not given and 1 if equality is given. But what is not so nice especially now in our secret sharing based scenario is that comparison is relatively expensive. So what we actually found to be cheaper is the so-called Demux operation. And this is basically something that comes from hardware engineering like handling bits. Basically we do what we call a bit decomposition, so we get an array of secret shared bits. And then, a Demux operation basically gives you this secret shared index vector. And here is an easy example. Basically the information flow is the following: we have a secret shared 2. In bit decomposition this is 0,1 or 1, 0. And then we see that we basically expand the 0 to 1, 0 and the 1 to 0, 1 and then, the last line basically indicates products. So there we see that from the 2 is the third element in the last column that is 1 and every other element is 0. This is just as an example. How much more time? >>: Four minutes. >> Marcel Keller: Four minutes. Okay, I think, yeah, I'm running very much out of time. So I will not have much time to talk about tree-based ORAM. So that's like a new code scheme by Elaine Shi and others. We have some access timings here of the whole implementation. And one can see obviously the simple array is much worse in terms of asymptotic complexity. And then, there are those two variants that I don't have time to talk about. Then we implemented the oblivious priority queue, and this uses these heap constructions. And I've talked about it before. So value-priority pairs: we have this operation, minimum priority removal, insert a new pair and updating. And we implement it using two oblivious arrays. It's a very well known construction, like how you put the tree into an array. And then, we need another array which is kind of like what we have here at the bottom two. For the updating procedure we need to find elements in the tree. We [inaudible] the value; we need to find an element in the tree to lower the priority there. So unfortunately I also won't have time to talk in detail about Dijkstra's algorithm which is an algorithm of a shortest path in a graph and basically works by going over all the vertices and updating the distance from some source node S as you go along using some priority queue over there. I'm really sorry for this. So now we get to the real applications. So how did we implement Dijkstra's algorithm in MPC? Of course we use oblivious... >>: What is secret -- What is Dijkstra's algorithm that you did? What is the security requirement? [Inaudible]... >> Marcel Keller: So basically the security that we get, everything is secret apart from the number of vertices and edges. So the number of vertices and edges is the only thing that is public because that also is correlated to the running time of the whole thing. But graph structure is completely hidden in oblivious arrays. We use like two of them. One for the vertices and one for the edge. We also use the priority queue, the oblivious priority queue because that is what you need for Dijkstra's algorithm. And then, basically we use this neat trick... >>: [Inaudible] I still don't get -- So there are two parties. One party has the graph and the other party has what? >> Marcel Keller: So we don't really specify that. Basically the graph has to go into the oblivious array somehow. It can be one -- Yeah, usually it will be line one party, someone having the graph, and then just this party will do the appropriate secret sharing information and one will have to build up the oblivious array of course. So where am I? Basically what I didn't really get to explain is that we have these two nested loops in the Dijkstra's algorithm so basically we walk over all vertices in a particular order, and within that over each neighbor of every vertex. And this is not so nice in an MPC setting because it kind of would reveal how many times of each loop; that would give information away about the graph structure. Basically what we do, we kind of combine those two loops into one big loop walking over all edges in the right order. The outer loop body, so that's the vertex-related loop body, we will do some computation every time but actually only dummy computation if it's not relevant. And this is exactly what we need this obscuring the access operation: are we really writing or are we just pretending to write? And this brings me pretty close. So obviously because we're using this [inaudible] ORAMs that only have polylog overhead, we get polylog overhead over the classical algorithm in our MPC implementation; whereas, our previous work had polynomial overhead. And this is what the timings look like. So we have the implementation with no ORAM; that is previous work. And then, we have implementation of our simple array, and then the tree-based array. Obviously in the long term, tree-based array is the fastest. And when I say tree-based array here, in this case we use the original tree-based ORAM from Asiacrypt 2011. Please let me just wrap up because that's my last slide. So the conclusion is we think we need a dedicated compiler even for secret sharing based MPC. It seems to us that previously there wasn't really a compiler for secret sharing based. [Inaudible]. We're not aware of any compiler that does this scale of IO optimization where obviously for garbled circuits there have been compilers around for a long time. And we think we also need them for secret sharing based MPC. And then, of course oblivious data structures in MPC are feasible and cool. And this concludes my talk. [Applause] >>: How are you compiling the program into oblivious data structures? For instance, let's say your program has two data structures. One is an array and the other is a, let's say, priority queue and your program can have a control-branch dependent on sensitive variable. Let's say S is the sensitive variable and it's the last bit of the secret key or something. And maybe your program is like if S equal to 1, you access the array and if S equal to 0, you access the priority queue. And if you put the array and the priority queue in two different oblivious data structures then you will reveal the fact of which data structure is being accessed and that will leak information. >> Marcel Keller: Yes, you're perfectly right. So that's what we don't do. So we basically say we have a particular -- we basically don't say we can implement any branching programming in MPC. What we say, "We can now have oblivious data structures in MPC." But that is only to the extent that we can access a secret index of an array or that we can do oblivious operations on a priority queue. But what we cannot do is branching on sensitive variable and then do one thing or the other thing depending on the variable. That is what we don't do here. >>: With Dijkstra's algorithm, is it admitted by the compiler? Or is that a handconstructed circuit? Because if it's admitted by the compiler... >> Marcel Keller: Sorry? >>: The Dijkstra's algorithm that you did. >> Marcel Keller: Yeah. >>: Is it admitted by compiler or is it a hand-constructed circuit? Because if it's admitted by the compiler then if your compiler doesn't ensure the security then it's not guaranteed that the implemented [inaudible] will be secure. >> Marcel Keller: I'm not sure if I understand the question. So basically we program this Dijkstra's algorithm or our flavor of the Dijkstra's algorithm that only has the single loop. We implemented that in our specific language which is more or less Python. And this is compiled by the compiler into a circuit, but like a bit more of a circuit because in MPC you an obviously loop on public information. >>: [Inaudible] doesn't allow branching on secret variable [inaudible]... >> Marcel Keller: Yes. That's correct. It does not allow branching on secret variables. That's the important thing. But what we show here in our work is that we don't need to do that for implementing Dijkstra's algorithm in MPC. >>: Okay, so if anybody else has questions for the speaker, perhaps we can take them offline so we can move on to our next speaker. >> Marcel Keller: Thank you. >>: And actually let's give Marcel one more hand here. [Applause] So our next speaker, Mariana Raykova, did her PhD at Columbia University. During that time I was fortunate enough to have her as an intern here. So she's no stranger to Microsoft. After that she went on to a post-doc at IBM and now has finally come out to the correct side of the country and is working at SRI down in California. >> Mariana Raykova: Thank you, Brian. Thanks for the introduction once again. I'm going to tell you about some work that maybe would repeat a lot of the stuff that the previous speaker told you, so you will have a chance to understand him even better from my talk. And hopefully some of the things that he didn't have time to explain, as oblivious RAM structures, I will explain in my talk. So this in work that's in collaboration Craig Gentry, Shai Halevi, Charanjit Jutla and Daniel Wichs. So -- Oops, something is not working. We will be considering the question of two-party computation where we have Bob and Alice who want to compute some function that depends on their inputs. And as we heard in many of the previous talks, the goal of this computation will be security which would mean that whatever protocol they are running doesn't reveal anything more than the final output. It doesn't reveal anything more about their input. In many of the previous talks that we heard, the way Alice and Bob were going to compute was using a circuit and in particular garbling the circuit and doing the evaluation of this representation of the circuit by gate evaluation of this circuit. So this was one very widely adopted method of computation; however, there are different ways to represent computation. And one of these computational models is random access machines which was fortunate to be introduced by the previous speaker. But, I will repeat for you once again what is a random access machine. So we can think of a random access machine as the CPU that does some small computation that can fit on the data that's stored in the registers of the CPU. And then, from time to time in between these computations it needs to go off to main memory and read or write some data to that main memory. So this is basically the intuitive description of the RAM Computation Model. And why do we want to use the RAM Computation Model? Well because RAM representation is always more efficient than a circuit. In particular we have this result that tells us if you start from RAM with a particular running time what will be the size of the circuit that would be computing the same circuit. So what I will be proposing for you is that if you want to do computation, really first convert to a RAM representation and then we will use this computation for our secure computation protocol from now on. And the gain that we get from using a RAM representation for our computation is even more pronounced if we are thinking of a scenario where one of the inputs will be really large. For example, we have some huge database of social data. And the computation that we'll be performing will be actually running in sublinear time of the size of our input. So if Alice here wants to search something in this huge database, we know that we have many algorithms for search that can run in sublinear time in the size of the database. And if we were to do this with a secure computation protocol, we need to hide which positions of the database or the execution of our query touched because, otherwise, we will be revealing information beyond the output of the computation. So inherently if we want to use secure computation protocol based on circuits and we want to make sure that we touch each record in the database, we will have to have a circuit that's as big as the memory. So we will immediately lose our sublinear efficiency that we were having in the starting algorithm. If we use, however, the RAM representation then we still have hope to preserve the sublinear computation. >>: I have a question. You said that RAM is always more efficient than circuits. >> Mariana Raykova: Yeah. >>: But you're not claiming that secure computation with RAM is always more efficient than secure computation [inaudible]. >> Mariana Raykova: I will be eventually claiming that even the overhead that we'll get from secure computation would be still better than the overhead... >>: In this case [inaudible] data in the sublinear? Or are you saying always? Forget everything we saw so far. It's the RAM? >> Mariana Raykova: Always. I would [inaudible]. Yeah? >>: Consider a scenario where you first load the ORAM once and then you ignore that [inaudible] execution [inaudible]. >> Mariana Raykova: Yeah, so we will be looking at the amortized case where you will be executing many computations once you have the preprocessing state. >>: [Inaudible] is efficient, right? It's just a linear [inaudible]? Or what exactly initialized [inaudible] the overhead? >> Mariana Raykova: So it depends on your setup exactly. You might need to initialize using the secure protocol as well if this data is shared between the two parties. Or if this data belongs to one party, maybe you can initialize it immediately. So I'll be considering the amortized case where you will be having many executions over that. So how would we do this secure computation starting from the RAM Computation Model? Well for the work of the CPU which will be relatively small or a constant size circuit, we can use existing secure computation protocols that are using circuits. Because in this case, this is really independent of the running time of our computation, the small circuits representing the CPU. And then when we need to go off and do our reads and writes into memory we will use the oblivious RAM structures which were introduced by Goldreich and Ostrovsky. The basic properties that oblivious RAM guarantees us is that we have a way to store and access data into memory that hides the access pattern into memory, so you cannot distinguish two sequences of accesses that have the same length. And the second property is the fact that we have efficient access protocols which means that the overhead of our access into memory has polylogarithmic complexity in the size of the data. And when we are talking about the setting of two-party computation, we will be instantiating these accesses into ORAM using ORAM with shared parameters between the two parties that are participating in the protocol. So this has been done. This type of computation has been done. It appeared in the work of Ostrovsky and Shoup in '97 and also I have at least three co-authors here from this second work where we did secure computation with this model. We used the Yao garbled circuits to instantiate the work of the CPU and then to instantiate each step of the computation of the ORAM accesses. So I have to point out that if we want to use secure computation for RAM, we will also be revealing the running time of the RAM. So if you choose, you might want to add this to a particular length if you want to be hiding between different inputs. So what is this work? What we did in this work is we primarily considered two-party computation protocol of a private keyword search where one party will have a keyword and it will be searching in a database that is not necessarily owned by the second party but it might be coming from a third party. And the second party will be just acting as the storage for this database. So the point is that neither of the two parties should know or should be able to see what is the data in that database. What is our contribution? Really we looked at one particular ORAM construction which was also described in the previous talk. This is the construction of [inaudible] that was the first one that introduced the tree ORAM structure. So we optimized this ORAM in several ways. We modified the underlying tree and the underlying eviction algorithm for this ORAM structure. And we also did an optimization for this ORAM structure that allows us to do binary search using a single ORAM as opposed to logarithmic number of ORAM accesses. >>: So this ORAM optimization regardless of the multiparty computation? >> Mariana Raykova: Yes. >>: [Inaudible]... >> Mariana Raykova: This was an optimization on the ORAM construction that we would use in the secure computation protocol. And what we actually did is that we -- for the secure computation steps we didn't use Yao garbled circuits but we actually devised protocols using homomorphic encryption for the two-party computation steps. And our goal was to use low degree polynomial, so we didn't want to use the full power of FHE, but we wanted to use low degree polynomials and explore the benefits that this can provide us in terms of communication. In general FHE can buy you the most communication gain, but when we went to lower degree polynomials then we were trading off these gains in communication for better computation. And actually from a computational point of view, we were guessing that for some particular tailored protocol, actually, these protocols based on homomorphic encryption might be competitive with approaches based on Yao. So I will see how much I will manage to cover in this talk but hopefully I will at least show you how ORAM works and what are our optimizations. So let me start with describing the ORAM construction that was introduced by Elaine and her other co-authors. So the main idea of this ORAM structure is that we will store the data in a tree structure. So you have a binary tree. And then, the size of each of the nodes in this binary tree will be logarithmic in the total size of the database. Each element that's stored in the binary tree will be associated with a particular leaf of the binary tree. And what this means is that when we have a particular record associated with a leaf this means that the only location it is allowed to reside in the ORAM structure are on the path that connects the root of the tree with the corresponding leaf. So now if we want to make an ORAM look-up and we want to access a particular virtual address in our database, what we will do is we will find the leaf identifier for this record then we will trace all the records in the path leading to that leaf. So once we find what we've been looking for, this record, then we have to move it in the ORAM structure. And the way we will move it is we will insert it back in the root node. But before we insert it in the root, we will have to assign it to a different leaf in the tree. This will guarantee that the next time we have to look for the same record, we will be actually traversing a different path in the tree and we'll be hiding our access pattern. So once we have assigned the new identifier in the record, we are ready to insert it in the tree. Obviously with this approach for accessing data, we'll have the issue that our root will start very soon to overflow. So the approach that the others of that work suggested for balancing the load of the tree was to select -- after each ORAM access we will select two nodes in each level of the tree. Then we will choose a random record in each of the nodes and we will evict that record in the corresponding child of the parent. And the way we need to do this is in an oblivious fashion which means that we will have to touch both of the children so that we hide from the server which of the children we are evicting to. So this is how we were accessing something, but let's look a little bit more carefully. What do we really need in order to access the ORAM, the binary tree ORAM? We need for each record in the database to be able to efficiently look up its tree identifier. And if the client needs to store all of this information then the client will need to have memory that's proportional to the database. And what we are really aiming for is a client that can have only constant memory storage. So in order to manage this increase of the memory for the client, we will store these data-leaf pairs recursively. And what this means is that we will pack them in several groups together and we will call each of these groups a new record in our database. And will store this information recursively. So we start with the main tree which contains all the database records. Then, in the second tree we store record-leaf pairs. And then, we continue in the same way until we get a total number of log entries. So our access will scan the smallest tree which is a constant size then you will use what you found in the constant size tree to scan the next tree until you reach to the main tree containing the data. So this was the construction that we started with. And what are the optimizations that we devised? The first thing that we did is to reduce the depth of the tree. So the depth that was suggested by the authors of the construction was to have a tree of depth log N where each node contains K items where K has to be the security parameter for the scheme. What we showed is that actually we can cut the tree short. We can have depth that's only log of N divided by the security parameter K, and we still get the appropriate negligible overflow probabilities. So this reduces the total height of the tree. In order to do this we need to increase the size of the nodes from K to 2K. And the result of this optimization is that we improve the constant in our storage parameter, and also we improve the computation because now the trees that we have to traverse will be shorter. So this was the first optimization that we did. The next one was to actually increase the branching program of the binary tree that's storing the data. So this means, again, smaller depth because we looking at the logarithm with a larger base. >>: It's the same K? >> Mariana Raykova: Yeah, we are using the same security parameter. And it would also affect the computational complexity of the ORAM. So once we do this optimization there is a question of what happens with the eviction algorithm because if we were doing the same thing that was proposed before to select two nodes in each level, and if we wanted to maintain the obliviousness of the scheme we had to access each of the K children of a particular node. So this was going to introduce overhead in the eviction procedure. That's why we devised a new eviction which doesn't choose nodes in a particular level but rather chooses a path on which we will evict. And actually we showed that it's enough to have a deterministic eviction schedule on the paths. And this deterministic eviction schedule guarantees that in each level you basically do a roundrobin -- you circle through all the nodes in a round-robin fashion. And in order to do this where you need to look is basically at the reverse order of the digits of the leaves where you're looking at the digits base K where K is, again, our security parameters. So this is just guaranteeing that you are touching each of the nodes in one level in the correct order. So we showed that we did deterministic encryption. We still have the right overflow probabilities. I must point out that there is a newer work by a subset of the previous authors and some new ones that is called Path ORAM, and it also uses eviction that actually evicts on the look-up path. My understanding is that with you might need a little more storage on the client. I'm not sure whether our optimization would be applicable immediately to that scheme; we have not explored what happens if we use that new scheme. So the next optimization was how to change the ORAM so that we can do binary search using only a single ORAM access as opposed to log N ORAM accesses. So let's see how we do that. We start with sorted data. Of course if we want to binary search, we should have the data sorted. So what this represents here is these are virtual addresses for the purposes of... >>: One access per RAM operation? One access per what? Binary search is log N even without the O. >> Mariana Raykova: Yeah, you have to have log N access in the database. >>: Yeah. >>: But we do it with a single ORAM access. The ORAM access already has the polylogarithmic number of physical addresses from the memory, right? >>: So [inaudible] binary search [inaudible] a single ORAM access. >> Mariana Raykova: Yeah, a single ORAM access. So these will be the virtual addresses which we usually use in look-up in ORAM look-up. This will be the field that we use for the binary search and then this is the data that we aiming to fetch. So let's remind ourselves what is the immediate record in one of the search in ORAM? It contains these records that pack inside several virtual address-leaf pairs. So if we want to look up a particular virtual address which happens in a regular ORAM access, what we do is we look in the intermediate tree, TI, then we fetch the label of the leaf that we have to look in the next tree. Then, we derive the particular virtual address we will look in that next tree based on the virtual address that we are looking in general. So I have a deterministic way to map the virtual address that's the input for the ORAM into a new value that will be looked in each of the intermediate trees. So once we have identified that virtual address in this record that's packing leaf and virtual address pairs, we proceed recursively in the next tree. So what will be the change if we want to do directly a look-up based on these fields in our database? Well, let's see. We have these pairs of virtual addresses and leaves and then, we were creating these recursive trees, we can see what are the subsets of virtual addresses in the original database that eventually get mapped to this intermediate virtual address and leaf pair that lives in one of the intermediate trees. And we can also see what are the ranges of the actual data. These are the values of the data fields on which we are doing the binary search. And we can just store, together with this virtual address and leaf pair, the range of the data values that eventually get mapped to these virtual addresses in the intermediate tree. But we are shrinking down this range of records, of the original records and they get packed and packed in intermediate records. So we can just store the appropriate range of the original data. So now when we start to do the search through these recursive binary trees, what we will do is we will find in the particular tree the leaf label on which path we will have to look. But then, we will decide what virtual address in the intermediate tree we will be looking for by doing a check with the ranges that we are storing with our records and seeing in a particular record that we are packing with two virtual addresses whether we should look for Vi1 or for Vi2. And we will make this decision during the search based on the comparison of the binary search index that we are looking for, d, and how it compares with these stored ranges that we have added to the tree. So this is basically the modification that we introduced which allows us in each binary tree to not look for one particular virtual address but to decide for which virtual address we will look based on these comparisons with the ranges of the data that we have added to the structure. So this would allow us basically to find our data index in one swipe through the series of log N binary trees. Okay, so these were basically all the modifications that we did for the ORAM. Now I will try to give you just some taste of... >>: Can you use any specific property of oblivious RAM [inaudible]? >> Mariana Raykova: So I think the property -- We are leveraging the fact that it's already in this binary tree structure. That allows you to map a whole consecutive interval from the starting data into a virtual address that's an intermediate record. The way you transition from one tree to another tree is to pack a consecutive interval of virtual addresses that already in the starting tree. Those correspond to a consecutive interval of your data that's already sorted. So you can just store the value in the beginning and the value in the end. And you'll propagate this with the intermediate record. So basically each intermediate record maps back to a consecutive range in the original data, and so you store the two endpoints of the data values and you can check on those as [inaudible]. But I don't see how you can do this with the hierarchical ORAMs. I think this leveraging on the fact that you have already this binary search structure embedded in your ORAM. So thanks, Elaine. Okay, these are the modifications of the ORAM. Let me just give you a flavor of what type of tailored protocols we come up with using homomorphic encryption. I will just give you an example with equal-to-zero protocol. So we will have some encrypted value that we will want to compare whether it's equal-to-zero. For example, our client will be holding this value X that's encrypted under a key, that server's key. So our two-party will be client and server. And we want to compute at the end an encryption of the bit which says whether this value is zero or not. So what we can do is we can have the client choose a random N-bit number R, and he can apply homomorphically the addition. So we have the property now that this ciphertext C encrypts R if and only if X is equal to zero. So the [inaudible] can send to the server this new ciphertext C and he can also send encryptions of the bits of the random number that he generated. So these bits are going to be in a plaintext space mod-2 to the m which is bigger than the original N-bit number R. So now what the server can do is he can use the fact that you can express XORs of the bits of two numbers with this arithmetic operation when you're in the appropriate module for the FHE. So he can basically compute an XOR on the bits of the encrypted number that's still under the client's -- that he decrypted using his key so he could decrypt X plus R. But then he also uses the encryptions under the client's key for the bits R sub J to compute this XOR. So now we have the property that if X the original value was equal to zero then all the bits S sub J prime that he computed using this XOR operation are also encrypting zero. What we can do now is we can add up all the encryption of the bits and we have the property that the resulting sum is equal to zero if and only if our original X was equal to zero. So how this helped us was to reduce the plaintext space from module M to the N to module 2 to the M. So we can repeat this step, reducing the plaintext in several steps where at the end we will have an encryption that's in plaintext space that's some constant module. And then, at this point you can evaluate any depth to circuit using our low degree homomorphic encryption. So this is just a flavor of what these old protocols look like. And I know that when you hear homomorphic encryption, you probably are thinking that this is like super, super inefficient. So what I decided to do is I took a graph of our CCS paper which implemented what I showed you using the Yao garbled circuits. And this is basically the proof that using ORAM eventually gets better than just using a plain Yao on the whole database. At that point we compute what the database is for which we actually outperform basic Yao. And then, I decided to see where I got the evaluation of -- This implementation with the homomorphic encryption is still undergoing, but I got one point because we are considering only huge databases so the only thing that I had was actually a database that's 2 to the 22. And so we didn't have such information for our previous paper, but for such a database where the item size is only 112 bits not 512 bits as we had here. But I'm not sure how much this makes a difference since this is affecting only the biggest tree in the ORAM. This is where the point was. It was about 1400 seconds. The binary search on a database of size 2 to the 22 using this, our tailored homomorphic encryption protocols. So I don't know how this graph exactly extends here. Also I don't have information whether the hardware on these tree evaluations were run is the same. But this is the best comparison I could get. >>: So if I understand correctly, they look comparable. Yeah? >> Mariana Raykova: Yeah, they look comparable. And this without the optimization using the SIMD operations -- I know I have one minute -- without using the SIMD operations which basically allow you to pack many homomorphic encryptions in the same ciphertext. So we did it to improve even better. >>: So it's not always better. For small data sets, Yao is better. >> Mariana Raykova: Right, you have a lot of constants here that are hidden before you get better than the Yao. >>: [Inaudible] back to my question... >> Mariana Raykova: So in conclusion, I think the three things I want to say are: if you want to do secure computation, I think you should think about using RAMs as opposed to just using circuits. Also, devising specialized ORAMs for different computation, just as in the case for binary search, might be interesting, trying to actually leverage the fact that already the ORAM is doing log N accesses or polylog N accesses. So maybe you can incorporate a bigger part of your computation in the ORAM. And the third one is don't immediately discard low degree homomorphic encryption and SIMD operations. They can be efficient. So thank you. And if you have any questions or if we have time, I will take them. Yeah? [Applause] >>: It seems that the ORAM improvements were like generic to ORAM. It wasn't something -- Do you think if you're looking at using ORAM in secure computation then there would be sort of like different optimizations which make it a worse ORAM but better for the use of secure computation? >> Mariana Raykova: I think if you want to do this, you will have to consider specific functionalities just as we did with binary search, right? We did something that makes the ORAM maybe a bit worse because you're storing more but improves the secure computation binary search. >>: I'm afraid we're running a little short on time here, so we're going to move on to the next speaker. But let's give Mariana another hand. [Applause] So our final oblivious speaker here has a slightly different take on what it means to be oblivious. Samee Zahur is a student of Dave Evans at University of Virginia who's done a lot of work on MPC and optimizations there too. And I gather that this work grew out of partly his frustrations with the difficulty of implementing those optimizations, so he's going to help everybody else have an easier time. >> Samee Zahir: So thanks for the intro. So, yes, this is joint work with my advisor David Evans. So the purpose of this work -- what we do in this is that we develop a new language for secure computation. And the idea is that we want it to be easier for researchers to develop their own techniques on an existing system. They shouldn't have to expressing compilers and what not, so we do all the hard work for it. So just to be clear, this is not what we want. I understand that we are sort of adding to the mess but hopefully at the end of my talk I will be able to convince you that we actually add something useful to the system and don't just add to the mess. Okay, since the last two talks were about ORAM, let's just go to that. The first implementation of the hybrid Yao plus the ORAM protocol was in that CCS paper just mentioned. What they ran their experiments on was probably binary search benchmark. So let's say in a hypothetical situation you already have a language that performs Yao and you have your own ORAM implementation, and you just want to sort of plug that in, into an existing system and see how it works with the rest of the Yao circuit and what not. So everything else happens in Yao; just the random access part, that happens in ORAM. This is sort of a normal binary search code written in C; there's nothing special in here. You don't have to sort of read through it. But let's sort of just step through it little by little to show you what things we have to change here to make it compilable in our compiler. So the first thing we do obviously is we change the data types. Some of the data types depend on secret data, others don't. [Inaudible] does not depend on secret data. We also do one thing here: we annotate if conditions with the obliv keyword of the dependent secret data. That helps with the coding. And the other thing is that while loop. It's been mentioned several times here that if you have loop where the termination condition is dependent secret data that's sort of problematic. So we just take the simple approach; we don't allow that. So we sort of rewrite it into something like this. So, the loop itself does not depend on secret data. In many cases it seems that's sort of easy to do when the program itself is not secure. And that's probably it. This code as is can be compiled with our framework. But that's not very interesting, but what's really interesting to me is that our compiler has no built-in notions of ORAM. So that ORAM read line over there, that's just a library function that's added later. So that's sort of interesting because, see, it's enclosed in an oblivious structure where even at runtime we're not going to know whether that condition was true or not. So everything done inside that oblivious structure will have to be done sort of conditionally, right? Assignments will be conditional assignments, increments will be conditional increments and so on and so forth. So somehow we need a language to express these sorts of structures where ORAM needs to network accessing or randomized shuffling, crypto operations, somehow we need to express this that, okay, "In my function that I'm writing these things need to be done unconditionally and these things need to be done conditionally." And we need to sort of figure that out, how that fits into the rest of the language. So that's sort of what this talk is going to be about. So our goal is to have this intermediate language -- not intermediate, a high level language where it sort of combines the best of both worlds, right? Users should be able to write code in an intuitive way; whereas, at the same time library writers should be able to go under the hood and sort of do whatever they need to do to present the user with some sort of an intuitive abstraction. So that's sort of what we are striving for here. So just to explain -- So now what I'm going to do in the next few slides is describe what extensions over normal C that we actually add on top of C, what are the features we add on top of C? The first we've already seen, the conditionals. The way this works is that every time you do conditional operations, the basic ones the compiler already knows about. They are just multiplex between the old value and the new value and that's it. If you have nested conditionals, the conditionals need to get added together the way you'd expect. So the interesting part starts with functions. What do we do with functions? Well, of course just an aside is we obviously have a type system that prevents data dependencies or control dependencies so that non-obliv data does not get modified under the control of obliv data and what not as you'd expect. So about functions, what do we do? We have two families of functions. One is the usual kind, and the other one that has a strange suffix at the end of the [inaudible] functioning [inaudible], those are the ones that can be called from inside an obliv if. Those are the ones that the compiler somehow as figured out how to execute them safely while inside an obliv if. So what we have are these two families of functions. The first family is the normal ones. They have no restrictions on what can be done inside the body of the function, so they can modify non-obliv global variables and what not. Whereas, the second family of functions have the same restrictions as you'd have inside an obliv if but they called from anywhere. And what we do is that during compilation we make sure that everything done inside the second family functions is actually done conditionally. And the condition can be sort of specified from the outside by the caller. Right? So this is sort of an example of the things we can do with it. So just to clarify we also do not have built-in array indexing that is based on secret data. So if you have normal array indexing, we require that index to be a known value because there are just so many ways of doing this we don't want to enforce a default. But if you do need something like that, you can just write this two-line function: arrayWrite. All we do is just run through a loop and go through every single index as you'd expect. And if you have the particular index that you want to write, just write to it. That's it. So what happens when we actually execute this? We get to a condition. Neither party knows whether or not this condition is true or false. We get to this line, arrayWrite. We need to call a function. So we'll just call this function anyway regardless of whether or not the condition was true because we don't know whether this condition is true. And that's sort of the general trend for all obliv if constructs. We get to it and then we run through this loop; i goes from 0 to n. Both parties know that i is going from 0 to n. We -Oops, sorry. Yeah. We are modifying this non-obliv variable even though we are dynamically inside the control of an obliv if structure because we sort of know this is the local variable. This variable will go out of scope. Even if we modify it, it's not going reveal any extra information. So after that we can go on and safely do all the modifications we need to. And finally the other feature that we introduce is that of unconditional segments. So sometimes as we saw in the case of ORAM and as we'll see some more examples later, is that we sometimes need to embed unconditional segments that need to execute regardless of what the enclosing condition is. So here's another for it. Let's say we want to write a library. This is not even a hybrid [inaudible]. We want to write a library that allows this dynamically resizable vector, right? That's a pretty normal thing to do. We want to be able to write this sort of append function even though we don't know whether or not it's being called -- if that if condition was true or false. So once again, think about what we need to do to implement this; we will need to resize whatever internal buffer there is. We will need to reallocate memory unconditionally and then need to do unconditional writing to it. Okay? So that's pretty much it. That's what the function will look like. We will have an unconditional block that does memory allocation and then increments some physical lend. And DynVec here is just a struct, normal C struct. And then do an arrayWrite. That's it. Okay? So this sort of gives us an easy way of deriving conditional versions of functions for the compiler and an easy to specify conditional versions of functions. So that's sort of nice. All through this we sort of have the usual types of [inaudible] that ensure that we're not doing anything stupid. So I know this is sort of messy, but this sort of formally defines what steps we go through to do the compilation process. So convert is just sort of a mathematic function that takes an Obliv-C code and produces normal C code, just plain C. And it takes sort of [inaudible] condition -- So for example if we want to do an assignment in the first line there, we can either do a normal copy if we know that the condition is true or else do a conditional assignment. In case of obliv if we have to execute both segments no matter what but we do it under different conditions and things like that. In case of that unconditional statement as expected, we just compiled it as if it was outside everything else. So that's pretty much it. To implement something like this we actually have the implementation; it's available online for you to use. The way we do it is that we do it in terms of a preprocessor before GCC. So we convert everything to normal C using a custom version of C Intermediate Language and then, we just let it recompile by GCC. We have all the runtime libraries already written up for you. And that way what we get at the end of it is sort of a binary native [inaudible] program for performing whatever protocol you want to perform. Right? So there is no sort of intermediate language that is interpreted at runtime or anything; it's just binary program. So that's sort of nice. This slide sort of shows -- I know it's sort of ugly but it's the ugliest code I'll show you. This slide sort of shows a complete example for performing Yao's millionaire problem in our language. So on the left side you have Obliv-C code and on the right side you have normal C code for launching Obliv-C function. So just to step through it one by one: you start executing this function. You say that, okay, this normal integer, MyInput. Party 1's copy of MyInput needs to entered into Variable A. Party 2's copy of MyInput is fed into Variable B. And then, you just do a normal comparison and then you reveal it. And when you reveal it, it gets -- When it says 0 it's revealed to everybody. And so everybody gets the value of res inside result essentially. And on the C side you just say setCurrentParty, so I'm Party 1, I'm Party 2 and what not, and then you just say execYaoProtocol and then you claim the result. That's it. And the nice thing about this is on the Obliv-C side it's not -- you just specify the computation in a data oblivious way. You don't necessarily have to say that it's Yao. So you can sort of at runtime decide which protocol you want to execute. In this case it's Yao semi-honest, but we also have implementations for the dual execution version of it where you have stronger guarantees against a malicious adversary where you leak [inaudible] bit but it's against [inaudible] adversary. So if you want to, let's say, have your own protocol, all you have to do is just write a function like that. And we have runtime hooks that specify how to do and operations, or operations, how to do inputs and how to do outputs. And those get automatically called from Obliv-C code. So inside -- You just have to write the function like that that say execute dual execution protocol and what not, and in there all you have to do is just set those runtime hooks, execute this Obliv-C function, and that's it. So for dual execution all we do is that we first thread out, set the hooks separately, execute the same function twice in two different threads and then before output we [inaudible] them back. The other thing we have just for the sake of development, we also have execDebug protocol. Guess what that does. Yes? >>: So if you wanted to use some Java program that has data access, the ORAM code would be in your language, should be written in your language... >> Samee Zahir: Yes. >>: So I understand. Okay. >> Samee Zahir: Or, I mean, if you have already a program written in C, it's also easy to just wrap it in my language. I'll have normal Obliv-C function that will just make a call to the C library; that's also easy to do because it's simple preprocessing before going to C. So it's very easy to link C code together. That's easy to do. >>: I'm a little confused: is the programmer deciding which variable is the ORAM or is the compiler deciding that? >> Samee Zahir: Programmer. >>: Programmer is deciding that. >> Samee Zahir: Yes. >>: And the type system ensures that the decision is correct. >> Samee Zahir: Yes. If you try to do something -- If you try to directly leak information - The only thing that will get leaked if you explicitly say reveal; other than that nothing is going to leak. Yes, the type information ensures that. Yes. Any other questions? So, yes, that's sort of what we do. And we have already sort of implemented libraries for these things. We've already mentioned these things so I'm not going to go through them again. The other protocol we have is dual execution. We have ORAM implementation. Dynamically sized vectors. Range tracked integers is pretty simple. A common optimization that, let's say, if both parties know that regardless of the input one particular integer in this place of the program will never be more than 5 bits. In that case, we can sort of use shorter circuits to do that addition. We don't need 30-bit addition or 30-bit addition circuit. So what we do in the range tracked integers is a simple struct that carries both the normal integer as well as conservative estimates of bounds -- upper bound and lower bound -- so that when you're doing the addition you can just normal addition operations with smaller circuits as well as you can track whatever the ranges were. So these are sort of examples of things that would be hard to do in other existing languages today because this concept of allowing language programs to do some things unconditionally while at the same time having other things do conditionally and make sure the whole thing sort of works in a sort of seamless manner: that's sort of what we consider our contribution here to be. Current status of the project? Performance is competitive. I don't have concrete performance numbers but they're competitive with other implementations out there. We do not have floating point yet implemented. There's no fundamental reason why we couldn't be able to do that, but we just haven't gotten around to it yet. Same for logical operations like and, and, logical and, and; and or, or because in normal C you execute something and you see whether or not it's true and then execute [inaudible]. That's sort of the C local operator. It can be done but right now we just made do with normal bitwise operations and Boolean types. It works. Error reporting, yeah, we're working on it. So, yeah, my suggestion is if you're coming up with circuit libraries, you should be able to just test it. Using our system you should be able to just test it on all existing protocols and not have to worry about making your own framework and what not. Similarly if you have new protocols, you should be able to just test it on all existing benchmarks. I have lost count of how many times I've implement AES. It shouldn't be necessary. And protocol enhancements like ORAM: if you have particular subprotocols for doing some things very efficient like multiplication, if you want to, you just finite fields for that. I don’t know. In that case if you have one particular protocol that you want target and you want to enhance that particular operation for it, this is really good for that. So, yeah, hopefully this will be useful to everyone here. Let me know if this is, if this isn't, and it's free for download. So that's it. [Applause] >>: We have time for some questions. >>: If I remember correctly the information flow [inaudible] was kind of [inaudible] that the solutions were either very restricted or not that useful, meaning that they leaked so much that they weren't that useful. And this is about two years ago. So I don't know what you have done or [inaudible]... >> Samee Zahir: I don't know which paper you are talking about? >>: No, not a specific paper. My Master's was in this [inaudible] general. >> Samee Zahir: So you might know more about this than I would. In general, I can tell you that the trade-off sort of that we adopt is that, yes, our type system is sort of flexible in the sense that -- As you saw, we allow certain local variables to be modified even though they're not obliv but it's inside [inaudible]. We do have a proof that unless you do reveal operations, you will not leak something. And even if the proof is too complicated, I can just explain it to you in this way, that in terms of the implementation when you're rewriting to C, we never introduce new obliv functions. So the crypto should take care of the rest really. On the other hand, since we have funny constructs like unconditional structures and what not, the programmer if he's not careful can do unintuitive things like if you're inside a false condition and still you're modifying variable and what not. So the trade-off that we go for is that we assumed that the programmer knows what they're doing, but we won't accidentally reveal information. That's not going to happen. >>: Any other questions? >>: How is this true for different ORAM implementations? I guess if I want to have my one [inaudible]... >> Samee Zahir: Yeah. So like the one that I showed here, that's not built into the compiler; that's also a library. So it would be pretty much the same except -- Yeah, you just have to do the network transfers and everything on your own. But if you want to know more about it, I can send you the code for my ORAM library if you want. So, yeah. That's works. And, yeah, all this is just compiled through native code so you get sort of a speedup there. >>: All right. With that, let's have one more round of applause for all the oblivious speakers in this round. [Applause]