>> Melissa Chase: So we're very happy to have Mike Rosulek visiting us this week. He's worked in a variety of different areas, but particularly on secure computation, both theory and practice, and today he's going to tell us about one of his [indiscernible]. >> Mike Rosulek: Thanks. So thanks, Melissa and Sonny, for letting me come up here and invite myself and hosting a talk. So today I'm going to be talking about some joint work with Arash Afshar and Payman Mohassel from Calgary, sort of -- Payman is in transition -- and Zhangxiang Hu is a Ph.D. student of mine. So this is a talk about two-party secure computation, and I realized, like, half an hour ago, that I didn't really have a slide saying what secure computation is. I don't know if that's going to be a problem in this audience. Okay. Awesome. So this is a slide that -- or a visual that David Evans likes to use where he estimated the cost of doing human genome comparison or human genome sequencing, so the blue line is the cost of sequencing the human genome without any security, just the raw computational cost historically. And you see that's gone down. And then the other lines are the estimated costs of what it would -- how much it would cost to sequence a genome inside of a secure computation achieving semi-honest security or malicious security. And these are just estimates, but I think it's impressive the amount of progress that we've got so far. We've gone from secure computation in the '80s being just feasibility results to in the last decade it's actually practical for some domains. Maybe not practical yet for genome comparisons, but we've come a long way. And so I just want to talk about the state-of-the-art -- yeah, Josh. >>: I was thinking, semi-honest [inaudible] semi-honest now. >>: I thought that, too, and I was hoping no one would notice, because this is not -- this is just something I shamelessly stole. And in fact, I couldn't find the one that he used when he was here a couple months ago for the MPC workshop. The graph, I think it falls off the end. It falls down to zero at some point because circuit garbling is so cheap now and OT extension is so cheap, and so semi-honest computation is like basically free. I think there are other -- probably this is leveling off due to just some economic forces and not due to some limitations of technology. That would be my guess. But yeah. But that's -- what a great win for cryptography. It's cheaper than doing it out in the clear. So come to us. Do things within a secure computation. It's cheaper than without. Yeah. I don't know how seriously we're meant to take this graph, but qualitatively it's -- we've come a long way. The state-of-the-art in 2PC protocols, for general purpose, 2PC protocols, they typically will start out -- the papers in the area will start out like this. So if you want to compute -- if you want to do a secure computation of sum F, the first thing you do is express F as a boolean security, you'll do something with that circuit. That's kind of where we are now. And for certain things like AES, that's not so bad because AES, in any case, is designed to have, you know, nice-looking circuits because people want to implement it in hardware. This is the AES S-box. I don't remember exactly how many gates it is, but it's small enough to fit on a slide. This is just one S-box. You need 208 of these to do AES 256. So it's not a terribly large circuit. It's certainly not millions of gates. Like tens of thousands of gates maybe. But for other things like ->>: [inaudible] 6,000? >> Mike Rosulek: >>: Like the whole AS. >> Mike Rosulek: >>: Yeah, it's -- Small. Yeah, it's -- >> Mike Rosulek: Yeah, so actually it's 208 of these plus the key schedule, so it's between 10,000 and 100,000. Yeah. I don't know the number. If you think of a more realistic problem, like the stable marriage problem, it's very iterative, memory intensive. If you wanted to write that as a Boolean circuit, you would not have a good day at the office. So most of the computations that we care about in practice are not really well suited to the circuit model. Like we don't design algorithms for circuits. Another thing that's as simple as binary search, right? The binary search is -- well, when you write something as a circuit, the circuit has to be as large as the input, right? So you don't get any sublinear time algorithms, like binary search is the classical sublinear time algorithm, right? And that's because for security in general, the secure computation has to touch every bit of the input, right? If you run a protocol and you notice that you never touch this input, then that could potentially leak information. So we don't have sublinear time 2PC protocols for that reason. So you might wonder why do we use these circuits anyway? Well, there's a reason that we use circuits. It's because we have this garbled circuit technique that goes all the way back to the beginning from when I was first working in MPC. So probably most the people in the room know this, so I'll go over it pretty fast. The idea is to pick two labels for each wire, just random strings, cryptographic keys, if you like. Randomly assign one of those labels to be true and one to be false, as I've done here. And then think of the truth table of the gate and say here is an and gate and here is the truth table of that gate, and I'm going to replace trues and false with the corresponding labels, and then I'm going to use the input wire labels as keys to encrypt the output wire label. So these will become cipher text, right? So I encrypt using A1 and B0 as a key encrypt, C0. That's one entry in the encrypted truth table. That's the basic idea behind garbled circuits. The reason this works is because, first of all, the association between true and false is random, so the wire labels, themselves, don't leak whether -- I can't look at a wire label and just decide that it's the true wire label or the false wire label. And also if I give you just one wire label for each wire, then you can only open exactly one of the cipher texts and you can only learn exactly one of the labels on the output. So it kind of by induction you only learn one label on each wire, and at the end we can reveal the true/false association. So for our purposes, it's important that if I look at -- if I just know one of the wire labels, I don't know whether it's the true or the false one. I have no idea what just happened. Okay. I'm back. I hope that doesn't keep happening. If I just have one wire label, I can't tell whether it was the true or false one, and I can't guess the other wire label. Those are the two properties that will be important. I'm foreshadowing right now, so keep that in mind. So that's the garbled circuit technique. We'll keep that, but we want to move beyond circuits as the basis for 2PC protocols because of the inherent limitations of circuits. So let's talk about a different model called the random access machine model, and it really is a realistic model of modern computer architecture. So you imagine there's a CPU with some small internal state. Think of those as the registers, the small constant number of registers, a very large memory and in between the CPU and the memory there's a memory bus. And what happens along the memory bus? Well, the CPU will ask to read some data and the memory says okay, here is it is. The CPU might ask to write some data and the memory says yeah, okay, I'll do that. This is the basis of a model computation called the RAM model. Usually in cryptography we care about something called the oblivious RAM. So just looking ahead, we'll be trying to simulate the execution of a RAM program, and these messages along the memory bus will be kind of out in the clear. So we need a way to protect those. There's a way to compile any RAM program into what's called an oblivious RAM where these read/write locations and the data on the memory bus reveal nothing about internal -- the secret input, so the computation. Those constructions are very interesting. They're not really the focus of this talk, so in this talk I'll just assume that I have an oblivious RAM. I won't describe the constructions that get you that. There's some good practical constructions starting from the work of Elaine Shehee [phonetic] and others that are actually relatively practical compared to the earliest constructions, and they only blow things up by a polylog factor in the size of the memory. There's another thing that I'll gloss over is that so to do oblivious RAM, you set up some data structures in memory with some secret keys in the CPU state. To set up those data structures requires an initialization phase where you have to touch all the bits of memory. That's kind of inherent to ORAM, and I won't talk too much about that. Most of the work we'll imagine there's like a preprocessing phase where you do that. So if you want to think of where RAM programs can really do much better than circuits is if you have like a repeated queries on a big database, so you take a long time, you set up the ORAM structures in the database, and then you can repeatedly use that large memory and run even sublinear time queries on it. And also looking ahead, we will use other -- so along this memory bus, you have whether it's a read or a write, you have memory addresses, and you have data going back and forth. We're going to use some other mechanism to protect the data. We only need the ORAM construction, itself, to protect the read/write and the address. I'll call it the metadata obliviousness. So we'll use a different mechanism to protect the raw data, itself. Okay. So I'm going to assume that we have one of these ORAM constructions that computes the function that we want, and we want to use this as the basis of the 2PC protocol. So in the semi-honest model, this has been done. This was work by Gordon, et al. This appeared at CCS in 2012. This is kind of our starting point. So they showed how to emulate a RAM program in a 2PC protocol with semi-honest security. So I'll show you this and then the rest of the talk will be about how to achieve malicious security while doing this. Okay. So again, I'll totally gloss over the ORAM initialization phase. Somehow Bob gets -- so ORAM initialization initializes the memory of the RAM program and also initializes the state. The state has some, like, secret key that unlocks, you know, decodes the data structures in the ORAM memory. So we'll start out with some initialized memory and internal state of the CPU, and we'll secret share it between the two parties. And we'll let Bob, in their protocol, Bob is the one who, like, takes care of the memory. He's, like, he's the keeper of the big memory. But they'll both jointly keep the internal state. Then they're going to use a circuit 2PC to repeatedly evaluate the CPU circuit, okay? And it's a slightly augmented CPU circuit, so they both put in their shares of the state, the shares get reassembled, they go into the CPU. If there's data to be read by the CPU in the step, Bob feeds that in. He's the keeper of the memory. The CPU executes. It outputs a new state and it gets secret shared among the two parties, and let's say Bob learns the memory access for the next step. This is the message that goes across the memory bus, like read or write. So the large box in white is something that they execute using like a garbled circuit 2PC. Okay? And if the CPU is the CPU circuit for an ORAM, then it will be safe to have Bob handle all these memory accesses, right? Because the memory accesses don't leak anything about the inputs of the computation. Okay? So just as an example, if the CPU says that it wants to read something, then in the next time step Bob will fetch that thing from memory and give it his input to the CPU, we're in the semi-honest model, so Bob promises to do this. There's no problems. This is a full-fledged ORAM, so the data in the memory has kind of been encrypted by the ORAM, so it's safe for Bob to know the raw contents of memory. The CPU wants to write, Bob will just do the writing. He's the keeper of the memory. If the previous instruction was a write, then the data coming in is not used in the next step. So they just repeatedly execute this circuit over and over and over again. It spits out commands to talk to memory. So in the semi-honest model, this is nice. The rest of the talk I want to just discuss how to bring this from the semi-honest model to malicious model, and I guess the way I approach it is in the end the techniques that we use are not, you know, that -- they're not like difficult techniques. We'll just find the best techniques from what we already know about circuit-based MPC. But there's a lot of ways to do it wrong, so I want to focus on, like, there are easy ways to do it wrong and there are easy ways to do it right, but have to have really high overhead. So I just want to find the right selection of techniques from circuit-based 2PC that are most appropriate in this setting. So in the malicious model, we're still going to evaluate the CPU circuit over and over again. We have to do that inside a malicious 2PC now, obviously. That's kind of the main thing. So that will protect what's inside this white box, if the white box represents malicious secure 2PC, but that doesn't protect what goes into the box. The parties are still malicious, they might send different stuff into the box, right? So if we're secret sharing the state, then a party might, you know, lie about what his share of the state is. Bob might lie about what's in memory when the CPU is reading from memory. Okay. So this is kind of the major thing we have to deal with. One way to fix it to use a MAC. So we can say that the CPU circuit outputs some stuff and it all gets MACed with some secret keys, and then when you put stuff back into the CPU, you verify that it -- you verify the MAC to make sure that it hasn't, you know, it hasn't been changed from last time. You can definitely do this, and this will lead to a malicious secure protocol. But I don't quite like this because now I'm doing a malicious -- you know, this white box got bigger on the page and it got bigger as a garbled circuit also, because now I have extra circuitry for all these MACs and all these verifications, right? If the CPU is kind of like noncryptographic, I don't want to like add extra crypto on top of it, right? It's already expensive enough to do a malicious secure 2PC of the crypto algorithm. Even a one-time MAC is not really a crypto algorithm. You should think of like a one-time MAC as like a multiplication. Still not a small circuit, by any means. And in any case, the solution that we eventually come up with has us adding nothing to the circuit. So adding anything to the circuit is kind of not fun, yeah. >>: [inaudible] crypto through the ORAM [inaudible] encrypting data [inaudible]. >> Mike Rosulek: So if it's a full ORAM, then it does encrypt. It's like the contents of memory. The way an ORAM protects the addresses, though, is purely combinatorial, and so we're actually going to -- we actually require an ORAM that only protects the addresses and doesn't need to even protect the contents of memory. So actually you're not already having -- you may not already have an encryption circuit inside of the ORAM. >>: Is that true for the previous [inaudible]? >> Mike Rosulek: The previous one, they had to do some extra tricks to get the -- they're in a semi-honest model, so they could factor out some of the encryption, like let Alice be in charge of the encryption somehow. So with a specific ORAM construction, they were able to move the encryption circuit out of the 2PC, but as their general paradigm they would have the encryption circuit inside of the CPU. >>: [inaudible]. You handle the instruction stuff, [inaudible] encryptions must be a [inaudible] is that also outside of the ->> Mike Rosulek: So the reshuffling, I mean, so we're not -- I guess in our work, we're not really looking inside the ORAM, but that's all combinatorial. I mean, you're not doing a shuffle of, like, based on a pseudo random function. If you imagine like path ORAM or trio RAM. >>: [inaudible]. >> Mike Rosulek: Yeah, if you're using one of the other ORAMs, it does some serious shuffling. >>: The original ones. >> Mike Rosulek: >>: Yeah. Okay. >> Mike Rosulek: Yeah, so if you're thinking of the path ORAMs and you take away the encryption circuit, then it's all just like assign something to a random leaf and just traverse buckets. It's all very combinatorial. There's no crypto circuitry inside that. I guess that's what I have in my mind for the kinds of ORAMs you might use. But certainly some ORAMs would have bad circuits, for sure. Yeah. >>: The things you're writing to ORAM, Bob can cut them or they're protected in some other way? >> Mike Rosulek: Yeah. So in our protocol we'll protect them in a different way. It's not clear what that way is yet, but just I guess the point of this slide is we don't want to add extra crypto circuitry to this malicious secure 2PC. And also, we really don't want to add lots of extra inputs, all right? So we'll be inputting shares of the secret state and we'll be inputting MACs. And in a malicious secure 2PC, there's a high cost to pay for input consistency checks, if you're familiar with those techniques. So inputs to a 2PC are expensive. You have to have OTs. You have to have these kind of various checks to make sure that the same -- the input is used and all the cut-and-choose phases. So, you know, raw garbled circuit material is expensive, and inputs are expensive in the malicious world. We want to minimize those as much as possible, so that's kind of the theme. So what is this mysterious way that I'm protecting the data of the ORAM? So if you look at the CPU circuit, it's outputting some internal state. It's outputting some stuff that's being written to memory. I need that. All that stuff needs to be kept private. So no one should know, like, the actual contents. And you need to prevent the bad guys from tampering with it, from saying that it was something that it wasn't. I'm already doing the CPU. I'm already evaluating the CPU inside of a 2PC protocol. It's already a garbled circuit. The garbled circuit is outputting a garbled representation of the state and memory in the form of wire labels. Wire labels have these two properties already, right? So if I have a garbled circuit outputting wire labels for the -- these wires, those wire labels hide zeros and ones of those values. And if I -- if you evaluate that circuit and you just get, like, the false wire label for this wire, you can't then guess what the true wire label was for that wire, right? So there's this authenticity property. You can't guess the other wire label given just one of the wire labels. So that's kind of the theme of -- I'll have more to say about this, but this is kind of our theme. We're already using a garbled circuit mechanism. Garbled circuits already have privacy and authenticity of their garbled values. Those are the properties we need to protect memory and state of an ORAM as it's executing. So that's our theme. We're going to reuse these garbled values, these wire labels. So that's what we do. We want to get malicious security. We don't want to put any additional overhead inside the garbled circuits because they're expensive to begin with. We want to do as well as we can, basically steal all the best ideas from circuit-based 2PC and steal all the efficiency parameters, right? If you can do security with overhead of 40 and malicious secure circuits, we want to do overhead 40 for RAM as well. So I'm going to describe two protocols, if I have time, with kind of different tradeoffs of their parameters. And when I talk about efficiency, I really care about how much more expensive is it than semi-honest. So to get 2 to the minus S security for security, the state-of-the-art is to do -- is to make it S times more expensive, and we match that here, as well, by stealing the appropriate ideas from circuit-based 2PC world. If we allow an online/offline setup where there's an offline preprocessing phase, which is actually very natural in the RAM setting, then we can even do better than S. We can do 2S over log T where T is the total running time. So think of T being very large. So log T is greater than two. In fact, like TS over log T can be something like 7 concretely. Okay. So malicious security that only costs like seven times more than semi-honest. >>: [inaudible] preprocessing depend on [inaudible]. >> Mike Rosulek: Yeah. So in general, yeah. So you need to know the function that you're evaluating but not the inputs. And actually, in the RAM setting it becomes a little bit more flexible because really, you only need to know what RAM construction you're using in the offline phase. In the online phase, you can decide -- so in the offline phase you decide, oh, I'm using this path ORAM construction. Then in the online phase I decide, okay, now we're going to evaluate F on these inputs, and I could reuse the old preprocessing, because most of the steps are ORAM construction steps. Okay. So hopefully some of this makes sense as we go along. So there's still more to say about -- yeah, go ahead. >>: [inaudible] simple that I must be misunderstanding. So doing this RAM model, you only want to be as good as circuit-based PC? Don't you want to do better than that? >> Mike Rosulek: So the thing that I am measuring is how much more expensive -- so in the RAM model, how much more expensive is malicious than semi-honest. >>: Okay. >> Mike Rosulek: And that's this S or 2S over log T. So that is the -- that ratio is what we're matching. But yeah, so if you have a RAM program for some F that's much faster than a circuit for F, then -- so this is S times semi-honest RAM computation. So presumably, the RAM model could be better than the circuit. So just comparing the time of a RAM 2PC to a circuit 2PC, it could be better depending on the RAM and the circuit. But this -- so this is a protocol for like general purpose, general -- like any RAM program. So it makes more sense for us to just talk about this -the overhead of malicious to semi-honest. So let me say a little bit more about what I mean when I say, like, reusing wire labels, reusing garbled values. So here is this CPU circuit. Has an input for the state, an input for the data that's being read from memory. It has outputs for the state, for the memory access, like the read versus write and address. It has an output for the data that's being written to memory if it happens to be a write instruction. Okay? So we're in the -- we're going to be evaluating the circuit over and over many times for each time step, and so when I say that we're reusing wire labels, I really mean that, you know, for these wires going out and for these wires coming in, I choose the same wire labels, right? When I garble a circuit, I get to choose the true label and the false label for each wire. I'll choose them to actually coincide right here, right? So whatever I get out of here, I can directly use as an input to this garbled circuit without -- like, none of us need to talk or do anything, right? Whatever garbled output you get from the T minus 1 circuit, you can feed right into the T circuit. So that's what I mean by these orange arrows. So conceptually, those wires are like the same wire because they're the same wire labels in the garbled circuit. So for the state, it's pretty straightforward, right? For the state, the state from time T goes into -- as the input just to time T plus 1one. With the memory it's a little bit less straightforward. So if I write to some address at this time and then later I want to read from that same address, I need to connect this data into this data input. Right? So whatever was written here needs to be read down here. I need to connect those wires. And by connect, I mean they share common wire labels in the garbled circuit. So this is what I mean by reusing wire labels. Since we have a small audience, I have a quiz. Everyone's been paying attention very nicely. It kind of looks like I just took the RAM computation and unrolled it into a giant circuit, right? But I just claimed that RAMs can do things that circuits can't do, right? A RAM can be sublinear in its time. That would give me a sublinear size circuit. So, like, where is the paradox? There's a paradox that needs to be resolved, right? It can't be true that I just unroll a RAM computation into a circuit and get something that's -- that doesn't touch every bit of the input, right? >>: You unroll it, but you still have memory access [inaudible] right? >> Mike Rosulek: Uh-huh. >>: Whereas if it was just a big circuit, all the inputs would be coming in at the beginning. >> Mike Rosulek: Yeah. There's -- it's this issue of the memory addresses. When I was unrolling this circuit, I was unrolling it -- these -- I was doing it in a way that depends on these addresses, and these addresses are determined at runtime. So there's this idea of, like these outputs come as the RAM program is running. Only then will I know how to connect the parts of this circuit, right? So good. We resolved the paradox there. I think it's worth thinking about that. You know, you get to a certain point writing the proof and you go, geez, what have I done? I've proved something impossible. So these connections are easy to predict. They don't depend on any runtime behavior. The state just always gets passed forward, but these, the lines that connect up the memory accesses are determined at runtime. So you can't just unroll the RAM computation before you actually run the RAM. So this is kind of -- I think this is an interesting way of thinking about RAM computation in general. It's like you're sort of building a circuit on the fly somehow. >>: Some programs [inaudible] static and react to this [inaudible]. They're not determined at runtime [inaudible]. >> Mike Rosulek: Yeah. The RAM programs with polylog overhead, though, so it's not just that these depend on runtime, because maybe they could be predictable, but they depend on the preprocessing phase, at least, right? The preprocessing phase puts some secret stuff in the state that tells you what these connections will be and there's a distribution over these addresses that's oblivious, but the actual addresses, themselves, you know, maybe they only depend on this secret key that's in the state or something like that. There's also this idea that you could have a RAM that's like giving outputs. So aside from these outputs, it could just be giving outputs as it goes and taking inputs as it goes. It's not really kind of a circuity thing to do, like giving outputs and taking inputs as it goes. You think of a circuit as like a one-shot deal. Okay. So this is -- I want to do secure evaluation of a RAM program. I really want to use this trick of reusing wire labels, right? Because there's nothing to do. Like I don't have to do any input consistency checks. Like you already have the thing you need that you're going to put into the next CPU. This is awesome. So I need to find a way to get everything else to work in a way that doesn't clobber this idea. So that's going to be the challenge. So the first -yeah. >>: Can you explain how you connect, like, the [inaudible] this curvy line. >> Mike Rosulek: So, yeah. Maybe if I haven't explained it well on this slide, maybe I need to do it better here. So imagine we just executed this, and I saw that output was read to this particular address. And I look back in time and say when was the last time I wrote to that address? Okay. It was time T minus 1. So now when I garble this circuit, I choose the wire labels for these guys to be coinciding here, and I choose the wire labels for these input wires to coincide with these wires. >>: So the address is in the clear? >> Mike Rosulek: Yeah. The address is in the clear. If I didn't know the address, I couldn't hook these up. >>: Yeah, but those addresses may be data dependent. >> Mike Rosulek: They're data dependent, but they're oblivious, right? They don't -- information theoretically they depend on the data, but they don't leak information about the data. >>: So [inaudible]. >> Mike Rosulek: Yeah, because it's in ORAS. So input dependent is maybe not the best term because it's also dependent on the initialization phase, and maybe that's the -- a better way to think about it. >>: And so do you need to keep the whole history of reads and writes as the program executes? >> Mike Rosulek: You need to keep -- for each address, you need to know when was the last time that you wrote to that address. >>: Oh, okay. >> Mike Rosulek: One person has to keep track of everything. So say that Bob is the keeper of the memory. He has to keep these wire labels around because that's his representation of what's in memory. So he has to keep the entire memory, which for every bit of memory, there's a wire label now. That's some overhead for sure. And the person who's generated the garbled circuits has to know when was the last time I wrote to this address because I have to remember what the wire labels were when I -- now it's time for me to garble this circuit, I have to fetch those old wire labels and make sure they're the ones I'm using here. Yeah. So people have to keep track of some of the history. Maybe not all of it, but some significant history. Actually, the person who's generating the garbled circuits can derive these wire labels using a pseudo random function. So he doesn't have to remember the wire labels. He just has to remember the last time they were accessed. There's some optimizations you can do where he can remember it a little bit less stuff, but still one of the parties has to remember for every address in memory, the corresponding wire label that he has. Times some overhead factor. This is malicious security. It's not cheap. Okay. So this is the reason why the CPU doesn't need to encrypt the data. The garbled circuit is already protecting the data from someone looking at it. So we do at least have a chance of the garbled circuit being like noncryptographic in nature. That's always nice. Okay. So how do we, in the circuit world, how do we a achieve malicious security? Well, we use a technique called cut-and-choose. So I'll just refresh everyone's memory about. The idea is, let's just say I'm evaluating one garbled circuit, but the parties involved might be malicious. In particular, the person who's generating garbled circuit might be lying about it and generating some bad ones. So I'll make him generate lots of garbled circuits. I'll pick some random subset of them to open and check. So he'll show me all the secrets in these garbled circuits. If any of them are bad, I'll give up because I caught him cheating. So let's assume that all of them checked out. Then I'll take the remaining garbled circuits and I'll evaluate them somehow. There's more things that can go wrong, but this is the general idea. And I have a statistical guarantee that let's say I open half of the circuits, all of them were good. And with very, very high probability, at least the majority of the unopened circuits were good. They were correctly generated. So I can take the majority output with very high probability that I'll represent a correct computation. >>: Do you get any difference in the output when you know it's been checked? >> Mike Rosulek: So the beauty of this -- the beauty of malicious security is sometimes you know you're being cheated, but you can't say -- you can't reveal the fact that you know you're being cheated. He can cheat in a way such that you'll only notice if your input was particular thing. So if you say, oh, I see -- if you tell me, Mike, I see that you're cheating me, then I'll say, aha, you could have only known that if you had used this particular input. Now I know your input. >>: Okay. So I guess the other way that this could be done in theory, at least, is all the unselected circuits are proven to be equivalent to each other. >> Mike Rosulek: >>: Yes. Which is less efficient, but -- >> Mike Rosulek: Yeah. Yeah. So this is kind of the standard paradigm for what you might even consider implementing. But, yeah, there's lots of different ways to prove that you have -- the whole purpose is to prove that there's some garbled circuit is correct, basically. This is one way to do it. And there are lots of subtleties that can go wrong, like, oh, I see that you were cheating me, but you can't say it. So that's why you have to take the majority. You can't just give up and -- yeah. So I'm glossing over most of the details just so conceptually we can see that this is the cut-and-choose approach. So now let's take this to the RAM setting, okay. So now we're going to -- this garbled circuit is the RAM CPU. We're going to evaluate it many times over and over again, okay? So let's say I did the cut-and-choose for time step T minus 1, and now it's time for me to do a cut-and-choose for time step T. I claim this is going to be a problem or it's going to conflict with my desire to reuse these garbled wire labels. So let's look at one of the circuits down below. Suppose this circuit ends up being a checked circuit. That means I'm going to open up the circuit and I'm going to see all of its secrets. So that means it better not share any wire labels in common with these evaluation circuits. Right? The protocol is really counting on these garbled values to be protected. This would reveal its secrets. On the other hand, if there's a circuit who's going to be used in evaluation, I really want it to share wire labels with some previous one, right? That's my whole trick that I want to use. But at the time that I'm generating these garbled circuits, I don't know whether they're going to be checked or not. That's the whole point of cut-and-choose. You generate them all with the possibility that any of them could be opened, right? So this kind of leads to a contradiction, right? Does that make sense? So the point of this is that like a naive application of cut-and-choose is incompatible with this idea of using wire labels. We have to be very careful by what we mean by cut-and-choose in this setting. So the thing that saves the day is this idea of blind cut-and-choose, which Sonny knows all about. So the idea is it's basically like one cut-and-choose to rule them all. Instead of a cut-and-choose at every time step, we do one cut-and-choose at the beginning. So I want to call the different columns here threads of computation, and each thread is going to be either a check thread or an eval thread. And the receiver is going to secretly determine whether it's check or eval, and the guy who's generating garbled circuit won't know which thread is -- you know, whether any thread is a check thread or an eval thread. So in time step T we generate garbled circuits. So we generate garbled circuits in each time step, and within each thread we reuse the wire labels. So I'm only drawing like one arrow, but, you know, this is this funky wire labels hooked up from all over the place, but only within a single thread. Okay? Within the check threads, the receiver checks that the garbled circuits are good and also checks that the connections are good, right? He's checking everything that the -- list off everything that the sender could do wrong. The receiver checks that in these threads. So somehow it's a arranged so that in a check thread, the receiver will get enough information to check all of these things, including the connections between them. He'll abort if any of those turn out to be wrong. Right? So the property of cut-and-choose says that if -let's say you open half of the threads are check threads. If all of them are correct in their garbling and connections, then the majority of the other threads will be connect in all aspects as well. So using an oblivious transfer, you can set it up so that the sender doesn't know which is the check thread, which is an eval thread. He sends the use OT so that if this was a checked thread, the receiver will pick up the checking information. If it was an eval thread, the receiver only picks up enough to evaluate on one input, which is the sender's input. Right? So there was this -- yeah, we sort of compartmentalized this sharing of wire labels. So wire labels are only shared between, you know, check circuit to check circuit or eval circuit to eval circuit this way. You don't have things crossing over, which would sort of mess things up. Probably need to move on a little faster from "oh, my goodness, what's going on?" So once you identify this as the cut-and-choose technique you use, that's pretty much most of the work. You have to solve all the other problems that are inherent in cut-and-choose, but those are kind of standard. If you want security two to the minus S, you need about 3 times S threads for the statistical argument to work. But using new techniques that Yehuda Lindell presented at Crypto, you only need S threads. So he showed that in the setting of circuits. We show how to do it in the RAM setting as well. It's this trick where at the end if someone was cheating, then you recover their input and then you just compute the function all by yourself. So we show how to do it at the end also with some small bootstrapping computation that its size is independent of the RAM running time. So that's kind of amortizes -- it amortizes this trick away very, very nicely. Okay. What else? So there's this problem that I need to remember the wire labels of the previous circuits, like we saw, and I can't -- this is a new PDF viewer that rewinds differently than I'm expecting. I can't garble these circuits until I've garbled these circuits because I don't know how I'm going to hook these up yet. So I can't preprocess all these garbled circuits. It really has to go sequentially. Right? I have to know what these -- I have to know where I'm going to hook up this other arrow that I didn't draw. I have to know that before I start garbling that circuit. >>: So you mean you can't garble [inaudible] until you've actually run the previous? >> Mike Rosulek: Yeah. So I need -- oh, boy. I need to get the outputs of this guy, which says what address I'm reading from, before I garble this, which because I need to know where this other wire comes from. So we can't preprocess the garbled circuits. Why would we want to preprocess the garbled circuits? Because two papers at Crypto last week showed that when you can batch preprocess a lot of garbled circuits, you get a good savings. So imagine we're in the circuit world again. We want to evaluate the same circuit lots and lots of times, right? So we're going to do a cut-and-choose. Lots of circuits are going to be generated and thrown away, but we know that we're going to evaluate the same circuit over and over again. Maybe there's a savings to be had. So their idea is, okay, you're going to garble -- you're going to evaluate a circuit end time, generate a ton of garbled circuits for that circuit, check some of them all at once, and then every time you want to evaluate the function, just pick a random subset of the remaining ones and just evaluate those. We'll call that a bucket. Right? So I want to evaluate F the first time. I'll pick these five circuits. That's my bucket. The next time I want to run F, I have this pool of unused circuits. I'll pick a random subset of them. Evaluate those. There's a bucket of five. And the cool thing is this extra step of taking the unused circuits and randomly assigning them to buckets, you know, the balls and bins problem is a little bit different and it allows these buckets to be significantly smaller, right? So as before, if it's just one cut-and-choose, I really can't do much better than having, like, S circuits. But here I can do, like, S over log N circuits. So in practical terms when N is like in the thousands, then the size of these buckets is like five or seven or something for security of 2 to the minus 40 instead of the size 40 buckets. Right? So this is why we'd really love to be able to preprocess garbled circuits. It seems to be incompatible with our idea of reusing wire labels. just want to talk about that a little bit. I The other reason it makes sense to preprocess garbled circuits is like we are evaluating the same circuit over and over again. It's the same RAM circuit over and over again. We already have a preprocessing phase for the ORAM, so it's not crazy for me to propose that we do this huge cut-and-choose at the beginning of time as part of a preprocessing phase. I'm already doing preprocessing to set up the data structures in the ORAM. It would be really great if I could also make these buckets really small and only deal with a couple garbled circuits at each time step. But again, this seems incompatible with our way of hooking up wire labels because it's -- the connection of the circuits is determined at runtime. If only we had a way to connect wires on the fly. If only we had a way to generate a garbled circuit and then later hook up the wires, that would be ideal. And fortunately, we do. It's called the LEGO approach invented, I guess, by Nielsen and Orlandi in a LEGO paper. They had the approach of generating a bunch of individual garbled gates, do a cut-and-choose on all the gates, and then sort of assemble them together in a circuit by connecting the gates together after the fact. So we take that idea and extend it to garbling a bunch of circuits and then kind of connecting circuits up on the fly. So we call this the LEGO RAM approach. I tried to find a picture of, like, an actual RAM, you know, like the animal made out of LEGOS, but this is all I could find. It's the battering RAM, though. This is the LEGO RAM. I was thinking, I may have had that set as a kid. I think it was a different one that was -- I had a lot of the castle LEGOS, in fact. Okay. So how does the LEGO approach work? Let me describe this idea of generating circuits and then later hooking them up together. So the main ingredient is this thing called an xor-homeomorphic commitment. So it starts out being a regular commitment scheme. So I commit to A, I commit to B, I commit to C. The stuff in the dotted boxes is a commitment that's sitting there waiting to be opened. So I can open A. It gets revealed to you, and I can also open the xor of two of these values so that you only learn the xor of the two values. You don't learn the individual ones. Okay. So these commitments are homomorphic in that way. How does that help us here? So this is our trick of connecting wire labels after the fact. So here is a single wire in a garbled circuit. It has two wire labels, A0 and A1, and they can either encode true and false this way or the opposite way. And I'm going to associate a bit, zero or one, depending on how they encode true and false, okay? So imagine I have this for each wire in the garbled circuit. I have these two wire labels and this parity bit. Okay? So imagine when I generate a garbled circuit, here is one wire in this garbled circuit and I have committed to these three values in an xor-homomorphic commitment. And then later on I generate another garbled circuit and I commit to those values, and they were totally independent of each other. They have nothing to do with each other, and now later I want to connect these wires up. This is the process that they call soldering. So, like, they were dealing with gates, really like you were building an integrated circuit or something, because I'm sure modern circuit fabrication uses soldering irons. That's my understanding. So I want to connect these two wires. The first thing I does do is I open the xor of the parity bits and that tells me whether the true/false mapping is different between these two guys. So, yes, indeed, for these two wires the true/false mapping was opposite. So I need to -- I want to map A0 to B1. I don't know whether A0 is true or false, and B1 is true or false, but I know that they have the same logical value because the parity bits were xor 1. So I'm going to open the xor of A0 and B1. I'm going to open xor of A1 and B0. And now later if you're the one evaluating the circuit, let's suppose you got to this point and you got A0, which might be true or false. You don't know. All you have to do is xor it with this value and you'll get B1, which is the corresponding wire label for down here. Right? So these are like the glue that allows you to transfer wire labels from one wire to another. And they don't leak anything else, so giving this to you doesn't leak anything about the other wire label. You still have this property that you can't guess the other wire label. This is the idea behind -- this is the LEGO approach for soldering wires together? Okay. So let's just apply this to our setting of RAM-based 2PC. So in the preprocessing phase we generate lots of garbled circuits for the CPU circuit, and for each -- for the input wires and output wires generate these homomorphic commitments, right? We'll do a cut-and-choose all of this, you know, huge number of garbled circuits and the cut-and-choose also checks the commitments also, right? So if you tell me to open the circuit, I'll open it. I'll also open the commitments and you can check that the commitments correspond to the secrets inside the garbled circuit. Okay? So with high probability, most of the rest of them are good, and they also correspond to their commitments, which is important. So now here is some time step. I want to evaluate the CPU on some time step. I pick a random bucket of CPU circuits from the pool and I solder their input wires together, right? So if there's a wire, like the first input bit of here gets soldered to the input bit of here and here and here. The first output bit of all of them get soldered together. We do this for all the input and output bits. And now we kind of have -- so this bucket has five circuits. I kind of have five paths from -- five paths from here to here, right? And with high probability, a majority of these circuits are good. So I just take those five paths and take the majority, and that will reflect -- that majority vote will reflect like a correct garbled circuit. Okay? So that's, like soldering this bucket together becomes, like, this mega-bucket that kind of acts like a single garbled circuit after you take the majority at the end, and then just solder the incoming state data from the previous step, solder the incoming memory input from the appropriate step in the past. The soldering can be done on the fly. So when it's time to execute this, I'll know how I need to connect it up to the past. Yeah. >>: A question as to why you have to sort of lump every bunch together. Why can't you just end up separate threads and then solder -- or I mean, solder together the gates in each thread but then put them all together ->> Mike Rosulek: Yeah. So in this bucket, like any of these could be bad. Imagine like one out of every five circuits is bad. That's fine. So let me take a bunch of buckets. One out of every five is bad, but it's not like that means in every column there's probably one bad one. >>: Right. >> Mike Rosulek: So I don't want to evaluate things that way, right? I want to isolate the bad one in here. I know that the majority of these are good, so I'll take the majority of these guys. And I don't know where the bad ones might live in another bucket. Does that make sense? So that was kind of an issue that comes up in the other kind of cut-and-choose as well. Right? You need, like, the majority of an entire thread to be good. You need to consider an entire execution sequence to be good to include it in the majority. Okay. So overall we get this protocol. In the offline phase we generate circuits. We do this huge cut-and-choose. And in the online phase we solder things together using these homomorphic commitments. The offline phase is expensive. I'm not going to lie, but the online phase is just related to the size of these buckets, which is determined by this combinatorial argument, right? And so the bucket sizes are size S, which is the security parameter over log of T. T is the number of times I'm doing this, the running time of the RAM. I actually wrote a Perl script to figure out the exact probabilities. So some of these previous papers had bounds. I thought I'm going to figure out the exact bounds. And it takes my desktop computer days and days to figure this out. But the numbers aren't so bad. Like even for 5,000, like a RAM program that talks to memory 5,000 times is very reasonable. The bucket size is seven. That seems pretty reasonable to me. It takes a while to go from seven to five, and I can't get it to -- I don't know where it crosses over from five to three. Probably with many more digits and the number T. Concretely these numbers are pretty good, so if you're wondering how much more expensive is malicious security compared to semi-honest security, like if the answer is seven or five, that's not too bad. I think that's pretty good. There's some other things I'll probably have to skip because I'm out of time. There are some things I really lied about and omitted. Yeah. Mostly with standard techniques. Like cut-and-choose, by introducing cut-and-choose, you introduce a lot of new things that can go wrong, okay? Those can really be handled by standard techniques. I still am glossing over how do you initialize the ORAM. That's the elephant in the room for all this notion of basing 2PC from RAMs to begin with. I didn't talk about how to get inputs into the RAM. I didn't talk about ORAM constructions need randomness to randomize the location of things in memory. I didn't talk about those sort of things. And I won't because I'm out of time. So this is the slide that I showed before. This is just kind of a summary of our results. All right. So if you compare the ratio of the cost of malicious versus cost of semi-honest, we kind of match the ratios that we can get in the circuit world and just, you know, the big idea is just reuse wire labels as a representation of the secret data. It's a very good way to -- a very good representation in a malicious secure protocol because the garbled circuit already has, like, protection against malicious behavior. So that's my last slide. Thanks for your time, and thanks for coming to see my talk. [applause] >>: Any questions? >>: So what are some programs that are thought to be better than RAM [inaudible]? I think you had [inaudible] minor search. >> Mike Rosulek: Yeah. So the paper that we built on that does RAM computation in the semi-honest model, they do just a simple binary search. So a big data, let's say ten million items, like a sorted list of 10 million items. So one of the things that I didn't mention, and now I'll mention it, is that you can set up the memory once and then run a RAM program many times. So many invocations of the CPU is one execution of the RAM program. You could do that many times, right? So you can query the memory for this, query -- do a binary search on this, do a binary search on that. So that was one thing that the previous work did was binary search. >>: So is it generally programs that have really large inputs? >> Mike Rosulek: So I think that's where RAMs are the biggest win is when -- RAMs can give sublinear time algorithms, and if you don't have a secure computation protocol that even allows that, then it's not going to be much fun. So like binary search is the economical example, but anything like that that's kind of a sublinear time access to a large data set, I think that kind of makes sense. One other thing that people seem to mention a lot is the stable marriage, stable matching algorithm because it's very -- it's very iterative, like you start with a matching and then you, like, kick people out and then, you know, things people keep proposing to other people and stuff keeps, you know -- the matching keeps changing in memory. It's a very random access kind of computation. I haven't thought hard about what it would look like as a circuit. I'm also not sure I want to think that hard about it, but that's another example. It's not a sublinear time algorithm, but certainly it's more well sued to a RAM model than a circuit. Cool. >>: Thank you again. >> Mike Rosulek: [applause] Thanks, everyone.