>> Seny Kamara: Okay, so we're going to have... Yehuda to this crowd is probably a bit redundant but...

advertisement

>> Seny Kamara: Okay, so we're going to have our first invited talk now. So introducing

Yehuda to this crowd is probably a bit redundant but I will say a few words anyways. So we are happy to have Dr. Yehuda Lindell with us. Yehuda is a professor at Bar-Ilan

University. So Yehuda has done lots of influential work in cryptography and especially in secure multiparty computation. A few examples are his work on privacy-preserving data mining, his work on security and efficiency of Yao's garbled circuit protocols, various work on that, and also he has co-authored a book on secure two-party computation.

Besides his work on secure multiparty computation, he also has influential work in other areas of cryptography, and he also has co-authored a widely used cryptographic textbook called Introduction to Modern Cryptography. So I think with that I will let him start his talk and won't make him wait any more.

>> Yehuda Lindell: Thank you very much.

[Applause]

So thank you, everyone. Thank you for the invitation and for the great initiative of this workshop. I'm going to talk about two-party computation, semi-honest and malicious. I'm going to try and keep it at a rather high-level and focus on conceptual issues. Since on the first talk -- I guess if you don't know this then it's not clear why you're here. But since it's the first talk of the workshop I'll just spend a minute putting us in context. So we have a set of parties with private inputs who want to compute some joint function of their inputs while preserving certain security properties like privacy, meaning only the output we revel, and correctness, meaning the correct function, will be computed. So if we have

DNA and we want to, say, check whether we should be going out because maybe we have genetic disorders that will match together and be really bad then we want to run a secure computation because we don't want to give up our DNA to every girl that we meet at a pub. But on the other hand we also want to know whether it's worthwhile going back after a few drinks.

And correctness will mean that neither of the two -- I'm not quite sure who'd be more suspicious; actually, I do. Neither two can make seem as if it really is okay when it's not okay. And that would be correctness. So we said security, we want privacy, we want correctness and the main question we have to ask is what is our adversarial model?

Who is our adversary and what are we trying to protect against? And there are three main models. I'll talk about the first two mainly but the third I'll just mention. Semi-honest means that the adversary follows the protocol specification and does what the code is supposed to do but tries to learn more information than allowed by looking at the transcript. This is a rather weak model, weak adversarial model. It doesn't provide any guarantees an assumed adversary couldn't run malicious codes. But it doesn't mean that it's useless and there are a few reasons why it's important to look at semi-honest security. Firstly, it models inadvertant leakage and I think this is a very important thing that a lot of people miss.

Let's say we have two hospitals who want to collaborate to do privacy-preserving data mining on the unit of their databases. They actually trust each other. But for regulatory reasons they're not allows to release or give their data to each other. And they also want to guarantee that if the other hospital gets broken into "my data is fine." So this is actually a very good reason to use a semi-honest protocol in that case. "I'm not worried the other hospital is going to cheat; I'm worried that my data may leak to them," also for

regulatory reasons and also for the actual reason that if they're broken into "this should be my problem." So that's one thing.

There are also other places we can say if the computers are very well locked down and maybe you can't change code, but in general semi-honest is not what we want. But it has proven in the past and it continues to prove to be a very, very good place to learn about efficiency and efficiency improvements that have been gained by understanding semi-honest protocols actually have made a huge difference to the case of malicious as well. So I strongly advocate looking at semi-honest, highly optimizing, getting improvements, helps us for the more general case.

For malicious adversaries where the adversary can run any arbitrary code that they want: this is actually obviously the best, what we'd really want. It doesn't matter what the adversary will do, we know that we're being protected. Another model which is somewhat in between is covert. And covert means, yes, the adversary can do whatever they want, they can run malicious code, but we're guaranteed that if they do try to cheat we're going to catch them with some good probability. And this actually makes a lot of sense in a lot of business-type scenarios. And in fact there's even a specific project that

I’m involved in right now which is not an academic project actually where they explicitly said that this level of security is what they need.

So if you have semi-trusting parties and the ramifications of getting caught cheating will be high then this is a very good model to look at. So there are two very fundamental feasibility results that we have for secure computation. These go back to the mid- to late

80's. We know that we can securely compute any function that we want. And we have

Yao which is a constant-round protocol. It's two parties only. And we have GMW which is this highly interactive protocol where you interact for every gate of a circuit you're trying to compute. And GMW is easily generalized to multiparty. So if you looked at these two results as a theoretician which everybody did until very recently then Yao is good for this constant-round but we don't have a nice multiparty generalization. We have BMR but it's not so nice, and in terms of just ease of understanding if you want to teach it. And GMW is easily generalized to multiparty. And that's all we have as feasibility results.

And historically it was well understood that if you want to get real efficiency then you need to write, to design a specific protocol for [inaudible] section for voting. These things will never actually be something which is of any good. And maybe it's worthwhile saying

-- Because I think there are enough people that are young enough in the room that for them they grew up understanding that, "No, yeah. Computing on circuits is actually something which is reasonable." But we grew up in prehistoric times in the previous millennium and back then this was like a completely absurd concept. Right? This would be like programming a car production which -- I don't know if anybody has actually done that. And that's what we thought anyway.

And the modern perspective is completely different. Yao is actually really, really efficient and you can achieve, actually, extraordinary efficiency with it. And not only that for semihonest it actually yields itself very nicely to generalization to malicious because we have these cut-and-choose techniques which I'll focus on a lot. Whereas GMW is highly interactive; it needs an oblivious transfer for every gate. So this is something which is just not -- GMW is still a pure feasibility result, and it's not something that we're going to try and actually use in practice. That would be, I would say, the modern perspective until very recently. And I'm going to start my talk by doing the GMW comeback and say,

"Maybe this isn't so nice. I mean Yao was just one person. We have three people on

GMW; we should give them some credit as well."

And maybe also it'd be nice to tell [inaudible], "This is not theory; you've actually designed a really practical protocol." I'm not sure he'd really care. But in case we have --

Actually I think it's really fascinating to be serious for a moment that these two results which were considered and designed to be pure feasibility and theoretical results actually in some sense are the best protocols that we have until today. So it's an amazing that -- I think it's quite amazing.

So the first obstacle in GMW is that you need an oblivious transfer per gate, and so you need the number of oblivious transfers the size of the circuit. And this is something that if even if you just have a few exponentiations per oblivious transfer, it's going to be really expensive. And, therefore, it's not going to be of any use except in tiny circuits. So that's why GMW we thought of to be as something which would not be practically interesting.

But the solution is something called an OT extension. What's an OT extension? And OT extension is essentially the secure computation analog of hybrid encryption where you run encrypted RSAs so you encrypt a small symmetric key using RSA and then you encrypt everything using AES and then it's really, really efficient. And an OT extension is the same thing. You run a small number of base OTs, say 128 of them, and then you use cheap operations to actually get a million or a billion OTs based on just 128 OTs. So that's an OT extension.

And Beaver in the 90's showed that this is actually possible. And this is an amazing result. By the way, if anybody doesn't know how Beaver does it then either go and read or come up to me in the break and I'll explain it to you because it's actually a result that everybody should know. This is a beautiful, beautiful, beautiful theoretical feasibility result. Okay? Beaver's construction is not one that you would want to use but it shows you something really amazing: you can actually just do a small number of OTs and effectively get a billion. And that's what Beaver showed us in the 90's.

And in 2003 Ishai, Kilian, Nissim and Petrank come along and said, "You know, again, this OT extension is not just a theoretical concept. We can actually do something which is very efficient." And they do something which is extraordinary. You have 128 base

OTs, for example, and then you can get a billion OTs where every OT will cost you three hash function operations essentially which is incredible. Three hash function operations: this is not something which we are going to be able to improve on or at least it doesn't seem like it. We don't have to improve on something which is so extraordinarily efficient.

And that's what they did in 2003.

And this is, I think, the first message I want to get across in this talk that even when something is really, really, really efficient cryptographically it doesn't mean that we can't actually make very significant improvements beyond that by looking at other issues. So this protocol of IKNP involves sending a massive matrix. Essentially if you want to do a billion hash function operations, you construct a matrix which is 128 by a billion and send that. And then, you do this matrix transpose. So you have this really, really wide matrix and you turn it into a really, really long matrix. And then, you take 128 bits each time for a single OT. Okay, so that's where you can get a billion out of 128.

But the problem is, is you have to send this massive matrix. And this is also not so nice because if I'm building a billion by 128 matrix then the other guy is not doing anything. I

send it. Then when the other guy is receiving it, I'm not doing anything. That's one problem. Second, holding it in memory is not so great. As David will tell you, 128 by a billion is probably not the best thing to do. I'm a theoretician; I have no idea. But there are least other sorts of problems. How real are they? This is one of the questions. But when you start looking closer, you see how real they really are. Here's an amazing observation: 42 percent of the time spent on this protocol is matrix transpose. 33 percent of the time is hash function operations. Incredible.

A bigger cost in this protocol is involved on matrix transpose than on actually computing the hash operations. Now I grew up -- Again, I'm talking about my age. I grew up in an age when the cost in cryptographic protocols was the cryptographic operations.

Everything is negligible, is not significant. And therefore, when you want to design more efficient protocols, reduce the number of cryptographic operations. Well the fact is today two things have happened. One, we have nice hardware people that made and

[inaudible] people who made cryptographic operations really fast and second, we have nice cryptographers who made the number of cryptographic operations small. And now guess what? The cryptographic operations are no longer the bottleneck in a lot of protocols. And this is one example where stupid matrix transpose is costing more than hash function operations.

Okay, there's also an issue here that the bandwidth is high. So even if you're only a single bit, for every oblivious transfer of the billion you have to send 128 bits each time.

So the first thing and the reason why the transpose is so slow for those theoreticians out there who for some reason have turned up at this workshop, it's slow because there are lots of cache misses. It's a very -- You're taking something every time you need to read and you're reading in these long words. And then, you're missing every single time. And it actually really, really slows things down. This entire world of algorithms that are cacheaware is really fascinating stuff so you can do sort much faster than Quicksort for any of those who are still stuck in the sort days of when you did your undergrad in computer science. Sort using a cache-aware algorithm in practice runs much, much, much faster than Quicksort. And here too you can do matrix transposes and algorithm by Eklundh who wrote this algorithm in the 70's because he wanted to matrix transpose on disk or on tape; I'm not sure.

And by running his algorithm we reduce the cost of the transpose to 10 percent of its previous cost and, therefore, reduce by 25 percent the cost of the protocol already. Just by doing the matrix transpose in an intelligent way, we manage to reduce by a quarter the cost of the protocol. Also with memory management and parallelization, instead of sending this massive matrix which will slow things down and also make it really hard to work with, we made very, very small changes that enabled you to actually just work now in chunks, in small blocks so that now everything becomes parallelizable. Everything became you didn't have to hold lots of things in memory; you only have to work with small amounts at a time. Also the original OT extension, you know how many OTs you want ahead of time. And we made a slight change to enable you to be dynamic.

So you do the base OTs and every time you want OTs, you can just ask for how many you want and everything is very, very smooth. So these are two changes that I would argue are non-cryptographic. The second one is a little bit cryptographic but it's like taking existing protocol and now making small changes that you can do because you understand algorithms and systems. This one you need a little bit of crypto knowledge but not that much. But there's a third change that we made which is really -- you need to

be a cryptographer. And what we actually realized is what did we want OT extensions for? Well, what do we want OT for? At least in this workshop we want OT because we're going to use it for secure computation. But if you're doing GMW, actually what you need is OT on random inputs. And if you're doing Yao then you're probably doing free-XOR until later on this afternoon when you are doing flexible XOR. But at least now for this morning you want to free-XOR. And when you're doing free-XOR it means that you have one value and the second value is always that first value XOR with the second, and it's like a correlated input. But the one of the values, the random one, you don't care what it is.

So we can actually change the protocol so that you derive the inputs -- you derive the output -- or actually you don't give an input; you get random output. And this actually reduces the bandwidth significantly, and we actually reduce the bandwidth to a quarter of its original. And reducing it to a quarter -- When we said that the cryptographic operations are not the bottleneck and now the transpose isn't the bottleneck, what the bottleneck is, is the bandwidth. So reducing that to a quarter by understanding where you're going to use and obviously making protocol changes, this made a huge difference. And this is the graph that we have.

So look at the blue for a second: that's when you're running on a LAN. So the original protocol ran for just over 20 seconds. Not that's not for 1 OT; that's for 10 million OTs.

So this is fast, right? 20 seconds for 10 million is great. But by doing matrix transpose intelligent we're below 15 seconds. By then doing the correlated or the random OTs, so actually the inputs were reducing the bandwidth, we dropped down to below 10 seconds.

But because now we're not working with this massive matrix and we said we can work on blocks, we can make everything parallelized nicely which you can't do with the original protocol. So with two threads you get to a half; with 4 threads you get to a quarter and so on and so on. It actually improves linearly. So just with 4 threads which you have, I guess, on any machine now, probably on your iPhone -- maybe upwards of 8 threads; I'm not sure -- so you've got down to 10 percent of the original cost.

When you look what happens when you're running on a wireless then actually even matrix transpose doesn't make much of a difference because the bottleneck is only communication. But once we've improved that and we've dropped it down then you're down to below half the cost. Okay? Or maybe the overall is half not a quarter. I thought it was a quarter but I don't remember. So in any sense we've got down to below half the cost by looking at issues of memory. Okay, just in bottom line we compute a 1 milliongate circuit using GMW in half a second and a 1.3 billion-gate circuit in 11 minutes which is something that just 5 years ago -- or let's say, I guess I would've thought it was impossible. Yeah?

>>: This is the amortized cost for OT [inaudible]?

>> Yehuda Lindell: No, no.

>>: [Inaudible].

>> Yehuda Lindell: No, this is the cost of doing 10 million OTs.

>>: Oh.

>>: In seconds. Yeah, you can do 10 million OTs in 2.5 seconds. Not bad. So what do I want to give you just from this few minutes on semi-honest? I'm going to say firstly that

GMW shouldn't be ruled out, and it may actually be quicker than Yao. It depends on, of course, are you running -- if you're very far away geographically and you have to do lots of -- and you have a long circuit then maybe not. But in many cases, GMW should definitely be considered. But that, I guess, is the less important thing as far as I’m concerned. The more important lesson is that cryptographic operations have become fast and there may not longer be the bottleneck, so we really need to understand and look at these other things to be informed. I'm not a systems person. I'm not an algorithms person. But I think it's very important for us cryptographers to be informed by the implementations, by the systems people and looking at what the costs are and then coming with improvements. But of course we all know that these protocols are really subtle to you have to be a cryptographer to make any change because you can easily mess them up.

And you can make algorithm and system optimizations on existing protocols; that's a good thing to do. But, it's never going to take you as far as if you build the protocol from scratch or design the protocol understanding both the cryptographic aspects but also keeping in mind what the bottlenecks are and what the systems and algorithms understanding has given you. So looking at the implementations that people do, looking at understanding what the costs actually are and where things -- And these change all the time, by the way. So if you're looking at Yao for malicious like I'll talk about in a minute, in the last seven years I think that the bottleneck has changed at least five times.

So depending on -- The first protocols had -- My original protocol [inaudible] had many, many commitments and that was the worst thing, so we changed it to exponentiations.

And then afterwards the number of circuits reduced. And every single time something was improved, the bottleneck actually changed. It's not like you do that once and you understand, but it's an ongoing process.

Okay, let's go on now to look at malicious. And for the rest of the talk I'm going to focus on getting Yao's protocol to be secure for malicious adversaries. So this is Yao's protocol: P1 constructs the garbled circuit. P1 sends it to P2 and sends also the keys associated with its own input down here. Each wire has keys. They do an oblivious transfer from P2 to get the keys associated with its import. Then, P2 now has an encrypted garbled circuit and one key for every input wire. We can now compute the circuit and get the output. P2 doesn't know anything because it just got an encrypted circuit which doesn't reveal any information. P1 doesn't know anything because P1 didn't compute the circuit. And we're all happy.

Okay, I'm actually very, very happy because this is really, really, really, really, really, really fast. Did I say "really" too many times? Now what happens when I want to do this maliciously? So the first thing which is sort of obvious: the OT has to be secure for malicious adversaries. Now again we're in a different time but in 2007 the first time that, with Benny Pinkas, we worked on this problem, we didn't have an OT that was secure from malicious adversaries that was effic ient. It didn’t' exist. The best OTs that we actually had for malicious adversaries that were proved with simulation-based security, anyone have a guess what the best protocol that was? Hmm?

>>: Quantum.

>> Yehuda Lindell: Quantum? I don't know. I'm in the classical world. I don't understand anything about quantum, so I apologize. I can't know if there was a quantum protocol.

But what was the best protocol that was out there in the classic world for secure computation with simulation-based security?

>>: GMW.

>> Yehuda Lindell: GMW with a general GMW compiler using non-interactive zero -- using regular zero knowledge proofs with a [inaudible] production. We didn't have any protocol that was efficient. So we wrote our protocol using a general OT primitive, and we wrote explicitly in the paper, "Hopefully when we solve this problem that will speed up our protocol." Fortunately a year later that happened. But just to show how quickly things are moving in this space, OT today we do in milliseconds also for malicious adversaries when in just 2007 we didn't have any such protocol. I've got to a stage where I'm talking about history; that's really sad, right? So fortunately in our field history is not in the

1950's but it's 2007.

So I think everybody here can talk about history. Okay, now we can also use OT extensions which we have now through tiny OT, and we have more improvements that are coming along but I don’t want to talk about that so much. The second problem -- And this is the problem that I want to talk about mostly -- is that the circuit may not be correctly constructed. And we have an encrypted circuit, it's a garbled circuit; it doesn't reveal what's going on inside. That's the whole point. So if P1 constructs a bad circuit,

P2 won't know that it's a bad circuit. Now you might think, "Okay, this is a problem of correctness because now it might not compute the correct function," but this is not true.

We already know from the late 90's and early 2000's when our definitions were finalized

-- I guess the late 90's -- that privacy and correctness go hand in hand. The examples given in the late 90's were sort of artificial, and today I think we understand that this is not artificial.

If the circuit is not correctly constructed then P1 can embed something in the circuit which reveals information about P2. So it could, for example, have a random mask going in and revealing some chunk of P2's input in the output that P2 won't even notice because it's masked by a random string that only P1 knows. So if you can construct a bad circuit, you can actually learn a lot of information about the other parties input and you can cheat. So these things go very closely together, and so we need to somehow solve this problem. And the way to do this is called the cut-and-choose paradigm. This is, again, a very old technique and it's, again, amazing that the techniques which were developed in the late 80's are relevant to practice today.

Why the late 80's? Because cut-and-choose is essentially the solution used for zeroknowledge proofs for NP. That's the first time I've seen it. And, again, the same Yao, the same GMW in 86 and GMW in 87 -- So the GMW for zero-knowledge proofs for NP and

GMW for [inaudible] computations -- are very relevant to implementations that we're doing today. And I think that's very interesting. So what happens is P1 constructs many copies of the circuit and P1 sends them to P2. P2 says, "Okay, open half of them. I want to check whether they're any good." And then P1 opens them and if they're all good then

P2 says, "Okay, you look like a nice guy and I can pretty much trust you, so I think the rest of the circuits are also okay." And everyone's happy. And that's the cut-and-choose paradigm.

And this actually solves one problem but creates many, many new ones. I won't deal with them here but it actually -- So the big problem was the circuit was incorrect and now once we've managed to solve this we've actually created a whole lot of other problems because we're not computing on many circuits. So the inputs have to be consistent and there are a whole lot of other attacks that go on. I'm not going to go into it but to just be aware that this is not the only problem and in fact it's not even necessarily the hardest.

Now here comes the really important question when you start to deal with these things.

So we now set all our circuits, opened half; they're all fine. We have half left; we're going to compute on them. What will P2 do if it gets different, inconsistent outputs in the other remaining half of the circuits?

It's a good question, right, because we're computing on a half. If this happens -- How can this happen? It's going to only happen if P1 is trying to cheat. Okay, so P1 is sending some bad circuits but they don’t get opened. There's no guarantee that we'll open them all. In fact if there's just one bad circuit it won't be opened -- with probability of one-half it won't be opened and will somehow sneak into the ones that are going to be evaluated. Now P2 knows that P1 is cheating. What do you do when you know someone is cheating on the playground? Depending on -- I'm being recorded so I won't swear but you tell them to get lost and you're not going to play anymore because they're not playing nicely.

So that's what P2 should do, right? P2 should abort and say, "Get lost. You're a cheat. I don't play with cheats." Unfortunately if P2 tells P1 that he's cheating then P2 is going to compromise his privacy. Why is that? Let's look at the following attack. P1 is going to generate one garbled circuit which is incorrect and the rest are going to be fine. And this garbled circuit will say the following thing, "If the first bit of P2's input is equal to 0 then output complete garbage. And if the first bit equals 1 then compute normally." Now it doesn't have to be the first bit; it can be any predicate you want. If it's whatever predicate you want, it doesn't make a difference. Now with probably one-half the circuits going to be evaluated and not checked, it's going to sneak in. And if the first bit of P2's input is 0 then what's going to happen? It's going to output garbage. And P2 is going to get now different outputs. It'll get in a whole lot of the circuits one specific type of output.

And in another one it's going to get something different. It's going to abort, "You're cheating. I'm not playing with you anymore." If P2's first input bit is 1, it's going to get all the same because it's not going to notice that anything happened. All the outputs are consistent; as far as he's concerned, P1 is honest. So he's going to complete the protocol. And, therefore, just by P1 observing if P2 aborted, P1 will actually learn this predicate about P2's input. So even when someone's cheating you in the playground, you can't tell them, "Get lost because I don't play with cheats," and not only can't you say, "Get lost; I don't play with cheats," when P1 comes back to you next week and says, "Let's play again," what do you have to say? "Yes, of course let's play. I love playing with you," because if you tell him, "Get lost; you're a cheat," then he knows that last week -- Right -- "That was your first input bit." Okay?

So we're in this sort of uncomfortable situation. We're in this sort of cheating relationship and we have to keep it up. You know he's cheating on you but you can't do anything about it. But that's the mathematics. So what are we going to do? What are going to do?

What's P2 going to do if he notices that he has different outputs? Well, the answer is simple: take the majority output because we can say that the majority of the remaining circuits are good. We have all these circuits; we opened half. The other half, the majority

is going to be good because if they're not we would've caught P1; and therefore, we can just take the majority output and ignore the minority ones. We know he's cheating but we can just ignore those values. And the question is, what's the function bounding this?

And, again, I want to talk a bit of history. So in 2007 when we did this we thought, "Ah,

Chernoff bound," and we wrote -- it gets to the error probability was 2 to the minus S over 17. If you sent S circuits meaning that if got 2 to the minus 40 cheating probability, you need 680 circuits. Okay, we didn't really think we needed that, when we were doing this for the first time we didn't really understand the ramifications of our analysis. So the bottom line was: if you wanted to take our analysis as it was because we used the

Chernoff bound, you need 680 circuits which obviously would absurd. But a better computation -- which is not accurate, by the way -- let's say you send S circuits and you're opening half of them then there's another half remaining. So if a quarter of them are bad and you didn't catch the adversary, you didn't catch P1 that means you lost because you want the majority to be good.

Now let's assume -- And we don't do this because in these protocols you're opening half of them, but let's assume you choose each circuit to be open with probability one-half exactly then what happens is that the adversary will catch you -- sorry, the adversary will get away with it. So a quarter of the circuits will be bad and you won't notice with probability 2 to the minus S over 4. So if you want to secure your 2 to the minus 40 -- By the way, this is not 2 to the minus 40 as in 40-bit key. That's computation time. It's 2 to the minus 40 in terms of statistical cheating error. So the adversary can succeed in cheating with probability 2 to the minus 40 only which in my mind is reasonable. Of course you can get to the 2 to the minus 80; it's a parameter.

So you need 160 circuits. Now actually we did a better analysis in 2011 showing it's actually 2 to the minus 0.311, and a student -- maybe [inaudible] I think, [inaudible] -- showed that if you open 60 percent not 50 percent then it's 2 to the minus 0.32s an that means 125 circuits are enough. So this is good news; you don’t need 680. You need 125 circuits. Well, it's good news and it's bad news, right? It's 125 times the cost of semihonest. That's expensive. 125 times the cost of semi-honest is very, very expensive. But, you know, it's reasonable. We can do it. We can implement it. We can actually run it and we do. And so this is where we are in 2012.

And in fact, [inaudible] also showed not only that 60% gives you this error probability, they showed that you can't do better than that. This is optimal regarding this strategy.

You could still optimize all the other parts and there are all these other parts as well, but this is really depressing [inaudible] come along and said, "Yeah. No, you can't do better."

But you have to realize that they didn't say, "You can't do better;" they said, "You can't do better within the framework of sending circuits, opening some percentage, looking at the remainder and taking the majority." So it's an optimality of a certain strategy, not an optimality overall.

And the main question was: is there a different variant that we can use that will reduce the number of circuits further? And in order to understand that the answer is yes, we have to come back to understand what the problem is that we're trying to solve. Now remember that if P2 aborts when he sees different outputs then P1 learns something. So what we want to do is we want to try and design a strategy so that the only way that P1 can succeed in cheating is if the following thing happens: all of the checked circuits are correct and all of the evaluated circuits are incorrect. Okay, this is soft of "you can't do

better than this." If you get this sort of situation, this is like seemingly optimal. It's very dangerous to say you're optimal especially when you're being videoed and you agreed for everyone in the world to see as well.

So I'm not saying it's optimal and you can't do better. I'm saying that it seems to be -- you're doing cut-and-choose so if P1 chooses circuits and all of the checked ones are correct and P2 doesn't notice that P1 cheated and if all of the evaluated ones are incorrect then they can all be consistent to the same role function. There's no way that

P2 can detect anything about that either. So that's what we want to try and get to. And if we can succeed in doing this then the cheating probability will be S over 2 choose S over 2 which means for 2 to the minus 40 we can get down to 44 circuits which is much, much, much better. And, again, we have to try and think we can design such a thing based on the fact that the problem is when you get inconsistent outputs. If you don't get inconsistent outputs, everything is fine. Especially if we're doing thi strategy because if you don't get inconsistent outputs and you know that this is the only bad situation then if you're not in this situation at least one of the evaluated circuits is good. If at least one of the evaluated circuits is good then it gives you good output. And that means that if everything's consistent then you know that you got the good outputs so you're fine.

The problem is what do you with the inconsistency. Now notice that if you're actually doing such a strategy, you don't have to do S circuits and open half of them. You can choose each circuit at random with probability one-half because there's just a single bad event. So P1 chooses circuits, some of them are good, some of them are bad. And there's only one bad subset. When all the bad ones are computed and all the good ones are checked. So you can just do this and you get 2 to the minus 40 with S circuits which of course is even better. So it's another 10 percent improvement. So this is our aim. How are we going to do this? And again, as we said, the problem is only if P2 observe or receives different outputs. If doesn’t receive different outputs then everything is fine.

And the idea is that we're going to have some magic trapdoor. So if P2 gets two different outputs in different circuits, it will somehow learn P1's actually input. We're going to have magic. He'll learn P1's input. And if he learns P1's input, what can he do? Just compute the function. Just compute f of x, y because he actually knows P1 is really important.

And of course he's not going to tell P1. He's not going to tell P1 that, "You're cheating and I've got your input." He's going to pretend that he's playing nicely because we still have the same problem as beforehand. But he's just going to compute the output correctly. So either you get all the outputs from the garbled circuits are correct -- I'm sorry, are consistent. And if they're all consistent and at least one of the circuits is good, everything is fine. Or, you got two different outputs. If you've got two different outputs, the magic trapdoor popped out and when the magic trapdoor popped out, you got P1's input and everything is still fine because you can correctly compute the output.

So this is really, really good news. How do you implement this magic trapdoor? We use sorting magic encryptors. It's not such a problem. The first thing we do is we -- So we have many garbled circuits and in many garbled circuits, all the garbled values are independent. But now we're going to make the garbled values on the output wires be the same. So just on the output wires it's fine. And what's going to happen if P2 receives different outputs and it means that it received both garbles values on a single wire. So there's a single wire where P2 got both of the values. Now if P1 was honest, this would never happen. So when P1 is honest then P2 will not be able to [inaudible] this magic trapdoor. The trapdoor is going to be two values -- both output values on a single

garbled wire. Okay? That's the magic trapdoor. So P2 will evaluate all the circuits and then P1 and P2 will run a malicious secure computation for a circuit that does the following: P1 inputs the correct input x, the same input as the other computation -- We can enforce that; that's fine -- and P2 inputs a magic trapdoor which is the two values on a single output wire.

And if P2 has this value then the circuit will give him x and, otherwise, it will give nothing.

So if P1 is honest, P2 won't get anything from [inaudible] secure computation. And if P1 cheats and there are inconsistent outputs then P2 gets P1's input. And we can enforce that all of these things are done correctly and efficiently. What's the problem? The problem is they're trying to construct a secure computation that's secure for malicious but we need a [inaudible] computation that's secure for malicious. It's a bit of circularity problem.

Fortunately we already know that you can do circular security, and it's called bootstrapping in this case. Yeah, you're right. I'm going to use an older protocol that needs 125 circuits and not my new protocol that needs only 40. But why will gain anything? Because I'm going to use -- this bootstrapping circuit is going to be small. So

I'm going to have 40 times this big circuit and 125 times this very small circuit. And if the circuit is small enough then everything is going to be fine. And since I don't have that much time, I'll tell you how big the circuit is going to be for number of inputs equals the number of outputs equals the security parameters because it makes computations very easy or we don't know how to compute. Then, this has only 32,000 XOR gates and only

16,000 AND gates. Right? That's wonderful. So if I want to compute an 8,000-gate circuit now instead of 125 times 8,000, I now have 40 times 8,000 plus 125 times 16,000.

Okay, that didn't work. But now if you go -- And I'm not going to go through all the details. If you go and look, you can actually do a much smarter circuit. And this is another important conceptual understanding that there are sometimes things that are not

-- just optimizing this circuit can actually make a huge difference. And we, by looking at it more closely, can actually optimize this so it has 50,000 XOR and only 128 AND. Okay, so it's a much, much smaller circuit.

>>: I didn't understand something about the general idea. What if in the second protocol

P2 cheats such that it is -- what will P1 do with...

>> Yehuda Lindell: P2 can't cheat.

>>: Well, if he catches him cheating. If he puts in garbled and P1 catches him...

>> Yehuda Lindell: P1 doesn't catch him. It's a secure computation. It's a black box. You can't cheat. In our protocols there's no such thing as cheating; you just can't cheat. So

P2 puts in garbage, he doesn't get anything output -- it doesn't make any difference. It's perfectly safe. It's a fully secure computation. So we can get it down to this. By the way, this is much better because we know we have free-XOR. But 50,000 XOR just means you still have to -- this is so much when you're doing 128 times that. This is actually going to start costing you. And you have to transmit also a whole lot of keys. So if you work even harder -- I don't write it here -- you can actually get it down to be 128 ANDs plus -- I didn't write it but it's a couple of hundred XORs. So look in the paper if you want, but you can actually optimize the circuit to be tiny and when it's so small then you have

125 of them but it's completely insignificant when you compare it to the other circuits. So essentially the cost is 40 times the full circuit rather than 125.

Okay, so that's what we go to. Now can you do any better? It doesn't seem like you can do much better. It doesn't seem like -- We didn't cut-and-choose. Obviously you can do a lot of other things which are complete cut-and-choose of Yao. But if you want to do cutand-choose of Yao because you know Yao and you've been doing it for seven years and you don't really know anything else then you want to be able to use these things we've done. So it seems like we've got to the end. And so we can further optimize other things but can we do any better? Can we get below S circuits for 2 to the minus S security?

And I would guess if I want to say something -- By the way when I conjecture things in crypto it's always the opposite. So whatever I say you should go and try and research the opposite direction. I actually have some good examples of students that I've managed to really mess up because of that.

But I would conjecture that on a single execution using cut-and-choose, [inaudible] thinks he can't do better than that. But happens if you want to do many executions? So in very recent joint work with Ben Riva we looked at what happens if you want to do many executions. And I want to do 1000 executions, so I want to batch Yao out. Then

P1 will construct some multiple of 1000 circuits. And P2 will ask P1 to open each one with some probability, maybe a half, maybe something different. And then P2 will check; if all the ones he opened are good then he'll take the remaining ones and he won't take the remaining ones and put them -- what he'll do is he'll randomly throw them into different buckets. So he'll mix them all up and throw them into random buckets.

And because of the way our protocols work, security will be maintained as long as there isn't any single bucket where all the circuits are bad. Now this actually makes it much harder for P1 because P1 can [inaudible] many bad circuits when he'll get caught. And if he chooses only a few circuits then once we throw them randomly into 1000 buckets, the probability is any single bucket with all bad circuits is going to be very, very small.

And this actually turns out to be -- So when we started talking about this, I sat down at home on the back of an envelope -- It was really on the back of an envelope actually -- I wrote down some numbers with an incorrect computation that came out to be so low I was like, "Wow, this is either too late at night and my calculations are really bad or something big is here."

So if I want to do 1000 executions -- And actually I'm not going to open a half; I'm going to open 15 percent only. Then what happens is that I'm going to only need an average of

7.06 circuits per execution. So with 7,060 circuits I'm going open 1060 and check and then, I'm going to compute 6 circuits for each -- evaluate 6 circuits for each execution.

And the probability of cheating is the minus 40.

>>: So the inputs for all the 1000 executions...

>> Yehuda Lindell: Are different.

>>: Are different.

>> Yehuda Lindell: Different.

>>: You consider it a failure even if one of these inputs are -- I mean...

>> Yehuda Lindell: There's no failure. It's malicious security.

>>: But I'd be okay as an adversary if some of my inputs are revealed. Maybe there are some inputs that are crucial and the others aren't.

>> Yehuda Lindell: I don't care about the adversaries. I care about the honest guys. It's full security. As far as I'm concerned it makes no difference if I learned your input or if didn't learn your input. It makes no difference whatsoever. What's important is an honest guy, I get always the correct output and privacy is maintained. What's happening underneath is philosophy and we don't care about philosophy. Now if you go up to a million circuits then you can actually get down to just 4.08 circuits per evaluation which is absurdly small. But if you only want 30 executions, you're still getting down to 13 circuits.

So it's a third of the cost. It's actually really significant. And it's not so simple, by the way, because you have to -- It's not so simple because the way the cut-and-choose works, the inputs are intertwined with the checks so it's not easy to separate things out.

But we can do it and we've done it. What's more interesting in my mind is not batch because, okay, to batch 1000 is hard. I want to do offline-online. I want to at night send a whole lot of circuits, check them, do all the work, and in the daytime when I get query I want to send a little bit of information, very little bit of bandwidth. I don't want to do any exponentiations or maybe like a really few exponentiations, like constant amount per evaluation, a really small of exponentiations. And I want to just evaluate a few circuits and that's it. If I can do that then I am going to get to really, really, really, really fast computations. And that's what we've done actually. It's, again, a lot of work you have to do to separate out the inputs completely from all the checking. But that's what we've done. And now at night you can do a thousand or a hundred thousand or a million and you get really, really small online time.

So if you're doing a million then online time is just four evaluations of a circuit with no checking. And we know evaluating is much quicker than checking; it's a quarter of the time. A four-evaluation circuit is your entire online time. If you're using just garbled or something like that this is like ridiculously fast. And this is something which is -- I think it's, again, another essentially order of magnitude faster than what you can do in the single-execution case. And often online in many practical scenarios is actually what you need. And so this gives you really, really must faster improvement.

So I'm going to spend just a couple of minutes, I want to talk about one other thing. I want to advertise essentially SCAPI which is a library we have for doing secure computation implementations. I think everyone here is or most people here are interested in doing this. And we aimed to build a library that was not for implementing our protocols but really for the community to be able to use. So it's a general-purpose, open source library. And we have a long term commitment to the library. So I can tell you now we have money for at least six more years that we will support the library. We will help you install it, and if you have any problems, we will give you support. And if there are bugs, we'll fix them. And we'll add functionality. So it's not like, "Yeah, it's a library there and someone --" it's well documented and everything. We've tried at least.

We have comments and a whole Java doc. The aim is for everybody to use it and it's not for us. That's not why we did it.

It's written in Java so it's good for large projects but it's not inefficient because we used

JNI to wrap low-level libraries. And now we have -- So you can do a pure Java with

Bouncy Castles. We have Miracl underneath or Crypto.pp. I'm actually working now to have it on top of open SSL. So it can be used very efficiently even though you're writing in Java. And the idea of the library is it's written in terms that cryptographers understand, so you'll implement your protocol using an oblivious transfer and insert a random function. And over a generic DLOG type group, you don't care what's underneath and you can instantiate in many, many ways.

I've said that. I'm just going to run through this very quickly. I just want to say, how would you work with this thing if you implement an oblivious transfer protocol that uses a DLOG group and a commitment scheme and a hash function? And you would have this theorem, right, in your paper, assume that DDH is hard in the group, that the hash function is collision resistant and the commitment is perfectly binding. Then SCAPI will actually enable you to do that within the implementation because it has hierarchies of security levels. So every protocol has an interface which says what its security level is.

So if you're a hash function you can be target collision resistant or collision resistant. If you're a group you can be either just secure against Discrete Log or CDH or DDH. If you're encryption you have all these different things here.

And then, just to show you how this would work -- I'll skip back to that -- in Cramer-

Shoup, for example, when you get a DLOG group then the first thing you ask a new constructor, you say, "If the DLOG group is not an instance DDH then just return an exception saying the DLOG has to be DDH." And, likewise, if the hash function is not collision resistant then you say you need that. So that's the way you would do these things, and you can actually enforce security levels in your protocols. It has three levels.

It has basic primitives. It has non-interactive schemes. And it has interactive protocols which for now has oblivious transfers, garbled circuits, sigma protocols, zero-knowledge, zero-knowledge proof of knowledge and a whole lot of different types of commitments. It also has compilations. So if you write a sigma protocol and immediately, automatically you can do ANDs and ORs of sigma protocols. You can get zero-knowledge and zeroknowledge proof of knowledge just automatically. All you have to do is implement the proof or the three-step of a sigma protocol and everything else comes for free.

So it's actually very -- We're continually adding to it. So the OT extensions that I talked about in the beginning of the talk will be there very, very soon. Just garbled is going to be there very soon. We're writing [inaudible] open SSL will up soon. So we have all of that. I'm going to conclude my talk: we've made huge progress for efficiency for the computation. And huge progress is from, I guess, seven, eight years ago there was fair play which showed maybe you can actually implement something but that was all there was. And now we can do things which are just extraordinary. But I don't think we should ever think that we're there, that we're at the optimal stage because maybe there's always something different. There's a different type of model we can look at. Maybe we can now look at online-offline or batching and other things, and there are a lot of other improvements to make.

Algorithms and systems optimizations actually make a massive difference today because cryptographers have done good work in getting the cryptographic cost down.

And the small things can make a huge difference. So matrix transpose or designing of a bootstrapping circuit in a special way can actually reduce the cost hugely, and these are

things that we have to look at as well and not think about just the cryptographic operations. So thank you.

[Applause]

>> Seny Kamara: So we have time for questions.

>>: I have a weird question maybe but is it possible to break the circuit into small subcircuitcuits and then garble them independently and then open them, open a small subcircuitcuit? And then, for example, you need to open some of them and then choose the one that -- After opening them you can choose the one then merge them together and build a huge circuit then compute...

>> Yehuda Lindell: So in general you can easily work on sub-circuitcuits and then merge things together; that's not a problem. But the question is why you would want to do it. So if you're worried about memory management then this can help. But you could also do streaming possibly for that. If you think you can get a better error, you reduce the error down, now I'm not sure because as soon as you have one sub-circuitcuit which is incorrect or one bucket or in one sub-circuitcuit all of them are bad, this may ruin absolutely everything. So I'm not sure whether you can use it to get lower error but for things like [inaudible] have to work on a massive billion-sized circuit that definitely is something which has been done before actually. David has work which has done that.

>>: So I didn't understand thing. So assume I get the output. If I have two output wires, I get the input of the other guy. How does that help me in not needing to tell him he's cheating?

>> Yehuda Lindell: Well because now I know his input so I can actually compute the output correctly. I just compute f of x, y.

>>: But he still cheated me.

>> Yehuda Lindell: He cheated me but I can't tell him he cheated me. You still have to be in this traitorous relationship.

>>: Okay.

>> Yehuda Lindell: You can't get out of it. We have to play with cheats and that's an unfortunate effect. Does it bother us? That's a good question by the way. It's a good question whether these types of things should bother us. If we're running protocols that are maliciously secure, I don't think this should bother us. Because we know that even though you try to cheat me, you can't. Right? So you might try to cheat on me but I know you're going fail every time so why do I care. If I'm running covert protocols where you actually have a reasonable probability of succeeding in cheating then this would be a very, very bad property. You wouldn't want that. But for malicious, I think it's okay. Try.

You're going to fail anyway.

>>: [Inaudible] say I think amortizing and running several executions is a nice idea. And just a comment on a concurrent and intended work, we are also looking at that and we took roughly the same approach.

>> Yehuda Lindell: Nice.

>>: We'll talk about our differences.

>> Yehuda Lindell: Great minds think alike.

>>: I was just wondering here, I was interested in your abort problem that you mentioned here. You know, [inaudible] force abortion or if you abort, [inaudible] something about your input. And it seems to me that even in [inaudible] oblivious transfer case that can happen. Suppose that Alice inputs are 0-zero word and the 0-one word and so she can send some kind of illegal input. And there it's even worse because if [inaudible] complains or not, Alice already knows what has been chosen.

>> Yehuda Lindell: Right. So [inaudible] there's a safe abort and there's an unsafe abort. So what I was talking about was an abort that would reveal in general we allow aborts and these aborts can be safe. Oblivious transfer specifically the reason why as a standard primitive we're not worried about that is because I can always take a default value when there's something which is incorrect.

>>: Okay.

>> Yehuda Lindell: So we can actually inside the protocol solve that problem. And many protocols, by the way, the standard protocols work in a way that all possible inputs by

Alice are valid and so it doesn't really matter. There's no such thing as an invalid value.

And if we using a higher protocol then we have to take care of that. So in fact within Yao there's this well known selective [inaudible] which actually works on giving an invalid key.

You have to make sure that the higher protocol uses that primitive, takes care of it. And that's exactly the warning, the general warning that -- Because you're using secure primitives doesn't mean that you get a secure protocol. I have a perfectly secure OT; it's not enough to do cut-and-choose and use secure OTs. You'll be horribly, horribly wrong and this is one of the problems that arises. But that's not because of the OTs; it's because of the use of OT.

>>: Someone had asked this question: do you think we're coming -- You have not talked anything about concurrent security of these protocols. Is there a case that -- I mean, can you make a case that you need it? Or when you say that --.

>> Yehuda Lindell: In general we know that you need it, right? We know that and so depending specifically on the protocol but there are UC versions of these protocols where typically the only additional cost is when you have zero-knowledge proofs. Then you need to make them UC secure. Everything else [inaudible] without rewinding. So this compilation of sigma protocols to UC secure which are relatively expensive but because the number of proofs aren't that big, you can live with that. So there are UC secure versions of most of these protocols.

>>: So it seems that for functionalities where you can reduce the evaluation of a function on normal inputs to that of evaluating the function on random inputs, if you could run the -- For example, oblivious transfer you can do that.

>> Yehuda Lindell: Yeah, yeah. So...

>>: So if you catch the adversary during the random phase then you don't really reveal...

>> Yehuda Lindell: So I would suggest you look at it. I actually once tried to look at that and see -- Actually I was looking in the context of covert. If you had such a thing then you could do very efficient covert protocols generically. So I once tried to look at can you characterize the functions [inaudible] random reduction. I didn't succeed, but I think it's a very interesting direction. I mean, you would get these sort of generic construction of covert from semi-honest.

>>: There are plenty of examples for [inaudible].

>> Yehuda Lindell: Yeah, but...

>>: [Inaudible] pre-processing model?

>> Yehuda Lindell: No, not that's not random inputs. That's pre-processing model that you don't have the inputs inside. If you had a product call...

>>: [Inaudible] -- Okay [inaudible]. Yeah.

>> Yehuda Lindell: Well, that's -- He's -- I saw the question differently.

>>: Yeah, yeah.

>>: I have a quick question about the SCAPI that you have implemented. Are you just considering MPC based garbled to get [inaudible]...

>> Yehuda Lindell: No.

>>: ...on your approach [inaudible].

>> Yehuda Lindell: The library -- The aim of the library is to be as generic as possible, so it has garbled circuits inside there but that's just one class. It has oblivious transfers, sigma protocols, any compilation to zero-knowledge. It has a whole lot of commitments.

And we're going to be adding primitives all the time. The idea is just to have a library where the primitives are there and you can take them and use them for anything else. It doesn't have, for example, things like the information theoretic in there yet. But this will take more time but the aim is a generic library that can be used for crypto in general. But it's more aimed for protocol development, and it's very modular. So it makes it very easy to implement and then to instantiate the primitives in different ways and compare.

>>: Does that implement just simulations or it can be used for real applications? Is it discrete event simulation, for example? Or, no, you are implementing the actual protocol?

>> Yehuda Lindell: No you're implementing the actual protocol. So the protocols that we've put in there are all ones that are perfectly secure, are full correct implementations.

And their efficiency is based on the fact that actually a lot [inaudible] primitives are in C but it's wrapped with JNI. So you work in Java but you get efficiency of C.

>> Seny Kamara: We have time for one more question.

>>: What are you thoughts about fairness?

>> Yehuda Lindell: So to be honest with you I think the world sucks and it's not fair at all.

[Audience laughter]

But since we said we're not talking philosophy: so we know that you can do much more in fairness that we ever thought. A lot of things fair. Can we do it efficiently? I actually thought about it recently. Can we take a small function like And, can we come up with a really efficient protocol for doing fair And? [Inaudible] is saying, "Yes, of course we can."

You know how to do it? I don't know.

>>: Take you protocol and --.

>> Yehuda Lindell: Yeah, but how concretely efficient is it? I don't know. That's...

>>: It's two rounds, right?

>> Yehuda Lindell: Well, we have to do this secret sharing preprocessing. Maybe that's easy to do, I don't know. But I think -- Or doing a million errors, for example -- or let's thousand errors because the number of rounds is...

>>: Any party -- distrusted party where you can issue this model where if a person like stops...

>> Yehuda Lindell: Oh, the optimistic. I really like -- If you're talking in terms of practice,

I thought you were the theoretician. Sorry, I apologize. I think the optimistic model is fantastic. I think it's a really great model. The problem is this no business incentive for anyone to ever be the optimistic server because no one's going to pay you -- I mean, let's say they pay you per use. The whole point of the optimistic model is once the server is there, there's no point in any body cheating. But let's say you could overcome and you have someone who really would be this optimistic server and they would do it for the good of mankind, and there are people who are interested in the good of mankind. There are not many but there are. And I think in terms of practicality, it makes a lot of sense.

And then, this is a very question, can we take all of these protocols that we have and what will their efficiency be when we actually work in the optimistic model? If I remember in one of the more generic constructions you essentially have to have this encryption of the value under the public key of the -- This is going to be hard to do within circuits. So I think it's actually a great research direction to try and do concretely efficient protocols, say, based on these circuits for the optimistic model. It's a nice idea.

>> Seny Kamara: Okay. Let's thank the speaker.

[Applause]

Download