>> Kristin Lauter: Okay. So this morning, I'm very pleased to welcome you to the cloud cryptography workshop. As most of you know, this workshop was conceived and organized by Seny Kamara and Melissa Chase from the research cryptography group. And they've done a fantastic job of organize. I'm so glad you're all here. We have a great list of speakers and talks relating to cloud cryptography. So to kick off the conference, we're very pleased to have John Manferdelli, distinguished engineer at Microsoft Research to give us a little bit of and overview and background on cloud computing security. Okay. Thank you. Okay. Thank you. >> John Manferdelli: Thanks. Actually it's punishment. My father always wanted me to be an engineer, and I wasn't, so the title was punishment for something I did. So I want to talk -- we actually have two conferences going on at the same time. We have a cloud security conference workshop on focused on systems aspects of cloud security. In fact, I tried to have them on so they didn't overlap, because I wanted to go to both, but because the schedule's today, they're both on. So I'll be bouncing back and forth. So you've heard a lot about clouds. In some ways there's not too much new. Grids were a little bit like clouds for shared infrastructure for computing. They've gotten very popular, partly because of scale and partly because people have this sense that they're going to save a lot of money by using clouds. And I'll explain to you the reasoning behind that, which is a little bit flawed. It's mostly flawed around security. But it has a few other problems too. So let me give you background that you all know. There's a tremendous asymmetry between attackers and commercial operations. Generally even clients -- everybody who's under the misapprehension that we've got things under control, they're just wrong. There are new attacks every day. If somebody wants to have a targeted attack, they can do it. It's not like they stay up at night saying gee, I wonder how I'm going to get this new one to get poor Orr over there if they want something. The clouds have -- do have new -- offer new uses. And probably the biggest one is just as a huge data store, common data store where you can bring computing to them, look at an enormous amount of data, whether it's social networking data or search data and analyze them. The most corporations, if you ask Bill Gates, he thinks a huge part of IT will move to the cloud as cost saving. So that will be a big business. Even for somebody like the government just being able to use the shared infrastructure with something simple as computing -- I'm sorry, communicating, bandwidth, is a good deal because the cloud providers get preferential rates on things like that. Many people, including many of the applications you're going to think about in this conference, really have to do with clouds and clients interacting. It's not just about IT running their payroll or whatever they want to do in the cloud. But some service that has a client side on a mobile phone, on a PC, on whatever. And a huge service in the -- in these clouds somewhere. And so the whole, you know, actually people really do now have to think about distributed security. They had to ever since they plugged in the Internet, but now it's -- you're really going to pay attention to clouds you have to do that. And there are geopolitical considerations. Some are obvious, some are not. Privacy concerns in this countries are different than others. If you're in a data certain and it's being used as a botnet, you run the risk of being closed down because somebody has to find the botnet and the cloud provider hasn't isolated them well enough to say that thing over there. And there's certainly been occasions where that's happened. Also, cloud infrastructure is usually pretty homogenous. There is a case where Amazon -- and this wasn't even through a malicious attack, went down for 18 hours because there was a single XNL flaw in a protocol that was used everywhere, and somebody accidental triggered it. It was called a packet of death. But even with that, there's really quite a strong drive to the cloud because of this alleged -- again, it's not completely crazy, 10 fold cost benefit. The final thing is when you control your own computing infrastructure you're focused on what you're trying to do, mission, the problem, and so whatever you do that security has tailored to the thing you want to do, if you're not too worried you do one set of things. If you're very worried, you have a lot of control over what runs where, when, how you audit it, how you look at it, what you encrypt, how you share it with other organizations. But in the cloud, just as in the peak infrastructure, the focus is on cost savings and brand and marketing. So sometimes you're in this whacky situation where you say to somebody that's a very high -- that thread is really bad and they say to you, it's not so bad because when they get hit, they won't know about it. If a security guy goes crazy or something like that. But again, for brand preservation there's some thinking that that's not quite as bad as having all your data stolen. So there are many interacting things. So here's the economics. Just, this might be interesting just as a very rough guide of why people think clouds are going to be a lot cheaper. So if you have a very large datacenter, let's say a couple hundred thousand machines, the cost of buying the hardware, because you have a good -- you know, you're a big customer, you have preferential rights. And the savings you get from power. There's a four fold preferential among the power rates and you obviously at the best at the datacenter. Typically you put the datacenter close to a power source. And the argument for a good rate is they lose less power in transmission. And bandwidth costs, those are low. So all together, the total cost of ownership of a server in a cloud is less than half of, you know, what I would go out or some small company would go out and buy a server for. Maybe even a little less. So there's a twofold improvement there. The second improvement has to do with utilization. The argument is that when I run a server in the cloud I can get about four times as much work by sharing loads on it. Two to four times, depending who you talk to. And two times four is eight. Well, there's the eight fold savings. And in some cases it really does work for standard services like search or mail, which are pretty undifferentiated services. Everybody kind of does the same thing and just -- that works out pretty well. It works less well when anybody cares about security. So in data centers now, this is probably the right crowd, this drives me crazy, it's I don't know a datacenter where if you want to run a secure service you don't hand the datacenter operator your private keys. There's something wrong with that. Several things. That's actually the way it is now. This mostly was for the other talk. For the non believers, people who think security, you know, basically is under control. I told my wife a couple days before the conferences I could go to the register and find the latest five stories about what bad thing happened and this is prejudiced towards Microsoft or against Microsoft I guess I should say. So again, emergency patch for Windows. It's a serious problem. I think it just got patched yesterday. What day is today. BP, while they're trying to protect their brand actually was infiltrated. So some information they didn't want out actually got out from an attack. The iPhone had a Trojan, which you'd expect. Citigroup, which really is well intentioned, they want to save their own money. They're at risk when people use their application and something bad happens, an attack on the 27th. Everybody of remembers the Google attack. It's infamous. And that was a cloud attack. Windows, you know, if anybody thinks the problem is all solved, Tavis at Google found a bug in Windows, and escalation of privilege bug that was there basically since the start of Windows. Flash is just too easy a target, so we won't talk about that. And I got this graph of attacks from our malware center. I don't even know what the scale s but you could tell it's bad. [laughter] it's very bad. The problem even on a client machine, on a server, one you control, there are two sets of problems. The first set of problems is the hardware. What it can isolate and where? There's some pieces of hardware which are just not well suited to isolation for competing adversarial applications. The screen is one. But that pales in comparison to the software problem. Which mostly has to do with us using OS legacy which grew up in a time when time sharing -- if there was any security at all, it was the timesharing model where there was a -- you know, a single administrator who knew everybody who used the machines who actually personally installed and understand all the important software and knew the people who were going to log in. Gave them their logins. Which is of course not the world we live in at all. PCs made that a little bit worse because it went from several people using the same machine to one person using the same machine and why were you protecting yourself from yourself? And then people plugged their machine into the Internet and things went bad. OSs are huge. And everybody gets everything for some reason. Configuration is very difficult. I personally don't know how my machine is configured. And I try to understand from time to. If you look in your Windows root store, you'd be frightened. But the major problem is vulnerability anywhere in the stack affects everything. Which is very worrisome in a shared environment like computing infrastructure. In most cases I don't worry too much about attacks which involve physical possession. But in the cloud setting that is a problem. This datacenter is run by insiders. You never see them. They don't -- they're not loyal to you, they're loyal to the organization that hired them or possibly somebody else, which is even worse. So insider attacks in these things are major problems. And people can dismiss them in an enterprise setting. You can't walk away from them in a cloud datacenter setting. Still, the Internet is great at collaboration and sharing. We all, you know, get papers that are posted on on the Internet. It's really a wonderful environment to get information. So you sure don't want to give that up. The commercial infrastructure with all these problems has moved to clouds, and it's going to take those problems with it. So what are we going to do about it? Is sort of the question. I had a -- I was on a government panel. I think my -- and maybe this -- people in the room will -- the usual way to do with security I would call it phen0menological or ad hoc is probably a less classy word for the same thing. It's people sort of try to prevent the attacks by noticing what the last bad thing happened, fixing that. There's a lot of research in detecting attacks. It's interesting. A clever person, it's very hard to detect attacks. So the research is quite interesting. But again, the success rate isn't great. You wouldn't want your life to depend on always finding out when you got attacked. Then they try to mitigate the attack. Okay, we can't really fix it but we're going to do things so it's hard for people to exploit it. And finally, recovery. So which a disaster happens where is the CD that we go, we initialize everything with? And these are all valuable, actually mostly for analysis to see what's around. So I won't list all these. But one example is address space layout randomization, which sort of combats a little bit of the homogeneity problem. It will not stop all attacks. But the nice thing about the randomization is if you only see 10 percent, one percent of the attacks because the randomization causes them to fault some way, you can detect attacks. They're a very sensitive indicators of attacks in the wild. They're not a way to stay safe though. They're just the way to find out what's going on. So we're losing. In both conferences the question is what should we -- could we do in a more fundamental way F this were -- you know, if people treated security as a science and not the sort of ad hoc phenomenological thing they do. And so you can look at a couple of models -- physics model which isn't completely appropriate. It's probably [inaudible] subtle not malicious, the world we're in, people are not necessarily subtle. But they are malicious. And they often get away with it. So the role of adversary enters. But that doesn't mean you should give up. Economics is in that same model. So I went back to my old physics books and, you know, if you just sort of glance at what's important to scientists, they observe and guess, which I could say we're doing now, but they want nice theories that predict and explain phenomenon. It's -- you know, you shouldn't be able to make money as a scientist for predicting the past. Theories have to predict, you know, phenomenon before they happen, which is apparent -- you know, is not always true in security. And they look for simple laws that are comprehensible so you can analyze what's going on and make the prediction. And final, they're verifiable. I mean, describe all the crabbing among physicists if you have an experiment that disproves a theory you probably don't get the kind of crap you'd get if you were in a political debate. It's just, you know, didn't work. Too bad. So the goal generally, for security in the cloud, is you do -- you want to protect both from the classic external attacks that we see now but also internal attacks, insider attacks. And the environment has changed. Now you're at the mercy of insiders. You can't deny it. It's not exactly a hardware problem. In some cases it's very hard to protect hardware if somebody has physical possession. But, for example, in a datacenter run by somebody, Microsoft who doesn't especially like being sued, physical possession is less of an issue than security. You can have cameras on the machines all the time. There are very few people, and in some data centers you roll up the truck with all the computers, and you never open it. When things break, they're gone. Until you throw away the container can. So physical security can be a problem, but it's not as big a problem as software. So I'd like to go back and say, well, what do we want from these data centers, and these are the sort of the buzz words you'd all look for. But you'd like a simple model for why whatever you're going to do provides them. Not complicated we did this thing here and did this thing there and maybe if everything works out we won't get hurt. So the first conference looks at this systemically. And their basic is what can we do so that when you run your software on a machine you have -- there are three properties? You know when -- what software and hardware you're relying on for security. You can measure it in your software somehow. Cryptographically, actually. The second thing is when you're sharing resources, let's focus on the CPU, they really are isolated. And again, there are still issues with things like side channels. But again many of them can actually be addressed by hardware by assigning bandwidth, that sort of stuff. But really isolated. Not what happens between one process on Windows or Linux, for that matter, and another process, like they're isolated. At thing is the program, not the people, should have secrets that only they see. The guys if the datacenter ought not to see your credits. And finally you ought to be able to do this in a manageable way and you should be able to manage this cryptographically over the Internet. So in the systems setting this is one solution which would make clouds believable for somebody who had sensitive data. And again, once you have this environment, you never, ever, ever, let anything out unencrypted outside your partition. The disks always contain encrypted data. That's it. You always transmit stuff encrypted. So if you mess up, it's your own fault. Correspondingly, and this is an example, I think much of what you'll talk about goes to another problem, let's just give up. Let's say the datacenter's run by Joseph Stalin and we still want to get some useful work out of it. Is there some simple guarantee, some simple way, and are there some simple things we can do to do useful work? And I think it turns out there are. A lot. They'll get better. But even the simple ability to store stuff because the cloud is guide good at communication and storage, in an encrypted manner, get to it from anywhere. With the additional ability to search it, that's actually a huge benefit. And if your datacenter is run by Stalin, you don't worry so much because you're also using a datacenter run by Chairman Mao and somebody else. Whoever the bad American guy is. So there's some redundancy. So that's the goal of this meeting. And I think it's actually quite important. I'm a little bit skeptical about the hype -- in fact, I'm a lot skeptical about the cloud. But the cloud will have a lot of benefits. It will eventually do many of the things people hope they do, but not without being safe. And right now they're not safe. So I look forward to conference. I will be in and out today. But I will mostly be here tomorrow. And I want to thank you all for coming. I'm really glad they're being recorded, so the ones I miss today I'll get to see. >>: [inaudible] conference? >> John Manferdelli: Yeah. Maybe. Actually it turns out -- so the first day it was quite different. The second day we're going to talk about TPMs, one of the things that let you boot these things. And we're going to talk about searchable symmetric encryption. But you're going to hear that talk here in this session, too. We duplicated it. You're welcome to come, but I don't know that -- I think we've arranged it so that you won't have to generally. Okay. [applause]. >> Ran Canetti: Okay. So when prepare the talk, I wasn't sure about what the audience will be, so I want -- so I [inaudible] the first half to be kind of more for wider audience and second half for technical. And but then also planned it for a little bit more time, so maybe the second half will come shorter. Anyway, so what does this code do? Well, if you know C that you can stare at it for few minutes and you can see this program basically count a number of primes from one to a hundred. There are two loops, one inside the other. It checks. Iteratively it is very easy to see. >>: [inaudible]. >> Ran Canetti: Yeah? >>: [inaudible]. >> Ran Canetti: No, I think it was -- yeah, it is cut off a bit. Scoot to the left, yeah. >>: [inaudible]. It's up to you. >> Ran Canetti: In fact, you know, there is -- maybe we can just put in it this mode. Whoops. >>: [inaudible]. >> Ran Canetti: Nothing is cut off. Okay. So what does this code do? So it's hard to see, but it's kind of hard -- if you start it, it's hard to see what this code does, but, in fact, the -- this code that is the same thing as the first code. Both codes count the number of frames who went to 100. In fact, the second code was at least easy -- it was generated from the first code via mechanical transformation. It's just a very simple number of syntactical operations changing the names of variables. It's turning loops into recursion, et cetera, stuff like that. So this is the process of program obfuscation. And what is program obfuscation? So it's very different things to different people. So from one point of view program obfuscation is an art form. It's an art of writing unintelligible or surprising code. In fact, there are several yearly contests and lots of creative code out there. So here is the winning entry in the 15th international obfuscated C code contest. In 2000. So the thing starts in '85, I guess. So the author said -- so this is a C program. It's hard to say from far away. But instead of making one self-reproducing program, what I made was a program that generates a set of mutually reproducing programs, all of them with cool layout. So if you fit it with itself, you get another phase and, et cetera. So this is one way to use obfuscation. So here is the winning in the same contest four years later. So this is an operating system. Maybe a bit smaller than Windows. So this is a 32-bit multitasking operating system for x86 computers with graphical user interface and a file system, support for loading and executing user applications in elf binary format with graphics and a command shell and text editor, et cetera. So this is another cool thing. Okay. >>: [inaudible]. >> Ran Canetti: It runs. Okay. So this is one way of looking at program obfuscation. So but also a program obfuscation can be something else. So it's also useful tool for hackers, right? So if you're a hacker, if you want to attack a system, you want to hide what your program is doing, and, in fact, many viruses, worms, and malicious code try to hide their code by using a number of techniques. So some of the techniques make sure that the code that is running on the computer is different than the code that is written on the disk. The code may even modify itself as it's running and, you know, maybe even keep one -only the one piece of code that is running in the clear and the rest is encrypted and the keys are changing all the time by the program itself. So there are all kinds of different techniques to hide the code. And, in fact, here is a Web page that was blocked by an intrusion prevention system, something that a student at Tel Aviv found. So this is what it looks like if you just read it. If you just look at the Web page, maybe you think it's a maybe a picture or whatever, a JPEG. But if it turns out that you -- but this is underlying underneath here is basically it's a redirection to some malicious website as soon as you click on some pixel on the screen or some frames on the screen and it redirects you without realizing, to some malicious website. So that's another thing that obfuscation is being used for. But also obfuscation is also a business for the point of view of the good guys. So if you look on the Web and you look for program obfuscation, you will find many vendors that are willing to study obfuscation code. That's code its compilers are going to turn your code into something which hopefully has a safe functionality but looks unintelligible to others. And there are very many good reasons to do this if you're a software vendor. So you want to put software up on the Web or in the public, so you want to protect your IP, don't want people to understand what your code is doing. You want to maybe prevent people from modifying your code in intelligent ways. You may want this obfuscation to be a good way to stop hackers or to slow hackers. So good reasons to obfuscate code. And people are doing it. So there are many techniques again for obfuscation out there. Those commercial -- obfuscation tools. So one set of tools is to obfuscate the source code. So you do variable renaming and you change the control structure, a structure that's seen before, and you may do some higher level semantical changes in your program. And another set of techniques is to obfuscate the object code, forget the source code, and then you just look at the machine code and you add redundant operations. You variate opcodes, you variate modes. You know the same operation could be done in many different ways. Then you encrypt unused modules, et cetera, et cetera. But most of the techniques are proprietary and the name of the game here is security by obscurity. So I won't tell what you I'm doing and you will never be able to understand. Or never but will slow you down at least. So people are doing it. And but let's think for a second what could we do if we really had a good really secure code obfuscation mechanism? It could be really great. It's a really cool thing to have. So assume we really could take software and obfuscate it and make it look like tamper proof hardware, right? So you can't really make it look like hardware because you could always duplicate, however you cannot duplicate. But forget that. And let's assume that I can prevent everything else. So we could do lots of cool things, right? So as I said before, we could public size code without fear of misuse. So this great for code distribution for download, also great for cloud computing, great, I could give my code to the cloud. He would run it for but would not be able to understand anything from it, how it's run, won't be able to fully modify it. And so even if I don't have secrets that the -- all the [inaudible] is public, I don't have any secrets that the cloud doesn't know, even then he won't be able to do things in a meaningful way. And furthermore, if I actually do have some secrets and say yeah, yeah, yeah, run the program or write my program in such a way that the output is encrypted and only I have the key then, the server won't even know what he's doing, right? So really if I could obfuscate everything in an efficient and effective way, security for cloud computing would really be not a problem. >>: So you assume that actually did the [inaudible]. >> Ran Canetti: So fully homomorphic encryption is a technique that doesn't go all the way to obfuscation but it's very useful. >>: [inaudible] in general. >> Ran Canetti: Then -- right. Exactly. So it's more general than fully homomorphic encryption. I'm saying in general. I take my program and write the input in a -- you know, I don't even have to have fully homomorphic encryption as a technique explicitly. I can just write a program that takes encrypted inputs, has a key in the clear, in the program, has the key decrypts, run the computation, encrypts again, and then that's a program with a key simplified the program, then obfuscate it. So it's the same functionality, but now the key is obfuscated and nothing you can do. So clearly fully homomorphic encryption is a very useful technique here, you know, to do this. But as just as an idea for what you are after, that would be great. So that's one set. So we can stop here, right because [inaudible], you know, workshop. But there are other applications. So you could publicize data with putting -- you know curbs on its usage, right? So you think of putting medical records online, but you just want to be able to allow to use them in certain ways but not others. So you can just do it, right, because data together with the obfuscated codes that there's a control. You can -- you can -- the code can really simplify secure distributed tasks, right? Because I want to run protocol, you know, I want to do, say, a voting among a number of people, I want to poll their votes and I want to get their results, so I just write a piece of code that I get from each one. It's fast forward and it's vote and just accumulate the tally and give me output to tally obfuscate it, and I let the people run it, you know, one by one and then we get the result. So it becomes almost trivial. Also, I have some nice game theoretic properties, so there is a paper Micali and Shelat actually use exactly this idea without calling it obfuscation. Assume there is a token that passes [inaudible] and that has nice properties. So you could also if you want to go further afield into cryptography we can turn -you know, we can get public key encryption from semantic encryption and signatures from MAC and run the functions [inaudible] and functions -- so we can get all those beautiful things. So in general obfuscation means -- it's an immensely powerful tool. So it could do many things. But unfortunately all the techniques that are out there that we know are all heuristic. And essentially they are all eventually reversible. There's no real security there. It's just annoyance to the hacker, right? And, in fact, the common wisdom is that all the obfuscation methods are doomed to be failure -- are doomed to failure at some point or another. That's common wisdom, and actually it's been recreated very nicely by this technique writer. So secure obfuscation is unlikely. The computer ultimately has to decipher and follow a software program's true instructions. Each new obfuscation technique has to abide by this requirement and, thus, will be reverse engineered. The computer has to run the program. Eventually you just look at what the computer runs, you'll be able to understand what's going on on. So that's the common wisdom. And is it true? So can we have really unbreakable obfuscation? And in fact, since we are cryptographers and so we like -- we like to think about things mathematically not just physically and empirically how really define what obfuscation means. So here is the -- so there's been a lot of work in the cryptographic community in the last 10 years or so on obfuscation, and in a rigorous way. And here's a definition. It was proposed Virtual Black Box by Barak, et al., so a general obfuscator is a compiler. It's also essentially inherently randomized compiler that has the following properties. First it preserves functionality. So for any program P, the output program Q, which is after running the obfuscator P has the same functionality. Maybe except for negligible probability over the choices of the obfuscator. And it preserves running time so, the Q runs -- doesn't run too much slower than P. And then there is this other property that actually obfuscates. And what does it mean? So very intuitively the high level means that having full access to the code of Q, you should not give to the adversary any advantage -- any computational advantages -- advantage on top of having oracle access or black box access or to -- you know, to secure tamper-proof hardware that runs the program P. And he's tried to quantify it in mathematical way. So here is an attempt. So for any poly -- that can be in the tradition cryptography for any polynomial tie adversary A, there exists a polynomial time similar to another adversary S on machine S such that for any program the -- whatever the adversary outputs after single obfuscated version of P can be output by S after a C having only oracle access to P. But this is obviously too strong, right, because the A could just output its input which is the obfuscated program and S could never output a program, any program that has this -- that has a safe functionality and then if this program does anything useful then of course things will be distinguishable and so this is too strong. So let's settle for something slightly weaker. So now I will say that the adversary only tries to compute some predicate R of the program. So just outputs one bit. And then want to say this simulated as the same thing. Manages to predict this predicate in roughly the same probability. Or equivalently if their probability of reverse the outputs one is the same as the probability that simulator outputs one. And this is called Virtual Black Box. So the main result of this paper is that even this weakened definition is unachievable, right? So they cannot -- you cannot have compilers or obfuscators that have this property. So in the interest of time, I -- I'm not going to go over the proof although it's rather simple. So the proof actually shows is two specific -- or one specific class of programs that cannot be obfuscated. And essentially the idea there is to do some diagonalization. And the fact that the adversary has some code for the program, no matter how obfuscated, allows it to do things that the simulator cannot do, having just oracle access. In a similar way to the way you argue the gains to definition that allowed the adversary to output long strings but in slightly more sophisticated way. But any way, the conclusion is that there exists a family of programs that cannot be obfuscated. And the idea there is to use the fact again the difference that the adversary has some code that runs the program, no matter how obfuscated, and the simulator has nothing, just oracle access. And there is a big difference there. So but what does this impossibility mean? So it definitely it resonates the popular belief, right, so that's why it's kind of very convincing. However, if you look at it a bit more closely, then what it shows only shows that certain classes of programs cannot be obfuscated, right? So that means that rules out a generic compiler obfuscator that can obfuscate everything, okay, but still there may be other, you know, specific classes of programs that can be obfuscated. It also only considers a relatively strong notion, this Virtual Black Box, and this equivalence to tamper-proof hardware and essentially the impossibility uses in a very strong way this particular feature of the definition. So this leaves open the -- and the questions what about obfuscation of specific classes of programs and what about weaker notions of obfuscation? And indeed there was a lot of research in the last 10 years or so or nine years on different variants of obfuscation. So just to say very briefly so there is -- so the one that -- one direction from research is -- work that obfuscates specific program families under different variants of notions. And then another set of -set of works showed connections in between obfuscation, other cryptographic tasks between obfuscation encryption, signatures, and other things. And then the other set of works to extend these impossibility results to have strong impossibility results in other cases if you require more [inaudible] things like information, et cetera. And then there was another set of work that investigate different notions, relaxations, additional features, et cetera. So there is -- so there is -- so this essentially the works that have been done. So in the rest of this talk, in the remaining time, which is not too many, I'm going to concentrate on a very specific thing. I'm going to concentrate on the original Virtual Black Box definition, although it's stronger [inaudible]. And the salient characteristic of this definition which actually I didn't say before is that you need to obfuscate any program in the family. So it's a definition there said that the obfuscate should work for any program in the family. And this requirement for any program in the family, although it's very natural for -- in the context of obfuscation, this is what you would really want, actually makes the definition very strong, as you will see. And here we can only do few things. But there are actually things that we can do. And this is this point obfuscation in France. And I'm going to talk more about this. So here is the motivation for the problem. So for point obfuscation, so assume Alice wants to post a puzzle in the newspaper. So those puzzles, find the differences between these two pictures. And you want to post it together with a solution that so that those who solve the puzzle can verify that they found the right solution. But they -- but the solution should not be in the clear. Only those who actually found it themselves should be able to verify, right? So it somewhat obfuscated, so like here, right. Here, the solution's here but it's obfuscated because its written backwards. Up side down. But Alice wants something slightly better than that. So again, what you want is that correct solutions will be accepted and incorrect solutions will be rejected. That's the program. And no information will be leaked by the program other than the ability to check different solutions. Right? So how do you do that? So this is really -- I'm saying -- okay. So assume that we had an obfuscator for the following family of programs which I call point programs. So here is a program in the family. So it's a program that has a value A in its belly and then for each input it just checks whether input equals the value in the belly and says yes or no. Okay? So that's the program. That's the family of programs. For each A there is a program for A. Then if we had an obfuscator for this, then Alice could post an obfuscated version of IA where the solution and be done with it, right, on the newspaper or in the cloud or wherever you want. And there is the functionality preservation implies that it's going to be okay and the Virtual Black Box implies that it's secure. So nobody learns anything other than access to the black box for this. So, in fact, it turns out that this can actually construct, okay? In fact, we could construct it before. We talked about obfuscation and we will just call it differently, but here the construction. So let G be a group of large prime order. And then what I'm going to do, I'm going to say the obfuscated version of I sub-A. So in short hand it's going to be the pair R and R to the A where R is random in the group. Or, more precisely, if you [inaudible] into the program, so here is the program. So you have two constants. Now, one is R and the other one is R to the A instead of having A in the clear you have those two constants. And basically when you get input you check whether R to the input equals B. Okay? So this is an obfuscated version of that point program. Okay? And the -- and now the challenge is to show that this is really case to actually prove which is the case. So actually I was planning to spend a few minutes on the proof, just to get the idea how those things look like. So [inaudible]. So functionality preservation is clear. So the -- it computes the right function, this program. And we show Virtual Black Box under the strong version of the Decision Diffie-Hellman assumption and what's this strong version. So the Decision Diffie-Hellman says that if you are R, R to the A, R to the B and R to the AB, it is indistinguishable from four random values or are limited to the group where A and B are chosen as random. And here I'm going to require this is the case, even when B is chosen at random, but A is chosen from any distribution which is well spread, which has enough in entropy, already, it has super logarithmic min-entropy in this security parameter. Okay? So I'm going kind of on a limb in saying of course when A has only poor number or only logarithmic min-entropy, there are some values that have some nonlinearity property then of course this thing is false. But I am saying that as soon as you have enough in entropy, it's okay. So this is true in the [inaudible] group model, et cetera, but it's a strong assumption. So how do you prove it? So it's two steps. So the first step is to show that this problem is obfuscated, obfuscation satisfies some slightly weaker notion of security which I call this -- not weaker, a different notion of security which is called -- I call it distributional indistinguishability. So what does it mean? That for any well spread distribution D, N distribution with a [inaudible] entropy, if you get an obfuscation of IA where A is taken from D. It looks to you just the same as an obfuscation of IA where A is random. Uniform. Okay? So here is -- so N indistinguishable I cannot tell the difference between the two things. So if we look again, what it means in our case is that the pair R, R to the A was A taken from any well spread distribution it looks like R and R to the A where A is uniform. Now, this was directly from the assumption before, so it's immediate. So now the hard part is to show that from this DI you can get a Virtual Black Box with a simulation distribution and this is actually generic, it has nothing to do with the specific scheme of the generic statement that any obfuscation -- any point obfuscation that satisfies DI also already satisfies VBB. So what do I need to show? I need to show that for any adversary A there is a simulator for any A, any single specific secret A that is fixed. The probability that A outputs one given obfuscation of IA is the same, roughly the same that the probability that S outputs one with oracle access to IA. So how do I do that? So what can S do? Right? S has this oracle that gives you one, one point that you have no idea, otherwise touches zero. It seems like it's useful, right, this oracle? So what does S do? So what it can do, it can say choose the random point R and compute obfuscation of IR and run A on this. And hopefully it's going to be a [inaudible]. But something that really works, right, because A -- any A, for any A, remember that actually there's little A, big A. So for any little -- so this have to work for any little A, for any fixed little A. But the point is that any adversary big A can always have some values in its belly and just ask -- and just run the obfuscator on these values, right? And if our fixed input happened to be one of those values that the adversary asked for, then simulator will not work, but simulation will not work. But intuitively this is not a real attack, you know, because this is really using this program as the black box so we should be able to get around it. So the claim is that this is all the -- that the adversary can do. The only thing that the adversary can do is check the program on some fixed polynomial number of As. And the proof is easy even this DI. Assume that there is a -- assume that there is an adversary and assume a polynomial set T such that A can tell the difference on with an obfuscation on values in T versus obfuscation of values not in T, right? So it's where T is super polynomial side. Then we can of course contradict distribution -- contradict DI because we're just using distribution which is uniform over T and then since T is super logarithmic, this distribution has enough entropy there and this adversary can tell difference between something's uniformity versus something totally uniform. So therefore, we know that for any adversary there exists such a set T. Now, the simulator is in business because now what the simulator will do, it will essentially -- here's this adversary. And with this adversary comes non uniform comes this set T so. What the simulator will do, it just ask this adversary basically ask all this -- ask -- sorry, the simulator will ask it's oracle on all the points in T if it's found the right point then simulation easy, just keeps the adversary in obfuscation of IA of that point, otherwise it can just give the adversary what he want today do before in obfuscation of random point. And this will work. Okay? >>: [inaudible] so how did you find T? You [inaudible] if and only if [inaudible]. >> Ran Canetti: So no. So there is a value which is a mean value for the adversary with output for when you gave -- when [inaudible] is getting [inaudible] for random value. So there is a specific value that's -- you know, that, you know, it's point four. So I'm saying T is the set of all those values [inaudible] something which is significantly different than point four. And [inaudible] has to be polynomial. Otherwise they contradict the other one. Anyway, so that's -- so that's the proof. Now essentially -- okay. So this is one construction. And, in fact, there are other constructions for point obfuscators. One by Hoeteck from very strong one-way permutations. Another one's from -other assumptions from strong versions of LWE, but those two actually don't get all the way to every input but only needs -- there needs some distribution on the inputs. So slightly weaker. But there are constructions. And another question is -- so we -- all of this construction needs strong assumptions. The question is that inherent. So turns out that it is to some extent. So again, the results by Hoeteck that if you have point obfuscators at all, then there not exist a non trivial or sub-exponential algorithm for circuit sat. And if you have such obfuscator with public-randomness, I mean there is all the randomness the obfuscator use is put up in the clear, which is like the ones that we saw is like that. Then there is this super strong one-way functions, so one-way functions such that for any adversary can invert only on polynomial number of points. Okay? So it inherently requires strong assumptions. Even though this small obfuscation pass. Okay. So this looks like it's a nice game, put puzzle in the name. But what is it -- you know, what -- maybe it's not very useful. But turns out that we can use these ideas to do some other things. We can do -- for instance, here is one simple thing that we can do. Instead of checking if your input equals some value, maybe we can check, maybe if a substring of input equals some secret values, now I have a hidden value in my belly and I want to make sure -- I want to check whether there exists a substring in the input which had equals my hidden value. So why is that important? Why is that interesting? So think of your security vendor and you want to put a firewall vendor or whatever, so you want to put on on some server on the net an algorithm that tries to find viruses. But you don't want to give the virus to the server because you don't trust the server, you just want to allow the server to defined the viruses for you in case he finds it but have no information about the virus. So of course you can do that. All right? Or the server or the cloud or whatever. So this is a kind of a very primitive secure cloud computing application, right. So you let the cloud find the viruses for you. So something else that you can do, so you can excel this is sort of checking here if your input really equals the point. Maybe you want to check that it equals close to the point by having [inaudible] something. And this can also be done. But I think this can be done -- this again, this cannot [inaudible] work doesn't talk about this definition of every input. It needs some distribution. So it's interesting to see if you can actually extend it to every -- but so this is something else that you can do. You can extend it to more structural functions. You can do -- you can do hyperplane membership. So assume that what you have in your belly -- what you keep in your belly is a set of points in some D dimensional space and you want to accept all those inputs, all those vectors that are orthogonal to your vector. And all the vectors on the hyperplane that's all orthogonal to your vector. So that's something that you can do. There's more structure. And using similar techniques. In fact, you can use this to get signatures with some strong properties that [inaudible] against leak in. This doesn't do as one of the best constructions that we have, but it's something. For signatures with weak keys. >>: [inaudible]. >> Ran Canetti: Yeah. Sorry. And [inaudible]. >>: [inaudible]. >> Ran Canetti: Yeah I say don't mention. Yeah. Sorry. Yeah. So something else that we could do. I'm running out of time. But something else that we can do is instead of having a program that just keeps one secret in its belly tell me one -- yes or no, you can have a program that keeps two secrets in its belly, A and B. And if I give it one security, give me the other one, right? Yeah. >>: [inaudible] Easter egg. >> Ran Canetti: Okay. Yeah. Okay. So Easter eggs okay. Why Easter eggs. >>: On Easter they go search for eggs all over the place. Used to hide it in code all over the place until the customer got mad. >> Ran Canetti: I see. Okay. Okay. So [inaudible] that's actually a good alternative name. So the Easter egg functionality. So again, I keep this hidden value in my belly and when I get the key or trigger of the secret that I expect then I release my secret. And you can implement this in a kind of modular [inaudible] way from point obfuscator. So what do I do? So what I publish is -- so assume that I have a point obfuscator where we obfuscate points acid before. So what I'm going to do, I'm going to publish N plus one point obfuscators and then what's going to be -so the first one is going to be a point obfuscator for this point, A, that A -- that the trigger before. And the other ones I'm going to be obfuscators of either A, if the bit of B is one, or if some other random point if the bit -- the corresponding bit of B is zero. Okay? So now if I have the right trigger A, I can first run the first one to make sure I have the right trigger and then I can just one by one of the other ones see if I fail or succeed and therefore get with the corresponding bit of B, right? And if I don't have A, then I'll learn nothing from this. So that's the way to do it. And the nice application for this is to do this nice thing that we all know how to do for 30 years already, which is semantic encryption, right? So because basically what I'm going to do, the encryption of M on the key case, I'm just published obfuscation of. >>: KM. So it looks a bit ridiculous to go back to encryption with all this, but it turns out this encryption is not so ridiculous, as such it needs a very strong encryption. And I'm particular what it gives you, it gives you security against [inaudible] keys, against weakly distributed keys and a also give you security against key dependent messages. So even if the message depends on the key, that's only those things that have been recently studied. It turns out that you can get all those things from these two lockers or obfuscation of Easter eggs and it turns out that this is actually almost the same. So the key can go back and forth. But actually can get more. So you can get even encryption which is both the same time secure against weak keys and key dependent and all the mix of those things. But turns out that I was cheating a bit. I have three more minutes. Yeah. So I was cheat a bit in this construction that I showed you before about Easter eggs. This construction doesn't quite work, as I said, because remember what I did I just put there for a bunch of obfuscations of values out to you, and I kind of like assumed I kind of made the quick conclusion that if one of them is secure than many of them put together that are also secure. But it turns out that this is not so easy and, in fact, it's not true in generally. So you can actually construct one obfuscator which satisfy definition but if you save several of them, one from the other, stop being secure. And, in fact, we don't have any construction of foreign obfuscation that we can prove that it's composable in this way under the notion of Virtual Black Box. However, but we can -- actually there is a way around it. So we can relax this Virtual Black Box for a a moment to allow for unbounded simulators. So now the simulator is going to be computationally unbounded. But it's still going to only be able to ask polynomially many queries to its oracle, otherwise the whole thing becomes ridiculous. And we call this Virtual Grey Box. So it's a weaker notion of security for obfuscation. However, this notion we can show that this -- out of the construction from before is actually composable VGB, not VBB. So that's great. And then it turns out that this VGB, Virtual Grey Box is so [inaudible] because actually suffices for this applications for strong encryption that we had before. And, in fact, it also implies that you also have this Easter egg functionality implies also encryption which is resilient to related key attacks. Right? So I have an encryption under a key K and under a key which is rated K plus 1 [inaudible] 2K, whatever. It's still going to be secure. And this is works in this crypto. Okay. So I'm running out of time. So there is this other set of requirements from obfuscation which is non-malleability, so I want to get, I want to be able make sure that a program that I have cannot be changed in a meaningful way to another related program. So this Virtual Black Box, it seems that if it guarantees that because a similar thing cannot do that of what could access to change to functionality in a meaningful way, so therefore the adversary should only [inaudible] it turns out that it doesn't really, you have to make more requirements then you can do it, then you can define it, then you can construct it. But only -it's -- we can do it in a very limited way. I think there are lots of interesting questions there how to do it. And that's almost it. So the one set of questions that remained is what if you cannot obfuscate? Obfuscation is a very hard problem. What happens and you cannot do it just out of scratch? But maybe with a little bit hardware assistance you can do it. Maybe you can have some secure hardware which is not everything but some secure hardware maybe you could do things. So can we have hardware assisted obfuscation? So of course if we don't make any restriction on the hardware then of course we can have a secure hardware -do everything in secure hardware and be done with it. But can we minimize the security requirements from the hardware? The hardware would be very small. So there are some partial answers. So we can weakly obfuscate any program, given only just one out of two memory, simple memory. But this doesn't really give us full obfuscation. It would be nice to be able to do something like this. And another question is so can you design a hardware that does it? Another question, can you use hardware that exists widely like a TPM, right, that every computer has this secure chip, small chip that could do secure obfuscation but only -- not loading the TPM too much. So this is a very good question and I think it's a great research question. >>: [inaudible]. >> Ran Canetti: Yes? >>: Sorry. [inaudible]. >> Ran Canetti: Yes. >>: [inaudible] number of bits, et cetera, in the TPM [inaudible]. >> Ran Canetti: Right. >>: [inaudible]. >> Ran Canetti: So the chip, assume the chip is slow, this TPM is slow. So you -- right. You don't want to wait for it too much. Okay. [inaudible]. Okay. So this is one set of questions about [inaudible] and some other questions that naturally are more high level questions. So can we have -- so this Virtual Black Box or Virtual Grey Box we can only obfuscate a very simple functionality, very specific functionalities. Can we have some generic obfuscation algorithm that go Greek and obfuscate everything or maybe almost everything or whatever, so larger class of programs with some reasonable notion of security? So we know that it would be for everything we can't but maybe something else. And the sad thing is that we don't think we would have Canada dates. It's not that, you know, I had candidates, I don't know how to prove that secure -- find the right notion. I didn't even have [inaudible] that does anything meaningful. And in a generic way. So we used to encrypt -- you know, we always go on the circuit gate by gate, you do this, we do that, we always could get everything, right? But here it doesn't seem to work. So this going gate by gate, this seems like inherently doomed to failure somehow. Because really what you want to hide is the internal intermediate values of the computation. So can we do anything? So in a fully unmarked encryption it sounds like a great thing and maybe you can use it but, you know, I don't know. And another direction is can we come with a set of cryptographic tools that will have practical obfuscation problems, you know? Those people out there that want to obfuscate code, those vendors can we have tools that can help them come up with better products that really can be used? So that's another set of questions. And another one is this length, this different way of looking at cryptographic problems maybe can give us some understanding about cryptography in general. So that's it. [applause]. >> Ran Canetti: Yeah? >>: Can you go back to your definition slide, way back. >> Ran Canetti: Okay. >>: And I was confused about some -- the alternate definition you gave. And maybe I'm just misunderstanding the quantification. But just a second. >> Ran Canetti: This one? >>: Yes [inaudible] is that over all possible [inaudible] over all possible input distributions? >> Ran Canetti: So for -- it's for any program. So you have -- so the way [inaudible] here there was no family here, which is very program general. But if you want to do something which is achievable, you want to fix the family programs and now you want to say for any program in the family it should happen. So you fix a problem in the family, and these two random variables -these two probabilities, sorry, should be the same. >>: Again, maybe I'm misunderstanding you but isn't it the case you've got a family of programs, each of which produces one 50 percent [inaudible]. >> Ran Canetti: Uh-huh. >>: It's very easy to simulate producing one [inaudible] input but not really signalling the program. >> Ran Canetti: Yeah, but so there's one specific program that's out there that I put, you know -- so let's say in the hyperplane. The functionality, the program has a hyperplane and it gives you input, yes, if it's here or no if it's there, you know. If it's above or below or whatever. So maybe that's [inaudible] so that's not very good. So some function like this that -- so but you have a hyperplane hidden in this program. So somehow this adversary gets access to this obfuscated program, it tries to find some interesting [inaudible] so you know if this hyperplane is the first -- you know, the first coordinate is a different [inaudible] and this what it tries to do. And it seems like they should be able to do the same thing but just oracle access to the problem. So this program has the hyperplane written here somewhere. And they can answer messages yes or no above, below. >>: But the overall probability of one I can match. >> Ran Canetti: No, so ->>: The overall -- but there is one specific adversary and for any adversary that chooses simulators. So there is a program and there's another theory that tries to put one specific property or predicate of this particular problem. So this pinpoints the bit. >>: Okay. >> Ran Canetti: Okay? >>: I think so. >>: Maybe like last question [inaudible]. >>: [inaudible] two definitions, VBB and VGB? >> Ran Canetti: We can separate them for -- so there's things that [inaudible] so we can separate them for the -- so I'm trying to figure out which way it goes. So we can separate them for the circuit model, but we cannot separate them for the Turing machine model. >>: So specifically there's a program which can be obfuscated [inaudible]. >> Ran Canetti: Uh-huh. >>: It cannot be a [inaudible]. >> Ran Canetti: Yeah. In particular this -- this program that -- for the circuit -look at this family of programs from the BGI thing. Right? For the circuit model. This program is actually iteratively obfuscated by the VGB model. Just because he can break the encryption, whatever, by -- just because in the circuit model. In the Turing machine model I don't know. So, in fact, it's a good question but the difference between the two. It could very well be, you know, for all we know that any program can be obfuscated VGB obfuscated in the circuit model. >>: That's where we should leave it then. [inaudible]. [applause]. >> Craig Gentry: So basically I'm going to be -- I'm going to talk about outsourcing computation. I know there's another talk just like a couple after mine of the same words in the title and I hope that's okay since this is a cloud crypto workshop and, you know, there's bound to be some redundancy in that area. But I'm going to talk about outsourcing computation privately, verifiably and perhaps even practically. So first of all privately. As you may I have guessed, I'd like to talk a little bit about fully homomorphic encryption, but not in as much detail as usual since I'm a little tired of talking about it. But basically the problem is that you have some client with an input and you have a server. And the client would like to delegate the computation. And furthermore, she would like to delegate it in such a way that the cloud doesn't even see what her input is. Okay? So that means basically that she wants to encrypt her input to make it private from the cloud. And then somehow despite the fact that it's encrypted, you would hope that's the case for any function F that's chosen either by the cloud or the client or whatever. It's possible for the cloud without the secret key to compute something that looks like the encryption of F of X. Okay? And then at that point the cloud just sends it to the client and the client uses her decryption key to recover F of X. Okay? So this is a fully homomorphic encryption scheme. In the past it was called a privacy homomorphism. But nowadays we call it homomorphic encryption. And the fully just means that we would really like it to work for every possible function F there that can be expressed as a boolean circuit, say you know in the past there were plenty of schemes that worked for some subsets, some small class of functions. But the fully, we would like it to work for all functions. So a bit more formally, fully homomorphic encryption scheme has the usual algorithms of an encryption scheme, KeyGen, encrypt and decrypt. And it has this additional algorithm eval, which is again a public algorithm. It doesn't take the secret key as input, it just takes some ciphertext and a function and it outputs another ciphertext which encrypts the function applied to the input. And what we would like a property of that find ciphertext there is compactness. There has been some schemes in the past where if you kind of operate on the ciphertext, the resulting ciphertext grows and grows and grows until it's really, really huge. What you like is that the encryption of F of X there, let's assume that F of X is the single bit, the output of a boolean circuit. We would like that ciphertext to be a nice compact normal looking ciphertext so that's that it's really fast to decrypt it and recover the output. And the point is we want to delegate processing. And if for some reason it took a really long time to decrypt this output ciphertext that wouldn't be any good. We want the recovery time to be kind of independent of how much computation was involved in computing F. Okay? And the notion of securities is the usual notion of securities. It's just semantic security. It should be hard to tell whether a zero or one is encrypted. And I just had to though these slides in here because I love them so much. I like to describe homomorphic encryption in terms of a physical analogy. So you can see that, you know, kind of it works in the real world in some sense. So physical analogy I have is that you have a jewelry store on the other hand Alice and she has some raw materials and what she would like is to delegate to some workers that she has the processing of these raw materials into rings and necklaces. Okay? So this is what we want him to do. But she's worried about theft, right? If she just willy nilly gives the workers the raw materials, she might lose some of her materials. So she's worried about theft. So in some sense she has a [inaudible] problem to the homomorphic encryption context. She would like a worker to process the raw materials without giving away complete access to the raw materials. Okay? And the solution here is kind of a encryption glove box, you know. It's so you have Alice here creates this glove box with a lock on it. And to do this, she puts the raw materials in the box, and she sends that over to the worker. And, you know, the worker sticks his hand inside and assembles the ring while it's inside the box. And he, you know, is assuming the box is impenetrable, so it's kind of useless to him. So he sends it back to Alice, and Alice just uses her decryption key to unlock it and recover the finished piece. So that's basically the delegation of computation scenario that we want here to get out of our encryption scheme. Essentially the homomorphic property is the gloves. Most encryption schemes don't have gloves to allow you to manipulate what's inside. Okay. So now, suppose you had a fully homomorphic encryption scheme. So how would you solve like maybe a typical cloud computing problem? So suppose for example that I just -- I encrypt all my files, I put them out on the cloud and at some later point I want to retrieve some particular files, you know, some files that have some combination of keywords. So what I do is I just encrypt the original files with this homomorphic encryption scheme, okay, and then later I have some query I want to make, and that query can be expressed as some boolean circuit, basically. Okay? So you just encode that query. And then the cloud just operates on the encrypted data that's sitting on its servers, combines it with the function F and runs this evaluation algorithm essentially processing the data while it's inside the encryption box. And what comes out is a ciphertext that is supposed to by definition encrypt F applied to the messages inside those original ciphertext there. And that's what I wanted, right? I wanted basically -- that's the response to my query. That should be precisely the files that satisfy the particular constraint having these keywords. Okay. And so there are lots of potential applications of fully homomorphic encryption. I mean, you could do an encrypted bing search, for example. This is very theoretical. You wouldn't -- I mean, it would take a really long time to do this. But in principal you could just -- I could just take the bits of my query and encrypt them, okay, and, you know, if bing agreed to then it could just take my encrypted query, combine it with all the data that's sitting here and whatever search function it uses and come up with a ciphertext that is responsive to my query. Okay? And it would never see what my query was. This would be very slow, so I'm not claiming it's practical or anything. But in principal, it's doable. >>: [inaudible] homomorphic [inaudible], you know, from actual queries like [inaudible] entire -- I mean if you [inaudible] actually don't have to look at the entire data? Because here for example if I'm [inaudible] is encrypted ->> Craig Gentry: Yeah. So it's kind of doubly inefficient, not just the fact that everything is encrypted but also that in practice you would use indices to make the search go blazingly fast, you know, some sort of -- can you imagine a binary search type thing. >>: [inaudible]. >> Craig Gentry: But if everything is encrypted you can't really do a binary search on the data because you can't see anything, you're just doing everything kind of blindly. >>: [inaudible] characterize, I don't know, this kind of efficiency limitations [inaudible] privacy. Because I haven't seen that like formally [inaudible] but I haven't seen like formula what exactly is the price. >> Craig Gentry: Yeah. Yeah. That's true. That hasn't really been formalized. But basically the idea is when you try to encode binary search as a circuit, what it ends up looking like is, well, the circuit has to touch all of -- all of the data. I mean at some static thing that just has to touch all the data. And that's what the homomorphic encryption scheme is run on, it's run on the circuit that's of size linear in the data even though in principal a binary search on unencrypted data could be log time. >>: Homomorphic encryption on e-mail would be great but it's not [inaudible]. >> Craig Gentry: Yeah. Okay. So some other applications. Private information retrieval. I'll talk about that later. But basically the idea is you just want one little slot of information without telling the database which slot you just picked. Searching encrypted data, that's basically what I was talking about there. There are other approaches using things like pairings but this is a fully homomorphic encryption, it's a bit more flexible because you don't have to prespecify some sort of keywords that you're going to search for, you can just do it dynamically. You can also do things like have an access control mechanism that's kind of oblivious. I can encrypt my credentials and offer them to the server and the server can take these encrypted credentials and offer me back the -- some information only if my credentials satisfy some constraint. But of course it would be operating on these encrypted credentials, it wouldn't even know what exactly they are. And there are various other good things that homomorphic encryption does for two party, multiparty computation. So this is -- it was recognized early on that this would be very useful thing, this privacy homomorphism, basically right after the invention of RSA, and their motivation is basically similar to ours now, you know, searching on encrypted data. And over time there have been plenty of somewhat homomorphic schemes. Like in RSA if you just very, very basic RSA if you just multiply two RSA ciphertexts under the same key, what you get is a ciphertext that essentially encrypts the product of the original messages. And so you can do a multiplication operation inside the encryption box. So it's multiplicatively homomorphic. But there's no way to do an add operation. And really to do general functions, you need, you know, basically if you look at and gates and or gates and so far all these can be expressed as some combination of additions and multiplications so if you could just do additions and multiplications sort of indefinitely then that's really what you -- all you need for fully homomorphic encryption. But we didn't have that. We had schemes that did one operation. And here is a scheme that does quadratic formulas. And as I kind of alluded to, there are other homomorphic encryption schemes that, yeah, in principal you could evaluate the ciphertext forever. But the ciphertext just expands and gets bigger and bigger even exponentially with the depth of the circuits, so they're not really very practical. Okay. So now we have some fully homomorphic encryption schemes. We have three of them now. All of them basically use the same blueprint, which is a little disappointing. And I'll talk about this blueprint later. And this blueprint is kind of inherently slow. So therein lies the problem. We have the solution but it's a bit impractical because they all use this blueprint and we don't have any other way of doing it. We have the solution, but it's a bit impractical because they all use this blueprint and we don't have any other way of doing it. So I'll get back to this. But I want to get -- talk about one more application that I was involved with of FHE, and that is this idea of outsourcing computation verifiably, not just privately but verifiably. And here the setting is you know, again, you have a client that wants to delegate the computation to the cloud but she wants a quick way to check the cloud's work. Of course she could, you know, just duplicate the cloud's work, if she wanted to take the time, make sure it's the same. But the point is she wants to spend much, much less time than it would take to do the work herself. Okay? And so here we have, you know, is kind of somewhat analogous setup. We have some input X and a function F. And the server puts in a lot of work. And it outputs F of X, something that looks like F of X and kind of a proof that this is indeed the correct value, F of X. Okay? And the client just verifies that the proof is correct. And unfortunately homomorphic description doesn't solve this problem immediately because I mean the server can do all sorts of computations on encrypted data, but how do you know at the end that it encrypts the results you want as opposed to just, you know, the cloud just created a fresh ciphertext that encrypts whatever it wants that you have no idea. So a bit more formerly, so a verifiable computation scheme has four algorithms, also. This is a KeyGen algorithm. But in this case, the KeyGen algorithm depends on the particular function F that the server wants to be computed. Okay? The function F can be like a universal circuit. So universal circuit can take some description of a function F as well as an input and then it will -- it will output this function applied to this input. So it's not -- when I say it's constraint of a particular function, that sounds very limiting. But in fact that can be universal subject that would have allowed you to encode any function. So second we have a problem generating algorithm, which is sort of like an encryption scheme, except it uses the secret key that Alice hides. Okay. And then there's a computing algorithm where basically the worker takes whatever the output of the problem was and just kind of manipulates it and outputs some answer. And then finally there's a verification algorithm which includes -- it's kind of like analogous to decryption in some sense. I mean, she takes the output of the worker and kind of decodes it and sees if it's the right thing or not, verifies the proof. Okay? And again, we want a compactness property here. We want the output of the workers nice and short and so that it's very fast to check it. Okay? Hopefully independent of the function and size. And security, what we want is, well, basically no bad proofs that it's impossible for an adversary to output some Y prime and the proof that, you know, why prime is indeed F of X when it's not the case. And even in the setting it would be nice to have some input privacy although that's not really part of the definition. You can imagine that Alice just originally would want to keep her input private. But we're not demanding that here. That would be just kind of a feature. So for this there are lots of applications. So I mean there are many reasons you would want to check that the cloud performance correctly, even protect against benign errors. If the data is very important it's going to be used in some other computation or financial transaction. You want to protect even against benign errors. For small devices maybe a small device would like to outsource this computation. And there you have just sort of the same issues. In large scale computations like SETI@home and folding@home, there you need to weed out the bad results before they're incorporated, obviously. And if the results weren't verifiable, there might be a strong incentive for users to fake them. So there are plenty of kind of ad hoc solutions to this. One is just redundancy. For example, at folding@home you just have multiple people compute the same thing. Take a majority. And there are various audit based solutions that -- I mean, the client itself for you know like folding@home can recalculate some portions itself just to make sure that it's correct. And there are various kind of secure hardware solutions. But they don't really completely solve the problem. In terms of more a theoretical crypto work there's a long line of work beginning with interactive proofs where the idea is that you have a powerful prover is trying to convince a verifier of something that it can't do on its own. These are interactive. We kind of wanted to avoid interactive solutions. There are probabilistically checkable proofs. So there you can have a computation that's, you know, it's a very long computation. And it has a very long proof of correctness. But what you can do is encode this proof of correctness in a PCP way, which means that the verifier can just take portions of the proof and as long as the prover doesn't know which portions of the proof -- beforehand, which portions of the proof is verifier is going to pick, then it's basically stuck and the verifier just checks these portions and then it's satisfied with some probability. You also have efficient arguments and CS proofs. And there basically there's -you sort of overlay some crypto on top of a PC proof. The prover sort of commits cryptographically to its PC proof, and in the case of CS proofs the challenge, kind of the query by the verifier is done non-interactively by using a random oracle, okay? So the prover applies a random oracle to its own proof and -- which is kind of hard for the prover to predict what it is, and that says which bits of the proof the prover has to give to the verifier. Okay. And so -- yeah, and so all of these schemes there's really no notion of privacy of the data, which would seem like a pretty important aspect in a lot of scenarios. Okay. So I want to talk I guess very briefly about. I think I have at least 10 extra minutes. About a recent solution that we have using fully homomorphic encryption. To this problem. And what you have is a non-interactive efficient verifiable computation scheme for every function F without these PCP based solutions that we had in the past. And furthermore, there are no random oracles. I mean random oracles don't really exist, so that's kind of and impediment to their acceptance. So the size of the worker's response grows only linearly with the output of the function of F that's obviously the best you can possibly hope for. Whereas in the case of the PCP based proofs, I mean, there's some higher dependence. It's still not very high, but it's a little higher than just plain old linear. And we get input privacy for free because we're using fully homomorphic encryption so that's what you would expect. So there is a drawback of this scheme and that is if you want a function -- you have this function F and you're going to be receiving multiple different inputs for it over time and you're going to, you know, want to evaluate it dynamically at that time. But it's going to be the same function F. Okay? So we have this amortized notion of complexity here where basically for this function F there's a one time setup procedure, which is kind of expensive for the client. It takes as long as computing F. And once that is done, the subsequent evaluations of F, you know, the client will receive this benefit of the verifiable computation scheme that it will just have to do a short verification work on what the worker sends to it. Okay? And there's one other drawback and that is, okay, suppose somebody cheats, okay? So I detect -- I receive a proof that doesn't verify. What do I do then? So unfortunately in this scheme, if you detect that someone has cheated, you basically have to stop at that point and you can't -- can't use what you created in this one time pre-processing phase any more, you have to create another one. Okay? Okay. So here's the high-level idea. It's a very simple idea. I hope it's not too simple. But it's -- the idea is basically take a one-time verifiable computation scheme, okay? In this case, you're only going to do -- you're only going to compute the function F once. Okay? And it's pretty obvious we have a one-time verification scheme. And that's Yao's garbled circuit, which I'll describe in just a minute. Okay? But, you know, for some reason that I'll get to in a minute, trying to reuse this Yao garbled circuit is not secure at all, okay. But it's secure once. Okay? So you take that one-time scheme and you basically sprinkle some FHE on top and so what we'll do is like if you want to evaluate F repeated times, each time you generate a new FHE public key, and instead of just sending the Yao input wire labels in the clear, which is what you would do up here, you encrypt them under that fresh homomorphic encryption public key. Okay? And then what happens -- I mean, what would happen in the original scheme is that the user would kind of evaluate the Yao garbled circuit and get some label for an output wire. Okay? So here though the worker does the same thing but it's all inside the homomorphic encryption scheme, okay? Homomorphic encryption scheme, okay? So, I mean, you know, any function that you can evaluate you can sort of evaluate inside the homomorphic encryption scheme and that's kind of the point. And so you just evaluate that entire garbled circuit while inside the fully homomorphic encryption scheme. And you -- the worker ends up with an encryption of the bits of a Yao label, instead of the Yao label itself. And the client just decrypts that, checks that it's a good label. Okay. So just to review in case you don't know so here's Yao's garbled circuit. So you have some function F that you want to compute. You express as a boolean circuit with and, or, not gates. And then you have a lot of gates that look like this, that have two inputs and one output. And you associate to each wire in the circuit a couple of labels. Strings. Say 100 bit strings. Okay? Or 128 just to be precise. And then to each gate in the circuit you -- you associate some ciphertext, four ciphertext. And what this ciphertext is for example is -- what it means is that to -you'll -- you recover basically the output wire -- let's say the operation is an and. The and of zero and zero is zero. So G00 here is going to be 0. So that's -- you will recover C0 if you have the labels A0 and B0. Okay? If you happen to -- so as you percolate through the circuit you're going to have exactly one label at each wire. Okay? And if you happen to have A0 and B0 then you can go here and you can decrypt the ciphertext and you can recover C0. Okay? But if you have A0 and B0 and you try to decrypt anything else, you're going to have a problem, because there's going to be some label that you don't know, for example B1 here. So that means you won't be able to recover C of 01. Except for the case in and that happens to be the same thing. But you wouldn't know that. Okay? And then you publish these ciphertext in random order. So this is unclear which are associated to which wires. Okay? So to use this as a -- well, let me see what my next slide is. So to use this as a one-time scheme, basically the client takes the circuit associated to the function F that it wants to evaluate and creates a garbled circuit for it. And when it has a particular input that it wants to be evaluated for F, well, that's associated to, you know, one label for each wire. It gives those labels to the worker. The worker just percolates up through the Yao garbled circuit. And it gets the label for some output wire, and that's the answer. Okay? And intuitively the reason this is not clear is that I mean the only way that the worker could forge is if it somehow output the other -- if it figured out what the other output label for the output wire is. But how is it going to figure that out if it doesn't, you know, it's not able to decrypt those other ciphertexts and the garbled circuit. Which it shouldn't be able to. So it's basically the intuition. But that's why it can be used as a one-time scheme but now imagine you try to use it twice. Okay? So I've already given the worker basically half of the input labels for the Yao garbled circuit in the first round. Now, if there's a different input I give him the labels for those associated labels in round two, at this point he usually has like, you know, three-fourths of the labels and, you know, eventually he's going to sort of collect all the labels for the input wires, both for zero and for one. And if he has all of those labels, then he can basically unravel the whole circuit. It's not garbled anymore at that point. Okay? Then it becomes insecure. So that's why we can't reuse it, okay, it just has to be used once. Otherwise the security are going to break some. Okay? So, you know, again our solution is really very, very simple. It's so simple it can just be put in a nice little picture like that. It's just to homomorphically encrypt under a public key that's specific to the round all of these particular labels and, you know, again the user just kind of does the same thing he did before in the one-time scheme except all the action happens inside the fully homomorphic encryption scheme so that it gets a homomorphically encrypted label at the output, either C0 or C1. >>: [inaudible]. >> Craig Gentry: I like the hands on approach. And the intuition of the proof is basically okay, basically I'll -- let's say the worker cheats successfully in the scheme. Okay? There must be -- then there must be basically some round for which he outputs a proof that's incorrect with respect to some input, okay? So let's just guess which round the worker cheats on then, okay? We'll be right with some reasonably good probability, because there are only so many rounds. And then -- and the other rounds we just replaced the homomorphically encrypted stuff that would -- we would normally validly create with just kind of random junk. And obviously that's kind of useless to the attacker. You shouldn't help the attacker. And so once we've done that basically really all the information the attacker is getting is what he correctly got in that one round that we've targeted. So that means that if he can break the scheme, you know, for that one round, essentially he's breaking the one-time scheme, that one-time Yao scheme. That's a very hand wavy description of the proof obviously, but that's basically the intuition. Okay. So but what if the user -- what if the worker tries to sheet. Okay. We know he can't cheat once. But let's suppose, you know, he's respond with something and didn't verify it properly. Okay? We've caught him. We didn't accept his output. But now what do we do? Okay? Can we continue? The problem is, no, we can't really continue because here's an actual attack that the worker could do to gradually unravel the entire garbled circuit inside. Okay? So here's what the worker -- the worker is going to try to figure out what each Yao label is. Okay? So what it does is -- you know, in the round it's given a homomorphically encrypted label. It just zeroizeses one of the bits. So it replaces one of those ciphertexts with an encryption of zero and sees if that messes things up. Okay. It does everything normally after that. Sends back a response. And if the client says oh, that's bad, then that means oh, the -- that must have been of encrypting a one not a zero. A and so in that way, you know, since it kind of gets a bit of information from the client in each round as to whether it verified properly or not, the worker can -- a malicious worker can gradually figure out what the labels are and then just destroy the entire circuit. So what's the countermeasure? Well, basically after you detect cheating you have to start with a new garbled circuit. It's not quite as bad as it sounds, because you could have a situation like folding@home where it constructs garbled circuit for that day. And maybe it has millions of users. So it's just, you know, it creates one garbled circuit for millions of users to worker on on that one day. And then it's just not going to give any verification information through that day. So it will receive all these responses from users but it won't say whether it verified or not. So it won't give any information back to the users. And then at the end of the day, it will do some accounting procedure where it says okay, those responses you gave me were kind of screwed up and I'm not going to pay you for those. And then it starts over then the next day with a new garbled circuit. So as long as this verification information that goes out always occurs after all of the -- all of the client's, you know, verification work has been done then everything is fine. So you can still get some amortization benefit here. Even though it doesn't work as well as you would hope. So it would be nice to have better countermeasures, though, so that's an interesting open problem. And I'm running low on time. So I want to talk a little bit about practicality with FHE schemes. And this is also an interesting open problem. So we have -- as I mentioned, we have three FHE schemes now. They all follow the same blueprint which is the following: You construct a somewhat homomorphic encryption scheme, that means a scheme that maybe can do additions and multiplications on the underline plane text for a while but then maybe it gets stuck for some reason. Okay? So it can go on for -- it can compute functions of some complexity. And then in the blueprint the idea is to take this -- hopefully you can take this somewhat homomorphic encryption scheme and kind of like beat it down until it has a certain property called bootstrappability, which is that what you would like for a reason that I'll tell you in a moment is that the encryption scheme -- you know, it has some class of functions that it can evaluate homomorphically. And what you would like is that the decryption function of the of encryption scheme itself is in that class. It's kind of like a self-referential property with the encryption scheme. Okay? And this is a property I call bootstrappability. And -- well, I'll just go to the next bullet. It turns out that if you have this property, if a somewhat homomorphic encryption scheme has this property that it can kind of homomorphically evaluate its own decryption function and still get a correct result at the end, then it's easy to transform that scheme into a fully homomorphic encryption scheme as a general transformation. So, yeah, so the idea in the second step is that you just try to massage this initial somewhat homomorphic encryption scheme so that its decryption function becomes like a flatter and flatter, you know, can be expressed by a flatter and flatter circuit. And once it becomes flat enough, then the scheme becomes bootstrappable because the decryption function is -- become, you know, is finally in the set of functions that the scheme with evaluate. So, anyway, all the schemes follow this basically blueprint. And as you might imagine, this blueprint inevitably leads to slowness. Because it involves homomorphically evaluating the decryption function. So the decryption function on its own in this scheme is kind of slow, okay? But let's imagine that you have a decryption function that's expressed as a circuit with lots of, you know, bits on the wires. What you're doing here is you're homomorphically evaluating the decryption function. That means you replace each of these bits on the wires of some huge ciphertext, okay? So you're kind of like squaring the complexity of the scheme. It's like the decryption function times the size of the ciphertext in some sense. So it's kind of inherently inefficient. And so one natural question to ask is there another blueprint? Another question is maybe -- maybe bootstrapping is not really necessary. Maybe if you just looked at the somewhat homomorphic encryption scheme, maybe it -- maybe it does a good job at evaluating most of the functions that we're interested in. So why the -- why do we only have this particular blueprint? I don't know. It just -- I mean, speaking for myself I could say I was looking at a lattice-based encryption scheme that was quite homomorphic. You could do, you know, a good number of additions and multiplications. So it was interesting. And each ciphertext has some noises associated to it. There's just, abstractly there's some noise parameter associated to the ciphertext. And as you add and multiply the ciphertext, it has the effect of growing this noise until the noise basically drowns out the signal. There's no -- the ciphertext becomes indecipherable. Okay. So the question arises how to reduce the noise of a ciphertext. So you'd like to take a ciphertext that has some large noise and you like to make the -- you'd like to somehow refresh the ciphertext so that got another ciphertext encrypts the same thing as the original one but it has smaller noise so that you can combine it with other ciphertexts for a little while until that noise gets too big and then you refresh again. And so you continually refresh the ciphertext. And that just happened to -- the way to refresh ciphertext just happened to be to kind of homomorphically apply the decryption function. So I mean, if you just apply the decryption function, you know, then you would just totally create a -- you'd decrypt it and you would create an entirely new ciphertext that encrypts the same thing that has small noise. Well, that's easy. But obviously -- but we don't want to give way the decryption function. But it turns out if you homomorphically apply the decryption function it has a lot of the same effect. So this particular scheme had some noise in it, and the bootstrapping just arose out of how to solve this noise problem. But in the past people have typically investigated schemes that really didn't have any noise associated to them. So there's this line of research started by Koblitz and Fellows where basically -- well I won't go too much into it, but basically an encryption of zero is like an element of some algebraic ideal. And encryption of one is like one plus some element in the ideal. And so as you -- you know, you add, you know, two things in the ideal, get another thing in the ideal, that's like adding zero to zero to get zero and, you know, all the other operations work out. And so there's no noise associated here. It's just a question whether the element is in the ideal or not. It's called the ideal membership problem. And if that's hard, then you would get a fully homomorphic encryption scheme if you could get everything to work efficiently. But unfortunately these schemes have the problem that I mentioned before that as you add and multiply ciphertext basically the individual ciphertext are multi variable polynomials and as you add and multiply them, they're multi-variant with, you know, lots more monomials than the original ones and they just expand, get really large. And so for this second question of can we avoid bootstrapping, maybe many functions we don't even need it, I just want to have give you -- we've actually implemented this scheme. So I want to give you some modification for this question. So if you just look at a somewhat homomorphic encryption scheme where we're just doing kind of will simple additions and multiplications of ciphertexts and do additions and multiplications on the underlying plane text without the refreshing step you could see things aren't so bad. I mean, we have -- this is kind of increasing security parameters. Okay, we have -- you can see the public key is kind of large here. Okay. It's not great. I'm not saying it's great. But the running times are, you know, milliseconds and seconds, right? It's not totally ridiculous. But if you go to the fully homomorphic scheme where we have this refresh procedure, what you see is that for like moderate sizes just refreshing a single ciphertext over here takes a really long time, like three minutes. So if we could avoid that, that would be fabulous. And I think I'm really over time now. But there are a couple of interesting settings where the polynomials, some interesting polynomials, for example in the private information retrieval context, where a client just wants to extract the Ith bit of a database without telling what I is, and like keywords where you just wanted to find like if a particular string is in the file. He's are two examples of where the polynomial that's being evaluated is actually really low degree. Not really low degree, but low enough so that really bootstrapping is not going to be needed. And so it would be interesting to look for other -- try to categorize the types of functions that don't need bootstrapping. >>: [inaudible] homomorphic encryptions [inaudible]. >> Craig Gentry: No, we didn't implement that. Okay. So there are many open questions. I'm sure some of you know there's DARPA BAA for relating to computing on encrypted data so, they're offering 20 million dollars in funding. And they're focusing on speeding up FHE. And I'd like to encourage you not to apply. [laughter]. All right. That's it. [applause]. >>: I guess we still have just a minute before the next session so if [inaudible] seriously, one question maybe. All right. So we have a nonzero break and I guess organizers maybe like 11:20. [inaudible]