>> Kristin Lauter: Okay. So this morning, I'm... the cloud cryptography workshop. As most of you know,...

advertisement
>> Kristin Lauter: Okay. So this morning, I'm very pleased to welcome you to
the cloud cryptography workshop. As most of you know, this workshop was
conceived and organized by Seny Kamara and Melissa Chase from the research
cryptography group. And they've done a fantastic job of organize. I'm so glad
you're all here.
We have a great list of speakers and talks relating to cloud cryptography.
So to kick off the conference, we're very pleased to have John Manferdelli,
distinguished engineer at Microsoft Research to give us a little bit of and
overview and background on cloud computing security. Okay. Thank you.
Okay. Thank you.
>> John Manferdelli: Thanks. Actually it's punishment. My father always wanted
me to be an engineer, and I wasn't, so the title was punishment for something I
did.
So I want to talk -- we actually have two conferences going on at the same time.
We have a cloud security conference workshop on focused on systems aspects
of cloud security. In fact, I tried to have them on so they didn't overlap, because I
wanted to go to both, but because the schedule's today, they're both on. So I'll
be bouncing back and forth.
So you've heard a lot about clouds. In some ways there's not too much new.
Grids were a little bit like clouds for shared infrastructure for computing. They've
gotten very popular, partly because of scale and partly because people have this
sense that they're going to save a lot of money by using clouds. And I'll explain
to you the reasoning behind that, which is a little bit flawed. It's mostly flawed
around security. But it has a few other problems too.
So let me give you background that you all know. There's a tremendous
asymmetry between attackers and commercial operations. Generally even
clients -- everybody who's under the misapprehension that we've got things
under control, they're just wrong. There are new attacks every day. If somebody
wants to have a targeted attack, they can do it. It's not like they stay up at night
saying gee, I wonder how I'm going to get this new one to get poor Orr over there
if they want something.
The clouds have -- do have new -- offer new uses. And probably the biggest one
is just as a huge data store, common data store where you can bring computing
to them, look at an enormous amount of data, whether it's social networking data
or search data and analyze them.
The most corporations, if you ask Bill Gates, he thinks a huge part of IT will move
to the cloud as cost saving. So that will be a big business. Even for somebody
like the government just being able to use the shared infrastructure with
something simple as computing -- I'm sorry, communicating, bandwidth, is a
good deal because the cloud providers get preferential rates on things like that.
Many people, including many of the applications you're going to think about in
this conference, really have to do with clouds and clients interacting. It's not just
about IT running their payroll or whatever they want to do in the cloud. But some
service that has a client side on a mobile phone, on a PC, on whatever. And a
huge service in the -- in these clouds somewhere.
And so the whole, you know, actually people really do now have to think about
distributed security. They had to ever since they plugged in the Internet, but now
it's -- you're really going to pay attention to clouds you have to do that.
And there are geopolitical considerations. Some are obvious, some are not.
Privacy concerns in this countries are different than others. If you're in a data
certain and it's being used as a botnet, you run the risk of being closed down
because somebody has to find the botnet and the cloud provider hasn't isolated
them well enough to say that thing over there. And there's certainly been
occasions where that's happened.
Also, cloud infrastructure is usually pretty homogenous. There is a case where
Amazon -- and this wasn't even through a malicious attack, went down for 18
hours because there was a single XNL flaw in a protocol that was used
everywhere, and somebody accidental triggered it. It was called a packet of
death.
But even with that, there's really quite a strong drive to the cloud because of this
alleged -- again, it's not completely crazy, 10 fold cost benefit. The final thing is
when you control your own computing infrastructure you're focused on what
you're trying to do, mission, the problem, and so whatever you do that security
has tailored to the thing you want to do, if you're not too worried you do one set
of things. If you're very worried, you have a lot of control over what runs where,
when, how you audit it, how you look at it, what you encrypt, how you share it
with other organizations.
But in the cloud, just as in the peak infrastructure, the focus is on cost savings
and brand and marketing. So sometimes you're in this whacky situation where
you say to somebody that's a very high -- that thread is really bad and they say to
you, it's not so bad because when they get hit, they won't know about it.
If a security guy goes crazy or something like that. But again, for brand
preservation there's some thinking that that's not quite as bad as having all your
data stolen. So there are many interacting things.
So here's the economics. Just, this might be interesting just as a very rough
guide of why people think clouds are going to be a lot cheaper. So if you have a
very large datacenter, let's say a couple hundred thousand machines, the cost of
buying the hardware, because you have a good -- you know, you're a big
customer, you have preferential rights. And the savings you get from power.
There's a four fold preferential among the power rates and you obviously at the
best at the datacenter. Typically you put the datacenter close to a power source.
And the argument for a good rate is they lose less power in transmission.
And bandwidth costs, those are low. So all together, the total cost of ownership
of a server in a cloud is less than half of, you know, what I would go out or some
small company would go out and buy a server for. Maybe even a little less. So
there's a twofold improvement there.
The second improvement has to do with utilization. The argument is that when I
run a server in the cloud I can get about four times as much work by sharing
loads on it. Two to four times, depending who you talk to. And two times four is
eight. Well, there's the eight fold savings.
And in some cases it really does work for standard services like search or mail,
which are pretty undifferentiated services. Everybody kind of does the same
thing and just -- that works out pretty well. It works less well when anybody cares
about security.
So in data centers now, this is probably the right crowd, this drives me crazy, it's I
don't know a datacenter where if you want to run a secure service you don't hand
the datacenter operator your private keys. There's something wrong with that.
Several things. That's actually the way it is now.
This mostly was for the other talk. For the non believers, people who think
security, you know, basically is under control. I told my wife a couple days before
the conferences I could go to the register and find the latest five stories about
what bad thing happened and this is prejudiced towards Microsoft or against
Microsoft I guess I should say. So again, emergency patch for Windows. It's a
serious problem. I think it just got patched yesterday. What day is today.
BP, while they're trying to protect their brand actually was infiltrated. So some
information they didn't want out actually got out from an attack.
The iPhone had a Trojan, which you'd expect. Citigroup, which really is well
intentioned, they want to save their own money. They're at risk when people use
their application and something bad happens, an attack on the 27th.
Everybody of remembers the Google attack. It's infamous. And that was a cloud
attack. Windows, you know, if anybody thinks the problem is all solved, Tavis at
Google found a bug in Windows, and escalation of privilege bug that was there
basically since the start of Windows. Flash is just too easy a target, so we won't
talk about that.
And I got this graph of attacks from our malware center. I don't even know what
the scale s but you could tell it's bad. [laughter] it's very bad.
The problem even on a client machine, on a server, one you control, there are
two sets of problems. The first set of problems is the hardware. What it can
isolate and where? There's some pieces of hardware which are just not well
suited to isolation for competing adversarial applications. The screen is one.
But that pales in comparison to the software problem. Which mostly has to do
with us using OS legacy which grew up in a time when time sharing -- if there
was any security at all, it was the timesharing model where there was a -- you
know, a single administrator who knew everybody who used the machines who
actually personally installed and understand all the important software and knew
the people who were going to log in. Gave them their logins. Which is of course
not the world we live in at all.
PCs made that a little bit worse because it went from several people using the
same machine to one person using the same machine and why were you
protecting yourself from yourself? And then people plugged their machine into
the Internet and things went bad.
OSs are huge. And everybody gets everything for some reason. Configuration
is very difficult. I personally don't know how my machine is configured. And I try
to understand from time to. If you look in your Windows root store, you'd be
frightened.
But the major problem is vulnerability anywhere in the stack affects everything.
Which is very worrisome in a shared environment like computing infrastructure.
In most cases I don't worry too much about attacks which involve physical
possession. But in the cloud setting that is a problem. This datacenter is run by
insiders. You never see them. They don't -- they're not loyal to you, they're loyal
to the organization that hired them or possibly somebody else, which is even
worse. So insider attacks in these things are major problems. And people can
dismiss them in an enterprise setting. You can't walk away from them in a cloud
datacenter setting.
Still, the Internet is great at collaboration and sharing. We all, you know, get
papers that are posted on on the Internet. It's really a wonderful environment to
get information. So you sure don't want to give that up.
The commercial infrastructure with all these problems has moved to clouds, and
it's going to take those problems with it. So what are we going to do about it? Is
sort of the question. I had a -- I was on a government panel. I think my -- and
maybe this -- people in the room will -- the usual way to do with security I would
call it phen0menological or ad hoc is probably a less classy word for the same
thing. It's people sort of try to prevent the attacks by noticing what the last bad
thing happened, fixing that.
There's a lot of research in detecting attacks. It's interesting. A clever person,
it's very hard to detect attacks. So the research is quite interesting. But again,
the success rate isn't great. You wouldn't want your life to depend on always
finding out when you got attacked.
Then they try to mitigate the attack. Okay, we can't really fix it but we're going to
do things so it's hard for people to exploit it. And finally, recovery. So which a
disaster happens where is the CD that we go, we initialize everything with?
And these are all valuable, actually mostly for analysis to see what's around. So
I won't list all these. But one example is address space layout randomization,
which sort of combats a little bit of the homogeneity problem. It will not stop all
attacks. But the nice thing about the randomization is if you only see 10 percent,
one percent of the attacks because the randomization causes them to fault some
way, you can detect attacks. They're a very sensitive indicators of attacks in the
wild. They're not a way to stay safe though. They're just the way to find out
what's going on.
So we're losing. In both conferences the question is what should we -- could we
do in a more fundamental way F this were -- you know, if people treated security
as a science and not the sort of ad hoc phenomenological thing they do. And so
you can look at a couple of models -- physics model which isn't completely
appropriate. It's probably [inaudible] subtle not malicious, the world we're in,
people are not necessarily subtle. But they are malicious. And they often get
away with it.
So the role of adversary enters. But that doesn't mean you should give up.
Economics is in that same model.
So I went back to my old physics books and, you know, if you just sort of glance
at what's important to scientists, they observe and guess, which I could say we're
doing now, but they want nice theories that predict and explain phenomenon. It's
-- you know, you shouldn't be able to make money as a scientist for predicting
the past. Theories have to predict, you know, phenomenon before they happen,
which is apparent -- you know, is not always true in security.
And they look for simple laws that are comprehensible so you can analyze what's
going on and make the prediction. And final, they're verifiable. I mean, describe
all the crabbing among physicists if you have an experiment that disproves a
theory you probably don't get the kind of crap you'd get if you were in a political
debate. It's just, you know, didn't work. Too bad.
So the goal generally, for security in the cloud, is you do -- you want to protect
both from the classic external attacks that we see now but also internal attacks,
insider attacks. And the environment has changed. Now you're at the mercy of
insiders. You can't deny it.
It's not exactly a hardware problem. In some cases it's very hard to protect
hardware if somebody has physical possession. But, for example, in a
datacenter run by somebody, Microsoft who doesn't especially like being sued,
physical possession is less of an issue than security. You can have cameras on
the machines all the time. There are very few people, and in some data centers
you roll up the truck with all the computers, and you never open it. When things
break, they're gone. Until you throw away the container can.
So physical security can be a problem, but it's not as big a problem as software.
So I'd like to go back and say, well, what do we want from these data centers,
and these are the sort of the buzz words you'd all look for. But you'd like a
simple model for why whatever you're going to do provides them. Not
complicated we did this thing here and did this thing there and maybe if
everything works out we won't get hurt.
So the first conference looks at this systemically. And their basic is what can we
do so that when you run your software on a machine you have -- there are three
properties? You know when -- what software and hardware you're relying on for
security. You can measure it in your software somehow. Cryptographically,
actually.
The second thing is when you're sharing resources, let's focus on the CPU, they
really are isolated. And again, there are still issues with things like side
channels. But again many of them can actually be addressed by hardware by
assigning bandwidth, that sort of stuff. But really isolated. Not what happens
between one process on Windows or Linux, for that matter, and another process,
like they're isolated.
At thing is the program, not the people, should have secrets that only they see.
The guys if the datacenter ought not to see your credits. And finally you ought to
be able to do this in a manageable way and you should be able to manage this
cryptographically over the Internet.
So in the systems setting this is one solution which would make clouds
believable for somebody who had sensitive data. And again, once you have this
environment, you never, ever, ever, let anything out unencrypted outside your
partition. The disks always contain encrypted data. That's it. You always
transmit stuff encrypted. So if you mess up, it's your own fault.
Correspondingly, and this is an example, I think much of what you'll talk about
goes to another problem, let's just give up. Let's say the datacenter's run by
Joseph Stalin and we still want to get some useful work out of it. Is there some
simple guarantee, some simple way, and are there some simple things we can
do to do useful work? And I think it turns out there are. A lot. They'll get better.
But even the simple ability to store stuff because the cloud is guide good at
communication and storage, in an encrypted manner, get to it from anywhere.
With the additional ability to search it, that's actually a huge benefit.
And if your datacenter is run by Stalin, you don't worry so much because you're
also using a datacenter run by Chairman Mao and somebody else. Whoever the
bad American guy is. So there's some redundancy.
So that's the goal of this meeting. And I think it's actually quite important. I'm a
little bit skeptical about the hype -- in fact, I'm a lot skeptical about the cloud. But
the cloud will have a lot of benefits. It will eventually do many of the things
people hope they do, but not without being safe. And right now they're not safe.
So I look forward to conference. I will be in and out today. But I will mostly be
here tomorrow. And I want to thank you all for coming. I'm really glad they're
being recorded, so the ones I miss today I'll get to see.
>>: [inaudible] conference?
>> John Manferdelli: Yeah. Maybe. Actually it turns out -- so the first day it was
quite different. The second day we're going to talk about TPMs, one of the things
that let you boot these things. And we're going to talk about searchable
symmetric encryption. But you're going to hear that talk here in this session, too.
We duplicated it. You're welcome to come, but I don't know that -- I think we've
arranged it so that you won't have to generally. Okay.
[applause].
>> Ran Canetti: Okay. So when prepare the talk, I wasn't sure about what the
audience will be, so I want -- so I [inaudible] the first half to be kind of more for
wider audience and second half for technical. And but then also planned it for a
little bit more time, so maybe the second half will come shorter.
Anyway, so what does this code do? Well, if you know C that you can stare at it
for few minutes and you can see this program basically count a number of primes
from one to a hundred. There are two loops, one inside the other. It checks.
Iteratively it is very easy to see.
>>: [inaudible].
>> Ran Canetti: Yeah?
>>: [inaudible].
>> Ran Canetti: No, I think it was -- yeah, it is cut off a bit. Scoot to the left,
yeah.
>>: [inaudible]. It's up to you.
>> Ran Canetti: In fact, you know, there is -- maybe we can just put in it this
mode. Whoops.
>>: [inaudible].
>> Ran Canetti: Nothing is cut off. Okay.
So what does this code do? So it's hard to see, but it's kind of hard -- if you start
it, it's hard to see what this code does, but, in fact, the -- this code that is the
same thing as the first code. Both codes count the number of frames who went
to 100. In fact, the second code was at least easy -- it was generated from the
first code via mechanical transformation. It's just a very simple number of
syntactical operations changing the names of variables. It's turning loops into
recursion, et cetera, stuff like that.
So this is the process of program obfuscation. And what is program obfuscation?
So it's very different things to different people. So from one point of view
program obfuscation is an art form. It's an art of writing unintelligible or surprising
code. In fact, there are several yearly contests and lots of creative code out
there.
So here is the winning entry in the 15th international obfuscated C code contest.
In 2000. So the thing starts in '85, I guess. So the author said -- so this is a C
program. It's hard to say from far away. But instead of making one
self-reproducing program, what I made was a program that generates a set of
mutually reproducing programs, all of them with cool layout.
So if you fit it with itself, you get another phase and, et cetera. So this is one way
to use obfuscation.
So here is the winning in the same contest four years later. So this is an
operating system. Maybe a bit smaller than Windows. So this is a 32-bit
multitasking operating system for x86 computers with graphical user interface
and a file system, support for loading and executing user applications in elf
binary format with graphics and a command shell and text editor, et cetera.
So this is another cool thing. Okay.
>>: [inaudible].
>> Ran Canetti: It runs. Okay. So this is one way of looking at program
obfuscation. So but also a program obfuscation can be something else. So it's
also useful tool for hackers, right? So if you're a hacker, if you want to attack a
system, you want to hide what your program is doing, and, in fact, many viruses,
worms, and malicious code try to hide their code by using a number of
techniques. So some of the techniques make sure that the code that is running
on the computer is different than the code that is written on the disk. The code
may even modify itself as it's running and, you know, maybe even keep one -only the one piece of code that is running in the clear and the rest is encrypted
and the keys are changing all the time by the program itself.
So there are all kinds of different techniques to hide the code.
And, in fact, here is a Web page that was blocked by an intrusion prevention
system, something that a student at Tel Aviv found. So this is what it looks like if
you just read it. If you just look at the Web page, maybe you think it's a maybe a
picture or whatever, a JPEG. But if it turns out that you -- but this is underlying
underneath here is basically it's a redirection to some malicious website as soon
as you click on some pixel on the screen or some frames on the screen and it
redirects you without realizing, to some malicious website. So that's another
thing that obfuscation is being used for.
But also obfuscation is also a business for the point of view of the good guys. So
if you look on the Web and you look for program obfuscation, you will find many
vendors that are willing to study obfuscation code. That's code its compilers are
going to turn your code into something which hopefully has a safe functionality
but looks unintelligible to others.
And there are very many good reasons to do this if you're a software vendor. So
you want to put software up on the Web or in the public, so you want to protect
your IP, don't want people to understand what your code is doing. You want to
maybe prevent people from modifying your code in intelligent ways. You may
want this obfuscation to be a good way to stop hackers or to slow hackers. So
good reasons to obfuscate code. And people are doing it.
So there are many techniques again for obfuscation out there. Those
commercial -- obfuscation tools. So one set of tools is to obfuscate the source
code. So you do variable renaming and you change the control structure, a
structure that's seen before, and you may do some higher level semantical
changes in your program. And another set of techniques is to obfuscate the
object code, forget the source code, and then you just look at the machine code
and you add redundant operations. You variate opcodes, you variate modes.
You know the same operation could be done in many different ways. Then you
encrypt unused modules, et cetera, et cetera.
But most of the techniques are proprietary and the name of the game here is
security by obscurity. So I won't tell what you I'm doing and you will never be
able to understand. Or never but will slow you down at least.
So people are doing it. And but let's think for a second what could we do if we
really had a good really secure code obfuscation mechanism? It could be really
great. It's a really cool thing to have. So assume we really could take software
and obfuscate it and make it look like tamper proof hardware, right? So you can't
really make it look like hardware because you could always duplicate, however
you cannot duplicate. But forget that. And let's assume that I can prevent
everything else.
So we could do lots of cool things, right? So as I said before, we could public
size code without fear of misuse. So this great for code distribution for download,
also great for cloud computing, great, I could give my code to the cloud. He
would run it for but would not be able to understand anything from it, how it's run,
won't be able to fully modify it.
And so even if I don't have secrets that the -- all the [inaudible] is public, I don't
have any secrets that the cloud doesn't know, even then he won't be able to do
things in a meaningful way. And furthermore, if I actually do have some secrets
and say yeah, yeah, yeah, run the program or write my program in such a way
that the output is encrypted and only I have the key then, the server won't even
know what he's doing, right?
So really if I could obfuscate everything in an efficient and effective way, security
for cloud computing would really be not a problem.
>>: So you assume that actually did the [inaudible].
>> Ran Canetti: So fully homomorphic encryption is a technique that doesn't go
all the way to obfuscation but it's very useful.
>>: [inaudible] in general.
>> Ran Canetti: Then -- right. Exactly. So it's more general than fully
homomorphic encryption. I'm saying in general. I take my program and write the
input in a -- you know, I don't even have to have fully homomorphic encryption as
a technique explicitly. I can just write a program that takes encrypted inputs, has
a key in the clear, in the program, has the key decrypts, run the computation,
encrypts again, and then that's a program with a key simplified the program, then
obfuscate it.
So it's the same functionality, but now the key is obfuscated and nothing you can
do. So clearly fully homomorphic encryption is a very useful technique here, you
know, to do this. But as just as an idea for what you are after, that would be
great.
So that's one set. So we can stop here, right because [inaudible], you know,
workshop. But there are other applications. So you could publicize data with
putting -- you know curbs on its usage, right? So you think of putting medical
records online, but you just want to be able to allow to use them in certain ways
but not others.
So you can just do it, right, because data together with the obfuscated codes that
there's a control. You can -- you can -- the code can really simplify secure
distributed tasks, right? Because I want to run protocol, you know, I want to do,
say, a voting among a number of people, I want to poll their votes and I want to
get their results, so I just write a piece of code that I get from each one. It's fast
forward and it's vote and just accumulate the tally and give me output to tally
obfuscate it, and I let the people run it, you know, one by one and then we get
the result. So it becomes almost trivial.
Also, I have some nice game theoretic properties, so there is a paper Micali and
Shelat actually use exactly this idea without calling it obfuscation. Assume there
is a token that passes [inaudible] and that has nice properties.
So you could also if you want to go further afield into cryptography we can turn -you know, we can get public key encryption from semantic encryption and
signatures from MAC and run the functions [inaudible] and functions -- so we can
get all those beautiful things. So in general obfuscation means -- it's an
immensely powerful tool. So it could do many things.
But unfortunately all the techniques that are out there that we know are all
heuristic. And essentially they are all eventually reversible. There's no real
security there. It's just annoyance to the hacker, right? And, in fact, the common
wisdom is that all the obfuscation methods are doomed to be failure -- are
doomed to failure at some point or another. That's common wisdom, and
actually it's been recreated very nicely by this technique writer.
So secure obfuscation is unlikely. The computer ultimately has to decipher and
follow a software program's true instructions. Each new obfuscation technique
has to abide by this requirement and, thus, will be reverse engineered.
The computer has to run the program. Eventually you just look at what the
computer runs, you'll be able to understand what's going on on. So that's the
common wisdom.
And is it true? So can we have really unbreakable obfuscation? And in fact,
since we are cryptographers and so we like -- we like to think about things
mathematically not just physically and empirically how really define what
obfuscation means. So here is the -- so there's been a lot of work in the
cryptographic community in the last 10 years or so on obfuscation, and in a
rigorous way. And here's a definition. It was proposed Virtual Black Box by
Barak, et al., so a general obfuscator is a compiler. It's also essentially inherently
randomized compiler that has the following properties. First it preserves
functionality. So for any program P, the output program Q, which is after running
the obfuscator P has the same functionality. Maybe except for negligible
probability over the choices of the obfuscator. And it preserves running time so,
the Q runs -- doesn't run too much slower than P.
And then there is this other property that actually obfuscates. And what does it
mean? So very intuitively the high level means that having full access to the
code of Q, you should not give to the adversary any advantage -- any
computational advantages -- advantage on top of having oracle access or black
box access or to -- you know, to secure tamper-proof hardware that runs the
program P.
And he's tried to quantify it in mathematical way. So here is an attempt. So for
any poly -- that can be in the tradition cryptography for any polynomial tie
adversary A, there exists a polynomial time similar to another adversary S on
machine S such that for any program the -- whatever the adversary outputs after
single obfuscated version of P can be output by S after a C having only oracle
access to P.
But this is obviously too strong, right, because the A could just output its input
which is the obfuscated program and S could never output a program, any
program that has this -- that has a safe functionality and then if this program does
anything useful then of course things will be distinguishable and so this is too
strong.
So let's settle for something slightly weaker. So now I will say that the adversary
only tries to compute some predicate R of the program. So just outputs one bit.
And then want to say this simulated as the same thing. Manages to predict this
predicate in roughly the same probability. Or equivalently if their probability of
reverse the outputs one is the same as the probability that simulator outputs one.
And this is called Virtual Black Box.
So the main result of this paper is that even this weakened definition is
unachievable, right? So they cannot -- you cannot have compilers or obfuscators
that have this property. So in the interest of time, I -- I'm not going to go over the
proof although it's rather simple.
So the proof actually shows is two specific -- or one specific class of programs
that cannot be obfuscated. And essentially the idea there is to do some
diagonalization. And the fact that the adversary has some code for the program,
no matter how obfuscated, allows it to do things that the simulator cannot do,
having just oracle access. In a similar way to the way you argue the gains to
definition that allowed the adversary to output long strings but in slightly more
sophisticated way.
But any way, the conclusion is that there exists a family of programs that cannot
be obfuscated. And the idea there is to use the fact again the difference that the
adversary has some code that runs the program, no matter how obfuscated, and
the simulator has nothing, just oracle access. And there is a big difference there.
So but what does this impossibility mean? So it definitely it resonates the
popular belief, right, so that's why it's kind of very convincing. However, if you
look at it a bit more closely, then what it shows only shows that certain classes of
programs cannot be obfuscated, right? So that means that rules out a generic
compiler obfuscator that can obfuscate everything, okay, but still there may be
other, you know, specific classes of programs that can be obfuscated.
It also only considers a relatively strong notion, this Virtual Black Box, and this
equivalence to tamper-proof hardware and essentially the impossibility uses in a
very strong way this particular feature of the definition.
So this leaves open the -- and the questions what about obfuscation of specific
classes of programs and what about weaker notions of obfuscation?
And indeed there was a lot of research in the last 10 years or so or nine years on
different variants of obfuscation. So just to say very briefly so there is -- so the
one that -- one direction from research is -- work that obfuscates specific
program families under different variants of notions. And then another set of -set of works showed connections in between obfuscation, other cryptographic
tasks between obfuscation encryption, signatures, and other things.
And then the other set of works to extend these impossibility results to have
strong impossibility results in other cases if you require more [inaudible] things
like information, et cetera. And then there was another set of work that
investigate different notions, relaxations, additional features, et cetera. So there
is -- so there is -- so this essentially the works that have been done.
So in the rest of this talk, in the remaining time, which is not too many, I'm going
to concentrate on a very specific thing. I'm going to concentrate on the original
Virtual Black Box definition, although it's stronger [inaudible]. And the salient
characteristic of this definition which actually I didn't say before is that you need
to obfuscate any program in the family. So it's a definition there said that the
obfuscate should work for any program in the family. And this requirement for
any program in the family, although it's very natural for -- in the context of
obfuscation, this is what you would really want, actually makes the definition very
strong, as you will see.
And here we can only do few things. But there are actually things that we can
do. And this is this point obfuscation in France. And I'm going to talk more about
this. So here is the motivation for the problem. So for point obfuscation, so
assume Alice wants to post a puzzle in the newspaper. So those puzzles, find
the differences between these two pictures. And you want to post it together with
a solution that so that those who solve the puzzle can verify that they found the
right solution. But they -- but the solution should not be in the clear. Only those
who actually found it themselves should be able to verify, right? So it somewhat
obfuscated, so like here, right. Here, the solution's here but it's obfuscated
because its written backwards. Up side down. But Alice wants something
slightly better than that.
So again, what you want is that correct solutions will be accepted and incorrect
solutions will be rejected. That's the program. And no information will be leaked
by the program other than the ability to check different solutions. Right?
So how do you do that? So this is really -- I'm saying -- okay. So assume that
we had an obfuscator for the following family of programs which I call point
programs. So here is a program in the family. So it's a program that has a value
A in its belly and then for each input it just checks whether input equals the value
in the belly and says yes or no. Okay? So that's the program. That's the family
of programs. For each A there is a program for A.
Then if we had an obfuscator for this, then Alice could post an obfuscated
version of IA where the solution and be done with it, right, on the newspaper or in
the cloud or wherever you want. And there is the functionality preservation
implies that it's going to be okay and the Virtual Black Box implies that it's secure.
So nobody learns anything other than access to the black box for this.
So, in fact, it turns out that this can actually construct, okay? In fact, we could
construct it before. We talked about obfuscation and we will just call it differently,
but here the construction. So let G be a group of large prime order. And then
what I'm going to do, I'm going to say the obfuscated version of I sub-A. So in
short hand it's going to be the pair R and R to the A where R is random in the
group. Or, more precisely, if you [inaudible] into the program, so here is the
program. So you have two constants. Now, one is R and the other one is R to
the A instead of having A in the clear you have those two constants. And
basically when you get input you check whether R to the input equals B. Okay?
So this is an obfuscated version of that point program. Okay? And the -- and
now the challenge is to show that this is really case to actually prove which is the
case.
So actually I was planning to spend a few minutes on the proof, just to get the
idea how those things look like. So [inaudible]. So functionality preservation is
clear. So the -- it computes the right function, this program. And we show Virtual
Black Box under the strong version of the Decision Diffie-Hellman assumption
and what's this strong version. So the Decision Diffie-Hellman says that if you
are R, R to the A, R to the B and R to the AB, it is indistinguishable from four
random values or are limited to the group where A and B are chosen as random.
And here I'm going to require this is the case, even when B is chosen at random,
but A is chosen from any distribution which is well spread, which has enough in
entropy, already, it has super logarithmic min-entropy in this security parameter.
Okay?
So I'm going kind of on a limb in saying of course when A has only poor number
or only logarithmic min-entropy, there are some values that have some
nonlinearity property then of course this thing is false. But I am saying that as
soon as you have enough in entropy, it's okay. So this is true in the [inaudible]
group model, et cetera, but it's a strong assumption.
So how do you prove it? So it's two steps. So the first step is to show that this
problem is obfuscated, obfuscation satisfies some slightly weaker notion of
security which I call this -- not weaker, a different notion of security which is
called -- I call it distributional indistinguishability. So what does it mean? That for
any well spread distribution D, N distribution with a [inaudible] entropy, if you get
an obfuscation of IA where A is taken from D. It looks to you just the same as an
obfuscation of IA where A is random. Uniform. Okay?
So here is -- so N indistinguishable I cannot tell the difference between the two
things. So if we look again, what it means in our case is that the pair R, R to the
A was A taken from any well spread distribution it looks like R and R to the A
where A is uniform. Now, this was directly from the assumption before, so it's
immediate. So now the hard part is to show that from this DI you can get a
Virtual Black Box with a simulation distribution and this is actually generic, it has
nothing to do with the specific scheme of the generic statement that any
obfuscation -- any point obfuscation that satisfies DI also already satisfies VBB.
So what do I need to show? I need to show that for any adversary A there is a
simulator for any A, any single specific secret A that is fixed. The probability that
A outputs one given obfuscation of IA is the same, roughly the same that the
probability that S outputs one with oracle access to IA. So how do I do that?
So what can S do? Right? S has this oracle that gives you one, one point that
you have no idea, otherwise touches zero. It seems like it's useful, right, this
oracle? So what does S do? So what it can do, it can say choose the random
point R and compute obfuscation of IR and run A on this.
And hopefully it's going to be a [inaudible]. But something that really works, right,
because A -- any A, for any A, remember that actually there's little A, big A. So
for any little -- so this have to work for any little A, for any fixed little A. But the
point is that any adversary big A can always have some values in its belly and
just ask -- and just run the obfuscator on these values, right? And if our fixed
input happened to be one of those values that the adversary asked for, then
simulator will not work, but simulation will not work.
But intuitively this is not a real attack, you know, because this is really using this
program as the black box so we should be able to get around it. So the claim is
that this is all the -- that the adversary can do. The only thing that the adversary
can do is check the program on some fixed polynomial number of As.
And the proof is easy even this DI. Assume that there is a -- assume that there is
an adversary and assume a polynomial set T such that A can tell the difference
on with an obfuscation on values in T versus obfuscation of values not in T,
right? So it's where T is super polynomial side. Then we can of course
contradict distribution -- contradict DI because we're just using distribution which
is uniform over T and then since T is super logarithmic, this distribution has
enough entropy there and this adversary can tell difference between something's
uniformity versus something totally uniform.
So therefore, we know that for any adversary there exists such a set T. Now, the
simulator is in business because now what the simulator will do, it will essentially
-- here's this adversary. And with this adversary comes non uniform comes this
set T so. What the simulator will do, it just ask this adversary basically ask all
this -- ask -- sorry, the simulator will ask it's oracle on all the points in T if it's
found the right point then simulation easy, just keeps the adversary in
obfuscation of IA of that point, otherwise it can just give the adversary what he
want today do before in obfuscation of random point. And this will work. Okay?
>>: [inaudible] so how did you find T? You [inaudible] if and only if [inaudible].
>> Ran Canetti: So no. So there is a value which is a mean value for the
adversary with output for when you gave -- when [inaudible] is getting [inaudible]
for random value. So there is a specific value that's -- you know, that, you know,
it's point four. So I'm saying T is the set of all those values [inaudible] something
which is significantly different than point four. And [inaudible] has to be
polynomial. Otherwise they contradict the other one.
Anyway, so that's -- so that's the proof. Now essentially -- okay. So this is one
construction. And, in fact, there are other constructions for point obfuscators.
One by Hoeteck from very strong one-way permutations. Another one's from -other assumptions from strong versions of LWE, but those two actually don't get
all the way to every input but only needs -- there needs some distribution on the
inputs. So slightly weaker. But there are constructions.
And another question is -- so we -- all of this construction needs strong
assumptions. The question is that inherent. So turns out that it is to some
extent. So again, the results by Hoeteck that if you have point obfuscators at all,
then there not exist a non trivial or sub-exponential algorithm for circuit sat. And
if you have such obfuscator with public-randomness, I mean there is all the
randomness the obfuscator use is put up in the clear, which is like the ones that
we saw is like that. Then there is this super strong one-way functions, so
one-way functions such that for any adversary can invert only on polynomial
number of points. Okay?
So it inherently requires strong assumptions. Even though this small obfuscation
pass. Okay. So this looks like it's a nice game, put puzzle in the name. But
what is it -- you know, what -- maybe it's not very useful. But turns out that we
can use these ideas to do some other things. We can do -- for instance, here is
one simple thing that we can do. Instead of checking if your input equals some
value, maybe we can check, maybe if a substring of input equals some secret
values, now I have a hidden value in my belly and I want to make sure -- I want
to check whether there exists a substring in the input which had equals my
hidden value.
So why is that important? Why is that interesting? So think of your security
vendor and you want to put a firewall vendor or whatever, so you want to put on
on some server on the net an algorithm that tries to find viruses. But you don't
want to give the virus to the server because you don't trust the server, you just
want to allow the server to defined the viruses for you in case he finds it but have
no information about the virus.
So of course you can do that. All right? Or the server or the cloud or whatever.
So this is a kind of a very primitive secure cloud computing application, right. So
you let the cloud find the viruses for you.
So something else that you can do, so you can excel this is sort of checking here
if your input really equals the point. Maybe you want to check that it equals close
to the point by having [inaudible] something. And this can also be done. But I
think this can be done -- this again, this cannot [inaudible] work doesn't talk about
this definition of every input. It needs some distribution. So it's interesting to see
if you can actually extend it to every -- but so this is something else that you can
do.
You can extend it to more structural functions. You can do -- you can do
hyperplane membership. So assume that what you have in your belly -- what
you keep in your belly is a set of points in some D dimensional space and you
want to accept all those inputs, all those vectors that are orthogonal to your
vector. And all the vectors on the hyperplane that's all orthogonal to your vector.
So that's something that you can do. There's more structure. And using similar
techniques.
In fact, you can use this to get signatures with some strong properties that
[inaudible] against leak in. This doesn't do as one of the best constructions that
we have, but it's something. For signatures with weak keys.
>>: [inaudible].
>> Ran Canetti: Yeah. Sorry. And [inaudible].
>>: [inaudible].
>> Ran Canetti: Yeah I say don't mention. Yeah. Sorry. Yeah. So something
else that we could do. I'm running out of time. But something else that we can
do is instead of having a program that just keeps one secret in its belly tell me
one -- yes or no, you can have a program that keeps two secrets in its belly, A
and B. And if I give it one security, give me the other one, right? Yeah.
>>: [inaudible] Easter egg.
>> Ran Canetti: Okay. Yeah. Okay. So Easter eggs okay. Why Easter eggs.
>>: On Easter they go search for eggs all over the place. Used to hide it in code
all over the place until the customer got mad.
>> Ran Canetti: I see. Okay. Okay. So [inaudible] that's actually a good
alternative name. So the Easter egg functionality. So again, I keep this hidden
value in my belly and when I get the key or trigger of the secret that I expect then
I release my secret.
And you can implement this in a kind of modular [inaudible] way from point
obfuscator. So what do I do? So what I publish is -- so assume that I have a
point obfuscator where we obfuscate points acid before. So what I'm going to do,
I'm going to publish N plus one point obfuscators and then what's going to be -so the first one is going to be a point obfuscator for this point, A, that A -- that the
trigger before. And the other ones I'm going to be obfuscators of either A, if the
bit of B is one, or if some other random point if the bit -- the corresponding bit of
B is zero. Okay?
So now if I have the right trigger A, I can first run the first one to make sure I have
the right trigger and then I can just one by one of the other ones see if I fail or
succeed and therefore get with the corresponding bit of B, right?
And if I don't have A, then I'll learn nothing from this. So that's the way to do it.
And the nice application for this is to do this nice thing that we all know how to do
for 30 years already, which is semantic encryption, right? So because basically
what I'm going to do, the encryption of M on the key case, I'm just published
obfuscation of.
>>: KM. So it looks a bit ridiculous to go back to encryption with all this, but it
turns out this encryption is not so ridiculous, as such it needs a very strong
encryption. And I'm particular what it gives you, it gives you security against
[inaudible] keys, against weakly distributed keys and a also give you security
against key dependent messages. So even if the message depends on the key,
that's only those things that have been recently studied. It turns out that you can
get all those things from these two lockers or obfuscation of Easter eggs and it
turns out that this is actually almost the same. So the key can go back and forth.
But actually can get more. So you can get even encryption which is both the
same time secure against weak keys and key dependent and all the mix of those
things.
But turns out that I was cheating a bit. I have three more minutes. Yeah. So I
was cheat a bit in this construction that I showed you before about Easter eggs.
This construction doesn't quite work, as I said, because remember what I did I
just put there for a bunch of obfuscations of values out to you, and I kind of like
assumed I kind of made the quick conclusion that if one of them is secure than
many of them put together that are also secure.
But it turns out that this is not so easy and, in fact, it's not true in generally. So
you can actually construct one obfuscator which satisfy definition but if you save
several of them, one from the other, stop being secure.
And, in fact, we don't have any construction of foreign obfuscation that we can
prove that it's composable in this way under the notion of Virtual Black Box.
However, but we can -- actually there is a way around it. So we can relax this
Virtual Black Box for a a moment to allow for unbounded simulators. So now the
simulator is going to be computationally unbounded. But it's still going to only be
able to ask polynomially many queries to its oracle, otherwise the whole thing
becomes ridiculous. And we call this Virtual Grey Box. So it's a weaker notion of
security for obfuscation. However, this notion we can show that this -- out of the
construction from before is actually composable VGB, not VBB. So that's great.
And then it turns out that this VGB, Virtual Grey Box is so [inaudible] because
actually suffices for this applications for strong encryption that we had before.
And, in fact, it also implies that you also have this Easter egg functionality implies
also encryption which is resilient to related key attacks. Right?
So I have an encryption under a key K and under a key which is rated K plus 1
[inaudible] 2K, whatever. It's still going to be secure. And this is works in this
crypto.
Okay. So I'm running out of time. So there is this other set of requirements from
obfuscation which is non-malleability, so I want to get, I want to be able make
sure that a program that I have cannot be changed in a meaningful way to
another related program. So this Virtual Black Box, it seems that if it guarantees
that because a similar thing cannot do that of what could access to change to
functionality in a meaningful way, so therefore the adversary should only
[inaudible] it turns out that it doesn't really, you have to make more requirements
then you can do it, then you can define it, then you can construct it. But only -it's -- we can do it in a very limited way. I think there are lots of interesting
questions there how to do it. And that's almost it.
So the one set of questions that remained is what if you cannot obfuscate?
Obfuscation is a very hard problem. What happens and you cannot do it just out
of scratch? But maybe with a little bit hardware assistance you can do it. Maybe
you can have some secure hardware which is not everything but some secure
hardware maybe you could do things.
So can we have hardware assisted obfuscation? So of course if we don't make
any restriction on the hardware then of course we can have a secure hardware -do everything in secure hardware and be done with it. But can we minimize the
security requirements from the hardware? The hardware would be very small.
So there are some partial answers.
So we can weakly obfuscate any program, given only just one out of two
memory, simple memory. But this doesn't really give us full obfuscation. It would
be nice to be able to do something like this.
And another question is so can you design a hardware that does it? Another
question, can you use hardware that exists widely like a TPM, right, that every
computer has this secure chip, small chip that could do secure obfuscation but
only -- not loading the TPM too much.
So this is a very good question and I think it's a great research question.
>>: [inaudible].
>> Ran Canetti: Yes?
>>: Sorry. [inaudible].
>> Ran Canetti: Yes.
>>: [inaudible] number of bits, et cetera, in the TPM [inaudible].
>> Ran Canetti: Right.
>>: [inaudible].
>> Ran Canetti: So the chip, assume the chip is slow, this TPM is slow. So you
-- right. You don't want to wait for it too much. Okay. [inaudible]. Okay. So this
is one set of questions about [inaudible] and some other questions that naturally
are more high level questions. So can we have -- so this Virtual Black Box or
Virtual Grey Box we can only obfuscate a very simple functionality, very specific
functionalities. Can we have some generic obfuscation algorithm that go Greek
and obfuscate everything or maybe almost everything or whatever, so larger
class of programs with some reasonable notion of security? So we know that it
would be for everything we can't but maybe something else. And the sad thing is
that we don't think we would have Canada dates. It's not that, you know, I had
candidates, I don't know how to prove that secure -- find the right notion. I didn't
even have [inaudible] that does anything meaningful. And in a generic way. So
we used to encrypt -- you know, we always go on the circuit gate by gate, you do
this, we do that, we always could get everything, right?
But here it doesn't seem to work. So this going gate by gate, this seems like
inherently doomed to failure somehow. Because really what you want to hide is
the internal intermediate values of the computation. So can we do anything?
So in a fully unmarked encryption it sounds like a great thing and maybe you can
use it but, you know, I don't know. And another direction is can we come with a
set of cryptographic tools that will have practical obfuscation problems, you
know? Those people out there that want to obfuscate code, those vendors can
we have tools that can help them come up with better products that really can be
used? So that's another set of questions.
And another one is this length, this different way of looking at cryptographic
problems maybe can give us some understanding about cryptography in general.
So that's it.
[applause].
>> Ran Canetti: Yeah?
>>: Can you go back to your definition slide, way back.
>> Ran Canetti: Okay.
>>: And I was confused about some -- the alternate definition you gave. And
maybe I'm just misunderstanding the quantification. But just a second.
>> Ran Canetti: This one?
>>: Yes [inaudible] is that over all possible [inaudible] over all possible input
distributions?
>> Ran Canetti: So for -- it's for any program. So you have -- so the way
[inaudible] here there was no family here, which is very program general. But if
you want to do something which is achievable, you want to fix the family
programs and now you want to say for any program in the family it should
happen. So you fix a problem in the family, and these two random variables -these two probabilities, sorry, should be the same.
>>: Again, maybe I'm misunderstanding you but isn't it the case you've got a
family of programs, each of which produces one 50 percent [inaudible].
>> Ran Canetti: Uh-huh.
>>: It's very easy to simulate producing one [inaudible] input but not really
signalling the program.
>> Ran Canetti: Yeah, but so there's one specific program that's out there that I
put, you know -- so let's say in the hyperplane. The functionality, the program
has a hyperplane and it gives you input, yes, if it's here or no if it's there, you
know. If it's above or below or whatever. So maybe that's [inaudible] so that's
not very good. So some function like this that -- so but you have a hyperplane
hidden in this program.
So somehow this adversary gets access to this obfuscated program, it tries to
find some interesting [inaudible] so you know if this hyperplane is the first -- you
know, the first coordinate is a different [inaudible] and this what it tries to do. And
it seems like they should be able to do the same thing but just oracle access to
the problem.
So this program has the hyperplane written here somewhere. And they can
answer messages yes or no above, below.
>>: But the overall probability of one I can match.
>> Ran Canetti: No, so ->>: The overall -- but there is one specific adversary and for any adversary that
chooses simulators. So there is a program and there's another theory that tries
to put one specific property or predicate of this particular problem. So this
pinpoints the bit.
>>: Okay.
>> Ran Canetti: Okay?
>>: I think so.
>>: Maybe like last question [inaudible].
>>: [inaudible] two definitions, VBB and VGB?
>> Ran Canetti: We can separate them for -- so there's things that [inaudible] so
we can separate them for the -- so I'm trying to figure out which way it goes. So
we can separate them for the circuit model, but we cannot separate them for the
Turing machine model.
>>: So specifically there's a program which can be obfuscated [inaudible].
>> Ran Canetti: Uh-huh.
>>: It cannot be a [inaudible].
>> Ran Canetti: Yeah. In particular this -- this program that -- for the circuit -look at this family of programs from the BGI thing. Right? For the circuit model.
This program is actually iteratively obfuscated by the VGB model. Just because
he can break the encryption, whatever, by -- just because in the circuit model. In
the Turing machine model I don't know.
So, in fact, it's a good question but the difference between the two. It could very
well be, you know, for all we know that any program can be obfuscated VGB
obfuscated in the circuit model.
>>: That's where we should leave it then. [inaudible].
[applause].
>> Craig Gentry: So basically I'm going to be -- I'm going to talk about
outsourcing computation. I know there's another talk just like a couple after mine
of the same words in the title and I hope that's okay since this is a cloud crypto
workshop and, you know, there's bound to be some redundancy in that area.
But I'm going to talk about outsourcing computation privately, verifiably and
perhaps even practically. So first of all privately. As you may I have guessed, I'd
like to talk a little bit about fully homomorphic encryption, but not in as much
detail as usual since I'm a little tired of talking about it.
But basically the problem is that you have some client with an input and you have
a server. And the client would like to delegate the computation. And
furthermore, she would like to delegate it in such a way that the cloud doesn't
even see what her input is. Okay?
So that means basically that she wants to encrypt her input to make it private
from the cloud. And then somehow despite the fact that it's encrypted, you would
hope that's the case for any function F that's chosen either by the cloud or the
client or whatever. It's possible for the cloud without the secret key to compute
something that looks like the encryption of F of X.
Okay? And then at that point the cloud just sends it to the client and the client
uses her decryption key to recover F of X. Okay? So this is a fully homomorphic
encryption scheme. In the past it was called a privacy homomorphism. But
nowadays we call it homomorphic encryption. And the fully just means that we
would really like it to work for every possible function F there that can be
expressed as a boolean circuit, say you know in the past there were plenty of
schemes that worked for some subsets, some small class of functions. But the
fully, we would like it to work for all functions.
So a bit more formally, fully homomorphic encryption scheme has the usual
algorithms of an encryption scheme, KeyGen, encrypt and decrypt. And it has
this additional algorithm eval, which is again a public algorithm. It doesn't take
the secret key as input, it just takes some ciphertext and a function and it outputs
another ciphertext which encrypts the function applied to the input.
And what we would like a property of that find ciphertext there is compactness.
There has been some schemes in the past where if you kind of operate on the
ciphertext, the resulting ciphertext grows and grows and grows until it's really,
really huge. What you like is that the encryption of F of X there, let's assume that
F of X is the single bit, the output of a boolean circuit. We would like that
ciphertext to be a nice compact normal looking ciphertext so that's that it's really
fast to decrypt it and recover the output.
And the point is we want to delegate processing. And if for some reason it took a
really long time to decrypt this output ciphertext that wouldn't be any good. We
want the recovery time to be kind of independent of how much computation was
involved in computing F. Okay?
And the notion of securities is the usual notion of securities. It's just semantic
security. It should be hard to tell whether a zero or one is encrypted.
And I just had to though these slides in here because I love them so much. I like
to describe homomorphic encryption in terms of a physical analogy. So you can
see that, you know, kind of it works in the real world in some sense.
So physical analogy I have is that you have a jewelry store on the other hand
Alice and she has some raw materials and what she would like is to delegate to
some workers that she has the processing of these raw materials into rings and
necklaces. Okay? So this is what we want him to do. But she's worried about
theft, right? If she just willy nilly gives the workers the raw materials, she might
lose some of her materials. So she's worried about theft. So in some sense she
has a [inaudible] problem to the homomorphic encryption context. She would like
a worker to process the raw materials without giving away complete access to
the raw materials. Okay?
And the solution here is kind of a encryption glove box, you know. It's so you
have Alice here creates this glove box with a lock on it. And to do this, she puts
the raw materials in the box, and she sends that over to the worker. And, you
know, the worker sticks his hand inside and assembles the ring while it's inside
the box. And he, you know, is assuming the box is impenetrable, so it's kind of
useless to him. So he sends it back to Alice, and Alice just uses her decryption
key to unlock it and recover the finished piece.
So that's basically the delegation of computation scenario that we want here to
get out of our encryption scheme.
Essentially the homomorphic property is the gloves. Most encryption schemes
don't have gloves to allow you to manipulate what's inside.
Okay. So now, suppose you had a fully homomorphic encryption scheme. So
how would you solve like maybe a typical cloud computing problem? So
suppose for example that I just -- I encrypt all my files, I put them out on the
cloud and at some later point I want to retrieve some particular files, you know,
some files that have some combination of keywords.
So what I do is I just encrypt the original files with this homomorphic encryption
scheme, okay, and then later I have some query I want to make, and that query
can be expressed as some boolean circuit, basically. Okay? So you just encode
that query.
And then the cloud just operates on the encrypted data that's sitting on its
servers, combines it with the function F and runs this evaluation algorithm
essentially processing the data while it's inside the encryption box. And what
comes out is a ciphertext that is supposed to by definition encrypt F applied to
the messages inside those original ciphertext there. And that's what I wanted,
right? I wanted basically -- that's the response to my query. That should be
precisely the files that satisfy the particular constraint having these keywords.
Okay. And so there are lots of potential applications of fully homomorphic
encryption. I mean, you could do an encrypted bing search, for example. This is
very theoretical. You wouldn't -- I mean, it would take a really long time to do
this. But in principal you could just -- I could just take the bits of my query and
encrypt them, okay, and, you know, if bing agreed to then it could just take my
encrypted query, combine it with all the data that's sitting here and whatever
search function it uses and come up with a ciphertext that is responsive to my
query. Okay? And it would never see what my query was.
This would be very slow, so I'm not claiming it's practical or anything. But in
principal, it's doable.
>>: [inaudible] homomorphic [inaudible], you know, from actual queries like
[inaudible] entire -- I mean if you [inaudible] actually don't have to look at the
entire data? Because here for example if I'm [inaudible] is encrypted ->> Craig Gentry: Yeah. So it's kind of doubly inefficient, not just the fact that
everything is encrypted but also that in practice you would use indices to make
the search go blazingly fast, you know, some sort of -- can you imagine a binary
search type thing.
>>: [inaudible].
>> Craig Gentry: But if everything is encrypted you can't really do a binary
search on the data because you can't see anything, you're just doing everything
kind of blindly.
>>: [inaudible] characterize, I don't know, this kind of efficiency limitations
[inaudible] privacy. Because I haven't seen that like formally [inaudible] but I
haven't seen like formula what exactly is the price.
>> Craig Gentry: Yeah. Yeah. That's true. That hasn't really been formalized.
But basically the idea is when you try to encode binary search as a circuit, what it
ends up looking like is, well, the circuit has to touch all of -- all of the data. I
mean at some static thing that just has to touch all the data. And that's what the
homomorphic encryption scheme is run on, it's run on the circuit that's of size
linear in the data even though in principal a binary search on unencrypted data
could be log time.
>>: Homomorphic encryption on e-mail would be great but it's not [inaudible].
>> Craig Gentry: Yeah. Okay. So some other applications. Private information
retrieval. I'll talk about that later. But basically the idea is you just want one little
slot of information without telling the database which slot you just picked.
Searching encrypted data, that's basically what I was talking about there. There
are other approaches using things like pairings but this is a fully homomorphic
encryption, it's a bit more flexible because you don't have to prespecify some sort
of keywords that you're going to search for, you can just do it dynamically.
You can also do things like have an access control mechanism that's kind of
oblivious. I can encrypt my credentials and offer them to the server and the
server can take these encrypted credentials and offer me back the -- some
information only if my credentials satisfy some constraint. But of course it would
be operating on these encrypted credentials, it wouldn't even know what exactly
they are.
And there are various other good things that homomorphic encryption does for
two party, multiparty computation.
So this is -- it was recognized early on that this would be very useful thing, this
privacy homomorphism, basically right after the invention of RSA, and their
motivation is basically similar to ours now, you know, searching on encrypted
data. And over time there have been plenty of somewhat homomorphic
schemes. Like in RSA if you just very, very basic RSA if you just multiply two
RSA ciphertexts under the same key, what you get is a ciphertext that essentially
encrypts the product of the original messages. And so you can do a
multiplication operation inside the encryption box. So it's multiplicatively
homomorphic. But there's no way to do an add operation. And really to do
general functions, you need, you know, basically if you look at and gates and or
gates and so far all these can be expressed as some combination of additions
and multiplications so if you could just do additions and multiplications sort of
indefinitely then that's really what you -- all you need for fully homomorphic
encryption. But we didn't have that. We had schemes that did one operation.
And here is a scheme that does quadratic formulas. And as I kind of alluded to,
there are other homomorphic encryption schemes that, yeah, in principal you
could evaluate the ciphertext forever. But the ciphertext just expands and gets
bigger and bigger even exponentially with the depth of the circuits, so they're not
really very practical.
Okay. So now we have some fully homomorphic encryption schemes. We have
three of them now. All of them basically use the same blueprint, which is a little
disappointing. And I'll talk about this blueprint later. And this blueprint is kind of
inherently slow. So therein lies the problem. We have the solution but it's a bit
impractical because they all use this blueprint and we don't have any other way
of doing it. We have the solution, but it's a bit impractical because they all use
this blueprint and we don't have any other way of doing it.
So I'll get back to this. But I want to get -- talk about one more application that I
was involved with of FHE, and that is this idea of outsourcing computation
verifiably, not just privately but verifiably.
And here the setting is you know, again, you have a client that wants to delegate
the computation to the cloud but she wants a quick way to check the cloud's
work. Of course she could, you know, just duplicate the cloud's work, if she
wanted to take the time, make sure it's the same. But the point is she wants to
spend much, much less time than it would take to do the work herself. Okay?
And so here we have, you know, is kind of somewhat analogous setup. We have
some input X and a function F. And the server puts in a lot of work. And it
outputs F of X, something that looks like F of X and kind of a proof that this is
indeed the correct value, F of X. Okay? And the client just verifies that the proof
is correct. And unfortunately homomorphic description doesn't solve this problem
immediately because I mean the server can do all sorts of computations on
encrypted data, but how do you know at the end that it encrypts the results you
want as opposed to just, you know, the cloud just created a fresh ciphertext that
encrypts whatever it wants that you have no idea.
So a bit more formerly, so a verifiable computation scheme has four algorithms,
also. This is a KeyGen algorithm. But in this case, the KeyGen algorithm
depends on the particular function F that the server wants to be computed.
Okay? The function F can be like a universal circuit. So universal circuit can
take some description of a function F as well as an input and then it will -- it will
output this function applied to this input. So it's not -- when I say it's constraint of
a particular function, that sounds very limiting. But in fact that can be universal
subject that would have allowed you to encode any function.
So second we have a problem generating algorithm, which is sort of like an
encryption scheme, except it uses the secret key that Alice hides. Okay.
And then there's a computing algorithm where basically the worker takes
whatever the output of the problem was and just kind of manipulates it and
outputs some answer.
And then finally there's a verification algorithm which includes -- it's kind of like
analogous to decryption in some sense. I mean, she takes the output of the
worker and kind of decodes it and sees if it's the right thing or not, verifies the
proof. Okay?
And again, we want a compactness property here. We want the output of the
workers nice and short and so that it's very fast to check it. Okay? Hopefully
independent of the function and size.
And security, what we want is, well, basically no bad proofs that it's impossible
for an adversary to output some Y prime and the proof that, you know, why prime
is indeed F of X when it's not the case.
And even in the setting it would be nice to have some input privacy although
that's not really part of the definition. You can imagine that Alice just originally
would want to keep her input private. But we're not demanding that here. That
would be just kind of a feature.
So for this there are lots of applications. So I mean there are many reasons you
would want to check that the cloud performance correctly, even protect against
benign errors. If the data is very important it's going to be used in some other
computation or financial transaction. You want to protect even against benign
errors.
For small devices maybe a small device would like to outsource this computation.
And there you have just sort of the same issues.
In large scale computations like SETI@home and folding@home, there you need
to weed out the bad results before they're incorporated, obviously. And if the
results weren't verifiable, there might be a strong incentive for users to fake them.
So there are plenty of kind of ad hoc solutions to this. One is just redundancy.
For example, at folding@home you just have multiple people compute the same
thing. Take a majority. And there are various audit based solutions that -- I
mean, the client itself for you know like folding@home can recalculate some
portions itself just to make sure that it's correct.
And there are various kind of secure hardware solutions. But they don't really
completely solve the problem.
In terms of more a theoretical crypto work there's a long line of work beginning
with interactive proofs where the idea is that you have a powerful prover is trying
to convince a verifier of something that it can't do on its own. These are
interactive. We kind of wanted to avoid interactive solutions.
There are probabilistically checkable proofs. So there you can have a
computation that's, you know, it's a very long computation. And it has a very long
proof of correctness. But what you can do is encode this proof of correctness in
a PCP way, which means that the verifier can just take portions of the proof and
as long as the prover doesn't know which portions of the proof -- beforehand,
which portions of the proof is verifier is going to pick, then it's basically stuck and
the verifier just checks these portions and then it's satisfied with some probability.
You also have efficient arguments and CS proofs. And there basically there's -you sort of overlay some crypto on top of a PC proof. The prover sort of commits
cryptographically to its PC proof, and in the case of CS proofs the challenge, kind
of the query by the verifier is done non-interactively by using a random oracle,
okay? So the prover applies a random oracle to its own proof and -- which is
kind of hard for the prover to predict what it is, and that says which bits of the
proof the prover has to give to the verifier. Okay.
And so -- yeah, and so all of these schemes there's really no notion of privacy of
the data, which would seem like a pretty important aspect in a lot of scenarios.
Okay. So I want to talk I guess very briefly about. I think I have at least 10 extra
minutes. About a recent solution that we have using fully homomorphic
encryption. To this problem. And what you have is a non-interactive efficient
verifiable computation scheme for every function F without these PCP based
solutions that we had in the past.
And furthermore, there are no random oracles. I mean random oracles don't
really exist, so that's kind of and impediment to their acceptance.
So the size of the worker's response grows only linearly with the output of the
function of F that's obviously the best you can possibly hope for. Whereas in the
case of the PCP based proofs, I mean, there's some higher dependence. It's still
not very high, but it's a little higher than just plain old linear.
And we get input privacy for free because we're using fully homomorphic
encryption so that's what you would expect. So there is a drawback of this
scheme and that is if you want a function -- you have this function F and you're
going to be receiving multiple different inputs for it over time and you're going to,
you know, want to evaluate it dynamically at that time. But it's going to be the
same function F. Okay?
So we have this amortized notion of complexity here where basically for this
function F there's a one time setup procedure, which is kind of expensive for the
client. It takes as long as computing F. And once that is done, the subsequent
evaluations of F, you know, the client will receive this benefit of the verifiable
computation scheme that it will just have to do a short verification work on what
the worker sends to it. Okay?
And there's one other drawback and that is, okay, suppose somebody cheats,
okay? So I detect -- I receive a proof that doesn't verify. What do I do then?
So unfortunately in this scheme, if you detect that someone has cheated, you
basically have to stop at that point and you can't -- can't use what you created in
this one time pre-processing phase any more, you have to create another one.
Okay? Okay.
So here's the high-level idea. It's a very simple idea. I hope it's not too simple.
But it's -- the idea is basically take a one-time verifiable computation scheme,
okay? In this case, you're only going to do -- you're only going to compute the
function F once. Okay? And it's pretty obvious we have a one-time verification
scheme. And that's Yao's garbled circuit, which I'll describe in just a minute.
Okay? But, you know, for some reason that I'll get to in a minute, trying to reuse
this Yao garbled circuit is not secure at all, okay. But it's secure once. Okay?
So you take that one-time scheme and you basically sprinkle some FHE on top
and so what we'll do is like if you want to evaluate F repeated times, each time
you generate a new FHE public key, and instead of just sending the Yao input
wire labels in the clear, which is what you would do up here, you encrypt them
under that fresh homomorphic encryption public key. Okay?
And then what happens -- I mean, what would happen in the original scheme is
that the user would kind of evaluate the Yao garbled circuit and get some label
for an output wire. Okay?
So here though the worker does the same thing but it's all inside the
homomorphic encryption scheme, okay? Homomorphic encryption scheme,
okay? So, I mean, you know, any function that you can evaluate you can sort of
evaluate inside the homomorphic encryption scheme and that's kind of the point.
And so you just evaluate that entire garbled circuit while inside the fully
homomorphic encryption scheme. And you -- the worker ends up with an
encryption of the bits of a Yao label, instead of the Yao label itself. And the client
just decrypts that, checks that it's a good label.
Okay. So just to review in case you don't know so here's Yao's garbled circuit.
So you have some function F that you want to compute. You express as a
boolean circuit with and, or, not gates. And then you have a lot of gates that look
like this, that have two inputs and one output. And you associate to each wire in
the circuit a couple of labels. Strings. Say 100 bit strings. Okay? Or 128 just to
be precise.
And then to each gate in the circuit you -- you associate some ciphertext, four
ciphertext. And what this ciphertext is for example is -- what it means is that to -you'll -- you recover basically the output wire -- let's say the operation is an and.
The and of zero and zero is zero. So G00 here is going to be 0. So that's -- you
will recover C0 if you have the labels A0 and B0. Okay? If you happen to -- so
as you percolate through the circuit you're going to have exactly one label at
each wire. Okay? And if you happen to have A0 and B0 then you can go here
and you can decrypt the ciphertext and you can recover C0. Okay?
But if you have A0 and B0 and you try to decrypt anything else, you're going to
have a problem, because there's going to be some label that you don't know, for
example B1 here. So that means you won't be able to recover C of 01. Except
for the case in and that happens to be the same thing. But you wouldn't know
that. Okay?
And then you publish these ciphertext in random order. So this is unclear which
are associated to which wires. Okay? So to use this as a -- well, let me see
what my next slide is. So to use this as a one-time scheme, basically the client
takes the circuit associated to the function F that it wants to evaluate and creates
a garbled circuit for it. And when it has a particular input that it wants to be
evaluated for F, well, that's associated to, you know, one label for each wire. It
gives those labels to the worker. The worker just percolates up through the Yao
garbled circuit. And it gets the label for some output wire, and that's the answer.
Okay?
And intuitively the reason this is not clear is that I mean the only way that the
worker could forge is if it somehow output the other -- if it figured out what the
other output label for the output wire is. But how is it going to figure that out if it
doesn't, you know, it's not able to decrypt those other ciphertexts and the garbled
circuit. Which it shouldn't be able to. So it's basically the intuition.
But that's why it can be used as a one-time scheme but now imagine you try to
use it twice. Okay? So I've already given the worker basically half of the input
labels for the Yao garbled circuit in the first round. Now, if there's a different
input I give him the labels for those associated labels in round two, at this point
he usually has like, you know, three-fourths of the labels and, you know,
eventually he's going to sort of collect all the labels for the input wires, both for
zero and for one. And if he has all of those labels, then he can basically unravel
the whole circuit. It's not garbled anymore at that point. Okay? Then it becomes
insecure. So that's why we can't reuse it, okay, it just has to be used once.
Otherwise the security are going to break some.
Okay? So, you know, again our solution is really very, very simple. It's so simple
it can just be put in a nice little picture like that. It's just to homomorphically
encrypt under a public key that's specific to the round all of these particular labels
and, you know, again the user just kind of does the same thing he did before in
the one-time scheme except all the action happens inside the fully homomorphic
encryption scheme so that it gets a homomorphically encrypted label at the
output, either C0 or C1.
>>: [inaudible].
>> Craig Gentry: I like the hands on approach. And the intuition of the proof is
basically okay, basically I'll -- let's say the worker cheats successfully in the
scheme. Okay? There must be -- then there must be basically some round for
which he outputs a proof that's incorrect with respect to some input, okay? So
let's just guess which round the worker cheats on then, okay? We'll be right with
some reasonably good probability, because there are only so many rounds. And
then -- and the other rounds we just replaced the homomorphically encrypted
stuff that would -- we would normally validly create with just kind of random junk.
And obviously that's kind of useless to the attacker. You shouldn't help the
attacker.
And so once we've done that basically really all the information the attacker is
getting is what he correctly got in that one round that we've targeted. So that
means that if he can break the scheme, you know, for that one round, essentially
he's breaking the one-time scheme, that one-time Yao scheme. That's a very
hand wavy description of the proof obviously, but that's basically the intuition.
Okay. So but what if the user -- what if the worker tries to sheet. Okay. We
know he can't cheat once. But let's suppose, you know, he's respond with
something and didn't verify it properly. Okay? We've caught him. We didn't
accept his output. But now what do we do? Okay? Can we continue? The
problem is, no, we can't really continue because here's an actual attack that the
worker could do to gradually unravel the entire garbled circuit inside. Okay?
So here's what the worker -- the worker is going to try to figure out what each
Yao label is. Okay? So what it does is -- you know, in the round it's given a
homomorphically encrypted label. It just zeroizeses one of the bits. So it
replaces one of those ciphertexts with an encryption of zero and sees if that
messes things up. Okay. It does everything normally after that. Sends back a
response. And if the client says oh, that's bad, then that means oh, the -- that
must have been of encrypting a one not a zero.
A and so in that way, you know, since it kind of gets a bit of information from the
client in each round as to whether it verified properly or not, the worker can -- a
malicious worker can gradually figure out what the labels are and then just
destroy the entire circuit.
So what's the countermeasure? Well, basically after you detect cheating you
have to start with a new garbled circuit. It's not quite as bad as it sounds,
because you could have a situation like folding@home where it constructs
garbled circuit for that day. And maybe it has millions of users. So it's just, you
know, it creates one garbled circuit for millions of users to worker on on that one
day. And then it's just not going to give any verification information through that
day. So it will receive all these responses from users but it won't say whether it
verified or not. So it won't give any information back to the users.
And then at the end of the day, it will do some accounting procedure where it
says okay, those responses you gave me were kind of screwed up and I'm not
going to pay you for those. And then it starts over then the next day with a new
garbled circuit. So as long as this verification information that goes out always
occurs after all of the -- all of the client's, you know, verification work has been
done then everything is fine. So you can still get some amortization benefit here.
Even though it doesn't work as well as you would hope.
So it would be nice to have better countermeasures, though, so that's an
interesting open problem.
And I'm running low on time. So I want to talk a little bit about practicality with
FHE schemes. And this is also an interesting open problem.
So we have -- as I mentioned, we have three FHE schemes now. They all follow
the same blueprint which is the following: You construct a somewhat
homomorphic encryption scheme, that means a scheme that maybe can do
additions and multiplications on the underline plane text for a while but then
maybe it gets stuck for some reason. Okay? So it can go on for -- it can
compute functions of some complexity.
And then in the blueprint the idea is to take this -- hopefully you can take this
somewhat homomorphic encryption scheme and kind of like beat it down until it
has a certain property called bootstrappability, which is that what you would like
for a reason that I'll tell you in a moment is that the encryption scheme -- you
know, it has some class of functions that it can evaluate homomorphically. And
what you would like is that the decryption function of the of encryption scheme
itself is in that class. It's kind of like a self-referential property with the encryption
scheme. Okay? And this is a property I call bootstrappability.
And -- well, I'll just go to the next bullet. It turns out that if you have this property,
if a somewhat homomorphic encryption scheme has this property that it can kind
of homomorphically evaluate its own decryption function and still get a correct
result at the end, then it's easy to transform that scheme into a fully homomorphic
encryption scheme as a general transformation.
So, yeah, so the idea in the second step is that you just try to massage this initial
somewhat homomorphic encryption scheme so that its decryption function
becomes like a flatter and flatter, you know, can be expressed by a flatter and
flatter circuit. And once it becomes flat enough, then the scheme becomes
bootstrappable because the decryption function is -- become, you know, is finally
in the set of functions that the scheme with evaluate.
So, anyway, all the schemes follow this basically blueprint. And as you might
imagine, this blueprint inevitably leads to slowness. Because it involves
homomorphically evaluating the decryption function. So the decryption function
on its own in this scheme is kind of slow, okay? But let's imagine that you have a
decryption function that's expressed as a circuit with lots of, you know, bits on the
wires. What you're doing here is you're homomorphically evaluating the
decryption function. That means you replace each of these bits on the wires of
some huge ciphertext, okay? So you're kind of like squaring the complexity of
the scheme. It's like the decryption function times the size of the ciphertext in
some sense.
So it's kind of inherently inefficient. And so one natural question to ask is there
another blueprint? Another question is maybe -- maybe bootstrapping is not
really necessary. Maybe if you just looked at the somewhat homomorphic
encryption scheme, maybe it -- maybe it does a good job at evaluating most of
the functions that we're interested in.
So why the -- why do we only have this particular blueprint? I don't know. It just
-- I mean, speaking for myself I could say I was looking at a lattice-based
encryption scheme that was quite homomorphic. You could do, you know, a
good number of additions and multiplications. So it was interesting. And each
ciphertext has some noises associated to it. There's just, abstractly there's some
noise parameter associated to the ciphertext.
And as you add and multiply the ciphertext, it has the effect of growing this noise
until the noise basically drowns out the signal. There's no -- the ciphertext
becomes indecipherable. Okay. So the question arises how to reduce the noise
of a ciphertext. So you'd like to take a ciphertext that has some large noise and
you like to make the -- you'd like to somehow refresh the ciphertext so that got
another ciphertext encrypts the same thing as the original one but it has smaller
noise so that you can combine it with other ciphertexts for a little while until that
noise gets too big and then you refresh again. And so you continually refresh the
ciphertext. And that just happened to -- the way to refresh ciphertext just
happened to be to kind of homomorphically apply the decryption function. So I
mean, if you just apply the decryption function, you know, then you would just
totally create a -- you'd decrypt it and you would create an entirely new ciphertext
that encrypts the same thing that has small noise. Well, that's easy. But
obviously -- but we don't want to give way the decryption function.
But it turns out if you homomorphically apply the decryption function it has a lot of
the same effect.
So this particular scheme had some noise in it, and the bootstrapping just arose
out of how to solve this noise problem. But in the past people have typically
investigated schemes that really didn't have any noise associated to them. So
there's this line of research started by Koblitz and Fellows where basically -- well
I won't go too much into it, but basically an encryption of zero is like an element
of some algebraic ideal. And encryption of one is like one plus some element in
the ideal. And so as you -- you know, you add, you know, two things in the ideal,
get another thing in the ideal, that's like adding zero to zero to get zero and, you
know, all the other operations work out.
And so there's no noise associated here. It's just a question whether the element
is in the ideal or not. It's called the ideal membership problem. And if that's hard,
then you would get a fully homomorphic encryption scheme if you could get
everything to work efficiently. But unfortunately these schemes have the problem
that I mentioned before that as you add and multiply ciphertext basically the
individual ciphertext are multi variable polynomials and as you add and multiply
them, they're multi-variant with, you know, lots more monomials than the original
ones and they just expand, get really large.
And so for this second question of can we avoid bootstrapping, maybe many
functions we don't even need it, I just want to have give you -- we've actually
implemented this scheme. So I want to give you some modification for this
question.
So if you just look at a somewhat homomorphic encryption scheme where we're
just doing kind of will simple additions and multiplications of ciphertexts and do
additions and multiplications on the underlying plane text without the refreshing
step you could see things aren't so bad. I mean, we have -- this is kind of
increasing security parameters. Okay, we have -- you can see the public key is
kind of large here. Okay. It's not great. I'm not saying it's great. But the running
times are, you know, milliseconds and seconds, right?
It's not totally ridiculous. But if you go to the fully homomorphic scheme where
we have this refresh procedure, what you see is that for like moderate sizes just
refreshing a single ciphertext over here takes a really long time, like three
minutes.
So if we could avoid that, that would be fabulous. And I think I'm really over time
now. But there are a couple of interesting settings where the polynomials, some
interesting polynomials, for example in the private information retrieval context,
where a client just wants to extract the Ith bit of a database without telling what I
is, and like keywords where you just wanted to find like if a particular string is in
the file.
He's are two examples of where the polynomial that's being evaluated is actually
really low degree. Not really low degree, but low enough so that really
bootstrapping is not going to be needed. And so it would be interesting to look
for other -- try to categorize the types of functions that don't need bootstrapping.
>>: [inaudible] homomorphic encryptions [inaudible].
>> Craig Gentry: No, we didn't implement that.
Okay. So there are many open questions. I'm sure some of you know there's
DARPA BAA for relating to computing on encrypted data so, they're offering 20
million dollars in funding. And they're focusing on speeding up FHE. And I'd like
to encourage you not to apply. [laughter].
All right. That's it.
[applause].
>>: I guess we still have just a minute before the next session so if [inaudible]
seriously, one question maybe.
All right. So we have a nonzero break and I guess organizers maybe like 11:20.
[inaudible]
Download