1

advertisement
1
>> Kristin Lauter: Okay. So today, we're very pleased to have Yosi Oren
visiting us from Tel Aviv University. He'll speak to us about the mechanical
cryptographer. Thank you.
>> Yossef Oren: Okay. So I'm going to talk to you about the mechanical
cryptographer. It's a bit of a pretty exciting thing we did in our group in
the previous few months, and I think it's really interesting and I hope to
share the excitement with you, okay?
So first of all, a bit about me. I am part of the Cryptography and Network
Security Lab at Tel Aviv University, and among other things we do in our lab,
we research [indiscernible] systems. We do foundations of cryptography work,
RFID work. Personally, my interests are secure hardware, which means power
analysis, ways to attack using power analysis, both applied work, lab work and
[indiscernible] work. And low-resource cryptography for lightweight computers,
particularly RFID attacks.
And other things I do are cryptography in the real world, which are voter
privacy, web application security and so on.
Specifically, today, I'm going to talk about power analysis and other hardware
attacks and specifically by the type of power analysis attack which is very
unique in its application; specifically, it can be used where previous forms of
power analysis were unusable. It needs much less data and it can be much more
versatile.
Okay. Things which I did which are not related to the academic but might be
interesting to you, I did a lot of coding in my life. I was both a technical
leader and a project manager. I know it's different things in Microsoft. And
I also wrote some comedy. Might be apparent in the talk, depending on you.
Okay. So I just want to calibrate people here that might be from different
backgrounds. Who here knows what a flip-flop is? Wow. Not sure, right? Who
here has ever touched a scope? Okay. Who knows what AES looks like from the
inside?
>>:
AES?
>> Yossef Oren:
AES.
Okay.
So okay.
So here is the AES cipher, okay?
The
2
structure of the AES cipher, the uses of the AES cipher. You get plaintext on
the top. You get ciphertext on the bottom. You get key and what's inside?
Bit flipping, bit shifting, permutations and so on. This is a very efficient
algorithm, okay? Modern CPUs can do 2 to the 31 AES operations per second.
Very efficient.
So if I give you the plaintext and the key, it's very easy to do the
encryption. If I give you the ciphertext and the key, very difficult to do
decryption. Yes?
>>:
What is two billion.
>> Yossef Oren:
>>:
Two gigahertz.
AES [indiscernible] multicore.
Multicore, okay.
>> Yossef Oren: Yeah. You have to do the -- you have to cheat a bit.
terms of how many times per second the AES core is involved.
But in
Okay. So if I give you the plaintext and the key, cipher tex is easy to
calculate. On the other hand, if I give you the plaintext and the ciphertext
and the key is not given, it's difficult to calculate, okay? Why is this so
hard? Well, this is what cryptography, this is why the cipher was designed
this way. It's designed to be difficult to do cryptanalysis. Plaintext and
ciphertext do not lead to the key.
And what that means, essentially, is that there is no efficient way to
represent the key as a function of the plaintext and the ciphertext, okay. In
fact, most random functions which have 128 bit inputs or 256 bit inputs and 128
bit outputs are very difficult to represent most random functions.
So if you have a very difficult to represent function, you can do either two
things. You either spend a lot of space, you create a huge lookup table, or
you can try the efficient thing, which is known key and known plaintext,
ciphertext, and just go over it a lot of times until you find the pair you're
looking for. This takes a lot of time.
So it's either a huge space, a long time or some sort of trade-off between the
two, which is inefficient. But anyway, there is no efficient way to represent
the key of the function of the plaintext and the ciphertext, okay? I just want
3
to put this to the side.
of the talk, okay?
We're going to go into this a bit more in the future
So let's put AES to the side for a moment and I want to talk to you about a
nice software tool or a nice software machine called a solver. Who here as
ever dealt into solver, tried to use a solver in your work? I know that MSRC
is actually writing a solver. So you can talk to them. A solver, as it's
defined, is designed to solve stuff, okay? How do I use a solver?
I input into the solver a set of statements over variables written in some sort
of logic language. There's no real restriction on what language I can use. I
can use sat statements. I can use conditional logic statements. I can use
English. I don't know.
And the solver, upon receiving the set of statements, runs for a short while or
a long while, and then it outputs a satisfying assignment for the set of
statements. It outputs a set of variables which can be assigned to the
statement such that all of them are satisfied.
This is a very useful tool, it's very general purpose, very versatile. One
thing, one example I can give to you, which it is used for is scheduling, for
example, doctors' assignments. Okay. Let's assume I have a set of doctors and
a set of assignments and the doctors have constraints. What are the
constraints? For example, a doctor cannot be present at two different stations
at the same time, okay, right? Because it's a classical doctor not a quantum
doctor.
And the doctor would like to sleep six hours per night, okay, perhaps he would
like a day free or this doctor can't do Saturdays or so on. So the set of
statements is given to you and the variables is Dr. Cohen is doing
ophthalmology on Thursday and so on. So the software outputs a satisfying
assignment, which is the way to assign the doctors to rooms. Similarly, the
way to assign post-docs to universities and so on.
Another thing which is a very common application of solvers is verifying
hardware designs. Hardware design, you start with a high level language and
you end up with actually a picture which is burned on to silicon. And some
parts of this process leading from the hardware design to the silicon are
manual. Some places are prone to errors.
4
So what you do is you give the solver both the high level definition and the
actual net list, which is so-called -- this is a low level [indiscernible], and
you tell the solver, is there an assignment of inputs which is different, which
leads to a different output if I feed it to the higher level design, if I feed
it to the silicon, okay? These are solvers, okay?
Solvers are very versatile, so one of the things people thought of doing is
let's try to do cryptanalysis with solvers, okay? And I'm just going to tell
you what cryptanalysis is again. I am going to take this crypto algorithm, for
example, AES. I am going to write the set of statements which is as follows,
okay? The plaintext is P. The ciphertext is C. The transformation between
plaintext and ciphertext is the set of logical expressions. And please give me
the cryptographic key -- this is the assignment I want as variables. The
cryptographic key will satisfy the set of statements. What this means is if I
take the plaintext and the crypto with the key, I will arrive at the
ciphertext. This is exactly cryptanalysis, okay?
And that would be really cute if it could just give this problem to the
computer and come back tomorrow and get the key, right. So this was tried in
the year 2000 by Massaci and Marraro, 2000 AES was not live yet. They used a
[indiscernible]. And what they found out was that not surprisingly, of course,
modern crypto was strong enough to resist solvers.
Obviously.
You would have heard about it if it was successful.
So what does this mean? You give the solver the plaintext and the ciphertext
and the key, and the solver runs for an intractable time, okay, and the solver
takes an unreasonable amount of memory. Just ran it for a while and then they
said okay, we give up. We don't have enough hardware.
Why is this so? Okay. We feel that it should be so, but why is it so
actually? So if any of you have done some symmetric cipher design, you know
that one of the most crucial elements of the symmetric cipher is what's called
the diffusion property or the avalanche property. What this means is that if I
change one bit in one of the inputs, very quickly it's going to affect half of
the outputs, okay. One bit flips on the input, half of the bits flip in the
output. And this happens very quickly. So AES has ten rounds. If I change a
bit in one of the rounds, in two or three rounds, all of the bits are going to
be affected.
5
So effectively what the solver is trying to do is trying to find a candidate
assignment so it's guessing, okay, let's say one of the bits of the key is one.
And then it has to propagate this belief and see if it makes all of the rest of
the statements satisfiable.
And as soon as it changes one of the statements, all of this other assignments
become invalid because of this avalanche property. So the solver is reduced to
actually brute forcing all of the keys, trying to find if one of them satisfies
the relationship between the plaintext and ciphertext, okay?
So it's established at this point that classical cryptanalysis, using a solver,
is difficult. And the reason is diffusion. So okay. I'm an engineering. And
when I'm faced with a difficult problem, when I do typically is I solve a
different problem, right? This is what engineers do.
So let's try to change the problem in a way which makes solvers more efficient,
okay? So this is a definition of a cryptanalysis in the sense that I'm going
to use today. I'm giving a description of an algorithm, plaintext and
ciphertext. I want to output the key.
So what can I do to make this easier for me, the engineer? So first of all,
I'm going to take this AES algorithm and replace it with a concrete AES device,
okay? This could be a smart card. This could be a software implementation.
This could be a server running somewhere, okay. But this device, of course,
implements AES, but it does so in a way which has -- in a physical way.
And what happens is as it does its work, it keeps giving hints about what it's
doing, about its internal structure. Specifically, in my case, I'm going to
look at the power consumption, the instantaneous power consumption of this
device every micro second and I'm going to output a power trace. A power trace
is a sequence of, let's say, a million points and each one of these points is
instantaneous power consumption of the device while it's doing encryption at a
specific time.
And now I have more information, which I can try to use for my attack, okay?
And if I write this as a formal definition of power analysis, as I'm going to
use it in this work, now I have a description of a cryptographic device,
plaintext and ciphertext and power traces, okay? And I'm going to try to find
the cryptographic key.
6
What's nice about power traces? As I told you previously, what really causes
problems for me, the attacker, when I only have the plaintext and the
ciphertext, is the diffusion property, the avalanche property. What happens,
if I'm using traces, traces, if I'm looking at two very adjacent periods in the
power trace, they do not have the avalanche property. The diffusion between
two close in time spots in the power trace are close enough together for me to
be able to make an hypothesis at times T and check it at time T plus one
without compromising all of the rest of the assumptions I have about the
system.
So power traces are very good for me, the attacker, when I'm trying to attack
this device. Power analysis has been demonstrated in 1996. It's well known
that it works, okay? So what I'm trying to do now is trying to use this power
trace in the scope of solvers. So what am I going to do?
Okay. Oh, before I do that, I'd like to say a few things about power analysis.
I understand that a few of you have touched it? No? Nobody's ever done it
here? So power analysis, in a nutshell. Power analysis builds on the
assumption that power consumption is variable. And not only is it variable,
different instructions cause different power consumptions and different data
cause different power consumption. And this means that the logical leap here
means that I can run this backwards.
If I analyze a power consumption, I can make some inferences about the
instruction in the data. What does this mean? Inference about the
instructions mean I can do reverse engineering. I can output the list of the
instructions the device is performing and inferences about data, specifically
in my case, is key recovery.
Okay. So because power consumption depends on instructions in data, I can look
at the power consumption and learn about the instructions in the data.
Now, at this point, I'm doing this talk in variance for a few years now, and I
would have a few animations and pictures of transistors where I would use to
prove to you that power consumption which is something that is dependent on the
instructions. But now fortunately, there is a revolution in the world and it's
much easier for me to prove this to you.
And this is called a smartphone.
proof that power analysis works.
Probably you have one. And this is living
The smartphone has a CPU. It has a lot of
7
sensors. It has radios. It has accelerometers and so on. Let's ignore the
sensors and the radios. It has a CPU, okay? And sometimes, we take our
smartphones and perform a CPU intensive operation, which has no IO and no
sensor.
For example, we play a game. Some of us play games on our smartphones. We can
admit it. And as we are playing this game on our smartphone, we feel that our
smartphone gets hot, right? And the battery runs down, okay? Yes.
>>: I saw the freebie games an incredible amount of battery and power
downloading.
>> Yossef Oren: Yes, about three times the amount of -- three times the power
they spend on the games, they spend on the ads, yes. But it's kind of scary to
think about that. But the whole thing only runs out to about a dollar a year,
if you just charge your computer all the year, it costs you about a dollar in
electrics.
So now I can give you a living proof that power consumption is variable, okay?
I just want to show this picture which I found. So power consumption is
variable.
Okay. So I have this fact, it's established and I'm going to do side channel
analysis with solvers. And okay. Now I'm trying to do cryptanalysis. And two
groups, one of them in Belgium, one of them in Princeton, thought of doing it.
They just used a cipher and instead of feeding only the inputs and the outputs
of the cipher, they also fed intermediate data, which is related to the power
consumption.
And, of course, I'm standing here and I'm talking to you so, of course, they
were successful. Okay. The result. The key can be recovered from the side
channel data. Right? Okay. But as you see, there is a small but. And let's
see if we can try and find the mine in the statement together.
The key can be recovered from the side channel data if there are no errors in
the side channel trace, okay? So many of you are scope jockeys and have tried
to measure physical phenomena using your scope. And as you know, there are
never no errors in your scope measurement, okay?
So this is the harsh reality of power analysis.
Here is a very small device.
8
Can anybody guess what this device is? Yes, this is an inverter. It takes a
square wave input here and it outputs a square wave output here, which is the
inverse. This is an anti-transistor and this is a P-time transistor and you
can just go through it in your head and see that a logical one here causes the
zero to run out here and a logical zero here causes the one to run out here.
Okay. So now I'm going to feed the square input into here and I'm going to
take my oscilloscope and I'm going to look at the output. And how am I going
to measure the power trace? I'm going to take a very small resistor, a one ohm
resistor, and I'm going to put it in series between the device and the ground
and I'm going to measure the voltage drop on this resistor.
And the voltage drop is related to the power consumption, okay? Now, what do
we see here? When nothing is happening on the circuit, there is no power
consumption at all. This is something which is a property of CMOS transistors
and you can really like it. This is why we can use our small battery to power
our devices for a long time, okay?
If something happens, there is practically no power consumption. And when this
device switches from one to zero, this is a peak, okay. Sometimes it's a big
peak, sometimes it's a smaller peak. It's related to the fact that these
capacitors, passive capacitors get charged and discharged, okay?
Okay. So if I was able, if I was somehow tasked with performing power analysis
of this device, this power trace would be beautiful for me, okay. But the
problem is that it never looks like this, right? What happens to the power
trace on the [indiscernible] measurement, okay. So there are several things
which are going on in this device which cause problems for me. First of all,
I'm not measuring only the cryptographic operation, okay. I am attacking a
system, a chip or a decoder or a video set top box. And this system is doing
other stuff other than doing my encryptions.
It might be
threads and
under test,
this. This
doing, I don't know, IO. Could be doing all sorts of CPU tasks,
so on. What happens is that other stuff is going on in this device
and I'm obviously measuring not only my calculations, but also
is called switching noise.
Another thing that happens -- oh, sorry. I gave you too much errors here.
I'll bring it up slowly. Every piece of conducting wire on my device and
connecting my device to the world is both a transmitting and a receiving
9
antenna. And the electrons which are moving through the air modulate
themselves on to these cables. This is called electronic noise or thermal
noise, because when you cool the device down, it gets weaker.
I'm obviously measuring this thermal noise as well, because I'm measuring
what's going on on the cable. And the third thing which is causing problems
for me is that I'm using the physical device called an oscilloscope, which is
connected using a physical device called a probe. The scope has its
limitations. It has a digital, analog to digital converter, which run certain
times of amount of times per second. It has its own impulse function. It has
its own sensitivities. It has its quantization.
What happens at the end of the day is instead of having this pretty, exact
representation of power consumption, I have this disgusting errored thing which
we feel is difficult for us to perform power analysis on.
And let's try to give this a rigorous treatment, okay? And this is called the
information robustness trade-off. Let's take a rigorous look at this phenomena
we just saw.
So let's assume I'm going to perform my solver attack. The solver needs a set
of statements and each one of these statements is calculated from a
measurement, okay. Takes the trace, power trace and a measurement. Let's say
I measure it at 100 different times and I feed these measurements into my
solver system.
Okay. So I have a measurement space, which is 100 dimensional and each one of
these axis is the value of a certain measurement, okay? Is this okay? So here
is the precise measurement, okay? If there were no errors, no artifacts,
nothing, this is what I would have been able to measure using my test setup.
And it has been shown in the works of the Belgians and the Princeton guys that
this is enough to recover the key with actually very good accuracy and very
high speed, okay?
And here is what I get on my scope, okay? If this was the case, I would be
very happy. Because my actual measurement would be the precise measurement and
I would be able to output the key quickly and efficiently.
But what happens is that these errors and things, or noise -- you don't have to
treat it as something bad. Error means that it's bad. Noise means that it's a
10
fact of life, okay?
So what happens is if the circle moves aside. Now what happens, okay? I'm
running, I'm providing this measurement to the solver, and the solver looks at
the equations and looks at this measurement and what does the solver output?
Noise. Unsatisfiable. There is no key which can give out this measurement,
okay? Okay? The measurement stays -- has a code word here, it doesn't have a
code word here.
So now looking at this picture, what do we immediately feel inclined to do to
make this work again? We want to do this, right? We increase the robustness
of the equation system, okay. So we increase the robustness. We now say the
equation set could be okay at time T1, the leak is three. Now we say it's not
three. It could be two or three or four or somewhere, but it can't be seven or
something of the sort. And we increase this robustness until we get into -until the set of valid measurements includes the precise measurement.
But what happens here is that the measurement space does not look like this,
but rather looks like this. And now, what happens is that inside the circle of
measurements which are valid, which are validly represented by the side channel
measurement, there are exponentially many satisfying keys.
What happens when the solver has exponentially many satisfying keys? We know
what it does. It just brute forces all of these keys. Because all of these
keys are legal. What's happened is that the solver is again reduced to brute
forcing. And this results, again, in an intractable running time. So this is
what we call the robustness information trade-off. It's a very, very cruel
trade-off.
If we use an equation set which has errors in it, then the solver returns
unsatisfiability, because the correct solution is not inside the set of
measurement which I can accommodate. But if I increase the set of measurements
if I add robustness, then the solver runs for intractable time, okay?
So this is where we were stuck at 2009. Either one of these is not good for
us. We can't use solvers for power analysis, cryptanalysis in the real world.
Okay. So it's a hard problem so again, I'm an engineer. So if I have a hard
problem, then I solve a different problem.
Okay.
So how can I change the problem to make it more acceptable for me?
11
Okay. Here is the solver. This is the tool I use previously. A solver
receives a set of statements and it outputs a satisfying assignment. Now,
instead of a solver, I can use a built of a more elaborate tool called an
optimizer.
Now, an optimizer works more or less the same way as a solver, but it doesn't
receive only a set of statements. It also receives something called a goal
function. And the solver, the optimizer chooses, among all of the possible
satisfying assignments, the optimal assignment. The one that gives us the best
value for the goal function. Let's say I want to minimize the goal function,
give me the minimal value of the goal function.
Again, this is a general purpose tool. It's a bit more elaborate and heavy.
It takes more time to run than the solver, but it's also very useful. One
example where it's used, actually, IBM has a group which rolled their own
solver and all day they sell usage of the solver to companies around the world,
transportation companies, shipping companies who want to optimize their routes.
Here is a classical application for optimizers which actually is in the field
today. I want to take the Russian railway system and I want to save fuel. I
want to serve all of the stations in the railway system. I have constraints.
A train cannot be present at two places at once. Two trains cannot be present
at the same railway at the same time, okay. Train cannot travel so on and so
forth before needing fuel. But I also have a goal function. The goal function
is please serve the schedule of the Russian railway system in a way which
minimizes the distance traveled by the trains. Minimizes the gas consumption
of the trains and so on.
And when you save a lot of money using this tool, suddenly, it becomes very,
very important. So this is an optimizer. And similar to stat solvers, which
are also developed in universities and there's competition between every year,
there's something called an SAT race, where people will run their SAT travels
and see which one is the fastest, there's also something called the
pseudo-Boolean competition, the solver competition, optimizer competition.
Every year, people compare their optimizers. It's an active work. People are
actually researching them, trying to find the best way of doing it.
So okay. How do I use this in my context? So if I go to the previous slide, I
had this big circle, and my assumption was that all of the points inside the
circle are equally eligible to be the correct key. So now, my insight is that
12
some mistakes are more expensive than others.
What do I mean? I am going to give a price to each mistake I make, okay.
There is the correct measurement, which is not the correct measurement. It's
the measurement I got. Every time I deviate from this measurement, I'm going
to pay a price, and the optimization would be let's try to pay the minimum
price.
And what's really nice is that I gave you this idea that there's a continuum of
points, and the decoder which receives this trace outputs a point on this
continuum. This is not correct. Actually, the decoder is a bit more
elaborate. It's something called a soft decoder.
The decoder for each one of the points along its axis outputs an
[indiscernible] probability, which means how likely it is that this specific
point is the one that was transmitted conditioned only the fact that this trace
was the one received, okay. If somebody here that did some single processing
work, this is a Bayesian decoder, a naive Bayesian decoder. Other decoders
also exist.
Okay. So I can give a price to every mistake I make, and I want to do the best
I can. So here is my new definition of side channel analysis based on solvers.
I have a description of a device. I have plaintext, I have ciphertext and now
I have -- I need to find the key that minimizes the estimated error.
So once I output a key, I can run this backwards and say that if this is the
key, this is the power trace it should have been if there were no errors. I
want the power trace that should have been to be closest as possible to the
power trace that I did see with my scope. Okay?
So now let's take a look at what really happens if I do this, okay? Again, I'm
going to look at the measure measurement space. And this is a real measurement
from my lab. Actually, it's based on a simulation from a device.
I'm going to look at two dimensions of this measurement space, and the
brightness of each point is correlated to the likelihood that this is the point
which is really received, okay? So there are lots of points which are
non-zero, okay. And if you would turn down the light, you would see them also
less likely points.
13
Where is the precise measurement? Wow. Yeah, you see it? Great. Can I have
lights again, Matt? Wow. Matt, you're awesome. So let's just take a look a
bit closer at the points here. And where is the precise measurement? Here it
is. Okay? This measurement, it's the 680th most possible of the 65,000 points
here. So it's pretty likely, but it's still exponentially difficult to find
it, okay?
Now, I just gave you two measurements, okay? I don't give you two
measurements. I'm going to give you 100 measurements, okay. So if I would try
to brute force over all of these measurements, I would have to do an
exponentially crazy amount of evaluation to see if it's a correct one. But now
if I have a ranking by probability, I still have to do a crazy large amount of
exponential work.
Practically, in my measurements, the average rank of the correct byte out of
256 possible candidates is number 14, okay? So 14 to the power of 100 is still
a lot of work. But the thing is that these points now are not IID. And this
is the really -- this is the real trick here, because I also give to my solver
the description of the cryptographic process which is creating these bytes to
be output.
So if one of these measurements is the input to sub-bytes, and one of them is
the output to sub-bytes, as soon as I assume one of them, I also have to assume
the other, okay?
So what this means is that this point might be quite unlikely, but it's really,
really powerfully suggested by other information I have. And as I said,
because of the slow diffusion property, what I saw a minute before, what I'm
going to see a minute after is going to really affect my choice at this stage,
okay?
So I'm just going to go a bit into the tool I use, which might be useful for
you in other applications. The specific optimizing system I use is called a
Pseudo-Boolean Optimizer. And this is how it works. The objective of the
Pseudo-Boolean Optimizer is to output the vector X variable, which minimizes
this goal function, okay. This is a constraint function called C. I multiply
it by X, I get a number. I'm going to minimize this number subject to this
matrix of linear constraints. A times X has to be greater than or equal to B.
And why is this called a Pseudo-Boolean Optimizer.
Because the variables
14
themselves are Boolean, but the coefficients of A are assigned integers, okay.
So if you delved a bit into logic systems and so on, you know that the simplest
logical system we've used are SAT statements. SAT statements are Boolean and
Boolean. And on the other side, there's something called integer programming
where the variables themselves are integers and the coefficients of integers.
So this is somewhere in between. It's a compromise which is very good for me,
because this simplification makes the solver quicker and it makes the
implementation more easy to use.
Okay. So using very simple to describe gadgets, I can turn a linear
Psuedo-Boolean Optimizer into a nonlinear Psuedo-Boolean Optimizer. And this
is using the classical linearization technique.
So if I have a constraint, let's say, Y is X1, X2. Remember that Y and X1, X2
are Boolean, okay. So what this actually means is that X1 plus X2, both of
them are 1, then Y has to be -- okay. More or less this. Okay? I'm not sure
I wrote this correctly, but there is a linearization system and you can use
nonlinear and they also can multiply these variables together and you can also
use your inverts.
So this language is very expressive. I can very easily describe to you a
Pseudo-Boolean instance. You can look at it and understand what it does. And
this is in contrast to SAT systems, which really you need somebody to
understand assembly language to understand the SAT system. Here is a sample
Pseudo-Boolean instant written in the open Pseudo-Boolean programming language.
What am I trying to do here? I'm trying to find the vector X1, X2, X3. Which
minimizes this goal function, okay, and it has this constraint, okay. X1 plus
X2 plus X3 is greater than two, okay? So let's try to see if we can find in
our heads the optimal assignment. Let's try the all zeroes assignment, okay?
The all zeroes assignment gives us the goal function of zero, which is really
great. But zero plus zero plus zero is not -- does not satisfy the constraint,
okay. So that's not good.
Let's try the all 1s assignment. This gives me four. Four is more than two.
This is great. And the goal function now is going to pay the price of five.
Okay. Can anybody looking at this equation system give me the optimal
assignment?
15
>>:
One, zero, one.
>> Yossef Oren: Precisely. One, zero, one as well gives me a price to pay of
two and one, zero, one is two. Two is greater than or equal to two. Great,
okay. This language is very expressive and it's very rich, okay?
What do I mean rich? It's really useful for my application. Specifically, I'm
trying to find the variables, my variables which are either the inputs, the
outputs or all sorts of internal state of illusions of my device. They're all
flip-flops. Flip-flops can either be zero or one. So it's really great that
the variables are zeroes and ones.
On the other hand, the measurements, which are the things I pride to my
systems, they're stamped on some continuous axis, but I can sample them into
integers so the constraints is really great that they're integers, okay? And
the non-linear notation is rich and expressive enough for me to be able to
write very elegantly and succinctly the things which are -- you can see in
crypto devices. For example, here are some very simple Pseudo-Boolean
statements.
Here is a negative OR, you see. Out is equal to not X1 times not X2, okay?
Pretty trivial, okay? You can see this is a negative OR gate. Here is an
exclusive OR gate. This is a bit cute, okay? I wanted to put this here,
because you see that I'm actually using here a nonlinear -- I'm using here the
Pseudo-Boolean, okay. This integer. If you would write the through table for
this thing, you would see that this is exactly the exclusive OR statement,
okay?
You can just play with it in your head. Here is a bit of a more disgusting
function. This is the Keelog ciphers nonlinear feedback function. It's five
inputs, one bit output. There's no efficient algebraic representation of it.
But as you can see, I just, I wrote something here which is Pseudo-Boolean
representation of it, okay?
And also, the goal function is really good for me, okay? The goal function,
let's say I'm trying to determine the costs, the price of my -- I'm going to
guess I'm trying to do a side channel analysis. I guess that the key is a
certain vector, okay.
Now, choosing the key forces, of course, all of the internal values of the
state to be also strictly defined. So now, I have for each one of these
16
internal values, a chosen value and I know the aposteriori probability of each
one of these values. I want to find the overall probability that this is the
correct value. What do I do? I multiply all of these together and I get a
single value which is -- yes?
>>: There's a negate missing someplace.
negate something.
A max doesn't become a min unless you
>> Yossef Oren: The max is less than one. All of these products, all of the
aposteriori probabilities are less than one. So the log of a number less than
one is negative. So I want the minimum sum. It's the minimum -- it's like
entropy.
>>:
The maximum, it's a negative number.
>> Yossef Oren: I want to maximize the negative, yes. Thank you. Right. So
here is another example of a Pseudo-Boolean instance. Here I have a vector X,
okay, you can just look at it. It's so apparent to see what I've written down
here. X can be either zero, one, two or three. This statement means that one
and only one of these variables can be true, okay?
And X is an eight bit value. Here are the eight values of X. X sub 1, 2, X
sub 7. And as soon as one of these events happens, it forces all of the other
events to get their value. And these are the constraints and here is the goal
function, okay? The goal function means that, okay, if you're going to choose
zero, you're going to pay a pretty low price. If you're going to go ahead and
say that X is 3, you're going to have to be pretty sure of it based on other
evidence before you can go ahead and choose X is equal to 3.
So as the solver, as the optimizer works, it would probably try first X is zero
and then move down this list, okay. But again, because X might be determined
by the previous value in the decryption, the former value in the decryption, it
will do a bit more efficiently. But this is how my equation system looks. Any
questions about this? Because now I'm going to describe my workflow and my
results.
>>: So I'm wondering how you get the expected trace.
simulator, or do you run ->> Yossef Oren:
Okay.
So do you have a
The way I prepare my decoder, it's a method that's well
17
known in the art. It's a paper from 2005 called template to text, which is
very interesting. It basically you take your device and your cryptivity, and
you force it to output values which you would like it to output. And you
measure, let's say you want that value number two would be five, okay. You
make it output value -- many traces in which value number two is equal to five
and you find an interesting point in this trace, or a few interesting points,
which are highly correlated with the value we are measuring. There's
statistical tools which assist you in finding this point.
And then for this point, this interesting point, you create a mean and standard
deviation. And the mean and standard deviation are the mean and standard
deviation conditioned on the fact that you were expecting five.
Now, for all possible values, 256 values, you create 256 means and standard
deviations, okay. If you're using, for example, two interesting points, you
also have a covariance matrix for each of these. So now I have, if I get a
trace which I don't know, I can use these variables, these means and variances
to create 256 probabilities. Probability of this trace conditioned on one,
conditioned on two, conditioned on three.
And then using Bayesian inversion, I can flip this around and get the
probability I'm looking for, okay? It's really interesting single processing
and the paper describes it much better than I just did, okay?
>>:
I get it.
Thank you.
>> Yossef Oren: So here is the work flow. I have this device under test. I'm
going to do something novel to it, which will end up in the optimizer
outputting the secret key, okay? So what is my work flow? TASCA, by the way,
is tolerant algebraic side channel attack. Tolerant, because I can tolerate
errors, okay?
So first of all, I'm going to take my device under test and subject it to
reverse engineering. This can be based on prior knowledge. This could be
based on really probing and so on.
And the output of this would be the power model. The power model means that I
am going to write a set of constraints which say is the device is doing this at
time, is doing something at this time, then I expect its power, its precise
power to be this, okay? This is output from reverse engineering.
18
I'm also going to take traces or, let's say, one power trace, which is output
from the device, as I am attacking it and I'm going to put it into this
decoder, this Bayesian decoder, which I just described, and this decoder is
going to output the vector of aposteriori probabilities for each one of the
measurements I'm going. Let's take a look at the amount of data I'm using.
The data complexity. I have a single trace, for instance, and in this trace
I'm going to take 100 interesting points, and each one of these interesting
points is going to calls the output of a vector of aposteriori probabilities,
okay?
So from this one trace I'm going to get say 100 times 256 aposteriori
probabilities, and then I'm going to put this into the optimizer, these
together, and it's going to run and it's going to end up in a paper and I'm
very happy, okay? We know how it works right? Our objective function is very
well defined as researchers.
So let's talk about a real attack, okay, and the results. By the way, this is
going to appear in CHES 2012 a month from now. So the solver is SCIP. SCIP is
an open source solver written by the Berlin university. It keeps running in
the SCIP -- in these Pseudo-Boolean competitions and winning so it's pretty
good.
And the
device,
Belgian
outputs
claims.
cryptosystem I'm attacking is a simulation. It's not a real lab
but rather based on measurements which have been performed by my
colleagues. So it's real data, but it's created by a tool which
simulated data, okay? I just don't want to make any ground-breaking
I didn't break a physical device yet, okay?
And what I did, I took a single power trace and I extracted 100
from this single trace. So single power trace, data complexity
signal to noise ratio is reasonable at 10 DB. It means there's
amount of power in the signal than there is in the noise. This
signal to noise ratio.
measurements
is one. The
ten times the
is a reasonable
I put these instances, I created a lot of these instances. I said I have a
tool that creates them. I took 200 of these instances and, on average, in
about less than ten minutes, the key was recovered with perfect success rate,
okay? One trace, ten minutes, 100 percent success rate, okay?
19
What does this mean? Before I describe what it means, I want to tell you a bit
about the field of power analysis. Power analysis has been something which
people in the academic world know about since '96, and it's well assumed that
the government agencies and so on and so forth know about it from World War II
and on, okay?
So people know that power analysis attacks work. But what happens at the most
power analysis attacks until this day are based on statistical methods. What
that means, that you need many traces, and you try to make a hypothesis on
these traces which will cause them to behave in a way which, for example, you
can split them into two bins. If your assignment is correct, if your
hypothesis is correct, then these two bins will be statistically significant.
Or there will be a correlation between your hypothesis and the traces at a
certain amount of time. So another thing which all of the previous -- not all
of them, but many of the previous attacks assume is that there is a linear
correlation between the power consumption of the device and the data it's
processing, which we know because it's a CMOS device, okay?
So how do you break power analysis? How do you make a device resist power
analysis? You attack both of these venues. First of all, you try to make your
protocol, you try to make your environment so that the keys are changed with
high frequency. They're always fresh. So you won't be able to get a lot of
measurements so your statistical message will fail. Or you can just inject
noise to make the statistically correlation weaker.
Another thing you do is you try to make electronic engineering tricks to break
the linear correlation between the power consumption and the amount of bits
which are flipping at a certain time. You can do this by, for example, what's
called dual array logic. You carry your circuit and then you create a mirror
image of the circuit and they always do the opposite from one another so you
really don't know what's going on, okay?
But what happens is that both of these assumptions, which make today's resisted
devices resistant do not hold when I'm doing this attack. First of all, crypto
devices don't need high data complexity to be attacked with this thing, okay?
The data complexity is one.
And if I really have a lot of noise, I can average two traces together so that
the data complexity is two. Averaging really, really hits the signal to noise
20
ratio, okay? So I don't really need to average. If I average 16 traces
together, I can do a great work in reducing the signal to noise ratio.
So the data complexity is very low, which is something which previous
countermeasures assumed would make them resistant to power analysis.
Another thing is that I don't need anything particular about the leak. As I
told you, I make no assumption about the leak. I leave this problem to the
author of the decoder, okay. The guy who writes the decoder, he can, as long
as he can output the vector of outputs to varied probabilities, that's enough
for me.
So if the leak is linear, that's fine. If it's not linear, okay. The device I
showed you has a nonlinear correlation between the [indiscernible] rate of the
byte and the power consumption, but if there is a relation, okay, so anything
that I can write a soft decoder for and it doesn't even have to be power
consumption. It can be anything exotic, anything else. As long as I can write
a soft decoder, I can do this attack, okay?
So these two facts together call into question the security of previously safe
devices. So if you say a device is resistant to power analysis, you might have
to go and check this claim again, okay?
So I think this is pretty exciting. I still don't have any practical devices
I've attacked, because it's a very fresh result, but I feel there's a lot of
things that will have to be questioned if this really turns out to be
practical.
So where do I go from now? Future work. First of all, I used only power
analysis decoder. So I feel that anything that leaks can be attacked. So a
nice thing to do would be try to do different things that leak and try to feed
them into my solver. What I do know, at least I feel it, that if I throw
garbage at my solver, which means low quality data, it knows to ignore it, more
or less. So if there is a measurement where all of the values in these
measurements have the same probability, the solver will just ignore it.
So it really feels that the more data I throw at the solve every, the more
effective it will be. The second thing is different leakage models. Again, I
can try different things. For example, I can try to use a cache timing leak,
electronic leakage and so on and so forth.
21
And the third thing, I touched on it briefly. The decoders we currently use
have this very elaborate pre-processing phase. We have a captive device and
they do profiling and they create [indiscernible]. And then I have a very,
very small data complexity. I only need one trace, and I use this trace and I
get the key.
So it might be that I have limited time with my device. There might be -- it
might be possible to use less profiling time and more traces for the attack and
still get an attack with low data complexity both in the online and offline
phases, okay?
And the fourth thing, which is really apparent is to get a real smart card or a
car or a computer and attack it using this method, which is really something
interesting to do, but difficult.
Okay. So this time, I thank you. And you can get the paper from this website,
and I'd really welcome any questions or comments. Thank you.
>>: I don't know if this is really within your mandate, what you're interested
in, but what would be the characteristics of the device that could defeat this
attack?
>> Yossef Oren: Let's see. I would say that what I really need is diffusion.
I need low diffusion. So a device which has lots of diffusion would defeat the
system, okay. So if, for example, if I make a guess and to check this guess I
would have to go over a whole space of solutions, that would make the device
difficult.
One example is, for example, a device which does a lot of things in parallel.
So I'd have to make a lot of guesses at the same time, okay? Anything else?
>>:
How does the attack scale with higher complexity crypto systems?
>> Yossef Oren:
What do you mean complexity?
>>: So say you make the bit size larger or something.
or ->> Yossef Oren:
So a much bigger keys
AES doesn't use a different round structure for larger keys.
22
It just does more rounds. So AES with a larger key is not more -- not more
powerful when I attack it using this way.
>>:
So it just doesn't change at all?
>> Yossef Oren:
Yes.
>>: With the exception of TV decoder cards, who in his right mind puts his
secrets without the other guy get within two inches of them in the scope?
>> Yossef Oren: Oh, yeah, that's true. If -- you can always say that if you
have physical access to a device, then you're screwed anyway, okay? But first
of all, TV decoders are a very interesting market. The second thing is that
now we are taking our cell phones with all of the secrets and touching them to
all sorts of -- at least once to do it once the software and hardware is in
place, we're going to go and touch, okay, places you wouldn't touch with your
hand, suddenly you're going to touch with your phone. And who knows what's on
the other side.
And the third thing I can say is if you're using electromagnetic leakage,
suddenly your range is much larger, okay? Power analysis does always have this
comment you have to make that to do power analysis, you need a power trace.
Yes. Okay? Anything else?
>> Kristin Lauter:
>> Yossef Oren:
Let's thank Yossef.
Thank you.
Download