>> Kristin Lauter: Okay. So today we're very... us. He's going to speak to us on key...

advertisement
>> Kristin Lauter: Okay. So today we're very pleased to have Orr Dunkelman visiting
us. He's going to speak to us on key recovery attacks of practical complexity on AES
variants.
Orr is a postdoc at the Weizmann Institute and has been a postdoc at the Ecole Normale
Superieure and at Leuven, something like that. So thank you for coming.
>> Orr Dunkelman: Thank you for having me. So this is a joint work with Alex
Biryukov, Nathan Keller, Dmitry Khovratovich, and Adi Shamir. And I'm going to
discuss key recovery attacks of practical complexity on AES variants.
Now, AES is advanced encryption standards. So just to get a rough estimation, how
many of you know what the AES is [inaudible]? Great. Okay. So we can skip some of
the things. It's always a problem when you go to a place and you don't know who you're
going to meet there.
So I will go very briefly over the concepts of the AES, why we thought it was secure, and
then a bit about attacks on the AES.
Now, cryptanalysis in recent years, especially symmetric-key cryptanalysis, went towards
a very theoretical approach. Now, when I say theoretical approach, I don't mean P versus
NP sort of theoretical approach; I mean towards 2 to the 400th times complexity
approach.
Now, that prevents us to discuss things with other practical people who say that, that 2 to
the 400 is a bit too much, and theoretical people who complain that order of 1 is order of
1 and nobody cares about it.
So we're stuck somewhere in the middle. And we're trying actually to [inaudible] this
problem of a very huge gap between what is considered practical and what is considered
theoretical, because whenever I'm going to say in this talk theoretical, I mean practical
theoretical, 2 to the 400. And when I say practical, I mean something which is practical,
you can do it on your nearest -- well, either grids or even your PC.
So let's start with a very quick reminder what a block cipher is. So this is one of the most
basic cryptographic algorithms. There was the data encryption standard. That was
standardized in 1975 or -7, depending how you count it. Somewhere in between.
So there is a symmetric key which is shared between the two sites. And there is a
transformation. Actually, a block cipher is a permutation, a key permutation that takes
blocks of beats of N beats and generates usually the same size of the outputs.
This is sort of a bijection, not necessarily into the same space, because some block
ciphers have additional functionality. But today most of the time we deal with block
ciphers which take an N beat block and return an N beat block. So if you want to treat it
formally, there is N beats plaintext K bit key, and it's just transformation.
Now, actually when you encrypt data you have more than that. So you have some mode
of operation, a way to wrap around this block cipher which can treat only a specific
amount of data at each, invocation. So there is a system behind the block cipher.
And we have to remember that, that it's not out of the blue, standing there. I mean, like
there is this block box, and that's whatever you want. You have to somehow
communicate with it and make it give you the right inputs or the right functionality that
you're looking for.
Okay. So this is a very crash course of differential cryptanalysis. Differential
cryptanalysis is a method to attack or cryptanalyze block ciphers. It was introduced in
the '90s by Biham and Shamir. And its basic idea is instead of looking at the specific
value and try to determine the key by looking at its decryption, you look at pairs. So you
take two encryptions, P and P star, and then you look at the differences in how things
behave.
Now, the reason this is very good is because the differences have the tendency to
propagate very nicely. For example, if you have a linear operation, the difference of the
output is the linear transformation on the different of the inputs. Now, this is really great
because when I say linear, in cryptography sometimes we mix linear and affine.
It also worked for affine transformations, but the nice thing is that when you XOR a key,
the difference before the XOR and after the XOR is the same. So the keys appears from
the encryption algorithm.
Of course other things becomes a bit harder. For example, when you have no linear
operations, they actually get very annoying when you try to analyze it. Because when
you have a nonlinear operation, you know the input difference, you do not necessarily
know the output difference, besides the standard case for the input difference is zero.
Assuming that your function is deterministic, which is most of the times, unless you're
using Pentium 60 megahertz, the first generation, when you put the same input you get
the same output.
Okay. So a quick introduction to the AES. So as I said before -- sorry, as you probably
know, it was designed by Vincent Rijmen and Joan Daemen. Under the name Rijndael it
was submitted to the NIST competition in 1998. There was a very long process of
selecting the AES. Many people tried to break it. And at the end it was selected. It has
an SP network structure. I will show in a second. Its block size is 128 beats; key size,
128, 192, or 256.
Now, most modern symmetric-key ciphers operate in the concept of the rounds. So we
have some basic transformation, and then you repeat it. You iterate it again and again. In
the case of AES, it's either 10, 12, or 14 rounds depending on the key size.
Questions so far? Great.
So input, this is the state. You take 16 bytes of the state you put it in 4-by-4 metrics.
Each round is composed of four simple operations.
The first one is the SubBytes operation, which is a nonlinear transformation; a table
lookup, if you like. A ShiftRow operation, which takes each row and rotates it to the left
by I bytes. So the first row is unrotated, the second one is rotated by one byte to the left,
two bytes, three bytes to the left. Then the MixColumn separation, which takes each
column and multiplies it by an MDS matrix. I will show in a second. And then the
[inaudible] key operation which adds the subkey.
Now, usually, as I told you before, there is a key which is used to generate many subkeys.
The reason is that if we have a key of let's say 128 beats or 256 beats, then you have to
generate enough keying material, subkey material, to affect the entire encryption. So
there is also a subkey generation algorithm or key schedule algorithm which transforms
this 256-bit key into 15 subkeys of 128 beats each.
So here's the MixColumn operation. You take four bytes and you multiply it by this very
nice matrix. It's an MDS maximal distance ->> Separated?
>> Orr Dunkelman: Separate. Thank you very much. I keep on calling it maximal
distance something. The S is very useful.
>> [inaudible]
>> Orr Dunkelman: So if you have a change -- if you change one byte in the input to the
MDS multiplication, you will affect at least four bytes, meaning all byte at the output. If
you change two, you affect at least three. If you change three, you affect at least two.
And if you change four, well, you may affect only one, but it's more probable that you
affected everything.
So this is a very nice thing because modern enciphers are built around the concept of
confusion, which is usually the S-boxes, the nonlinear parts, and defusion, which takes
this -- a bit of nonlinearity and spreads it all around.
Now, in order just to give you the field, if you're interested, it's 11B, X to the 8th. Et
cetera. It doesn't really matter. I'm not going to discuss this anywhere in the talk, just for
completeness.
The S-box, SubBytes, you, first of all, invert the inputs over the same field where we
define zero to be itself inverse. Then you compute this nice matrix over GF2 and add this
nice vector.
Now, this is a very nice S-box because it has very, very nice differential properties.
Specifically, you get something which is uniform as you can get for differences -- for the
differentials -- okay, sorry.
Difference distribution table is a prediction saying given some input difference what is
going to be the output difference. And there are probabilities associated with it because
this is not -- when you look at differences, you lose some determinism in the sense that
the difference, one, can be either between the pair of values zero and one, but it can
between the pair two and three or three -- sorry, between four and five.
So when you have something like that, you don't know actually what was the pair. And
this is actually how the key affects differential cryptanalysis, because it actually selects
which of the pairs with input difference one was entered. But there are 128 such pairs.
And the difference distribution table, which is a sort of prediction, just a probability
distribution. If you have an input difference 1, the output difference is going to be 3 with
probability either 0, 2 to the minus 7, or 2 to the minus 6.
This is the case for the S-box that is written here. And specifically this is the best that
you can get with an 8-beat to 8-beat S-box. There is one entry of 2 to the minus 6, and
126 entries with probability to the minus 7. And all the rest is 0, given an equal distance.
So it's a very nice S-box.
Okay. The key schedule algorithm, you first of all put the key in a 4-by-8 matrix. Now,
you work on all wards, all columns of 32 beats or 4 bytes. So the first column you take
these two things, so first you rotate this column a bit. And you put it through the S-box
and generate this new byte.
The last byte is affected by also some round constants to prevent this structure to be too
self-similar. We have attacks that are based on such things. So this is how it works.
The next column is generated by a simple XOR of two columns. Simple XR. Simple
XR. And then, again, we have an S-box layer, this time without rotation. And an XR,
and then another three rounds of simple things.
Now, the reason for making this thing only use the S-box every four wards is efficiency.
No other reason behind it besides that.
Okay. So as I said before, these S-boxes are really, really nice. They have very good
differential appropriates. The MixColumn operations, because it's an MDS matrix, make
sure that you have a diffusion from one byte to the entire column. And when you take
into consideration the fact that there is a ShiftRow separation, if you change even single
beat of the plaintext, after two rounds you may affect all the beats of the internal state.
This is a very strong property. It's actually a very good property. It was designed using
some sort of thing called the wide trial strategy, which assures that the number of active
S-boxes -- okay. An active S-box is an S-box with nonzero input difference.
When there is a zero input difference, there is a zero output difference with probability 1.
When you have a nonzero input difference, you have some distribution on the output
difference.
So an active S-box is an S-box with probability associated with the transitions. So we
would prefer to have as few active S-boxes as possible. And specifically in the case of
AES, you can show that if you have two rounds, the minimal number of active S-boxes is
five, for three rounds it's nine, and for four rounds it's 25.
Now, the probability of each transition is at most 2 to the minus 6. Assuming
independence between the transitions, which is not necessarily the case, the probability
for any four-round differential characteristic, every prediction of saying this input
difference became this input difference after one round, became this output difference
after two rounds, after four rounds we'll have probability of at most 2 to the minus 150.
Now, the state is -- sorry. It's 128 beats. Meaning that you have 2 to the 127 pairs. If
you take 2 to the 127 pairs, each of them satisfies the differential characteristic with
probability 2 to the minus 150, the amount of pairs satisfying all the differential
transitions is going to be less than 1. Meaning the probability of finding a pair which
satisfies this differential characteristic is negligible.
Okay. There are some other four-round assurances. For example, after four rounds, there
are no more impossible differential cryptanalysis differentials, and there are no
four-round square properties. More than four rounds. Five rounds is the -- now, the
thing is that I'm saying -- well, it was believed that AES is secure because differential -there are no good differential characteristics. You start with some input difference for
any transitions, for any set of transitions, the probability of finding something good is
very small.
Now, the thing is that actually in differential cryptanalysis we do not care usually about
the exact paths, what were the transition differences. We care only about the input and
the output differences.
And this is done -- this is mostly because of the way the attacks work. You take many
pairs and you find the pairs which satisfy some input and output difference. You don't go
after all the things in the middle.
So there is a concept called differential which is actually saying what's the probability
that a given input difference will become some output difference independent of the
transitions in the middle. So it might be the case that each path has a very low
probability, but the total probability of an input/output difference pair is much higher.
So there is a series of papers that actually show that four-round AES has no differentials,
meaning there is no input/output pair -- input difference/output difference -- sorry -- pair,
which holds with probability of 2 to the minus 110. Meaning that you still need huge
amounts of data just to break four rounds of AES.
And that's why differential cryptanalysis will not work on AES, and the same goes for
linear cryptanalysis, which is another flavor of cryptanalysis based on approximating the
encryption by using some sort of a set of equations with some error margins.
Okay. So hopefully by now you are convinced that AES is secure. Right? AES is
secure. I mean, NIST had to pick a good cipher. And they looked at the five finalists,
and they said, well, we don't like Bruce -- sorry. I'm kidding.
>> Serpent is too slow.
>> Orr Dunkelman: Serpent is to slow. MARS is too complicated. If we will pick
Twofish, the hardware people are going to kill us. R66 [phonetic], well, there is some
duplication, the hardware people will not really like it. Rivest has too much money
anyway, so we don't have to give him any more honor. Yeah, let's pick Rijndael.
Now, they had to do so very -- I mean, well, they had to do it very delicately, and they
had to make sure that it's secure. And people try to attack AES, and, you know, with the
wide trail strategy, people had very good feeling about the security of AES. So it's
actually secure. Sort of. We'll see in a second.
So -- how did we get here? Sorry. Okay.
So here's a bit of an overview of what happened in cryptanalysis over the years. Now, in
the early '80s, when cryptanalysis just became academic research field, people looked at
ciphers and said here is a cipher. They twiddled beat No. 5, and see what happens in the
ciphertext. They would do experiments like this. They would take an encryption
function, they flip a bit, and they wait to see how many times the output of bit -- let's say
how many times when you flip bit 5 bit 6 of the ciphertext is flipped.
Then you had a very nice set of criteria, like the Avalanche Criteria, the Strict Avalanche
Criteria, all these things which were actually statistical properties which nobody really
knew why they're there, but if you [inaudible] to find one of these, you would break the
scheme.
Now, they're using statistical tests, so they took a small dataset, because this is before the
time that most people had access to computers, and actually experting code that does
cryptography was a bit hard. So you had to write everything on your own on your VAX,
whatever you had in your university.
>> 780.
>> Orr Dunkelman: Well, depending on your university. You were lucky you had 780.
And the poor busters who had to work with punch cards at the beginning -- well, at the
beginning of [inaudible] it was a bit out of the place. But, anyway, so people actually
went and tried it and measured and said, okay, 60 percent of the time bit No. 1 is flipped,
so will is some statistical impurity. And people actually verified these results and people
worked very hard.
At some point along the way, we started to understand how things work. So around
mid-'80s, people started to say there is an attack, I didn't verify explicitly, but I can tell
you, I can prove to you that it works. Let's say meet-in-the-middle attacks. We know we
understood the concept of these attacks. We know how they work.
So instead of telling you, you know, I actually went, I did the meet-in-the-middle attack
and computed 2 to the 50 steps, there's a very nice paper by Chaum and Evertse from
CRYPTO '85 finding meet-in-the-middle properties, meet-in-the-middle attacks on
reduced-round DES.
The last attack is on 7 round DES, rounds 2 to 8, with complexity 2 to the 52. I can
assure you, I can guarantee 100 percent that they didn't actually verify this.
But we all know that the attack work. And why we know that it works, well, it's very
simple. We understand the theory behind how the attack works. We understand the
property. We came to an understanding how the things work, how it affects the security.
So we don't actually have to do the experiments, you just have to find the algorithm and
convince you that the theory behind it works.
Now, at that point of time people have started to assume that 1,000 known plaintexts is
okay. As an adversary obtaining 1,000 known plaintexts is a legitimate requirement.
And the time complexity had to be much lower than exhaustive search. This was the
breaking point. Your cipher is broken if by taking 1,000 known plaintexts, I can run for
something which is faster than exhaustive search.
And then came 1991 where differential cryptanalysis was introduced. Now, besides the
fact that this was the first attack on the Data Encryption Center which was faster than
exhaustive search to some sense, it had broken one barrier that we had.
First of all, chosen plaintexts are okay. I mean, before that people said, okay, if you have
one, two, three chosen plaintexts, it's okay. But having real amount of data complexity, 2
to the 47 data, and I think that when Don Coppersmith had to sort of defend the data
encryption standard, he said, well, doing 2 to the 56 encryptions in your garage is must
faster than obtaining 2 to the 47 data in the chosen plaintext manner.
So people would still prefer to use exhaustive search over differential cryptanalysis to
break ciphers, but suddenly in the academic circles it's okay that you require 2 to the 47
chosen plaintexts.
Okay. In '92, '93, the related-key attacks came into play, which is even crazier. I mean,
let's look at known plaintext. I have some standards. I'm encrypting data. I have encrypt
the header according to some standard so I know what is encrypted and I know the
ciphertext.
Then came chosen plaintext. Well, Joan, please encrypt this data so I could break your
skin. Now came the related-key approach which says please encrypt this data under this
key that I don't know, now flip it 5 of the key. Don't tell me what was the beats. And
encrypt this data.
Now, this sounds a bit crazy. How dare we tell the person that we're going to attack,
listen, it's very nice of you that encrypted me 2 to the 47 data, I just need you to flip these
beats of the key or rotating the key a bit and encrypting another 2 to the 47 data. Wow.
We became very, very cheeky people. Yeah.
>> [inaudible]
>> Orr Dunkelman: Yes. I will discuss this a bit later. But there are natural relations.
And specifically the paper from '93 that's -- there were two papers actually that
introduced related-key attacks, the one by Knudsen from '92, and the independent paper
from Biham from '93.
And they actually show the related-key attack with rotation relations which choosing
oracle access to encryption with one key. Even though it's a related-key attack. It was a
sort of a defining [inaudible] properties based on row configurations. But still it's a
related-key attack. It uses rotations. But in access to the oracle with only one key.
The field becomes crazier as time goes by. Now, if you're dealing -- if you're used to let's
say public-key cryptosystems, I mean, yeah, known data, chosen data, who cares.
Related-key attacks, okay, so one out of five times that you generate the keys you will
have issues. Well, depending on the probabilities and all that stuff, but even related-key
is something that you can leave in the public-key cryptosystems.
And then came the '97 competition, the AES competition where people started to be very
paranoid. One strike and you're out. There is some statistical impurity with probability 2
to the minus whatever. 2 to the minus 110 you're out of the game. Now, this is a very
strict approach and it has very good reasons I will discuss in a second.
And then came '99, and then we started to have adaptive chosen plaintext and ciphertext
attacks in symmetric-key cryptosystems.
Again, the models get crazier and crazier. Now, if you're from public-key cryptosystems,
who cares. I have this -- I mean, if you discuss CCA to security, this is the main problem
of public-key cryptosystems. Because, well, the adversary can generate adaptive queries.
In block ciphers it sounds a bit crazy.
So as you can see, slowly we became from a field of people doing practical stuff to a
field of people who do the following crazy something. And this is a quote from one of
my papers. Don't take this CV too seriously.
So time complexity of related-key attack, the total time complexity of step 2B, which was
the dominating part of the attack, is 2 to the 423 SHACAL-1 encryptions. Yeah, yeah,
yeah.
The data complexity, by the way, I didn't write it down here, but it's about 2 to the 160
related-key chosen plaintexts, and there are four related keys. And as you can see, 2 to
the 420 is very practical thing.
Okay. So I'm cheating, right? The thing is the key size of SHACAL-1 is 512 bits.
Meaning that as an adversary, if I'm limiting your time complexity to only 2 to the 423,
you will use my attack. You will not use exhaustive search because it will take you a bit
more time.
Of course, you will not use my attack because the data complexity is a bit high and just
the time required for transmitting all the data, but this is something else. This is one of
the good things about academia. You can do whatever you want and...
So most of the cryptanalytic papers today actually discuss what we call certificational
attacks. So the data complexity is just slightly less than the entire code book. And
actually I have a paper which uses more than the entire code book, FSE 2000 and I think
-7 or -8. Very nice paper, by the way. Seriously. The concepts are nice. The fact that
you need more data than there is, actually we use their several keys, so it's okay.
The time complexity, just slightly less than exhaustive search. And memory, well,
nobody cares about memory complexity too much. I mean, you just need to store the
data. Of course the memory complexity has to be lower than the time complexity
because you have to initialize the memory. And nobody really cares if it's a fast memory
like RAM or it's let's say hard disk or something.
I had once a discussion with some cryptanalyst who suggested the following model to
store huge amounts of data, you send it in a beam to a repeater in some other galaxy
which sends it back making sure it arrives just in time when you need it. Then you don't
have to store anything.
Okay. So actually why are these things are still published? I mean, okay, I ridicule my
field; don't read cryptanalysis papers, they are very boring, very unrealistic.
So one reason is that actually why would you use a primitive which is not optimal?
There are more optimal solutions, so pick the ones that do not suffer from these kind of
attacks.
So if you have to pick between the Rijndael and Serpent, Serpent has more security
margins -- more safety margins, sorry, okay, so pick something which is more secure.
Another thing is that actually if you will publish only papers which have practical attacks,
no paper will be published. Well, almost no paper will be published. And the thing is
that attacks only get better. So if I will not give you this hint about where there is a
problem, and you have a new idea that you didn't thought about that it can help but
somehow it matches the property that I found, and attacks only get better.
And you can see it in attacks, for example, in SHACAL-1. Now the best-known attack
on SHACAL-1 in these crazy settings has time complexity of about 2 to the 300. Which
is still huge amount, but it's much better than 2 to the 400.
So we get better as time goes by.
But actually it doesn't solve our real problems, our core problems, which are answering
questions from users of cryptography. Does this attack affect my system. Okay. I mean,
at the end, there is a buffer between academia research and everyday life. And in some
fields the buffer is very large, in some fields it's very small.
In the case of cryptography, there are security engineers and then there is the public, and
the security engineers have to answer these questions and they have no idea.
Should I still use -- yeah.
>> I think the [inaudible] question is does the next attack affect my system, the one that
hasn't been published yet. Or that would be published because the guys in Ukraine are
busy making money.
>> Orr Dunkelman: That's true. But the problem is that my crystal ball is not really good
today. There are a bit of clouds. Sorry. I'm a bit jet lagged. All the English that I know
went. So I'm terribly sorry for all the mistakes. And actually this is something that
happens a lot. People are still using MD5 for certificates.
>> People is us, by the way.
>> Orr Dunkelman: I was trying to avoid saying that out loud. Brian told me yesterday.
Now, I'm not even discussing the problem of mitigating problems. This is a real issue.
Crypto agility is a real issue. It's not something like, you know -- it's very easy to pick at
Microsoft or for that matter in other company for it takes you so long to mitigate this
problem. But there are good reasons for that. I guess that you know it better for me.
>> Don't worry about it.
>> The problem is in six years [inaudible] still working on it.
>> Orr Dunkelman: Six years is a legitimate time spent in ->> [inaudible]
>> Orr Dunkelman: Of course.
>> [inaudible] fill the position of MD5 removal program management.
>> Orr Dunkelman: Yeah. So as you probably know, if you're -- something not related
to Microsoft. If you're caught spitting in Australia, you can actually ask your fine to be -well, if the camera, if you have an automatic camera that takes a picture of you spitting,
you can ask the fine to be discharged because they use MD5 for computing the -ensuring the authenticity of images that the camera took.
It was a real case in court and the state, instead of calling a real cryptographer just said
well, MD5 is broken and, therefore, the fine is not authentic enough. They discharged
the guy.
So even real -- let's say real-life implications are very hard to predict.
Okay. So what is actually a break. And we have this issue inside the cryptanalysis
community as well. And your guess is as good as mine. Actually, it's probably better
because if you try and think about it from different points of view, from theoretical -theoretical, practical point of view, anything which is better than exhaustive search is
okay.
So this is the extreme approach that says the maximum of time, data and memory is less
than exhaustive search. And of course the data has to be less than the entire code book.
Another approach is that the time, data and memory is better than generic attacks. So we
have meet-in-the-middle -- sorry. We have time-memory-data tradeoff attacks first
introduced by Hellman around the '70s, in the beginning of the '80s, and today we have
some other variants of it, like the rainbow tables.
If your attack is better than these generic attacks, then the cipher is broken. Saying that,
you know, at the end you can break any system with a generic attack. That's the whole
point of generic attacks. They work for [inaudible].
Now, there is a new metric which is promoted by several people which is time times
memory is less than the required for exhaustive search, and the concept behind that is that
if you have a memory circuit, memory gate, take the hardware that was used to
implement it, transform it into hardware that does exhaustive search, so if you take 2 to
the 100 memory, you can transform it into about 2 to the 90, let's say, 2 to the 100 order
of magnitude circuits which do trial encryptions.
Of course this is a bit cheating because it depends if the memory that you need in your
attack is RAM or hard drive or whatever, which makes everything even a bit harder to
compute. And of course the best metric in the real life is money.
I'm limiting your time. I want to find the key in the year. How much money to cost
using this approach, this approach, this approach in the attack. If the attack is lower than
any previous approach, it's a break.
So any guess is good. And it depends on the system you are trying to work with.
So let's put this debate behind us. Let's look at practical attacks.
Now, let's try to upper bound the complexity of attack so we could consider it to be
practical. And I will discuss only time complexity because the moment I'm discussing
the time complexity this is a bound for the data complexity. If you wait 2 to the 56 time
to get the encryptions, this is the time.
So assuming that the time complexity is always as large as the data complexity, I can
discuss only this issue. And of course this also gives a bound on the memory
consumption.
So the DES cracker by EFF, the Electronic Frontier Foundation, computed 2 to the 55
DES encryptions in about 56 hours. Today you can do it even faster. You have the
COPACOBANA machine that the [inaudible] guys built. A very nice FPGA board. You
put about 10 or 12 FPGAs, connect it. It's a machine that used to cost 10,000 Euros,
which is about $50,000 these days. And it computed -- it found a key off the data
encryption standard in about 17 days. So if you want to the key in less time, you just buy
more COPACOBANAs. Assuming the time of [inaudible] is zero [inaudible].
Now, there was a SHA-1 [inaudible] project which tried to do 2 to the 61 evaluations of
SHA-1. Actually, there are debates if they were trying to do 2 to the 63 or 2 to the 61.
But this thing didn't finish. After two years of computation, they just stopped the project.
They didn't really tell why they did so. So we can guess either there was a problem, they
needed more time, or there was a bug in the software. Anything goes.
So I think that if you say 2 to the 64 cycles is a practical time complexity, we will all
agree on that, right? Yeah.
>> The NSA and these guidelines are set to say 2 to the 80 is good enough for this decade
and it's not good enough for next decade.
>> Orr Dunkelman: I agree, but the problem is that -- I mean, if I will try to convince
people, listen, the NSA can read your e-mails. Their assumption is that the NSA can read
their e-mails anyway.
>> No, that's their guidance for civilians so other guys won't read their e-mails.
>> Orr Dunkelman: Still the Russians read your e-mail. No offense. Well, maybe not
your e-mail, but everybody are reading everybody e-mails. And especially these days.
There was a report recently about the countries which actually participate in cyber
warfare, and I think that Russia, China, the U.S., and I think that another country ranked
very high having not on defense forces but also offense forces. And I don't have to tell
you that North Korea is probably doing such things. And if you're from Estonia or you
know somebody from Estonia, you probably know what happened two years I think, a
year ago, two years ago ->> Two years.
>> Orr Dunkelman: -- some people at Russia were very pissed off at the Estonians.
So let's say 2 to the 56. I mean, if you want to play with this bound a bit, 2 to the 64
cycles, which are about 2 to the 56 AES instructions, AES encryptions, just move the
line. It doesn't change anything too much.
So here is a summary of attacks on the AES-256. So remind you, AES has three variants:
128-bit key which has ten rounds; 192, which has 12 rounds; and 256, which has 14
rounds.
So this is the number of rounds of the attack. This is the time complexity. This is the
practical borderline. This is the exhaustive search, the certificational crazy model. And
this is of course logarithmic scale, because otherwise we will need a few floors extra.
So these are results in the single-key model. You can see that the best known attack on
up to eight rounds of AES-256 with access to oracle, which encrypts under one key, takes
about 2 to the 149 or something like that. It's a paper by Demirci and -- I forgot his
name. I'm terribly sorry. From FSE 2008. And the only practical attack on AES-256 is
on six rounds by Ferguson, et al., so go and ask Niels from across the street to discuss
these results.
In the two-key model, or in the four-key model, in the related-key model, the adversary is
allowed to probe the oracle under different keys. The best-known attack is [inaudible]
ten rounds with time complexity of about 2 to the 170 or 2 to the 173, if I remember
correctly.
Now, this was the case up until recently, and then Dmitry and Alex and Ivica in one of
their papers, and Dmitry and Alex proposed two attacks on AES-256. The first one from
CRYPTO 2009 takes about 2 to the 135 with encryptions of their 2 to the 35 related keys.
It's crazy, but it's still better than exhaustive search.
And the recent ASIACRYPT paper, which Dmitry is going to present next week, is 2 to
the 99.5 time complexity and data complexity, and the axis is -- and you use four related
keys.
So as you can see, attacks only get better.
>> So somebody was giving a talk about the attack on full ten rounds in Italy. Anna
[inaudible] I think is her name. Full ten round. But I don't know any of the results. Have
you heard any of those?
>> Orr Dunkelman: I tried to contact here. So, first of all, she claims results, statistical
results on AES-128. As far as I understood from the abstract and from the e-mail that she
answered me, was this is a single-key model.
>> When?
>> Orr Dunkelman: But ->> She didn't give you a time.
>> Orr Dunkelman: Didn't give anything. She just said that after her talk, which I think
is about Thursday or Friday ->> This coming Thursday?
>> Orr Dunkelman: Yeah. The 4th of December I think they have -- either the 3rd or the
4th, they're going to have some sort of crypto day in their university, symmetric key day
in the university. They're going to discuss the security of block ciphers, she will discuss
her results. And then she will publish an abstract of her results.
>> [inaudible] or something?
>> Orr Dunkelman: Who knows. I just want to -- in that line there was a guy, Claude
Gravel, a student of Gil Segav, who gave in Crypto 2009 -- in the rump session he spoke
about the fact that he found statistical problems in AES. And we actually went and redid
all these experiments. And as far as we know, the experiments failed.
So we found some artifacts that are caused by random things. He picked some huge
amounts of data, took 4096 chosen plaintexts, and he said that if you encrypt them, the
ciphertexts have some problems in the [inaudible] test. And we found that it has
current -- we are currently running a second simulation, but the first simulation showed
that you get such large biases after the amount of random trials that you expect. He
claimed that the biases are larger by a factor of 10.
>> [inaudible]
>> Orr Dunkelman: He gave a very quick overview [inaudible] the rump session. He
was speaking just before me, so I remember it very clear. And then somebody told me -somebody. Adi told me implement.
So it's implemented and it's running.
So these are the attacks that we presented in the paper. So the first one, for example, is -these are attacks, the yellow once. And I hope that there is no one here who is color
blind. I'm terribly sorry if you are. I will try to -- so these two results are in the
related-key model.
And you can see that, for example, you can attack up to nine rounds with complexity of 2
to the 39. And most of it is data complexity. Once you gather this amount of data, you
automatically find the things. And the analyzes time is less than 2 to the 32. Meaning
that as an adversary you can sit at home, get the 2 to the 39 data, and then do the analysis
in less than a minute.
In the related-subkey model, which is slightly different, and we'll discuss the differences,
you can attack up to 10 rounds with complexity of 2 to the 45. And even 11 rounds in
quasi or semipractical complexity of 2 to the 70. So now it's a bit -- well, it depends if
you want to put the line here or here or if you're NSA or someone else. But these are
roughly the time complexity.
>> [inaudible]
>> Orr Dunkelman: Sorry?
>> They all require oracle, right?
>> Orr Dunkelman: They all require chosen plaintext capabilities or chosen ciphertext
capability. And under the related-key or the related-subkey model.
So of course we require some knowledge about what's going on, but I will discuss a bit
later that it's not as bad as it sounds.
Okay. So as I said before, the related-key model was introduced by Knudsen and Biham.
And around '96, '97, people started to use related-key differentials.
Now, the concept there was to use differences in the key to cancel differences in the
differential characteristic. Because you have a key schedule algorithm where a difference
is propagating some manner, and then they are injected into the encryption process in
some locations, and you can actually use these differences to play with things.
Now, there is a set of good relations, which we all agree that will are good relations, and
if you will submit a paper it will get accepted and that sort of stuff, so if you have XORs,
rotations or additions, there are legitimate requirements. Legitimate relations.
Now, if you have "and" or "or" and the use of XORs and additions together, people tend
to be very unsupportive of your paper. And the reason for that is that if I give you a
related-key, encryption under a key, okay, and encryption under a key K or 1, I give you
access to these two oracles, you can easily find the least significant bit of the key.
Encrypt under the first one, encrypt under the second one, if the encryption is the same,
then the or 1 didn't change anything in the key. So we know the that the least significant
bit is 1.
Now, there is a guy who presented a paper in workshop on coding and cryptography this
year who claimed that he had an attack on all ciphers using related-key. Now, his
relations are good in the sense there's a theoretical paper by [inaudible] from [inaudible]
from 2003 or 2004 when he -- when they showed that the relation has to be bijective in
order for everything to be okay.
So you can see for example that and/or is not good and XORs and rotations are good
relations.
Now, he showed the generic attack that works using a very crazy but bijective relation.
So this is not something to be afraid of in the practical sense, but from the theoretical
point of view it proves that there is no way to construct a cipher which is secure against
related-key attacks, and then in the end it boils down to whether you have this relation in
the system that you're using.
If you're going to build an S-box and you're going to defend it using a [inaudible]
transformation of the T cipher, which has problems with related-key attacks, because you
transformed a block cipher into a compression function using the Davies-Meyer mode,
you give a very strong hold for the adversary to apply related-key attacks.
If you have a [inaudible] system sitting in nowhere sending data with the key which was
embedded into the device, the related-key relations are very, very weak. You don't have
related-key. Attacks on the sensor there, you might have on the reader at the other end,
the receiver at the other end.
So it's very tricky, but I'm going to discuss relations which are relatively okay, which are
XOR relations.
So, for example, if you discuss the probability of differential, you can see that you take
the probability when you [inaudible] in plaintext. Encrypt it under the key, so you give it
to the oracle, and you take a plaintext with some plaintext difference and you ask what's
the probability of getting some specific ciphertext difference.
So four round AES. The probability here is bounded by 2 to the minus 110.
In related-key differentials, we have encryption of P under the key K and encryption of
PX for delta P under the key K X for delta K. And these probabilities tend to be very
different when you allow yourself some K differences.
Okay. Now, the related-subkey model is a bit crazier. How I define the related-key, let's
say relation which is an XOR. You take the key, you XOR it with some constant and you
ask for the encryption under the XOR'd key.
Now, you can say listen, this is a bit crazy. But reality shows that there are protocols in
which you can do that. There are protocols where you can flip bit No. 5 of the key. Real
protocols.
Now, in the related-subkey model, we take a key K, we look at all the subkeys it
generates, XOR subkey 2 and 3, for example, run backwards the key schedule algorithm,
find the key that satisfies these subkeys, and then use this key. So this is even a bit
crazier, I mean. We don't XOR the key, we XOR a subkey and then run backwards.
Now, the key that is generated adheres to the key schedule algorithm. It's not a fault
attack where you flip bit No. 5 of subkey 20 and you don't do anything else. You flip the
20, you roll back the key schedule to get the new key, and then you generate all the
subkeys according to the key schedule algorithm. So it's not a fault attack and it's
something that you have to be very careful.
Now, this sounds a bit crazy. And it is. But, on the other hand, when you use it in
systems, in real systems -- for example, if you take AES-256 and you put it as a
compression function in the Davies-Meyer mode, you have this capability. And
AES-256 like any good block cipher should be used in the Davies-Meyer transformation
of a block cipher into compression function.
And if you do not follow this transformation [inaudible] there is a standard
transformation taking a good block cipher, making it into a good compression function.
And this transformation allows for related-key, related-subkey, related whatever
capabilities you want.
And AES-256, for example, due to these results, cannot be transferred into a secure
compression function, meaning that there is a problem with AES-256. Okay.
>> AES-256 reduced rounds or was it full AES-256?
>> Orr Dunkelman: So what Dmitry I guess is going to talk about it tomorrow is the 2 to
the 99.5 attack is for the full AES. We discuss here only the ten rounds just because,
well, we try to make the attack practical. If you lose the practicality assumption, then
you can do more.
Okay. So let's go over our results. So trying to -- without entering into too many dirty
details, if you take this key difference, now, this is gray, gray, gray, gray, pink, pink.
Again, I'm terribly sorry if any of you is color blind. Gray means that all these four bytes
have the same difference as these four bytes, which is the same as these four, as these
four, and these two pinks are the same.
So if you put this into the key schedule algorithm of AES-256, you get gray, gray, gray,
gray, two pinks.
When you generate this column, you apply S-box on the shifted or rotated column here.
Y, by the way, means zero difference. So no difference in the input, no difference of the
output of the S-boxes. And the [inaudible] so we get gray, and then this column is the
XOR of this column and this column, so when you XOR two things with the same
difference, you get something without difference, et cetera, et cetera, et cetera, until you
arrive to this round.
These are the same bytes, so this byte is equal to this byte. These are equal. This is
something new. This red thing is something new. And then you have here blue, blue
with green where green is the XOR of blue with pink.
Yeah. This is not RGB scheme. This is just a nice way to -- you know, otherwise it
will -- it will have alpha zero, alpha one running around here, you will have alpha beta -I think we tried to do it with letters. And then at some point we started to look up in the
dictionary for me Greek letters. And at some point we said, okay, colors is okay.
So what do we do with it? And this is the related-key differential that we build with it.
So if you start with an input difference all gray and you XOR gray key difference, you
have no difference, no difference, no difference, two pinks, pinks, these pinks become
blue. I'm terribly sorry. Which after the MixColumn transformation becomes gray. And
then it's canceled.
Now, let's -- trust me, it works. But the interesting part here is the fact that you can have
one round with no active S-boxes. This happens with probability 1. If we had only
differentials or differential characteristics, it would be 2 to the minus 6. Two rounds with
only two active S-boxes rather than five. Three rounds with one, two, and that's it. Two
active S-boxes rather than nine.
And in total this entire path has probability of 2 to the minus 56 for eight rounds. And
you can count the number of active S-boxes, this one, two, three, four, five, six, seven,
eight, nine, for eight rounds.
The wide trail strategy ensures that you will have at least 50 when you do only standard
differential attacks.
Now, in our attack we -- or actually in some of the attacks we don't care what happens in
these 12 bytes. This where this helps us a bit with the probabilities. This increases 2 to
the 9 of 36.
Now, these probabilities, and actually this is something which we always hide or assume
that all the transitions are independent, that the probability of the entire transition is the
product of all the transitions, the probabilities of all transitions.
Now, usually you cannot verify it, but when you have a practical attack you can. So we
looked at the seven rounds differential characteristic, because the eight round is a bit
harder to verify. So we look at the first seven rounds. So stopping everything here.
Doesn't matter whether you stop it here or here. This transition is no different and no
probability associated with it. And we generated the differential characteristic is
expected to have probability 2 to the minus 30.
So we took each time 2 to the 30 to random plaintext pairs, which adhere to the input
difference [inaudible] checked whether what happens in the output difference. Now, in
this scenario, because this is a random process and everything, you do not always get
four. Sometimes you get more, sometimes you get less, the number of right pairs, pairs
which satisfy. This entire characteristic behaves like [inaudible] with mean value of four.
Trust me, it was [inaudible] value of four. I'm not going to cover this.
So you can see that in theory, [inaudible] if we run 100 experiments, we should have
expected 1.8 tests with no right pairs. We had zero. One right pair is expected in 7.3 of
the tests, but we had 10 of these. And you can see that it's relatively okay. I mean, this is
not exactly [inaudible] four, but if it's [inaudible] 3.5 or 4.5, we're not going to fight over
it.
But we can't. It's not the point. Yes.
>> What's [inaudible] square [inaudible]?
>> I think that in this case you shouldn't see in problem. It should be the same. But, you
know, they say should is the name of a fish. In Hebrew should -- the world for should,
amoule [phonetic], is the name of a fish. It's type of a carp fish, so...
We haven't done this experiment. Actually I would run it afterwards. So in the
related-subkey model, what you do is you shift all the differences one round. Now,
usually when you shift differences one round, it has no effect. But when you have
related-keys thrown in, then the key schedule behaves slightly differently.
So that's why we needed the related-subkey assumption, so we push everything one
round forward or two rounds forward depending on the exact attack, and then let the
related-subkey assumption take care of the issues that form because of the slightly
different locations.
But all in all, the attack is the same. I mean, if I show this related-key differential to a
cryptanalyst, he would say, okay, take 2 to the 36 pairs, for example, we start with 2 to
the 36 pairs, you expect to get one right pair with this difference. When you find this pair
you know that something's happened, and you can use this to find the key.
So starting from here, it's a very standard transformation. I mean, I'm omitting some
stuff, but it's not important.
Now, we also looked at another attack scenario where the plaintexts are not generated at
random plaintexts. Usually chosen plaintexts assume that the adversary choose a
plaintext at random and then plaintext XOR something because we need to get
randomness from somewhere. This is just for reasons for picking plaintexts at random.
Now, we decided to look at counter mode. So counter mode is a way to transform a
block cipher into a stream cipher. You initialize a plaintext, which is some IV in the
counter. You encrypt it under the key and then you XOR the outcome with the plaintext
you're trying to encrypt. You [inaudible] the counter and you do it again and again and
again.
So we said let's assume we have such a system. The plaintexts are not generated -actually, the attack goes after the stream, not after the plaintext, but after the stream that
was generated, in a very nonrandom way, very deterministic way. Is the attack still
works? And actually we did the experiment and it works. The same distribution, the
same -- everything is the same.
So if you have a counter mode embedded in some system, you let it run. Now, we take it,
we reinitialize. We have to change the key. This is a related-key attack. We have to
change the IV at some point -- well, and change the counter and let it run as well. And if
you do it correctly, the attack works. We actually verified it.
Now, we also did some attacks when we say, okay, let's assume that we use ECB -- that's
a bad way, but -- and we restrict ourselves to cases where all the plaintexts are ASCII
characters. And that actually works. The differential still holds. The probabilities work.
We also discussed the case where actually the ASCII characters are numeric values. So
we have a database encrypting only numbers. And the differentials, because they are -well, they are, still works, so if you're going to have some security engineer saying I want
to use AES-128 because it's really, really fast, and I want to make it more secure by
extending the key schedule algorithm from AES-128 to AES-256, 256-bit key is better
than 128. And the speed of AES-128 is still better.
So if you take AES-128 and you fortify it, you actually completely crash the security in
the related-key model. I have to be honest. It's in the related-key model.
Now, the minimal Hamming weight of the key difference is 24. Now, this is important if
you're hardware people and you take the device and you start hitting it until the key flips
in the right positions.
So the minimal Hamming weight of the key difference [inaudible] is 24. But we didn't
try to optimize this a bit too much, so I guess that you can actually improve this a bit.
But it's not a real issue, I think.
Okay. So this is a summary of the attacks. You can see whether it's a key difference or a
subkey difference or related-key or related-subkey. If it's a distinguisher, or you can
retrieve some bits of the key or the full key. For example, the nine rounds related-key
attack, related-key attack takes 2 to the 39 time, 2 to the 39 data, and 2 to the 32 memory.
Well -- yeah.
>> So going back to the Davies-Meyer application, so I see you get like in seven key bits.
Have you thought about what that translates into Davies-Meyer application for getting
collisions?
>> Orr Dunkelman: Yeah. You can actually -- there is a set of differentials with start
with zero difference and ends with zero difference.
>> [inaudible]
>> Orr Dunkelman: Yeah.
>> [inaudible]
>> Orr Dunkelman: Yeah. But you still need to do some -- so what happens is that, for
example, if you skip this round, you start here. So with zero difference here. And you
stop here, for example, you have zero difference.
Now, I'm cheating a bit, because there are more rounds, but then this differential is
optimized for probability, not for finding collusions. Dmitry did the work with finding
collisions. He found collisions. I think in the CRYPTO paper, in the ASIACRYPT
paper, there is a collision for 13 rounds AES-256 in Davies-Meyer mode.
>> Is that [inaudible]?
>> Orr Dunkelman: I think both are on the ICR reprints.
So you can do really fun stuff when you allow yourself crazy models.
Okay. So can I take a bit more of your time?
>> Yeah, please.
>> Orr Dunkelman: Okay. Great.
So what are the security implications, because at the end you will have to answer
questions from users like does this affect my system.
So, for example, if you extend AES-128 to 256-bit key, actually you lose security. And
the real issue I think is the fact that the security margins of AES-256 are much smaller
than what we expected. AES-256, AES-128, AES-192 were all designed to be secure
against related-key attacks.
Unfortunately, the two latter, the 192 and 256, are not secure enough.
Now, if you ask me if this is going to change anything, the probability of NIST
modifying the AES specifications is, well, like the probability that tomorrow I will visit
the moon. And the reason for that is that AES was actually widely deployed. Unlike the
SHA-1, MD5, SHA-2 situation, where even are trying not to mitigate to something new,
AES was very quickly adopted and there are many products using AES, there are
low-end devices using AES. AES succeeded so well that if somebody's going to
transform AES today to something else it won't happen.
Now, you can change, for example, the key schedule algorithm.
>> Yeah, that would be the easiest way to do, right, just make that key schedule?
>> Orr Dunkelman: Yes. But the problem is that when you make such things, first of all,
you don't know what new problems you introduce. But the real issue is that you cannot
change it because there are so many devices around the world, and I'm not talking even
about the software implementations. There are so many hardware implementations that
you cannot change back. That's it. We're going to be with AES for a very long time.
And this is very unlikely to start any AES-2 competition, saying, okay, AES was not
good, we have to start again. But this won't happen. Because currently they have the
SHA-3 competition. And my feeling is that they would rather to roll back time and not
have this competition rather than --
>> You think they'd rather not have the SHA competition?
>> Orr Dunkelman: I think it causes them more headache than they expected. They got
51 submissions, proper and complete submissions. And now they have to fight with
everybody an nobody likes them because his hash function was not selected and even I
have issues with them because I had two submissions and one them didn't pass to the
second round. I mean, it was a good one. I'm now discussing the one that didn't -- that
did pass. Of course it's great. But the one that didn't pass, it should have.
>>Is the one that passed the one with the AES [inaudible]?
>> Orr Dunkelman: Yes. Of course the one based on the AES round -- and now
everybody are a bit -- so now there is a question how this will affect the NIST
competition. If you're an AES based, do you have greater chances or lower chances.
Now, of course nobody used the key schedule algorithm of AES. AES based means that
we took the AES round, we stuck it in several locations and said, okay, we use it as a
nonlinear operation, and the reason for that, one of the reasons for that is reuse of
hardware and software.
The fact that Intel and soon later AMD is going to introduce the AES instruction, so we
can do one AES round in six cycles. This is the most diffusion confusion pair cycles that
you can think of. Yes.
>> So going back to key schedule, so you found this brilliant, you know, set of
differences in probability 1. Could you share how you found those and if there's more
lurking? Is this optimal?
>> Orr Dunkelman: So I was looking at the key schedule algorithm of AES-256 since
2005, 2004, actually. So we had some paper about related-key -- related-key differential
boomerang -- never mind. Some sort of [inaudible]. And we were optimizing to have
very good differential characteristics, and we assumed that in order to have that you have
to have minimal number of active S-boxes in the state.
So we did everything we could to make sure that the number of active S-boxes locally
was optimal, and we missed the big picture. And then Dmitry came with his papers about
the full AES, 256.
And then we saw that there is a different way to deal with that, which pays more in the
number of active S-boxes locally, but globally you gain a lot.
So right now I think that he has an automated tool for finding best differential
characteristics, related-key differential characteristics of up to X rounds. But we did this
by hand, so, yeah, very old-fashioned sort of an approach to -- but the thing is that I don't
believe we try to look -- I know that Dmitry ran his tool, so I know that [inaudible] that
the tool can find.
But we needed some -- you know, a change in the state of mind trying to optimize
globally rather than locally. And there was a change in the state of mind. And when you
try to actually -- we tried other things. We actually tried other things, having pink 1 and
pink 2, having -- it doesn't work.
But I'm not going to commit to the fact that this is the best thing you can find because I'm
sure, 99.5 percent sure that if you do something else, you slightly change the model, you
push the related-subkey model even further, you say I take Wiki classes, so classes of
Wikis. Under the related-subkey model, you can find something which is even stronger.
I'm sure it's lurking around somewhere, but we didn't do it that far because I think that it
would take time for the community to understand that the related-subkey model is
legitimate.
Personally, I don't -- I'm not still 100 percent confident it's legitimate. It's legitimate in
some senses; it's still weird in some other. Okay.
Okay. So let's -- so did we break the full AES with practical complexity, the answer is of
course no. And should users be worried, and I think that users should be worried, but I
would advise against panicking. I mean, don't phase out AES-256 yet. But if you're
already running over the code and making sure that you can change the hash function
from MD5 to something else, think about putting the hooks to change AES to something
else in the distant future.
That's it. And the paper is on ePrint.
[applause]
>> So I asked this question a lot a few months ago, and I got a consistent answer. I'm
wondering if it's still consistent now. If you have your choice in practice AES-128 or
256 ->> Orr Dunkelman: 256.
>> Okay.
>> Orr Dunkelman: Any day. But this is as long as we're not discussing compression
function. This is only if we discuss encryption, counter mode, or elephant. Or a real
encryption. If you're going to use crazy [inaudible] if you have tweakable block ciphers,
and actually some of the tweakable block ciphers assume that you can put in public
tweak, which changes the key schedule algorithm, and they just introduced related-key
attacks with [inaudible], usually when you start a paper with related-key attacks, you
have to give examples saying related-key attacks are very important because the S-box,
because whatever.
Now, if you have tweakable block ciphers where people are allowed to change the key
schedule from the distance with the public parameter, the related-key model seems a bit
more practical.
So as long as you're discussing classic mode of operation, no tweakable block ciphers,
nothing else, AES-256. Compression functions, tweakable block ciphers, anything else,
AES-128.
>>Kristin Lauter: Okay. So Orr is actually here all week until Friday. And if anyone
would like to go to lunch now with Orr and continue the discussion, please let me know.
Let's thank Orr again.
[applause]
Download