>> Kristin Lauter: Okay. So today we're very pleased to have Orr Dunkelman visiting us. He's going to speak to us on key recovery attacks of practical complexity on AES variants. Orr is a postdoc at the Weizmann Institute and has been a postdoc at the Ecole Normale Superieure and at Leuven, something like that. So thank you for coming. >> Orr Dunkelman: Thank you for having me. So this is a joint work with Alex Biryukov, Nathan Keller, Dmitry Khovratovich, and Adi Shamir. And I'm going to discuss key recovery attacks of practical complexity on AES variants. Now, AES is advanced encryption standards. So just to get a rough estimation, how many of you know what the AES is [inaudible]? Great. Okay. So we can skip some of the things. It's always a problem when you go to a place and you don't know who you're going to meet there. So I will go very briefly over the concepts of the AES, why we thought it was secure, and then a bit about attacks on the AES. Now, cryptanalysis in recent years, especially symmetric-key cryptanalysis, went towards a very theoretical approach. Now, when I say theoretical approach, I don't mean P versus NP sort of theoretical approach; I mean towards 2 to the 400th times complexity approach. Now, that prevents us to discuss things with other practical people who say that, that 2 to the 400 is a bit too much, and theoretical people who complain that order of 1 is order of 1 and nobody cares about it. So we're stuck somewhere in the middle. And we're trying actually to [inaudible] this problem of a very huge gap between what is considered practical and what is considered theoretical, because whenever I'm going to say in this talk theoretical, I mean practical theoretical, 2 to the 400. And when I say practical, I mean something which is practical, you can do it on your nearest -- well, either grids or even your PC. So let's start with a very quick reminder what a block cipher is. So this is one of the most basic cryptographic algorithms. There was the data encryption standard. That was standardized in 1975 or -7, depending how you count it. Somewhere in between. So there is a symmetric key which is shared between the two sites. And there is a transformation. Actually, a block cipher is a permutation, a key permutation that takes blocks of beats of N beats and generates usually the same size of the outputs. This is sort of a bijection, not necessarily into the same space, because some block ciphers have additional functionality. But today most of the time we deal with block ciphers which take an N beat block and return an N beat block. So if you want to treat it formally, there is N beats plaintext K bit key, and it's just transformation. Now, actually when you encrypt data you have more than that. So you have some mode of operation, a way to wrap around this block cipher which can treat only a specific amount of data at each, invocation. So there is a system behind the block cipher. And we have to remember that, that it's not out of the blue, standing there. I mean, like there is this block box, and that's whatever you want. You have to somehow communicate with it and make it give you the right inputs or the right functionality that you're looking for. Okay. So this is a very crash course of differential cryptanalysis. Differential cryptanalysis is a method to attack or cryptanalyze block ciphers. It was introduced in the '90s by Biham and Shamir. And its basic idea is instead of looking at the specific value and try to determine the key by looking at its decryption, you look at pairs. So you take two encryptions, P and P star, and then you look at the differences in how things behave. Now, the reason this is very good is because the differences have the tendency to propagate very nicely. For example, if you have a linear operation, the difference of the output is the linear transformation on the different of the inputs. Now, this is really great because when I say linear, in cryptography sometimes we mix linear and affine. It also worked for affine transformations, but the nice thing is that when you XOR a key, the difference before the XOR and after the XOR is the same. So the keys appears from the encryption algorithm. Of course other things becomes a bit harder. For example, when you have no linear operations, they actually get very annoying when you try to analyze it. Because when you have a nonlinear operation, you know the input difference, you do not necessarily know the output difference, besides the standard case for the input difference is zero. Assuming that your function is deterministic, which is most of the times, unless you're using Pentium 60 megahertz, the first generation, when you put the same input you get the same output. Okay. So a quick introduction to the AES. So as I said before -- sorry, as you probably know, it was designed by Vincent Rijmen and Joan Daemen. Under the name Rijndael it was submitted to the NIST competition in 1998. There was a very long process of selecting the AES. Many people tried to break it. And at the end it was selected. It has an SP network structure. I will show in a second. Its block size is 128 beats; key size, 128, 192, or 256. Now, most modern symmetric-key ciphers operate in the concept of the rounds. So we have some basic transformation, and then you repeat it. You iterate it again and again. In the case of AES, it's either 10, 12, or 14 rounds depending on the key size. Questions so far? Great. So input, this is the state. You take 16 bytes of the state you put it in 4-by-4 metrics. Each round is composed of four simple operations. The first one is the SubBytes operation, which is a nonlinear transformation; a table lookup, if you like. A ShiftRow operation, which takes each row and rotates it to the left by I bytes. So the first row is unrotated, the second one is rotated by one byte to the left, two bytes, three bytes to the left. Then the MixColumn separation, which takes each column and multiplies it by an MDS matrix. I will show in a second. And then the [inaudible] key operation which adds the subkey. Now, usually, as I told you before, there is a key which is used to generate many subkeys. The reason is that if we have a key of let's say 128 beats or 256 beats, then you have to generate enough keying material, subkey material, to affect the entire encryption. So there is also a subkey generation algorithm or key schedule algorithm which transforms this 256-bit key into 15 subkeys of 128 beats each. So here's the MixColumn operation. You take four bytes and you multiply it by this very nice matrix. It's an MDS maximal distance ->> Separated? >> Orr Dunkelman: Separate. Thank you very much. I keep on calling it maximal distance something. The S is very useful. >> [inaudible] >> Orr Dunkelman: So if you have a change -- if you change one byte in the input to the MDS multiplication, you will affect at least four bytes, meaning all byte at the output. If you change two, you affect at least three. If you change three, you affect at least two. And if you change four, well, you may affect only one, but it's more probable that you affected everything. So this is a very nice thing because modern enciphers are built around the concept of confusion, which is usually the S-boxes, the nonlinear parts, and defusion, which takes this -- a bit of nonlinearity and spreads it all around. Now, in order just to give you the field, if you're interested, it's 11B, X to the 8th. Et cetera. It doesn't really matter. I'm not going to discuss this anywhere in the talk, just for completeness. The S-box, SubBytes, you, first of all, invert the inputs over the same field where we define zero to be itself inverse. Then you compute this nice matrix over GF2 and add this nice vector. Now, this is a very nice S-box because it has very, very nice differential properties. Specifically, you get something which is uniform as you can get for differences -- for the differentials -- okay, sorry. Difference distribution table is a prediction saying given some input difference what is going to be the output difference. And there are probabilities associated with it because this is not -- when you look at differences, you lose some determinism in the sense that the difference, one, can be either between the pair of values zero and one, but it can between the pair two and three or three -- sorry, between four and five. So when you have something like that, you don't know actually what was the pair. And this is actually how the key affects differential cryptanalysis, because it actually selects which of the pairs with input difference one was entered. But there are 128 such pairs. And the difference distribution table, which is a sort of prediction, just a probability distribution. If you have an input difference 1, the output difference is going to be 3 with probability either 0, 2 to the minus 7, or 2 to the minus 6. This is the case for the S-box that is written here. And specifically this is the best that you can get with an 8-beat to 8-beat S-box. There is one entry of 2 to the minus 6, and 126 entries with probability to the minus 7. And all the rest is 0, given an equal distance. So it's a very nice S-box. Okay. The key schedule algorithm, you first of all put the key in a 4-by-8 matrix. Now, you work on all wards, all columns of 32 beats or 4 bytes. So the first column you take these two things, so first you rotate this column a bit. And you put it through the S-box and generate this new byte. The last byte is affected by also some round constants to prevent this structure to be too self-similar. We have attacks that are based on such things. So this is how it works. The next column is generated by a simple XOR of two columns. Simple XR. Simple XR. And then, again, we have an S-box layer, this time without rotation. And an XR, and then another three rounds of simple things. Now, the reason for making this thing only use the S-box every four wards is efficiency. No other reason behind it besides that. Okay. So as I said before, these S-boxes are really, really nice. They have very good differential appropriates. The MixColumn operations, because it's an MDS matrix, make sure that you have a diffusion from one byte to the entire column. And when you take into consideration the fact that there is a ShiftRow separation, if you change even single beat of the plaintext, after two rounds you may affect all the beats of the internal state. This is a very strong property. It's actually a very good property. It was designed using some sort of thing called the wide trial strategy, which assures that the number of active S-boxes -- okay. An active S-box is an S-box with nonzero input difference. When there is a zero input difference, there is a zero output difference with probability 1. When you have a nonzero input difference, you have some distribution on the output difference. So an active S-box is an S-box with probability associated with the transitions. So we would prefer to have as few active S-boxes as possible. And specifically in the case of AES, you can show that if you have two rounds, the minimal number of active S-boxes is five, for three rounds it's nine, and for four rounds it's 25. Now, the probability of each transition is at most 2 to the minus 6. Assuming independence between the transitions, which is not necessarily the case, the probability for any four-round differential characteristic, every prediction of saying this input difference became this input difference after one round, became this output difference after two rounds, after four rounds we'll have probability of at most 2 to the minus 150. Now, the state is -- sorry. It's 128 beats. Meaning that you have 2 to the 127 pairs. If you take 2 to the 127 pairs, each of them satisfies the differential characteristic with probability 2 to the minus 150, the amount of pairs satisfying all the differential transitions is going to be less than 1. Meaning the probability of finding a pair which satisfies this differential characteristic is negligible. Okay. There are some other four-round assurances. For example, after four rounds, there are no more impossible differential cryptanalysis differentials, and there are no four-round square properties. More than four rounds. Five rounds is the -- now, the thing is that I'm saying -- well, it was believed that AES is secure because differential -there are no good differential characteristics. You start with some input difference for any transitions, for any set of transitions, the probability of finding something good is very small. Now, the thing is that actually in differential cryptanalysis we do not care usually about the exact paths, what were the transition differences. We care only about the input and the output differences. And this is done -- this is mostly because of the way the attacks work. You take many pairs and you find the pairs which satisfy some input and output difference. You don't go after all the things in the middle. So there is a concept called differential which is actually saying what's the probability that a given input difference will become some output difference independent of the transitions in the middle. So it might be the case that each path has a very low probability, but the total probability of an input/output difference pair is much higher. So there is a series of papers that actually show that four-round AES has no differentials, meaning there is no input/output pair -- input difference/output difference -- sorry -- pair, which holds with probability of 2 to the minus 110. Meaning that you still need huge amounts of data just to break four rounds of AES. And that's why differential cryptanalysis will not work on AES, and the same goes for linear cryptanalysis, which is another flavor of cryptanalysis based on approximating the encryption by using some sort of a set of equations with some error margins. Okay. So hopefully by now you are convinced that AES is secure. Right? AES is secure. I mean, NIST had to pick a good cipher. And they looked at the five finalists, and they said, well, we don't like Bruce -- sorry. I'm kidding. >> Serpent is too slow. >> Orr Dunkelman: Serpent is to slow. MARS is too complicated. If we will pick Twofish, the hardware people are going to kill us. R66 [phonetic], well, there is some duplication, the hardware people will not really like it. Rivest has too much money anyway, so we don't have to give him any more honor. Yeah, let's pick Rijndael. Now, they had to do so very -- I mean, well, they had to do it very delicately, and they had to make sure that it's secure. And people try to attack AES, and, you know, with the wide trail strategy, people had very good feeling about the security of AES. So it's actually secure. Sort of. We'll see in a second. So -- how did we get here? Sorry. Okay. So here's a bit of an overview of what happened in cryptanalysis over the years. Now, in the early '80s, when cryptanalysis just became academic research field, people looked at ciphers and said here is a cipher. They twiddled beat No. 5, and see what happens in the ciphertext. They would do experiments like this. They would take an encryption function, they flip a bit, and they wait to see how many times the output of bit -- let's say how many times when you flip bit 5 bit 6 of the ciphertext is flipped. Then you had a very nice set of criteria, like the Avalanche Criteria, the Strict Avalanche Criteria, all these things which were actually statistical properties which nobody really knew why they're there, but if you [inaudible] to find one of these, you would break the scheme. Now, they're using statistical tests, so they took a small dataset, because this is before the time that most people had access to computers, and actually experting code that does cryptography was a bit hard. So you had to write everything on your own on your VAX, whatever you had in your university. >> 780. >> Orr Dunkelman: Well, depending on your university. You were lucky you had 780. And the poor busters who had to work with punch cards at the beginning -- well, at the beginning of [inaudible] it was a bit out of the place. But, anyway, so people actually went and tried it and measured and said, okay, 60 percent of the time bit No. 1 is flipped, so will is some statistical impurity. And people actually verified these results and people worked very hard. At some point along the way, we started to understand how things work. So around mid-'80s, people started to say there is an attack, I didn't verify explicitly, but I can tell you, I can prove to you that it works. Let's say meet-in-the-middle attacks. We know we understood the concept of these attacks. We know how they work. So instead of telling you, you know, I actually went, I did the meet-in-the-middle attack and computed 2 to the 50 steps, there's a very nice paper by Chaum and Evertse from CRYPTO '85 finding meet-in-the-middle properties, meet-in-the-middle attacks on reduced-round DES. The last attack is on 7 round DES, rounds 2 to 8, with complexity 2 to the 52. I can assure you, I can guarantee 100 percent that they didn't actually verify this. But we all know that the attack work. And why we know that it works, well, it's very simple. We understand the theory behind how the attack works. We understand the property. We came to an understanding how the things work, how it affects the security. So we don't actually have to do the experiments, you just have to find the algorithm and convince you that the theory behind it works. Now, at that point of time people have started to assume that 1,000 known plaintexts is okay. As an adversary obtaining 1,000 known plaintexts is a legitimate requirement. And the time complexity had to be much lower than exhaustive search. This was the breaking point. Your cipher is broken if by taking 1,000 known plaintexts, I can run for something which is faster than exhaustive search. And then came 1991 where differential cryptanalysis was introduced. Now, besides the fact that this was the first attack on the Data Encryption Center which was faster than exhaustive search to some sense, it had broken one barrier that we had. First of all, chosen plaintexts are okay. I mean, before that people said, okay, if you have one, two, three chosen plaintexts, it's okay. But having real amount of data complexity, 2 to the 47 data, and I think that when Don Coppersmith had to sort of defend the data encryption standard, he said, well, doing 2 to the 56 encryptions in your garage is must faster than obtaining 2 to the 47 data in the chosen plaintext manner. So people would still prefer to use exhaustive search over differential cryptanalysis to break ciphers, but suddenly in the academic circles it's okay that you require 2 to the 47 chosen plaintexts. Okay. In '92, '93, the related-key attacks came into play, which is even crazier. I mean, let's look at known plaintext. I have some standards. I'm encrypting data. I have encrypt the header according to some standard so I know what is encrypted and I know the ciphertext. Then came chosen plaintext. Well, Joan, please encrypt this data so I could break your skin. Now came the related-key approach which says please encrypt this data under this key that I don't know, now flip it 5 of the key. Don't tell me what was the beats. And encrypt this data. Now, this sounds a bit crazy. How dare we tell the person that we're going to attack, listen, it's very nice of you that encrypted me 2 to the 47 data, I just need you to flip these beats of the key or rotating the key a bit and encrypting another 2 to the 47 data. Wow. We became very, very cheeky people. Yeah. >> [inaudible] >> Orr Dunkelman: Yes. I will discuss this a bit later. But there are natural relations. And specifically the paper from '93 that's -- there were two papers actually that introduced related-key attacks, the one by Knudsen from '92, and the independent paper from Biham from '93. And they actually show the related-key attack with rotation relations which choosing oracle access to encryption with one key. Even though it's a related-key attack. It was a sort of a defining [inaudible] properties based on row configurations. But still it's a related-key attack. It uses rotations. But in access to the oracle with only one key. The field becomes crazier as time goes by. Now, if you're dealing -- if you're used to let's say public-key cryptosystems, I mean, yeah, known data, chosen data, who cares. Related-key attacks, okay, so one out of five times that you generate the keys you will have issues. Well, depending on the probabilities and all that stuff, but even related-key is something that you can leave in the public-key cryptosystems. And then came the '97 competition, the AES competition where people started to be very paranoid. One strike and you're out. There is some statistical impurity with probability 2 to the minus whatever. 2 to the minus 110 you're out of the game. Now, this is a very strict approach and it has very good reasons I will discuss in a second. And then came '99, and then we started to have adaptive chosen plaintext and ciphertext attacks in symmetric-key cryptosystems. Again, the models get crazier and crazier. Now, if you're from public-key cryptosystems, who cares. I have this -- I mean, if you discuss CCA to security, this is the main problem of public-key cryptosystems. Because, well, the adversary can generate adaptive queries. In block ciphers it sounds a bit crazy. So as you can see, slowly we became from a field of people doing practical stuff to a field of people who do the following crazy something. And this is a quote from one of my papers. Don't take this CV too seriously. So time complexity of related-key attack, the total time complexity of step 2B, which was the dominating part of the attack, is 2 to the 423 SHACAL-1 encryptions. Yeah, yeah, yeah. The data complexity, by the way, I didn't write it down here, but it's about 2 to the 160 related-key chosen plaintexts, and there are four related keys. And as you can see, 2 to the 420 is very practical thing. Okay. So I'm cheating, right? The thing is the key size of SHACAL-1 is 512 bits. Meaning that as an adversary, if I'm limiting your time complexity to only 2 to the 423, you will use my attack. You will not use exhaustive search because it will take you a bit more time. Of course, you will not use my attack because the data complexity is a bit high and just the time required for transmitting all the data, but this is something else. This is one of the good things about academia. You can do whatever you want and... So most of the cryptanalytic papers today actually discuss what we call certificational attacks. So the data complexity is just slightly less than the entire code book. And actually I have a paper which uses more than the entire code book, FSE 2000 and I think -7 or -8. Very nice paper, by the way. Seriously. The concepts are nice. The fact that you need more data than there is, actually we use their several keys, so it's okay. The time complexity, just slightly less than exhaustive search. And memory, well, nobody cares about memory complexity too much. I mean, you just need to store the data. Of course the memory complexity has to be lower than the time complexity because you have to initialize the memory. And nobody really cares if it's a fast memory like RAM or it's let's say hard disk or something. I had once a discussion with some cryptanalyst who suggested the following model to store huge amounts of data, you send it in a beam to a repeater in some other galaxy which sends it back making sure it arrives just in time when you need it. Then you don't have to store anything. Okay. So actually why are these things are still published? I mean, okay, I ridicule my field; don't read cryptanalysis papers, they are very boring, very unrealistic. So one reason is that actually why would you use a primitive which is not optimal? There are more optimal solutions, so pick the ones that do not suffer from these kind of attacks. So if you have to pick between the Rijndael and Serpent, Serpent has more security margins -- more safety margins, sorry, okay, so pick something which is more secure. Another thing is that actually if you will publish only papers which have practical attacks, no paper will be published. Well, almost no paper will be published. And the thing is that attacks only get better. So if I will not give you this hint about where there is a problem, and you have a new idea that you didn't thought about that it can help but somehow it matches the property that I found, and attacks only get better. And you can see it in attacks, for example, in SHACAL-1. Now the best-known attack on SHACAL-1 in these crazy settings has time complexity of about 2 to the 300. Which is still huge amount, but it's much better than 2 to the 400. So we get better as time goes by. But actually it doesn't solve our real problems, our core problems, which are answering questions from users of cryptography. Does this attack affect my system. Okay. I mean, at the end, there is a buffer between academia research and everyday life. And in some fields the buffer is very large, in some fields it's very small. In the case of cryptography, there are security engineers and then there is the public, and the security engineers have to answer these questions and they have no idea. Should I still use -- yeah. >> I think the [inaudible] question is does the next attack affect my system, the one that hasn't been published yet. Or that would be published because the guys in Ukraine are busy making money. >> Orr Dunkelman: That's true. But the problem is that my crystal ball is not really good today. There are a bit of clouds. Sorry. I'm a bit jet lagged. All the English that I know went. So I'm terribly sorry for all the mistakes. And actually this is something that happens a lot. People are still using MD5 for certificates. >> People is us, by the way. >> Orr Dunkelman: I was trying to avoid saying that out loud. Brian told me yesterday. Now, I'm not even discussing the problem of mitigating problems. This is a real issue. Crypto agility is a real issue. It's not something like, you know -- it's very easy to pick at Microsoft or for that matter in other company for it takes you so long to mitigate this problem. But there are good reasons for that. I guess that you know it better for me. >> Don't worry about it. >> The problem is in six years [inaudible] still working on it. >> Orr Dunkelman: Six years is a legitimate time spent in ->> [inaudible] >> Orr Dunkelman: Of course. >> [inaudible] fill the position of MD5 removal program management. >> Orr Dunkelman: Yeah. So as you probably know, if you're -- something not related to Microsoft. If you're caught spitting in Australia, you can actually ask your fine to be -well, if the camera, if you have an automatic camera that takes a picture of you spitting, you can ask the fine to be discharged because they use MD5 for computing the -ensuring the authenticity of images that the camera took. It was a real case in court and the state, instead of calling a real cryptographer just said well, MD5 is broken and, therefore, the fine is not authentic enough. They discharged the guy. So even real -- let's say real-life implications are very hard to predict. Okay. So what is actually a break. And we have this issue inside the cryptanalysis community as well. And your guess is as good as mine. Actually, it's probably better because if you try and think about it from different points of view, from theoretical -theoretical, practical point of view, anything which is better than exhaustive search is okay. So this is the extreme approach that says the maximum of time, data and memory is less than exhaustive search. And of course the data has to be less than the entire code book. Another approach is that the time, data and memory is better than generic attacks. So we have meet-in-the-middle -- sorry. We have time-memory-data tradeoff attacks first introduced by Hellman around the '70s, in the beginning of the '80s, and today we have some other variants of it, like the rainbow tables. If your attack is better than these generic attacks, then the cipher is broken. Saying that, you know, at the end you can break any system with a generic attack. That's the whole point of generic attacks. They work for [inaudible]. Now, there is a new metric which is promoted by several people which is time times memory is less than the required for exhaustive search, and the concept behind that is that if you have a memory circuit, memory gate, take the hardware that was used to implement it, transform it into hardware that does exhaustive search, so if you take 2 to the 100 memory, you can transform it into about 2 to the 90, let's say, 2 to the 100 order of magnitude circuits which do trial encryptions. Of course this is a bit cheating because it depends if the memory that you need in your attack is RAM or hard drive or whatever, which makes everything even a bit harder to compute. And of course the best metric in the real life is money. I'm limiting your time. I want to find the key in the year. How much money to cost using this approach, this approach, this approach in the attack. If the attack is lower than any previous approach, it's a break. So any guess is good. And it depends on the system you are trying to work with. So let's put this debate behind us. Let's look at practical attacks. Now, let's try to upper bound the complexity of attack so we could consider it to be practical. And I will discuss only time complexity because the moment I'm discussing the time complexity this is a bound for the data complexity. If you wait 2 to the 56 time to get the encryptions, this is the time. So assuming that the time complexity is always as large as the data complexity, I can discuss only this issue. And of course this also gives a bound on the memory consumption. So the DES cracker by EFF, the Electronic Frontier Foundation, computed 2 to the 55 DES encryptions in about 56 hours. Today you can do it even faster. You have the COPACOBANA machine that the [inaudible] guys built. A very nice FPGA board. You put about 10 or 12 FPGAs, connect it. It's a machine that used to cost 10,000 Euros, which is about $50,000 these days. And it computed -- it found a key off the data encryption standard in about 17 days. So if you want to the key in less time, you just buy more COPACOBANAs. Assuming the time of [inaudible] is zero [inaudible]. Now, there was a SHA-1 [inaudible] project which tried to do 2 to the 61 evaluations of SHA-1. Actually, there are debates if they were trying to do 2 to the 63 or 2 to the 61. But this thing didn't finish. After two years of computation, they just stopped the project. They didn't really tell why they did so. So we can guess either there was a problem, they needed more time, or there was a bug in the software. Anything goes. So I think that if you say 2 to the 64 cycles is a practical time complexity, we will all agree on that, right? Yeah. >> The NSA and these guidelines are set to say 2 to the 80 is good enough for this decade and it's not good enough for next decade. >> Orr Dunkelman: I agree, but the problem is that -- I mean, if I will try to convince people, listen, the NSA can read your e-mails. Their assumption is that the NSA can read their e-mails anyway. >> No, that's their guidance for civilians so other guys won't read their e-mails. >> Orr Dunkelman: Still the Russians read your e-mail. No offense. Well, maybe not your e-mail, but everybody are reading everybody e-mails. And especially these days. There was a report recently about the countries which actually participate in cyber warfare, and I think that Russia, China, the U.S., and I think that another country ranked very high having not on defense forces but also offense forces. And I don't have to tell you that North Korea is probably doing such things. And if you're from Estonia or you know somebody from Estonia, you probably know what happened two years I think, a year ago, two years ago ->> Two years. >> Orr Dunkelman: -- some people at Russia were very pissed off at the Estonians. So let's say 2 to the 56. I mean, if you want to play with this bound a bit, 2 to the 64 cycles, which are about 2 to the 56 AES instructions, AES encryptions, just move the line. It doesn't change anything too much. So here is a summary of attacks on the AES-256. So remind you, AES has three variants: 128-bit key which has ten rounds; 192, which has 12 rounds; and 256, which has 14 rounds. So this is the number of rounds of the attack. This is the time complexity. This is the practical borderline. This is the exhaustive search, the certificational crazy model. And this is of course logarithmic scale, because otherwise we will need a few floors extra. So these are results in the single-key model. You can see that the best known attack on up to eight rounds of AES-256 with access to oracle, which encrypts under one key, takes about 2 to the 149 or something like that. It's a paper by Demirci and -- I forgot his name. I'm terribly sorry. From FSE 2008. And the only practical attack on AES-256 is on six rounds by Ferguson, et al., so go and ask Niels from across the street to discuss these results. In the two-key model, or in the four-key model, in the related-key model, the adversary is allowed to probe the oracle under different keys. The best-known attack is [inaudible] ten rounds with time complexity of about 2 to the 170 or 2 to the 173, if I remember correctly. Now, this was the case up until recently, and then Dmitry and Alex and Ivica in one of their papers, and Dmitry and Alex proposed two attacks on AES-256. The first one from CRYPTO 2009 takes about 2 to the 135 with encryptions of their 2 to the 35 related keys. It's crazy, but it's still better than exhaustive search. And the recent ASIACRYPT paper, which Dmitry is going to present next week, is 2 to the 99.5 time complexity and data complexity, and the axis is -- and you use four related keys. So as you can see, attacks only get better. >> So somebody was giving a talk about the attack on full ten rounds in Italy. Anna [inaudible] I think is her name. Full ten round. But I don't know any of the results. Have you heard any of those? >> Orr Dunkelman: I tried to contact here. So, first of all, she claims results, statistical results on AES-128. As far as I understood from the abstract and from the e-mail that she answered me, was this is a single-key model. >> When? >> Orr Dunkelman: But ->> She didn't give you a time. >> Orr Dunkelman: Didn't give anything. She just said that after her talk, which I think is about Thursday or Friday ->> This coming Thursday? >> Orr Dunkelman: Yeah. The 4th of December I think they have -- either the 3rd or the 4th, they're going to have some sort of crypto day in their university, symmetric key day in the university. They're going to discuss the security of block ciphers, she will discuss her results. And then she will publish an abstract of her results. >> [inaudible] or something? >> Orr Dunkelman: Who knows. I just want to -- in that line there was a guy, Claude Gravel, a student of Gil Segav, who gave in Crypto 2009 -- in the rump session he spoke about the fact that he found statistical problems in AES. And we actually went and redid all these experiments. And as far as we know, the experiments failed. So we found some artifacts that are caused by random things. He picked some huge amounts of data, took 4096 chosen plaintexts, and he said that if you encrypt them, the ciphertexts have some problems in the [inaudible] test. And we found that it has current -- we are currently running a second simulation, but the first simulation showed that you get such large biases after the amount of random trials that you expect. He claimed that the biases are larger by a factor of 10. >> [inaudible] >> Orr Dunkelman: He gave a very quick overview [inaudible] the rump session. He was speaking just before me, so I remember it very clear. And then somebody told me -somebody. Adi told me implement. So it's implemented and it's running. So these are the attacks that we presented in the paper. So the first one, for example, is -these are attacks, the yellow once. And I hope that there is no one here who is color blind. I'm terribly sorry if you are. I will try to -- so these two results are in the related-key model. And you can see that, for example, you can attack up to nine rounds with complexity of 2 to the 39. And most of it is data complexity. Once you gather this amount of data, you automatically find the things. And the analyzes time is less than 2 to the 32. Meaning that as an adversary you can sit at home, get the 2 to the 39 data, and then do the analysis in less than a minute. In the related-subkey model, which is slightly different, and we'll discuss the differences, you can attack up to 10 rounds with complexity of 2 to the 45. And even 11 rounds in quasi or semipractical complexity of 2 to the 70. So now it's a bit -- well, it depends if you want to put the line here or here or if you're NSA or someone else. But these are roughly the time complexity. >> [inaudible] >> Orr Dunkelman: Sorry? >> They all require oracle, right? >> Orr Dunkelman: They all require chosen plaintext capabilities or chosen ciphertext capability. And under the related-key or the related-subkey model. So of course we require some knowledge about what's going on, but I will discuss a bit later that it's not as bad as it sounds. Okay. So as I said before, the related-key model was introduced by Knudsen and Biham. And around '96, '97, people started to use related-key differentials. Now, the concept there was to use differences in the key to cancel differences in the differential characteristic. Because you have a key schedule algorithm where a difference is propagating some manner, and then they are injected into the encryption process in some locations, and you can actually use these differences to play with things. Now, there is a set of good relations, which we all agree that will are good relations, and if you will submit a paper it will get accepted and that sort of stuff, so if you have XORs, rotations or additions, there are legitimate requirements. Legitimate relations. Now, if you have "and" or "or" and the use of XORs and additions together, people tend to be very unsupportive of your paper. And the reason for that is that if I give you a related-key, encryption under a key, okay, and encryption under a key K or 1, I give you access to these two oracles, you can easily find the least significant bit of the key. Encrypt under the first one, encrypt under the second one, if the encryption is the same, then the or 1 didn't change anything in the key. So we know the that the least significant bit is 1. Now, there is a guy who presented a paper in workshop on coding and cryptography this year who claimed that he had an attack on all ciphers using related-key. Now, his relations are good in the sense there's a theoretical paper by [inaudible] from [inaudible] from 2003 or 2004 when he -- when they showed that the relation has to be bijective in order for everything to be okay. So you can see for example that and/or is not good and XORs and rotations are good relations. Now, he showed the generic attack that works using a very crazy but bijective relation. So this is not something to be afraid of in the practical sense, but from the theoretical point of view it proves that there is no way to construct a cipher which is secure against related-key attacks, and then in the end it boils down to whether you have this relation in the system that you're using. If you're going to build an S-box and you're going to defend it using a [inaudible] transformation of the T cipher, which has problems with related-key attacks, because you transformed a block cipher into a compression function using the Davies-Meyer mode, you give a very strong hold for the adversary to apply related-key attacks. If you have a [inaudible] system sitting in nowhere sending data with the key which was embedded into the device, the related-key relations are very, very weak. You don't have related-key. Attacks on the sensor there, you might have on the reader at the other end, the receiver at the other end. So it's very tricky, but I'm going to discuss relations which are relatively okay, which are XOR relations. So, for example, if you discuss the probability of differential, you can see that you take the probability when you [inaudible] in plaintext. Encrypt it under the key, so you give it to the oracle, and you take a plaintext with some plaintext difference and you ask what's the probability of getting some specific ciphertext difference. So four round AES. The probability here is bounded by 2 to the minus 110. In related-key differentials, we have encryption of P under the key K and encryption of PX for delta P under the key K X for delta K. And these probabilities tend to be very different when you allow yourself some K differences. Okay. Now, the related-subkey model is a bit crazier. How I define the related-key, let's say relation which is an XOR. You take the key, you XOR it with some constant and you ask for the encryption under the XOR'd key. Now, you can say listen, this is a bit crazy. But reality shows that there are protocols in which you can do that. There are protocols where you can flip bit No. 5 of the key. Real protocols. Now, in the related-subkey model, we take a key K, we look at all the subkeys it generates, XOR subkey 2 and 3, for example, run backwards the key schedule algorithm, find the key that satisfies these subkeys, and then use this key. So this is even a bit crazier, I mean. We don't XOR the key, we XOR a subkey and then run backwards. Now, the key that is generated adheres to the key schedule algorithm. It's not a fault attack where you flip bit No. 5 of subkey 20 and you don't do anything else. You flip the 20, you roll back the key schedule to get the new key, and then you generate all the subkeys according to the key schedule algorithm. So it's not a fault attack and it's something that you have to be very careful. Now, this sounds a bit crazy. And it is. But, on the other hand, when you use it in systems, in real systems -- for example, if you take AES-256 and you put it as a compression function in the Davies-Meyer mode, you have this capability. And AES-256 like any good block cipher should be used in the Davies-Meyer transformation of a block cipher into compression function. And if you do not follow this transformation [inaudible] there is a standard transformation taking a good block cipher, making it into a good compression function. And this transformation allows for related-key, related-subkey, related whatever capabilities you want. And AES-256, for example, due to these results, cannot be transferred into a secure compression function, meaning that there is a problem with AES-256. Okay. >> AES-256 reduced rounds or was it full AES-256? >> Orr Dunkelman: So what Dmitry I guess is going to talk about it tomorrow is the 2 to the 99.5 attack is for the full AES. We discuss here only the ten rounds just because, well, we try to make the attack practical. If you lose the practicality assumption, then you can do more. Okay. So let's go over our results. So trying to -- without entering into too many dirty details, if you take this key difference, now, this is gray, gray, gray, gray, pink, pink. Again, I'm terribly sorry if any of you is color blind. Gray means that all these four bytes have the same difference as these four bytes, which is the same as these four, as these four, and these two pinks are the same. So if you put this into the key schedule algorithm of AES-256, you get gray, gray, gray, gray, two pinks. When you generate this column, you apply S-box on the shifted or rotated column here. Y, by the way, means zero difference. So no difference in the input, no difference of the output of the S-boxes. And the [inaudible] so we get gray, and then this column is the XOR of this column and this column, so when you XOR two things with the same difference, you get something without difference, et cetera, et cetera, et cetera, until you arrive to this round. These are the same bytes, so this byte is equal to this byte. These are equal. This is something new. This red thing is something new. And then you have here blue, blue with green where green is the XOR of blue with pink. Yeah. This is not RGB scheme. This is just a nice way to -- you know, otherwise it will -- it will have alpha zero, alpha one running around here, you will have alpha beta -I think we tried to do it with letters. And then at some point we started to look up in the dictionary for me Greek letters. And at some point we said, okay, colors is okay. So what do we do with it? And this is the related-key differential that we build with it. So if you start with an input difference all gray and you XOR gray key difference, you have no difference, no difference, no difference, two pinks, pinks, these pinks become blue. I'm terribly sorry. Which after the MixColumn transformation becomes gray. And then it's canceled. Now, let's -- trust me, it works. But the interesting part here is the fact that you can have one round with no active S-boxes. This happens with probability 1. If we had only differentials or differential characteristics, it would be 2 to the minus 6. Two rounds with only two active S-boxes rather than five. Three rounds with one, two, and that's it. Two active S-boxes rather than nine. And in total this entire path has probability of 2 to the minus 56 for eight rounds. And you can count the number of active S-boxes, this one, two, three, four, five, six, seven, eight, nine, for eight rounds. The wide trail strategy ensures that you will have at least 50 when you do only standard differential attacks. Now, in our attack we -- or actually in some of the attacks we don't care what happens in these 12 bytes. This where this helps us a bit with the probabilities. This increases 2 to the 9 of 36. Now, these probabilities, and actually this is something which we always hide or assume that all the transitions are independent, that the probability of the entire transition is the product of all the transitions, the probabilities of all transitions. Now, usually you cannot verify it, but when you have a practical attack you can. So we looked at the seven rounds differential characteristic, because the eight round is a bit harder to verify. So we look at the first seven rounds. So stopping everything here. Doesn't matter whether you stop it here or here. This transition is no different and no probability associated with it. And we generated the differential characteristic is expected to have probability 2 to the minus 30. So we took each time 2 to the 30 to random plaintext pairs, which adhere to the input difference [inaudible] checked whether what happens in the output difference. Now, in this scenario, because this is a random process and everything, you do not always get four. Sometimes you get more, sometimes you get less, the number of right pairs, pairs which satisfy. This entire characteristic behaves like [inaudible] with mean value of four. Trust me, it was [inaudible] value of four. I'm not going to cover this. So you can see that in theory, [inaudible] if we run 100 experiments, we should have expected 1.8 tests with no right pairs. We had zero. One right pair is expected in 7.3 of the tests, but we had 10 of these. And you can see that it's relatively okay. I mean, this is not exactly [inaudible] four, but if it's [inaudible] 3.5 or 4.5, we're not going to fight over it. But we can't. It's not the point. Yes. >> What's [inaudible] square [inaudible]? >> I think that in this case you shouldn't see in problem. It should be the same. But, you know, they say should is the name of a fish. In Hebrew should -- the world for should, amoule [phonetic], is the name of a fish. It's type of a carp fish, so... We haven't done this experiment. Actually I would run it afterwards. So in the related-subkey model, what you do is you shift all the differences one round. Now, usually when you shift differences one round, it has no effect. But when you have related-keys thrown in, then the key schedule behaves slightly differently. So that's why we needed the related-subkey assumption, so we push everything one round forward or two rounds forward depending on the exact attack, and then let the related-subkey assumption take care of the issues that form because of the slightly different locations. But all in all, the attack is the same. I mean, if I show this related-key differential to a cryptanalyst, he would say, okay, take 2 to the 36 pairs, for example, we start with 2 to the 36 pairs, you expect to get one right pair with this difference. When you find this pair you know that something's happened, and you can use this to find the key. So starting from here, it's a very standard transformation. I mean, I'm omitting some stuff, but it's not important. Now, we also looked at another attack scenario where the plaintexts are not generated at random plaintexts. Usually chosen plaintexts assume that the adversary choose a plaintext at random and then plaintext XOR something because we need to get randomness from somewhere. This is just for reasons for picking plaintexts at random. Now, we decided to look at counter mode. So counter mode is a way to transform a block cipher into a stream cipher. You initialize a plaintext, which is some IV in the counter. You encrypt it under the key and then you XOR the outcome with the plaintext you're trying to encrypt. You [inaudible] the counter and you do it again and again and again. So we said let's assume we have such a system. The plaintexts are not generated -actually, the attack goes after the stream, not after the plaintext, but after the stream that was generated, in a very nonrandom way, very deterministic way. Is the attack still works? And actually we did the experiment and it works. The same distribution, the same -- everything is the same. So if you have a counter mode embedded in some system, you let it run. Now, we take it, we reinitialize. We have to change the key. This is a related-key attack. We have to change the IV at some point -- well, and change the counter and let it run as well. And if you do it correctly, the attack works. We actually verified it. Now, we also did some attacks when we say, okay, let's assume that we use ECB -- that's a bad way, but -- and we restrict ourselves to cases where all the plaintexts are ASCII characters. And that actually works. The differential still holds. The probabilities work. We also discussed the case where actually the ASCII characters are numeric values. So we have a database encrypting only numbers. And the differentials, because they are -well, they are, still works, so if you're going to have some security engineer saying I want to use AES-128 because it's really, really fast, and I want to make it more secure by extending the key schedule algorithm from AES-128 to AES-256, 256-bit key is better than 128. And the speed of AES-128 is still better. So if you take AES-128 and you fortify it, you actually completely crash the security in the related-key model. I have to be honest. It's in the related-key model. Now, the minimal Hamming weight of the key difference is 24. Now, this is important if you're hardware people and you take the device and you start hitting it until the key flips in the right positions. So the minimal Hamming weight of the key difference [inaudible] is 24. But we didn't try to optimize this a bit too much, so I guess that you can actually improve this a bit. But it's not a real issue, I think. Okay. So this is a summary of the attacks. You can see whether it's a key difference or a subkey difference or related-key or related-subkey. If it's a distinguisher, or you can retrieve some bits of the key or the full key. For example, the nine rounds related-key attack, related-key attack takes 2 to the 39 time, 2 to the 39 data, and 2 to the 32 memory. Well -- yeah. >> So going back to the Davies-Meyer application, so I see you get like in seven key bits. Have you thought about what that translates into Davies-Meyer application for getting collisions? >> Orr Dunkelman: Yeah. You can actually -- there is a set of differentials with start with zero difference and ends with zero difference. >> [inaudible] >> Orr Dunkelman: Yeah. >> [inaudible] >> Orr Dunkelman: Yeah. But you still need to do some -- so what happens is that, for example, if you skip this round, you start here. So with zero difference here. And you stop here, for example, you have zero difference. Now, I'm cheating a bit, because there are more rounds, but then this differential is optimized for probability, not for finding collusions. Dmitry did the work with finding collisions. He found collisions. I think in the CRYPTO paper, in the ASIACRYPT paper, there is a collision for 13 rounds AES-256 in Davies-Meyer mode. >> Is that [inaudible]? >> Orr Dunkelman: I think both are on the ICR reprints. So you can do really fun stuff when you allow yourself crazy models. Okay. So can I take a bit more of your time? >> Yeah, please. >> Orr Dunkelman: Okay. Great. So what are the security implications, because at the end you will have to answer questions from users like does this affect my system. So, for example, if you extend AES-128 to 256-bit key, actually you lose security. And the real issue I think is the fact that the security margins of AES-256 are much smaller than what we expected. AES-256, AES-128, AES-192 were all designed to be secure against related-key attacks. Unfortunately, the two latter, the 192 and 256, are not secure enough. Now, if you ask me if this is going to change anything, the probability of NIST modifying the AES specifications is, well, like the probability that tomorrow I will visit the moon. And the reason for that is that AES was actually widely deployed. Unlike the SHA-1, MD5, SHA-2 situation, where even are trying not to mitigate to something new, AES was very quickly adopted and there are many products using AES, there are low-end devices using AES. AES succeeded so well that if somebody's going to transform AES today to something else it won't happen. Now, you can change, for example, the key schedule algorithm. >> Yeah, that would be the easiest way to do, right, just make that key schedule? >> Orr Dunkelman: Yes. But the problem is that when you make such things, first of all, you don't know what new problems you introduce. But the real issue is that you cannot change it because there are so many devices around the world, and I'm not talking even about the software implementations. There are so many hardware implementations that you cannot change back. That's it. We're going to be with AES for a very long time. And this is very unlikely to start any AES-2 competition, saying, okay, AES was not good, we have to start again. But this won't happen. Because currently they have the SHA-3 competition. And my feeling is that they would rather to roll back time and not have this competition rather than -- >> You think they'd rather not have the SHA competition? >> Orr Dunkelman: I think it causes them more headache than they expected. They got 51 submissions, proper and complete submissions. And now they have to fight with everybody an nobody likes them because his hash function was not selected and even I have issues with them because I had two submissions and one them didn't pass to the second round. I mean, it was a good one. I'm now discussing the one that didn't -- that did pass. Of course it's great. But the one that didn't pass, it should have. >>Is the one that passed the one with the AES [inaudible]? >> Orr Dunkelman: Yes. Of course the one based on the AES round -- and now everybody are a bit -- so now there is a question how this will affect the NIST competition. If you're an AES based, do you have greater chances or lower chances. Now, of course nobody used the key schedule algorithm of AES. AES based means that we took the AES round, we stuck it in several locations and said, okay, we use it as a nonlinear operation, and the reason for that, one of the reasons for that is reuse of hardware and software. The fact that Intel and soon later AMD is going to introduce the AES instruction, so we can do one AES round in six cycles. This is the most diffusion confusion pair cycles that you can think of. Yes. >> So going back to key schedule, so you found this brilliant, you know, set of differences in probability 1. Could you share how you found those and if there's more lurking? Is this optimal? >> Orr Dunkelman: So I was looking at the key schedule algorithm of AES-256 since 2005, 2004, actually. So we had some paper about related-key -- related-key differential boomerang -- never mind. Some sort of [inaudible]. And we were optimizing to have very good differential characteristics, and we assumed that in order to have that you have to have minimal number of active S-boxes in the state. So we did everything we could to make sure that the number of active S-boxes locally was optimal, and we missed the big picture. And then Dmitry came with his papers about the full AES, 256. And then we saw that there is a different way to deal with that, which pays more in the number of active S-boxes locally, but globally you gain a lot. So right now I think that he has an automated tool for finding best differential characteristics, related-key differential characteristics of up to X rounds. But we did this by hand, so, yeah, very old-fashioned sort of an approach to -- but the thing is that I don't believe we try to look -- I know that Dmitry ran his tool, so I know that [inaudible] that the tool can find. But we needed some -- you know, a change in the state of mind trying to optimize globally rather than locally. And there was a change in the state of mind. And when you try to actually -- we tried other things. We actually tried other things, having pink 1 and pink 2, having -- it doesn't work. But I'm not going to commit to the fact that this is the best thing you can find because I'm sure, 99.5 percent sure that if you do something else, you slightly change the model, you push the related-subkey model even further, you say I take Wiki classes, so classes of Wikis. Under the related-subkey model, you can find something which is even stronger. I'm sure it's lurking around somewhere, but we didn't do it that far because I think that it would take time for the community to understand that the related-subkey model is legitimate. Personally, I don't -- I'm not still 100 percent confident it's legitimate. It's legitimate in some senses; it's still weird in some other. Okay. Okay. So let's -- so did we break the full AES with practical complexity, the answer is of course no. And should users be worried, and I think that users should be worried, but I would advise against panicking. I mean, don't phase out AES-256 yet. But if you're already running over the code and making sure that you can change the hash function from MD5 to something else, think about putting the hooks to change AES to something else in the distant future. That's it. And the paper is on ePrint. [applause] >> So I asked this question a lot a few months ago, and I got a consistent answer. I'm wondering if it's still consistent now. If you have your choice in practice AES-128 or 256 ->> Orr Dunkelman: 256. >> Okay. >> Orr Dunkelman: Any day. But this is as long as we're not discussing compression function. This is only if we discuss encryption, counter mode, or elephant. Or a real encryption. If you're going to use crazy [inaudible] if you have tweakable block ciphers, and actually some of the tweakable block ciphers assume that you can put in public tweak, which changes the key schedule algorithm, and they just introduced related-key attacks with [inaudible], usually when you start a paper with related-key attacks, you have to give examples saying related-key attacks are very important because the S-box, because whatever. Now, if you have tweakable block ciphers where people are allowed to change the key schedule from the distance with the public parameter, the related-key model seems a bit more practical. So as long as you're discussing classic mode of operation, no tweakable block ciphers, nothing else, AES-256. Compression functions, tweakable block ciphers, anything else, AES-128. >>Kristin Lauter: Okay. So Orr is actually here all week until Friday. And if anyone would like to go to lunch now with Orr and continue the discussion, please let me know. Let's thank Orr again. [applause]