>> Unknown Speaker: Well, introducers for time immemorial have said the next speaker needs no introduction. And that speaker needs no introduction. Tanja Lange will speak to us about attacking elliptic curve challenges. >> Tanja Lange: Well, thank you very much. Also thanks to all you of for sticking around in spite of no rain so whatever outside. So very pleased to see you here. The beginning of my talk has already been taken care of kind of by Scott Vanstone who was -- in his acceptance speech for the plaque mentioned that they had put out some challenges. So in 1997 Certicom had started marketing elliptic curve cryptography, and we're trying to get the community to help build confidence in elliptic curves. I mean, that's the normal thing if you propose a new cryptosystem. People are skeptical. They don't really see advantages of the new system over the old one. And maybe, maybe there are disadvantages. I mean, there has been more research on RSA at that point than there was on breaking elliptic curve cryptography. And hence came the challenges, and Scott already mentioned that some of those were exercises, and he mentioned a little bit of questioning how hard the exercises would be. That he thought they were easy -- that he thought were hard and I thought they were easier. Anyway, the exercises were solved. And the background, while building confidence in elliptic curves, it's very nice to see the Elliptic Curve Conference being now in its 14th year, us celebrating 25 years of elliptic curves in everything including elliptic curve cryptography. So apparently there is enough confidence or interest in the community. So the marketing of Certicom certainly worked out. So here is the easy challenges. So 79 bit, that's the group size. So if you used a general square root of the group order estimates, then this is the computation of 2 to the 40, which now is very inconveniently done on laptop and even thinking back at what computers were like in 1997, that's not a hard computation. So they're estimating 146 machine days, and they were willing to give away the Handbook of Applied Cryptography, which I guess they had a few on the shelves. Similar prizes were for the next challenges to introduce a name so that P up here means that's a curve over prime field. The 2 means it's over a binary field but the curve is defined over the extension field, and K, which appears down here, stands for a Koblitz curve. So that's a curve defined over the F2 and then considered above a small field. Most of the time the number here indicates that this is a 2 to the 89, F to the 89, or F2 to the 79, or here it's a 79-bit prime. There's some deviations from that on the next page. So these were the easy ones, and they're all done, so different levels of how easy it was, but, yeah, they're done. Then the next ones are more serious challenges. You see suddenly there is real money, like 10,000. And sometimes the naming is a bit hard to understand. So, for instance, 109 does not mean it's 2 to the 109, it's a field that's still 2 to the -- sorry, 108. It's still a field of 2 to the 109, but because it's a Koblitz curve, you lose a bit. So apparently that gave us now 2K-108. And the statement coming with this was saying the 109-bit level 1 challenges are feasible using a very large network of computers. The 131-bit level 1 challenges are expected to be infeasible against realistic software and hardware attacks unless of course a new algorithm for the ECDLP is discovered. And then they were going on with challenges at the level where they actually were thinking about, so in Junfeng's talk he was saying, well, Tanja's needing a team to break the ECC2K-130, so I guess your boss will not be comfortable with this and your boss will ask you why weren't we going for overkill, if you can be just comfortably fine here. Comfortably fine in Certicom money just means 30,000. So at prizes that we have right now, I'm very sure you cannot buy hardware for 30,000 that will break this, which means it's secure. Well, seriously, we did some estimates of how hard it would be to break those, and it's certain of within that budget. Then for the other ones it gets harder, the estimates go up to quite a long time to break it. And the prize money would be quite noticeable. Now, the history of this, I mentioned all the exercises were broken quickly. So in particular Harley was -- Rob Harley was very visible in doing that, so he must now have a few copies of the Handbook of Applied Cryptography on his shelf. But then he also went for the more serious challenges to get a bit of money. And it says Harley, et al., so he usually had a team of people around him led by him, or the first one led by him and Baisley. And then [inaudible] 2000 he lost interest in this and Chris Monico was starting to do the remaining somewhat doable challenges. So as of 2004 all the 109-bit level challenges are done, are solved. And since then not much has happened. And when we looked at this, the last update to the document from Certicom was in 2003, and this still said that all the 131-bit challenges are infeasible. So there wasn't a point when these challenges had been solved or were about to be solved. Okay. And so we had a good time. We had an ECRYPT meeting in Lausanne and we were hanging around with a bunch of people doing, well, [inaudible] research retreat. So we meet with people who have similar interests, and then we say, hmm, with our knowledge, with our abilities around here, what could we do that would be interesting. So we brainstorm a little bit. And it turns out that we have actually around the table quite a few people with expertise and different levels of implementations going all the way from ASIC design, FPGA implementation, so, for instance, Junfeng was around there, and so we had the FPGA expertise, going through -well, here you see a few people, that was actually the next meeting, so it's not entirely the same team, but we thought, yeah, with this group of people we can at least analyze from modern architectures and for hardware-like architectures, like ASIC, how much does it cost to build the ASIC to attack those. And so we wrote a biggish paper analyzing the -- gives estimates of how expensive it would be to attack the next target, the ECC2K-130, then the non-Koblitz binary curve and other binary curves. We wanted to focus on F2 to the N as a base field just because it's similar in arithmetic, so it's also nice over the hardware implementations. And admittedly we're kind of tempted by this infeasible. Just somebody tell us you can't do this. This is still, well, a little bit challenged, I admit it, but it was this feeling of yes, we're going to show them we can do this. Oh, well, the first effect was that the Certicom document now reads: May be within reach. So, well, I guess we had success. We managed to update the Certicom document. We're still hoping for the updated Certicom document where it says: Broken by the VAMPIRE team, but that will come eventually. Another outcome of this is that a bunch of people now have a few more papers on their CV, which as an academic is also very nice. So the first paper that we wrote was as a direct output of this research retreat, appeared at SHARCS. SHARCS stands for Special Hardware Attacks on Cryptographic -- Dan? >>: Systems. >> Tanja Lange: Systems. Okay. Good. So the SHARCS workshop has been going for a few years. It's something that Christof Paar and I started initially. And it's a workshop where we look at actual attacks, like how much does it cost you to break blah, where blah is either something which is a downscaled version, like 130 bit, or something where we don't see we can build it but it would be interesting to know how much it costs you to break RSA 1024. So that type of paper appears at SHARCS. And so our analysis of the elliptic curve challenges was fitting nicely in there. Then after we thought, yeah, maybe within reach, even Certicom now says maybe within reach. Good. So the target, and that's what the rest of this talk focuses on, is this particular curve, which is a Koblitz curve. Looks like this. So it has no X square term here. It's defined over a field of 131 bits. The challenge states the prime group order. So the whole curve has order 4 times this big prime. And then certain -- Certicom says, well, we randomly picked a point by randomly picking X coordinate, checking that there's a Y coordinate, and then multiplying by the cofactor that gave us the P. We know that P has order L. And then it's the same with getting another random point Q. So Certicom says we don't know what the result of the challenge is. So we cannot solve the challenge by breaking into Certicom headquarters. They don't know either. So our challenge now is find the K so that Q equals K times P. So we thought this was a worthy target, not only because of the 20,000 Canadian dollars, but actually for scientific reasons. There are people who do propose using very small finite fields for applications such as, well, RFID. You saw how tempting it would be to save a little bit more area and go down. So there's a proposal for TinyTate, this is a pairings application, and in general RFID. You might say okay if it's just on a box of milk, then the lifetime of this chip is so short that we don't worry about an academic attack and it still takes more than a year to break it. But still they should at least know how weak or how strong it is. Now, going into the details of what this means, to attack this thing we first should understand how it is used in a cryptosystem. So this is the binary curve. So the addition on this Weierstrass curve, so we did move it to Edwards form. Edwards forms are very nice for implementations for the attack. We stuck with the Weierstrass form here. If you add two points that have nothing special with each other to do, then the sum is given with a usual slope formula, and if you have a doubling, then it looks like this. So each of these operations, whether it's an addition or a doubling, costs you one inversion, two multiplications and one square root. If you wanted to use this curve for cryptographies of a constructive applications, you would be interested in avoiding the inversion, going for projective formulas and so on. And [inaudible] mentioned in his talk that there's an explicit formula database. So if you're interested in finding out how to constructively work on these curves, go here for other formulas. For attacking those curves, we need to have unique representatives of each point. So we cannot work with a projective representation. At every moment we want to have one point represented in one way on the computer. Then the reason why Certicom is putting in binary curves and Koblitz curves separately is that Koblitz curves are very interesting for implementations. They have certain features that make them faster. So what Koblitz observed in 1991, and which is why we now call them Koblitz curves, except for Neal, when we gave his talk he was talking about ABC curves, well, it's kind of nice if, you know, anomalous binary curves, he observed that we have an extra structure on this. Usually if I give you a point and tell you find me another point, you go and double the point, triple the point, quadruple the point. This is your way of jumping around. And then you can add those. But you also stay in the cyclic group and just move by computing multiples of the point. On a Koblitz curve, there is a further operation; namely, if you have a point XY, which is defined over the big field, then because these -- all the coefficients of A is in 0 and 1, so it's over the -- F2 over the subfield, if you do the math and put a Q on both sides, you compute this to the power of Q, then nothing happens to the coefficients because they live over the base field and suddenly you have that your point to the power of 2 is also in the field. So here we have another way of jumping around on the curve. You give an XY and then the next point is X square Y square. That's also on the curve. Of if you're in a prime order group, this will be some multiple of your point. So it's nothing that really gets you out of your system. It will be again some Kth multiple. But K is a big number usually. It's not like one, two, three, four, five. And even though it's a big number, this is a very cheap computation. And then to use this cheap computation, well, Koblitz in his initial proposal said it would be good to use this for scalar multiplication. And he had some good idea there. His own version wasn't particularly fast, but then Meyer-Staffelbach came along and shortened it, and then Jerry Solinas had the very nice idea of coming up with sparse representations that are short. They have the same bit length as a usual binary representation, but instead of doing a lock and a scalar, that many doublings, you replace each doubling by one of those sigmas. And sigma is just two squarings. That's a whole lot cheaper. Now, I didn't mention what this was. Sigma is called the Frobenius endomorphism of the curve. You know the Frobenius automorphism of a field if you're in FP to the N. Then this computes the Pth power of your element. And this just extends to the curve, and therefore it's also called the Frobenius. So addition still stays the same, but you can always reduce the number of additions by going for sparse presentations. Sorry. So that's why people are interested in using Koblitz curves, because they're cheaper when you use them in scalar multiplication. They make your protocols run faster. And quite a lot faster because, well, I just made big expensive doublings, be very cheap. Usually you have to pay a price, and there is some drawback in the security of those. And then once you know how much that is, you can balance those. So Certicom says, well, let's just put prime fields, general binary fields and Koblitz curve as the examples here, and then people can attack all of those and can invite designated attacks. The general attacks against the discrete logarithm problem, so you want to find this K which links the P and the Q, the point that you're given. The best we know is [inaudible]. We do not know anything like index calculus. So Neal was referring to his talk on the golden shield which he thought was maybe a bit too flowery, too much -- the golden shield was maybe too much pathos, but it is still we don't know how to get any [inaudible] algorithm done. So here we are with the best known attacks running about the square root of the group order. So in our case, the group order, it was a curve over 131-bit field. We have a cofactor of 4. So the group order has 129 bits. So take the square root of that, so you're down to 64.5 bits. That many 2 to the 65 operations where operations still need to be defined. To get there, well, I was skipping over this factor of 4 because, well, we work in a prime group and, yes, Certicom is fully aware that there is something called the Pohlig-Hellman attack which breaks down any big discrete logarithm problem into a small discrete logarithm problem in the subgroups. So, yes, we focus on prime order groups. Then Baby-Step Giant-Step is the easiest if you want to explain it to somebody. You make one big table with jumping around in big steps and you make a -- then you check by jumping around in the small table, and at every moment you check whether you match. And you set up these tables so that there must be a match. The drawback is, well, you have to store a table with square root the group order size. Pollard's rho method has the same two constants except at running time but avoids the storage. So what Pollard's rho does is it grabs an element, grabs another element, and checks whether they're the same. And you know when you have a bag will L balls, then the birthday paradox tells you after about square root of L you'll get the same one twice. So that's why there's always this square root of L up here in the attack estimates. There are something like multiple target attacks. For instance, you want to attack the best curve and you know that many people are working on that, yes, you can kind of [inaudible] but Certicom just gave us a single challenge. So we do not have to worry about multiple target attacks. Doesn't work here. Then what does Pollard's rho method do. This join [inaudible], it actually does a walk. So if you want to have any knowledge of what it means to draw the same ball twice, you want to relate this to the discrete log problem. So they say we make a walk for every moment we know where we are in terms of multiples of P and Q. And if we jump around and we hit the same point again, then we have two expressions of this scale as how we got there. So for this jumping around there is a function which is called the iteration function F here which jumps from one point PI to another point PI plus 1. And then it's up to us to find a way to make this function look random so that we can use the birthday paradox then to assume that it's -- well, for the usual estimates it's square root of the group order. Here's one way of doing this. Assume I know how I got here. I'm at PI and I know I have AI copies of P, now BI copies of Q. And then there's a way of getting from this PI, whenever I hit this PI I'll do the same step, I go here. And I know how many additional copies of Q and of P got me here. And then I move like a figure on the board game. At every moment I go somewhere and I look what's on this field that tells me I'll go that way, go that way, go this way. And on the side I keep my counters of how many Ps and how many Qs I have. And then after I walk around the whole thing, I come back here. Now I'm here at PJ. So I have my scale as AJ and BJ. But I'm at the same point. Now, assuming that my walk was through the whole thing, it's fairly unlikely that BI and BJ are the same. I will have added something. Well, there's still the group order. There might be the same order, the group order. But usually they are not. So I have this expression there and if they're not the same order, the group order, can I divide by BI minus BJ. And suddenly I know the relation between P and Q. Now, for a while I'm going to talk to you about how I make this board game, how I lay out the -pull a card and now we go this way, now we go this way. If you do this, then this is called an adding walk, because at every point you add something to it. So this point here will tell me go forward by one, go left by three, meaning add P, one copy of P and add three copies of Q. That's the easiest way to make such a board game plan. And it should somehow depend on the spot where I am. If I don't want to draw the whole board game and I don't want to draw the whole board game because it's going to have 2 to the 130 little spots to stand on, so I have to get something -- some property of the spot where I'm standing. So I take the representation of this point, take a hash function and use the output of this hash function to get my new instruction. So this point will have some binary representation. I take this binary representation, run it through a hash function, the hash function then says, ah, this output, this short output now tells me one forward, three to the left. But also this point over there might tell me the same. So I will reuse instructions. But still if I have enough different instructions, this will work relatively random. So here's an example of what it does look like when I print out the whole board game plan. So every little dot here is a spot you could stand, and then, well, there's some drawing that gets you this way, this way. So assume you're standing here, you go up, up, up, up, up, and here, well, then I have to get closer to see, oh, it sends you over here. Now, what happened here is this thing was sending the little figure up, up, up, up, up and down, then didn't go straight here but took this way in the crossing, then took this way in the crossing. Now, if the little board game figure comes around again here, it is on the same spot. It has the same binary representation. So it will get the same instruction. Each time the little board figure comes here, it is told to turn around and go this way. Each time it comes to this crossing, it's turned around here. So it will keep circling forever there. That might be kind of disappointing for the board game figure, but it's actually what I want to have. If I see my little figure circling, and there are ways to figure this out, then I am in a situation where I am here with PI and I'm again here with PJ. That was there on the previous slide. That's jackpot. Then I know how I got here. And I know how I got here in two different ways. I know how I got here in the first round and I know how got here again on the second round. This method is called Pollard's rho method because, well, Pollard suggested it. And if you look at this thing and use a bit of fantasy, it looks like a Greek rho. So hence the name of the method. >>: Tanja? >> Tanja Lange: Yeah. >>: Looks like you have something that isn't a [inaudible]. >> Tanja Lange: Here? >>: Upper left one here. >> Tanja Lange: Here? Well, okay, there's one thing that you don't see how these things are oriented. And honestly I don't see it either from here. You're worried about this clump here? >>: Yeah. >> Tanja Lange: So I'm not sure whether this one would be the loop or this one would be the loop. There's also sometimes stray components, like down here. Sorry. There is none. No, this is because it wasn't a group, this was just laying out the floor plan. >>: Oh, okay. >> Tanja Lange: So this is not that we solved the discrete log plan, this was just taking a 1024 elements and giving each of them a random direction and then you can still observe the same effect. Now, if I want to run this on my computer, then, well, I told you already that we had this ECRYPT meeting with lots of different experts with different platforms. So if I get somebody who knows ASIC, somebody who knows FPGA, and then, well, us knowing quite a bit of software, should we then all start the same computation. Should we then all say, okay, well, now all forces on this curve, let's start computing. Well, if we all start our own personal Pollard's rho computation on, say, N different computers, then good probability you get a square root of N improvement, somewhat like the -- to find this collision sooner. But it's a bad tradeoff. You do N times the work. You have N computers running. But you only get square root of N times a chance. That's where another idea comes in -- that's -- I mean, what I'm reporting so far is history. That's nothing that we came up with. This is called the parallel rho method. And the idea behind this is instead of having each person find this whole rho shape, let's get another way of how we notice that we had the same step. Like I walk a little bit and whenever I hit one of those power boxes here, then I phone home and say, oh, I've reached a power box here and, by the way, my current position in As and Bs is the following. And then if some other computer -- now, there's another board game for you coming around, and it's, say, this power box, it phones how many and says I'm standing here. And after a while from the third or fourth or whatever computer, somebody ends up on this power box, the same that I was before, phones home and then home says, oh, huh, I now have two from different computers who have been at this spot, and then we have another collision and can compute things, assuming that we start from different positions. If we start from the same position, then we didn't gain anything. But if we started from different positions, then we have found a collision. So that's the background of the parallel rho method. I have N different computers at each computer. I send off my little board figures on different positions, I remember where they come from, I know where they're going, and they always phone home. Well, we don't call them power boxes anymore, we call them distinguished points. And the way I recognize a distinguished point is that I look at the binary representation and I say, okay, I take those which have the last 15 bits equal to zero, for instance. I need to look at this anyway because that was the way I decide on my next walk. So I now look at this A to decide where to go next and B to decide whether I'm on a special point. This, if I do the math, gives me a factor of N speedup. It changes what the picture looks like. I now don't have this huge walk ending up in a rho. But I have lots of little blocks. For instance, here, this red guy walks a little bit and then finds something which I draw in bigger dots. Those are the distinguished points. So you see lots of short walks. For instance, somebody walking here gets to here. And then sometimes there are a bit longer walks. This guy will go and go and go until it gets down here. Now, up here we have the collision that we are hoping for. The blue guy was going and hit this black dot. At that point it reported its position. And the orange one from up here was coming down, and as of here, they're working on the same path. They get the same instructions because they're on the same spot. And as soon as I hit a distinguished point, they actually inform the home base of this. And at that point I can stop the computation because -- yeah? >>: On average how many distinguished points do you need? >> Tanja Lange: How many distinguished points I need? Well, there is no single answer to this. For instance, if I -- so it is my choice what I use for the distinguished point criterion. If I use a very restrictive criterion, like I have 130 bits in my representation and I want to distinguish a point only if 65 bits are zero, that's very rare to happen. So we'll take many, many, many steps to get there. So we'll have long walks and very few distinguished points. Alternatively, I can have short walks and many distinguished points. It is still the number of -well, square root of the group order of iterations of steps that I need to do. So that number doesn't change. But there's a little bit of a tradeoff. Each time I hit a distinguished point, we have extra costs for phoning home. So we have networking costs. So we shouldn't have too many of those. Also, the home base has to store all of this. So to give you an example in the attack that we're currently running, we estimated that we will store 850 terabytes. And we checked and, yes, our cluster's big enough we can do this. It is nice to have not too long walks because, well, at this point we are done if we have like a long walk going here and not finding anything, this will not help much for the other one. So we'll have some final wasted computations going on. But in the end, we will have -- this is the total computation. If I divide by this many iterations, this many distinguished points, then that's the walk length. So there's a tradeoff. So -- yeah. >>: [inaudible] how do you draw this picture? >> Tanja Lange: I asked them to draw the picture. >>: [inaudible] every dot is? I don't know -- >> Tanja Lange: Every dot -- well, this is not really a group. This is just assigning to 1024 elements a direction. So these are the elements, and then the error coming out of them gives you the direction. But that's randomly assigned. And then what did you use ->>: There's a package called [inaudible] which brings -- or it tries shorten the errors. So starting from the random graph it tries to shorten [inaudible] dots close to the [inaudible]. >>: So you have a dot for every element [inaudible]. >>: There are little arrows. It's a bit hard to see from back here. But each dot has little arrows. >> Tanja Lange: I mean, each element gives you another element, like gives you a connection, and it's districted. And then you lay this out and you use that graph with program to combine those. Okay. So far for the general parallel rho method now comes the point where Koblitz curves are different from other curves [inaudible] the curves are different from generic groups. Namely, if you look at P and minus P, then they have the same X coordinate. And those of you who sat through the Rump session on Monday and saw Peter Schwabe [inaudible] presentation have already seen that this can be used as a speedup. If I -- instead of saying, oh, I have L points, now consider L over two groups of two points, which identify by having the same X coordinate. Then the probabilities of joining the same element twice have improved by square root of 2 because I have fewer elements of my set. So if I'm able to do double pegs of points, which while looks very likely because the X coordinate are the same, so it's very easy to identify two points by just looking at the X coordinates instead of looking at X and Y, I need to pay a bit of attention so that the walk from P is the same as the walk from minus P. Otherwise I screw up my double pegs. So the whole thing has to be organized on double pegs of points. The usual way of doing this is say we define something like an absolute value of the point. We use the X coordinate and for the order of the two Y coordinates we use one where a certain bit is zero. For instance, on a Koblitz curve you have Y and you have Y plus X. And you know that the X has [inaudible] there's always some way of getting a unique P out of the two. Sometimes there's quite a bit of computation. In the case that Peter was presenting in the odd characteristic prime field, it's very easy. You just want the last bit of Y to be zero. In the [inaudible] case, a little bit more work is necessary. Well, you can say like square root have a minimum, for instance. And then we add to this point the next step. Then what Peter was reporting, it might happen that you run so-called fruitless cycles. So you start at some point PI, you add one of those steps. That gives you PI plus 1. And then PI plus 1 will grab another point. I mean, I do one step. And from here I have a certain possible set of jumps of steps. And I might take the same. And in between by shooting this absolute value I happen to have a negate at this, then I've just gotten back -- hmm? You're negating [inaudible]. So I will keep working like this for a while and then backwards and so on. So at this point the little board game figure is again in a cycle. But this not a happy cycle. This is an annoying one. It's just one step forward, one step backwards. And I don't learn anything from this. I don't get any knowledge of the discrete log problem. So this is why it's called fruitless. It doesn't give me any knowledge about this discrete log problem. You can also do this with four steps around or six steps around and so on. So the two step is the easiest to write on the slide. Otherwise it gets a bit longer to find out what the conditions. But it does happen. So I said before we can detect cycles and we can work out of them. And Peter's -- the whole point of Peter's talk was to say, yes, we know how to handle those guys. At the same time, it's nice if you don't have to. The next part is what else can we do if we have a Koblitz curve and how is it that Koblitz curves make it easier to avoid such trouble. On the Koblitz curves, we, first of all, have another thing. I told you that we have this jumping around by squaring both X and Y. So instead of saying, oh, we have the same X and then plus minus Y, we now have another function which is very easy to compute, just compute in squares, and we can identify all the points which are in the same cycle and are Frobenius. I'm standing here and I'm squaring X and Y, I'm squaring X and Y. Once I've done this 131 times, and I'm [inaudible] over F2 to the 131, I'm back to where I started. So instead of doing double pegs like two points plus and minus P, I can now do pegs which contain two endpoints, plus and minus all of those different Frobenius powers. It's reasonably fast to find out who else is in my cycle, so I can find a unique representative, and I've now divided the size of elements to look at by the two from the plus-minus and by the N. So the schoolbook speedup that I get for attacking a Koblitz curve as opposed to a random curve is a square root of N speedup, where N is the group order. So in this case N was 131, so expect that to be safe about square root of 131, which is why we settled for the Koblitz curve rather than for the general binary curve. The Koblitz curve is about 2 to the 7 easier. Now, how do I get this [inaudible] 3.5 on 2 to the 7. Thanks. Yeah. I was saying 121 was -128 is to the 7, but square root of that. Thanks. Now, how do I build [inaudible] function. I could define a canonical representative. I said before for the plus-minus that's easy. I say, well, least significant bit. In this case I have to do all the squarings, look at them, store all those endpoints and say, ah, that one is the smallest, I use that. I can't do this. It has been proposed. And in the world of square root speedups, this is still a speedup of a square root of N. You won't quite see the speedup of square root of N because it could cost you a lot of effort to compute all those points, to sell those points, to compare those points, but it still would work. And, yeah, squarings are not so bad in [inaudible]. You could do all of that. But there's a nicer one. So Harley observed this in the -- well, used it as an implementation about the same time Gallant, Lambert and Vanstone were writing a paper and pointing it out; namely, if you would do this whole thing in a normal basis representation. So far I've not talked about how I represent this field. But if it was a normal basis representation, then the Hamming weight of X and of X square and of X 2 to the 4, in normal basis all I do is I shift, I do a cyclic shift of the coordinates. It doesn't change the Hamming weight. It doesn't change the number of 1s in the representation. So here's an easy measure that grabs all the points -- [inaudible] it's more points of Hamming weight 3. But all of those points in the Frobenius cycle will have the same Hamming weight. So their suggestion was to use this Hamming weight as a way to determine how to go. So what they suggested was we take the Hamming weight -- well, or do some J which depends on the Hamming weight, and then instead of finding a unique point, we take the Frobenius of the point added to the point. Here's a short calculation to show you that this is compatible with plus and minus. If instead of PI I put in minus PI, well, [inaudible] minus PI here [inaudible] minus PI here, which I can pull out of the parenthesis, so if I have minus PI, I'll also have minus PI plus 1. So I stay in the same class. Similarly, if I have a sigma to the I, this is the J even -- well, doesn't quite -- so in some I here and some other number there, let's call this I prime. So this is not the same -- there's some random power of sigma. I can pull it out of here. So if I take this as a definition of the work, it's well defined on classes. Gallant and Vanstone suggest to use a hash of the Hamming weight and then do the hash in all those between 1N. So in our example [inaudible] between 1 and 131. Harley, he didn't write it up as a general strategy. He used this in his attack on the ECC2K-108. And he's good at [inaudible], so he was very -- he's aware that it costs you a lot to do a full size 108, or in our case 131, squarings at once. And he reduced it to a small set. You might notice that 3 was missing here. So actually the way that he computes the J is he takes the Hamming weight, he reduced it mod 7, and then he adds 2. Which means that you're in the set 2, 3, 4, 5, 6, 7, 8, and then he also replaces 3 by 1. So his software is online. You can see that this is exactly the set of steps that he is doing. We looked at this for our thing now, first of all, 131 is somewhat bigger than 1 over 8. It's much more projected to do this approach than to do an adding work where we have to find a unique representative. So, yes, very quick decision, this is the right approach. But then we don't want to go with the GLV idea of taking all powers. It's nicer to have a restricted set. And there's many ways in which this is nicer. It is nicer for the implementation if you're worried about code size. If you only have to implement seven, as in Harley's, or eight will be our suggestion, different squarings, it's easier than if you have to implement 131 different squarings. Not so much a problem in the general core-to-core implementation. Somewhat a hassle if you're in a Cell architecture where the code size actually competes with your registers, with your memory. So there are reasons even in software architecture that you don't want to have so much choice. And then when you talk to your friend and he's laying out what an FPGA would look like, all the RFID, then, yes, every single different choice costs you area. So you do not want to have 131 different choices. 8 choices, maybe 16 if I'm generous today. But you don't want to have that many. And then also in software what we settled on is approach called bitslicing. And that is something where you can't compute many elements at once. So we're actually doing this whole walk on 131 -- 128 points at the same time. Now, all these 128 points have to somehow follow the same path. There are no branches. Well, I just told you that each point decides on its own. Well, so that means you do the same computation for everybody, so everybody costs as much as the sum of all the computations, and then you pick the right results. You mask somehow that you get the right ones. So it's not that only some would have a huge squaring to do, everybody would have to do the huge squaring in this implementation setting. There's also a little bit of a worry that we are in the situation where we go one step forward, one step back, one step forward, one step back, like having a circle, a fruitless one. If I set on a huge set of coefficients, it might be that I find something which adds up to zero. If I have only very few coefficients, then I can actually do the computation to see that it's not going to happen. So what we did when we were designing the iteration function was we settled on having a small set of different powers. We said we want to have 2 to the, well, 8 different choices, say, and then we looked at eight different choices. We want them in an interval consecutively. And we looked if we start with squaring once, squaring twice till squaring seven times, it is possible to get such a loop. That's not good. But if we do shifting then by 1, no, shifting then by 2, then the shortest combination of all of those powers is very large. We just use the lattice algorithm to compute the lengths of the shortest vector. Now, the shortest vector will have negative coefficients. So it doesn't really tell us exactly what we want to have. But it gives us a lower bound. I mean, if I allow you to use negative coefficients, you have to use these huge ones. Well, if then, say, oh, you're only allowed to use positive ones, it can only be larger. So that's when we say we have a way of avoiding further cycles by doing this. So instead of avoiding them in the sense of taking care of them, what Peter was reporting for the odd characteristic, we just don't run into those by this choice of the iteration function. And then you have a set interval, it's better -- that's just an implementation issue that if I have to do all of them, well, if I have the square root and I have the fourth power, then it's convenient to continue like this. So our definition of the iteration function is we take that Hamming weight, technicality on the side, we notice it's always even, and if we just want to have eight numbers and we compute mod 8, then we would only get four different values. Well, so we divide the Hamming weight by 2 and then compute mod 8, and then to avoid such short combinations, we shift the interval by 3. So here is the Js that we use for computing -- J's powers of Frobenius [inaudible] steps. So what it costs us is to compute the Hamming weight, to do the normal basis representation, actually, we'll turn off -- we actually compute in normal basis representation so it doesn't cost much, then we check for distinguished points where our criterion was that the Hamming weight was less than or equal to 34. So that is the number which gives us the 850 terabyte of storage. If we would have been a bit more generous, collected more points, it would have overflowed the capacity of my cluster. If we would have been less generous, so more restrictive on finding those points, then we would end up having very long runs on the machines, and that is sometimes wasteful. Well, and then for the computation, we have to find our J and we have to compute sigma to the J's power, so 2 to the J of X and to the J of Y, and add the points. Then we sat down and analyzed what it means to have this iteration [inaudible]. I started by saying are we pulling out of our bag with L elements totally randomly. And then I was talking about this board game which still had fairly random layout and then said, okay, well, now if we restrict things we have to make sure that this assignment of steps is still sufficiently random. Because you can only hope for the birthday attack, a birthday bound the square root of L if you have a random walk or if you have a random way of taking elements out of your bag. I mean, I could be doing a very nonrandom walk just in a circle and it wouldn't tell me anything. So we have to analyze how close our walk is to a random walk. So how much we can hope that we get this square root of the group order. Now, our group order, I mentioned before, is about cofactor of 4, get that away, then we hope to get a speedup from using the negation, the double pegs of points, and from the Frobenius, so having 131 times 2 points regarded as one element. We do lose a bit of randomness by restricting to only eight choices for the J. So in some sense the GLV approach of taking a hash function on this gives you more randomness. But we're talking about a percent here. Reason for this, well, if you draw, like there's many, many elements who have about as many 0s or 1s. So Hamming weight, 66, 65 is very, very frequent. Hamming weight 0, okay, in this case, doesn't happen at all. Okay. Hamming weight 2 happens very rarely. 4 happens a little bit more frequently. So you draw this and you get this huge bump in the middle. There's many, many more elements to having an average Hamming weight of -- well, having a Hamming weight around the average than having on the extremes. And also when you compute mod 8, you still see this distribution. Sure, we fold in the extreme values, but it still shows. So it's not totally random. But when we analyzed this, the heuristics is it's less than 7 percent. Well, that's what's in the paper right now. We just did something where we looked at -- well, like second order collisions or anticollisions. Every way that we do something we're sure we're not going to have a good collision. So, okay, then we get 7 percent instead of 6.993 percent. You see, it's actually pretty close to 7. So even the more sophisticated analysis doesn't get us much further away from randomness. That means we have the expanded number of iterations as in the textbook version times 1., and then comes the probability that we have. So we'll need about 2 to the 60.9 iterations. And then, well, makes iterations cheap. So the highlights of what we've been doing is we looked at the randomness of the iteration function. We went through this by saying, okay, we could have chosen more Js, we could have said, okay, instead of doing mod 8 we do mod 16 and have 16 choices, but that will make it more expensive to compute a single iteration and would only slightly increase the randomness. Increase randomness means fewer iterations needed. So if we want to have -- if we have an eye on the total computation, like the number of iterations needed to break it and the time it takes for each iteration, that was the time that we optimized for, and that's where we ended up choosing eight different selections instead of 16, or smaller would be 4, but that would be very nonrandom. Then we have a few more things, like we do not actually remember how we got here. The usual textbook version tells you you have to have your little board figure, have a count of how many AIs and BIs it has collected by now and keep those and -- well, I just have a little scorecard I use so many of these steps or actually compute the integers. We just let our figure run around. We went over where the figure came from. All of the steps are totally deterministic. So the moment that the little figure reaches the power box and phones home, we record where it is. If this power box ever finds a match, we know where those two figures came from. And at that point it's worth doing the computation again and remembering how we got here. But we don't have to bother the FPGA implementation with remembering how they walked here. So he's actually quite happy that we don't tell him, oh, by the way, aside from computing here on the elliptic curve we also have to remember a little bit of how many Js of [inaudible] 8 we've used or, even worse, how much implement some arithmetic model to the group order. You remember this nice big prime, it looks beautiful, and then we know exactly which discrete log we have for each of the points in our database. But we don't need this. Ideally we need this for two. Realistically, we do not store everything. We do store exactly where this little figure came from, and then we store a hash of the result. Otherwise it wouldn't fit in the 850 terabyte. By doing a hash of this, we might have a pseudocollision, so to say. We might have two board figures arriving at the same spot even though it's just the hash values that match. Okay. At that point the server will do some cycles and find out nope, sorry, it wasn't a match. And we're not telling our clients to shut down at that point. They will continue producing numbers. So there was a bit of a tradeoff, so we also have a nice protocol to send those points and there was this moment where we're thinking huh, yeah, so what's the benefits of Eindhoven University for incoming traffic and would they actually notice if I use 50 percent of their traffic. Yeah, we decide the answer was yes, and then we reduce the amount of traffic that it costs. And we've so far been happily under the radar. We've had some issues with broken power supplies or my university deciding it's time to do a test and then they shut us down. Not so nice because we have -- that is absolutely right. Dan is holding up a sign saying 850 gigabyte, not terabyte. Thank you, Dan. I apologize. Yeah, otherwise we would still not be under the radar for the incoming traffic. So this highlight gives you some ideas of what's behind this. So we set off in teams so there's now a bunch of different spin-off projects. So I've been pointing Junfeng a few times because he also is a coauthor on this paper and they have a very nice FPGA implementation which got published this year at the FPL conference. The same Peter who was reporting on the negation and prime fields was one of the authors on the paper of doing [inaudible] on the Cell for this. So we have a bunch of PlayStations. And as Peter Montgomery mentioned, PlayStations are a very convenient or a relatively cheap hardware, and we have been lucky enough to get some bits of the big 200 PlayStation cluster sitting in Lausanne. Then we have sat down on GPUs, so Dan's presentation yesterday ended with saying, oh, actually in number theory you should notice there's a multithread world, there's a world where you have lots of tiny little processes all want to do the same. Well, it's a great platform for this attack. We can't feed them as many computations as we need. I mean, we -- hey, we have a lot to give. We have about 2 to the 60 computations. We can slice them in chunks of a hundred something, we can, ah, if you want a thousand at the same time, no problem. So this is a great application for highly parallel platforms. Little side result was that those platforms don't actually come in an implementation-friendly way. So if you're a C programmer, then you should be happy with using CUDA, which is what the GPUs want to be programmed in. If you're a good C programmer and are used to writing the speed-critical routines in assembly and you look at where is the assembly for GPUs, and it turns out there is none. Okay. So put that project on the side, quickly get a, hey, can't be so hard to design an assembly language for GPUs. Actually, we didn't have to go all the way to designing the assembly language. There was already somebody who has been sitting down with the disassembler to find out what the GPU is doing under certain instructions. That thing was buggy, we fixed it, and when then also wrote a [inaudible], so that stands human friendly assembler version, which that's part of the [inaudible] allocation for you. So we had a few side projects coming out of this. On the math side we improved how we write normal basis and we came up with something we called optimal polynomial basis. So it's a polynomial basis if you have a normal basis for your field, you can still speed up polynomial multiplication in that field. If you want to know more, then we have some Web pages. There's the ECC challenge at info which is still anonymous but you can trust me it's us. If you want to give me a challenge, like can you make the fifth lines as the following, yes, I can do this. We have submitted the paper for a conference which wanted anonymous submissions, and therefore we don't have the names on there yet. But I think it's about time to put names on there and give credit to everybody. We also have a Twitter page. If you go there you can see how often my university screws up and how often the components break down. Like we just had to replace a Western Digital hard drive and got a broken one back, and there's a bunch of papers that came out of this project. Thank you for your attention. [applause] >> Tanja Lange: Rich? >>: How far along are you? When will you finish? >> Tanja Lange: Well, you can go to the page and find out, no, you can go to that page. Here is what [inaudible] we have right now. If you look at this, we're like 10 percent through, which is not so great. Main problem is that we are hoping for the FPGA thing and that one FPGA cluster is worth about four to five Lausannes. >>: Okay. Is that capital L? >> Tanja Lange: Lausanne is capital L, little L is [inaudible] and E is Eindhoven, D is Dublin, J is [inaudible]. We actually got some -- even got some [inaudible] supercomputers, so the graph that we have online is pretty bumpy. I have a few more graphs in the presentation that I didn't get around to doing. That one. So here you can see that we actually put in some effort to make things faster. And the biggest improvement we got, well, they got, is on the FPGAs. So initial estimates, like October 2009, we're 2,000 -- I mean, you wouldn't need 2,000 FPGAs to break this, and now the current estimate is around 600. >>: And is that number of years or something? >> Tanja Lange: No, that many copies of the hardware to run it at that speed which would break it in a year. Then one of those RIVYERAs is 128 FPGAs. So on this scale that is a seeable fraction. And then we have a few applications to supercomputing centers who have lots of idle GPUs, as it turns out, and sometimes actually need to justify their existence. >>: [inaudible]. >> Tanja Lange: Well, we can, so they could just let us run them and we'd be happy. Further questions? >>: If I recall correctly, there was an estimate earlier that you would actually finish spring of this -- this past string. Was there an interesting story there? >> Tanja Lange: Yeah. That's related to the FPGA story. That was scheduled to be ready and running last September. Now, in some sense, the FPGA implementation wasn't ready at that point either. If you look at where it was at that point, it was definitely worth waiting till March. March was when Junfeng visited us and I think that's when he did a lot of intensive work on this and got this huge speedup. So it was definitely worth waiting until then. But since then we're sitting down with the code, and, well, the interesting stories are like, okay, they have the FPGA boards, but they don't have the power supply. Because the power supply is somewhere and it's stuck with the ash cloud. That tells you it was around April. They finally had the FPGA boards and had the power supply and were starting to power the thing on and realized that all but two boards had a little crack because somebody stepped on them. Then apparently they got the boards fixed or got fresh boards and put them in. And this is a small startup company in Kiel and I'm not sure how far they get with their startup if they have a similar attitude each time. But then they had to come from Kiel down to Bornholm and it's now sitting in Bornholm and they're now figuring out how to talk to the machine. And at the beginning not even the test programs from the company worked on the machine. But, well, yes ->>: The FPGA [inaudible] they actually arrived at Bornholm seven days ago. So maybe in a couple of weeks we can have it running and then start something [inaudible]. >> Tanja Lange: But, yeah, I think next spring is a more realistic estimate. But also, if you look at the graphs, there was not much happening at the beginning. Victor. >>: So -- >> Tanja Lange: Do you have computers for us? >>: Hmm? >> Tanja Lange: Do you have some computer time for us? >>: [inaudible] >> Tanja Lange: I should have asked Rich about this too. >>: We're taking donations. >>: That's difficult. No, I was going to say so once you find the answer, what next? >> Tanja Lange: Well, we are a government-sponsored project, so we can't actually take the 20,000. But we are enough people that if all of us buy airplane tickets to pick up the prize in person and have a beer over in Toronto, then the money will be gone. And then over that beer we can discuss whether it's worth going for that next challenge or not. But I think it's not so much reaching the goal but the things that we learned on the side. >>: Sure. I mean, so have you thought about the things that you've learned here and nice things, how they might be applied to other things? >> Tanja Lange: Yeah. So, I mean, some things are very specific to the Koblitz curve attack, so they could be used for 163, and we're certainly going to sit down and rewrite our estimates. But they won't help much like, okay, the specific iteration function, but then the analysis of the iteration function is much more accurate than we had before. Also some of the invitations, like, I mean, the assembly language for GPU we have and we're certainly going to use for the 131 nonKoblitz curve challenge. And, well, maybe it's more interesting to run the prime fields, but we have a better negation map. But if you have any interesting projects, we're listening with open ears, because, well, once machines are running, we have capacity to think again. Toni. >>: How did you come up with the 7 percent for a penalty for not having a random walk? >> Tanja Lange: Okay. So the question was about the analysis slide. You mean the 7 percent compared to the 6.9993, or in general how that happened? >>: In general. >> Tanja Lange: In general. When you look at you're sending somewhere and you have a certain out degree, then at every point -- and you know what the probability is of going that way, then you can compute this over the -- well, it ends up being like sum over 1 over 1 minus P squared, where P is the probability for each of the steps. And that gives you like the first level deviation from randomness, and then what we have now is kind of a second where you have P to the 4th coming in. So there is an appendix over one and a half pages in the paper to explain that formula. But it's a general formula where you can plug in whatever step function you have. So we had pretty good understandings of like additive walks. Now this is more like a multiplicative walk, because we had 1 plus the sigma where sigma is like a scaler to the J. And we have now also that one in there. So it's fairly generic if you have your random walk or not so random walk, you can plug in the probabilities and get the numbers out. >> Unknown Speaker: All right. Well, I couldn't help noticing that Tanja didn't thank the organizers, so maybe we should thank Tanja for the talk and the organizers. [applause]