>> Unknown Speaker: Well, introducers for time immemorial have... no introduction. And that speaker needs no introduction. ...

advertisement
>> Unknown Speaker: Well, introducers for time immemorial have said the next speaker needs
no introduction. And that speaker needs no introduction. Tanja Lange will speak to us about
attacking elliptic curve challenges.
>> Tanja Lange: Well, thank you very much. Also thanks to all you of for sticking around in
spite of no rain so whatever outside. So very pleased to see you here.
The beginning of my talk has already been taken care of kind of by Scott Vanstone who was -- in
his acceptance speech for the plaque mentioned that they had put out some challenges.
So in 1997 Certicom had started marketing elliptic curve cryptography, and we're trying to get
the community to help build confidence in elliptic curves.
I mean, that's the normal thing if you propose a new cryptosystem. People are skeptical. They
don't really see advantages of the new system over the old one. And maybe, maybe there are
disadvantages. I mean, there has been more research on RSA at that point than there was on
breaking elliptic curve cryptography.
And hence came the challenges, and Scott already mentioned that some of those were exercises,
and he mentioned a little bit of questioning how hard the exercises would be. That he thought
they were easy -- that he thought were hard and I thought they were easier.
Anyway, the exercises were solved. And the background, while building confidence in elliptic
curves, it's very nice to see the Elliptic Curve Conference being now in its 14th year, us
celebrating 25 years of elliptic curves in everything including elliptic curve cryptography.
So apparently there is enough confidence or interest in the community. So the marketing of
Certicom certainly worked out.
So here is the easy challenges. So 79 bit, that's the group size. So if you used a general square
root of the group order estimates, then this is the computation of 2 to the 40, which now is very
inconveniently done on laptop and even thinking back at what computers were like in 1997,
that's not a hard computation.
So they're estimating 146 machine days, and they were willing to give away the Handbook of
Applied Cryptography, which I guess they had a few on the shelves.
Similar prizes were for the next challenges to introduce a name so that P up here means that's a
curve over prime field. The 2 means it's over a binary field but the curve is defined over the
extension field, and K, which appears down here, stands for a Koblitz curve.
So that's a curve defined over the F2 and then considered above a small field. Most of the time
the number here indicates that this is a 2 to the 89, F to the 89, or F2 to the 79, or here it's a
79-bit prime.
There's some deviations from that on the next page. So these were the easy ones, and they're all
done, so different levels of how easy it was, but, yeah, they're done.
Then the next ones are more serious challenges. You see suddenly there is real money, like
10,000. And sometimes the naming is a bit hard to understand. So, for instance, 109 does not
mean it's 2 to the 109, it's a field that's still 2 to the -- sorry, 108. It's still a field of 2 to the 109,
but because it's a Koblitz curve, you lose a bit. So apparently that gave us now 2K-108.
And the statement coming with this was saying the 109-bit level 1 challenges are feasible using a
very large network of computers. The 131-bit level 1 challenges are expected to be infeasible
against realistic software and hardware attacks unless of course a new algorithm for the ECDLP
is discovered.
And then they were going on with challenges at the level where they actually were thinking
about, so in Junfeng's talk he was saying, well, Tanja's needing a team to break the ECC2K-130,
so I guess your boss will not be comfortable with this and your boss will ask you why weren't we
going for overkill, if you can be just comfortably fine here.
Comfortably fine in Certicom money just means 30,000.
So at prizes that we have right now, I'm very sure you cannot buy hardware for 30,000 that will
break this, which means it's secure.
Well, seriously, we did some estimates of how hard it would be to break those, and it's certain of
within that budget. Then for the other ones it gets harder, the estimates go up to quite a long
time to break it. And the prize money would be quite noticeable.
Now, the history of this, I mentioned all the exercises were broken quickly. So in particular
Harley was -- Rob Harley was very visible in doing that, so he must now have a few copies of
the Handbook of Applied Cryptography on his shelf.
But then he also went for the more serious challenges to get a bit of money. And it says Harley,
et al., so he usually had a team of people around him led by him, or the first one led by him and
Baisley. And then [inaudible] 2000 he lost interest in this and Chris Monico was starting to do
the remaining somewhat doable challenges.
So as of 2004 all the 109-bit level challenges are done, are solved. And since then not much has
happened.
And when we looked at this, the last update to the document from Certicom was in 2003, and
this still said that all the 131-bit challenges are infeasible. So there wasn't a point when these
challenges had been solved or were about to be solved.
Okay. And so we had a good time. We had an ECRYPT meeting in Lausanne and we were
hanging around with a bunch of people doing, well, [inaudible] research retreat. So we meet
with people who have similar interests, and then we say, hmm, with our knowledge, with our
abilities around here, what could we do that would be interesting. So we brainstorm a little bit.
And it turns out that we have actually around the table quite a few people with expertise and
different levels of implementations going all the way from ASIC design, FPGA implementation,
so, for instance, Junfeng was around there, and so we had the FPGA expertise, going through -well, here you see a few people, that was actually the next meeting, so it's not entirely the same
team, but we thought, yeah, with this group of people we can at least analyze from modern
architectures and for hardware-like architectures, like ASIC, how much does it cost to build the
ASIC to attack those.
And so we wrote a biggish paper analyzing the -- gives estimates of how expensive it would be
to attack the next target, the ECC2K-130, then the non-Koblitz binary curve and other binary
curves. We wanted to focus on F2 to the N as a base field just because it's similar in arithmetic,
so it's also nice over the hardware implementations.
And admittedly we're kind of tempted by this infeasible. Just somebody tell us you can't do this.
This is still, well, a little bit challenged, I admit it, but it was this feeling of yes, we're going to
show them we can do this.
Oh, well, the first effect was that the Certicom document now reads: May be within reach. So,
well, I guess we had success. We managed to update the Certicom document. We're still hoping
for the updated Certicom document where it says: Broken by the VAMPIRE team, but that will
come eventually.
Another outcome of this is that a bunch of people now have a few more papers on their CV,
which as an academic is also very nice. So the first paper that we wrote was as a direct output of
this research retreat, appeared at SHARCS. SHARCS stands for Special Hardware Attacks on
Cryptographic -- Dan?
>>: Systems.
>> Tanja Lange: Systems. Okay. Good.
So the SHARCS workshop has been going for a few years. It's something that Christof Paar and
I started initially. And it's a workshop where we look at actual attacks, like how much does it
cost you to break blah, where blah is either something which is a downscaled version, like 130
bit, or something where we don't see we can build it but it would be interesting to know how
much it costs you to break RSA 1024. So that type of paper appears at SHARCS. And so our
analysis of the elliptic curve challenges was fitting nicely in there.
Then after we thought, yeah, maybe within reach, even Certicom now says maybe within reach.
Good.
So the target, and that's what the rest of this talk focuses on, is this particular curve, which is a
Koblitz curve. Looks like this. So it has no X square term here. It's defined over a field of 131
bits. The challenge states the prime group order. So the whole curve has order 4 times this big
prime.
And then certain -- Certicom says, well, we randomly picked a point by randomly picking X
coordinate, checking that there's a Y coordinate, and then multiplying by the cofactor that gave
us the P. We know that P has order L. And then it's the same with getting another random point
Q.
So Certicom says we don't know what the result of the challenge is. So we cannot solve the
challenge by breaking into Certicom headquarters. They don't know either.
So our challenge now is find the K so that Q equals K times P. So we thought this was a worthy
target, not only because of the 20,000 Canadian dollars, but actually for scientific reasons. There
are people who do propose using very small finite fields for applications such as, well, RFID.
You saw how tempting it would be to save a little bit more area and go down.
So there's a proposal for TinyTate, this is a pairings application, and in general RFID. You
might say okay if it's just on a box of milk, then the lifetime of this chip is so short that we don't
worry about an academic attack and it still takes more than a year to break it. But still they
should at least know how weak or how strong it is.
Now, going into the details of what this means, to attack this thing we first should understand
how it is used in a cryptosystem.
So this is the binary curve. So the addition on this Weierstrass curve, so we did move it to
Edwards form. Edwards forms are very nice for implementations for the attack. We stuck with
the Weierstrass form here.
If you add two points that have nothing special with each other to do, then the sum is given with
a usual slope formula, and if you have a doubling, then it looks like this.
So each of these operations, whether it's an addition or a doubling, costs you one inversion, two
multiplications and one square root.
If you wanted to use this curve for cryptographies of a constructive applications, you would be
interested in avoiding the inversion, going for projective formulas and so on. And [inaudible]
mentioned in his talk that there's an explicit formula database. So if you're interested in finding
out how to constructively work on these curves, go here for other formulas.
For attacking those curves, we need to have unique representatives of each point. So we cannot
work with a projective representation. At every moment we want to have one point represented
in one way on the computer.
Then the reason why Certicom is putting in binary curves and Koblitz curves separately is that
Koblitz curves are very interesting for implementations. They have certain features that make
them faster.
So what Koblitz observed in 1991, and which is why we now call them Koblitz curves, except
for Neal, when we gave his talk he was talking about ABC curves, well, it's kind of nice if, you
know, anomalous binary curves, he observed that we have an extra structure on this.
Usually if I give you a point and tell you find me another point, you go and double the point,
triple the point, quadruple the point. This is your way of jumping around. And then you can add
those. But you also stay in the cyclic group and just move by computing multiples of the point.
On a Koblitz curve, there is a further operation; namely, if you have a point XY, which is
defined over the big field, then because these -- all the coefficients of A is in 0 and 1, so it's over
the -- F2 over the subfield, if you do the math and put a Q on both sides, you compute this to the
power of Q, then nothing happens to the coefficients because they live over the base field and
suddenly you have that your point to the power of 2 is also in the field.
So here we have another way of jumping around on the curve. You give an XY and then the
next point is X square Y square. That's also on the curve. Of if you're in a prime order group,
this will be some multiple of your point. So it's nothing that really gets you out of your system.
It will be again some Kth multiple. But K is a big number usually. It's not like one, two, three,
four, five. And even though it's a big number, this is a very cheap computation.
And then to use this cheap computation, well, Koblitz in his initial proposal said it would be
good to use this for scalar multiplication. And he had some good idea there. His own version
wasn't particularly fast, but then Meyer-Staffelbach came along and shortened it, and then Jerry
Solinas had the very nice idea of coming up with sparse representations that are short. They
have the same bit length as a usual binary representation, but instead of doing a lock and a scalar,
that many doublings, you replace each doubling by one of those sigmas. And sigma is just two
squarings. That's a whole lot cheaper.
Now, I didn't mention what this was. Sigma is called the Frobenius endomorphism of the curve.
You know the Frobenius automorphism of a field if you're in FP to the N. Then this computes
the Pth power of your element. And this just extends to the curve, and therefore it's also called
the Frobenius.
So addition still stays the same, but you can always reduce the number of additions by going for
sparse presentations.
Sorry. So that's why people are interested in using Koblitz curves, because they're cheaper when
you use them in scalar multiplication. They make your protocols run faster. And quite a lot
faster because, well, I just made big expensive doublings, be very cheap. Usually you have to
pay a price, and there is some drawback in the security of those. And then once you know how
much that is, you can balance those.
So Certicom says, well, let's just put prime fields, general binary fields and Koblitz curve as the
examples here, and then people can attack all of those and can invite designated attacks.
The general attacks against the discrete logarithm problem, so you want to find this K which
links the P and the Q, the point that you're given. The best we know is [inaudible]. We do not
know anything like index calculus.
So Neal was referring to his talk on the golden shield which he thought was maybe a bit too
flowery, too much -- the golden shield was maybe too much pathos, but it is still we don't know
how to get any [inaudible] algorithm done.
So here we are with the best known attacks running about the square root of the group order. So
in our case, the group order, it was a curve over 131-bit field. We have a cofactor of 4. So the
group order has 129 bits. So take the square root of that, so you're down to 64.5 bits. That many
2 to the 65 operations where operations still need to be defined.
To get there, well, I was skipping over this factor of 4 because, well, we work in a prime group
and, yes, Certicom is fully aware that there is something called the Pohlig-Hellman attack which
breaks down any big discrete logarithm problem into a small discrete logarithm problem in the
subgroups. So, yes, we focus on prime order groups.
Then Baby-Step Giant-Step is the easiest if you want to explain it to somebody. You make one
big table with jumping around in big steps and you make a -- then you check by jumping around
in the small table, and at every moment you check whether you match. And you set up these
tables so that there must be a match. The drawback is, well, you have to store a table with square
root the group order size.
Pollard's rho method has the same two constants except at running time but avoids the storage.
So what Pollard's rho does is it grabs an element, grabs another element, and checks whether
they're the same. And you know when you have a bag will L balls, then the birthday paradox
tells you after about square root of L you'll get the same one twice.
So that's why there's always this square root of L up here in the attack estimates. There are
something like multiple target attacks. For instance, you want to attack the best curve and you
know that many people are working on that, yes, you can kind of [inaudible] but Certicom just
gave us a single challenge. So we do not have to worry about multiple target attacks. Doesn't
work here.
Then what does Pollard's rho method do. This join [inaudible], it actually does a walk. So if you
want to have any knowledge of what it means to draw the same ball twice, you want to relate this
to the discrete log problem.
So they say we make a walk for every moment we know where we are in terms of multiples of P
and Q. And if we jump around and we hit the same point again, then we have two expressions of
this scale as how we got there. So for this jumping around there is a function which is called the
iteration function F here which jumps from one point PI to another point PI plus 1.
And then it's up to us to find a way to make this function look random so that we can use the
birthday paradox then to assume that it's -- well, for the usual estimates it's square root of the
group order.
Here's one way of doing this. Assume I know how I got here. I'm at PI and I know I have AI
copies of P, now BI copies of Q. And then there's a way of getting from this PI, whenever I hit
this PI I'll do the same step, I go here. And I know how many additional copies of Q and of P
got me here.
And then I move like a figure on the board game. At every moment I go somewhere and I look
what's on this field that tells me I'll go that way, go that way, go this way. And on the side I keep
my counters of how many Ps and how many Qs I have. And then after I walk around the whole
thing, I come back here. Now I'm here at PJ. So I have my scale as AJ and BJ.
But I'm at the same point. Now, assuming that my walk was through the whole thing, it's fairly
unlikely that BI and BJ are the same. I will have added something. Well, there's still the group
order. There might be the same order, the group order. But usually they are not. So I have this
expression there and if they're not the same order, the group order, can I divide by BI minus BJ.
And suddenly I know the relation between P and Q.
Now, for a while I'm going to talk to you about how I make this board game, how I lay out the -pull a card and now we go this way, now we go this way. If you do this, then this is called an
adding walk, because at every point you add something to it. So this point here will tell me go
forward by one, go left by three, meaning add P, one copy of P and add three copies of Q.
That's the easiest way to make such a board game plan. And it should somehow depend on the
spot where I am. If I don't want to draw the whole board game and I don't want to draw the
whole board game because it's going to have 2 to the 130 little spots to stand on, so I have to get
something -- some property of the spot where I'm standing. So I take the representation of this
point, take a hash function and use the output of this hash function to get my new instruction.
So this point will have some binary representation. I take this binary representation, run it
through a hash function, the hash function then says, ah, this output, this short output now tells
me one forward, three to the left.
But also this point over there might tell me the same. So I will reuse instructions. But still if I
have enough different instructions, this will work relatively random.
So here's an example of what it does look like when I print out the whole board game plan.
So every little dot here is a spot you could stand, and then, well, there's some drawing that gets
you this way, this way. So assume you're standing here, you go up, up, up, up, up, and here,
well, then I have to get closer to see, oh, it sends you over here.
Now, what happened here is this thing was sending the little figure up, up, up, up, up and down,
then didn't go straight here but took this way in the crossing, then took this way in the crossing.
Now, if the little board game figure comes around again here, it is on the same spot. It has the
same binary representation. So it will get the same instruction. Each time the little board figure
comes here, it is told to turn around and go this way. Each time it comes to this crossing, it's
turned around here.
So it will keep circling forever there. That might be kind of disappointing for the board game
figure, but it's actually what I want to have.
If I see my little figure circling, and there are ways to figure this out, then I am in a situation
where I am here with PI and I'm again here with PJ. That was there on the previous slide. That's
jackpot. Then I know how I got here. And I know how I got here in two different ways. I know
how I got here in the first round and I know how got here again on the second round.
This method is called Pollard's rho method because, well, Pollard suggested it. And if you look
at this thing and use a bit of fantasy, it looks like a Greek rho. So hence the name of the method.
>>: Tanja?
>> Tanja Lange: Yeah.
>>: Looks like you have something that isn't a [inaudible].
>> Tanja Lange: Here?
>>: Upper left one here.
>> Tanja Lange: Here? Well, okay, there's one thing that you don't see how these things are
oriented. And honestly I don't see it either from here. You're worried about this clump here?
>>: Yeah.
>> Tanja Lange: So I'm not sure whether this one would be the loop or this one would be the
loop. There's also sometimes stray components, like down here. Sorry. There is none. No, this
is because it wasn't a group, this was just laying out the floor plan.
>>: Oh, okay.
>> Tanja Lange: So this is not that we solved the discrete log plan, this was just taking a 1024
elements and giving each of them a random direction and then you can still observe the same
effect.
Now, if I want to run this on my computer, then, well, I told you already that we had this
ECRYPT meeting with lots of different experts with different platforms. So if I get somebody
who knows ASIC, somebody who knows FPGA, and then, well, us knowing quite a bit of
software, should we then all start the same computation. Should we then all say, okay, well, now
all forces on this curve, let's start computing.
Well, if we all start our own personal Pollard's rho computation on, say, N different computers,
then good probability you get a square root of N improvement, somewhat like the -- to find this
collision sooner. But it's a bad tradeoff. You do N times the work. You have N computers
running. But you only get square root of N times a chance.
That's where another idea comes in -- that's -- I mean, what I'm reporting so far is history. That's
nothing that we came up with. This is called the parallel rho method. And the idea behind this is
instead of having each person find this whole rho shape, let's get another way of how we notice
that we had the same step.
Like I walk a little bit and whenever I hit one of those power boxes here, then I phone home and
say, oh, I've reached a power box here and, by the way, my current position in As and Bs is the
following.
And then if some other computer -- now, there's another board game for you coming around, and
it's, say, this power box, it phones how many and says I'm standing here. And after a while from
the third or fourth or whatever computer, somebody ends up on this power box, the same that I
was before, phones home and then home says, oh, huh, I now have two from different computers
who have been at this spot, and then we have another collision and can compute things, assuming
that we start from different positions. If we start from the same position, then we didn't gain
anything. But if we started from different positions, then we have found a collision.
So that's the background of the parallel rho method. I have N different computers at each
computer. I send off my little board figures on different positions, I remember where they come
from, I know where they're going, and they always phone home.
Well, we don't call them power boxes anymore, we call them distinguished points. And the way
I recognize a distinguished point is that I look at the binary representation and I say, okay, I take
those which have the last 15 bits equal to zero, for instance. I need to look at this anyway
because that was the way I decide on my next walk. So I now look at this A to decide where to
go next and B to decide whether I'm on a special point.
This, if I do the math, gives me a factor of N speedup. It changes what the picture looks like. I
now don't have this huge walk ending up in a rho. But I have lots of little blocks. For instance,
here, this red guy walks a little bit and then finds something which I draw in bigger dots. Those
are the distinguished points.
So you see lots of short walks. For instance, somebody walking here gets to here. And then
sometimes there are a bit longer walks. This guy will go and go and go until it gets down here.
Now, up here we have the collision that we are hoping for. The blue guy was going and hit this
black dot. At that point it reported its position. And the orange one from up here was coming
down, and as of here, they're working on the same path. They get the same instructions because
they're on the same spot. And as soon as I hit a distinguished point, they actually inform the
home base of this. And at that point I can stop the computation because -- yeah?
>>: On average how many distinguished points do you need?
>> Tanja Lange: How many distinguished points I need? Well, there is no single answer to this.
For instance, if I -- so it is my choice what I use for the distinguished point criterion. If I use a
very restrictive criterion, like I have 130 bits in my representation and I want to distinguish a
point only if 65 bits are zero, that's very rare to happen.
So we'll take many, many, many steps to get there. So we'll have long walks and very few
distinguished points.
Alternatively, I can have short walks and many distinguished points. It is still the number of -well, square root of the group order of iterations of steps that I need to do. So that number
doesn't change.
But there's a little bit of a tradeoff. Each time I hit a distinguished point, we have extra costs for
phoning home. So we have networking costs. So we shouldn't have too many of those.
Also, the home base has to store all of this. So to give you an example in the attack that we're
currently running, we estimated that we will store 850 terabytes. And we checked and, yes, our
cluster's big enough we can do this.
It is nice to have not too long walks because, well, at this point we are done if we have like a
long walk going here and not finding anything, this will not help much for the other one. So
we'll have some final wasted computations going on.
But in the end, we will have -- this is the total computation. If I divide by this many iterations,
this many distinguished points, then that's the walk length. So there's a tradeoff.
So -- yeah.
>>: [inaudible] how do you draw this picture?
>> Tanja Lange: I asked them to draw the picture.
>>: [inaudible] every dot is? I don't know --
>> Tanja Lange: Every dot -- well, this is not really a group. This is just assigning to 1024
elements a direction. So these are the elements, and then the error coming out of them gives you
the direction. But that's randomly assigned. And then what did you use ->>: There's a package called [inaudible] which brings -- or it tries shorten the errors. So starting
from the random graph it tries to shorten [inaudible] dots close to the [inaudible].
>>: So you have a dot for every element [inaudible].
>>: There are little arrows. It's a bit hard to see from back here. But each dot has little arrows.
>> Tanja Lange: I mean, each element gives you another element, like gives you a connection,
and it's districted. And then you lay this out and you use that graph with program to combine
those.
Okay. So far for the general parallel rho method now comes the point where Koblitz curves are
different from other curves [inaudible] the curves are different from generic groups. Namely, if
you look at P and minus P, then they have the same X coordinate.
And those of you who sat through the Rump session on Monday and saw Peter Schwabe
[inaudible] presentation have already seen that this can be used as a speedup.
If I -- instead of saying, oh, I have L points, now consider L over two groups of two points,
which identify by having the same X coordinate. Then the probabilities of joining the same
element twice have improved by square root of 2 because I have fewer elements of my set.
So if I'm able to do double pegs of points, which while looks very likely because the X
coordinate are the same, so it's very easy to identify two points by just looking at the X
coordinates instead of looking at X and Y, I need to pay a bit of attention so that the walk from P
is the same as the walk from minus P. Otherwise I screw up my double pegs. So the whole
thing has to be organized on double pegs of points.
The usual way of doing this is say we define something like an absolute value of the point. We
use the X coordinate and for the order of the two Y coordinates we use one where a certain bit is
zero. For instance, on a Koblitz curve you have Y and you have Y plus X. And you know that
the X has [inaudible] there's always some way of getting a unique P out of the two.
Sometimes there's quite a bit of computation. In the case that Peter was presenting in the odd
characteristic prime field, it's very easy. You just want the last bit of Y to be zero. In the
[inaudible] case, a little bit more work is necessary. Well, you can say like square root have a
minimum, for instance. And then we add to this point the next step.
Then what Peter was reporting, it might happen that you run so-called fruitless cycles. So you
start at some point PI, you add one of those steps. That gives you PI plus 1. And then PI plus 1
will grab another point.
I mean, I do one step. And from here I have a certain possible set of jumps of steps. And I
might take the same. And in between by shooting this absolute value I happen to have a negate
at this, then I've just gotten back -- hmm? You're negating [inaudible]. So I will keep working
like this for a while and then backwards and so on.
So at this point the little board game figure is again in a cycle. But this not a happy cycle. This
is an annoying one. It's just one step forward, one step backwards. And I don't learn anything
from this. I don't get any knowledge of the discrete log problem. So this is why it's called
fruitless. It doesn't give me any knowledge about this discrete log problem.
You can also do this with four steps around or six steps around and so on. So the two step is the
easiest to write on the slide. Otherwise it gets a bit longer to find out what the conditions. But it
does happen.
So I said before we can detect cycles and we can work out of them. And Peter's -- the whole
point of Peter's talk was to say, yes, we know how to handle those guys. At the same time, it's
nice if you don't have to.
The next part is what else can we do if we have a Koblitz curve and how is it that Koblitz curves
make it easier to avoid such trouble.
On the Koblitz curves, we, first of all, have another thing. I told you that we have this jumping
around by squaring both X and Y. So instead of saying, oh, we have the same X and then plus
minus Y, we now have another function which is very easy to compute, just compute in squares,
and we can identify all the points which are in the same cycle and are Frobenius.
I'm standing here and I'm squaring X and Y, I'm squaring X and Y. Once I've done this 131
times, and I'm [inaudible] over F2 to the 131, I'm back to where I started.
So instead of doing double pegs like two points plus and minus P, I can now do pegs which
contain two endpoints, plus and minus all of those different Frobenius powers. It's reasonably
fast to find out who else is in my cycle, so I can find a unique representative, and I've now
divided the size of elements to look at by the two from the plus-minus and by the N.
So the schoolbook speedup that I get for attacking a Koblitz curve as opposed to a random curve
is a square root of N speedup, where N is the group order.
So in this case N was 131, so expect that to be safe about square root of 131, which is why we
settled for the Koblitz curve rather than for the general binary curve. The Koblitz curve is about
2 to the 7 easier.
Now, how do I get this [inaudible] 3.5 on 2 to the 7. Thanks. Yeah. I was saying 121 was -128 is to the 7, but square root of that. Thanks.
Now, how do I build [inaudible] function. I could define a canonical representative. I said
before for the plus-minus that's easy. I say, well, least significant bit. In this case I have to do all
the squarings, look at them, store all those endpoints and say, ah, that one is the smallest, I use
that.
I can't do this. It has been proposed. And in the world of square root speedups, this is still a
speedup of a square root of N. You won't quite see the speedup of square root of N because it
could cost you a lot of effort to compute all those points, to sell those points, to compare those
points, but it still would work.
And, yeah, squarings are not so bad in [inaudible]. You could do all of that. But there's a nicer
one. So Harley observed this in the -- well, used it as an implementation about the same time
Gallant, Lambert and Vanstone were writing a paper and pointing it out; namely, if you would do
this whole thing in a normal basis representation.
So far I've not talked about how I represent this field. But if it was a normal basis representation,
then the Hamming weight of X and of X square and of X 2 to the 4, in normal basis all I do is I
shift, I do a cyclic shift of the coordinates. It doesn't change the Hamming weight. It doesn't
change the number of 1s in the representation.
So here's an easy measure that grabs all the points -- [inaudible] it's more points of Hamming
weight 3. But all of those points in the Frobenius cycle will have the same Hamming weight. So
their suggestion was to use this Hamming weight as a way to determine how to go.
So what they suggested was we take the Hamming weight -- well, or do some J which depends
on the Hamming weight, and then instead of finding a unique point, we take the Frobenius of the
point added to the point.
Here's a short calculation to show you that this is compatible with plus and minus. If instead of
PI I put in minus PI, well, [inaudible] minus PI here [inaudible] minus PI here, which I can pull
out of the parenthesis, so if I have minus PI, I'll also have minus PI plus 1. So I stay in the same
class.
Similarly, if I have a sigma to the I, this is the J even -- well, doesn't quite -- so in some I here
and some other number there, let's call this I prime. So this is not the same -- there's some
random power of sigma. I can pull it out of here. So if I take this as a definition of the work, it's
well defined on classes.
Gallant and Vanstone suggest to use a hash of the Hamming weight and then do the hash in all
those between 1N. So in our example [inaudible] between 1 and 131.
Harley, he didn't write it up as a general strategy. He used this in his attack on the ECC2K-108.
And he's good at [inaudible], so he was very -- he's aware that it costs you a lot to do a full size
108, or in our case 131, squarings at once. And he reduced it to a small set.
You might notice that 3 was missing here. So actually the way that he computes the J is he takes
the Hamming weight, he reduced it mod 7, and then he adds 2. Which means that you're in the
set 2, 3, 4, 5, 6, 7, 8, and then he also replaces 3 by 1. So his software is online. You can see
that this is exactly the set of steps that he is doing.
We looked at this for our thing now, first of all, 131 is somewhat bigger than 1 over 8. It's much
more projected to do this approach than to do an adding work where we have to find a unique
representative.
So, yes, very quick decision, this is the right approach. But then we don't want to go with the
GLV idea of taking all powers. It's nicer to have a restricted set. And there's many ways in
which this is nicer.
It is nicer for the implementation if you're worried about code size. If you only have to
implement seven, as in Harley's, or eight will be our suggestion, different squarings, it's easier
than if you have to implement 131 different squarings.
Not so much a problem in the general core-to-core implementation. Somewhat a hassle if you're
in a Cell architecture where the code size actually competes with your registers, with your
memory. So there are reasons even in software architecture that you don't want to have so much
choice.
And then when you talk to your friend and he's laying out what an FPGA would look like, all the
RFID, then, yes, every single different choice costs you area. So you do not want to have 131
different choices. 8 choices, maybe 16 if I'm generous today. But you don't want to have that
many.
And then also in software what we settled on is approach called bitslicing. And that is something
where you can't compute many elements at once. So we're actually doing this whole walk on
131 -- 128 points at the same time. Now, all these 128 points have to somehow follow the same
path. There are no branches. Well, I just told you that each point decides on its own.
Well, so that means you do the same computation for everybody, so everybody costs as much as
the sum of all the computations, and then you pick the right results. You mask somehow that
you get the right ones.
So it's not that only some would have a huge squaring to do, everybody would have to do the
huge squaring in this implementation setting.
There's also a little bit of a worry that we are in the situation where we go one step forward, one
step back, one step forward, one step back, like having a circle, a fruitless one.
If I set on a huge set of coefficients, it might be that I find something which adds up to zero. If I
have only very few coefficients, then I can actually do the computation to see that it's not going
to happen.
So what we did when we were designing the iteration function was we settled on having a small
set of different powers. We said we want to have 2 to the, well, 8 different choices, say, and then
we looked at eight different choices. We want them in an interval consecutively. And we
looked if we start with squaring once, squaring twice till squaring seven times, it is possible to
get such a loop. That's not good.
But if we do shifting then by 1, no, shifting then by 2, then the shortest combination of all of
those powers is very large. We just use the lattice algorithm to compute the lengths of the
shortest vector.
Now, the shortest vector will have negative coefficients. So it doesn't really tell us exactly what
we want to have. But it gives us a lower bound. I mean, if I allow you to use negative
coefficients, you have to use these huge ones. Well, if then, say, oh, you're only allowed to use
positive ones, it can only be larger. So that's when we say we have a way of avoiding further
cycles by doing this.
So instead of avoiding them in the sense of taking care of them, what Peter was reporting for the
odd characteristic, we just don't run into those by this choice of the iteration function.
And then you have a set interval, it's better -- that's just an implementation issue that if I have to
do all of them, well, if I have the square root and I have the fourth power, then it's convenient to
continue like this.
So our definition of the iteration function is we take that Hamming weight, technicality on the
side, we notice it's always even, and if we just want to have eight numbers and we compute mod
8, then we would only get four different values.
Well, so we divide the Hamming weight by 2 and then compute mod 8, and then to avoid such
short combinations, we shift the interval by 3.
So here is the Js that we use for computing -- J's powers of Frobenius [inaudible] steps. So what
it costs us is to compute the Hamming weight, to do the normal basis representation, actually,
we'll turn off -- we actually compute in normal basis representation so it doesn't cost much, then
we check for distinguished points where our criterion was that the Hamming weight was less
than or equal to 34. So that is the number which gives us the 850 terabyte of storage.
If we would have been a bit more generous, collected more points, it would have overflowed the
capacity of my cluster. If we would have been less generous, so more restrictive on finding
those points, then we would end up having very long runs on the machines, and that is
sometimes wasteful.
Well, and then for the computation, we have to find our J and we have to compute sigma to the
J's power, so 2 to the J of X and to the J of Y, and add the points.
Then we sat down and analyzed what it means to have this iteration [inaudible]. I started by
saying are we pulling out of our bag with L elements totally randomly.
And then I was talking about this board game which still had fairly random layout and then said,
okay, well, now if we restrict things we have to make sure that this assignment of steps is still
sufficiently random. Because you can only hope for the birthday attack, a birthday bound the
square root of L if you have a random walk or if you have a random way of taking elements out
of your bag.
I mean, I could be doing a very nonrandom walk just in a circle and it wouldn't tell me anything.
So we have to analyze how close our walk is to a random walk. So how much we can hope that
we get this square root of the group order.
Now, our group order, I mentioned before, is about cofactor of 4, get that away, then we hope to
get a speedup from using the negation, the double pegs of points, and from the Frobenius, so
having 131 times 2 points regarded as one element.
We do lose a bit of randomness by restricting to only eight choices for the J. So in some sense
the GLV approach of taking a hash function on this gives you more randomness.
But we're talking about a percent here. Reason for this, well, if you draw, like there's many,
many elements who have about as many 0s or 1s. So Hamming weight, 66, 65 is very, very
frequent.
Hamming weight 0, okay, in this case, doesn't happen at all. Okay. Hamming weight 2 happens
very rarely. 4 happens a little bit more frequently.
So you draw this and you get this huge bump in the middle. There's many, many more elements
to having an average Hamming weight of -- well, having a Hamming weight around the average
than having on the extremes.
And also when you compute mod 8, you still see this distribution. Sure, we fold in the extreme
values, but it still shows. So it's not totally random. But when we analyzed this, the heuristics is
it's less than 7 percent.
Well, that's what's in the paper right now. We just did something where we looked at -- well,
like second order collisions or anticollisions. Every way that we do something we're sure we're
not going to have a good collision. So, okay, then we get 7 percent instead of 6.993 percent.
You see, it's actually pretty close to 7. So even the more sophisticated analysis doesn't get us
much further away from randomness. That means we have the expanded number of iterations as
in the textbook version times 1., and then comes the probability that we have. So we'll need
about 2 to the 60.9 iterations. And then, well, makes iterations cheap.
So the highlights of what we've been doing is we looked at the randomness of the iteration
function. We went through this by saying, okay, we could have chosen more Js, we could have
said, okay, instead of doing mod 8 we do mod 16 and have 16 choices, but that will make it more
expensive to compute a single iteration and would only slightly increase the randomness.
Increase randomness means fewer iterations needed.
So if we want to have -- if we have an eye on the total computation, like the number of iterations
needed to break it and the time it takes for each iteration, that was the time that we optimized for,
and that's where we ended up choosing eight different selections instead of 16, or smaller would
be 4, but that would be very nonrandom.
Then we have a few more things, like we do not actually remember how we got here. The usual
textbook version tells you you have to have your little board figure, have a count of how many
AIs and BIs it has collected by now and keep those and -- well, I just have a little scorecard I use
so many of these steps or actually compute the integers. We just let our figure run around.
We went over where the figure came from. All of the steps are totally deterministic. So the
moment that the little figure reaches the power box and phones home, we record where it is. If
this power box ever finds a match, we know where those two figures came from. And at that
point it's worth doing the computation again and remembering how we got here.
But we don't have to bother the FPGA implementation with remembering how they walked here.
So he's actually quite happy that we don't tell him, oh, by the way, aside from computing here on
the elliptic curve we also have to remember a little bit of how many Js of [inaudible] 8 we've
used or, even worse, how much implement some arithmetic model to the group order.
You remember this nice big prime, it looks beautiful, and then we know exactly which discrete
log we have for each of the points in our database. But we don't need this.
Ideally we need this for two. Realistically, we do not store everything. We do store exactly
where this little figure came from, and then we store a hash of the result. Otherwise it wouldn't
fit in the 850 terabyte.
By doing a hash of this, we might have a pseudocollision, so to say. We might have two board
figures arriving at the same spot even though it's just the hash values that match.
Okay. At that point the server will do some cycles and find out nope, sorry, it wasn't a match.
And we're not telling our clients to shut down at that point. They will continue producing
numbers. So there was a bit of a tradeoff, so we also have a nice protocol to send those points
and there was this moment where we're thinking huh, yeah, so what's the benefits of Eindhoven
University for incoming traffic and would they actually notice if I use 50 percent of their traffic.
Yeah, we decide the answer was yes, and then we reduce the amount of traffic that it costs. And
we've so far been happily under the radar. We've had some issues with broken power supplies or
my university deciding it's time to do a test and then they shut us down. Not so nice because we
have -- that is absolutely right. Dan is holding up a sign saying 850 gigabyte, not terabyte.
Thank you, Dan. I apologize. Yeah, otherwise we would still not be under the radar for the
incoming traffic.
So this highlight gives you some ideas of what's behind this. So we set off in teams so there's
now a bunch of different spin-off projects. So I've been pointing Junfeng a few times because he
also is a coauthor on this paper and they have a very nice FPGA implementation which got
published this year at the FPL conference.
The same Peter who was reporting on the negation and prime fields was one of the authors on the
paper of doing [inaudible] on the Cell for this. So we have a bunch of PlayStations. And as
Peter Montgomery mentioned, PlayStations are a very convenient or a relatively cheap hardware,
and we have been lucky enough to get some bits of the big 200 PlayStation cluster sitting in
Lausanne.
Then we have sat down on GPUs, so Dan's presentation yesterday ended with saying, oh,
actually in number theory you should notice there's a multithread world, there's a world where
you have lots of tiny little processes all want to do the same. Well, it's a great platform for this
attack. We can't feed them as many computations as we need.
I mean, we -- hey, we have a lot to give. We have about 2 to the 60 computations. We can slice
them in chunks of a hundred something, we can, ah, if you want a thousand at the same time, no
problem. So this is a great application for highly parallel platforms.
Little side result was that those platforms don't actually come in an implementation-friendly way.
So if you're a C programmer, then you should be happy with using CUDA, which is what the
GPUs want to be programmed in. If you're a good C programmer and are used to writing the
speed-critical routines in assembly and you look at where is the assembly for GPUs, and it turns
out there is none. Okay. So put that project on the side, quickly get a, hey, can't be so hard to
design an assembly language for GPUs.
Actually, we didn't have to go all the way to designing the assembly language. There was
already somebody who has been sitting down with the disassembler to find out what the GPU is
doing under certain instructions. That thing was buggy, we fixed it, and when then also wrote a
[inaudible], so that stands human friendly assembler version, which that's part of the [inaudible]
allocation for you.
So we had a few side projects coming out of this. On the math side we improved how we write
normal basis and we came up with something we called optimal polynomial basis. So it's a
polynomial basis if you have a normal basis for your field, you can still speed up polynomial
multiplication in that field.
If you want to know more, then we have some Web pages. There's the ECC challenge at info
which is still anonymous but you can trust me it's us. If you want to give me a challenge, like
can you make the fifth lines as the following, yes, I can do this.
We have submitted the paper for a conference which wanted anonymous submissions, and
therefore we don't have the names on there yet. But I think it's about time to put names on there
and give credit to everybody.
We also have a Twitter page. If you go there you can see how often my university screws up and
how often the components break down. Like we just had to replace a Western Digital hard drive
and got a broken one back, and there's a bunch of papers that came out of this project.
Thank you for your attention.
[applause]
>> Tanja Lange: Rich?
>>: How far along are you? When will you finish?
>> Tanja Lange: Well, you can go to the page and find out, no, you can go to that page. Here is
what [inaudible] we have right now. If you look at this, we're like 10 percent through, which is
not so great. Main problem is that we are hoping for the FPGA thing and that one FPGA cluster
is worth about four to five Lausannes.
>>: Okay. Is that capital L?
>> Tanja Lange: Lausanne is capital L, little L is [inaudible] and E is Eindhoven, D is Dublin, J
is [inaudible]. We actually got some -- even got some [inaudible] supercomputers, so the graph
that we have online is pretty bumpy.
I have a few more graphs in the presentation that I didn't get around to doing. That one. So here
you can see that we actually put in some effort to make things faster. And the biggest
improvement we got, well, they got, is on the FPGAs.
So initial estimates, like October 2009, we're 2,000 -- I mean, you wouldn't need 2,000 FPGAs to
break this, and now the current estimate is around 600.
>>: And is that number of years or something?
>> Tanja Lange: No, that many copies of the hardware to run it at that speed which would break
it in a year. Then one of those RIVYERAs is 128 FPGAs.
So on this scale that is a seeable fraction. And then we have a few applications to
supercomputing centers who have lots of idle GPUs, as it turns out, and sometimes actually need
to justify their existence.
>>: [inaudible].
>> Tanja Lange: Well, we can, so they could just let us run them and we'd be happy.
Further questions?
>>: If I recall correctly, there was an estimate earlier that you would actually finish spring of
this -- this past string. Was there an interesting story there?
>> Tanja Lange: Yeah. That's related to the FPGA story. That was scheduled to be ready and
running last September.
Now, in some sense, the FPGA implementation wasn't ready at that point either. If you look at
where it was at that point, it was definitely worth waiting till March. March was when Junfeng
visited us and I think that's when he did a lot of intensive work on this and got this huge speedup.
So it was definitely worth waiting until then.
But since then we're sitting down with the code, and, well, the interesting stories are like, okay,
they have the FPGA boards, but they don't have the power supply. Because the power supply is
somewhere and it's stuck with the ash cloud. That tells you it was around April.
They finally had the FPGA boards and had the power supply and were starting to power the
thing on and realized that all but two boards had a little crack because somebody stepped on
them.
Then apparently they got the boards fixed or got fresh boards and put them in. And this is a
small startup company in Kiel and I'm not sure how far they get with their startup if they have a
similar attitude each time.
But then they had to come from Kiel down to Bornholm and it's now sitting in Bornholm and
they're now figuring out how to talk to the machine. And at the beginning not even the test
programs from the company worked on the machine. But, well, yes ->>: The FPGA [inaudible] they actually arrived at Bornholm seven days ago. So maybe in a
couple of weeks we can have it running and then start something [inaudible].
>> Tanja Lange: But, yeah, I think next spring is a more realistic estimate. But also, if you look
at the graphs, there was not much happening at the beginning.
Victor.
>>: So --
>> Tanja Lange: Do you have computers for us?
>>: Hmm?
>> Tanja Lange: Do you have some computer time for us?
>>: [inaudible]
>> Tanja Lange: I should have asked Rich about this too.
>>: We're taking donations.
>>: That's difficult. No, I was going to say so once you find the answer, what next?
>> Tanja Lange: Well, we are a government-sponsored project, so we can't actually take the
20,000. But we are enough people that if all of us buy airplane tickets to pick up the prize in
person and have a beer over in Toronto, then the money will be gone. And then over that beer
we can discuss whether it's worth going for that next challenge or not.
But I think it's not so much reaching the goal but the things that we learned on the side.
>>: Sure. I mean, so have you thought about the things that you've learned here and nice things,
how they might be applied to other things?
>> Tanja Lange: Yeah. So, I mean, some things are very specific to the Koblitz curve attack, so
they could be used for 163, and we're certainly going to sit down and rewrite our estimates.
But they won't help much like, okay, the specific iteration function, but then the analysis of the
iteration function is much more accurate than we had before.
Also some of the invitations, like, I mean, the assembly language for GPU we have and we're
certainly going to use for the 131 nonKoblitz curve challenge. And, well, maybe it's more
interesting to run the prime fields, but we have a better negation map.
But if you have any interesting projects, we're listening with open ears, because, well, once
machines are running, we have capacity to think again. Toni.
>>: How did you come up with the 7 percent for a penalty for not having a random walk?
>> Tanja Lange: Okay. So the question was about the analysis slide. You mean the 7 percent
compared to the 6.9993, or in general how that happened?
>>: In general.
>> Tanja Lange: In general. When you look at you're sending somewhere and you have a
certain out degree, then at every point -- and you know what the probability is of going that way,
then you can compute this over the -- well, it ends up being like sum over 1 over 1 minus P
squared, where P is the probability for each of the steps.
And that gives you like the first level deviation from randomness, and then what we have now is
kind of a second where you have P to the 4th coming in. So there is an appendix over one and a
half pages in the paper to explain that formula.
But it's a general formula where you can plug in whatever step function you have. So we had
pretty good understandings of like additive walks. Now this is more like a multiplicative walk,
because we had 1 plus the sigma where sigma is like a scaler to the J. And we have now also
that one in there.
So it's fairly generic if you have your random walk or not so random walk, you can plug in the
probabilities and get the numbers out.
>> Unknown Speaker: All right. Well, I couldn't help noticing that Tanja didn't thank the
organizers, so maybe we should thank Tanja for the talk and the organizers.
[applause]
Download