>>: This is the first speaker of the session. ... cryptic community as the co-discoverer of the BN elliptic curves,...

advertisement
>>: This is the first speaker of the session. This is Michael Naehrig who we know in the
cryptic community as the co-discoverer of the BN elliptic curves, and we're talking about
implementing pairings today.
>>Michael Naehrig: Thank you.
So good morning, everybody. It's nice that everybody came back this morning.
Yeah, first of all, I'd like to thank the organizers for giving me the opportunity to speak
here. That's really great.
So my talk will be about pairing computation. Although I'm not going to go so much into
the detail of the implementation, a little bit in the middle part, but it's more like about
parameter selection and algorithms to compute pairings.
So the talk has three main parts. I'll give some general stuff in the beginning about
pairings and pairing finitely curves. And, of course, we'll have a look at BN curves, in
particular a subfamily of BN curves which are very, very nice for implementation. And
then the second part I'm going to describe an implementation that actually Peter
[inaudible] did of the optimal 8 pairing on a BN curve using the polynomial
parameterization of the primes to implement theory arithmetic. And the third part we'll
take a look at using affine coordinates for pairing computation.
Okay. So let me start. We've seen most of this yesterday already, so it's just to fix
notation, we'll talk about elliptic curves over some field Fq finite field Fq, and I'm going
to denote by N the number of Fq rational points on that curve. And that's Q plus 1
minus t where t is the trace of Frobenius. That's all known. And we'll take a large prime
deviser r of n. That's the size of the group we usually do crypto with. And then the
embedding degree of E with respect to this r is the smallest positive integer case such
as r divides q to the k minus 1. That's a very important parameter for pairings as we'll
see soon.
Yeah, we have these three properties down here. So, of course, k is the order of q
modulo r. And that's also the reason why if you choose an arbitrary, like a random
elliptic curve, then that's the reason why this case usually is very large, are the order of
rth actually.
And furthermore we have, of course, that the rth roots of unity are contained in the finite
field extension Fq to the k, and also if that k is larger than 1 and we have all r torsion
points on the curve already defined over Fq to the k.
So that's basically the reason why I will fix our universe today is Fq to the k or the curve
E over Fq to the k. We don't need any larger fields here.
So for practical applications of pairings we usually use, yeah, variance of the
Tate-Lichtenbaum pairing, which is the explicit version described by Lichtenbaum of the
Tate pairing, and you can see why the embedding degree is to important. Basically it
works all over things defined over Fq to the k. Yeah, the pairing takes two points. First
point is from the r torsion group and the second gives some class in this quotient, and
this is met over to the field Fq to the k star modulo r powers.
So the pairing is actually a function frP which is given by this deviser RP minus r times
the point at infinity, and it's evaluated at some deviser given by the point Q.
So for people to implement this, this is already too complicated, so we're going to try to
make things easier by first restricting the first coordinate to points defined over Fq. And
for suitable circumstances we can also take the second argument just as an r torsion
point, this time defined over the larger field.
And if we have k is larger than one, we can we place the deviser by the point Q. So it's
basically just a function given by some point P evaluated at some point Q and then
raised to this exponent to give a unique value and rth roots of unity so we get rid of
these classes over here. That's basically the thing we start from.
We've seen this yesterday already, but I have much nicer pictures with colors [laughter].
This is a standard group law in elliptic curve, and actually these lines that occur here
like the lines through P1 and P2 and then the vertical line here, these are the things you
can use to build up the function. That's what Victor Miller described yesterday already.
So you have these formulas over here. So if you want to compute fmP of Q for some
integer m, you can build that by doing some square-and-multiple-like loop using these
formulas. So you start with f0, which is 1, and then you build up that. And you always
evaluate directly at the point Q. If you think of how large the degree of that polynomial,
for example, would be if you don't plug in the point Q, I mean, the degree is n so you
have a really function function so you want to plug in the point Q directly and then only
compute with the values.
So I've also put an Edwards curve down here because you don't need to work with line
functions. For Edwards curves, which are quartic curves, it doesn't work with line
functions. So in that case you just replace the line function in the narrate by a quadratic
function, this blue one here. So if you want to add P1 and P2 on the Edwards curve,
you need to find that blue function which is a quadratic that goes through the points P1,
P2, this point 0 prime down here, and the points at infinity.
And then [inaudible] theorem gives you an eighth intersection point. So because their
points at infinity have multiplicity 2, you get 4 of them, then you have these three points
is in total 7, and then you get one more intersection point. So that's the group law on
Edwards curves. And you can just replace that function by the quadratic and that
function by the product of that line and the line that goes through the y axis. And then
the algorithm works just the same.
So what I just want to point out here is that what you have to do, you have to do
computations, and these computations are basically you have to curve arithmetic to
compute these points, so you have to keep track of the points you come along, and you
have to compute in the field Fq to the k, namely, by squaring this value and multiplying
by the fraction of these line functions.
And that's the reason you need to have a small embedding degree k, because if that's
too large, you just can't do the computations.
So we're going to make things even more easy. We're going to restrict to some
common choices for the groups we compute the pairings on. So the first, that's what we
already did for the Tate pairing. We'll just restrict to a point defined over the base field.
And the second group we're going to choose as the Qi in space of the Frobenius
endomorphism, also known as the trace zero group, and -- yeah, we'll see why we're
going to do this.
And then we have basically two variants of pairings we compute. Namely, something
like the reduced Tate pairing where the first point is small defined over the base field
and the second is taken from G2, so defined over the large field, and then we have a
second possibility, namely, the ate pairing where we just change the order of these
groups. So the first point is a larger point, the second one is the smaller.
The advantage of this is that you see here it's not Fr anymore, it's Ft, and T is basically
the trace of the Frobenius. So that value is much smaller than r. So if you think of the
square multiply Miller loop, then this can be computed in half the number of iterations
than the Tate pairing.
So there are even more efficient variants, namely, so-called optimal ate pairings, and
they have even smaller values for t here. So the t can be replaced by Fm -- by m, and
the notion of optimal ate pairings means that this -- the length of m is actually the length
of r divided by the [inaudible] function of the embedding degree.
We'll see an example for an optimal ate pairing later. So these are basically the
versions -- I'm not talking about any special types, so I want to restrict to just these
choices for this talk.
Okay. So we have this group G2 here, which consists of points of r torsion points
defined over the large field Fq to the k, and that's annoying if we have so large
elements. And we can actually use a twist of E to represent that group nicely. That's
the main reason, actually, we choose this group; because it can be represented very
nicely.
So for this talk we will -- yeah, we'll define the twist E prime of E as a curve which is
isomorphic to E over F to the k. That's what I said in the beginning. Our universe today
is Fq to the k, so nothing large than that. So a twist E prime of E is a curve isomorphic
over that universe.
And as you might almost all know that these twists look like this, and the isomorphism
looks very simple. It's just multi-planic coefficients with omega squared omega cubed
for some element in Fq to the k star.
Okay. So if -- I mean, the isomorphism is defined over Fq to the k. But if the curve
somebody defined over some smaller field and we have like a deviser d of k such that E
prime is defined over Fq to the k over d, we say that -- and no smaller field, and the
isomorphism is defined over Fq to the k and no smaller field, we say that the twist has
degree d.
And there are not so many possibilities for the twist degrees. Yeah, for an arbitrary
curve in general it's just d equal to 2. If we have special j invariants like 12 cubed, then
we have the coefficient b is zero of the curves, and so then we can have twists of
degree 4n2, and for j invariant zero we can have twists of degree 2, 3, and 6. That's all
that is possible.
And -- yeah, so the good thing is that we can say that there's always a unique twist E
prime of degree given by that Gcd over here. So if we take d to be deviser of k, and
then it's always degree d. So the best thing we could hope for is that we get a twist of
degree 6. And there's a unique twist such that our prime R divides the group order of
that twist. That's a nice treatment of this. In the original ate pairing paper, that explains
all that.
>>: [inaudible].
>>Michael Naehrig: And you can write down possible group orders of twists. That's
what they do in the ate pairing paper. Then you can see that there's only one possibility
of these group orders that can be divisible.
>>: [inaudible].
>>Michael Naehrig: Yes. So we're going to fix one of these equations many I'm just
saying that -- yeah, so I should say there's exactly one group order. Yeah, right.
So now we take this twist and we define the group G2 prime which is just the group of r
torsion points on E prime defined over that smaller field Fq to the k over d. And then
our twisting isomorphism defines a group isomorphism from G2 prime to G2. So here
are some pictures, and I've put that here so that G2 prime is slicing here and has points
defined over smaller field and [inaudible] maps over to G2.
And if we now, in addition, also represent the field extensions we work in, so Fq to the k,
with this same element omega, then that's very convenient because elements in Fq to
the k can be written as polynomials in omega with coefficients in Fq to the k over d. So
that twist isomorphism just takes the coefficients and puts them at the right places in the
new elements. So that's very convenient. There's nothing to compute. And you can
see that these group elements from G2 are actually very special. They are kind of
sparse because they all come from elements in G2 prime.
If you want, you could also think of what happened if I go back from G1. Then you will
end up over here in G1 prime, and that will look very similar. It's just coefficients put at
the right places. But in this talk we're only going to use G2 prime.
Okay. So now we've got all the stuff we need to talk about pairing computation, but now
we need curves to compute pairings. And we need to fulfill some security requirements.
So this table I have taken from recommendations by NIST and
ECRYPT II, and what it says is actually it gives a level of security, and it says there are
equivalent sizes. So there are certain sizes for the parameters that give equivalent
levels of security. So, for example, if you want to have 128-bit security, you need to
take an elliptic curve with 256 bits for the prime R. And the extension field should be
something like that, 3000 bits.
And -- yeah. So I'm actually not talking about any special assumptions. That's basically
the minimum requirement that at least the DLPs should be difficult enough.
And now the pairing has this embedding degree that the curve would compute the
pairing on, and that embedding degree links the sizes of the prime R and the finite field
Fq to the k. So that's very nicely represented down here. That value ro is basically a
measure for how far the curve is everything having a prime number of rational points
over the base field, so the size of Q is actually ro times the size of R, and then you get k
times the same thing to get the size of the extension field. So the factor between the
prime R and the extension field is ro times k.
And from these minimal requirements here you can deduce kind of an optimal factor for
that. So in a 128-bit level it comes out to be around 12.
So for efficiency reasons you might want to balance the security. So, for example, first
of all, you want to have the ro value as close to 1 as possible. I mean, it's good to have
a prime order curve so you don't need to compute with two large primes Q or prime
powers Q here.
And, also, if this factor of k is too large, then you will have to fulfill a minimal size for R
and that makes your finite field way too large for that same security level. The other
way around, if it's too small you will have to increase the size for your finite field which
makes the group in the elliptic curve actually too large for that security level. So if you
believe in these recommendations then a good thing to do would be to choose ro k as
close as possible to these optimal values in the table.
So we need a small k and we need ro times k to be close to these values and we would
like to have a twist degree as large as possible. So there are quite some constraints on
the choice of curves we can make. So as Professor Koblitz said yesterday already,
supersingular curves always have small embedding degree. In large characteristic it's
at most 2 and for characteristic 3 and 2 we can have k up to 6.
So they were the natural first choice. That's when they came back from the grave and
people were computing pairings on them. But you can see ro times k is quite small, so
you would have to make the fields very large to get a certain security level.
So I'm going to focus on ordinary curves in this talk. We'll hear something about
supersingular curves, pairings on supersingular curves tomorrow, I guess in both talks
on pairings. So to get a larger embedding degree and larger ro times k we might want
to go to ordinary curves.
So what do we need? We need curves with the following properties. That's the
standard thing for the group order. We need a prime deviser of that group order. We
also need the embedding degree condition and we need to find that curve, and usually
that should be done by using the CM methods. I mean, if you choose some curve, as
already mentioned several times, then the embedding degree is just enormous. You
can't compute. So you have to fix that in advance. So fix the embedding degree and
then try to fulfill these equations and that should look familiar from Francois [inaudible]'s
talk yesterday. If you have that formula fulfilled, then you can use the method to
construct the curve. The only restriction is that you can find the Hilbert class polynomial
so that d should be suitable or small enough such that you can actually compute that
and find the curve.
Okay. So these are the restrictions. And usually it's done by -- so for these examples
for the security levels I showed you earlier, it's usually done by selecting polynomials
that parameterize these parameters n, p, and t and that already fulfill these conditions
as polynomials. And the only thing you need to do is to plug in values until you find
these things to be prime P and R, and that's all nicely described in the famous
taxonomy paper that has basically all the constructions that are out there.
And in this table I have put some examples that kind of fulfill these balanced security
requirements. So for the 128 level, of course, BN curves, they have an embedding
degree of 12 and ro equals 1, so that exactly fits in these requirements, although one
might have to adjust the security a bit by increasing the bit size a few bits because you
have these endomorphisms of degree 6 and very special parameterization.
So, anyway, these are the names of these constructions in the taxonomy paper. So if
you want to look at that later, you can use the slides to find that in the paper.
So just to give you an idea what we can have here, so the upper things are always the
ones with the higher degree twists. And ro L equal to 1 is just possible for a few
families. So the BN curves, Freeman curves for k equals 10 and m and t curves, which
are ordinary curves that have embedding less or equal to 6. All the others -- for all the
other examples you can find prime order curves.
Okay. And then you might want to look at the last column here. So that gives you the
degree of the field you have to do the computations in. For the ate pairing you will
replace G2 by G2 prime and that's defined over this degree extension of Fq.
So very nice example of these BN curves. They are given by these two polynomials.
So p and n are given by degree 4 polynomials. And the good thing is that if you look
down here, the CM equation by accident comes out to be minus 3 times the square all
the time, so they all have CM by the field Q square root minus 3, and therefore all the
j -- so they have always j invariant zero. That's the reason for the curve shape here.
And it's possible to find curves with a prime order so that n is equal to R and is prime.
So what you need to do here to construct such curves is just plug in values for u until
you find those two polynomials to give you prime numbers. That's all. Once you have
that, you can try certain different values for b and check the order of the group to see if
you have the right twist or you can actually immediately compute that order because for
these curves you can just write it down.
Okay. The embedding degree is 12. That's already said. And the twist of degree 6 -- in
that curve shape we have such a twist -- is, of course, given somehow by that
coefficient b over [inaudible]. So to conduct this twist you will try just certain values for
[inaudible] until you find the right one that gives you the right group order.
So what does it mean for our computations? The group G2 prime is defined over Fp
squared so we basically only have to do computations over Fp squared. So for the ate
pairing curve arithmetic is in E prime of Fp squared. So we'll just always replace the
points in the large group by the small ones on the twist.
And then you can also represent the curve at the field extensions using that same value
[inaudible] from the twist so you have this nice convenient representation that goes
along with the twist isomorphism.
So that's the algorithm for the optimal ate pairing on BN curves. It looks a bit confusing,
but it's basically just a few parts. You can look here. The parameter m is just 6u plus 2,
and that's of size one fourth the size of R because the n was given by a degree 4
polynomial. So now we have a linear polynomial in u. And then we have this -- that's
the milliloop that's the square multiply like part here where you have these doubling
steps. You compute double of the point and then you do the squaring times the line
function. So I've already thrown out all the denominator line functions because that's
what you can do for even embedding degree. They all lie -- so by choosing this G2 in
the way we did, they all lie in subfields and get mapped to 1 by the final exponentiation.
You have to do adjustment if the parameter u is negative, and then it's not only -- so for
the Tate pairing you just have this milliloop and then you're done. Also for the ate
pairing. But for the optimal pairings you have to pay a little bit to get this small order of
the function. So you have to do some Frobenius computations, two more of these
addition steps, and then steps 14 to 16 are the final exponentiation. You can split that
up into some Frobenius powers and multiplications and you are left over with a
so-called hard part of the final exponentiation which is kind of a real exponentiation
down here.
So these are actually the things that are computationally intensive. So here all these
doubling steps, you have multiplications in the large field. You have to compute the
coefficients of the line functions, evaluate at P, do these multiplications after the
squaring and the point computations. And then this is all not so expensive, and then
you get this large exponentiation down here.
Okay. So let's have a look at how to choose suitable curves that make this algorithm
more efficient. So, for example, if you have that parameter m 6u plus 2 very sparse,
you don't need to go into that if statement so often. That's good. That saves you
computation.
Also, if the value u itself is sparse, has a lot of zeros in its binary representation, then
you can do that part of the final exponential down here very efficiently.
So that's really essential to choose these things in that way, we will see later.
And then you might want to choose p [inaudible] 3 mod 4 to get that nice representation
for the quadratic field. So you have a lot of computations of Fp squared, and that will
make that very efficient, although maybe Francisco will have maybe a different view on
that tomorrow. So I would say that this is the best choice.
And then you might want to choose psi as small as possible. We'll see what that means
in a few seconds.
So there have been some very fast pairing implementations out there which just look at
the efficiency of the actual pairing algorithm. But if you do protocols you want to also
have efficient curve scaler multiplication, and you might need to hash to the curve. That
means currently you have to compute square roots or Q proofs. So you might think
about that as well.
So also there are some protocols designed for constraint devices where actually these
devices don't compute any pairings at all. The pairings are computed somewhere
where you have power to compute them. And then if you increase pairing efficiency by
paying with a less efficient scaler multiplication, then that's also not a good thing to do.
So even you could use Edwards curves to get a very fast scaler multiplication. You can
compute pairings. You're not as flexible. But, still, I mean, it could be the best choice.
All right. So the next two slides are actually just about a nice subfamily of these BN
curves that we actually just came up with to show that you can choose BN curves as
nice as you want because there are a lot of them, there are enough curves. And, for
example, if you choose that element psi in a way such that its norm over Fp squared
gives you the parameter P, then you already know which is the right twist to use, so you
don't need to check for the order and stuff like that. So that's just very basic, some
basic calculations using the fact that these curves don't have points of order 2 and 3.
So we suggest the following things. So you should choose your BN curves as follows.
You should choose a low weight u such that you get a low weight 6u plus as already
mentioned earlier, p, comma, to 3 mod 4 to get the nice quadratic field extension, and
then you choose a small psi in this shape. And that gives you B like this. So C to the 4
plus d to the 6. And you plug in some small integers to try -- I mean, you still have to try
to find a right -- the right B value. That's the thing you need to do. So you need to plug
in some values for C and D that gives you the correct B and then everything else is
fixed.
So the advantage of this is you get an obvious point on the curve which is minus D
squared, comma, C squared. And you also get an obvious point on the twist, which is
this point.
So once you fix C and D, you can compute all that very easily. You get all these
generators. I mean, that's not something you need for efficient computation, but it's a
very nice representation of these curves, and you can choose it all in a way that gives a
very efficient pairing algorithm.
So one example curve is this. So the u is very sparse, just 3 minus 1 values in the
[inaudible] representation. And you can choose C equals 1 and D equals 1. So this
information is all you need to describe all the parameters you need for pairing
computation. It might be useful for putting curves in certificates or something like that.
That's really not much information. And from that you can compute everything, you get
these nice generators. If you want to send them around, that's not much information.
You have a very small psi here. So multiplication by that thing is just two additions in Fp
squared. And the twist looks like this.
Also, B and B prime, so B over psi are kind of small because current really fast formulas
for computing the pairings use these values B. And if they're small, that's better.
Okay. So these are some suggestions for BN curves. Okay. So we'll come to the
second part of my talk, which is about a pairing implementation. And, I mean, these
pairing-friendly curves are very special. They have this special embedding degree
condition, and also most of them are given by polynomials. So the primes are given by
polynomials, so that's a special structure. So the question is could we use that to make
arithmetic more efficient.
And, actually, Fan, Verkauteren and Verbauwhede demonstrated this in a hardware
setting where they showed that you can use this polynomial representation to get very
fast algorithms. And here they choose a u that's kind of almost power of 2. It's just the
power of 2 plus some small thing that makes the polynomials prime, give you prime
values.
The problem is that essentially uses the fact that you can build specially sized
multipliers in hardware. You can build multipliers that have certain -- that multiply
certain small numbers with larger ones, and that's more efficient than a general
multiplication which you don't have in software. You can just use what you have.
So their approach is not exactly what we can do in software. So the question is does it
work in software.
So if you look at Dan Bernstein's paper, Curve25519, then you see that he does some
similar things there. He represents the elements in some strange rings and so if we do
it like here, we can just write down that polynomial P, so we introduce another variable,
we write it as polynomials over some variable x, and then you can write that P in these
two different shapes. And if you plug in 1, you get back the value of the prime P.
So what we can do now, we can represent our elements in Fp by just using the same
representation here. So we put coefficients here and kind of have a representation with
certain different sizes for the coefficient. I mean, it's not unique. We can just use it -you can have several different representations of a number. It's just we need to have
that F of 1 is equal to F. So the arithmetic in -- we will do arithmetic now by multiplying
polynomials and then plugging in 1 in at the end.
So if you have two such elements, multiply them, you get an element of degree 6. That
gives you 7 coefficients, and then you want to get back something with 4 coefficients, of
course, and then you can use the polynomial representation, this P thing, to reduce the
degree of the polynomial. And that actually looks very simple. So you get these
formulas. That's the degree reduction. So you do something like a school book
multiplication on polynomials and then reduce the degree again.
The problem is that doesn't really help. That's not really more efficient than, for
example, Montgomery multiplication. The reason is you will have to do reductions all
the time just as in Montgomery. And probably Montgomery representation is the more
efficient thing to do. So, again, it's the problem with the hardware. I mean, we get
coefficients -- so if you multiply these things with 64-bit coefficients, the product has
128-bit coefficients, and then you add something during the algorithms that makes
these things grow, you have to do reductions all the time. So if you had hardware
realization, you could make a big larger registers and then it would work.
So the idea we have is -- we had is, okay, let's take more coefficients and put it in some
variables such that there's space.
The problem is we need to restrict to third powers of u and that's a really strong
restriction. Okay. Anyway, so we take 12 coefficients like that, and then at 256-bit
number we represent by these 12 coefficients, and then this v thing that is 21 bits. U
was about one quarter of r, so 256. U is about 63 bits and then v is 21.
So what happens if you multiply two such elements? You get double the size of the
coefficients. And if you put all these things in double-precision floats then there's still
space for doing some additions without having to do coefficient reductions. So we use
double-precision floating points and can do some computations without doing the
reductions.
But, of course, there's some point you reach the limit of the size of these variables, so
you have do some coefficient reductions. And that's what -- we do it just like with round
functions. So that's just computing modulo 6v and computing modulo v and then you
get, like, a balanced representation where these things are between minus 3v and so
on.
And, again, if you get a carry from the last coefficient, you use the polynomial
representation to reduce back to the other coefficients.
So how Peter and [inaudible] implemented this. They used vector instructions, and
mulpd and addpd from wick you actually do two in one instruction. So we have these
SIMD registers and instruction, and you can do two multiplications and two additions.
The problem is if you do Fp arithmetic with that you'll have to always make sure that the
right things are in the right places to actually use these things. So you have to do
shuffling and combining and things like that.
The solution to this problem is that you directly implement Fp squared arithmetic
because that naturally gives you some kind of parallelism where you have things -- so if
you write down the elements -- and Fp squared elements interleaved, you take the
coefficients from the first and then the zero coefficient from the second and so on, you
more often have already the same things at the right -- the right things at the right
places.
So then we used school book multiplication, so no [inaudible] because that's an even
number of multiplications you can do in parallel and so on and so forth. And then per
multiplication you have to do at least one polynomial reduction and two coefficient
reductions.
There's one nice things I should mention. Of course it will become more efficient if you
take out reductions. So that was actually the plan when we took these variables that
you have some space to take out reductions. And so Peter wrote a class that always
takes care of these of the maximal values that you can get. So we just plug that in, and
he tried how far does it work, and then we need to do a reduction again and so on and
so forth.
And then in the end it was reimplemented in [inaudible] j.
So the results. So currently we have kind of performance like this, so 4 million cycles.
To compare with what was published before, there was a paper by Hankerson,
Menezes and Scott from 2008. They gave for the same security level BN curves
optimal ate pairing. It was the r ate pairing, but it's almost the same algorithm, and they
gave, like, 10,000 cycles. And we asked Mike Scott, and he sent us back something
like that for a newer version of the parameters, and we were quite happy that this was
so fast.
And then a few weeks later some people came along and made a very fast
implementation of the optimal ate pairing. I'm not going to talk about that very much
because that's tomorrow at the same time. Francisco will explain to you what they did
to get these performance.
So if you have a look at this table down here where we also give the cycle counts for
multiplication and squaring Fp squared, then you see that actually our paper is currently
a bit faster, just looking at the multiplications. But the total pairing is slower.
And the reasons for this is that -- so the main actually is that we have to restrict to this
condition here, third power of v. That restricts the choice of curves so much that we
can't use these nice curves, and the other guys actually use a very nice curve, so very
sparse parameters and everything.
So that's the main reason it doesn't work so nicely.
Also, it wasn't possible to remove as many reductions as we thought. So there are still
too many reductions in that, so that takes time. And multiplication is not really much
faster. So for hardware it seems it doesn't really pay off to use the polynomial
representation.
So the reasons for multiplication not being faster is that we use school book to have this
parallel structure, so an even number of multiplications, and by -- and still we have to
put things at the right places, so that also takes some time here.
But, still, it's not that bad. It seems that still Montgomery multiplication, that's what the
others use, is the best thing to do for software. But maybe if we try to compute pairings
on other architectures where floating points are faster than maybe our approach will be
better.
Okay. That's the second part.
So if you're bored and you want to do something else, you can think of why our paper is
called DCLXZI and talk to me after the talk.
So the third part actually deals with the coordinate system choice for pairings. Now you
have to do curve arithmetic so you have to decide which coordinate system you use.
And I wanted to show you this citation from Steven from the year 2005. He said one
can use projective coordinates for the operation in E of Fq. The performance analysis
depends on the relative costs of inversion to multiplication in Fq and experiments show
that affine coordinates are faster.
So that was 2005. That was before BN curves and that was with supersingular curves.
In the meantime, the picture has changed. I mean, for ECC it's clear that you will use
projective coordinates because finite field inversions in prime fields are very expensive
compared to multiplications, and you want to avoid these inversions by doing some
more multiplications.
And also for pairings, I mean, the current speed records, our implementation as well as
the one by Francisco and the others, used projective formulas. It seems to be the best
to do there.
But, still, maybe one should rethink about using affine coordinates. So the reasons for
this is that in some cases we can have quite efficient inversions. So, for example, if you
work in extension fields. So here the example of a quadratic extension given by that
polynomial, yeah, you can compute the inverse by basically computing the inverse of
the norm of that element which is an inversion in the base field.
So to invert in Fq squared you invert in the base field and do some additional
multiplications. I would have to compute these squares. And so you have the compute
the norm first, then invert that, and then multiply by the coefficients.
So an inverse in Fq squared should be less than one inverse in the base field plus these
operations to multiplications to squarings, multiplication by omega and so on.
This means that the ratio of inversion to multiplication becomes smaller. So if you
assume that a multiplication in the large fields with curve [inaudible] takes at least three
multiplications, then this ratio will be roughly one-third plus some constant. So going up
in extension fields makes your inversions by a certain ratio better. In this case it's
roughly one-third.
So in general for a degree L extension -- so what I'm telling you is all old stuff that's well
known and it's basically a generalization of the Itoh-Tsujii inversion algorithm. It's the
standard way to compute inverses in optimal extension fields. And you can do that in
general. You can invert an element in some extension field by computing the norm
[inaudible] invert willing that and then multiplying by that element raised to this exponent
here. So B minus 1.
And if you sit down and think about the exact costs for these algorithms, you end up
with something like that. So in degree 3 extension, for example, you get roughly one
sixth the ratio plus some constant here and so on.
So you can see the larger the field extension gets, the better your inversion with respect
to multiplication will be in the end.
Also, another trick also very well known, I don't know which one of Peter's tricks it is, but
it's one of the tricks that came up in these papers on the ECM methods. So very well
known. Of course, if you want to invert two elements in parallel, then you compute the
product, invert the product and do two more multiplications to get the inverses.
So you replace two inversions by one inversion plus three multiplications. In general,
that works just as for two elements you compute all these products, you invert the last
one and then you can get back the elements by always two multiplications. So in total,
you replace S inversions by one inversions and three times S minus 1 multiplications.
And that gives you an average inversion to multiplication ratio of roughly 3 if S is large
enough. So we have two ways of improving the inversion to multiplication ratio. One is
this thing when you are able to do several inversions at once, and this has nothing to do
with field extension. On the other hand, we have that property that in extension fields
you get a better ratio.
So if you think back at the ate pairing algorithm at the first table with the examples for
the curve construction methods, then the last column gave you the extension degrees.
And for the ate pairing or variants of the ate pairing which are the most efficient pairing
algorithms, you will end up doing computations exactly in that extension field. So if you
compute the line function of values, you compute a slope of the line function and you
have to do an inversion in that extension field. So that's the case if you use affine
coordinates.
So it might be actually a good thing to use affine coordinates. If your inversions are
cheap already, which is usually not the case, or for ate pairings if you go to very high
security levels where you have large extension degrees, then you can use these
extension field tricks to improve your i to m ratio and get a better algorithm.
And there might be reasons not to use these very special curves with a fixed [inaudible]
1728, so people might want to use more general curves that only have a degree 2 twist.
And then this field extension is larger also. So in that case it might help to use affine
coordinates. It might be worth considering.
Or if you -- to come back to the second trick, if you compute several pairings. There are
a lot of protocols that actually compute many pairings or for the [inaudible] proof
systems you compute products of pairings and then you can do that in parallel -- they
are not really in parallel. You do it at once. You wait until you get to the inversions for o
and then you do the inversions at once and get a ratio of roughly 3.
So there are some possible scenarios where it might be good to think about affine
coordinates again.
The reason we stumbled about this was when I was working on Microsoft's pairing
library, and this is based on the bignum library, which is basically written by Peter
Montgomery. So that is has finite field arithmetic field extensions, polynomials, elliptic
curves and all the stuff.
So I was working on pairings by using that library. So what we did is we used the base
field arithmetic from big num, so Peter's Montgomery multiplication. And as usually so
256 bit numbers are split into four pieces and you do the usual Montgomery stuff.
We have the extension fields and I put in these inversion tricks I described before. And
the whole thing is a C implementation. For the special case of 256 bit integers, we have
a little bit of assembly. But it's basically completely C. It's not restricted to any specific
security level curves or processes you can plug in any BN curve. So you can even plug
in other curves. But currently we have the most stuff for BN curves. So you can plug in
any curve and it will work. It works on a 32 bit and 64 bit Windows. And that's what you
get for, that's what we got for the field arithmetic.
If you compare the multiplications, it's really much slower than this special
implementations I talked about earlier. But I wanted to show you this last column here,
which is the inversion to multiplication ratio. So if you're given like an implementation
with certain properties, then you see here in the 12th degree extension, an inversion is
almost as cheap as a multiplication or as expensive as a multiplication.
So if you would work over such large extension fields, you might want to do the
inversions instead of using projective coordinates.
And then the pairing algorithm it's basically an Ate pairing as shown before and here
you can see some -- I mean, so that's the optimal Ate pairing. It's about 14, about 15
million cycles and if you use the simultaneous inversion trick, it doesn't show too much
here because the ratio in FP squared is is already 5.3. If you reduce that to almost 3,
that's not a big deal here. So it becomes a little bit better product of pairings can use
more optimizations. You can just already [inaudible] everything into one milly variable
and save a lot of time.
Good, if you want details and thorough analysis of how much more projective formulas
costs, how much more the fastest projective formulas cost compared to F coordinates
and so on, you can look at the paper here.
Thank you.
[applause]
>>: Any questions?
>>: The obvious question, putting together parts two and parts three, who is going to do
it? When is it going to happen?
>> Michael Naehrig: Pardon?
>>: You mentioned the possibility, maybe probability of speeding things up with that by
coordinates and you've been mentioning better multi precision arithmetic. So when are
we going to see implementation that puts these things together and gets even better?
>> Michael Naehrig: For BN curves, I mean, usually it wouldn't pay off to use the FI
coordinates, because if you look at the speeds people get for these fastest
implementations, then the projective formulas are much better. So it would only pay off
for higher security levels.
So, yeah, it's a good thing to try that, yeah.
>>: It seems to me that the recommendation have taken [inaudible] find a supply of
curves.
>> Michael Naehrig: Yes.
>>: Well, you over sell, no matter how much ->> Michael Naehrig: Oh, okay.
>>: That you expect for the same proportion, or the operational takes it to be a power of
2 [inaudible].
>> Michael Naehrig: Yes.
>>: Can get lucky a few times but then it will run out.
>> Michael Naehrig: Yes.
>>: If no more questions, let's thank Michael for his talk.
[applause]
Download