>>: This is the first speaker of the session. This is Michael Naehrig who we know in the cryptic community as the co-discoverer of the BN elliptic curves, and we're talking about implementing pairings today. >>Michael Naehrig: Thank you. So good morning, everybody. It's nice that everybody came back this morning. Yeah, first of all, I'd like to thank the organizers for giving me the opportunity to speak here. That's really great. So my talk will be about pairing computation. Although I'm not going to go so much into the detail of the implementation, a little bit in the middle part, but it's more like about parameter selection and algorithms to compute pairings. So the talk has three main parts. I'll give some general stuff in the beginning about pairings and pairing finitely curves. And, of course, we'll have a look at BN curves, in particular a subfamily of BN curves which are very, very nice for implementation. And then the second part I'm going to describe an implementation that actually Peter [inaudible] did of the optimal 8 pairing on a BN curve using the polynomial parameterization of the primes to implement theory arithmetic. And the third part we'll take a look at using affine coordinates for pairing computation. Okay. So let me start. We've seen most of this yesterday already, so it's just to fix notation, we'll talk about elliptic curves over some field Fq finite field Fq, and I'm going to denote by N the number of Fq rational points on that curve. And that's Q plus 1 minus t where t is the trace of Frobenius. That's all known. And we'll take a large prime deviser r of n. That's the size of the group we usually do crypto with. And then the embedding degree of E with respect to this r is the smallest positive integer case such as r divides q to the k minus 1. That's a very important parameter for pairings as we'll see soon. Yeah, we have these three properties down here. So, of course, k is the order of q modulo r. And that's also the reason why if you choose an arbitrary, like a random elliptic curve, then that's the reason why this case usually is very large, are the order of rth actually. And furthermore we have, of course, that the rth roots of unity are contained in the finite field extension Fq to the k, and also if that k is larger than 1 and we have all r torsion points on the curve already defined over Fq to the k. So that's basically the reason why I will fix our universe today is Fq to the k or the curve E over Fq to the k. We don't need any larger fields here. So for practical applications of pairings we usually use, yeah, variance of the Tate-Lichtenbaum pairing, which is the explicit version described by Lichtenbaum of the Tate pairing, and you can see why the embedding degree is to important. Basically it works all over things defined over Fq to the k. Yeah, the pairing takes two points. First point is from the r torsion group and the second gives some class in this quotient, and this is met over to the field Fq to the k star modulo r powers. So the pairing is actually a function frP which is given by this deviser RP minus r times the point at infinity, and it's evaluated at some deviser given by the point Q. So for people to implement this, this is already too complicated, so we're going to try to make things easier by first restricting the first coordinate to points defined over Fq. And for suitable circumstances we can also take the second argument just as an r torsion point, this time defined over the larger field. And if we have k is larger than one, we can we place the deviser by the point Q. So it's basically just a function given by some point P evaluated at some point Q and then raised to this exponent to give a unique value and rth roots of unity so we get rid of these classes over here. That's basically the thing we start from. We've seen this yesterday already, but I have much nicer pictures with colors [laughter]. This is a standard group law in elliptic curve, and actually these lines that occur here like the lines through P1 and P2 and then the vertical line here, these are the things you can use to build up the function. That's what Victor Miller described yesterday already. So you have these formulas over here. So if you want to compute fmP of Q for some integer m, you can build that by doing some square-and-multiple-like loop using these formulas. So you start with f0, which is 1, and then you build up that. And you always evaluate directly at the point Q. If you think of how large the degree of that polynomial, for example, would be if you don't plug in the point Q, I mean, the degree is n so you have a really function function so you want to plug in the point Q directly and then only compute with the values. So I've also put an Edwards curve down here because you don't need to work with line functions. For Edwards curves, which are quartic curves, it doesn't work with line functions. So in that case you just replace the line function in the narrate by a quadratic function, this blue one here. So if you want to add P1 and P2 on the Edwards curve, you need to find that blue function which is a quadratic that goes through the points P1, P2, this point 0 prime down here, and the points at infinity. And then [inaudible] theorem gives you an eighth intersection point. So because their points at infinity have multiplicity 2, you get 4 of them, then you have these three points is in total 7, and then you get one more intersection point. So that's the group law on Edwards curves. And you can just replace that function by the quadratic and that function by the product of that line and the line that goes through the y axis. And then the algorithm works just the same. So what I just want to point out here is that what you have to do, you have to do computations, and these computations are basically you have to curve arithmetic to compute these points, so you have to keep track of the points you come along, and you have to compute in the field Fq to the k, namely, by squaring this value and multiplying by the fraction of these line functions. And that's the reason you need to have a small embedding degree k, because if that's too large, you just can't do the computations. So we're going to make things even more easy. We're going to restrict to some common choices for the groups we compute the pairings on. So the first, that's what we already did for the Tate pairing. We'll just restrict to a point defined over the base field. And the second group we're going to choose as the Qi in space of the Frobenius endomorphism, also known as the trace zero group, and -- yeah, we'll see why we're going to do this. And then we have basically two variants of pairings we compute. Namely, something like the reduced Tate pairing where the first point is small defined over the base field and the second is taken from G2, so defined over the large field, and then we have a second possibility, namely, the ate pairing where we just change the order of these groups. So the first point is a larger point, the second one is the smaller. The advantage of this is that you see here it's not Fr anymore, it's Ft, and T is basically the trace of the Frobenius. So that value is much smaller than r. So if you think of the square multiply Miller loop, then this can be computed in half the number of iterations than the Tate pairing. So there are even more efficient variants, namely, so-called optimal ate pairings, and they have even smaller values for t here. So the t can be replaced by Fm -- by m, and the notion of optimal ate pairings means that this -- the length of m is actually the length of r divided by the [inaudible] function of the embedding degree. We'll see an example for an optimal ate pairing later. So these are basically the versions -- I'm not talking about any special types, so I want to restrict to just these choices for this talk. Okay. So we have this group G2 here, which consists of points of r torsion points defined over the large field Fq to the k, and that's annoying if we have so large elements. And we can actually use a twist of E to represent that group nicely. That's the main reason, actually, we choose this group; because it can be represented very nicely. So for this talk we will -- yeah, we'll define the twist E prime of E as a curve which is isomorphic to E over F to the k. That's what I said in the beginning. Our universe today is Fq to the k, so nothing large than that. So a twist E prime of E is a curve isomorphic over that universe. And as you might almost all know that these twists look like this, and the isomorphism looks very simple. It's just multi-planic coefficients with omega squared omega cubed for some element in Fq to the k star. Okay. So if -- I mean, the isomorphism is defined over Fq to the k. But if the curve somebody defined over some smaller field and we have like a deviser d of k such that E prime is defined over Fq to the k over d, we say that -- and no smaller field, and the isomorphism is defined over Fq to the k and no smaller field, we say that the twist has degree d. And there are not so many possibilities for the twist degrees. Yeah, for an arbitrary curve in general it's just d equal to 2. If we have special j invariants like 12 cubed, then we have the coefficient b is zero of the curves, and so then we can have twists of degree 4n2, and for j invariant zero we can have twists of degree 2, 3, and 6. That's all that is possible. And -- yeah, so the good thing is that we can say that there's always a unique twist E prime of degree given by that Gcd over here. So if we take d to be deviser of k, and then it's always degree d. So the best thing we could hope for is that we get a twist of degree 6. And there's a unique twist such that our prime R divides the group order of that twist. That's a nice treatment of this. In the original ate pairing paper, that explains all that. >>: [inaudible]. >>Michael Naehrig: And you can write down possible group orders of twists. That's what they do in the ate pairing paper. Then you can see that there's only one possibility of these group orders that can be divisible. >>: [inaudible]. >>Michael Naehrig: Yes. So we're going to fix one of these equations many I'm just saying that -- yeah, so I should say there's exactly one group order. Yeah, right. So now we take this twist and we define the group G2 prime which is just the group of r torsion points on E prime defined over that smaller field Fq to the k over d. And then our twisting isomorphism defines a group isomorphism from G2 prime to G2. So here are some pictures, and I've put that here so that G2 prime is slicing here and has points defined over smaller field and [inaudible] maps over to G2. And if we now, in addition, also represent the field extensions we work in, so Fq to the k, with this same element omega, then that's very convenient because elements in Fq to the k can be written as polynomials in omega with coefficients in Fq to the k over d. So that twist isomorphism just takes the coefficients and puts them at the right places in the new elements. So that's very convenient. There's nothing to compute. And you can see that these group elements from G2 are actually very special. They are kind of sparse because they all come from elements in G2 prime. If you want, you could also think of what happened if I go back from G1. Then you will end up over here in G1 prime, and that will look very similar. It's just coefficients put at the right places. But in this talk we're only going to use G2 prime. Okay. So now we've got all the stuff we need to talk about pairing computation, but now we need curves to compute pairings. And we need to fulfill some security requirements. So this table I have taken from recommendations by NIST and ECRYPT II, and what it says is actually it gives a level of security, and it says there are equivalent sizes. So there are certain sizes for the parameters that give equivalent levels of security. So, for example, if you want to have 128-bit security, you need to take an elliptic curve with 256 bits for the prime R. And the extension field should be something like that, 3000 bits. And -- yeah. So I'm actually not talking about any special assumptions. That's basically the minimum requirement that at least the DLPs should be difficult enough. And now the pairing has this embedding degree that the curve would compute the pairing on, and that embedding degree links the sizes of the prime R and the finite field Fq to the k. So that's very nicely represented down here. That value ro is basically a measure for how far the curve is everything having a prime number of rational points over the base field, so the size of Q is actually ro times the size of R, and then you get k times the same thing to get the size of the extension field. So the factor between the prime R and the extension field is ro times k. And from these minimal requirements here you can deduce kind of an optimal factor for that. So in a 128-bit level it comes out to be around 12. So for efficiency reasons you might want to balance the security. So, for example, first of all, you want to have the ro value as close to 1 as possible. I mean, it's good to have a prime order curve so you don't need to compute with two large primes Q or prime powers Q here. And, also, if this factor of k is too large, then you will have to fulfill a minimal size for R and that makes your finite field way too large for that same security level. The other way around, if it's too small you will have to increase the size for your finite field which makes the group in the elliptic curve actually too large for that security level. So if you believe in these recommendations then a good thing to do would be to choose ro k as close as possible to these optimal values in the table. So we need a small k and we need ro times k to be close to these values and we would like to have a twist degree as large as possible. So there are quite some constraints on the choice of curves we can make. So as Professor Koblitz said yesterday already, supersingular curves always have small embedding degree. In large characteristic it's at most 2 and for characteristic 3 and 2 we can have k up to 6. So they were the natural first choice. That's when they came back from the grave and people were computing pairings on them. But you can see ro times k is quite small, so you would have to make the fields very large to get a certain security level. So I'm going to focus on ordinary curves in this talk. We'll hear something about supersingular curves, pairings on supersingular curves tomorrow, I guess in both talks on pairings. So to get a larger embedding degree and larger ro times k we might want to go to ordinary curves. So what do we need? We need curves with the following properties. That's the standard thing for the group order. We need a prime deviser of that group order. We also need the embedding degree condition and we need to find that curve, and usually that should be done by using the CM methods. I mean, if you choose some curve, as already mentioned several times, then the embedding degree is just enormous. You can't compute. So you have to fix that in advance. So fix the embedding degree and then try to fulfill these equations and that should look familiar from Francois [inaudible]'s talk yesterday. If you have that formula fulfilled, then you can use the method to construct the curve. The only restriction is that you can find the Hilbert class polynomial so that d should be suitable or small enough such that you can actually compute that and find the curve. Okay. So these are the restrictions. And usually it's done by -- so for these examples for the security levels I showed you earlier, it's usually done by selecting polynomials that parameterize these parameters n, p, and t and that already fulfill these conditions as polynomials. And the only thing you need to do is to plug in values until you find these things to be prime P and R, and that's all nicely described in the famous taxonomy paper that has basically all the constructions that are out there. And in this table I have put some examples that kind of fulfill these balanced security requirements. So for the 128 level, of course, BN curves, they have an embedding degree of 12 and ro equals 1, so that exactly fits in these requirements, although one might have to adjust the security a bit by increasing the bit size a few bits because you have these endomorphisms of degree 6 and very special parameterization. So, anyway, these are the names of these constructions in the taxonomy paper. So if you want to look at that later, you can use the slides to find that in the paper. So just to give you an idea what we can have here, so the upper things are always the ones with the higher degree twists. And ro L equal to 1 is just possible for a few families. So the BN curves, Freeman curves for k equals 10 and m and t curves, which are ordinary curves that have embedding less or equal to 6. All the others -- for all the other examples you can find prime order curves. Okay. And then you might want to look at the last column here. So that gives you the degree of the field you have to do the computations in. For the ate pairing you will replace G2 by G2 prime and that's defined over this degree extension of Fq. So very nice example of these BN curves. They are given by these two polynomials. So p and n are given by degree 4 polynomials. And the good thing is that if you look down here, the CM equation by accident comes out to be minus 3 times the square all the time, so they all have CM by the field Q square root minus 3, and therefore all the j -- so they have always j invariant zero. That's the reason for the curve shape here. And it's possible to find curves with a prime order so that n is equal to R and is prime. So what you need to do here to construct such curves is just plug in values for u until you find those two polynomials to give you prime numbers. That's all. Once you have that, you can try certain different values for b and check the order of the group to see if you have the right twist or you can actually immediately compute that order because for these curves you can just write it down. Okay. The embedding degree is 12. That's already said. And the twist of degree 6 -- in that curve shape we have such a twist -- is, of course, given somehow by that coefficient b over [inaudible]. So to conduct this twist you will try just certain values for [inaudible] until you find the right one that gives you the right group order. So what does it mean for our computations? The group G2 prime is defined over Fp squared so we basically only have to do computations over Fp squared. So for the ate pairing curve arithmetic is in E prime of Fp squared. So we'll just always replace the points in the large group by the small ones on the twist. And then you can also represent the curve at the field extensions using that same value [inaudible] from the twist so you have this nice convenient representation that goes along with the twist isomorphism. So that's the algorithm for the optimal ate pairing on BN curves. It looks a bit confusing, but it's basically just a few parts. You can look here. The parameter m is just 6u plus 2, and that's of size one fourth the size of R because the n was given by a degree 4 polynomial. So now we have a linear polynomial in u. And then we have this -- that's the milliloop that's the square multiply like part here where you have these doubling steps. You compute double of the point and then you do the squaring times the line function. So I've already thrown out all the denominator line functions because that's what you can do for even embedding degree. They all lie -- so by choosing this G2 in the way we did, they all lie in subfields and get mapped to 1 by the final exponentiation. You have to do adjustment if the parameter u is negative, and then it's not only -- so for the Tate pairing you just have this milliloop and then you're done. Also for the ate pairing. But for the optimal pairings you have to pay a little bit to get this small order of the function. So you have to do some Frobenius computations, two more of these addition steps, and then steps 14 to 16 are the final exponentiation. You can split that up into some Frobenius powers and multiplications and you are left over with a so-called hard part of the final exponentiation which is kind of a real exponentiation down here. So these are actually the things that are computationally intensive. So here all these doubling steps, you have multiplications in the large field. You have to compute the coefficients of the line functions, evaluate at P, do these multiplications after the squaring and the point computations. And then this is all not so expensive, and then you get this large exponentiation down here. Okay. So let's have a look at how to choose suitable curves that make this algorithm more efficient. So, for example, if you have that parameter m 6u plus 2 very sparse, you don't need to go into that if statement so often. That's good. That saves you computation. Also, if the value u itself is sparse, has a lot of zeros in its binary representation, then you can do that part of the final exponential down here very efficiently. So that's really essential to choose these things in that way, we will see later. And then you might want to choose p [inaudible] 3 mod 4 to get that nice representation for the quadratic field. So you have a lot of computations of Fp squared, and that will make that very efficient, although maybe Francisco will have maybe a different view on that tomorrow. So I would say that this is the best choice. And then you might want to choose psi as small as possible. We'll see what that means in a few seconds. So there have been some very fast pairing implementations out there which just look at the efficiency of the actual pairing algorithm. But if you do protocols you want to also have efficient curve scaler multiplication, and you might need to hash to the curve. That means currently you have to compute square roots or Q proofs. So you might think about that as well. So also there are some protocols designed for constraint devices where actually these devices don't compute any pairings at all. The pairings are computed somewhere where you have power to compute them. And then if you increase pairing efficiency by paying with a less efficient scaler multiplication, then that's also not a good thing to do. So even you could use Edwards curves to get a very fast scaler multiplication. You can compute pairings. You're not as flexible. But, still, I mean, it could be the best choice. All right. So the next two slides are actually just about a nice subfamily of these BN curves that we actually just came up with to show that you can choose BN curves as nice as you want because there are a lot of them, there are enough curves. And, for example, if you choose that element psi in a way such that its norm over Fp squared gives you the parameter P, then you already know which is the right twist to use, so you don't need to check for the order and stuff like that. So that's just very basic, some basic calculations using the fact that these curves don't have points of order 2 and 3. So we suggest the following things. So you should choose your BN curves as follows. You should choose a low weight u such that you get a low weight 6u plus as already mentioned earlier, p, comma, to 3 mod 4 to get the nice quadratic field extension, and then you choose a small psi in this shape. And that gives you B like this. So C to the 4 plus d to the 6. And you plug in some small integers to try -- I mean, you still have to try to find a right -- the right B value. That's the thing you need to do. So you need to plug in some values for C and D that gives you the correct B and then everything else is fixed. So the advantage of this is you get an obvious point on the curve which is minus D squared, comma, C squared. And you also get an obvious point on the twist, which is this point. So once you fix C and D, you can compute all that very easily. You get all these generators. I mean, that's not something you need for efficient computation, but it's a very nice representation of these curves, and you can choose it all in a way that gives a very efficient pairing algorithm. So one example curve is this. So the u is very sparse, just 3 minus 1 values in the [inaudible] representation. And you can choose C equals 1 and D equals 1. So this information is all you need to describe all the parameters you need for pairing computation. It might be useful for putting curves in certificates or something like that. That's really not much information. And from that you can compute everything, you get these nice generators. If you want to send them around, that's not much information. You have a very small psi here. So multiplication by that thing is just two additions in Fp squared. And the twist looks like this. Also, B and B prime, so B over psi are kind of small because current really fast formulas for computing the pairings use these values B. And if they're small, that's better. Okay. So these are some suggestions for BN curves. Okay. So we'll come to the second part of my talk, which is about a pairing implementation. And, I mean, these pairing-friendly curves are very special. They have this special embedding degree condition, and also most of them are given by polynomials. So the primes are given by polynomials, so that's a special structure. So the question is could we use that to make arithmetic more efficient. And, actually, Fan, Verkauteren and Verbauwhede demonstrated this in a hardware setting where they showed that you can use this polynomial representation to get very fast algorithms. And here they choose a u that's kind of almost power of 2. It's just the power of 2 plus some small thing that makes the polynomials prime, give you prime values. The problem is that essentially uses the fact that you can build specially sized multipliers in hardware. You can build multipliers that have certain -- that multiply certain small numbers with larger ones, and that's more efficient than a general multiplication which you don't have in software. You can just use what you have. So their approach is not exactly what we can do in software. So the question is does it work in software. So if you look at Dan Bernstein's paper, Curve25519, then you see that he does some similar things there. He represents the elements in some strange rings and so if we do it like here, we can just write down that polynomial P, so we introduce another variable, we write it as polynomials over some variable x, and then you can write that P in these two different shapes. And if you plug in 1, you get back the value of the prime P. So what we can do now, we can represent our elements in Fp by just using the same representation here. So we put coefficients here and kind of have a representation with certain different sizes for the coefficient. I mean, it's not unique. We can just use it -you can have several different representations of a number. It's just we need to have that F of 1 is equal to F. So the arithmetic in -- we will do arithmetic now by multiplying polynomials and then plugging in 1 in at the end. So if you have two such elements, multiply them, you get an element of degree 6. That gives you 7 coefficients, and then you want to get back something with 4 coefficients, of course, and then you can use the polynomial representation, this P thing, to reduce the degree of the polynomial. And that actually looks very simple. So you get these formulas. That's the degree reduction. So you do something like a school book multiplication on polynomials and then reduce the degree again. The problem is that doesn't really help. That's not really more efficient than, for example, Montgomery multiplication. The reason is you will have to do reductions all the time just as in Montgomery. And probably Montgomery representation is the more efficient thing to do. So, again, it's the problem with the hardware. I mean, we get coefficients -- so if you multiply these things with 64-bit coefficients, the product has 128-bit coefficients, and then you add something during the algorithms that makes these things grow, you have to do reductions all the time. So if you had hardware realization, you could make a big larger registers and then it would work. So the idea we have is -- we had is, okay, let's take more coefficients and put it in some variables such that there's space. The problem is we need to restrict to third powers of u and that's a really strong restriction. Okay. Anyway, so we take 12 coefficients like that, and then at 256-bit number we represent by these 12 coefficients, and then this v thing that is 21 bits. U was about one quarter of r, so 256. U is about 63 bits and then v is 21. So what happens if you multiply two such elements? You get double the size of the coefficients. And if you put all these things in double-precision floats then there's still space for doing some additions without having to do coefficient reductions. So we use double-precision floating points and can do some computations without doing the reductions. But, of course, there's some point you reach the limit of the size of these variables, so you have do some coefficient reductions. And that's what -- we do it just like with round functions. So that's just computing modulo 6v and computing modulo v and then you get, like, a balanced representation where these things are between minus 3v and so on. And, again, if you get a carry from the last coefficient, you use the polynomial representation to reduce back to the other coefficients. So how Peter and [inaudible] implemented this. They used vector instructions, and mulpd and addpd from wick you actually do two in one instruction. So we have these SIMD registers and instruction, and you can do two multiplications and two additions. The problem is if you do Fp arithmetic with that you'll have to always make sure that the right things are in the right places to actually use these things. So you have to do shuffling and combining and things like that. The solution to this problem is that you directly implement Fp squared arithmetic because that naturally gives you some kind of parallelism where you have things -- so if you write down the elements -- and Fp squared elements interleaved, you take the coefficients from the first and then the zero coefficient from the second and so on, you more often have already the same things at the right -- the right things at the right places. So then we used school book multiplication, so no [inaudible] because that's an even number of multiplications you can do in parallel and so on and so forth. And then per multiplication you have to do at least one polynomial reduction and two coefficient reductions. There's one nice things I should mention. Of course it will become more efficient if you take out reductions. So that was actually the plan when we took these variables that you have some space to take out reductions. And so Peter wrote a class that always takes care of these of the maximal values that you can get. So we just plug that in, and he tried how far does it work, and then we need to do a reduction again and so on and so forth. And then in the end it was reimplemented in [inaudible] j. So the results. So currently we have kind of performance like this, so 4 million cycles. To compare with what was published before, there was a paper by Hankerson, Menezes and Scott from 2008. They gave for the same security level BN curves optimal ate pairing. It was the r ate pairing, but it's almost the same algorithm, and they gave, like, 10,000 cycles. And we asked Mike Scott, and he sent us back something like that for a newer version of the parameters, and we were quite happy that this was so fast. And then a few weeks later some people came along and made a very fast implementation of the optimal ate pairing. I'm not going to talk about that very much because that's tomorrow at the same time. Francisco will explain to you what they did to get these performance. So if you have a look at this table down here where we also give the cycle counts for multiplication and squaring Fp squared, then you see that actually our paper is currently a bit faster, just looking at the multiplications. But the total pairing is slower. And the reasons for this is that -- so the main actually is that we have to restrict to this condition here, third power of v. That restricts the choice of curves so much that we can't use these nice curves, and the other guys actually use a very nice curve, so very sparse parameters and everything. So that's the main reason it doesn't work so nicely. Also, it wasn't possible to remove as many reductions as we thought. So there are still too many reductions in that, so that takes time. And multiplication is not really much faster. So for hardware it seems it doesn't really pay off to use the polynomial representation. So the reasons for multiplication not being faster is that we use school book to have this parallel structure, so an even number of multiplications, and by -- and still we have to put things at the right places, so that also takes some time here. But, still, it's not that bad. It seems that still Montgomery multiplication, that's what the others use, is the best thing to do for software. But maybe if we try to compute pairings on other architectures where floating points are faster than maybe our approach will be better. Okay. That's the second part. So if you're bored and you want to do something else, you can think of why our paper is called DCLXZI and talk to me after the talk. So the third part actually deals with the coordinate system choice for pairings. Now you have to do curve arithmetic so you have to decide which coordinate system you use. And I wanted to show you this citation from Steven from the year 2005. He said one can use projective coordinates for the operation in E of Fq. The performance analysis depends on the relative costs of inversion to multiplication in Fq and experiments show that affine coordinates are faster. So that was 2005. That was before BN curves and that was with supersingular curves. In the meantime, the picture has changed. I mean, for ECC it's clear that you will use projective coordinates because finite field inversions in prime fields are very expensive compared to multiplications, and you want to avoid these inversions by doing some more multiplications. And also for pairings, I mean, the current speed records, our implementation as well as the one by Francisco and the others, used projective formulas. It seems to be the best to do there. But, still, maybe one should rethink about using affine coordinates. So the reasons for this is that in some cases we can have quite efficient inversions. So, for example, if you work in extension fields. So here the example of a quadratic extension given by that polynomial, yeah, you can compute the inverse by basically computing the inverse of the norm of that element which is an inversion in the base field. So to invert in Fq squared you invert in the base field and do some additional multiplications. I would have to compute these squares. And so you have the compute the norm first, then invert that, and then multiply by the coefficients. So an inverse in Fq squared should be less than one inverse in the base field plus these operations to multiplications to squarings, multiplication by omega and so on. This means that the ratio of inversion to multiplication becomes smaller. So if you assume that a multiplication in the large fields with curve [inaudible] takes at least three multiplications, then this ratio will be roughly one-third plus some constant. So going up in extension fields makes your inversions by a certain ratio better. In this case it's roughly one-third. So in general for a degree L extension -- so what I'm telling you is all old stuff that's well known and it's basically a generalization of the Itoh-Tsujii inversion algorithm. It's the standard way to compute inverses in optimal extension fields. And you can do that in general. You can invert an element in some extension field by computing the norm [inaudible] invert willing that and then multiplying by that element raised to this exponent here. So B minus 1. And if you sit down and think about the exact costs for these algorithms, you end up with something like that. So in degree 3 extension, for example, you get roughly one sixth the ratio plus some constant here and so on. So you can see the larger the field extension gets, the better your inversion with respect to multiplication will be in the end. Also, another trick also very well known, I don't know which one of Peter's tricks it is, but it's one of the tricks that came up in these papers on the ECM methods. So very well known. Of course, if you want to invert two elements in parallel, then you compute the product, invert the product and do two more multiplications to get the inverses. So you replace two inversions by one inversion plus three multiplications. In general, that works just as for two elements you compute all these products, you invert the last one and then you can get back the elements by always two multiplications. So in total, you replace S inversions by one inversions and three times S minus 1 multiplications. And that gives you an average inversion to multiplication ratio of roughly 3 if S is large enough. So we have two ways of improving the inversion to multiplication ratio. One is this thing when you are able to do several inversions at once, and this has nothing to do with field extension. On the other hand, we have that property that in extension fields you get a better ratio. So if you think back at the ate pairing algorithm at the first table with the examples for the curve construction methods, then the last column gave you the extension degrees. And for the ate pairing or variants of the ate pairing which are the most efficient pairing algorithms, you will end up doing computations exactly in that extension field. So if you compute the line function of values, you compute a slope of the line function and you have to do an inversion in that extension field. So that's the case if you use affine coordinates. So it might be actually a good thing to use affine coordinates. If your inversions are cheap already, which is usually not the case, or for ate pairings if you go to very high security levels where you have large extension degrees, then you can use these extension field tricks to improve your i to m ratio and get a better algorithm. And there might be reasons not to use these very special curves with a fixed [inaudible] 1728, so people might want to use more general curves that only have a degree 2 twist. And then this field extension is larger also. So in that case it might help to use affine coordinates. It might be worth considering. Or if you -- to come back to the second trick, if you compute several pairings. There are a lot of protocols that actually compute many pairings or for the [inaudible] proof systems you compute products of pairings and then you can do that in parallel -- they are not really in parallel. You do it at once. You wait until you get to the inversions for o and then you do the inversions at once and get a ratio of roughly 3. So there are some possible scenarios where it might be good to think about affine coordinates again. The reason we stumbled about this was when I was working on Microsoft's pairing library, and this is based on the bignum library, which is basically written by Peter Montgomery. So that is has finite field arithmetic field extensions, polynomials, elliptic curves and all the stuff. So I was working on pairings by using that library. So what we did is we used the base field arithmetic from big num, so Peter's Montgomery multiplication. And as usually so 256 bit numbers are split into four pieces and you do the usual Montgomery stuff. We have the extension fields and I put in these inversion tricks I described before. And the whole thing is a C implementation. For the special case of 256 bit integers, we have a little bit of assembly. But it's basically completely C. It's not restricted to any specific security level curves or processes you can plug in any BN curve. So you can even plug in other curves. But currently we have the most stuff for BN curves. So you can plug in any curve and it will work. It works on a 32 bit and 64 bit Windows. And that's what you get for, that's what we got for the field arithmetic. If you compare the multiplications, it's really much slower than this special implementations I talked about earlier. But I wanted to show you this last column here, which is the inversion to multiplication ratio. So if you're given like an implementation with certain properties, then you see here in the 12th degree extension, an inversion is almost as cheap as a multiplication or as expensive as a multiplication. So if you would work over such large extension fields, you might want to do the inversions instead of using projective coordinates. And then the pairing algorithm it's basically an Ate pairing as shown before and here you can see some -- I mean, so that's the optimal Ate pairing. It's about 14, about 15 million cycles and if you use the simultaneous inversion trick, it doesn't show too much here because the ratio in FP squared is is already 5.3. If you reduce that to almost 3, that's not a big deal here. So it becomes a little bit better product of pairings can use more optimizations. You can just already [inaudible] everything into one milly variable and save a lot of time. Good, if you want details and thorough analysis of how much more projective formulas costs, how much more the fastest projective formulas cost compared to F coordinates and so on, you can look at the paper here. Thank you. [applause] >>: Any questions? >>: The obvious question, putting together parts two and parts three, who is going to do it? When is it going to happen? >> Michael Naehrig: Pardon? >>: You mentioned the possibility, maybe probability of speeding things up with that by coordinates and you've been mentioning better multi precision arithmetic. So when are we going to see implementation that puts these things together and gets even better? >> Michael Naehrig: For BN curves, I mean, usually it wouldn't pay off to use the FI coordinates, because if you look at the speeds people get for these fastest implementations, then the projective formulas are much better. So it would only pay off for higher security levels. So, yeah, it's a good thing to try that, yeah. >>: It seems to me that the recommendation have taken [inaudible] find a supply of curves. >> Michael Naehrig: Yes. >>: Well, you over sell, no matter how much ->> Michael Naehrig: Oh, okay. >>: That you expect for the same proportion, or the operational takes it to be a power of 2 [inaudible]. >> Michael Naehrig: Yes. >>: Can get lucky a few times but then it will run out. >> Michael Naehrig: Yes. >>: If no more questions, let's thank Michael for his talk. [applause]