>>: This afternoon's session is, as I say, a... you who never had the good fortune to meet Oliver,...

advertisement
>>: This afternoon's session is, as I say, a tribute in honor of Oliver Atkin. For those of
you who never had the good fortune to meet Oliver, he was sort of a singular point in
mathematics, I can. Oliver was quite entertaining, and he was very obsessed, shall I
say, with the numerology of number theory. In fact, I was thinking during Kristin's talk
that he would have really loved looking at those factorizations of the coefficients that
she gave. He probably -- undoubtedly he would have gone off muttering to himself and
come up with a good modular explanation of what's going on, but that's only my
speculation.
Oliver was one of the real pioneers in computing in number theory. I mean, from the
time when computers were laughably primitive by today's standards, Oliver was doing
some rather non-trivial computation. And he didn't have all that many students, but I
think he did have actually a great influence on the field of elliptic curves and modular
forms, each of which were his personal friends, I think.
He was interested in lots of other things. In particular -- related. In particular, he was
interested in primality testing, which the shy and retiring Dan Bernstein will speak about
[laughter].
>>Daniel J. Bernstein: All right. Thanks.
Let's do a microphone test. Am I audible from the back? I see a thumbs up. Okay.
My first encounter with Oliver Atkin was in '95. I was applying for a job at the University
of Illinois at Chicago, and I was nervous. I was 23. I was giving a talk and realizing as I
was giving the talk, oh, my God, there's all these people in the audience who don't do
number theory, and maybe I should define a number field. And so I quickly give a
definition of a number field instead of just saying some things about it, that, for instance,
q adjoins square root of minus 1 of degree 2 and q adjoin zeta 19 of degree 19.
Now, as soon as that second 19 came out of my mouth, instantly, very loudly from the
back of the room in a some sort of British accent there was a [inaudible] [laughter]. I
said, 18, excuse me, and I continued with my talk, and I got the job.
A few months later Oliver had his requirement conference, and as Victor mentioned, he
was very funny. He stood up at some point and explained that retirement -- of course
he would continue working, retirement simply meant that he would no longer have to
talk with students about anything less than cubic reciprocity [laughter].
I have a few of his papers mentioned here. I actually thought I might spend the hour
just quoting things he said. I will resist that, but I will give one entertaining quote here.
The whole subject of primality and factorization has had an extraordinary fascination for
me since the late 1960s when John Brillhart, John Selfridge, Dan Shanks, Dick Lehmer
and had in others introduced me to it, both in person and in print. I was no stranger, he
wrote, to primes in computation, but these had previously arisen only as the
eigenvalues of Hecke operators, and were certainly all less than 1 million.
He goes on to say how in primality and factorization the major influences on the subject
in the last two decades -- again, he was writing this in '95 -- is from the intelligent
primality test offered paper. The major influences on the subject in the last two decades
have been the use of elliptic curves by Lenstra, and the increasing number of
applications, in particular to cryptography. And then he says these influences increased
the audience for the subject and so necessarily decreased the level of judgment and
professionalism [laughter]. For some peripheral observers this fact has obscured the
novelty, beauty, and often simplicity of the ideas.
I figured that I would spend the hour giving a few examples of things that -- maybe in a
few cases I'll go a bit beyond what he said in his papers, and I hope that the things that I
say that are beyond the papers are things that if he were here, he would have enjoyed.
So, first of all, recognizing primes. Oops. Skip the Carmichael for the moment.
If you want to prove, for instance, that that 314159265358979323 is composite, you can
just apply the contrapositive of Fermat's little theorem computing 2 to the n -- this
number is n -- compute 2 to the n modulo n, subtract 2 from it, and you end up with,
well, some number which is visibly not zero modulo n. And Fermat tells you that if n
were prime, then 2 to the n minus 2 or, in general, w to the n minus w for any integer w
would have to be zero modulo n.
This is a great way of proving most composite numbers to be composite. But there are
some numbers which seem prime from the perspective of Fermat's little theorem, and
these are the Carmichael numbers, which, thanks to Alford, Granville and Pomerance
we know there are infinitely many of these Carmichael numbers, numbers where w to
the n minus w of zero modulo n even though n is in fact not prime.
So what do you do to get a more reliable test? Well, you start factoring w to the n minus
w. For instance, if w is any integer n is let's say an odd prime -- pretty easy to tell
whether an even number is prime -- if n is an odd prime, then either w or w to the n
minus 1 over 2 plus 1 or w to the n minus 2 -- n minus 1 over 2 minus 1 or more than
one of those has to be zero modulo n. And the proof is simply, well, Fermat's little
theorem says w to the n minus w is zero mod n, and w to the n minus w is a product of
the three functions that I just mentioned. So at least one of the factors has to be zero
modulo n, and, well, that's the conclusion.
Now, this is more reliable than Fermat. You can keep going. For instance, if n is 1 mod
4, then you can factor w to the n minus 1 over 4 minus -- sorry, factor w to the n minus 1
over 2 minus 1, which is where I got to a moment ago, factor that into 2 pieces. It's,
again, a difference of squares if n is the congruent to 1 mod 4, and then you get a more
reliable test.
The end result of continuing in this way is from Artjuhov in 1966 who said in general if u
is the number of powers of 2 in n minus 1, then you keep factoring n minus 1 into -- well,
w to the n minus 1 minus 1 into as far as you can go with powers of 2 in n minus 1.
For instance, here's a proof that 2821 is not prime. If you take 2 to the 1410 plus 1, 2 to
the 705 plus 1, 2 to the 705 minus 1 modulo 2821, then they're all not zero. But, wait a
minute, the product of those is 2 to the 2820 minus 1, which would have to be zero if
2821 were prime. So 2821 is not prime.
All right. That's Artjuhov's test, and it's actually very, very reliable. The standard
theorem is that if n is an odd prime and you apply Artjuhov's test for a random choice of
w between 1 and n minus 1, it's got at least a 75 percent chance of proving that n is not
prime.
Of course, if you apply it to a prime, it will never prove that n is not prime.
Try a bunch of choices of w, enough that the 75 percent chance keeps piling up. If you
try, say, log base 2 of n or ceiling of log base 2 of n choices of w, then standard
conjecture is that this reliably recognizes primes.
If trying all these choices of w in any reasonable pattern, if that fails to prove that n is
composite, the only way that can happen is if n is prime.
There's all sorts of people -- I'm not going to try to trace who exactly is responsible for
the pieces of this. This is the current typical way of checking that a number is prime or
proving that it's composite.
How long does this take? Well, I've told you to try log n, log base 2 of n choices of w,
choices of potential witnesses to n being composite. How long does each of these w's
take? Well, you have to do some exponentiation model n. You have to do something
like log n bit operations to multiplied mod n, and then you have to do log n
multiplications to do a nth power, and then you have to do that log n times for log n
values of w. So that's log n cubed time to do all of these exponentiations.
You can try to speed that up. Maybe log n cubed is not the fastest way to reliably
distinguish prime numbers from composite numbers. For instance, you could try doing
only square root of log n choices of w, and that would reduce the time to log n to the
2.5. Quite a lot faster than log n cubed. Except it doesn't work. There are certainly
numbers that pass this test with only square root of log n choices of w.
The reason is that you can easily write down lots and lots of numbers and where that
75 percent is actually quite realistic. For instance, here's one of the Atkin-Larson
examples. And I think they were the first to write this down, although the whole paper
was about three pages long essentially saying that all the previous papers on the topic
were stupid, but one of the points that they made in this paper was that if you have any
n of the form 4k plus 3 times 8k plus 5 where those are both primes, then you will have
about a quarter of the possible w's in fact making n seem to be prime when in fact it is
clearly composite.
If you look at how many n's there are and you think about how many w's you'd have to
try to get rid of all of these n's to make the test succeed for all these n's, you see you
have to have something at least close to linear in log n for the number of w's to try to
exclude all of these composites.
So what do you do instead of if you want to try to improve on log n cubed? Well, you
could try a quadratic extension of z mod n. Instead of looking at the multiplicative group
of z mod n, let's look at, for instance, z mod n adjoin t in middle here, z mod n adjoin t
where t is the root of t squared minus wt plus 1.
Now, I've put a hypothesis on w here to force this to be a field, mainly w squared minus
4 having Jacoby's symbol minus 1 modulo n, if n is prime, as is the [inaudible] symbols,
w squared minus 4 is not a square and then, well, that polynomial, t squared minus wt
plus 1 is discriminate w squared minus 4, which is not a square mod n, so this isn't in
fact a field extension. The test you do is, well, if n is an odd prime and you compute t to
the n plus 1 over 2 in this field, then you will get 1 or minus 1, again, assuming w has
the right symbol, w squared minus 4 has the right symbol.
And the proof, to be complete about it, first, well, as I just said, from w you know that
that extension is in fact the quadratic field extension of z mod n, and now what does that
tell you about t to the n? Well, t is certainly a root of this polynomial u squared minus
wu plus 1 by construction t squared minus wt plus 1 is zero in this field. Now taking nth
powers, t to the n is also a root, but certainly t to the n is different from t because in this
field we know all the numbers whose nth are themselves, and t is not one of them.
So this polynomial has two roots, t and t to the n. Therefore factors is u minus t times u
minus t to the n, and then looking at the constant coefficient you see t to the n plus 1 is
1. And therefore t to the 1 plus 1 over 2 is 1 or minus 1. And that's exactly this test,
which is a typical Lucas-style test.
Reinterpreting the computation here, this is counter the number of points on a certain
curve. I'm working with powers of t which has norm 1. Well, let's look at all the
elements of norm 1 in this extension. Let's look at all the y plus xt's that have norm 1, in
other words, that have x squared minus wxy plus y squared equals 1. That's some
curve. It's a shifted, twisted circle, clock, if you like. On this curve, well, the
computation I just did is counting the number of points on this curve. It's exactly n plus
1 under the same assumption about w.
The number of points on this group scheme evaluated at z mod n is n plus 1 by this
hypothesis on w. So if you multiply n plus 1 by any point, for instance, 10, this is if you
take t to the n plus 1 power, you will get the identity on it which is 0, 1. And now that -well, okay, aside from dividing by 2, getting the neutral element or obvious point of order
2, this is exactly the same test that I wrote down here which -- okay, it's fun to have
curves running around, but what's the point? Is this actually better than the original
test? It's not, actually. It's certainly not faster. It's somewhat slower. It's maybe more
reliable? Well, if you look at it, no, it's not more reliable. There are just as many failure
cases for this test as there are for the usual tests.
So then you say, well, okay that attempt to apply mathematics was not improving the
situation. Let's put in some more. Let's have an elliptic curve. For instance, let's take x
squared plus y squared equals 1 minus 30 x squared y squared. There's a nice genus
1 curve, and, hey, genus 1 must be better than genus 0.
Then assuming you know the number of points, which was -- the critical calculation here
was figuring out the number of points on this x squared plus y squared minus wxy
equals 1. Assuming you can figure out the number of points on this n curve, modulo n,
then you can do the same kind of test and take some random element of this group, this
group scheme at z mod n, and multiply it by the known number of points, the known
number assuming that n is prime, and then, well, if n were composited, it would have an
awfully difficult time having the presumed number of points times some point here
coming out to be the identity element.
This is what the Chudnovsky brothers and Gordon proposed in the mid '80s, building
the elliptic curve e with complex multiplication only in the class No. 1 case. Of course,
we now have to do this very efficiently for higher class numbers. I'll come back to
exactly how fast that is.
But, again, there's no point in doing this. This is not better than z mod n star, it's not
more reliable -- well, if you look at how reliable it is, then you see that these elliptic
pseudoprimes for doing an elliptic curve primality test are just as frequent as regular
pseudoprimes or quadratic pseudoprimes, so there's no point.
What do you do to make a better, faster primality test? Well, this is the subject of Atkins
'95 paper. You try to combine different tests. You try to say instead of doing a lot of w's
for z mod n star or doing a lot of w's for this x squared minus wxy plus y squared equals
1 or for some elliptic curve, you start varying which groups you're working with.
The first proposal along these lines was from Baillie and then Pomerance, Selfridge and
then Wagstaff who said take one quadratic test and one linear -- well, one z mod n star
test. The total time to do those two tests, each one of them takes quadratic time, so
doing two of them takes quadratic time essentially. If you compare that to doing two w's
in the original test, it's much, much, much more reliable. Even now there are no
counterexamples known. There are no examples known of numbers n which are
composite and which are not proven to be composite by their tests, filling in the details
of exactly which w's they take.
If you can find an example, you get $620 of which I believe $20 are from Pomerance
because he thinks that there are lots and lots of counterexamples -- I'll come back to
that -- and then Atkins said well, okay, okay, linear and quadratic is not enough. Here's
a really confidence-inspiring test. Do a linear test, a quadratic extension of z mod n and
a cubic extension of z mod n. And he goes to some effort to make a cubic extension
which allows really fast computations and offers $2,500 -- this is no longer open, for a
counterexample. I mean, we don't know any counterexamples, but if you find one, you
don't get $2,500.
Pomerance's argument about the linear and quadratic test was published in '84.
Actually, it was at, I believe, Arjen Lentsta's Ph.D. defense. He a wrote a little paper
saying here's how Arjen can make some money, can make $620 -- at the time it was a
slightly smaller amount -- but to get Arjen off on a good financial footing, he could try to
construct counterexamples to this test. And Pomerance explained how to do this, and
the same explanation just also give lots and lots of counterexamples to Atkins' test. So
there should be lots and lots of counterexamples. But, actually, the obviously thing to
do is keep going just a little bit, where the little bit grows really, really slowly with n.
I think if you take something much smaller than log n -- I'll quantify this a little more
precisely in a moment -- if you take something far smaller than log n test, well, log n to
the epsilon test where epsilon converges to zero with n, then I believe that this
sequence of tests becomes perfectly reliable. So if you take Atkins' intelligent primality
test and keep going to a super-intelligent and a cortic, quintic, et cetera then you will
something which is a perfectly reliable test for primality that takes only essentially
quadratic time instead of essentially cubic time.
Now, I'm not sure if this analysis, the analysis I'll show you in a moment, has been done
before. I've put new in question marks for this conjecture. It's a pretty easy analysis to
do. At the same time, I've seen people who are speculating that the best possible
primality recognition algorithm takes essentially cubic time, so the quadratic time
conjecture does seem to be new at least to a bunch of people writing papers in this
area.
Further comment, which I'll also come back to, is that if you want to make this run as
quickly as possible, not just get the exponent down to 2 but get the little o of 1 as small
as possible, then you certainly should not be doing degree 20, degree 21, et cetera,
extensions, you should be doing a bunch of those elliptic curves, being careful to not
combined a bunch curves which all have the same number of points. Gordon's test
always had n plus 1 points.
No point in combining those. You want to have a lot of orders which have a large
least-common multiple. But that's easy to do.
Where does this conjecture come from? Well, Erdos in 1956 -- this was the pays for
Pomerance's analysis -- Erdos said there should be infinitely many Carmichael numbers
because there should be infinitely many numbers n for which n minus 1 is a multiple of p
minus 1 for every prime p dividing n. This is how you force w to the n minus 1 to be 1
modulo p is you force p minus 1 to divide n minus 1. So unless w is a multiple of p,
certainly w to the n minus 1 will be, well, w to the p minus 1 times something, which is 1
modulo p. And if you manage to do that for every p dividing n, then, well, you've made
an n for which w to the n minus 1 has a very good chance of being 1 and then a good
chance of passing all the tests you might do with z mod n star.
What's the chance that n actually gets through this? Well, if you think suppose I've got
n where I know it's got a p times some other stuff, fix a p and then say I've got n as p
times some q times whatever. Then what's the chance that n minus 1 will be a multiple
of p minus 1 or it may be close enough that you'll have a good chance of w to the n
minus 1 being 1. Well, basically you want some event of n being 1 modulo p minus 1
which has chance 1 over p minus 1, maybe a little bit more for allowing, say, p minus 1
over 2.
What if you allow p to vary? Well, these aren't independent chances, because if you
look at the 1 over p minus 1 chance for each p, the chance of all of those happening is
not 1 over the product of p minus 1, it's 1 over the least common multiple of p minus 1.
This was Erdos's central insight that there's going to be infinitely many Carmichael
numbers that the least common multiple of p minus 1 does not have to be very big. You
can have a whole lot of primes p where p minus 1 is a product of very small primes.
If you start with a set q1, which is all the primes up to 100, 1,000, a million -- pick some
number which grows slowly -- and then take all the primes p up to some bound such
that p minus 1 is a product of a subset of those small primes, primes up to 1,000. Now
you've got a bunch of primes p which the least common multiple of the p minus 1's is
actually -- I'll guarantee it to be at most the product of all of the elements of q1.
Now, that's not very big, and that actually gives a good chance that if you form a lot of
different n's from these lot of different p's and then look for each of those n's as n minus
1 divisible by the least common multiple of the p minus 1's, well, is it divisible by the
product of all the primes in q1, there's actually a very good chance of that happening, at
least enough of a chance that when you look at all the n's over all the p's, then it
actually does happen very frequently. So Erdos conjectured there are h to the 1 minus
epsilon Carmichael numbers up h. And that still hasn't been proven, but at least we
know there's h to some constant power.
Pomerance attacked the linear and quadratic test by saying, well, let's -- instead of just
having one set of small primes q1, let's have one set of small primes q1, say, every
prime that's 3 mod 4 will be in q1 up to, say, 1,000, and then every prime that's 1 mod 4
up to 1,000, we'll put those into q2, and we'll have p minus 1 be a product of any subset
of q1 and w plus 1 being a product of any subset of q2. And there's actually quite a few
primes that satisfy both of these conditions.
And then if you take n to be a product of a lot of these different p's, then there's a pretty
good chance that n minus 1 will have -- will be divisible by all the of the elements of q1
and that n plus 1 will be divisible by all of the elements of q2, which guarantees that n
will pass at least the simplest forms of the linear and quadratic test and has a good
chance of passing even the fancier linear and quadratic tests you might write down.
If you looks at Atkins' tests, three tests, a linear and a quadratic and a cubic, then you
can, of course, apply Pomerance's argument, but if you quantify -- as the number of
tests goes up, if you quantify how big the numbers are that you have to write down for
Pomerance's argument to kill this test to exhibit a composite number that passes the
test, then you see -- well, at least I did a pretty solid job, I think, for Pomerance's original
analysis, but he wasn't trying for a lower bound, he was trying for an upper bound. Still,
I think this is going to be pretty close to the truth that t is going to be bounded by
something times log log n. So the answer you get from Pomerance are something like
double the exponential in t.
If you have two tests, then already it's so big nobody's found it yet. If you have three
tests, it should be ridiculously large. As t goes up, the size of n you need to fool this test
becomes, well, doubly exponential in t. I wouldn't be surprised if it's actually, say, log
log n times log log log n or something else that makes analytic number theorists happy,
but I'm certainly very comfortable conjecturing that I haven't missed so much in the
analysis that t is certainly less than log n to the epsilon. That's actually a very weak
conjecture compared to what seems to be the case.
Yes?
>>: [inaudible].
>>Daniel J. Bernstein: Well, this is coming from n. N is going to be a product of p's
where the p minus 1's all have just -- each p is exploring a bunch of different primes
from the same set q1, which is only, say, the primes up to 1,000.
Now, there's a lot of different p's that have p minus 1 being a product of various subsets
of those primes, but then the least common multiple of all the p minus 1's is not very big.
It's just the product of the primes up to 1,000. So it's just e to the 1,000. So that's the
chance that n minus 1 is divisible by that particular product of all the q1's? It's a huge
number. It's like e to the minus 1,000, which on the scale of everything else happening
here means you only have to look at e to the thousand different numbers n before you
get one that passes that, and only e to the thousand for passing that.
So that was Erdos's argument. And this is maybe not the most computationally
effective way to construct an n which passes these tests, but it does convince analytic
number theorists that there should be infinitely many counterexamples.
At the same time, there are quantitative limits on how far this can go, so I do believe
that there is an essentially quadratic time primality test.
What if you don't believe these conjectures? Well, I'll get to that in a moment. I first
promised that I would get back to constructing elliptic curves, because certainly you
don't want to use very high degree extensions. They're much slower to do
computations in than working with elliptic curves.
So let's say you want to do a t test with t different elliptic curves or maybe t minus 5
tests with elliptic curves and five tests with degree 12345 extensions.
Let me contrast this with what happens in ECPP. In ECPP we're trying to construct
something like log n different curves so that we can find 1 that has its order being prime
or 2 times the prime, 4 times the prime, something like that.
In this context, we don't need that condition. We don't need orders which are
essentially prime. That's important for proving primality of n, but that also slows things
down dramatically by having such a big t. Here t -- well, I think it's log n to the epsilon.
Let's assume it's log n to the .3 at most.
Then you can easily generalize the standard Shallit ideas for making ECPP construct a
curve quickly. You start with a bunch of square roots of small numbers, say numbers
up to t to the one half -- anything that's substantially less than t will make the
asymptotics kind of reasoning for the time to do that -- then there's a good chance, if
you look at discriminants up to t squared or 10t squared, they have a good chance of
being t to the one half smooth, the factoring into integers up to square root of t, which
means that from the square roots -- the square roots are relatively slow. Writing down a
single square root already takes log n squared time.
Okay. Doing t to the one half square roots, that's t to the one half times log n squared
time, once you have some square roots, square roots of all the numbers up to t to the 1
half, you can multiply them together much more efficiently to get square roots of
discriminants up to t squared, or I should say negative discriminants down to minus t
squared.
Now, the time to do all those multiplications instead of the t to the one half times log n
squared is more like t squared times log n, which -- well, for the remainder of t that I'm
talking about is much, much smaller than the something times log n squared. That's the
bottleneck.
What do you do next? Well, do some lattice basis reduction to figure out which of your
discriminants is actually happy with your prime, which of your primes is happy with a
class group, and then that gives you something like t, discriminants up to t squared
roughly, then you get something like t discriminants that are good for n. Maybe it will be
only t over 10 or t over log t or some such, so instead of t squared I should be saying t
to the 2 plus epsilon for some suitable epsilon, but it's about -- t squared is about the
right number.
And then fast CM -- I think Drew has left, but let me point to his very recent paper on
speeding up CM. I believe that the run time that he gets heuristically under variation
assumptions applied to the situation looks like t squared times log n plus t times log n
squared. In other words, the time per curve that we're writing down is something like
log n squared. For this range of t, the dominant part is the last part of this fast CM
algorithm, which is kind of merging the class polynomial construction with writing down
the smallest possible part of the class polynomial and then finding roots of it.
Maybe there's something better here. I don't know how far this is going to go. Certainly
this result from Sutherland is faster than the previous results. It seems to me that this
will be the bottleneck in actually running this primality test for very large numbers, so it's
actually a legitimate excuse for doing class polynomial computations for figuring out
better class polynomial computations for moving from j to [inaudible], for instance,
should actually seriously speed up this primality recognizing algorithm. I don't think I
can say the same about ECPP as an application of class polynomial computations, but I
think this -- it really is the bottleneck. I think it's really the most important step in this
algorithm is doing interesting elliptic curve computations.
All right. Suppose you're not happy with all these conjectures and you actually want to
prove something. Well, then you have to increase t. You have to look around more and
find a curve for which the number of points on the curve is something that you can
factor so that you don't just check that some point has the order you expect in this
group, you want to check that the point has not just order dividing what you expect, but
you want to verify that the order is what you expect, so that tells you that the group has
to be at least a certain size. And that's what ECPP does. The fast ECPP takes time log
n to the fourth, verifying on ECPP certificate takes time log n cubed, and current project
is getting that time down. I don't think it will be possible to do better than cubic, but at
least you can look at little o of 1, things like log log n factors and try to get those out, try
to improve the constants.
So what actually takes the time here? Well, in ECPP, you've heard something already,
but just to briefly review, a ECPP proof looks like a elliptic curve modulo n together with
some point as w now is a point on this curve mod n which has prime order q. And part
of this proof is recursively verifying that q is itself prime. Q can't be too small. The proof
breaks down if q is too small, but the q's that weak actually find are pretty close to n. So
this is not a serious restriction.
What does a verifier do with this proof? Well, the verifier checks, first of all, that w looks
like it has order q, checks q times w, sees that the neutral element on this curve.
Because elliptic curve computations are compatible with base change, you get to
reduce this modulo p and you've done a computation on the elliptic curve modulo p. For
any prime p dividing n, you know that q times w is zero. So the order of w in e of z mod
p is either 1 or q once you've checked that q recursively is prime.
You check that w is non-zero and also non-zero after base change, so w is -- for
instance, for Weierstrass coordinates you checks it's an affine point for other coordinate
systems. You check that each of the coordinates is different -- the difference of
coordinates is invertible modulo n. That's what this boils down to.
So you check that w in each e much z mod p even without knowing what p is, do some
very fast tests to see that w is going to be non-zero, doesn't have order 1 modulo p in
the elliptic curve modulo p, and so now you know for every p dividing n, every prime p
dividing n, that the order of w is exactly q. But that means that the size of the elliptic
curve group is, well, at least q. And now knowing that q is pretty big, that tells you that p
has to be pretty big, by [inaudible]. Specifically, every p dividing n has to be bigger than
the square root of n, which immediately implies n is prime.
What slows this algorithm down is, first of all, the recursion. You've got this recursive
proof that q is prime. Q is pretty close to n. You can put more work into trying to find
q's and slightly decrease the q's that you find, but you still have to go through something
close to log n, maybe log n over log log n levels of recursion to actually prove that n is
prime. Just because there's all these subproofs involved in it, you have to know that q
is prime.
The other thing that makes this algorithm slow is that doing arithmetic in the elliptic
curve modulo n is slow. For instance, if you take the Goldwasseh-Kilian definition,
which I've written here as the engineer's definition, of e of z mod n, this is follow your
nose and say, well, I've got points on the elliptic curve mod n. I don't even know if n is
prime. I'll just go ahead and use the formulas, uses the doubling formulas. X1 is
different from x2. Well, I'll compute lambda equals y. 2 minus y1 over x2 minus x1 and,
whoops, I just divided by something which was not invertible. That's the GCD
computation is doing that inverse -- inversion.
And, hey, I've just found a factor of n. And if that never happens, if nothing goes wrong,
then you know that the computation you've done reduces modulo p so that this
computation sort of looking at the algorithm you're doing is retroactively defining some
piece of e of z mod n which is compatible with e of z mod n which is compatible with e
of z mod p for every p dividing n, and then you know something about e of z mod p.
It actually is a legitimate proof, and you don't have to think what e of z mod n actually is.
You could alternatively come along and say oh, this is so ridiculous having e of z mod n
defined implicitly by what some algorithm is doing. Let's give a proper definition
compatible with how algebraic geometers would think about group schemes of what e of
z mod n actually is.
I'd like to show you the definition. I'll do that in a few minutes because I think it really is
a very nice definition and fun to work with, but it still requires that GCD for every
computation that you're doing.
You could try other things. For instance, in one of Francois's [phonetic] papers there's
using division polynomials which, as written, don't involve any inversions, but it's
something like 20 multiplications per bit of n to do that computation, and that's kind of
ridiculous compared to what you've heard for even Jacobian coordinates. You can do a
nth multiple in 9 plus some -- something that converges to zero, 9 plus little o of 1 log n
multiplications where some of these are the time you have to do for batching a bunch of
GCDs, checking that everything that you were implicitly dividing by in the Jacobian
coordinate formulas is actually invertible mod n. Fastest way to do that is multiply them
all together, and you have to do that something like log n times, so doing a
multi-inversion modulo n costs a significant chunk of this computation.
I thought a few years ago that I could do better than this with a Montgomery
Ladder-type computation which almost kills the little o of 1, gets rid of the multi GCD,
reduces the 9 to 8 at the end of the day, essentially by killing that multi GCD, but I
wouldn't trust this proof, and if somebody came along to me and said that's a proof of
primality, I'd be kind of skeptical.
So fortunately we know more now. In particular, we know how to do curve
computations without exceptional cases and with incredible speed.
So instead of using old-fashioned curve shapes, let's use x squared plus y squared
equals 1 plus d x squared y squared where d is not a square and then we know that the
addition law, the very fast addition law on this curve, always works. You never end up
dividing by anything zero modulo p. So you don't have to bother checking anything.
You do the fastest computation you can think of and it just -- it always works. You've
heard what the fastest computations are, and it's only 7 times log n multiplications plus
some little o of 1 for the occasional additions you have to do. That's much, much faster
than certainly division polynomials or doing GCDs all the time. Well, this is -- this
seems to be the state of the art. Maybe somebody will come up with something better,
but this is already pretty good except you might object, wait a minute, how do I know
that this d is not a square?
We're trying to computations in e of z mod n in order to do computations in e of z mod p
for every prime p that divides n. Now, how do I know that d is not a square in z mod p?
I don't know anything about p. I mean, I think p equals n. We're going to prove p
equals n. But that proof can't assume that p equals n. We don't know in advance. We
can't assume anything about p. How do you know d is not a square?
Well, the easy fix is to say, okay, at least there will be some p that works because you
take a d whose symbol mod n is minus 1, and that means there's some p dividing n
where d is not a square modulo p, and that tells you that some p -- all of your elliptic
curve computations have worked, so for some p dividing n p is at least, well, what you
get from Hisil [phonetic] assuming you've verified that something has order q.
Now, depending exactly how close q is to n, you have to do a little bit more work to
check does n have any small primes. If you know that some prime dividing n is, say,
bigger than n over a million and you know that n has no prime factors up to a million, n
has to be prime.
You can try to balance, okay, what will you allow q to do, do you want to do more order
verification versus doing less trial division, use better methods in trial division. There's
all sorts of things to make this run even faster. But the basic idea certainly works, and
that's what we're exploring right now.
I promised I would tell you the mathematicians definitions of e of z mod n. I'll only do
this for -- I'll do e of r in some generality, but only for r's with class No. 1, but z mod n
has class No. 1.
This goes back effectively to a [inaudible] paper by Lange and Ruppert -- different
Lange -- saying for any abelian variety over any algebraically closed field -Were you writing papers in '85?
>>: [inaudible].
>>Daniel J. Bernstein: -- for any abelian variety over any algebraically closed field,
there's a low degree complete system of addition laws. So addition laws are polynomial
expressions which are compatible with addition except they're allowed to sometimes
give all zeros, some non-projective point i. They specifically showed that if you have a
symmetric elliptic curve embedding, then you get a degree 2 in each variable system of
addition laws. I'll say what this means quite concretely in a moment.
They commented that this proof does not let you write down the addition laws. To
determine explicitly a complete system of addition laws requires tedious computations
already in the easiest case of an elliptic curve in Weierstrass normal form.
But, okay, they were not deterred by tedious computations. In the same paper they
actually did it. They wrote down a complete system of three addition laws for short
Weierstrass curves and then a couple years later did it for long Weierstrass curves. I'm
not going to show you the formulas. Just to give you an idea of how complicated they
are, if you give names to some cross products, then you end up with only 53 monomials
in the complete system of addition laws -- quite a mess -- until Bosma and Lenstra
came along and made things much simpler.
So what they did -- first of all, they reduced the three addition laws to two. So they
wrote down six polynomials, x3, y3, z3, x3 prime, y3 prime, z3 prime -- the primes are
just different polynomials, no derivatives -- in this generic polynomial ring with variables
x1, y1, z1, x2, y2, z2 and generic curve coefficients a1 3a6 in Weierstrass form.
What I've shown you on the previous slide in a ridiculous font here is not the system of
two addition laws. This is two of the six polynomials. So they had y3 prime and z3
prime. Actually, this is the result -- so what I'm showing you is a scan from the printed
publication, and the printed publication is not the same as what Bosma had inside his
computer from an early version of magma. This is what happens when you feed
magma output through the publisher and you get published y3 prime and published z3
prime. These are incorrect formulas as they actually appeared in print.
I said I would say concretely what this means. Well, these polynomials have the
following very explicit addition property. If you take any Weierstrass curve, any point p1
on this Weierstrass curve over whatever field, any point p2 on the same curve over the
same field, then the first three polynomials evaluated from Bosma and Lenstra will be
either the sum of the points or 000, and the second system of polynomials they wrote
down will also give you either p1 plus p2 or 000, and they won't both give 000. So
between the two of them, at least one of them will add any particularly pair of points that
you feed in as input.
Okay. Here's a similar theorem for Edwards curves. Instead of in p2, this is in p1 times
p1. This is also geometrically outside characteristic too. This is arbitrary elliptic curves.
So the same level of generality as this theorem. The formal expression is -- well, it's the
same thing except it's all p1 times p1. Instead of p2, there's some explicit formulas,
some explicit polynomials that we wrote down x3, z3, y3, t3 and x3 prime, z3 prime, y3
prime, t3 prime which always add any pair of points. There might be some occasional
zero divided by zeros, but that will be made up for by something else not being zero
divided by zero.
The difference between these kinds of formulas and what you get from a, well,
engineering approach to adding points on an elliptic curve is that these formulas will
never give you anything bad other than 0 divided by 0 or 000 and more variables.
For the normal formulas, if you try applying, say, the doubling formulas in textbooks,
then those don't work for adding most pairs of points. They give you actual wrong
answers. These are all valid on some open subsets of e times e or e of k times e of k.
Here are the formulas for Edwards curves. That's the complete system of addition laws
for addition on Edwards curves with all the extra variables to put it in p1 times p1 and
show it to undergraduates. For comparison here, again, is the Bosma-Lenstra complete
system of addition laws which -- I mean, these are both, you know, finite computations.
In principle, there's no difference. Just big o of 1, right?
Okay. This is your brain on Edwards curves. This is your brain on Weierstrass curves.
And what does this have to do with defining e of r? Well, here's the general setup again
for rings of class No. 1. You take projective space over r to be the set of lines through
the origin in 3-dimensional space. So you take all -- for any xyz define x, colon, y,
colon, z as all the multiples of xyz, same multiple of x, y, and z, and that's some line
through the origin in 3-dimensional space. And then the set of those lines is -- well, I
should say this is supposed to be a non-trivial line in the sense that x, y and z are
supposed to generate the whole ring. Take all of the non-trivial lines through the origin
and that is projective 2 space over r. And now define e of r for, say, Weierstrass form.
This is what Lenstra did in '97. Define e of r as, well, the set of xyz in this projective
space that satisfy the curve equation for, say, a short Weierstrass curve.
How do you add points? How do you add these elements of e of r? Well, this is where
the complete system of addition laws comes in. And you really need it to define e of r in
this generality.
You take the complete addition laws from -- well, back in '87 Lenstra's paper only had
Lange-Ruppert to refer to. He said take those three addition laws for Weierstrass
curves, add the points that you're trying to add, the x1, y1, z1 and x2, y2, z2 with those
formulas, and that gives you three different choices for lines through the origin with are
supposed to be the sum of your points projectively.
And now they're all supposed to be the same point in some sense or maybe 0, 0, 0, but
if r is -- say you have z mod n being z mod p times z mod q for two different primes p
and q. Then it might have one of the formulas is working mod p and another one is
working mod q. To get a general formula that always works you add these three lines
through the origin, and that always gives you a proper line through the origin which is
some xyz.
This is the GCD computation. This is the inversion. You have to do something mod n.
You have to find one generator for this module. I mean, you start out seeing it as a
projective module, you know it's rank 1, and because r is assumed to be trivial class
group, you know that there's a single generator for it, and that computation is exactly a
GCD computation.
Okay. So that's the right definition of e of r. Of course, if you allow r to have a bigger
class group, then you need to allow more terms in this, not just a single xyz.
Okay. Next mini talk, factoring integers into primes.
Here's a quote from Atkin-Morain in '93, finding suitable curves for the elliptic curve
method of factorization. They said for practical application -- they constructed a whole
bunch of curves. The elliptic curve method of factorization we'll hear much, much more
about tomorrow in Peter Montgomery's talk. Plus you've heard a bit about it before. In
the context of ECM, well, it's good to start with a curve over q. You need to have the
curve over q with a known non-torsion point over q, and then you'd like to have the
curve having a big torsion curve. And that's what this whole paper is about is
constructing curves over q with rank known shown explicitly to be at least 1 by
exhibiting a point and with big torsion groups all the way up to the maximum you can
have over q, namely, z mod 8 times z mod 2. And they say you may as well use this
16-torsion-point curve, family of curves.
Giving a prescribed factor of 16, well, inside the context of the elliptic curve method,
whatever groups you write down, if you know that they have, say, four torsion points for
the clock or if you know they have two torsion points for z mod n star, if you know
something about your group, then that effectively divides the size of your group by that
little torsion. It improves the chances the elliptic curve method factoring, and so they
say, yeah, this is the biggest groups we can give you. Use those curves. Except it's
actually not true. These are not the best curves to use for ECM.
Together with Tanya and Peter Birkner, this paper here is called Starfish on Strike.
You'll have to look at the paper to understand the title. The result is there's sort of an
expect part of this and then there was a surprising part of this.
We were looking at all the results from Husey [phonetic] and Hissil that you've heard of
about of how fast Edwards curves, and in particular this kind of twist of Edwards curves
minus x squared plus y squared equals 1 plus the x squared y squared, how fast these
curves are. It's certainly much faster than anything you can do with Jacobian
coordinates or the other coordinates that have been considered for ECM. We were
certainly expecting these curves to be very, very fast. There's a little problem that these
curves are incompatible with having 12 or 16 torsion points. With this particular shape,
the minus x squared prohibits having 10 or 12 or 16 torsion points. The best you can do
is say z mod 6 or z mod 8. And we can constructed all these -- I use the word we
loosely -- my coauthors constructed all these. I just did the computer experiment.
We were expecting that the z mod 6 and z mod 8 cases would be very fast but would
lose some effectiveness inside ECM. They don't have the maximum torsion. And
everybody working with ECM knows we want big torsion except it's -- again, it's just not
true. These curves actually find more primes than the previous curves do even though
the torsion is smaller.
Now, there's reasons -- there's easy reasons to explain why they might find the same
number of primes inside ECM, why they might be as good in terms of effectiveness,
how many primes they find, and then better for speed. And that's what we were hoping
for, that they would be at least as good or maybe not so much worse for effectiveness
and then so fast that they would be worthwhile.
But actually they're very fast, faster than anything else, and they find more primes. The
combined effect of that is illustrated in the following diagrams which are -- this is for
1,000 curves in five different families for finding different sizes of primes using
parameters that were optimized for the now known to be not best possible families.
What you see here is taking, for example, 7200 modular multiplications to find an
average 25-bit prime. So feed all 25-bit primes to your ECM program and then see how
many of the primes you find, compare that to the number of multiplications that you took
which was maybe 4,000 for finding half the primes comes out to 8,000 per prime
actually found. The curves are then sorted in order of the lowest cost curves on the left,
the highest on the right. The red curve at the bottom here, the lowest cost curve, is the
z mod 6 which actually has 2 pieces because there are two different families that we
were looking at and they apparently have different performance despite having the
same torsion and all other obvious algebraic features being the same. Up here is the
12 and 16 and then some other curve shapes that we tried. Very stable. As we
increased the size of primes we're looking for, it is very clear that ECM is happier with
these z mod 6 curves than with the previous z mod 12 and z mod 2 times z mod 8
curves.
>>: [inaudible].
>>Daniel J. Bernstein: I'm sorry?
>>: [inaudible].
>>Daniel J. Bernstein: The order does slightly change between 12 and 2 times 8. The
parameters, again, were optimized for a certain slice of ECM parameter space and
then -- if you're trying to optimize ECM parameters then you end up being kind of limited
by -- you're looking at small numbers and you have a limit to how many different cutoffs
you can use for, say, b1 in ECM and the stage 2 parameters. And so you often get kind
of discontinuities, and so for some of those parameters actually the 12 and 16 reversed
slightly.
But those were the best parameters found in a fairly comprehensive search for those
curves, and this -- just using the parameters optimized for those is quite a lot better.
So what's going on here? We don't know, but maybe somebody can figure it out.
All right. Last little section of my talk. I have officially 10 minutes, but I only have a few
slides. Sometimes used to be going for a while doing some serious mathematics and
then you kind of degenerate and you end up saying, all right, what are all the primes up
to 1,000? I'm really bored. 2, 3, 5. And you would think a problem like this, there's
nothing to say about it because Eratosthenes figured it all out thousands of years ago.
And what the Sieve of Eratosthenes does is, well, evaluate -- enumerates systematically
all the small values of some quadratic forms. In the traditional expression, you're
numerating product. You take, say, all the multiples of 2, all the multiples of 3, you can
skip all the multiplies of 4, all the multiplies of 5, skip all the multiplies of 6. You're
numerating products, i times j. But for lots of reasons I prefer to re-express those
products. Let's ignore what happens at 2. Just look at odd products i the times j. You
can think of i as x plus y and j as y minus x and then, well, i times j is minus x squared
plus y squared. Y squared minus x squared is a generic way, again, ignoring even
numbers. Y squared minus x squared is a generic way to write an odd product which -I remember a wonderful book called Category Theory Made Difficult. This is the Sieve
of Eratosthenes made difficult, and there's kind of limits to how far you can go compared
to category theory, but, okay.
Y squared minus x squared, that is the norm of the same kind of thing I was writing
down before, y plus xt from the ring z adjoined t mod t squared minus 1. Hey, that's not
irreducible. Don't worry about it. You can take norms from z adjoin t by t squared
minus 1 down to z, and in particular the norm of y plus xt is y squared minus x squared.
You take y plus xt times y minus xt, you get y squared minus xt squared which is y
squared minus x squared.
So that's what the Sieve of Eratosthenes is doing is systematically numerating norms
from this reducible ring. And then somehow by knowing something about the numbers
of ways of write n as a value of these norms, it figures out whether n is prime. If you
can write n in several different ways as a product, then it's not prime.
Like I said, you kind of degenerate after a while, but it's still kind of fun to look at this.
All right. If you do this computation for all small numbers n, say all numbers n up to h,
and then it's actually very, very fast, because numerating all the values of y squared
minus x squared that are up to h, well, if you just take some y in some range and x in
some range, then you're not going to get any particular n with a very good chance, but if
you're looking at all the n's, just zoom through all the x's and all the y's that could
possibly be relevant and you make a table of all the n's that you care about, and that's
what the Sieve of Eratosthenes does, and it's very, very efficient.
But you can actually do better. I was on the way to a conference with Oliver, and he
mentioned that he actually uses something different to check whether a number is
prime, whether a small number is prime. Namely, evaluating -- well, numerating values
of x squared plus y squared or 4x squared plus y squared, 3x squared plus y squared.
Okay, the complete Sieve of Atkin -- there's, of course, many choices you can make
here, but what is now widely known as the Sieve of Atkin is enumerating -- instead of y
squared minus x squared values you numerate y squared plus 4x squared values for
some n's, y squared plus 3x squared values for some other n's and y squared -- well, 3x
squared minus y squared values for some other n's. And this covers all possible n's.
Now, this is better than the Sieve of Eratosthenes. There are fewer values of these
forms than there are of y squared minus x squared because there are fewer elements of
these number fields than there are of q times q. If you take q adjoin -- what people
sometimes call q adjoin square root of 1, q adjoin t mod t squared minus 1. That's q
adjoin square root of 1. It's not a number field, it's a product of two number fields. Its
zeta function is a product of zeta functions that has a double pole at 1. You've got more
ideals, you've got more elements of this number field than you do of -- well, product of
two number fields than you do of an actual authentic number field like the ones that are
showing up in the Sieve of Atkin, q adjoin square root of 3, square root of minus 3 or
whatever square root you want to put in except for square root of a square like square
root of 1.
Now, as a result of this, if you ask how long does it take to write down all the values of x
squared plus y squared, say, it's just less time than writing down all the values of y
squared minus x squared.
I have a parenthetical note that I don't know the answer to, namely, can you do
something similar, enumerating points on elliptic curves. I heard him mentioning this
and said, well, that's funny. That actually answers an open question in prime
enumeration -- you'd be surprised how many papers there actually are on this topic -namely, can you numerate primes in what seems to be the best time possible, not quite
h over log h time for all the primes up to h but h over log log h is the best anybody's
been able to do. That's the number of additions of numbers that are up to about h or h
squared or so. Can you numerate primes with that minimum amount of time using a lot
less than h space. It was previously known how to do this enumeration using
something like h or h over log h space, but can you do it, some previous papers asked,
in only, say, square root of h space. That was previously known to be do believe only
with much, much more time, like h times log log h.
And, well, Atkins sieve immediately answers that question. And so we wrote a little
paper and then Will Galway came along and said actually you can do same kinds of
techniques with better allowed spaces reduction gets down h to the one-third, you
should be able to get this down to h to the one fourth.
But more recently I've been looking back at this and wondering is this actually a
sensible kind of optimization to do? This is saying -- it's saying go for the absolute
minimum time, paying attention to, like, log log h factors, not willing to do h or h times
log h or any such thing and then asking can we make these huge memory reductions,
but still not willing to compromise at all in the amount of time.
I don't actually think this is a meaning faculty gain. I think that meaningful gains are
ones played on current state-of-the-art graphics cards like the Radeon 5970 graphics
card which has 3200 parallel multipliers, all running at 725 megahertz. It can do 2.3
times 10 to the 12th multiplications per second, draws 300 watts, costs about $600.
This is a picture of something running at even higher speed doing more multiplications
per second but needs more cooling, and you have to make sure to plug it into an even
better power supply.
Now, this is the future of computation. This is the fastest computational machinery you
can buy today. Consider its price, it's the fastest -- it's the best-priced performance ratio
you can get for computation by far. And it's not what we're optimizing for. If you think
about the 3,000 parallel multipliers here and you try to put your typical number theoretic
algorithm onto this graphics card, then you see it's actually incredibly slow because
those 3,000 parallel multipliers, they can all operate at once with very small amounts of
memory. They can't talk to huge amounts of memory, they can't sieve very quickly. If
you tell them oh, we're going to have a large amount of memory and just access that,
it's incredibly slow. And physically it wouldn't make sense for it to be faster.
So to take advantage of this, we should be willing to trade some time for reducing
memory consumption much, much further. And it's actually -Yeah, go ahead.
>>: [inaudible].
>>Daniel J. Bernstein: Yeah.
>>: [inaudible].
>>Daniel J. Bernstein: Yeah, this is not counting the printer's paper. So the bits of
memory are -Sorry?
>>: [inaudible].
>>Daniel J. Bernstein: Yeah, you keep spitting out of this machine the primes in order:
2, 3, 5.
Somebody else might write them down ->>: [inaudible].
>>Daniel J. Bernstein: Sorry. Say again.
>>: [inaudible].
>>Daniel J. Bernstein: Yeah, something like h bits. This is -- the operations are h over
log log h operations, each of which is working on integers of log h bits. So the total
number of bit operations is h log h divided by log log h.
Now, you're talking about a much, much smaller number, namely, h, of bits that you
have to print. It's an important question -- yeah, it's important that these are operations
on -- you can count bit operations as well. It takes just a bit more work. Okay.
All right. So back to here. And this is actually my last slide.
If we want to optimize number theoretic algorithms for real computers, then we really
have to reduce the amount of memory we're using and even do that if it means
increasing the time somewhat. A great example of this is a paper by John Sorenson a
few years ago on the pseudosquares prime sieve which always prints the primes 1
through h in order and is conjectured to take, well, h times log h operations on the log h
bit integers and uses only log h squared bits of memory. And, okay, nobody's put it onto
this machine yet, but it will run much, much, much faster on modern computer
architecture and computers of the future than any of the other algorithms that I've
mentioned possibly will.
I think it might be possible to improve this a little bit, maybe get rid of almost a log h
factor using elliptic curve primality tests using class polynomials and so on, putting
everything together that I've said. So maybe the problem of numerating primes is not
actually as simple as I once a thought.
In any case, even if this is the best possible, I think Oliver really would have enjoyed
playing with these computers, and I hope that you will enjoy in the future playing with
them too.
Thanks for your attention.
[applause]
>>: Are there any questions? Dan's blown you away, hasn't he?
>>: I just wanted to hear conjecture on putting primality the log n squared [inaudible]
are there any quadratic [inaudible] lurking in that prediction?
>>Daniel J. Bernstein: There are tons of conjectures that are -- I think that would be
harder to prove than that. And, yeah, I guess that in particular is one of pieces that's
needed. So it's certainly relying on being able to construct all sorts of stuff that we have
no way to prove can actually be constructed. So I think you might actually be one of the
culprits in conjecturing that the best is log n cubed, and this log n squared relies on
many, many more conjectures than would go into previous stuff. It relies on quite a bit
of stuff that's way beyond what anybody can prove, but nevertheless I believe this is
correct. I think I could even write down an explicit algorithm which I conjecture to
reliably determine the primality of n and which takes this amount of time. But, of course,
it's way beyond current technology.
>>: [inaudible].
>>Daniel J. Bernstein: The actual run time -- that's the easy part to analyze. The actual
run time is log n squared times a certain lower order. The hard part is convincing
somebody that it actually reliably determines the primality. So that's what Oliver went
on for 10 pages in his paper about the linear quadratic and cubic tests being put
together, and then when you try to put together more and more tests and doing the
analysis of when should they first fail is actually -- again, I think I've done a reasonable
first job, but I wouldn't be surprised if I'm off by some noticeable factor, even log
squared log n, I would believe. But I think that the final t, the number of tests that I
need, is much, much smaller than log n, so log n to the epsilon.
>>: Any other questions?
>>: I have a question. I'm not sure if I should be asking one of the other speakers
[inaudible] on that machine that you showed us if anyone does highly parallelized
[inaudible].
>>Daniel J. Bernstein: So the question was whether anybody's tried a parallel pairing
on a machine with 3,000 multipliers, and I'm trying to get forward to the picture of it. I
have no idea how to actually use this PDF viewer. Flip, flip, flip. This is one of the
things that reminds me that I have too many slides.
I guess it doesn't help answering your question to show the machine. I just think it's a
really -- it's really, really cool, I think -- I've never actually seen this in operation, but all
the fans are supposed to be going [inaudible].
>>: You don't actually have one?
>>Daniel J. Bernstein: I have one of the lower-clocked ones. I don't have -- this is one
of more expensive limited edition -- it's so cool. I mean, [laughter] they only produce like
a thousand of them, and they run at something like 900 megahertz instead of
725 megahertz and ->>: It's only cool if you have enough fans.
>>Daniel J. Bernstein: Yeah. To answer the question, I ->>: [inaudible].
>>Daniel J. Bernstein: I do believe that there is a group that has started looking at this
question. I don't know if they're public about it, and I don't know if they're happy to
cooperate with other people about it, but speaking for myself, I'm mostly look at much
simpler things. I see all the interesting activity in pairings, and it's very cool to watch
and interesting to see how fast things are going. And, yeah, if you believe, as I do, that
these are the computers of the future, then it makes perfect sense to optimize pairings
for these. But if you looking for people who might have done work in this direction and
would be willing to talk about it, then you might have to look around for whether they're
willing to say anything.
>>: What are the multipliers?
>>Daniel J. Bernstein: There's a single precision floating point approximately i triple e.
>>: [inaudible].
>>Daniel J. Bernstein: About 24-bit [inaudible].
>>: Does that machine double as also a cook top range [laughter]?
>>Daniel J. Bernstein: Sorry? Double as?
>>: Could you cook on that machine [laughter]?
>>Daniel J. Bernstein: I assume you could, yeah. You just have to take the fans off.
>>: So if you can flip back to the graph you had [laughter] ->>Daniel J. Bernstein: This is what I get for promising a series of mini talks.
>>: I was just curious. So I guess the Edwards curves with lesser torsion, you expect
fewer bits because the torsion is -- it's not as large, but you're being more efficient
because the divisions are faster, and it averages out that you're doing better. But how
much -- how do the effects cancel? I mean, how much worse is the torsion factor and
then how much better is the [inaudible].
>>Daniel J. Bernstein: Okay. So this is what I was trying to address. So what you said
is exactly what we expected at the beginning, namely, that these curves would be faster
but find fewer primes. And the actual effect is that they're faster and find more primes.
Each curve ->>: But you're counting [inaudible].
>>Daniel J. Bernstein: That's true. So this is the combined effect. So out of this, some
part of the distance here is the speedup which is some percentage -- I mean the gap
between, like, the 7250 or so and the 7500 -- I don't know offhand what the answer is
for this. This is showing the overall effect of switching to the -- from x squared plus y
squared equals 1 plus dx squared y squared with torsion say z mod 12, this is -- these
guys, I think this one and this one, is minus x squared plus y squared equals 1 plus dx
squared y squared in a particular family with torsion z mod 6. Part of the gap here -- so
I do know for each of these that it's not a cancellation. So for each of these it is an
improvement in the number of primes found and an improvement in [inaudible]. So
each curve is running faster than the previous curve. You can go to the paper. It's on
online -- it appeared at [inaudible] just recently -- and you'll get tables which say what
the number of primes actually found is in each of these cases. That's not what this
graph is showing, so you might think that, yeah, the graph is saying oh, well there's a
speedup, but then a loss in the -- from the primes found. But, no, it's actually there's
some speedup and there's an improvement in the number of primes found. So that was
the surprising part of looking at this was that actually these curves are better for reasons
that are still not known.
You might think -- if you start looking at what happens with the 2 torsion and 4 torsion
over appropriate quadratic extensions, then you can argue why it should be as good to
do these as the previous z mod 12s. But that still doesn't say why they're better. And
they really are better. It's just -- it's a certain percentage improvement, not a huge
improvement, but it's quite a surprise. Certainly something that should be explored.
>>: A small addition to this screen, the 2 times 4 and the 8 are [inaudible].
>>Daniel J. Bernstein: Yeah, that's right. So up at the top, the -- so these are, you can
see again, multiple levels for multiple different families that we were actually trying.
Except for some sporadic curves that do substantially better, most of the curves in the 2
times 4 -- excuse me, most of the curves in the 2 times 4 and 8 families are worse in the
number of primes found, I mean much, much worse than the z mod 12 or z mod 2 times
z mod.
In z mod 6 some interesting things are happening, some of which we understand, and
those could, again, explain why this is getting to be a little better than the previous just
being faster and about as effective, but why they're more effective, no idea.
>>: [inaudible].
>>Daniel J. Bernstein: The x axis is an inverse error function distribution. So if you
want to turn a normal distribution into a straight line, then you use this distribution.
>>: [inaudible].
>>Daniel J. Bernstein: Oh, sorry. This is 1,000 curves sorted in performance order,
and the scale is chosen as inverse so that if it were a normal distribution, you would get
a straight line.
>>: And then you said that [inaudible].
>>Daniel J. Bernstein: [inaudible].
>>: The best price performance ratio currently, does this include the cost of powering
the unit to run it?
>>Daniel J. Bernstein: That's a good question, yeah, how much do you pay for the
power of these things. I think given the amount of computation you're getting out, even
if the power ends up doubling your budget, then it's certainly worth it. In fact, depending
where you live, the power can cost quite a bit. So in the typical, say, five-year lifetime of
a machine -- these things don't have five-year warranties -- but if you imagine this going
for five years, then you could easily spend as much on power as you do on actually
buying equipment. If you buy two of these along with a regular CPU and disc and so on
in a case and you end up spending, say, $2,000 on a PC with 4.6 times 10 to the 12th
multiplications per second, then you could easily spend $2,000 on power. Depends
where you are how much your power bill is. Of correspond, with solar energy, in
principle you should be able to power something like this for -- sustainably without very
much area because the sun is producing a huge amount of energy on the square meter
of the earth's surface. Think green [laughter].
>>: [inaudible].
>>: Can you come back to the picture again [laughter]?
>>: [inaudible].
>>Daniel J. Bernstein: No. There are some sporadic curves. So actually -- okay,
something we were expecting is that there would be, of course, random variation
between curves. So if you're willing to say, okay, actually I'm going to maliciously
think -- you know, all the numbers I care about factoring, I care about finding, say, 25-bit
primes. I'm going to precompute curves which are really good at finding 25-bit primes,
and so you would expect some random variation, and that's what you see the straight
lines for.
Now, when you see deviations, some of them are -- here there's actually two different
families which separated by this type of experiment. But some of them are curves
which are sporadic curves which are better. And now we know that those curves are
better because the same curves are showing up as much, much better -- well, first of all,
it's kind of implausible that it would be like this, that you would have such a jump, for
instance, for the green curve randomly. But beyond that, the same curves are good for
26-bit primes, 25-bit primes, 24-bit primes and so on. And that's something that can't
happen by random.
>>: [inaudible].
>>Daniel J. Bernstein: Well, the right edge -- those are the slopers. Those are the
ones to throw away. The interesting ones are the fast ones over on the left here. And
these are good families. This is a z mod 6, not so good family, although still better than
anything we can do with 12 or 2 times 8. So there are multiple families which are
stratified somehow.
It's possible that that is something easily explainable from looking at torsion over small
extensions of q, but then the sporadic curves here don't -- we've looked at some of
those and don't have any idea what's making them so fast. And in general the z mod 6,
the red line down at the bottom, no idea what's making it so fast.
>>: [inaudible].
>>Daniel J. Bernstein: Yeah, the sporadic ones are great. Something like this is almost
as good as the worst z mod six ones. And this is not a bad curve to use. Just by
randomly looking around and seeing which curves are the best ones, some of them are
surprisingly good.
>>: Any more questions? Any more reluctant questions?
Okay. Well, let's thank Dan again and -[applause]
Download