>> Reinier Broker: We're very happy to have a... yesterday and flying out tonight, so it's a bit of...

advertisement
>> Reinier Broker: We're very happy to have a visitor from MIT, Drew. He just flew in
yesterday and flying out tonight, so it's a bit of a heroic visit. He's going to talk to us
about computing class polynomials with the Chinese Remainder Theorem.
>> Drew Sutherland: So to set the stage for this talk, I'd just like to take you back to the
Algorithmic Number Theory Symposium that took place this past May in Banff. Some
of you I know were there.
The Selfridge prize for the conference was awarded to a paper by
Belding-Broker-Enge-Lauter, which tells a story of three algorithms, all for computing
Hilbert class polynomials.
The first of the three, the complex analytic method, is the oldest, it's the most well known
and the most widely used. It's sort of the reigning champion.
The second is a p-adic algorithm which is actually very nicely described by Reinier
Broker in the most recent issue of Mathematics of Computation.
Now, the third algorithm is a bit of a mongrel. The idea of using the Chinese Remainder
Theorem for computing Hilbert class polynomials has been kicking around for quite a
while, but until very recently it wasn't obvious that this could be made practical.
The surprising result that was presented at the conference is that heuristically, anyway, all
three of these algorithms appear to have the same asymptotic complexity, something like
roughly quasi linear in D, the discriminant.
This is surprising, mostly because prior to this paper, this CRT method was known to
have complexity something like, oh, D to the three halves. So most of the paper and most
of Reinier's talk was focused on ways to improve the CRT method.
Now, if you're like me and you've got sort of a soft spot for algorithms, you can't help but
root for the underdog. And as the talk went on and Reinier talked about all the neat ideas
they'd come up with for improving the CRT method, I got more and more excited, and I
couldn't wait for the end of the talk when I knew the climactic showdown would come
between the underdog and the reigning champion.
But sadly this was not a Hollywood ending. The underdog gets crushed. It's not even
close. Even with all the improvements that they came up with. And, in fact, so they
found in their initial implementation the complex analytic method still seemed to be
about 50 times faster. And, in fact, the real situation is a lot worse than that. Because in
practice people don't actually compute Hilbert class polynomials with the complex
analytic method; they compute other class polynomials, ones with smaller coefficients
that can be computed more quickly. And we don't necessarily know how to do that with
the CRT method, so the real difference that we're looking at here is a factor of more than
a thousand.
Now, by the end of this talk I'm going to hopefully convince you that the CRT method
for large enough discriminates is easily a hundred times faster than the complex analytic
method. So we've got five orders of magnitude we're going to cover in the next hour, or
little less than an hour, so let's get started.
All right. Just a little bit of back ground. One of the main reasons people are interested
in computing Hilbert class polynomials, or any class polynomial for that matter, is to
construct elliptic curves of known order. So the idea is you have some finite field, let's
suppose it's a prime field that we like, and we have some number of points we wish our
elliptic curve had, and that tells us what the trace of the curve T should be.
And we can write down an equation 4P equals T squared minus V squared D, where D is
some square-free negative discriminant. And if we happen to know, if we can pull out of
our pocket, the Hilbert class polynomial for that discriminant, reduce it mod P, find a
root, that will tell us the J-invariant of the curve we want. And then all we've got to do is
figure out what the right sign is of the trace, and we can take a twist if we need to. So the
only hard part in all of this is figuring out that class polynomial, that Hilbert class
polynomial.
Now, the -- sort of we can define Hilbert class polynomial in terms of elliptic curves if
we imagine we start with a discriminant D that uniquely determines some imaginary
quadratic order which we can think of as just a lattice of points in the complex plane, we
can take a quotient of the plane with that lattice, we're going to get a torus, some elliptic
curve. It has some J-invariant in the minimal polynomial -- that's an algebraic integer -and the minimal polynomial of that J-invariant is the Hilbert class polynomial.
Now, D is square-free. If it's a fundamental discriminant, that's going to give us the
Hilbert class field. But in general we're going to get the ring class field associated with
the imaginary quadratic order.
Now, the interesting thing for what we're going to be using today -- that we're going to be
talking about today is the case where we have a prime that splits completely in the Hilbert
class field. Equivalently, that just means we can write it in this form. Then the Hilbert
class polynomial is going to split completely mod P into linear factors, and its roots are
just going to be a list of all the J-invariants of elliptic curves whose endomorphism ring is
isomorphic to this quadratic order O sub D, and so I'll just indicate that by saying O sub E
is equal to O sub D.
Now, in principle, the CM method can be applied to construct any ordinary elliptic curve
as long as the trace is nonzero. But in practice we only know how to do this when the
discriminate is fairly small. And if we pick a curve at random, that's not likely to be true
if we have some cryptographic-size prime.
Now, why do we need the discriminant to be small? Well, the Hilbert class polynomial is
big, really big. It takes more than D bits just to write it down. And if we're talking about
some cryptographic size prime P, that could be a really big polynomial.
So coming down to earth a little bit, here's just some examples of the estimates of the size
of the Hilbert class polynomial for various discriminants. These are neither the largest
nor smallest cases, just fundamental discriminates close to powers of 10. The columns H
here are the class number of D and they tell us the degree of the Hilbert class polynomial.
This is a bound on the size of the coefficients, and if we take the class number times log
B we get a rough estimate within a few percent of the size of the Hilbert class
polynomial.
And you can see already at around 10 to the 10th we're talking about gigabytes, just to
write down this polynomial, potentially more than would fit comfortably in our machines'
memory. And if we want to go up to 10 to the 12th or even, say, 10 to the 14th, which is
where we're headed, we're needing -- we're going to need to deal with terabytes of data.
Okay. So at this point you might ask why on earth would you ever want to compute a
polynomial that big. Well, one motivating reason for using large discriminants is
pairing-based cryptography. The idea here is we have -- and this will work on any
elliptic curve, but we take a pair of points on elliptic curve and we look at some pairing,
the Weil pairing, the Tate pairing, that's going to give us a map of the two points on
elliptic curve into some finite field, extension of our finite field, a base field that we're
working on.
The interesting case is where the degree of that extension is small but not super small, so
roughly we'd like the embedding degree to be somewhere between 6 and 24. And we
want to choose the embedding degree and the size of our prime so that we just balance
the difficulty of the discrete logarithm problem on the elliptic curve and in our extension
field.
And that means we have a fairly narrow range of parameters that are useful. So say we
take embedding degree 6, we should really pick our prime somewhere between 170 and
192 bits. So if we have a many degree 10, then maybe we want a slightly larger prime.
Now, these are very tight constraints; there are not a lot of curves that meet these criteria.
In fact, you might ask are there any. Well, there are. There are infinitely many. But if
we insist on keeping the discriminates small, they're going to be very hard to find.
So this table counts the number of prime order pairing-friendly curves with embedding
degree 6 or 10 and prime of the size indicated with discriminate less than various powers
of 10. So you can see if we wanted an embedding degree 10 curve that would be useful
in cryptography, the first discriminate we find is bigger than 10 to the 9th, and we've only
got eight to choose from if we want it to be less than 10 to the 10th. But if we're willing
to go a bit further, we can find a lot more of these curves.
Another thing to keep in mind is these might not be the only constraints we want to put
on our curves. There might be other criteria we'd like our curve to satisfy, and it's going
to make it even harder to find these curves unless we can handle big discriminates.
Okay. So the basic idea behind the CRT method is very simple, as with any Chinese
Remainder Theorem application, we start by picking a bunch of little primes. Although
here the primes aren't going to be so little, our P's of I's are going to be roughly the same
size as our discriminate D. We're going to work entirely with primes that split
completely in the Hilbert class field, so they can all be written in this form, 4P equals T
squared minus V squared D. We're going to pick enough of them so that we can uniquely
determine the coefficients of the Hilbert class polynomial over the integers.
And our next step is for each of our little primes to compute, figure out what the roots of
the Hilbert class polynomial are, and then multiply together a bunch of linear factors to
get the coefficients of the polynomial mod P, mod R little P's of I here. And then we can
apply the Chinese Remainder Theorem to compute the Hilbert class polynomial which is
a polynomial with integer coefficients. Or, alternatively, we might want to compute the
Hilbert class polynomial mod some cryptographic-size prime, big P, because typically
that's what we want to do in practice. We don't really care about the coefficients over the
integers. The first thing we're going to do when we find out what they are is we're going
to reduce mod P.
And there's a way to do this directly without necessarily ever computing it over the
integers. And so this was -- this idea uses the Explicit Chinese Remainder Theorem as
suggested in a paper by Agashe, Lauter, and Venkatesan.
Now, as originally proposed, the way we find the roots of the Hilbert class polynomial is
total brute force, just try every possibility, we run through all the J-invariants in FP and
see if they give us a curve with the right endomorphism ring. Remember the roots of the
Hilbert class polynomial are just a list of curves with endomorphism ring O sub D.
Now, even if we knew how to compute the endomorphism ring really quickly, like say in
unit time, that's still going to take too long, there's too many curves to check.
Okay. The big improvement in the ANTS paper was to realize that you only need to find
one root of the Hilbert class polynomial; you can then find all the others using the action,
the class group, which can be computed explicitly via isogonies, and I'm going to talk
about how that works.
And so when you apply this idea, now you only need to find one root, not all of them.
The complexity is now quasi linear in K. So it's potentially competitive with the other
methods that are out there. But as indicated in the beginning, the preliminary results are
very disappointing.
Okay. So we need to figure out how we're going to make it faster, but before we do that,
I want to talk a little bit about the Explicit Chinese Remainder Theorem. So if I tell you
I'm thinking of a number, say, that's less than a positive number integer, less than 105,
and I tell you that it's 2 mod 3, 3 mod 5 and 4 mod 7, if you sat down and thought about it
for a while you could figure out what my number was. But suppose I don't want you to
tell me what my number is, I just want you to tell me what it is mod 11. Can you do that
any more efficiently than computing what my number is as an integer and then reducing
mod 11.
And it turns out there is actually a way to do that directly. This was first suggested in a
paper by Montgomery and Silverman. And it uses a similar approach to the traditional
Chinese Remainder Theorem. There are these coefficients that we can precompute. But
then we also need to compute an approximation to a certain integer. I'm not going to go
into the details of the algorithm other than to note that depending on the parameters, you
can save you a little bit of time, maybe a logarithmic factor, but it's not necessarily a
whole lot better if you're just computing one coefficient C.
However, in our situation, we need to apply the Chinese Remainder Theorem to every
coefficient of the Hilbert class polynomial. There might be thousands, hundreds of
thousands, even a million of those. And the CRT primes are the same every time. So
these coefficients A sub I, M sub I, and this big M, which was the product of all of our
CRT primes, these are always the same. We can precompute them once, and if we're
careful on how we do that, we can do that in root D time and space.
And the amount of information we need to maintain as we go along, if we imagine we're
just going to be told the C sub I's, these are the value of the coefficients mod P sub I, one
at a time, all we have to do is maintain these sums, we can just sort of add C sub I in and
be reducing mod P as we go. We also need to maintain this approximation R, but that
only takes log P bits.
So even though we need to compute a lot of data, we need enough C sub I's to be able to
determine the value of the coefficient over the integers. We never need to write that
number down. We only need to maintain log P bits for each coefficient. And that brings
the space down to O of root D log P.
Now, this is kind of exciting. I mean, when I first realized this -- I mean, it seems
obvious, but when I first realized this I got pretty excited because it made me realize that
you could potentially apply the Chinese Remainder Theorem to much larger
discriminants. But only if you can figure out how to make the algorithm faster. I mean,
running in less space isn't much use if it takes a thousand times as long.
>>: So the other two methods that you mentioned at the very beginning don't have this
sort of possible advantage, like they always take more memory.
>> Drew Sutherland: Certainly for the complex analytic that's true, and I believe that
would be true for p-adic, but Reinier could answer. Yeah. Yes. Okay. So that's
exciting. Yes. And I'm a big fan of the Chinese Remainder Theorem just in general
principle. You have an army of primes, lots of little primes nibbling away at the
problem. They can do magic.
>>: [inaudible] easier to parellelize.
>> Drew Sutherland: Absolutely. Yes. I mean, good point. We'll come back to that
one.
Okay. So now I'm going to go into the algorithm in a little more detail. I'm not going to
belabor every step of this algorithm. I'm just going to focus on two of them, where all the
interesting stuff happens. But just to give you sort of a sketch of what's going on here.
We imagine we're starting with some big P is our cryptographic-size prime. We're trying
to construct a curve over that prime field. We know what T is, the trace of that curve,
because we know what order we'd like the curve to have. And that allows us to
determine this discriminant D.
And now we're going to find out the J-invariants of every curve whose endomorphism
ring is O sub D over FP. And we could -- we may only want one, but it actually takes no
more work to find them all.
The first thing we do in step one is a bunch of precomputation. We do some stuff with
the class group, we pick a bunch of primes, and we do this CRT precomputation. Then
we spend most of our time in a loop in step two. For each of our CRT primes our first
step is to find a root of the Hilbert class polynomial and we do that by finding an elliptic
curve mod little P that has the endomorphism ring O sub D.
Once we've done that, we know one root of the Hilbert class polynomial, and then we're
going to use the class group which we precomputed, we're going to use the action of the
class group on that root to compute all the other roots. And I'm going to explain how that
works. But once -- we're going to get H roots where H is the class number, then we need
to put my linear factors together, update CRT sums, and keep on trucking.
And then at the very end, we've got a little bit of post computation to do to get the value
of the Hilbert class polynomial mod big P, and then the very last step is to find a root and
once we know one root of the Hilbert class polynomial mod big P, we can do the same
thing we did here in step 2B over big P to get all the other roots very efficiently. In fact,
it takes more time to find the first root than it does to find all the rest of them. Question.
>>: The main difference is instead of using isogony is using the Galois action; is that
right?
>> Drew Sutherland: We're using -- we're using the Galois action computed via
isogonies. So we are using isogonies.
>>: Okay.
>> Drew Sutherland: Yeah.
>>: You're using it twice, though, that's the main idea?
>> Drew Sutherland: Yes. I'm using it in two different ways. I think it will be clear in a
moment. I'm going to get into both of these steps.
>>: [inaudible] or maybe you just [inaudible] before meaning what was at the conference
at ANTS.
>> Drew Sutherland: So...
>>: How does this differ from the algorithm at ANTS? Or is this just the same?
>> Drew Sutherland: So this is the algorithm at ANTS. Okay. The only -- I'll maybe
highlight some of the differences. The differences are using the Explicit Chinese
Remainder Theorem the way we're going to represent the class group is slightly different.
And the -- there's some optimizations that we're going to make to both of these steps.
And I'll talk about those. But the basic structure is the same. The Explicit Chinese
Remainder Theorem is maybe the biggest difference. Okay.
>>: [inaudible]
>> Drew Sutherland: Why is step four there? Because we're trying to construct an
elliptic curve at the end. If all you care about is knowing the polynomial, we're done at
step three.
>>: [inaudible]
>> Drew Sutherland: You don't necessarily. I'd throw it in there simply because you get
them for free. Maybe -- but, I mean, this gives you -- well, I'll give you one reason. Let's
suppose you'd like to be able to write down your curve, elliptic curve as Y squared equals
X cubed minus 3X plus B. Well, this -- you go through all of the isogenous curves, you'll
have a very easy time finding one that you can put in that form. But if you just picked
one, maybe it wouldn't.
Okay. In terms of the complexity, every step on this page takes on the order of root D
time or less. But this loop here has to be repeated roughly root D times, assuming the
Generalized Riemann Hypothesis. So all of the action is here.
All right. So the two steps I'm going to focus on are first finding a curve with the
endomorphism ring we want and, second, computing the conjugates, computing the
action of the class group.
So to find a curve with the right endomorphism ring we're going to start by solving an
easier problem. We're going to find a curve that has the right trace. Okay. All the
curves with endomorphism ring [inaudible] O sub D have the same trace, but there may
be other curves with that trace.
And the simplest way to do that is just to pick a curve at random, pick a point on that
curve at random, and see if that when we do a scale of multiplication by what we think
the curve order should be, whether we get the identity elementer, and if we don't, we
know that that's not the curve we want. If we do, we've got a little more work to do, but
that extra bit of work is very small. We compute the [inaudible] of the group and the -we can easily determine a group order in O of log to the 1 plus epsilon additional steps.
But our odds of finding the curve we want are pretty low; we may have to test a bunch of
curves. And so this poses a problem. On average we might expect roughly we're going
to have to try two times root P curves before we find a curve with a particular trace we're
looking for. I mean, if we -- so suppose traces are uniformly distributed over the Hoss
interval [phonetic], which of course they're not, but close enough.
So to speed this up, we don't want to use random curves. So the idea here is instead of
picking a curve at random, we're going to use a parameterized family of curves that has
certain prescribed torsion requirements baked in from the beginning. So, for example, if
we know we're looking for a curve whose order is divisible by 12, we can use a
parameterization of curves over Q that all have 12 torsion to just enumerate a long list of
curves over FP that all have order divisible by 12. That reduces the number of potential
curves we need to check.
To take this further, we don't want to just necessarily use parameterizations over Q; we
can use the modular curve X1 of N which parameterizes elliptic curves with a point of
order N on them. We find a point on X1N over FP, and then we can use that to construct
an elliptic curve over FP that has N torsion, nontrivial N torsion.
Additionally, we can play some standard tricks to very quickly filter out curves that don't
have the right order mod 3 and mod 4. And so you put all these things together, you can
easily narrow down the search by factor of a hundred. In fact, I've used -- successfully
used N up to 29 is about the largest N that I find useful, where -- I mean, it takes a certain
amount of time to find points in X1N, you've got to trade that off against how much time
you save by narrowing the search.
But in practice you can easily get a factor of 20 or 30 here.
All right. That gives us a curve with a right trace. But how do we know whether it has
the right endomorphism ring. First question we might ask is how likely is it to have the
right endomorphism ring. And we can figure out exactly what that probability is by
computing the Hurwitz class number.
So we know this value 4P, now we're dealing with little P here, 4P minus T squared, if
we compute the Hurwitz class number, which is a sum over the divisors of the conductor
of these class numbers, it's going to count exactly how many elliptic curves or how many
distinct J-invariants there are of elliptic curves who -- which have trace plus or minus T.
Okay. These guys are all going to have endomorphism rings that are contained within the
maximal order [inaudible] D.
Now, if V is 1, the sum is very short, only has one term, and life is simple: every curve
with trace plus or minus T has the endomorphism ring we want. And so some early
versions of this algorithm actually insisted on making V1 precisely to make life easier
here. Because when V is bigger than 1, we've got more work to do, how are we going
to -- we could throw the curve away and pick another one, but that's not such a great idea,
or we could do some work to try and find a curve with a right endomorphism ring.
But I'm going to hope to convince you that actually we should be very happy when V is
bigger than 1. Because it actually is going to allow us to speed up the algorithm
substantially.
One of the biggest problems you find if you just -- the first thing I did when I was
working with this algorithm is I sat down and implemented and thought, ah, I don't want
to mess around with computing isogonies and dealing with calculating endomorphism
ring, I'll just fix D as 1, I'll only use primes where I have a conductor of 1. And if you
then run the algorithm, you find it starts off at a sprint for the first few primes, but as the
primes are getting bigger, it gets slower and slower. And the reason is there's only H of
D curves out there with the endomorphism ring I want, and H doesn't change as P gets
bigger. So as P grows, this gets slower and slower, especially if the class number
happens to be small, which is kind of counterintuitive because you might think that that
would be the easy case since the Hilbert class polynomial's then smaller.
So the fix here is to -- once we know a curve with trace plus or minus T, we can find a
curve with the endomorphism ring we want as long as we're prepared to go climbing an
isogony volcano, or actually maybe several isogony volcanos, and I'll explain what that is
in a moment.
But before I do, I'd like to observe that assuming that we know how to do this, if we can
get from a curve with the right trace to a curve with the right endomorphism ring, it
changes our view of which primes we like. Because it means we should be looking for
primes where it's easy to find a curve with the right trace, which means we want this big
H here, this Hurwitz class number to be big. Which means we want V to be big.
And so when we think about picking which CRT primes to use, we don't necessarily want
to use the smallest ones; we want to use the ones that optimize this ratio, because every
other part of the algorithm only depends on P logarithmically. This is the only place
where there's an exponential dependence.
Okay. All right. So now I want to just talk real quickly about isogony volcanos. So
recall that the classical modular polynomials parameterize L isogonies between a pair of
elliptic curves. And if I know the J-invariant of some elliptic curve, I can find all the L
isogenous curves; I just plug the J-invariant into this modular polynomial, it's got integer
coefficients in two variables, I plug it in for one of them, and I'm going to get a univariate
polynomial out of degree L plus 1 and the roots of that polynomial of all the L isogenous
curves.
Now, if I restrict my -- so this allows us to define a graph of L isogonies among all the
J -- on all the J-invariants over FP, and if I restrict my attention just to J-invariants to
curves that have trace plus or minus T, that's a connected graph. And it looks like a
volcano. It's got a cycle up at the top that's called the -- this terminology comes from
David Kohel. It's called the crater of the volcano. And then hanging down from each
node on that cycle is a roughly L [inaudible] tree of height K where L to the K is the
largest power of L divided into conductor V. I don't know if people can see my volcano
here. But it's a cycle with a bunch of trees hanging down. The trees are all the same
height and they all have the same fan out.
And the key point is that the nodes at the top of the volcano, on the crater of the volcano,
L squared doesn't divide their conductor. So this allows us to kill out any divisors of our
conductor that we don't want so that we can get a curve with the right endomorphism
ring.
So when we start with a random J-invariant, we're going to be somewhere on this
volcano, we don't know where, but we're quite likely to be somewhere near the bottom
because there's more curves down there. But by computing isogonies, we can figure out
where we are on the volcano, mostly because we can always tell when we're at the
bottom, because those nodes only have one edge coming out of them.
That means that the modular polynomial when you plug in the J-invariant you only find
one root, whereas all the nodes above the bottom you get L plus 1 roots. And so you can
use that by walking through various paths in this graph to figure out how high you are in
the volcano and how to get higher.
So that sounds like a lot of work, but as long as L is small, it doesn't take very long. And
we only have to do this once. Once we find a curve with the right trace, we just climb a
few volcanoes, one for each L dividing V, and then we've got a curve with the right
endomorphism ring.
And that's what it looks like. So, yeah, here's the cycle of the -- up top, and then we've
got trees hanging down. Okay.
All right. So we finished the first part of step two. We found a curve with
endomorphism ring O sub D, we know one root of the Hilbert class polynomial. Now we
want to find all the others. And so to do this, we're going to use the action of the class
group. Up to now I've been sort of identifying the endomorphism ring with this
imaginary quadratic order, but we can take that isomorphism in the other direction.
Suppose we pick an ideal in our imaginary quadratic order that corresponds to some
endomorphism. That endomorphism has some kernel, there's some subgroup of our
elliptic curve that gets killed by it, and that in turn defines an isogony. And if you work
through the definitions, that isogenous curve is going to have the same endomorphism
ring.
So for each ideal in our order, we get an isogony, and this action actually -- this gives us
a group action on the [inaudible] of J-invariants of elliptic curves with endomorphism
ring O sub D, and it factors through the class group and provided P as a totally split
prime, everything that I've said up here works mod P as well.
It doesn't matter which ideal representative we choose for the class group. We're going to
get the same E prime here. But the degree of the isogony does depend on which
representative we choose. It's going to be the norm of that ideal alpha is L and we want L
to be small.
Okay. So to compute this action explicitly, let's suppose for the simplest case our
volcano's flat, it's just a cycle of [inaudible] height zero and our class group is cyclic
generated by a single element of norm L, a single ideal of norm L. To walk the isogony
cycle, we just plug our J-invariant in that we know, the one root that we know, into the
modular polynomial, we're going to get a univariate polynomial, it's going to have exactly
two roots.
Those two roots are going to correspond to the two directions we could walk along our
cycle. We pick one of them. Okay. That gives us a new J-invariant. We plug that in.
We factor out the term X minus J not, which is getting rid of the root we already know,
the one we came from, and there's only going to be one root left and it's going to tell us
the next step to take. And if we go all the way around the order of the -- whatever the
order of that ideal is in the class group, we'll get all of the J-invariants, all of the roots.
We can even do this when L does divide the conductor of V, but it's a little more difficult
because we might fall off the rim of the volcano. We only are interested in these guys up
on top, they're the ones with the right endomorphism ring, but if we take a wrong step we
might find ourselves down at the bottom and have to climb back up again. But as long as
L is small, it's still reasonably efficient to do that.
Any questions? I just realized -- the reason I'm pausing here, this should be curves with
trace T, not O sub D [inaudible]. Any questions before I go on?
All right. Now, in general the situation is more complicated than what I just said. We
only have one -- the ideal class group isn't necessarily cyclic, and even if it is, we don't
necessarily just want to use one representative to generate it. And so we may need to
walk a bunch of cycles, cycles within cycles.
Now, in the ANTS paper, the way they suggest to do this is to take a basis of the class
group, we can then represent any element of the class group in terms of that basis. And if
we put a lexicographical ordering on the exponent factors that correspond to that
representation, we can enumerate those elements using just one isogony per step.
Okay. So that sounds good. The only problem with it is that each step requires O of L
squared operations in Fs of P, where L here is the norm of our basis element. The
problem is that if we insist on using a basis, there's no reason that those norms have to be
small. Okay.
There's a slightly subtle point here. We know under the Generalized Riemann
Hypothesis that the class group is generated by ideal representatives of small norm, but a
set of generators is not the same thing as a basis. Those generators might be dependent.
And when you go to form a basis, you've got to multiply them together, and when you do
that, the norms can blow up.
And it's not hard to find examples of class groups for which every basis contains an
element with large norm, like close to the square root of D. So that -- and that will -- that
would drive the running time right back up to O of D to the three halves if we did that.
So what do we do instead. We can solve this sort of in a very general fashion. If we
suppose you give me a list of generators for some finite group, I can then write down this
composition series where I just knock out one generator at a time. And if that
composition happens -- series happens to be cyclic, which it will be if G is abelian, which
is the case here, then I can define these numbers N sub I, which are just the sizes of these
quotients.
And each of these N sub I's is going to divide the order of the corresponding generator,
and the product of the N sub I's is going to be equal to the order of the group. But N sub
I might not necessarily be equal to the order of alpha sub I. It will be if it was a basis to
begin with.
But this still has the property that we can now uniquely represent every element in the
class group, and we can enumerate all the elements -- all the -- the action of all the
elements in the class group using just one isogony at a time.
Any questions? I see at least one puzzled face.
>>: Just the tradeoff between making [inaudible] unique [inaudible].
>> Drew Sutherland: I only care about the -- all I care about is the uniqueness. I don't
really care about how big these N sub I's are. So I'd be perfectly happy -- if there's an
ideal with norm 3 that generates the entire class group, and so E sub I is equal to the order
of the class group, great, that's fine. It's just the unique representation that we need.
But the key point is that this allows us to enumerate the class group using only elements
that were on our original list of generators. We don't make any new -- we don't create
any new generators.
So to put this into practice, we represent the class group explicitly as usual using binary
quadratic forms, the norm here is just the value of A, and so we're going to focus on
forms with prime norm. We can write down a list of all of the forms with prime norm up
to whatever bound we believe the GRH, take 6 log squared D, if we don't, we can go all
the way up to a root D over 3. It doesn't matter. Either way this approach will work.
The nice thing about the sequence of N sub I's is they'll actually identify redundant
generators because we'll get an equal to 1. And if we take what's left, we're going to get
what I call the norm minimal representation of the class group.
And we can compute this using a generic algorithm quite quickly, either in the order of
the size of the group or even in the square root of the size of the group. The group here is
of order square root of D. And we only have to do this once at the very beginning.
All right. So I'm doing okay on time. So how does this all shake out asymptotically? I
don't want to get into any sort of rigorous analysis, but let's just do sort of a back of the
envelope calculation here to see if we can figure out where we're going to be spending the
most time.
So just a few heuristics to have in mind, the class number we know actually this
rigorously, asymptotically it's about a quarter of the square root of the discriminant, and
there's actually a well-known constant there that's exact.
The primes that we need to use are all going to be roughly on the order of D log D. And
in practice they're never going to be bigger than 2 to the 64th I predict. Maybe someday,
but by the time they are, we'll be running on 128-bit computers. The biggest primes I've
had to use so far are about 50 bits.
And L, which is perhaps the most interesting case, we know under the GRH that we
never need to use L bigger than O of log squared. Conjecturally, it's O of log -- O of log
the 1 plus epsilon, but for most discriminants it's a lot better than that, it's more like log
log. And that's a bound on the largest L of all of the elements in our generating set. But
we really only care about the L for the smallest one, because that's where we're going to
be spending all of our time, we're going to be walking that interior cycle a lot; we're only
going to be walking in the bigger cycles occasionally.
So, in fact, we actually expect L to be bounded by a constant for most discriminants,
because we basically got a 50/50 chance that each L, you know, that D will be quadratic
residue mod L.
>>: [inaudible] over epsilon D, that's not [inaudible]?
>> Drew Sutherland: [inaudible] it's D. Yeah.
>>: I think it's root D.
>> Drew Sutherland: You think it's root D. I will check. I could be wrong.
>>: [inaudible]
>> Drew Sutherland: Thank you.
>>: If you just look at the bound on the size of the coefficients, so the primes you need to
go up to, the size of the coefficients of the Hilbert class polynomial's integers are like E
to the root D log [inaudible].
>> Drew Sutherland: No, no, you need root D. The bound on I is root D. You need root
D of the primes. But the size of the individual primes is O of D, or O of D log D.
>>: [inaudible] the product of all the primes only needs to be as big as E to the root D
log D. That's the product of [inaudible].
>> Drew Sutherland: Right. Right. So we're going to get ->>: Oh, sorry, so [inaudible] get the [inaudible].
>> Drew Sutherland: Yeah. Yeah.
>>: Okay. Well, I'll [inaudible].
>> Drew Sutherland: Yeah. Let me -- let me -- I think it's right, but I could be wrong.
Let me check. Let me come back to it at the end. I'll double-check [inaudible] right. But
I can ->>: Are you writing the actual [inaudible]?
>> Drew Sutherland: I'm actually writing not the log of the prime but the prime itself. I
can tell you just from knowing from having run it, the primes are typically bigger than D.
And if you look in your paper, actually, the primes that you give at the example in your
ANTS paper at the end are bigger than 108,708.
>>: So the number of primes is more like [inaudible] ->> Drew Sutherland: Yeah. The number of primes is root D, but the primes, the actual
value of the primes is slightly bigger than D. And the log of the prime is like log D, log P
is like log D. And so you get root D. Yeah. I think that's where you were getting at.
You get root P log P -- log -- root D log D. Does that make sense? Okay. Sorry about
that.
Okay. So how do things shake out? So the first step which in the original algorithm and
certainly the algorithms presented in the ANTS paper was by far the most
time-consuming one. This is finding a curve with the right endomorphism ring, looks to
be something like O of root D long to the 1 1/2 D FP operations.
Now, I'm counting FP operations here because these primes fit inside the word size of our
computer, and so they can be done, it makes sense to treat them as a unit cost because
these are quite quick. Whereas -- and then step 2B, if we suppose L is bounded by log
log D or a constant, we're going to get something like O of root D log D.
And then, surprisingly, the step that you might have thought was the easiest, which is just
multiplying a bunch of linear factors together to get the coefficients of a big polynomial,
is actually asymptotically the dominant step, O of root D log squared D.
Now, that's not always true. You can find discriminants where the smallest norm is big,
in which case step 2B will have the same complexity as this and the constant factors will
mean that step 2B dominates. It will be a lot bigger.
But for most discriminants, it's going to be this last step which I haven't talked about at
all, and I just want to give a quick plug for a library developed by David Harvey for
doing polynomial multiplication over word size over word size modulus. It's really fast.
Because the algorithms here are well known and have been heavily optimized, so we
want to try and use the best one we can.
Okay. All right. So I just want to run through just an example here of how the
computation comes out for three different discriminants. So up top I've picked three
discriminants that are all roughly around 10 to the 10th.
And the first section, the first set of rows of the table show information about the class
groups, so the first number H of V is the class number, that's the degree of the Hilbert
class polynomial. This is our -- the number of bits. Log B is the number of bits that we
need in each coefficient potentially. That's our upper bound.
L1 is the size of the smallest -- the norm of the smallest element in the class group and L2
is the second smallest, and we only needed to use two representatives for any of these
cases.
And then the time there is the number of seconds spent figuring all that out. And you'll
see it's negligible, fractions of a second.
So just to give the size of the class polynomial here in the largest case would be 5 million
bits times 54,000, so something like -- I don't know what that works out to be, but big.
Okay.
N here is the number of primes we're using, and that's going to be determined both by the
bound B and also the class number, and then this is how big the primes are. And you can
see here we've got about 42-bit primes, so that's just slightly bigger than 10 to the 10th.
And this is the amount of time that was spent fiddling around with the primes; i.e.,
figuring out which primes are totally split primes and deciding which ones to use, which
are the best primes. And so you can see it's just a few seconds, not a lot of time.
And then we do some CRT precomputation, and I also wrote down the post-computation
time here, and that's also just a few seconds.
Then we spend a good chunk of the rest of the day in step 2. And I've split up the time in
step 2 into three parts by percentages. So the first number listed here, this says spent 56
percent of these 70,000 seconds finding curves that had the right endomorphism ring over
Fs of little P; spent 14 percent of the time enumerating the action of the class group to
find the other roots; and 30 percent of the time it spent multiplying linear factors together
to build up the class polynomial mod P.
And so in this example, the class number is relatively big for that discriminant, and so
you're seeing it spend more time over here than it is in these two examples. In the middle
column we have a case where the class number is pretty small relative to that
discriminant, and so it's spending most of its time finding curves with the right
endomorphism ring. The class number being small actually makes that harder. Okay.
But it makes these other two steps easier.
And then the last column is a case where we got unlucky; we have a norm that's pretty
big, 11. That means when we're enumerating the action of the class group we're factoring
polynomials of degree 12 or, well, really degree 11 after we remove one root. And that's
a lot more painful than factoring cubics or quadratics. So this -- you spend more time in
this step.
But still overall this is faster than this because at the end of the day B is a lot smaller and
the class number is a lot smaller.
And I'm going to show you in a minute how to make these numbers a lot smaller by a
factor of about 20 or 30 in a second. So the absolute values don't matter so much, but it's
the relative numbers that are interesting.
The last line here is the amount of time spent finding one root of the Hilbert class
polynomial mod big P, our cryptographic-size prime, which here was about 200 bits.
And this number is interesting because in theory this is the ultimate limiting factor
actually. If I have 141,155 computers all running in parallel, I can have them each do one
prime, okay, so this could be made fractions of a second potentially.
Now, I didn't have that many computers, but I did actually run some tests with 14
computers, each with two processors, so I had 28 threads going. So you can -- but even
on just a dual processor, quad processor, you can make these numbers a lot smaller. This
is probably the bottleneck.
This last line here is just -- as a point of interest, you can find all the other roots in less
time than it took to find the first root, as I mentioned.
Okay.
>>: So the -- sorry. The second to the last line [inaudible].
>> Drew Sutherland: That's the time to find the J-invariant you want. One root of the
Hilbert class polynomial, mod big P, mod your big prime. Okay.
All right. Okay. So in the last ten minutes, I want to come back to -- so maybe just to
say what we've gotten to at this stage, actually, before I go on. With what I've shown you
up to now, the underdog has now sort of caught up to the world champion. Okay. We've
sped things up quite significantly for large discriminates, were maybe a thousand times
faster. Not quite that much. Probably more like a hundred times faster.
We're still not quite -- but we're still not competing with the complex analytic method on
a level playing field because this is computing the Hilbert class polynomial, the class
polynomial of the J-invariant, but the other methods for computing class polynomials can
take advantage of much smaller, much more advantageous, not necessarily smaller, but
invariants that have smaller class polynomials. And I want to talk about how we can do
that with the CRT method.
So there's two key properties of the J-invariant that we've been relying on implicitly. The
first is that it generates the ring class field. But the second is that it doesn't matter -- if we
pick any complex number in the upper half plane and apply -- look at its associated
J-invariant, it has the same minimal polynomial, no matter which Tau we pick. That's not
true of every class invariant. And in order for us to be able to work with things mod P,
we need that to be true.
In addition, we'd also like it to be a case that whatever class invariant we use has a nice
algebraic relationship with the J-invariant, so we can easily go back and forth from the
J-invariant to our alternative class invariant.
And the motivation for all of this -- I mean, some of you may have heard of Weber
polynomials and there are a number of other class polynomials, the game we're trying to
play here is just find a class polynomial that has smaller coefficients than the Hilbert
class polynomial because it means we don't need as many primes to compute them.
So this is maybe best demonstrated by example. So the simplest example is gamma 2 is
defined as the cube root of the J-invariant. I mean, I could write down an expression in
terms of the Dedekind eta function, but this is a nice, easy definition.
And if we want to use the algorithm I just described to compute the class polynomial for
gamma 2, we only need to make a few modifications. The first one is we can reduce our
height estimate, our log B, by a factor of three. That means we need only a third as many
primes, so everything gets faster.
To make our lives easy we'll restrict to primes that are congruent to 2 mod 3 so that cube
roots are unique in FP. Makes life simple. Then we're going to still enumerate the
J-invariants, but for each J-invariant we compute we'll take it's cube root, and then we'll
form the class polynomial by multiplying together all of the associated linear factors, so
now we're going to get the class polynomial of gamma 2, its coefficients mod little P for
each or our little Ps, and then at the end we'll apply the Chinese Remainder Theorem to
get the class polynomial gamma 2 mod big P. Then we take a root of that and we cube it
to get a J-invariant which is going to give us the curve we want.
So this is very straightforward and you'll instantly find it makes the algorithm more than
three times as fast.
Now, we can get a little fancier here. It's actually possible to use primes that are
congruent to 1 mod 3, even though the cube roots aren't unique, only one of the cube
roots is actually the right invariant and you can find that you can figure out which one it
is, and Reinier has some ideas on that.
It's also possible rather than enumerating the J-invariants we could enumerate the gamma
2s using a modular polynomial for gamma 2.
However, in practice, neither of these makes things substantially faster than what I've
described here, just because computing cube roots is very easy. It doesn't take any time.
It's just an exponentiation.
Okay. But why stop at a factor of 3? Now, when -- I didn't mention a constraint there,
by the way. We do assume that 3 doesn't divide the discriminant. If we make a stronger
constraint of the discriminant, that it's a congruent to 7 mod 8, then we can use not quite
the Weber invariant but of the square of the Weber invariant, so F here. This is another
function that I could write down in terms of the Dedekind eta function [inaudible]
analytic expression for, but I'll just define it with this algebraic relationship it has with
gamma 2. And so this also defines its relationship with J.
And I can play the same game. I can compute J, I can compute gamma 2, and when P is
congruent to 7 mod 8, I can uniquely compute F squared from this equation.
Now I get to reduce my height bound by a factor of 36, which is very nice. The only
downside to this is that if you're looking for prime order curves, they never have
discriminant congruent to 3 mod 8 or to 7 mod 8. The absolute value of the
discriminant's always 3 mod 8, and that, if you just stare at this equation for a while,
you'll see why that's true.
So for prime order curves, we might instead use a class invariant originally coming from
Ramanujan, and again we need to use the square of the invariant to be able to compute it
exactly. But for a D congruent to the absolute value of E congruent to 11 mod 24, which
is in particular 3 mod 8, we can compute G squared using this relationship. Again, we
compute gamma 2 by taking a cube root, and we can uniquely determine G squared by
solving that equation. And this gives us a factor of 18 improvement.
Now, when we actually apply this, we find that we get a speedup of more than a factor of
3 and more than a factor of 18. And the reason for that is that not only are we using
fewer primes, but we can pick the best fewer, meaning so our average time per prime is
also going down. Even when you throw in the fact that I'm going to force you to use
primes that are just 2 mod 3, that means we have to go out a little bit further, but we're
still winning.
So you can see the benefit here is this is the height bound, the number of bits we need to
compute goes down substantially. That means the number of primes goes down. The
running time then of course goes down as well. But what's also interesting is how the
split of the running times splits up. So with the J-invariant, we were spending most of
our time finding curves to the right endomorphism ring, but now when we're using the
Ramanujan invariant, most of that time is shifting over to the other steps. And in
particular this third step is looking like it's going to become the most expensive, which it
will when we make D big enough.
>>: [inaudible]
>> Drew Sutherland: [inaudible] step is just multiplying a bunch of linear factors to get a
big polynomial. You know, building the polynomial up from its roots, building that
product tree.
>>: [inaudible] right?
>> Drew Sutherland: Yes.
>>: So I know -- I mean, I saw David a week ago and he commented that you -- I mean,
he said that you had some [inaudible] small polynomial [inaudible].
>> Drew Sutherland: Yeah.
>>: [inaudible] taking advantage of that?
>> Drew Sutherland: This is -- these numbers are taking advantage of that. Although not
as much as I could, yes. So David and I -- so I have something that's really fast for low
degree, he has something that's really fast for high degree, and we haven't quite figured
out where we should overlap in the middle. So right now I think I'm switching over to
him too soon, but we're -- that goes back and forth. He makes his code faster, I make my
code faster, so that keeps shifting.
Okay. All right. But now I want to come back down to the underdog's ready to stand
face-to-face, toe-to-toe with the world champion.
So these times are taken from a paper by Andreas Enge. And I should mention all of the
stuff I've been talking about in class invariants, that is joint work with Andreas Enge.
These are taken from his paper on floating point approximations, these running times.
And I've converted everything to 2.4 gigahertz AMD because that's where these were run.
And the complex analytic method here is using the double eta quotient; we're using the
square of the Weber invariant. And you can see that across the board the CRT method is
faster, but perhaps most notably its advantage grows; it actually has a logarithmic
advantage here.
And this is because the step of building the polynomial up from its roots, that's also the
dominant step for the complex analytic method, but it has to do that over Z. So whereas
we're doing that mod lots of little Ps, so we gain a logarithmic factor.
Already at 100,000, with class number 100,000 here, the complex analytic method is
starting to run out of gas because this polynomial is getting too big to fit in memory, and
so the constant factors are going down. But even if you step back to the earlier stages
here where the class number is doubling in each row, you can see that the ratios are
improving almost by doubling at each step.
>>: [inaudible]
>> Drew Sutherland: So here the CRT method is getting a factor of 36. Here we're only
getting a factor of 28. Okay. But I would say -- and I actually -- but the complex
analytic method could use ep squared also. It doesn't because there are I guess practical
reasons -- I don't know the details -- that the eta question is actually faster. Okay. So I
would call it -- if I said the problem is do a CM construction using whatever method you
think is the fastest go, then this is a fair comparison.
And as William suggested, another big advantage of the CRT method is how easy it is to
parallelize it. And so for the larger examples that I've computed, I've been running on a
small cluster of machines, and so here's one with a discriminant on the order of 10 to the
13th, class number about 700,000, took 11 hours. Scaling up to 10 to 14th and class
number over 2 million took about 4 1/2 days. And if I had more computers, I could make
that smaller.
And there's a lot of headroom here, because what I find most remarkable about this is the
amount of memory is still -- even at 10 to the 14th is still very small. I mean, the class
polynomial mod big P -- our big P here is about 256 bits -- is only about 64 megabytes.
Okay. So even with everything, all of the infrastructure surrounding it, we're only use 2or 300 megabytes per thread here that's running.
And the total disk storage, the total amount of stuff we ever needed to write down is less
than 2 gigabytes. It's 28 times 64 megabytes.
Now, you might say, well, what about the tens of terabytes you were telling me about
earlier? Or in this example I think the biggest one here is -- so with the Ramanujan
invariant, we get it down to 4 terabytes. Okay. We're still computing 4 terabytes. There
is 4 terabytes of data cycling through the memory of the machine. We just never need to
save it all in one place. So larger computations are certainly feasible.
And that's where I'll end. That's all I have.
[applause]
>> Drew Sutherland: Oh, is there another slide? Oh, I lied. That's not all I have. Sorry.
So why did I put this last slide here? Ah, yes. This was to just give you an idea of how
things scale. So here I've just picked discriminants close to powers of ten. And the only
thing that's perhaps really worth noticing is that modulo fluctuations in the class number,
it's roughly linear, which you'd expect.
But the point that I wanted to focus on is the splits. You look at how much time is spent
in part C, the building the polynomial from its roots, and it's just getting bigger and
bigger and bigger. And so as you allow it, that's going to be the limiting factor. And
that's really all I have. Okay. Sorry about that. Any questions?
>>: [inaudible] asymptotic complexity at all or you just changed the constants of the
algorithm overall, that you've dramatically changed [inaudible].
>> Drew Sutherland: Yeah. So -- well, obviously the space. Yeah.
>>: [inaudible] but I mean as far as the [inaudible].
>> Drew Sutherland: So there are certainly some log log factors.
>>: Right.
>> Drew Sutherland: Okay. On the -- well, so there's two separate issues. One is there
is what you can prove, okay, I think it's possible to prove a better bound than the -- I
think it was a D log 7D. Yeah. It might be possible to improve that by one, but I'm not
sure. I need to think about that.
>>: [inaudible]
>> Drew Sutherland: Yeah. I don't know about that. Then there's the heuristic bound in
the their paper, which is the D log cubed D.
I think under pessimistic conditions I don't improve that. I would argue that one I would
come up with some more pragmatic heuristics that would say that it's better than that.
Part of it I guess depends really on how you view word size multiplications on your
computer, if you think those take log squared, you know, bit operations, then I'm not
going to get you anything. But...
Yeah, I think that's the answer.
>>: A question about running things in parallel, of course, I'm a big fan of the CRT
approach, but [inaudible] analytic approach, right [inaudible] points [inaudible] parallel.
>> Drew Sutherland: So you've still got to -- you've still got the problem of building the
polynomial up at the end. And there you're going to have -- I think that -- I mean, it's not
that you can't do it in parallel, it can be done in parallel. But the amount of effort you
would have to put into engineering that is much more substantial.
I mean, the last few steps of that FFT that you would need to compute to do just -- just
think about the very last step of multiplying the last two polynomials together, which is
the most time-consuming step. So you've got two polynomials of order of degree H over
2 with huge coefficients and you've got to multiply them together. Doing that across a
hundred machines is not a fun project. I'm not saying it can't be done, but you'd have to
think hard. Whereas here there's no thought involved, and so, okay, you do the first
prime, you do the second prime. It's not hard. Any other questions?
>>: Can you say anything about [inaudible]?
>> Drew Sutherland: So this is all code written in C. I use Montgomery representation
for all the finite fields. I take advantage of a couple of other sort of standard tricks. So
the elliptic curve implementation I use an affine group law computed in parallel across a
bunch of curves, so then I'm effectively getting six multiplications or seven per squaring
because I can do the -- again, this is another Montgomery trick doing the field inversions
in parallel.
I need to be able to handle any elliptic curve, so picking a special curve like an Edwards
curve or a Montgomery curve isn't really an option in general, and even if I did, I don't
believe that it would be faster than the parallel affine trick.
The other big implementation detail that gives a huge constant factor is one that I used
earlier in the paper that I presented at ANTS, which is specialized optimized code for
factoring cubics and quartics, so those tests for -- that helps both when L is 2 or 3 here -or when L is 3, I guess.
But also for testing 3 torsion and 4 torsion, being able to solve a -- find a root of a quartic
really fast is the difference between that being useful and not being useful. If you just
apply a standard root-finding algorithm, you know, compute X to the P mod, whatever,
it's too slow to help. But if you can solve by radicals and be clever about how you do
that, it gets to be fast enough to be useful.
And then I tried all sorts of clever things on building the polynomial up from its roots,
and then I gave up when I saw David's code. So...
Okay. That's it.
[applause]
Download