>> Reinier Broker: We're very happy to have a visitor from MIT, Drew. He just flew in yesterday and flying out tonight, so it's a bit of a heroic visit. He's going to talk to us about computing class polynomials with the Chinese Remainder Theorem. >> Drew Sutherland: So to set the stage for this talk, I'd just like to take you back to the Algorithmic Number Theory Symposium that took place this past May in Banff. Some of you I know were there. The Selfridge prize for the conference was awarded to a paper by Belding-Broker-Enge-Lauter, which tells a story of three algorithms, all for computing Hilbert class polynomials. The first of the three, the complex analytic method, is the oldest, it's the most well known and the most widely used. It's sort of the reigning champion. The second is a p-adic algorithm which is actually very nicely described by Reinier Broker in the most recent issue of Mathematics of Computation. Now, the third algorithm is a bit of a mongrel. The idea of using the Chinese Remainder Theorem for computing Hilbert class polynomials has been kicking around for quite a while, but until very recently it wasn't obvious that this could be made practical. The surprising result that was presented at the conference is that heuristically, anyway, all three of these algorithms appear to have the same asymptotic complexity, something like roughly quasi linear in D, the discriminant. This is surprising, mostly because prior to this paper, this CRT method was known to have complexity something like, oh, D to the three halves. So most of the paper and most of Reinier's talk was focused on ways to improve the CRT method. Now, if you're like me and you've got sort of a soft spot for algorithms, you can't help but root for the underdog. And as the talk went on and Reinier talked about all the neat ideas they'd come up with for improving the CRT method, I got more and more excited, and I couldn't wait for the end of the talk when I knew the climactic showdown would come between the underdog and the reigning champion. But sadly this was not a Hollywood ending. The underdog gets crushed. It's not even close. Even with all the improvements that they came up with. And, in fact, so they found in their initial implementation the complex analytic method still seemed to be about 50 times faster. And, in fact, the real situation is a lot worse than that. Because in practice people don't actually compute Hilbert class polynomials with the complex analytic method; they compute other class polynomials, ones with smaller coefficients that can be computed more quickly. And we don't necessarily know how to do that with the CRT method, so the real difference that we're looking at here is a factor of more than a thousand. Now, by the end of this talk I'm going to hopefully convince you that the CRT method for large enough discriminates is easily a hundred times faster than the complex analytic method. So we've got five orders of magnitude we're going to cover in the next hour, or little less than an hour, so let's get started. All right. Just a little bit of back ground. One of the main reasons people are interested in computing Hilbert class polynomials, or any class polynomial for that matter, is to construct elliptic curves of known order. So the idea is you have some finite field, let's suppose it's a prime field that we like, and we have some number of points we wish our elliptic curve had, and that tells us what the trace of the curve T should be. And we can write down an equation 4P equals T squared minus V squared D, where D is some square-free negative discriminant. And if we happen to know, if we can pull out of our pocket, the Hilbert class polynomial for that discriminant, reduce it mod P, find a root, that will tell us the J-invariant of the curve we want. And then all we've got to do is figure out what the right sign is of the trace, and we can take a twist if we need to. So the only hard part in all of this is figuring out that class polynomial, that Hilbert class polynomial. Now, the -- sort of we can define Hilbert class polynomial in terms of elliptic curves if we imagine we start with a discriminant D that uniquely determines some imaginary quadratic order which we can think of as just a lattice of points in the complex plane, we can take a quotient of the plane with that lattice, we're going to get a torus, some elliptic curve. It has some J-invariant in the minimal polynomial -- that's an algebraic integer -and the minimal polynomial of that J-invariant is the Hilbert class polynomial. Now, D is square-free. If it's a fundamental discriminant, that's going to give us the Hilbert class field. But in general we're going to get the ring class field associated with the imaginary quadratic order. Now, the interesting thing for what we're going to be using today -- that we're going to be talking about today is the case where we have a prime that splits completely in the Hilbert class field. Equivalently, that just means we can write it in this form. Then the Hilbert class polynomial is going to split completely mod P into linear factors, and its roots are just going to be a list of all the J-invariants of elliptic curves whose endomorphism ring is isomorphic to this quadratic order O sub D, and so I'll just indicate that by saying O sub E is equal to O sub D. Now, in principle, the CM method can be applied to construct any ordinary elliptic curve as long as the trace is nonzero. But in practice we only know how to do this when the discriminate is fairly small. And if we pick a curve at random, that's not likely to be true if we have some cryptographic-size prime. Now, why do we need the discriminant to be small? Well, the Hilbert class polynomial is big, really big. It takes more than D bits just to write it down. And if we're talking about some cryptographic size prime P, that could be a really big polynomial. So coming down to earth a little bit, here's just some examples of the estimates of the size of the Hilbert class polynomial for various discriminants. These are neither the largest nor smallest cases, just fundamental discriminates close to powers of 10. The columns H here are the class number of D and they tell us the degree of the Hilbert class polynomial. This is a bound on the size of the coefficients, and if we take the class number times log B we get a rough estimate within a few percent of the size of the Hilbert class polynomial. And you can see already at around 10 to the 10th we're talking about gigabytes, just to write down this polynomial, potentially more than would fit comfortably in our machines' memory. And if we want to go up to 10 to the 12th or even, say, 10 to the 14th, which is where we're headed, we're needing -- we're going to need to deal with terabytes of data. Okay. So at this point you might ask why on earth would you ever want to compute a polynomial that big. Well, one motivating reason for using large discriminants is pairing-based cryptography. The idea here is we have -- and this will work on any elliptic curve, but we take a pair of points on elliptic curve and we look at some pairing, the Weil pairing, the Tate pairing, that's going to give us a map of the two points on elliptic curve into some finite field, extension of our finite field, a base field that we're working on. The interesting case is where the degree of that extension is small but not super small, so roughly we'd like the embedding degree to be somewhere between 6 and 24. And we want to choose the embedding degree and the size of our prime so that we just balance the difficulty of the discrete logarithm problem on the elliptic curve and in our extension field. And that means we have a fairly narrow range of parameters that are useful. So say we take embedding degree 6, we should really pick our prime somewhere between 170 and 192 bits. So if we have a many degree 10, then maybe we want a slightly larger prime. Now, these are very tight constraints; there are not a lot of curves that meet these criteria. In fact, you might ask are there any. Well, there are. There are infinitely many. But if we insist on keeping the discriminates small, they're going to be very hard to find. So this table counts the number of prime order pairing-friendly curves with embedding degree 6 or 10 and prime of the size indicated with discriminate less than various powers of 10. So you can see if we wanted an embedding degree 10 curve that would be useful in cryptography, the first discriminate we find is bigger than 10 to the 9th, and we've only got eight to choose from if we want it to be less than 10 to the 10th. But if we're willing to go a bit further, we can find a lot more of these curves. Another thing to keep in mind is these might not be the only constraints we want to put on our curves. There might be other criteria we'd like our curve to satisfy, and it's going to make it even harder to find these curves unless we can handle big discriminates. Okay. So the basic idea behind the CRT method is very simple, as with any Chinese Remainder Theorem application, we start by picking a bunch of little primes. Although here the primes aren't going to be so little, our P's of I's are going to be roughly the same size as our discriminate D. We're going to work entirely with primes that split completely in the Hilbert class field, so they can all be written in this form, 4P equals T squared minus V squared D. We're going to pick enough of them so that we can uniquely determine the coefficients of the Hilbert class polynomial over the integers. And our next step is for each of our little primes to compute, figure out what the roots of the Hilbert class polynomial are, and then multiply together a bunch of linear factors to get the coefficients of the polynomial mod P, mod R little P's of I here. And then we can apply the Chinese Remainder Theorem to compute the Hilbert class polynomial which is a polynomial with integer coefficients. Or, alternatively, we might want to compute the Hilbert class polynomial mod some cryptographic-size prime, big P, because typically that's what we want to do in practice. We don't really care about the coefficients over the integers. The first thing we're going to do when we find out what they are is we're going to reduce mod P. And there's a way to do this directly without necessarily ever computing it over the integers. And so this was -- this idea uses the Explicit Chinese Remainder Theorem as suggested in a paper by Agashe, Lauter, and Venkatesan. Now, as originally proposed, the way we find the roots of the Hilbert class polynomial is total brute force, just try every possibility, we run through all the J-invariants in FP and see if they give us a curve with the right endomorphism ring. Remember the roots of the Hilbert class polynomial are just a list of curves with endomorphism ring O sub D. Now, even if we knew how to compute the endomorphism ring really quickly, like say in unit time, that's still going to take too long, there's too many curves to check. Okay. The big improvement in the ANTS paper was to realize that you only need to find one root of the Hilbert class polynomial; you can then find all the others using the action, the class group, which can be computed explicitly via isogonies, and I'm going to talk about how that works. And so when you apply this idea, now you only need to find one root, not all of them. The complexity is now quasi linear in K. So it's potentially competitive with the other methods that are out there. But as indicated in the beginning, the preliminary results are very disappointing. Okay. So we need to figure out how we're going to make it faster, but before we do that, I want to talk a little bit about the Explicit Chinese Remainder Theorem. So if I tell you I'm thinking of a number, say, that's less than a positive number integer, less than 105, and I tell you that it's 2 mod 3, 3 mod 5 and 4 mod 7, if you sat down and thought about it for a while you could figure out what my number was. But suppose I don't want you to tell me what my number is, I just want you to tell me what it is mod 11. Can you do that any more efficiently than computing what my number is as an integer and then reducing mod 11. And it turns out there is actually a way to do that directly. This was first suggested in a paper by Montgomery and Silverman. And it uses a similar approach to the traditional Chinese Remainder Theorem. There are these coefficients that we can precompute. But then we also need to compute an approximation to a certain integer. I'm not going to go into the details of the algorithm other than to note that depending on the parameters, you can save you a little bit of time, maybe a logarithmic factor, but it's not necessarily a whole lot better if you're just computing one coefficient C. However, in our situation, we need to apply the Chinese Remainder Theorem to every coefficient of the Hilbert class polynomial. There might be thousands, hundreds of thousands, even a million of those. And the CRT primes are the same every time. So these coefficients A sub I, M sub I, and this big M, which was the product of all of our CRT primes, these are always the same. We can precompute them once, and if we're careful on how we do that, we can do that in root D time and space. And the amount of information we need to maintain as we go along, if we imagine we're just going to be told the C sub I's, these are the value of the coefficients mod P sub I, one at a time, all we have to do is maintain these sums, we can just sort of add C sub I in and be reducing mod P as we go. We also need to maintain this approximation R, but that only takes log P bits. So even though we need to compute a lot of data, we need enough C sub I's to be able to determine the value of the coefficient over the integers. We never need to write that number down. We only need to maintain log P bits for each coefficient. And that brings the space down to O of root D log P. Now, this is kind of exciting. I mean, when I first realized this -- I mean, it seems obvious, but when I first realized this I got pretty excited because it made me realize that you could potentially apply the Chinese Remainder Theorem to much larger discriminants. But only if you can figure out how to make the algorithm faster. I mean, running in less space isn't much use if it takes a thousand times as long. >>: So the other two methods that you mentioned at the very beginning don't have this sort of possible advantage, like they always take more memory. >> Drew Sutherland: Certainly for the complex analytic that's true, and I believe that would be true for p-adic, but Reinier could answer. Yeah. Yes. Okay. So that's exciting. Yes. And I'm a big fan of the Chinese Remainder Theorem just in general principle. You have an army of primes, lots of little primes nibbling away at the problem. They can do magic. >>: [inaudible] easier to parellelize. >> Drew Sutherland: Absolutely. Yes. I mean, good point. We'll come back to that one. Okay. So now I'm going to go into the algorithm in a little more detail. I'm not going to belabor every step of this algorithm. I'm just going to focus on two of them, where all the interesting stuff happens. But just to give you sort of a sketch of what's going on here. We imagine we're starting with some big P is our cryptographic-size prime. We're trying to construct a curve over that prime field. We know what T is, the trace of that curve, because we know what order we'd like the curve to have. And that allows us to determine this discriminant D. And now we're going to find out the J-invariants of every curve whose endomorphism ring is O sub D over FP. And we could -- we may only want one, but it actually takes no more work to find them all. The first thing we do in step one is a bunch of precomputation. We do some stuff with the class group, we pick a bunch of primes, and we do this CRT precomputation. Then we spend most of our time in a loop in step two. For each of our CRT primes our first step is to find a root of the Hilbert class polynomial and we do that by finding an elliptic curve mod little P that has the endomorphism ring O sub D. Once we've done that, we know one root of the Hilbert class polynomial, and then we're going to use the class group which we precomputed, we're going to use the action of the class group on that root to compute all the other roots. And I'm going to explain how that works. But once -- we're going to get H roots where H is the class number, then we need to put my linear factors together, update CRT sums, and keep on trucking. And then at the very end, we've got a little bit of post computation to do to get the value of the Hilbert class polynomial mod big P, and then the very last step is to find a root and once we know one root of the Hilbert class polynomial mod big P, we can do the same thing we did here in step 2B over big P to get all the other roots very efficiently. In fact, it takes more time to find the first root than it does to find all the rest of them. Question. >>: The main difference is instead of using isogony is using the Galois action; is that right? >> Drew Sutherland: We're using -- we're using the Galois action computed via isogonies. So we are using isogonies. >>: Okay. >> Drew Sutherland: Yeah. >>: You're using it twice, though, that's the main idea? >> Drew Sutherland: Yes. I'm using it in two different ways. I think it will be clear in a moment. I'm going to get into both of these steps. >>: [inaudible] or maybe you just [inaudible] before meaning what was at the conference at ANTS. >> Drew Sutherland: So... >>: How does this differ from the algorithm at ANTS? Or is this just the same? >> Drew Sutherland: So this is the algorithm at ANTS. Okay. The only -- I'll maybe highlight some of the differences. The differences are using the Explicit Chinese Remainder Theorem the way we're going to represent the class group is slightly different. And the -- there's some optimizations that we're going to make to both of these steps. And I'll talk about those. But the basic structure is the same. The Explicit Chinese Remainder Theorem is maybe the biggest difference. Okay. >>: [inaudible] >> Drew Sutherland: Why is step four there? Because we're trying to construct an elliptic curve at the end. If all you care about is knowing the polynomial, we're done at step three. >>: [inaudible] >> Drew Sutherland: You don't necessarily. I'd throw it in there simply because you get them for free. Maybe -- but, I mean, this gives you -- well, I'll give you one reason. Let's suppose you'd like to be able to write down your curve, elliptic curve as Y squared equals X cubed minus 3X plus B. Well, this -- you go through all of the isogenous curves, you'll have a very easy time finding one that you can put in that form. But if you just picked one, maybe it wouldn't. Okay. In terms of the complexity, every step on this page takes on the order of root D time or less. But this loop here has to be repeated roughly root D times, assuming the Generalized Riemann Hypothesis. So all of the action is here. All right. So the two steps I'm going to focus on are first finding a curve with the endomorphism ring we want and, second, computing the conjugates, computing the action of the class group. So to find a curve with the right endomorphism ring we're going to start by solving an easier problem. We're going to find a curve that has the right trace. Okay. All the curves with endomorphism ring [inaudible] O sub D have the same trace, but there may be other curves with that trace. And the simplest way to do that is just to pick a curve at random, pick a point on that curve at random, and see if that when we do a scale of multiplication by what we think the curve order should be, whether we get the identity elementer, and if we don't, we know that that's not the curve we want. If we do, we've got a little more work to do, but that extra bit of work is very small. We compute the [inaudible] of the group and the -we can easily determine a group order in O of log to the 1 plus epsilon additional steps. But our odds of finding the curve we want are pretty low; we may have to test a bunch of curves. And so this poses a problem. On average we might expect roughly we're going to have to try two times root P curves before we find a curve with a particular trace we're looking for. I mean, if we -- so suppose traces are uniformly distributed over the Hoss interval [phonetic], which of course they're not, but close enough. So to speed this up, we don't want to use random curves. So the idea here is instead of picking a curve at random, we're going to use a parameterized family of curves that has certain prescribed torsion requirements baked in from the beginning. So, for example, if we know we're looking for a curve whose order is divisible by 12, we can use a parameterization of curves over Q that all have 12 torsion to just enumerate a long list of curves over FP that all have order divisible by 12. That reduces the number of potential curves we need to check. To take this further, we don't want to just necessarily use parameterizations over Q; we can use the modular curve X1 of N which parameterizes elliptic curves with a point of order N on them. We find a point on X1N over FP, and then we can use that to construct an elliptic curve over FP that has N torsion, nontrivial N torsion. Additionally, we can play some standard tricks to very quickly filter out curves that don't have the right order mod 3 and mod 4. And so you put all these things together, you can easily narrow down the search by factor of a hundred. In fact, I've used -- successfully used N up to 29 is about the largest N that I find useful, where -- I mean, it takes a certain amount of time to find points in X1N, you've got to trade that off against how much time you save by narrowing the search. But in practice you can easily get a factor of 20 or 30 here. All right. That gives us a curve with a right trace. But how do we know whether it has the right endomorphism ring. First question we might ask is how likely is it to have the right endomorphism ring. And we can figure out exactly what that probability is by computing the Hurwitz class number. So we know this value 4P, now we're dealing with little P here, 4P minus T squared, if we compute the Hurwitz class number, which is a sum over the divisors of the conductor of these class numbers, it's going to count exactly how many elliptic curves or how many distinct J-invariants there are of elliptic curves who -- which have trace plus or minus T. Okay. These guys are all going to have endomorphism rings that are contained within the maximal order [inaudible] D. Now, if V is 1, the sum is very short, only has one term, and life is simple: every curve with trace plus or minus T has the endomorphism ring we want. And so some early versions of this algorithm actually insisted on making V1 precisely to make life easier here. Because when V is bigger than 1, we've got more work to do, how are we going to -- we could throw the curve away and pick another one, but that's not such a great idea, or we could do some work to try and find a curve with a right endomorphism ring. But I'm going to hope to convince you that actually we should be very happy when V is bigger than 1. Because it actually is going to allow us to speed up the algorithm substantially. One of the biggest problems you find if you just -- the first thing I did when I was working with this algorithm is I sat down and implemented and thought, ah, I don't want to mess around with computing isogonies and dealing with calculating endomorphism ring, I'll just fix D as 1, I'll only use primes where I have a conductor of 1. And if you then run the algorithm, you find it starts off at a sprint for the first few primes, but as the primes are getting bigger, it gets slower and slower. And the reason is there's only H of D curves out there with the endomorphism ring I want, and H doesn't change as P gets bigger. So as P grows, this gets slower and slower, especially if the class number happens to be small, which is kind of counterintuitive because you might think that that would be the easy case since the Hilbert class polynomial's then smaller. So the fix here is to -- once we know a curve with trace plus or minus T, we can find a curve with the endomorphism ring we want as long as we're prepared to go climbing an isogony volcano, or actually maybe several isogony volcanos, and I'll explain what that is in a moment. But before I do, I'd like to observe that assuming that we know how to do this, if we can get from a curve with the right trace to a curve with the right endomorphism ring, it changes our view of which primes we like. Because it means we should be looking for primes where it's easy to find a curve with the right trace, which means we want this big H here, this Hurwitz class number to be big. Which means we want V to be big. And so when we think about picking which CRT primes to use, we don't necessarily want to use the smallest ones; we want to use the ones that optimize this ratio, because every other part of the algorithm only depends on P logarithmically. This is the only place where there's an exponential dependence. Okay. All right. So now I want to just talk real quickly about isogony volcanos. So recall that the classical modular polynomials parameterize L isogonies between a pair of elliptic curves. And if I know the J-invariant of some elliptic curve, I can find all the L isogenous curves; I just plug the J-invariant into this modular polynomial, it's got integer coefficients in two variables, I plug it in for one of them, and I'm going to get a univariate polynomial out of degree L plus 1 and the roots of that polynomial of all the L isogenous curves. Now, if I restrict my -- so this allows us to define a graph of L isogonies among all the J -- on all the J-invariants over FP, and if I restrict my attention just to J-invariants to curves that have trace plus or minus T, that's a connected graph. And it looks like a volcano. It's got a cycle up at the top that's called the -- this terminology comes from David Kohel. It's called the crater of the volcano. And then hanging down from each node on that cycle is a roughly L [inaudible] tree of height K where L to the K is the largest power of L divided into conductor V. I don't know if people can see my volcano here. But it's a cycle with a bunch of trees hanging down. The trees are all the same height and they all have the same fan out. And the key point is that the nodes at the top of the volcano, on the crater of the volcano, L squared doesn't divide their conductor. So this allows us to kill out any divisors of our conductor that we don't want so that we can get a curve with the right endomorphism ring. So when we start with a random J-invariant, we're going to be somewhere on this volcano, we don't know where, but we're quite likely to be somewhere near the bottom because there's more curves down there. But by computing isogonies, we can figure out where we are on the volcano, mostly because we can always tell when we're at the bottom, because those nodes only have one edge coming out of them. That means that the modular polynomial when you plug in the J-invariant you only find one root, whereas all the nodes above the bottom you get L plus 1 roots. And so you can use that by walking through various paths in this graph to figure out how high you are in the volcano and how to get higher. So that sounds like a lot of work, but as long as L is small, it doesn't take very long. And we only have to do this once. Once we find a curve with the right trace, we just climb a few volcanoes, one for each L dividing V, and then we've got a curve with the right endomorphism ring. And that's what it looks like. So, yeah, here's the cycle of the -- up top, and then we've got trees hanging down. Okay. All right. So we finished the first part of step two. We found a curve with endomorphism ring O sub D, we know one root of the Hilbert class polynomial. Now we want to find all the others. And so to do this, we're going to use the action of the class group. Up to now I've been sort of identifying the endomorphism ring with this imaginary quadratic order, but we can take that isomorphism in the other direction. Suppose we pick an ideal in our imaginary quadratic order that corresponds to some endomorphism. That endomorphism has some kernel, there's some subgroup of our elliptic curve that gets killed by it, and that in turn defines an isogony. And if you work through the definitions, that isogenous curve is going to have the same endomorphism ring. So for each ideal in our order, we get an isogony, and this action actually -- this gives us a group action on the [inaudible] of J-invariants of elliptic curves with endomorphism ring O sub D, and it factors through the class group and provided P as a totally split prime, everything that I've said up here works mod P as well. It doesn't matter which ideal representative we choose for the class group. We're going to get the same E prime here. But the degree of the isogony does depend on which representative we choose. It's going to be the norm of that ideal alpha is L and we want L to be small. Okay. So to compute this action explicitly, let's suppose for the simplest case our volcano's flat, it's just a cycle of [inaudible] height zero and our class group is cyclic generated by a single element of norm L, a single ideal of norm L. To walk the isogony cycle, we just plug our J-invariant in that we know, the one root that we know, into the modular polynomial, we're going to get a univariate polynomial, it's going to have exactly two roots. Those two roots are going to correspond to the two directions we could walk along our cycle. We pick one of them. Okay. That gives us a new J-invariant. We plug that in. We factor out the term X minus J not, which is getting rid of the root we already know, the one we came from, and there's only going to be one root left and it's going to tell us the next step to take. And if we go all the way around the order of the -- whatever the order of that ideal is in the class group, we'll get all of the J-invariants, all of the roots. We can even do this when L does divide the conductor of V, but it's a little more difficult because we might fall off the rim of the volcano. We only are interested in these guys up on top, they're the ones with the right endomorphism ring, but if we take a wrong step we might find ourselves down at the bottom and have to climb back up again. But as long as L is small, it's still reasonably efficient to do that. Any questions? I just realized -- the reason I'm pausing here, this should be curves with trace T, not O sub D [inaudible]. Any questions before I go on? All right. Now, in general the situation is more complicated than what I just said. We only have one -- the ideal class group isn't necessarily cyclic, and even if it is, we don't necessarily just want to use one representative to generate it. And so we may need to walk a bunch of cycles, cycles within cycles. Now, in the ANTS paper, the way they suggest to do this is to take a basis of the class group, we can then represent any element of the class group in terms of that basis. And if we put a lexicographical ordering on the exponent factors that correspond to that representation, we can enumerate those elements using just one isogony per step. Okay. So that sounds good. The only problem with it is that each step requires O of L squared operations in Fs of P, where L here is the norm of our basis element. The problem is that if we insist on using a basis, there's no reason that those norms have to be small. Okay. There's a slightly subtle point here. We know under the Generalized Riemann Hypothesis that the class group is generated by ideal representatives of small norm, but a set of generators is not the same thing as a basis. Those generators might be dependent. And when you go to form a basis, you've got to multiply them together, and when you do that, the norms can blow up. And it's not hard to find examples of class groups for which every basis contains an element with large norm, like close to the square root of D. So that -- and that will -- that would drive the running time right back up to O of D to the three halves if we did that. So what do we do instead. We can solve this sort of in a very general fashion. If we suppose you give me a list of generators for some finite group, I can then write down this composition series where I just knock out one generator at a time. And if that composition happens -- series happens to be cyclic, which it will be if G is abelian, which is the case here, then I can define these numbers N sub I, which are just the sizes of these quotients. And each of these N sub I's is going to divide the order of the corresponding generator, and the product of the N sub I's is going to be equal to the order of the group. But N sub I might not necessarily be equal to the order of alpha sub I. It will be if it was a basis to begin with. But this still has the property that we can now uniquely represent every element in the class group, and we can enumerate all the elements -- all the -- the action of all the elements in the class group using just one isogony at a time. Any questions? I see at least one puzzled face. >>: Just the tradeoff between making [inaudible] unique [inaudible]. >> Drew Sutherland: I only care about the -- all I care about is the uniqueness. I don't really care about how big these N sub I's are. So I'd be perfectly happy -- if there's an ideal with norm 3 that generates the entire class group, and so E sub I is equal to the order of the class group, great, that's fine. It's just the unique representation that we need. But the key point is that this allows us to enumerate the class group using only elements that were on our original list of generators. We don't make any new -- we don't create any new generators. So to put this into practice, we represent the class group explicitly as usual using binary quadratic forms, the norm here is just the value of A, and so we're going to focus on forms with prime norm. We can write down a list of all of the forms with prime norm up to whatever bound we believe the GRH, take 6 log squared D, if we don't, we can go all the way up to a root D over 3. It doesn't matter. Either way this approach will work. The nice thing about the sequence of N sub I's is they'll actually identify redundant generators because we'll get an equal to 1. And if we take what's left, we're going to get what I call the norm minimal representation of the class group. And we can compute this using a generic algorithm quite quickly, either in the order of the size of the group or even in the square root of the size of the group. The group here is of order square root of D. And we only have to do this once at the very beginning. All right. So I'm doing okay on time. So how does this all shake out asymptotically? I don't want to get into any sort of rigorous analysis, but let's just do sort of a back of the envelope calculation here to see if we can figure out where we're going to be spending the most time. So just a few heuristics to have in mind, the class number we know actually this rigorously, asymptotically it's about a quarter of the square root of the discriminant, and there's actually a well-known constant there that's exact. The primes that we need to use are all going to be roughly on the order of D log D. And in practice they're never going to be bigger than 2 to the 64th I predict. Maybe someday, but by the time they are, we'll be running on 128-bit computers. The biggest primes I've had to use so far are about 50 bits. And L, which is perhaps the most interesting case, we know under the GRH that we never need to use L bigger than O of log squared. Conjecturally, it's O of log -- O of log the 1 plus epsilon, but for most discriminants it's a lot better than that, it's more like log log. And that's a bound on the largest L of all of the elements in our generating set. But we really only care about the L for the smallest one, because that's where we're going to be spending all of our time, we're going to be walking that interior cycle a lot; we're only going to be walking in the bigger cycles occasionally. So, in fact, we actually expect L to be bounded by a constant for most discriminants, because we basically got a 50/50 chance that each L, you know, that D will be quadratic residue mod L. >>: [inaudible] over epsilon D, that's not [inaudible]? >> Drew Sutherland: [inaudible] it's D. Yeah. >>: I think it's root D. >> Drew Sutherland: You think it's root D. I will check. I could be wrong. >>: [inaudible] >> Drew Sutherland: Thank you. >>: If you just look at the bound on the size of the coefficients, so the primes you need to go up to, the size of the coefficients of the Hilbert class polynomial's integers are like E to the root D log [inaudible]. >> Drew Sutherland: No, no, you need root D. The bound on I is root D. You need root D of the primes. But the size of the individual primes is O of D, or O of D log D. >>: [inaudible] the product of all the primes only needs to be as big as E to the root D log D. That's the product of [inaudible]. >> Drew Sutherland: Right. Right. So we're going to get ->>: Oh, sorry, so [inaudible] get the [inaudible]. >> Drew Sutherland: Yeah. Yeah. >>: Okay. Well, I'll [inaudible]. >> Drew Sutherland: Yeah. Let me -- let me -- I think it's right, but I could be wrong. Let me check. Let me come back to it at the end. I'll double-check [inaudible] right. But I can ->>: Are you writing the actual [inaudible]? >> Drew Sutherland: I'm actually writing not the log of the prime but the prime itself. I can tell you just from knowing from having run it, the primes are typically bigger than D. And if you look in your paper, actually, the primes that you give at the example in your ANTS paper at the end are bigger than 108,708. >>: So the number of primes is more like [inaudible] ->> Drew Sutherland: Yeah. The number of primes is root D, but the primes, the actual value of the primes is slightly bigger than D. And the log of the prime is like log D, log P is like log D. And so you get root D. Yeah. I think that's where you were getting at. You get root P log P -- log -- root D log D. Does that make sense? Okay. Sorry about that. Okay. So how do things shake out? So the first step which in the original algorithm and certainly the algorithms presented in the ANTS paper was by far the most time-consuming one. This is finding a curve with the right endomorphism ring, looks to be something like O of root D long to the 1 1/2 D FP operations. Now, I'm counting FP operations here because these primes fit inside the word size of our computer, and so they can be done, it makes sense to treat them as a unit cost because these are quite quick. Whereas -- and then step 2B, if we suppose L is bounded by log log D or a constant, we're going to get something like O of root D log D. And then, surprisingly, the step that you might have thought was the easiest, which is just multiplying a bunch of linear factors together to get the coefficients of a big polynomial, is actually asymptotically the dominant step, O of root D log squared D. Now, that's not always true. You can find discriminants where the smallest norm is big, in which case step 2B will have the same complexity as this and the constant factors will mean that step 2B dominates. It will be a lot bigger. But for most discriminants, it's going to be this last step which I haven't talked about at all, and I just want to give a quick plug for a library developed by David Harvey for doing polynomial multiplication over word size over word size modulus. It's really fast. Because the algorithms here are well known and have been heavily optimized, so we want to try and use the best one we can. Okay. All right. So I just want to run through just an example here of how the computation comes out for three different discriminants. So up top I've picked three discriminants that are all roughly around 10 to the 10th. And the first section, the first set of rows of the table show information about the class groups, so the first number H of V is the class number, that's the degree of the Hilbert class polynomial. This is our -- the number of bits. Log B is the number of bits that we need in each coefficient potentially. That's our upper bound. L1 is the size of the smallest -- the norm of the smallest element in the class group and L2 is the second smallest, and we only needed to use two representatives for any of these cases. And then the time there is the number of seconds spent figuring all that out. And you'll see it's negligible, fractions of a second. So just to give the size of the class polynomial here in the largest case would be 5 million bits times 54,000, so something like -- I don't know what that works out to be, but big. Okay. N here is the number of primes we're using, and that's going to be determined both by the bound B and also the class number, and then this is how big the primes are. And you can see here we've got about 42-bit primes, so that's just slightly bigger than 10 to the 10th. And this is the amount of time that was spent fiddling around with the primes; i.e., figuring out which primes are totally split primes and deciding which ones to use, which are the best primes. And so you can see it's just a few seconds, not a lot of time. And then we do some CRT precomputation, and I also wrote down the post-computation time here, and that's also just a few seconds. Then we spend a good chunk of the rest of the day in step 2. And I've split up the time in step 2 into three parts by percentages. So the first number listed here, this says spent 56 percent of these 70,000 seconds finding curves that had the right endomorphism ring over Fs of little P; spent 14 percent of the time enumerating the action of the class group to find the other roots; and 30 percent of the time it spent multiplying linear factors together to build up the class polynomial mod P. And so in this example, the class number is relatively big for that discriminant, and so you're seeing it spend more time over here than it is in these two examples. In the middle column we have a case where the class number is pretty small relative to that discriminant, and so it's spending most of its time finding curves with the right endomorphism ring. The class number being small actually makes that harder. Okay. But it makes these other two steps easier. And then the last column is a case where we got unlucky; we have a norm that's pretty big, 11. That means when we're enumerating the action of the class group we're factoring polynomials of degree 12 or, well, really degree 11 after we remove one root. And that's a lot more painful than factoring cubics or quadratics. So this -- you spend more time in this step. But still overall this is faster than this because at the end of the day B is a lot smaller and the class number is a lot smaller. And I'm going to show you in a minute how to make these numbers a lot smaller by a factor of about 20 or 30 in a second. So the absolute values don't matter so much, but it's the relative numbers that are interesting. The last line here is the amount of time spent finding one root of the Hilbert class polynomial mod big P, our cryptographic-size prime, which here was about 200 bits. And this number is interesting because in theory this is the ultimate limiting factor actually. If I have 141,155 computers all running in parallel, I can have them each do one prime, okay, so this could be made fractions of a second potentially. Now, I didn't have that many computers, but I did actually run some tests with 14 computers, each with two processors, so I had 28 threads going. So you can -- but even on just a dual processor, quad processor, you can make these numbers a lot smaller. This is probably the bottleneck. This last line here is just -- as a point of interest, you can find all the other roots in less time than it took to find the first root, as I mentioned. Okay. >>: So the -- sorry. The second to the last line [inaudible]. >> Drew Sutherland: That's the time to find the J-invariant you want. One root of the Hilbert class polynomial, mod big P, mod your big prime. Okay. All right. Okay. So in the last ten minutes, I want to come back to -- so maybe just to say what we've gotten to at this stage, actually, before I go on. With what I've shown you up to now, the underdog has now sort of caught up to the world champion. Okay. We've sped things up quite significantly for large discriminates, were maybe a thousand times faster. Not quite that much. Probably more like a hundred times faster. We're still not quite -- but we're still not competing with the complex analytic method on a level playing field because this is computing the Hilbert class polynomial, the class polynomial of the J-invariant, but the other methods for computing class polynomials can take advantage of much smaller, much more advantageous, not necessarily smaller, but invariants that have smaller class polynomials. And I want to talk about how we can do that with the CRT method. So there's two key properties of the J-invariant that we've been relying on implicitly. The first is that it generates the ring class field. But the second is that it doesn't matter -- if we pick any complex number in the upper half plane and apply -- look at its associated J-invariant, it has the same minimal polynomial, no matter which Tau we pick. That's not true of every class invariant. And in order for us to be able to work with things mod P, we need that to be true. In addition, we'd also like it to be a case that whatever class invariant we use has a nice algebraic relationship with the J-invariant, so we can easily go back and forth from the J-invariant to our alternative class invariant. And the motivation for all of this -- I mean, some of you may have heard of Weber polynomials and there are a number of other class polynomials, the game we're trying to play here is just find a class polynomial that has smaller coefficients than the Hilbert class polynomial because it means we don't need as many primes to compute them. So this is maybe best demonstrated by example. So the simplest example is gamma 2 is defined as the cube root of the J-invariant. I mean, I could write down an expression in terms of the Dedekind eta function, but this is a nice, easy definition. And if we want to use the algorithm I just described to compute the class polynomial for gamma 2, we only need to make a few modifications. The first one is we can reduce our height estimate, our log B, by a factor of three. That means we need only a third as many primes, so everything gets faster. To make our lives easy we'll restrict to primes that are congruent to 2 mod 3 so that cube roots are unique in FP. Makes life simple. Then we're going to still enumerate the J-invariants, but for each J-invariant we compute we'll take it's cube root, and then we'll form the class polynomial by multiplying together all of the associated linear factors, so now we're going to get the class polynomial of gamma 2, its coefficients mod little P for each or our little Ps, and then at the end we'll apply the Chinese Remainder Theorem to get the class polynomial gamma 2 mod big P. Then we take a root of that and we cube it to get a J-invariant which is going to give us the curve we want. So this is very straightforward and you'll instantly find it makes the algorithm more than three times as fast. Now, we can get a little fancier here. It's actually possible to use primes that are congruent to 1 mod 3, even though the cube roots aren't unique, only one of the cube roots is actually the right invariant and you can find that you can figure out which one it is, and Reinier has some ideas on that. It's also possible rather than enumerating the J-invariants we could enumerate the gamma 2s using a modular polynomial for gamma 2. However, in practice, neither of these makes things substantially faster than what I've described here, just because computing cube roots is very easy. It doesn't take any time. It's just an exponentiation. Okay. But why stop at a factor of 3? Now, when -- I didn't mention a constraint there, by the way. We do assume that 3 doesn't divide the discriminant. If we make a stronger constraint of the discriminant, that it's a congruent to 7 mod 8, then we can use not quite the Weber invariant but of the square of the Weber invariant, so F here. This is another function that I could write down in terms of the Dedekind eta function [inaudible] analytic expression for, but I'll just define it with this algebraic relationship it has with gamma 2. And so this also defines its relationship with J. And I can play the same game. I can compute J, I can compute gamma 2, and when P is congruent to 7 mod 8, I can uniquely compute F squared from this equation. Now I get to reduce my height bound by a factor of 36, which is very nice. The only downside to this is that if you're looking for prime order curves, they never have discriminant congruent to 3 mod 8 or to 7 mod 8. The absolute value of the discriminant's always 3 mod 8, and that, if you just stare at this equation for a while, you'll see why that's true. So for prime order curves, we might instead use a class invariant originally coming from Ramanujan, and again we need to use the square of the invariant to be able to compute it exactly. But for a D congruent to the absolute value of E congruent to 11 mod 24, which is in particular 3 mod 8, we can compute G squared using this relationship. Again, we compute gamma 2 by taking a cube root, and we can uniquely determine G squared by solving that equation. And this gives us a factor of 18 improvement. Now, when we actually apply this, we find that we get a speedup of more than a factor of 3 and more than a factor of 18. And the reason for that is that not only are we using fewer primes, but we can pick the best fewer, meaning so our average time per prime is also going down. Even when you throw in the fact that I'm going to force you to use primes that are just 2 mod 3, that means we have to go out a little bit further, but we're still winning. So you can see the benefit here is this is the height bound, the number of bits we need to compute goes down substantially. That means the number of primes goes down. The running time then of course goes down as well. But what's also interesting is how the split of the running times splits up. So with the J-invariant, we were spending most of our time finding curves to the right endomorphism ring, but now when we're using the Ramanujan invariant, most of that time is shifting over to the other steps. And in particular this third step is looking like it's going to become the most expensive, which it will when we make D big enough. >>: [inaudible] >> Drew Sutherland: [inaudible] step is just multiplying a bunch of linear factors to get a big polynomial. You know, building the polynomial up from its roots, building that product tree. >>: [inaudible] right? >> Drew Sutherland: Yes. >>: So I know -- I mean, I saw David a week ago and he commented that you -- I mean, he said that you had some [inaudible] small polynomial [inaudible]. >> Drew Sutherland: Yeah. >>: [inaudible] taking advantage of that? >> Drew Sutherland: This is -- these numbers are taking advantage of that. Although not as much as I could, yes. So David and I -- so I have something that's really fast for low degree, he has something that's really fast for high degree, and we haven't quite figured out where we should overlap in the middle. So right now I think I'm switching over to him too soon, but we're -- that goes back and forth. He makes his code faster, I make my code faster, so that keeps shifting. Okay. All right. But now I want to come back down to the underdog's ready to stand face-to-face, toe-to-toe with the world champion. So these times are taken from a paper by Andreas Enge. And I should mention all of the stuff I've been talking about in class invariants, that is joint work with Andreas Enge. These are taken from his paper on floating point approximations, these running times. And I've converted everything to 2.4 gigahertz AMD because that's where these were run. And the complex analytic method here is using the double eta quotient; we're using the square of the Weber invariant. And you can see that across the board the CRT method is faster, but perhaps most notably its advantage grows; it actually has a logarithmic advantage here. And this is because the step of building the polynomial up from its roots, that's also the dominant step for the complex analytic method, but it has to do that over Z. So whereas we're doing that mod lots of little Ps, so we gain a logarithmic factor. Already at 100,000, with class number 100,000 here, the complex analytic method is starting to run out of gas because this polynomial is getting too big to fit in memory, and so the constant factors are going down. But even if you step back to the earlier stages here where the class number is doubling in each row, you can see that the ratios are improving almost by doubling at each step. >>: [inaudible] >> Drew Sutherland: So here the CRT method is getting a factor of 36. Here we're only getting a factor of 28. Okay. But I would say -- and I actually -- but the complex analytic method could use ep squared also. It doesn't because there are I guess practical reasons -- I don't know the details -- that the eta question is actually faster. Okay. So I would call it -- if I said the problem is do a CM construction using whatever method you think is the fastest go, then this is a fair comparison. And as William suggested, another big advantage of the CRT method is how easy it is to parallelize it. And so for the larger examples that I've computed, I've been running on a small cluster of machines, and so here's one with a discriminant on the order of 10 to the 13th, class number about 700,000, took 11 hours. Scaling up to 10 to 14th and class number over 2 million took about 4 1/2 days. And if I had more computers, I could make that smaller. And there's a lot of headroom here, because what I find most remarkable about this is the amount of memory is still -- even at 10 to the 14th is still very small. I mean, the class polynomial mod big P -- our big P here is about 256 bits -- is only about 64 megabytes. Okay. So even with everything, all of the infrastructure surrounding it, we're only use 2or 300 megabytes per thread here that's running. And the total disk storage, the total amount of stuff we ever needed to write down is less than 2 gigabytes. It's 28 times 64 megabytes. Now, you might say, well, what about the tens of terabytes you were telling me about earlier? Or in this example I think the biggest one here is -- so with the Ramanujan invariant, we get it down to 4 terabytes. Okay. We're still computing 4 terabytes. There is 4 terabytes of data cycling through the memory of the machine. We just never need to save it all in one place. So larger computations are certainly feasible. And that's where I'll end. That's all I have. [applause] >> Drew Sutherland: Oh, is there another slide? Oh, I lied. That's not all I have. Sorry. So why did I put this last slide here? Ah, yes. This was to just give you an idea of how things scale. So here I've just picked discriminants close to powers of ten. And the only thing that's perhaps really worth noticing is that modulo fluctuations in the class number, it's roughly linear, which you'd expect. But the point that I wanted to focus on is the splits. You look at how much time is spent in part C, the building the polynomial from its roots, and it's just getting bigger and bigger and bigger. And so as you allow it, that's going to be the limiting factor. And that's really all I have. Okay. Sorry about that. Any questions? >>: [inaudible] asymptotic complexity at all or you just changed the constants of the algorithm overall, that you've dramatically changed [inaudible]. >> Drew Sutherland: Yeah. So -- well, obviously the space. Yeah. >>: [inaudible] but I mean as far as the [inaudible]. >> Drew Sutherland: So there are certainly some log log factors. >>: Right. >> Drew Sutherland: Okay. On the -- well, so there's two separate issues. One is there is what you can prove, okay, I think it's possible to prove a better bound than the -- I think it was a D log 7D. Yeah. It might be possible to improve that by one, but I'm not sure. I need to think about that. >>: [inaudible] >> Drew Sutherland: Yeah. I don't know about that. Then there's the heuristic bound in the their paper, which is the D log cubed D. I think under pessimistic conditions I don't improve that. I would argue that one I would come up with some more pragmatic heuristics that would say that it's better than that. Part of it I guess depends really on how you view word size multiplications on your computer, if you think those take log squared, you know, bit operations, then I'm not going to get you anything. But... Yeah, I think that's the answer. >>: A question about running things in parallel, of course, I'm a big fan of the CRT approach, but [inaudible] analytic approach, right [inaudible] points [inaudible] parallel. >> Drew Sutherland: So you've still got to -- you've still got the problem of building the polynomial up at the end. And there you're going to have -- I think that -- I mean, it's not that you can't do it in parallel, it can be done in parallel. But the amount of effort you would have to put into engineering that is much more substantial. I mean, the last few steps of that FFT that you would need to compute to do just -- just think about the very last step of multiplying the last two polynomials together, which is the most time-consuming step. So you've got two polynomials of order of degree H over 2 with huge coefficients and you've got to multiply them together. Doing that across a hundred machines is not a fun project. I'm not saying it can't be done, but you'd have to think hard. Whereas here there's no thought involved, and so, okay, you do the first prime, you do the second prime. It's not hard. Any other questions? >>: Can you say anything about [inaudible]? >> Drew Sutherland: So this is all code written in C. I use Montgomery representation for all the finite fields. I take advantage of a couple of other sort of standard tricks. So the elliptic curve implementation I use an affine group law computed in parallel across a bunch of curves, so then I'm effectively getting six multiplications or seven per squaring because I can do the -- again, this is another Montgomery trick doing the field inversions in parallel. I need to be able to handle any elliptic curve, so picking a special curve like an Edwards curve or a Montgomery curve isn't really an option in general, and even if I did, I don't believe that it would be faster than the parallel affine trick. The other big implementation detail that gives a huge constant factor is one that I used earlier in the paper that I presented at ANTS, which is specialized optimized code for factoring cubics and quartics, so those tests for -- that helps both when L is 2 or 3 here -or when L is 3, I guess. But also for testing 3 torsion and 4 torsion, being able to solve a -- find a root of a quartic really fast is the difference between that being useful and not being useful. If you just apply a standard root-finding algorithm, you know, compute X to the P mod, whatever, it's too slow to help. But if you can solve by radicals and be clever about how you do that, it gets to be fast enough to be useful. And then I tried all sorts of clever things on building the polynomial up from its roots, and then I gave up when I saw David's code. So... Okay. That's it. [applause]