Math 5330 Spring 2013 Elementary factoring algorithms The RSA cryptosystem is founded on the idea that, in general, factoring is hard. Where as with Fermat’s Little Theorem and some related ideas, one can usually tell very quickly if a composite number is, in fact, composite, actually producing a factorization of a composite number is a very different thing. Currently, the only method at our disposal is trial division. For small numbers, trial division is the method of choice. If you wish to factor a number n ă 1010 , you should probably use trial division. But what if you want to factor a large number? Trial division still has a part to play. If you have a number of size roughly 1030 , then you would need to be very lucky to factor it with trial division. If the number were to be the product of two nearly equal primes (or if the number itself were prime) then you would have to perform trial division up to about 1015 to see this. To put this in perspective, there are roughly 29,000,000,000,000 primes up to 1015 , and even if we could perform 106 multi precision divisions a second, it would take 29,000,000 seconds to try them all. That is, trial division could take about a year. So what to do with a 30-digit or larger number? First, one usually uses trial division for a while. After all, we know how to factor any even number. At some point, it is useful to now that the number actually is composite, so after some trial division, if m is the current unfactored part, calculate 2m´1 pmod mq. If it is not 1, then m is composite. Usually one does some more trial division (try, say, all primes p ă 106 .) But after that, switch to some other factoring method. What other factoring methods are there? Here I will present several other fairly simple factoring methods. The first dates back to Fermat, the rest are less than 50 years old. Fermat’s Factoring Method Our first method is based on the idea that if n “ x2 ´ y 2 , then n “ px ´ yqpx ` yq. That is, we will try to represent n as the difference of two?squares, and use that representation to factor n. To do this, we start with a number x0 “ r ns, and calculate px0 ` kq2 ´ n, for k “ 0, 1, 2, . . . , stopping when a square is returned. There is a trick to speed up the calculations for px0 ` kq2 ´ n, and that is that two successive values are related. That is, px0 ` k ` 1q2 ´ n “ rpx0 ` kq2 ´ ns ` 2px0 ` kq ` 1, so we only have to calculate one square. For example, if n “ 3977, then x0 “ 64, and we need to calculate x20 ´ n “ 642 ´ 3977 “ 119. To calculate 652 ´ 3977 we don’t even have to square 65, we just add 2 ˆ 64 ` 1 to 119 to get 248. Moreover, these numbers, 2px0 ` kq ` 1 grow by 2 each time, so we don’t even need to recalculate them, we just add 2 to the previous value. Here is a table for these calculations. k 0 1 2 3 4 5 x0 ` k 64 65 66 67 68 69 2px0 ` kq ` 1 px0 ` kq2 ´ n 129 119 131 248 133 379 135 512 137 647 139 784 = 282 What this tells us is that 692 ´ 3977 “ 282 which we rearrange as 3977 “ 692 ´ 282 “ p69 ´ 28qp69 ` 28q “ 41 ˆ 97. Each iteration in the table goes very fast on a computer, the most difficult step of which is to determine if px0 ` kq2 ´ n is a square. Fermat’s factoring method works reasonably well for small numbers n and for numbers n “ pq where p and q are nearly equal. An example I’ve come across is in trying to factor n “ 1022 ` 1. If you use trial division for a while, you find factors 89 and 101, leaving a 19-digit number, 1,112,470,797,641, 561, 909. If you try Fermat’s method on this number, you fairly quickly find 1, 112, 470, 797, 641, 561, 909 “ 1056689261 ˆ 1052788969. How good is Fermat’s method? For small numbers, it is a reasonable thing to try. But in fact, it is worse than trial division in general! The worst case of Fermat’s method is where n is prime. In this case, n factors as n ¨ 1, so we need x ` y “ n, x “ y “ 1. This means ? n´1 n`1 and y “ . Now the x here is x0 `k, where x0 is roughly n. That is, we need x“ 2 2 ? n`1 n`1 ? n`k “ , so k « ´ n steps before concluding that n is prime. To see what 2 2 10 this means, suppose we have an n around ? 10 . This 5is a very small number, as factoring goes. If n is prime, it will take about n steps or 10 steps to show this by trial division. With Fermat’s method, it will take 21 1010 ´ 105 steps. Thus, trial divisor takes about 100,000 steps, Fermat’s method takes 4,999,900,000 steps. On average, one expects to find a composite number n to have a prime divisor of size n.63 , and coprime part of size about n37 . If the coprime part is actually prime, then trial divisor will find the factorization of n in about n.37 steps. Fermat’s method will take something like 1 .63 n steps, so again trial division wins. Thus, in general, one should never use Fermat’s 2 method to completion. You can try several million steps, maybe, hoping to get lucky, but then switch to something else. Before moving on to the next method, I should mention that many approaches can be improved, or are more advantageous in some situations than in others. We already know, for example, that if n “ 2p , then the only possible divisors of n are primes q ” 1 pmod pq, so we can skip most numbers when using trial division on such numbers. With Fermat’s method, there is another way to speed things up. Paradoxically, it is to try to factor a number larger than n rather than factoring n. Pick some appropriate number, m, and try to factor mn Page 2 rather than n. The idea is that mn might factor into two nearly equal parts. Here is a simple example. If we wish to factor 1207 with Fermat’s method, then x0 “ 35 and after 10 steps, we get x0 ` 9 “ 44, with 1207 “ 442 ´ 272 “ p44 ` 27qp44 ´ 27q “ 71 ˆ 17. If, on the other hand, we first multiply n by 3, and use Fermat’s method on 3621, then x0 “ 61 and already we have 612 ´ 3621 “ 100 “ 102 . Here, we have 3621 “ 612 ´ 102 “ 71 ˆ 51, and looking for the factor divisible by 3, we recover 1207 “ 71 ˆ 17. In general, one multiplies n by some number with lots of factors, like 315 “ 32 ˆ 5 ˆ 7 on the hopes that some factors multiplying p with others multiplying q producing nearly equal numbers. For example, suppose we wish to use Fermat’s method to factor 7421. This would require 35 steps with Fermat’s method: x0 “ 87, 872 ´ 7421 “ 148, 882 ´ 7421 “ 323, . . . , p87 ` 24q2 ´ 7421 “ 4900 “ 702 . If, instead, we multiply n by 315 and try to factor 2337615, then four steps are required: x0 “ 1529, 15292 ´ 2337615 “ 226, 1530 Ñ 3285, 1531 Ñ 6346, 1532 Ñ 9409 “ 972 . The reason: 7421 “ 41 ˆ 181, and these primes are far apart. However, multiplying by 315 gave the factorization 315ˆ7421 “ 15322 ´972 “ 1629ˆ1435 “ p9ˆ181qp35ˆ41q. Multiplying by a number m CAN make Fermat’s method worse. I believe there is an algorithm for picking a sequence of numbers m to multiply by n. One tries Fermat’s method on each mn?for some prescribed period of time, and in the end, you can factor n in something under 3 n steps ? rather than n steps as required by trial division. I do not know the details. The next two methods were both devised by a mathematician by the name of John Pollard. They are both considerably better than trial division. However, before using them, one should check that 2n ı 2 pmod nq, so one knows n is composite. Pollard’s rho method (1975) This method uses an “iterated functions approach.” Let f pxq “ x2 ` 1 (lots of other functions could be used instead of this one), and consider the sequence f p1q, f pf p1qq, f pf pf p1qqq, . . . . pmod pq. This sequence will be eventually periodic. This means that after a while, a periodic pattern will present itself. For example, if p “ 23, the sequence is 1, 2, 5, 3, 10, 9, 13, 9, 13, 9, . . . . We call 1, 2, 3, 4, 10 the tail of this eventually periodic pattern. If we let f m pxq represent the m-fold composition f pf p¨ ¨ ¨ f pxq ¨ ¨ ¨ qq, then for any prime p there are integers k ‰ m for which f k paq ” f m p1q pmod pq. This is because there are only p possible remainders when a number is divided by p, but there are infinitely many m. Once we have an m and a k, then f k`1 p1q ” f m`1 p1q, f k`2 p1q ” f m`2 p1q, and so on. This means that if p is some unknown divisor of n, and if we could find the right m and k, then we might be able to find p because p would be a divisor of gcdpf m p1q ´ f k p1q, nq. How do we find m and k when we don’t even know p? We use a method called Floyd’s Cylce Finding Algorithm. The algorithm works like this: Suppose we have a sequence a0 , a1 , a2 , . . . which is eventually periodic. Then am “ a2m for some integer m. We can use this to form a factoring algorithm: To factor n, for k “ 1, 2, 3, . . . , calculate gcdpf 2k p1q ´ f k p1q, nq. In fact, what we do is calculate a sequence f k p1q pmod nq, to keep the numbers from getting Page 3 too large, and for even values of n “ 1357. We have k 1 2 3 4 5 6 7 8 9 10 11 12 k, we calculate gcdpf k p1q ´ f k{2 p1q, nq. As an example, let fk f k{2 difference gcd 2 5 2 3 1 26 677 5 672 1 1021 266 26 240 1 193 611 677 -66 1 147 1255 1021 234 1 906 1209 266 943 23 and so, 23 is a divisor of 1357. The reason this works should be made clear if we just do things modulo 23: k fk 1 2 2 5 3 3 4 10 5 9 6 13 7 9 8 13 9 9 10 13 11 9 12 13 f k{2 2 5 3 10 9 13 difference 3 5 10 3 4 0 That is, f 12 p1q ´ f 6 is divisible by 23, so it is at the stage k “ 12 that the prime 23 is discovered by Pollard’s rho algorithm. How fast is the rho method? Certainly it has to find a prime p in at most p steps. This does not sound very good: trial division will find p in exactly p steps. However, there is reason to believe the rho method finds p much faster than p steps. Suppose, instead of numbers f m p1q, we just produced random numbers. How long would it take before two of our random numbers agreed modulo p? The is a variation of the birthday problem in probability: If you pick k things (with replacement) from n types of things, what is the probability of getting two of the same thing? The probability that the are all different is ˆ ˙ˆ ˙ ˆ ˙ npn ´ 1qpn ´ 2q ¨ ¨ ¨ pn ´ k ` 1q 1 2 k´1 “1 1´ 1´ ¨¨¨ 1 ´ . nk n n n Page 4 Let’s ask a different question: When is the probability of finding a match 12 ? To approximate the probability, take the logarithm. We want ˙ k´1 ÿ ˆ j . ´ ln 2 “ ln 1 ´ n j“1 Using the approximation lnp1 ´ xq « ´x, we want 1 2 k´1 kpk ´ 1q k2 ` ` ¨¨¨ ` “ « . n n n 2n 2n a ? This means we want k « 2n lnp2q « 1.177 n. For example, with the birthday problem (how many people do you need in a room to have a?50-50 chance that two have a birthday in common?), this says you would need about 1.177 365 « 22.5 people. ln 2 « What this means for the rho method: If the numbers f m p1q ”act” random enough, then ? we expect to find a prime p not in p steps, but more like 1.177 p steps. Numerical evidence ? supports this, so for simplicity, we say the rho method probably finds a factor p in p ă n1{4 steps. More is known. If we used a simpler function for f pxq, say f pxq “ ax ` b, a linear function rather than a quadratic, then the iterates do not seem random enough, and we get something more like p steps again. But using most quadratic or higher degree polynomials, the iterates do appear to act like random numbers. Pollard’s p ´ 1 method (1974) Recall Fermat’s Little Theorem yet again: For any prime p, and any number a with p ffl a, then ap´1 ” 1 pmod pq. In particular, if p ą 2, then 2p´1 ” 1 pmod pq. If m is a multiple of p ´ 1, say m “ kpp ´ 1q, then 2m “ p2p´1 qk ” 1k ” 1 pmod pq. This means that p 2m ´ 1 for any m where pp ´ 1q m. For example, if p “ 7, then p ´ 1 “ 6 so 7 2m ´ 1 for any m divisible by 6. For example, 212 ´ 1 “ 4095 “ 7 ˆ 585. We can turn this into a factoring algorithm as follows: take a sequence of m’s with lots of small factors (we will use the sequence mk “ k!, but other sequences would work as well.) For each term in the sequence, we calculate gcdpn, 2mk ´ 1q, and stop when the gcd returns a number larger than 1. This method will find a prime divisor p of n if p ´ 1 mk . This method works very well if p ´ 1 has all small prime divisors. The Maple command “ifactor(n, easy)” does the following: It uses trial division up to some limit, and then uses some fixed number of iterations of the p ´ 1 method. For example, ifactor(1037 ´ 1, easy) returns p3q2 c28 p247629013q. What this means is that it found 9 and 247,629,103 as factors of 1037 ´ 1, leaving a 28-digit number that it knew to be composite (the meaning of the c). The factor 247629013 was found by the p ´ 1 method. It was successful because p ´ 1 “ 22 ˆ 3 ˆ 37 ˆ 41 ˆ 61 ˆ 223 Page 5 has all small divisors. In particular, it did NOT find the smaller prime divisor q “ 2028119 because q ´ 1 “ 2 ˆ 37 ˆ 27407, and it did not do enough iterations so that 27407 m. As a simple example of the p ´ 1 method, let’s factor n “ 3811. As with the rho method, we form a table: k 2 3 4 5 6 2k! pmod 3811q gcdp2k! ´ 1, 3811q 4 1 64 1 1194 1 2172 1 3257 37 and 3811 “ 37 ˆ 103. We found 37 after 6 steps because 37 - 1 = 36, a divisor of 6!. Some notes on this table: We did not calculate 2k! , but 2k! pmod nq. Also, one can calculate 2pk`1q! by using the formula 2pk`1q! “ p2k! qk`1 , using the binary squaring algorithm. That is, once we know 25! ” 2172 pmod 3811q, we calculate 26! pmod 3811q by calculating instead, 21726 pmod 3811q. In real life, back in the late 70’s, the p ´ 1 method was used to show that 1053 ´ 1 is divisible by p “ 1325815267337711173. In fact, this prime was found fairly quickly because p ´ 1 “ 22 ˆ 32 ˆ 11 ˆ 53 ˆ 1279 ˆ 1553 ˆ 3557 ˆ 8941, which has all of its prime divisors less than 10,000. Page 6