CSEP 590TU – Practical Aspects of Modern Cryptography Winter 2006 Final Project Report Due: March 7, 2006 Author: J. Jeffry Howbert FACTORING RSA MODULI: CURRENT STATE OF THE ART The security of the widely used RSA (Rivest-Shamir-Adelman) public key cryptosystem rests on the presumed difficulty of deriving the two prime factors of the chosen modulus. This report examines the status of that security in two main sections. In the first, the mathematics of the current best methods for factoring very large integers is described. The second section reviews recent progress in the application of these methods to factoring RSA moduli, along with the implications for the security of RSA when using various size moduli. Algorithms for factoring large integers Several useful surveys of integer factoring methods exist; see [1] – [4]. [1] is the best overall introduction for someone unfamiliar with the topic, followed by [2] and [3]. The treatment in [4] is more in-depth and comprehensive, but assumes a higher degree of mathematical sophistication. Known factoring algorithms may be classified in several ways. They can be distinguished, for example, according to their computational complexity, as running in either exponential time, subexponential time, or polynomial time. A particularly important division is between special purpose and general purpose factoring algorithms. In special purpose algorithms, running time depends on the size of the integer being factored, the size and number of the factors, and whether the integer has a special form. The running time of general purpose algorithms, by contrast, depends solely on the size of the integer being factored. In practice, only the most advanced general purpose algorithms have been useful for attacking large RSA moduli. With this in mind, the special purpose algorithms will be mentioned only in passing, and emphasis placed instead on the historical and conceptual development of the general purpose algorithms. Special purpose algorithms of note include: Trial division by all primes up to n. Fermat factorization (see below). Pollard’s rho algorithm. Invented in 1975. Pollard’s p - 1 algorithm. Invented in 1974. Williams’ p + 1 algorithm. Invented in 1982. Elliptic curve factorization. Invented by H. Lenstra 1987. (See end of this section for limited detail.) The general purpose algorithms are all based on some elaboration of the congruence of squares method. The original version of the method has been improved through several important conceptual advances, leading to remarkable increases in the size of integers which can be factored. The main stages in the evolution of general purpose methods can summarized as: 1) difference of squares (Fermat’s method) 2) congruence of squares (Kraitchik) 3) filling a matrix with smooth relations, and processing the matrix with linear algebra (Morrison and Brillhart; Dixon) 4) sieving to find smooth relations more efficiently (Pomerance and others) The cornerstone of the congruence of squares method is Fermat’s method of factorization, discovered by him in the 1600s. He observed that any odd integer n > 1 can be written as the product of the sum and difference of two integers, and therefore as the difference of two integer squares: n = ( a + b )( a – b ) = a2 – b2 Thus if n is a composite number with unknown factors, a factorization of n can be achieved by finding two integer squares whose difference equals n. The simplest algorithm for searching for appropriate integer squares is to evaluate the expression: x = n + i2 for successive i = 0, 1, 2, ... If x is an integer square, then a factorization has been found. This approach often works well in practice for modest size n, but has the disadvantage that testing for integral square roots must be done on numbers x which are at least as large as n. A more efficient approach is to keep x small by evaluating the expression: x = ( n + i )2 – n for successive i = 0, 1, 2, .... As before, finding some i for which x is an integer square generates a factorization: n = ( n + i )2 – x = ( n + i – x )( n + i + x ) Fermat’s factorization method is especially effective when the factors are similar in size, i.e. are close to n, but it can be even slower than trial division if the factors are significantly different. The first point serves as a caution that the primes selected to form an RSA modulus should not be too close together. A generalization of Fermat’s method was developed in the 1920s by Kraitchik [5], wherein one searches for integers a and b such that a2 – b2 is a multiple of n, that is, a congruence of squares in which: b2 a2 mod n If a congruence is found, then n divides a2 – b2 = ( a + b )( a – b ). The uninteresting solutions of the congruence can be eliminated in advance by imposing the constraint: b ! a mod n In the remaining cases, the factors of n must be split in some fashion between ( a + b ) and ( a – b ). The factors of n dividing ( a + b ) can be extracted by calculating gcd( n, a + b ), and the factors dividing ( a – b ) by calculating gcd( n, a – b ). If n is indeed composite (and satisfies the added condition that n is not a prime power), then there is at least a 50% chance that gcd( n, a + b ) and gcd( n, a – b ) are non-trivial factors of n. The true importance of Kraitchik’s generalization is that it allows the factorization to exploit congruences where only one side of the congruence is an integer square. For example, if two congruences (henceforth called relations) can be found such that: b1 a12 mod n b2 a22 mod n where b1 and b2 are not integer squares, but b1 b2 is an integer square, then: b1 b2 a12 a22 mod n is a congruence of squares, and a factorization has likely been obtained. The approach is extensible, in that any number of relations can be multiplied together to produce one in which the product of the bi on the left-hand side is an integer square. It is easy to generate an arbitrary number of relations bi ai2 mod n, but generally non-trivial to find a subset of bi whose product forms an integer square. Kraitchik approached the problem by collecting bi which are easily and fully factored into small primes. One then looks for some combination of these bi such that the product of each individual prime factor across the bi is an even power. The overall product of the bi must then be an integer square. With the advent of digital computers, it became feasible to systematically process large numbers of bi. In 1975 Morrison and Brillhart [6] introduced a method called CFRAC, which applied linear algebra and continued fractions to extract congruent squares from large numbers of relations. They used it to factor the seventh Fermat number, a famous result at the time. During the same period, Dixon [7] developed a similar approach which did not involve continued fractions. Although less efficient, Dixon’s algorithm is conceptually much simpler, and will be used for purposes of explanation here. The first step in the algorithm is to choose a factor base of small primes p1, p2, p3, ... pk. The largest prime in the set, pk, is called the smoothness bound B, and any integer which factors completely over the factor base is referred to as B-smooth. A set of aI near n (or ( k n ), where k is various small positive integers) are then chosen at random, and used to generate bi according to: bi = ai2 mod n The bi are trial factored over the factor base. The chances of successful factoring are enhanced by the fact that ai are near n and the bi therefore relatively small. Any relation (ai, bi) for which bi is smooth is saved. In the next step, each smooth bi is converted to a vector representation vi of the exponents of its factors. For example, if the factor base = { 2, 3, 5, 7 } and b i = 378 = 21 33 50 71, then vi = [ 1, 3, 0, 1 ]. The goal now is as with Kraitchik’s method: find a subset of bi whose product bi is an integer square. The vectors vi simplify this in several ways: The multiplications of bi are replaced by additions of vi. The squareness of any resulting vi is easily tested by reducing it mod 2. (This operation can be illustrated with the example above: vi mod 2 = [ 1, 3, 0, 1 ] mod 2 = [ 1, 1, 0, 1 ] ). In the desired result, all the powers of the prime factors in bi are even; this is equivalent to vi = [ 0, 0, 0, ..., 0 ], i.e. a vector sum whose components are all zero. The vi can be placed as rows in a matrix and manipulated with the standard tools of linear algebra. Of greatest importance, it is now possible to guarantee a solution. This requires only that the number of smooth relations collected be greater than the number of primes in the factor base. In that event, the number of rows in the vector matrix is greater than the number of columns, and the basic theorems of linear algebra assert that a linear dependence exists. The matrix can be reduced using standard methods, which are readily adapted to work with mod 2 arithmetic. In addition to structured Gaussian elimination, the block Lanczos and block Wiedemann methods are popular. In practice the choice of B is critical. If B is small, testing for smoothness is easy, but it may be difficult or impossible to find any relations that are B-smooth. If B is large, the work involved in smoothness testing goes up, and more relations must be gathered to fill the matrix adequately. An analysis by Pomerance [4] suggests that the optimal choice of B is: B ~ exp( 1/2 ( ln n )1/2 ( ln ln n )1/2 ) The linear algebra processing of the matrix, described above, is often referred to as the matrix step. The final innovations to the basic congruence of squares method involve faster ways to find the B-smooth integers needed to populate the matrix. These replace the random generation of relations ( ai, bi ) using trial division with processes that create series of candidate smooth integers separated by multiples of the primes in the factor base. Because this is reminiscent of the sieve of Eratosthenes, it is commonly called the sieving step. The sieving and matrix steps of the most advanced general purpose algorithms (below) are also referred to as data collection and data processing. The first sieving method to enjoy widespread practical application was quadratic sieving (QS), invented by Pomerance in 1981. First a continuum of bi is generated over a range of ai near n, using the formula bi = ai2 – n. The sieving then proceeds by following these steps for each prime p in the factor base: 1) Determine whether n is a nonzero square modulo p. If not, sieving with this p is not possible; skip to the next p. (In practice, this is a disqualification for including p in the factor base.) 2) Extract the square roots x1, x2 of n modulo p by solving the congruence x2 n mod p. (The Shanks-Tonelli algorithm is efficient for this.) 3) Find the smallest ai such that ai = x1 and x2 and flag them. Note that the corresponding bi ai2 – n 0 mod p, and so are necessarily divisible by p. 4) Flag all ai such that: or ai = x1 + kp ai = x2 + kp k = 1, 2, 3, ... Because ( xi + kp )2 xi2 mod p, we again have the corresponding bi 0 mod p and therefore divisible by p (this is the essence of the sieve). 5) For all ai which are flagged, divide the corresponding bi by p. 6) Repeat steps 1) – 5) for the integer powers of p, pr up to some cutoff r. When sieving is complete for all p in the factor base, the bi which have been reduced to 1 by repeated division by primes in the factor base are exactly those which are smooth with respect to the factor base. The overall process is radically faster than trial division because only those bi which are divisible are actually divided. One drawback to the basic QS algorithm is that as ai deviates further and further from n, the size of bi grows, reducing the probability of it being smooth. This was addressed by an important enhancement developed in 1983 by Davis. Use of multiple quadratic polynomials of the form: bi = Aai2 + 2Bai + C A, B, C where A, B, and C satisfy certain constraints that ensure bi is a square mod n, gives a continued yield of smooth bi while keeping ai – n small. This enhancement, known as multiple polynomial quadratic sieve (MPQS), is widely used in practice. The yield of smooth bi can also be enhanced by combining partial relations. In a partial relation, bi is smooth except for one (usually large) prime factor which is not in the factor base. If two partial relations exist which have the same non-smooth factor px: b1 = p1e11 p2e21 ... pkek1 px a12 mod n b2 = p1e12 p2e22 ... pkek2 px a22 mod n they can be multiplied together and the non-smooth factor eliminated by multiplying by its inverse, to give a relation fully within the factor base: b1 b2 = p1e11+e12 p2e21+e22 ... pkek1+ek2 px2 a12 a22 mod n b1 b2 = p1e11+e12 p2e21+e22 ... pkek1+ek2 px2 (px-1)2 a12 a22 (px-1)2 mod n b1 b2 = p1e11+e12 p2e21+e22 ... pkek1+ek2 a12 a22 (px-1)2 mod n The harvest of partial relations is vital to factoring truly large integers such as RSA moduli, where the number of full relations derived from partial and double partial relations typically exceeds the number of simple full relations by several-fold. In terms of speed, the sieving step of QS is about 2.5 times faster when partial relations are exploited, and another 2 times faster when double partial relations are also included. There are modest penalties incurred from the greater amount of data generated and stored, but they are more than repaid by the time saved on sieving per se. Another computational efficiency can be realized by modifying Step 5) in the basic QS algorithm. Rather than dividing bi by a prime which divides it, the “prime hit” is recorded by adding logr p to an accessory storage location (r is an appropriately chosen base). After sieving all the primes in the factor base, there will be some bi for which the sum in the accessory storage location is close to logr bi; the smoothness of these is confirmed by trial division over the factor base. QS has been largely superceded by the more powerful and faster general number field sieve (GNFS) method. The mathematics of GNFS are beyond the scope of this report (and, at present, largely beyond the comprehension of this author). The efficiency of GNFS lies in restricting the search for smooth numbers to those of order n1/d, where d is a small integer such as 5 or 6. To achieve this focus on small numbers, however, the computations of both the sieving and matrix steps must be performed in algebraic number fields, making them much more complex than QS. The concepts behind GNFS originated with Pollard’s proposal in 1988 to factor numbers of the special form x3 + k with what subsequently became known as the special number field sieve (SNFS). Over the next five years, it underwent intensive theoretical development into a form of GNFS which proved practical to implement on computers. Major contributors to this progress included the Lenstras, Pomerance, Buhler, and Adelman [2]. To understand the speed advantage of GNFS over QS, it is useful to first examine the resource demands of the various steps in the algorithms. It turns out the sieving steps of both QS and GNFS are extremely CPU-intensive, but require only moderate amounts of memory. Sieving is, however, highly parallelizable, so it can be partitioned across large numbers of fairly ordinary workstations. With QS, for example, a processor can work in isolation for weeks once it has been given the number to factor, the factor base, and a set of polynomials to use in sieving. The factoring of RSA-120 involved over 600 volunteers and their computers; sieving was parceled out and results reported back via email. The matrix step of QS and GNFS is just the opposite – there are huge computational advantages to keeping the matrix in memory while it is processed, but it otherwise takes only a small fraction of the time required for sieving. Historically, the matrix step has been performed on a supercomputer at a central location, although some examples of distributing the matrix step of GNFS have been reported recently. Some examples of statistics on resource usage during factoring of RSA moduli with QS and GNFS are given in the second section of the report. The speed advantage of GNFS over QS does not appear until the number being factored exceeds around 110 digits in size. Below that size, QS is still the fastest general purpose factoring algorithm known. However, when working at the limit of what can be factored today (around 200 digits), GNFS is many-fold faster than QS. The theoretical complexity of all the congruence of squares methods is inherently subexponential. Dixon’s algorithm, which does not use sieving, can have a run time as favorable as: L( n )Dixon ~ exp( ( 2 + o( 1 ) ) ( ln n )1/2 ( ln ln n )1/2 ) if the elliptic curve method for trial division is used. For QS, the run time is: L( n )QS ~ exp( ( 1 + o( 1 ) ) ( ln n )1/2 ( ln ln n )1/2 ) Although the change in the leading constant relative to Dixon’s algorithm might seem trivial, it is in fact of major significance – it leads to a doubling in the size of numbers which can be practically factored. The theoretical advantage of GNFS is much more obvious, as the exponent on the ln n term is smaller: L( n )GNFS ~ exp( ( ( 64/9 )1/3 + o( 1 ) ) ( ln n )1/3 ( ln ln n )2/3 ) With one exception, the special purpose algorithms mentioned in the first part of this section have exponential running times, yet another reason they are not competitive with general purpose algorithms for factoring large integers. The exception is the elliptic curve method (ECM), which is subexponential. Unlike the congruence of squares methods, its run time is dominated by the size of the prime factor p, rather than the size of n. In the worst case, when the smallest prime factor p ~ n, ECM has run time the same as QS. In favorable cases, where the smallest prime factor is around 20 to 25 digits, ECM is faster than QS or GNFS, and is the method of choice. As a final note on complexity, there is one known factoring algorithm with polynomial run time, Shor’s algorithm for quantum computers. At present it poses no threat to RSA, due to lack of suitable computing hardware. Its greatest achievement to date was factoring 15 into 3 and 5 on a quantum computer with 7 qubits in 2001. Application of factoring methods to RSA moduli Information in this section is drawn primarily from various online documents at [8] and [9]. Historical progress in factoring RSA moduli is best understood in context of the RSA Factoring Challenge. This is a public contest created by RSA Laboratories in 1991 as a means to understand the practical difficulties involved in factoring large integers of the sort used in RSA moduli. A set of challenge numbers was published, ranging in size from 100 to 500 decimal digits, with each number being composed of exactly two factors, similar in size. The numbers were created in such a way that no-one, even RSA Laboratories, knew their factors. This original set of challenge numbers was named RSA-100 through RSA-500, where the number in the name indicates the number of decimal digits in the challenge number. Nominal cash prizes (c. $1000) were offered for successful factorizations. In 2001, the original series of challenge numbers was superceded by a new challenge series, RSA-576 through RSA-2048, where the name indicates the size of the number in bits. These carry substantial cash prizes, ranging from $10,000 for RSA-576 to $200,000 for RSA-2048, but even these amounts pale by comparison with the investment of manpower and computer time required to factor any of the challenge numbers thus far. The dates of successful factoring and other details are given in the table below for all RSA challenge numbers larger than 120 decimal digits. Challenge number RSA-120 RSA-129 RSA-130 RSA-140 RSA-155 (512 bits) RSA-160 RSA-576 RSA-640 RSA-200 RSA-704 RSA-768 RSA-896 RSA-1024 RSA-1536 RSA-2048 Decimal digits 120 129 130 140 155 Year factored 1993 1994 1996 1999 1999 Factoring team Method Compute time Lenstra, et al Atkins, et al Lenstra, et al Montgomery, et al Montgomery, et al MPQS MPQS GNFS GNFS GNFS 830 MIPS-years 5000 MIPS-years 1000 MIPS-years 2000 MIPS-years 8000 MIPS-years 160 174 193 200 212 232 270 309 463 617 2003 2003 2005 2005 not factored not factored not factored not factored not factored not factored Franke, et al Franke, et al Bahr, et al Kleinjung, et al GNFS GNFS GNFS GNFS 2.7 1-GHz Pentium-years 13 1-GHz Pentium-years 30 2.2-GHz Opteron-years 75 2.2-GHz Opteron-years MPQS = multiple polynomial quadratic sieve GNFS = general number field sieve To better grasp the magnitude of these efforts, we can look at more detailed statistics for RSA129 and RSA-200, the largest RSA challenge numbers factored by MPQS and GNFS, respectively (data taken from [1] and [10]). RSA-129: year completed size factor base large prime bound full relations additional full relations derived from partial and double partial relations amount of data time for sieving step time for matrix step 1994 524339 230 1.1 X 105 4.6 X 105 2 GB 5000 MIPS-years 45 hrs RSA-200: year completed factor base bound (algebraic side) factor base bound (rational side) large prime bound relations from lattice sieving relations from line sieving total relations (after duplicates) matrix size (rows and columns) non-zero entries in matrix time for sieving step time for matrix step 2005 3 X 108 18 X 107 235 26 X 108 5 X 107 22.6 X 108 64 X 106 solved by block Wiedemann 11 X 109 55 2.2-GHz Opteron-years 20 2.2-GHz Opteron-years The hardware used for the matrix step of RSA-200 was a cluster of 80 2.2-GHz Opterons connected via a Gigabit network. As expected for a problem that pushes the envelope of computational feasibility, the quantities and sizes of everything involved are staggering. It also seems evident that further scaling to attack larger RSA moduli will not be easily achieved. This brings us to the million-dollar question for the consumer of RSA cryptosystems: Are the RSA keys in use at my organization secure? Or more precisely, at what point in the future will advances in factoring methods put those keys at risk? (The term RSA key is equivalent to RSA modulus for the purposes of the remaining discussion.) No-one any longer advocates use of 512 bit RSA keys; with today’s hardware and algorithms, such a key could be factored by a cluster of a few dozen PCs in a month. What about the currently recommended standards of 768 bit, 1024 bit, and 2048 bit keys? The answer must consider the value of the information being protected by the key, in addition to the purely technical issues around the difficulty of factoring. Extrapolating from recent trends in advances in factoring, we might expect a massive effort to succeed in factoring a 768 bit key sometime in the next 5 to 7 years, whereas the 1024 bit benchmark should stand for decades (details below). To factor a chosen 768 bit key, an adversary would have to fully replicate that effort, spending millions of dollars and months of time. For information which has modest value (say a typical online consumer purchase), and/or where the data has a transient lifetime (SSL sessions), there is simply no economic incentive for the adversary, and a 768 bit key is probably adequate. For data with higher value (e.g. large bank transactions), and/or where longevity of decades is required (signatures on contracts), a minimum key size of 1024 bits is advisable, and 2048 bits could be considered. The above extrapolation assumes: no fundamental breakthroughs in factoring algorithms factoring efforts will continue to use general-purpose computer hardware the capability of general-purpose computer hardware will improve at traditional rates With these conservative assumptions, it is straightforward to estimate resource requirements for factoring untried larger integers, relative to the resources required for known successful factorings. Here n’ is the size of the untried larger integer, and n is the size of an integer which has been successfully factored: time required for n’ relative to n = L( n’ )GNFS / L( n )GNFS memory required for n’ relative to n = ( L( n’ )GNFS / L( n )GNFS ) The current best estimate for GNFS sieving on a 768-bit integer requires 18,000 PCs, each with 5 GB memory, working for a year [11]. To sieve a 1024-bit integer in a year would take on the order of 50,000,000 PCs, each with 10 GB main memory, plus additional DRAM. The cost to acquire the latter hardware would exceed US$ 1011! Will these conservative assumptions hold? It is impossible to know whether and when a major algorithmic advance over GNFS might occur. There are no well-identified theoretical avenues to such an advance. But it is worth noting that in the heyday of QS, numerous efforts to produce better algorithms gave results with theoretical run times no better than QS, and there was speculation this might represent a fundamental lower limit on run times. There is, of course, the dark horse of quantum computing. If quantum hardware ever scales, RSA cryptosystems could quickly become worthless. The real threat to overturning past trends, however, probably lies with proposals to perform GNFS sieving using specially designed hardware [11]. The past five years have seen the emergence of designs known as TWINKLE (based on electro-optics) and TWIRL (based on parallel processing pipelines), and one using mesh circuits (based on two-dimensional systolic arrays). The designs appear to have matured beyond the conceptual stage, and might be ready for serious attempts at reduction to practice. TWIRL seems to offer the greatest overall advantage in terms of cost and speed. It is estimated that c. 200 independent 1 GHz TWIRL clusters could complete GNFS sieving on a 1024 bit integer in one year. To build the clusters would incur a one-time R&D cost of US$ 10-20M, but only around US$ 1.1M for the actual manufacture. This reduces the cost of factoring by 5 to 6 orders of magnitude, and brings it easily within the reach of large organizations, especially governments. References [1] A. K. Lenstra, “Integer Factoring,” Designs, Codes, and Cryptography, 19: 101-128 (2000). [2] C. Pomerance, “A Tale of Two Sieves”, Notices Amer. Math. Soc., 43: 1473-1485 (1996). [3] Wikipedia contributors (2006). Integer factorization. Wikipedia, The Free Encyclopedia. Retrieved March 5, 2006 from http://en.wikipedia.org/w/index.php?title=Integer_factorization&oldid=40925252. [4] R. Crandall and C. Pomerance, Prime Numbers, A Computational Perspective, 2nd Ed., Chaps. 5 and 6, Springer, New York (2005). [5] M. Kraitchik, Theorie de Nombres, II, pp. 195-208, Gauthier-Villars, Paris (1926). [6] M. A. Morrison and J. Brillhart, “A method of factoring and the factorization of F7”, Math. Comp, 29: 183-205 (1975). [7] J. Dixon, “Asymptotically fast factorization of integers”, Math. Comp., 36: 255-260 (1981). [8] http://www.rsasecurity.com/rsalabs/ [9] http://www.crypto-world.com/FactorWorld.html [10] http://www.loria.fr/~zimmerma/records/rsa200 [11] A. Shamir and E. Tromer, “Special-Purpose Hardware for Factoring: the NFS Sieving Step”, Proc. Workshop on Special Purpose Hardware for Attacking Cryptographic Systems (SHARCS) (2005). Available at: http://www.wisdom.weizmann.ac.il/~tromer/papers/hwsieve.pdf