paper

advertisement
CSEP 590TU – Practical Aspects of Modern Cryptography
Winter 2006
Final Project Report
Due: March 7, 2006
Author: J. Jeffry Howbert
FACTORING RSA MODULI: CURRENT STATE OF THE ART
The security of the widely used RSA (Rivest-Shamir-Adelman) public key cryptosystem rests on
the presumed difficulty of deriving the two prime factors of the chosen modulus. This report
examines the status of that security in two main sections. In the first, the mathematics of the
current best methods for factoring very large integers is described. The second section reviews
recent progress in the application of these methods to factoring RSA moduli, along with the
implications for the security of RSA when using various size moduli.
Algorithms for factoring large integers
Several useful surveys of integer factoring methods exist; see [1] – [4]. [1] is the best overall
introduction for someone unfamiliar with the topic, followed by [2] and [3]. The treatment in [4] is
more in-depth and comprehensive, but assumes a higher degree of mathematical sophistication.
Known factoring algorithms may be classified in several ways. They can be distinguished, for
example, according to their computational complexity, as running in either exponential time,
subexponential time, or polynomial time. A particularly important division is between special
purpose and general purpose factoring algorithms. In special purpose algorithms, running time
depends on the size of the integer being factored, the size and number of the factors, and
whether the integer has a special form. The running time of general purpose algorithms, by
contrast, depends solely on the size of the integer being factored. In practice, only the most
advanced general purpose algorithms have been useful for attacking large RSA moduli. With this
in mind, the special purpose algorithms will be mentioned only in passing, and emphasis placed
instead on the historical and conceptual development of the general purpose algorithms.
Special purpose algorithms of note include:

Trial division by all primes up to n.

Fermat factorization (see below).

Pollard’s rho algorithm. Invented in 1975.

Pollard’s p - 1 algorithm. Invented in 1974.

Williams’ p + 1 algorithm. Invented in 1982.

Elliptic curve factorization. Invented by H. Lenstra 1987. (See end of this section for limited
detail.)
The general purpose algorithms are all based on some elaboration of the congruence of squares
method. The original version of the method has been improved through several important
conceptual advances, leading to remarkable increases in the size of integers which can be
factored. The main stages in the evolution of general purpose methods can summarized as:
1) difference of squares (Fermat’s method)
2) congruence of squares (Kraitchik)
3) filling a matrix with smooth relations, and processing the matrix with linear algebra (Morrison
and Brillhart; Dixon)
4) sieving to find smooth relations more efficiently (Pomerance and others)
The cornerstone of the congruence of squares method is Fermat’s method of factorization,
discovered by him in the 1600s. He observed that any odd integer n > 1 can be written as the
product of the sum and difference of two integers, and therefore as the difference of two integer
squares:
n = ( a + b )( a – b ) = a2 – b2
Thus if n is a composite number with unknown factors, a factorization of n can be achieved by
finding two integer squares whose difference equals n. The simplest algorithm for searching for
appropriate integer squares is to evaluate the expression:
x = n + i2
for successive i = 0, 1, 2, ...
If x is an integer square, then a factorization has been found. This approach often works well in
practice for modest size n, but has the disadvantage that testing for integral square roots must be
done on numbers x which are at least as large as n. A more efficient approach is to keep x small
by evaluating the expression:
x = ( n + i )2 – n
for successive i = 0, 1, 2, ....
As before, finding some i for which x is an integer square generates a factorization:
n = ( n + i )2 – x = ( n + i – x )( n + i + x )
Fermat’s factorization method is especially effective when the factors are similar in size, i.e. are
close to n, but it can be even slower than trial division if the factors are significantly different.
The first point serves as a caution that the primes selected to form an RSA modulus should not
be too close together.
A generalization of Fermat’s method was developed in the 1920s by Kraitchik [5], wherein one
searches for integers a and b such that a2 – b2 is a multiple of n, that is, a congruence of squares
in which:
b2  a2 mod n
If a congruence is found, then n divides a2 – b2 = ( a + b )( a – b ). The uninteresting solutions of
the congruence can be eliminated in advance by imposing the constraint:
b ! a mod n
In the remaining cases, the factors of n must be split in some fashion between ( a + b ) and ( a –
b ). The factors of n dividing ( a + b ) can be extracted by calculating gcd( n, a + b ), and the
factors dividing ( a – b ) by calculating gcd( n, a – b ). If n is indeed composite (and satisfies the
added condition that n is not a prime power), then there is at least a 50% chance that gcd( n, a +
b ) and gcd( n, a – b ) are non-trivial factors of n.
The true importance of Kraitchik’s generalization is that it allows the factorization to exploit
congruences where only one side of the congruence is an integer square. For example, if two
congruences (henceforth called relations) can be found such that:
b1  a12 mod n
b2  a22 mod n
where b1 and b2 are not integer squares, but b1  b2 is an integer square, then:
b1  b2  a12  a22 mod n
is a congruence of squares, and a factorization has likely been obtained. The approach is
extensible, in that any number of relations can be multiplied together to produce one in which the
product of the bi on the left-hand side is an integer square.
It is easy to generate an arbitrary number of relations bi  ai2 mod n, but generally non-trivial to
find a subset of bi whose product forms an integer square. Kraitchik approached the problem by
collecting bi which are easily and fully factored into small primes. One then looks for some
combination of these bi such that the product of each individual prime factor across the bi is an
even power. The overall product of the bi must then be an integer square.
With the advent of digital computers, it became feasible to systematically process large numbers
of bi. In 1975 Morrison and Brillhart [6] introduced a method called CFRAC, which applied linear
algebra and continued fractions to extract congruent squares from large numbers of relations.
They used it to factor the seventh Fermat number, a famous result at the time. During the same
period, Dixon [7] developed a similar approach which did not involve continued fractions.
Although less efficient, Dixon’s algorithm is conceptually much simpler, and will be used for
purposes of explanation here.
The first step in the algorithm is to choose a factor base of small primes p1, p2, p3, ... pk. The
largest prime in the set, pk, is called the smoothness bound B, and any integer which factors
completely over the factor base is referred to as B-smooth. A set of aI near n (or ( k  n ),
where k is various small positive integers) are then chosen at random, and used to generate bi
according to:
bi = ai2 mod n
The bi are trial factored over the factor base. The chances of successful factoring are enhanced
by the fact that ai are near n and the bi therefore relatively small. Any relation (ai, bi) for which bi
is smooth is saved. In the next step, each smooth bi is converted to a vector representation vi of
the exponents of its factors. For example, if the factor base = { 2, 3, 5, 7 } and b i = 378 = 21  33 
50  71, then vi = [ 1, 3, 0, 1 ]. The goal now is as with Kraitchik’s method: find a subset of bi
whose product  bi is an integer square. The vectors vi simplify this in several ways:

The multiplications of bi are replaced by additions of vi.

The squareness of any resulting  vi is easily tested by reducing it mod 2. (This operation
can be illustrated with the example above: vi mod 2 = [ 1, 3, 0, 1 ] mod 2 = [ 1, 1, 0, 1 ] ). In
the desired result, all the powers of the prime factors in  bi are even; this is equivalent to  vi
= [ 0, 0, 0, ..., 0 ], i.e. a vector sum whose components are all zero.

The vi can be placed as rows in a matrix and manipulated with the standard tools of linear
algebra.

Of greatest importance, it is now possible to guarantee a solution. This requires only that the
number of smooth relations collected be greater than the number of primes in the factor base.
In that event, the number of rows in the vector matrix is greater than the number of columns,
and the basic theorems of linear algebra assert that a linear dependence exists. The matrix
can be reduced using standard methods, which are readily adapted to work with mod 2
arithmetic. In addition to structured Gaussian elimination, the block Lanczos and block
Wiedemann methods are popular.
In practice the choice of B is critical. If B is small, testing for smoothness is easy, but it may be
difficult or impossible to find any relations that are B-smooth. If B is large, the work involved in
smoothness testing goes up, and more relations must be gathered to fill the matrix adequately.
An analysis by Pomerance [4] suggests that the optimal choice of B is:
B ~ exp( 1/2 ( ln n )1/2 ( ln ln n )1/2 )
The linear algebra processing of the matrix, described above, is often referred to as the matrix
step.
The final innovations to the basic congruence of squares method involve faster ways to find the
B-smooth integers needed to populate the matrix. These replace the random generation of
relations ( ai, bi ) using trial division with processes that create series of candidate smooth
integers separated by multiples of the primes in the factor base. Because this is reminiscent of
the sieve of Eratosthenes, it is commonly called the sieving step. The sieving and matrix steps of
the most advanced general purpose algorithms (below) are also referred to as data collection and
data processing.
The first sieving method to enjoy widespread practical application was quadratic sieving (QS),
invented by Pomerance in 1981. First a continuum of bi is generated over a range of ai near n,
using the formula bi = ai2 – n. The sieving then proceeds by following these steps for each prime
p in the factor base:
1) Determine whether n is a nonzero square modulo p. If not, sieving with this p is not possible;
skip to the next p. (In practice, this is a disqualification for including p in the factor base.)
2) Extract the square roots x1, x2 of n modulo p by solving the congruence x2  n mod p. (The
Shanks-Tonelli algorithm is efficient for this.)
3) Find the smallest ai such that ai = x1 and x2 and flag them. Note that the corresponding bi  ai2
– n  0 mod p, and so are necessarily divisible by p.
4) Flag all ai such that:
or
ai = x1 + kp
ai = x2 + kp
k = 1, 2, 3, ...
Because ( xi + kp )2  xi2 mod p, we again have the corresponding bi  0 mod p and therefore
divisible by p (this is the essence of the sieve).
5) For all ai which are flagged, divide the corresponding bi by p.
6) Repeat steps 1) – 5) for the integer powers of p, pr up to some cutoff r.
When sieving is complete for all p in the factor base, the bi which have been reduced to 1 by
repeated division by primes in the factor base are exactly those which are smooth with respect to
the factor base. The overall process is radically faster than trial division because only those bi
which are divisible are actually divided.
One drawback to the basic QS algorithm is that as ai deviates further and further from n, the size
of bi grows, reducing the probability of it being smooth. This was addressed by an important
enhancement developed in 1983 by Davis. Use of multiple quadratic polynomials of the form:
bi = Aai2 + 2Bai + C
A, B, C 
where A, B, and C satisfy certain constraints that ensure bi is a square mod n, gives a continued
yield of smooth bi while keeping ai – n small. This enhancement, known as multiple polynomial
quadratic sieve (MPQS), is widely used in practice.
The yield of smooth bi can also be enhanced by combining partial relations. In a partial relation,
bi is smooth except for one (usually large) prime factor which is not in the factor base. If two
partial relations exist which have the same non-smooth factor px:
b1 = p1e11  p2e21  ...  pkek1  px  a12 mod n
b2 = p1e12  p2e22  ...  pkek2  px  a22 mod n
they can be multiplied together and the non-smooth factor eliminated by multiplying by its inverse,
to give a relation fully within the factor base:
b1  b2 = p1e11+e12  p2e21+e22  ...  pkek1+ek2  px2  a12  a22 mod n
b1  b2 = p1e11+e12  p2e21+e22  ...  pkek1+ek2  px2  (px-1)2  a12  a22  (px-1)2 mod n
b1  b2 = p1e11+e12  p2e21+e22  ...  pkek1+ek2  a12  a22  (px-1)2 mod n
The harvest of partial relations is vital to factoring truly large integers such as RSA moduli, where
the number of full relations derived from partial and double partial relations typically exceeds the
number of simple full relations by several-fold. In terms of speed, the sieving step of QS is about
2.5 times faster when partial relations are exploited, and another 2 times faster when double
partial relations are also included. There are modest penalties incurred from the greater amount
of data generated and stored, but they are more than repaid by the time saved on sieving per se.
Another computational efficiency can be realized by modifying Step 5) in the basic QS algorithm.
Rather than dividing bi by a prime which divides it, the “prime hit” is recorded by adding logr p to
an accessory storage location (r is an appropriately chosen base). After sieving all the primes in
the factor base, there will be some bi for which the sum in the accessory storage location is close
to logr bi; the smoothness of these is confirmed by trial division over the factor base.
QS has been largely superceded by the more powerful and faster general number field sieve
(GNFS) method. The mathematics of GNFS are beyond the scope of this report (and, at present,
largely beyond the comprehension of this author). The efficiency of GNFS lies in restricting the
search for smooth numbers to those of order n1/d, where d is a small integer such as 5 or 6. To
achieve this focus on small numbers, however, the computations of both the sieving and matrix
steps must be performed in algebraic number fields, making them much more complex than QS.
The concepts behind GNFS originated with Pollard’s proposal in 1988 to factor numbers of the
special form x3 + k with what subsequently became known as the special number field sieve
(SNFS). Over the next five years, it underwent intensive theoretical development into a form of
GNFS which proved practical to implement on computers. Major contributors to this progress
included the Lenstras, Pomerance, Buhler, and Adelman [2].
To understand the speed advantage of GNFS over QS, it is useful to first examine the resource
demands of the various steps in the algorithms. It turns out the sieving steps of both QS and
GNFS are extremely CPU-intensive, but require only moderate amounts of memory. Sieving is,
however, highly parallelizable, so it can be partitioned across large numbers of fairly ordinary
workstations. With QS, for example, a processor can work in isolation for weeks once it has been
given the number to factor, the factor base, and a set of polynomials to use in sieving. The
factoring of RSA-120 involved over 600 volunteers and their computers; sieving was parceled out
and results reported back via email.
The matrix step of QS and GNFS is just the opposite – there are huge computational advantages
to keeping the matrix in memory while it is processed, but it otherwise takes only a small fraction
of the time required for sieving. Historically, the matrix step has been performed on a
supercomputer at a central location, although some examples of distributing the matrix step of
GNFS have been reported recently. Some examples of statistics on resource usage during
factoring of RSA moduli with QS and GNFS are given in the second section of the report.
The speed advantage of GNFS over QS does not appear until the number being factored
exceeds around 110 digits in size. Below that size, QS is still the fastest general purpose
factoring algorithm known. However, when working at the limit of what can be factored today
(around 200 digits), GNFS is many-fold faster than QS.
The theoretical complexity of all the congruence of squares methods is inherently subexponential.
Dixon’s algorithm, which does not use sieving, can have a run time as favorable as:
L( n )Dixon ~ exp( ( 2 + o( 1 ) )  ( ln n )1/2  ( ln ln n )1/2 )
if the elliptic curve method for trial division is used. For QS, the run time is:
L( n )QS ~ exp( ( 1 + o( 1 ) )  ( ln n )1/2  ( ln ln n )1/2 )
Although the change in the leading constant relative to Dixon’s algorithm might seem trivial, it is in
fact of major significance – it leads to a doubling in the size of numbers which can be practically
factored. The theoretical advantage of GNFS is much more obvious, as the exponent on the ln n
term is smaller:
L( n )GNFS ~ exp( ( ( 64/9 )1/3 + o( 1 ) )  ( ln n )1/3  ( ln ln n )2/3 )
With one exception, the special purpose algorithms mentioned in the first part of this section have
exponential running times, yet another reason they are not competitive with general purpose
algorithms for factoring large integers. The exception is the elliptic curve method (ECM), which is
subexponential. Unlike the congruence of squares methods, its run time is dominated by the size
of the prime factor p, rather than the size of n. In the worst case, when the smallest prime factor
p ~ n, ECM has run time the same as QS. In favorable cases, where the smallest prime factor
is around 20 to 25 digits, ECM is faster than QS or GNFS, and is the method of choice.
As a final note on complexity, there is one known factoring algorithm with polynomial run time,
Shor’s algorithm for quantum computers. At present it poses no threat to RSA, due to lack of
suitable computing hardware. Its greatest achievement to date was factoring 15 into 3 and 5 on a
quantum computer with 7 qubits in 2001.
Application of factoring methods to RSA moduli
Information in this section is drawn primarily from various online documents at [8] and [9].
Historical progress in factoring RSA moduli is best understood in context of the RSA Factoring
Challenge. This is a public contest created by RSA Laboratories in 1991 as a means to
understand the practical difficulties involved in factoring large integers of the sort used in RSA
moduli. A set of challenge numbers was published, ranging in size from 100 to 500 decimal digits,
with each number being composed of exactly two factors, similar in size. The numbers were
created in such a way that no-one, even RSA Laboratories, knew their factors. This original set
of challenge numbers was named RSA-100 through RSA-500, where the number in the name
indicates the number of decimal digits in the challenge number. Nominal cash prizes (c. $1000)
were offered for successful factorizations. In 2001, the original series of challenge numbers was
superceded by a new challenge series, RSA-576 through RSA-2048, where the name indicates
the size of the number in bits. These carry substantial cash prizes, ranging from $10,000 for
RSA-576 to $200,000 for RSA-2048, but even these amounts pale by comparison with the
investment of manpower and computer time required to factor any of the challenge numbers thus
far.
The dates of successful factoring and other details are given in the table below for all RSA
challenge numbers larger than 120 decimal digits.
Challenge
number
RSA-120
RSA-129
RSA-130
RSA-140
RSA-155
(512 bits)
RSA-160
RSA-576
RSA-640
RSA-200
RSA-704
RSA-768
RSA-896
RSA-1024
RSA-1536
RSA-2048
Decimal
digits
120
129
130
140
155
Year
factored
1993
1994
1996
1999
1999
Factoring team
Method
Compute time
Lenstra, et al
Atkins, et al
Lenstra, et al
Montgomery, et al
Montgomery, et al
MPQS
MPQS
GNFS
GNFS
GNFS
830 MIPS-years
5000 MIPS-years
1000 MIPS-years
2000 MIPS-years
8000 MIPS-years
160
174
193
200
212
232
270
309
463
617
2003
2003
2005
2005
not factored
not factored
not factored
not factored
not factored
not factored
Franke, et al
Franke, et al
Bahr, et al
Kleinjung, et al
GNFS
GNFS
GNFS
GNFS
2.7 1-GHz Pentium-years
13 1-GHz Pentium-years
30 2.2-GHz Opteron-years
75 2.2-GHz Opteron-years
MPQS = multiple polynomial quadratic sieve
GNFS = general number field sieve
To better grasp the magnitude of these efforts, we can look at more detailed statistics for RSA129 and RSA-200, the largest RSA challenge numbers factored by MPQS and GNFS,
respectively (data taken from [1] and [10]).
RSA-129:
year completed
size factor base
large prime bound
full relations
additional full relations derived from
partial and double partial relations
amount of data
time for sieving step
time for matrix step
1994
524339
230
1.1 X 105
4.6 X 105
2 GB
5000 MIPS-years
45 hrs
RSA-200:
year completed
factor base bound (algebraic side)
factor base bound (rational side)
large prime bound
relations from lattice sieving
relations from line sieving
total relations (after duplicates)
matrix size (rows and columns)
non-zero entries in matrix
time for sieving step
time for matrix step
2005
3 X 108
18 X 107
235
26 X 108
5 X 107
22.6 X 108
64 X 106
solved by block Wiedemann
11 X 109
55 2.2-GHz Opteron-years
20 2.2-GHz Opteron-years
The hardware used for the matrix step of RSA-200 was a cluster of 80 2.2-GHz Opterons
connected via a Gigabit network.
As expected for a problem that pushes the envelope of computational feasibility, the quantities
and sizes of everything involved are staggering. It also seems evident that further scaling to
attack larger RSA moduli will not be easily achieved.
This brings us to the million-dollar question for the consumer of RSA cryptosystems: Are the RSA
keys in use at my organization secure? Or more precisely, at what point in the future will
advances in factoring methods put those keys at risk? (The term RSA key is equivalent to RSA
modulus for the purposes of the remaining discussion.) No-one any longer advocates use of 512
bit RSA keys; with today’s hardware and algorithms, such a key could be factored by a cluster of
a few dozen PCs in a month. What about the currently recommended standards of 768 bit, 1024
bit, and 2048 bit keys? The answer must consider the value of the information being protected by
the key, in addition to the purely technical issues around the difficulty of factoring.
Extrapolating from recent trends in advances in factoring, we might expect a massive effort to
succeed in factoring a 768 bit key sometime in the next 5 to 7 years, whereas the 1024 bit
benchmark should stand for decades (details below). To factor a chosen 768 bit key, an
adversary would have to fully replicate that effort, spending millions of dollars and months of time.
For information which has modest value (say a typical online consumer purchase), and/or where
the data has a transient lifetime (SSL sessions), there is simply no economic incentive for the
adversary, and a 768 bit key is probably adequate. For data with higher value (e.g. large bank
transactions), and/or where longevity of decades is required (signatures on contracts), a minimum
key size of 1024 bits is advisable, and 2048 bits could be considered.
The above extrapolation assumes:
 no fundamental breakthroughs in factoring algorithms
 factoring efforts will continue to use general-purpose computer hardware
 the capability of general-purpose computer hardware will improve at traditional rates
With these conservative assumptions, it is straightforward to estimate resource requirements for
factoring untried larger integers, relative to the resources required for known successful factorings.
Here n’ is the size of the untried larger integer, and n is the size of an integer which has been
successfully factored:
time required for n’ relative to n
= L( n’ )GNFS / L( n )GNFS
memory required for n’ relative to n
= ( L( n’ )GNFS / L( n )GNFS )
The current best estimate for GNFS sieving on a 768-bit integer requires 18,000 PCs, each with 5
GB memory, working for a year [11]. To sieve a 1024-bit integer in a year would take on the
order of 50,000,000 PCs, each with 10 GB main memory, plus additional DRAM. The cost to
acquire the latter hardware would exceed US$ 1011!
Will these conservative assumptions hold? It is impossible to know whether and when a major
algorithmic advance over GNFS might occur. There are no well-identified theoretical avenues to
such an advance. But it is worth noting that in the heyday of QS, numerous efforts to produce
better algorithms gave results with theoretical run times no better than QS, and there was
speculation this might represent a fundamental lower limit on run times. There is, of course, the
dark horse of quantum computing. If quantum hardware ever scales, RSA cryptosystems could
quickly become worthless.
The real threat to overturning past trends, however, probably lies with proposals to perform GNFS
sieving using specially designed hardware [11]. The past five years have seen the emergence of
designs known as TWINKLE (based on electro-optics) and TWIRL (based on parallel processing
pipelines), and one using mesh circuits (based on two-dimensional systolic arrays). The designs
appear to have matured beyond the conceptual stage, and might be ready for serious attempts at
reduction to practice. TWIRL seems to offer the greatest overall advantage in terms of cost and
speed. It is estimated that c. 200 independent 1 GHz TWIRL clusters could complete GNFS
sieving on a 1024 bit integer in one year. To build the clusters would incur a one-time R&D cost
of US$ 10-20M, but only around US$ 1.1M for the actual manufacture. This reduces the cost of
factoring by 5 to 6 orders of magnitude, and brings it easily within the reach of large organizations,
especially governments.
References
[1] A. K. Lenstra, “Integer Factoring,” Designs, Codes, and Cryptography, 19: 101-128 (2000).
[2] C. Pomerance, “A Tale of Two Sieves”, Notices Amer. Math. Soc., 43: 1473-1485 (1996).
[3] Wikipedia contributors (2006). Integer factorization. Wikipedia, The Free Encyclopedia.
Retrieved March 5, 2006 from
http://en.wikipedia.org/w/index.php?title=Integer_factorization&oldid=40925252.
[4] R. Crandall and C. Pomerance, Prime Numbers, A Computational Perspective, 2nd Ed.,
Chaps. 5 and 6, Springer, New York (2005).
[5] M. Kraitchik, Theorie de Nombres, II, pp. 195-208, Gauthier-Villars, Paris (1926).
[6] M. A. Morrison and J. Brillhart, “A method of factoring and the factorization of F7”, Math.
Comp, 29: 183-205 (1975).
[7] J. Dixon, “Asymptotically fast factorization of integers”, Math. Comp., 36: 255-260 (1981).
[8] http://www.rsasecurity.com/rsalabs/
[9] http://www.crypto-world.com/FactorWorld.html
[10] http://www.loria.fr/~zimmerma/records/rsa200
[11] A. Shamir and E. Tromer, “Special-Purpose Hardware for Factoring:
the NFS Sieving Step”, Proc. Workshop on Special Purpose Hardware for Attacking
Cryptographic Systems (SHARCS) (2005). Available at:
http://www.wisdom.weizmann.ac.il/~tromer/papers/hwsieve.pdf
Download