Document 12269774

advertisement
Actual and proposed
special purpose hardware devices
for integer factorization
(a historical perspective)
Arjen K. Lenstra
Lucent Technologies’ Bell Labs
Integer factorization
Given a composite n, find a non-trivial factor of n
Example: given n = 15, find 3 or 5
Why?
• until 1977: mostly for recreational purposes
• since then, a somewhat better excuse:
to figure out secure RSA key sizes
Special purpose hardware for
integer factorization
Why?
• Before the 1950s: no choice
• 1950 - 1970s: access to computers too limited
• Later: computers believed to be too general or too slow
• Now:
• software approaches stuck at around 600-bit integers
• like to find out how hard factoring 1024-bit integers would be
Actual and proposed special purpose
hardware devices for integer factorization
Actual and proposed special purpose
hardware devices for integer factorization
• 1919, Carissan: Machine à Congruences
• 1930s, Lehmer: Bicycle Chain Sieve,
Photo-Electric Number Sieve, and Movie Film Sieve
• 1970s, Smith/Wagstaff: Georgia Cracker
• 1980s, Pomerance/Smith/Tuler: Quasimodo
Actual and proposed special purpose
hardware devices for integer factorization
• 1919, Carissan: Machine à Congruences
• 1930s, Lehmer: Bicycle Chain Sieve,
Photo-Electric Number Sieve, and Movie Film Sieve
• 1970s, Smith/Wagstaff: Georgia Cracker
• 1980s, Pomerance/Smith/Tuler: Quasimodo
• 1999, Shamir: Twinkle
• 2002, Bernstein: Factoring Circuits
• 2003, Shamir/Tromer: Twirl
• 2004, Geiselmann/Steindwandt: YASD
• 2005, Franke/Kleinjung/Paar/Pelzl/Priplata/Stahlke: SHARK
(and several other matrix step proposals)
The early machines (Carissan, Lehmer)
To factor n, try to solve n = x2  y2 = (x + y)(x  y):
look for x = i + [n] such that x2  n is a square (y2):
• for a small set of small primes p:
• manually find the x’s for which
x2  n modulo p is a square
• mark those x’s on a ‘wheel’ with p positions
• turn all wheels simultaneously (i = 0,1,2,…) until there is
a set of conditions (one per wheel) that ‘lines up’
• hope that it leads to the desired solution y
• if there is a solution, it will show up, but it may take a while
Carissan’s Machine à Congruences
• 14 concentric brass rings with p  59 studs per ring
• conditions ‘x2  n square mod p’ represented by caps on studs
• a cap under the arm triggers a switch
• 14 switches in series: alarm sounds if all 14 switches triggered
Some results
• primality proof of 708 158 977 in 10 minutes (of manual cranking)
• factorization, in 18 minutes:
7 141 075 053 842 = 2  841 249  4 244 329
• around 1920: the only prototype disappeared in a drawer,
not to be seen again until March 1992
• see: Jeff Shallit, Hugh Williams, François Morain,
Discovery of a lost factoring machine,
Mathematical Intelligencer 17 (1995) 41-47
Lehmer’s Bicycle Chain Sieve
• cruder (but faster: motorized!) version of same idea
• found that
9 999 000 099 990 001 = 1 676 321  5 964 848 081
Lehmer’s Photo-Electric Number Sieve
• ‘condition’ corresponds to a hole in a sprocket-wheel
• if holes line up: a (weak) light beam passes through, caught by
photo-electric detector (‘the fair Rebecca’) & stops the machine
(unless nearby ham radio operator was active)
• much faster than Carissan’s machine
Some results
• factorization, in 12 seconds:
279  1 = 2 687  202 029 703  1 113 491 139 767
• factors of 293 + 1 , in ‘a few’ seconds:
529 510 939 and 2 903 110 321
• see: D.N. Lehmer,
Hunting big game in the theory of numbers,
Scripta Mathematica 1 (1932-33) 229-235
D.H. Lehmer,
A photo-electric number sieve,
Amer. Math. Monthly 40 (1933) 401-406
The later machines
All based on the 1970s Morrison-Brillhart approach:
to factor n, try to solve x2  y2 mod n as follows
1. Collect set V of integers v with v2  pP pe(v,p) mod n
for some fixed set P and |V| > |P|
Relation collection step ‘hard’ : Georgia Cracker, Quasimodo
Twinkle, Twirl, YASD, SHARK
2. Find |V|  |P| linear dependencies mod 2 among the
|P|-dimensional vectors (e(v,p))vV
Matrix step
‘easy’ : Factoring Circuits
3. Each dependency leads to pair x, y with x2  y2 mod n
and thus to a chance to factor n by computing gcd(n, x  y)
The Georgia Cracker
• special purpose hardware to collect relations
using CFRAC (continued fraction factoring method)
• no striking or particularly interesting features (no picture either)
• used to factor numbers from Cunningham tables,
largest: a 62-digit factor of 3204 + 1, January 1986
• sitting on a shelf in Jeff Smith’s office:
‘it could be working again <1wk’
Quasimodo
• stands for Quadratic Sieve Motor
• special purpose hardware to collect relations
using QS (quadratic sieve factoring method)
• interesting pipelined architecture
• supposedly very fast, when it was designed
• no longer so when it was actually built
• never properly debugged, never used to factor anything
• parts of only existing prototype used for other purposes
• never seen it, no pictures, unclear what survives, if anything
Intermezzo
Since the late 1980s:
• PCs become ubiquitous
• computing power for relation collection step can
relatively easily be ‘arranged’
• as a result:
• special purpose devices no longer worth the trouble,
unless they offer something new or special
(or lead to interesting funding possibilities)
• relation collection step easiest
(just sit back and relax until done, progress can be monitored)
• matrix more cumbersome
(get your hands on a big machine, worry about bits)
Twinkle, 1999
• The first special purpose hardware factoring device since
internet factoring became popular
• stands for The Weizmann INstitute Key Locating Engine
• special purpose optical sieving device to collect
relations using QS or NFS (number field sieve)
• short history:
• spring 1999: wild claims in press that
512-bit RSA moduli can be broken very quickly
• May 1999: Twinkle announced at EC99 rumpsession
• August 1999: 512-bit RSA actually broken
(but not using Twinkle)
• May 2000: Twinkle buried at EC2000
Regular sieving
• initialize s[i] = 0 for all i in some large interval I
• for all p  P:
• compute starting point rp
• for all rp + kp  I with k  Z:
replace s[rp + kp ] by s[rp + kp ] + logp
• further process all i  I for which s[i] is large enough
sieve s represented by space
sieve represented by time
Twinkle:
p  P processed in time
p  P processed in space
(just like Carissan and the Lehmer sieves)
Twinkle sieving
1. Build a wafer with for all p  P:
• a cell with:
• a counter c starting at 0
• a register a containing rp, the starting point for p
• an LED of strength proportional to logp
2. Put a photo-electric cell opposite the wafer
for i = 0, 1, 2, … in succession:
• on all cells simultaneously:
• if c = a: flash the LED and replace a by a + p
• replace c by c + 1
( for cell p, light of intensity logp flashes at i = rp + kp)
• if light intensity at photo-electric cell strong enough:
many p’s flash at i, thus further process i
Analysis of Twinkle
• for 384-bit QS factorization: not clearly infeasible
• for anything interesting (such as, back then, 512-bit moduli):
• wafer too large to be practical
multi-wafer designs
• wafer may melt (part of audience did) run it at lower speed
• processing of reports too expensive
add hardware
• idea in the mean time abandoned
• except for a rather crude prototype, device never built
Factoring circuits, 2002
At least two interesting aspects:
1. Claim that 3d-digit integers can be factored
at the cost of d-digit integers using old method
2. A new method to do the matrix step
Influential, because:
• It caused confusion (almost panic), thus got a lot of attention
•
Triggered lots of new activity in this field
(possibly even culminating in the present workshop)
•
Pushed a new, better cost function: time  equipment cost
Matrix step
Find dependency mod 2 among columns of sparse A:
compute Aiv for some vector v and 1  i  m = dim(A)
(plus additional fiddling around)
Traditional:
• Matrix-by-vector multiply in time w(A) = O(m)
• Repeat m times: total time O(m2)
full cost O(m3)
But: needs O(m) memory
}
Bernstein’s sorting method (or Shamir/Tromer routing variant):
• Store matrix in square mesh of  w(A) = O(m) processors
• Matrix-by-vector multiply in time O(m½) on mesh
• m times: total time O(m1½) (but O(m2½) operations)
 full cost O(m2½)
Matrix step hardware proposals and claims
This workshop, and earlier:
• several mesh proposals
• systolic architecture(s)
Results and claims for 1024-bit moduli
• strongly depend on dimension and density of the matrix
• results of mostly speculative nature
• matrix step still seems not as hard as relation collection
• known: factor bases sizes that will most likely work
• no real clue yet about dimension and density of the matrix
Relation collection in the NFS
A common version of the problem:
• integer m, polynomial f of degree d, smoothness bounds B1, B2
• find many coprime integers a and b > 0 such that
|a  mb| is B1-smooth and |bdf(a/b)| is B2-smooth
Software approaches:
• line sieving: for b = 1, 2, … in succession process line of a’s
• special q: for many q’s, look at a,b with q | bdf(a/b):
• do line sieving in index q sublattice (insane but common)
• do lattice sieving as suggested by
• Pollard’s ancient paper (not so bad)
• Franke and Kleinjung’s SHARCS paper (looks promising)
• combine with or replace by non-sieving methods such as
elliptic curve factoring or FFT&gcd based
Special purpose hardware to collect relations
My limited understanding of the situation (as of Feb 23):
• TWIRL: line siever (or KF lattice siever?) with ‘priority queues’,
‘challenging’ pipelined design
1024-bit in about 1M$yr
• YASD: traditional liner siever,
mesh based, no inter-chip connects,
6.3 times slower than TWIRL
• SHARK: KF lattice siever with special cofactor hardware,
modular, realizable ASIC design
1024-bit in < 200M$yr
• ECCITY: replace all sieving by Elliptic Curve Factoring,
‘fills entire country with multicomputers, each of which
has the size of a major city’ (at break-even point)
Putting 1-200M$yr into context
SHA-1 random collision attack:
• fewer than 269 SHA-1 applications
• SHA-1 application takes fewer than 900 cycles
• playstation 3 VGA card (16 vector 4.5GHz PE’s) costs US$50
• attacking SHA-1 on a single COTS card takes 2K centuries
 Attacking SHA-1 costs 10M$yr
• same ballpark as 1024-bit RSA
• same cards: crack DES in a day for about 200K
• SHA1 attack cost: down from 20B$yr a few weeks ago…
• at least a factor 200 gap between
1024-bit RSA and 80-bit security
• what about running ECM on those cards?
Conclusion
• design and evaluation of current special purpose hardware
factoring devices still mostly in the mud slinging phase
• listen to the talks here and make up your own mind
• my pessimistic guesses:
• none of the currently proposed devices will collect relations
for an actual 1024-bit factorization anytime soon
• special purpose factoring hardware will not have
much impact on the security of RSA moduli until
quantum computers are built
(I hope I will be proved wrong)
Download