Actual and proposed special purpose hardware devices for integer factorization (a historical perspective) Arjen K. Lenstra Lucent Technologies’ Bell Labs Integer factorization Given a composite n, find a non-trivial factor of n Example: given n = 15, find 3 or 5 Why? • until 1977: mostly for recreational purposes • since then, a somewhat better excuse: to figure out secure RSA key sizes Special purpose hardware for integer factorization Why? • Before the 1950s: no choice • 1950 - 1970s: access to computers too limited • Later: computers believed to be too general or too slow • Now: • software approaches stuck at around 600-bit integers • like to find out how hard factoring 1024-bit integers would be Actual and proposed special purpose hardware devices for integer factorization Actual and proposed special purpose hardware devices for integer factorization • 1919, Carissan: Machine à Congruences • 1930s, Lehmer: Bicycle Chain Sieve, Photo-Electric Number Sieve, and Movie Film Sieve • 1970s, Smith/Wagstaff: Georgia Cracker • 1980s, Pomerance/Smith/Tuler: Quasimodo Actual and proposed special purpose hardware devices for integer factorization • 1919, Carissan: Machine à Congruences • 1930s, Lehmer: Bicycle Chain Sieve, Photo-Electric Number Sieve, and Movie Film Sieve • 1970s, Smith/Wagstaff: Georgia Cracker • 1980s, Pomerance/Smith/Tuler: Quasimodo • 1999, Shamir: Twinkle • 2002, Bernstein: Factoring Circuits • 2003, Shamir/Tromer: Twirl • 2004, Geiselmann/Steindwandt: YASD • 2005, Franke/Kleinjung/Paar/Pelzl/Priplata/Stahlke: SHARK (and several other matrix step proposals) The early machines (Carissan, Lehmer) To factor n, try to solve n = x2 y2 = (x + y)(x y): look for x = i + [n] such that x2 n is a square (y2): • for a small set of small primes p: • manually find the x’s for which x2 n modulo p is a square • mark those x’s on a ‘wheel’ with p positions • turn all wheels simultaneously (i = 0,1,2,…) until there is a set of conditions (one per wheel) that ‘lines up’ • hope that it leads to the desired solution y • if there is a solution, it will show up, but it may take a while Carissan’s Machine à Congruences • 14 concentric brass rings with p 59 studs per ring • conditions ‘x2 n square mod p’ represented by caps on studs • a cap under the arm triggers a switch • 14 switches in series: alarm sounds if all 14 switches triggered Some results • primality proof of 708 158 977 in 10 minutes (of manual cranking) • factorization, in 18 minutes: 7 141 075 053 842 = 2 841 249 4 244 329 • around 1920: the only prototype disappeared in a drawer, not to be seen again until March 1992 • see: Jeff Shallit, Hugh Williams, François Morain, Discovery of a lost factoring machine, Mathematical Intelligencer 17 (1995) 41-47 Lehmer’s Bicycle Chain Sieve • cruder (but faster: motorized!) version of same idea • found that 9 999 000 099 990 001 = 1 676 321 5 964 848 081 Lehmer’s Photo-Electric Number Sieve • ‘condition’ corresponds to a hole in a sprocket-wheel • if holes line up: a (weak) light beam passes through, caught by photo-electric detector (‘the fair Rebecca’) & stops the machine (unless nearby ham radio operator was active) • much faster than Carissan’s machine Some results • factorization, in 12 seconds: 279 1 = 2 687 202 029 703 1 113 491 139 767 • factors of 293 + 1 , in ‘a few’ seconds: 529 510 939 and 2 903 110 321 • see: D.N. Lehmer, Hunting big game in the theory of numbers, Scripta Mathematica 1 (1932-33) 229-235 D.H. Lehmer, A photo-electric number sieve, Amer. Math. Monthly 40 (1933) 401-406 The later machines All based on the 1970s Morrison-Brillhart approach: to factor n, try to solve x2 y2 mod n as follows 1. Collect set V of integers v with v2 pP pe(v,p) mod n for some fixed set P and |V| > |P| Relation collection step ‘hard’ : Georgia Cracker, Quasimodo Twinkle, Twirl, YASD, SHARK 2. Find |V| |P| linear dependencies mod 2 among the |P|-dimensional vectors (e(v,p))vV Matrix step ‘easy’ : Factoring Circuits 3. Each dependency leads to pair x, y with x2 y2 mod n and thus to a chance to factor n by computing gcd(n, x y) The Georgia Cracker • special purpose hardware to collect relations using CFRAC (continued fraction factoring method) • no striking or particularly interesting features (no picture either) • used to factor numbers from Cunningham tables, largest: a 62-digit factor of 3204 + 1, January 1986 • sitting on a shelf in Jeff Smith’s office: ‘it could be working again <1wk’ Quasimodo • stands for Quadratic Sieve Motor • special purpose hardware to collect relations using QS (quadratic sieve factoring method) • interesting pipelined architecture • supposedly very fast, when it was designed • no longer so when it was actually built • never properly debugged, never used to factor anything • parts of only existing prototype used for other purposes • never seen it, no pictures, unclear what survives, if anything Intermezzo Since the late 1980s: • PCs become ubiquitous • computing power for relation collection step can relatively easily be ‘arranged’ • as a result: • special purpose devices no longer worth the trouble, unless they offer something new or special (or lead to interesting funding possibilities) • relation collection step easiest (just sit back and relax until done, progress can be monitored) • matrix more cumbersome (get your hands on a big machine, worry about bits) Twinkle, 1999 • The first special purpose hardware factoring device since internet factoring became popular • stands for The Weizmann INstitute Key Locating Engine • special purpose optical sieving device to collect relations using QS or NFS (number field sieve) • short history: • spring 1999: wild claims in press that 512-bit RSA moduli can be broken very quickly • May 1999: Twinkle announced at EC99 rumpsession • August 1999: 512-bit RSA actually broken (but not using Twinkle) • May 2000: Twinkle buried at EC2000 Regular sieving • initialize s[i] = 0 for all i in some large interval I • for all p P: • compute starting point rp • for all rp + kp I with k Z: replace s[rp + kp ] by s[rp + kp ] + logp • further process all i I for which s[i] is large enough sieve s represented by space sieve represented by time Twinkle: p P processed in time p P processed in space (just like Carissan and the Lehmer sieves) Twinkle sieving 1. Build a wafer with for all p P: • a cell with: • a counter c starting at 0 • a register a containing rp, the starting point for p • an LED of strength proportional to logp 2. Put a photo-electric cell opposite the wafer for i = 0, 1, 2, … in succession: • on all cells simultaneously: • if c = a: flash the LED and replace a by a + p • replace c by c + 1 ( for cell p, light of intensity logp flashes at i = rp + kp) • if light intensity at photo-electric cell strong enough: many p’s flash at i, thus further process i Analysis of Twinkle • for 384-bit QS factorization: not clearly infeasible • for anything interesting (such as, back then, 512-bit moduli): • wafer too large to be practical multi-wafer designs • wafer may melt (part of audience did) run it at lower speed • processing of reports too expensive add hardware • idea in the mean time abandoned • except for a rather crude prototype, device never built Factoring circuits, 2002 At least two interesting aspects: 1. Claim that 3d-digit integers can be factored at the cost of d-digit integers using old method 2. A new method to do the matrix step Influential, because: • It caused confusion (almost panic), thus got a lot of attention • Triggered lots of new activity in this field (possibly even culminating in the present workshop) • Pushed a new, better cost function: time equipment cost Matrix step Find dependency mod 2 among columns of sparse A: compute Aiv for some vector v and 1 i m = dim(A) (plus additional fiddling around) Traditional: • Matrix-by-vector multiply in time w(A) = O(m) • Repeat m times: total time O(m2) full cost O(m3) But: needs O(m) memory } Bernstein’s sorting method (or Shamir/Tromer routing variant): • Store matrix in square mesh of w(A) = O(m) processors • Matrix-by-vector multiply in time O(m½) on mesh • m times: total time O(m1½) (but O(m2½) operations) full cost O(m2½) Matrix step hardware proposals and claims This workshop, and earlier: • several mesh proposals • systolic architecture(s) Results and claims for 1024-bit moduli • strongly depend on dimension and density of the matrix • results of mostly speculative nature • matrix step still seems not as hard as relation collection • known: factor bases sizes that will most likely work • no real clue yet about dimension and density of the matrix Relation collection in the NFS A common version of the problem: • integer m, polynomial f of degree d, smoothness bounds B1, B2 • find many coprime integers a and b > 0 such that |a mb| is B1-smooth and |bdf(a/b)| is B2-smooth Software approaches: • line sieving: for b = 1, 2, … in succession process line of a’s • special q: for many q’s, look at a,b with q | bdf(a/b): • do line sieving in index q sublattice (insane but common) • do lattice sieving as suggested by • Pollard’s ancient paper (not so bad) • Franke and Kleinjung’s SHARCS paper (looks promising) • combine with or replace by non-sieving methods such as elliptic curve factoring or FFT&gcd based Special purpose hardware to collect relations My limited understanding of the situation (as of Feb 23): • TWIRL: line siever (or KF lattice siever?) with ‘priority queues’, ‘challenging’ pipelined design 1024-bit in about 1M$yr • YASD: traditional liner siever, mesh based, no inter-chip connects, 6.3 times slower than TWIRL • SHARK: KF lattice siever with special cofactor hardware, modular, realizable ASIC design 1024-bit in < 200M$yr • ECCITY: replace all sieving by Elliptic Curve Factoring, ‘fills entire country with multicomputers, each of which has the size of a major city’ (at break-even point) Putting 1-200M$yr into context SHA-1 random collision attack: • fewer than 269 SHA-1 applications • SHA-1 application takes fewer than 900 cycles • playstation 3 VGA card (16 vector 4.5GHz PE’s) costs US$50 • attacking SHA-1 on a single COTS card takes 2K centuries Attacking SHA-1 costs 10M$yr • same ballpark as 1024-bit RSA • same cards: crack DES in a day for about 200K • SHA1 attack cost: down from 20B$yr a few weeks ago… • at least a factor 200 gap between 1024-bit RSA and 80-bit security • what about running ECM on those cards? Conclusion • design and evaluation of current special purpose hardware factoring devices still mostly in the mud slinging phase • listen to the talks here and make up your own mind • my pessimistic guesses: • none of the currently proposed devices will collect relations for an actual 1024-bit factorization anytime soon • special purpose factoring hardware will not have much impact on the security of RSA moduli until quantum computers are built (I hope I will be proved wrong)