Extending the Generalized Fermat Prime Number Search Beyond One Million Digits Using GPUs Iain Bethune1 and Michael Goetz2 1 EPCC, The University of Edinburgh, James Clerk Maxwell Building, The King’s Buildings, Mayfield Road, Edinburgh, EH9 3JZ, UK, ibethune@epcc.ed.ac.uk, http://www.epcc.ed.ac.uk/~ibethune 2 PrimeGrid, mgoetz@primegrid.com, http://www.primegrid.com Abstract. Great strides have been made in recent years in the search for ever larger prime Generalized Fermat Numbers (GFN). We briefly review the history of the GFN prime search, and describe new implementations of the ‘Genefer’ software (now available as open source) using CUDA and optimised CPU assembler which have underpinned this unprecedented progress. The results of the ongoing search are used to extend Gallot and Dubner’s published tables comparing the theoretical predictions with actual distributions of primes, and we report on recent discoveries of GFN primes with over one million digits. Keywords: Generalized Fermat Numbers, Primality Testing, Volunteer Computing, Computational Mathematics, GPU Computing, CUDA 1 Background Computational number theory and in particular the search for large prime numbers has grown steadily in popularity over the last two decades. Led by projects like the Great Internet Mersenne Prime Search (GIMPS), tens of thousands of volunteers now contribute computer time in support of projects such as “Seventeen or Bust” - attempting to solve the Sierpiński Problem [9] - and searches for primes of particular types including Proth (k · 2n + 1, k < 2n ), Riesel (k · 2n − 1, k < 2n ), Cullen (n · 2n + 1) and Woodall (n · 2n − 1) primes. Many of these prime searches are coordinated by the PrimeGrid [10] project, which uses the Berkeley Open Infrastructure for Network Computing (BOINC) [1] to allow client computers to download, process, and return work units consisting of primality tests or sieving. The Generalized Fermat Numbers (GFN) are defined as having the form n Fb,n = b2 + 1. Starting in 2000 Yves Gallot led a very active and well-organised distributed search for GFN primes using his ‘Proth’ and ‘Genefer’ programs. Many GFN primes were found with over 100,000 digits and preliminary results were published in a seminal paper by Gallot and Dubner [5] in 2002. However, by 2004 the project drew to a gradual conclusion with the exception of a few individual searchers. 2 In 2009 PrimeGrid restarted the GFN search beginning from where the previous effort left off, searching n ≥ 15. Due in part to increased CPU power, a very large user base and improvements to Gallot’s software, exceptional progress has been made to date, which we report hereafter. 2 2.1 Software for GFN Searching PRP Testing During the early years of the GFN search Gallot’s original C program ‘Genefer’ was used to perform probable primality (PRP) tests on GFNs. An overview of the implementation of the PRP test employed is described by Gallot and Dubner [5] and details of FFT multiplication modulo Fermat numbers are given by Crandall and Fagin [4]. The program was later modified by Gallot and David Underbakke, rewriting the critical numerical routines (FFT and modular reduction) using Intel assembly language. One variant ‘Genefer80’ made use of the Intel x87 instruction set, which allows use of the extended 80-bit precision of the x87 Floating Point Unit compared to the standard 64-bit ‘double precision’ of the x86 FPU. By taking care to ensure all intermediate values are stored at this higher precision, much larger values of b can be tested for a given n before encountering round-off errors in the conversion from floating-point back into integer representation. Although slightly slower than the C implementation, the ability to test larger b values has been invaluable as the search for n ≤ 16 has now passed the b limit of ‘Genefer’ (see Table 2 for the current search limits). Similarly, ‘Genefx64’ uses the SSE2 vector instruction set, allowing modern CPUs to compute the FFT at nearly twice the speed of ‘Genefer’ with similar accuracy. Since all Intel 64-bit processors support SSE2, the original C implementation is now essentially obsolete, only used by the few remaining 32-bit processors participating in the search. The speeds and b limits of each of these variants are compared in Table 1. When PrimeGrid restarted the GFN prime search in 2009, the Genefer applications were extended with a checkpoint/restart capability and integrated with Mark Rodenkirch’s PRPNet software which coordinated the distribution of PRP tests to client computers and the recording and reporting of results. Initially, the ‘Genefer80’ and ‘Genefx64’ applications were only available for the MS Windows platform, and testing began for n = 15, 16, 18, 19 (n = 17 continues to be searched independently by participants in the original GFN prime search). The authors’ contributions to the development of these programs began with the porting of the ‘Genefer80’ and ‘Genefx64’ assembly codes from Intel-syntax to AT&T/GNU syntax, allowing these to be compiled using the GNU GCC Compiler and made available for Mac OS X and Linux platforms. At the same time, an initial port of ‘Genefer’ was developed by Shoichiro Yamada using Nvidia’s ‘Compute Unified Device Architectue’ (CUDA) programming model, and subsequently optimised and extended by the authors. For a comprehensive overview of CUDA and Graphics Processing Units (GPUs), we refer the reader to Nickolls et al [8]. For our purposes it suffices to say that many modern computers 3 Table 1. b limits and performance (ms per multiplication) of Genefer variants for selected n. Tests performed on Intel Core 2 Quad 2.4 GHz running Window 7 Pro 64 bit, with an Nvidia GTX460 1350MHz (Driver 285.86). n 15 17 19 21 22 Genefer80 Genefer Genefx64 GeneferCUDA b limit t (ms) b limit t (ms) b limit t (ms) b limit t (ms) 67,210,000 2.34 1,630,000 1.67 1,575,000 0.912 1,840,000 0.212 45,450,000 11.2 1,095,000 7.54 1,060,000 4.05 1,270,000 0.601 30,020,000 57.4 695,000 35.3 735,000 19.3 815,000 1.98 20,250,000 277 490,000 175 515,000 102 580,000 8.23 - 480,000 16.5 contains GPUs providing performance of 100 to 1000 GFLOPS (billion floating point operations per second), compared with around 10 GFLOPS from a typical CPU core. The FFT operation in ‘GeneferCUDA’ is performed by Nvidia’s CUFFT library, and in order to minimise the cost of repeatedly transferring data to and from the GPU, the remaining steps in the calculation loop have been ported to CUDA kernels able to run on the GPU. As shown in Table 1 this results in significant speedups (4.3x faster than Genefx64 for n = 15, and 9.7x for n = 19). More importantly, however, the advent of ‘GeneferCUDA’ has allowed larger values of n to be tackled that would take prohibitively long on a CPU. For example, a typical test at n = 22 that takes around a week on a GPU would take over 3 months on a CPU! Testing of GFN for n = 22 has already begun, and results of the search so far are reported in section 3. The introduction of the CUDA code in the n = 19 search has increased the rate of progress so much that it we have also been able to start searching the n = 20 range. Most recently, in early 2012, the authors added support for BOINC directly in our code, allowing the GFN prime search to be offered via the PrimeGrid BOINC project rather than requiring participants to install the PRPNet client. All the ‘Genefer’ variants have been unified into a single program, allowing a single consistent interface independent of the actual calculation method employed. In addition, this will make the development of any additional FFT implementations much easier, and will facilitate future maintainability of the software. Finally, we have made our programs freely available in both source and binary forms from https://www.assembla.com/spaces/genefer, which we believe is a significant contribution to the community. 2.2 Sieving Despite the excellent performance obtained with recent versions of ‘Genefer’, in common with other prime searches to efficiently search a large range of candidates (here the b values to be tested for each n in Fb,n ) we employ a sieve to remove candidates which have ‘small’ prime factors. The sieving algorithm used was developed by Phil Carmody [3]. Deciding exactly when to stop sieving the depth of the sieve - is a function of the relative speed at which the sieving program can find factors compared to the rate at which the primality testing 4 Table 2. Contiguous search limits and largest known primes for each n. n b limit (Sep 2013) Largest Prime 15 6,961,316 1554729632768 + 1 16 3,196,780 1950221265536 + 1 17 1,166,000 1372930131072 + 1 18 1,024,466 773620262144 + 1 19 750,244 475856524288 + 1 20 201,460 22 10,428 - Date Jul 2011 Jan 2005 Sep 2003 Feb 2012 Aug 2012 - Decimal digits 235,657 477,763 804,474 1,528,413 2,976,663 - program can test the remaining candidates. Initially, we carried out sieving using the ‘AthGFNSv’ program developed by Underbakke, Gallot and Carmody. However, in May 2012 a CUDA sieving program ‘GFNSvCUDA’ was implemented by Anand Nair, which was dramatically faster than the existing CPU sieve. For example, at n = 19, several years of sieving on CPUs had reached a depth of 3070P (i.e. trial factors up to 3.07 × 1015 had been checked). Within the first 6 months of sieving on GPUs, a depth of 19100P has been reached (including a re-check of the original 3070P), and the sieving effort stopped as it is now more efficient to PRP test the remaining candidates directly. 3 Distribution of Large GFN Primes To date, PrimeGrid is actively searching 15 ≤ n ≤ 22, with the exception of n = 17 which is reserved by independent searchers. The n = 21 case is still in the process of sieving, but good progress has been made in primality testing the other n, which we summarise in Table 2. Note that for n = 15, 16, 17 the largest known GFN prime is significantly beyond the current b reported. This represents the fact that while every b below the reported values is known to have been tested, individual searchers have tested small ranges far in advance of the current organised search limit. In their 2002 paper [5] Gallot and Dubner presented a method for calculating the expected number of GFN primes for each n up to a particular limit of b. They showed excellent agreement between the predicted and the actual numbers of primes found for n ≤ 12, b ≤ 106 and n = 13, 14, b ≤ 104 based on the then current search limits. We have calculated the expected numbers of GFN primes for each n up to our new search limits using Gallot’s method and compared with the actual numbers of primes found to date in Table 3. For ease of comparison with Gallot and Dubner’s tables, we also report the difference between estimated and actual numbers of primes in terms of standard deviations. In addition to PrimeGrid’s database, the Largest Known Primes Database [2] was used to provide data for smaller b and n values. We observe that while most of the findings are broadly in line with the predicted values (indeed, over 50% of the errors are less than one standard deviation), there appear to be significant excesses of GFN primes for n = 18, 19, 5 Table 3. Comparison of predicted and actual number of GFN primes for 13 ≤ n ≤ 22 up to current search limits n 2 8192 16384 32768 65536 131072 262144 524288 1048576 .. . 4194304 b ≤ 105 b ≤ 106 Est. Act. Err. Est. Act. Err. 10 3 -2.2 81 74 -0.8 5 1 -1.7 38 33 -0.9 2 1 -0.5 14 16 0.6 2 1 -0.5 13 14 0.2 1 1 0.2 7 5 -0.6 0 2 2.2 4 7 1.5 0 1 1.6 2 0 1 .. .. . . 0 0 - Search Limit b Est. Act. 13,000,000 764 730 4,560,000 156 137 6,961,000 84 91 3,196,000 35 38 1,166,000 8 7 1,024,000 4 7 750,000 2 4 201,460 0 0 .. . 10,428 0 0 Err. -1.2 -1.5 0.8 0.5 -0.4 1.5 2.0 0.0 .. . 0.0 particularly for small b. Unfortunately, with the current b limits, the number of primes is too low to assess the probability that the predicted distribution of primes is correct via the Chi Squared Test. Nevertheless, it is still possible to check the validity of the prediction, since if Gallot’s expression for the number of GFN primes for given b, n was too small then we should see that more candidates remain after sieving than expected. Dubner and Keller [6] showed that a given prime p = k · 2n+1 + 1 divides Fb,n with probability 2n /p (averaged over all b). Thus if we sieve R GFNs with all potential divisors p < pmax , the number of expected candidates is Y (1 − p<pmax 2n ) · R, p p ≡ 1 mod 2n+1 (1) Applying Mertens’ 3rd theorem [7] we have Y (1 − p<pmax 2n 2Cn )= γ p e log(pmax ) (2) where Cn = an (p) p ) , − p1 ) Y (1 − p6=2 (1 ( an (p) = 2n 0 if p ≡ 1 mod 2n+1 , otherwise. (3) So sieving the GFNs Fb,n , b ∈ [2, Bmax ] we expect the number of candidates remaining to be e−γ Cn Bmax / log(pmax ) (4) As shown in Table 4, we find excellent agreement between the expected and actual number of candidates remaining after sieving. As a result, we assert that the excess of primes for n = 18, 19 is no more than a statistical anomaly. Further 6 Table 4. Expected and actual candidates remaining after sieving to a depth of pmax n pmax 18 19 20 21 22 2.510 · 1018 1.855 · 1019 1.985 · 1019 1.935 · 1019 2.120 · 1019 Candidates remaining Expected Actual 17,228,044 17,300,322 16,577,985 16,546,522 18,321,722 18,342,741 20,355,000 20,378,158 21,953,527 21,952,320 Table 5. GFN mega-primes found by PrimeGrid GFN 475856524288 + 1 356926524288 + 1 341112524288 + 1 75898524288 + 1 773620262144 + 1 676754262144 + 1 525094262144 + 1 361658262144 + 1 145310262144 + 1 40734262144 + 1 9 · 23497442 + 1* 81 · 23352924 + 1* Digits 2,976,633 2,911,151 2,900,832 2,558,647 1,543,643 1,528,413 1,499,526 1,457,075 1,353,265 1,208,473 1,052,836 1,009,333 Finder Masashi Kumagai Tim McArdle Peyton Hayslette Michael Goetz Senji Yamashita Carlos Loureiro David Tomecko Michel Johnson Ricky L Hubbard Senji Yamashita Heinz Ming Michal Gasewicz Date Aug 2012 Jul 2012 Jun 2012 Nov 2011 Apr 2012 Feb 2012 Jan 2012 Nov 2011 Feb 2011 Mar 2011 Oct 2012 Jan 2012 Software GeneferCUDA Genefx64 GeneferCUDA GeneferCUDA GeneferCUDA GeneferCUDA GeneferCUDA GeneferCUDA Genefx64 Genefx64 LLR LLR searching at these n, as well as n = 20, 21, 22 for which we currently have little data, will be needed to confirm or refute this. 4 GFN Mega-Primes As a result of the aforementioned extensions to the ‘Genefer’ program and wide participation in the search since it was made available through the BOINC platform we have made rapid progress to high b values, particularly for n ≥ 18 where the CUDA implementation has been used. Consequently we have discovered a number of GFN mega-primes (primes with over 1 million decimal digits), and they are listed in Table 5. Note that the two primes marked with an asterisk were found by PrimeGrid’s Proth prime search, rather than the GFN search, but since they can be expressed as GFNs with n = 1 they are included for completeness. Prior to our search efforts, only one GFN mega-prime was known 24518262144 + 1, with 1,150,678 digits - found in March 2008 by Stephen Scott, searching independently. 5 Continuing the Search The results reported above are only a snapshot in time from an ongoing, popular prime search project. We intend to continue the search for large GFN primes 7 for all n ≥ 15, including n = 21 which is currently unsearched. Of particular interest to many participants is the search at n = 22, where the current GFNs being tested have decimal lengths of over 17.1 million digits, close to the size of the largest known prime 257885161 − 1 (17.4 million digits). The b limit of ‘GeneferCUDA’ for n = 22 corresponds to GFNs of 23 million digits, meaning that a prime found during this search has a chance of becoming the largest known prime of any kind, a position that has been held solely by Mersenne primes since the discovery of M756839 in 1992. In order to support the ongoing search we will continue to develop ‘Genefer’ to take advantage of the latest computing hardware. In particular versions able to take advantage of other non-Nvidia GPU hardware (for example using the OpenCL library) and Intel’s Advanced Vector Extensions (AVX) may prove invaluable in the search for a new world record GFN prime. Acknowledgements The first author acknowledges the support of NAIS, the Centre for Numerical Algorithms and Intelligent Software (EPSRC grant EP/G036136/1). We also wish to thank several people who have contributed to the GFN prime search. First, we thank Yves Gallot for popularising the search, developing the initial Genefer code upon which the entire project is based, and also for useful discussions concerning the purported excess of primes at large n (see Section 3). Second, we thank David Underbakke, Mark Rodenkirch, Ken Brazier, Shoichiro Yamada, Ronald Schneider and Anand Nair, who have all contributed to the ongoing development of the PRP testing and sieving software. Third, thanks go to the PrimeGrid team Rytis Slatkevicius, Lennart Vogel and John Blazek without whom the search would not have reached such a wide audience. Finally, we are grateful to all the ‘crunchers’ who have dedicated their computer resources and made possible the ongoing success of the search. References 1. Anderson, D.: BOINC: A system for public-resource computing and storage. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing. pp. 4–10. GRID ’04, IEEE Computer Society, Washington, DC, USA (2004), http: //dx.doi.org/10.1109/GRID.2004.14 2. Caldwell, C.: The prime pages - the largest known primes database. http: //primes.utm.edu 3. Carmody, P.: GFN filters. http://fatphil.org/maths/GFN/maths.html 4. Crandall, R., Fagin, B.: Discrete weighted transforms and large-integer arithmetic. Math. Comp. 62, 305–324 (1994) 5. Dubner, H., Gallot, Y.: Distribution of generalized fermat prime numbers. Math. Comp. 71, 825–832 (2002) 6. Dubner, H., Keller, W.: Factors of generalized fermat numbers. Math. Comp. 64, 397–405 (1995) 7. Mertens, F.: Ein beitrag zur analytischen zahlentheorie. J. reine angew. Math 78, 46–62 (1874) 8 8. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6(2), 40–53 (Mar 2008), http://doi.acm.org/10.1145/ 1365490.1365500 9. Sierpiński, W.: Sur un problème concernant les nombres k . 2n + 1. Elem. Math. 115, 73–74 (1960) 10. Slatkevicius, R., Vogel, L., Blazek, J.: PrimeGrid website. http://www.primegrid. com