Extending the Generalized Fermat Prime Number Search

advertisement
Extending the Generalized Fermat Prime
Number Search Beyond One Million Digits
Using GPUs
Iain Bethune1 and Michael Goetz2
1
EPCC, The University of Edinburgh, James Clerk Maxwell Building, The King’s
Buildings, Mayfield Road, Edinburgh, EH9 3JZ, UK, ibethune@epcc.ed.ac.uk,
http://www.epcc.ed.ac.uk/~ibethune
2
PrimeGrid, mgoetz@primegrid.com, http://www.primegrid.com
Abstract. Great strides have been made in recent years in the search for
ever larger prime Generalized Fermat Numbers (GFN). We briefly review
the history of the GFN prime search, and describe new implementations
of the ‘Genefer’ software (now available as open source) using CUDA and
optimised CPU assembler which have underpinned this unprecedented
progress. The results of the ongoing search are used to extend Gallot and
Dubner’s published tables comparing the theoretical predictions with
actual distributions of primes, and we report on recent discoveries of
GFN primes with over one million digits.
Keywords: Generalized Fermat Numbers, Primality Testing, Volunteer
Computing, Computational Mathematics, GPU Computing, CUDA
1
Background
Computational number theory and in particular the search for large prime numbers has grown steadily in popularity over the last two decades. Led by projects
like the Great Internet Mersenne Prime Search (GIMPS), tens of thousands
of volunteers now contribute computer time in support of projects such as
“Seventeen or Bust” - attempting to solve the Sierpiński Problem [9] - and
searches for primes of particular types including Proth (k · 2n + 1, k < 2n ), Riesel
(k · 2n − 1, k < 2n ), Cullen (n · 2n + 1) and Woodall (n · 2n − 1) primes. Many
of these prime searches are coordinated by the PrimeGrid [10] project, which
uses the Berkeley Open Infrastructure for Network Computing (BOINC) [1] to
allow client computers to download, process, and return work units consisting
of primality tests or sieving.
The Generalized Fermat Numbers (GFN) are defined as having the form
n
Fb,n = b2 + 1. Starting in 2000 Yves Gallot led a very active and well-organised
distributed search for GFN primes using his ‘Proth’ and ‘Genefer’ programs.
Many GFN primes were found with over 100,000 digits and preliminary results
were published in a seminal paper by Gallot and Dubner [5] in 2002. However,
by 2004 the project drew to a gradual conclusion with the exception of a few
individual searchers.
2
In 2009 PrimeGrid restarted the GFN search beginning from where the previous effort left off, searching n ≥ 15. Due in part to increased CPU power, a
very large user base and improvements to Gallot’s software, exceptional progress
has been made to date, which we report hereafter.
2
2.1
Software for GFN Searching
PRP Testing
During the early years of the GFN search Gallot’s original C program ‘Genefer’
was used to perform probable primality (PRP) tests on GFNs. An overview of
the implementation of the PRP test employed is described by Gallot and Dubner [5] and details of FFT multiplication modulo Fermat numbers are given by
Crandall and Fagin [4]. The program was later modified by Gallot and David
Underbakke, rewriting the critical numerical routines (FFT and modular reduction) using Intel assembly language. One variant ‘Genefer80’ made use of the
Intel x87 instruction set, which allows use of the extended 80-bit precision of
the x87 Floating Point Unit compared to the standard 64-bit ‘double precision’
of the x86 FPU. By taking care to ensure all intermediate values are stored at
this higher precision, much larger values of b can be tested for a given n before
encountering round-off errors in the conversion from floating-point back into integer representation. Although slightly slower than the C implementation, the
ability to test larger b values has been invaluable as the search for n ≤ 16 has now
passed the b limit of ‘Genefer’ (see Table 2 for the current search limits). Similarly, ‘Genefx64’ uses the SSE2 vector instruction set, allowing modern CPUs to
compute the FFT at nearly twice the speed of ‘Genefer’ with similar accuracy.
Since all Intel 64-bit processors support SSE2, the original C implementation
is now essentially obsolete, only used by the few remaining 32-bit processors
participating in the search. The speeds and b limits of each of these variants are
compared in Table 1.
When PrimeGrid restarted the GFN prime search in 2009, the Genefer applications were extended with a checkpoint/restart capability and integrated
with Mark Rodenkirch’s PRPNet software which coordinated the distribution of
PRP tests to client computers and the recording and reporting of results. Initially, the ‘Genefer80’ and ‘Genefx64’ applications were only available for the MS
Windows platform, and testing began for n = 15, 16, 18, 19 (n = 17 continues to
be searched independently by participants in the original GFN prime search).
The authors’ contributions to the development of these programs began with
the porting of the ‘Genefer80’ and ‘Genefx64’ assembly codes from Intel-syntax
to AT&T/GNU syntax, allowing these to be compiled using the GNU GCC Compiler and made available for Mac OS X and Linux platforms. At the same time,
an initial port of ‘Genefer’ was developed by Shoichiro Yamada using Nvidia’s
‘Compute Unified Device Architectue’ (CUDA) programming model, and subsequently optimised and extended by the authors. For a comprehensive overview
of CUDA and Graphics Processing Units (GPUs), we refer the reader to Nickolls et al [8]. For our purposes it suffices to say that many modern computers
3
Table 1. b limits and performance (ms per multiplication) of Genefer variants for
selected n. Tests performed on Intel Core 2 Quad 2.4 GHz running Window 7 Pro 64
bit, with an Nvidia GTX460 1350MHz (Driver 285.86).
n
15
17
19
21
22
Genefer80
Genefer
Genefx64
GeneferCUDA
b limit t (ms)
b limit t (ms)
b limit t (ms)
b limit t (ms)
67,210,000 2.34 1,630,000 1.67 1,575,000 0.912 1,840,000 0.212
45,450,000 11.2 1,095,000 7.54 1,060,000 4.05 1,270,000 0.601
30,020,000 57.4 695,000 35.3 735,000 19.3 815,000 1.98
20,250,000
277 490,000
175 515,000
102 580,000 8.23
- 480,000 16.5
contains GPUs providing performance of 100 to 1000 GFLOPS (billion floating
point operations per second), compared with around 10 GFLOPS from a typical CPU core. The FFT operation in ‘GeneferCUDA’ is performed by Nvidia’s
CUFFT library, and in order to minimise the cost of repeatedly transferring
data to and from the GPU, the remaining steps in the calculation loop have
been ported to CUDA kernels able to run on the GPU. As shown in Table 1
this results in significant speedups (4.3x faster than Genefx64 for n = 15, and
9.7x for n = 19). More importantly, however, the advent of ‘GeneferCUDA’ has
allowed larger values of n to be tackled that would take prohibitively long on a
CPU. For example, a typical test at n = 22 that takes around a week on a GPU
would take over 3 months on a CPU! Testing of GFN for n = 22 has already
begun, and results of the search so far are reported in section 3. The introduction
of the CUDA code in the n = 19 search has increased the rate of progress so
much that it we have also been able to start searching the n = 20 range.
Most recently, in early 2012, the authors added support for BOINC directly in
our code, allowing the GFN prime search to be offered via the PrimeGrid BOINC
project rather than requiring participants to install the PRPNet client. All the
‘Genefer’ variants have been unified into a single program, allowing a single
consistent interface independent of the actual calculation method employed. In
addition, this will make the development of any additional FFT implementations
much easier, and will facilitate future maintainability of the software. Finally, we
have made our programs freely available in both source and binary forms from
https://www.assembla.com/spaces/genefer, which we believe is a significant
contribution to the community.
2.2
Sieving
Despite the excellent performance obtained with recent versions of ‘Genefer’, in
common with other prime searches to efficiently search a large range of candidates (here the b values to be tested for each n in Fb,n ) we employ a sieve to
remove candidates which have ‘small’ prime factors. The sieving algorithm used
was developed by Phil Carmody [3]. Deciding exactly when to stop sieving the depth of the sieve - is a function of the relative speed at which the sieving
program can find factors compared to the rate at which the primality testing
4
Table 2. Contiguous search limits and largest known primes for each n.
n b limit (Sep 2013) Largest Prime
15
6,961,316 1554729632768 + 1
16
3,196,780 1950221265536 + 1
17
1,166,000 1372930131072 + 1
18
1,024,466 773620262144 + 1
19
750,244 475856524288 + 1
20
201,460 22
10,428 -
Date
Jul 2011
Jan 2005
Sep 2003
Feb 2012
Aug 2012
-
Decimal digits
235,657
477,763
804,474
1,528,413
2,976,663
-
program can test the remaining candidates. Initially, we carried out sieving using
the ‘AthGFNSv’ program developed by Underbakke, Gallot and Carmody. However, in May 2012 a CUDA sieving program ‘GFNSvCUDA’ was implemented
by Anand Nair, which was dramatically faster than the existing CPU sieve. For
example, at n = 19, several years of sieving on CPUs had reached a depth of
3070P (i.e. trial factors up to 3.07 × 1015 had been checked). Within the first 6
months of sieving on GPUs, a depth of 19100P has been reached (including a
re-check of the original 3070P), and the sieving effort stopped as it is now more
efficient to PRP test the remaining candidates directly.
3
Distribution of Large GFN Primes
To date, PrimeGrid is actively searching 15 ≤ n ≤ 22, with the exception of
n = 17 which is reserved by independent searchers. The n = 21 case is still in
the process of sieving, but good progress has been made in primality testing
the other n, which we summarise in Table 2. Note that for n = 15, 16, 17 the
largest known GFN prime is significantly beyond the current b reported. This
represents the fact that while every b below the reported values is known to have
been tested, individual searchers have tested small ranges far in advance of the
current organised search limit.
In their 2002 paper [5] Gallot and Dubner presented a method for calculating
the expected number of GFN primes for each n up to a particular limit of b.
They showed excellent agreement between the predicted and the actual numbers of primes found for n ≤ 12, b ≤ 106 and n = 13, 14, b ≤ 104 based on the
then current search limits. We have calculated the expected numbers of GFN
primes for each n up to our new search limits using Gallot’s method and compared with the actual numbers of primes found to date in Table 3. For ease of
comparison with Gallot and Dubner’s tables, we also report the difference between estimated and actual numbers of primes in terms of standard deviations.
In addition to PrimeGrid’s database, the Largest Known Primes Database [2]
was used to provide data for smaller b and n values.
We observe that while most of the findings are broadly in line with the
predicted values (indeed, over 50% of the errors are less than one standard
deviation), there appear to be significant excesses of GFN primes for n = 18, 19,
5
Table 3. Comparison of predicted and actual number of GFN primes for 13 ≤ n ≤ 22
up to current search limits
n
2
8192
16384
32768
65536
131072
262144
524288
1048576
..
.
4194304
b ≤ 105
b ≤ 106
Est. Act. Err. Est. Act. Err.
10
3 -2.2 81 74 -0.8
5
1 -1.7 38 33 -0.9
2
1 -0.5 14 16 0.6
2
1 -0.5 13 14 0.2
1
1 0.2
7
5 -0.6
0
2 2.2
4
7 1.5
0
1 1.6
2
0
1
..
..
.
.
0
0
-
Search Limit
b Est. Act.
13,000,000 764 730
4,560,000 156 137
6,961,000 84 91
3,196,000 35 38
1,166,000
8
7
1,024,000
4
7
750,000
2
4
201,460
0
0
..
.
10,428
0
0
Err.
-1.2
-1.5
0.8
0.5
-0.4
1.5
2.0
0.0
..
.
0.0
particularly for small b. Unfortunately, with the current b limits, the number
of primes is too low to assess the probability that the predicted distribution of
primes is correct via the Chi Squared Test. Nevertheless, it is still possible to
check the validity of the prediction, since if Gallot’s expression for the number of
GFN primes for given b, n was too small then we should see that more candidates
remain after sieving than expected.
Dubner and Keller [6] showed that a given prime p = k · 2n+1 + 1 divides
Fb,n with probability 2n /p (averaged over all b). Thus if we sieve R GFNs with
all potential divisors p < pmax , the number of expected candidates is
Y
(1 −
p<pmax
2n
) · R,
p
p ≡ 1 mod 2n+1
(1)
Applying Mertens’ 3rd theorem [7] we have
Y
(1 −
p<pmax
2n
2Cn
)= γ
p
e log(pmax )
(2)
where
Cn =
an (p)
p )
,
− p1 )
Y (1 −
p6=2
(1
(
an (p) =
2n
0
if p ≡ 1 mod 2n+1 ,
otherwise.
(3)
So sieving the GFNs Fb,n , b ∈ [2, Bmax ] we expect the number of candidates
remaining to be
e−γ Cn Bmax / log(pmax )
(4)
As shown in Table 4, we find excellent agreement between the expected and
actual number of candidates remaining after sieving. As a result, we assert that
the excess of primes for n = 18, 19 is no more than a statistical anomaly. Further
6
Table 4. Expected and actual candidates remaining after sieving to a depth of pmax
n
pmax
18
19
20
21
22
2.510 · 1018
1.855 · 1019
1.985 · 1019
1.935 · 1019
2.120 · 1019
Candidates remaining
Expected Actual
17,228,044 17,300,322
16,577,985 16,546,522
18,321,722 18,342,741
20,355,000 20,378,158
21,953,527 21,952,320
Table 5. GFN mega-primes found by PrimeGrid
GFN
475856524288 + 1
356926524288 + 1
341112524288 + 1
75898524288 + 1
773620262144 + 1
676754262144 + 1
525094262144 + 1
361658262144 + 1
145310262144 + 1
40734262144 + 1
9 · 23497442 + 1*
81 · 23352924 + 1*
Digits
2,976,633
2,911,151
2,900,832
2,558,647
1,543,643
1,528,413
1,499,526
1,457,075
1,353,265
1,208,473
1,052,836
1,009,333
Finder
Masashi Kumagai
Tim McArdle
Peyton Hayslette
Michael Goetz
Senji Yamashita
Carlos Loureiro
David Tomecko
Michel Johnson
Ricky L Hubbard
Senji Yamashita
Heinz Ming
Michal Gasewicz
Date
Aug 2012
Jul 2012
Jun 2012
Nov 2011
Apr 2012
Feb 2012
Jan 2012
Nov 2011
Feb 2011
Mar 2011
Oct 2012
Jan 2012
Software
GeneferCUDA
Genefx64
GeneferCUDA
GeneferCUDA
GeneferCUDA
GeneferCUDA
GeneferCUDA
GeneferCUDA
Genefx64
Genefx64
LLR
LLR
searching at these n, as well as n = 20, 21, 22 for which we currently have little
data, will be needed to confirm or refute this.
4
GFN Mega-Primes
As a result of the aforementioned extensions to the ‘Genefer’ program and wide
participation in the search since it was made available through the BOINC platform we have made rapid progress to high b values, particularly for n ≥ 18 where
the CUDA implementation has been used. Consequently we have discovered a
number of GFN mega-primes (primes with over 1 million decimal digits), and
they are listed in Table 5. Note that the two primes marked with an asterisk
were found by PrimeGrid’s Proth prime search, rather than the GFN search,
but since they can be expressed as GFNs with n = 1 they are included for completeness. Prior to our search efforts, only one GFN mega-prime was known 24518262144 + 1, with 1,150,678 digits - found in March 2008 by Stephen Scott,
searching independently.
5
Continuing the Search
The results reported above are only a snapshot in time from an ongoing, popular
prime search project. We intend to continue the search for large GFN primes
7
for all n ≥ 15, including n = 21 which is currently unsearched. Of particular
interest to many participants is the search at n = 22, where the current GFNs
being tested have decimal lengths of over 17.1 million digits, close to the size
of the largest known prime 257885161 − 1 (17.4 million digits). The b limit of
‘GeneferCUDA’ for n = 22 corresponds to GFNs of 23 million digits, meaning
that a prime found during this search has a chance of becoming the largest known
prime of any kind, a position that has been held solely by Mersenne primes since
the discovery of M756839 in 1992.
In order to support the ongoing search we will continue to develop ‘Genefer’
to take advantage of the latest computing hardware. In particular versions able
to take advantage of other non-Nvidia GPU hardware (for example using the
OpenCL library) and Intel’s Advanced Vector Extensions (AVX) may prove
invaluable in the search for a new world record GFN prime.
Acknowledgements The first author acknowledges the support of NAIS, the
Centre for Numerical Algorithms and Intelligent Software (EPSRC grant EP/G036136/1).
We also wish to thank several people who have contributed to the GFN prime
search. First, we thank Yves Gallot for popularising the search, developing the
initial Genefer code upon which the entire project is based, and also for useful
discussions concerning the purported excess of primes at large n (see Section 3).
Second, we thank David Underbakke, Mark Rodenkirch, Ken Brazier, Shoichiro
Yamada, Ronald Schneider and Anand Nair, who have all contributed to the
ongoing development of the PRP testing and sieving software. Third, thanks
go to the PrimeGrid team Rytis Slatkevicius, Lennart Vogel and John Blazek
without whom the search would not have reached such a wide audience. Finally,
we are grateful to all the ‘crunchers’ who have dedicated their computer resources
and made possible the ongoing success of the search.
References
1. Anderson, D.: BOINC: A system for public-resource computing and storage. In:
Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing.
pp. 4–10. GRID ’04, IEEE Computer Society, Washington, DC, USA (2004), http:
//dx.doi.org/10.1109/GRID.2004.14
2. Caldwell, C.: The prime pages - the largest known primes database. http:
//primes.utm.edu
3. Carmody, P.: GFN filters. http://fatphil.org/maths/GFN/maths.html
4. Crandall, R., Fagin, B.: Discrete weighted transforms and large-integer arithmetic.
Math. Comp. 62, 305–324 (1994)
5. Dubner, H., Gallot, Y.: Distribution of generalized fermat prime numbers. Math.
Comp. 71, 825–832 (2002)
6. Dubner, H., Keller, W.: Factors of generalized fermat numbers. Math. Comp. 64,
397–405 (1995)
7. Mertens, F.: Ein beitrag zur analytischen zahlentheorie. J. reine angew. Math 78,
46–62 (1874)
8
8. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6(2), 40–53 (Mar 2008), http://doi.acm.org/10.1145/
1365490.1365500
9. Sierpiński, W.: Sur un problème concernant les nombres k . 2n + 1. Elem. Math.
115, 73–74 (1960)
10. Slatkevicius, R., Vogel, L., Blazek, J.: PrimeGrid website. http://www.primegrid.
com
Download