Generating RSA Primes Jim Townsend CSE633

advertisement
Generating RSA
Primes
Jim Townsend
CSE633
Final Results
Fall 2010
Importance
• Encryption is harder to secure than ever
• RSA is an important standard in Public Key Encryption
• Developed in 1977, it began with relatively small keys –
128,256 bit keys
• Current standard: 1048 bit keys (310 decimal digits)
• Math on these numbers is very CPU intensive
How Keys are Generated
•
•
•
•
•
•
•
Use the Miller-Rabin algorithm
Tests against a specific few numbers
Only a probabilistic method
Probability a number is prime: .75
Repeated passes used to eliminate false positives
16 repetitions: (1-.75)^16
Runtime: O(ln(N)^4)
Sieve of Eratosphenes
• Decided to implement a small sieve on the numbers before
using the Miller-Rabin algorithm
• Using all the prime numbers less than 1000 (168 numbers),
see if any of those evenly divide the number first
• Decreased serial runtime by more than half
Current Program
• The program takes in two strings: a starting value and a range
• Runs a sieve on the range with the first 168 primes
• Uses the remaining numbers and tests them with the MillerRabin algorithm up to 16 times on each.
Serial Results
Serial Results
•
•
•
•
Finding small numbers was relatively fast
Found 2263 primes 20 digits long in just .68 seconds
Large numbers are a different story:
310 digits (Current RSA standard) took 27.01 seconds to find
only 118 primes
Parallel Algorithm
• Divided the range among each processor
• Each node checked its set and reported the number of primes
it found
• Final reduction to sum up the count
Gains
• Saw incredible speedup due to the minimal communication
needed
• Most of the real gains came from tweaking the serial
algorithm
• Using the sieve and only checking odd numbers
• Would see much more by using load balancing using OpenMP
Single Parallel vs Serial Algorithm
35
30
Time (s)
25
20
Single
15
Serial
10
5
0
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310
Number of Decimal Digits
All Parallel Test Runs
4
3.5
3
2.5
Time (s)
8 Cores
16 Cores
24 Cores
2
32 Cores
40 Cores
48 Cores
1.5
56 Cores
64 Cores
1
0.5
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Number of Decimal Digits
Total Speedup
50
45
40
Speedup Factor
35
30
310 Digits
25
240 Digits
180 Digits
120 Digits
20
60 Digits
15
10
5
0
1
8
16
24
32
40
Number of Cores
48
56
64
Efficiency: Ts/(P*Tp)
1
0.9
0.8
0.7
Percent
0.6
310 Digits
0.5
240 Digits
180 Digits
120 Digits
0.4
60 Digits
0.3
0.2
0.1
0
1
8
16
24
32
Cores
40
48
56
64
Future Work
• Could be more improved by load balancing the test with
OpenMP
• Exit on first failed test
• Much better synchronization would be possible
• Could also use this to divide the test into smaller pieces as
well
• Implementation in CUDA using GPGPUs
Any Questions?
Download