Introduction - Montclair State University

advertisement
GENERATING PRIME NUMBERS FOR
CRYPTOGRAPHY
By Ayesha Mohiuddin and Ramazan Burus
Abstract
The prime numbers used in most cryptographic algorithms are rather large; often
hundreds of digits in length or longer. And finding such prime numbers, even
with today’s modern machines takes a lot of time, many years even. Grid
computing, where we are able to use many processors to share the work load, can
help us speed up this process of finding prime numbers between very large
ranges.
For example, a very basic algorithm of finding prime numbers, where you check
whether any number smaller than the candidate is a divider of the candidate or
not, is a very slow method of finding primes. However, using many nodes to do
these calculations on different subsets of your range speeds up the process
considerably, especially, if you keep increasing the number of participating
nodes.
Introduction
Prime numbers, which are natural numbers only divisible by itself and by one,
have been the focus of many great mathematicians since Euclid’s time. Use of
prime numbers in cryptographic algorithms for data security purposes over the
internet has made prime numbers even more important. Example of such
cryptographic algorithms is Diffie-Helman key exchange, RSA algorithm, PKI
etc.
Prime numbers play an essential role in the art of public key cryptography but
generating these huge prime numbers can be a hassle, since it’s a very time
consuming task for computers. However, if we use multiple processors to share
this work load instead of using just one, we can achieve the results much earlier.
This kind of grid computing projects invites users to participate in the project by
downloading the client code which is responsible for calculation on a part of the
range.
1
Ideally, whenever the participating user’s computer is idle, this program will start
calculating prime within its allotted range, and send the result back to the master
server.
This is not a new technique, infact, grid computing is already being used to do
many calculation intensive projects such as analyzing data obtained from outer
space, creating simulation models, and even finding out largest prime numbers
etc.
In our project we wanted to implement a simple prime number generator using
the most basic algorithm i.e. to check if a number is divisible by any smaller
number than itself, but we will use grid structure to accomplish this task and see
how it speeds up this slow process.
Prime Numbers and Cryptography
Any positive natural number greater than 1, which is only divisible by 1 and
itself, is called a prime number, such as the numbers 2, 3, 5, 7 and 11. These
numbers have only two divisors; themselves and 1. The number 1 is a special
case which is considered neither prime nor composite. The nth prime number is
commonly denoted as Pn, so P1 = 2, P2 = 3, and so on. (Weisstein)
In a 1975 lecture, D. Zagier commented "There are two facts about the
distribution of prime numbers of which I hope to convince you so
overwhelmingly that they will be permanently engraved in your hearts. The first
is that, despite their simple definition and role as the building blocks of the
natural numbers, the prime numbers grow like weeds among the natural numbers,
seeming to obey no other law than that of chance, and nobody can predict where
the next one will sprout. The second fact is even more astonishing, for it states
just the opposite: that the prime numbers exhibit stunning regularity, that there
are laws governing their behavior, and that they obey these laws with almost
military precision" (Havil 2003, p. 171).
In today world where people rely heavily on the internet, the data security has
become a very important issue and computer professionals are always searching
for ways to implement secure methods for online data transfer. For example,
cryptography is used to transfer information securely and secretly.
There are two main types of cryptography: Secret Key and Public key
cryptography.
2
Secret Key cryptography is a very old method used by the ancient Romans and
Greeks. It requires both parties to agree on, or exchange a secret key that will be
used to encrypt the transferring data between the parties. Number of users is
usually small here since exchanging the key secretly is a difficult task.
Public Key cryptography, first proposed by Diffie and Hellman in 1976, is a
modern way to transfer encrypted data securely even for high number of users.
The keys come in pairs and no prior exchange of secret keys is required.
Therefore, one key can be exchanged publicly without compromising the other
pair. (A. Languasco, A. Perelli). Each participant has two keys, public(E) and
private(D), where the public key (enciphering key) is published to all users and
private key (deciphering key) is kept secret. It should be computationally
infeasible to derive D from E. Then your original message (P) can be encrypted
into the cipher text (C) by applying some formula using the keys.
C = SE (P)
P = HD(C)
One of the most common public key cryptosystem is called RSA (Rivest-ShamirAdelman) encryption introduced in 1978.
Two keys are used in RSA say e for encryption and d for decryption. One of
these is kept as a private key. The keys are obtained from two complementary
functions, say E and D, that undoes each other are chosen. The Plain text T is
encoded into Te mod n which makes factoring Te to uncover T very difficult. It
can be decrypted by a person knowing d by doing (Te) d mod n = T.
(Pfleeger and Pfleeger, p 75)
To find n, we have to choose two large prime numbers p and q. (typically 256
bits each & keep them secret). So, that n = p.q & Ø(n) = (p-1)(q-1).
Then e and d are such that e.d = 1 mod Ø(n).
So, your public key is <e,n> & private key is <d,n>
Since n is product of p and q , two very large prime numbers, it is hard to find
these factors, thus, increasing the security of this algorithm.
Cryptographic methods other than RSA also rely on Prime numbers, such as
Diffie- Hellman algorithm etc. Therefore, generating such huge prime numbers to
use is an important issue for cryptographers.
3
Finding Primes
The task of proving that a particular number is prime has taken on practical
importance as the use of public-key cryptography has become widespread. As
numbers get larger and larger the number of primes decreases.
Since prime numbers have only two dividers, itself and 1, therefore one very
basic way of checking if a number n is prime or not would be to see if any
number smaller than this number can divide it completely or not. If you find a
divider than n is not prime. This method take a lot of time to compute when n
becomes very large.
“There is no known method for rapidly and conclusively testing a given
number for primality. Until just recently, the algorithms available,
particularly those that could be executed in a reasonable amount of time,
could only conclusively exclude a number i.e. prove it is composite, or show
that a given integer might be a prime” (Crow).
Some of the methods to check for primality are mentioned below.
Fermat’s little theorem provides a fast method for proving a number p not a
prime. For any integer m and a possibly prime number p, if
mp mod p ≠ m mod p
then p is not prime. If the remainders are equal, p may be prime.
By repeatedly testing p using different values of m, we can increase the
probability of p being a prime. If tested long enough this probability becomes
almost 100%.
Sieve of Eratosthenes is another method to test a number for primality. This
method, named for a Greek mathematician of the third century BCE, is a trial and
error based method. We check if any prime number up to the square-root of the
number is its factor or not, that is, if any prime > 2 and < √n can not divide n
completely than n is a prime. “The sieve is slow and becomes computationally
expensive and time consuming as the magnitude of the number being tested
increases.” (Crow)
4
Using Grid Computing
Grid computing, using multiple computers to do a very intensive computation
makes it a very powerful tool. Any kind of application where intensive
processing or heavy mathematical computation is involved, can be done using
grids of client nodes to share the processing to speed up the job considerably.
The more the clients the faster it becomes.
Today our computing power is much more powerful than before, but still it is
not enough for processing algorithm where the running time is exponential.
Therefore, instead of using one, we could use more than one computer
coordinated and working together on one project. One such project was SETI
project in 1999. SETI at Home is a grid computing application that divides
signals received from space into tiny segments and sends them to millions of
computers worldwide for processing data. Some other examples where grid
computing can be used maybe: cracking a password that takes weeks, using
grids it will be a matter of minutes, or Distributed Denial of Service attacks
(allowing many coordinated computers to attack), or grids can be also used to
combat denial of service (DOS) attacks.
One Grid project related to prime numbers is the GIMPS project (The Great
Internet Mersenne Prime Search). Here grid computing is being used to find very
large prime numbers and to find prime factor of large numbers. Many users
participate by downloading their code and joining in the search of largest known
prime numbers. The latest discovery by one of their clients was the 41st known
Mersenne Prime, 224,036,583-1. The number is nearly a million digits larger than
our last find and is now the largest known prime number. These calculations took
just over two weeks on a client’s 2.4 GHz Pentium 4 computer.
(http://www.mersenne.org)
A Simple Grid Project to Find Primes
We can implement the most basic algorithm for generating prime numbers, by
checking if any smaller number than a number n is a divider of the number n or
not. But implement it using grid structure to generate primes between large
ranges in much shorter time.
Here we are not really interested in finding an efficient algorithm that
finds prime numbers very quickly, but we are interested in improving the
running time for any algorithm for finding prime numbers. If a more
efficient algorithm is used here, the run time will be even faster.
5
For this purpose let’s use the following Java code for checking for a prime
number:
//------------------------------------------------------------------// isPrime
// Returns true if the given number n is a prime
// else return false
//------------------------------------------------------------------public static boolean isPrime(long n)
{
if(n <= 1)
return false;
double limit = Math.sqrt(n);
for (long i = 2; i<=limit; i++)
{
if( n % i == 0) return false;
}
return true;
}
Here the input n will be generated by a loop that runs from the starting range to
ending range values. Therefore the project should have the following
characteristics.
Client side program responsibilities;




Connect to the database through internet,
Take a range of numbers to work on, communicate that the range
has been taken, and start calculating primes within that range.
Connect to database again for each found prime number and put
that into its corresponding table
When done communicate completion of task and take another
range for new calculations.
Administration Side responsibilities;

Assign different ranges to different clients and receive results in
tables.
6

Keep track of jobs, if a taken job is not done up to a certain time
by a node, then consider the node dead and re-assign the same
range to another client node. The new node should somehow start
from where old one left off.
Therefore, we are going to have the following three components in this simple
project:
Client
 Allotted a unique Id.
 Gets the range of numbers within it will generate the primes.
Master
 Keeps monitoring the activity.
 Re-assigns range to another client if original client does not complete
within its allotted time (limit is 1 day for our experiment).
Database
 Stores the client information and the resulted Prime numbers sent by the
clients. We are going to use oracle database in this example.
The grid structure that we are using is explained in the following diagram having
multiple client nodes, master and a database to store resulted primes.
Client 1
Master
Client 2
Client 3
Oracle Database
Grid Architecture
7
The range of numbers that we are going to assign to every incoming client node
is not going to be a fixed range. As the numbers increase calculation will become
long since we are checking all the smaller number for division with the candidate
numbers. The larger the number the more dividers to check for. Considering this,
we will keep our ranges in descending order as the numbers increase.
227
0
Now consider the tables that we are going to use in the database. We decided on
two simple tables; one for keeping track of the clients and their ranges called
clients, and another one for storing resulted prime numbers called primeResult.
The table clients is of the following structure:
-CleintId (Unique Primary key; stores the id of each client node)
-StartRange (starting range number for this client)
-End Range (ending range number for this client)
-LastPrime (Last prime found by this client until now, when range is reassigned to another node, this is where that new node will start its
calculation)
- TakenFalg (if 0 means this range has not been assigned, 1 means it has
been assigned, 2 means it has been completed)
-StartTime (stores the time when this client started working on this range)
An example fro this table is as follows:
CleintId StartRange EndRange LastPrime TakenFlag StartTime
--------- ------------- ----------- ----------- ------------ ----------12
833333
1666665
1666657
2
25-NOV-04
The other table primeResult simply stores the client id (CID) and the prime
number found by that client (Prime). Here is a snapshot of a row from this table.
CID Primes
____________
1
7
8
Clients
PrimeResult
ClientId
CID
StartRange
EndRange
LastPrime
TakenFlag
StartTime
Primes
ER-Diagram for database tables
Now the client code that we implemented uses Ojdbc to connect to the oracle
database to get its ranges and to store back the results. Of course TCP or UDP
packets could also have been used for transferring data.
//Load the jdbc-odbc bridge driver
Class.forName ("oracle.jdbc.driver.OracleDriver");
//Attempt to connect to a driver.
Connection con = DriverManager.getConnection
(URL, "Username", "password");
//Create Updatable Statement object for submitting
//SQL statements to the driver
Statement stmt =con.createStatement
(ResultSet.TYPE_SCROLL_SENSITIVE,
ResultSet.CONCUR_UPDATABLE);
It is a very simple code having just one class that connects to oracle reads the row
in table clients where the takenFlag is 0 (not taken by any other node). Once it
gets its ranges it updates the flag to 1 and starts the calculation on its assigned
ranges.
//Get rows with non-assigned ranges
query ="SELECT * FROM CLIENTS WHERE takenFlag = 0";
result = stmt.executeQuery (query);
if (result.next ())
{
clientId = result.getString(1);
cid = Integer.parseInt(clientId);
startRange = result.getString(2);
9
start = Integer.parseInt(startRange);
endRange = result.getString(3);
end = Integer.parseInt(endRange);
lastPrime = result.getString(4);
lprime = Integer.parseInt(lastPrime);
takenFlag = result.getString(5);
flag = Integer.parseInt(takenFlag);
}
//Set flag as taken (1) and enter current system time
//as start time for this client into the database.
query ="UPDATE clients SET takenFlag = 1,
starttime = (select sysdate from dual) WHERE
clientId="+clientId;
updateResult = stmt.executeUpdate(query);
//Close the statement
stmt.close();
//Close the connection
con.close();
//Start from last prime that was calculated for this range
if(lprime == 0 )
{
startcal = start;
}else{
startcal = lprime+1;
}
//Finding Primes with in given range
for (long i = startcal; i <= end; i++) {
if(isPrime(i)) {
//After connecting again to the database:
query ="INSERT INTO
primeResult_"+clientId+" VALUES
("+clientId+","+i+")";
updateResult = stmt.executeUpdate(query);
//Update last found prime in clients
query ="UPDATE clients SET lastprime =
10
"+i+" WHERE clientId="+clientId;
updateResult = stmt.executeUpdate(query);
stmt.close();
con.close();
}
}
//When range is done update flag to 2 (completed)
query ="UPDATE clients SET takenFlag = 2 WHERE
clientId="+clientId;
updateResult = stmt.executeUpdate(query);
When this code is being run from many different computers, each client can
connect to database and gets its ranges and then connect again to store result.
Also because of the way this has been implemented here, same node, when done
with its range, can come back and become a new client to do a new range of
numbers.
The master code that is checking if any node ran out of its allotted time is also
running on the same server that contains the database.
This master program keeps checking all the rows containing TakenFlag = 1 and
check if their StartTime is more than a day old or not. If it is a day old than it
resets its TakenFlag back to 0 so that another client can pick it up. If all the rows
containing all the sub ranges that we wanted to calculate are done, then a separate
table called permit containing a single entry called done is updated by master to 1
from its initial value 0. This way all the clients know that the project is completed
and they can stop now.
Conclusion
In conclusion finding prime numbers is still a slow process but not as slow as
before. By using only one computer with a platform of 4 CPU at 700 Mhz to
find prime numbers between 0 and 30,000,000 we were able to reduce running
time to 17 minutes by using threads. This speed is very good comparing to 2
hours that were required to do this without threads before. But now using the
grid structure the time is reduced considerably.
We were able to get following result using this simple grid structure that we
stated above. Our client side executable was about 1.29 MB, its memory usage
is about 10 MB, and the CPU usage was 7 to 10 %.
11
In 12 hours using only 6 nodes, which is a really small number, we were able to
find primes within the maximum range of 461 million. This speed can be
increased further by using more nodes, more efficient algorithm for finding
prime numbers and also using threads in case of that one of our clients may
have dual CPU in his/her system.
Many modifications and improvements can be done to this project. For example
using a better algorithm, using UDP packets to communicate with the master
instead of dealing directly with the database, etc. Also it would be better if the
client side code is wrapped into a screen saver, so that it only starts executing
when the clients computer is idle in order not to obstruct their own work.
Bibliography
Crow, Jerry. “Prime Numbers in Public Key Cryptography”, GSEC Practical
Assignment. SANS Institute 2003.
http://www.giac.org/practical/GSEC/Gerald_Crow_GSEC.pdf
GIMPS (The Great Internet Mersenne Prime Search), 2004,
http://www.mersenne.org
Havil, J. “Gamma: Exploring Euler's Constant”. Princeton, NJ: Princeton
University Press, 2003.
A. Languasco, and A. Perelli. “Prime Numbers and Cryptography”. 2003
http://www.math.unipd.it/~languasc/lavoripdf/R8eng.pdf
Lewis, John and Loftus, William. “Java Software Solutions”. 2nd edition,
Addison Wesley Longman, 2001
Pfleeger, Charles and Pfleeger, Shari. “Security in Computing”. Prentice Hall
2003, 3rd Edition
Weisstein, Eric W. "Prime Number." From MathWorld--A Wolfram Web
Resource. http://mathworld.wolfram.com/PrimeNumber.html
12
Download