GENERATING PRIME NUMBERS FOR CRYPTOGRAPHY By Ayesha Mohiuddin and Ramazan Burus Abstract The prime numbers used in most cryptographic algorithms are rather large; often hundreds of digits in length or longer. And finding such prime numbers, even with today’s modern machines takes a lot of time, many years even. Grid computing, where we are able to use many processors to share the work load, can help us speed up this process of finding prime numbers between very large ranges. For example, a very basic algorithm of finding prime numbers, where you check whether any number smaller than the candidate is a divider of the candidate or not, is a very slow method of finding primes. However, using many nodes to do these calculations on different subsets of your range speeds up the process considerably, especially, if you keep increasing the number of participating nodes. Introduction Prime numbers, which are natural numbers only divisible by itself and by one, have been the focus of many great mathematicians since Euclid’s time. Use of prime numbers in cryptographic algorithms for data security purposes over the internet has made prime numbers even more important. Example of such cryptographic algorithms is Diffie-Helman key exchange, RSA algorithm, PKI etc. Prime numbers play an essential role in the art of public key cryptography but generating these huge prime numbers can be a hassle, since it’s a very time consuming task for computers. However, if we use multiple processors to share this work load instead of using just one, we can achieve the results much earlier. This kind of grid computing projects invites users to participate in the project by downloading the client code which is responsible for calculation on a part of the range. 1 Ideally, whenever the participating user’s computer is idle, this program will start calculating prime within its allotted range, and send the result back to the master server. This is not a new technique, infact, grid computing is already being used to do many calculation intensive projects such as analyzing data obtained from outer space, creating simulation models, and even finding out largest prime numbers etc. In our project we wanted to implement a simple prime number generator using the most basic algorithm i.e. to check if a number is divisible by any smaller number than itself, but we will use grid structure to accomplish this task and see how it speeds up this slow process. Prime Numbers and Cryptography Any positive natural number greater than 1, which is only divisible by 1 and itself, is called a prime number, such as the numbers 2, 3, 5, 7 and 11. These numbers have only two divisors; themselves and 1. The number 1 is a special case which is considered neither prime nor composite. The nth prime number is commonly denoted as Pn, so P1 = 2, P2 = 3, and so on. (Weisstein) In a 1975 lecture, D. Zagier commented "There are two facts about the distribution of prime numbers of which I hope to convince you so overwhelmingly that they will be permanently engraved in your hearts. The first is that, despite their simple definition and role as the building blocks of the natural numbers, the prime numbers grow like weeds among the natural numbers, seeming to obey no other law than that of chance, and nobody can predict where the next one will sprout. The second fact is even more astonishing, for it states just the opposite: that the prime numbers exhibit stunning regularity, that there are laws governing their behavior, and that they obey these laws with almost military precision" (Havil 2003, p. 171). In today world where people rely heavily on the internet, the data security has become a very important issue and computer professionals are always searching for ways to implement secure methods for online data transfer. For example, cryptography is used to transfer information securely and secretly. There are two main types of cryptography: Secret Key and Public key cryptography. 2 Secret Key cryptography is a very old method used by the ancient Romans and Greeks. It requires both parties to agree on, or exchange a secret key that will be used to encrypt the transferring data between the parties. Number of users is usually small here since exchanging the key secretly is a difficult task. Public Key cryptography, first proposed by Diffie and Hellman in 1976, is a modern way to transfer encrypted data securely even for high number of users. The keys come in pairs and no prior exchange of secret keys is required. Therefore, one key can be exchanged publicly without compromising the other pair. (A. Languasco, A. Perelli). Each participant has two keys, public(E) and private(D), where the public key (enciphering key) is published to all users and private key (deciphering key) is kept secret. It should be computationally infeasible to derive D from E. Then your original message (P) can be encrypted into the cipher text (C) by applying some formula using the keys. C = SE (P) P = HD(C) One of the most common public key cryptosystem is called RSA (Rivest-ShamirAdelman) encryption introduced in 1978. Two keys are used in RSA say e for encryption and d for decryption. One of these is kept as a private key. The keys are obtained from two complementary functions, say E and D, that undoes each other are chosen. The Plain text T is encoded into Te mod n which makes factoring Te to uncover T very difficult. It can be decrypted by a person knowing d by doing (Te) d mod n = T. (Pfleeger and Pfleeger, p 75) To find n, we have to choose two large prime numbers p and q. (typically 256 bits each & keep them secret). So, that n = p.q & Ø(n) = (p-1)(q-1). Then e and d are such that e.d = 1 mod Ø(n). So, your public key is <e,n> & private key is <d,n> Since n is product of p and q , two very large prime numbers, it is hard to find these factors, thus, increasing the security of this algorithm. Cryptographic methods other than RSA also rely on Prime numbers, such as Diffie- Hellman algorithm etc. Therefore, generating such huge prime numbers to use is an important issue for cryptographers. 3 Finding Primes The task of proving that a particular number is prime has taken on practical importance as the use of public-key cryptography has become widespread. As numbers get larger and larger the number of primes decreases. Since prime numbers have only two dividers, itself and 1, therefore one very basic way of checking if a number n is prime or not would be to see if any number smaller than this number can divide it completely or not. If you find a divider than n is not prime. This method take a lot of time to compute when n becomes very large. “There is no known method for rapidly and conclusively testing a given number for primality. Until just recently, the algorithms available, particularly those that could be executed in a reasonable amount of time, could only conclusively exclude a number i.e. prove it is composite, or show that a given integer might be a prime” (Crow). Some of the methods to check for primality are mentioned below. Fermat’s little theorem provides a fast method for proving a number p not a prime. For any integer m and a possibly prime number p, if mp mod p ≠ m mod p then p is not prime. If the remainders are equal, p may be prime. By repeatedly testing p using different values of m, we can increase the probability of p being a prime. If tested long enough this probability becomes almost 100%. Sieve of Eratosthenes is another method to test a number for primality. This method, named for a Greek mathematician of the third century BCE, is a trial and error based method. We check if any prime number up to the square-root of the number is its factor or not, that is, if any prime > 2 and < √n can not divide n completely than n is a prime. “The sieve is slow and becomes computationally expensive and time consuming as the magnitude of the number being tested increases.” (Crow) 4 Using Grid Computing Grid computing, using multiple computers to do a very intensive computation makes it a very powerful tool. Any kind of application where intensive processing or heavy mathematical computation is involved, can be done using grids of client nodes to share the processing to speed up the job considerably. The more the clients the faster it becomes. Today our computing power is much more powerful than before, but still it is not enough for processing algorithm where the running time is exponential. Therefore, instead of using one, we could use more than one computer coordinated and working together on one project. One such project was SETI project in 1999. SETI at Home is a grid computing application that divides signals received from space into tiny segments and sends them to millions of computers worldwide for processing data. Some other examples where grid computing can be used maybe: cracking a password that takes weeks, using grids it will be a matter of minutes, or Distributed Denial of Service attacks (allowing many coordinated computers to attack), or grids can be also used to combat denial of service (DOS) attacks. One Grid project related to prime numbers is the GIMPS project (The Great Internet Mersenne Prime Search). Here grid computing is being used to find very large prime numbers and to find prime factor of large numbers. Many users participate by downloading their code and joining in the search of largest known prime numbers. The latest discovery by one of their clients was the 41st known Mersenne Prime, 224,036,583-1. The number is nearly a million digits larger than our last find and is now the largest known prime number. These calculations took just over two weeks on a client’s 2.4 GHz Pentium 4 computer. (http://www.mersenne.org) A Simple Grid Project to Find Primes We can implement the most basic algorithm for generating prime numbers, by checking if any smaller number than a number n is a divider of the number n or not. But implement it using grid structure to generate primes between large ranges in much shorter time. Here we are not really interested in finding an efficient algorithm that finds prime numbers very quickly, but we are interested in improving the running time for any algorithm for finding prime numbers. If a more efficient algorithm is used here, the run time will be even faster. 5 For this purpose let’s use the following Java code for checking for a prime number: //------------------------------------------------------------------// isPrime // Returns true if the given number n is a prime // else return false //------------------------------------------------------------------public static boolean isPrime(long n) { if(n <= 1) return false; double limit = Math.sqrt(n); for (long i = 2; i<=limit; i++) { if( n % i == 0) return false; } return true; } Here the input n will be generated by a loop that runs from the starting range to ending range values. Therefore the project should have the following characteristics. Client side program responsibilities; Connect to the database through internet, Take a range of numbers to work on, communicate that the range has been taken, and start calculating primes within that range. Connect to database again for each found prime number and put that into its corresponding table When done communicate completion of task and take another range for new calculations. Administration Side responsibilities; Assign different ranges to different clients and receive results in tables. 6 Keep track of jobs, if a taken job is not done up to a certain time by a node, then consider the node dead and re-assign the same range to another client node. The new node should somehow start from where old one left off. Therefore, we are going to have the following three components in this simple project: Client Allotted a unique Id. Gets the range of numbers within it will generate the primes. Master Keeps monitoring the activity. Re-assigns range to another client if original client does not complete within its allotted time (limit is 1 day for our experiment). Database Stores the client information and the resulted Prime numbers sent by the clients. We are going to use oracle database in this example. The grid structure that we are using is explained in the following diagram having multiple client nodes, master and a database to store resulted primes. Client 1 Master Client 2 Client 3 Oracle Database Grid Architecture 7 The range of numbers that we are going to assign to every incoming client node is not going to be a fixed range. As the numbers increase calculation will become long since we are checking all the smaller number for division with the candidate numbers. The larger the number the more dividers to check for. Considering this, we will keep our ranges in descending order as the numbers increase. 227 0 Now consider the tables that we are going to use in the database. We decided on two simple tables; one for keeping track of the clients and their ranges called clients, and another one for storing resulted prime numbers called primeResult. The table clients is of the following structure: -CleintId (Unique Primary key; stores the id of each client node) -StartRange (starting range number for this client) -End Range (ending range number for this client) -LastPrime (Last prime found by this client until now, when range is reassigned to another node, this is where that new node will start its calculation) - TakenFalg (if 0 means this range has not been assigned, 1 means it has been assigned, 2 means it has been completed) -StartTime (stores the time when this client started working on this range) An example fro this table is as follows: CleintId StartRange EndRange LastPrime TakenFlag StartTime --------- ------------- ----------- ----------- ------------ ----------12 833333 1666665 1666657 2 25-NOV-04 The other table primeResult simply stores the client id (CID) and the prime number found by that client (Prime). Here is a snapshot of a row from this table. CID Primes ____________ 1 7 8 Clients PrimeResult ClientId CID StartRange EndRange LastPrime TakenFlag StartTime Primes ER-Diagram for database tables Now the client code that we implemented uses Ojdbc to connect to the oracle database to get its ranges and to store back the results. Of course TCP or UDP packets could also have been used for transferring data. //Load the jdbc-odbc bridge driver Class.forName ("oracle.jdbc.driver.OracleDriver"); //Attempt to connect to a driver. Connection con = DriverManager.getConnection (URL, "Username", "password"); //Create Updatable Statement object for submitting //SQL statements to the driver Statement stmt =con.createStatement (ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE); It is a very simple code having just one class that connects to oracle reads the row in table clients where the takenFlag is 0 (not taken by any other node). Once it gets its ranges it updates the flag to 1 and starts the calculation on its assigned ranges. //Get rows with non-assigned ranges query ="SELECT * FROM CLIENTS WHERE takenFlag = 0"; result = stmt.executeQuery (query); if (result.next ()) { clientId = result.getString(1); cid = Integer.parseInt(clientId); startRange = result.getString(2); 9 start = Integer.parseInt(startRange); endRange = result.getString(3); end = Integer.parseInt(endRange); lastPrime = result.getString(4); lprime = Integer.parseInt(lastPrime); takenFlag = result.getString(5); flag = Integer.parseInt(takenFlag); } //Set flag as taken (1) and enter current system time //as start time for this client into the database. query ="UPDATE clients SET takenFlag = 1, starttime = (select sysdate from dual) WHERE clientId="+clientId; updateResult = stmt.executeUpdate(query); //Close the statement stmt.close(); //Close the connection con.close(); //Start from last prime that was calculated for this range if(lprime == 0 ) { startcal = start; }else{ startcal = lprime+1; } //Finding Primes with in given range for (long i = startcal; i <= end; i++) { if(isPrime(i)) { //After connecting again to the database: query ="INSERT INTO primeResult_"+clientId+" VALUES ("+clientId+","+i+")"; updateResult = stmt.executeUpdate(query); //Update last found prime in clients query ="UPDATE clients SET lastprime = 10 "+i+" WHERE clientId="+clientId; updateResult = stmt.executeUpdate(query); stmt.close(); con.close(); } } //When range is done update flag to 2 (completed) query ="UPDATE clients SET takenFlag = 2 WHERE clientId="+clientId; updateResult = stmt.executeUpdate(query); When this code is being run from many different computers, each client can connect to database and gets its ranges and then connect again to store result. Also because of the way this has been implemented here, same node, when done with its range, can come back and become a new client to do a new range of numbers. The master code that is checking if any node ran out of its allotted time is also running on the same server that contains the database. This master program keeps checking all the rows containing TakenFlag = 1 and check if their StartTime is more than a day old or not. If it is a day old than it resets its TakenFlag back to 0 so that another client can pick it up. If all the rows containing all the sub ranges that we wanted to calculate are done, then a separate table called permit containing a single entry called done is updated by master to 1 from its initial value 0. This way all the clients know that the project is completed and they can stop now. Conclusion In conclusion finding prime numbers is still a slow process but not as slow as before. By using only one computer with a platform of 4 CPU at 700 Mhz to find prime numbers between 0 and 30,000,000 we were able to reduce running time to 17 minutes by using threads. This speed is very good comparing to 2 hours that were required to do this without threads before. But now using the grid structure the time is reduced considerably. We were able to get following result using this simple grid structure that we stated above. Our client side executable was about 1.29 MB, its memory usage is about 10 MB, and the CPU usage was 7 to 10 %. 11 In 12 hours using only 6 nodes, which is a really small number, we were able to find primes within the maximum range of 461 million. This speed can be increased further by using more nodes, more efficient algorithm for finding prime numbers and also using threads in case of that one of our clients may have dual CPU in his/her system. Many modifications and improvements can be done to this project. For example using a better algorithm, using UDP packets to communicate with the master instead of dealing directly with the database, etc. Also it would be better if the client side code is wrapped into a screen saver, so that it only starts executing when the clients computer is idle in order not to obstruct their own work. Bibliography Crow, Jerry. “Prime Numbers in Public Key Cryptography”, GSEC Practical Assignment. SANS Institute 2003. http://www.giac.org/practical/GSEC/Gerald_Crow_GSEC.pdf GIMPS (The Great Internet Mersenne Prime Search), 2004, http://www.mersenne.org Havil, J. “Gamma: Exploring Euler's Constant”. Princeton, NJ: Princeton University Press, 2003. A. Languasco, and A. Perelli. “Prime Numbers and Cryptography”. 2003 http://www.math.unipd.it/~languasc/lavoripdf/R8eng.pdf Lewis, John and Loftus, William. “Java Software Solutions”. 2nd edition, Addison Wesley Longman, 2001 Pfleeger, Charles and Pfleeger, Shari. “Security in Computing”. Prentice Hall 2003, 3rd Edition Weisstein, Eric W. "Prime Number." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PrimeNumber.html 12