San Jose State University Department of Computer Science CS 265 RSA Timing Attack Submitted By: Ramya Venkataramu SID: 004395639 Date Submitted : 03/24/2006 Section: 01 1 TABLE OF CONTENTS ABSTRACT ....................................................................................................................... 3 INTRODUCTION ............................................................................................................ 3 The RSA Cryptosystem .................................................................................................. 3 Repeated Squaring Algorithm......................................................................................... 3 The Timing Attack .......................................................................................................... 4 METHODOLOGY ........................................................................................................... 4 Attack #1: Is a practical timing attack against OpenSSL ............................................... 5 Attack #2: Attack using Paul C. Kocher’s Method......................................................... 5 IMPLEMENTATION ...................................................................................................... 6 Implementation Setup ..................................................................................................... 6 Implementation of RSA Cryptosystem ........................................................................... 6 Demonstration ............................................................................................................. 6 Attack #1: Implementation of the Practical Timing Attack over OpenSSL 0.9.7d ........ 6 Difficulties Encountered ........................................................................................... 10 Difficulties Solved .................................................................................................... 10 Demonstration ........................................................................................................... 10 Attack #2: Implementation of an Attack using Paul C. Kocher’s Method ................... 10 Difficulties Encountered ........................................................................................... 11 Difficulties Solved .................................................................................................... 12 Demonstration ........................................................................................................... 12 CONCLUSION ............................................................................................................... 12 REFERENCES................................................................................................................ 12 APPENDIX ...................................................................................................................... 13 A-1 Implementation of RSA Cryptosystem .................................................................. 13 A-2 Brumley and Boneh’s Approach ........................................................................... 13 A-3 Kocher’s Method on Repeated Squaring Algorithm ............................................. 15 2 ABSTRACT It was believed that the only way to attack RSA cryptosystem was by solving the “hard” problem of factorizing an integer ‘N’ (the modulus) into its’ two relatively prime components (‘p’ and ‘q’). However, innovative side channel attacks called Timing attacks were able to break the RSA Cryptosystem through a different approach. In this project, the details of the various timing attacks are studied and their implementations are carried out in order to gain an in-depth knowledge in this area. INTRODUCTION The RSA Cryptosystem The RSA cryptosystem, invented by Rivest, Shamir, and Adleman is a “one-way” mathematical function used to securely encrypt and decrypt messages. Its security is based on the idea that factoring an integer into its prime divisors is a hard problem. Messages are encrypted using: C = Me mod N Ciphertexts are decrypted using: M = Cd mod N - M is the message - C is the cipher text - e is the exponent - d is the private key - N is the modulus and N = p*q, p > q Repeated Squaring Algorithm The exponentiation in encryption and decryption is an expensive operation. Repeated Squaring Algorithm is an efficient method used to compute modular exponentiation. x=M for j = 1 to n x = mod( x^2, m) if dj == 1 then x = mod( x*M, m ) endif next j return x Figure 1: Repeated Squaring Algorithm, Source: [2] Figure 1 illustrates the repeated squaring method of performing modular reductions. 3 The Timing Attack The RSA cryptosystem is secure. However, there are surprising indirect attacks that can be carried out against the RSA system to recover bits of the private key. Timing attack is one such indirect side channel attack. Timing attack depends on time taken to perform certain crypto operation with a set of input parameters. This timing information can then be used to determine certain amount of the “secret information”. METHODOLOGY There are two well known timing attacks: Brumley & Boneh’s attack over OpenSSL, [1]. The timing attack based on Kocher’s idea, [2]. This project implements these two attacks. Attack #1: Is a practical timing attack against OpenSSL 0.9.7d, using the concepts in [1]. Attack #2: Is a timing attack over repeated squaring algorithm using ideas in [2]. Before describing the attack in details, here are some basic concepts and terms. OpenSSL “The OpenSSL Project is a collaborative effort to develop a robust, commercial- grade, full-featured, and Open Source toolkit implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols as well as a full-strength general purpose cryptography library. The project is managed by a worldwide community of volunteers that use the Internet to communicate, plan, and develop the OpenSSL toolkit and its related documentation.” [3] To optimize the encryption/decryption process OpenSSL uses: Chinese Remainder Theorem Sliding Window Exponentiation Montgomery Multiplication Karatsuba’s Algorithm Chinese Remainder Theorem (CRT) CRT is a mathematical technique that can speedup the exponentiation operation. With Chinese Remaindering, the function m = cd mod N is computed in two steps. First, evaluate m1 = cd1 mod p and m2 = cd2 mod q, where d1 and d2 are pre-computed using d, p and q are the prime components of the modulus N. Then m1 and m2 are combined to m using CRT. 4 Sliding Window Exponentiation Sliding Window Exponentiation is an optimization of the ‘square and multiply’ method. This algorithm performs modular multiplication at every step. It is required to precompute a multiplication table which can then be used in successive computations. Hence, in each iteration a block of bits can be processed. For a 1024-bit modulus it uses a window size of five [1]. Montgomery Reduction Montgomery is a method of implementing reduction modulo operation using a series of efficient operations. Montgomery reduction transforms a reduction modulo q into a reduction modulo some power of 2 (denoted by R). However, in order to use Montgomery reduction all variables must first be put into Montgomery form. The Montgomery form of a number x is x*R mod q [1]. Since RSA deals with huge numbers, the Montgomery reduction method speeds up the process, even though there is an overhead involved initially in putting the numbers in Montgomery form. Attack #1: Is a practical timing attack against OpenSSL The attack depends on time variation of various operations in OpenSSL RSA decryption: Schindler’s observation of the number of extra reductions in Montgomery’s multiplication. The choice of multiplication routine – Karatsuba vs. Normal multiplication. Extra reduction in Montgomery reduction At the end of the Montgomery reduction, a check is made if the output is greater than the modulus q. If so, subtract q from the output to ensure the output is in the range of 0 to q. This step is called extra reduction. The number of extra reductions causes a timing difference which helps us deduce how close g is to a multiple of one of these factors. Timing in multiplicative methods OpenSSL uses 2 different multiplicative methods: Karatsuba/Recursive multiplication – for multiplying two numbers with an equal number of words. Normal Multiplication – multiplying two numbers with an unequal number of words. There is some timing information revealed by these two multiplication routines. Karatsuba is faster than normal multiplication. Hence, multiplying equal number of words takes shorter time than multiplying unequal number of words. Attack #2: Attack using Paul C. Kocher’s Method The actual time (Ti) taken to sign a large number of random messages is computed. 5 Attacker can compute (on a machine similar to the system on attack) ti, time taken to compute Mi * Mi2 (mod m) for each message. If d1 in the private key is 1, then Mi * Mi2 (mod m) for each I is performed, otherwise it is not. By looking at set Ti and ti and the correlation between the sets, attacker can identify the value of bit 1. Attacker can proceed in a similar manner to identify other bits. To identify the correlation between the Ti and ti, the values were normalized and compared. IMPLEMENTATION Implementation Setup All the below implementations were carried out on IBM Thinkpad T23 notebook, running SuSE Linux 10 operating system. Implementation of RSA Cryptosystem A simple RSA cryptosystem is implemented. It performs encryption of a message and decryption of a cipher text. Both methods use the repeated squaring algorithm for efficiency. Encryption – The message from an input file is parsed. Every character is encrypted into a cipher text which is stored in an output file. Decryption- The input file contains the encrypted message. This message is decrypted and the resulting deciphered text is written in an output file. This sample implementation can handle key size of up to 32-bits. Demonstration Refer Appendix A-1 Attack #1: Implementation of the Practical Timing Attack over OpenSSL 0.9.7d The OpenSSL 0.9.7d is downloaded from [6] and the build is performed to create the executables. “Blinding” function of openssl is turned of using “” 6 OpenSSL performs 4 types of operations: 1. 2. 3. 4. Signing (RSA_eay_private_encrypt function) Decryption (RSA_eay_private_decrypt function) Signature Verification (RSA_eay_public_decrypt function) Encryption (RSA_eay_public_encrypt function) This implementation attacks the signing function (RSA_eay_private_encrypt) which can be found in the file rsa_eay.c in directory /crypto/rsa. To optimize the encryption/decryption process OpenSSL uses: Chinese Remainder Theorem Sliding Window Exponentiation Montgomery Multiplication Karatsuba’s Algorithm The implementation of Chinese Remainder Theorem is done in the function RSA_mod_exp which can be found in the file rsa_eay.c in directory /crypto/rsa. Here the ‘rdtsc’ function [7] is used to calculate the time for the exponentiation bn_mod_exp to execute. The Karatsuba and Normal multiplication is executed by the function BN_mul which is found in the file bn_mul.c in directory /crypto/bn. Montgomery Reductions is executed by the function BN_mod_exp which can be found in the file bn_exp.c in the directory /crypto/bn. BN or big number data structure - is a key data structure used. This data structure is needed since normal C data types have ‘int’ or ‘long long’ which are of size 32 and 64 bits (on Intel-386 architecture) respectively. On the other hand BN data structure maintains a pointer to a large data type and can handle numbers which are of the order of 1024 or more bits. This is needed since the size of the modulus, p or q values, and private key used for practical purposes are of the order of 1024 bits. Particulars of the Attack: The attack proceeds to guess the value of q (where N = p * q and q < p) one bit at a time, using the decryption timing information for certain known plain-texts. The initial guess for q lies between lies between 2512 and 2511. Decryption times for different possible combinations of first few bits are found and arrived at an initial guess by finding the peaks in these decryption time. Now suppose we have already found top i-1 bits. g is the top i-1 bits of q (assuming these bits are already recovered) and the remaining bits are 0. 7 Let ghi be equal to g, but with ith bit is set to 1. This implies g < ghi < q or g < q < ghi Let ug = g * R-1 mod N Let ughi = ghi * R-1 mod N Time to decrypt ug and ughi are measured. Note, ug and ughi are used instead of g and ghi as RSA decryption converts its input to Montgomery form before exponentiation and hence will use g and ghi. The difference in DecryptionTime(ug) and DecryptionTime(ughi) is used to determine bit i of q. If this difference is ‘large’, then bit i of q is 0 and g < q < ghi. If this difference is ‘small’ then bit i of q is 1 and g < ghi < q. This large and small difference is due to time variations in openssl (Extra reduction and multiplication algorithm used) that were described before. For any particular bit of q, the number of queries for a guess g is determined by two parameters [1]: Neighborhood Size – For every bit of q, measure the decryption time for a neighborhood of values g, g+1, g+2, ……,g+n. [1] Sample Size – For each value of g+i, sample the decryption time multiple times and compute the median decryption time. This is required to overcome the effect of a multiuser environment. Repeatedly decrypting for g+k and using the median value as the effective decryption time is more effective than doing it once. [1] The neighborhood and sample size must be large enough to obtain delta values with a strong indicator of the private key bit. I have chosen a sample size of 7 and neighborhood size of 3200. The program doing the attack is using some functions from OpenSSL’s libcrypto library. These functions are part of OpenSSL and were used to handle big numbers of g, ghi, etc and doing math between the big numbers. The list of functions used is as follows: BN_init - function is used to initialize any big number data type. BN_bin2bn – Converts a binary value to big number form. BN_uadd – performs the addition of two big numbers and stores the result in a third big number. BN_mul – performs the multiplication of R^-1 (mod N) with the input cipher text (neighbor value in this attack) BN_print_fp - This function prints the input big number data structure to a file. BN_clear_bit - is used to set the input bit value to 0. BN_set_bit - is used to set the input bit value to 1. The Algorithm to recover the private key bits makes use of time variances which occur in OpenSSL’s implementation of RSA and is as in Figure 2: 8 Initialize g with top i-1 bits of q. ghi is made equal to g. Determine R-1 mod N (This can be gotten by looking at openssl source code and is not a secret). While there are more bits to be found BN_set_bit function is used to set bit i of ghi to 1. for k = 0 to Neighborhood size BN_add function is used to add g to k. BN_mul is used to multiply g+k with R-1 mod N Store the multiply result in ug /* determine the time used to decrypt ug */ for j=0 to Sample Size Call the OpenSSL signing/decryption function with arguments. Note the difference in start and end times for ug end for let t1 be the median of decryption times of ug over the Sample Size. /* the same process is repeated with ughi */ BN_add function is used to add ghi to k. Use BN_mul to multiply the add result with R-1 mod N Store the multiply result in ughi for j=0 to Sample Size Call the OpenSSL signing/decryption function with arguments. Note the difference in start and end times for ughi end for let t2 be the median of decryption times of ughi over the Sample Size. let delta = | t1 – t2 | If delta is “large” then /* bit i of q is 0 */ BN_clear_bit function is used to clear bit i in ghi else /* bit i of q is 1 */ BN_set_bit is used to set bit i of g /* Bit I is already set in ghi */ end if end for /* Neighborhood size */ end while Figure 2: Algorithm to Recover Private Key Bits The decryption time in the above algorithm is measured as the time for BN_mod_exp function with q, in OpenSSL. This is done by changing the OpenSSL code to add timing measurement around the call to this function in file (crypto/rsa/rsa_eay.c). The time is calculated using ‘rdtsc’ instruction that gives a high resolution cycle time. 9 The attack takes a considerably long time to run due to the large size of the neighborhood and sample size. This attack was run overnight and to successfully recover bits of q, after few initial bits were considered as known. Difficulties Encountered 1. Timing the difference in time to decrypt ug and ughi using the timing function in the attack did not yield consistent timing differences between when bit i was 0 or 1. 2. Understanding the OpenSSL code and integrating my attack function with the OpenSSL code. 3. Calculating time for a operation with high resolution 4. Blinding is used in OpenSSL 0.9.7d which hides the timing variations that are useful in the timing attack. Difficulties Solved 1. To overcome this issue, the time is recorded inside the OpenSSL code in the BN_mod_exp function. 2. Integration is achieved by launching openssl using “system” call. Also, OpenSSL functions to handle big numbers were used in the attack code by linking the libcrypto library. 3. rdtsc was used to get the timestamp counts before and after the BN_mod_exp() function. Also note that, care was taken not to run any other load like screensavers, etc, while the attack was being run. And the laptop’s power management function which can affect rdtsc was turned off, while the attack was in progess. 4. The function RSA_blinding_off is used to turn off blinding effect. Demonstration A sample run is demonstrated in Appendix A-2. Attack #2: Implementation of an Attack using Paul C. Kocher’s Method The repeated squaring algorithm implemented as part of the my RSA implementation was targeted in this attack. A large number of queries are fed to the repeated squaring method which performs the decryption of each query. For each of these queries, an integer variable keeps a count of the number of modular reductions that occur. After the signing of each random message, the actual count values are recorded in an array. Note, the modular reductions are executed only if the message is larger than the modulus value (N) value. Additionally, in iteration i, if the binary bit of the private key is 1, than 10 an extra modular reduction is executed. Hence, it can be deduced that a private key bit of value 1 will result in a higher actual count value than a bit value of 0. Figure 3: Kocher’s Attack Algorithm for j=0 till count Generate random messages Mj Measure the actual decryption time Tj for each Mj end for private_key[MSB] = 1 for i=1 till numOfDigits private_key[MSB+i] = 1 for j=0 till count Measure tj the time taken by repeated squaring algorithm with current guess “private key”. This implementation uses the count of modulo operation as an indicator of time. end for Find the correlation between Tj and tj. tj was normalized to Tj, by adding the most common difference between Tj and tj values. If there is a correlation between Tj and tj, then /* bit i of private key is 1 */ /* bit i is already set to 1 */ Else /* bit i of private key is 0 */ private_key[MSB+i] = 0 end if end for This method has a high success rate and worked on different keys. Note that the correlation is found by imitates plotting the graph of the actual time and the time obtained when using the guess bit. The two graphs are normalized to same level by adding “most common difference between Tj and tj”. The value of dj = 1 if there is a correlation between Tj and tj. Tj and tj is found to have correlation if normalized tj is never higher Tj. And using this, successive bits of d can be found. Difficulties Encountered 1. Determining the correlation between actual time and the computed time without manually plotting a graph for every iteration. 2. Timing the modular reductions in the repeated squaring algorithm. 11 Difficulties Solved 1. This was solved by calculating an increment in time and using this value during subsequent calculations. This way, the graphs of the actual time and the computed times overlap and an actual correlation can be determined. 2. This was solved by keeping a count internally in the algorithm that keeps track of the number of modular reductions performed. Demonstration A sample run is demonstrated in Appendix A-3. CONCLUSION This project investigates the two well known methods of timing attack – Brumley & Boneh attack on OpenSSL and the Kocher’s method on the repeated squaring algorithm. Both attacks were successfully implemented. REFERENCES [1] David Brumley and Dan Boneh, “Remote Timing Attacks Are Practical” at http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf [2] Mark Stamp and Richard M. Low, “Applied Cryptanalysis” [3] http://www.openssl.org/ [4] Mark Stamp, “Information Security: Principles and Practice”, John Wiley & Sons [5] Paul C. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems” at http://www.cryptography.com/resources/whitepapers/TimingAttacks.pdf [6] OpenSSL source code download website at http://rpmfind.net/ [7] tsc description http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf 12 APPENDIX A-1 Implementation of RSA Cryptosystem Note all the mentioned files are available in the directory containing the code. Case(1): Encryption The file in1.txt contains the string “RSA TIMING PROJECT IS #1!” This is the message to be encrypted. The encryption is run as follows: :~/x_ramya/CS_265/cryptanalysis> ./rsa Enter modulus value (N): 3233 Press 1 to perform encrytion or Press 2 to perform decrytion/signing 1 Enter Input file name: in1.txt Enter output file name: out1.txt Enter exponent value (e): 17 :~/x_ramya/CS_265/cryptanalysis> To verify the encryption, decryption of the same output file ‘out1’ is performed as shown below: Case(2): Decryption :~/x_ramya/CS_265/cryptanalysis> ./rsa Enter modulus value (N): 3233 Press 1 to perform encrytion or Press 2 to perform decrytion/signing 2 Enter Input file name: out1.txt Enter output file name: final1.txt Please enter private key (d): 2753 :~/x_ramya/CS_265/cryptanalysis> cat final1.txt RSA TIMING PROJECT IS #1!:~/x_ramya/CS_265/cryptanalysis> A-2 Brumley and Boneh’s Approach The simulation is run on IBM T-23 Thinkpad. The attack was against OpenSSL Signing routine, where signing was carried out with a 1024 bit key generated by ssh-keygen. Certain number of initial bits of q were used for the initial guess of q. With the attack I was able to successfully recover bits of q as shown below. 13 Note that the ‘limit’ value used in the attack program to distinguish the ‘small’ and ‘large’ delta values (which in turn will lead to guess of bit value as 1 or 0 resp.), depends on lot of factors like neighborhood size, CPU speed, etc. Initial few digits of q used in the sample run below was CFB4DE0ACABC98616D42EFF… The attack assumed that initial 16 digits are known. Sample result below shows the initial 3 correct digits that were recovered along with the timing differences. 0110 1101 0100 => 6D4 Note that attack is not limited to 12 bits and attack was able to recover more bits as well. Only 12 bits are shown here due to space restrictions. :~/x_ramya/CS_265/cryptanalysis/openssl> make gcc -o openssl_attack openssl_attack.c -I /usr/src/packages/BUILD/openssl-0.9.7d/crypto -I /usr/src/packages/BUILD/openssl-0.9.7d/include -L /usr/src/packages/BUILD/openssl-0.9.7d -lcrypto -Wall -g :~/x_ramya/CS_265/cryptanalysis/openssl> ./openssl_attack i = 0: Time reqd to decrypt ug 12209827175 Time reqd to decrypt ughi 12217261139 delta 7433964 Bit is 0 i = 1: Time reqd to decrypt ug 12208888409 Time reqd to decrypt ughi 12208365169 delta 523240 Bit is 1 i = 2: Time reqd to decrypt ug 12210930466 Time reqd to decrypt ughi 12211237740 delta 307274 sum of diff 78664210 Bit is 1 i = 3: Time reqd to decrypt ug 12207957153 Time reqd to decrypt ughi 12215835548 delta 7878395 Bit is 0 i = 4: Time reqd to decrypt ug 12210401404 Time reqd to decrypt ughi 12209934021 delta 467383 Bit is 1 i = 5: Time reqd to decrypt ug 12212973694 Time reqd to decrypt ughi 12213135763 delta 162069 Bit is 1 14 i = 6: Time reqd to decrypt ug 12219014351 Time reqd to decrypt ughi 12226447738 delta 7433387 Bit is 0 i = 7: Time reqd to decrypt ug 12219186192 Time reqd to decrypt ughi 12219337969 delta 151777 Bit is 1 i = 8: Time reqd to decrypt ug 12213442217 Time reqd to decrypt ughi 12220561756 delta 7119539 Bit is 0 i = 9: Time reqd to decrypt ug 12210885116 Time reqd to decrypt ughi 12210458363 delta 426753 Bit is 1 i = 10: Time reqd to decrypt ug 12209842086 Time reqd to decrypt ughi 12217600829 delta 7758743 Bit is 0 i = 11: Time reqd to decrypt ug 12210432774 Time reqd to decrypt ughi 12218527179 delta 8094405 Bit is 0 A-3 Kocher’s Method on Repeated Squaring Algorithm The attack is run using known private and public key combinations. Simulation 1 The private key is recovered. Note that only 2 bits are incorrect. :~/x_ramya/CS_265/cryptanalysis> ./attack Enter modulus (N): 3233 Enter private key (d): 2753 The binary equivalent of the private key is: 101011000001 15 ********After Performing the Kocher attack*********** The private key is determined to be (in binary) : 101011100000 The private key in decimal is 2784 :~/x_ramya/CS_265/cryptanalysis> Simulation 2 The private key is recovered. Note that only 1 bit is incorrect. :~/x_ramya/CS_265/cryptanalysis> ./attack Enter modulus (N): 36355783 Enter private key (d): 24229147 The binary equivalent of the private key is: 1011100011011010100011011 ********After Performing the Kocher attack*********** The private key is determined to be (in binary) : 1011100011011010100011100 The private key in decimal is 24229148 :~/x_ramya/CS_265/cryptanalysis> 16