implementation - Department of Computer Science

advertisement
San Jose State University
Department of Computer Science
CS 265
RSA Timing Attack
Submitted By:
Ramya Venkataramu
SID: 004395639
Date Submitted : 03/24/2006
Section: 01
1
TABLE OF CONTENTS
ABSTRACT ....................................................................................................................... 3
INTRODUCTION ............................................................................................................ 3
The RSA Cryptosystem .................................................................................................. 3
Repeated Squaring Algorithm......................................................................................... 3
The Timing Attack .......................................................................................................... 4
METHODOLOGY ........................................................................................................... 4
Attack #1: Is a practical timing attack against OpenSSL ............................................... 5
Attack #2: Attack using Paul C. Kocher’s Method......................................................... 5
IMPLEMENTATION ...................................................................................................... 6
Implementation Setup ..................................................................................................... 6
Implementation of RSA Cryptosystem ........................................................................... 6
Demonstration ............................................................................................................. 6
Attack #1: Implementation of the Practical Timing Attack over OpenSSL 0.9.7d ........ 6
Difficulties Encountered ........................................................................................... 10
Difficulties Solved .................................................................................................... 10
Demonstration ........................................................................................................... 10
Attack #2: Implementation of an Attack using Paul C. Kocher’s Method ................... 10
Difficulties Encountered ........................................................................................... 11
Difficulties Solved .................................................................................................... 12
Demonstration ........................................................................................................... 12
CONCLUSION ............................................................................................................... 12
REFERENCES................................................................................................................ 12
APPENDIX ...................................................................................................................... 13
A-1 Implementation of RSA Cryptosystem .................................................................. 13
A-2 Brumley and Boneh’s Approach ........................................................................... 13
A-3 Kocher’s Method on Repeated Squaring Algorithm ............................................. 15
2
ABSTRACT
It was believed that the only way to attack RSA cryptosystem was by solving the “hard”
problem of factorizing an integer ‘N’ (the modulus) into its’ two relatively prime
components (‘p’ and ‘q’). However, innovative side channel attacks called Timing
attacks were able to break the RSA Cryptosystem through a different approach. In this
project, the details of the various timing attacks are studied and their implementations
are carried out in order to gain an in-depth knowledge in this area.
INTRODUCTION
The RSA Cryptosystem
The RSA cryptosystem, invented by Rivest, Shamir, and Adleman is a “one-way”
mathematical function used to securely encrypt and decrypt messages. Its security is
based on the idea that factoring an integer into its prime divisors is a hard problem.
Messages are encrypted using: C = Me mod N
Ciphertexts are decrypted using: M = Cd mod N
- M is the message
- C is the cipher text
- e is the exponent
- d is the private key
- N is the modulus and N = p*q, p > q
Repeated Squaring Algorithm
The exponentiation in encryption and decryption is an expensive operation. Repeated
Squaring Algorithm is an efficient method used to compute modular exponentiation.
x=M
for j = 1 to n
x = mod( x^2, m)
if dj == 1 then
x = mod( x*M, m )
endif
next j
return x
Figure 1: Repeated Squaring Algorithm, Source: [2]
Figure 1 illustrates the repeated squaring method of performing modular reductions.
3
The Timing Attack
The RSA cryptosystem is secure. However, there are surprising indirect attacks that can
be carried out against the RSA system to recover bits of the private key. Timing attack is
one such indirect side channel attack. Timing attack depends on time taken to perform
certain crypto operation with a set of input parameters. This timing information can then
be used to determine certain amount of the “secret information”.
METHODOLOGY
There are two well known timing attacks:


Brumley & Boneh’s attack over OpenSSL, [1].
The timing attack based on Kocher’s idea, [2].
This project implements these two attacks.
Attack #1: Is a practical timing attack against OpenSSL 0.9.7d, using the concepts
in [1].
Attack #2: Is a timing attack over repeated squaring algorithm using ideas in [2].
Before describing the attack in details, here are some basic concepts and terms.
OpenSSL
“The OpenSSL Project is a collaborative effort to develop a robust, commercial- grade,
full-featured, and Open Source toolkit implementing the Secure Sockets Layer (SSL
v2/v3) and Transport Layer Security (TLS v1) protocols as well as a full-strength general
purpose cryptography library. The project is managed by a worldwide community of
volunteers that use the Internet to communicate, plan, and develop the OpenSSL toolkit
and its related documentation.” [3]
To optimize the encryption/decryption process OpenSSL uses:
 Chinese Remainder Theorem
 Sliding Window Exponentiation
 Montgomery Multiplication
 Karatsuba’s Algorithm
Chinese Remainder Theorem (CRT)
CRT is a mathematical technique that can speedup the exponentiation operation. With
Chinese Remaindering, the function m = cd mod N is computed in two steps. First,
evaluate m1 = cd1 mod p and m2 = cd2 mod q, where d1 and d2 are pre-computed using d,
p and q are the prime components of the modulus N. Then m1 and m2 are combined to m
using CRT.
4
Sliding Window Exponentiation
Sliding Window Exponentiation is an optimization of the ‘square and multiply’ method.
This algorithm performs modular multiplication at every step. It is required to precompute a multiplication table which can then be used in successive computations.
Hence, in each iteration a block of bits can be processed. For a 1024-bit modulus it uses a
window size of five [1].
Montgomery Reduction
Montgomery is a method of implementing reduction modulo operation using a series of
efficient operations. Montgomery reduction transforms a reduction modulo q into a
reduction modulo some power of 2 (denoted by R). However, in order to use
Montgomery reduction all variables must first be put into Montgomery form. The
Montgomery form of a number x is x*R mod q [1]. Since RSA deals with huge numbers,
the Montgomery reduction method speeds up the process, even though there is an
overhead involved initially in putting the numbers in Montgomery form.
Attack #1: Is a practical timing attack against OpenSSL
The attack depends on time variation of various operations in OpenSSL RSA decryption:


Schindler’s observation of the number of extra reductions in Montgomery’s
multiplication.
The choice of multiplication routine – Karatsuba vs. Normal multiplication.
Extra reduction in Montgomery reduction
At the end of the Montgomery reduction, a check is made if the output is greater than the
modulus q. If so, subtract q from the output to ensure the output is in the range of 0 to q.
This step is called extra reduction. The number of extra reductions causes a timing
difference which helps us deduce how close g is to a multiple of one of these factors.
Timing in multiplicative methods
OpenSSL uses 2 different multiplicative methods:
 Karatsuba/Recursive multiplication – for multiplying two numbers with an equal
number of words.
 Normal Multiplication – multiplying two numbers with an unequal number of
words.
There is some timing information revealed by these two multiplication routines.
Karatsuba is faster than normal multiplication. Hence, multiplying equal number of
words takes shorter time than multiplying unequal number of words.
Attack #2: Attack using Paul C. Kocher’s Method
The actual time (Ti) taken to sign a large number of random messages is computed.
5
Attacker can compute (on a machine similar to the system on attack) ti, time taken to
compute Mi * Mi2 (mod m) for each message. If d1 in the private key is 1, then Mi * Mi2
(mod m) for each I is performed, otherwise it is not.
By looking at set Ti and ti and the correlation between the sets, attacker can identify the
value of bit 1.
Attacker can proceed in a similar manner to identify other bits.
To identify the correlation between the Ti and ti, the values were normalized and
compared.
IMPLEMENTATION
Implementation Setup
All the below implementations were carried out on IBM Thinkpad T23 notebook,
running SuSE Linux 10 operating system.
Implementation of RSA Cryptosystem
A simple RSA cryptosystem is implemented. It performs encryption of a message and
decryption of a cipher text. Both methods use the repeated squaring algorithm for
efficiency.
Encryption – The message from an input file is parsed. Every character is encrypted into
a cipher text which is stored in an output file.
Decryption- The input file contains the encrypted message. This message is decrypted
and the resulting deciphered text is written in an output file.
This sample implementation can handle key size of up to 32-bits.
Demonstration
Refer Appendix A-1
Attack #1: Implementation of the Practical Timing Attack over OpenSSL 0.9.7d
The OpenSSL 0.9.7d is downloaded from [6] and the build is performed to create the
executables. “Blinding” function of openssl is turned of using “”
6
OpenSSL performs 4 types of operations:
1.
2.
3.
4.
Signing (RSA_eay_private_encrypt function)
Decryption (RSA_eay_private_decrypt function)
Signature Verification (RSA_eay_public_decrypt function)
Encryption (RSA_eay_public_encrypt function)
This implementation attacks the signing function (RSA_eay_private_encrypt) which can
be found in the file rsa_eay.c in directory /crypto/rsa.
To optimize the encryption/decryption process OpenSSL uses:
 Chinese Remainder Theorem
 Sliding Window Exponentiation
 Montgomery Multiplication
 Karatsuba’s Algorithm
The implementation of Chinese Remainder Theorem is done in the function
RSA_mod_exp which can be found in the file rsa_eay.c in directory /crypto/rsa.
Here the ‘rdtsc’ function [7] is used to calculate the time for the exponentiation
bn_mod_exp to execute.
The Karatsuba and Normal multiplication is executed by the function BN_mul which is
found in the file bn_mul.c in directory /crypto/bn.
Montgomery Reductions is executed by the function BN_mod_exp which can be found in
the file bn_exp.c in the directory /crypto/bn.
BN or big number data structure - is a key data structure used. This data structure is
needed since normal C data types have ‘int’ or ‘long long’ which are of size 32 and 64
bits (on Intel-386 architecture) respectively. On the other hand BN data structure
maintains a pointer to a large data type and can handle numbers which are of the order of
1024 or more bits. This is needed since the size of the modulus, p or q values, and private
key used for practical purposes are of the order of 1024 bits.
Particulars of the Attack:
The attack proceeds to guess the value of q (where N = p * q and q < p) one bit at a time,
using the decryption timing information for certain known plain-texts.
The initial guess for q lies between lies between 2512 and 2511. Decryption times for
different possible combinations of first few bits are found and arrived at an initial guess
by finding the peaks in these decryption time.
Now suppose we have already found top i-1 bits.
 g is the top i-1 bits of q (assuming these bits are already recovered) and the
remaining bits are 0.
7






Let ghi be equal to g, but with ith bit is set to 1. This implies g < ghi < q or g < q <
ghi
Let ug = g * R-1 mod N
Let ughi = ghi * R-1 mod N
Time to decrypt ug and ughi are measured.
Note, ug and ughi are used instead of g and ghi as RSA decryption converts its
input to Montgomery form before exponentiation and hence will use g and ghi.
The difference in DecryptionTime(ug) and DecryptionTime(ughi) is used to
determine bit i of q. If this difference is ‘large’, then bit i of q is 0 and g < q < ghi.
If this difference is ‘small’ then bit i of q is 1 and g < ghi < q. This large and
small difference is due to time variations in openssl (Extra reduction and
multiplication algorithm used) that were described before.
For any particular bit of q, the number of queries for a guess g is determined by two
parameters [1]:
Neighborhood Size – For every bit of q, measure the decryption time for a neighborhood
of values g, g+1, g+2, ……,g+n. [1]
Sample Size – For each value of g+i, sample the decryption time multiple times and
compute the median decryption time. This is required to overcome the effect of a multiuser environment. Repeatedly decrypting for g+k and using the median value as the
effective decryption time is more effective than doing it once. [1]
The neighborhood and sample size must be large enough to obtain delta values with a
strong indicator of the private key bit. I have chosen a sample size of 7 and neighborhood
size of 3200.
The program doing the attack is using some functions from OpenSSL’s libcrypto library.
These functions are part of OpenSSL and were used to handle big numbers of g, ghi, etc
and doing math between the big numbers. The list of functions used is as follows:







BN_init - function is used to initialize any big number data type.
BN_bin2bn – Converts a binary value to big number form.
BN_uadd – performs the addition of two big numbers and stores the result in a
third big number.
BN_mul – performs the multiplication of R^-1 (mod N) with the input cipher text
(neighbor value in this attack)
BN_print_fp - This function prints the input big number data structure to a file.
BN_clear_bit - is used to set the input bit value to 0.
BN_set_bit - is used to set the input bit value to 1.
The Algorithm to recover the private key bits makes use of time variances which occur in
OpenSSL’s implementation of RSA and is as in Figure 2:
8
Initialize g with top i-1 bits of q.
ghi is made equal to g.
Determine R-1 mod N (This can be gotten by looking at openssl source code and is not a
secret).
While there are more bits to be found
BN_set_bit function is used to set bit i of ghi to 1.
for k = 0 to Neighborhood size
BN_add function is used to add g to k.
BN_mul is used to multiply g+k with R-1 mod N
Store the multiply result in ug
/* determine the time used to decrypt ug */
for j=0 to Sample Size
Call the OpenSSL signing/decryption function with arguments.
Note the difference in start and end times for ug
end for
let t1 be the median of decryption times of ug over the Sample Size.
/* the same process is repeated with ughi */
BN_add function is used to add ghi to k.
Use BN_mul to multiply the add result with R-1 mod N
Store the multiply result in ughi
for j=0 to Sample Size
Call the OpenSSL signing/decryption function with arguments.
Note the difference in start and end times for ughi
end for
let t2 be the median of decryption times of ughi over the Sample Size.
let delta = | t1 – t2 |
If delta is “large” then
/* bit i of q is 0 */
BN_clear_bit function is used to clear bit i in ghi
else
/* bit i of q is 1 */
BN_set_bit is used to set bit i of g
/* Bit I is already set in ghi */
end if
end for /* Neighborhood size */
end while
Figure 2: Algorithm to Recover Private Key Bits
The decryption time in the above algorithm is measured as the time for BN_mod_exp
function with q, in OpenSSL. This is done by changing the OpenSSL code to add timing
measurement around the call to this function in file (crypto/rsa/rsa_eay.c). The time is
calculated using ‘rdtsc’ instruction that gives a high resolution cycle time.
9
The attack takes a considerably long time to run due to the large size of the neighborhood
and sample size. This attack was run overnight and to successfully recover bits of q, after
few initial bits were considered as known.
Difficulties Encountered
1. Timing the difference in time to decrypt ug and ughi using the timing function in
the attack did not yield consistent timing differences between when bit i was 0 or
1.
2. Understanding the OpenSSL code and integrating my attack function with the
OpenSSL code.
3. Calculating time for a operation with high resolution
4. Blinding is used in OpenSSL 0.9.7d which hides the timing variations that are
useful in the timing attack.
Difficulties Solved
1. To overcome this issue, the time is recorded inside the OpenSSL code in the
BN_mod_exp function.
2. Integration is achieved by launching openssl using “system” call. Also, OpenSSL
functions to handle big numbers were used in the attack code by linking the
libcrypto library.
3. rdtsc was used to get the timestamp counts before and after the BN_mod_exp()
function. Also note that, care was taken not to run any other load like
screensavers, etc, while the attack was being run. And the laptop’s power
management function which can affect rdtsc was turned off, while the attack was
in progess.
4. The function RSA_blinding_off is used to turn off blinding effect.
Demonstration
A sample run is demonstrated in Appendix A-2.
Attack #2: Implementation of an Attack using Paul C. Kocher’s Method
The repeated squaring algorithm implemented as part of the my RSA implementation was
targeted in this attack. A large number of queries are fed to the repeated squaring method
which performs the decryption of each query. For each of these queries, an integer
variable keeps a count of the number of modular reductions that occur. After the signing
of each random message, the actual count values are recorded in an array.
Note, the modular reductions are executed only if the message is larger than the modulus
value (N) value. Additionally, in iteration i, if the binary bit of the private key is 1, than
10
an extra modular reduction is executed. Hence, it can be deduced that a private key bit of
value 1 will result in a higher actual count value than a bit value of 0.
Figure 3: Kocher’s Attack Algorithm
for j=0 till count
Generate random messages Mj
Measure the actual decryption time Tj for each Mj
end for
private_key[MSB] = 1
for i=1 till numOfDigits
private_key[MSB+i] = 1
for j=0 till count
Measure tj the time taken by repeated squaring algorithm with current
guess “private key”. This implementation uses the count of modulo operation as an
indicator of time.
end for
Find the correlation between Tj and tj. tj was normalized to Tj, by adding the most
common difference between Tj and tj values.
If there is a correlation between Tj and tj, then
/* bit i of private key is 1 */
/* bit i is already set to 1 */
Else
/* bit i of private key is 0 */
private_key[MSB+i] = 0
end if
end for
This method has a high success rate and worked on different keys.
Note that the correlation is found by imitates plotting the graph of the actual time and the
time obtained when using the guess bit. The two graphs are normalized to same level by
adding “most common difference between Tj and tj”. The value of dj = 1 if there is a
correlation between Tj and tj. Tj and tj is found to have correlation if normalized tj is
never higher Tj. And using this, successive bits of d can be found.
Difficulties Encountered
1. Determining the correlation between actual time and the computed time without
manually plotting a graph for every iteration.
2. Timing the modular reductions in the repeated squaring algorithm.
11
Difficulties Solved
1. This was solved by calculating an increment in time and using this value during
subsequent calculations. This way, the graphs of the actual time and the computed
times overlap and an actual correlation can be determined.
2. This was solved by keeping a count internally in the algorithm that keeps track of
the number of modular reductions performed.
Demonstration
A sample run is demonstrated in Appendix A-3.
CONCLUSION
This project investigates the two well known methods of timing attack – Brumley &
Boneh attack on OpenSSL and the Kocher’s method on the repeated squaring algorithm.
Both attacks were successfully implemented.
REFERENCES
[1] David Brumley and Dan Boneh, “Remote Timing Attacks Are Practical” at
http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf
[2] Mark Stamp and Richard M. Low, “Applied Cryptanalysis”
[3] http://www.openssl.org/
[4] Mark Stamp, “Information Security: Principles and Practice”, John Wiley & Sons
[5] Paul C. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS,
and Other Systems” at
http://www.cryptography.com/resources/whitepapers/TimingAttacks.pdf
[6] OpenSSL source code download website at http://rpmfind.net/
[7] tsc description http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
12
APPENDIX
A-1 Implementation of RSA Cryptosystem
Note all the mentioned files are available in the directory containing the code.
Case(1): Encryption
The file in1.txt contains the string “RSA TIMING PROJECT IS #1!” This is the message
to be encrypted. The encryption is run as follows:
:~/x_ramya/CS_265/cryptanalysis> ./rsa
Enter modulus value (N): 3233
Press 1 to perform encrytion or Press 2 to perform decrytion/signing 1
Enter Input file name: in1.txt
Enter output file name: out1.txt
Enter exponent value (e): 17
:~/x_ramya/CS_265/cryptanalysis>
To verify the encryption, decryption of the same output file ‘out1’ is performed as shown
below:
Case(2): Decryption
:~/x_ramya/CS_265/cryptanalysis> ./rsa
Enter modulus value (N): 3233
Press 1 to perform encrytion or Press 2 to perform decrytion/signing 2
Enter Input file name: out1.txt
Enter output file name: final1.txt
Please enter private key (d): 2753
:~/x_ramya/CS_265/cryptanalysis> cat final1.txt
RSA TIMING PROJECT IS #1!:~/x_ramya/CS_265/cryptanalysis>
A-2 Brumley and Boneh’s Approach
The simulation is run on IBM T-23 Thinkpad. The attack was against OpenSSL Signing
routine, where signing was carried out with a 1024 bit key generated by ssh-keygen.
Certain number of initial bits of q were used for the initial guess of q. With the attack I
was able to successfully recover bits of q as shown below.
13
Note that the ‘limit’ value used in the attack program to distinguish the ‘small’ and
‘large’ delta values (which in turn will lead to guess of bit value as 1 or 0 resp.), depends
on lot of factors like neighborhood size, CPU speed, etc.
Initial few digits of q used in the sample run below was
CFB4DE0ACABC98616D42EFF…
The attack assumed that initial 16 digits are known.
Sample result below shows the initial 3 correct digits that were recovered along with the
timing differences. 0110 1101 0100 => 6D4
Note that attack is not limited to 12 bits and attack was able to recover more bits as well.
Only 12 bits are shown here due to space restrictions.
:~/x_ramya/CS_265/cryptanalysis/openssl> make
gcc -o openssl_attack openssl_attack.c -I
/usr/src/packages/BUILD/openssl-0.9.7d/crypto -I
/usr/src/packages/BUILD/openssl-0.9.7d/include -L
/usr/src/packages/BUILD/openssl-0.9.7d -lcrypto -Wall -g
:~/x_ramya/CS_265/cryptanalysis/openssl> ./openssl_attack
i = 0: Time reqd to decrypt ug 12209827175
Time reqd to decrypt ughi 12217261139
delta 7433964
Bit is 0
i = 1: Time reqd to decrypt ug 12208888409
Time reqd to decrypt ughi 12208365169
delta 523240
Bit is 1
i = 2: Time reqd to decrypt ug 12210930466
Time reqd to decrypt ughi 12211237740
delta 307274
sum of diff 78664210
Bit is 1
i = 3: Time reqd to decrypt ug 12207957153
Time reqd to decrypt ughi 12215835548
delta 7878395
Bit is 0
i = 4: Time reqd to decrypt ug 12210401404
Time reqd to decrypt ughi 12209934021
delta 467383
Bit is 1
i = 5: Time reqd to decrypt ug 12212973694
Time reqd to decrypt ughi 12213135763
delta 162069
Bit is 1
14
i = 6: Time reqd to decrypt ug 12219014351
Time reqd to decrypt ughi 12226447738
delta 7433387
Bit is 0
i = 7: Time reqd to decrypt ug 12219186192
Time reqd to decrypt ughi 12219337969
delta 151777
Bit is 1
i = 8: Time reqd to decrypt ug 12213442217
Time reqd to decrypt ughi 12220561756
delta 7119539
Bit is 0
i = 9: Time reqd to decrypt ug 12210885116
Time reqd to decrypt ughi 12210458363
delta 426753
Bit is 1
i = 10: Time reqd to decrypt ug 12209842086
Time reqd to decrypt ughi 12217600829
delta 7758743
Bit is 0
i = 11: Time reqd to decrypt ug 12210432774
Time reqd to decrypt ughi 12218527179
delta 8094405
Bit is 0
A-3 Kocher’s Method on Repeated Squaring Algorithm
The attack is run using known private and public key combinations.
Simulation 1
The private key is recovered. Note that only 2 bits are incorrect.
:~/x_ramya/CS_265/cryptanalysis> ./attack
Enter modulus (N): 3233
Enter private key (d): 2753
The binary equivalent of the private key is: 101011000001
15
********After Performing the Kocher attack***********
The private key is determined to be (in binary) : 101011100000
The private key in decimal is 2784
:~/x_ramya/CS_265/cryptanalysis>
Simulation 2
The private key is recovered. Note that only 1 bit is incorrect.
:~/x_ramya/CS_265/cryptanalysis> ./attack
Enter modulus (N): 36355783
Enter private key (d): 24229147
The binary equivalent of the private key is: 1011100011011010100011011
********After Performing the Kocher attack***********
The private key is determined to be (in binary) :
1011100011011010100011100
The private key in decimal is 24229148
:~/x_ramya/CS_265/cryptanalysis>
16
Download