Physical Security and SideChannel Attacks Rice ELEC 528/ COMP 538 Farinaz Koushanfar Fall 2007 Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Introduction • Classic cryptography views the securing problem using mathematical abstractions • The classic cryptoanalysis has had a great success and promise – Analysis and quantifying crypto algorithms’s resilience against attacks) • Recently, many of the security protocols have been attacked using physical attacks – Take advantage of the implementation specific to recover the secret parameters Physical attacks • Traditional cryptography is centered around the concepts of one-way and trapdoor functions • A one-way function can be rapidly calculated, but is computationally difficult to invert • Polynomial time algorithms rarely find a pre-image of the one-way security functions for a random set of inputs • A trapdoor one-way function is a function that is easy to invert if and only if a certain secret (key) is available • Physical attacks usually have two phases: – Interaction phase: the attacker exploits some physical characteristics of the device – Exploitation phase: analyzing the gathered information to recover the secret Model • Consider a device capable of doing cryptographic function • The key is usually stored in the device and protected • Modern crypto based on Kerckhoff’s assumptions all of the data required to operate a chip is entirely hidden in the secret • Attacker only needs to extract the keys Principle of divide-and-conquer attack • The divide and conquer (D&C) attacks attempt at recovering the key by parts • The idea is that an observable characteristic can be correlated with a partial key – The partial key should be small enough to enable exhaustive search • Once a partial key is validated, the process is repeated for finding other keys • D&C attacks may be iterative (some parts of the key dependent on others) or independent Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Hardware targets • The most common victim of hardware cryptoanalysis are the smart cards (SC) – Attacks on SCs are applicable to any general purpose processor with a fixed bus length – Attacks on FPGAs are also reported. FPGAs represent application specific devices with parallel computing opportunity Smart Cards • It has a small processor (8bit or 32bit) long with ROM, EEPROM and a small RAM • There are eight wires connecting the processor to the outside world • Power supply: SCs have no internal batteries, the current provided by the reader • Clock: SCs do not have an internal clock • SCs are typically equipped with a shield that destroys the chip if a tampering happens FPGAs • The first difference with SCs is in the applications of the two processor. • FPGAs and ASICs allow parallel computing • Multiple programmable configuration bits Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Attack classification • Many possible attacks, the attacks are often not mutually exclusive • Invasive vs. noninvasive attacks • Active vs. passive – Active attacks tamper with device’s proper functionality, either temporary or permanently Five major attack groups • Probing attack (invasive) • Fault injection attacks – active attacks , maybe invasive or noninvasive • Timing attacks exploit device’s running time • Power analysis attack • Electromagnetic analysis attacks Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Power attacks Measuring phase • This task is usually straightforward – Easy for smart cards: the energy is provided by the terminal and the current can be read • Relatively inexpensive (<$1000) equipment can digitally sample voltage differences at high rates (1GHz++) with less than 1% error • Device’s power consumption depends on many things, including its structure and data Simple power analysis (SPA) • Monitoring the device’s power consumption to deduce information about data/operation • Example: SPA on DES – smart card – The internal structure is shown in the next slide • Summary DES - a block cipher – a product cipher – 16 rounds (iterations) on the input bits (of P) • substitutions (for confusion) and • permutations (for diffusion) – Each round with a round key • Generated from the user-supplied key * DES Basic Structure [Fig. – cf. J. Leiwo] Input • Input: 64 bits (a block) Input Permutation • Li/Ri– left/right half of the input block for iteration i (32 bits) – subject to L0 R0 substitution S and permutation P (cf. Fig 2S 8– text) • K - user-supplied key • Ki - round key: – 56 bits used +8 unused (unused for E but often used for error checking) • Output: 64 bits (a block) • Note: Ri becomes L(i+1) • All basic op’s are simple logical ops – Left shift / XOR K P L1 R1 L16 R16 Final Permutation Output K1 K16 Example 1 - SPA on DES (cont’d) • The upper trace – entire encryption, including the initial phase, 16 DES rounds, and the initial permutation • The lower trace – detailed view of the second and third rounds SPA on DES (cont’d) • The DES structure and 16 rounds are known • Instruction flow depends on data power signature • Example: Modular exponentiation in DES is often implemented by square and multiply algorithm • Typically the square operation is implemented differently compared with the multiply (for speed purposes) • Then, the power trace of the exponentiation can directly yields the corresponding value • All programs involving conditional branching based on the key values are at risk! square and multiply algorithm Example 2: SPA on RSA SPA example (cont’d) SPA example (cont’d) • Unprotected modular exponentiation – square and multiply algorithm • The pick values reveal the key values Possible counter measure – randomizing RSA exponentiation Differential power analysis (DPA) • SPA targets variable instruction flow • DPA targets data-dependence • Difference b/w smart cards (SCs) and FPGAs • In SCs, one operation running at a time – Simple power tracing is possible • In FPGAs, typically parallel computations prevents visual SPA inspection DPA Example: DPA on DES • • • • • • • Divide-and-conquer strategy, comparing powers for different inputs Record large number of inputs and record the corresponding power consumption We have access to R15, that entered the last round operation, since it is equal to L16 Take this output bit (called M’i) at the last round and classify the curves based on the bit 6 specific bits of R15 will be XOR’d with 6 bits of the key, before entering the S-box By guessing the 6-bit key value, we can predict the bit b, or an arbitrary output bit of an arbitrary S-box output Thus, with 26 partitions, one for each possible key, we can break the cipher much faster A closer look at HW Implementation Of DES DPA (cont’d) • DPA can be performed in any algorithm that has the operation =S(K), – is known and K is the segment key The waveforms are captured by a scope and Sent to a computer for analysis What is available after acquisition? DPA (cont’d) The bit will classify the wave wi – Hypothesis 1: bit is zero – Hypothesis 2: bit is one – A differential trace will be calculated for each bit! DPA (cont’d) DPA (cont’d) DPA -- testing DPA -- testing DPA – the wrong guess DPA (cont’d) • The DPA waveform with the highest peak will validate the hypothesis DPA curve example DPA by correlations Attacking a secret key algorithm Typical DPA Target Example -- DPA Example – hypothesis testing DPA on DES algorithm DPA on other algorithms DPA (Cont’d) Improvements over DPA • Correlation power analysis (CPA) - attacker steps – Predict the power usage of the device at one specific instant, as a function of certain key bits • E.g., for DES, it is assumed to be function of the Hamming weight of the data – Prediction matrix stores the predicted values – Consumption vector Stores the measured power – The attacker compared the actual and the predicted values, using correlation coefficient – E.g., correlation b/w all the columns of the prediction vector and the consumption matrix Modeling the power consumption • Hamming weight model – Typically measured on a bus, Y=aH(X)+b – Y: power consumption; X: data value; H: Hamming weight • The Hamming distance model – Y=aH(PX)+b – Accounting for the previous value on the bus (P) Correlation power analysis (CPA) • The equation for generating differential waveforms replaced with correlations • Rather than attacking one bit, the attacker tries prediction of the Hamming weight of a word (H) • The correlation is computed by: More about PA (cont’d) • Data-dependent attacks require power consumption model – Can be measured and learned • Synchronization of the measurements needs to be addressed • The attack is affected by parallel computing which lowers observability • The described attack is not the best achieved to date, e.g., techniques based on maximum likelihood often offer better results Statistical PA -- countermeasures Anti-DPA countermeasures Anti-DPA • Internal clock phase shift Further Readings • Differential power analysis, by kocher, Jaffe, and Jun • Power analysis tutorial, by Aigner and Oswald • A tutorial on physical security and side-channel attacks, by Koeune and Standaert • Michael Tunstall has some good material – a few of the charts are his courtesy • Side channel attacks: countermeasures, by Verbauwhede Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Timing attacks • Running time of a crypto processor can be used as an information channel • The idea was proposed by Kocher, Crypto’96 Timing attacks (cont’d) RSA Cryptosystem • Key generation: – Generate large (say, 2048-bit) primes p, q – Compute n=pq and (n)=(p-1)(q-1) – Choose small e, relatively prime to (n) • Typically, e=3 (may be vulnerable) or e=216+1=65537 (why?) – Compute unique d such that ed = 1 mod (n) – Public key = (e,n); private key = d • Security relies on the assumption that it is difficult to factor n into p and q • Encryption of m: c = me mod n How Does RSA Decryption Work? • RSA decryption: compute yx mod n – This is a modular exponentiation operation • Naïve algorithm: square and multiply Kocher’s Observation Whether iteration takes a long time depends on the kth bit of secret exponent This takes a while to compute This is instantaneous Outline of Kocher’s Attack • Idea: guess some bits of the exponent and predict how long decryption will take • If guess is correct, will observe correlation; if incorrect, then prediction will look random – This is a signal detection problem, where signal is timing variation due to guessed exponent bits – The more bits you already know, the stronger the signal, thus easier to detect (error-correction property) • Start by guessing a few top bits, look at correlations for each guess, pick the most promising candidate and continue RSA in OpenSSL • OpenSSL is a popular open-source toolkit – mod_SSL (in Apache = 28% of HTTPS market) – stunnel (secure TCP/IP servers) – sNFS (secure NFS) – Many more applications • Kocher’s attack doesn’t work against OpenSSL – Instead of square-and-multiply, OpenSSL uses CRT, sliding windows and two different multiplication algorithms for modular exponentiation • CRT = Chinese Remainder Theorem • Secret exponent is processed in chunks, not bit-by-bit Chinese Remainder Theorem • n = n1n2…nk where gcd(ni,nj)=1 when i j • The system of congruences x = x1 mod n1 = … = xk mod nk – Has a simultaneous solution x to all congruences – There exists exactly one solution x between 0 and n-1 • For RSA modulus n=pq, to compute x mod n it’s enough to know x mod p and x mod q RSA Decryption With CRT • To decrypt c, need to compute m=cd mod n • Use Chinese Remainder Theorem (why?) – d1 = d mod (p-1) these are precomputed – d2 = d mod (q-1) – qinv = (1/q) mod p – Compute m1 = cd1 mod p; m2 = cd2 mod q Attack this computation order top)*q learn q. – Compute m = m2+(qinv*(m 1-m2)inmod This is enough to learn private key (why?) RSA Decryption With CRT • To decrypt c, need to compute m=cd mod n • Use Chinese Remainder Theorem (why?) – d1 = d mod (p-1) these are precomputed – d2 = d mod (q-1) – qinv = (1/q) mod p – Compute m1 = cd1 mod p; m2 = cd2 mod q Attack this computation order top)*q learn q. – Compute m = m2+(qinv*(m 1-m2)inmod This is enough to learn private key (why?) Operations Involved in Decryption What is needed to compute cd mod q and xy mod q? • Exponentiation – Sliding windows • Multiplication routines – Normal (when operands have unequal length) – Karatsuba (when operands have equal length): faster • Modular reduction – Montgomery reduction Montgomery Reduction • Decryption requires computing m2 = cd2 mod q • This is done by repeated multiplication – Simple: square and multiply (process d2 1 bit at a time) – More clever: sliding windows (process d2 in 5-bit blocks) • In either case, many multiplications modulo q • Multiplications use Montgomery reduction – Pick some R = 2k – To compute x*y mod q, convert x and y into their Montgomery form xR mod q and yR mod q – Compute (xR * yR) * R-1 = zR mod q • Multiplication by R-1 can be done very efficiently Schindler’s Observation • At the end of Montgomery reduction, if zR > q, then need to subtract q – Probability of this extra step is proportional to c mod q • If c is close to q, a lot of subtractions will be done • If c mod q = 0, very few subtractions – Decryption will take longer as c gets closer to q, then become fast as c passes a multiple of q • By playing with different values of c and observing how long decryption takes, attacker can guess q! – Doesn’t work directly against OpenSSL because of sliding windows and two multiplication algorithms Reduction Timing Dependency Decryption time q 2q p Value of ciphertext c Summary of Time Dependencies g<q g>q Montgomer Longer y effect Shorter Multiplicati on effect Longer Shorter g is the decryption value (same as c) Different effects… but one will always dominate! Attack Is Binary Search Decryption time #Reductions Mult routine 0-1 Gap q Value of ciphertext Attack Overview • Initial guess g for q between 2512 and 2511 (why?) • Try all possible guesses for the top few bits • Suppose we know i-1 top bits of q. Goal: ith bit – Set g =…known i-1 bits of q…000000 – Set ghi=…known i-1 bits of q…100000 (note: g<ghi) • If g<q<ghi then the ith bit of q is 0 • If g<ghi<q then the ith bit of q is 1 • Goal: decide whether g<q<ghi or g<ghi<q Two Possibilities for ghi Decryption time g #Reductions Mult routine ghi? Difference in decryption times between g and ghi will be small q Value of ciphertext ghi? Difference in decryption times between g and ghi will be large Timing Attack Details • What is “large” and “small”? – Know from attacking previous bits • Decrypting just g does not work because of sliding windows – Decrypt a neighborhood of values near g – Will increase difference between large and small values, resulting in larger 0-1 gap • Attack requires only 2 hours, about 1.4 million queries to recover the private key – Only need to recover most significant half bits of q The 0-1 Gap Zero-one gap Extracting RSA Private Key Montgomery reduction dominates zero-one gap Multiplication routine dominates Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Electromagnetic power analysis EMA – probe design EMA signal Spatial positioning Spatial positioning Example: SEMA on RSA EMA (cont’d) Counter measures Outline • • • • • • • Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks Fault injection attacks Fault attacks Fault injection techniques • Transient (provisional) and permanent (destructive) faults – Variations to supply voltage – Variations in the external clock – Temperature – White light – Laser light – X-rays and ion beams – Electromagnetic flux Need some (maybe expensive equipment) – eg, laser Fault injection steps Provisional faults • Single event upsets – Temporary flips in a cell’s logical state to a complementary state • Multiple event faults – Several simultaneous SEUs • Dose rate faults – The individual effects are negligible, but cumulative effect causes fault • Provisional faults are used more in fault injection Permanent faults • Single-event burnout faults – Caused by a parasitic thyristor being formed in the MOS power transistors • Single-event snap back faults – Caused by self-sustained current by parasitic bipolar transistors in MOS • Single-event latch-up faults – Creates a self sustained current in parasitics • Total dose rate faults – Progressive degradation of the electronic circuit Fault impacts (model) • Resetting data • Data randomization – could be misleading, no control over! • Modifying op-code – implementation dependent Example – fault attack on DES 15-th round DPA 15-th round DPA 15-th round DES Fault attacks – counter measures Fault attacks – counter measures