SideChannelAttacks

advertisement
Physical Security and SideChannel Attacks
Rice ELEC 528/ COMP 538
Farinaz Koushanfar
Fall 2007
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Introduction
• Classic cryptography views the securing
problem using mathematical abstractions
• The classic cryptoanalysis has had a great
success and promise
– Analysis and quantifying crypto algorithms’s resilience
against attacks)
• Recently, many of the security protocols have
been attacked using physical attacks
– Take advantage of the implementation specific to
recover the secret parameters
Physical attacks
• Traditional cryptography is centered around the concepts
of one-way and trapdoor functions
• A one-way function can be rapidly calculated, but is
computationally difficult to invert
• Polynomial time algorithms rarely find a pre-image of the
one-way security functions for a random set of inputs
• A trapdoor one-way function is a function that is easy to
invert if and only if a certain secret (key) is available
• Physical attacks usually have two phases:
– Interaction phase: the attacker exploits some physical
characteristics of the device
– Exploitation phase: analyzing the gathered information to
recover the secret
Model
• Consider a device capable of doing
cryptographic function
• The key is usually stored in the device and
protected
• Modern crypto based on Kerckhoff’s
assumptions  all of the data required to
operate a chip is entirely hidden in the secret
• Attacker only needs to extract the keys
Principle of divide-and-conquer
attack
• The divide and conquer (D&C) attacks attempt
at recovering the key by parts
• The idea is that an observable characteristic can
be correlated with a partial key
– The partial key should be small enough to enable
exhaustive search
• Once a partial key is validated, the process is
repeated for finding other keys
• D&C attacks may be iterative (some parts of the
key dependent on others) or independent
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Hardware targets
• The most common victim of hardware
cryptoanalysis are the smart cards (SC)
– Attacks on SCs are applicable to any general
purpose processor with a fixed bus length
– Attacks on FPGAs are also reported. FPGAs
represent application specific devices with
parallel computing opportunity
Smart Cards
• It has a small processor (8bit or 32bit) long
with ROM, EEPROM and a small RAM
• There are eight wires connecting the
processor to the outside world
• Power supply: SCs have no internal
batteries, the current provided by the reader
• Clock: SCs do not have an internal clock
• SCs are typically equipped with a shield that
destroys the chip if a tampering happens
FPGAs
• The first difference with
SCs is in the applications
of the two processor.
• FPGAs and ASICs allow
parallel computing
• Multiple programmable
configuration bits
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Attack classification
• Many possible attacks, the attacks are
often not mutually exclusive
• Invasive vs. noninvasive attacks
• Active vs. passive
– Active attacks tamper with device’s proper
functionality, either temporary or permanently
Five major attack groups
• Probing attack (invasive)
• Fault injection attacks – active attacks ,
maybe invasive or noninvasive
• Timing attacks exploit device’s running
time
• Power analysis attack
• Electromagnetic analysis attacks
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Power attacks
Measuring phase
• This task is usually straightforward
– Easy for smart cards: the energy is provided by the
terminal and the current can be read
• Relatively inexpensive (<$1000) equipment can
digitally sample voltage differences at high rates
(1GHz++) with less than 1% error
• Device’s power consumption depends on many
things, including its structure and data
Simple power analysis (SPA)
• Monitoring the device’s power consumption to
deduce information about data/operation
• Example: SPA on DES – smart card
– The internal structure is shown in the next slide
• Summary DES - a block cipher
– a product cipher
– 16 rounds (iterations) on the input bits (of P)
• substitutions (for confusion) and
• permutations (for diffusion)
– Each round with a round key
• Generated from the user-supplied key
* DES Basic Structure
[Fig. – cf. J. Leiwo]
Input
• Input:
64 bits (a block)
Input Permutation
• Li/Ri– left/right half of the input block
for iteration i (32 bits) – subject to
L0
R0
substitution S and permutation P (cf. Fig 2S
8– text)
• K - user-supplied key
• Ki - round key:
– 56 bits used +8 unused
(unused for E but often used for error
checking)
• Output:
64 bits (a block)
• Note: Ri becomes L(i+1)
• All basic op’s are simple logical ops
– Left shift / XOR
K
P
L1
R1
L16
R16
Final Permutation
Output
K1
K16
Example 1 - SPA on DES (cont’d)
• The upper trace – entire encryption, including
the initial phase, 16 DES rounds, and the initial
permutation
• The lower trace – detailed view of the second
and third rounds
SPA on DES (cont’d)
• The DES structure and 16 rounds are known
• Instruction flow depends on data  power signature
• Example: Modular exponentiation in DES is often
implemented by square and multiply algorithm
• Typically the square operation is implemented differently
compared with the multiply (for speed purposes)
• Then, the power trace of the exponentiation can directly
yields the corresponding value
• All programs involving conditional branching based on
the key values are at risk!
square and multiply
algorithm
Example 2: SPA on RSA
SPA example (cont’d)
SPA example (cont’d)
• Unprotected modular exponentiation – square and
multiply algorithm
• The pick values reveal the key values
Possible counter measure –
randomizing RSA exponentiation
Differential power analysis (DPA)
• SPA targets variable instruction flow
• DPA targets data-dependence
• Difference b/w smart cards (SCs) and
FPGAs
• In SCs, one operation running at a time
–  Simple power tracing is possible
• In FPGAs, typically parallel
computations prevents visual SPA
inspection  DPA
Example: DPA on DES
•
•
•
•
•
•
•
Divide-and-conquer strategy, comparing powers for different inputs
Record large number of inputs and record the corresponding power
consumption
We have access to R15, that entered the last round operation, since it is
equal to L16
Take this output bit (called M’i) at the last round and classify the curves
based on the bit
6 specific bits of R15 will be XOR’d with 6 bits of the key, before entering
the S-box
By guessing the 6-bit key value, we can predict the bit b, or an arbitrary
output bit of an arbitrary S-box output
Thus, with 26 partitions, one for each possible key, we can break the
cipher much faster
A closer look at HW
Implementation Of DES
DPA (cont’d)
• DPA can be performed in any algorithm
that has the operation =S(K),
–  is known and K is the segment key
The waveforms are captured by a scope and
Sent to a computer for analysis
What is available after acquisition?
DPA (cont’d)
The bit will classify the wave wi
– Hypothesis 1: bit is zero
– Hypothesis 2: bit is one
– A differential trace will be calculated for each bit!
DPA (cont’d)
DPA (cont’d)
DPA -- testing
DPA -- testing
DPA – the wrong guess
DPA (cont’d)
• The DPA waveform with the highest peak
will validate the hypothesis
DPA curve example
DPA by correlations
Attacking a secret key algorithm
Typical DPA Target
Example -- DPA
Example – hypothesis testing
DPA on DES algorithm
DPA on other algorithms
DPA (Cont’d)
Improvements over DPA
• Correlation power analysis (CPA) - attacker steps
– Predict the power usage of the device at one specific
instant, as a function of certain key bits
• E.g., for DES, it is assumed to be function of the Hamming weight
of the data
– Prediction matrix stores the predicted values
– Consumption vector Stores the measured power
– The attacker compared the actual and the predicted
values, using correlation coefficient
– E.g., correlation b/w all the columns of the prediction
vector and the consumption matrix
Modeling the power consumption
• Hamming weight model
– Typically measured on a bus, Y=aH(X)+b
– Y: power consumption; X: data value; H:
Hamming weight
• The Hamming distance model
– Y=aH(PX)+b
– Accounting for the previous value on the bus
(P)
Correlation power analysis (CPA)
• The equation for generating differential
waveforms replaced with correlations
• Rather than attacking one bit, the attacker
tries prediction of the Hamming weight of a
word (H)
• The correlation is computed by:
More about PA (cont’d)
• Data-dependent attacks require power
consumption model
– Can be measured and learned
• Synchronization of the measurements needs to
be addressed
• The attack is affected by parallel computing
which lowers observability
• The described attack is not the best achieved to
date, e.g., techniques based on maximum
likelihood often offer better results
Statistical PA -- countermeasures
Anti-DPA countermeasures
Anti-DPA
• Internal clock phase shift
Further Readings
• Differential power analysis, by kocher, Jaffe, and
Jun
• Power analysis tutorial, by Aigner and Oswald
• A tutorial on physical security and side-channel
attacks, by Koeune and Standaert
• Michael Tunstall has some good material – a
few of the charts are his courtesy
• Side channel attacks: countermeasures, by
Verbauwhede
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Timing attacks
• Running time of a crypto processor can be used
as an information channel
• The idea was proposed by Kocher, Crypto’96
Timing attacks (cont’d)
RSA Cryptosystem
• Key generation:
– Generate large (say, 2048-bit) primes p, q
– Compute n=pq and (n)=(p-1)(q-1)
– Choose small e, relatively prime to (n)
• Typically, e=3 (may be vulnerable) or
e=216+1=65537 (why?)
– Compute unique d such that ed = 1 mod (n)
– Public key = (e,n); private key = d
• Security relies on the assumption that it is difficult to
factor n into p and q
• Encryption of m: c = me mod n
How Does RSA Decryption
Work?
• RSA decryption: compute yx mod n
– This is a modular exponentiation operation
• Naïve algorithm: square and multiply
Kocher’s Observation
Whether iteration takes a long time
depends on the kth bit of secret exponent
This takes a while
to compute
This is instantaneous
Outline of Kocher’s Attack
• Idea: guess some bits of the exponent and predict how
long decryption will take
• If guess is correct, will observe correlation; if incorrect,
then prediction will look random
– This is a signal detection problem, where signal is timing variation
due to guessed exponent bits
– The more bits you already know, the stronger the signal, thus
easier to detect (error-correction property)
• Start by guessing a few top bits, look at correlations for
each guess, pick the most promising candidate and
continue
RSA in OpenSSL
• OpenSSL is a popular open-source toolkit
– mod_SSL (in Apache = 28% of HTTPS market)
– stunnel (secure TCP/IP servers)
– sNFS (secure NFS)
– Many more applications
• Kocher’s attack doesn’t work against OpenSSL
– Instead of square-and-multiply, OpenSSL uses CRT,
sliding windows and two different multiplication
algorithms for modular exponentiation
• CRT = Chinese Remainder Theorem
• Secret exponent is processed in chunks, not bit-by-bit
Chinese Remainder Theorem
• n = n1n2…nk
where gcd(ni,nj)=1 when i  j
• The system of congruences
x = x1 mod n1 = … = xk mod nk
– Has a simultaneous solution x to all congruences
– There exists exactly one solution x between 0 and n-1
• For RSA modulus n=pq, to compute x mod n it’s enough
to know x mod p and x mod q
RSA Decryption With CRT
• To decrypt c, need to compute m=cd mod n
• Use Chinese Remainder Theorem (why?)
– d1 = d mod (p-1)
these are precomputed
– d2 = d mod (q-1)
– qinv = (1/q) mod p
– Compute m1 = cd1 mod p; m2 = cd2 mod q
Attack this computation
order top)*q
learn q.
– Compute m = m2+(qinv*(m
1-m2)inmod
This is enough to learn private key (why?)
RSA Decryption With CRT
• To decrypt c, need to compute m=cd mod n
• Use Chinese Remainder Theorem (why?)
– d1 = d mod (p-1)
these are precomputed
– d2 = d mod (q-1)
– qinv = (1/q) mod p
– Compute m1 = cd1 mod p; m2 = cd2 mod q
Attack this computation
order top)*q
learn q.
– Compute m = m2+(qinv*(m
1-m2)inmod
This is enough to learn private key (why?)
Operations Involved in
Decryption
What is needed to compute cd mod q and xy mod q?
• Exponentiation
– Sliding windows
• Multiplication routines
– Normal (when operands have unequal length)
– Karatsuba (when operands have equal length):
faster
• Modular reduction
– Montgomery reduction
Montgomery Reduction
• Decryption requires computing m2 = cd2 mod q
• This is done by repeated multiplication
– Simple: square and multiply (process d2 1 bit at a time)
– More clever: sliding windows (process d2 in 5-bit blocks)
• In either case, many multiplications modulo q
• Multiplications use Montgomery reduction
– Pick some R = 2k
– To compute x*y mod q, convert x and y into their Montgomery form
xR mod q and yR mod q
– Compute (xR * yR) * R-1 = zR mod q
• Multiplication by R-1 can be done very efficiently
Schindler’s Observation
• At the end of Montgomery reduction, if zR > q, then need to
subtract q
– Probability of this extra step is proportional to c mod q
• If c is close to q, a lot of subtractions will be done
• If c mod q = 0, very few subtractions
– Decryption will take longer as c gets closer to q, then
become fast as c passes a multiple of q
• By playing with different values of c and observing how
long decryption takes, attacker can guess q!
– Doesn’t work directly against OpenSSL because of
sliding windows and two multiplication algorithms
Reduction Timing Dependency
Decryption
time
q
2q
p
Value of ciphertext c
Summary of Time
Dependencies
g<q
g>q
Montgomer Longer
y effect
Shorter
Multiplicati
on effect
Longer
Shorter
g is the decryption value (same as c)
Different effects… but one will always
dominate!
Attack Is Binary Search
Decryption time
#Reductions
Mult routine
0-1 Gap
q
Value of ciphertext
Attack Overview
• Initial guess g for q between 2512 and 2511 (why?)
• Try all possible guesses for the top few bits
• Suppose we know i-1 top bits of q. Goal: ith bit
– Set g =…known i-1 bits of q…000000
– Set ghi=…known i-1 bits of q…100000 (note: g<ghi)
• If g<q<ghi then the ith bit of q is 0
• If g<ghi<q then the ith bit of q is 1
• Goal: decide whether g<q<ghi or g<ghi<q
Two Possibilities for ghi
Decryption time
g
#Reductions
Mult routine
ghi?
Difference in decryption times
between g and ghi will be small
q
Value of ciphertext
ghi?
Difference in decryption times
between g and ghi will be large
Timing Attack Details
• What is “large” and “small”?
– Know from attacking previous bits
• Decrypting just g does not work because of sliding
windows
– Decrypt a neighborhood of values near g
– Will increase difference between large and small
values, resulting in larger 0-1 gap
• Attack requires only 2 hours, about 1.4 million queries to
recover the private key
– Only need to recover most significant half bits of q
The 0-1 Gap
Zero-one gap
Extracting RSA Private Key
Montgomery reduction
dominates
zero-one gap
Multiplication routine dominates
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Electromagnetic power analysis
EMA – probe design
EMA signal
Spatial positioning
Spatial positioning
Example: SEMA on RSA
EMA (cont’d)
Counter measures
Outline
•
•
•
•
•
•
•
Introduction
Hardware targets
Attack classification
Power attacks
Timing attacks
Electromagnetic attacks
Fault injection attacks
Fault injection attacks
Fault attacks
Fault injection techniques
• Transient (provisional) and permanent
(destructive) faults
– Variations to supply voltage
– Variations in the external clock
– Temperature
– White light
– Laser light
– X-rays and ion beams
– Electromagnetic flux
Need some (maybe expensive
equipment) – eg, laser
Fault injection steps
Provisional faults
• Single event upsets
– Temporary flips in a cell’s logical state to a
complementary state
• Multiple event faults
– Several simultaneous SEUs
• Dose rate faults
– The individual effects are negligible, but cumulative
effect causes fault
• Provisional faults are used more in fault injection
Permanent faults
• Single-event burnout faults
– Caused by a parasitic thyristor being formed in the MOS power
transistors
• Single-event snap back faults
– Caused by self-sustained current by parasitic bipolar transistors
in MOS
• Single-event latch-up faults
– Creates a self sustained current in parasitics
• Total dose rate faults
– Progressive degradation of the electronic circuit
Fault impacts (model)
• Resetting data
• Data randomization – could be misleading, no control
over!
• Modifying op-code – implementation dependent
Example – fault attack on DES
15-th round DPA
15-th round DPA
15-th round DES
Fault attacks – counter measures
Fault attacks – counter measures
Download