Side-Channel Attack on OpenSSL ECDSA Naomi Benger Joop van de Pol Yuval Yarom Nigel Smart 1 Outline • Background • ECDSA • wNAF scalar multiplication • Hidden Number Problem • The FLUSH+RELOAD Technique • Attacking OpenSSL ECDSA – Improved lattice technique – Using information from the double and add chain 2 ECDSA Signer has a private key 1<α<q-1 and a public key Q=[α]G 1. 2. 3. 4. 5. 6. Compute h=Hash(m) Randomly select an ephemeral key 1<k<q Compute (x,y)=[k]G Take r=x mod q; If r=0 repeat from 2 Take s=(h+r·α)/k mod q; if s=0 repeat from 2 (r,s) is the signature Note that k = ( r / s) × a + ( h / s) mod q 3 wNAF Form To compute [d]G, first write d in wNAF form: n-1 d = å di 2 for di Î {0, ±1,±3,...,±(2 -1)} i w i=0 Such that if di≠0 then di+1=…=di+w+1=0. 4 Scalar Multiplication with wNAF form Precompute {±G, ±[3]G,…, ±[2w-1]G} x=0 for i=n-1 downto 0 x = Double(x) if (di≠0) then x = Add(x, [di]G) end end return x 5 Notation x q = min ( x - aq) aÎ Note that x mod q < y implies x-y/2q < y/2 6 The Hidden Number Problem Suppose we know numbers ti, ui such that ati - ui q < q / 2 z 7 The Hidden Number Problem Suppose we know numbers ti, ui such that ati - ui q < q / 2 z We can construct a lattice æ z ç 2 ×q ç ç ç ç 2z × t 1 è And a vector (2 ö ÷ ÷ ÷ 2z × q ÷ 2 z × td 1 ÷ø × u1,… , 2 × ud , 0) Which is very close to a lattice vector that depends on α. z z 8 HNP and ECDSA k = ( r / s) × a + ( h / s) mod q Recall that z We want ati - ui q < q / 2 In terms of k: k 0 n Or in terms of ti and ui: ( r / s) × a + ( h / s) q = k 9 HNP and ECDSA k = ( r / s) × a + ( h / s) mod q Recall that z We want ati - ui q < q / 2 In terms of k: k a n l 0 Or in terms of ti and ui: ( r / s) × a - (- ( h / s )) q < q 10 HNP and ECDSA k = ( r / s) × a + ( h / s) mod q Recall that z We want ati - ui q < q / 2 In terms of k: k-a 0 n l 0 Or in terms of ti and ui: ( r / s) × a - ( a - ( h / s)) q < q 11 HNP and ECDSA k = ( r / s) × a + ( h / s) mod q Recall that z We want ati - ui q < q / 2 In terms of k: l k a / 2 ( ) 0 n-l Or in terms of ti and ui: (( r / s) × a - ( a - ( h / s))) / 2 l q <q/2 l 12 HNP and ECDSA k = ( r / s) × a + ( h / s) mod q Recall that z We want ati - ui q < q / 2 In terms of k: ( k - a - q / 2) / 2l q < q / 2l+1 Or in terms of ti and ui: ( 0 n-(l+1) ) l l+1 r / s × a a h / s + q / 2 / 2 < q / 2 ( ) ( ( ) ) q 13 HNP and ECDSA – State of the Art • Useful information: – l bits for l known LSBs – Between l-1 and l bits for l known MSBs – l/2 bits for arbitrary l consecutive bits • Liu and Nguyen 2013 – 160 bit key, 100 signatures, 2 known bits – Expected 200 observed signatures 14 The X86 Cache • Memory is slower than the processor • The cache utilises locality to bridge the gap Processor – Divides memory into lines – Stores recently used lines Cache • Shared caches improve performance for multi-core processors Memory 15 Cache Consistency • Memory and cache can be in inconsistent states Processor – Rare, but possible • Solution: Flushing the cache contents Cache – Ensures that the next load is served from the memory Memory 16 The FLUSH+RELOAD Technique • Exploits cache behaviour to leak information on victim access to shared memory. – Shared text segments – Shared libraries – Memory de-duplication • Spy monitors victim’s access to shared code – Spy can determine what victim does – Spy can infer the data the victim operates on 17 FLUSH+RELOAD • FLUSH memory line • Wait a bit • Measure time to RELOAD line Processor Cache – slow-> no access – fast-> access • Repeat Memory 18 Uses • OpenSSL AES (Gullasch et al. 2011) • GnuPG RSA (CVE 2013-4242) [Yarom & Falkner 2014] • OpenSSL ECDSA over binary fields (CVE 20140076) [Yarom & Benger 2014] • OpenSSL ECDSA over prime fields – this work. • OpenSSL AES (cross-VM) [Irazoqui et al. 2014] • PaaS clouds [Zhang et al. 2014] 19 Attacking OpenSSL wNAF • Achieve sharing with the victim code • Use FLUSH+RELOAD to recover the double and add chain of the wNAF calculation • Divide time into slots of 1200 cycles (about 0.4μs) • In each slot, probe a memory line in the code of the Double and Add functions. 20 Problem I - Speculative Execution x=0 for i=n-1 downto 0 x = Double(x) if (di≠0) then x = Add(x, [di]G) end end return x Solution: Place probe here 21 Problem II - Overlap (A) Victim Attacker (B) Victim Attacker (C) Victim Attacker (D) Victim Attacker Attacker Flush Wait Victim Reload Access Something else 22 Problem II - Overlap Victim if (A) (!BN_mod_sub_quick(n0, n2, &r->X, p)) goto err; Attacker if (!field_mul(group, n0, n1, n0, ctx)) goto err; if (!BN_mod_sub_quick(&r->Y, n0, n3, p)) goto err; (B) Victim Attacker (C) Victim Attacker (D) 0x59d62a: 0x59d62f: 0x59d632: 0x59d637: 0x59d63a: 0x59d63d: 0x59d641: mov 0x8(%rsp),%rcx mov %rbx,%r8 mov 0x18(%rsp),%rdx mov %rbp,%rdi mov %rcx,%rsi callq *0x38(%rsp) test %eax,%eax Victim Attacker Attacker Flush Wait Victim Reload Access Something else 23 Demo 24 Sample Trace Raw: D||||D|D|||D||||A||||D|||D||||D|||D|||A|A|||D|||D||| |D|||D||||D|||D|||A||||D|||D|D|||D|||D||||D||A|A|||D |||D|D|||D|||D|||A||||D|||D|||D|||D|||D|||A|A||||D|| |D|||D||||D||A|A|||D||||D|||D|||D|||A|||D|||D|||D|D| ||D|||D|||A||||D|||D|||D|||D|D|||D|||D|||D|||A|||D|D |||D||… Processed: DDDADDDDADDDDDDADDDDDADDDDADDDDDADDDDADDDDADDDDDADDD DDDDADDDDADDDDADDDDDADDDDADDDDDDDADDDDDDADDDDADDDDAD DDDADDDDADDDDADDDDDADDDDDADDDDADDDDADDDDADDDDADDDDAD DDDADDDDDDDADDDDDADDDDADDDDDDADDDDADDDDDDADDDDDADDDD ADDDDDADDDDDADDDDDADDDDDADDDDXDDDADDDDADDDDADDDDADDD DDADDDDADDDDDDADDDDDADDDDADDDDDDADDDDDDADDDDADD 25 Using the LSBs The trace: DDDADDDDADDDDD…DDDDDDADDDDADD Reveals 3 LSBs (100). A different trace might reveal fewer bits. How do we deal with that? æ z ç 2 ×q ç ç ç ç 2z × t 1 è ö ÷ ÷ ÷ z 2 ×q ÷ 2 z × td 1 ÷ø z z 2 × u ,… , 2 × ud , 0) ( 1 26 Using the LSBs The trace: DDDADDDDADDDDD…DDDDDDADDDDADD Reveals 3 LSBs (100). A different trace might reveal fewer bits. How do we deal with that? æ z1 ç 2 ×q ç ç ç ç 2 z1 × t 1 è ö ÷ ÷ ÷ zd 2 ×q ÷ 2 zd × td 1 ÷ø z1 zd 2 × u ,… , 2 × ud , 0) ( 1 We vary the z per (ti, ui) tuple. 27 Results • For secp256k1 Expected # Sigs Time (s) d Success Prob. Time / Prob. 200 220 240 100 110 60 611.13 79.67 2.68 .035 .020 .005 17460 3933 536 260 280 300 65 70 75 2.26 4.46 13.54 .055 .295 .530 41 15 26 28 Using More Information DDDADDDDADDDDDDADDDDDADDDDADDDDDADDDDADDDDADDDDDADDD DDDDADDDDADDDDADDDDDADDDDADDDDDDDADDDDDDADDDDADDDDAD DDDADDDDADDDDADDDDDADDDDDADDDDADDDDADDDDADDDDADDDDAD DDDADDDDDDDADDDDDADDDDADDDDDDADDDDADDDDDDADDDDDADDDD ADDDDDADDDDDADDDDDADDDDDADDDDXDDDADDDDADDDDADDDDADDD DDADDDDADDDDDDADDDDDADDDDADDDDDDADDDDDDADDDDADD We know how to use the revealed LSBs But these give an average of 2 bits per observed signature. 29 Using More Information DDDADDDDADDDDDDADDDDDADDDDADDDDDADDDDADDDDADDDDDADDD DDDDADDDDADDDDADDDDDADDDDADDDDDDDADDDDDDADDDDADDDDAD DDDADDDDADDDDADDDDDADDDDDADDDDADDDDADDDDADDDDADDDDAD DDDADDDDDDDADDDDDADDDDADDDDDDADDDDADDDDDDADDDDDADDDD ADDDDDADDDDDADDDDDADDDDDADDDDXDDDADDDDADDDDADDDDADDD DDADDDDADDDDDDADDDDDADDDDADDDDDDADDDDDDADDDDADD We know how to use the revealed LSBs But these give an average of 2 bits per observed signature. Can we use the information about the MSBs? 30 Using the MSBs Assume dm+l,dm≠0 Before adding [dm]G, x is: 1000…000 0 l+1 x=0 for i=n-1 downto 0 x = Double(x) if (di≠0) then x = Add(x, [di]G) end end return x 31 Using the MSBs Assume dm+l,dm≠0 Before adding [dm]G, x is: 1000…000 0 l+1 After adding [dm]G, for dm>0 it is And for dm<0 100…00 w 0 l+1 011…11 w 0 l+1 x=0 for i=n-1 downto 0 x = Double(x) if (di≠0) then x = Add(x, [di]G) end end return x 32 Using the MSBs Assume dm+l,dm≠0 Before adding [dm]G, x is: 1000…000 0 l+1 After adding [dm]G, for dm>0 it is And for dm<0 011…11 w 0 l+1 Either way, k+2m+w n 100…00 w 0 l+1 100…00 m+l+1 m+w+1 x=0 for i=n-1 downto 0 x = Double(x) if (di≠0) then x = Add(x, [di]G) end end return x 0 33 Observation For many “standard” curves, q is close to a power of two. That is, q=2n-ε such that |ε|<2p for p≪n. For example for secp256k1 q=FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF EBAAEDCE6AF48A03BBFD25E8CD0364141 Adding or subtracting q to an n bit number is unlikely to change the MSBs 34 Using all the Information For m>p k + 2 m+w a n (k + 2 ) × 2 m+w (k + 2 ) × 2 m+w (k(+k 2+ 2 ) × )2× 2 m+w m+w n-m-l-1 n-m-l-1 100…00 n+w-l n a - 2n-1 - 2- 2 - a2 - a2+ ae n-1n-1 n n 0 0 { + 0 n+w-l 0 n-m-l-1 0 n 0 n-m-l-1 0 a n n-m-l-1 n-m-l-1 100…00 m+l+1 m+w+1 0 n+w-l n+p-m-l-1 n-m-l-1 aε 0 0 } 35 Using all the Information For m>p k + 2 m+w a n (k + 2 ) × 2 m+w (k + 2 ) × 2 m+w (k + 2 ) × 2 m+w n-m-l-1 n-m-l-1 n-m-l-1 100…00 n+w-l n a - 2n-1 -2 - a2 + ae n m+w n-m-l-1 0 { + n+w-l - 2n-1 - aq 0 n-m-l-1 0 0 n+w-l n 0 m+l+1 0 n+p-m-l-1 (k + 2 ) × 2 0 0 a n n-1 100…00 m+l+1 m+w+1 n-m-l-1 aε 0 0 } 0 n n+w-l 0 36 Using all the Information For m>p (k + 2 ) × 2 m+w (k + 2 ) × 2 m+w n-m-l-1 n-m-l-1 - 2n-1 0 a n+w-l n -2 n-1 - a2 + ae n { + 0 0 0 n+w-l n n+p-m-l-1 (k + 2 ) × 2 m+w n-m-l-1 (k + 2 ) × 2 n-m-l-1 (k + 2 ) × 2 n-m-l-1 m+w m+w - 2n-1 - aq 0 n-m-l-1 n-m-l-1 aε 0 0 } 0 n+w-l n - 2n-1 - aq - 2n+w-l-1 0 0 n n+w-l-1 0 - 2n-1 - 2 n+w-l-1 < 2n+w-l-1 » q / 2l-w+1 q n+w-l-1 0 37 Results # perfect traces 10 11 12 13 Time (s) Success probability 2.25 4.66 7.68 0.07 0.25 0.38 11.3 0.54 With a very high probability, observing 25 signatures yields more than 13 perfect traces. 38 Summary • FLUSH+RELOAD provides a nearly perfect crosscore, cross-VM side channel • No need for a fixed number of bits in HNP • Can handle the negative digits in wNAF • Can handle non-consecutive bits • Curve choice allows using almost half of the information in each perfect trace • We can break a 256 bit curve after observing as little as 25 signatures. 39