Kleptography: The outsider inside your crypto devices (and its trust implications) MOTI YUNG Columbia U./ RSA Labs. Joint work with Adam Young Approach to Trustworthy Computing Have O.S. / HW Control sensitive actions (even at browser level) Employ strong Crypto Hopefully Tamper-proof Crypto (hardware) Have worthwhile Certification of public-keys (not everyone with access to the browser is a CA – a common commercial practice that ruins the name of PKI) Now, we have strong assurance in credentials (server side as well as client side and proxy side) This is (maybe costly however it is) wise… but….… What is Kleptography? Kleptography is the study of stealing information securely and subliminally (out of your most trusted system component: Tamper proof crypto-device or un-scrutinized crypto-software). Types of information that we want to steal: — Private decryption keys/ signing keys — Symmetric decryption keys — Confidential data (industrial secrets, military secrets, national secrets) Kleptography is dedicated to (re)searching ways of obtaining such data in an undetectable fashion with high security guarantees. It is a formal cryptographic study of backdoor designs (beyond the naïve common attacks that are detectable– e.g. weak random generation). What is the goal of kleptography? To develop a robust backdoor within a cryptosystem that: 1) Provides the attacker with the desired secret information (e.g., private key of the unwary user) 2) Cannot be detected in black-box implementations (I/O access only to a hardware box/ software) except by the attacker 3) If a reverse-engineer (i.e., not the attacker) breaches the blackbox, then the previously stolen information remains confidential (secure against reverse-engineering). Ideally, confidentiality holds going forward as well. The successful reverse-engineer will learn that the attack is carried out, BUT will be unable to use the backdoor. It is the design of cryptographic Trojan horses that are robust against reverse-engineering. Asymmetric Klepto Talk Road Map Kleptographic attack on RSA key generation — Motivates the notion of a SETUP Definition of a Secretly Embedded Trapdoor with Universal Protection (SETUP) Kleptographic attack on the Diffie-Hellman key exchange Kleptographic attack on the Digital Signature Algorithm (DSA) Kleptographic Theft of RSA Private Key Problem: To devise a backdoor (i.e., a way to covertly obtain the RSA private keys of users) that can be deployed in an RSA [RSA78] key generation program such that: — The backdoor can only be utilized by the attacker, even if the code is obtained and scrutinized (confidentiality). — The resulting RSA key pair must “look like” a normal RSA key pair (indistinguishability). — The same copy of the key generation program is obtained by everyone (it may be code signed for instance). Observation: Note that a pseudorandom bit generator that uses a fixed secret seed does not accomplish this. The seed or seeds will be revealed to the reverse-engineer and the resulting pseudorandom bit sequences will be revealed (also: statistical test will reveal this attack). This provides that attacker with an exclusive advantage that is maintained even after reverse-engineering. Algorithms that can be attacked By compromising RSA key generation using a SETUP, we can compromise: — RSA [RSA78] — Rabin [Ra79] — Properly padded RSA: RSA-Optimal Asymmetric Encryption Padding (OAEP) [BR95,FIPS01,Sh01]. — And others… Certification/ Validation: Simple Zero-Knowledge protocols will not: — reveal that a SETUP attack has taken place — inhibit the operation of the SETUP attack in any way. Normal RSA Key Generation Let e be the public RSA exponent that is shared by all the users (e.g., e is often taken to be 216+1) 1) choose a large number p randomly (e.g., p is 512 bits long) 2) if p is composite or gcd(e,p - 1) 1 then goto step 1 3) choose a large number q randomly 4) if q is composite or gcd(e,q - 1) 1 then goto step 3 5) output the public key (n=pq,e) and the private key p Note that the private exponent d is found by solving for (d,k) in ed + k(n) = 1 (using the extended Euclidean alg.) RSA Encryption/Decryption Let d be the private exponent where ed = 1 mod (p-1)(q-1) Let Zn* denote the set of numbers in {1,2,3,…,n-1} that are relatively prime to n To encrypt m Zn* compute: c = me mod n To decrypt the ciphertext c compute: m = cd mod n Kleptographic RSA Key Generation The key generation algorithm is modified to contain a cryptotrojan. The cryptotrojan contains the attacker’s public key Y. This is an earlier version of the attack [YY96,YY97], more mature versions exist [YY04,YY05]. 1) choose a large value s randomly (e.g., 512-bits) 2) compute p = H(s) where H is a cryptographic one-way function 3) if p is composite or p-1 not relatively prime to e then goto step 1 4) choose a large value RND randomly 5) compute c to be the asymmetric encryption of s under Y (the attacker’s p.k.) 6) solve for (q,r) in (c || RND) = pq + r 7) if q is composite or q-1 not ….. then goto step 1 8) output the public key (n=pq,e) and the private key p Note that n is about 1024 bits in length Recovering the RSA Private Key The private key is recovered as follows: — The attacker obtains the public key (n,e) of the user — Let u be the 512 uppermost bits of n — The attacker sets c1 = u and c2 = u+1 (c2 accounts for a potential borrow bit having been taken from the computation n = pq = (c || RND) – r — The attacker decrypts c1 and c2 to get s1 and s2, respectively — Either p1 = H(s1) or p2 = H(s2) will divide n Only the attacker can perform this operation since only the attacker knows the needed private decryption key corresponding to Y. Definition of a SETUP A SETUP attack is an algorithmic modification C’ of a cryptosystem C with the following properties: 1) Halting Correctness: C and C' are efficient algorithms. 2) Output Indistinguishability: The outputs of C and C' are computationally indistinguishable to all efficient algorithms except for the attacker A. 3) Confidentiality of C: The outputs of C do not compromise the security of the cryptosystem that C implements. 4) Confidentiality of C': The outputs of C' only compromise the security of the cryptosystem that C’ implements with respect to the attacker A (and not against the traditional adversary). 5) Ability to compromise C': With overwhelming probability the attacker A (a new shadow party) can decrypt, forge, or otherwise cryptanalyze efficiently at least one private output of C' given a sufficient number of public outputs of C'. Formal Aspects In the CT-RSA-05 proceedings there is a formal “security model and definitions. There are two types of attackers (traditional and insider) The design employs tools of modern cryptography: indistinguishability, random oracle assumption regarding strong one-way hash functions, etc. There is a proof of security of the design (in the model). It is “fun” to use formal methodology and proof techniques to prove the “security of klepto” which gives us a new notion in modern cryptography that of “provable insecurity” Diffie-Hellman Key Exchange Parameters Concrete parameters: Let p be a large prime such that: — p is 768 bits long or larger — p-1 is divisible by a large prime q (e.g. p=2q+1) — q is a 160 bit prime Let g < p be an element in Zp* with order q (p,q) must provide a suitable setting for the discrete-logarithm problem. The parameters (p,q) are public The Key Exchange Problem Alice and Bob want to establish a shared secret key over an insecure network. The network in insecure in the sense that it is vulnerable to a passive eavesdropper. Once the shared secret key is derived, they can use it to send symmetrically encrypted plaintext to each other over the network. Diffie-Hellman solves this problem. The Diffie-Hellman Key Exchange 1) Alice chooses a < q randomly 2) Alice sends A = ga mod p to Bob 3) Bob chooses b < q randomly 4) Bob sends B = gb mod p to Alice 5) Alice computes k = Ba mod p 6) Bob computes k = Ab mod p Observe that k = Ba = Ab mod p since gba = gab mod p The Diffie-Hellman Assumption The classic Diffie-Hellman key exchange relies on the presumed intractability of solving the computational DiffieHellman problem. (security is assured by the decisional version) A simplified version of the Diffie-Hellman problem can be stated as follows: Given (p,g,A,B) compute k = Ab mod p where b = logg B mod p The RSA key generation has a subliminal channel (half of the bits can be fixed and we get a composite N). The DH problem does not have one (under the decisional assumption all bits are equally random and useful as “key material”)… So.. Is subliminal channel needed? The computer Science Answer: If there ain’t one create one! There are many ways to establish “secure communication channels while crypto is involved”…. Assumptions for the DH SETUP attack The assumptions are as follows: 1) The attacker can deploy the SETUP attack in a tamperresistant black-box that Alice will use (Bob can use a blackbox as well). 2) The black-box can store state information across invocations of the Diffie-Hellman algorithm (non-volatile memory). 3) The malicious designer can act as a passive eavesdropper on all of Alice and Bob’s key exchanges. Goal of the SETUP attack against DH The goals of the simplified SETUP attack are: 1) To permit the malicious manufacturer to learn every other (or all but one) Diffie-Hellman shared secret k that Alice and Bob compute. 2) To prevent Alice and Bob (and everyone else) from knowing that the attack is taking place. 3) Robustness against reverse-engineering: - If only the code for the SETUP attack is disclosed then all shared secrets past and future will remain confidential. - A single DH shared secret may be compromised if the nonvolatile state information is disclosed. Parameters for the DH SETUP attack Parameters for the attack: xm: private key generated by the malicious attacker for the attack. xm is randomly chosen such that xm < q. xm is kept secret by the attacker (e.g., in the attacker’s smart card). ym: public key corresponding to xm. Hence, ym = gxm mod p. ym is not certified, but is placed inside the black-box that Alice uses. ID: A random and secret bit string in Alice’s device (Identifier). It should be at least 160 bits in length. H: public cryptographic one-way hash function such that: H: {0,1}* Zq Intuition behind the DH SETUP attack The idea is to have the attacker: 1) Generate a private key xm and public key ym = gxm mod p 2) Place the public key ym in the black-box 3) Design the black-box to compute a shared secret k between Alice and the attacker during the first DH key exchange between Alice and Bob. k = yma mod p • Use pseudorandomness derived from k instead of a random exponent a in Alice’s next key exchange. This allows the attacker to learn the second Diffie-Hellman shared secret. In two DH exchanges there will be three (one extra shadow one)!!!! The Diffie-Hellman SETUP Attack First exchange: • Alice’s device sends A1 = ga1 mod p to Bob where a1 R Zq • Alice’s device stores a1 in non-volatile memory • Bob’s device sends B1 = gb1 mod p to Alice where b1 R Zq • Alice and Bob’s devices compute k1 = ga1b1 mod p Second exchange: • Alice’s device computes a2 = H(ID || (yma1 mod p)) • Alice’s device sends A2 = ga2 mod p to Bob • Bob’s device sends B2 = gb2 mod p to Alice where b2 R Zq • Alice and Bob’s devices compute k2 = ga2b2 mod p Recovering the 2nd DH Shared Secret The attacker: 1) Obtains A1 and B2 via passive eavesdropping. 2) Computes a2 = H(ID || (A1xm mod p)) 3) Computes k2 = B2a2 mod p Note that: A1xm mod p = ga1xm = yma1 = gxma1 mod p Security of the DH SETUP attack Device Indistinguishability: — Since ID is a large randomly chosen string and is secret within Alice’s device, a2 appears random to Alice even if the device gives (a1,a2) to her (H acts like a random oracle). Confidentiality w.r.t. the reverse-engineer: — The reverse-engineer learns ID and ym (we may assume that a1 is learned and so at most a2 is compromised). — The reverse-engineer still must solve instances of the DiffieHellman problem to learn past DH shared secrets k2. Chaining the DH SETUP attack The attack generalizes to reveal t out of t+1 Diffie-Hellman shared secrets. This is accomplished by chaining the use of the DH pseudorandom exponent. For example, Alice’s device stores a2 in non-volatile memory and computes: a3 = H(ID || (yma2 mod p)) instead of choosing a3 uniformly at random… This is called a (t,t+1)-SETUP attack. Parameters for the DSA - The primes p and q are as before (but q much smaller 160 bits). - g is the same as before (gq = 1 mod p). - The DSA signing private key is x where x is chosen randomly modulo q. - The DSA signature verification public key is y where: y = gx mod p - SHA1 denotes the “somewhat-secure hash algorithm” [NIST180] SHA1: {0,1}* {0,1}160 DSA Signing Algorithm Algorithm [NIST186] to sign m given (m,x,g,p,q): • Choose k < q randomly • Compute r = (gk mod p) mod q • Compute s = k-1(SHA1(m) + xr) mod q • Output (r,s) as the digital signature on m. • The DSA sends only 320 bits per signature, can they be exploited fast? DSA Signature Verification Algorithm Algorithm to verify the signature (r,s) given (m,r,s,y,g,p,q): 1) Make sure all values are contained in the correct sets 2) Compute t = s-1 mod q 3) Signature is valid if and only if: r = ((gt)SHA1(m) ytr mod p) mod q Note that: r = (gk mod p) mod q and k = s-1SHA1(m) + xs-1r mod q Also, (gt)SHA1(m) ytr mod p = gk mod p [Very Important!] Goal of DSA SETUP Attack The goal of the SETUP attack is to leak Alice’s private key x securely and subliminally to the malicious manufacturer of the DSA black-box. The malicious manufacturer uses the key pair (ym,xm) as before. The attack seeks to leak one exponent k in every two successive digital signatures that are output. This securely transmits x to the manufacturer since (known fact): x = r-1(sk – SHA1(m)) mod q Discrete Log Kleptogram The value computed as an intermediate one: (gt)SHA1(m) ytr mod p = gk mod p Enables a SETUP attack against DSA We call this a discrete-log kleptogram since once we can have something that looks like a DH communicated value we can exploit (reduction to the attack on DH). SETUP Attack Against DSA First DSA signature computation: 1) Alice’s device computes (r1,s1) on message m1 where r1 = (gk1 mod p) mod q and s1 = k1-1(SHA1(m1) + xr1) mod q 2) Alice’s device stores exponent k1 in non-volatile memory. Second DSA signature computation: 1) Alice’s device computes k2 = H(ID || (ymk1 mod p)) 2) Alice’s device computes (r2,s2) on message m2 where r2 = (gk2 mod p) mod q and s2 = k2-1(SHA1(m2) + xr2) mod q Recovering the 2nd signature exponent The attacker: 1) Obtains (m1,r1,s1) and (m2,r2,s2) via passive eavesdropping. 2) Computes t = s1-1 mod q 3) Computes T = (gt)SHA1(m1) ytr1 mod p Note that T = gk1 mod p 4) Computes k2 = H(ID || (Txm mod p)) Note that Txm mod p = gk1xm = ymk1 mod p 5) Computes x2 = r2-1(s2k2 – SHA1(m2)) mod q Security of the DH SETUP attack Device Indistinguishability: — Note that Alice can always recover k1 and k2 using her private key x. — Since ID is a large randomly chosen string and is secret within Alice’s device, k2 appears random to Alice. Confidentiality w.r.t. the reverse-engineer: — The reverse-engineer learns ID and ym — The reverse-engineer still must solve instances of the DiffieHellman problem to learn past DH shared secrets k2. Conclusion The notion of a cryptographic backdoor that is robust against reverse-engineering was introduced (SETUP). A SETUP attack against RSA key generation was presented. A SETUP attack against Diffie-Hellman was presented. A SETUP attack against DSA was presented. These SETUP attacks naturally extend to other cryptosystems as well: e.g., Cramer-Shoup, etc. (ongoing research… Conclusions In all these schemes: we have proof of security of the system (against all but the attacker) and a second security (exclusivity) proof for the attacker [Two systems in one!] and proofs are according to modern standards…. Cryptography is about security (we know..), it is about solving seemingly paradoxical schemes (we know….), and is also about looking for things that no one will ever look at (thus it is also about non-trivial scrutiny, namely: hacking with a purpose)…. Conclusions Trust relationships: manufacturer has to be trusted and implementations scrutinized as much as possible. Some black box tests will not work (under cryptographic assumptions) Trust within and about cryptographic system is “tricky” (also true in dealing with other systems, but not everyone thinks about it seriously! So cryptographers ought to look at these other things…) Since cryptographic thinking is important in analyzing security issues cryptographic training is important! [to catch a thief you have to think like one] THANK YOU!