Authentication Methods: From Digital Signatures to Hashes Lecture Motivation We have looked at confidentiality services, and also examined the information theoretic framework for security. Confidentiality between Alice and Bob only guarantees that Eve cannot read the message, it does not address: – Is Alice really talking to Bob? – Is Bob really talking to Alice? In this lecture, we will look at the following problems: – Entity Authentication: Proof of the identity of an individual – Message Authentication: (Data origin authentication) Proof that the source of information really is what it claims to be – Message Signing: Binding information to a particular entity – Data Integrity: Ensuring that information has not been altered by unknown entities Lecture Outline Discrete Logarithms and ElGamal – Primitive elements and some more number theory (quickly) – DLOG – ElGamal, another Public Key Algorithm… Digital Signatures: – The basic idea – RSA Signatures and ElGamal Signatures – Inefficiencies: Hashing and Signing Hash Functions: – Definitions and terminology – CHP Hash – SHA-1 Message Authentication Codes Note: Some attacks will be discussed. More attacks and cryptanalysis will come later in the semester Primitive Roots Consider the following powers of 3 (mod 7): 31 3, 32 2, 33 6, 34 4, 35 5, 36 1 (mod 7) Note that we obtain all non-zero numbers mod 7. When this happens, we call 3 a primitive root (or generator) mod 7. Is a number always a primitive root? No. If p is prime there are f(p-1) primitive roots mod p. How to find them? Good homework problem… Proposition: Let g be a primitive root for the prime p 1. If n is an integer, then gn=1 (mod p) if and only if and only if n=0 (mod p-1) . 2. If j and k are integers, then gj=gk (mod p) if and only if j=k (mod p-1). Proof: We sketch (1) on the board. Discrete Logarithms Let p be a prime, and a and b nonzero integers (mod p) with b a x (mod p) The problem of finding x is called the discrete logarithm problem, and is written: x La b Often a will be a primitive root mod p. The discrete log behaves like the normal log in many ways: La b1b2 La b1 La b2 Generally, finding the discrete log is a hard problem. f(x) = ax (mod p) is an example of a one-way function. ElGamal Public Key Cryptosystem One way functions are often used to construct public key cryptosystems. We saw one in RSA, we now show an example using the DLOG problem. Alice wants to send m to Bob. Bob chooses a large prime p and a primitive root a. We assume 0 < m < p. Bob also chooses a secret integer a and computes b=aa (mod p). Bob’s Public key is: (p, a, b) Alice does: 1. 2. 3. Chooses a secret random integer k and computes r=ak (mod p) Computes t=bkm (mod p). Sends (r,t) to Bob. Bob decrypts by: tr a m (mod p) ElGamal Public Key Cryptosystem, pg. 2 Important issues… – a must be kept secret, else Eve can decrypt – Eve sees (r,t): t is the product of two random numbers and is hence random. Knowing r does not really help as Eve would need to be able to solve DLOG in order to get k. Very important: A different random k must be used for each message! – If we have m1 and m2, and use the same k, then the ciphertexts will be (r,t1) and (r,t2) – If Eve ever finds m1 then she has m2 also!!! t1 / m1 bk t 2 / m2 m2 t 2 m1 / t1 mod p Overview of Digital Signatures Suppose you have an electronic document (e.g. a Word file). How do you sign the document to prove to someone that it belongs to you? You can’t use a scanned signature at the end– this is easy to forge and use elsewhere. Conventional signing can’t work in the digital world. We require a digital signature to satisfy: 1. 2. Digital signatures can’t be separated from the message and attached to another message. Signature needs to be verified by others. An Application for Digital Signatures Suppose we have two countries, A and B, that have agreed not to test any nuclear bombs (which produce seismic waves when detonated). How can A monitor B by using seismic sensors? 1. The sensors need to be in country B, but A needs to access them. There is a conflict here. 2. Country B wants to make sure that the message sent by the seismic sensor does not contain “other” data (espionage). 3. Country A, however, wants to make sure that the data has not been altered by country B. (Assumption: the sensor itself is tamper proof). How can we solve this problem? Treaty Verification Example RSA provides a solution: 1. Country A makes an RSA public/private key. (n,e) are given to B but (p,q,d) are kept private in the tamper-proof sensor. 2. Sensor collects data x and uses d to encrypt: y=xd (mod n), and sends x and y to country B. 3. Country B takes x and y and calculates z=ye (mod n). 4. If z=x, then B can be sure that the encrypted message corresponds to x. B then forwards (x,y) to A. 5. Country A checks that ye (mod n)=x. If so, then A is sure that x has not been modified, and A can trust x as being authentic. In this example, it is hard for B to forge (x,y) and hence if (x,y) verifies A can be sure that data came unaltered from the sensor. RSA Signatures The treaty example is an example of RSA signatures. We now formalize it with Alice and Bob. Alice publishes (n,eA) and keeps private (p,q,dA) Alice signs m by calculating y=mdA (mod n). The pair (m,y) is the signed document. Bob can check that Alice signed m by: 1. 2. Downloading Alice’s (n,eA) from a trusted third party. Guaranteeing that he gets the right (n,eA) is another problem (we’ll talk about this in a later lecture). Calculate z=yeA (mod n). If z=m then Bob (or anyone else) can be guaranteed that Alice signed m. RSA Signatures, pg. 2 Suppose Eve wants to attach Alice’s signature to another message m1. She cannot simply use (m1, y) since y eA m1 mod n Therefore, she needs y1 with y1eA=m1 (mod n). m1 looks like a ciphertext and y1 like a plaintext. In order for Eve to make a fake y1 she needs to be able to decrypt m1 to get y1!!! She can’t due to hardness of RSA. Existential Forgery: Eve could choose y1 first and then calculate an m1 using (n,eA) via m1=y1eA (mod n). Now (m1, y1) will look like a valid message and signature that Alice created since m1=y1eA (mod n). Problem with existential forgery: Eve has made an m1 that has a signature, but m1 might be gibberish! Usefulness of existential forgery depends on whether there is an underlying “language” structure. Blind RSA Signatures Sometimes we might want Alice to sign a document without knowing its contents (e.g. privacy concerns: purchaser does not want Bank to know what is being purchased, but wants Bank to authorize purchase). We can accomplish this with RSA signatures (Bob wants Alice to sign a document m): 1. Alice generates an RSA public and private key pair. 2. Bob generates a random k mod n with gcd (k,n)=1. 3. Bob computes t=keAm (mod n), and sends t to Alice. 4. Alice signs t as following the normal RSA signature procedure by calculating s=tdA (mod n). Alice sends Bob s. 5. Bob computes k-1s (mod n). This is the signed message mdA (mod n). Verification: k s mod n k t 1 1 d A k k 1 eA m Does Alice learn anything about m from t? dA k 1k eAd A md A md A mod n ElGamal Signatures We may modify the ElGamal public key procedure to become a signature scheme. Alice wants to sign m. Alice chooses a large prime p and a primitive root a. Alice also chooses a secret integer a and computes b=aa (mod p). Alice’s Public key is: (p, a, b). Security of the signature depends on the fact a is private. Alice does: 1. Chooses a secret random integer k with gcd(k,p-1)=1, and computes r=ak (mod p) 2. Computes s=k-1(m-ar) (mod p). 3. The signed message is the triple (m,r,s). ElGamal Signatures, pg. 2 Bob can verify by: 1. Downloading Alice’s public key (p, a, b). 2. 3. Computes v1=brrs (mod p) and v2=am (mod n) The signature is valid if and only if v1=v2 (mod p) Verification: We have sk m ar mod p 1 m sk ar mod p 1 Therefore v2 a a a a br r s v1 mod p This scheme is believed to be secure, as long as DLOG is hard to solve. m sk ar a r k s Don’t: Choose a p with (p-1) the product of small primes and don’t reuse k. Wastefulness of plain signatures In signature schemes with appendix, where we attach the signature to the end of the document, we increase the communication overhead. If we have a long message m=[m1,m2,…,mN], then our signed document is {[m1,m2,…,mN],[sigA(m1),…,sigA(mN)]}. This doubles the overhead! We don’t want to do this when communication resources are precious (which is always!). Solution: We need to shrink the message into a smaller representation and sign that. Enter: Hash functions Hash Functions Straight-forward application of digital signatures can be expensive when the message is large In general, many security protocols benefit from using a “digested” or “compressed” representative of a message – We typically need additional cryptographic properties in order for the compression operation to be useful This “compression function” is a hash function: h(m) Domain Range Hash Functions, pg. 2 Formally, a cryptographic hash function h takes an input message of arbitrary length and produces a message digest of fixed length, and satisfies: 1. Given a message m, h(m) is quick to calculate 2. One-Way (preimage resistance): Given a digest y, it is computationally infeasible to find an m with h(m)=y. 3. Strongly Collision Free: It is computationally infeasible to find messages m1 and m2 with h(m1)=h(m2). Can we ever have h(m1)=h(m2)? Yes. Why? We will look at a couple examples. Chaum, vanHeijst, Pfitzman Hash We may use the DLOG problem to construct a hash function Choose a prime p such that q=(p-1)/2 is also prime. (There’s an algorithm for doing this, but that’s not our goal today). Choose two primitive roots a and b. The hash function h(m) will take integers (mod q2) to integers (mod p). Hence, producing half the bits. Write m=x0+x1q with 0 x 0 , x1 . q 1 Define the hash by: hm a x 0 bx1 mod p CHP Hash is strongly collision-free Proposition: If we know m m with h (m) h (m) , then we can solve the discrete logarithm a La b . Proof: Will be given on the board after we cover all of the slides. SHA-1 In order to get fast hash functions, we need to operate at the bitlevel. SHA-1 is one such algorithm. Many of the popular hash functions (e.g. MD5, SHA-1) use an iterative design: – Start with a message m of arbitrary length and break it into nbit blocks, m=[m1,m2,…,ml]. The last block is padded to fill out a full block size. – Message blocks are processed via a sequence of rounds using a compression function h’ which combines current block and the result of the previous round X j hX j1 , m j – X0 is an initial value, and Xl is the message digest. SHA-1, pg. 2 In SHA-1, we pad according to the rule: – Start with a message m of arbitrary length and break it into nbit blocks. – The last block is padded with a 1 followed by enough 0 bits to make the new message 64 bits short of a multiple of 512 bits in length. – Into the 64 unfilled bits of the last block, we append the 64bit representation of the length T of the message. – Overall, we have L T / 512 1 blocks of 512 bits. – The appended message becomes m=[m1,m2,…,mL]. SHA-1, pg. 3 (Basic Operations) We will need the following bit operations: SHA-1, pg. 4 (Basic Algorithm) SHA-1, pg. 5 (Inside the Alg.) Initial 160-bit register X0=[H0,H1,H2,H3,H4] SHA-1, pg. 6 (Subregister Operations) • The operations done by ft(b,C,D) depend on the round number t • The word Wt depends on the round number t • The constant Kt depends on the round number t Message Authentication Codes A message authentication code (MAC) is a function that is used to prevent alteration of messages: – – – – MACs use a shared key K between Alice and Bob Alice will send not only the message m, but also MACK(m). Bob checks whether the attached MAC matches what he calculates Eve cannot alter the message because she does not have K. The MAC takes two inputs: the key K and an arbitrary size m. Ideally, a MAC should be a random mapping from all possible inputs to n-bits of output. The uncertainty (and security) of the MAC is directly associated with the size of the key K – Remember: to Eve, the message is known, so it’s the key that contains the security CBC-MAC CBC-MAC is a method for turning a block cipher into a MAC: – Idea: encrypt m using CBC mode and throw away all but last block of ciphertext. – For messages P1, P2, …, Pk, the MAC is calculated by H 0 IV H i E K Pi H i 1 MAC H k Do not use the same key for encryption (confidentiality) and authentication! CBC-MAC, pg. 2 Be careful when using CBC-MAC. Here’s a possible protocol failure: Observe: Fix K. If MAC(a)=MAC(b), then MAC(a||c) =MAC(b||c), where c is a single block length in size. MAC(a || c) EK c MAC(a) EK c MAC(b) MAC(b || c) 1. Now, suppose attacker collects many MAC values and finds a collision. This gives a and b for which MAC(a)=MAC(b). 2. If attacker can get the sender to authenticate (a||c) (How is another matter…) then the attacker can replace the message being sent to the receiver with (b||c). Comment: Its not an easy attack to do, but it is a possible weakness! CBC-MAC, pg. 3 Practical Implementation Details: 1. Generally, if your message is m, do not just calculate MAC(m), rather you should make an intermediate message s=(l||m), where l is the length of m in a fixed-length format. 2. Pad s to be a multiple of block size 3. Apply CBC-MAC to the padded string s 4. Output the last ciphertext block. Do not output any intermediate block values! CBC-MAC can reuse same code as confidentiality (encryption) functions CBC-MAC is generally tough to use correctly, though. HMAC We may also use hash functions to build MACs. We cannot simply use MACK(m)=h(K||m) or h(m||K): – Having the key at the front allows for length extension attacks – Having the key at the end allows for key-recovery attacks Designers of HMAC considered these issues HMAC computes MACK m hK a || hK b || m Where a and b are constants that are specified. HMAC has been around for a while and has been cryptanalyzed. It’s the preferred MAC to use. Using MACs We must be careful using MACs. If Alice sends Bob [m||MACK(m)] and Eve records this, she may send it again at a later time (the replay attack!) Generally, you want to authenticate not just the message, but the context. That is, you want to authenticate m and additional data d (such as message number, source, destination, protocol identifier, sizes for different fields, etc.) Why all these possibilities? If you tie the message to the specific context, then it is harder for an adversary to manipulate context fields to forge. Make certain, though, that you have clear rules on how to split concatenations (d||m) back into d and m. Problems with Hashes We must be careful when using hash functions, they are subject to some “attacks” Length Extension Attack: Consider a block-based hash like SHA-1, with input blocks m=(m1, m2, …, mk), and hash h(m). A new message m’ =(m1, m2, …, mk, mk+1), will have hash h(m’)=h’(h(m),mk+1), where h’ is the compression sub-function. In systems, such as authentication applications, where we calculate h(X||m), Eve can append extra text to m and also update the hash. Partial Message Collision Attack: Suppose we are able to find m and m’ such that h(m)=h(m’). If a system uses h(m||X) as an authentication parameter, then due to the iterative nature h(m||X)=h(m’||X). An adversary can replace m with m’ during authentication. In general hashing practice, we really use f(m)= h(h(m)||m) or f(m)=h(h(m)) as the hash.