Hash Functions Ramki Thurimella What is a hash function? Also known as message digest or fingerprint Compression: A function that maps arbitrarily long binary strings to fixed length binary strings Ease of Computation : Given a hash function and an input it should be easy to calculate the output Cryptographic hash functions have more security requirements compared to the ones used to create hash tables 2 Applications Data Integrity Downloading of large files often include the MD5 digest to verify the integrity of the file Proof of Ownership Publish the digest of a document describing a patent idea in the New York Times Newspaper charges based on the number of characters Digital Signatures Digitally sign the digest of a message instead of the message to save computational time 3 Properties Collision resistance For two distinct messages m1 and m2, h(m1) ≠ h(m2) By pigeon-hole principle, this is not always possible since the domain is infinite and the range is finite The security requirement merely requires that for a h and m1, some other m2 cannot be found easily, i.e. computationally infeasible, such that both messages hash to the same value One-way property They are used in pseudorandom number generators, e.g. generate several keys from one shared secret key Learning one value should not make learning other values easy 4 Attacks Ideally a hash function should be a random mapping Attack: A non-generic method of distinguishing the hash function from an ideal hash function Non-generic means that the attack exploits the specific weaknesses of the hash function Since there is no key, exhaustive search cannot be performed Generic attack: Birthday attack 5 Birthday Attack Expected number of collisions after picking t pairs: tC * (1/2n) ≈ t2/2n 2 To get 1 collision, choose t = 2(n/2) This attack is useful only for collisions Sometimes a pre-image (given h(m) finding m) is required, e.g. privacy scenario Generic attack for pre-image requires 2n work 6 A non-generic attack • An insecure hash function 1. Pad m (specifics are not important) 2. Create 128-bit blocks m1,m2,…,mk 3. H0= 128-bit block of all zeros 4. Hi = AES ( Hi-1 mi) 5. Hk is the result • Attack – Pick m and assume it splits into m1 & m2 – Let m1 = m1 (+) H1 and m2’ = H2(+) m2 (+) H1 – Assume that m1’ & m2’ are the result of splitting m’ 7 Non-Generic Attack (cont.) Show that m & m’ map to the same value (Exercise 5.5, pp. 88) 8 The MD5 Function Specifies the compression function to use Processes 512-bit message blocks Produces a 128-bit digest Is derived from MD4 to address security concerns Created by Ronald Rivest and described in RFC 1321 9 MD5 There are four possible functions F; a different one is used in each round: http://en.wikipedia.org/wiki/File:MD5.svg 10 In the previous Figure MD5 consists of 64 of these operations, grouped in four rounds of 16 operations F is a nonlinear function; one function is used in each round Mi denotes a 32-bit block of the message input Ki denotes a 32-bit constant, different for each operation <<<s denotes a left bit rotation by s places; s varies for each operation denotes addition modulo 232 11 The SHA-1 Function Processes 512-bit message blocks Produces a 160-bit digest Is also derived from MD4 A slight modification of SHA-0 to fix a “technical error” Created by NIST with the aid of the NSA Defined in FIPS PUB 180-2 12 SHA-1 http://en.wikipedia.org/wiki/File:SHA-1.svg 13 One iteration within the SHA-1 compression function A, B, C, D and E are 32-bit words of the state F is a nonlinear function that varies Wt is the expanded message word of round t Kt is the round constant of round t 14 Examples % echo "0000 0000" | openssl dgst -sha512 | wc MD5(M) = 0x93b885adfe0da089cdf634904fd59f71 MD5(M’) = 0x55a54008ad1ba589aa210d2629c1df41 SHA-1(M) = 0x5ba93c9db0cff93f52b521d7420e43f6eda2784f SHA-1(M’)= 0xbf8b4530d8d246dd74ac53a13471bba17941dff7 On average changing 1 input bit will change 50% of the output bits Avalanche condition 15 Weaknesses of hash functions Length extension Let m = m1,m2,…,mk Choose m’ = m1,m2,…,mk,mk+1 Notice that h(m’) = h’(h(m),mk+1) Why is this a problem? Alice sends Bob m Authenticates by sending h(X || m) where X is a secret known only to Bob Eve can append to m and update the authentication code 16 Current State of Hash Functions Researchers try to break hash functions in 1 of 2 ways Finding two messages (usually single message blocks) hash to the same digest Find the message that creates the digest of all zeros or all ones (essentially finding a preimage) Collisions and preimages can be found for MD4 Collisions can be constructed for MD5 and SHA-0 Theoretic collisions discovered for SHA-1 SHA-256 & SHA-512 has no known attacks 17