Chapter 6 Hash Functions February 15, 2010 6 Hash functions have been used in computing from the earliest days, and have a particular relevance to cryptography - in particular to digital signatures. Hash functions are also sometimes known as “message digest functions” or “message compression functions”. A hash function takes a long string of data and maps it into a pseudorandom m-bit value. Setting 2m = N we have N possible such values. These can be treated as integer values or as addresses of records in memory or of 1-bit records in a bit map. 6.1Uses of hash functions 1. Traditional For spreading records (often fixed length) over a file in an ”even“ fashion. Example H(P erson0 s N ame and Address)−→ P oints to record. This avoids allocation space to letters of alphabet. Obviously we may get a clash, handled as follows: Writing: H(N + A) = Address of record 1 → Is address slot f ull? −→ Store data there ↑ No ↑ ↓ Y es ↑ ↑ Increment address by 1 ← Look at next record slot? Reading: H(N + A) = Address of slot → Is item in slot the required one? −→ Retrieve data ↑ Y es ↑ ↓ No ↑ ↑ Increment address by 1 ← Look at next record slot 2. Footprints To test if some data (example: an RSA public key) has been used before use bit-maps. H(Data) = Address of bit. In practice use several distinct H1 (), H2 (), . . . Hk (). If bits at H1 (Data), H2 (Data), . . . Hk (Data) are all set then assume data has been used before and (depending on the application) discard data, and generate some new data - example: RSA moduli. If bits at H1 (Data), H2 (Data), . . . Hk (Data) not all set place. “Ones” in all the addressed bits and accept data. It can’t have been used before. 3. Unique Digest of Messages N = H(M essage) = Digest. Here we trust (unless we keep a bit map of hashes) that no “hits” occur. A hit is when H(m1 ) = H(m2 ) for distinct messages m1 , m2 . We want H() to be such that: (a) Given m1 and H(m1 ) we can’t find m2 such that H(m1 ) = H(m2 ). (b) and cannot find a pair m1 , m2 such that H(m1 ) = H(m2 ). (a) Would allow an attacker to alter or replace an existing signed message, and append the signature of m1 to that of m2 . (b) Would allow an attacker to submit an innocent message m1 for hashing and signature; and then swap m2 for m1 and append the signature of m1 to m2 . Clearly a good H() randomises, so that changing a bit in the data changes fifty percent of bits in its hash. i.e. it is similar to an encryption 2 algorithm. Question: Could we use an encryption algorithm E(K, m) as a hash? -With fixed, public K? -With secret K? - i.e. a MAC 6.2 Probability of hits This depends on N the number of slots available and t the number of tries (which have filled slots). The expected number of hits per slot = Nt = µ. We treat the number of hits, k per slot as having a Poisson distribution. i.e. k P rob(k hits per slot) = µk! e−µ . So 2 Prob(slot is empty) = e−µ ∼ 1 − µ + µ2 . . . 2 Prob(slot is not empty) = 1-e−µ ∼ µ − µ2 . . . Prob(Two or more hits/slot) = 1-(e−µ + µe−µ ) 2 = 1-(1-µ + µ2 + µ − µ2 ) = (µ2 )/(2) For (1) Traditional use we can use relatively large µ since hits are not a disaster , but merely slow down performance. For (2) footprints, k hashes into one bit-map. (example: RSA modulus). So after t objects footprinted we have kt bits set (many several times) (µ = Nt ) Prob(1 bit is set) =1 - e−kµ Prob(all k arbitrarily picked k bits are all set) =(1-e−kµ )k P=Prob(really new data not giving footprint which suggests it’s a repeat) = 1-(1-e−kµ )k Example µ = 0.1 k P µ = 0.5 k P µ = 0.5 k P Up 1 0.905 Down 1 0.607 Up 1 0.819 ↓ 2 0.967 ↓ 2 0.600 ↓ 2 0.891 ↓ 3 0.983 ↓ 3 0.531 ↓ 3 0.908 ↓ 4 0.988 ↓ 4 0.441 Down 4 0.908 ↓ 5 0.991 ↓ 5 0.348 ↓ 5 0.899 For (3) unique hashes for digital signatures 2 P rob(slot has two or more hits) ∼ µ2 Expected number of slots with two or more hits≈ there aren’t such slots want 3 N.µ2 . 2 For security that Expected Number i.e. (Nµ2 )/(2) i.e. µ i.e. t << << << << 1 1 (2/N)1/2 (2N)1/2 q (Expected number of tries before a repeat = (πN )/(2) from the birthday paradox). Say t ∼ (N )1/2 ∼ (2m )1/2 → log2 t ∼ m2 . Suppose m = 160 (bits) and t ∼ 280 hashes before a hit. 280 is infeasible (∼ 1024 ops. One year = 3 ∗ 1016 secs) 6.3 The Structure of Hash Functions CBC-MAC Structure This is a basic structure: 4 The function f () could be encryptor E(k; xi ) where k is a key - but for a Hash there is no secrecy and so k is a known constant. We have yi = f (mi ⊕ yi−1 ). (Clearly we need an initial value (IV), y0 , and the final Hash(message) = yn ). This scheme is not secure. A fraudster can introduce a spurious m0i , and then choose an m0i+1 so that we return to the original yi+1 = f (mi+1 + yi ) (and the same final resulting Hash from a modified message). Thus Change Change or so set mi to m0i to give yi0 = f (m0i + yi−1 ) 0 mi+1 to m0i+1 to give yi+1 = yi+1 0 0 mi+1 + yi = mi+1 + yi m0i+1 = mi+1 − yi0 + f (mi + yi−1 ) (yi0 and yi−1 are observed by the fraudster and f () is supposedly known) RIPE-MAC Structure An improved structure is as follows: 5 Output Zi = mi + f (xi ) = mi + f (mi + zi−1 ) To attack this the fraudster changes mi to m0i giving y0i = f(m0i + zi−1 ) z0i = y0i + m0i He wants z0i+1 = zi+1 0 or mi+1 + f (m0i+1 + zi0 ) = mi+1 + f (mi+1 + mi + f (mi + zi−1 )) where zi = mi + f (mi + zi−1 ) This cannot be solved for m0i+1 in terms of the other known quantities (including f ()). A further possible chosen plain-text attack is like this: T he f raudster f inds H1 = H(m1 ) from message m1 H2 = H(m2 ) from message m2 and H3 = H(m1 |m3 ) 6 where (m1 |m3 ) means m1 concatenated by m3 , a single feedback item. Now H3 = f (m3 + H1 ) + m3 . Consider H4 = H(m2 |m4 ) where (m2 |m4 ) is m2 concatenated by a new m4 , a single feed-break item. H4 Attacker sets m4 so H4 or H4 = = = = f(m4 + H2 ) + m4 m3 + H1 − H2 f(m3 + H1 ) + m3 + H1 − H2 H3 + (H1 − H2 ) The attacker has formed a message (m2 |m4 ) and its Hash H4 from known (m1 , H1 ), (m2 , H2 ) and (m3 , H3 ) without even knowing f ()! The attacker has subverted a hash which is secret. To counter this we add a further f () before the final production of the hash: 7 Giving H(m) = f (mn + f (mn + zn−1 )). (Or in above notation H4 = f (m4 + f (m4 + H2 )) and the attacker cannot find H4 without knowing f ()). 6.4 Keyed Hash Instead of trying to build Hash functions from encryption MAC/CBC structures we could build MACs (message authentication codes) from Hash functions, by introducing a key to the Hash. For example M AC(m) = H(key : message : key). This has certain weakness, and the recommended KeyedHash structure is: HMAC(K,message) = H(K ⊕ opad, H(K ⊕ ipad, message)) where K = Key ipad = Hex(36) repeated B times (B = length of block processed (in bytes)) (B = 64 → 512 bit blocks) opad = Hex(5C) repeated B times Obviously any Hash function needs conventions for the message itself: 1. Does it need padding to reach a certain length? 2. Should its real length be included with the message? 3. Should the date be included? 4. Should the originator’s ID/Certificate be included? (cf KDSA Chapter 5) 8 6.5 The Original Hash recommended for Digital Signatures was Square-Mod-n Zi = (Zi−1 + Xi )2 mod n But Xi = 1111xi1L |1111xi1R |1111xi2L |1111xi2R |. . . |1111xikL |1111xikR | i.e. Xi is 2k bytes wide but handle k input bytes at a time. 0 An attacker wants Xi0 = Zi−1 + Xi − Zi−1 or 0 X0i = (Zi−1 − Zi−1 ) + Xi 9 . (0.1) But the x0i we put in will be expanded to Xi0 = 1111x0i1L | etc. So equation 0 ?? above will only hold if (Zi−1 − Zi−1 ) = 0000xxxx0000xxxx which has −8k probability (2 ). However there are clearly problems with all-zero input etc. MDC is another proposed Hash, using DES encryption F () 64-bit in; 128-bit out. E∗1 (x) = E ∗ (K1 , x) E∗2 (x) = E ∗ (K2 , x) K1 = K but set f irst 2 bits to 0 1 K2 = K but set f irst 2 bits to 1 0 6.6 RIPE-MD is another EU developed Hash function handling 512-bit input blocks and yielding 128-bit output. It was later extended to RIPE-160 to give 160-bit output. 10 6.7 SHA-1 (Secure Hash Algorithm) This is the most used hash function in cryptography. 512 input buts are processed at a time. Output is 160 bits. Each round has as inputs H0, H1, H2, H3, H4 (Five 32-bit words of feedback=hash to date) and 512 bits of message Mi . A round contains a loop, iterated 80 times. The output of the loop is added to the input H0, H1, H2, H3, H4 to give the new Hash-to-date. 1. Divide 512-bit Mi into sixteen 32-bit words W (0) to W (15) 2. Construct W (16) to W (79) from them 3. Set A = H0, B = H1, C = H2, D = H3, E = H4 11 4. Loop 80 t = 0, 79 T emp = S 5 (A) + f (t, B, C, D) + E + W (t) + K(t) E = D, D = C, C = S 30 (B), B = A, A = T emp 5. H0 = H0 + A, H1 = H1 + B, H2 = H2 + C, H3 = H3 + D, H4 = H4 + E 6. i = i + 1 next block of input Notes on SHA-1 1. “+” = Addition (drop overflow) 2. S n (x) = Rotate 32-bit X n positions left 3. . 0≤ t ≤ 19 20≤ t ≤ 39 40≤ t ≤ 59 60≤ t ≤ 79 f(t, f(t, f(t, f(t, B, B, B, B, C, C, C, C, D)=(B&C) OR (B&D) D)=B XOR C XOR D D)=(B&C) OR (B&D) OR (C&D) D)=B XOR C XOR D (& = logical AN D, B=complement of B) 4. . 0≤ t ≤ 19 20≤ t ≤ 39 40≤ t ≤ 59 60≤ t ≤ 79 K(t)=5A827999 hex K(t)=6ED9EBA1 K(t)=8F1BBCDC K(t)=CA62C1D6 5. Initial values for H()0 s are: H0 H1 H2 H3 H4 = = = = = 67452301 EFCDAB89 98BADCFE 10325476 C3D281F0 6. Expansion procedure for t = 16 to 79 W (t) = S 1 (W (t − 3) XOR W (t − 8) XOR W (t − 14) XOR W (t − 16)) 12 7. Padding of input message to produce n ∗ 512 bits message P adding |101 . . . 0011 | 100. . . 00 ←−←−←− n∗512 bits | message length ← 64 bits → | −→−→−→ The message length field of 64 bits given the bit length of the message (< 264 ). 6.8 Further Developments SHA-1, although widely used, has certain weaknesses - probably more theoretical than practical. Proposals for a better hash function have been invited (in the manner of the AES project, see Chapter 2), but as yet no selection of the short-listed or winning candidates has been made. 13