Chapter 6 Hash Functions 6 February 15, 2010

advertisement
Chapter 6
Hash Functions
February 15, 2010
6
Hash functions have been used in computing from the earliest days, and have
a particular relevance to cryptography - in particular to digital signatures.
Hash functions are also sometimes known as “message digest functions”
or “message compression functions”.
A hash function takes a long string of data and maps it into a pseudorandom m-bit value. Setting 2m = N we have N possible such values. These
can be treated as integer values or as addresses of records in memory or of
1-bit records in a bit map.
6.1Uses of hash functions
1. Traditional
For spreading records (often fixed length) over a file in an ”even“ fashion.
Example H(P erson0 s N ame and Address)−→ P oints to record.
This avoids allocation space to letters of alphabet. Obviously we may
get a clash, handled as follows:
Writing: H(N + A) = Address of record
1
→
Is address slot f ull?
−→ Store data there
↑
No
↑
↓ Y es
↑
↑
Increment address by 1
← Look at next record slot?
Reading: H(N + A) = Address of slot
→ Is item in slot the required one? −→ Retrieve data
↑
Y es
↑
↓ No
↑
↑
Increment address by 1
←
Look at next record slot
2. Footprints
To test if some data (example: an RSA public key) has been used
before use bit-maps. H(Data) = Address of bit.
In practice use several distinct H1 (), H2 (), . . . Hk (). If bits at
H1 (Data), H2 (Data), . . . Hk (Data) are all set then assume data has
been used before and (depending on the application) discard data, and
generate some new data - example: RSA moduli.
If bits at H1 (Data), H2 (Data), . . . Hk (Data) not all set place. “Ones”
in all the addressed bits and accept data. It can’t have been used before.
3. Unique Digest of Messages
N = H(M essage) = Digest. Here we trust (unless we keep a bit map
of hashes) that no “hits” occur. A hit is when H(m1 ) = H(m2 ) for
distinct messages m1 , m2 . We want H() to be such that:
(a) Given m1 and H(m1 ) we can’t find m2 such that H(m1 ) = H(m2 ).
(b) and cannot find a pair m1 , m2 such that H(m1 ) = H(m2 ).
(a) Would allow an attacker to alter or replace an existing signed
message, and append the signature of m1 to that of m2 .
(b) Would allow an attacker to submit an innocent message m1 for
hashing and signature; and then swap m2 for m1 and append the
signature of m1 to m2 .
Clearly a good H() randomises, so that changing a bit in the data
changes fifty percent of bits in its hash. i.e. it is similar to an encryption
2
algorithm.
Question: Could we use an encryption algorithm E(K, m) as a hash?
-With fixed, public K?
-With secret K? - i.e. a MAC
6.2 Probability of hits
This depends on N the number of slots available and t the number of tries
(which have filled slots). The expected number of hits per slot = Nt = µ.
We treat the number of hits, k per slot as having a Poisson distribution. i.e.
k
P rob(k hits per slot) = µk! e−µ . So
2
Prob(slot is empty) = e−µ ∼ 1 − µ + µ2 . . .
2
Prob(slot is not empty) = 1-e−µ ∼ µ − µ2 . . .
Prob(Two or more hits/slot) = 1-(e−µ + µe−µ )
2
= 1-(1-µ + µ2 + µ − µ2 )
= (µ2 )/(2)
For (1) Traditional use we can use relatively large µ since hits are not a
disaster , but merely slow down performance.
For (2) footprints, k hashes into one bit-map. (example: RSA modulus). So after t objects footprinted we have kt bits set (many several times)
(µ = Nt )
Prob(1 bit is set) =1 - e−kµ
Prob(all k arbitrarily picked k bits are all set) =(1-e−kµ )k
P=Prob(really new data not giving
footprint which suggests it’s a repeat) = 1-(1-e−kµ )k
Example
µ = 0.1 k
P
µ = 0.5 k
P
µ = 0.5 k
P
Up
1 0.905
Down 1 0.607
Up
1 0.819
↓
2 0.967
↓
2 0.600
↓
2 0.891
↓
3 0.983
↓
3 0.531
↓
3 0.908
↓
4 0.988
↓
4 0.441
Down
4 0.908
↓
5 0.991
↓
5 0.348
↓
5 0.899
For (3) unique hashes for digital signatures
2
P rob(slot has two or more hits) ∼ µ2
Expected number of slots with two or more hits≈
there aren’t such slots want
3
N.µ2
.
2
For security that
Expected Number
i.e. (Nµ2 )/(2)
i.e. µ
i.e. t
<<
<<
<<
<<
1
1
(2/N)1/2
(2N)1/2
q
(Expected number of tries before a repeat = (πN )/(2) from the birthday
paradox). Say t ∼ (N )1/2 ∼ (2m )1/2 → log2 t ∼ m2 . Suppose m = 160 (bits)
and t ∼ 280 hashes before a hit. 280 is infeasible (∼ 1024 ops. One year
= 3 ∗ 1016 secs)
6.3 The Structure of Hash Functions
CBC-MAC Structure
This is a basic structure:
4
The function f () could be encryptor E(k; xi ) where k is a key - but
for a Hash there is no secrecy and so k is a known constant. We have
yi = f (mi ⊕ yi−1 ). (Clearly we need an initial value (IV), y0 , and the final
Hash(message) = yn ).
This scheme is not secure. A fraudster can introduce a spurious m0i , and
then choose an m0i+1 so that we return to the original yi+1 = f (mi+1 + yi )
(and the same final resulting Hash from a modified message). Thus
Change
Change
or
so set
mi to m0i to give yi0 = f (m0i + yi−1 )
0
mi+1 to m0i+1 to give yi+1
= yi+1
0
0
mi+1 + yi = mi+1 + yi
m0i+1 = mi+1 − yi0 + f (mi + yi−1 )
(yi0 and yi−1 are observed by the fraudster and f () is supposedly known)
RIPE-MAC Structure
An improved structure is as follows:
5
Output Zi = mi + f (xi ) = mi + f (mi + zi−1 )
To attack this the fraudster changes mi to m0i giving
y0i = f(m0i + zi−1 )
z0i = y0i + m0i
He wants z0i+1 = zi+1
0
or mi+1 + f (m0i+1 + zi0 ) = mi+1 + f (mi+1 + mi + f (mi + zi−1 ))
where zi = mi + f (mi + zi−1 )
This cannot be solved for m0i+1 in terms of the other known quantities
(including f ()).
A further possible chosen plain-text attack is like this:
T he f raudster f inds H1 = H(m1 ) from message m1
H2 = H(m2 ) from message m2
and H3 = H(m1 |m3 )
6
where (m1 |m3 ) means m1 concatenated by m3 , a single feedback item.
Now H3 = f (m3 + H1 ) + m3 . Consider H4 = H(m2 |m4 ) where (m2 |m4 )
is m2 concatenated by a new m4 , a single feed-break item.
H4
Attacker sets m4
so H4
or H4
=
=
=
=
f(m4 + H2 ) + m4
m3 + H1 − H2
f(m3 + H1 ) + m3 + H1 − H2
H3 + (H1 − H2 )
The attacker has formed a message (m2 |m4 ) and its Hash H4 from known
(m1 , H1 ), (m2 , H2 ) and (m3 , H3 ) without even knowing f ()! The attacker
has subverted a hash which is secret.
To counter this we add a further f () before the final production of the
hash:
7
Giving H(m) = f (mn + f (mn + zn−1 )). (Or in above notation H4 =
f (m4 + f (m4 + H2 )) and the attacker cannot find H4 without knowing f ()).
6.4 Keyed Hash
Instead of trying to build Hash functions from encryption MAC/CBC structures we could build MACs (message authentication codes) from Hash functions, by introducing a key to the Hash. For example M AC(m) = H(key :
message : key). This has certain weakness, and the recommended KeyedHash structure is:
HMAC(K,message) = H(K ⊕ opad, H(K ⊕ ipad, message))
where K = Key
ipad = Hex(36) repeated B times
(B = length of block processed (in bytes))
(B = 64 → 512 bit blocks)
opad = Hex(5C) repeated B times
Obviously any Hash function needs conventions for the message itself:
1. Does it need padding to reach a certain length?
2. Should its real length be included with the message?
3. Should the date be included?
4. Should the originator’s ID/Certificate be included? (cf KDSA Chapter
5)
8
6.5 The Original Hash recommended for Digital Signatures was Square-Mod-n
Zi = (Zi−1 + Xi )2 mod n
But Xi = 1111xi1L |1111xi1R |1111xi2L |1111xi2R |. . . |1111xikL |1111xikR |
i.e. Xi is 2k bytes wide but handle k input bytes at a time.
0
An attacker wants Xi0 = Zi−1 + Xi − Zi−1
or
0
X0i = (Zi−1 − Zi−1
) + Xi
9
. (0.1)
But the x0i we put in will be expanded to Xi0 = 1111x0i1L | etc. So equation
0
?? above will only hold if (Zi−1 − Zi−1
) = 0000xxxx0000xxxx which has
−8k
probability (2 ).
However there are clearly problems with all-zero input etc.
MDC is another proposed Hash, using DES encryption F ()
64-bit in; 128-bit out.
E∗1 (x) = E ∗ (K1 , x)
E∗2 (x) = E ∗ (K2 , x)
K1 = K but set f irst 2 bits to 0 1
K2 = K but set f irst 2 bits to 1 0
6.6
RIPE-MD is another EU developed Hash function handling 512-bit input
blocks and yielding 128-bit output. It was later extended to RIPE-160 to
give 160-bit output.
10
6.7 SHA-1 (Secure Hash Algorithm)
This is the most used hash function in cryptography. 512 input buts are processed at a time. Output is 160 bits. Each round has as inputs H0, H1, H2, H3, H4
(Five 32-bit words of feedback=hash to date) and 512 bits of message Mi . A
round contains a loop, iterated 80 times. The output of the loop is added to
the input H0, H1, H2, H3, H4 to give the new Hash-to-date.
1. Divide 512-bit Mi into sixteen 32-bit words W (0) to W (15)
2. Construct W (16) to W (79) from them
3. Set A = H0, B = H1, C = H2, D = H3, E = H4
11
4. Loop 80 t = 0, 79
T emp = S 5 (A) + f (t, B, C, D) + E + W (t) + K(t)
E = D, D = C, C = S 30 (B), B = A, A = T emp
5. H0 = H0 + A, H1 = H1 + B, H2 = H2 + C, H3 = H3 + D,
H4 = H4 + E
6. i = i + 1 next block of input
Notes on SHA-1
1. “+” = Addition (drop overflow)
2. S n (x) = Rotate 32-bit X n positions left
3. .
0≤ t ≤ 19
20≤ t ≤ 39
40≤ t ≤ 59
60≤ t ≤ 79
f(t,
f(t,
f(t,
f(t,
B,
B,
B,
B,
C,
C,
C,
C,
D)=(B&C) OR (B&D)
D)=B XOR C XOR D
D)=(B&C) OR (B&D) OR (C&D)
D)=B XOR C XOR D
(& = logical AN D, B=complement of B)
4. .
0≤ t ≤ 19
20≤ t ≤ 39
40≤ t ≤ 59
60≤ t ≤ 79
K(t)=5A827999 hex
K(t)=6ED9EBA1
K(t)=8F1BBCDC
K(t)=CA62C1D6
5. Initial values for H()0 s are:
H0
H1
H2
H3
H4
=
=
=
=
=
67452301
EFCDAB89
98BADCFE
10325476
C3D281F0
6. Expansion procedure for t = 16 to 79
W (t) = S 1 (W (t − 3) XOR W (t − 8) XOR W (t − 14) XOR W (t − 16))
12
7. Padding of input message to produce n ∗ 512 bits
message
P adding
|101 . . . 0011 | 100. . . 00
←−←−←−
n∗512 bits
|
message length
← 64 bits → |
−→−→−→
The message length field of 64 bits given the bit length of the message
(< 264 ).
6.8 Further Developments
SHA-1, although widely used, has certain weaknesses - probably more
theoretical than practical. Proposals for a better hash function have
been invited (in the manner of the AES project, see Chapter 2), but
as yet no selection of the short-listed or winning candidates has been
made.
13
Download