Hash Function Design Overview of the basic components in SHA

advertisement
Hash Function Design
Overview of the basic components in SHA-3 competition
Daniel Joščák
daniel.joscak@i.cz
S.ICZ a.s.
Hvězdova 1689/2a, 140 00 Prague 4;
Faculty of Mathematics and Physics,
Charles University, Prague
Abstract
In this article we bring an overview of basic building blocks used in the design of new hash functions
submitted to the SHA-3 competition. We briefly present the current widely used hash functions MD5,
SHA-1, SHA-2 and RIPEMD-160. At the end we consider several properties of the candidates and give an
example of candidates that are in SHA-3 competition.
Keywords: SHA-3 competition, hash functions.
1 Introduction
In 2004 a group of researchers led by Xiaoyun Wang (Shandong University, China) presented real
collisions in MD5 and other hash functions at the rump session of Crypto conference and they explained
the method in [10]. In 2006 the same group presented a collision attack on SHA–1 in [8] and since then a
lot of progress in collision finding algorithms has been made. Although there is no specific reason to
believe that a practical attack on any of the SHA–2 family of hash functions is imminent, a successful
collision attack on an algorithm in the SHA–2 family could have catastrophic effects for digital signatures.
In reaction to this situation the National Institute of Standards and Technology (NIST) created a public
competition for a new hash algorithm standard SHA–3 [1]. Except for the obvious requirements of the
hash function (i.e. collision resistance, first and second preimage resistance, …) NIST expects SHA–3 to
have a security strength that is at least as good as the hash algorithms in the SHA–2 family, and that this
security strength will be achieved with significantly improved efficiency. NIST also desires that the SHA–3
hash functions will be designed so that a possibly successful attack on the SHA–2 hash functions is
unlikely to be applicable to SHA–3.
The submission deadline for new designs was October 31, 2008. 51 algorithms were submitted for the
competition. A lot of new ideas appeared in the submissions but candidates also contain some several
common properties. We try to summarize common building blocks which appeared and categorize the
submission according to them. The information about NIST’s organization of the SHA-3 competition,
algorithm speed and current state of attacks and are taken and can be found at NIST web page [1],
projects eBash [5] and Hash ZOO [4]. Very good comparison and categorization of the candidates can be
found in [7].
30
Security and Protection of Information 2009
2 Desired properties
In this section we briefly present definitions of properties that good hash functions and candidates for
SHA-3 algorithm must have.
Collision resistant: a hash function H is collision resistant if it is hard to find two distinct inputs that
hash to the same output (that is, two distinct inputs m1 and m2, such that H(m1) = H(m2)).
Every hash function with more inputs than outputs will necessarily have collisions. Consider a hash
function SHA256 that produces 256 bits of output from an arbitrarily large input. Since it must generate
one of 2256 outputs for each member of a much larger set of inputs, the pigeonhole principle guarantees
that some inputs will hash to the same output. Collision resistance doesn't mean that no collisions exist;
simply that they are hard to find.
The birthday paradox sets an upper bound on collision resistance: if a hash function produces N bits of
output, an attacker can find a collision by performing only 2N/2 hash operations until two outputs happen
to match. If there is an easier method than this brute force attack, it is considered a flaw in the hash
function.
First preimage resistant: a hash function H is said to be first preimage resistant (sometimes only preimage
resistant) if given h it is hard to find any m such that h = H(m).
Second preimage resistant: a hash function H is said to be second preimage resistant if given an input
m1, it is hard to find another input, m2 (not equal to m1) such that H(m1) = H(m2)
A preimage attack differs from a collision attack in that there is a fixed hash or message that is being
attacked and in its complexity. Optimally, a preimage attack on an n-bit hash function will take an order
of 2n operations to be successful.
Resistant to length-extension attacks: given H(m) and length of m but not m, by choosing a suitable m'
an attacker is not able to calculate H (m || m'), where || denotes concatenation.
Efficiency: computation of a hash function must be efficient i.e. speed matters. Hash functions are widely
deployed in many applications and it is important to have fast implementation on different architectures.
During the first SHA-3 conference organized by NIST organizer announced they initially focus on Intel
Architecture 32-bit (IA-32) and Advanced Micro Devices 64-bit (AMD64) but performance on other
platforms will not be overlooked. They asked if submitters adjust tunable parameters of candidates to run
as fast as SHA-256, SHA-512 on IA-32 and AMD64, are the algorithms secure? If not its chances in
competition are lower.
Memory requirements and code size is very important for implementation on various embedded systems
such as smart cards.
HMAC construction: hash function must have at least one construction to support HMAC (or
alternative MAC construction) as a pseudorandom function (PRF) i.e. it is hard to distinguish HMACK
based on H from a random function.
3 Current hash functions
We briefly describe four the most known and used hash algorithms to show an evolution of the hash
functions. All of the functions use the same message padding (adding bit “1”, then zeroes and length of the
message such that padded message is multiple of the block-size for compression function). All of the
functions use the Merkle-Damgård construction from a compression function which is shown in Figure 1.
All but RIPEMD-160 uses Davies-Meyer construction of compression function from a block cipher. And
Security and Protection of Information 2009
31
all of the functions use a very simple register instruction: logical operators or, and, xor in simple nonlinear
function, modular addition, shift and rotation. Functions mainly differ (except the obvious length of the
registers, message blocks and outputs) in complexity of the message expansion function and step function
which are part of the compression function. The newer the function is, a more complex message expansion
and step function is used.
M1
IV
M2
f
Mn
f
f
output
Figure 1: Merkle-Damgård construction.
3.1
MD5
MD5 was designed by Ron Rivest in 1991. It was a successor of previous MD4 and the length of output is
128 bits long. The message expansion was very simple - identity and permutations of message-block
registers. Step function is shown on Figure 2. The first cryptanalysis appeared in 1993 [6]. Real collisions
are known since 2004 [10]. It is not recommended to use this function for cryptographic purposes any
more.
Figure 2: MD5 step function, F is simple nonlinear function (taken from wikipedia).
3.2
SHA-1
Specification was published in 1995 as the Secure Hash Standard, FIPS PUB 180-1, by NIST. The output
of the function has a length of 160 bits. It was a successor of SHA0 which was withdrawn by NSA shortly
after its publication and was superseded by the revised version. SHA-1 differs from SHA-0 only by a single
bitwise rotation in the message schedule of its compression function; this was done, according to NSA, to
correct a flaw in the original algorithm which reduced its cryptographic security. It is the most common
hash function used today.
32
Security and Protection of Information 2009
In 2006 a collision attack on SHA–1 was presented in [8]. No real collisions were found till today but the
complexity of the attack is claimed to be roughly 261. It is not recommended to use this function for new
applications.
Figure 3: SHA-1 step function, F is simple nonlinear function (taken from wikipedia).
3.3
SHA-2
SHA-2 is a family of four hash functions SHA 224, SHA 256, SHA 384 and SHA 512. The algorithms
were first published in the draft FIPS PUB 180-2 in 2001. The 386 and 512 bit versions use different
constants, 64 bits long registers and 1024 bits long message blocks in compression functions. Otherwise
they are the same. SHA-2 functions have the same construction properties as SHA-1, but there weren’t any
successful applications of the previous attacks on SHA-1 or MD5 published. This is believed to be due to
their complex message expansion and step function. Nowadays users are strongly encouraged to move to
these functions.
Figure 4: SHA-2 step function, Ch, Ma, ∑0 and ∑+ are not so trivial functions (taken from wikipedia).
Security and Protection of Information 2009
33
3.4
RIPEMD-160
RIPEMD-160 is a 160-bit cryptographic hash function, designed by H. Dobbertin, A. Bosselaers, and B.
Preneel. It is intended to be used as a secure replacement for the 128-bit hash functions MD4, MD5. The
speed of the algorithm is similar to the speed of SHA-1 but the structure of the algorithm is different as
shown on Figure 5. It uses a balanced Feistel network known from the theory of block ciphers. There are
no successful attacks known on RIPEMD-160 and the function is together with the SHA-2 family
recommended by ETSI 102176-1.
Figure 5: RIPEMD compression function.
4 Building blocks
In this section we provide a list of common building blocks that appeared in SHA-3 competition. The list
may not be complete and there may be some others common properties of the candidates. For each
candidate we tried to summarize pros and cons and some examples of that design strategy. The links for
the documentation of the candidates can be found at NIST web site [1].
4.1
Feedback Shift Register (FSR)
Linear and nonlinear feedback shift registers are often used in stream ciphers. Because of their good
pseudorandom properties, easy implementation in hardware and well known theory, they are good
candidates to use as a building block in compression function.
Pros: efficiency in HW, known theory from stream ciphers, easy to implement.
Cons: implementation in SW may be slow, possible cons of stream cipher such as long initialization.
Examples: MD6, Shabal, Essence, NaSHA.
34
Security and Protection of Information 2009
4.2
Feistel Network
A Feistel network is a general method for transforming any function into a permutation. The strategy has
been used in the design of many block ciphers and because hash functions are often based on a block
cipher it is used there as well. A Feistel network works as follows: Take a block of length n bits and divide
it into two parts, called L and R. A round of the cipher can be calculated from the previous round by
setting Li = Ri-1 and Ri = Li-1 XOR f(Ri-1, Ki), where Ki is the subkey used in the i-th round and f is an
arbitrary round function. If L and R are of the same size, the Feistel network is said to be balanced; if they
are not, the Feistel network is said to be unbalanced.
Pros: theory and proves from block ciphers.
Cons: can not be generalized.
Examples: ARIRANG, BLAKE, Chi, CRUNCH, DynamicSHA2, JH, Lesamnta, Sarmal, SIMD, Skein,
TIB3.
4.3
Final Output Transformation
Method used in some of the hash function to prevent length extension attack.
Pros: helps to prove properties and countermeasure the length extension attack.
Cons: two different transformation (compression function and output transformation).
Examples: Cheetah, Chi, Crunch, ECHO, ECOH, Grostl, Keccak, Lane, Luffa, Lux, Skein, Vortex.
4.4
Message expansion
Method for preparing the message blocks to be an input for the step of the compression function similar to
key expansion in block ciphers.
Pros: theory from block ciphers known as key expansion.
Cons: can not be generalized,
Examples: ARIRANG, BLAKE, Cheetah, Chi, CRUNCH, ECOH, Edon-R, Hamsi, Khichidy, LANE,
Lesamnta, SANDstorm, Shabal, SHAvite-3, SIMD, Skein, TIB3.
4.5
S-box
Used for substitution to obscure the relationship between the key (message block) and the ciphertext
(value of intermediate chaining variable). Because of the extension and known properties of AES, the
majority of hash function submitted to the first round used S-Boxes from AES.
Pros: theory from block ciphers (key expansion), speed in HW,
Cons: often implemented as look-up tables which can be viewed as a door to possible side channel attacks.
Examples: Cheetah, Chi, CRUNCH, ECHO, ECOH, Grostl, Hamsi, JH, Khichidy, LANE, Lesamnta,
Luffa, Lux, SANDstorm, Sarmal, SHAvite-3, SWIFFTX, TIB3. (33 out of 51 candidates uses S-Boxes)
4.6
Wide Pipes
Countermeasure to prevent multi-collisions and multi-preimages of Joux type [8]. Wide pipe design
means that intermediate chaining variable is kept longer than the length of hash output e.g. 512 bits for
256 bit hash.
Security and Protection of Information 2009
35
Pros: prevent multi-collisions,
Cons: more complex and not as efficient to produce chaining variable of double length with the good
properties of chaining variable.
Examples: ARIRANG, BMW, Chi, Echo, Edon-R, Grostl, JH, Keccak, Lux. MD6, SIMD.
4.7
MDS Matrixes
Good diffusion properties in the theory of block ciphers are often achieved by using of Maximum
Distance Separable Matrixes. These matrixes might be helpful also in the design of hash functions.
Pros: mathematical background and proven diffusion properties
Cons: memory requirements
Examples: ARIRANG, Cheetah, ECHO, Fugue, Grostl, JH, LANE, Lux, Sarmal, Vortex.
4.8
Tree structure
Tree structure of hashing is an intuitive approach which takes advantage of parallelism from independent
compression function threads and countermeasure current attacks on Merkle-Damgård construction.
Pros: parallelism, resistant against current attacks on SHA-1 and MD5
Cons: memory requirements and “modes” of operation
Example: MD6.
4.9
Sponge structure
Works as “absorbing” the message or “squeezing” the message to produce an output. Absorbing works as
follows:
• Initialize state
• XOR some of the message to the state
• Apply compression function
• XOR some more of the message into the state
• Apply compression function …
Squeezing works as follows:
• Apply compression function
• Extract some output
• Apply compression function
• Extract some output
• Apply compression function …
Examples: Keccak, Luffa.
4.10 Merkle-Damgård like structure.
Structures very similar to Merkle-Damgård constructions of hash functions are still very popular. The
Merkle-Damgård construction is shown in Figure 1, the suggested techniques use various chaining of
intermediate variables or context.
Pros: known structure, speed
36
Security and Protection of Information 2009
Cons: how to prevent previous attacks, multi-collisions and extension attack.
Examples: ARIRANG, CRUNCH, Cheetah, Chi, LANE, Sarmal.
5 Conclusion
We have tried to present the latest overview in the design of hash functions. We showed the traditional
design techniques and presented some of the building blocks of the algorithms submitted to the SHA-3
competition along with their pros and cons.
References
[1]
National Institute of Standards and Technology: Cryptographic Hash Project
http://csrc.nist.gov/groups/
ST/hash/index.html
[2]
National Institute of Standards and Technology: SHA-3 First Round Candidates
http://csrc.nist.gov/
groups/ST/hash/sha-3/Round1/submissions_rnd1.html
[3]
Souradyuti Paul. First SHA-3 conference organized by NIST
http://csrc.nist.gov/groups/ST/hash/sha3/Round1/Feb2009/documents/Soura_TunableParameters.pdf
[4]
IAIK Graz, SHA-3 ZOO http://ehash.iaik.tugraz.at/index.php?title=The_SHA3_Zoo&oldid=3035
[5]
Daniel J. Bernstein and Tanja Lange (editors). eBACS: ECRYPT Benchmarking of Cryptographic
Systems. http://bench.cr.yp.to, accessed 27 March 2009
[6]
Bert den Boer; Antoon Bosselaers. Collisions for the Compression Function of MD5. pp. 293–304.
ISBN 3-540-57600-2
[7]
Ewan Fleischmann and Christian Forler and Michael Gorski: Classification of the SHA-3
Candidates Cryptology ePrint Archive: Report 511/2008, http://eprint.iacr.org/ version 0.81, 16
February 2009
[8]
A. Joux: Multicollisions in iterated hash functions. Application to cascaded constructions.
Proceedings of Crypto 2004, LNCS 3152, pages 306-316.
[9]
Wang X., Yin Y. L., and Yu H.: Finding collisions in the full SHA-1. In Victor Shoup, editor,
Advances in Cryptology - CRYPTO ’05, volume 3621 of Lecture Notes in Computer Science,
pages 17 – 36. Springer, 2005, 14 - 18 August 2005.
[ 10 ] Wang X. and Yu H.: How to Break MD5 and Other Hash Functions. In Ronald Cramer, editor,
Advances in Cryptology - EUROCRYPT 2005, volume 3494 of Lecture Notes in Computer
Science, pages 19 – 35. Springer, 2005.
Security and Protection of Information 2009
37
Download