CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) “Chinese” Attacks on Hashes 1. Background A hash function is a function that takes a variable-size input and returns a fixed-size string, which is called the hash value. The hash value is relatively easy to compute for any given input. A hash function is one-way and collision-free, which are key properties for our topic. MD4 is a hash function developed in 1990. It is based on the basic arithmetic and logical operations. Since its publication, several other hash functions have been designed using MD4 as the basis, including MD5, HAVAL, RIPEMD, SHA-0, SHA-1, SHA-256, etc. These hash functions all follow the same design principal as well as have similar structures as MD4. These hash functions play very important roles in the digital signatures, data integrity, and many other cryptographic protocols. They not only ensure the information security, but also improve the efficiency. Among them, MD5 and SHA-1 are most widely used today. Since the appearance of all these hash functions, people have been trying to develop techniques to perform collision search attacks on them. Developing new hash functions and breaking them are like two sides of a coin. They both help people develop better hash functions to better serve their purposes in the cryptographic world. There have been significant developments in the area in the past several years. However, the real breakthroughs came in 2004 and 2005. Dr. Xiaoyun Wang and her Chinese crew published a series of papers to break MD4, MD5, HAVAL, RIPEMD, and SHA-1 [2-7]. Although A. Joux [8] broke SHA-0 with the time complexity for finding a collision in about 2 51 SHA-0 operations, their result was significantly better. Below are the best results by “Chinese” attacks on those hash functions: The time complexity for finding a collision for MD4 is about 2 8 MD4 operations [5]. The time complexity for finding a collision for MD5 is to find the first blocks with about 2 39 MD5 operations, and the second blocks with about 2 32 MD5 operations [6]. The time complexity for finding a collision for HAVAL-128 is about 2 7 HAVAL128 operations [7]. The time complexity for finding a collision for RIPEMD is about 2 18 RIPEMD operations [5]. The time complexity for finding a collision for SHA-0 is about 2 39 SHA-0 operations [1]. The time complexity for finding a collision for SHA-1 is about 2 69 SHA-1 operations announced in [2] and later on, the result was improved to about 2 63 SHA-1 operations [9]. 1 CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) 2. “Chinese” Collision attacks The “Chinese” collision attacks on hash functions are precise differential attacks in which the differential path is more restrictive since it depends on the message difference as well as the specific values of the message bits involved. They do not use the exclusive-or as a measure of difference, but instead use modular integer subtraction as the measure. It’s called a modular differential by the Chinese crew. The attacks have some principles applicable to all MD4 family hash functions. Such attacks first pick a favorable message differential between two messages M and M’ such that these two messages have a higher probability of having the same hash value. Depending on the message differential, some number of probabilistic conditions must be met. It uses a technique called “message modification” to eliminate some of these conditions which appear in the early stages of the hash compression function. This reduction leads to a better overall complexity for the collision attack. Basically, the attacks include three steps: 1) Find a collision differential for which M and M’ probably produce a collision. Taking MD4 as an example, M = M’ −M = (m 0 ,m 1 , ...... ,m 15 ) m 1 = 2 31 , m 2 = 2 31 − 2 28 , m 12 = −2 16 mi = 0, 0 ≤ i ≤ 15, i 1, 2, 12. The reason why the collision differential is selected is not clearly specified in the article [5], but it’s stated in [7] that based on significant amount of analysis, it’s believed to be fairly easy for M and M’ to produce a collision with the collision differential chosen. 2) Derive a set of sufficient conditions which ensure the collision differential to hold. For example, the MD4 compression function has three rounds. Each round uses a different nonlinear Boolean function defined as follows: F(X, Y, Z) = (X Y) (X Z) G(X, Y, Z) = (X Y) (X Z) (Y Z) H(X, Y, Z) = X Y Z Some properties of these three nonlinear Boolean functions are very helpful for determining sufficient conditions for the differential paths that are used in the collision search attack on MD4. The sufficient conditions that ensure all the characteristics to hold can be verified by these properties. Refer to [5] for details. 2 CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) 3) For any random message M, make some modification to M such that almost all the sufficient conditions hold. This is done by two types of message modification techniques, which are termed as “single-step modification” and “multi-step modification”. Taking MD4 as an example, it is believed that the first two rounds of MD4 is not one-way, and for the single-step modification, it is to modify M such that all the conditions in round 1 hold. For the multi-step modification, it is to modify M so that some bits of M get changed in round 2 to fulfill more conditions while all the conditions in round 1 remain hold. This greatly improves the probability that M and M’ may produce a collision. Refer to [5] for details. 3. Results for MD4 and MD5 attacks Practically, it's computationally feasible to apply “Chinese” attacks on MD4 and MD5 hash functions by employing the computational power of a personal computer. I have C programs implementing the MD4 algorithm in [5] and MD5 algorithm in [6]. They are running under the Unix/Linux environment. I used cygwin to mimic the Linux environment on my WinXP system on a Pentium4 3.40G machine. I used gcc to compile the programs and used -Os and appropriate -mcpu and -mtune flags for my machine to achieve the optimal performance. Below are two runs for each MD4 and MD5 program respectively. As it shows, it takes about 5 seconds to find a collision for MD4 attacks and about 1 hour for MD5 attacks. These results clearly demonstrate that both MD4 and MD5 are severely broken and thus should not be used any longer. $ time ./md4.exe unsigned int m0[16] = { 0x45051308, 0x81730a12, 0xd56ad03c, 0x71628a68, 0xa54f00e7, 0xb32a6311, 0x0e13c786, 0xb48eae4b, 0x4656581e, 0x18a6deab, 0x9b50d7b2, 0x0cfc6be7, 0xb42bdf1e, 0x814dcfbb, 0xb776931d, 0xb27bcba6 }; unsigned int m1[16] = { 0x45051308, 0x01730a12, 0x456ad03c, 0x71628a68, 0xa54f00e7, 0xb32a6311, 0x0e13c786, 0xb48eae4b, 0x4656581e, 0x18a6deab, 0x9b50d7b2, 0x0cfc6be7, 0xb42adf1e, 0x814dcfbb, 0xb776931d, 0xb27bcba6 }; real 0m4.506s user 0m4.496s sys 0m0.020s 3 CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) $ time ./md4.exe unsigned int m0[16] = { 0xcbe53b38, 0x5032eef7, 0xb019844f, 0xebd1e372, 0x98d286ff, 0x76430bc9, 0xd2a8b026, 0xc2c5c353, 0x6d4d2c65, 0xa011e2ac, 0x61eec40e, 0x434b154e, 0xebafa851, 0xa9601efa, 0xc48a9a59, 0x578bbb57 }; unsigned int m1[16] = { 0xcbe53b38, 0xd032eef7, 0x2019844f, 0xebd1e372, 0x98d286ff, 0x76430bc9, 0xd2a8b026, 0xc2c5c353, 0x6d4d2c65, 0xa011e2ac, 0x61eec40e, 0x434b154e, 0xebaea851, 0xa9601efa, 0xc48a9a59, 0x578bbb57 }; real 0m5.467s user 0m5.477s sys 0m0.010s $ time ./md5.exe block #1 done block #2 done unsigned int m0[32] = { 0x1929aa4b, 0xd8844d17, 0x8e5e1527, 0x34b458d0, 0xacc2f035, 0xdbe6e5f1, 0x234533f1, 0x6c716baf, 0x05352d45, 0x61f48bfd, 0x769ae8c3, 0x8fda4316, 0x754098e2, 0x8c4005d8, 0xc26ca7b4, 0x22f12708, 0x06dea041, 0xe664ec4e, 0x1d72b3a0, 0x03bdc431, 0x47d0fc1c, 0x4c7bdc4e, 0x76648928, 0xbea20bd6, 0x5079c739, 0x4dae799f, 0xbbd34dfa, 0xdc019c7f, 0x9d18d4e3, 0x253ba683, 0xed5a754e, 0x5a8cd41a, }; unsigned int m1[32] = { 0x1929aa4b, 0xd8844d17, 0x8e5e1527, 0x34b458d0, 0x2cc2f035, 0xdbe6e5f1, 0x234533f1, 0x6c716baf, 0x05352d45, 0x61f48bfd, 0x769ae8c3, 0x8fdac316, 0x754098e2, 0x8c4005d8, 0x426ca7b4, 0x22f12708, 0x06dea041, 0xe664ec4e, 0x1d72b3a0, 0x03bdc431, 0xc7d0fc1c, 0x4c7bdc4e, 0x76648928, 0xbea20bd6, 0x5079c739, 0x4dae799f, 0xbbd34dfa, 0xdc011c7f, 0x9d18d4e3, 0x253ba683, 0x6d5a754e, 0x5a8cd41a, }; real 68m56.807s user 68m42.968s sys 0m0.070s 4 CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) $ time ./md5.exe block #1 done block #2 done unsigned int m0[32] = { 0x5c625d50, 0x7c27ef85, 0x24835757, 0x9b4b0d82, 0xff777f20, 0x0a0777b0, 0xe2ff34b4, 0xd3302c20, 0x0534ad31, 0x2033e5f7, 0x7fbadc53, 0x12c25f81, 0x5edab51a, 0x746c590d, 0xad958bb0, 0xfbe53434, 0x70fd8d2f, 0x2e073782, 0x22c1af9c, 0xe4b8fb96, 0xc53a1137, 0x0e0f9f6a, 0x66dd690e, 0x8f422950, 0x9e36a841, 0x05f2ddb9, 0x6aa211be, 0x9c429c6b, 0x5a3c7cea, 0x661fa395, 0x1668028a, 0x7c61522d, }; unsigned int m1[32] = { 0x5c625d50, 0x7c27ef85, 0x24835757, 0x9b4b0d82, 0x7f777f20, 0x0a0777b0, 0xe2ff34b4, 0xd3302c20, 0x0534ad31, 0x2033e5f7, 0x7fbadc53, 0x12c2df81, 0x5edab51a, 0x746c590d, 0x2d958bb0, 0xfbe53434, 0x70fd8d2f, 0x2e073782, 0x22c1af9c, 0xe4b8fb96, 0x453a1137, 0x0e0f9f6a, 0x66dd690e, 0x8f422950, 0x9e36a841, 0x05f2ddb9, 0x6aa211be, 0x9c421c6b, 0x5a3c7cea, 0x661fa395, 0x9668028a, 0x7c61522d, }; real 56m53.298s user 56m52.757s sys 0m0.030s 4. What does it mean and what to do about it? Well, it means that hash functions such as MD5 are no longer useful as digital signature hashes. It is no longer the case that you can believe that a person’s signed document is identical to your version of that document, even if the checksum matches. SHA-1 is the best we’ve got at the moment and has the time complexity of the order 2 63 , which appears to be best standing against collision attacks. While this seems like a huge number, distributed searches using many computers across the Internet have solved problems that were twice as complex. In other words, there exist large collections of computers distributed over the Internet that are capable of finding SHA-1 collisions. To exploit a collision attack, an adversary would typically begin by constructing two messages with the same hash where one message appears legitimate. The attacker could then try to get you to digitally sign the legitimate message. He would then claim that you actually signed the malicious message, and prove this claim by showing that your signature matches the malicious message. There is a similar concern for systems involved 5 CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) with the signed code and certificates that an adversary might be able to construct a valid signature or certificate request that had a corresponding hash collision with a malicious signature or certificate request. In terms of practical security, the major concern about these new attacks is that it might lead to more efficient attacks and thus a migration to stronger hashes is believed to be mandatory. These attacks broke the strong collision resistance, not pre-image (one-way) resistance though. That means that it is still infeasible for an attacker to generate a particular input to a hash function that is guaranteed to produce a particular output. Because of this, many of the applications that use cryptographic hashes, such as HMACrelated protocols, password storage or document signing, are only minimally affected by the collision attacks. In the case of document signing, for example, an attacker could not simply fake a signature from an existing document ─ the attacker would have to fool the private key holder into signing a pre-selected document. Reversing password encryption is not made possible by the attacks either. Constructing a password that works for a given account requires a pre-image attack. Practically, these collision attacks suggest the acceleration of upgrading systems that use hash functions. Three viable approaches for improving the application security are: 1) Replace the hash function with a stronger one. The most commonly suggested approach is to simply employ SHA-2 hash functions, possibly truncating the output to 160 bits for backward compatibility with SHA-1 in which case extra special care must be taken to prevent a “man in the middle” attack from downgrading a SHA-2 session into a more vulnerable SHA-1 session. Moreover, all new applications and protocols must be designed to have better collision resistance. These applications and protocols need to be able to accommodate new hash standards as they are developed. 2) Alter the protocol so that it no longer requires that the hash function be collision resistant. A recent proposal suggests adding randomness to hash functions [11]. To implement this, the application must have a good source of randomness and must alter the protocol. 3) Implement simple message pre-processing to convert plaintext messages into a form that makes all existing collision attacks inapplicable. This approach can be accomplished with minimal code change [12]. This practical alternative is appealing for applications which want to extend the secure life of SHA-1. The bottom line: Don’t use MD4. Don’t use MD5. Don’t use HAVAL. Don’t use RIPEMD. 6 CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project Bing Wu (9735592) Don’t use SHA-0. While SHA1 is showing signs of significant problems in some areas, SHA1 remains stronger than others. However, avoid using SHA-1 if possible because it is next up to be cracked. Use SHA-2 hash functions for now and wait for more collision-resistant hash functions. The SHA-2 standard is currently resisting known SHA-1 attacks. Theoretical attacks against SHA-2 hash functions may take a few years to turn into practical attacks. However, they are all potentially vulnerable as well because of their same design principal as other MD4 family hash functions. VSH is about the best generally published hash function [10], but it needs to have more peer review before it can be seriously considered. Use alternative approaches described in [11-12]. 5. Conclusion The “Chinese” attacks on hashes are remarkable in the cryptographic area. It makes people eagerly upgrade their systems to employ better hash functions as well as develop new and more collision-resistant hash functions to better serve their cryptographic purposes. This will greatly help us achieve a more secure digital world. References 1) Xiaoyun Wang, Hongbo Yu, Yiqun Lisa Yin, Efficient Collision Search Attacks on SHA-0, Crypto'05. 2) Xiaoyun Wang, Yiqun Lisa Yin, Hongbo Yu, Finding Collisions in the Full SHA-1, Crypto'05. 3) Xiaoyun Wang, Yiqun Lisa Yin, Hongbo Yu, Collision Search Attacks on SHA1, 2005. 4) Xiaoyun Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu, Collisions for Hash Functions MD4, MD5, HAVAL-128, RIPEMD, Crypto'04. 5) Xiaoyun Wang, Xuejia Lai, Dengguo Feng, Hui Chen, Xiuyuan Yu, Cryptanalysis of the Hash Functions MD4 and RIPEMD, Eurocrypto’05. 6) Xiaoyun Wang, Hongbo Yu, How to Break MD5 and Other Hash Functions, Eurocrypto’05. 7) Xiaoyun Wang, Dengguo Feng, Xiuyuan Yu, An Attack on Hash Function HAVAL128, Science in China Series E. 8) A. Joux, Collisions for SHA-0, Rump session of Crypto’04, August 2004. 9) Xiaoyun Wang, Andrew Yao, Frances Yao, New Collision Search for SHA-1, Rump session of Crypto’05, August 2005. 10) Scott Contini, Arjen K. Lenstra, Ron Steinfeld, VSH, an efficient and provable collision resistant hash function, Rump session of Crypto’05, August 2005. 11) S. Halevi, and H. Krawczyk, Strengthening Digital Signatures via Randomized Hashing, Internet Draft, 2005. 12) Szydlo M. and Yin Y., Collision-Resistant usage of MD5 and SHA-1 via Message Preprocessing, IACR Eprint archive 2005 #248. 7