THE SHA-1 ALGORITHM Amit Keswani and Vaibhav Khadilkar Lamar University Computer Science Department, Beaumont, TX 77710, USA Abstract This paper discusses the secure hash algorithm (SHA-1) originally developed by the National Security Agency (NSA) as SHA-0 and later handed over to the National Institute of Standards and Technology (NIST). However, in order to correct a flaw in the original algorithm, the NSA later presented the revised version of SHA-0 and referred it as SHA-1[1]. SHA-1 is a hash function that takes a variable length input message and produces a fixed length output message called the hash or the message digest of the original message. The paper also produces the results of implementation of the SHA-1 algorithm. The SHA-1 algorithm is of particular importance because of its use with the Digital Signature Algorithm (DSA) for digital signatures. I. Introduction A hash function takes a variable length message and produces a fixed length message as its output. This output message is called the hash or message digest of the original input message. The trick behind building a good, secured cryptographic hash function is to devise a good compression function in which each input bit affects as many output bits as possible [2]. The SHA-1 algorithm belongs to a set of cryptographic hash functions similar to the MD family of hash functions. But the main difference between the SHA-1 and the MD family is the more frequent use of input bits during the course of the hash function in the SHA-1 algorithm than in MD4 or MD5 [2]. This fact results in SHA-1 being more secured compared to MD4 [3] or MD5 [4] but at the expense of slower execution. The original specification of the algorithm was published in May 1993 whereas the revised version was published in 1995 [1]. The algorithm was based on principles similar to those in the design of the MD4 and MD5 algorithms [1]. The way this algorithm works is that for a message of size < 264 bits it computes a 160-bit condensed output called a message digest [5]. The SHA-1 algorithm is designed so that it is practically infeasible to find two input messages that hash to the same output message. It is also practically impossible to deduce the original input message given only the output hash message. This paper is organized as follows. Section II describes the alphabet that is used with this algorithm along with the operations that can be done on them; it is usually advisable to use hex digits. Section III discusses how the given input message is first padded to make it a multiple of 512. Section IV describes the functions and the constants that are used in this algorithm. Section V presents the actual SHA-1 algorithm followed by section VI that explains the implementation of the SHA-1 algorithm. Section VII gives the results of two different input messages when the SHA- 1 algorithm is applied to them. Section VIII describes the algorithm’s use with the Digital Signature Algorithm. II. Alphabet and Operators This section outlines the alphabet that is used with this algorithm. It is better to employ hex digits as they are easier to use as compared to binary digits. A hex digit ranges from 0 to F. A hex digit is a representation of a 4-bit binary string. In this algorithm we use words, which are 32-bit binary strings or 8 hex digits equivalently. An integer in the range 0 to 232 – 1 can be represented as a word, as it would have at most 32 digits, in which case all of them would be 1. Also, once the process of padding the given message is done the padded message is divided into 512 bit blocks. These blocks are represented as a sequence of 16 words. The logical operations that are applied to words are as follows [5]: y and z respectively such that 0<=x,y,z<232 and z=(x+y) mod 232 C. Circular Shift Operator The “circular shift” operator represented by “Sn(X)”, where X is a word and n is an integer 0<=n32 and Sn(X)= (X << n) OR (X >> 32-n). The overall circular shift is explained as follows: (X << n) means discarding the leftmost n bits of X and padding the result with n zeroes on the right. Similarly (X >> 32-n) implies discarding the rightmost (32-n) bits of X and padding the result with (32-n) zeroes on the left. This results in the circular shift of X by ‘n’ positions to the left. For example, consider an integer ‘9’ represented as a 4-bit number i.e. 1001. Now a circular shift of ‘1’ can be carried out using the above procedure as follows: (X << n) -> (1001 << 1) -> 0010 (X >> 4-n) -> (1001 >> 3) -> 0001 Sn(X) -> 0010 OR 0001 -> 0011 A. Bitwise logical operators III. Message Padding The bitwise logical “and” operator represented as “X ^ Y” where X and Y are words. The bitwise logical “or” operator represented as “X || Y”, again X and Y are words. The bitwise logical “exclusive-or” operator is represented as “X XOR Y”. The bitwise logical “complement” operator is represented as “~ X”. B. Addition Operator The “word addition” operator represented by “Z=X+Y”, where X,Y and Z are words representing integers x, As stated before the SHA-1 algorithm produces a condensed representation of the given input message or data file. This input message is considered as a bit string where the length of the message is the number of bits in the input string. The purpose of message padding is to produce a padded message of length equal to a multiple of 512 bits. The reason behind this is that the SHA-1 algorithm processes messages as ‘n’ number of 512-bit blocks when computing the message digest [2]. The way padding is done is explained as follows: The original message is initially appended with a ‘1’ followed by a number of ‘0’ bits such that the resultant length is 64 bits short of the next highest multiple of 512 bits. The last 64 bits of the last 512-bit block are reserved for representing the length of the original unpadded message. The result of message padding is a padded message containing 16*n words for some n>0 [5]. IV. Functions and Constants A sequence of logical functions f0, f1, f2, ….f79 and constant words K(0), K(1), K(2), … K(79) is used in the SHA-1 algorithm. Each of the functions ft for 0<=t<=79 processes three words producing a 32-bit output. The functions and the constants used are mentioned below [5]: i) 0 <= t <= 19 ft(B,C,D) = (B AND C) OR ((NOT B) AND D) Kt = 5A827999 ii) 20 <= t <= 39 ft(B,C,D) = B XOR C XOR D Kt = 6ED9EBA1 iii) 40 <= t <= 59 ft(B,C,D) = (B AND C) OR (B AND D) OR (C AND D) Kt = 8F1BBCDC iv) 60 <= t <= 79 ft(B,C,D) = B XOR C XOR D Kt = CA62C1D6 Note that the constants are given in hex only for reading purposes. The actual algorithm uses them as bit strings during computations. V. SHA-1 Algorithm [2] The message digest output is calculated using the final padded message as ‘n’ 512-bit blocks. The algorithm makes use of two 160-bit registers, each consisting of five 32-bit sub-registers. In addition, there also exists a sequence of eighty 32bit words viz. W0, W1, W2,…, W79 that will be used for computational purposes. The basic SHA-1 algorithm is presented as follows: 1) The algorithm starts off by initializing the five sub-registers of the first 160-bit register X labeled H0, H1, H2, H3, H4 as follows: H0=67452301; H1=EFCDAB89; H2=98BADCFE; H3=10325476; H4=C3D2E1F0; 2) From here onwards, SHA-1 iterates through each of the 512-bit message blocks viz. m0, m1, m2, … , mn-1. For each of the message block, do the following: a. Write mj as a sequence of sixteen 32-bit words, mj = W0 || W1 || W2 || … || W15 b. Compute the remaining sixty four 2-bit words as follows: Wt = (Wt-3 xor Wt-8 xor Wt-14 xor Wt-16) Cyclic shift of Wt by 1 i.e. S1(Wt) c. Copy the first 160 bit register into the second register as follows: A= H0; B= H1; C=H2; D=H3; E= H4; d. This step involves a sequence of four rounds, corresponding to four intervals 0<=t<=19, 20<=t<=39, 40<=t<=59, 60<=t<=79. Each round takes as input the current value of register X and the blocks Wt for that interval and operates upon them for 20 iterations as follows: For t = 0 to 79, T=S5(A)+ft(B,C,D)+E +Wt+Kt E=D;D=C; C= S30(B); B=A; A=T e. Once all four rounds of operations are completed, the second 160-bit register (A, B, C, D, E) is added to the first 160-bit register (H0, H1, H2, H3, H4) as follows: H0 = H0 + A; H1 = H1 + B; H2 = H2 + C; H3 = H3 + D; H4 = H4 + E; 3) Once the algorithm has processed all of the 512-bit blocks, the final output of X becomes the 160-bit message digest. The basic building block comprises of the rotations and XOR operations that are carried out in step (3d). VI. Implementation Issues This section briefly discusses various issues that were handled in the implementation of the SHA-1 algorithm. They are listed one after another. The implementation language was Java. A. BigInteger Class The Java programming language has a math package that contains an implementation of a BigInteger class. The BigInteger class was employed to make use of the four logical operators (NOT, AND, OR, XOR) as well as the addition (ADD) operator. This package was imported and used directly. B. Conversion to Binary The SHA-1 algorithm works on bits hence the given input string must be converted to its binary representation. The given character is converted to its binary representation using its ASCII value. This value may not produce an 8bit representation of the character. In that case, extra zeros need to be padded to make the binary representation of a character of length 8. C. Addition mod 232 To implement addition mod 232 we just discard the leftmost carry bit that is generated after the addition operation. We do not need to have any other special code to solve this issue. E. Special Functions There are two special functions written in the BitOperation class. The ComputeHexString function first calculates the integer representation of a binary string and then converts that integer to a hexadecimal string. The GetBinaryString first calculates the integer representation of a hexadecimal string and then converts that integer to a binary string. VII. Results H3 = 10325476 + 681e6df6 = 7850c26c H4 = c3d2e1f0 + d8fdf6ad = 9cd0d89d This section presents the output of the SHA-1 algorithm on five different inputs. These inputs were taken from [1] and [5]. The digest is: a9993e36 4706816a ba3e2571 7850c26c 9cd0d89d A. Example 1 [5] B. Example 2 [5] This example consists of the input string “abc”. The hexadecimal equivalent of this string is “01100001 01100010 01100011”. The length of this string is 24. We first append a “1” to the hexadecimal representation of “abc”. We then append the appropriate number of 0’s followed by the 64-bit binary representation of the length of the string. In this case we have only 1 block of length 512 bits. This example consists of the string “abcdbcdecdefdefgefghfghighijhijkijkljk lmklmnlmnomnopnopq”. The length of this string is 448. We first append a “1” to the hexadecimal representation of “abc”. We then append the appropriate number of 0’s followed by the 64-bit binary representation of the length of the string. In this case we have only 2 blocks of length 512 bits. The different words of block 1 are: The different words of this block are: W[0] = 61626380 W[1] = 00000000 W[2] = 00000000 W[3] = 00000000 W[4] = 00000000 W[5] = 00000000 W[6] = 00000000 W[7] = 00000000 W[8] = 00000000 W[9] = 00000000 W[10] = 00000000 W[11] = 00000000 W[12] = 00000000 W[13] = 00000000 W[14] = 00000000 W[15] = 00000018 After processing the block, we get the values of Hi as, H0 = 67452301 + 42541b35 = a9993e36 H1 = efcdab89 + 5738d5e1 = 4706816a H2 = 98badcfe + 21834873 = ba3e2571 W[0] = 61626364 W[1] = 62636465 W[2] = 63646566 W[3] = 64656667 W[4] = 65666768 W[5] = 66676869 W[6] = 6768696a W[7] = 68696a6b W[8] = 696a6b6c W[9] = 6a6b6c6d W[10] = 6b6c6d6e W[11] = 6c6d6e6f W[12] = 6d6e6f70 W[13] = 6e6f7071 W[14] = 80000000 W[15] = 00000000 After processing block 1, we get the values of Hi as, H0 = 67452301 + 8ce34517 = f4286818 H1 = efcdab89 + d3ad7c25 = c37b27ae H2 = 98badcfe + 6b4e1883 = 0408f581 H3 = 10325476 + 74351cd2 = 84677148 H4 = c3d2e1f0 + 86838382 = 4a566572 D. Example 4 [1] The words of block 2 are: This example deals with a string “The quick brown fox jumps over the lazy cog”. W[0] = 00000000 W[1] = 00000000 W[2] = 00000000 W[3] = 00000000 W[4] = 00000000 W[5] = 00000000 W[6] = 00000000 W[7] = 00000000 W[8] = 00000000 W[9] = 00000000 W[10] = 00000000 W[11] = 00000000 W[12] = 00000000 W[13] = 00000000 W[14] = 00000000 W[15] = 000001C0 After processing block 2, the values of Hi are, H0 = 67452301 + 906fd62c = 84983e44 H1 = efcdab89 + 58c0aac0 = 1c3bd26e H2 = 98badcfe + b6a55520 = baae4aa1 H3 = 10325476 + 74e9b89d = f95129e5 H4 = c3d2e1f0 + 9af00b7f = e54670f1 The digest is: 84983e44 1c3bd26e baae4aa1 f95129e5 e54670f1 C. Example 3 [1] This example deals with a string “The quick brown fox jumps over the lazy dog”. The digest is: 2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12 The digest is: de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3 E. Example 5 [1] This example deals with an empty string “”. The digest is: da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709 The running speed of the implemented algorithm was found to be acceptable for the above mentioned examples. This was mainly because of the small size of the input messages. However, it was observed that as the input size went on increasing the program became slower. For recording purposes, we learnt that for an input message of 10000 characters, it consumed around half a minute to generate the final output. But as we go on higher, the program becomes unacceptably slow. For example, for an input size of around 50000 characters, it generated the message digest in approximately 20 minutes. VIII. SHA-1 Security In addition to the first two members of the SHA family viz. SHA-0 and SHA-1, four more variants have been issued with increased output ranges and slightly differing designs. These variants viz. SHA-224, SHA-256, SHA-384 and SHA-512 are often collectively termed as SHA-2 [1]. The evolution of these variants is the result of constant attacks to the earlier versions of the SHA family of algorithms. Attacks have been found on both the SHA-0 and the SHA-1 algorithms and the sequence of these attacks are listed below: A. Cryptanalysis of SHA-0 In August 1998, an attack was presented on SHA-0 that detected collisions with complexity 261 [6]. Later in 2004, near collisions were found for SHA-0 along with full collisions on the reduced SHA-0 algorithm [7]. Finally in August 2004, a collision attack for the full SHA0 algorithm was announced with complexity 240 that was further improved to a complexity of 239 by an attack presented in February 2005 [1]. B. Cryptanalysis of SHA-1 In early 2005, Rijmen and Oswald published an attack on the reduced version of SHA-1 that found collisions with complexity fewer than 280 operations [1]. In February 2005, an attack was announced that found collisions in the full version of SHA-1 needing less than 269 operations [1]. Finally in August 2005, the above attack was improved lowering the complexity to 263 operations [1]. IX. Applications SHA-1 is one of the required secure hash algorithms for use in U.S. Federal applications for the motive of protecting highly sensitive data [1]. One of the most important applications of the SHA-1 algorithm is its incorporation in the Digital Signature Standard. It is used commonly with the Digital Signature Algorithm in electronic mail, electronic funds transfer, software distribution and various other applications that demand data integrity and authentication [5]. The idea of signing hashed messages provides many advantages, one of them being faster creation and less resources for storage or transmission [2]. Few other applications include the SHACAL block ciphers, copy prevention system of Microsoft’s Xbox game console and many file sharing applications [1]. X. Conclusion The Secure Hash Algorithm (SHA-1) is used for computing a compressed representation of a message or a data file. Given an input message of arbitrary length < 264 bits, it produces a 160-bit output called the message digest. The SHA-1 algorithm is claimed to be secure because it is practically infeasible to compute the message corresponding to a given message digest. Also it is extremely improbable to detect two messages hashing to the same value. The basic SHA-1 algorithm was studied with detailed explanation of the alphabet structure used along with various different operators, functions and constants employed by the algorithm. The important implementation issues were discussed that influenced the manner in which various different classes and its members were defined. The implemented algorithm was checked and tested with a number of benchmark input messages supplied by authorized sites. Last but not least, the attacks on the SHA-1 algorithm were mentioned followed by a section on the most important applications of the SHA-1 algorithm. References [1] SHA hash functions - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/SHA1#Desc ription_of_the_algorithms [2] Wade Trappe, Lawrence C. Washington. 2006. Introduction to Cryptography with Coding Theory. New Jersey: Pearson Prentice Hall. [3] R. Rivest MIT Laboratory for Computer Science and RSA Data Security, Inc. Internet RFC(1320) April 1992 ftp://ftp.rfc-editor.org/innotes/rfc1320.txt [4] R. Rivest MIT Laboratory for Computer Science and RSA Data Security, Inc. Internet RFC(1321) April 1992 ftp://ftp.rfc-editor.org/innotes/rfc1321.txt [5] FIPS 180-1 - Secure Hash Standard http://www.itl.nist.gov/fipspubs/fip1801.htm [6] Florent Chabaud and Antonie Joux, Differential Collisions in SHA-0, Centre d' Électronique de l'Armement CASSI/SCY/EC F-35998 Rennes ArmÉes, France {chabaud,joux}@celar.fr [7] Eli Biham Rafi Chen,Near-collisions of sha-0, Computer Science Department, Technion - Israel Institute of Technology Haifa 32000, Israel