the sha-1 algorithm - Computer Science

THE SHA-1 ALGORITHM
Amit Keswani and Vaibhav Khadilkar
Lamar University Computer Science Department, Beaumont, TX 77710, USA
Abstract
This paper discusses the secure hash
algorithm (SHA-1) originally developed
by the National Security Agency (NSA)
as SHA-0 and later handed over to the
National Institute of Standards and
Technology (NIST). However, in order
to correct a flaw in the original
algorithm, the NSA later presented the
revised version of SHA-0 and referred it
as SHA-1[1]. SHA-1 is a hash function
that takes a variable length input
message and produces a fixed length
output message called the hash or the
message digest of the original message.
The paper also produces the results of
implementation of the SHA-1 algorithm.
The SHA-1 algorithm is of particular
importance because of its use with the
Digital Signature Algorithm (DSA) for
digital signatures.
I. Introduction
A hash function takes a variable length
message and produces a fixed length
message as its output. This output
message is called the hash or message
digest of the original input message. The
trick behind building a good, secured
cryptographic hash function is to devise
a good compression function in which
each input bit affects as many output bits
as possible [2].
The SHA-1 algorithm belongs to a set
of cryptographic hash functions similar
to the MD family of hash functions. But
the main difference between the SHA-1
and the MD family is the more frequent
use of input bits during the course of the
hash function in the SHA-1 algorithm
than in MD4 or MD5 [2]. This fact
results in SHA-1 being more secured
compared to MD4 [3] or MD5 [4] but at
the expense of slower execution.
The original specification of the
algorithm was published in May 1993
whereas the revised version was
published in 1995 [1]. The algorithm
was based on principles similar to those
in the design of the MD4 and MD5
algorithms [1].
The way this algorithm works is that
for a message of size < 264 bits it
computes a 160-bit condensed output
called a message digest [5]. The SHA-1
algorithm is designed so that it is
practically infeasible to find two input
messages that hash to the same output
message. It is also practically impossible
to deduce the original input message
given only the output hash message.
This paper is organized as follows.
Section II describes the alphabet that is
used with this algorithm along with the
operations that can be done on them; it is
usually advisable to use hex digits.
Section III discusses how the given input
message is first padded to make it a
multiple of 512. Section IV describes the
functions and the constants that are used
in this algorithm. Section V presents the
actual SHA-1 algorithm followed by
section
VI
that
explains
the
implementation of the SHA-1 algorithm.
Section VII gives the results of two
different input messages when the SHA-
1 algorithm is applied to them. Section
VIII describes the algorithm’s use with
the Digital Signature Algorithm.
II. Alphabet and Operators
This section outlines the alphabet that
is used with this algorithm. It is better to
employ hex digits as they are easier to
use as compared to binary digits. A hex
digit ranges from 0 to F. A hex digit is a
representation of a 4-bit binary string. In
this algorithm we use words, which are
32-bit binary strings or 8 hex digits
equivalently. An integer in the range 0 to
232 – 1 can be represented as a word, as
it would have at most 32 digits, in which
case all of them would be 1. Also, once
the process of padding the given
message is done the padded message is
divided into 512 bit blocks. These blocks
are represented as a sequence of 16
words.
The logical operations that are applied
to words are as follows [5]:
y and z respectively such that
0<=x,y,z<232 and z=(x+y) mod 232
C. Circular Shift Operator
The
“circular
shift”
operator
represented by “Sn(X)”, where X is a
word and n is an integer 0<=n32 and
Sn(X)= (X << n) OR (X >> 32-n). The
overall circular shift is explained as
follows:
(X << n) means discarding the leftmost n bits of X and padding the result
with n zeroes on the right. Similarly (X
>> 32-n) implies discarding the rightmost (32-n) bits of X and padding the
result with (32-n) zeroes on the left. This
results in the circular shift of X by ‘n’
positions to the left. For example,
consider an integer ‘9’ represented as a
4-bit number i.e. 1001. Now a circular
shift of ‘1’ can be carried out using the
above procedure as follows:
(X << n) -> (1001 << 1) -> 0010
(X >> 4-n) -> (1001 >> 3) -> 0001
Sn(X) -> 0010 OR 0001 -> 0011
A. Bitwise logical operators
III. Message Padding
The bitwise logical “and” operator
represented as “X ^ Y” where X and Y
are words.
The bitwise logical “or” operator
represented as “X || Y”, again X and Y
are words.
The bitwise logical “exclusive-or”
operator is represented as “X XOR Y”.
The bitwise logical “complement”
operator is represented as “~ X”.
B. Addition Operator
The “word addition” operator
represented by “Z=X+Y”, where X,Y
and Z are words representing integers x,
As stated before the SHA-1 algorithm
produces a condensed representation of
the given input message or data file. This
input message is considered as a bit
string where the length of the message is
the number of bits in the input string.
The purpose of message padding is to
produce a padded message of length
equal to a multiple of 512 bits. The
reason behind this is that the SHA-1
algorithm processes messages as ‘n’
number of 512-bit blocks when
computing the message digest [2].
The way padding is done is explained
as follows: The original message is
initially appended with a ‘1’ followed by
a number of ‘0’ bits such that the
resultant length is 64 bits short of the
next highest multiple of 512 bits. The
last 64 bits of the last 512-bit block are
reserved for representing the length of
the original unpadded message. The
result of message padding is a padded
message containing 16*n words for
some n>0 [5].
IV. Functions and Constants
A sequence of logical functions f0, f1, f2,
….f79 and constant words K(0), K(1),
K(2), … K(79) is used in the SHA-1
algorithm. Each of the functions ft for
0<=t<=79 processes three words
producing a 32-bit output. The functions
and the constants used are mentioned
below [5]:
i) 0 <= t <= 19
ft(B,C,D) = (B AND C) OR
((NOT B) AND
D)
Kt = 5A827999
ii) 20 <= t <= 39
ft(B,C,D) = B XOR C XOR D
Kt = 6ED9EBA1
iii) 40 <= t <= 59
ft(B,C,D) = (B AND C) OR (B
AND D) OR (C
AND D)
Kt = 8F1BBCDC
iv) 60 <= t <= 79
ft(B,C,D) = B XOR C XOR D
Kt = CA62C1D6
Note that the constants are given in hex
only for reading purposes. The actual
algorithm uses them as bit strings during
computations.
V. SHA-1 Algorithm [2]
The message digest output is calculated
using the final padded message as ‘n’
512-bit blocks. The algorithm makes use
of two 160-bit registers, each consisting
of five 32-bit sub-registers. In addition,
there also exists a sequence of eighty 32bit words viz. W0, W1, W2,…, W79 that
will be used for computational purposes.
The basic SHA-1 algorithm is presented
as follows:
1) The algorithm starts off by initializing
the five sub-registers of the first 160-bit
register X labeled H0, H1, H2, H3, H4 as
follows:
H0=67452301;
H1=EFCDAB89;
H2=98BADCFE;
H3=10325476;
H4=C3D2E1F0;
2) From here onwards, SHA-1 iterates
through each of the 512-bit message
blocks viz. m0, m1, m2, … , mn-1. For
each of the message block, do the
following:
a. Write mj as a sequence of sixteen
32-bit words,
mj = W0 || W1 || W2 || … || W15
b. Compute the remaining sixty
four 2-bit words as follows:
 Wt = (Wt-3 xor Wt-8 xor
Wt-14 xor Wt-16)
 Cyclic shift of Wt by 1
i.e. S1(Wt)
c. Copy the first 160 bit register
into the second register as
follows:
A= H0; B= H1; C=H2; D=H3;
E= H4;
d. This step involves a sequence of
four rounds, corresponding to
four
intervals
0<=t<=19,
20<=t<=39,
40<=t<=59,
60<=t<=79. Each round takes as
input the current value of register
X and the blocks Wt for that
interval and operates upon them
for 20 iterations as follows:
 For t = 0 to 79,
 T=S5(A)+ft(B,C,D)+E
+Wt+Kt
 E=D;D=C; C= S30(B);
 B=A; A=T
e. Once all four rounds of
operations are completed, the
second 160-bit register (A, B, C,
D, E) is added to the first 160-bit
register (H0, H1, H2, H3, H4) as
follows:
 H0 = H0 + A;
 H1 = H1 + B;
 H2 = H2 + C;
 H3 = H3 + D;
 H4 = H4 + E;
3) Once the algorithm has processed all
of the 512-bit blocks, the final output of
X becomes the 160-bit message digest.
The basic building block comprises of
the rotations and XOR operations that
are carried out in step (3d).
VI. Implementation Issues
This section briefly discusses various
issues that were handled in the
implementation of the SHA-1 algorithm.
They are listed one after another. The
implementation language was Java.
A. BigInteger Class
The Java programming language has a
math package that contains an
implementation of a BigInteger class.
The BigInteger class was employed to
make use of the four logical operators
(NOT, AND, OR, XOR) as well as the
addition (ADD) operator. This package
was imported and used directly.
B. Conversion to Binary
The SHA-1 algorithm works on bits
hence the given input string must be
converted to its binary representation.
The given character is converted to its
binary representation using its ASCII
value. This value may not produce an 8bit representation of the character. In
that case, extra zeros need to be padded
to make the binary representation of a
character of length 8.
C. Addition mod 232
To implement addition mod 232 we just
discard the leftmost carry bit that is
generated after the addition operation.
We do not need to have any other special
code to solve this issue.
E. Special Functions
There are two special functions written
in the BitOperation class. The
ComputeHexString
function
first
calculates the integer representation of a
binary string and then converts that
integer to a hexadecimal string. The
GetBinaryString first calculates the
integer representation of a hexadecimal
string and then converts that integer to a
binary string.
VII. Results
H3 = 10325476 + 681e6df6 = 7850c26c
H4 = c3d2e1f0 + d8fdf6ad = 9cd0d89d
This section presents the output of the
SHA-1 algorithm on five different
inputs. These inputs were taken from [1]
and [5].
The digest is: a9993e36 4706816a
ba3e2571 7850c26c 9cd0d89d
A. Example 1 [5]
B. Example 2 [5]
This example consists of the input
string
“abc”.
The
hexadecimal
equivalent of this string is “01100001
01100010 01100011”. The length of this
string is 24. We first append a “1” to the
hexadecimal representation of “abc”. We
then append the appropriate number of
0’s followed by the 64-bit binary
representation of the length of the string.
In this case we have only 1 block of
length 512 bits.
This example consists of the string
“abcdbcdecdefdefgefghfghighijhijkijkljk
lmklmnlmnomnopnopq”. The length of
this string is 448. We first append a “1”
to the hexadecimal representation of
“abc”. We then append the appropriate
number of 0’s followed by the 64-bit
binary representation of the length of the
string. In this case we have only 2 blocks
of length 512 bits.
The different words of block 1 are:
The different words of this block are:
W[0] = 61626380
W[1] = 00000000
W[2] = 00000000
W[3] = 00000000
W[4] = 00000000
W[5] = 00000000
W[6] = 00000000
W[7] = 00000000
W[8] = 00000000
W[9] = 00000000
W[10] = 00000000
W[11] = 00000000
W[12] = 00000000
W[13] = 00000000
W[14] = 00000000
W[15] = 00000018
After processing the block, we get the
values of Hi as,
H0 = 67452301 + 42541b35 = a9993e36
H1 = efcdab89 + 5738d5e1 = 4706816a
H2 = 98badcfe + 21834873 = ba3e2571
W[0] = 61626364
W[1] = 62636465
W[2] = 63646566
W[3] = 64656667
W[4] = 65666768
W[5] = 66676869
W[6] = 6768696a
W[7] = 68696a6b
W[8] = 696a6b6c
W[9] = 6a6b6c6d
W[10] = 6b6c6d6e
W[11] = 6c6d6e6f
W[12] = 6d6e6f70
W[13] = 6e6f7071
W[14] = 80000000
W[15] = 00000000
After processing block 1, we get the
values of Hi as,
H0 = 67452301 + 8ce34517 = f4286818
H1 = efcdab89 + d3ad7c25 = c37b27ae
H2 = 98badcfe + 6b4e1883 = 0408f581
H3 = 10325476 + 74351cd2 = 84677148
H4 = c3d2e1f0 + 86838382 = 4a566572
D. Example 4 [1]
The words of block 2 are:
This example deals with a string “The
quick brown fox jumps over the lazy
cog”.
W[0] = 00000000
W[1] = 00000000
W[2] = 00000000
W[3] = 00000000
W[4] = 00000000
W[5] = 00000000
W[6] = 00000000
W[7] = 00000000
W[8] = 00000000
W[9] = 00000000
W[10] = 00000000
W[11] = 00000000
W[12] = 00000000
W[13] = 00000000
W[14] = 00000000
W[15] = 000001C0
After processing block 2, the values of
Hi are,
H0 = 67452301 + 906fd62c = 84983e44
H1 = efcdab89 + 58c0aac0 = 1c3bd26e
H2 = 98badcfe + b6a55520 = baae4aa1
H3 = 10325476 + 74e9b89d = f95129e5
H4 = c3d2e1f0 + 9af00b7f = e54670f1
The digest is:
84983e44 1c3bd26e baae4aa1 f95129e5
e54670f1
C. Example 3 [1]
This example deals with a string “The
quick brown fox jumps over the lazy
dog”.
The digest is:
2fd4e1c6 7a2d28fc ed849ee1 bb76e739
1b93eb12
The digest is:
de9f2c7f d25e1b3a fad3e85a 0bd17d9b
100db4b3
E. Example 5 [1]
This example deals with an empty
string “”.
The digest is:
da39a3ee 5e6b4b0d 3255bfef 95601890
afd80709
The running speed of the implemented
algorithm was found to be acceptable for
the above mentioned examples. This was
mainly because of the small size of the
input messages. However, it was
observed that as the input size went on
increasing the program became slower.
For recording purposes, we learnt that
for an input message of 10000
characters, it consumed around half a
minute to generate the final output. But
as we go on higher, the program
becomes unacceptably slow. For
example, for an input size of around
50000 characters, it generated the
message digest in approximately 20
minutes.
VIII. SHA-1 Security
In addition to the first two members of
the SHA family viz. SHA-0 and SHA-1,
four more variants have been issued with
increased output ranges and slightly
differing designs. These variants viz.
SHA-224, SHA-256, SHA-384 and
SHA-512 are often collectively termed
as SHA-2 [1]. The evolution of these
variants is the result of constant attacks
to the earlier versions of the SHA family
of algorithms.
Attacks have been found on both the
SHA-0 and the SHA-1 algorithms and
the sequence of these attacks are listed
below:
A. Cryptanalysis of SHA-0



In August 1998, an attack was
presented on SHA-0 that detected
collisions with complexity 261
[6].
Later in 2004, near collisions
were found for SHA-0 along
with full collisions on the
reduced SHA-0 algorithm [7].
Finally in August 2004, a
collision attack for the full SHA0 algorithm was announced with
complexity 240 that was further
improved to a complexity of 239
by an attack presented in
February 2005 [1].
B. Cryptanalysis of SHA-1
 In early 2005, Rijmen and
Oswald published an attack on
the reduced version of SHA-1
that found collisions with
complexity fewer than 280
operations [1].
 In February 2005, an attack was
announced that found collisions
in the full version of SHA-1
needing less than 269 operations
[1].
 Finally in August 2005, the
above attack was improved
lowering the complexity to 263
operations [1].
IX. Applications
SHA-1 is one of the required secure hash
algorithms for use in U.S. Federal
applications for the motive of protecting
highly sensitive data [1].
One of the most important applications
of the SHA-1 algorithm is its
incorporation in the Digital Signature
Standard. It is used commonly with the
Digital Signature Algorithm in electronic
mail, electronic funds transfer, software
distribution
and
various
other
applications that demand data integrity
and authentication [5]. The idea of
signing hashed messages provides many
advantages, one of them being faster
creation and less resources for storage or
transmission [2].
Few other applications include the
SHACAL
block
ciphers,
copy
prevention system of Microsoft’s Xbox
game console and many file sharing
applications [1].
X. Conclusion
The Secure Hash Algorithm (SHA-1) is
used for computing a compressed
representation of a message or a data
file. Given an input message of arbitrary
length < 264 bits, it produces a 160-bit
output called the message digest. The
SHA-1 algorithm is claimed to be secure
because it is practically infeasible to
compute the message corresponding to a
given message digest. Also it is
extremely improbable to detect two
messages hashing to the same value.
The basic SHA-1 algorithm was
studied with detailed explanation of the
alphabet structure used along with
various different operators, functions
and constants employed by the
algorithm.
The important implementation issues
were discussed that influenced the
manner in which various different
classes and its members were defined.
The implemented algorithm was checked
and tested with a number of benchmark
input messages supplied by authorized
sites.
Last but not least, the attacks on the
SHA-1 algorithm were mentioned
followed by a section on the most
important applications of the SHA-1
algorithm.
References
[1] SHA hash functions - Wikipedia, the
free encyclopedia
http://en.wikipedia.org/wiki/SHA1#Desc
ription_of_the_algorithms
[2] Wade Trappe, Lawrence C.
Washington. 2006. Introduction to
Cryptography with Coding Theory. New
Jersey: Pearson Prentice Hall.
[3] R. Rivest MIT Laboratory for
Computer Science and RSA Data
Security, Inc. Internet RFC(1320) April
1992
ftp://ftp.rfc-editor.org/innotes/rfc1320.txt
[4] R. Rivest MIT Laboratory for
Computer Science and RSA Data
Security, Inc. Internet RFC(1321) April
1992
ftp://ftp.rfc-editor.org/innotes/rfc1321.txt
[5] FIPS 180-1 - Secure Hash Standard
http://www.itl.nist.gov/fipspubs/fip1801.htm
[6] Florent Chabaud and Antonie Joux,
Differential Collisions in SHA-0,
Centre d' Électronique de l'Armement
CASSI/SCY/EC F-35998 Rennes
ArmÉes, France
{chabaud,joux}@celar.fr
[7] Eli Biham Rafi Chen,Near-collisions
of sha-0, Computer Science Department,
Technion - Israel Institute of Technology
Haifa 32000, Israel