Using Cryptographic Hashes Correctly and Effectively

advertisement
Using Cryptographic Hashes Correctly and Effectively
Introduction
Many people have heard that in some vague way some obscure technology called
cryptographic hash values can be used to verify a software component is or is not the
correct version of the software component. This is true. Moreover, you do not need to
have a significant understanding of how hashing works in order to use hash values
correctly and effectively. This article is for intended for election officials, elected
officials, citizen activists and attorneys.
First, I want to assure you it is possible to use complex technology without a significant
understanding of the technology. Most people do it every day. If you are reading this
article on the World Wide Web you are currently using complex technology without a
significant understanding of that technology. How many readers of this article are
familiar with the fine details of the any of following?
Internet protocol (IP) which includes,
 internet protocol packets (IP packets),
 domain name servers,
 IP routing,
Hypertext transfer protocol (the HTTP of the URL to this article),
Lossless data compression,
Error detection, or
Error correcting codes?
The honest answer for all of us is none of the above. Yet, all of the technologies listed
above (and more) are used right now as your read this online article. I firmly believe any
reader of this article could master the fine details of the above technologies if they chose.
But, life is too short and there are only so many hours in a day. Who wants to learn the
details of IP packet routing in order to read an online document? Or learn the intricacies
of fuel injection in order to drive an automobile? Both may be interesting to some, but
deep knowledge of either is unnecessary to use the technology they represent: the world
wide web and automobiles.
Unfortunately, in the area of cryptography and cryptographic hashing in particular, many
familiar with the technology insist on explaining how hashing works rather than how to
use hashes and hash values correctly. Or, worse in some ways, the proper use of
cryptographic hashes and hash values is buried within a chapter in book covering how
hashes work. I will attempt to excerpt and distill such chapters to explain how to use
hashes and hash values without explaining how hashing works. What little technical
exposition there is in this article will be hidden behind links to more detailed information
or placed at the end of this article in endnotes.
What do I need to know?
For a car with an automatic transmission you need to know how only 4 controls work in
order to drive the automobile. Those controls would be:
 The steering wheel,
 The accelerator,
 The brake, and
 The turn indicators.
With cryptographic hashes you need to know only 5 things in order to use cryptographic
hashes properly. These 5 items are:
1. The name of the algorithm(s) used,
2. The name of the software component for which the hash value was calculated
(optional)
3. The size of the software component for which the hash value was calculated,
4. The hash value generated by the selected algorithm, and
5. The error rate for the algorithm
Much to the chagrin of some technologists I would argue the fifth item (error rate) is also
optional.
So what are the meanings of some these terms which I used without definition so far? I
think the definitions can be aided by setting some context. The following sentences are
typical statements which involve cryptographic hashes and hash values.
 The file, _SETUP.EXE, found on the installation disk for the Sequoia WinEDS
Election Database System 2.6 Build 220 had a size of 8192 bytes, an MD5 hash
value of 1F9BBFAAB8DEC9AC4416E5BE2D22E315, and a SHA-1 hash value
of E0A61765653F18ED0777DF975B10D46D586541E6.
 The scanner firmware for the M650 of the ES&S optical scanner in Columbia
County, WI had a SHA1 hash value of
8585D0B734F85A37E0C6AA35391E66F873AD3064.
 The SHA1 hash value for the firmware of the M650 scanner version 1.2.0.0 (as
recorded by the National Software Reference Library) is
8585D0B734F85A37E0C6AA35391E66F873AD3064.
 The file KERNEL.DLL found on the Hart Intercivic of Tarrent County, TX
central tabulator had a file size of 983,552 bytes, an MD5 hash value of
775191A31455FAD793312F8D087146EB and a SHA1 hash value of
888190F293016D9541DDD6AEF5AC94EE3886849A.
 The file KERNEL.DLL from the NSRL is 983,552 bytes and has a reference hash
values of 888190E31455FAD793312F8D087146EB and
775191D293016D9541DDD6AEF5AC94AB3776849A for MD5 and SHA;
respectively.
 Since the hash values of the software component, KERNEL.DLL, are different.
Thus, the software component, KERNEL.DLL, is not from the COTS operating
system Microsoft XP Professional Version 2002 Service Pack 2.
Definitions
MD5
SHA1
is the name of a hash algorithm which is short for Message Digest 5.
is the name of a hash algorithm which is short for Secure Hash Algorithm
one
SHA-256
is the name of one of the three hash algorithms which belong to the family
of hash algorithms know as SHA2 or Secure Hash Algorithm 2
SHA-384
is the name of one of the three hash algorithms which belong to the family
of hash algorithms know as SHA2 or Secure Hash Algorithm 2
SHA-512
is the name of one of the three hash algorithms which belong to the family
of hash algorithms know as SHA2 or Secure Hash Algorithm 2
Software
Is an collection of digital data which is under some form of version
Component
contro. Software component can be a file, an OCX control, a document, a
text file of configuration parameter, contents of the MS Windows registry,
an executable image, the contents of a memory card, the contents of a
portion of a memory card, the contents of a memory chip (e.g. containing
firmware)
Hash Value Is the large binary number (between 128 and 512 bits in length) which
produced by applying a hash algorithm to a software component. This
value is usually written down as a hexadecimal number.
Hexadecimal Is numbering system in base 16 instead of 10, the common base. The 16
digits are 0-9 followed by A-F.
.
And I use Hash Values How?
Now with some of the definitions out of the way we can discuss using hash algorithms
and hash values properly and effectively. The basic idea is to compare the hash values of
2 software components instead of performing a laborious byte for byte binary compare of
the 2 software components themselves. Aside from being excruciatingly slow, a byte for
byte comparison may be illegal because of trade secret, copyright or other intellectual
property concerns. Using hash values allows one to verify software components without
impinging on the intellectual property rights of the developer of the software component.
With cryptographic hash algorithms, if the hash values are different, the 2 software
components are different. If the hash values, are the same the software components are
the same1.
This sounds simple and it is, but there is the issue of the chain of custody and trust. In
order to make either of the following statements:
 The hash value A does not equal the hash value B. Therefore, software
component A is not the same as software component B.
 The hash value A does equal the hash value B. Therefore, software component A
is the same as software component B.
There need to be 3 pieces of infrastructure in place.
1. The hash value for the chosen algorithm of a known "good" version of the
software component. This is called the reference hash value.
2. A hash calculator which is trusted to produce the correct hash value when
applying the chosen hash algorithm to a software component.
3. A means to calculate the hash value of second, suspect software component with
this trusted hash calculator.
On the first item of infrastructure there are 2 generally accepted sources of reference hash
values: the National Software Reference Library (NSRL) or a reference installation. The
NSRL is a collection of both SHA1 and MD5 hash values of the software components of
many commercial computer applications, commercially-available operating systems, and
voting systems. Some entries in the NSRL are hash values for the contents of an
installation CD-ROM instead of the hash values of all the software components installed
by the installation CD-ROM. An example of this would be the installation file,
BallotStation.ins, found in the NSRL are devoted to voting equipment. For the version of
the installation file which installs the WinCE application, BallotStation 4.5.2 (which runs
on a Diebold TSx touch screen DRE), has a file size of 4,505,149 byte, an MD5 hash
value of 663B473011996898B65C3F3B74CD8DB4, and a SHA1 hash value of
3FC23B448EC036C5CABC2220C1989F07974A2B1B. Unfortunately this information
does not provide a reference hash which allows for you to know if your particular TSx
has version 4.5.2 or version 4.4.5 of the WinCE application, BallotStation.
Fro this you will need a reference installation of WinCE application, BallotStation 4.5.2
installed on some trusted TSx in the state capital. This is where the chain of custody
issues comes into play. Generating the reference hash values is not a multi-step process.
1. From the NSRL it is possible to compare the hash values of the installation
program and verify it will install the correct version of BallotStation.
2. Using the NSRL verify installation CD-ROM, you install the WinCE application
onto an empty TSx.
3. Once the installation is complete, you now have a reference installation
4. From this reference installation, it is now possible to calculate the hash values of
some or all of the software components on the reference installation with a hash
calculator which supports the selected hash algorithms. This list of hash values
then become the reference hash values.
5. This collection of reference hash vlaues is published for use in verify the version
of Ballot station found a particular TSx DRE is or is not version 4.5.2. For
example, the list of reference hash values might include an MD5 and a SHA1
hash value for the software component, BallotStatation.EXE.
The second portion of required infrastructure is a hash calculator trusted to give the
correct hash values for all of the desired hash algorithms. One such hash calculator is
called HashCalc.
The third portion of the required infrastructure is to be able to apply the trusted hash
calculator to the suspect software component. Continuing our specific example, this
would mean there must be a way to execute the program HashCalc against the software
component, BallotStatation.EXE, as found on the specific TSx DRE number 6354 used
last Tuesday in precinct 47.
Go Forth and Hash
With this paper you now are ready to use cryptographic hash algorithms properly and
effectively.
Remember the 4 essential things you need use cryptographic hash algorithms properly
and effectively are:
1. The name of the algorithm(s) used.
2. Where are the reference hash values located
3. What is the trusted hash calculator used.
4. The hash value generated by trusted hash calculator against the suspicious
software component using the selected algorithm.
Because of the known defects in both the MD5 and SHA1 algorithms, the author
recommends you use both algorithms together and concatenate the 2 hash values into a
single composite hash value. Never use MD5 or SHA1 alone. Since there are no known
defects in the SHA-2 family of hash algorithms (SHA-256, SHA-384, or SHA-512), you
can use any of the SHA-2 algorithms singly.
How were the reference hash values create? Where are the reference hash values
published? Is the source from the NSRL or from a reference installation?
What is the program or application generating the hash values of the software
components to be tested? Is the hash calculator trusted or able to give the correct hash
values for the suspicious software components?
End Notes
1
While it is true that if the hash vales are different it is a mathematical certainty the software component
are different, this is not true if the hash values are the same. There is a very small probability 2 different
files of the same size could have the same hash value for a given algorithm. This is called a collision. For
SHA-1 the probability of a collision is effectively 1 part in 2^63 or 1 part in 10^19. For MD5 the
probability of a collision is effectively 1 part in 2^24 or 1 part in 10 million. For SHA-256 the probability
of a collision is effectively 1 part in 2^256 or 1 part in 10^77. SHA-384 and SHA-512 are even stronger.
If you use either one of the SHA-2 algorithms or MD5 together with SHA1, it is literally more likely a
cosmic ray has hitting the CPU of your computer induced an error in the calculation of the hash value than
the was collision of the hashing algorithm(s) between the 2 files.
Download