Cyclic Redundancy Check

advertisement
Cyclic Redundancy Check
Error detection is important whenever there is a non-zero chance of your data getting corrupted.
Whether it's an Ethernet packet or a file under the control of your application, you can add a
piece of redundant information to validate it.
The simplest example is a parity bit. Many computers use one parity bit per byte of memory.
Every time the byte gets written, the computer counts the number of non-zero bits in it. If the
number is even, it sets the ninth parity bit, otherwise it clears it. When reading the byte, the
computer counts the number of non-zero bits in the byte, plus the parity bit. If any of the nine bits
is flipped, the sum will be odd and the computer will halt with a memory error. (Of course, if two
bits are flipped--a much rarer occurrence--this system will not detect it.)
For messages longer than one byte, you'd like to store more than one bit of redundant
information. You might, for instance, calculate a checksum. Just add together all the bytes in the
message and append (or store somewhere else) the sum. Usually the sum is truncated to, say, 32
bits. This system will detect many types of corruption with a reasonable probability. It will,
however, fail badly when the message is modified by inverting or swapping groups of bytes.
Also, it will fail when you add or remove null bytes.
Calculating a Cyclic Redundancy Check is a much more robust error checking algorithm. In this
article I will sketch the mathematical foundations of the CRC calculation and describe two C++
implementations--first the slow but simple one, then the more optimized one.
Polynomials
Here's a simple polynomial, 2x2 - 3x + 7. It is a function of some variable x, which depends only
on powers of x. The degree of a polynomial is equal to the highest power of x in it; here it is 2
because of the x2 term. A polynomial is fully specified by listing its coefficients, in this case (2, 3, 7). Notice that to define a degree-d polynomial you have to specify d + 1 coefficients.
It's easy to multiply polynomials. For instance,
(2x2 - 3x + 7) * (x + 2)
= 2x3 + 4x2 - 3x2 - 6x + 7x + 14
= 2x3 + x2 + x + 14
Conversely, it is also possible to divide polynomials. For instance, the above equation can be
rewritten as a division:
(2x3 + x2 + x + 14) / (x + 2) = 2x2 - 3x + 7
Just like in integer arithmetic, one polynomial doesn't have to be divisible by another. But you
can always divide out the "whole" part and be left with the remainder. For instance x2 - 2x is not
divisible by x + 1, but you can calculate the quotient to be x - 3 and the remainder to be 3:
(x2 - 2x) = (x + 1) * (x - 3) + 3
In fact you can use a version of long division to perform such calculations
Arithmetic Modulo Two
Most of us are familiar with polynomials whose coefficients are real numbers. In general,
however, you can define polynomials with coefficients taken from arbitrary sets. One such set (in
fact a field) consists of the numbers 0 and 1 with arithmetic defined modulo 2. It means that you
perform arithmetic as usual, but if you get something greater than 1 you keep only its remainder
after division by 2. In particular, if you get 2, you keep 0. Here's the addition table:
0+0=0
0+1=1+0=1
1 + 1 = 0 (because 2 has remainder 0 after dividing by 2)
The multiplication table is equally simple:
0*0=0
0*1=1*0=0
1*1=1
What's more, subtraction is also well defined (in fact the subtraction table is identical to the
addition table) and so is division (except for division by zero). What is nice, from the point of
view of computer programming, is that both addition and subtraction modulo 2 are equivalent to
bitwise exclusive or (XOR).
Now imagine a polynomial whose coefficients are zeros and ones, with the rule that all arithmetic
on these coefficients is performed modulo 2. You can add, subtract, multiply and divide such
polynomials (they form a ring). For instance, let's do some easy multiplication:
(1x2 + 0x + 1) * (1x + 1)
= 1x3 + 1x2 + 0x2 + 0x + 1x + 1
= 1x3 + 1x2 + 1x + 1
Let's now simplify our notation by representing a polynomial as a series of coefficients. For
instance, 1x2 + 0x + 1 has coefficients (1, 0, 1), 1x + 1 (1, 1), and 1x3 + 1x2 + 1x + 1 (1, 1, 1, 1).
Do you see what I am driving at? A polynomial with coefficients modulo 2 can be represented as
a series of bits. Conversely, any series of bits can be looked upon as a polynomial. In particular
any binary message, which is nothing but a series of bits, is equivalent to a polynomial.
CRC
Take a binary message and convert it to a polynomial then divide it by another predefined
polynomial called the key. The remainder from this division is the CRC. Now transmit both the
message and the CRC. The recipient of the transmission does the same operation (divides the
message by the same key) and compares his CRC with yours. If they differ, the message must
have been mangled. If, on the other hand, they are equal, the odds are pretty good that the
message went through uncorrupted. Most localized corruptions (burst of errors) will be caught
using this scheme.
Not all keys are equally good. The longer the key, the better error checking. On the other hand,
the calculations with long keys can get pretty involved. Ethernet packets use a 32-bit CRC
corresponding to degree-31 remainder (remember, you need d + 1 coefficients for a degree-d
polynomial). Since the degree of the remainder is always less than the degree of the divisor, the
Ethernet key must be a polynomial of degree 32. A polynomial of degree 32 has 33 coefficients
requiring a 33-bit number to store it. However, since we know that the highest coefficient (in
front of x32) is 1, we don't have to store it. The key used by the Ethernet is 0x04c11db7. It
corresponds to the polynomial:
x32 + x26 + ... + x2 + x + 1
There is one more trick used in packaging CRCs. First calculate the CRC for a message to which
you have appended 32 zero bits. Suppose that the message had N bits, thus corresponding to
degree N-1 polynomial. After appending 32 bits, it will correspond to a degree N + 31
polynomial. The top-level bit that was multiplying xN-1 will be now multiplying xN+31 and so on.
In all, this operation is equivalent to multiplying the message polynomial by x32. If we denote the
original message polynomial by M (x), the key polynomial by K (x) and the CRC by R (x)
(remainder) we have:
M(x) * x32 = Q (x) * K (x) + R (x)
Now add the CRC to the augmented message and send it away. When the recipient calculates the
CRC for this sum, and there was no transmission error, he will get zero. That's because:
M(x) * x32 + R (x) = Q (x) * K (x) (no remainder!)
You might think I made a sign mistake--it should be -R (x) on the left. Remember, however, that
in arithmetic modulo 2 addition and subtraction are the same!
We'll use this property of the CRC to test our implementation of the algorithm.
Naive CRC Calculation
The CRC algorithm requires the division of the message polynomial by the key polynomial. The
straightforward implementation follows the idea of long division, except that it's much simpler.
The coefficients of our polynomials are ones and zeros. We start with the leftmost coefficient
(leftmost bit of the message). If it's zero, we move to the next coefficient. If it's one, we subtract
the divisor. Except that subtraction modulo 2 is equivalent to exclusive or, so it's very simple.
Let's do a simple example, dividing a message 100110 by the key 101. Remember that the
corresponding polynomials are x5 + x2 + x and x2 + 1. Since the degree of the key is 2, we start
by appending two zeros to our message.
10011000 / 101
101
100
101
01
We don't even bother calculating the quotient, all we need is the remainder (the CRC), which is
01 in this case. The original message with the CRC attached reads 10011001. You can easily
convince itself that it is divisible by the key, 101, with no remainder.
In practice we don't write the top bit of the key--it is implicit. In this particular example, we
would only store bits 01 as our key.
The calculation above could be implemented using a 2-bit register for storing intermediate results
(again, the top bit is always one, so we don't store it). Let's rewrite the above example and
highlight the bits that are stored in the register at each step. The significant bits of the key are
marked in red.
0010011000 / 101
001
010
100
101
010
100
101
001
Notice that we subtract (or XOR, since this is arithmetic modulo 2) the key from the register
every time a 1 is shifted out of it.
Implementation
Let's start with the basic class, Crc. It defines the type Crc::Type as a 32-bit unsigned long
(corresponding to a 33-bit key). The constructor takes the key and stores it, and it zeroes the
register. The method Done returns the result of the CRC calculation. It also zeroes the register, so
that it can be used to calculate another CRC.
#include <cassert>
#include <iostream>
#include <string>
class Crc
{
public:
typedef unsigned long Type;
Crc (Type key)
: _key (key), _register (0)
{}
Type Done ()
{
Type tmp = _register;
_register = 0;
return tmp;
}
protected:
Type _key; // really 33-bit key, counting implicit 1 top-bit
Type _register;
};
The straightforward implementation of the CRC algorithm is not very efficient, but it will serve
as our starting point. The class SlowCrc implements a public method PutByte as well as a private
helper, PutBit. PutByte simply splits the byte into bits and sends them one-by-one to PutBit. The
value of the bit is encoded as a bool (true or false).
class SlowCrc: public Crc
{
public:
SlowCrc (Crc::Type key)
: Crc (key)
{}
void PutByte (unsigned char byte);
private:
void PutBit (bool bit);
};
void SlowCrc::PutByte (unsigned char byte)
{
unsigned char mask = 0x80; // leftmost bit
for (int j = 0; j < 8; ++j)
{
PutBit ((byte & mask) != 0);
mask >>= 1;
}
}
Here is the heart of the algorithm, PutBit. We pick the top bit from the register, shift the register
left by one, inserting a new message bit from the right. If the top bit was one, we XOR the key
into the register.
void SlowCrc::PutBit (bool bit)
{
std::cout << bit? "1": "0";
bool topBit = (_register & 0x80000000) != 0;
// shift bit into register from the right
_register <<= 1;
_register ^= (bit? 0x1: 0x0); // OR or XOR, same result
if (topBit)
{
// XOR the 32-bits of the key.
// The implicit high bit of the 33-bit key conceptually
// clears the topBit shifted out of the register
_register ^= _key;
}
}
Here is the main routine for testing the algorithm. First we extend the original message by 32 bits.
This corresponds to multiplying the original polynomial by x32. Next we calculate the CRC for
such an augmented message and add it to it. That corresponds to calculating M (x) * x32 + R (x),
where M (x) is the message polynomial and R (x) is the remainder from our division--in other
words the CRC.
As a consistency test, we divide this new polynomial (the message with the CRC appended to it)
by the key polynomial and make sure we get zero.
int main ()
{
Crc::Type const ethernetKey = 0x04c11db7;
SlowCrc slowCrc (ethernetKey);
// calculate R in: M (x) * x^32 = Q (x) * K (x) + R (x)
std::string msg ("Harry had a little lamp");
size_t origLen = msg.length ();
// "Multiply" message by x^32
msg.resize (origLen + sizeof (Crc::Type));
for (size_t i = 0; i < msg.length (); ++i)
{
std::cout << " ";
slowCrc.PutByte (msg [i]);
}
std::cout << "\n<- ";
Crc::Type crc = slowCrc.Done ();
std::cout << "\n0x" << std::hex << crc << std::endl;
// Now test consistency
// M (x) * x^32 + R (x)
int shift = 32;
for (int j = 0; j < sizeof (SlowCrc::Type); ++j)
{
shift -= 8;
msg [origLen + j]
= static_cast<unsigned char> (crc >> shift);
}
// Divide it by K (x) --> should be divisible
for (i = 0; i < msg.length (); ++i)
{
std::cout << " ";
slowCrc.PutByte (msg [i]);
}
std::cout << "\n<- ";
crc = slowCrc.Done ();
std::cout << "\n0x" << std::hex << crc << std::endl;
assert (crc == 0);
return 0;
}
Next, we'll implement the optimized version.
Optimized CRC Calculation
The main trick in optimizing the CRC calculation is to operate on bytes rather than bits. Look at
the top 8 bits of the CRC register. Scan this byte from left to right. Every time you find a nonzero bit, you XOR the key into the register at a bit offset such that this bit is turned off by the top
invisible bit of the key. (Notice that this operation can modify the remaining bits in the byte you
are scanning.) XORing is like addition, it is associative and you can do it in any order. So you
can XOR together a bunch of shifted keys corresponding to a given byte and store it for later.
This is simply a section of the slow CRC algorithm applied to our pre-calculation:
// for all possible byte values
for (unsigned i = 0; i < 256; ++i)
{
// put this byte at the top of the register
Crc::Type reg = i << 24;
// for all bits in a byte
for (int j = 0; j < 8; ++j)
{
bool topBit = (reg & 0x80000000) != 0;
reg <<= 1;
if (topBit)
reg ^= _key;
}
_table [i] = reg;
}
Once the table is pre-calculated, we can use it to evaluate the CRC of any message byte-by-byte.
At each step, we shift one byte out of the register and shift in the next message byte. We use the
shifted byte as an index to our pre-computed table. The value from the table contains the
accumulated XOR of appropriately shifted keys. We XOR this value into the register and
continue the process until the whole message is consumed.
In fact we can do even better. We can keep XORing the accumulated keys into the register, but
postpone the XORing of the message byte until its time comes to be shifted out of the register.
Imagine data flowing into the register and out of it.
As a bonus, we won't be wasting the first four cycles on pushing the null bytes through the
register and we won't need to append four null bytes to the end of the message.
Optimized Implementation
The class FastCrc contains a static table with pre-computed values. This table is actually stored
inside a class Table that is private to the class FastCrc. Notice the overloaded operator [] which
makes Table behave like an array.
class FastCrc: public Crc
{
public:
FastCrc (Crc::Type key)
: Crc (key)
{
_table.Init (key);
}
void PutByte (unsigned byte);
private:
class Table
{
public:
Table () : _key (0) {}
void Init (Crc::Type key);
Crc::Type operator [] (unsigned i)
{
return _table [i];
}
private:
Crc::Type _table [256];
Crc::Type _key;
};
private:
static Table _table;
};
The implementation file must contain the definition of the static member _table
FastCrc::Table FastCrc::_table;
The table is initialized when the first FastCrc object is created and then reinitialized only if the
key changes.
void FastCrc::Table::Init (Crc::Type key)
{
assert (key != 0);
if (key == _key)
return;
_key = key;
// for all possible byte values
for (unsigned i = 0; i < 256; ++i)
{
Crc::Type reg = i << 24;
// for all bits in a byte
for (int j = 0; j < 8; ++j)
{
bool topBit = (reg & 0x80000000) != 0;
reg <<= 1;
if (topBit)
reg ^= _key;
}
_table [i] = reg;
}
}
Look at the simplicity of the PutByte method:
void FastCrc::PutByte (unsigned byte)
{
unsigned top = _register >> 24;
top ^= byte;
_register = (_register << 8) ^ _table [top];
}
The message byte is XORed only after the top byte is shifted out of the register. A new stack of
keys obtained from the table is then XORed into the register.
To test our implementation, we will calculate the CRC using both the slow and the fast
algorithms.
int main ()
{
std::string msg ("Harry had a little lamp");
std::string exMsg (msg);
exMsg.resize (msg.length () + 4);
Crc::Type const ethernetKey = 0x04c11db7;
SlowCrc slowCrc (ethernetKey);
for (size_t i = 0; i < exMsg.length (); ++i)
{
slowCrc.PutByte (exMsg [i]);
}
Crc::Type crc = slowCrc.Done ();
std::cout << "\n0x" << std::hex << crc << std::endl;
FastCrc fastCrc (ethernetKey);
for (size_t j = 0; j < msg.length (); ++j)
{
fastCrc.PutByte (msg [j]);
}
Crc::Type newCrc = fastCrc.Done ();
std::cout << "\n0x" << std::hex << newCrc << std::endl;
return 0;
}
Download