Cyclic Redundancy Check

advertisement
Cyclic Redundancy Check
http://www.relisoft.com/Science/CrcMath.html
Error detection is important whenever there is a non-zero chance of your data getting corrupted.
Whether it's an Ethernet packet or a file under the control of your application, you can add a piece of
redundant information to validate it.
The simplest example is a parity bit. Many computers use one parity bit per byte of memory.
Every time the byte gets written, the computer counts the number of non-zero bits in it. If the number
is even, it sets the ninth parity bit, otherwise it clears it. When reading the byte, the computer counts
the number of non-zero bits in the byte, plus the parity bit. If any of the nine bits is flipped, the sum
will be odd and the computer will halt with a memory error. (Of course, if two bits are flipped--a much
rarer occurrence--this system will not detect it.)
For messages longer than one byte, you'd like to store more than one bit of redundant
information. You might, for instance, calculate a checksum. Just add together all the bytes in the
message and append (or store somewhere else) the sum. Usually the sum is truncated to, say, 32
bits. This system will detect many types of corruption with a reasonable probability. It will, however,
fail badly when the message is modified by inverting or swapping groups of bytes. Also, it will fail
when you add or remove null bytes.
Calculating a Cyclic Redundancy Check is a much more robust error checking algorithm. In this
article I will sketch the mathematical foundations of the CRC calculation and describe two C++
implementations--first the slow but simple one, then the more optimized one.
Polynomials
Here's a simple polynomial, 2x2 - 3x + 7. It is a function of some variable x, which depends only on
powers of x. The degree of a polynomial is equal to the highest power of x in it; here it is 2 because of
the x2 term. A polynomial is fully specified by listing its coefficients, in this case (2, -3, 7). Notice that
to define a degree-d polynomial you have to specify d + 1 coefficients.
It's easy to multiply polynomials. For instance,
(2x2
- 3x + 7) * (x + 2)
= 2x3 + 4x2 - 3x2 - 6x + 7x + 14
= 2x3 + x2 + x + 14
Conversely, it is also possible to divide polynomials. For instance, the above equation can be
rewritten as a division:
(2x3 + x2 + x + 14) / (x + 2) = 2x2 - 3x + 7
Just like in integer arithmetic, one polynomial doesn't have to be divisible by another. But you
can always divide out the "whole" part and be left with the remainder. For instance x2 - 2x is not
divisible by x + 1, but you can calculate the quotient to be x - 3 and the remainder to be 3:
(x2 - 2x) = (x + 1) * (x - 3) + 3
In fact you can use a version of long division to perform such calculations
Arithmetic Modulo Two
Most of us are familiar with polynomials whose coefficients are real numbers. In general, however, you
can define polynomials with coefficients taken from arbitrary sets. One such set (in fact a field)
consists of the numbers 0 and 1 with arithmetic defined modulo 2. It means that you perform
arithmetic as usual, but if you get something greater than 1 you keep only its remainder after division
by 2. In particular, if you get 2, you keep 0. Here's the addition table:
0 + 0 = 0
0 + 1 = 1 + 0 = 1
1 + 1 = 0 (because 2 has remainder 0 after dividing by 2)
The multiplication table is equally simple:
0 * 0 = 0
0 * 1 = 1 * 0 = 0
1 * 1 = 1
What's more, subtraction is also well defined (in fact the subtraction table is identical to the
addition table) and so is division (except for division by zero). What is nice, from the point of view of
computer programming, is that both addition and subtraction modulo 2 are equivalent to bitwise
exclusive or (XOR).
Now imagine a polynomial whose coefficients are zeros and ones, with the rule that all arithmetic
on these coefficients is performed modulo 2. You can add, subtract, multiply and divide such
polynomials (they form a ring). For instance, let's do some easy multiplication:
(1x2 + 0x + 1) * (1x + 1)
= 1x3 + 1x2 + 0x2 + 0x + 1x + 1
= 1x3 + 1x2 + 1x + 1
Let's now simplify our notation by representing a polynomial as a series of coefficients. For
instance, 1x2 + 0x + 1 has coefficients (1, 0, 1), 1x + 1 (1, 1), and 1x3 + 1x2 + 1x + 1 (1, 1, 1, 1).
Do you see what I am driving at? A polynomial with coefficients modulo 2 can be represented as
a series of bits. Conversely, any series of bits can be looked upon as a polynomial. In particular any
binary message, which is nothing but a series of bits, is equivalent to a polynomial.
CRC
Take a binary message and convert it to a polynomial then divide it by another predefined polynomial
called the key. The remainder from this division is the CRC. Now transmit both the message and the
CRC. The recipient of the transmission does the same operation (divides the message by the same
key) and compares his CRC with yours. If they differ, the message must have been mangled. If, on
the other hand, they are equal, the odds are pretty good that the message went through uncorrupted.
Most localized corruptions (burst of errors) will be caught using this scheme.
Not all keys are equally good. The longer the key, the better error checking. On the other hand,
the calculations with long keys can get pretty involved. Ethernet packets use a 32-bit CRC
corresponding to degree-31 remainder (remember, you need d + 1 coefficients for a degree-d
polynomial). Since the degree of the remainder is always less than the degree of the divisor, the
Ethernet key must be a polynomial of degree 32. A polynomial of degree 32 has 33 coefficients
requiring a 33-bit number to store it. However, since we know that the highest coefficient (in front of
x32) is 1, we don't have to store it. The key used by the Ethernet is 0x04c11db7. It corresponds to the
polynomial:
x32 + x26 + ... + x2 + x + 1
There is one more trick used in packaging CRCs. First calculate the CRC for a message to which
you have appended 32 zero bits. Suppose that the message had N bits, thus corresponding to degree
N-1 polynomial. After appending 32 bits, it will correspond to a degree N + 31 polynomial. The toplevel bit that was multiplying xN-1 will be now multiplying xN+31 and so on. In all, this operation is
equivalent to multiplying the message polynomial by x32. If we denote the original message
polynomial by M (x), the key polynomial by K (x) and the CRC by R (x) (remainder) we have:
M * x32 = Q (x) * K (x) + R (x)
Now add the CRC to the augmented message and send it away. When the recipient calculates the
CRC for this sum, and there was no transmission error, he will get zero. That's because:
M * x32 + R (x) = Q (x) * K (x) (no remainder!)
You might think I made a sign mistake--it should be -R (x) on the left. Remember, however,
that in arithmetic modulo 2 addition and subtraction are the same!
We'll use this property of the CRC to test our implementation of the algorithm.
Naive CRC Calculation
The CRC algorithm requires the division of the message polynomial by the key polynomial. The
straightforward implementation follows the idea of long division, except that it's much simpler. The
coefficients of our polynomials are ones and zeros. We start with the leftmost coefficient (leftmost bit
of the message). If it's zero, we move to the next coefficient. If it's one, we subtract the divisor.
Except that subtraction modulo 2 is equivalent to exclusive or, so it's very simple.
Let's do a simple example, dividing a message 100110 by the key 101. Remember that the
corresponding polynomials are x5 + x2 + x and x2 + 1. Since the degree of the key is 2, we start by
appending two zeros to our message.
10011000 / 101
101
111
101
100
101
100
101
01
We don't even bother calculating the quotient, all we need is the remainder (the CRC), which is
01 in this case. The original message with the CRC attached reads 10011001. You can easily convince
itself that it is divisible by the key, 101, with no remainder.
Download