Additional Explanation of the Luhn Algorithm

advertisement
Additional Explanation of the Luhn Algorithm
If you've had trouble following the discussion of the Luhn algorithm in the book, this
alternate, lengthier explanation is for you. Big thanks to reader Aaron Mirandon, who
reviewed this document and made numerous helpful suggestions.
The Luhn algorithm is used to guard against data entry errors in multidigit numbers for
things like product codes. It does so through the creation of a check digit, which is an
extra digit that is computed from the digits of the original product code and is appended
to the end of product code. Because only one value of the check digit "matches" a
particular product code, mistyping digits usually results in a product code that doesn't
match its check digit.
There are two related processes: 1. generation of the check digit for the original number,
and 2. validation of a number (with check digit appended).
The algorithm is based on the idea that the sum of the digits in the product code should be
a multiple of 10. That is, for the first process – picking a check digit – the algorithm will
choose a value, that, when appended to the original number, will result in all of the digits
equally a multiple of 10. Likewise, in the second process – validation – the algorithm will
sum all the digits to see if they are a multiple of 10.
What makes these processes tricky is that every other digit is "doubled." The word
"doubled" is in quotes because it's not a strict doubling if the result is a two-digit number.
For example, if we "double" the digit 7, instead of simply multiplying 7 by 2 to get 14
and adding 14 into the sum, we instead put the two digits 1 and 4 into the sum. Thus the
"doubling" of 7 results in adding 1 and 4, or a total of 5, into the overall sum.
To make the check digit generation process clearer, here's an expansion of the example
used in the book. We'll start with the number 176248 and determine the correct check
digit.
The dotted outline shows the original number. The first thing to note is that every other
digit is "doubled," starting with the rightmost digit of the original number and proceeding
leftward. In this case, with six digits in the original number, the second, fourth, and sixth
digit (shown as shaded boxes) are doubled – the digits with the values 7, 2, and 8. While
we determine which digits to double starting with the rightmost digit of the original
number, in this case, the 8 of 176248, we don't have to add the digits right-to-left; we can
add them in any order. If we know how many digits are in the original number, for
example, we'll know which digits have to be doubled.
Remember the rule about doubling the digits. The 7 is doubled to become 1 and 4; the 2
simply becomes 4; the 8 becomes 1 and 6.
All of the digits, doubled or not, are summed to get 27. The only value for the check digit
that results in a number that is a multiple of 10 is 3. With a check digit of 3, the sum of
the entire number including the check digit is 30.
Here's the validation process on the resulting number, 1762483.
Because this is validation, we assume the rightmost digit of this is a check digit, so it's the
second digit from the right that is the rightmost digit of the original number. Therefore
it's this digit, the 8, that is the first digit to be doubled. Following the same route as
before, except now there's the check digit to be included in the sum. This sum is 30,
which is a multiple of 10, so this number, 1762483, is valid.
Here's another example. Our original number here is 34941. Let's generate a check digit:
Again, the rightmost digit of the original number, the 1, is doubled, along with every
other number proceeding leftward. So the first digit (3), third digit (9), and fifth digit (1)
are doubled. Note that the original number in the first example had an even count of
digits, so the first digit (leftmost digit) was not doubled. The original number in this case,
though, has an odd count of digits, so the first digit is doubled. This is what leads to the
problem explained in the book; if we don't know how many digits are in a number, and
we are trying to process the digits are they are being read (that is, left-to-right), we won't
know which digits are doubled. If we did know how many digits were in the number, we
wouldn't have this problem, and mathematically nothing prevents us from processing the
digits in any order.
This fact – that to know which digits are doubled, you have to know how many digits
there are – is the main source of difficulty in the problem as described in the book. If we
allowed the program to read all the digits first – into an array, for example – and only
process the digits, we wouldn't have had to go through all the problem-solving steps
outlined in the chapter. We would have known how many digits were in the number and
therefore would know which digits were doubled and which were not. But because the
problem precludes using an array and requires processing the individual digits as they are
read, we have to do more work. In effect, we have to process the digits left-to-right even
though, to know which digits to double, we have to picture the number right-to-left.
Anyway, in this second example, the sum of the digits in the original number is 25.
Therefore 5 is the correct value for the check digit, the value that will make the sum of all
digits a multiple of 10. Now let's take the number with its appended check digit of 5,
349415, and validate:
Again, we double every second digit, starting with the rightmost digit of the original
number. Summing everything including the check digit, we get a multiple of 30.
Lastly, here's an example of a failed validation. Let's try to validate 95433:
As before, we double every second digit, starting with the rightmost digit of the original
number (we assume during validation that the rightmost digit overall is a check digit). In
this case, the result is 23, which is not a multiple of 10. This number is therefore invalid.
The Luhn algorithm is good at picking up the kinds of errors commonly made during data
entry. If a wrong digit is typed, the checksum will no longer be a multiple of 10 and
therefore will fail to validate. Also, if two digits are transposed, like typing 1764283
instead of 1762483, because of the alternate-number doubling, the checksum will again
be different. Of course, if multiple mistakes are made, there's a chance they will cancel
each other out and the checksum will be valid.
Download