Identification Numbers and Error Detection Meredith Wachs Where do you find identification numbers? Checks Credit cards Driver’s licenses VIN Numbers Zip Codes SSN ISBN Numbers Bar Codes/UPC’s (Universal Product Codes) Check Digits Used to reduce error “Grocery items, credit cards, overnight mail, magazines, personal checks, traveler’s checks, soft drink cans, and automobiles” have check digits (COMAP) Congruence mod n “a is congruent to b, mod n, if n divides a-b” (Stillwell, Elements of Number Theory) For example, 27 is congruent to 6 mod 3 because 27-6=21, which is divisible by three. Similarly, a number is congruent (mod n) to its remainder after being divided by n; for example, 20/3=6 R2, so 20=2 (mod 3) Money Order COMAP example- The identification number on a money order is 63024383845. The last digit is the check digit, so the first 10 digits should be 5 mod 9. How do we divide such a large number by 9? How do we know this? Calculating Divisibility Rules Take a number z=abcdefg This could also be written as z = g*10^0 + f*10^1 + e*10^2 + d*10^3 + c*10^4 + b*10^5 + a*10^6 Let’s replace each 10^n by its congruence mod 9 Divisibility by 9 continued 10^0=1=1 (mod 9) 10^1=10=1 (mod 9) 10^2=1*1=1 (mod 9) By analogy, we see that every place will have a weight of 1, so a(1)+b(1)+c(1)+d(1)+e(1)+f(1)+g(1) leaves the same remainder (mod 9) as abcdefg, so this is our divisibility rule. Divisibility by 11 We’ll proceed in the same way: 10^0=1 mod 11 10^1=-1 mod 11 10^2= -1*-1=1 mod 11 10^3= 10^2*10^1=1*-1=-1 mod 11 In this way, we see that abcdefg=(g+e+c+a)(f+d+b) (mod 11) Ex. 3458679=(9+6+5+3)-(7+8+4)=4 (mod 11), so is not divisible by 11. Money Order Revisited If the check digit only checks the divisibility of the sum, what can be overlooked? Potential Problems with the Check Digit Any number can be replaced with its congruence mod n (so 0 can be replaced with 9 in this example) Also, any digits can be easily transposed. Other Uses of the Check Digit Traveler’s checks and Euro banknotes use the check digit, but the entire number, including the check digit, should be divisible by 9. If you know the divisor rule, you can easily figure out what the check digit should be. For example, if a traveler’s check ID number is 3487956321 and it has a divisor rule of 9, what should the check digit be? Answer The sum of the digits is 48, and 48+6=54, which is divisible by 9, so the check digit should be 6. More Sophisticated- UPC’s A 12-digit number that is found along the bottom of a barcode A BBBBB CCCCC D “A” is the type of good, “B” is the manufacturer’s code, “C” is the product code, “D” is the check digit (COMAP) UPC Error Prevention Take the code A BCDEF GHIJK L. Going from left to right, calculate 3A+B+3C+D+3E+F+3G+H+3I+J+3K+L. If it isn’t divisible by 10, it is an incorrect UPC. This “detects all single-position errors and about 89% of other errors” (COMAP 326). Try this on your own! The U.S. Banking System To identify a certain bank, it has an identification number (the first string on the bottom of the check). The string is ABCDEFGH, and the check digit is I. “I” must be the last digit of the resulting sum of 7A+3B+9C+7D+3E+9F+7G+3H http://blog.wellsfargo.com/GuidedByHistory/images/Earth_Check_large.jpg Here, we can see that 7(1)+3(2)+9(1)+0+0+0+7(2)+3(4)= 48. Since 8 is the check digit, this is a valid number. Credit Card Numbers A more effective algorithm, which is used for credit cards, is called Codabar, and is also used by “libraries, blood banks, photofinishing companies, German banks, and the South Dakota driver’s license department” (COMAP 327). Codabar “allows computers to detect 100% of singleposition errors and about 98% of other common errors” (COMAP 328). The Codabar algorithm was developed by Hans Peter Luhn (1896-1964) and was patented in 1960 (http://www.merriampark.com/anatomycc.htm). The Algorithm Assume a 16-digit card number, with the final number being the check digit. Add every digit in the oddnumbered spaces (going from left to right) and multiply by two. Then add the number of digits in odd-numbered spaces that >4. Finally, add all the digits in the even-numbered spaces (except for the check digit). The check digit will need to be whatever will make the total divisible by 10. Ex. What is the check digit for card number 124785943967210? Solution Ex. What is the check digit for card number 124785943967210? 1+4+8+9+3+6+2+0=33*2=66 66+3=69 69+2+7+5+4+9+7+1= 104 104+6=110, so the check digit is 6. ISBN Numbers The 10-digit International Standard Book Number detects “100% of single errors and 100% of transposition errors” (COMAP 328) An ISBN of A-BCDE-FGHI-J, with J as the check digit, is valid if 10A+9B+8C+7D+6E+5F+4G+3H+2I+J is divisible by 11. ISBN’s continued Example: 0-387-95587-9 10(0)+9(3)+8(8)+7(7)+6(9)+5(5)+4(5)+3(8) +2(7)+9=286 286 is divisible by 11 because (6+2)-8=0, which is divisible by 11. Why does this work every time? COMAP proof: Let there be an error in the B slot called B’. Then both calculations (with and without errors) must be divisible by 11 in order for B’ to not be detected. Then the difference between the calculations must be divisible by 11. Proof continued… So (10A+9B+8C+7D+6E+5F+4G+3H+2I+J) – (10A+9B’+8C+7D+6E+5F+4G+3H+2I+J)= 9(B-B’). Since B,B’ <=9, B-B’ cannot be 11, and so 9(B-B’) cannot be divisible by 11 unless B=B’. Another Example What is the check digit for the ISBN 07167-1910? Solution 10(0)+9(7)+8(1)+7(6)+6(7)+5(1)+4(9)+3(1) +2(0)= 199 The next number divisible by 11 is 199+10=209. To represent 10, ISBN numbers have an X. Code 39 This uses the digits 0-9 and letters A-Z (which correspond to 10-35) Code 39 is used by the DoD, automotive companies, and the health industry (COMAP 329). A 15-character string is validated by whether 15a+14b+13c+….1*o is divisible by 36 (with o as the check digit). The VIN system is a more complicated alphanumeric system. Bar Codes “To decode the information in a bar code, a beam of light is passed over the bars and spaces via a scanning device, such as a handheld wand of a fixed-beam device. The dark bars reflect very little light back to the scanner, whereas the light spaces reflect much light. The differences in reflection intensities are detected by the scanner and converted to strings of 0’s and 1’s that represent specific numbers and letters.” (COMAP 334) Postnet Codes A bar code for a ZIP+4 code (+1 check digit) There are 52 long or short bars, with one “guard bar” on either side and the remaining 50 bars grouped into 10 groups of 5, with 2 long bars and 3 short bars each. The check digit makes the sum of all ten numbers divisible by 10. http://www-math.cudenver.edu/~wcherowi/jcorner/barcodes.html UPC Bar Codes Each digit is made up of seven modules There are guard bars, a center division, and a difference between manufacturer and product numbers to make reading the codes as accurate as possible. Bar codes for UPC’s have been in use since a pack of Wrigley Juicy Fruit gum was scanned on June 26, 1974 in Marsh’s Supermarket in Troy, Ohio. The first barcodes were made by National Cash Register, but smearing ink problems soon made IBM the top contender in the market. (http://en.wikipedia.org/wiki/Barcode) Digit Manufacturer Product 0 0001101 1110010 1 0011001 1100110 2 0010011 1101100 3 0111101 1000010 4 0100011 1011100 5 0110001 1001110 6 0101111 1010000 7 0111011 1000100 8 0110111 1001000 9 0001011 1110100 http://en.wikipedia.org/wiki/Universal_Product_Code Is the UPC above valid? (3*0+1*3+3*6+…3*5+1*2 = 60 = 0 mod 10, so yes, it is valid.) Illinois Driver’s License Numbers In contrast to Social Security numbers, an Illinois driver’s license number can help reconstruct a person’s surname (by sound, not by spelling), first and middle initials, date of birth, and gender. These forms of ID numbers are also used in the National Archives, the Library of Congress, and in genealogy research. (COMAP 341-2). Soundex Coding System for Surnames 1. 2. 3. Delete all occurrences of h and w. (so “Wachs” becomes “acs”) Assign number as follows: a,e,i,o,u,y = 0; b,f,p,v = 1; c,g,j,k,q,s,x,z = 2; d,t = 3; l = 4; m,n = 5; r = 6 (so “acs” = 022) If two or more letters with the same numeric value are adjacent, omit all but the first (so we’re left with “ac”) Soundex continued… 4. Delete the first character of the original name if still present. 5. Delete all occurrences of a,e,i,o,u,y. (so we’re left with “c”) 6. Retain only the first three digits corresponding to the remaining letters; append trailing 0’s if fewer than three letters remain; precede the three digits with the first letter of the surname (so we have W200) Because of the way this is coded, many errors in spelling are taken into account. *taken directly from COMAP 342 The Middle Digits-First Initial InitialCodeInitialCodeInitialCodeInitialCode A B C D E F G 0 60 100 160 200 240 280 H I J K L M N 320 400 420 500 520 540 620 O P Q R S T U 640 660 700 720 780 800 840 V W X Y Z 860 880 940 960 980 The Middle Digits-Middle Initial InitialCodeInitialCodeInitialCodeInitialCode A B C D E F G 1 2 3 4 5 6 7 H I J K L M N 8 9 10 11 12 13 14 O P Q R S T U 14 V 15 W 15 X 16 Y 17 Z 18 18 18 19 19 19 19 Calculating the Middle Digits Add the code for the first initial to the code for the middle initial. For my initials, MJ, you have 540+10=550. So far, my number is W200-550. All middle digits information taken from http://www.highprogrammer.com/alan/num bers/dl_us_shared.html The Last Five Digits In Illinois, the last five digits retain the birth date and sex of the person. Each month is assumed to have 31 days (starting with January 1 as 001). My birthday, March 2, is therefore 2*31=62+2=064. If male, you are done. If female, add 600 to this number (so I am 664). Put the last two digits of your birth year before this number. Thus, the last five digits of my Illinois driver’s license is 8-8664. My complete number is W200-55088664. As the website author points out, IL not having overflow numbers makes it likely to have multiple people with the same number printed on their license. *COMAP 343 Conclusions We have seen several ways of identifying objects or people with numbers and the likelihood of error and the ways error is caught with each one. As COMAP points out on p. 329, “Like many practices in the ‘real world,’ historical accident and lack of knowledge about existing methods seem to be the explanation [for having so many means of identification].” We see this especially in the difference between driver’s license numbers and SSN, which were assigned prior to computers and the development of many of the systems used (like Soundex).