Fault Tolerance CDA 5140 Spring 06 Everyday FT Background • Use of check digits for error detection on everyday applications used extensively but most people unaware of it • Examples: airline tickets, credit cards, bank accounts, library books, grocery products, drivers’ licenses, passports, rental cars, UPS, express mail, UPC bar codes, etc. • Not used for SSNs, telephone numbers & some serial numbers on currency - why not? Universal Product Codes (UPC) First used on groceries, now on most products; some products now have Radio Frequency Identification (RFID) which is much more complicated UPC has 12 digits, a1, a2, … a12 each of which is from the set {0, 1, … , 9} and a1 is the product type a2 to a6 identifies the manufacturer a7 to a11 identifies the product a12 is the check digit In some products a12 is not printed but is on bar code UPC continued a12 chosen such that: (a1a2a3…a12)(3 1 3 1 3 1 3 1 3 1 3 1) = 3a1 + a2 + 3a3 + … + a12 = 0 (mod 10) Example: Kellogg product 2nd Kellogg Product Nutri Grain Product Different manufacturer UPC Examples Note for all 3 products that first digit is “0” indicating a household product First 2 products from same manufacturer, so next 5 digits same (38000) but 3rd is different manufacturer so different digits (16000) First 2 products are different, so different next 5 digits distinct (66330 and 01302) UPC examples To determine if first example is a valid code, calculate 3(0) + 1(3) + 3(8) + 1(0) + 3(0) + 1(0) + 3(6) + 1(6) + 3(3) + 1(3) + 3(0) + 1(7) = 70 = 0(mod 10) This code is able to detect any single error, for example, typed in incorrectly. Assume a digit ai is incorrectly entered as ai’ and if this were not detected, then would have: UPC example continued (a1a2a3 …ai… a12)(3 1 3 1 3 1 3 1 3 1 3 1) = 0 (mod 10) (a1a2a3 …ai’…a12)(3 1 3 1 3 1 3 1 3 1 3 1)=0 (mod 10) Thus ((a1a2a3 …ai…a12) - (a1a2a3…ai’…a12))(3 1 3 1 3 1 3 1 3 1 3 1) = 0 which implies either (ai - ai’) = 0 or 3 (ai - ai’) = 0 (mod 10) which is a contradiction, and hence a single error must be detected Version E UPC • Used for special products such as round soda cans, small items, magazines • Only 8 digits long • Uses one of 4 formulae depending on last non-check digit – If a7 is {0, 1, 2} then a8 is such that (3 1 3 3 1 3 1 1)(a1a2a3…a8) = 0 (mod 10) Version E UPC cont’d • If a7 is {3} then a8 is chosen such that (3 1 3 1 1 3 0 1) )(a1a2a3…a8) = 0 (mod 10) If a7 is {4} then a8 is chosen such that (3 1 3 1 3 3 0 1) (a1a2a3…a8) = 0 (mod 10) If a7 is {5,6,7,8,9} then a8 is chosen such that (3 1 3 1 3 1 3 1)(a1a2a3…a8) = 0 (mod 10) Version E UPC Example New Yorker Magazine Since a7 is 9, use (3 1 3 1 3 1 3 1) Other UPC Versions • Shipping container version has 2 check digits, one comparable to standard, other using modulus 103 • European countries have 13-digit version, weight vector (1 3 1 3 . . . 1 3 1) with 2 digits ahead of the manufacturer’s number to identify the country Credit Card Methods IBM developed more complex system for use with credit cards, libraries, blood banks, some vehicle agencies, etc. Use permutation that maps as: (0) = 0 (1) = 2 (2) = 4 (4) = 8 (5) = 1 (6) = 3 (8) = 7 (9) = 9 (3) = 6 (7) = 5 Credit Card Method For any string of digits (a1a2a3 . . . an-1), n even, check digit an is set such that (a1) + a2 + (a3) + a4 + ... + (an-1) + an = 0 (mod 10) For n odd, apply to even numbered positions. Consider a VISA ad with card number 4417 1234 5678 9122 - is this valid? Error Detection for UPC & Credit Card Schemes • Both schemes can detect all single digit errors. • Common error of transpositions of two digits where …ab… becomes …ba… is not detected by UPC code if |a-b| = 5, and not by IBM code if |a-b| = 9 • For UPC code, why is this true? • So, for the 90 such possible transpositions for UPC code, all except 05, 16, 27, 38, 49 and reversals are detected, & for IBM not 90 or 90 Increased Detection Ability • Next to 2-digit transformation, most common transformation is jump transformation which changes …abc… into …cba…, e.g. ...6163… to ...6361... • Neither of UPC or IBM schemes would detect these • A weight vector of (7 3 9 7 3 9 7 3) is used with 8-digit number mod 10 by banks & many western countries for passport to detect such errors, others use (7 3 1 7 3 1 …) • Both schemes detect jumps if |a-c| is not 5 Increased Detection cont’d • Other classes of errors detected depending on the 3 weights used – for example …aca… to …bcb…, or, …aa… to …bb… can be detected with the proper choice of weights • 4-weight schemes increase percentage of errors detected but order of weights becomes very important – for example, if weights are 1, 3, 9, 7, then we have 1a + 3c + 9a = 10a + 3c = 3c (mod 10) and 1b +3c + 9b = 10b + 3c = 3c (mod 10) So error not detected. What if change weights to 1, 3, 7, 9? Other Modulus Schemes All previous schemes were modulus 10. Scheme for ISBN (book) numbers is mod 11 & detects all single errors and transposition errors. 9-digit number and one check digit, (a1a2a3…a10)with weight vector (10 9 8 7 6 5 4 3 2 1) gives: (a1a2a3…a10)(10 9 8 7 6 5 4 3 2 1) = 0 (mod 11) if a10 is 10 then it is written as “X” Our text has ISBN number 0-471-29342-3 and sums to 187 = ? Other Modulus Schemes Various schemes developed to avoid use of X but not as well known as ISBN. Most common error is single digit but many schemes don’t cover such as choice is based on simplicity, e.g. USPS money orders, FedEx, UPS packages simply assign check digits based on mod 9 or 7 Other Modulus Schemes Airline companies, FedEx, UPS divide number by 7 & take remainder as check digit For example, UPS number 601 399 957 divided by 7 gives a remainder of 4 so on the package is: 601 399 957 4 division by 7 not as effective at detecting single errors but is fairly effective at detecting transpositions Alphanumeric Schemes • Include alphanumeric characters, for example the “3-out-of-9” code or 39 Code, allows digits {0, 1, … 9} and letters {A, B, …, Z} and characters - and . and space, used by DoD, automotive and health industries • Letters assigned numbers 10 through 35, and other 3 characters, 36,37 and 38 respectively • Multiply (a1a2a3 …an)(n n-1 … 3 2 1) mod 39 and take remainder as check digit, converted to alphanumeric if necessary • Can add $,/,+,% as 39 through 42 to detect all single digit errors & all transposition errors Error Correcting Approaches Some schemes add second check digit to give more ‘power’ Example: Norwegian citizen registration numbers with 9 digits plus 2 check digits calculated as: (a1a2a3…a10)(3 7 6 1 8 9 4 5 2 1) = 0 (mod 11) (a1a2a3…a11)(5 4 3 2 7 6 5 4 3 2 1) = 0 (mod 11) Detects all single digit errors and all double errors except those where difference between correct version & incorrect one is of form (0 0 0 a 0 0 0 0 0 11-a 0) Why? Error Correcting Numbers such that check digits would be “10” not used More powerful version detects all double errors and corrects all single errors The information is 8 digits and check is 2 digits: (a1a2a3…a10)(1 1 1 1 1 1 1 1 1 1) = 0 (mod 11) (a1a2a3…a10)(1 2 3 4 5 6 7 8 9 10) = 0 (mod 11) Note that omitting numbers with check digit ‘10’ still gives over 82 million numbers. Error Correcting First calculation determines magnitude m of any single error, since if there were no errors, sum would be 0 modulus 11 Second calculation, assuming single error, will then identify position i of error since if sum s is non-zero, have mi = s, so determine that position i is too large by m (use modulus properties to get an even multiple of i) Thus, knowing magnitude & position, single error can be corrected. Error Correcting Given 8 information digits 73245018 can calculate a9 and a10 (7 3 2 4 5 0 1 8 a9 a10)(1 1 1 1 1 1 1 1 1 1) = 7 + 3 + 2 + 4 + 5 + 1 + 8 + a9 + a10 = 30 + a9 + a10 = 8 + a9 + a10 = 0 (mod 11) (7 3 2 4 5 0 1 8 a9 a10)(1 2 3 4 5 6 7 8 9 10) = 7 + 6 + 6 + 16 + 25 + 7 + 64 + 9a9 + 10a10 = 131 + 9a9 + 10a10 = 10 + 9a9 + 10a10 = 0 (mod 11) 2 equations in 2 unknowns - what are values of a9 and a10? Summary • Various schemes discussed have many uses when used to detect errors • Techniques based on Number Theory properties and used sum of zero but any digit could be used • Techniques likely to be extended to much more powerful but simple-to-calculate schemes