Exam 3 Review Identification Numbers Information Fall 2014 Mathematics in Management Science Identification Numbers Modern identification numbers serve at least two functions: The number should unambiguously identify the person or thing with which it is associated . (codes) The number should have a “selfchecking” aspect. (check digits) Check Digits Check digit is extra digit appended to a number for purposes of detecting errors when copying or transmitting the number. Check digit is calculated from the rest of the number and transmitted with the number. When an error occurs, a recalculation of the check digit won’t match. Check Digit Examples USPS money orders; Traveler checks Credit cards; Car rentals UPC – Universal Product Code ISBN – International Standard Book No BIN – Bank Identification No VIN – Vehicle Identification No Check Digits Division Schemes Weighted Schemes Codabar Scheme Direct vs Indirect Methods Detecting/Correcting Errors ZIP Codes Bar Codes UPC Bar Codes AirLine Bar Codes ZIP Code Bar Codes Intelligent Mail Bar Codes Bits, Bytes, & Binary Strings A binary number is one written in base 2, so the digits are all either 0 or 1. A single binary digit (a 0 or a 1) is a bit. A byte is a group of binary digits or bits (usually eight) operated on as a unit; bytes are considered as a unit of memory size. A binary string (or word) is a list of bits. Binary Codes A system for coding data made up of two states (or symbols); “0” or “1”. Postnet code, UPC code, Morse code, Braille, etc. DVDs, Blu Ray, faxes, high defn TVs, cell phones, all use binary codes with data represented as strings of 0’s and 1’s rather than usual digits 0 through 9 and letters A through Z. ASCII Code American Standard Code for Information Interchange Parity A bit string has odd parity if the number of 1s in the string is odd. A bit string has even parity if the number of 1s in the string is even. 01100, 000, 11001001 – even parity. 1000011, 1, 00010 – odd parity. Reliable Data Transmission How to decode? Binary Linear Code Strings (words) of 0’s and 1’s with extra digits for error correction used to send fulltext messages. Words composed of all possible messages of a given length plus parity-check sum digits appended to messages; resulting strings are the code words. A binary linear code is set of binary digit strings where each string has two parts—the message part and the check-digit part. Binary Codewords A binary codeword is a string of binary digits: e.g. 00110111 is an 8-bit codeword A binary code is a collection of codewords all with the same length. A binary code C of length n is a collection of binary codewords all of length n and it is called linear if it is a subspace of {0, 1}n. In other words, 0…0 is in C and the sum of two codewords is also a codeword. Hamming (7,4) Code East to write formulas for parity bits. Given message a1a2a3a4 calculate parity check-sum digits c1c2c3 via: c1= (a1+a2+a3 ) mod 2 c2= (a1 +a3+a4 ) mod 2 c3= ( a2+a3+a4 ) mod 2 These give same as using circles! RHS equations are parity check-sums. Detecting & Correcting Errors Valid code words must satisfy parity check-sums; if not, have an error. But, if bunch of errors, a code word could get transformed to some other code word. How many 1-bit errors does it take to change a legal code word into a different legal code word? Hamming Distance The Hamming distance between two binary strings is the number of bits in which the two strings differ. dist btwn 10 and 01 is 2 dist btwn 10001 and 11001 is 1 dist btwn 00000 and 01101 is 3 dist btwn pixd words is 3 Weight of a Binary Code Suppose the weight of some binary code is t; so, it takes t 1-bit changes to convert any code word into another. Therefore, we can detect up to t-1 single bit errors. The Hamming (7,4) code has weight t=3. Thus using it we can detect 1 or 2 single bit errors. Weight of a Binary Code Suppose the weight of some binary code is t. Then, can detect up to t-1 single bit errors, or, we can correct up to (t-1)/2 errors (if t is odd), (t-2)/2 errors (if t is odd). Cannot do both. Nearest Neighbor Decoding Spp parity-check sums detect an error. Compute distances between received word and all codewords. The codeword that differs in fewest bits is used in place of received word. Thus get automatic error correction by choosing “closest” permissible answer. Types of Codes Error Detection/Correction Codes for accuracy of data Data Compression Codes for efficiency Cryptography for security Data Compression Here want to use less space to express (approximately) same info. Data compression is a process of encoding data so that the most frequently occurring data are represented by the fewest symbols. Compression Algorithms Can be lossless – meaning that original data can be reconstructed exactly – or lossy – meaning only get approximate reconstruction of the data. Examples ZIP and GIF are lossless JPEG and MPEG are lossy Run-Length Encoding (RLE) Simple form of data compression (introduced very early, but still in use). Only useful where there are long runs of same data (e.g., black and white images). Repeated symbols (runs) are replaced by a single symbol and a count. Huffman Encoding Code created using so-called code tree by arranging chars from top to bottom according to increasing probabilities. Uses code tree to both encode and decode. Must know: How to create the code tree. How to use code tree to encode/decode. Using Huffman Tree: Assigning Labels The label that gets assigned to a letter is the sequence of binary digits along the path connecting the top to the desired letter. Creating a Huffman Code Tree Constructed from a frequency table. Freq table shows number of times (as a fraction of total) that each char occurs in document. Freq table specific to the document being compressed, so every doc has its own code tree. Cryptography The study of methods to make, and break, secret codes. Process of coding information to prevent unauthorized use is called encryption. Encryption used for thousands of years. Caesar Cipher or Shift Cipher Identify letters with mod 26. A → 0, B → 1, C → 2, etc. Each char (A—Z) is “shifted” by a fixed amount, d, known to both parties. To encrypt: shift d letters to “right” <letter> → (<letter> + d) mod 26. To decrypt: shift d letters to the “left” <letter> → (<letter> − d) mod 26 Decimation Cipher Caesar cipher works by rearranging letters in a simple way: add fixed number to each letter and use mod arithmetic. Decimation cipher permutes letters in a more complicated way: add fixed number to each letter and use mod arithmetic. Again, identify the letters with Z26 , so (A = 0, B = 1, … , Z = 25). Linear Cipher Let n, m, d be as before: d = shift; (n · m) ≡ 1 mod 26 Example: n = 3 and d = 5, get m = 9. Encrypt: x → (n · x + d ) mod 26 Decrypt: y → (m · (y − d )) mod 26 Vigenère Cipher Starts with key—word, phrase, or random letters. Letters in key indicate amount to shift the corresponding letters in the message (as in Caesar cipher). Line up letters of key with letters of message; repeat key as necessary. “ Add” message and key letter by letter (mod 26). To decrypt repeat, but subtract the key from the encrypted message. Online Data Security One method uses a bit-by-bit Vigenerè cipher: Data is rep'd as a binary number. Key is a (long, randomly generated) binary number. Data is encrypted by adding the key bit-by-bit (mod 2). Data is decrypted by adding the key a second time. Public-Key Cryptography Algorithms are defined by keys: if you know the key, you know the algorithm. One key published (the public key); other key kept secret (the private key). This means one algorithm is public and the other is secret/private. Using Public-Key Cryptography To send a message, encrypt it with the recipient’s public key. To read a received message, decrypt it with your private key. RSA Algorithm Two keys: public key and private key, Either key can encrypt a message: if one key encrypts a message, the other key will decrypt it. Knowing one key does not allow finding the other key! RSA Algorithm To encode/decode messages: • sending a message encrypt with recipient’s public key • reading a received message decrypt with your private key The Keys Each key consists of two numbers: an exponent and a modulus. Public key: r and n, respectively. Private key: s and n, respectively. The modulus n is same in both. RSA Cryptography Algorithm Given key (r,n), encrypt msg as follows: 1. Convert message to string of digits. 2. Break message into uniformly sized blocks, padding last block with 0’s if need. Call the blocks M1, M2,... , Mk . 3. Check that each Mi has no common divisors with n besides 1. 4. Calculate and send Ri = (Mi )r mod n . Key Generation Algorithm 1. Pick distinct primes p, q; put n = pq. 2. Let m be LCM of p − 1 and q − 1. 3. Pick r so it has no common divisors with m except 1. (That is, r and m are relatively prime.) 4. Find s so r • s ≡ 1 mod m; i.e., s is the mult inverse of r (mod m). 5. Public/private keys are (r,n)/(s,n).