Introduction to Coding Theory Outline [1] Introduction [2] Basic assumptions [3] Correcting and detecting error patterns [4] Information rate [5] The effects of error correction and detection [6] Finding the most likely codeword transmitted [7] Some basic algebra [8] Weight and distance [9] Maximum likelihood decoding [10] Reliability of MLD [11] Error-detecting codes [12] Error-correcting codes p2. Introduction to Coding Theory [1] Introduction Coding theory The study of methods for efficient and accurate transfer of information Detecting and correcting transmission errors Information transmission system Information Source k-digit Transmitter (Encoder) n-digit Communication Channel Noise Receiver (Decoder) n-digit Information Sink k-digit p3. Introduction to Coding Theory [2] Basic assumptions Definitions Digit:0 or 1(binary digit) Word:a sequence of digits Example:0110101 Binary code:a set of words Example:1. {00,01,10,11} , 2. {0,01,001} Block code :a code having all its words of the same length Example: {00,01,10,11}, 2 is its length Codewords :words belonging to a given code |C| : Size of a code C(#codewords in C) p4. Introduction to Coding Theory Assumptions about channel n {0,1} Channel n {0,1} 1. Receiving word by word 011011001 Channel 011, 011, 001 2. Identifying the beginning of 1st word 3. The probability of any digit being affected in transmission is the same as the other one. p5. Introduction to Coding Theory Binary symmetric channel 0 p: reliability p 0 p 1 1 p 1 p 1 In many books, p denotes crossover probability. Here crossover probability(error prob.) is 1-p p6. Introduction to Coding Theory [3] Correcting and detecting error patterns Any received word should be corrected to a codeword that requires as few changes as possible. C1 {00,01,10,11} Cannot detect any errors !!! C2 {000000,010101,101010,111111} source Channel 110101 C3 {000,011,101,110} source Channel correct 010101 parity-check digit 010 correct 110 ? 000 ? 011 ? p7. Introduction to Coding Theory [4] Information rate Definition: information rate of code C 1 is defined as log 2 C n where n is the length of C Examples 1 c1 log 2 4 1 2 c2 1 3 2 c3 3 p8. Introduction to Coding Theory [5] The effects of error correction and detection 1. No error detection and correction Let C={0,1}11={0000000000, …, 11111111111} Reliability p=1-10-8 Transmission rate=107 digits/sec Then Pr(a word is transmitted incorrectly) = 1-p11 ≒11x10-8 11x10-8(wrong words/words)x107/11(words/sec)=0.1 wrong words/sec 1 wrong word / 10 sec 6 wrong words / min 360 wrong words / hr 8640 wrong words / day p9. Introduction to Coding Theory 2. parity-check digit added(Code length becomes 12 ) Any single error can be detected ! (3, 5, 7, ..errors can be detected too !) Pr(at least 2 errors in a word)=1-p12-12 x p11(1-p)≒66x10-16 So 66x10-16 x 107/12 ≒ 5.5 x 10-9 wrong words/sec one word error every 2000 days! The cost we pay is to reduce a little information rate + retransmission(after error detection!) p10. Introduction to Coding Theory 3. 3-repetition code Any single error can be corrected ! Code length becomes 33 and information rate becomes 1/3 Task:design codes with reasonable information rates low encoding and decoding costs some error-correcting capabilities p11. Introduction to Coding Theory [6] finding the most likely codeword transmitted BSC channel p :reliability d :#digits incorrectly transmitted n :code length p (, ) p nd (1 p) d Example: Code length = 5 p (, ) p5 0.9 (10101,01101) (0.9)3 (0.1)2 0.00729 p12. Introduction to Coding Theory Assume is sent when is received if p ( , ) max{ p ( u, ) : u C} Theorem 1.6.3 Suppose we have a BSC with ½ < p < 1. Let 1 and 2 be codewords and a word, each of lengthn . Suppose that 1 and disagree in d positions and 1 2 and disagree in d 2 positions. Then p ( 1 , ) p ( 2 , ) iff d1 d 2 p13. Introduction to Coding Theory Example ? channel 00110 d (number of disagreements with 01101 3 01001 4 10100 2 ← smallest d 10101 3 p 0.98 ) p14. Introduction to Coding Theory [7] Some basic algebra K {0,1} 0 1 1, 1 0 1, 1 1 0 Addition: 0 0 0, 0 0, 0 1 0, 1 1 1 Multiplication:0 0 0, 1 K n:the set of all binary words of length n Addition: 01101 11001 10100 Scalar multiplication: 0 0 n , 1 0 n :zero word p15. Introduction to Coding Theory Kn is a vector space u , v, w: words of length n a, b: scalar 1. v w K n 2. (u v) w u (v w) 3. v 0 0 v v 4. v v' v' v 0, v' K n 5. v w w v 6. av K n 7. a (v w) av aw 8. (a b)v av bv 9. (ab)v a (bv) 10. 1v v p16. Introduction to Coding Theory [8] Weight and distance Hamming weight: wt (v ) the number of times the digit 1 occurs in Example: wt (110101) 4, wt (000000) 0 Hamming distance: d (v, w) the number of positions in which Example: and w disagree d (01011,00111) 2, d (10110,10110) 0 p17. Introduction to Coding Theory Some facts: u, v, w:words of length n a : digit 1. 2. 3. 4. 5. 6. 7. 8. 9. 0 wt( v ) n wt( v ) 0 iff v 0s 0 d ( v , w) n d ( v, w) 0 iff v w d ( v, w) d ( w, v ) wt( v w) wt( v ) wt( w) d ( v , w ) d ( v , u ) d ( u , w) wt( av ) a wt( v ) d ( av, aw) a d ( v, w) p18. Introduction to Coding Theory [9] Maximum likelihood decoding w=v+u n Source string x channel k codeword u decode w Error pattern CMLD:Complete Maximum Likelihood Decoding v CMLD IMLD If only one word v in C closer to w , decode it to v If several words closest to w, select arbitrarily one of them IMLD:Incomplete Maximum Likelihood Decoding If only one word v in C closer to w, decode it to v If several words closest to w, ask for retransmission p19. Introduction to Coding Theory d (v, w) wt (v w) error pattern u vw p (v1 , w) p (v2 , w) iff wt (v1 w) wt (v2 w) The most likely codeword sent is the one with the error pattern of smallest weight Example:Construct IMLD. |M|=3 , C={0000, 1010, 0111} Error Pattern Received Decode w 0000 + w 1010 + w 0111 + w v 0000 0000 1010 0111 0000 1000 1000 0010 1111 - 0100 0100 1110 0011 0000 0010 0010 1000 0101 - 0001 0001 1011 0110 0000 1100 1100 0110 1011 - 1010 1010 0000 1101 1010 1001 1001 0011 1110 - 0110 0110 1100 0001 0111 0101 0101 1111 0010 0111 p20. Introduction to Coding Theory [10] Reliability of MLD The probability that if v is sent over a BSC of probability p then IMLD correctly concludes that v was sent p ( C , v ) p ( v , w) wL ( v ) where L(v ) : all words which are close to v The higher the probability is, the more correctly the word can be decoded! p21. Introduction to Coding Theory [11] Error-detecting codes w C Can’t detect u v + C u Can detect u Error pattern Example:C {000,111} Error Pattern u v = 000 v = 111 000 000 111 100 100 011 010 010 101 001 001 110 110 110 001 101 101 010 011 011 100 111 111 000 Can detect Can’t detect u vw 000 000 000 000 111 111 111 111 000 p22. Introduction to Coding Theory the distance of the code C : the smallest of d(v,w) in C Theorem 1.11.14 A code C of distance d will at least detect all non-zero error patterns of weight less than or equal to d-1. Moreover, there is at least one error pattern of weight d which C will not detect. t error-detecting code It detects all error patterns of weight at most t and does not detect at least one error pattern of weight t+1 A code with distance d is a d-1 error-detecting code. p23. Introduction to Coding Theory [12] Error-correcting codes v + w u Error pattern Theorem 1.12.9 For all v in C , if it is closer to v than any other word in C, a code C can correct u. A code of distance d will correct all error patterns of weight less than or equal to (d 1) / 2 . Moreover, there is at least one error pattern of weight 1+(d 1) / 2 which C will not correct. t error-correcting code It corrects all error patterns of weight at most t and does not correct at least one error pattern of weight t+1 A code of distance d is a (d 1) / 2 error-correcting code. p24. Introduction to Coding Theory C {000, 111} d 3 Received Error Pattern Decode w 000 + w 111 + w v 000 000* 111 000 100 100* 011 000 010 010* 101 000 001 001* 110 000 110 110 001* 111 101 101 010* 111 011 011 100* 111 111 111 000* 111 (3 1) / 2 1 C corrects error patterns 000,100,010,001 p25.