Csci1802 Computer Systems MatricesLecture 3 Learning outcomes Knowledge of the use of matrices in data transmission Matrices 3 - matrix encoding 1 Introduction We are interested in binary codes like ASCII. We want to be able to send messages in these codes and to take steps to pick up any errors in transmission. We are not interested in secret messages - just coding to guarantee the message is transmitted without error. 2 Error detecting codes We send a vector x of bits e.g. an 8 bit ASCII word like 11010101. If there is noise on the line then we might get the vector y x at the other end. If there is an error in one bit then we could get 11010001. It would be useful to know that there was a problem. We can do this if we use an error detecting code. p1 Csci1802 Computer Systems 2.1 Example Don't just send the 8 bit word but add a parity check digit - so we send 9 bits. The final digit is chosen so that the number of ones in the transmitted word is even. In the above case we could transmit 110101011 - the final 1 chosen to make an even parity word. If there is an error in transmission which results in a wrong bit being received -so for example we get 110100011- we detect the error since the parity is now odd. Of course this idea is only 1-error detecting. Two errors will be missed - 110000011 has even parity and so we don't realise there is a problem. However, the chance of two errors will be small compared to the chance of one error so maybe we can live with this - or if we can't we could try a more complex form of coding. There is always a trade off between how much error detection you can do and how much effort you are willing to put into the coding/decoding process. 2.2 Exercise We could have used a repetition code. In this case you send 1101010111010101. This is the word repeated. This method is also error detecting. Is it 1-error or 2-error detecting? 3 Error correction Even better than error detection is error correction. There are codes which allow you to correct the received word. Obviously you can only correct up to a point - if there are too many errors you cannot cope. p2 Csci1802 Computer Systems 3.1 Example Triple repetition. Send 110101011101010111010101. This gives error detection and correction. If one mistake occurs then two copies will agree and so we know what was sent. Hence this is a 1-error correcting code. 4 Hamming distance Notice that in the above examples that word length increases when we have detection and correction. One of the aims of coding is to try and keep the word length down but still get good correction - but we have to compromise between the two. The basic idea is that we start with some original messages. These are short but there is a sense in which the words are very close together. In order to have detection and correction of errors we need to code the messages as words that are further apart. Unfortunately we can only do this if we make the code words longer. 4.1 Example The ASCII code is 8 bit. It has 256 messages. There are no spare bytes - so any error will give a word that decodes. ASCII with parity check (i.e. a 9 bit code) has 512 available words but only uses 256 of them - so some errors will be found. This is because an error in one bit results in a received word which doesn't stand for anything. The 9 bit codewords are further apart - a single mistake can't get you to another meaningful word. The Hamming distance between two vectors is the number of places where the two vectors differ. 4.2 Example d(110,101)=2, d(000,111)=3, d(100,101)=1, d(100,100)=0. The minimum distance d of a coding scheme is the shortest distance between any two distinct vectors. For example if we use extended ASCII then d=1 since there are vectors of distance 1 from each other. However, if we use ASCII with an added parity check digit then we get d=2 (why?). The importance of the minimum distance d is given by the following two facts: The number of errors that can be detected is d-1 The number of errors that can be corrected is t if d=2t+1. So the bigger we get d the better provided we don't increase the word length too much. p3 Csci1802 Computer Systems 5 Matrix coding It turns out that we can generate nice codes with matrices. These codes also have quick decoding techniques as well (which is important) - but the mathematics of that is a bit too sophisticated for an introductory course. Since we are using binary codes we do all our matrix arithmetic MOD 2. That means that 1+1=0. Apart from that matrix multiplication is as before. The words we want to transmit are vectors - in our case we could take all our messages as 8 bit ASCII words. Every 8 bit vector stands for something. We code the words using an 8 n matrix G (where n>8). The word x is coded as xG (i.e. the vector x multiplied by the matrix G). This will give us a longer vector (actually of length n). Giving a real example would involve messy calculations which are a bit tedious- so we will illustrate the ideas with a restricted two bit code. In the real world you just use bigger matrices and get a machine to do the calculations. 5.1 Example We will have a two bit binary code - so we can only have four symbols in our word set. Let our messages be: "a" 00 "b" 01 "c" 10 "d" 11 1 0 1 1 We use coding matrix G = 0 1 0 1 Then "a" codes as 1 0 1 1 = (0 0 0 0). (0 0) 0 1 0 1 "c" will code as 1 0 1 1 = (1 0 1 1) (1 0) 0 1 0 1 5.2 Exercise Find the codes for "b" and "d". You should notice that a message codes as itself followed by some extra bits - the 'parity check' bits. Don't forget that the arithmetic is MOD 2. When using matrix codes we perform 'parity checking' with a matrix as well. Recall that I is the identity matrix and that AT is the transpose of A. The coding matrices always have a special form. The recipe for error detection uses this. p4 Csci1802 Computer Systems 1 1 G= (I | A ) and in the example case A= 0 1 1 0 we get H = We put H = (AT | I ) and since AT = 1 1 1 0 1 0 . 1 1 0 1 H is called the parity check matrix. We use H to check if the received word is in the code: x is in the code if HxT = 0 ; otherwise there has been an error. 5.3 Example (a) 1011 is sent. The vector received at the other end turns out to be 1011. The person receiving does the calculation 1 0 H 1 1 0 = 0 so everything OK. (b) 1011 is sent. The vector received is this time is 1010. The person receiving does the calculation 1 0 0 H = 1 1 0 Since this is not the zero vector the word is not in the code and there has been an error. p5 Csci1802 Computer Systems We can use these vector methods for error correcting codes as well but you need to know more mathematics for that. p6 Csci1802 Computer Systems Exercises for Matrices Lecture 3 Error detecting codes 1 a. Find the code generated by the matrix 1 0 1 G = 0 1 1 Take the original messages to be 00,10,01,11 standing for the restricted alphabet 'a','b','c','d'. b. Work out the distance value for the original code and for the new code. In each case calculate how many errors can be detected and corrected. 2. Find the parity check matrix H for the code you get when you code the restricted 1 0 1 1 alphabet [a,b,c,d] using G = 0 1 1 0 3 The probability of an error in a single byte transmission is 0.001. The triple repetition code is used to send a single ASCII character. What is the probability that it will be transmitted and correcly decoded. 4 Take the following binary coding for the letters a,b,c,d: a is 0011 b is 1001 c is 0000 d is 1010 Find the d value for this code (i.e. the minimum distance between codewords). Use G to form a new code where G is the matrix 1 0 G= 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1 1 Find the d value for this new code. p7 Csci1802 Computer Systems How many errors can this new code detect? How many errors can it correct? Find the parity check matrix H and use it to decide if any of the following are in the code 0101010 1111110 1001001 1110100. 5 p8 Write down two 2 by 5 coding matrices and their corresponding parity check matrices.