TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Problem 1. In general, the entropy H(X) of a discrete random variable X is defined as H(X ) PX ( x) log2 PX ( x) x X So now we can write the solution directly as 1 1 1 1 H ( X ) log2 (2) log2 (4) log2 (8) log2 (8) 2 4 8 8 1 2 3 3 2 4 8 8 4 4 3 3 8 8 8 8 14 8 1.75 bits where we have used the facts that logy(y x) = x and logy(1/x) = –logy(x). Then, the rate of the source is given by R rH ( X ) 175 bits / sec © Mikko Valkama / TUT 1 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Following encoding "rule" will achieve the entropy limit (and is also feasible to decode since no codeword is a prefix of any other codeword): Outcome Probability Codeword Codeword length a1 1/2 0 1 bit a2 1/4 10 2 bits a3 1/8 110 3 bits a4 1/8 111 3 bits The average number of bits produced by the coder is now 1 1 1 1 1 2 3 3 1.75 bits H ( X ) 2 4 8 8 meaning that in this example we indeed achieved the entropy limit. It is, however, not necessarily obvious when it is possible to reach the entropy limit in general. This issue will be considered in Problem 2. © Mikko Valkama / TUT 2 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Problem 2. Kraft's inequality for code word lengths is given by K M li 1 i 1 where M is the number of code symbols (=2 for binary codes), K is the number of source words (and so also the number of code words) and li is the length of the i-th code word. It can be shown that every code (prefix or not) has to satisfy the above inequality to be uniquely decodable. In general, the average code word length is then defined as (this is the natural definition that we used already in the Exercise 1) K L li Pi i 1 where Pi is the probability of the i-th source word. Then, the target is to minimize L given that the Kraft's inequality is fulfilled (and given of course that li‘s are integers and > 0 for all i). © Mikko Valkama / TUT 3 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Clearly, if we are able to choose li logM 1 (verify that these codeword lengths li fulfill the Kraft’s ineq!) Pi it follows that the average code word length L is given by K K K 1 L li Pi Pi log M Pi log M Pi H M ( X ) . Pi i 1 i 1 i 1 So it really is possible to obtain the average codeword length equal to the source word entropy. Of course, in practice, this is only possible if the source word probabilities are of certain kind (i.e., –logM(Pi) is integer i). More generally, we can "round up the above idea" by choosing 1 li log M Pi where x stands for the smallest integer number x. Then, it follows that also these lengths li will satisfy the Kraft's inequality since K M i 1 li K 1 log M Pi M i 1 © Mikko Valkama / TUT K M log M 1 Pi i 1 K M i 1 4 / 14 log M Pi K Pi 1 . i 1 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Based on this, we really can choose the code word lengths in this way (that is, based on the Kraft's inequality, there really is a code with these code word lengths li) and we can write log M 1 1 log M log M 1 (by definition of x ) Pi Pi Pi 1 or simply log M 1 li log M Pi 1 1 Pi Then, multiplying by Pi ( 0 ) and summing over i, we get H M ( X ) L H M ( X ) 1. (q.e.d.) Once more, the lower bound can be reached iff we are able to set li logM 1 . Pi © Mikko Valkama / TUT 5 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Problem 3. Source word probabilities 0.62, 0.23, 0.10, 0.02, 0.02, 0.01. a) binary Huffman code, M = 2, code symbols 0 and 1: 0.62 0.62 0.62 0.62 0.23 0.23 0.23 0.23 0.10 0.10 0.10 0.02 0.03 0.02 0.01 0 0.02 0 0.05 0 0.15 0.62 0 0.38 0 1.0 1 1 1 1 1 So the code words are: w1 = 0, w2 = 10, w3 = 110, w4 = 1111, w5 = 11100, w6 = 11101. The average code length is then L = 0.621 + 0.232 + 0.103 + 0.024 + 0.025 + 0.015 = 1.61 bits. The source word entropy in turn is 6 H ( X ) Pi log 2 Pi = 1.54 bits < L (of course). i 1 © Mikko Valkama / TUT 6 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 b) ternary Huffman code, M = 3 => s = 2, code symbols a, b and c (as an example): 0.62 0.62 0.62 0.23 0.23 0.23 0.10 0.10 0.02 0.03 0.02 0.01 b 0.02 c 0.15 c 1.0 b a b a a So the code words are: w1 = c, w2 = b, w3 = ac, w4 = aa, w5 = abb, w6 = aba. The average code length is now L = 0.621 + 0.231 + 0.102 + 0.022 + 0.023 + 0.013 = 1.18 ternary units. The source word entropy is 6 H ( X ) Pi log 3 Pi = 0.97 ternary units < L (of course, notice the 3-base logarithm due to ternary symbols). i 1 © Mikko Valkama / TUT 7 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Problem 4. The following discrete memoryless channel is considered: 1–p x1 = 0 y1 = 0 p y2 = 1 x2 = 1 p 1–p y3 = 2 Let's denote the input probabilities as PX(x1) = q and PX(x2) = 1 – q (why?). Then, by definition, we can write the conditional entropy of Y given X as H (Y X ) PX ( x) PY | X ( y | x)log 2 PY | X ( y | x) x X yY or (use the conditional probabilities in the Figure !) q (1 p)log 2 1 p p log 2 p 0log 2 0 H (Y X ) (1 q) 0log 2 0 p log 2 p (1 p)log 2 1 p ... (1 p)log 2 1 p p log 2 p Then, the actual mutual information is given by I ( X , Y ) H (Y ) H (Y | X ) H (Y ) (1 p) log 2 1 p p log 2 p © Mikko Valkama / TUT 8 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 In general, in order to evaluate H(Y), we need to find the probabilities for yi. So according to the total probability formula, we can first write PY ( y1 ) PY | X ( y1 | x1 ) PX ( x1 ) PY | X ( y1 | x2 ) PX ( x2 ) PY ( y1 ) (1 p)q 0 PY ( y1 ) (1 p)q PY ( y2 ) PY | X ( y2 | x1 ) PX ( x1 ) PY | X ( y2 | x2 ) PX ( x2 ) PY ( y2 ) pq p(1 q) PY ( y2 ) p PY ( y3 ) PY | X ( y3 | x1 ) PX ( x1 ) PY | X ( y3 | x2 ) PX ( x2 ) PY ( y3 ) 0 (1 p)(1 q) PY ( y3 ) (1 p)(1 q) Then, the entropy H(Y) is given by H (Y ) PY ( y) log2 PY ( y) yY (1 p )q log 2 (1 p )q p log 2 p (1 p )(1 q ) log2 (1 p)(1 q) (1 p )qlog 2 1 p log 2 q p log 2 p (1 p )(1 q )log 2 1 p log 2 1 q © Mikko Valkama / TUT 9 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 This can be simplified further to H (Y ) p log2 p (1 p) log2 1 p q(1 p) log2 q (1 q)(1 p) log2 1 q Then we can substitute this into the expression for I(X,Y) which yields I ( X , Y ) H (Y ) (1 p) log 2 1 p p log 2 p q(1 p) log 2 q (1 q)(1 p) log 2 1 q (1 p) q log 2 q (1 q) log 2 1 q The actual channel capacity CS is in turn obtained when the above mutual information I(X,Y) is maximized over the input distribution. So we should choose the value of q in such a way that the expression for I(X,Y) above is maximized. In this case, the maximization is trivial since CS max I ( X , Y ) q max(1 p) q log 2 q (1 q)log 2 1 q q (1 p) max q log 2 q (1 q)log 2 1 q q What is left is the binary entropy function which is maximized when q = 1/2. So the channel capacity is then © Mikko Valkama / TUT 10 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 1 1 1 1 CS (1 p) log2 (1 ) log2 1 2 2 2 2 CS 1 p To once more summarize the main results (for this problem) H (Y X ) (1 p) log2 1 p p log2 p I ( X , Y ) (1 p) q log2 q (1 q) log2 1 q CS 1 p (e.g., if p = 0 => CS = 1 and if p = 1 => CS = 0 which conform with our intuition (see the channel model)) © Mikko Valkama / TUT 11 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 Problem 5. In order to determine the channel capacity, we usually find out an expression for the average mutual information and maximize it with respect to the source probabilities. However, there is an easier way to determine the channel capacity if the channel has certain symmetry (this usually makes the actual calculations much more simple to carry out !). To check the symmetry condition, it is convenient to write the conditional probabilities into a conditional probability matrix P = {pij} = {PY|X(yi|xj)}. Then, if each row (column) of P is just a permutation of any other row (column), equally probable input symbols will maximize the mutual information !!! “Physically” this means that under such symmetry condition, H(Y |X) turns out to be independent of the source probabilities and also equally probable input symbols make the output values each equally probable. Then clearly, I ( X ,Y ) H (Y ) H (Y | X ) is maximized. For our channel, the conditional probability matrix is given by éPY |X (y 1 ê êP (y ê Y |X 2 P= ê êPY |X (y 3 ê êPY |X (y 4 ëê | x 1 ) PY |X (y 1 | x 1 ) PY |X (y 2 | x 1 ) PY |X (y 3 | x 1 ) PY |X (y 4 é1 | x 2 ) ù êê3 ú ê1 | x2) ú ú êê3 ú= | x 2 ) ú êê1 ú ê6 | x2) ú ú êê1 û êë6 1ù ú 6ú 1ú ú 6ú 1ú ú 3ú 1ú ú 3ú û and the symmetry conditions are fulfilled. As a consequence, PX(x1) = PX(x2) = 1/2 will maximize the average mutual information. © Mikko Valkama / TUT 12 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 The output symbol probabilities are then (total probability formula again, it is actually quite an important result from basic probability theory) PY ( yi ) PY | X ( yi | x1 ) PX ( x1 ) PY | X ( yi | x2 ) PX ( x2 ) 1 1 1 1 1 3 2 6 2 4 for all i. So, then the mutual information (and the capacity in this case) is given by I ( X , Y ) H (Y ) H (Y | X ) log 2 (4) 2 P ( x) P x X X yY Y |X ( y | x)log 2 PY | X ( y | x) 1 p1 log 2 ( p1 ) p1 log 2 ( p1 ) p2 log 2 ( p2 ) p2 log 2 ( p2 ) 2 1 p2 log 2 ( p2 ) p2 log 2 ( p2 ) p1 log 2 ( p1 ) p1 log 2 ( p1 ) 2 2 2 p1 log 2 ( p1 ) 2 p2 log 2 ( p2 ) 2 1 2 1 2 log 2 log 2 3 3 6 6 0.0817 bits / channel use CS In this example, the capacity is quite small (when compared to the “maximum capacity” of 1 bits / channel use that could in general be achieved with binary inputs). Can you explain that intuitively ? © Mikko Valkama / TUT 13 / 14 TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016 In the figure below, the previous capacity CS 2 2 p1 log 2 ( p1 ) 2 p2 log 2 ( p2 ) is depicted as a function of p1. Notice that p1 + p2 = 0.5 or p2 = 0.5 – p1 (why?), so there really is only one “free” variable describing the channel quality in this case. © Mikko Valkama / TUT 14 / 14