Answer the following questions: Question # 1: (40 points) For the following image below: 99 99 99 99 99 99 99 99 20 20 20 20 20 20 20 20 0 0 0 0 0 0 0 0 0 0 50 50 50 50 0 0 0 0 50 50 50 50 0 0 0 0 50 50 50 50 0 0 0 0 50 50 50 50 0 0 0 0 0 0 0 0 0 0 a. (10 points) What is the entropy of the image below, where numbers (0, 20, 50, 99) denote the gray-level intensities? b. (20 points) Show step by step how to construct the Huffman tree to encode the above intensity values in this image. Show the resulting code for each intensity value. c. (10 points) What is the average number of bits needed for each pixel, using your Huffman code? How does it compare to? a) Entropy is: n n 1 H (S ) pi log 2 pi log 2 pi pi i 1 i 1 We have S= <s1, s2, s3, s4> = <0, 20, 50, 99> P (s1) = 32/64 = 1/2 P (s2) = 8/64= 1/8 P (s3) = 16/64 = 1/4 P (s4) = 8/64 = 1/8 6 points So Entropy = - [½(log ½) + 1/8 (log1/8) + ¼(log1/4) + 1/8(log1/8)] = - [-1/2 – 3/8 – ½ - 3/8] = 14/8 4 points b) We will consider the probabilities as the frequencies of the values Value Freq 20 99 50 0 1/8 1/8 ¼ 1/2 Then we create separate nodes for each of the values above with their frequencies. 1/8 20 1/8 99 ¼ 50 ½ 0 2 points Then we combine the least frequent nodes, 20 and 99 and name it as result (1) and this should be inserted back in the forest of values shown above. ¼ 2 points 1/8 20 1/8 99 Then we combine value 50 and result (1) and name it as result (2) and we insert it back to the forest in which we have only value 0 remained. 1/2 4 points ¼ 50 ¼ 1/8 20 1/8 99 Then we combine value 0 and result (2) and name it as result (3) and by this we have the tree with all values combined 1 ½ 0 1/2 4 points ¼ 50 ¼ 1/8 20 1/8 99 Then we add the codes 0 and 1 on the edges so that any edge linking a parent with its left child will take 0 and any edge that is linking the parent with its right child will take 1. 1 1 0 ½ 0 1/2 0 4 points 1 ¼ 50 ¼ 0 1/8 20 1 1/8 99 So the code number for the values is as follows: 0 = 1, 50 = 00, 20 = 010, 99 = 011 4 points c) The average number of pixels needed can be calculated as follows: ∑ (Number of bits needed for a value) * (Probability of the value) = 1(1/2) + 2(1/4) + 3(1/8) + 3(1/8) = ½ + ½ + 3/8 + 3/8 = 14/8 This is the same as the entropy. 8+2 points Question # 2: (20 points) Consider an alphabet with two symbols A, B, with probability Pr(A) = x and Pr(B) = 1 – x. a. (10 points) Plot the entropy as a function of x. You might want to use log 2 3 1.6 , log 2 7 2.8 b. (10 points) Discuss why it must be the case that if the probability of the two symbols 0.5 + and 0.5 – , with small , the entropy is less than the maximum. a. Entropy is: n n 1 H (S ) pi log 2 pi log 2 pi pi i 1 i 1 S = <s1, s2> = <A, B> P (A) = x P (B) = 1-x So the entropy is: Entropy = - [xlogx + (x-1)log(x-1)] The table below shows some plot values: x 0.0500 0.1000 0.1500 0.2000 0.2500 0.3000 0.3500 0.4000 0.4500 0.5000 0.5500 0.6000 Entropy 0.2864 0.4690 0.6098 0.7219 0.8113 0.8813 0.9341 0.9710 0.9928 1.0000 0.9928 0.9710 0.6500 0.7000 0.7500 0.8000 0.8500 0.9000 0.9500 And the plot is shown below: 0.9341 0.8813 0.8113 0.7219 0.6098 0.4690 0.2864 10 points b) Entropy is: n n 1 H (S ) pi log 2 pi log 2 pi pi i 1 i 1 S = <s1, s2> = <A, B> And considering only two values, the entropy will be maximum if the two values have the probability of exactly 0.5 and the entropy will be at that time = 1. (Totally Random). The more we add the probability to one of these values (and subsequently subtract the same amount of probability from the other value) the less entropy we will have. 10 points Question # 3: (40 points) Suppose the alphabet is {A, B, C}, and the known probability distribution is Pr(A) = 0.5, Pr(B) = 0.4, and Pr(C) = 0.1. For simplicity, let us also assume that both encoder and decoder know that the length of the messages is always 3, so there is no need for a terminator. a. (15 points) How many bits are needed to encode the message BBB by Huffman coding? b. (25 points)How many bits are needed to encode the message BBB by arithmetic coding? a) Using Huffman Coding: We first list the frequencies (frequencies here are the same as probabilities) Value Frequency C B A 0.1 0.4 0.5 Then we make them as separate nodes 0.1 C 0.4 B 0.5 A Then, we combine the less frequent C and B and name the result as result (1) and insert result (1) back into the forest of values after deleting nodes C and B 0.5 0.1 C 0.4 B Then, we combine node A and result (1) to form the entire tree and insert it back to the forest which should now contain only this tree. 1 0.5 A 0.5 0.4 B 0.1 C Then we add the codes 0 and 1 on the edges so that any edge linking a parent with its left child will take 0 and any edge that is linking the parent with its right child will take 1. 1 0 1 0.5 A 10 points 0.5 0 0.1 C 1 0.4 B So, the following is the code assignment: A = 0, C = 10, B = 11 This means that B needs two bits to represent it. So, the message BBB should take 2+2+2 = 6 bits 5 points b) using Arithmetic coding: First we list all the values with their probabilities: A= 0.5 C= 0.1 B=0.4 We show the ranges table first Value A B C Probability 0.5 0.4 0.1 Range [0,0.5) [0.5, 0.9) [0.9, 1) Then applying the algorithm we get for BBB: Value high low Range 1 0 1 B 0.9 0.5 0.4 B 0.86 0.7 0.16 B 0.844 0.78 0.064 6 points 12 points Then we get the codeword using low = 0.78 and high = 0.844. WE apply the algorithm in order to find a value between low and high as follows: k 1 2 3 4 Initial bit 1 1 1 1 code 0. 0.1 0.11 0.111 0.1101 Value Code 0 0.5 0.75 0.875 0.8125 Final code 0.1 0.11 0.110 0.1101 7 points and after four iterations we come up with following code: 1101 (0.1101)2 = 0.8125 This means that we need 4 bits to encode this message using arithmetic coding.