CM613 lossless sem

1. Suppose eight characters have a distribution A:(1), B:(1), C:(1), D:(2), E:(3), F:(5), G:(5),
H:(10). Draw a Huffman tree for this distribution. (Because the algorithm may group
subtrees with equal probability in a different order, your answer is not strictly unique.)
2. Consider an alphabet with two symbols A, B, with probability P (A) = x and P (B) = 1 - x.
a. Plot the entropy as a function of x. You might want to use log2 (3) = 1.6 , log2 (7)
= 2.8.
b. Discuss why it must be the case that if the probability of the two symbols are 1/2
+ e and 1/2-e, with small e, the entropy is less than the maximum.
3. Arithmetic Coding and Huffman Coding are two popular lossless compression methods.
a. What are the advantages and disadvantages of Arithmetic Coding as compared
to Huffman Coding?
b. Suppose the alphabet is [A, B, C ], and the known probability distribution is PA =
0.5, PB = 0.4, PC = 0.1. For simplicity, let’s also assume that both encoder and
decoder know that the length of the messages is always 3, so there is no need
for a terminator.
i. How many bits are needed to encode the message BBB by Huffman
ii. How many bits are needed to encode the message BBB by arithmetic
4. Consider the dictionary-based LZW compression algorithm. Suppose the alphabet is the
set of symbols {0,1}. Show the dictionary (symbol sets plus associated codes) and output
for LZW compression of the input 0 1 1 0 0 1 1