EECE 577: Information and Coding Theory Midterm Exam # 1 (October 2012) Time allowed: 4.5 Hours Name: (Return this test booklet with your answer sheets.) (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) True False (k) (l) (m) (n) (o) (p) (q) (r) True False Problem 1: / 40 Problem 2: / 40 Problem 3: / 25 Problem 4: / 10 Problem 5: / 20 Problem 6: / 20 Total: / 155 (s) (t) Problem 1. (40 points: Correct answer = +2, no answer = 0, incorrect answer= −1.) Say True or False to the following statements. You don’t need to justify your answer. (a) A uniquely decodable (UD) code is a prefix-free code. (b) A UD code can be represented by a tree, where codewords cannot be assigned to interior nodes. (c) In the Huffman procedure for a D-ary (D > 2) zero-error data compression, all the siblings of the node associated with the smallest probability mass are codewords. (d) Given a DMS, a binary Huffman code has exactly twice a expected length of a quaternary (=4-ary) Huffman code. (e) If Xi ’s i.i.d. discrete random variables with pmf pX (x) and sample space X , Pare n then i=1 logD (1/pX (Xi ))/n converges to HD (X) in probability. (f) Given a discrete source with |X | outcomes and a D-ary code with |X | codewords, the minimum expected codeword length is achieved by assigning longer codewords to less probable outcomes. (g) The fixed-length source coding that relies on the Asymptotic Equi-partition Property (AEP) is in general classified as lossless source coding. (h) If there exists a D-ary UD code with the codeword-length profile [l1 , l2 , ..., l|X | ], then there always exists a D-ary prefix-free(PF) code that has exactly the same codeword-length profile as the UD code even without rearranging the codeword lengths. (i) The Huffman codes always make the equality hold in Kraft’s inequality. (j) Rate-distortion theory is about source coding. 2 (k) Given a discrete source with |X | outcomes of non-zero probabilities, a Huffman procedure that considers up to the level K of a D-ary tree can assign at least D + (K − 1)(D − 1) distinct codewords to the outcomes. (l) If our interest is in finding an optimal solution, we just need to search the subset of search space which guarantees the existence of an optimal solution. (m) The almost sure convergence of a sequence of random variables always implies the convergence in distribution. (n) The weak law of large numbers is a special case of the convergence of a sequence of random variables with probability one. (o) The convergence of a sequence of random variables is essentially the convergence of a sequence of functions. (p) The Huffman codes are not only optimal but also asymptotically achieve the entropy lower bound for any distribution of X. (q) To achieve the entropy lower bound by using variable-length PF codes constructed through Huffman procedures, we always need to encode asymptotically large number of data symbols together. (r) The central limit theorem is a special case of the convergence of a sequence of random variables in probability. (n) (s) Given a discrete-random variable with sample space X , if x ∈ Aε where xi ∈ X (n) (n) and Aε is the set of ε-typical sequences of length n, then |Aε | ≥ (1−ε)Dn(HD (X)−ε) for any integers D(≥ 2) and n. (t) According to the converse of the source coding theorem, the probability of message symbol error converges to one if the code rate is greater than the entropy. 3 Problem 2. (40 points) Answer the following questions: (a) (5 points) A D-ary source code C for a source random variable X is defined as a mapping from X to D∗ , i.e., C : X → D∗ , where X is the range space of X and D∗ is the set of all finite-length sequences of symbols taken from a D-ary code alphabet D. What is the cardinality of D∗ ? (b) (5 points) Draw a Venn diagram that shows the relationship among the following sets: (i) the set of variable-length source codes (ii) the set of prefix-free codes (iii) the set of uniquely-decodable codes (iv) the set of nonsingular codes (c) (5 points) Let X be a discrete random variable with pmf pX (x) and sample space X . Show that the entropy of X is less than or equal to log |X |, ∀pX (x) with |X | < ∞. (d) (5 points) Sketch the binary entropy function hb (p), and find the maximum and the minimum points. 4 (e) (5 points) The mathematical induction is a method of proof typically used to establish that a given statement is true for all natural numbers. The proof consists of three steps. Write down the name of each step and explain each step briefly. (f) (5 points) What is the definition of a Discrete Memoryless Source (DMS)? (g) (5 points) Formulate the variable-length source coding and the fixed-length source coding problems, and point out the major differences of them. (h) (5 points) Given a source random variable X that generates a data sequence Z = (X1 , X2 , · · · , Xn ), we can construct a D-ary Huffman code for Z. Prove that the expected length per source symbol of the Huffman code converges to the entropy HD (X) of the source distribution as n goes to infinity. 5 Problem 3. (25 points: 1 point each) Fill in the blanks below. • An ( (a) ) of a code X is a mapping from the set of length-N strings of letters in X to the set D0 of finite-length strings. • The optimization problem min f (x) is equivalent to x∈Ω minminf (x) i x∈Ωi with a sufficient condition ( (b) ). • Theorem: the ( (c) ) inequality Let C be a D-ary source code, and let l1 , l2 , · · · , l|X | be the lengths of the codewords. |X | P If C is uniquely decodable, then D−lk ≤ 1. k=1 (Proof) Consider the identity N |X | |X | |X | |X | X X X X D−lk = ( (d) ). ··· kN =1 k1 =1k2 =1 k=1 By collecting terms on the right side, we write N |X | NX ·lmax X (N ) D−lk = Ai D−i i=1 k=1 where lmax = max lk , and 1≤k≤|X | (N ) Ai is the coefficient of D−i in i=1 k=1 NX ·lmax ≤ ( (f) ) i=1 =( (g) ), |X | P 1 D−lk ≤ ( (g) ) N , ∀N. k=1 Now, 1 N lim ( (g) ) = lim exp N →∞ N →∞ =( (h) ). Therefore, |X | P !N D−lk k=1 If C is uniquely decodable, then ( (e) ) ∀N, ∀i holds. By using ( (e) ), we have N |X | NX ·lmax X (N ) D−lk = Ai D−i which leads to |X | P D−lk ≤ 1. k=1 6 1 ln ( (g) ) N . • Theorem: the entropy lower bound on L(C) for a D-ary UD code Let L(C) be the expectation i of codeword L(C), and let HD (X) is the h of length 1 entropy of X defined as E logD p(X) , then L(C) ≥ HD (X). (Proof) L(C) − HD (X) = ( (i) ) − ( (j) ) |X | X pi · logD ( (k) ) = i=1 |X | = X pi · ln ( (k) ) ln D i=1 |X | X pi · ln ( (l) ) ≥ by ( (m) ) ln D i=1 |X | |X | X X 1 = ( (n) ) − ( (o) ) ln D i=1 i=1 ≥ ( (p) ) by( (q) ). • Theorem [Yeung, p.52] The expected length of a Huffman code denoted by LHuff , satisfies LHuff < ( (w) ) + ( (x) ). (Proof) Consider constructing a code with codeword lengths {l1 , l2 , · · · , l|X | } where li = ( (r) ). Then, − logD pi ≤ li < − logD pi + 1. Therefore, D−li ≤ ( (s) ), which implies |X | X D |X | X ≤ ( (s) ) = 1. −li i=1 i=1 It satisfies the ( (c) ) inequality, which implies there exists a ( (t) ), having codeword length profile as {l1 , l2 , · · · , l|X | }. Let’s compute the expected length of this PF code. Then, LPF = ( (u) ) |X | X < pi ( (v) ) i=1 = ( (w) ) + ( (x) ). Therefore, ( (y) ) ≤ LHuff ≤ LPF < ( (w) ) + ( (x) ). 7 Problem 4. (10 points) (a) (5 points) Derive the Markov and the Chebychev inequalities. (b) (5 points) Let X be any random variable with mean µ and standard deviation σ. Find a nontrivial lower bound, in a decimal number representation, on Pr(|X −µ| < kσ) when k = 1, 2, and 3. Problem 5. (20 points) (a) (5 points) Illustrate the source coding theorem. (b) (5 points) Illustrate the converse part of the source coding theorem. (c) (10 points: 1 point each) Fill in the blanks below. Theorem: If we have a sequence of block codes with length n and coding rate less than (n) H(X)−ζ, where ζ ≥ 0 does not change with n, then Pr(X (n) 6= X̂ ) → 1 as ( (10) ). 8 (proof) Consider any block code with block length n and coding data rate less than H(X)− ζ, so that the total number of distinct codewords is at most 2n(H(X)−ζ) . Among these distinct codewords, some are assigned to some of the typical sequences and the others are assigned to some of the atypical sequences. The rest of the sequences in X n have arbitrary codewords assigned and all result in decoding error Now, the probability of the correct decision can be written as Pc = Pr( (1) ). Then, by total probability theorem, we have X Pr( (1) | X (n) = x(n) ) Pr(X (n) = x(n) ) Pr( (1) ) = x(n) ∈X n X = (n) x(n) ∈A | X ··· + {z (F) ··· . (n) / x(n) ∈A } | {z (FF) } The first term (F) can be written as (F) ≤ X 1( (1) ) (x (n) ) · Pr(X (n) = x(n) ) (n) x(n) ∈A = X Pr(X (n) = x(n) ) ( (2) ) ≤ ( (3) ) · ( (4) ) = 2−n(ζ−) . because ( (5) ) , and ( (6) ) The second part (FF) can be rewritten as X (FF) ≤ 1 · Pr(X (n) = x(n) ) = Pr( (7) ) < (n) x(n) ∈A / for sufficiently large n. Therefore, Pc < 2−n(ζ−) + for sufficiently large n, for any block code with rate less than H(X) − ζ, with ζ > 0 and for any > 0. This probability is equal to 1 − Pr(X (n) 6= Xˆ(n) ). Thus, Pr(X (n) 6= X̂ (n) ) > ( (8) ) holds for any , take ( (9) )(( (9) ) means ( (10) )), then Pr(X (n) 6= X̂ (n) ) → 1 as ( (10) ). 9 Problem 6. (20 points) Suppose that X is a discrete random variable with pmf pX (x) and sample space X , from which a set Ω(n) , {(x1 , x2 , · · · , xn ) : xi ∈ X , ∀i ∈ N} of data sequences is generated. Answer the following questions. (a) (5 points) We want to construct a binary fixed-length source code. How many bits are needed per source output to make no decoding error? (b) (5 points) We want to construct a binary variable-length source code. How many bits on average are needed per source output to make no decoding error? (c) (5 points) Ω(n)0 is the subset of Ω(n) such that 1 Ω(n)0 , {x ∈ Ω(n) : H(X) − ≤ − log PX1 ,X2 ,··· ,Xn (x1 , x2 , · · · , xn ) ≤ H(X) + }. n Find a nontrivial upper bound on the cardinality of Ω(n)0 . (d) (5 points) We want to construct a binary fixed-length source code. How many bits are needed per source output to make decoding error less than for sufficiently large n? 10