Entropy and Shannon`s First Theorem

Chapter 6 Entropy and Shannon’s First Theorem Information A quantitative measure of the amount of information any event represents. I(p) = the amount of information in the occurrence of an event of probability p. Axioms: A. I(p) ≥ 0 B. I(p1∙p2) = I(p1) + I(p2) C. I(p) Cauchy functional equation Existence: I(p) = log_(1/p) source single symbol for any event p p1 & p2 are independent events is a continuous function of p units of information: in base 2 = a bit in base e = a nat in base 10 = a Hartley 6.2 Uniqueness: Suppose I′(p) satisfies the axioms. Since I′(p) ≥ 0, take any 0 < p0 < 1, any base k = (1/p0)(1/I′(p0)). So kI′(p0) = 1/p0, and hence logk (1/p0) = I′(p0). Now, any z  (0,1) can be written as p0r, r a real number  R+ (r = logp0 z). The Cauchy Functional Equation implies that I′(p0n) = n I′(p0) and m  Z+, I′(p01/m) = (1/m) I′(p0), which gives I′(p0n/m) = (n/m) I′(p0), and hence by continuity I′(p0r) = r I′(p0). Hence I′(z) = r∙logk (1/p0) = logk (1/p0r) = logk (1/z). ⁪ Note: In this proof, we introduce an arbitrary p0, show how any z relates to it, and then eliminate the dependency on that particular p0. 6.2 Entropy The average amount of information received on a per symbol basis from a source S = {s1, …, sq} of symbols, si has probability pi. It is measuring the information rate. In radix r, when all the probabilities are independent: weighted arithmeticmean of information information of the weighted geometricmean      pi pi q q q 1 1 1 H r ( S )   pi  logr   logr    logr    pi i 1 i 1 i 1  pi   pi  • Entropy is amount of information in probability distribution. Alternative approach: consider a long message of N symbols from S = {s1, …, sq} with probabilities p1, …, pq. You expect si to appear Npi times, and the probability of this typical message is: q P   pi i 1 Np i q 1 1 whose information is log   N  pi log  N  H (S) pi P  i 1 6.3 Consider f(p) = p ln (1/p): (works for any base, not just e) f′(p) = (-p ln p)′ = -p(1/p) – ln p = -1 + ln (1/p) f″(p) = p(-p-2) = - 1/p < 0 for p  (0,1)  f is concave down f′(1/e) = 0 f(1/e) = 1/e 1/ e f f′(1) = -1 f′(0) = ∞ 0 lim f ( p)  lim p 0 p 0 ln p1 1 p 1/ e p 1 f(1) = 0  ln p ( ln p)  p 1  lim 1  lim  lim 0  1  2 p 0 p p  0 ( p ) p 0  p 6.3 Basic information about logarithm function Tangent line to y = ln x at x = 1 (y  ln 1) = (ln)′x=1(x  1) y=x1 (ln x)″ = (1/x)′ = -(1/x2) < 0 x  ln x is concave down. y=x1 ln x 0 x 1 -1 Conclusion: ln x  x  1 6.4 Fundamental Gibbs inequality q q i 1 i 1 Let  xi  1 and  yi  1 be two probability distributions, and consider q yi xi log  xi i 1  only when xi  yi   q q q q yi xi (1  )   ( xi  yi )   xi   yi  1  1  0  xi i 1 i 1 i 1 i 1 • Minimum Entropy occurs when one pi = 1 and all others are 0. • Maximum Entropy occurs when? Consider Gibbs with  1 distribution y i  q1    q q q 1 q H (S )  logq   pi log  logq  pi   pi log   0  pi  pi i 1 i 1 i 1     • Hence H(S) ≤ log q, and equality occurs only when pi = 1/q. 1  6.4 Entropy Examples S = {s1} S = {s1,s2} S = {s1, …, sr} p1 = 1 p1 = p2 = ½ p1 = … = pr = 1/r H(S) = 0 H2(S) = 1 Hr(S) = 1 (no information) (1 bit per symbol) but H2(S) = log2r. • Run length coding (for instance, in binary predictive coding): p = 1  q is probability of a 0. H2(S) = p log2(1/p) + q log2(1/q) As q  0 the term q log2(1/q) dominates (compare slopes). C.f. average run length = 1/q and average # of bits needed = log2(1/q). So q log2(1/q) = avg. amount of information per bit of original code. Entropy as a Lower Bound for Average Code Length Given an instantaneous code with length li in radix r, let q 1 r  li K   li  1 ; Qi  ;  Qi  1 K i 1 r i 1 q  Qi  Qi 1 1 So by Gibbs,  pi logr    0, applyinglog  log  log pi pi Qi i 1  pi  q q q 1 1 H r ( S )   pi logr   pi logr   pi (logr K  li logr r ) pi i 1 Qi i 1 i 1 q q  logr K   pi li . Since K  1, logr K  0, and henceH r ( S )  L. i 1 By the McMillan inequality, this hold for all uniquely decodable codes. Equality occurs when K = 1 (the decoding tree is complete) and p  r li i 6.5 Shannon-Fano Coding Simplest variable length method. Less efficient than Huffman, but allows one to code symbol si with length li directly from probability pi. li = logr(1/pi)   pi 1 1 1 r li  li   logr   li   logr   1   r   pi  r  . pi  pi  pi pi r   K  q Summing this inequality over i: p i 1 i q 1 r i 1  li q  i 1 pi 1  r r Kraft inequality is satisfied, therefore there is an instantaneous code with these lengths. 6.6 q q 1 Also, H r ( S )   pi logr   pi li  H r ( S )  1 pi  i 1 i 1  L by summing multipliedby pi Example: p’s: ¼, ¼, ⅛, ⅛, ⅛, ⅛ l’s: 2, 2, 3, 3, 3, 3 K = 1 0 H2(S) = 2.5 0 1 1 L = 5/2 0 0 1 1 0 1 6.6 The Entropy of Code Extensions Recall: The nth extension of a source S = {s1, …, sq} with probabilities p1, …, pq is the set of symbols T = Sn = {si1 ∙∙∙ sin : sij  S 1  j  n} where concatenation multiplication ti = si1 ∙∙∙ sin has probability pi1 ∙∙∙ pin = Qi assuming independent probabilities. Let i = (i1−1, …, in−1)q + 1, an n-digit number base q. The entropy is: [] qn qn 1 1 H ( S )  H (T )   Qi log   Qi log  Qi i 1 pi1  pin i 1 n  1 1  Qi log    log   pi1 pin i 1  qn qn  qn    Qi log 1     Qi log 1 .  i 1 pi1 pin i 1  6.8 qn qn 1 1 Consider the kth term Qi log   pi1  pi n log  pi k i 1 pi k i 1 q q q q q 1 1 ˆ ˆ  pi1  pi n log  i k  pi1  pi k  pi n  pi k log   pi k i1 1 i n 1 pi k i 1 1 i n 1 i k 1 q  i 1 1 iˆk q   pi1  pˆ i k  pi n H (S )  H (S ) i n 1  pi1  pˆ i k  pi n is just a probability in the(n  1)st extension,and adding themall up gives1.  H(Sn) = n∙H(S) Hence the average S-F code length Ln for T satisfies: H(T)  Ln < H(T) + 1  n ∙ H(S)  Ln < n ∙ H(S) + 1  H(S)  (Ln/n) < H(S) + 1/n [now let n go to infinity] 6.8 Extension Example S = {s1, s2} H2(S) = (2/3)log2(3/2) + (1/3)log2(3/1) p1 = 2/3 p2 = 1/3 ~ 0.9182958 … Huffman: s1 = 0 s2 = 1 Avg. coded length = (2/3)∙1+(1/3)∙1 = 1 Shannon-Fano: l1 = 1 l2 = 2 Avg. length = (2/3)∙1+(1/3)∙2 = 4/3 2nd extension: p11 = 4/9 p12 = 2/9 = p21 p22 = 1/9 S-F: l11 = log2 (9/4) = 2 l12 = l21 = log2 (9/2) = 3 l22 = log2 (9/1) = 4 LSF(2) = avg. coded length = (4/9)∙2+(2/9)∙3∙2+(1/9)∙4 = 24/9 = 2.666… Sn = (s1 + s2)n, probabilities are corresponding terms in (p1 + p2)n i n i i n i n i n   2 1 2       p1  p2 So thereare   symbolswith probability       n i  3 3 3 i 0  i       n  3n  T hecorresponding SF lengthis log2 i   n log2 3  i   n log2 3  i 2   6.9 Extension cont. LSF (n)  n  2i 1 n n i     n  n log2 3  i   n   2  n log2 3  i   3 i 0  i  i 0  i  3 n  n i n  n i  1  2n     n log 3 2  2  i  n log 3     2    2  n    3 i i 3  i 0   i 0    n (2 + 1)n = 3n (n ) LSF Hence n 2n 3n-1 * 2  log2 3   H2 (S ) n  3 as n x 1  n  i n  i dx n i n 1 n  i 1  (2  x )    2  x  n (2  x )    2 (n  i ) x  n  3n 1  i 0  i  i 0  i  n n n n n i n i n n i n 1 i n n 1         2 ( n  i )  n  3  n 2   i  2  2  i  n  3  n  3             i 0  i  i 0  i  i 0  i  i 0  i  n n 6.9 Markov Process Entropy p( si | si1  sim )  conditional probability thatsi follows si1  sim . For an mth order process,thinkof let tingthestate s  si1 , , sim .  1  Hence, I ( s | s )   log , and so i  p( si | s )   p( s | s )  I ( s | s ) H (S | s )  si S i i Now, let p(s )  theprobability of being in states . T henH (S )   p(s )  H (S | s )  s S m  p(s )  p(s | s )  I (s | s )    p(s )  p(s    p(s , s )  I (s | s )   p(s , s )  log s i S s S m s S m s i S i i i i s S m s i S i s , s i S m 1 i | s )  I (si | s ) 1 p ( s i |s ) 6.10 .8 previous state 0, 0 .2 .5 .5 0, 1 1, 0 .5 .5 .2 1, 1 .8 equilibrium probabilities: p(0,0) = 5/14 = p(1,1) p(0,1) = 2/14 = p(1,0) next state Si1 Si2 Si 0 0 0 0.8 5/14 4/14 0 0 1 0.2 5/14 1/14 0 1 0 0.5 2/14 1/14 0 1 1 0.5 2/14 1/14 1 0 0 0.5 2/14 1/14 1 0 1 0.5 2/14 1/14 1 1 0 0.2 5/14 1/14 1 1 1 0.8 5/14 4/14 H 2 (S )   {0,1}3 2 Example p(si | si1, si2) p(si1, si2) p(si1, si2, si) 1 p(si1 , si2 , si ) log2  p(si | si1 , si2 ) 4 1 1 1 1 1 log 2  2 log 2  4 log 2  0.801377 14 0.8 14 0.2 14 0.5 6.11 The Fibonacci numbers Let f0 = 1 f1 = 2 f2 = 3 f3 = 5 f4 = 8 , …. 𝑓𝑛+1 be defined by fn+1 = fn + fn−1. The lim = 𝑛→∞ 𝑓𝑛 1+ 5 2 = the golden ratio, a root of the equation x2 = x + 1. Use these as the weights for a system of number representation with digits 0 and 1, without adjacent 1’s (because (100)phi = (11)phi). Base Fibonacci Representation Theorem: every number from 0 to fn − 1 can be uniquely written as an n-bit number with no adjacent one’s . Existence: Basis: n = 0 0 ≤ i ≤ 0. 0 = (0)phi = ε Induction: Let 0 ≤ i ≤ fn+1 If i < fn , we are done by induction hypothesis. Otherwise, fn ≤ i < fn+1 = fn−1 + fn , so 0 ≤ (i − fn) < fn−1, and is uniquely representable by i − fn = (bn−2 … b0)phi with bi in {0, 1} ¬(bi = bi+1 = 1). Hence i = (10bn−2 … b0)phi which also has no adjacent ones. Uniqueness: Let i be the smallest number ≥ 0 with two distinct representations (no leading zeros). i = (bn−1 … b0)phi = (b′n−1 … b′0)phi . By minimality of i bn−1 ≠ b′n−1 , and so without loss of generality, let bn−1 = 1 b′n−1 = 0, implies (b′n−2 … b′0)phi ≥ fn−1 which can’t be true. Base Fibonacci The golden ratio  = (1+√5)/2 is a solution to x2 − x − 1 = 0 and is equal to the limit of the ratio of adjacent Fibonacci numbers. 1/2 1/ 1/2 1/r H2 = log2 r 0 1 0 1/ 0 … r−1 1st order Markov process: 1 0 1/ Think of source as 0 emitting variable 10 length symbols: 1/2 1/ 1/2 1 0 1/ + 1/2 = 1 Entropy = (1/)∙log  + ½(1/²)∙log ² = log  which is maximal take into account variable length symbols

Entropy and Shannon`s First Theorem

Related documents

Products

Support

Entropy and Shannon`s First Theorem

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib