lecture4

ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range SX = { 1,2,3,. . . k} and pmf pk = PX (X = k) Let A  { X  k} Uncertainty of A  I ( X  k ) 1  ln pk Thus pk 1  pk  0  Uncertainty = 0 Uncertainty → ∞ Entropy of X ≡ expected uncertainty of outcomes H X  E I X   K    p ln p k k 1 k • If log2 is used units are bits , with ln, units are nats. • By convention, if P(X= x) = 0 i.e., when P(X = x) = 1 -0 log (0) = 0 For a binary random variable X  { 0,1} , Let p ≡ P(X=1) H X   ( 1  p ) log( 1  p )  p log p HX is maximum when p=0.5 ↔ ↔ p = 1, p= 0  no uncertainty 0,1 are equally probable max uncertainty → HX = 0 Image: http://en.wikipedia.org/wiki/GNU_Free_Documentation_License Let S X  { 00,01,11,10} equally probable HX  3 3 1 1   pk log 2 pk    log 2 4 k 0 k 0 4  2 bits If given that the first bit is 1, H X |1st bit 1 1 1 1 1   log 2  log 2  1 bit 2 2 2 2 for 01 for 11 In general, HX of n equally probable outcomes= n bits e.g., n-bit equiprobable numbers  n bits As each bit is specified, HX decreases by 1 bit. When all n-bits are specified, HX = 0 Relative Entropy: If p = ( p1 , p2 , . . . , pK ) q = ( q1 , q2 , . . . , qK ) X~p Y~q K outcomes for both X and Y are two pmf’s . H (p;q) ≡ relative entropy of q with respect to p   K p k 1   k ln qk  H X ~ p k ln qk  K p k 1 H  p; q   K  pk ln k 1 p k ln pk pk qk H  p; q   0 H  p ; q   0  pk  qk  k  1,.. . , K H (p;q) is often used as a metric for probability distributions, is called the Kullback – Leibler Distance. log x To prove the assertions, use  x  ln 1   x ln 1 H  p; q  1 x 1 1 x K p   pk ln k qk k 1  x 1    qk    p 1   k  pk  k 1  1 K p k k  q k 0 k  H  p; q   0 to get H  p ; q   0, qk 1 pk  k i.e. q  p x 1  k 1, ..., K K H  p; q   ln K  H X ~ p If qk  K pk   pk ln 1 k 1 K 0  H X  ln K H X  ln K iff pK  1 k K This is called maximum entropy (ME) or the minimum relative entropy (MRE) situation. Thus: 0  H X  ln K Only one possible outcome K equally probable outcomes Differential Entropy: For a continuous random variable all entropies are maximally uncertain.  entropy cannot be defined as for discrete random variables. Instead, differential entropy is used  HX    f x  ln X f X x  dx   E  ln f X x  In fact, the integral extends only over the region where fX (x) > 0 f X x ln f X x  0 if f X x  0 e.g . If f X  x    x   1 e 2  2  E  ln f X  x    ln  ln   ln  ln  ln  H X ~ Gaussian  2     x    E   2  22  E x    2 2   2 2 2 2    2   1 2  ln e 2    2    ln  22 2 e  2       Relative entropies for continuous random varaibles X and Y  H  f X ; fY      f X  x  ln f X x  dx fY x  Information Theory Let X be a random variable with SX = { x1 , …., xk } Information about outcomes of X is to be sent over a channel. Channel X Source Receiver Destination How can outcomes { x1 , …., xk } be coded so that all information is carried with maximal efficiency? Best code → minimum expected codeword length. Code must be instantaneously decodable, i.e. no codeword is a prefix for any other. → construct a code tree e.g. S = {x1 , x2 , x3 , x4 ,x5 } 0 0 x1 1 1 0 x2 x3 1 0 x4 x1 = 00 1 x5 x2 = 01 x3 = 10 x4 = 110 x5 = 111 If lk = length of code for xk E (codeword length)  E lk   K  px  l k 1 k k For instantaneous binary codes K l k 2  1 k 1 For D-ary code K l k D  1 k 1 Kraft Inequality Consider E lk   H X  K p k 1  p lk   p log p k 1 K k k  pk log pk 2 lk k 1  k K K k 1 k log pk  log 2 lk  0  Relative Entropy of pk and qk  2 lk which is  0  E lk   H X E lk   H X iff pk  2 lk  1  i .e. lk  log 2    k  pk  Shannon' s source coding theorem i.e. 1. minimum average codeword length = entropy of X 2. most efficient code is obtained when length(xk) = - log pk i.e. word lengths are inversely proportional to their probabilities. 1  bits of information in X = entropy of X 2  a maximally efficient code can always be found when all pk are powers of 2 otherwise H X  E lk best  H X  1 One such optimal code is the Huffman code constructed by a Huffman tree. e.g. Let SX = { A , B , C , D , E } with pmf = { 0.1, 0.3, 0.25, 0.2, 0.15} At every step, combine nodes with minimal sum: 1) 0.25 A 3) 0.3 0.25 0.2 B C D E 0.3 2) A 0.55 C E 4) 0.45 B C D 1 0.55 0 A E 0.25 0 A D 1 0 0.25 0.45 B 0.25 1 E 0.45 1 0 B C 1 D A = 000 B = 01 C = 10 D = 11 E = 001 To prove -lk 2  1 (for any binary tree code) k for any binary tree with each leaf a codeword Let lmax = longest codeword  leaf at level lmax (root = level 0) l If all leaves are at level lmax # leaves = 2 max If a leaf is at lk < lmax it eliminates 2lmax lk leaves from the full tree.  2 k lmax lk  2lmax ( Remember, each leaf is eliminated by exactly one codeword) 2lmax  lk lmax 2  2  k l k 2  1 k 0 1 A B 2 3 C A eliminates D 23 - 1  4 leaves B eliminates 23 - 2  2 leaves C, D do not eliminate any leaves 2 k lk  2 1  2  2  2 3  2 3  1 In General if the tree is complete,  lk 2  1 k If not , it is < 1 e.g. A B 2  lk C  2 1  2 3  2 3 k 3  4 Maximum Entropy Method Given random variable X , SX = { x1 , …., xk } unknown pmf pk = p(xk) constraint E(g(x)) = r 1 Estimate pk Hypothesis : p k  c e   g ( xk ) is the maximum entropy pmf. Proof: Suppose pmf q ≠ p satisfies 1  0  H q ; p   q k k  ln qk pk  q ln q k k  ln c  g  xk   k   ln c    qk g  xk   H X q  k   ln c  r  H X q   H X  p   H X q   H X  p   H X q  In general, given n constraints, E(g1(x)) = r1 . . . E(gn(x)) = rn _____ 1-1 _____ 1-n the ME pmf has the form pk  c e  1 g1 ( xk )  ...   n g n ( xk ) Where c and  i are chosen to satisfy 1  1 ... 1  n and p k 1 k If X is continuous , the ME pdf is of the form f X  x   c e 1 g1 ( x )  ...   n g n ( x ) Note that g i x  may be moments  ME method allows pmf/pdf estimates if some moments are known

lecture4

Related documents

Products

Support

lecture4

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib