Chapter 6 Information Theory

Chapter 6 Information Theory 1 6.1 Mathematical models for information source • Discrete source X  { x1 , x 2 ,  , x L } L p k  P[ X  x k ]  pk  1 k 1 2 6.1 Mathematical models for information source • Discrete memoryless source (DMS) Source outputs are independent random variables {X i} i  1, 2 ,  , N • Discrete stationary source – Source outputs are statistically dependent – Stationary: joint probabilities of x1 , x 2 ,  x n and x1 m , x 2  m ,  x n  m are identical for all shifts m – Characterized by joint PDF p ( x1 , x 2 ,  x m ) 3 6.2 Measure of information • Entropy of random variable X – A measure of uncertainty or ambiguity in X X  { x1 , x 2 ,  , x L } L H ( X )    P[ X  x k ] log P [ X  x k ] k 1 – A measure of information that is required by knowledge of X, or information content of X per symbol – Unit: bits (log_2) or nats (log_e) per symbol – We define 0 log 0  0 – Entropy depends on probabilities of X, not values of X 4 Shannon’s fundamental paper in 1948 “A Mathematical Theory of Communication” Can we define a quantity which will measure how much information is “produced” by a process? He wants this measure H ( p1 , p 2 ,  , p n ) to satisfy: 1) H should be continuous in p i 2) If all p i are equal, H should be monotonically increasing with n 3) If a choice can be broken down into two successive choices, the original H should be the weighted sum of the individual values of H 5 Shannon’s fundamental paper in 1948 “A Mathematical Theory of Communication” 1 1 1 1 1 1 2 1 H( , , ) H( , ) H( , ) 2 3 6 2 2 2 3 3 6 Shannon’s fundamental paper in 1948 “A Mathematical Theory of Communication” The only H satisfying the three assumptions is of the form: n H   K  p i log p i i 1 K is a positive constant. 7 Binary entropy function H(p) H ( X )   p log p  (1  p ) log( 1  p ) H=0: no uncertainty H=1: most uncertainty 1 bit for binary information Probability p 8 Mutual information • Two discrete random variables: X and Y I ( X ;Y )      P[ X   P[ X   P[ X  x, Y  y ]I ( x, y )  x , Y  y ] log P[ x | y ] P[ x ]  x , Y  y ] log P[ x , y ] P [ x ]P [ y ] • Measures the information knowing either variables provides about the other • What if X and Y are fully independent or dependent? 9 I ( X ;Y )  H ( X )  H ( X | Y )  H (Y )  H (Y | X )  H ( X )  H (Y )  H ( X , Y ) 10 Some properties I ( X ; Y )  I (Y ; X ) I ( X ;Y )  0 I(X ; X )  H (X ) I ( X ; Y )  min{ H ( X ), H (Y )} 0  H ( X )  log  Entropy is maximized when probabilities are equal If Y  g ( X ), then H (Y )  H ( X ) 11 Joint and conditional entropy • Joint entropy H ( X ,Y )   P[ X  x , Y  y ] log P [ X  x , Y  y ] • Conditional entropy of Y given X  P [ X  x ] H (Y | X  x )    P [ X  x , Y  y ] log P [Y H (Y | X )   y | X  x] 12 Joint and conditional entropy • Chain rule for entropies H ( X 1 , X 2 , , X n )  H ( X 1 )  H ( X 2 | X 1 )  H ( X 3 | X 1 , X 2 )    H ( X n X 1 , X 2 ,  , X n 1 ) • Therefore, n H ( X 1 , X 2 , , X n )   H (X i ) i 1 • If Xi are iid H ( X 1 , X 2 ,  , X n )  nH ( X ) 13 6.3 Lossless coding of information source • Source sequence with length n n is assumed to be large x  [ X 1 , X 2 , , X n ] X    { x1 , x 2 ,  , x L } p i  P[ X  x i ] • Without any source coding we need log L bits per symbol 14 Lossless source coding • Typical sequence – Number of occurrence of x i is roughly np i – When n   , any x will be “typical” L log P [ x ]  log  N ( pi ) np i  nH ( X ) P[ x ]  1  np i log p i   nH ( X ) i 1 i 1 P[ x ]  2  All typical sequences have the same probability when n   15 Lossless source coding • Typical sequence Number of typical sequences = 1 2 nH ( X ) P[ x ] • Since typical sequences are almost certain to occur, for the source output it is sufficient to consider only these typical sequences • How many bits per symbol we need now? R  nH ( X ) n  H ( X )  log L 16 Lossless source coding Shannon’s First Theorem - Lossless Source Coding Let X denote a discrete memoryless source. There exists a lossless source code at rate R if R  H (X ) bits per transmission 17 Lossless source coding For discrete stationary source… R  H(X )  lim k 1 k H ( X 1 , X 2 , , X k )  lim H ( X k | X 1 , X 2 ,  , X k 1 ) k 18 Lossless source coding algorithms • Variable-length coding algorithm – Symbols with higher probability are assigned shorter code words L min R  {nk } n k P ( xk ) k 1 – E.g. Huffman coding • Fixed-length coding algorithm E.g. Lempel-Ziv coding 19 Huffman coding algorithm P(x1) P(x2) P(x3) P(x4) P(x5) P(x6) P(x7) H(X)=2.11 R=2.21 bits per symbol x1 x2 x3 x4 x5 x6 x7 00 01 10 110 1110 11110 11111 20 6.5 Channel models and channel capacity • Channel models input sequence output sequence x  ( x1 , x 2 ,  , x n ) y  ( y1 , y 2 ,  , y n ) A channel is memoryless if n P[ y | x ]   P[ y i | xi ] i 1 21 Binary symmetric channel (BSC) model Source data Output data Channel encoder Binary modulator Channel Demodulator and detector Channel decoder Composite discrete-input discrete output channel 22 Binary symmetric channel (BSC) model 0 1-p 0 p Input p Output 1 1 1-p P [Y  0 | X  1]  P [Y  1 | X  0 ]  p P [ Y  1 | X  1]  P [ Y  0 | X  0 ]  1  p 23 Discrete memoryless channel (DMC) {Y} x0 y0 x1 y1 … …… Input {X} xM-1 Output P[ y | x ] yQ-1 can be arranged in a matrix 24 Discrete-input continuous-output channel Y  X  N If N is additive white Gaussian noise… p( y | x)   1 2  e ( yx) 2 2 2 2 n p ( y 1 , y 2 ,  , y n | x1 , x 2 ,  , x n )   p ( yi | xi ) i 1 25 Discrete-time AWGN channel y i  xi  ni • Power constraint E [ X ]  P • For input sequence x  ( x1 , x 2 ,  , x n ) with large n 2 1 n x  n i 1 2 i  1 x 2  P n 26 AWGN waveform channel Source data Output data Channel encoder Modulator Physical channel Input waveform Demodulator and detector Channel decoder Output waveform • Assume channel has bandwidth W, with frequency response C(f)=1, [-W, +W] y (t )  x (t )  n (t ) 27 AWGN waveform channel • Power constraint E [ X ( t )]  P 2 lim T  1 T  T 2 T 2 x ( t )dt  P 2 28 AWGN waveform channel • How to define probabilities that characterize the channel? x (t )  x j j (t ) j (t ) j n (t )  n j yi  x j  n j j y (t )   y j j ( t ) j { j ( t ), j  1, 2 ,  , 2WT } Equivalent to 2W uses per second of a discrete-time channel 29 AWGN waveform channel • Power constraint becomes... lim T  1 T  T 2 T 2 x ( t )dt  lim 2 T  1 T 2W T x j 1 2 j  lim T  1  2WT  E [ X ] 2 T  2W E [ X ] 2  P • Hence, E[ X ]  2 P 2W 30 Channel capacity • After source coding, we have binary sequency of length n • Channel causes probability of bit error p • When n->inf, the number of sequences that have np errors  n   np  n! nH b ( p )   2  ( np )! ( n (1  p ))! 31 Channel capacity • To reduce errors, we use a subset of all possible sequences 2 M  2 n nH b ( p ) 2 n (1  H b ( p )) • Information rate [bits per transmission] R  1 n log 2 M  1  H b ( p) Capacity of binary channel 32 Channel capacity 0  R  1  H b ( p)  1 We cannot transmit more than 1 bit per channel use Channel encoder: add redundancy 2n different binary sequencies of length n contain information We use 2m different binary sequencies of length m for transmission 33 Channel capacity • Capacity for abitray discrete memoryless channel C  max I ( X ; Y ) p • Maximize mutual information between input and output, over all p  ( p 1 , p 2 ,  , p X ) • Shannon’s Second Theorem – noisy channel coding - R < C, reliable communication is possible - R > C, reliable communication is impossible 34 Channel capacity For binary symmetric channel P [ X  1]  P [ X  0 ]  1 2 C  1  p log 2 p  (1  p ) log 2 (1  p )  1  H ( p ) 35 Channel capacity Discrete-time AWGN channel with an input power constraint E[ X ]  P Y  X  N 2 For large n, 1 y 2  E[ X ]  E[ N ]  P   2 2 2 n 1 n yx 2  1 n 2  2 n 36 Channel capacity Discrete-time AWGN channel with an input power constraint E[ X ]  P Y  X  N 2 Maximum number of symbols to transmit  M  n(P   ) 2  n 2  n  n  (1  P  n 2 ) 2 Transmission rate R  1 n log 2 M  1 2 log 2 (1  P  2 ) Can be obtained by directly maximizing I(X;Y), subject to power constraint 37 Channel capacity Band-limited waveform AWGN channel with input power constraint - Equivalent to 2W use per second of discretetime channel P 1 1 P 2 W C  log 2 (1  )  log 2 (1  ) bits/channel use N 2 2 N 0W 0 2 1 P P C  2W  log 2 (1  )  W log 2 (1  ) bits/s 2 N 0W N 0W 38 Channel capacity C  W log 2 (1  P   W   P ) N 0W C   C  1 . 44 P N0 39 Channel capacity • Bandwidth efficiency r  R W  log 2 (1  r  log 2 (1  bR N 0W P )  b  log N 0W )  log 2 (1  r b 2  M PT s log 2  M P R ) N0 • Relation of bandwidth efficiency and power efficiency b N0 2 1 r  r r  0, b  ln 2   1 . 6 dB N0 40 41

Chapter 6 Information Theory

Related documents

Products

Support

Chapter 6 Information Theory

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib