Chapter 6 Information Theory 1 6.1 Mathematical models for information source • Discrete source X { x1 , x 2 , , x L } L p k P[ X x k ] pk 1 k 1 2 6.1 Mathematical models for information source • Discrete memoryless source (DMS) Source outputs are independent random variables {X i} i 1, 2 , , N • Discrete stationary source – Source outputs are statistically dependent – Stationary: joint probabilities of x1 , x 2 , x n and x1 m , x 2 m , x n m are identical for all shifts m – Characterized by joint PDF p ( x1 , x 2 , x m ) 3 6.2 Measure of information • Entropy of random variable X – A measure of uncertainty or ambiguity in X X { x1 , x 2 , , x L } L H ( X ) P[ X x k ] log P [ X x k ] k 1 – A measure of information that is required by knowledge of X, or information content of X per symbol – Unit: bits (log_2) or nats (log_e) per symbol – We define 0 log 0 0 – Entropy depends on probabilities of X, not values of X 4 Shannon’s fundamental paper in 1948 “A Mathematical Theory of Communication” Can we define a quantity which will measure how much information is “produced” by a process? He wants this measure H ( p1 , p 2 , , p n ) to satisfy: 1) H should be continuous in p i 2) If all p i are equal, H should be monotonically increasing with n 3) If a choice can be broken down into two successive choices, the original H should be the weighted sum of the individual values of H 5 Shannon’s fundamental paper in 1948 “A Mathematical Theory of Communication” 1 1 1 1 1 1 2 1 H( , , ) H( , ) H( , ) 2 3 6 2 2 2 3 3 6 Shannon’s fundamental paper in 1948 “A Mathematical Theory of Communication” The only H satisfying the three assumptions is of the form: n H K p i log p i i 1 K is a positive constant. 7 Binary entropy function H(p) H ( X ) p log p (1 p ) log( 1 p ) H=0: no uncertainty H=1: most uncertainty 1 bit for binary information Probability p 8 Mutual information • Two discrete random variables: X and Y I ( X ;Y ) P[ X P[ X P[ X x, Y y ]I ( x, y ) x , Y y ] log P[ x | y ] P[ x ] x , Y y ] log P[ x , y ] P [ x ]P [ y ] • Measures the information knowing either variables provides about the other • What if X and Y are fully independent or dependent? 9 I ( X ;Y ) H ( X ) H ( X | Y ) H (Y ) H (Y | X ) H ( X ) H (Y ) H ( X , Y ) 10 Some properties I ( X ; Y ) I (Y ; X ) I ( X ;Y ) 0 I(X ; X ) H (X ) I ( X ; Y ) min{ H ( X ), H (Y )} 0 H ( X ) log Entropy is maximized when probabilities are equal If Y g ( X ), then H (Y ) H ( X ) 11 Joint and conditional entropy • Joint entropy H ( X ,Y ) P[ X x , Y y ] log P [ X x , Y y ] • Conditional entropy of Y given X P [ X x ] H (Y | X x ) P [ X x , Y y ] log P [Y H (Y | X ) y | X x] 12 Joint and conditional entropy • Chain rule for entropies H ( X 1 , X 2 , , X n ) H ( X 1 ) H ( X 2 | X 1 ) H ( X 3 | X 1 , X 2 ) H ( X n X 1 , X 2 , , X n 1 ) • Therefore, n H ( X 1 , X 2 , , X n ) H (X i ) i 1 • If Xi are iid H ( X 1 , X 2 , , X n ) nH ( X ) 13 6.3 Lossless coding of information source • Source sequence with length n n is assumed to be large x [ X 1 , X 2 , , X n ] X { x1 , x 2 , , x L } p i P[ X x i ] • Without any source coding we need log L bits per symbol 14 Lossless source coding • Typical sequence – Number of occurrence of x i is roughly np i – When n , any x will be “typical” L log P [ x ] log N ( pi ) np i nH ( X ) P[ x ] 1 np i log p i nH ( X ) i 1 i 1 P[ x ] 2 All typical sequences have the same probability when n 15 Lossless source coding • Typical sequence Number of typical sequences = 1 2 nH ( X ) P[ x ] • Since typical sequences are almost certain to occur, for the source output it is sufficient to consider only these typical sequences • How many bits per symbol we need now? R nH ( X ) n H ( X ) log L 16 Lossless source coding Shannon’s First Theorem - Lossless Source Coding Let X denote a discrete memoryless source. There exists a lossless source code at rate R if R H (X ) bits per transmission 17 Lossless source coding For discrete stationary source… R H(X ) lim k 1 k H ( X 1 , X 2 , , X k ) lim H ( X k | X 1 , X 2 , , X k 1 ) k 18 Lossless source coding algorithms • Variable-length coding algorithm – Symbols with higher probability are assigned shorter code words L min R {nk } n k P ( xk ) k 1 – E.g. Huffman coding • Fixed-length coding algorithm E.g. Lempel-Ziv coding 19 Huffman coding algorithm P(x1) P(x2) P(x3) P(x4) P(x5) P(x6) P(x7) H(X)=2.11 R=2.21 bits per symbol x1 x2 x3 x4 x5 x6 x7 00 01 10 110 1110 11110 11111 20 6.5 Channel models and channel capacity • Channel models input sequence output sequence x ( x1 , x 2 , , x n ) y ( y1 , y 2 , , y n ) A channel is memoryless if n P[ y | x ] P[ y i | xi ] i 1 21 Binary symmetric channel (BSC) model Source data Output data Channel encoder Binary modulator Channel Demodulator and detector Channel decoder Composite discrete-input discrete output channel 22 Binary symmetric channel (BSC) model 0 1-p 0 p Input p Output 1 1 1-p P [Y 0 | X 1] P [Y 1 | X 0 ] p P [ Y 1 | X 1] P [ Y 0 | X 0 ] 1 p 23 Discrete memoryless channel (DMC) {Y} x0 y0 x1 y1 … …… Input {X} xM-1 Output P[ y | x ] yQ-1 can be arranged in a matrix 24 Discrete-input continuous-output channel Y X N If N is additive white Gaussian noise… p( y | x) 1 2 e ( yx) 2 2 2 2 n p ( y 1 , y 2 , , y n | x1 , x 2 , , x n ) p ( yi | xi ) i 1 25 Discrete-time AWGN channel y i xi ni • Power constraint E [ X ] P • For input sequence x ( x1 , x 2 , , x n ) with large n 2 1 n x n i 1 2 i 1 x 2 P n 26 AWGN waveform channel Source data Output data Channel encoder Modulator Physical channel Input waveform Demodulator and detector Channel decoder Output waveform • Assume channel has bandwidth W, with frequency response C(f)=1, [-W, +W] y (t ) x (t ) n (t ) 27 AWGN waveform channel • Power constraint E [ X ( t )] P 2 lim T 1 T T 2 T 2 x ( t )dt P 2 28 AWGN waveform channel • How to define probabilities that characterize the channel? x (t ) x j j (t ) j (t ) j n (t ) n j yi x j n j j y (t ) y j j ( t ) j { j ( t ), j 1, 2 , , 2WT } Equivalent to 2W uses per second of a discrete-time channel 29 AWGN waveform channel • Power constraint becomes... lim T 1 T T 2 T 2 x ( t )dt lim 2 T 1 T 2W T x j 1 2 j lim T 1 2WT E [ X ] 2 T 2W E [ X ] 2 P • Hence, E[ X ] 2 P 2W 30 Channel capacity • After source coding, we have binary sequency of length n • Channel causes probability of bit error p • When n->inf, the number of sequences that have np errors n np n! nH b ( p ) 2 ( np )! ( n (1 p ))! 31 Channel capacity • To reduce errors, we use a subset of all possible sequences 2 M 2 n nH b ( p ) 2 n (1 H b ( p )) • Information rate [bits per transmission] R 1 n log 2 M 1 H b ( p) Capacity of binary channel 32 Channel capacity 0 R 1 H b ( p) 1 We cannot transmit more than 1 bit per channel use Channel encoder: add redundancy 2n different binary sequencies of length n contain information We use 2m different binary sequencies of length m for transmission 33 Channel capacity • Capacity for abitray discrete memoryless channel C max I ( X ; Y ) p • Maximize mutual information between input and output, over all p ( p 1 , p 2 , , p X ) • Shannon’s Second Theorem – noisy channel coding - R < C, reliable communication is possible - R > C, reliable communication is impossible 34 Channel capacity For binary symmetric channel P [ X 1] P [ X 0 ] 1 2 C 1 p log 2 p (1 p ) log 2 (1 p ) 1 H ( p ) 35 Channel capacity Discrete-time AWGN channel with an input power constraint E[ X ] P Y X N 2 For large n, 1 y 2 E[ X ] E[ N ] P 2 2 2 n 1 n yx 2 1 n 2 2 n 36 Channel capacity Discrete-time AWGN channel with an input power constraint E[ X ] P Y X N 2 Maximum number of symbols to transmit M n(P ) 2 n 2 n n (1 P n 2 ) 2 Transmission rate R 1 n log 2 M 1 2 log 2 (1 P 2 ) Can be obtained by directly maximizing I(X;Y), subject to power constraint 37 Channel capacity Band-limited waveform AWGN channel with input power constraint - Equivalent to 2W use per second of discretetime channel P 1 1 P 2 W C log 2 (1 ) log 2 (1 ) bits/channel use N 2 2 N 0W 0 2 1 P P C 2W log 2 (1 ) W log 2 (1 ) bits/s 2 N 0W N 0W 38 Channel capacity C W log 2 (1 P W P ) N 0W C C 1 . 44 P N0 39 Channel capacity • Bandwidth efficiency r R W log 2 (1 r log 2 (1 bR N 0W P ) b log N 0W ) log 2 (1 r b 2 M PT s log 2 M P R ) N0 • Relation of bandwidth efficiency and power efficiency b N0 2 1 r r r 0, b ln 2 1 . 6 dB N0 40 41