Problem 1.

advertisement
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Problem 1.
In general, the entropy H(X) of a discrete random variable X is defined as
H(X )  
 PX ( x) log2 PX ( x)
x X
So now we can write the solution directly as
1
1
1
1
H ( X )  log2 (2)  log2 (4)  log2 (8)  log2 (8)
2
4
8
8

1 2 3 3
  
2 4 8 8

4 4 3 3
  
8 8 8 8

14
8
 1.75 bits
where we have used the facts that logy(y x) = x and logy(1/x) = –logy(x).
Then, the rate of the source is given by
R  rH ( X )  175 bits / sec
© Mikko Valkama / TUT
1 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Following encoding "rule" will achieve the entropy limit (and is also
feasible to decode since no codeword is a prefix of any other codeword):
Outcome
Probability
Codeword
Codeword length
a1
1/2
0
1 bit
a2
1/4
10
2 bits
a3
1/8
110
3 bits
a4
1/8
111
3 bits
The average number of bits produced by the coder is now
1
1
1
1
 1   2   3   3  1.75 bits  H ( X )
2
4
8
8
meaning that in this example we indeed achieved the entropy limit. It is,
however, not necessarily obvious when it is possible to reach the entropy
limit in general. This issue will be considered in Problem 2.
© Mikko Valkama / TUT
2 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Problem 2.
Kraft's inequality for code word lengths is given by
K
 M  li
1
i 1
where M is the number of code symbols (=2 for binary codes), K is the
number of source words (and so also the number of code words) and li is
the length of the i-th code word. It can be shown that every code (prefix or
not) has to satisfy the above inequality to be uniquely decodable.
In general, the average code word length is then defined as (this is the
natural definition that we used already in the Exercise 1)
K
L   li Pi
i 1
where Pi is the probability of the i-th source word. Then, the target is to
minimize L given that the Kraft's inequality is fulfilled (and given of
course that li‘s are integers and > 0 for all i).
© Mikko Valkama / TUT
3 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Clearly, if we are able to choose
li  logM
1
(verify that these codeword lengths li fulfill the Kraft’s ineq!)
Pi
it follows that the average code word length L is given by
K
K
K
1
L   li Pi   Pi log M   Pi log M Pi  H M ( X ) .
Pi
i 1
i 1
i 1
So it really is possible to obtain the average codeword length equal to the
source word entropy. Of course, in practice, this is only possible if the
source word probabilities are of certain kind (i.e., –logM(Pi) is integer  i).
More generally, we can "round up the above idea" by choosing

1
li  log M 
Pi 

where x  stands for the smallest integer number  x. Then, it follows that
also these lengths li will satisfy the Kraft's inequality since
K
M
i 1
 li
K


1
  log M 
Pi 
M 
i 1
© Mikko Valkama / TUT
K
 M
 log M
1
Pi
i 1
K
 M
i 1
4 / 14
log M Pi
K
  Pi  1 .
i 1
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Based on this, we really can choose the code word lengths in this way (that
is, based on the Kraft's inequality, there really is a code with these code
word lengths li) and we can write
log M


1
1
 log M   log M
 1 (by definition of x  )


Pi 
Pi 
Pi
1
or simply
log M
1
 li  log M
Pi
1
1
Pi
Then, multiplying by Pi (  0 ) and summing over i, we get
H M ( X )  L  H M ( X )  1. (q.e.d.)
Once more, the lower bound can be reached iff we are able to set
li  logM
1
.
Pi
© Mikko Valkama / TUT
5 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Problem 3.
Source word probabilities 0.62, 0.23, 0.10, 0.02, 0.02, 0.01.
a) binary Huffman code, M = 2, code symbols 0 and 1:
0.62
0.62
0.62
0.62
0.23
0.23
0.23
0.23
0.10
0.10
0.10
0.02
0.03
0.02
0.01
0
0.02
0
0.05
0
0.15
0.62
0
0.38
0
1.0
1
1
1
1
1
So the code words are: w1 = 0, w2 = 10, w3 = 110, w4 = 1111, w5 = 11100, w6 = 11101. The average code
length is then L = 0.621 + 0.232 + 0.103 + 0.024 + 0.025 + 0.015 = 1.61 bits. The source word
entropy in turn is
6
H ( X )   Pi log 2 Pi
= 1.54 bits < L (of course).
i 1
© Mikko Valkama / TUT
6 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
b) ternary Huffman code, M = 3 => s = 2, code symbols a, b and c (as an example):
0.62
0.62
0.62
0.23
0.23
0.23
0.10
0.10
0.02
0.03
0.02
0.01
b
0.02
c
0.15
c
1.0
b
a
b
a
a
So the code words are: w1 = c, w2 = b, w3 = ac, w4 = aa, w5 = abb, w6 = aba. The average code length is now
L = 0.621 + 0.231 + 0.102 + 0.022 + 0.023 + 0.013 = 1.18 ternary units. The source word entropy is
6
H ( X )   Pi log 3 Pi
= 0.97 ternary units < L (of course, notice the 3-base logarithm due to ternary symbols).
i 1
© Mikko Valkama / TUT
7 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Problem 4.
The following discrete memoryless channel is considered:
1–p
x1 = 0
y1 = 0
p
y2 = 1
x2 = 1
p
1–p
y3 = 2
Let's denote the input probabilities as PX(x1) = q and PX(x2) = 1 – q (why?).
Then, by definition, we can write the conditional entropy of Y given X as
H (Y X )    PX ( x)  PY | X ( y | x)log 2  PY | X ( y | x) 
x X
yY
or (use the conditional probabilities in the Figure !)
 q  (1  p)log 2 1  p   p log 2  p   0log 2  0  


H (Y X )   
 (1  q)  0log 2  0   p log 2  p   (1  p)log 2 1  p   
 ...
 (1  p)log 2 1  p   p log 2  p 
Then, the actual mutual information is given by
I ( X , Y )  H (Y )  H (Y | X )
 H (Y )  (1  p) log 2 1  p   p log 2  p 
© Mikko Valkama / TUT
8 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
In general, in order to evaluate H(Y), we need to find the probabilities for
yi. So according to the total probability formula, we can first write
 PY ( y1 )  PY | X ( y1 | x1 ) PX ( x1 )  PY | X ( y1 | x2 ) PX ( x2 )
PY ( y1 )  (1  p)q  0
PY ( y1 )  (1  p)q
 PY ( y2 )  PY | X ( y2 | x1 ) PX ( x1 )  PY | X ( y2 | x2 ) PX ( x2 )
PY ( y2 )  pq  p(1  q)
PY ( y2 )  p
 PY ( y3 )  PY | X ( y3 | x1 ) PX ( x1 )  PY | X ( y3 | x2 ) PX ( x2 )
PY ( y3 )  0  (1  p)(1  q)
PY ( y3 )  (1  p)(1  q)
Then, the entropy H(Y) is given by
H (Y )  
 PY ( y) log2 PY ( y) 
yY
 (1  p )q log 2 (1  p )q   p log 2  p 
 (1  p )(1  q ) log2 (1  p)(1  q) 
 (1  p )qlog 2 1  p   log 2 q   p log 2  p 
 (1  p )(1  q )log 2 1  p   log 2 1  q 
© Mikko Valkama / TUT
9 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
This can be simplified further to
H (Y )   p log2  p   (1  p) log2 1  p   q(1  p) log2 q 
 (1  q)(1  p) log2 1  q 
Then we can substitute this into the expression for I(X,Y) which yields
I ( X , Y )  H (Y )  (1  p) log 2 1  p   p log 2  p 
 q(1  p) log 2 q   (1  q)(1  p) log 2 1  q 
 (1  p) q log 2 q   (1  q) log 2 1  q 
The actual channel capacity CS is in turn obtained when the above mutual
information I(X,Y) is maximized over the input distribution. So we should
choose the value of q in such a way that the expression for I(X,Y) above is
maximized. In this case, the maximization is trivial since
CS  max I ( X , Y )
q
 max(1  p)  q log 2  q   (1  q)log 2 1  q  
q
 (1  p) max  q log 2  q   (1  q)log 2 1  q  
q
What is left is the binary entropy function which is maximized when q =
1/2. So the channel capacity is then
© Mikko Valkama / TUT
10 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
1
 1
1
 1 
CS  (1  p)  log2    (1  ) log2 1   
2
2
 2 
 2
CS  1  p
To once more summarize the main results (for this problem)
 H (Y X )  (1  p) log2 1  p   p log2  p 
 I ( X , Y )  (1  p) q log2 q   (1  q) log2 1  q 
 CS  1  p (e.g., if p = 0 => CS = 1 and if p = 1 => CS = 0 which
conform with our intuition (see the channel model))
© Mikko Valkama / TUT
11 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
Problem 5.
In order to determine the channel capacity, we usually find out an
expression for the average mutual information and maximize it with
respect to the source probabilities. However, there is an easier way to
determine the channel capacity if the channel has certain symmetry (this
usually makes the actual calculations much more simple to carry out !). To
check the symmetry condition, it is convenient to write the conditional
probabilities into a conditional probability matrix P = {pij} = {PY|X(yi|xj)}.
Then, if each row (column) of P is just a permutation of any other row
(column), equally probable input symbols will maximize the mutual
information !!!
“Physically” this means that under such symmetry
condition, H(Y |X) turns out to be independent of the source probabilities
and also equally probable input symbols make the output values each
equally probable. Then clearly, I ( X ,Y )  H (Y )  H (Y | X ) is maximized.
For our channel, the conditional probability matrix is given by
éPY |X (y 1
ê
êP (y
ê Y |X 2
P= ê
êPY |X (y 3
ê
êPY |X (y 4
ëê
| x 1 ) PY |X (y 1
| x 1 ) PY |X (y 2
| x 1 ) PY |X (y 3
| x 1 ) PY |X (y 4
é1
| x 2 ) ù êê3
ú ê1
| x2) ú
ú êê3
ú=
| x 2 ) ú êê1
ú ê6
| x2) ú
ú êê1
û
êë6
1ù
ú
6ú
1ú
ú
6ú
1ú
ú
3ú
1ú
ú
3ú
û
and the symmetry conditions are fulfilled. As a consequence, PX(x1) =
PX(x2) = 1/2 will maximize the average mutual information.
© Mikko Valkama / TUT
12 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
The output symbol probabilities are then (total probability formula again,
it is actually quite an important result from basic probability theory)
PY ( yi )  PY | X ( yi | x1 ) PX ( x1 )  PY | X ( yi | x2 ) PX ( x2 )
1 1 1 1 1
    
3 2 6 2 4
for all i. So, then the mutual information (and the capacity in this case) is
given by
I ( X , Y )  H (Y )  H (Y | X )
 log 2 (4) 
 2
 P ( x)  P
x X
X
yY
Y |X
( y | x)log 2  PY | X ( y | x) 
1
 p1 log 2 ( p1 )  p1 log 2 ( p1 )  p2 log 2 ( p2 )  p2 log 2 ( p2 ) 
2
1
 p2 log 2 ( p2 )  p2 log 2 ( p2 )  p1 log 2 ( p1 )  p1 log 2 ( p1 ) 
2
 2  2 p1 log 2 ( p1 )  2 p2 log 2 ( p2 )

2
1 2
1
 2  log 2    log 2  
3
3 6
6
 0.0817 bits / channel use
 CS
In this example, the capacity is quite small (when compared to the
“maximum capacity” of 1 bits / channel use that could in general be
achieved with binary inputs). Can you explain that intuitively ?
© Mikko Valkama / TUT
13 / 14
TLT-5400/5406 DIGITAL TRANSMISSION, Exercise 1, Spring 2016
In the figure below, the previous capacity
CS  2  2 p1 log 2 ( p1 )  2 p2 log 2 ( p2 )
is depicted as a function of p1. Notice that p1 + p2 = 0.5 or p2 = 0.5 – p1
(why?), so there really is only one “free” variable describing the channel
quality in this case.
© Mikko Valkama / TUT
14 / 14
Download