Coding Techniques for Multicasting Ashish Khisti

Coding Techniques for Multicasting
by
Ashish Khisti
B.A.Sc., Engineering Science
University of Toronto, 2002
Submitted to the Department of Electrical Engineering and
Computer Science
in partial fulfillment of the requirements for the degree of
MASSACHUSETTS INSTI
OF TECHNOLOGY
Master of Science
at the
JUL 2 6 2004
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
LIBRARIES
May 2004
Luie
2ec.-
@ Massachusetts Institute of Technology 2004. All rights reserved.
A uthor ..
....................................
Departmet of Electrical1ngineering and Computer Science
May 21, 2004
...................
Certified by ..
I
Gregory Wornell
Professor
Thesis Supervisor
Certified by.................
Uri Erez
Post Doctoral Scholar
Tkc'is Supervisor
Accepted by...
Arthur C. Smith
Chairman, Department Committee on Graduate Students
BARKER
E
Coding Techniques for Multicasting
by
Ashish Khisti
B.A.Sc., Engineering Science
University of Toronto, 2002
Submitted to the Department of Electrical Engineering and Computer Science
on May 21, 2004, in partial fulfillment of the
requirements for the degree of
Master of Science
Abstract
We study some fundamental limits of multicasting in wireless systems and propose practical
architectures that perform close to these limits. In Chapter 2, we study the scenario in which
one transmitter with multiple antennas distributes a common message to a large number
of users. For a system with a fixed number (L) of transmit antennas, we show that, as
the number of users (K) becomes large, the rate of the worst user decreases as O(K-).
Thus having multiple antennas provides significant gains in the performance of multicasting
system with slow fading. We propose a robust architecture for multicasting over block fading
channels, using rateless erasure codes at the application layer. This architecture provides
new insights into the cross layer interaction between the physical layer and the application
layer. For systems with rich time diversity, we observe that it is better to exploit the time
diversity using erasure codes at the application layer rather than be conservative and aim
for high reliability at the physical layer. It is known that the spatial diversity gains are not
significantly high in systems with rich time diversity. We take a step further and show that
to realize these marginal gains one has to operate very close to the optimal operating point.
Next, we study the problem of multicasting to multiple groups with a multiple antenna
transmitter. The solution to this problem motivates us to study a multiuser generalization
of the dirty paper coding problem. This generalization is interesting in its own right and
is studied in detail in Chapter 3. The scenario we study is that of one sender and many
receivers, all interested in a common message. There is additive interference on the channel
of each receiver, which is known only to the sender. The sender has to encode the message
in such the way that it is simultaneously 'good' to all the receivers. This scenario is a
non-trivial generalization of the dirty paper coding result, since it requires that the sender
deal with multiple interferences simultaneously. We prove a capacity theorem for the special
case of two user binary channel and derive achievable rates for many other channel modes
including the Gaussian channel and the memory with defects model. Our results are rather
pessimistic since the value of side information diminishes as the number of users increase.
Thesis Supervisor: Gregory Wornell
Title: Professor
Thesis Supervisor: Uri Erez
Title: Post Doctoral Scholar
2
Acknowledgments
First and foremost I would like to thank my two wonderful advisors, Greg Wornell and Uri
Erez. I am truly privileged to have the opportunity to work with the two of you. Greg
took me under his wings when I joined MIT and has ever since been a tremendous source of
wisdom and inspiration. My interactions with Uri have been simply remarkable. Any time
I had the slightest idea on my research problem, I could drop by his office and develop a
much clearer picture on what I was thinking about. I will always remember the summer of
2003 when we spent several hours together each day trying to crack the binary dirty paper
coding problem. I have learned so many things from the two of you in such a short while
that it is difficult to imagine myself two years earlier.
Special thanks to my lab mates, Albert Chan, Vijay Divi, Everest Huang, Hiroyuki Ishii,
Emin Martinian, Charles Swannack and Elif Uysal. I should particularly mention special
thanks to Emin Martinian for several interesting research discussions. It was great to have
you Vijay as my office mate, with whom I could freely talk almost anything that came to
mind. Special thanks to Giovanni Aliberti, our systems administrator for providing us with
delicious homemade sandwiches during the Thursday lunch meetings and introducing me
to the wonderful world of Mac OSX. My deepest regards to our admin, Tricia Mulcahy for
making all the complicated things at MIT look so simple. Also thanks to Shashi Borade
and Shan-Yuan Ho for being wonderful travel companions in my trip to Europe last year.
Thank you Watcharapan Suwansantisuk for inviting me to the Thai festival.
I would like to thank the wonderful researchers at HP Labs, Palo Alto for inviting me to
visit their labs several times during the last two years. I would especially like to acknowledge
some fruitful interactions with Mitch Trott, John Apostolopoulos and Susie Wee at the HP
labs and hope that they continue in the future.
Finally, I would like to thanks my grandmother, mother and my sister for being extremely supportive during my course of studies. It is nice to have three generation of ladies
to support my uprising! I could have never gone this far without your support.
3
Contents
1 Introduction
2
Multicasting from a Multi-Antenna Transmitter
12
2.1
Channel Model for Multicasting . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2
Serving all Users -
No Transmitter CSI . . . . . . . . . . . . . . . . . . . .
15
2.2.1
Single User Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.2.2
Outage in Multicasting
16
2.3
. . . . . . . . . . . . . . . . . . . . . . . . .
Serving all Users - Perfect Transmitter CSI
. . . . . . . . . . . . . . . . . .
18
Naive Time-Sharing Alternatives . . . . . . . . . . . . . . . . . . . .
20
2.4
Serving a fraction of Users . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.5
Multicasting Using Erasure Codes
23
2.3.1
2.6
3
9
. . . . . . . . . . . . . . . . . . . . . . .
2.5.1
Architecture Using Erasure Codes
. . . . . . . . . . . . . . . . . . .
26
2.5.2
Analysis of the Achievable Rate
. . . . . . . . . . . . . . . . . . . .
28
2.5.3
Layered Approach and Erasure Codes
. . . . . . . . . . . . . . . . .
33
. . . . . . . . . . . . . . . . . . . . . . . .
34
2.6.1
Transmitting Independent Messages to each user . . . . . . . . . . .
35
2.6.2
Multiple Groups
36
Multicasting to Multiple Groups
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multicasting with Known Interference as Side Information
38
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
2 User noiseless case . . . . . . . . . . . . . . . . . . . .
43
3.3.1
Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.3.2
Converse
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3.3
Discussion of the Coding Scheme . . . . . . . . . . . . . . . . . . . .
48
3.1
Point to Point channels
3.2
Multicasting channels
3.3
Binary Channels -
4
3.3.4
Random Linear Codes are Sufficient
. . . . . . . . . . . . . . . . . .
49
3.3.5
Alternative Proof of the Coding Theorem . . . . . . . . . . . . . . .
49
3.3.6
Practical Capacity Approaching Schemes
. . . . . . . . . . . . . . .
50
More than two receivers . . . . . . . . . . . . . . . . . .
51
3.4.1
Improved Inner Bound for K > 2 users . . . . . . . . . . . . . . . . .
52
3.4.2
Binary Channel with Noise
. . . . . . . . . . . . . . . . . . . . . . .
58
3.5
Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
3.6
Writing on many memories with defects
. . . . . . . . . . . . . . . . . . . .
61
Two Memories with Defects . . . . . . . . . . . . . . . . . . . . . . .
62
3.4
Binary Channels -
3.6.1
4
Conclusion and Future Work
65
A Coding over multiple blocks in SISO Block Fading Channels
68
B Proof of proposition 2
71
5
List of Figures
. . . . . . . . . . . .
2-1
Outage Probability for a Single User in MISO system
2-2
K - E[R] vs. K for L=1,2 ... 6 antennas. We need to take the expectation
over R, since we are considering a finite number of users . . . . . . . . . . .
2-3
17
20
The pdf of F with L=2,8. The mean of both distribution is the same, but the
pdf for L=8 has much shorter tails. This helps when we aim for low values
of F, but hurts when we aim for very high values of F
2-4
. . . . . . . . . . . .
24
A time division protocol for serving 4 groups of users. Each group has multiple users that want the same content. The channel coherence time is large
enough so that the channel gain is constant in a given block but small enough
that successive blocks to a given group have independent gains. . . . . . . .
2-5
Low SNR analysis of erasure codes based Multicasting. (a) The optimal value
of
j
(normalized throughput) as a function of the number of antennas and
the corresponding optimizing value of
R
Cerg
(normalized target rate) (b) The
optimal outage probability as a function of the number of antennas (c)
as a function of
2-6
24
R
Cerg
for L = 1, 4, 10 transmit antennas.
e
. . . . . . . . . . .
30
Analysis of of the erasure code based multicasting at SNR=50 dB. (a) Optimal value of C as a function of the number of antennas and the corresponding
optimizing value of R (b) The optimal outage as a function of the number of
antennas (c)O as a function of R - Cerg for L = 1, 4, 10 transmit antennas.
32
3-1
Point to Point channel with state parameters known to the encoder . . . . .
40
3-2
Dirty Paper Coding Channel
. . . . . . . . . . . . . . . . . . . . . . . . . .
41
3-3
Two User Multicasting Channel with Additive Interference
. . . . . . . . .
44
3-4
Achievable rates for the two user multicasting channel when S 1 and
i.i.d. The x-axis is Pr(Si) = 1
S2
are
. . . . . . . . . . . . . . . . . . . . . . . . .
6
45
3-5
Coding for two user multicasting channel
3-6
Architecture for 2 user Binary Multicasting Channel
. . . . . . . . . . . . .
50
3-7
K user multicasting channel . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
3-8
Upper bound for K user multicasting channel . . . . . . . . . . . . . . . . .
53
3-9
Improved Architecture for the 3 user channel
. . . . . . . . . . . . . . . . .
54
3-10 Optimal p, I(U;S1,S 2 ,S 3 ) vs q . . . . . . . . . . . . . . . . . . . . . . . . .
57
. . . . . . . . . .
57
3-12 Two User Multicasting Gaussian Dirty Paper Channel . . . . . . . . . . . .
59
3-13 Achievable Rates and Outer Bound for writing on two memories with defects
64
. . . . . . . . . . . . . . . . . . .
3-11 Inner Bound, Improved Inner Bound and Outer Bound .
7
46
List of Tables
3.1
The probability distribution p(UIS 1 , S2, S 3 ) of cleanup scheduling. Here p is
a parameter to be optimized.
. . . . . . . . . . . . . . . . . . . . . . . . . .
8
56
Chapter 1
Introduction
The problem of distributing common content to several receivers is known as multicasting. It
arises in many popular applications such as multimedia streaming and software distribution.
It is pertinent to both wireless and wireline networks. Some early work [25] in multicasting
has focussed on developing efficient scalable protocols for multi-hop networks such as the
Internet and mobile ad hoc networks (MANET). These protocols provide on demand service
so that the entire network does not get flooded with data packets.
More recent work
[18] on wireline multicasting has shown that the fundamental rate can be increased if the
intermediate nodes perform coding. Polynomial time algorithms for efficient network coding
have also been suggested [27, 15]. Yet another direction [21, 29] has been to model the endto-end links as erasure channels and develop codes that can be efficiently decoded by several
users experiencing different channel qualities. These codes are suitable for multicasting over
the Internet where packet loss is the main source of error.
In this thesis, we establish some fundamental limits and propose robust architectural
solutions for multicasting in wireless networks. A signal transmitted from one node in a
wireless network is received by all other nodes surrounding it.
While this effect causes
interference if different users want different messages, it is beneficial in the multicasting
scenario. Despite this inherent advantage, the problem has not been widely explored in the
wireless setting. We are only aware of the work in [20] that studies some array processing
algorithms and fundamental limits on multicasting in wireless systems.
Multicasting is not a new application in wireless.
The radio and TV broadcasting
systems are one of the earliest examples of analog wireless multicasting.
9
However, the
engineering principles that govern the design of these systems are very different from the
modern view. These networks have a powerful transmitter designed to maximize the range of
signal transmission. The transmitter is mounted on tall towers so that many receivers are in
direct line-of-sight and receive good quality signal. Unfortunately, the data rate available in
such systems is limited and modern of digital television systems use wireline solutions such as
cable TV for high bandwidth transmission. The enormous growth of wireless telephony over
the last decade has been made possible in part through the new digital wireless networks.
These networks, also known as cellular networks, divide the geographical area into smaller
cells and reuse the available spectrum in different cells so that they can support a large
number of users. Furthermore these systems do not rely on line of sight communication, but
rather exploit the structure of wireless fading channel to develop new algorithms for reliable
communication. These networks have been primarily designed for providing individualized
content to each user. However, the next generation cellular networks are expected to support
wireless data and streaming media applications where several users are interested in the
same content. As these applications are deployed, it is important to develop new efficient
architectural solutions. Clearly, if all the receivers want the same content, a system that
creates a separate copy for each receiver is far from efficient. Consider for example an indoor
sports stadium where the audience have personalized screens on which they can watch the
game more closely and listen to the commentary. A central access point provides content to
all the receivers over wireless links. Unlike the analog TV/radio multicasting systems, these
digital systems cannot rely on line of sight communications and have to combat fading. At
the same time they are not efficient if they do not exploit the fact that all the receivers
want a common message.
In Chapter 2 of this thesis we will study some architectural issues for digital multicasting
over wireless systems. The performance of these systems is often limited by the worst user
in the network. We observe that multiple transmit antennas provide substantial gains in
these systems as they greatly improve the quality of the worst channel. We also propose
a robust architecture that uses rateless erasure codes. This study provides new insights
into the cross layer design of the application and physical layers. One conclusion we draw
is that in wireless systems with rich time diversity, it is not necessarily a good idea to be
conservative in the design of the physical layer and aim for small outage probability if there
is an ARQ type protocol at the application layer.
10
Another scenario we study is multicasting to multiple groups of users. It is quite natural
to conceive examples where instead of a single group, there are many groups of users and
different groups of users want a different message. This scenario is a generalization of the
conventional unicasting systems and the single group multicasting system. In Chapter 2
of this thesis, we study some coding schemes for the scenario where the base station has
multiple antennas and wishes to send different messages to different groups of users. This
particular problem leads us to study a multiuser generalization of the dirty paper coding
scheme.
The classical dirty paper coding scenario has one sender, one receiver and an
additive interference known only to the sender.
It has been used to solve the problem
of unicasting from a multi-antenna transmitter [1, 361. We are interested in the scenario
where a transmitter wishes to send a common message to many receivers and each of the
channel experiences an additive interference that is known only to the sender but not to the
receivers. The transmitter has to use the knowledge of the interference sequences so that
the resulting code is good for all the receivers simultaneously. A solution to this particular
abstraction provides an efficient solution to the problem of multicasting to multiple groups
with multiple antennas.
We devote Chapter 3 of this thesis to consider this problem in
detail. We view this problem as a multiuser generalization of the single user link studied in
[13] where the transmitter has to deal with only one interfering sequence. We refer to this
generalization as "Writing on many pieces of Dirty Paper at once". We obtain achievable
rates for a variety of channel models and prove capacity theorems for some special cases.
This scenario is rich with many open problems and we describe some of them in Chapter 4.
11
Chapter 2
Multicasting from a Multi-Antenna
Transmitter
In this chapter we focus on the problem of sending a common stream of information from
a base station to a large number of users in a wireless network. The scenario where one
transmitter communicates to several receivers is known as the broadcast channel problem.
A coding technique known as superposition coding is known to be optimal for the Gaussian
broadcast channel problem when the transmitter and all the receivers have a single antenna
[5]. Recently, this result has been generalized to the case where the transmitter and receivers
have multiple antennas and each receiver wants an independent message. The solution uses
a technique known as dirty paper coding [4]. This technique was first used for the MIMO
broadcast channel in [1]. Subsequent work in [34, 35, 39] shows that this technique achieves
the sum capacity point and very recently Weingarten et al. [36] show that this scheme is in
fact optimal. However little work has been done when there is common information to be
sent in the network. We are only aware of the work in Lopez et al. [23, 20] which studies
several schemes for multicasting common information in wireless networks.
Before designing a system for disseminating common content in a wireless network, we
must estimate the gains we expect from such architectures over existing systems that encode
several copies of the same message for different users. It is also important to understand
how these gains are affected by our design decisions. For example how do these gains change
if we decide to serve only a fraction of the best users, instead of all the users? How does
the fading environment and the number of users affect such systems? In this chapter we
12
seek answers to some of these questions.
Another consideration is whether the fading coefficients are known to the sender and/or
the receiver. Pilot assisted training sequences are often used so that the receivers can learn
the fading coefficients. If the receivers have perfect knowledge of fading coefficients then the
system is called a coherent communication system while those systems where neither the
transmitter nor the receiver have this knowledge are called non-coherent communication
systems. In this chapter we only focus on the coherent systems. Providing the knowledge of
fading coefficients to the transmitter typically requires explicit feedback from the receivers
in FDD (frequency division duplex) systems or uplink measurements in TDD (time division
duplex) systems. This knowledge of fading coefficients is known as channel state information (CSI) at the transmitter. It is necessary to assess the value of providing CSI to the
transmitter. In this chapter we study the systems with and without transmitter CSI.
2.1
Channel Model for Multicasting
We consider a scenario in which one base station wishes to send the same message to K
different users. The base station is equipped with L antennas while each user has a single
antenna.
Thus, each receiver experiences a superposition of L different paths from the
base station. We focus on an environment that has rich enough scattering and no line-ofsight between the base-station and the receivers. Furthermore, we consider narrow-band
communication systems so that the time interval between two symbols is much larger than
the delay spread due to multipath. Under these conditions, the channel gains on each of
the L paths can be modeled as i.i.d. complex Gaussian C.f(O, 1) [33].
Furthermore, it is
noted in [33] that the typical coherence time for such environments is on the order of a few
hundred symbols. The channel gains stay constant for a relatively large number of symbols.
The relative magnitude of coherence time compared to the communication delay tolerated by the application has important implications on the fundamental data rates achieved
by the system as well as the coding schemes that have to be used to achieve these rates.
We present three different scenarios of interest. In our model we consider transmission of
codewords consisting of N symbols. We denote symbol i E {1, 2... n} in codeword m by
x[i; m]. We use subscript k to denote the channel model of the kth user.
(i) Slow Fading: In some applications, the delay requirements are stringent. Several
13
different codewords have to be transmitted within one coherence time interval. In such
situations, we assume that the channel gains are fixed once they are drawn randomly.
The overall channel model is given by
ki; m] = h x[i; m] + wk [i;m]
(2.1)
The channel gains are each drawn i.i.d. CAf(0, 1), but they are fixed for subsequent
transmissions.
(ii) Block Fading: If the delay requirements are not too stringent a block fading model
is suitable.
Here the channel remains constant over a codeword transmission, but
changes independently when the next codeword is transmitted. Accordingly, the channel model is given by
yk[i; m]
=
ht[m]x[i; m] + wk[i; m]
(2.2)
The channel gains are drawn i.i.d. before the transmission of each codeword and
assumed to remain fixed during the transmission of the entire codeword.
A slight
generalization of the block fading channel model is a model where one codeword
spans over several blocks (say T blocks). Alternately, in this model, the channel is
constant over n/T consecutive symbols and then changes independently for the next
block of symbols.
(iii) Fast Fading: If the delay requirements are extremely relaxed, the transmitter can
perform interleaving of different codeword symbols. Accordingly, each symbol experiences a different channel gain and the channel model is given by
k[i;
Tn = h' [i; m] x [l; m] + Wk [i; Tn]
(2.3)
In this model, the channel gains are assumed to be drawn i.i.d. before the transmission
of each symbol in each codeword.
In (2.1)-(2.3), wk[i; m] is additive Gaussian noise CK(O, oa2 ) at receiver k when symbol i
of codeword m is received. These equations explicitly reveal how the channel gain changes
from symbol to symbol and codeword to codeword.
14
The choice of a good performance criterion for the multicasting system is intimately
related to the fading channel model. In the slow fading scenario we can order the users
based on their channel gains. However such an ordering cannot be done in the fast fading
model since the channel changes independently with each transmitted symbol. All users
have statistically identical channels. In such a scenario, it is most appropriate to focus on
the ergodic capacity of the channel. Since each user has the same ergodic capacity, it is
clear that this rate is achievable for multicasting. The ergodic capacity of such a channel is
given by [32]
C = E[log(1 + ph t h)]
(2.4)
In the slow fading scenario, one obvious choice is to serve all the users. In this case
the system is limited by the worst user in the network. Such a choice may be necessary
if all the users have subscribed to a service.
On the other hand in some applications it
might be worthwhile not to serve some of the weakest users. This may be necessary to do
in applications such as streaming that require that the stream be transmitted at a fixed
rate.
In the block fading scenario, it is not advisable to serve the weakest user in each
block, since the ordering of users changes from block to block. Instead it is attractive to
exploit the time diversity available in these systems to improve the overall throughput. In
the following sections we discuss achievable rates based on these choices.
2.2
Serving all Users -
No Transmitter CSI
As discussed before, a key factor that affects system design in slow fading is the availability
of channel state information (CSI) at the transmitter. In this section we are concerned with
the scenario where the transmitter has no knowledge of the channel. This happens if there
is no reverse path from the receivers to the transmitter or if the delay constraints prohibit
the transmitter to wait for the measurements from the receivers. Since the transmitter does
not know the channel gains, we typically design a code for some fixed rate R. A common
figure of merit is the so called outage probability, which we discuss next.
2.2.1
Single User Outage
We begin by defining the E-outage probability for a single user link as in [26].
15
Definition 1 An c-outage probabilityfor a fixed rate R is given by:
E=
min
Pr (I(x; y, h) ; R)
(2.5)
p(x):E[xtx]<P
In the above definition P is the power constraint and I(-; -) is the standard mutual
information between two random variables. Note that h is explicitly included along with
y, to stress that the receiver but not the transmitter has knowledge of the channel vector.
For Gaussian channels, we have
max
I(x;y,h)=
max
p(x):E[xtx]
log
1+
Ax:tr(Ax)<P
P
NO
NO
It is conjectured in [32] that for low outage, the optimal Ax = -[Iand accordingly
(2.5)
reduces to
E=
Here p
=
Pr
log
1+
phth) < R)
P/a2 is the input SNR of the system. Since the distribution of hth ~
-LX
under the assumption of small values of e and high SNR it can be shown that the outage
probability is given by [38, 24]
LL
_ 1 )L
(2R
e ~!
2.2.2
(2.6)
Outage in Multicasting
In multicasting, we declare an outage if any user fails to decode. This is clearly an extremely
stringent requirement. Nevertheless, even under this case, significant gains can be achieved
by using a modest number of transmit antennas. If the transmitter fixes a data rate R, the
resulting outage probability is given by:
EM = Pr(min{Ihl , 1h2 - - IhK} < R)
Here Ihj = I(x; yj, hj) is the mutual information between the transmitter and user j.
Since we assume i.i.d. channel gains and e
em = 1
< 1, we have
(1
16
E)K ~ Ke
(2.7)
Outage Proababilites as a function of SNR for R=0.1 b/sym and R
=
0.5 b/sym
10-
s
-
R=0.1,L=2
--- R=0. 1,L=4
R=0.5,L=4
- - R=0.5,L=2
10
10 10
0
0-6
-10
-5
0
Input SNR(dB
10
15
Figure 2-1: Outage Probability for a Single User in MISO system
Combining equations (2.6) and (2.7) we get
'9(2.8)
- 1)
Em 'E~z
~K (2R
(
p
LL(28
L!
Note that if we hold R as constant, K/pL = constant. Thus doubling the SNR, increases
the number of users the system can support at same Em by a factor of 2 L. This leads us to
the following conclusion
Claim 1 For a system with L transmitterantennas, operating at a fixed rate R bits/symbol
in the high SNR regime, and small outage, every 3 dB increase in SNR increases the number
of multicasting users that can be supported by a factor of 2 L
This result essentially says that if we are designing a system that operates with a low
outage probability, then we can serve many more users simultaneously by a modest increase
in the SNR. It is a consequence of the well known fact that the diversity of a multi-inputsingle-output (MISO) system is proportional to SNR-L in the high SNR regime. From (2.7),
it is clear that if we want to achieve a certain Em for the system, the outage probability for
17
each user must satisfy c = em/K. Accordingly the additional SNR should drive the outage
for each user from cm to Em/K. This reduction in outage can be achieved by a modest
increase in SNR, thanks to the steepness in the waterfall curve. See Figure 2-1.
2.3
Serving all Users - Perfect Transmitter CSI
In this section, we examine the other extreme when the transmitter has perfect knowledge
of each channel. There are two possible uses of this channel knowledge:
(i) Smart array processing to maximize the throughput.
(ii) Selecting a rate which all the users can decode.
The use of channel knowledge in array processing for multicasting is studied in [201. It
is shown that beamforming is optimal for the special case when there are either one or two
receivers.
However beamforming is not optimal in general and a technique called space-
time-multicast-codingis proposed and shown to be optimal for any number of users. The
capacity of the multicast network with transmitter knowledge derived in [20] is,
C
min
max
Ax:tr(Ax)<;P iE{1,2...K}
log
1 +
__A__N
(2.9)
NO
The numerical optimization of Ax can be performed efficiently, since the problem is a
convex optimization problem. However, the answer depends on the specific realization of
the channel vectors and in general is not amenable to analysis. As K --+ 0,
to expect that A, -- + !IL,
it is natural
where IL is a L x L identity matrix. The intuition is that as
the number of users becomes large, there is no preferred direction and hence it is favorable
to use an isotropic code. We primarily consider the scenario of a large number of users
and use Ax = 7IL. In this limit, the channel knowledge is not helpful in performing array
processing. We have,
R=
We set g
log (1+
min
iE{1,2 ...
K}
miniE{1,2K}
hfh) = log
L
'
)L
+
in i
hh)
(2.10)
iE{1,2 ...
K}
h. Its distribution can be easily calculated from the distri-
bution of h and is stated in the following proposition.
18
Proposition 1 The probability distributionfunction of g
=
miniG{1,2...K} hhi is given by
fg(x) = Kfhth(x)(1 - Fhth(X))K-1
where
fhth(-)
and Fhth(-) are the probability density and cumulative distributionfunctions
of hth respectively.
Proof:
Pr(g
G)
=
Pr(j|hi|12 > G, |h 2f1 2 > G ... |hK 12 > G)
Pr(jjhj1 2 > G)K
(Since all the channels are i.i.d.)
=
(1
=
-
Fhth(G))K
Differentiating both sides, we get the desired result. U
Using results from order statistics, we show in Appendix B that as K -*
oc, g
=
mini h hi decays as O(K-i), where L is the number of antennas. This leads to the following
proposition,
Proposition 2 Forfixed p, as the number of users K -+ oo, the rate common rate R which
all the receivers can decode decays as O(K-(+6)) where L is the number of antennas and
6 > 0 can be made arbitrarilysmall.
Proof:
From (2.10), we have
R
=
min
log 1 + - hjh)
L
iG{1,2...K}
min
iE{1,2...K}
Whh
L
6 ))
O(K4+
/
(For large K, we use the linear approximation)
(See Appendix B)
U
In Figure 2-2, we numerically calculate the quantity E[R] using Proposition 1 and plot
the product K - E[R] as a function of K. Note that Proposition 2 suggests that as the
number of users K becomes large, the common rate R approaches a non-random quantity.
However for any finite K, R is still a random variable and hence we use the average value
of R to observe of its decay rate. We see qualitatively from the graph that for L = 1, the
average value of R decays as 1/K (since K - E[R] approaches a constant as K increases),
but for L > 2 the decay rate of R is much slower(since K - E[R] increases with K). Hence,
19
KE[R] for M=1,2,3,4 antennas for k=5-200 users
150
100
-
-e- M=1
-*-+------
M=2
M=3
M=4
M=5
M=6
500
E
-
10
50
Number of Users (K)
100
150
200
Figure 2-2: K - E[R] vs. K for L=1,2 ... 6 antennas. We need to take the expectation over
R, since we are considering a finite number of users.
multiple antennas provide substantial gains in the achievable rate, analogous to the gains
we observed in the outage probability in the previous section.
2.3.1
Naive Time-Sharing Alternatives
We now compare the achievable rate of the previous section to naive schemes that do not
exploit the fact that all the users want a common message.
The achievable rates from
these schemes in comparison to Proposition 2, will indicate the value of exploiting common
information in the system. The scheme described in Proposition 2, requires that all users
simultaneously listen to the transmission and is limited by the worst user. Suppose we
"pretend" that each user wants a different message and then perform resource allocation
to ensure that all users get the same amount of data. We consider two such schemes and
show that both these schemes incur a significant loss in the achievable rate compared to
Proposition 21.
'Note that time-sharing is in genera not optimal for broadcast channels with independent messages.
Ideally we should do dirty paper coding. However we limit ourselves to time-sharing type strategies, since
they are easy to analyze and it is known that TDMA is not too bad compared to Dirty Paper Coding [16]
20
(i) Allocation based on time-slots: Users are divided up into time-slots, so that only
one user is active in each time-slot. The rate of the active user in a particular timeslot is given by Ri = log(1 + phth2 ), where hi is the received channel vector of the
active user. Since all the users need to get the same message eventually, we need to
allocate the time-slots inversely proportional to their rates. This can be expressed
mathematically as:
a1R1
such that EK 1 a=
Rts,1=
= Ce2 R 2 =...
=aKRK
1. This gives
1
1-
1
R2
1
(R 1 ,-
,RK) < K2
RK
K
Ri
(2.11)
i=1
In (2.11), 'H denotes the harmonic mean of R 1 ... RK. Using the fact that as K -+ oc,
k Ri
-
E[log(1 + phth)] it follows that R,
}E[log(1
4
+ phth)]. Note that
the term inside the logarithm increases with the number of antennas L, but this is only
a secondary effect. The rate decreases as 1/K for any number of transmit antennas
in contrast to Proposition 2 where the rate decreases as (1/K) .
(ii) Allocation based on power-levels:
Users are each assigned an equal number of
time-slots but the power level assigned to each user is inversely proportional to the
channel strength. Accordingly, we have that
ai||h1|2 = a2|h2 1 2 = ... = aK||hK 2
where the average power constraint requires I
Rt,,2 = - 110
1K
K
1+
Kf 1
ai
=
p. This gives
p
+hthK)
Kp
h-+
1
+-
Since log(1 + 1/x) is a convex function in x, it follows that
Rt,
2
log (1 + phthi)
k
i=1
21
As K -+ oc, it is clear that Rt,2 < -E[log(1
+ phth)].
Again the rate achieved
through this scheme decreases as 1/K for any number of transmit antennas.
Note that in these schemes, we do not have to transmit at the rate of the worst user
all the time. Despite this seeming advantage, these schemes perform poorly. The intuition
behind the poor performance of the time-sharing based schemes is that we have to allocate
more resources to the worst user (in terms of time-slots or power levels). It hurts if the
users do not listen in all time-slots. Thus significant gains can be achieved by designing
appropriate protocols for multicasting.
2.4
Serving a fraction of Users
In this section, we relax the requirement that all the user have to be served and find the
achievable rate when a faction a of the weakest users is not served. We observed in the
previous section that the rate decreases as O(K-L) if all the users have to be served. How
does the rate improve if we decide not to serve a certain fraction of the weakest uses?
In order to perform analysis and develop some insights, we again consider the limit of a
large number of users. Let ]Peff be the effective target SNR with the corresponding rate
R = log(1 + preff). An outage event occurs with probability
a = Pr(hL
(2.12)
Feff)
The usual notion of outage on a single user link is the fraction of time the channel
is not strong enough to support a given rate.
If the channel is in outage, the receiver
cannot decode at all and hence we would like this probability to be as small as possible. In
the multicasting setting, outage has an alternative interpretation. It is the fraction of the
weakest users who are not served. Thus in multicast, there is no reason to restrict attention
to small values of outage. It allows us to study the tradeoff between the fraction of weakest
users that are not served and the rate that can be given to the other users.
Since 11h 1 2
_
L, the relation between the effective channel gain,
1
ef
and outage
probability, a is given by the following expression:
1 - a = 1 - FIjhj2(LFeff) = e-LFeff (1 + LFeff + Llef1
2
22
...
(LPe )L-1
(L - 1)!)
(2.13)
In general, the above equation does not have an explicit solution for reff, however we
can consider the extreme cases
(i) a -
0: In this case, we expect reff to be small since we are serving almost all users.
Using
['ff
< 1 we have from (2.13),
reff
,
LL
(ii) a -+ 1: In this case, we expect reff to be large, since we are serving only the best
users. Accordingly, we have from (2.13), 1 - a ~ e- L
ff -Llogff)
log(1 - a)
L
eff
and that
2
(2.14)
Equation (2.13) suggests that a large number of antennas actually hurts the performance
when we aim to serve only the strongest users (a -+ 1). The intuition here is that when we
are serving the strongest users the more favorable distribution of the channel gains must
have long tails in the right extreme. By having many antennas we average the path gains
to each receiver and hence the extreme tails decay faster in both directions(see Figure 2-3).
This feature helps if we decide to serve most of the users but is not desirable if we decide
to serve only the best users. Note however that this observation comes with a caveat. We
assume that the transmitter uses a spatially isotropic code for serving (1 - a)K users in the
network. For values of a close to 1, in practice we are serving only a finite number of users.
In this case a spatially isotropic code (i.e. A, = 1IL) should not be used. It is worthwhile
that the transmitter learns the channel gains of the best users and does beamforming or
space time multicast coding proposed by Lopez [20]. In other words, if we want to serve a
fixed number of the best users then multiple antennas can again be useful albeit for different
reasons.
2.5
Multicasting Using Erasure Codes
Our treatment so far, has not addressed the important question of how large an outage the
system must tolerate. An obvious tendency is to keep the outage probability small. Such
2
Here the approximation a ~
(b) is in the sense that limb-,,
23
1
0.7
0.6-
0.5-
0.4-
0.3
I
0.2-
I
0.1
0.5
0
1
2
1.5
r
2.5
3
3.5
4
Figure 2-3: The pdf of F with L=2,8. The mean of both distribution is the same, but the
pdf for L=8 has much shorter tails. This helps when we aim for low values of F, but hurts
when we aim for very high values of F
Group I
K, users
Group 2
K2 users
Group 3
K users
Group 4
a users
Group 1
1 users
Group 2
K2 users
Group 3
K users
Channel Coherence Time
Figure 2-4: A time division protocol for serving 4 groups of users. Each group has multiple
users that want the same content. The channel coherence time is large enough so that the
channel gain is constant in a given block but small enough that successive blocks to a given
group have independent gains.
a design is however conservative as the corresponding target rate can be very small. When
the channel is good, we can decode much more information than the conservative rate. If a
system designer has the choice to pick the outage probability is it better to pick 1% or 10%?
Such a choice is important as different outages imply different achievable rates. In order to
get some insights into this particular problem, we need to pose the question in a broader
context and consider the overall system architecture. A bigger picture that involves crosslayer design is necessary as it lends some important insights on the interaction between the
physical layer and higher layer protocols.
The application we consider is distributing common content to a group of users from a
wireless access point. The access point serves many such groups in a round robin fashion
as shown in Figure 2-4. Due to the delay between successive periods in which a particular
group is served, the block fading scenario (2.2) is a suitable model. Each user experiences
24
an i.i.d. Rayleigh fading channel which stays constant within a given block and changes
independently in the next block. It is well known in the information theory literature [33]
that the optimal encoding technique over a block fading channel is to jointly code over a
very large number of blocks. The receiver knows the fading coefficients and incorporates this
knowledge in maximum likelihood decoding and this scheme achieves the ergodic capacity
(2.4). Even though this scheme is information theoretically optimal, there are many issues
that limit the use of this scheme in practice. The physical layer implementation requires use
of practical error correction codes that can be decoded at variable SNR. Unlike the AWGN
case, practical capacity approaching codes over these channels suffer from complexity constraints. Perhaps more serious is the fact that the code we use is a fixed block length code
and there is a finite error exponent associated with the code even if we code over a very
large number of blocks. In order to deal with an error event at the physical layer, a higher
layer protocol has to be implemented. This can be one of the following forms:
" Automatic Repeat Request (ARQ): If there is only a single receiver a particularly
simple feedback based scheme can be used to deal with the physical layer outage. If the
receiver is not able to decode at the end of the transmission, it sends a retransmission
request to the sender. While such a feedback cannot increase the information theoretic
rate, it is known to improve the error exponent [813.
"
Forward Error Correction(FEC): When there are many receivers, the ARQ protocol cannot be used because different users lose different symbols. One approach is
to use an erasure code as an outer code. The original source file is first converted into
erasure symbols each of which is then encoded at the physical layer using a suitable
channel code.
The receivers can recover the original source file if they are able to
receive a sufficient number of the erasure code symbols.
While a significant amount of literature has been devoted to ARQ based schemes on
block fading channels (see [22] and references therein), relatively little work has been done to
our knowledge on FEC based schemes. The main problem is that traditional erasure codes
are block codes and need to be designed for a specific rate which has to be chosen aprori
[8]. Hence the problem of outage is not completely solved since there is still a chance that
some users with weak channel cannot decode even after the FEC code is used. Fortunately
3
This particular analysis assumes perfect feedback with no errors.
25
an elegant solution to this problem is to use an incremental redundancy rateless erasure
code. These codes, also known as fountain codes were suggested in [21],[29] for multicasting
over the internet. Erasure symbols are generated from the source file in real time as long
as any of the receivers are still listening. Unlike the traditional block erasure codes, we do
not need the knowledge of erasure probability to generate these symbols. A receiver can
recover the original file after it has collected a sufficient number of symbols. Thus, rateless
codes allow variable rate decoding. A receiver is "online" until it collects a specific number
of erasure symbols and then recovers the original file. Using the rateless erasure code as an
outer code preserves the advantages of the ARQ based system without requiring feedback
of which specific symbols were lost.
2.5.1
Architecture Using Erasure Codes
We now describe an architecture that uses a rateless erasure code as an outer code. Each
erasure symbol is sent over the channel using a standard channel code. In order to allow
successful implementation of the channel code, the erasure symbol must be over a large
alphabet so that it provides large number of information bits. An important consideration
is how many channel blocks (K) should be used for transmitting each symbol. The case of
large K is analogous to the ergodic capacity achieving scheme we discussed earlier. It is
however of practical interest to consider the case of K = 1. In this case we can use existing
good codes for AWGN channel for channel coding. The idea is to fix a data rate R which
depends on the alphabet size of the erasure code. This would in turn result in an outage
probability F. The average throughput achieved in this scheme is (1 - E)R. Optimization
of the average throughput yields the best possible tradeoff between using a large data rate
per erasure symbol and producing a small outage probability at the physical layer. The
optimal value of E that maximizes this average throughput would answer the question we
posed in the first paragraph of this section as to what is a reasonable value for the outage
probability in designing systems. We now describe this particular architecture in detail and
calculate the optimal value of E. Our focus is initially on the case when K = 1. In each
block we send one erasure symbol encoded using a standard AWGN code. The case of large
K will be dealt subsequently.
(i) The transmitter converts the file into a very large number of erasure symbols each
26
consisting of nR information bits (n is the size of a block), using a rateless erasure
code. The erasure code symbols 4 can also be generated in real time. The encoding
and decoding complexity of these codes in near linear.
(ii) In each block, the transmitter attempts to transmit one erasure symbol. It encodes
this symbol using a suitable AWGN channel code at rate R.
(iii) Each receiver then tries to decode the packet received in each block. If its instantaneous mutual information in the block is higher than nR, it succeeds and the packet
is decoded and stored. Otherwise an error is declared and the packet is discarded.
(iv) When a receiver obtains enough packets, it is able to decode the original file using
the decoding algorithm for the rateless erasure code. If the original source file has nT
bits then [T/R] + 6 symbols are sufficient to generate the original file, where 6 is a
small constant depending on the choice of a particular code.
Note that this particular architecture assumes a feedback from the receiver. The transmitter has to know of whether any receiver is listening to its packets. In practice there
is always a handshaking protocol and session establishment between the receiver and the
transmitter. So this requirement is naturally available in most systems. There are many
practical observations that favor such an architecture.
" Robustness to Channel Modeling: The architecture we presented is adaptive to
varying channel conditions. If the channel is weak, the user has to wait longer until it
receives enough erasure symbols. Conversely if the channel is strong the waiting time
is short. By overcoming the problem of outage experienced in fixed block codes, this
scheme is far less sensitive to channel modeling. The choice of channel model does
affect the optimal outage probability we aim in each block, but this mismatch is not
detrimental to the performance of the system.
* Computational Complexity: There is a useful separation between generating the
erasure code symbols and AWGN codewords. The time as well as complexity requirements for encoding and decoding of erasure symbols are near linear. Moreover efficient
4
Note that the erasure code symbols has to be over a very large alphabet since each such symbol must
provide nR information bits to the inner AWGN code. From now on, we will refer to these as erasure
symbols with the understanding that they are over large alphabets.
27
iterative algorithms for decoding the inner AWGN code are now widely available as
well. On the other hand if one were to directly encode the source file for a block fading
channel code the practical implementations are not as efficient, as discussed earlier.
* Better Error Exponent: In a fixed block length erasure code was used there are
two sources error (i) error event in decoding the inner AWGN code (ii) error event in
the outer erasure code when sufficient erasure symbols are not received. By using a
rateless code, and assuming that the receiver has a perfect feedback channel to indicate
when it is done, we eliminate the second event and improve the error exponent 5. This
type of improvement is analogous to the improvement in the ARQ based schemes [9],
[6](pg. 201) for single user links.
Analysis of the Achievable Rate
2.5.2
We now present some analysis of the achievable rates using the architecture we just described.
We model the channel of each user as an independent block fading Rayleigh
channel. The transmitter has L antennas while each receiver has one antenna. Suppose we
decide to send an erasure symbol with nR information bits in each block. The probability
of erasure depends on the choice of R. Let e denote the probability that a packet gets lost
for any given user. The average throughput for each user is then given by C
We choose the optimal R that maximizes
E
where
(1 - e)R.
C. The probability of outage is given by:
=
Pr ((log(1 + 11hfI P) < R)
=
Pr (11h112 < Le
1eGeff
=
1
1 + Geff + Geff
2
2
GeL(L - 1)!)
6
Ge
L
(2.15)
Here Geff refers to the effective channel gain that we aim for based on our choice of R. If
we aim for the ergodic capacity for example, then Geff = 1. The overall throughput is given
5
Note that in practice it is not true that the backward channel is perfect. There is always going to be
some error. In both slow fading and fast fading, we can show that the error exponent still improves with
feedback.
6
In this section we take logarithms to the base e for simplicity of calculations.
28
by
O(Geff)
=
1 + Geff + Geff 2
2
e-Geff
GeffL)
(L - 1)!
log
(2.16)
1 +
Lff)
To develop insight into the effect of multiple antennas on the throughput
0, we
consider
the case of low SNR and high SNR systems.
Low SNR
As the SNR A p -* 0, we can make the following simplifications
(i) Cerg
P
(ii) Geff
LR
(iii)
log(,1+
pf
P
)_
LR
Cerg
Gf
Accordingly (2.16) simplifies to
o(Geff)
Cerg
=r e
-G
Ge
Ge
1+Ge+
L
2
GeffL-I
2
...
(L - 1)!
Figure 2-5(a) shows the optimal value of the normalized throughput C/Cerg as a function
of the number of antennas while Figure 2-5(b) shows the corresponding outage probability.
For L
=
1, we see that the optimal Geff
=
1,
O/Cerg =
1/e and the corresponding outage
probability is e =1 - 1/e. Thus the optimal system aims for a large outage probability.
As the number of transmit antennas is increased Figure 2-5(a) shows that the optimal
value of R decreases initially, reaches a minimum around L
=
6 and then slowly increases.
The intuition behind this behavior is that the tails of the distribution function of ||h||
become sharper (cf. Figure 2-3). Consequently, by decreasing R we can decrease the outage
substantially and the overall effect is a net increase in the throughput as seen in 2-5(a).
Even though the optimal throughput increases with the number of antennas, one has
to be careful in interpreting these results. We plot the function
R in
0 as a function of Cerg
2-5(c) for L = 1, 4, 10 antennas. We note that 2-5(a) only plots the peak values. These
plots reveal that even though overall higher gains are achievable with more antennas, the
function 0(-) becomes more peaky as L increases. One has to operate very close to the
optimal R/Cerg as L increases. Since we are operating in the low SNR regime, this requires
us to design and use strong AWGN codes for specific low rates. This may not be possible in
practice and Figure 2-5(c) shows that there is a large penalty in the throughput of multiple
29
(a) Optimized Throughput and Corresponding R
(b) Optimal
Outage
0.6
--
0. 8
Throughput
Target Rate
0.5
2
0
0
a)
0)
ca
o 0.4
0
E
E
0.2
0.
10
5
0.4
0.3
0.2
0.1
0
15
0
L
Number of Antennas
(c) Achievable Throughput With L=1,4,10 Transmit Antennas
-L=1O
0.5
L-
-
0.4
0)
115
10
5
Number of Antennas
-
-
L
-
0.3
0.2
0.1
A
0
0.5
1
1.5
2.5
R/C erg
2
3
3.5
4
4.5
5
Figure 2-5: Low SNR analysis of erasure codes based Multicasting. (a) The optimal value of
CCerg
(normalized throughput) as a function of the number of antennas and the correspond-
ing optimizing value of
R
Cerg
(normalized target rate) (b) The optimal outage probability as
a function of the number of antennas (c)
erg
as a function of
nerg
antennas.
30
R
for L = 1, 4, 10 transmit
transmitter antenna array if one operates at rates not close to the optimal. Given a code
that operates at a certain rate in the low SNR regime one has to select the matching number
of antennas to optimize the throughput. From Figure 2-5(c) we see that for R > 1. 5 Cerg, it
is better to have a single transmit antenna rather than 4 or more antennas. Thus one has
to be careful in interpreting the performance gains from using multiple antennas.
High SNR
In the high SNR regime, we can make the following approximations:
(i) Gef
(ii) E=1-
(iii)
Le
Le-'
e-e
O(Geff)
~ (1
(1+
-
<
1
Geff +
L
2
... (L)
G!
log (I + Pe ff)
Using the above approximations, we obtain the following expressions for the optimal
parameters:
EOPt
(2.17)
Llogp
1
1
Ropt
log p -
log log p + O( )
L
L
(2.18)
Copt
logp -
loglogp - O(
(2.19)
)
In Figure 2-6, we numerically plot the optimal achievable rates at SNR = 50 dB. Figure
2-6(a) shows the optimal throughput and the corresponding target rate. Unlike the low
SNR case, we observe here that the target rate increases with the number of antennas. The
intuition behind this fact is that in the high SNR case the distribution of log(jIh||) is what
matters and this distribution has short tails on the on the right side of the mean even for
L = 1.
Accordingly, as seen in Figure 2-6(a),(c) the optimal value of R is chosen to the
left of the ergodic capacity. In this regime, the the outage is small. As the number of
antennas increases, the target rate and the average throughput both appear to approach
the ergodic capacity according to I log log p as predicted by (2.18)and (2.19). Figure 2-6(b)
shows the corresponding outage probability, which decreases according to 1/L as predicted
by (2.17).
L
=
Figure 2-6(c) plots the achievable throughput as a function of R - Cerg for
1, 4, 10 antennas. Analogous to the low SNR regime, we observe that the gains from
having multiple antennas are prominent only if we operate close to the optimal R.
31
(b) Optimal Outage
(a) Optimal Throughput & Target Rate
-
8
0.12
E
0
0.1
6
0.08
Throughput (C)
Target Rate
0
4
E
Q-
0
2
0
0.06
0.04
0.02
10
5
0
15
Number of Antennas
10
5
15
Number of Antennas
(c) Achievable Throughput with L=1,4,10 Transmit Antennas
10 r
.
8
CL
-c
L
...*'' **..
- - L=1
-L=4
SL=1O
6
4
2
II
-5
-4
-3
-1
-2
R-C
0
1
2
erg
Figure 2-6: Analysis of of the erasure code based multicasting at SNR=50 dB. (a) Optimal
value of C as a function of the number of antennas and the corresponding optimizing value
of R (b) The optimal outage as a function of the number of antennas (c)C as a function of
R - Cerg for L = 1, 4, 10 transmit antennas.
32
Coding over Multiple Blocks:
So far we have considered using one erasure symbol per
block and an AWGN channel code. As discussed earlier, instead of using an AWGN code we
could have the inner channel code span over multiple blocks and exploit the time diversity.
The transmitter could use an erasure symbol over a larger alphabet with nKR information
bits. If the total mutual information over the K blocks is less than (nKR) bits then the
entire packet is discarded. The ergodic capacity approaching channel code requires K -
00.
However, as discussed in the earlier part of this section practical schemes should not aim
for a very large K. As we increase the number of blocks, we expect to approach the ergodic
capacity. We analyze the performance improvements for a single antenna transmitter in
Appendix A. We argue that for large K and high SNR, the achievable rate by coding over
K blocks is
Oopt
=
Cerg
-
0
(
log log p) (where 6 > 0 can be made arbitrarily small).
This shows that even with a single antenna, we can perform reasonably well by coding over
a small number of blocks.
2.5.3
Layered Approach and Erasure Codes
We now discuss another generalization of the communication technique using erasure codes
which uses the layered approach suggested in [31]. In the schemes that we have considered
so far we send at most one erasure symbol per block. If an outage occurs in a given block,
then the receiver misses the corresponding symbol. The main issue with such a construction
is that if the receiver is in outage, it misses the entire symbol. Ideally the receiver should
be able to decode as much information as possible depending to its channel strength. The
layered approach is a scheme where the users can recover a certain fraction of the information
depending on their channel strength. However we argue in this section that the gains from
the layered approach are not substantial.
The layered approach over slow fading channels for MISO links is suggested by Steiner
and Shamai [31].
The authors compute the average throughput for their scheme but do
not explicitly consider the use of erasure codes to achieve it. The main idea in [31] is to
send many different codewords in each block. For simplicity let us consider sending two
different packets at rates R 1 and R 2 over each block. Note that the channel with multiple
transmit antennas is a non-degraded broadcast channel and hence successive cancelation
cannot be performed at the decoder. If the channel is very strong so that we can decode
both the packets we get a total rate of R 1
+ R 2 . This is accomplished by performing joint
33
decoding between the two codewords. If the channel is not strong enough to perform the
joint decoding, an attempt is made to decode each codeword, treating the other as noise. If
this succeeds then at least one packet can be recovered. If the channel is extremely weak,
none of the codewords can be decoded. Thus a receiver can receive either 0,1 or 2 packets in
each block depending on the strength of the channel. By collecting more packets, we reduce
the average time necessary to collect sufficient number of packets to decode the source. An
optimization is performed over the possible rates R 1 and R 2 in [31] to maximize the average
throughput. This is precisely the rate we achieve using an outer erasure code. As we add
more and more layers, we expect to achieve higher gains at the cost of higher complexity
since more layers enable the receiver to get on average more information in each block.
However, it is shown in [31] that the gains are diminish quickly for more than two layers
over a wide range of SNR . Two layers provide some noticeable gains over a single layer, for
moderately large SNR > 20 dB. (See Figure 1, [31]). Adding multiple antennas does not
provide a dramatic gain in capacity as observed in [31]. Largest gains are achieved when
going from one to two antennas and the gains quickly diminish, similar to our observation
from the single layer scheme.
One open problem in implementing the multilayered broadcast approach the design
of compatible rateless erasure codes. One has to generate erasure symbols with different
alphabet sizes for the two layers. The decoder has to be able to use both types of symbols
to efficiently reconstruct the original file. The current implementations of rateless codes all
generate symbols over a single alphabet. In absence of such codes, one could use current
rateless codes over very small alphabet sizes. Erasure code packets on each layer can be
generated by using a collection of these small symbols. The number of symbols in each
packet is proportional to the rate assigned to that layer. However, having erasure symbols
over a small alphabet is not efficient from implementation point of view since it incurs large
overheads. Another problem with the broadcast approach is that the the receiver has to
perform joint decoding of the codewords which is prohibitively complex.
2.6
Multicasting to Multiple Groups
So far, we have focussed on the case in which all the users want a common message. In this
section, we consider a generalization to the case when different groups of users want different
34
messages. More specifically, there is one sender and many receivers. Each receiver belongs
to a specific group and all users within a given group want the same message. If there is
only one group, then the scenario degenerates to the case we have studied in the previous
sections. On the other hand, if the size of each group is one then the problem reduces to that
of sending independent messages to different users. In this section we discuss the general
problem when there is more than one group each with more than one user. The problem
is still open. We discuss some problems that need to be solved for this generalization and
provide some motivation for the next chapter of this thesis.
We begin by reviewing the known schemes for transmitting independent messages to
different users and then consider the more general case of multiple groups.
2.6.1
Transmitting Independent Messages to each user
In this section, we review the scheme for sending a separate message to each user. This
scheme was suggested by Caire and Shamai [1]. For simplicity, we consider the case of a
2 x 2 system where the transmitter has 2 antennas and each of the two receivers has one
antenna. The channel model is given by
(2.20)
y = Hx + w
We assume a slow fading model. Each entry of H is independent and CA/(O, 1). The
transmitter knows the realization of H. Also we have w
-
CK(0, I) and E[xt x] < P. The
main idea is to perform a LU decomposition of H. So that H
=
LU, where L is a lower
triangular matrix and U is an orthogonal matrix. Such a scheme is optimal at high SNR.
For general SNR an MMSE generalization of this factorization has to be performed. If m
is the message to be transmitted, we set x = Utm. Accordingly, we have:
[Y lF 1,
Y2
Y
The transformation x
=
L21
0
M
122
M2
+(2.21)
W2
m
L
w
Utm involves rotation by an orthogonal matrix, hence the
power constraint and the noise variance are preserved. The result of this transformation
is that user 1 receives an interference free AWGN channel.
35
On the other hand user 2
experiences interference from receiver 1. The channel
Y2 = 12 1 m 1
+1l
22 m2
+ W 2 has
12 1 mI as
additive interference. Since the transmitter knows the message of user 2, this interference is
known at the transmitter and a well known scheme called dirty paper coding [4] is used to
code for this user. The capacity of the channel is same as if the interference did not exist.
This scheme can be easily generalized to the K user channel and was shown to be optimal
recently [36].
2.6.2
Multiple Groups
In this section we consider the case when there are multiple groups of users and each groups
wants a different message. To simplify the exposition, we consider the case of a 4 x 4 system
with two groups and two users in each group. The input x is a 4 x 1 vector but there are
only two messages. How should we map the input message symbols to x? Inspired from
the independent message case in the previous section, we consider a linear transformation
x = Vm where 7 V is a 4 x 2 matrix. In order to satisfy the power constraint, we require
tr (VtV) < 1.
Also we select V such that L = HV is block triangular as shown in the
following equation:
Y1
111
0
Y2
121
0
Y3
131
132
m2
W3
Y4
141
142
m
W4
Wi
M
+
W2
(2.22)
w
L
y
1
By performing the above transformation, receiver 1 and 2 have interference free channel,
so message 1 can be decoded perfectly. On the other hand receiver 3 and 4 have known
interference on their channels. How can the transmitter cope with this known interference?
The transmitter has to simultaneously deal with two interference sequences now. If we had
the special case that 131
= 141
and 132
= 142
then, both the channels would be equivalent and
one can perform dirty paper coding. However, this choice of V is restrictive. Nonetheless,
we argue that even this choice of V is attractive for some special cases. We make use of the
following rule of thumb:
7
We do not use the same notation U as before because the matrix V it is not orthonormal. We explicitly
need to ensure that the power is preserved by imposing the trace constraint.
36
Rule of Thumb:
If an N antenna transmitter is used to null out in the direction of t < N
users (assuming i.i.d. Rayleigh fading), then effectively we have N - t degrees of freedom.
The above rule is motivated from the QR factorization of the i.i.d. Gaussian matrix H.
The diagonal entries of R have X(
)degrees
of freedom. We loose i degrees of freedom
by nulling out i random directions. Now consider the 4 x 4 system with two groups that we
considered previously. For group 2 we pose a constraint that 131
= 141.
This is equivalent
to nulling out one direction for group 1. Hence group 1 has three degrees of freedom. For
group 2 we only have two degrees of freedom since
112 = 122
= 0. The constraint
131 = 141
ensures that single user dirty paper coding can be done to remove the interference of group
2. With this scheme group 1 enjoys an extra degree of freedom. If there are N groups each
with 2 users and 2N transmit antennas, then the ith, i E {1, 2... N} group has (N - i
+ 2)
degrees of freedom. On the other hand if we performed zero forcing to all the other users
in each group, we only have 2 degrees of freedom for each group. Thus a simple scheme
that employs dirty paper coding and takes into account that users in each group want a
common message can do better.
The choice of V used in the above argument is somewhat arbitrary. It allows us to
apply the single user dirty paper coding technique to the second group of users. We are not
restricted to this particular value of V if there is a technique that allows us to deal with
more than one interference simultaneously. More specifically, the channel for user 3 and 4
are given by y3 = 13 1 mI + 13 1 m 2 +
W3
and y4 = 14 1 mI + 142 m 2 + w4 respectively. We would
like to encode the message ml in such a way that both the users have as little interference
m 2 as possible 8. This problem is a multiuser generalization of the single user dirty paper
coding result which deals with only one user. In the next chapter we study this problem in
detail. We consider a broader class of channels: channels with state parameters known to
the transmitter and develop some multiuser generalizations to these channels. By analogy
to [4], we view this generalization as "Writing on many pieces of Dirty Paper at once".
We obtain some achievable rates for the special case of Gaussian channels with Gaussian
interference, which we are interested in for the present application. However the capacity
of this channel remains an open problem.
8
For the 4 x 4 system we only need to deal with the case when the two interfering sequences for users 3 and
4 are scalar multiples of each other. But for larger systems, we need to consider more arbitrary interferences
37
Chapter 3
Multicasting with Known
Interference as Side Information
In this chapter, we study some coding techniques for multicasting channels which experience
an additive interference known to the transmitter.
Channels with interference known to
the transmitter model many different applications. In digital watermarking for example,
we wish to embed a watermark onto a host signal. The host signal is treated as known
interference in encoding the watermark [2]. In the last chapter (see also [1],[36]) we observed
that when the transmitter has different messages to send to different users, it can encode one
message treating the other as known interference. In this chapter we consider the problem
of coding for channels with known interference in detail. This particular scenario is a special
case of a broader class of channels, known as channels with state parameters known to the
transmitter. We examine this class of channels initially and then specialize to the known
interference case.
Communication channels controlled by state parameters known to the transmitter were
first introduced by Shannon [28]. Shannon studies a point to point channel, whose transition
probability depends on a state variable which is known to the transmitter but not to the
receiver. In Shannon's model the transmitter becomes aware of the state parameters during
the course of transition (i.e. the transmitter has causal knowledge of the state sequence).
Gel'fand and Pinsker [13] study a similar channel model where the transmitter has knowledge of the entire state sequence before the transmission begins (i.e. the transmitter has
non-causal knowledge of the state sequence). The knowledge of state variables is commonly
38
referred to as side information. See Figure 3-1 for the model of such a channel.
Perhaps surprisingly, the problem of non-causal side information has found several important applications. In this chapter, unless otherwise stated, we focus on this scenario
and refer to it simply as side information. One of the early applications in this area was
coding for defective memory cells [19]. An array of cells, some of which are defective, has
to be used to store a file. The write-head scans through the array and locates the defective cells. It uses this side information to encode the file in such a way that a read-head
which subsequently reads the memory cells can retrieve the file without explicitly knowing
which cells were defective. In [19] the authors propose a class of algebraic codes that are
optimal in that they achieve the same rate as a scheme where both the read and write-head
have knowledge of the defective cells. The pioneering work in [13] establishes the capacity
of channels with non-causal side information at the transmitter in terms of an auxiliary
random variable and shows that the result in [19] is a special case of this capacity. The
result in [13] was subsequently specialized by Costa [4] for the case where the noise and the
state variable are additive Gaussian random variables. Rather surprisingly there is no loss
in capacity if the additive interference is known only to the sender. This special result by
Costa has been further generalized in point to point settings when the interference is an
arbitrary stationary sequence [7] and is now popularly known as coding for channels with
known interference.
Most of the prior work on channels with state parameters has focussed on point to point
links. In this chapter we take the first steps towards a point to multipoint generalization of
channels with state parameters. We are interested in the case where there is one transmitter
and many receivers. Each receiver wants the same message. We refer to this as a multicasting scenario. Throughout this chapter, we assume that the state sequence is known non
causally to the transmitter but not to any of the the receivers. The main challenge in this
generalization is that the transmitter has to deal with the state sequence in such a way
that the resulting code is 'good' for all the receivers simultaneously. Otherwise the weakest
receiver determines the common rate of the system. In many situations of interest, this is
in fact a non-trivial generalization of the schemes for point to point links. In section 3.1
we summarize the results of point to point links and in section 3.2 we formally introduce
the multicasting channel with side information. We then consider several special cases of
the multicasting channel in sections 3.3-3.6. We propose novel coding techniques for these
39
State Gen.
Side Information
State Variable
Message(W)
Encoder
xn
p(yjx,s)
.
- f(W,S')
yn
Decoder
Figure 3-1: Point to Point channel with state parameters known to the encoder
channels and also derive the capacity expressions for some special cases.
3.1
Point to Point channels
In this section we review the literature on point to point channels with side information at
the transmitter. As discussed before, we restrict ourselves to the case of non-causal side
information and refer to it simply as side information. We begin by providing a formal
definition of such a channel [13], shown in Figure 3-1.
Definition 2 A point to point channel with random parameters consists of an input alphabet X, output alphabet Y and a set of states S (which are all discrete alphabets). It is
specified by a transition probability matrix p(yjx, s) and the distribution p(s), Vy E Y, x E
X, s E S.
The channel is stationary and memoryless if p(yn|xn, sn) = Hip(y lxi, si) and
p(sf) = Ujip(sj).
We assume that the sender knows the particularrealization of sn before
using the channel.
A ( 2 ,R, n) code for the channel defined above consists of an encoder
S,
-+
Xn and a decoder function g : ))
-- {1, 2...
2 nR}.
f
: {1, 2..
.2R}
X
Here n is the number of channel
uses to convey the message and R is the rate. The probability of error averaged over all
state sequences and all messages is defined as P, = Pr(g(Y') / W). An expression for the
capacity of this channel was developed in [13] in terms of an auxiliary random variable.
Theorem 1 The capacity of the point the point channel with random parameters is given
by
C =
max
p(UIS),p(XUS)
I(U; Y) - I(U; S)
40
S~,
NQ)
(0,N)
Z ~
Y=X+S+Z
X
E[X2]
p
Figure 3-2: Dirty Paper Coding Channel
Where U is an auxiliary random variable that satisfies U -+ (X, S)
|S\
X
- 1.
=
-+
Y and
IUI
IX
+
Furthermore, it suffices to restrict X be a deterministicfunction of U and S i.e.
f(U, S).
The achievability argument uses random binning technique. Fix the distributions p(UIS)
and p(XIU, S).
Generate
2 n(I(U;Y)-e)
sequences i.i.d according to p(U).
This guarantees
that there is atmost one sequence u", jointly typical with the received sequence yf.
randomly partition the sequences un into 2 nR bins such that there are at least
Now,
2 n(I(U;S)+e)
sequences in each bin. This guarantees that for any sequence sn, there is atleast one X"
in each bin jointly typical with it. To transmit message w, find a sequence u' in bin w
that is jointly typical with s'. Then transmit xn which is jointly typical with u" and sn
according to p(X IU, S). If the decoder finds a unique sequence typical with y' it declares
the corresponding bin index to be the message. It declares an error otherwise. The number
of sequences and the size of each bin are carefully chosen so that the probability of error goes
to zero as n -+ oo. In the above construction R < I(U; Y) - I(U; S) guarantees successful
decoding and this proves the achievability. The proof of the converse is non-trivial and is
derived through a novel chain of inequalities in [13]. See [12] for a simpler proof.
Finding the optimal auxiliary variable is a non-trivial problem in general and depends
on the choice of the specific channel model. The above expression has been specialized to an
additive interference channel model in [4] who considers the channel model Y
where Z
-
Cif(O, N) is additive Gaussian noise and S
-
CH(O,
Q)
=
X + S + Z,
is additive interference
known to the sender. See Figure 3-2. The transmitter also has a power constraint E[X 2]
P. The surprising result shown in [4] is that there is no loss in the capacity compared to a
channel where both the receiver and the transmitter have the knowledge of the interference.
The resulting capacity is log (1+
21). This rate is achieved through the random binning
41
scheme that used in the achievability proof of Gel'fand Pinsker result and the specific choice
of U is given by U = X+aS, where X ~ CJ(0, P) is chosen independent of S and a-
PN
This particular scheme is refered to as dirty paper coding, due to the title of [4]. Recently
this result has been generalized to the case of arbitrary interference [7] and the case of
non-stationary Gaussian interference [40],[3].
Most of the work in the literature has focused on point to point channels with random
parameters.
However multiuser generalizations of these results have received much less
attention. Recently Steinberg [30] considered a two user broadcast channel with random
parameters, where the users want a separate message and [17] have shown that the achievable region is optimal for the case of Gaussian broadcast channels. However the case in [17]
is a very special case that avoids the main difficulty often encountered in multiuser generalization of dirty paper coding where the sender has to deal with multiple state sequences
simultaneously. Their coding scheme applies the single user dirty paper coding result to a
broadcast channel setting and shows an optimistic result. We are interested in a non-trivial
multiuser generalizations where the sender has to deal with multiple state sequences simultaneously. There is typically a loss due to lack of receiver knowledge in such settings. This
case is captured when all the users want a common message and it will be studied in the
remaining sections of this chapter.
3.2
Multicasting channels
In this section we develop a model for multicasting over channels with random parameters.
We consider the scenario of one sender and many receivers.
All the receivers want the
same message. There is no coordination among the receivers. For the sake of clarity, we
define the channel for the case of two receivers. The generalization to the k receiver case is
straightforward.
Definition 3 A two user multicasting channel with random parameters consists of an
input alphabet X, output alphabets Y1
S.
and Y2 for user 1 and 2 and the set of states
(All alphabets are discrete.) It is specified by the transition probability p(y1, y2Ix, s)
Vy1 E Y1,y2 E Y2,x E X,s E S. The channel is stationary and ergodic if p(sn)
and p(y", y2x,
5
)
=
=
rlUp(s,)
rlip(Yli, y2ixji, si). Furthermore, we assume that the sender knows
the particularrealization of s',
non-causally before using the channel.
42
Note that the above definition includes the case where the channel of user 1 is controlled
by a state Si and that of user 2 by another state S 2 . We take the joint state S = (Si, S2)
and set p(s)
=
p(si, s 2 ). We now define the achievable common rate for this channel.
Definition 4 A (2 nR, n) code consists of a message set I
f
: I x S' -+ X', and two decoders g1 :
if the probability of error Pe
=
Y'
-+ I and g2 : Y
Pr{g(Y") 4 W or g(Y 2 )
=
-
$ W}
{1, 2,
... 2 nR},
an encoder
I. The rate R is achievable
-
0 as n -- oo. Here W
is the transmitted message and has a uniform probability distribution on I. The probability
of error, Pe is averaged over all state sequences in addition to the usual averaging over all
codebooks and messages.
An obvious generalization of the random binning scheme for the single user channel
yields the following achievable rate:
Theorem 2 An achievable rate for the multicasting channel with random parameters is
given by:
R=
max
p(UIS),P(XIU,S)
{min(I(U; Yi), I(U; Y2 )) - I(U; S)}
Where U is an auxiliary random variablethat satisfies U ->
Y 2 . Furthermore the size of alphabet for U is at most
(X, S)
lXi+I
->
Y 1 and U -+ (X, S)
->
+ 1.
The achievability argument is analogous to that in the single receiver case in Theorem 1.
The two decoders, upon receiving y" and yn respectively attempt to find the same sequence
u". Accordingly we can only generate
appears in the above expression.
2 n(min(I(U;Yi),I(U;Y
2 )))
sequences u" and hence min(-)
Is the above achievable rate optimal?
Since we are
restricting ourselves to schemes where both the receivers are required to decode to the
same un sequence, one would expect that such a restriction leads to sub-optimality of the
achievable rate. We show some examples in the subsequent sections where this particular
scheme is superseded by schemes that use separate codebooks for different receivers. In the
following sections we study specific channel models and derive some new achievable rates.
3.3
Binary Channels -
2 User noiseless case
In this section we consider the case of one sender and two receivers over binary channels.
Each receiver experiences additive (modulo 2) interference. Both the sequences are known
43
S1 ~ B(q)
Y1 = X +S1
Decoder 1=
W
Encoder
WI
X
Y2
+
S2~-
=X +
S2
Decoder 2
W2
,
(q)
Figure 3-3: Two User Multicasting Channel with Additive Interference
to the sender but not to any of the receivers 1. There is no additional noise on either of the
channels. The scenario is shown if Figure 3-3.
The two user binary channel model is the simplest non-trivial starting point for multicasting channels. The communication scheme is trivial if there is only one sender and one
receiver. The sender will simply XOR the codeword with the interference sequence and
transmit it. The receiver does not have to know the interference sequence as it receives
a clean codeword. For the two user case however, the sender has to simultaneously deal
with two interferences.
It cannot 'clean' both the interferences simultaneously and it is
not trivial what is the best scheme in such a situation. Motivated by the simplicity of the
single user scheme, it could do time-sharing between the two users. This yields a rate of
1/2 bits/symbol for each user for any Si and S 2 . Another simple scheme is to ignore the
two interference sequences. In this case, each of the channels is a binary symmetric channel
and this yields a rate of {1 - max{H(Si), H(S 2 )}} bits/symbol, where H(-) is the binary
entropy function.
It turns out that both these schemes are sub-optimal.
The following
theorem presents the capacity of this channel:
Theorem 3 The capacity of two user binary channel with additive interference, shown in
Figure 3-3, is given by
C = 1 - - H(Si e S2)
2
(3.1)
'For this special case, we observe that there is no loss if the sender learns S, and S2 causally. The
non-causal knowledge does not buy us anything.
44
- - Time Sharing
....
0.9
Ignoring SI
-
Capacity
0.45
0.8
0.7
0.6
E
--
0.5
0.4
-
-
0.30.20.1
-
Al
0
0.05
I
I
I
I
I
I
I
0.1
0.15
0.2
0.25
0.3
0.35
0.4
I
0.5
q=Pr(S.=1)
Figure 3-4: Achievable rates for the two user multicasting channel when Si and S 2 are i.i.d.
The x-axis is Pr(Si) = 1
We note the achievable rate is superior to what can be achieved by time sharing and
ignoring side information. Figure 3-4 compares the achievable rates of timesharing, ignoring
side information and the optimal scheme when Si and S 2 are independent and identically
distributed.
On the x-axis we plot the probability that Si = 1.
The y-axis plots the
achievable rates of the three schemes. We note that substantial gains are achieved using
the optimal scheme for intermediate values of Pr(Si = 1).
In order to prove the above theorem we describe a coding scheme that achieves the
capacity and establish a converse.
3.3.1
Coding Theorem
We first explain the basic idea of the coding scheme informally.
The sender selects a
codeword c to be transmitted to the users according to the message to be transmitted. The
channel input x is generated from this codeword c and the interference sequences s, and
S2. We XOR the first half of the symbols in c with the corresponding symbols of s, and
45
n
C
s!
x
Channel 2
Channel 1
Clean
Noisy
Noisy
Clean
Decode
Decode
C
C
Figure 3-5: Coding for two user multicasting channel
the remaining half of the symbols in c with those of S2 as shown in Figure 3-5.
The main steps of the coding scheme can be described as follows
" Generate
2 ,R
codewords of length n. Each symbol is generated i.i.d Bernoulli(1/2).
" To send message w E {1, 2 ...
2 nR}
select codeword c.
" Using the knowledge of the interference sequences si and S2, generate the transmit
sequence x according to the following relation
c(i)
e s1(i)
cW(i) ( s 2 (i)
for i = 1,2,
...
(3.2)
2
for i = n+1, . . . n
" The first half block of the received sequence yi at receiver 1 is clean while the second
half block is corrupted by the noise Si ( S 2 . Receiver 1 uses this knowledge of the
reliability of the two blocks in decoding. (The conclusions at the second receiver are
reverse). We can express yi as follows
=(3.3)
yi W
CW()
s1 (1) (Ds2(2*)
46
for i = n +,...
n
* Receiver 1 finds a codeword cb that is jointly typical with yi according to (3.3). The
corresponding joint typical set of sequences c and yi is:
A,
=
{(c, yi) : c and yi are individually typical according to Pr(C) and Pr(Yi)
and (c, yi) is jointly typical according to (3.3)}
Analysis of Probability of Error We now compute the probability of error under the
above coding scheme. In what follows, we use the symbol
-
to denote two sequences that
are jointly typical.
Lemma 1 If c and y1 are two independently selected sequences with same marginal as a
then
typical pair (c,y)
Pr((c, y1) (E A,) < 2-n1
H2
eS)
Proof:
Pr((c,yi) E Ae)
Pr(c(1 : n/2) ~ y1(1 : n/2), c(n/2 + 1: n) ~ y 1 (n/2 + 1 : n))
Pr(c(1 : n/2) ~ y1(1 : n/2)) Pr(c(n/2 + 1 : n) - y,(n/2 + 1 : n))
=
=
<2=
2-!If(C;C(DsiDs2)
(cf. (3.3))
-n(1--1H(Si(DS2))
Without loss in generality assume that message 1 is transmitted. Let Ei denote the
event that message i is decoded by receiver 1. Now we calculate the probability of error as
Pr(W,4 W1 )
=
Pr(E' U E 2 U E 3 ... E 2 nR)
2 nR
<
E + E Pr(Ei)
(Union Bound)
i=2
<
E + (2E - 1) 2 -n(1-
<
E + 2 -E
H(Si eS2 ))
(From Lemma 1)
Finally from symmetry, we note that the overall probability of error is Pr(W 4 W 1 or W =
W 2 ) < 2Pr(W 5
WI) approaches 0 as n -* oo. This proves that the rate claimed in Theo-
rem 3 is achievable.
47
3.3.2
Converse
To complete the proof of Theorem 3, we need to show that any reliable scheme cannot
exceed the capacity expression.
nR
= H(W) < I(W; Yjn) + nn
<
(Fano Inequality)
n - H(Y1"|W) + nc
Similarly, we have nR < n - H(Y2"lW) + nca. Combining the two inequalities, we have:
nR= <
K
<
n - max{H(YI |W), H(Y2|W)} + nEn
1
n - { H(Y1"|W) + H(Y2"|1W)} + nrn
2
1
(Conditioning reduces entropy)
n - 1H(Y,, Y 2 IW) + nEn
2
= n--H(Y
Ye
1
IW) + ncn
2
=
n-1-H(SnS)+n)
=
n(l-1--H(Si E)S
(Data Processing Inequality)
(Yi ( Y 2 = S 1 ( S 2 )
1
2 )+
2
En)
This establishes the upper bound and proves the capacity theorem.
3.3.3
Discussion of the Coding Scheme
In this section we provide some additional insights to the coding theorem of Section 3.3.1.
From each receiver's point of view, the channel can be thought of as a channel with block
interference.
Half the time, the channel is noiseless and half the time it is a BSC with
cross-over over probability Si e S2. Furthermore the receiver knows apriori the reliability
of each symbol. The achievable rate with this structure is given by [37]
R
<
1
1
-1(C;YIno noise) + I(C; YInoisy)
<
+111(1 - H(Si E S2))
=
-
1H(Si E S2)
Since the sender cannot cancel the interference completely, it is in effect providing side
48
information to the receivers about the reliability of different blocks by deciding beforehand
the how to cleanup the two channels. This knowledge is then exploited by the receiver
to achieve higher rates. However such receiver side information comes at a cost. While
cleaning one receiver we are actually injecting interference into the channel of the other
receiver. The noise in the noisy block of receiver 1 is S1 E S2, instead of Si for example.
It is not obvious if this is a good idea. For the two user binary case, we have shown that
this is indeed an optimal scheme. For more than two users we show in Section 3.4 that this
we can actually improve the achievable rate by choosing a cleanup schedule after observing
the interference sequences.
3.3.4
Random Linear Codes are Sufficient
The argument that random linear codes are sufficient to prove the coding theorem is analogous to that given by Gallager for the BSC (sec 6.2)[11].
To prove the random coding
theorem, we used the following properties
" Two codewords ci and c3 are mutually independent Vi 0
j
" Each codeword ci is a sequence of i.i.d random variable.
A random linear (n, k) code specified by a k x n generator matrix G is the set of
2
k
codewords x = uG Vu E {0, 1}k. The entries of G are i.i.d Bernoulli(1/2). It is clear that
the resulting codewords are an i.i.d sequence of Bernoulli(1/2) random variables. It can
also be shown that any two codewords are mutually independent [11]. This argument leads
to the following lemma
Corollary 1 A random linear code is sufficient to achieve the capacity of Theorem 3.
3.3.5
Alternative Proof of the Coding Theorem
In this section we give an alternative proof for the coding theorem. From Theorem 2, we
know that the following rate is achievable
R=
max
p(UIS),P(XIU,S1,S
2)
{min(I(U; Y), I(U; Y2)) - I(U; Si, S2)}
49
(3.4)
Since we are free to choose the auxiliary random variable, we set U to a specific expression that achieves the desired rate
2.
Let A ~ Bernoulli(1/2) and U = {U1, U 2 , U 3 , U4 }.
U = A{U 1(X e Sl) + U2 (Xe S 1)} + {U
(3.5)
S 2 ) + U4 (XE S 2 )}
3 (X
Substituting (3.5) into (3.4) and some algebra yields the desired achievable rate. Note
that with this choice of U, it follows that I(U; Si, S 2 )
=
0. This means that only causal
side information is sufficient to achieve the rate. This is consistent with Equation 3.2 where
each x(i) is only a function of s 1 (i) and s2 (i) and not on their future values. In this case
the achievable rate of Theorem 2 is tight.
3.3.6
Practical Capacity Approaching Schemes
Side Information
si
zI
Viterbi Decoder for Block Symbols
with variable reliabilities.
/I
Msg
Good Codes
C
Scheduler
Msg
X
for BSC
y2
Side Information
s2
Viterbi Decoder for Block Symbols
with variable refiabilities.
z2
Figure 3-6: Architecture for 2 user Binary Multicasting Channel
The coding theorem used involves a separation result. The choice of codeword c is simply
a function of of the message to be transmitted. The knowledge of si and
S2
is not used
in this step. Thereafter the transmitted sequence x is generated from c, si, S2 according to
(3.2). Hence there is a separation between the codeword selection and processing the side
information. This separation can be exploited in designing practical systems that achieve
capacity. Good codes for BSC channels already exist. They can be directly used for this
application. The only additional requirement is to generate the sequence x using the side
information according to 3.2. This processing is done by the scheduling block in 3-6.
Standard decoding algorithms such as the Viterbi Algorithm or iterative decoding for
block interference channels can be used at the receivers. Each decoder knows the statistical
reliability of the symbols it receives. Accordingly it can adjust its cost function in decision
making to take into account variable reliability of the symbols in different blocks.
2
This expression for U was discovered through numerical optimization
50
Msg
3.4
Binary Channels -
More than two receivers
We now generalize to the case when there are more than two receivers as shown in Figure 3-7.
We consider the case when there is no additional noise i.e. Z=
ZK = 0 and when S1, S 2
Pr(Ski, Sk 2 - - - Skm)
... SK
Z2=
=
are all identically distributed (i.e. the joint distribution of
is independent of the specific choice of k1 , k 2 .
. .
km E {1, 2 ... K}). We
now state the following result for the K user channel:
Z1
Si
Decodert
S2
Z
Y2
W
Encoder
X
rchannel is:
Decoder 2
~
---
usrbn
=
-- - --
- - SK
~ZK
------+
+
YK
Decoder K
WK
Figure 3-7: K user multicasting channel
T heorem 4 (Outer Bound ) An outer bound on the maximum common rate
for the K
user binary channel is:
R < 1-
-H
K
(Si D S 2 ,S1 E S
-...
S1ED SK)
Theorem 5 (Inner Bound) An achievable rate R for the K user binary channels is given
by
R = 1-
1 -
)H (S,
Sj )
The proof of Theorem 4 follows along the lines of the outer bound of the two user case in
Section 3.3.2 and is omitted. The proof of Theorem 5 follows through a direct generalization
51
of the two user case in section 3.3.1. The main idea is to divide the codeword into K blocks
and clean up each user in one of the blocks. Each user has one clean block and K -1 blocks
over noisy BSC with crossover probability q
A
=
Pr(Si
82 =
1).
In general our inner and outer bounds are not equal. However they do agree in the
following cases:
Corollary 2 If {S1, S2 - ..
SK}
are mutually independent Bernoulli(1/2) random variables
then the capacity is C =k and is achieved through time-sharing.
Proof:
The proof follows by observing that when S1,... Sk are independent Bernoulli 1/2, the outer bound
simplifies to 1/K, which is clearly achieved through timesharing between the users. U
Corollary 3 If { S1, S2 . .. SK} are mutually independent Bernoulli(q) random variables(q ;
1/2) then as K -* oo we have C -+ (1 - H(q)) and this is achieved by ignoring the side
information at the transmitter.
Proof:
The proof follows by observing that if S1
(K - 1)H (Si)
... Sk
are i.i.d then
H(S1 D S 2 ,S 1
e S
... S81 E SK)
KH(S,)
As K --+ oc, the lower and upper bounds agree and we achieve the desired result.
M
Note that if the side information is available at the receiver then the capacity is C = 1.
There is a loss in capacity when there are K > 2 users. This is in contrast to the single
user case where we do not experience any loss in the capacity due to additive interference.
Corollary 3 makes a negative statement that side information is not very useful in multicasting to large group of users. How severe is this effect? The answer in general depends
on the distribution of the interference sequences. In Figure 3-8, we plot the upper bound of
Lemma 3 as a function of the number of users for Pr(Si = 1) = 0.1 and 0.2 and all the Si
are independent. We note that in either case a modest number of users is sufficient for the
upper bound to approach the limit of corollary 3. The value of side information decreases
rapidly when the number of users is reasonably large.
3.4.1
Improved Inner Bound for K > 2 users
The inner bound proposed in Theorem 5 is not optimal for K > 2 users. We propose an
architecture to improve this inner bound. The intuition behind the sub-optimality of the
52
Upper Bound on Capacity For Binary Noiseless Multicasting Channels with q=0.1,0.2
- -*-
1-H(0.2)
UB:q=0.2
UB:q=0.1
0.9 -
1-H(0.1)
0.8-
o
0
0
0.7.
0.0-
C
0
OL
0.5-
0.4-
0.3
0.2
1
1
1
t
2
3
4
I
I
5
6
Number of Users(K)
7
8
9
10
Figure 3-8: Upper bound for K user multicasting channel
inner bound of Theorem 5 is that the scheme pre-assigns the blocks to the users. While
cleaning up the block for one user, it injects additional noise to all other users. While this
scheme is optimal for two user case, it is in general sub-optimal. A scheme that decides
the cleanup positions after observing the interference sequences does better. For example,
consider a particular channel use when Si(i) = 1, S 2 (i) = 0 and S 3 (i) = 0 and 0 < i < n/3.
If the slots were pre-assigned, user l's channel is cleaned up and user 2 and 3 are hurt.
On the other hand if a decision were to be made solely based on the observation of these
values, then it may be better to serve the majority of the users. In the long run serving
majority of users should help, since it would translate to better rates for each individual
user. Thus, one would expect that higher gains could be achieved if the cleanup blocks
were decided after observing the interference sequences when there are more than 2 users
in the system. Throughout this section we consider the case when the interferences Si are
mutually independent.
To develop further insight into the architecture consider a naive scheme which, in each
block cleans up the channel of the majority of users. For instance, in our earlier example
53
C,
Y1
Pick Schedule U
Mesag
$\b
Cw $:U E {Ui ...
Um}
XS2
X(i) = C.(i) +S(i)
Sj is chosen by U
2
CT
S
Scheduler
Scheduler-3
CodeWord Selection
IW1
(C.,U)
- ---
--
- -- ---
- (C", U)
Y3
(Cw, U)
- - - - - - - Decoders
Channel
Figure 3-9: Improved Architecture for the 3 user channel
of three users if Si(i)
=
1, S 2 (i) = 0 and S 3 (i)
=
0 then we select x(i) = c(i).
The
channel for each user is transformed into a BSC channel with a cross-over probability
=
Pr(the user is in minority). If there are three users in the system and the interferences
are independent then such a scheme yields a rate of 1 - H(q(1 - q)) (where q
For small values of q, this rate is better than the rate 1
-
2H(2q(1
=
Pr(Si = 1)).
- q)) suggested by the
inner bound in Theorem 5, however for larger values of q, the latter is better. Is it possible
to do uniformly better than the inner bound of Theorem 5? The answer is yes and we
propose an architecture that achieves a higher rate for all values of q. For simplicity we
focus on the three user architecture.
Theorem 5 involves two steps. Selecting a suitable codeword and scheduling the cleanup.
See Figure 3-6. We maintain the same architecture, since it is amenable to practical systems.
However instead of using pre-assigned schedules as in the two user case, we make schedular
block more complex. (See Figure 3-9.) We maintain a list of many cleanup schedules and
pick the best schedule after observing the interference sequences. The decoders try out all
the schedules and if the number of schedules is small they get a decoding failure for all the
schedules except the correct one. We now describe the coding scheme in detail.
Coding Scheme
" Generate C 1 , C2 ..
C2
according to i.i.d B(!) distribution. Given message W pick
the codeword C,.
" Fix p(UIS1, S2, 83) where U E {U 1 , U2 , U3 }. Generate M =
2 n1(U;siS 2 ,S 3 )
schedule
sequences i.i.d according to p(U). If U(i) = Uj then for symbol i, user j's channel is
54
cleaned according to X(i) = C.(i)
e Sj(i) as shown in the figure above. Both the
codeword and schedule sequences are revealed to all the receivers.
" Given the sequence triplet (S1, S2, S 3 ) find a schedule U jointly typical with it according to p(U Si, S2, S3) and select this U as the scheduling scheme.
" Generate X = C,
e Sj where Sj is determined by the selected U as explained above.
" Each receiver upon receiving Yi searches for a pair (CW, U) that is jointly typical
according to Pr(Y, C, U) = Pr(C) Pr(U) Pr(Y
IC, U), since C and U are generated
independently. If it finds a unique such pair it declares the corresponding message,
otherwise it declares a decoding failure.
Probability of Error Analysis
There are two main sources of error
" Encoding Error: The encoder does not find a sequence u jointly typical with
s 1 , S2, S3.
The probability of this event vanishes as n --
sequences >
2
oo if the number of U
n1(U;Si,S2 ,S3 )
" Decoding Error: More than one (C, U) pair is jointly typical with the received
sequence.
This probability vanishes as n -+ oo if the number of (C, U) pairs <
2 nI(C,U;Yi)
It follows that having R + I(U; Si, S2, S 3 ) < I(C, U; Y) is sufficient to guarantee vanishing
probability of error.
Theorem 6 An achievable rate for the 3 user binary channel is given by
R <
max
p(C),p(UIS1,S
Where |UI
=
2 ,S3 )
min{I(C, U; Y)} - I(U; Si, S 2 , S 3 )
i
3 and U and C are mutually independent.
To simplify the expression of Theorem 6, we introduce a new variable N = C
e Y.
If Ni (i) = 1 then the received sequence has an error at symbol i. Clearly if U = U1 then
N, = 0 and there is no error. If U = U2 or U
=
U3 then N is either 0 or 1. N has the
interpretation of effective noise injected in channel 1 based on the choice of a particular
sequence u". In addition we impose the following distribution on p(U|S1, S 2 , S3) shown in
Table 3.1.
55
Table 3.1: The probability distribution p(U|SI, S2, S 3 ) of cleanup scheduling. Here p is a
parameter to be optimized.
S
000
p(U=UlIS) p(U=U
1/3
1/3
2
S) p(U=U
1/3
3
H(UIS)
log 2 (3)
S)
p(S)
q3
001
1/2-p
1/2-p
2p
+ H(2p)
q7qi
010
011
1/2-p
2p
2p
1/2-p
1/2-p
1/2-p
1 - 2p + H(2p)
1 - 2p + H(2p)
q qi
q qo
100
2p
1/2-p
1/2-p
1/3
1/2-p
2p
1/2-p
1/3
1/2-p
1/2-p
2p
1/3
1 - 2p + H(2p)
1 - 2p + H(2p)
1 - 2p + H(2p)
log 2 (3)
q7qi
q, qO
q qo
q_3
101
110
111
1 - 2p
For the distribution of U in Table 3.1, we can calculate Pr(Ni = 1JU = U2 )
Pr(Ni = 1JU = U 2 )
Pr(Ni = 1&U = U 2)
Pr(U = U 2 )
Pr({010 & U = U 2} or {011 & U = U2} or {100 & U = U2 } or {101 & U = U 2})
1/3
=
3
qoqi(1/
2
+ p)
Note that if p = 1/6 we have I(U; Si, S2, S 3 ) = 0 and the scheme reduces to choosing
the blocks ahead of time, without looking at the specific realization of the interference
sequences. Thus the inner bound described in the Theorem 5 falls as a special case of this
scheme. After some algebra, we arrive at the following
H(N|U)
=
I(U;Si,S 2 ,S 3 ) =
H(3qoqi(1/2 + p))
3qoqi(2p-
H(2p) + log 2 ( ))
2
(3.6)
(3.7)
(3.8)
56
Optimal l(U;S) as a
functio of q
0.012
0.01
-
0.008
D.006
0,004
0.002
00'
0.15
0.1
0.05
0.2
0.25
q
0.3
5
0.35
0.4
0.45
0.!
0.35
0.4
0.45
0.5
Optimal p as a function of q
0.15 -
a
0.10.05
01
1
0.05
0
0.1
0.2
0.15
0.25
0.3
Figure 3-10: Optimal p, I(U; Si, S 2 , S 3 ) vs q
Inner and
1r
Outer Bounds for 3 User scheme
- Achievable by Multiple Scheduling Scl
- - Achiable by Single Schedule
Bound
0.9
Outger
0.8
0.5
0.4
0
0.05
0.1
0.15
02
q: p F
1)
0.3
0.35
0.4
0.45
05
Figure 3-11: Inner Bound, Improved Inner Bound and Outer Bound.
57
We now simplify the achievable rate of Theorem 6 as follows:
R <
I (C, U; Yi) - I (U; S1, S2, S3) -
E
=
H(Y) - H(YIU, C) - I(U; Si, S 2 , S 3 )
=
H(Y)
=
H(Y) - H(NiJU) - I(U; S1, S2S,
=
1
-
-
H(C e NijU, C) - I(U; S1, S2, S3)
-
3)
-
E
N, is noise, independent of C
- E
{H(Ni U) + I(U;SiS 2,S 3 )} -E
The summation {H(NiIU)
+ I(U; Si,
S 2 , S3)} above captures the tradeoff between com-
plexity at the encoder and the cleanup ability at the receiver. If I(U; Si, S2, S 3 ) = 0 i.e we
only have one schedule, the scheduling block in Figure 3-9 is simple however the effective
noise injected in the channels is high. As we increase the number of schedules, we are able
to decrease the effective noise on the symbols. However, by increasing the number of schedules, we increase the possibility of selecting a wrong schedule. The tradeoff is to minimize
the sum {H(NiJU) + I(U; S 1 , S 2 , S 3 )} for the distribution p(UISi, S 2 , S 3 ).
Substituting (3.6),(3.7) in the above expression, we have
R(p)
=
1-
3qoqi{2p
-
H(2p)+log2(
2
)}+
3
H(3qoq1(1/2 + p))
Maximizing with respect to p, yields the maximum rate achievable from this scheme. The
optimal p and the corresponding rate are shown in Figure 3-11.
We note that the gain
from using optimal p is minimum and the added complexity of this scheme may not be
necessary. In fact the results of the previous section suggest that even though the inner
bound of Theorem 5 is not optimal it is quite close to the outer bound so that the further
gains may not be worthwhile.
3.4.2
Binary Channel with Noise
In this section we consider the binary channels with additional noise not known to either the
encoder or the decoder. Suppose there is an additive noise Zi on each user channels and let
p
=
Pr(Zi = 1). The coding theorem of section 3.3.1 is still applicable and the corresponding
rate for the 2 user case can be easily shown to be R < 1 - !H(p) - 1H(p * Pr(Si = S 2 ))
58
Si ~(O,Q) Z1 ~j(O,N)
'
Encoder
Decoder2
W
Decoder 2
W2
-~
A
+2
=x+s2+Z2
S2~WN(O,Q) Z 2 ~ W(O,N)
Figure 3-12: Two User Multicasting Gaussian Dirty Paper Channel
where
*
denotes the convolution operator (p * q
=
p + q - 2pq). Unfortunately the converse
presented in section 3.3.2 does not generalize. A loose upper bound may be obtained by
assuming that the noise is known to the sender, however this is clearly not tight and it is
still an open problem to develop a tighter upper bound in this case.
Gaussian Channels
3.5
In this section we turn our attention to the Gaussian channel. We focus on the case of two
users. Each user receives Y
C.Af(O,
Q)
=
X + Si + Zi. We suppose that Si and S2 are independent
random variables while Zi and Z 2 are independent CK(O, N) random variables.
The scenario is shown in Figure 3-12.
We observed that in the binary case, a novel two user coding scheme outperforms timesharing as well as ignoring side information. Are there similar gains in the Gaussian case?
The answer is yes for at least a certain regime of P and
Q
and we describe some achievable
rates in this section. Unfortunately, getting an upper bound is difficult, so we do not know
the optimality of these schemes.
Before we proceed to some achievable rates note that
time-sharing between the two users yields a rate of R
=
- log (1 + E) while ignoring side
information yields R = log 1 + N+Q).
Corollary 4 R = log
1 + N+Q2 is achievable
Proof:
Write S1
and SI
-
=
S2
}(S1 + S 2 ) + "(S 1 - S2 ) and S2 = {(S1 + S2 ) are uncorrelated. We cancel the common term
}(S1
- S2) and note that (S1 + 52)
2 ) using the single user dirty
j(Si + S
59
paper coding idea [4] while absorb the other term - (S 1 - S2) into noise. Thus the effective noise on
each channel is A(0, N + Q/2) and the rate R is achievable. U
Corollary 5 An achievable rate for the Gaussian two user channel is
R
1
-log(1
+ -)+- max
2
N
_
_
log
2\PP$+Q(P4
_
+,
P+N+Q
_
_
PN2I
P+N
0
Proof:
The sender finds a sequence u jointly typical with the particular realization
, f +).
= (S J
Let
of s and sends x = u - ag, where a = P/(P + N). With this choice of U and using the result
of [13](R < I(U; Y) - I(U; S)), we calculate the achievable rate at receiver 1 for the two blocks
Qog(P )2). (The rates at receiver 2 are
p2 N+
< i log(12+ Ni),
respectively asP AN
P+N+Q
reverse.) By using arguments similar to the ones used in the for the binary case we can show that
the rate R < i (Rbl + Rb2 ) is achievable under joint typical decoding and this completes the proof.
U
The achievable rate in the above Corollary is motivated from the coding scheme for
the two user binary channel. To make this clear, we provide an alternate scheme based on
lattice coding. Lattice coding for dirty paper channels was first suggested in [7].
An alternative interpretation of the above
Interpretation using Lattice decoding
argument using lattice decoding is as follows: The codeword is divided into two blocks.
In the first block, dirty paper coding is performed for user 1. Accordingly the transmitter
sends X = (V - aS 1 + D) mod A.
Here V is the desired codeword and D is a dither
sequence [7]. We next calculate the performance of receiver 1 and 2 for this signalling:
" Receiver 1: Y1 = X + Si + Zi.
The receiver generates MMSE estimate aY1
=
X + E + aS1 where E is a sequence of i.i.d CA[(O, PN ) RV's. Taking modulo A, and
subtracting the dither sequence the effective channel is (V + E) mod A and a rate of
log(1 + 2)
"
is achievable.
Receiver 2: Y
2
= X + S2 + Z 2 . Since S2 is not presubtracted by the transmitter, it is
treated as additional noise and the receiver generates MMSE estimate a'Y = X + E',
where a' =
P
and E' is a sequence of i.i.d Cfi(O,
PN+±)
random variables.
The effective channel after the modulo A operation and subtracting the dither signal
is (V - aS1 + E') mod A and the achievable rate is max
0, log
\
(2Q
\
0
(+
))
P+N+Q/
In the subsequent block, the achievable rates at the two receivers will reverse and the average
rate in the lemma is achieved.
60
While the analogous scheme is strictly better than time-sharing in the binary case the
same is not true in the Gaussian case. The main difficulty is that in the first block if we
decide to clean up user l's channel, the noise injected in user 2's channel is significantly high
and a positive rate cannot be achieved for user 2 in this block for large (but finite) values
of
Q.
One may decide not to completely cleanup user 1 in the first block. One way to do
this is to choose a different value of a from the optimal. Unfortunately such a strategy does
not yield large gains and in general it is still an open problem to come up with a non-trivial
rate that beats time-sharing for all range of interferences.
3.6
Writing on many memories with defects
As noted earlier, the problem of writing on memory with defects [19],[14] was one of the
earliest application that lead to the general problem of coding for channels with non-causal
side information at the transmitter [13] for point to point links. In this section we briefly
summarize the classical memory with defects problem and then suggest some achievable
rates for multiuser channels.
Suppose we have an array of n cells on which we wish to store an encoded file. Each cell
can store either a 0 or a 1. A cell is defective with probability p and clear with probability of
1 - p. A defective cell is either stuck at a 0 or a 1 (both events are equally likely), no matter
what we write on it. A clear cell stores the bit that is written on it. A write-head scans
the entire array to determine which of the cells are defective and notes the value they are
stuck at. It uses this position to encode the file into a binary codeword, so that when the
file is subsequently read later on, it can be correctly decoded without knowing the position
of the defects. If decoder knew this information, it is clear that the capacity of the memory
is 1 - p. Is there a loss in the capacity if the decoder does not know the position of the
defects?
The surprising result is that there is no loss even if the receiver did not have this
information. This is analogous to the result of dirty paper coding [4]. The coding scheme
uses the idea of random binning. The main idea is that for large values of n, approximately
np cells are stuck at either a 0 or a 1. If in each bin, we can find a codeword that agrees
with the pattern of defects, then we can use this codeword to store the information of the
bin index without incurring any errors. There are n(1 - p) bits which are clear and hence
61
there are a total of
2 n(1-p)
codewords which agree with any specific sequence of defects. If
all these codewords are in different bins then we can communicate with a rate of 1 - p. If
the codewords were generated i.i.d Bernoulli(1/2) and randomly partitioned into 2n(1-P-E)
bins, then the codewords that agree with a given defect are in fact in different bins and we
achieve the maximum possible rate. More practical schemes based on erasure correcting
codes have been suggested for this and related applications [19],[41]. A slightly more general
version single memory with defects is given in the following lemma:
Lemma 2 Consider a memory with defects channel whose cells can have one of the following possibilities (independent of all other cells):
" stuck at 0 with probability P
"
stuck at 1 with probability P
"
clear with probability 1 - p but can flip the symbol with probability 6
The capacity of this memory is given by [14] (1 - p)(I - H(E))
3.6.1
Two Memories with Defects
We are interested in the following multi-user generalization of the writing on memory with
defects problem. There are two memories each with n cells. Each cell is defective with
probability p independent of all other cells. There is only one write-head which can simultaneously write on both the memories. The header can scan the memories and know the
position of defects. A read-head that subsequently scans one of the two memories does not
know these locations. How should one encode the file so that any one memory can be used
to decode the file at the maximum possible rate? The problem is still open. We propose
the following achievable rates
Corollary 6 An achievable rate for writing on two memories is R =
1
2
This scheme is
asymptotically optimal as p -+ 1
Proof:
The main idea is to use different codebooks for the two memories. We first serve memory 1 at a rate
1 - p and write on all the clear cells on this memory. In the process, we treat the resulting symbols
on memory 2 as additional stuck at defects. Memory 2 achieves a rate of p(l - p) corresponding to
the fraction of indices that have clear cell in memory 2 and defective cells in memory 1 . For the
62
as claimed. As
next message, the order of the two memories is reversed. This yields a rate of 1
p -* 1, this approaches 1 - p, which is an obvious upper bound on the achievable rate. U
The above coding scheme performs well as p -+ 1. But for small values of p it performs
poorly. Infact for p = 0, we only achieve 1/2 of the total rate. Use of independent codebooks
for the two memories is suboptimal for small values of p. We now consider the other extreme,
where we have the same codebook for the two memories.
Corollary 7 An achievable rate for writing on two memories is max (R 1 , R 2 ) where
I - 2p(1 - p) - p2
-)
2
R,
=
R2
=
-2
2
-- H
4
lp
p
(__p)2 + 2
(I -H
22
+
-Pip)
p-2
2
Proof:
Our proof is based on constructing a common codebook for the two memories. Let us define the
event Ei
2 k, i E {1, . .. n} j, k c {c, 1, 2} to be the event that at index i the cell of memory 1 is in state
j and the cell of memory 2 is in state k. Here the state 'c' refers to a clear cell, the state '0' refers
to a cell stuck at 0 and the state '1' refers to a cell stuck at '1'. Since the cells are independent, we
have P(Eijk) = Pr(j)Pr(k).
We argue that R 1 is achievable as follows. Consider a fictitious memory F whose ith cell Fj is
marked as clear if either Ec, Ejoi or Eio occurs. F is stuck at 0 either Eco, Ejoc or Ej00 occurs.
or E411 occurs. By law of large numbers, Fj has ~ (1 - p) 2 + Z
2p(l - p) + P. fraction of stuck cells. If the events Ejoi and Eilo
fraction of clear cells and
occur, there is an error in one of the memories, since the common codebook cannot agree for both
F is stuck at 1 if either Eiic, Ei~
defects. The fraction of these errors in either memories is
e =
(i-p)
2
Accordingly, we let each
+P'2
cell marked clear on 'F' to be a BSC with probability E. It is clear that any rate achievable for
F is also achievable for the constituent memories. We note that F is equivalent to the memory in
Lemma 2 with eff = E and peff = 2p(l - p) + I- thus R 1 is achievable.
To show that R 2 is achievable, we consider another fictitious memory G, whose ith cell is labeled
as clear if any of the following events occur: Ejcc, Ejo, Eiic, Eco, Ejci, Ejol and E 1 0 . Gi is marked
as stuck at 1 if Esil occurs and as stuck at 0 if Ej00 occurs. To ensure that any rate achievable by
G is also achievable for the constituent memories, we require that each clear cell is G is a BSC with
error probability
E
4
.
We now note that G is equivalent to a memory in Lemma 2 with
2
&egf= e and pef f
I
and thus the achievability of R 2 follows. U
Figure 3-13 plots the achievable rates for the different schemes described in this section.
We observe that for large p, it is better to code the two memories separately using the
scheme described in Corollary 6. On the other hand, for small values of p, it is better to
use the coding schemes described in Corollary 7 to achieve higher rates. It is still a open
problem to find a strategy that is uniformly optimal.
63
1
-
0.9
-
R,(Common Code)
R 2 (Common Code)
(Independent Code)
(Outer Bound)
0.8
0.7
N
0.6
co
CL
0.5
-
-N
0.4
-
'-
0.3
0.2
0
N
-
0.1
0
0.1
0.2
0.3
0.5
0.4
0.6
0.7
0.8
0.9
1
p: Channel
Figure 3-13: Achievable Rates and Outer Bound for writing on two memories with defects
64
Chapter 4
Conclusion and Future Work
The main focus of this thesis has been to understand fundamental limits and propose
practical architectures for multicasting in wireless networks. While the problem of sending independent messages to different users has been studied extensively in the literature,
relatively little work has been done for the case of multicasting common information.
In Chapter 2, we study the problem of multicasting from a base station having multiple transmit antennas. The fading environment plays an important role in the design of
such systems. The problem is more interesting for slow fading case where different users
experience different channel gains.
Our focus in this study has been the limit of large
number of users. We show that multiple antennas provide huge gains under many different
performance criteria. The main intuition is that multiple antennas average out the fading
effects and hence drastically improve the performance of the worst user. We also propose
an architecture that uses erasure codes for communicating over block fading channels. This
architecture sheds new light on the cross layer design of the application and physical layers.
One important observation is that if the application layer exploits time diversity of the
wireless channel using an ARQ type protocol, it is not necessarily a good idea to design the
physical layer that aims for a small outage probability. Furthermore one should be careful in
interpreting the gains achieved from using multiple antennas in these systems. It is known
that spatial diversity does not provide substantial gains in systems with rich time diversity
[24]. Our study takes a further step and shows that to achieve these marginal gains one has
to operate very close to the optimal operating point. These gains diminish quickly as we
deviate from the optimal operating point. Finally we consider the problem of multicasting
65
to multiple groups with multiple antennas. We discuss the difficulty in generalizing the
dirty paper coding solution to these problems and motivate our work in Chapter 3.
In Chapter 3, we study the problem of multicasting over channels with known interference at the transmitter. We develop a model for a class of multicasting channels where
the transmitter has to deal with more than one state sequence simultaneously and observe
that it is a non-trivial generalization of the single user problem which has been studied
previously. We derive the capacity of the two user binary case and and propose several
achievable rates for other models such as the Gaussian channels and the memory with defects case.
Our main conclusion for the binary case is that there is a fundamental loss
if only the transmitter knows the interference. Furthermore the value of side information
diminishes as the number of users grows. These results are somewhat negative since the
point to point result states that there is no loss in capacity if only the transmitter knows
the interference. We expect similar results to hold for channels other than binary channels,
but we have not proved this fact in this thesis.
Future work
Understanding fundamental limits for multicasting in wireless networks is
still a rich area and a lot remains to be done. One open problem for future work is in adapting rateless erasure codes over fading channels as discussed in section 2.5. The broadcast
approach in section 2.5.1 is a multilayered generalization of the single layer system that
we consider and is particularly attractive way for communicating with low delay on block
fading channels. The main difficulty in using erasure codes with the broadcast approach
is that the rates of different layers are different. Hence we need erasure codes which can
provide different rates for different layers. It is an open problem to design rateless erasure
codes over different alphabets which can be simultaneously used for efficient decoding.
The problem of multicasting over channels with known interference has many open
problems. The most general answer to this problem will be an extension of Gel'fand Pinsker
result [13] to multicasting channels. In this thesis, we proposed a novel coding scheme for the
binary channel, which is perhaps the simplest non-trivial channel in this class. The Gaussian
channel still remains to be completely solved. The main difficulty in these problems is that in
contrast to the single user case, there is a fundamental loss if only the transmitter knows the
additive interference. The solution to these problems will require novel coding techniques
to achieve non-trivial inner bounds, while deriving sharp upper bounds for these channels
66
may require some tricks not yet known.
67
Appendix A
Coding over multiple blocks in
SISO Block Fading Channels
In this appendix we analyze the performance improvement by using erasure codes that
span K blocks. We assume a SISO system with rayleigh fading and the high SNR (p -- 00)
regime. For such a system the the ergodic capacity is given by Cerg = log p - 'y nats [38]
where -y A 0.577 ... is the Euler's constant. We now consider the rate achieved from coding
over K blocks. We state the following two lemmas:
I'
Lemma 3 (Chernoff Bound) If X =
Xi is a sum of K i.i.d random variables,
then Pr(X < a) _< (E[e-sXi])Kesa for all s > 0.
Lemma 4 [42] If X ~ X
then
1
n + 1 - s, -)
1
E[(1 + pX)-s] = -4(n,
pf
P
p(-) is a hypergeometric function of second kind with the following property:
FP(n-s) .(1)s-fl
F(n)
4(n,n + 1 -s,) P
ifs < n
r(s n)
ifs> n
log p-V(n)+ 2 V(1)
if s=n
+
H(n)
Here IF(-) is the regular Gamma function and b(.) is the digamma function.
68
Let Ik =
i 1 log(1
+ pjhjI 2 ). We compute e = Pr (IK < KR)
Xi
K
Pr(Z Xi < KR)
E=
i=1
a
K sKR
K
(E[e-sXI)
=
(E[(1 + phi
(1, 2
((T
b]pl_
2
-s])Ke sKR
esKR
s, 1)
-
esKR
)
(A. 1)
Where (a) is due to Chernoff bound and (b) follows from Lemma (4).
Thus we have R > R (I
-
We now maximize this lower bound over s
(r(1-S))KesKR)
and R. Differentiating with respect to R and rearranging terms, we get':
-1=0
esRK(1+sRK) F(S)
psK
psK
esRK(1
+ sRK)
=
(
sK
r(1 - s)K
sRK + log(l + sRK) = sK log p - K logF(1 - s)
R
=
log
logF(1-s)
s
(A.2)
log(1+sKR)
sK
Since we are free to choose any s > 0 and we have K -+ oo, we select s
-0
0 such that
sK -+ oo. One such choice could be s = 1/log K. With this choice, we have
R
=logp
-
lim logL(1 - s)
s-0
-+
-+
S
_
lim log(1 + sKR)
sK-oo
sK
log R
logp-y- sK
log log p
log p-sK
'This step is heuristic since we are differentiating an approximation. To be rigorous, one should differentiate the entire expression and then make the high SNR approximation.
69
Finally, we need to calculate the outage probability e, since
N? =
(1 - E)R. From (A.1),
we have
rF(1 - s)
E
K esKR
PS
)S
K(1 _S)(e
-
eR)sK
sK logr(1-s)
e
sK(R + logF(1
exp
--+
c, C
=
s
1oo
1 sKR
sK
Thus as K
P
(1
-
e)R
-
logp
0
-
-
S) - log P)
From (A.2)
y
-
(log log p) nats. Thus for high SNR
and large K, we approach the ergodic capacity according to -O(loglog
s = K-5 (where 3 > 0 can be arbitrarily small), we satisfy both sK
thus we approach the capacity as 0 (K1_ i i).
70
-+
p). By choosing
0c and s -+ 0, and
Appendix B
Proof of proposition 2
We prove the following result used in Proposition 2. The minimum of n i.i.d
XiL
random
variables decreases as 0(n-(1/L+E)), for large n
Let X 1 , X 2 .
and suppose W
..
X, be i.i.d random variables with cumulative distribution function F(-)
=
miniE{1,2...n} Xi is the minimum of the n random variables. Let a(F) =
inf{x : F(x) > O} be the lower end point of the distribution. The following results are
known [10](Sec. 2.1,pg. 58).
Proposition 3 Let a(F) = -o.
Assume there is a constant y > 0, such that for all x > 0,
lim Ft)=
t--oo F(t)
X
(B. 1)
Then there is a sequence dn > 0 such that
lim Pr(Wn < dnx) = Li,,(x)
n--.oo
where
{=
exp (- (-x))
1
if x < 0
if X> 0
The normalizing constant d, can be chosen as 1
dn = sup {x: F(x) <
n
{
'There is a certain amount of freedom in choosing the normalizing constant. But for our purpose, this
choice of d, is sufficient.
71
Proposition 4 Let a(F) be finite. Suppose that the distributionfunction F(t) = F(a(F)-
1/t)
fort < 0 satisfies (B.1). Then there are sequences c, and d, > 0 such that
lim Pr(Wn < cn + dax) = L 2 ,(X)
(B.2
n-oo
)
where
- exp (-((x
L 2 ,,y (X) =
if x > 0
)
0
if x < 0
The normalizing constants cn and d, can be chosen to be
cn
=
a(F)
dn
=
sup
We apply Proposition 4 to the X 2
-
F(x) =
(B.3 )
x:F(x) < dn
W4
(BA )
-a(F)
distribution. The distribution function is given by
ex (1 + x +
x2
XL
1
x>0
(B.5)
1
x<0
10
Clearly, a(F) = 0 and F(x) in Proposition 4 is given by
-e
F(t)=
lt
(1
-
1 + 12 +
S27
1
l)L
(L-)!tL- 1)t>O
t<
t>0
10
72
(B.6)
Now we verify that F(-) satisfies the (B.1).
lrn F(tx)
t-*-oo F(x)
dF(tx)
=
l'H6pital Rule
t
lim
t--oo
dF(t)
dt
(7
{e'/t
lim
-
t-4-00
.
d
eji/t
el/t
-
(I
X+
2(tX)
2
(L-1)!(tX)Ll
1
1
(+)
-et+++.
+
lim)I
=
1
x
1. Accordingly, using Proposition 4 we have
Thus Corollory 4 is applies to F(.) with '
cn = 0. To solve for da, we use the approximation that for small x, (B.5) is approximately
. As n
-+
is satisfied by choosing dn
=
given by F(x) ~
oo, dn
<
O(n 1 -').
1 and hence, we have from (B.4) that d
. This
From (B.2), it follows that
lim Pr(Wn < dnx) = 1
n--+oo
<
-
exp(-x)
for x > 0
(B.7)
Thus for a X2L distribution, as r -+00, the smallest element decreases as 0(n-(1/L+E))
73
Bibliography
[1] G. Caire and S. Shamai. On the achievable throughput of a multiantenna Gaussian
broadcast channel. IEEE Trans. Info Theory, 49(7):1691-1706, July 2003.
[2] B. Chen and G. W. Wornell.
Quantization index modulation: A class of provably
good methods for digital watermarking and information embedding.
IEEE Trans.
Information Theory, IT-47:1423-1443, May 2001.
[3] A. S. Cohen and A. Lapidoth. Generalized writing on dirty paper. In Proc. Int. Symp.
Inform. Theory (ISIT), Lausanne, Swtizerland, page 227, July 2002.
[4] M. H. M. Costa. Writing on dirty paper. IEEE Trans. Information Theory, IT-29:439441, May 1983.
[5] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York,
1991.
[6] 1. Csiszar and J. Korner. Information Theory - Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981.
[7] U. Erez, S. Shamai (Shitz), and R. Zamir. Capacity and lattice strategies for cancelling
known interference. IEEE Trans. Information Theory, submitted, June 2002.
[8] G. D. Forney, Jr. Digital communications-II(6.451) - Course notes. MIT, Spring 2003.
[9] G. D. Forney, Jr. Exponential error bounds for erasure, list, and decision feedback
schemes,. IEEE Trans. Info. Theory, 1974.
[10] Janos Galambos.
The Asymptotic Theory of Extreme Order Statistics. Rpnert E.
Krieger Publishing Company, 1986.
74
[11] R. G. Gallager. Information Theory and Reliable Communication. Wiley, New York,
N.Y., 1968.
[12] A. El Gamal. Network information theory - summer Course notes. EPFL, 2003.
[13] S. I. Gelfand and M. S. Pinsker. Coding for channel with random parameters. Problemy
Pered. Inform. (Problems of Inform. Trans.), 9, No. 1:19-31, 1980.
[14] C. Heegard and A. El Gamal. On the capacity of computer memory with defects. IEEE
Trans. Information Theory, IT-29:731-739, Sept. 1983.
[15] Sidharth Jaggi, Philip A. Chou, and Kamal Jain. Low complexity algebraic multicast
network codes. In InternationalSymposium on Information Theory, 2003.
[16] N. Jindal and A. Goldsmith.
Dirty paper coding vs. TDMA for MIMO broadcast
channels. In IEEE International Conf. on Communications (ICC), June, 2004.
[17] Young-Han Kim, Arak Sutivong, and Styrmir Sigurjonsson. Multiple user writing on
dirty paper. In InternationalSymposium on Information Theory, 2004.
[18] R. Koetter and M. Medard. An algebraic approach to network coding. IEEE/A CM
Transactions on Networking, 11:782-795, 2002.
[19] A. V. Kuznetsov and B. S. Tsybakov. Coding in a memory with defective cells. translated from Prob. Peredach. Inform., 10:52-60, April-June, 1974.
[20] M. Lopez.
Multiplexing, Scheduling, and Multicasting Strategies for Antenna Arrays
in Wireless Networks. PhD dissertation, MIT, EECS Dept, June-August 2002.
[21] M. Luby. LT codes. In The 43rd Annual IEEE Symposium on Foundationsof Computer
Science, 2002.
[22] E. Malkamaki and H. Leib. Evaluating the performance of convolutional codes over
block fading channels. IEEE Trans. Info Theory, 45(5):1643-46, July 1999.
[23] A. Narula, M. Lopez, M. Trott, and G. Wornell. Efficient use of side information in
multiple-antenna data transmission over fading channels.
Areas in Communications, 16(8):1423-36, October 1998.
75
IEEE Journal on Selected
[24] A. Narula, M. Trott, and G. Wornell. Performance limits of coded diversity methods
for transmitter antenna arrays. IEEE Trans. Inform. Theory, 45(7):2418-33, November
1999.
[25] J. Nonnenmacher. Reliable Multicast Transport to Large Groups. PhD thesis, EPFL
Lausanne, Switzerland, 1998.
[26] L. Ozarow, S. Shamai, and A. Wyner. Information theoretic considerations for cellular
mobil radio. IEEE Tranactions on Vehicular Technology, 43(2):359-78, May 1994.
[27] P. Sanders, S. Egner, and L. Tolhuizen. Polynomial time algorithms for network information flow. In Proc. of the 15th ACM Symposium on Parallelism in Algorithms and
Architectures., 2003.
[28] C. E. Shannon. Channels with side information at the transmitter. IBM Journal of
Research and Development, 2:289-293, Oct. 1958.
[29] A. Shokrollahi. Raptor codes(2003). Pre-print.
[30] Y. Steinberg. On the broadcast channel with random parameters. In International
Symposium on Information Theory, 2002.
[31] A. Steiner and S. Shamai. Multi-layer broadcasting in a MIMO channel. In Conference
on Information Science and Systems, March 2004.
[32] E. Teletar. Capacity of multi-antenna Gaussian channels. European Transac. on Tele-
com. (ETT), 10:585-596, November/December 1999.
[33] D. N. C. Tse and P. Viswanath. Fundamentals of Wireless Communications. Working
Draft, 2003.
[34] P. Viswanath and D.N.C Tse. Sum capacity of the vector Gaussian broadcast channel
and uplink-downlink duality. IEEE Trans. Info Theory, 49(8):1912-21, August 2003.
[35] S. Viswanath, N. Jindal, and A. Goldsmith. On the capacity of multiple input multiple
output broadcast channels.
In Proc. of ICC 2002, New-York, NY, USA, 28 April-
2 May, 2002.
76
[36] Hanan Weingarten, Yossef Steinberg, and Shlomo Shamai (Shitz). The capacity region
of the Gaussian MIMO broadcast channel. In CISS, 2004.
[37] J. Wolfowitz.
Coding Theorems of Information Theory. Springer-Verlag, New York,
1964.
[38] G. W. Wornell and L. Zheng. Principles of wireless communication - Course notes.
Massachusetts Insititute of Technology, 2004.
[39] W. Yu and J.M. Cioffi. Trellis precoding for the broadcast channel, preprint, Globecom
2001. See also: The sum capacity of a Gaussian vector broadcast channel, submitted
to IEEE Trans. Inform. Theory.
[40] W. Yu, A. Sutivong, D. Julian, T. Cover, and M. Chiang. Writing on colored paper. In
Proc. Int. Symp. Inform. Theory (ISIT2001), page 302, Washington, DC., USA, June
24-29, 2001.
[41] R. Zamir, S. Shamai (Shitz), and U. Erez. Nested linear/lattice codes for structured
multiterminal binning. IEEE Trans. Information Theory, IT-48:1250 -1276.
[42] Hao Zhang and Tommy Guess. Asymptotical analysis of the outage capacity of ratetailored blast. In GlobeComm, 2003.
77