Coding Techniques for Multicasting Ashish Khisti

Coding Techniques for Multicasting by Ashish Khisti B.A.Sc., Engineering Science University of Toronto, 2002 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of MASSACHUSETTS INSTI OF TECHNOLOGY Master of Science at the JUL 2 6 2004 MASSACHUSETTS INSTITUTE OF TECHNOLOGY LIBRARIES May 2004 Luie 2ec.- @ Massachusetts Institute of Technology 2004. All rights reserved. A uthor .. .................................... Departmet of Electrical1ngineering and Computer Science May 21, 2004 ................... Certified by .. I Gregory Wornell Professor Thesis Supervisor Certified by................. Uri Erez Post Doctoral Scholar Tkc'is Supervisor Accepted by... Arthur C. Smith Chairman, Department Committee on Graduate Students BARKER E Coding Techniques for Multicasting by Ashish Khisti B.A.Sc., Engineering Science University of Toronto, 2002 Submitted to the Department of Electrical Engineering and Computer Science on May 21, 2004, in partial fulfillment of the requirements for the degree of Master of Science Abstract We study some fundamental limits of multicasting in wireless systems and propose practical architectures that perform close to these limits. In Chapter 2, we study the scenario in which one transmitter with multiple antennas distributes a common message to a large number of users. For a system with a fixed number (L) of transmit antennas, we show that, as the number of users (K) becomes large, the rate of the worst user decreases as O(K-). Thus having multiple antennas provides significant gains in the performance of multicasting system with slow fading. We propose a robust architecture for multicasting over block fading channels, using rateless erasure codes at the application layer. This architecture provides new insights into the cross layer interaction between the physical layer and the application layer. For systems with rich time diversity, we observe that it is better to exploit the time diversity using erasure codes at the application layer rather than be conservative and aim for high reliability at the physical layer. It is known that the spatial diversity gains are not significantly high in systems with rich time diversity. We take a step further and show that to realize these marginal gains one has to operate very close to the optimal operating point. Next, we study the problem of multicasting to multiple groups with a multiple antenna transmitter. The solution to this problem motivates us to study a multiuser generalization of the dirty paper coding problem. This generalization is interesting in its own right and is studied in detail in Chapter 3. The scenario we study is that of one sender and many receivers, all interested in a common message. There is additive interference on the channel of each receiver, which is known only to the sender. The sender has to encode the message in such the way that it is simultaneously 'good' to all the receivers. This scenario is a non-trivial generalization of the dirty paper coding result, since it requires that the sender deal with multiple interferences simultaneously. We prove a capacity theorem for the special case of two user binary channel and derive achievable rates for many other channel modes including the Gaussian channel and the memory with defects model. Our results are rather pessimistic since the value of side information diminishes as the number of users increase. Thesis Supervisor: Gregory Wornell Title: Professor Thesis Supervisor: Uri Erez Title: Post Doctoral Scholar 2 Acknowledgments First and foremost I would like to thank my two wonderful advisors, Greg Wornell and Uri Erez. I am truly privileged to have the opportunity to work with the two of you. Greg took me under his wings when I joined MIT and has ever since been a tremendous source of wisdom and inspiration. My interactions with Uri have been simply remarkable. Any time I had the slightest idea on my research problem, I could drop by his office and develop a much clearer picture on what I was thinking about. I will always remember the summer of 2003 when we spent several hours together each day trying to crack the binary dirty paper coding problem. I have learned so many things from the two of you in such a short while that it is difficult to imagine myself two years earlier. Special thanks to my lab mates, Albert Chan, Vijay Divi, Everest Huang, Hiroyuki Ishii, Emin Martinian, Charles Swannack and Elif Uysal. I should particularly mention special thanks to Emin Martinian for several interesting research discussions. It was great to have you Vijay as my office mate, with whom I could freely talk almost anything that came to mind. Special thanks to Giovanni Aliberti, our systems administrator for providing us with delicious homemade sandwiches during the Thursday lunch meetings and introducing me to the wonderful world of Mac OSX. My deepest regards to our admin, Tricia Mulcahy for making all the complicated things at MIT look so simple. Also thanks to Shashi Borade and Shan-Yuan Ho for being wonderful travel companions in my trip to Europe last year. Thank you Watcharapan Suwansantisuk for inviting me to the Thai festival. I would like to thank the wonderful researchers at HP Labs, Palo Alto for inviting me to visit their labs several times during the last two years. I would especially like to acknowledge some fruitful interactions with Mitch Trott, John Apostolopoulos and Susie Wee at the HP labs and hope that they continue in the future. Finally, I would like to thanks my grandmother, mother and my sister for being extremely supportive during my course of studies. It is nice to have three generation of ladies to support my uprising! I could have never gone this far without your support. 3 Contents 1 Introduction 2 Multicasting from a Multi-Antenna Transmitter 12 2.1 Channel Model for Multicasting . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Serving all Users - No Transmitter CSI . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Single User Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Outage in Multicasting 16 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . Serving all Users - Perfect Transmitter CSI . . . . . . . . . . . . . . . . . . 18 Naive Time-Sharing Alternatives . . . . . . . . . . . . . . . . . . . . 20 2.4 Serving a fraction of Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Multicasting Using Erasure Codes 23 2.3.1 2.6 3 9 . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Architecture Using Erasure Codes . . . . . . . . . . . . . . . . . . . 26 2.5.2 Analysis of the Achievable Rate . . . . . . . . . . . . . . . . . . . . 28 2.5.3 Layered Approach and Erasure Codes . . . . . . . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6.1 Transmitting Independent Messages to each user . . . . . . . . . . . 35 2.6.2 Multiple Groups 36 Multicasting to Multiple Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multicasting with Known Interference as Side Information 38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2 User noiseless case . . . . . . . . . . . . . . . . . . . . 43 3.3.1 Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.3 Discussion of the Coding Scheme . . . . . . . . . . . . . . . . . . . . 48 3.1 Point to Point channels 3.2 Multicasting channels 3.3 Binary Channels - 4 3.3.4 Random Linear Codes are Sufficient . . . . . . . . . . . . . . . . . . 49 3.3.5 Alternative Proof of the Coding Theorem . . . . . . . . . . . . . . . 49 3.3.6 Practical Capacity Approaching Schemes . . . . . . . . . . . . . . . 50 More than two receivers . . . . . . . . . . . . . . . . . . 51 3.4.1 Improved Inner Bound for K > 2 users . . . . . . . . . . . . . . . . . 52 3.4.2 Binary Channel with Noise . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.6 Writing on many memories with defects . . . . . . . . . . . . . . . . . . . . 61 Two Memories with Defects . . . . . . . . . . . . . . . . . . . . . . . 62 3.4 Binary Channels - 3.6.1 4 Conclusion and Future Work 65 A Coding over multiple blocks in SISO Block Fading Channels 68 B Proof of proposition 2 71 5 List of Figures . . . . . . . . . . . . 2-1 Outage Probability for a Single User in MISO system 2-2 K - E[R] vs. K for L=1,2 ... 6 antennas. We need to take the expectation over R, since we are considering a finite number of users . . . . . . . . . . . 2-3 17 20 The pdf of F with L=2,8. The mean of both distribution is the same, but the pdf for L=8 has much shorter tails. This helps when we aim for low values of F, but hurts when we aim for very high values of F 2-4 . . . . . . . . . . . . 24 A time division protocol for serving 4 groups of users. Each group has multiple users that want the same content. The channel coherence time is large enough so that the channel gain is constant in a given block but small enough that successive blocks to a given group have independent gains. . . . . . . . 2-5 Low SNR analysis of erasure codes based Multicasting. (a) The optimal value of j (normalized throughput) as a function of the number of antennas and the corresponding optimizing value of R Cerg (normalized target rate) (b) The optimal outage probability as a function of the number of antennas (c) as a function of 2-6 24 R Cerg for L = 1, 4, 10 transmit antennas. e . . . . . . . . . . . 30 Analysis of of the erasure code based multicasting at SNR=50 dB. (a) Optimal value of C as a function of the number of antennas and the corresponding optimizing value of R (b) The optimal outage as a function of the number of antennas (c)O as a function of R - Cerg for L = 1, 4, 10 transmit antennas. 32 3-1 Point to Point channel with state parameters known to the encoder . . . . . 40 3-2 Dirty Paper Coding Channel . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3-3 Two User Multicasting Channel with Additive Interference . . . . . . . . . 44 3-4 Achievable rates for the two user multicasting channel when S 1 and i.i.d. The x-axis is Pr(Si) = 1 S2 are . . . . . . . . . . . . . . . . . . . . . . . . . 6 45 3-5 Coding for two user multicasting channel 3-6 Architecture for 2 user Binary Multicasting Channel . . . . . . . . . . . . . 50 3-7 K user multicasting channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3-8 Upper bound for K user multicasting channel . . . . . . . . . . . . . . . . . 53 3-9 Improved Architecture for the 3 user channel . . . . . . . . . . . . . . . . . 54 3-10 Optimal p, I(U;S1,S 2 ,S 3 ) vs q . . . . . . . . . . . . . . . . . . . . . . . . . 57 . . . . . . . . . . 57 3-12 Two User Multicasting Gaussian Dirty Paper Channel . . . . . . . . . . . . 59 3-13 Achievable Rates and Outer Bound for writing on two memories with defects 64 . . . . . . . . . . . . . . . . . . . 3-11 Inner Bound, Improved Inner Bound and Outer Bound . 7 46 List of Tables 3.1 The probability distribution p(UIS 1 , S2, S 3 ) of cleanup scheduling. Here p is a parameter to be optimized. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 56 Chapter 1 Introduction The problem of distributing common content to several receivers is known as multicasting. It arises in many popular applications such as multimedia streaming and software distribution. It is pertinent to both wireless and wireline networks. Some early work [25] in multicasting has focussed on developing efficient scalable protocols for multi-hop networks such as the Internet and mobile ad hoc networks (MANET). These protocols provide on demand service so that the entire network does not get flooded with data packets. More recent work [18] on wireline multicasting has shown that the fundamental rate can be increased if the intermediate nodes perform coding. Polynomial time algorithms for efficient network coding have also been suggested [27, 15]. Yet another direction [21, 29] has been to model the endto-end links as erasure channels and develop codes that can be efficiently decoded by several users experiencing different channel qualities. These codes are suitable for multicasting over the Internet where packet loss is the main source of error. In this thesis, we establish some fundamental limits and propose robust architectural solutions for multicasting in wireless networks. A signal transmitted from one node in a wireless network is received by all other nodes surrounding it. While this effect causes interference if different users want different messages, it is beneficial in the multicasting scenario. Despite this inherent advantage, the problem has not been widely explored in the wireless setting. We are only aware of the work in [20] that studies some array processing algorithms and fundamental limits on multicasting in wireless systems. Multicasting is not a new application in wireless. The radio and TV broadcasting systems are one of the earliest examples of analog wireless multicasting. 9 However, the engineering principles that govern the design of these systems are very different from the modern view. These networks have a powerful transmitter designed to maximize the range of signal transmission. The transmitter is mounted on tall towers so that many receivers are in direct line-of-sight and receive good quality signal. Unfortunately, the data rate available in such systems is limited and modern of digital television systems use wireline solutions such as cable TV for high bandwidth transmission. The enormous growth of wireless telephony over the last decade has been made possible in part through the new digital wireless networks. These networks, also known as cellular networks, divide the geographical area into smaller cells and reuse the available spectrum in different cells so that they can support a large number of users. Furthermore these systems do not rely on line of sight communication, but rather exploit the structure of wireless fading channel to develop new algorithms for reliable communication. These networks have been primarily designed for providing individualized content to each user. However, the next generation cellular networks are expected to support wireless data and streaming media applications where several users are interested in the same content. As these applications are deployed, it is important to develop new efficient architectural solutions. Clearly, if all the receivers want the same content, a system that creates a separate copy for each receiver is far from efficient. Consider for example an indoor sports stadium where the audience have personalized screens on which they can watch the game more closely and listen to the commentary. A central access point provides content to all the receivers over wireless links. Unlike the analog TV/radio multicasting systems, these digital systems cannot rely on line of sight communications and have to combat fading. At the same time they are not efficient if they do not exploit the fact that all the receivers want a common message. In Chapter 2 of this thesis we will study some architectural issues for digital multicasting over wireless systems. The performance of these systems is often limited by the worst user in the network. We observe that multiple transmit antennas provide substantial gains in these systems as they greatly improve the quality of the worst channel. We also propose a robust architecture that uses rateless erasure codes. This study provides new insights into the cross layer design of the application and physical layers. One conclusion we draw is that in wireless systems with rich time diversity, it is not necessarily a good idea to be conservative in the design of the physical layer and aim for small outage probability if there is an ARQ type protocol at the application layer. 10 Another scenario we study is multicasting to multiple groups of users. It is quite natural to conceive examples where instead of a single group, there are many groups of users and different groups of users want a different message. This scenario is a generalization of the conventional unicasting systems and the single group multicasting system. In Chapter 2 of this thesis, we study some coding schemes for the scenario where the base station has multiple antennas and wishes to send different messages to different groups of users. This particular problem leads us to study a multiuser generalization of the dirty paper coding scheme. The classical dirty paper coding scenario has one sender, one receiver and an additive interference known only to the sender. It has been used to solve the problem of unicasting from a multi-antenna transmitter [1, 361. We are interested in the scenario where a transmitter wishes to send a common message to many receivers and each of the channel experiences an additive interference that is known only to the sender but not to the receivers. The transmitter has to use the knowledge of the interference sequences so that the resulting code is good for all the receivers simultaneously. A solution to this particular abstraction provides an efficient solution to the problem of multicasting to multiple groups with multiple antennas. We devote Chapter 3 of this thesis to consider this problem in detail. We view this problem as a multiuser generalization of the single user link studied in [13] where the transmitter has to deal with only one interfering sequence. We refer to this generalization as "Writing on many pieces of Dirty Paper at once". We obtain achievable rates for a variety of channel models and prove capacity theorems for some special cases. This scenario is rich with many open problems and we describe some of them in Chapter 4. 11 Chapter 2 Multicasting from a Multi-Antenna Transmitter In this chapter we focus on the problem of sending a common stream of information from a base station to a large number of users in a wireless network. The scenario where one transmitter communicates to several receivers is known as the broadcast channel problem. A coding technique known as superposition coding is known to be optimal for the Gaussian broadcast channel problem when the transmitter and all the receivers have a single antenna [5]. Recently, this result has been generalized to the case where the transmitter and receivers have multiple antennas and each receiver wants an independent message. The solution uses a technique known as dirty paper coding [4]. This technique was first used for the MIMO broadcast channel in [1]. Subsequent work in [34, 35, 39] shows that this technique achieves the sum capacity point and very recently Weingarten et al. [36] show that this scheme is in fact optimal. However little work has been done when there is common information to be sent in the network. We are only aware of the work in Lopez et al. [23, 20] which studies several schemes for multicasting common information in wireless networks. Before designing a system for disseminating common content in a wireless network, we must estimate the gains we expect from such architectures over existing systems that encode several copies of the same message for different users. It is also important to understand how these gains are affected by our design decisions. For example how do these gains change if we decide to serve only a fraction of the best users, instead of all the users? How does the fading environment and the number of users affect such systems? In this chapter we 12 seek answers to some of these questions. Another consideration is whether the fading coefficients are known to the sender and/or the receiver. Pilot assisted training sequences are often used so that the receivers can learn the fading coefficients. If the receivers have perfect knowledge of fading coefficients then the system is called a coherent communication system while those systems where neither the transmitter nor the receiver have this knowledge are called non-coherent communication systems. In this chapter we only focus on the coherent systems. Providing the knowledge of fading coefficients to the transmitter typically requires explicit feedback from the receivers in FDD (frequency division duplex) systems or uplink measurements in TDD (time division duplex) systems. This knowledge of fading coefficients is known as channel state information (CSI) at the transmitter. It is necessary to assess the value of providing CSI to the transmitter. In this chapter we study the systems with and without transmitter CSI. 2.1 Channel Model for Multicasting We consider a scenario in which one base station wishes to send the same message to K different users. The base station is equipped with L antennas while each user has a single antenna. Thus, each receiver experiences a superposition of L different paths from the base station. We focus on an environment that has rich enough scattering and no line-ofsight between the base-station and the receivers. Furthermore, we consider narrow-band communication systems so that the time interval between two symbols is much larger than the delay spread due to multipath. Under these conditions, the channel gains on each of the L paths can be modeled as i.i.d. complex Gaussian C.f(O, 1) [33]. Furthermore, it is noted in [33] that the typical coherence time for such environments is on the order of a few hundred symbols. The channel gains stay constant for a relatively large number of symbols. The relative magnitude of coherence time compared to the communication delay tolerated by the application has important implications on the fundamental data rates achieved by the system as well as the coding schemes that have to be used to achieve these rates. We present three different scenarios of interest. In our model we consider transmission of codewords consisting of N symbols. We denote symbol i E {1, 2... n} in codeword m by x[i; m]. We use subscript k to denote the channel model of the kth user. (i) Slow Fading: In some applications, the delay requirements are stringent. Several 13 different codewords have to be transmitted within one coherence time interval. In such situations, we assume that the channel gains are fixed once they are drawn randomly. The overall channel model is given by ki; m] = h x[i; m] + wk [i;m] (2.1) The channel gains are each drawn i.i.d. CAf(0, 1), but they are fixed for subsequent transmissions. (ii) Block Fading: If the delay requirements are not too stringent a block fading model is suitable. Here the channel remains constant over a codeword transmission, but changes independently when the next codeword is transmitted. Accordingly, the channel model is given by yk[i; m] = ht[m]x[i; m] + wk[i; m] (2.2) The channel gains are drawn i.i.d. before the transmission of each codeword and assumed to remain fixed during the transmission of the entire codeword. A slight generalization of the block fading channel model is a model where one codeword spans over several blocks (say T blocks). Alternately, in this model, the channel is constant over n/T consecutive symbols and then changes independently for the next block of symbols. (iii) Fast Fading: If the delay requirements are extremely relaxed, the transmitter can perform interleaving of different codeword symbols. Accordingly, each symbol experiences a different channel gain and the channel model is given by k[i; Tn = h' [i; m] x [l; m] + Wk [i; Tn] (2.3) In this model, the channel gains are assumed to be drawn i.i.d. before the transmission of each symbol in each codeword. In (2.1)-(2.3), wk[i; m] is additive Gaussian noise CK(O, oa2 ) at receiver k when symbol i of codeword m is received. These equations explicitly reveal how the channel gain changes from symbol to symbol and codeword to codeword. 14 The choice of a good performance criterion for the multicasting system is intimately related to the fading channel model. In the slow fading scenario we can order the users based on their channel gains. However such an ordering cannot be done in the fast fading model since the channel changes independently with each transmitted symbol. All users have statistically identical channels. In such a scenario, it is most appropriate to focus on the ergodic capacity of the channel. Since each user has the same ergodic capacity, it is clear that this rate is achievable for multicasting. The ergodic capacity of such a channel is given by [32] C = E[log(1 + ph t h)] (2.4) In the slow fading scenario, one obvious choice is to serve all the users. In this case the system is limited by the worst user in the network. Such a choice may be necessary if all the users have subscribed to a service. On the other hand in some applications it might be worthwhile not to serve some of the weakest users. This may be necessary to do in applications such as streaming that require that the stream be transmitted at a fixed rate. In the block fading scenario, it is not advisable to serve the weakest user in each block, since the ordering of users changes from block to block. Instead it is attractive to exploit the time diversity available in these systems to improve the overall throughput. In the following sections we discuss achievable rates based on these choices. 2.2 Serving all Users - No Transmitter CSI As discussed before, a key factor that affects system design in slow fading is the availability of channel state information (CSI) at the transmitter. In this section we are concerned with the scenario where the transmitter has no knowledge of the channel. This happens if there is no reverse path from the receivers to the transmitter or if the delay constraints prohibit the transmitter to wait for the measurements from the receivers. Since the transmitter does not know the channel gains, we typically design a code for some fixed rate R. A common figure of merit is the so called outage probability, which we discuss next. 2.2.1 Single User Outage We begin by defining the E-outage probability for a single user link as in [26]. 15 Definition 1 An c-outage probabilityfor a fixed rate R is given by: E= min Pr (I(x; y, h) ; R) (2.5) p(x):E[xtx]<P In the above definition P is the power constraint and I(-; -) is the standard mutual information between two random variables. Note that h is explicitly included along with y, to stress that the receiver but not the transmitter has knowledge of the channel vector. For Gaussian channels, we have max I(x;y,h)= max p(x):E[xtx] log 1+ Ax:tr(Ax)<P P NO NO It is conjectured in [32] that for low outage, the optimal Ax = -[Iand accordingly (2.5) reduces to E= Here p = Pr log 1+ phth) < R) P/a2 is the input SNR of the system. Since the distribution of hth ~ -LX under the assumption of small values of e and high SNR it can be shown that the outage probability is given by [38, 24] LL _ 1 )L (2R e ~! 2.2.2 (2.6) Outage in Multicasting In multicasting, we declare an outage if any user fails to decode. This is clearly an extremely stringent requirement. Nevertheless, even under this case, significant gains can be achieved by using a modest number of transmit antennas. If the transmitter fixes a data rate R, the resulting outage probability is given by: EM = Pr(min{Ihl , 1h2 - - IhK} < R) Here Ihj = I(x; yj, hj) is the mutual information between the transmitter and user j. Since we assume i.i.d. channel gains and e em = 1 < 1, we have (1 16 E)K ~ Ke (2.7) Outage Proababilites as a function of SNR for R=0.1 b/sym and R = 0.5 b/sym 10- s - R=0.1,L=2 --- R=0. 1,L=4 R=0.5,L=4 - - R=0.5,L=2 10 10 10 0 0-6 -10 -5 0 Input SNR(dB 10 15 Figure 2-1: Outage Probability for a Single User in MISO system Combining equations (2.6) and (2.7) we get '9(2.8) - 1) Em 'E~z ~K (2R ( p LL(28 L! Note that if we hold R as constant, K/pL = constant. Thus doubling the SNR, increases the number of users the system can support at same Em by a factor of 2 L. This leads us to the following conclusion Claim 1 For a system with L transmitterantennas, operating at a fixed rate R bits/symbol in the high SNR regime, and small outage, every 3 dB increase in SNR increases the number of multicasting users that can be supported by a factor of 2 L This result essentially says that if we are designing a system that operates with a low outage probability, then we can serve many more users simultaneously by a modest increase in the SNR. It is a consequence of the well known fact that the diversity of a multi-inputsingle-output (MISO) system is proportional to SNR-L in the high SNR regime. From (2.7), it is clear that if we want to achieve a certain Em for the system, the outage probability for 17 each user must satisfy c = em/K. Accordingly the additional SNR should drive the outage for each user from cm to Em/K. This reduction in outage can be achieved by a modest increase in SNR, thanks to the steepness in the waterfall curve. See Figure 2-1. 2.3 Serving all Users - Perfect Transmitter CSI In this section, we examine the other extreme when the transmitter has perfect knowledge of each channel. There are two possible uses of this channel knowledge: (i) Smart array processing to maximize the throughput. (ii) Selecting a rate which all the users can decode. The use of channel knowledge in array processing for multicasting is studied in [201. It is shown that beamforming is optimal for the special case when there are either one or two receivers. However beamforming is not optimal in general and a technique called space- time-multicast-codingis proposed and shown to be optimal for any number of users. The capacity of the multicast network with transmitter knowledge derived in [20] is, C min max Ax:tr(Ax)<;P iE{1,2...K} log 1 + __A__N (2.9) NO The numerical optimization of Ax can be performed efficiently, since the problem is a convex optimization problem. However, the answer depends on the specific realization of the channel vectors and in general is not amenable to analysis. As K --+ 0, to expect that A, -- + !IL, it is natural where IL is a L x L identity matrix. The intuition is that as the number of users becomes large, there is no preferred direction and hence it is favorable to use an isotropic code. We primarily consider the scenario of a large number of users and use Ax = 7IL. In this limit, the channel knowledge is not helpful in performing array processing. We have, R= We set g log (1+ min iE{1,2 ... K} miniE{1,2K} hfh) = log L ' )L + in i hh) (2.10) iE{1,2 ... K} h. Its distribution can be easily calculated from the distribution of h and is stated in the following proposition. 18 Proposition 1 The probability distributionfunction of g = miniG{1,2...K} hhi is given by fg(x) = Kfhth(x)(1 - Fhth(X))K-1 where fhth(-) and Fhth(-) are the probability density and cumulative distributionfunctions of hth respectively. Proof: Pr(g G) = Pr(j|hi|12 > G, |h 2f1 2 > G ... |hK 12 > G) Pr(jjhj1 2 > G)K (Since all the channels are i.i.d.) = (1 = - Fhth(G))K Differentiating both sides, we get the desired result. U Using results from order statistics, we show in Appendix B that as K -* oc, g = mini h hi decays as O(K-i), where L is the number of antennas. This leads to the following proposition, Proposition 2 Forfixed p, as the number of users K -+ oo, the rate common rate R which all the receivers can decode decays as O(K-(+6)) where L is the number of antennas and 6 > 0 can be made arbitrarilysmall. Proof: From (2.10), we have R = min log 1 + - hjh) L iG{1,2...K} min iE{1,2...K} Whh L 6 )) O(K4+ / (For large K, we use the linear approximation) (See Appendix B) U In Figure 2-2, we numerically calculate the quantity E[R] using Proposition 1 and plot the product K - E[R] as a function of K. Note that Proposition 2 suggests that as the number of users K becomes large, the common rate R approaches a non-random quantity. However for any finite K, R is still a random variable and hence we use the average value of R to observe of its decay rate. We see qualitatively from the graph that for L = 1, the average value of R decays as 1/K (since K - E[R] approaches a constant as K increases), but for L > 2 the decay rate of R is much slower(since K - E[R] increases with K). Hence, 19 KE[R] for M=1,2,3,4 antennas for k=5-200 users 150 100 - -e- M=1 -*-+------ M=2 M=3 M=4 M=5 M=6 500 E - 10 50 Number of Users (K) 100 150 200 Figure 2-2: K - E[R] vs. K for L=1,2 ... 6 antennas. We need to take the expectation over R, since we are considering a finite number of users. multiple antennas provide substantial gains in the achievable rate, analogous to the gains we observed in the outage probability in the previous section. 2.3.1 Naive Time-Sharing Alternatives We now compare the achievable rate of the previous section to naive schemes that do not exploit the fact that all the users want a common message. The achievable rates from these schemes in comparison to Proposition 2, will indicate the value of exploiting common information in the system. The scheme described in Proposition 2, requires that all users simultaneously listen to the transmission and is limited by the worst user. Suppose we "pretend" that each user wants a different message and then perform resource allocation to ensure that all users get the same amount of data. We consider two such schemes and show that both these schemes incur a significant loss in the achievable rate compared to Proposition 21. 'Note that time-sharing is in genera not optimal for broadcast channels with independent messages. Ideally we should do dirty paper coding. However we limit ourselves to time-sharing type strategies, since they are easy to analyze and it is known that TDMA is not too bad compared to Dirty Paper Coding [16] 20 (i) Allocation based on time-slots: Users are divided up into time-slots, so that only one user is active in each time-slot. The rate of the active user in a particular timeslot is given by Ri = log(1 + phth2 ), where hi is the received channel vector of the active user. Since all the users need to get the same message eventually, we need to allocate the time-slots inversely proportional to their rates. This can be expressed mathematically as: a1R1 such that EK 1 a= Rts,1= = Ce2 R 2 =... =aKRK 1. This gives 1 1- 1 R2 1 (R 1 ,- ,RK) < K2 RK K Ri (2.11) i=1 In (2.11), 'H denotes the harmonic mean of R 1 ... RK. Using the fact that as K -+ oc, k Ri - E[log(1 + phth)] it follows that R, }E[log(1 4 + phth)]. Note that the term inside the logarithm increases with the number of antennas L, but this is only a secondary effect. The rate decreases as 1/K for any number of transmit antennas in contrast to Proposition 2 where the rate decreases as (1/K) . (ii) Allocation based on power-levels: Users are each assigned an equal number of time-slots but the power level assigned to each user is inversely proportional to the channel strength. Accordingly, we have that ai||h1|2 = a2|h2 1 2 = ... = aK||hK 2 where the average power constraint requires I Rt,,2 = - 110 1K K 1+ Kf 1 ai = p. This gives p +hthK) Kp h-+ 1 +- Since log(1 + 1/x) is a convex function in x, it follows that Rt, 2 log (1 + phthi) k i=1 21 As K -+ oc, it is clear that Rt,2 < -E[log(1 + phth)]. Again the rate achieved through this scheme decreases as 1/K for any number of transmit antennas. Note that in these schemes, we do not have to transmit at the rate of the worst user all the time. Despite this seeming advantage, these schemes perform poorly. The intuition behind the poor performance of the time-sharing based schemes is that we have to allocate more resources to the worst user (in terms of time-slots or power levels). It hurts if the users do not listen in all time-slots. Thus significant gains can be achieved by designing appropriate protocols for multicasting. 2.4 Serving a fraction of Users In this section, we relax the requirement that all the user have to be served and find the achievable rate when a faction a of the weakest users is not served. We observed in the previous section that the rate decreases as O(K-L) if all the users have to be served. How does the rate improve if we decide not to serve a certain fraction of the weakest uses? In order to perform analysis and develop some insights, we again consider the limit of a large number of users. Let ]Peff be the effective target SNR with the corresponding rate R = log(1 + preff). An outage event occurs with probability a = Pr(hL (2.12) Feff) The usual notion of outage on a single user link is the fraction of time the channel is not strong enough to support a given rate. If the channel is in outage, the receiver cannot decode at all and hence we would like this probability to be as small as possible. In the multicasting setting, outage has an alternative interpretation. It is the fraction of the weakest users who are not served. Thus in multicast, there is no reason to restrict attention to small values of outage. It allows us to study the tradeoff between the fraction of weakest users that are not served and the rate that can be given to the other users. Since 11h 1 2 _ L, the relation between the effective channel gain, 1 ef and outage probability, a is given by the following expression: 1 - a = 1 - FIjhj2(LFeff) = e-LFeff (1 + LFeff + Llef1 2 22 ... (LPe )L-1 (L - 1)!) (2.13) In general, the above equation does not have an explicit solution for reff, however we can consider the extreme cases (i) a - 0: In this case, we expect reff to be small since we are serving almost all users. Using ['ff < 1 we have from (2.13), reff , LL (ii) a -+ 1: In this case, we expect reff to be large, since we are serving only the best users. Accordingly, we have from (2.13), 1 - a ~ e- L ff -Llogff) log(1 - a) L eff and that 2 (2.14) Equation (2.13) suggests that a large number of antennas actually hurts the performance when we aim to serve only the strongest users (a -+ 1). The intuition here is that when we are serving the strongest users the more favorable distribution of the channel gains must have long tails in the right extreme. By having many antennas we average the path gains to each receiver and hence the extreme tails decay faster in both directions(see Figure 2-3). This feature helps if we decide to serve most of the users but is not desirable if we decide to serve only the best users. Note however that this observation comes with a caveat. We assume that the transmitter uses a spatially isotropic code for serving (1 - a)K users in the network. For values of a close to 1, in practice we are serving only a finite number of users. In this case a spatially isotropic code (i.e. A, = 1IL) should not be used. It is worthwhile that the transmitter learns the channel gains of the best users and does beamforming or space time multicast coding proposed by Lopez [20]. In other words, if we want to serve a fixed number of the best users then multiple antennas can again be useful albeit for different reasons. 2.5 Multicasting Using Erasure Codes Our treatment so far, has not addressed the important question of how large an outage the system must tolerate. An obvious tendency is to keep the outage probability small. Such 2 Here the approximation a ~ (b) is in the sense that limb-,, 23 1 0.7 0.6- 0.5- 0.4- 0.3 I 0.2- I 0.1 0.5 0 1 2 1.5 r 2.5 3 3.5 4 Figure 2-3: The pdf of F with L=2,8. The mean of both distribution is the same, but the pdf for L=8 has much shorter tails. This helps when we aim for low values of F, but hurts when we aim for very high values of F Group I K, users Group 2 K2 users Group 3 K users Group 4 a users Group 1 1 users Group 2 K2 users Group 3 K users Channel Coherence Time Figure 2-4: A time division protocol for serving 4 groups of users. Each group has multiple users that want the same content. The channel coherence time is large enough so that the channel gain is constant in a given block but small enough that successive blocks to a given group have independent gains. a design is however conservative as the corresponding target rate can be very small. When the channel is good, we can decode much more information than the conservative rate. If a system designer has the choice to pick the outage probability is it better to pick 1% or 10%? Such a choice is important as different outages imply different achievable rates. In order to get some insights into this particular problem, we need to pose the question in a broader context and consider the overall system architecture. A bigger picture that involves crosslayer design is necessary as it lends some important insights on the interaction between the physical layer and higher layer protocols. The application we consider is distributing common content to a group of users from a wireless access point. The access point serves many such groups in a round robin fashion as shown in Figure 2-4. Due to the delay between successive periods in which a particular group is served, the block fading scenario (2.2) is a suitable model. Each user experiences 24 an i.i.d. Rayleigh fading channel which stays constant within a given block and changes independently in the next block. It is well known in the information theory literature [33] that the optimal encoding technique over a block fading channel is to jointly code over a very large number of blocks. The receiver knows the fading coefficients and incorporates this knowledge in maximum likelihood decoding and this scheme achieves the ergodic capacity (2.4). Even though this scheme is information theoretically optimal, there are many issues that limit the use of this scheme in practice. The physical layer implementation requires use of practical error correction codes that can be decoded at variable SNR. Unlike the AWGN case, practical capacity approaching codes over these channels suffer from complexity constraints. Perhaps more serious is the fact that the code we use is a fixed block length code and there is a finite error exponent associated with the code even if we code over a very large number of blocks. In order to deal with an error event at the physical layer, a higher layer protocol has to be implemented. This can be one of the following forms: " Automatic Repeat Request (ARQ): If there is only a single receiver a particularly simple feedback based scheme can be used to deal with the physical layer outage. If the receiver is not able to decode at the end of the transmission, it sends a retransmission request to the sender. While such a feedback cannot increase the information theoretic rate, it is known to improve the error exponent [813. " Forward Error Correction(FEC): When there are many receivers, the ARQ protocol cannot be used because different users lose different symbols. One approach is to use an erasure code as an outer code. The original source file is first converted into erasure symbols each of which is then encoded at the physical layer using a suitable channel code. The receivers can recover the original source file if they are able to receive a sufficient number of the erasure code symbols. While a significant amount of literature has been devoted to ARQ based schemes on block fading channels (see [22] and references therein), relatively little work has been done to our knowledge on FEC based schemes. The main problem is that traditional erasure codes are block codes and need to be designed for a specific rate which has to be chosen aprori [8]. Hence the problem of outage is not completely solved since there is still a chance that some users with weak channel cannot decode even after the FEC code is used. Fortunately 3 This particular analysis assumes perfect feedback with no errors. 25 an elegant solution to this problem is to use an incremental redundancy rateless erasure code. These codes, also known as fountain codes were suggested in [21],[29] for multicasting over the internet. Erasure symbols are generated from the source file in real time as long as any of the receivers are still listening. Unlike the traditional block erasure codes, we do not need the knowledge of erasure probability to generate these symbols. A receiver can recover the original file after it has collected a sufficient number of symbols. Thus, rateless codes allow variable rate decoding. A receiver is "online" until it collects a specific number of erasure symbols and then recovers the original file. Using the rateless erasure code as an outer code preserves the advantages of the ARQ based system without requiring feedback of which specific symbols were lost. 2.5.1 Architecture Using Erasure Codes We now describe an architecture that uses a rateless erasure code as an outer code. Each erasure symbol is sent over the channel using a standard channel code. In order to allow successful implementation of the channel code, the erasure symbol must be over a large alphabet so that it provides large number of information bits. An important consideration is how many channel blocks (K) should be used for transmitting each symbol. The case of large K is analogous to the ergodic capacity achieving scheme we discussed earlier. It is however of practical interest to consider the case of K = 1. In this case we can use existing good codes for AWGN channel for channel coding. The idea is to fix a data rate R which depends on the alphabet size of the erasure code. This would in turn result in an outage probability F. The average throughput achieved in this scheme is (1 - E)R. Optimization of the average throughput yields the best possible tradeoff between using a large data rate per erasure symbol and producing a small outage probability at the physical layer. The optimal value of E that maximizes this average throughput would answer the question we posed in the first paragraph of this section as to what is a reasonable value for the outage probability in designing systems. We now describe this particular architecture in detail and calculate the optimal value of E. Our focus is initially on the case when K = 1. In each block we send one erasure symbol encoded using a standard AWGN code. The case of large K will be dealt subsequently. (i) The transmitter converts the file into a very large number of erasure symbols each 26 consisting of nR information bits (n is the size of a block), using a rateless erasure code. The erasure code symbols 4 can also be generated in real time. The encoding and decoding complexity of these codes in near linear. (ii) In each block, the transmitter attempts to transmit one erasure symbol. It encodes this symbol using a suitable AWGN channel code at rate R. (iii) Each receiver then tries to decode the packet received in each block. If its instantaneous mutual information in the block is higher than nR, it succeeds and the packet is decoded and stored. Otherwise an error is declared and the packet is discarded. (iv) When a receiver obtains enough packets, it is able to decode the original file using the decoding algorithm for the rateless erasure code. If the original source file has nT bits then [T/R] + 6 symbols are sufficient to generate the original file, where 6 is a small constant depending on the choice of a particular code. Note that this particular architecture assumes a feedback from the receiver. The transmitter has to know of whether any receiver is listening to its packets. In practice there is always a handshaking protocol and session establishment between the receiver and the transmitter. So this requirement is naturally available in most systems. There are many practical observations that favor such an architecture. " Robustness to Channel Modeling: The architecture we presented is adaptive to varying channel conditions. If the channel is weak, the user has to wait longer until it receives enough erasure symbols. Conversely if the channel is strong the waiting time is short. By overcoming the problem of outage experienced in fixed block codes, this scheme is far less sensitive to channel modeling. The choice of channel model does affect the optimal outage probability we aim in each block, but this mismatch is not detrimental to the performance of the system. * Computational Complexity: There is a useful separation between generating the erasure code symbols and AWGN codewords. The time as well as complexity requirements for encoding and decoding of erasure symbols are near linear. Moreover efficient 4 Note that the erasure code symbols has to be over a very large alphabet since each such symbol must provide nR information bits to the inner AWGN code. From now on, we will refer to these as erasure symbols with the understanding that they are over large alphabets. 27 iterative algorithms for decoding the inner AWGN code are now widely available as well. On the other hand if one were to directly encode the source file for a block fading channel code the practical implementations are not as efficient, as discussed earlier. * Better Error Exponent: In a fixed block length erasure code was used there are two sources error (i) error event in decoding the inner AWGN code (ii) error event in the outer erasure code when sufficient erasure symbols are not received. By using a rateless code, and assuming that the receiver has a perfect feedback channel to indicate when it is done, we eliminate the second event and improve the error exponent 5. This type of improvement is analogous to the improvement in the ARQ based schemes [9], [6](pg. 201) for single user links. Analysis of the Achievable Rate 2.5.2 We now present some analysis of the achievable rates using the architecture we just described. We model the channel of each user as an independent block fading Rayleigh channel. The transmitter has L antennas while each receiver has one antenna. Suppose we decide to send an erasure symbol with nR information bits in each block. The probability of erasure depends on the choice of R. Let e denote the probability that a packet gets lost for any given user. The average throughput for each user is then given by C We choose the optimal R that maximizes E where (1 - e)R. C. The probability of outage is given by: = Pr ((log(1 + 11hfI P) < R) = Pr (11h112 < Le 1eGeff = 1 1 + Geff + Geff 2 2 GeL(L - 1)!) 6 Ge L (2.15) Here Geff refers to the effective channel gain that we aim for based on our choice of R. If we aim for the ergodic capacity for example, then Geff = 1. The overall throughput is given 5 Note that in practice it is not true that the backward channel is perfect. There is always going to be some error. In both slow fading and fast fading, we can show that the error exponent still improves with feedback. 6 In this section we take logarithms to the base e for simplicity of calculations. 28 by O(Geff) = 1 + Geff + Geff 2 2 e-Geff GeffL) (L - 1)! log (2.16) 1 + Lff) To develop insight into the effect of multiple antennas on the throughput 0, we consider the case of low SNR and high SNR systems. Low SNR As the SNR A p -* 0, we can make the following simplifications (i) Cerg P (ii) Geff LR (iii) log(,1+ pf P )_ LR Cerg Gf Accordingly (2.16) simplifies to o(Geff) Cerg =r e -G Ge Ge 1+Ge+ L 2 GeffL-I 2 ... (L - 1)! Figure 2-5(a) shows the optimal value of the normalized throughput C/Cerg as a function of the number of antennas while Figure 2-5(b) shows the corresponding outage probability. For L = 1, we see that the optimal Geff = 1, O/Cerg = 1/e and the corresponding outage probability is e =1 - 1/e. Thus the optimal system aims for a large outage probability. As the number of transmit antennas is increased Figure 2-5(a) shows that the optimal value of R decreases initially, reaches a minimum around L = 6 and then slowly increases. The intuition behind this behavior is that the tails of the distribution function of ||h|| become sharper (cf. Figure 2-3). Consequently, by decreasing R we can decrease the outage substantially and the overall effect is a net increase in the throughput as seen in 2-5(a). Even though the optimal throughput increases with the number of antennas, one has to be careful in interpreting these results. We plot the function R in 0 as a function of Cerg 2-5(c) for L = 1, 4, 10 antennas. We note that 2-5(a) only plots the peak values. These plots reveal that even though overall higher gains are achievable with more antennas, the function 0(-) becomes more peaky as L increases. One has to operate very close to the optimal R/Cerg as L increases. Since we are operating in the low SNR regime, this requires us to design and use strong AWGN codes for specific low rates. This may not be possible in practice and Figure 2-5(c) shows that there is a large penalty in the throughput of multiple 29 (a) Optimized Throughput and Corresponding R (b) Optimal Outage 0.6 -- 0. 8 Throughput Target Rate 0.5 2 0 0 a) 0) ca o 0.4 0 E E 0.2 0. 10 5 0.4 0.3 0.2 0.1 0 15 0 L Number of Antennas (c) Achievable Throughput With L=1,4,10 Transmit Antennas -L=1O 0.5 L- - 0.4 0) 115 10 5 Number of Antennas - - L - 0.3 0.2 0.1 A 0 0.5 1 1.5 2.5 R/C erg 2 3 3.5 4 4.5 5 Figure 2-5: Low SNR analysis of erasure codes based Multicasting. (a) The optimal value of CCerg (normalized throughput) as a function of the number of antennas and the corresponding optimizing value of R Cerg (normalized target rate) (b) The optimal outage probability as a function of the number of antennas (c) erg as a function of nerg antennas. 30 R for L = 1, 4, 10 transmit transmitter antenna array if one operates at rates not close to the optimal. Given a code that operates at a certain rate in the low SNR regime one has to select the matching number of antennas to optimize the throughput. From Figure 2-5(c) we see that for R > 1. 5 Cerg, it is better to have a single transmit antenna rather than 4 or more antennas. Thus one has to be careful in interpreting the performance gains from using multiple antennas. High SNR In the high SNR regime, we can make the following approximations: (i) Gef (ii) E=1- (iii) Le Le-' e-e O(Geff) ~ (1 (1+ - < 1 Geff + L 2 ... (L) G! log (I + Pe ff) Using the above approximations, we obtain the following expressions for the optimal parameters: EOPt (2.17) Llogp 1 1 Ropt log p - log log p + O( ) L L (2.18) Copt logp - loglogp - O( (2.19) ) In Figure 2-6, we numerically plot the optimal achievable rates at SNR = 50 dB. Figure 2-6(a) shows the optimal throughput and the corresponding target rate. Unlike the low SNR case, we observe here that the target rate increases with the number of antennas. The intuition behind this fact is that in the high SNR case the distribution of log(jIh||) is what matters and this distribution has short tails on the on the right side of the mean even for L = 1. Accordingly, as seen in Figure 2-6(a),(c) the optimal value of R is chosen to the left of the ergodic capacity. In this regime, the the outage is small. As the number of antennas increases, the target rate and the average throughput both appear to approach the ergodic capacity according to I log log p as predicted by (2.18)and (2.19). Figure 2-6(b) shows the corresponding outage probability, which decreases according to 1/L as predicted by (2.17). L = Figure 2-6(c) plots the achievable throughput as a function of R - Cerg for 1, 4, 10 antennas. Analogous to the low SNR regime, we observe that the gains from having multiple antennas are prominent only if we operate close to the optimal R. 31 (b) Optimal Outage (a) Optimal Throughput & Target Rate - 8 0.12 E 0 0.1 6 0.08 Throughput (C) Target Rate 0 4 E Q- 0 2 0 0.06 0.04 0.02 10 5 0 15 Number of Antennas 10 5 15 Number of Antennas (c) Achievable Throughput with L=1,4,10 Transmit Antennas 10 r . 8 CL -c L ...*'' **.. - - L=1 -L=4 SL=1O 6 4 2 II -5 -4 -3 -1 -2 R-C 0 1 2 erg Figure 2-6: Analysis of of the erasure code based multicasting at SNR=50 dB. (a) Optimal value of C as a function of the number of antennas and the corresponding optimizing value of R (b) The optimal outage as a function of the number of antennas (c)C as a function of R - Cerg for L = 1, 4, 10 transmit antennas. 32 Coding over Multiple Blocks: So far we have considered using one erasure symbol per block and an AWGN channel code. As discussed earlier, instead of using an AWGN code we could have the inner channel code span over multiple blocks and exploit the time diversity. The transmitter could use an erasure symbol over a larger alphabet with nKR information bits. If the total mutual information over the K blocks is less than (nKR) bits then the entire packet is discarded. The ergodic capacity approaching channel code requires K - 00. However, as discussed in the earlier part of this section practical schemes should not aim for a very large K. As we increase the number of blocks, we expect to approach the ergodic capacity. We analyze the performance improvements for a single antenna transmitter in Appendix A. We argue that for large K and high SNR, the achievable rate by coding over K blocks is Oopt = Cerg - 0 ( log log p) (where 6 > 0 can be made arbitrarily small). This shows that even with a single antenna, we can perform reasonably well by coding over a small number of blocks. 2.5.3 Layered Approach and Erasure Codes We now discuss another generalization of the communication technique using erasure codes which uses the layered approach suggested in [31]. In the schemes that we have considered so far we send at most one erasure symbol per block. If an outage occurs in a given block, then the receiver misses the corresponding symbol. The main issue with such a construction is that if the receiver is in outage, it misses the entire symbol. Ideally the receiver should be able to decode as much information as possible depending to its channel strength. The layered approach is a scheme where the users can recover a certain fraction of the information depending on their channel strength. However we argue in this section that the gains from the layered approach are not substantial. The layered approach over slow fading channels for MISO links is suggested by Steiner and Shamai [31]. The authors compute the average throughput for their scheme but do not explicitly consider the use of erasure codes to achieve it. The main idea in [31] is to send many different codewords in each block. For simplicity let us consider sending two different packets at rates R 1 and R 2 over each block. Note that the channel with multiple transmit antennas is a non-degraded broadcast channel and hence successive cancelation cannot be performed at the decoder. If the channel is very strong so that we can decode both the packets we get a total rate of R 1 + R 2 . This is accomplished by performing joint 33 decoding between the two codewords. If the channel is not strong enough to perform the joint decoding, an attempt is made to decode each codeword, treating the other as noise. If this succeeds then at least one packet can be recovered. If the channel is extremely weak, none of the codewords can be decoded. Thus a receiver can receive either 0,1 or 2 packets in each block depending on the strength of the channel. By collecting more packets, we reduce the average time necessary to collect sufficient number of packets to decode the source. An optimization is performed over the possible rates R 1 and R 2 in [31] to maximize the average throughput. This is precisely the rate we achieve using an outer erasure code. As we add more and more layers, we expect to achieve higher gains at the cost of higher complexity since more layers enable the receiver to get on average more information in each block. However, it is shown in [31] that the gains are diminish quickly for more than two layers over a wide range of SNR . Two layers provide some noticeable gains over a single layer, for moderately large SNR > 20 dB. (See Figure 1, [31]). Adding multiple antennas does not provide a dramatic gain in capacity as observed in [31]. Largest gains are achieved when going from one to two antennas and the gains quickly diminish, similar to our observation from the single layer scheme. One open problem in implementing the multilayered broadcast approach the design of compatible rateless erasure codes. One has to generate erasure symbols with different alphabet sizes for the two layers. The decoder has to be able to use both types of symbols to efficiently reconstruct the original file. The current implementations of rateless codes all generate symbols over a single alphabet. In absence of such codes, one could use current rateless codes over very small alphabet sizes. Erasure code packets on each layer can be generated by using a collection of these small symbols. The number of symbols in each packet is proportional to the rate assigned to that layer. However, having erasure symbols over a small alphabet is not efficient from implementation point of view since it incurs large overheads. Another problem with the broadcast approach is that the the receiver has to perform joint decoding of the codewords which is prohibitively complex. 2.6 Multicasting to Multiple Groups So far, we have focussed on the case in which all the users want a common message. In this section, we consider a generalization to the case when different groups of users want different 34 messages. More specifically, there is one sender and many receivers. Each receiver belongs to a specific group and all users within a given group want the same message. If there is only one group, then the scenario degenerates to the case we have studied in the previous sections. On the other hand, if the size of each group is one then the problem reduces to that of sending independent messages to different users. In this section we discuss the general problem when there is more than one group each with more than one user. The problem is still open. We discuss some problems that need to be solved for this generalization and provide some motivation for the next chapter of this thesis. We begin by reviewing the known schemes for transmitting independent messages to different users and then consider the more general case of multiple groups. 2.6.1 Transmitting Independent Messages to each user In this section, we review the scheme for sending a separate message to each user. This scheme was suggested by Caire and Shamai [1]. For simplicity, we consider the case of a 2 x 2 system where the transmitter has 2 antennas and each of the two receivers has one antenna. The channel model is given by (2.20) y = Hx + w We assume a slow fading model. Each entry of H is independent and CA/(O, 1). The transmitter knows the realization of H. Also we have w - CK(0, I) and E[xt x] < P. The main idea is to perform a LU decomposition of H. So that H = LU, where L is a lower triangular matrix and U is an orthogonal matrix. Such a scheme is optimal at high SNR. For general SNR an MMSE generalization of this factorization has to be performed. If m is the message to be transmitted, we set x = Utm. Accordingly, we have: [Y lF 1, Y2 Y The transformation x = L21 0 M 122 M2 +(2.21) W2 m L w Utm involves rotation by an orthogonal matrix, hence the power constraint and the noise variance are preserved. The result of this transformation is that user 1 receives an interference free AWGN channel. 35 On the other hand user 2 experiences interference from receiver 1. The channel Y2 = 12 1 m 1 +1l 22 m2 + W 2 has 12 1 mI as additive interference. Since the transmitter knows the message of user 2, this interference is known at the transmitter and a well known scheme called dirty paper coding [4] is used to code for this user. The capacity of the channel is same as if the interference did not exist. This scheme can be easily generalized to the K user channel and was shown to be optimal recently [36]. 2.6.2 Multiple Groups In this section we consider the case when there are multiple groups of users and each groups wants a different message. To simplify the exposition, we consider the case of a 4 x 4 system with two groups and two users in each group. The input x is a 4 x 1 vector but there are only two messages. How should we map the input message symbols to x? Inspired from the independent message case in the previous section, we consider a linear transformation x = Vm where 7 V is a 4 x 2 matrix. In order to satisfy the power constraint, we require tr (VtV) < 1. Also we select V such that L = HV is block triangular as shown in the following equation: Y1 111 0 Y2 121 0 Y3 131 132 m2 W3 Y4 141 142 m W4 Wi M + W2 (2.22) w L y 1 By performing the above transformation, receiver 1 and 2 have interference free channel, so message 1 can be decoded perfectly. On the other hand receiver 3 and 4 have known interference on their channels. How can the transmitter cope with this known interference? The transmitter has to simultaneously deal with two interference sequences now. If we had the special case that 131 = 141 and 132 = 142 then, both the channels would be equivalent and one can perform dirty paper coding. However, this choice of V is restrictive. Nonetheless, we argue that even this choice of V is attractive for some special cases. We make use of the following rule of thumb: 7 We do not use the same notation U as before because the matrix V it is not orthonormal. We explicitly need to ensure that the power is preserved by imposing the trace constraint. 36 Rule of Thumb: If an N antenna transmitter is used to null out in the direction of t < N users (assuming i.i.d. Rayleigh fading), then effectively we have N - t degrees of freedom. The above rule is motivated from the QR factorization of the i.i.d. Gaussian matrix H. The diagonal entries of R have X( )degrees of freedom. We loose i degrees of freedom by nulling out i random directions. Now consider the 4 x 4 system with two groups that we considered previously. For group 2 we pose a constraint that 131 = 141. This is equivalent to nulling out one direction for group 1. Hence group 1 has three degrees of freedom. For group 2 we only have two degrees of freedom since 112 = 122 = 0. The constraint 131 = 141 ensures that single user dirty paper coding can be done to remove the interference of group 2. With this scheme group 1 enjoys an extra degree of freedom. If there are N groups each with 2 users and 2N transmit antennas, then the ith, i E {1, 2... N} group has (N - i + 2) degrees of freedom. On the other hand if we performed zero forcing to all the other users in each group, we only have 2 degrees of freedom for each group. Thus a simple scheme that employs dirty paper coding and takes into account that users in each group want a common message can do better. The choice of V used in the above argument is somewhat arbitrary. It allows us to apply the single user dirty paper coding technique to the second group of users. We are not restricted to this particular value of V if there is a technique that allows us to deal with more than one interference simultaneously. More specifically, the channel for user 3 and 4 are given by y3 = 13 1 mI + 13 1 m 2 + W3 and y4 = 14 1 mI + 142 m 2 + w4 respectively. We would like to encode the message ml in such a way that both the users have as little interference m 2 as possible 8. This problem is a multiuser generalization of the single user dirty paper coding result which deals with only one user. In the next chapter we study this problem in detail. We consider a broader class of channels: channels with state parameters known to the transmitter and develop some multiuser generalizations to these channels. By analogy to [4], we view this generalization as "Writing on many pieces of Dirty Paper at once". We obtain some achievable rates for the special case of Gaussian channels with Gaussian interference, which we are interested in for the present application. However the capacity of this channel remains an open problem. 8 For the 4 x 4 system we only need to deal with the case when the two interfering sequences for users 3 and 4 are scalar multiples of each other. But for larger systems, we need to consider more arbitrary interferences 37 Chapter 3 Multicasting with Known Interference as Side Information In this chapter, we study some coding techniques for multicasting channels which experience an additive interference known to the transmitter. Channels with interference known to the transmitter model many different applications. In digital watermarking for example, we wish to embed a watermark onto a host signal. The host signal is treated as known interference in encoding the watermark [2]. In the last chapter (see also [1],[36]) we observed that when the transmitter has different messages to send to different users, it can encode one message treating the other as known interference. In this chapter we consider the problem of coding for channels with known interference in detail. This particular scenario is a special case of a broader class of channels, known as channels with state parameters known to the transmitter. We examine this class of channels initially and then specialize to the known interference case. Communication channels controlled by state parameters known to the transmitter were first introduced by Shannon [28]. Shannon studies a point to point channel, whose transition probability depends on a state variable which is known to the transmitter but not to the receiver. In Shannon's model the transmitter becomes aware of the state parameters during the course of transition (i.e. the transmitter has causal knowledge of the state sequence). Gel'fand and Pinsker [13] study a similar channel model where the transmitter has knowledge of the entire state sequence before the transmission begins (i.e. the transmitter has non-causal knowledge of the state sequence). The knowledge of state variables is commonly 38 referred to as side information. See Figure 3-1 for the model of such a channel. Perhaps surprisingly, the problem of non-causal side information has found several important applications. In this chapter, unless otherwise stated, we focus on this scenario and refer to it simply as side information. One of the early applications in this area was coding for defective memory cells [19]. An array of cells, some of which are defective, has to be used to store a file. The write-head scans through the array and locates the defective cells. It uses this side information to encode the file in such a way that a read-head which subsequently reads the memory cells can retrieve the file without explicitly knowing which cells were defective. In [19] the authors propose a class of algebraic codes that are optimal in that they achieve the same rate as a scheme where both the read and write-head have knowledge of the defective cells. The pioneering work in [13] establishes the capacity of channels with non-causal side information at the transmitter in terms of an auxiliary random variable and shows that the result in [19] is a special case of this capacity. The result in [13] was subsequently specialized by Costa [4] for the case where the noise and the state variable are additive Gaussian random variables. Rather surprisingly there is no loss in capacity if the additive interference is known only to the sender. This special result by Costa has been further generalized in point to point settings when the interference is an arbitrary stationary sequence [7] and is now popularly known as coding for channels with known interference. Most of the prior work on channels with state parameters has focussed on point to point links. In this chapter we take the first steps towards a point to multipoint generalization of channels with state parameters. We are interested in the case where there is one transmitter and many receivers. Each receiver wants the same message. We refer to this as a multicasting scenario. Throughout this chapter, we assume that the state sequence is known non causally to the transmitter but not to any of the the receivers. The main challenge in this generalization is that the transmitter has to deal with the state sequence in such a way that the resulting code is 'good' for all the receivers simultaneously. Otherwise the weakest receiver determines the common rate of the system. In many situations of interest, this is in fact a non-trivial generalization of the schemes for point to point links. In section 3.1 we summarize the results of point to point links and in section 3.2 we formally introduce the multicasting channel with side information. We then consider several special cases of the multicasting channel in sections 3.3-3.6. We propose novel coding techniques for these 39 State Gen. Side Information State Variable Message(W) Encoder xn p(yjx,s) . - f(W,S') yn Decoder Figure 3-1: Point to Point channel with state parameters known to the encoder channels and also derive the capacity expressions for some special cases. 3.1 Point to Point channels In this section we review the literature on point to point channels with side information at the transmitter. As discussed before, we restrict ourselves to the case of non-causal side information and refer to it simply as side information. We begin by providing a formal definition of such a channel [13], shown in Figure 3-1. Definition 2 A point to point channel with random parameters consists of an input alphabet X, output alphabet Y and a set of states S (which are all discrete alphabets). It is specified by a transition probability matrix p(yjx, s) and the distribution p(s), Vy E Y, x E X, s E S. The channel is stationary and memoryless if p(yn|xn, sn) = Hip(y lxi, si) and p(sf) = Ujip(sj). We assume that the sender knows the particularrealization of sn before using the channel. A ( 2 ,R, n) code for the channel defined above consists of an encoder S, -+ Xn and a decoder function g : )) -- {1, 2... 2 nR}. f : {1, 2.. .2R} X Here n is the number of channel uses to convey the message and R is the rate. The probability of error averaged over all state sequences and all messages is defined as P, = Pr(g(Y') / W). An expression for the capacity of this channel was developed in [13] in terms of an auxiliary random variable. Theorem 1 The capacity of the point the point channel with random parameters is given by C = max p(UIS),p(XUS) I(U; Y) - I(U; S) 40 S~, NQ) (0,N) Z ~ Y=X+S+Z X E[X2] p Figure 3-2: Dirty Paper Coding Channel Where U is an auxiliary random variable that satisfies U -+ (X, S) |S\ X - 1. = -+ Y and IUI IX + Furthermore, it suffices to restrict X be a deterministicfunction of U and S i.e. f(U, S). The achievability argument uses random binning technique. Fix the distributions p(UIS) and p(XIU, S). Generate 2 n(I(U;Y)-e) sequences i.i.d according to p(U). This guarantees that there is atmost one sequence u", jointly typical with the received sequence yf. randomly partition the sequences un into 2 nR bins such that there are at least Now, 2 n(I(U;S)+e) sequences in each bin. This guarantees that for any sequence sn, there is atleast one X" in each bin jointly typical with it. To transmit message w, find a sequence u' in bin w that is jointly typical with s'. Then transmit xn which is jointly typical with u" and sn according to p(X IU, S). If the decoder finds a unique sequence typical with y' it declares the corresponding bin index to be the message. It declares an error otherwise. The number of sequences and the size of each bin are carefully chosen so that the probability of error goes to zero as n -+ oo. In the above construction R < I(U; Y) - I(U; S) guarantees successful decoding and this proves the achievability. The proof of the converse is non-trivial and is derived through a novel chain of inequalities in [13]. See [12] for a simpler proof. Finding the optimal auxiliary variable is a non-trivial problem in general and depends on the choice of the specific channel model. The above expression has been specialized to an additive interference channel model in [4] who considers the channel model Y where Z - Cif(O, N) is additive Gaussian noise and S - CH(O, Q) = X + S + Z, is additive interference known to the sender. See Figure 3-2. The transmitter also has a power constraint E[X 2] P. The surprising result shown in [4] is that there is no loss in the capacity compared to a channel where both the receiver and the transmitter have the knowledge of the interference. The resulting capacity is log (1+ 21). This rate is achieved through the random binning 41 scheme that used in the achievability proof of Gel'fand Pinsker result and the specific choice of U is given by U = X+aS, where X ~ CJ(0, P) is chosen independent of S and a- PN This particular scheme is refered to as dirty paper coding, due to the title of [4]. Recently this result has been generalized to the case of arbitrary interference [7] and the case of non-stationary Gaussian interference [40],[3]. Most of the work in the literature has focused on point to point channels with random parameters. However multiuser generalizations of these results have received much less attention. Recently Steinberg [30] considered a two user broadcast channel with random parameters, where the users want a separate message and [17] have shown that the achievable region is optimal for the case of Gaussian broadcast channels. However the case in [17] is a very special case that avoids the main difficulty often encountered in multiuser generalization of dirty paper coding where the sender has to deal with multiple state sequences simultaneously. Their coding scheme applies the single user dirty paper coding result to a broadcast channel setting and shows an optimistic result. We are interested in a non-trivial multiuser generalizations where the sender has to deal with multiple state sequences simultaneously. There is typically a loss due to lack of receiver knowledge in such settings. This case is captured when all the users want a common message and it will be studied in the remaining sections of this chapter. 3.2 Multicasting channels In this section we develop a model for multicasting over channels with random parameters. We consider the scenario of one sender and many receivers. All the receivers want the same message. There is no coordination among the receivers. For the sake of clarity, we define the channel for the case of two receivers. The generalization to the k receiver case is straightforward. Definition 3 A two user multicasting channel with random parameters consists of an input alphabet X, output alphabets Y1 S. and Y2 for user 1 and 2 and the set of states (All alphabets are discrete.) It is specified by the transition probability p(y1, y2Ix, s) Vy1 E Y1,y2 E Y2,x E X,s E S. The channel is stationary and ergodic if p(sn) and p(y", y2x, 5 ) = = rlUp(s,) rlip(Yli, y2ixji, si). Furthermore, we assume that the sender knows the particularrealization of s', non-causally before using the channel. 42 Note that the above definition includes the case where the channel of user 1 is controlled by a state Si and that of user 2 by another state S 2 . We take the joint state S = (Si, S2) and set p(s) = p(si, s 2 ). We now define the achievable common rate for this channel. Definition 4 A (2 nR, n) code consists of a message set I f : I x S' -+ X', and two decoders g1 : if the probability of error Pe = Y' -+ I and g2 : Y Pr{g(Y") 4 W or g(Y 2 ) = - $ W} {1, 2, ... 2 nR}, an encoder I. The rate R is achievable - 0 as n -- oo. Here W is the transmitted message and has a uniform probability distribution on I. The probability of error, Pe is averaged over all state sequences in addition to the usual averaging over all codebooks and messages. An obvious generalization of the random binning scheme for the single user channel yields the following achievable rate: Theorem 2 An achievable rate for the multicasting channel with random parameters is given by: R= max p(UIS),P(XIU,S) {min(I(U; Yi), I(U; Y2 )) - I(U; S)} Where U is an auxiliary random variablethat satisfies U -> Y 2 . Furthermore the size of alphabet for U is at most (X, S) lXi+I -> Y 1 and U -+ (X, S) -> + 1. The achievability argument is analogous to that in the single receiver case in Theorem 1. The two decoders, upon receiving y" and yn respectively attempt to find the same sequence u". Accordingly we can only generate appears in the above expression. 2 n(min(I(U;Yi),I(U;Y 2 ))) sequences u" and hence min(-) Is the above achievable rate optimal? Since we are restricting ourselves to schemes where both the receivers are required to decode to the same un sequence, one would expect that such a restriction leads to sub-optimality of the achievable rate. We show some examples in the subsequent sections where this particular scheme is superseded by schemes that use separate codebooks for different receivers. In the following sections we study specific channel models and derive some new achievable rates. 3.3 Binary Channels - 2 User noiseless case In this section we consider the case of one sender and two receivers over binary channels. Each receiver experiences additive (modulo 2) interference. Both the sequences are known 43 S1 ~ B(q) Y1 = X +S1 Decoder 1= W Encoder WI X Y2 + S2~- =X + S2 Decoder 2 W2 , (q) Figure 3-3: Two User Multicasting Channel with Additive Interference to the sender but not to any of the receivers 1. There is no additional noise on either of the channels. The scenario is shown if Figure 3-3. The two user binary channel model is the simplest non-trivial starting point for multicasting channels. The communication scheme is trivial if there is only one sender and one receiver. The sender will simply XOR the codeword with the interference sequence and transmit it. The receiver does not have to know the interference sequence as it receives a clean codeword. For the two user case however, the sender has to simultaneously deal with two interferences. It cannot 'clean' both the interferences simultaneously and it is not trivial what is the best scheme in such a situation. Motivated by the simplicity of the single user scheme, it could do time-sharing between the two users. This yields a rate of 1/2 bits/symbol for each user for any Si and S 2 . Another simple scheme is to ignore the two interference sequences. In this case, each of the channels is a binary symmetric channel and this yields a rate of {1 - max{H(Si), H(S 2 )}} bits/symbol, where H(-) is the binary entropy function. It turns out that both these schemes are sub-optimal. The following theorem presents the capacity of this channel: Theorem 3 The capacity of two user binary channel with additive interference, shown in Figure 3-3, is given by C = 1 - - H(Si e S2) 2 (3.1) 'For this special case, we observe that there is no loss if the sender learns S, and S2 causally. The non-causal knowledge does not buy us anything. 44 - - Time Sharing .... 0.9 Ignoring SI - Capacity 0.45 0.8 0.7 0.6 E -- 0.5 0.4 - - 0.30.20.1 - Al 0 0.05 I I I I I I I 0.1 0.15 0.2 0.25 0.3 0.35 0.4 I 0.5 q=Pr(S.=1) Figure 3-4: Achievable rates for the two user multicasting channel when Si and S 2 are i.i.d. The x-axis is Pr(Si) = 1 We note the achievable rate is superior to what can be achieved by time sharing and ignoring side information. Figure 3-4 compares the achievable rates of timesharing, ignoring side information and the optimal scheme when Si and S 2 are independent and identically distributed. On the x-axis we plot the probability that Si = 1. The y-axis plots the achievable rates of the three schemes. We note that substantial gains are achieved using the optimal scheme for intermediate values of Pr(Si = 1). In order to prove the above theorem we describe a coding scheme that achieves the capacity and establish a converse. 3.3.1 Coding Theorem We first explain the basic idea of the coding scheme informally. The sender selects a codeword c to be transmitted to the users according to the message to be transmitted. The channel input x is generated from this codeword c and the interference sequences s, and S2. We XOR the first half of the symbols in c with the corresponding symbols of s, and 45 n C s! x Channel 2 Channel 1 Clean Noisy Noisy Clean Decode Decode C C Figure 3-5: Coding for two user multicasting channel the remaining half of the symbols in c with those of S2 as shown in Figure 3-5. The main steps of the coding scheme can be described as follows " Generate 2 ,R codewords of length n. Each symbol is generated i.i.d Bernoulli(1/2). " To send message w E {1, 2 ... 2 nR} select codeword c. " Using the knowledge of the interference sequences si and S2, generate the transmit sequence x according to the following relation c(i) e s1(i) cW(i) ( s 2 (i) for i = 1,2, ... (3.2) 2 for i = n+1, . . . n " The first half block of the received sequence yi at receiver 1 is clean while the second half block is corrupted by the noise Si ( S 2 . Receiver 1 uses this knowledge of the reliability of the two blocks in decoding. (The conclusions at the second receiver are reverse). We can express yi as follows =(3.3) yi W CW() s1 (1) (Ds2(2*) 46 for i = n +,... n * Receiver 1 finds a codeword cb that is jointly typical with yi according to (3.3). The corresponding joint typical set of sequences c and yi is: A, = {(c, yi) : c and yi are individually typical according to Pr(C) and Pr(Yi) and (c, yi) is jointly typical according to (3.3)} Analysis of Probability of Error We now compute the probability of error under the above coding scheme. In what follows, we use the symbol - to denote two sequences that are jointly typical. Lemma 1 If c and y1 are two independently selected sequences with same marginal as a then typical pair (c,y) Pr((c, y1) (E A,) < 2-n1 H2 eS) Proof: Pr((c,yi) E Ae) Pr(c(1 : n/2) ~ y1(1 : n/2), c(n/2 + 1: n) ~ y 1 (n/2 + 1 : n)) Pr(c(1 : n/2) ~ y1(1 : n/2)) Pr(c(n/2 + 1 : n) - y,(n/2 + 1 : n)) = = <2= 2-!If(C;C(DsiDs2) (cf. (3.3)) -n(1--1H(Si(DS2)) Without loss in generality assume that message 1 is transmitted. Let Ei denote the event that message i is decoded by receiver 1. Now we calculate the probability of error as Pr(W,4 W1 ) = Pr(E' U E 2 U E 3 ... E 2 nR) 2 nR < E + E Pr(Ei) (Union Bound) i=2 < E + (2E - 1) 2 -n(1- < E + 2 -E H(Si eS2 )) (From Lemma 1) Finally from symmetry, we note that the overall probability of error is Pr(W 4 W 1 or W = W 2 ) < 2Pr(W 5 WI) approaches 0 as n -* oo. This proves that the rate claimed in Theo- rem 3 is achievable. 47 3.3.2 Converse To complete the proof of Theorem 3, we need to show that any reliable scheme cannot exceed the capacity expression. nR = H(W) < I(W; Yjn) + nn < (Fano Inequality) n - H(Y1"|W) + nc Similarly, we have nR < n - H(Y2"lW) + nca. Combining the two inequalities, we have: nR= < K < n - max{H(YI |W), H(Y2|W)} + nEn 1 n - { H(Y1"|W) + H(Y2"|1W)} + nrn 2 1 (Conditioning reduces entropy) n - 1H(Y,, Y 2 IW) + nEn 2 = n--H(Y Ye 1 IW) + ncn 2 = n-1-H(SnS)+n) = n(l-1--H(Si E)S (Data Processing Inequality) (Yi ( Y 2 = S 1 ( S 2 ) 1 2 )+ 2 En) This establishes the upper bound and proves the capacity theorem. 3.3.3 Discussion of the Coding Scheme In this section we provide some additional insights to the coding theorem of Section 3.3.1. From each receiver's point of view, the channel can be thought of as a channel with block interference. Half the time, the channel is noiseless and half the time it is a BSC with cross-over over probability Si e S2. Furthermore the receiver knows apriori the reliability of each symbol. The achievable rate with this structure is given by [37] R < 1 1 -1(C;YIno noise) + I(C; YInoisy) < +111(1 - H(Si E S2)) = - 1H(Si E S2) Since the sender cannot cancel the interference completely, it is in effect providing side 48 information to the receivers about the reliability of different blocks by deciding beforehand the how to cleanup the two channels. This knowledge is then exploited by the receiver to achieve higher rates. However such receiver side information comes at a cost. While cleaning one receiver we are actually injecting interference into the channel of the other receiver. The noise in the noisy block of receiver 1 is S1 E S2, instead of Si for example. It is not obvious if this is a good idea. For the two user binary case, we have shown that this is indeed an optimal scheme. For more than two users we show in Section 3.4 that this we can actually improve the achievable rate by choosing a cleanup schedule after observing the interference sequences. 3.3.4 Random Linear Codes are Sufficient The argument that random linear codes are sufficient to prove the coding theorem is analogous to that given by Gallager for the BSC (sec 6.2)[11]. To prove the random coding theorem, we used the following properties " Two codewords ci and c3 are mutually independent Vi 0 j " Each codeword ci is a sequence of i.i.d random variable. A random linear (n, k) code specified by a k x n generator matrix G is the set of 2 k codewords x = uG Vu E {0, 1}k. The entries of G are i.i.d Bernoulli(1/2). It is clear that the resulting codewords are an i.i.d sequence of Bernoulli(1/2) random variables. It can also be shown that any two codewords are mutually independent [11]. This argument leads to the following lemma Corollary 1 A random linear code is sufficient to achieve the capacity of Theorem 3. 3.3.5 Alternative Proof of the Coding Theorem In this section we give an alternative proof for the coding theorem. From Theorem 2, we know that the following rate is achievable R= max p(UIS),P(XIU,S1,S 2) {min(I(U; Y), I(U; Y2)) - I(U; Si, S2)} 49 (3.4) Since we are free to choose the auxiliary random variable, we set U to a specific expression that achieves the desired rate 2. Let A ~ Bernoulli(1/2) and U = {U1, U 2 , U 3 , U4 }. U = A{U 1(X e Sl) + U2 (Xe S 1)} + {U (3.5) S 2 ) + U4 (XE S 2 )} 3 (X Substituting (3.5) into (3.4) and some algebra yields the desired achievable rate. Note that with this choice of U, it follows that I(U; Si, S 2 ) = 0. This means that only causal side information is sufficient to achieve the rate. This is consistent with Equation 3.2 where each x(i) is only a function of s 1 (i) and s2 (i) and not on their future values. In this case the achievable rate of Theorem 2 is tight. 3.3.6 Practical Capacity Approaching Schemes Side Information si zI Viterbi Decoder for Block Symbols with variable reliabilities. /I Msg Good Codes C Scheduler Msg X for BSC y2 Side Information s2 Viterbi Decoder for Block Symbols with variable refiabilities. z2 Figure 3-6: Architecture for 2 user Binary Multicasting Channel The coding theorem used involves a separation result. The choice of codeword c is simply a function of of the message to be transmitted. The knowledge of si and S2 is not used in this step. Thereafter the transmitted sequence x is generated from c, si, S2 according to (3.2). Hence there is a separation between the codeword selection and processing the side information. This separation can be exploited in designing practical systems that achieve capacity. Good codes for BSC channels already exist. They can be directly used for this application. The only additional requirement is to generate the sequence x using the side information according to 3.2. This processing is done by the scheduling block in 3-6. Standard decoding algorithms such as the Viterbi Algorithm or iterative decoding for block interference channels can be used at the receivers. Each decoder knows the statistical reliability of the symbols it receives. Accordingly it can adjust its cost function in decision making to take into account variable reliability of the symbols in different blocks. 2 This expression for U was discovered through numerical optimization 50 Msg 3.4 Binary Channels - More than two receivers We now generalize to the case when there are more than two receivers as shown in Figure 3-7. We consider the case when there is no additional noise i.e. Z= ZK = 0 and when S1, S 2 Pr(Ski, Sk 2 - - - Skm) ... SK Z2= = are all identically distributed (i.e. the joint distribution of is independent of the specific choice of k1 , k 2 . . . km E {1, 2 ... K}). We now state the following result for the K user channel: Z1 Si Decodert S2 Z Y2 W Encoder X rchannel is: Decoder 2 ~ --- usrbn = -- - -- - - SK ~ZK ------+ + YK Decoder K WK Figure 3-7: K user multicasting channel T heorem 4 (Outer Bound ) An outer bound on the maximum common rate for the K user binary channel is: R < 1- -H K (Si D S 2 ,S1 E S -... S1ED SK) Theorem 5 (Inner Bound) An achievable rate R for the K user binary channels is given by R = 1- 1 - )H (S, Sj ) The proof of Theorem 4 follows along the lines of the outer bound of the two user case in Section 3.3.2 and is omitted. The proof of Theorem 5 follows through a direct generalization 51 of the two user case in section 3.3.1. The main idea is to divide the codeword into K blocks and clean up each user in one of the blocks. Each user has one clean block and K -1 blocks over noisy BSC with crossover probability q A = Pr(Si 82 = 1). In general our inner and outer bounds are not equal. However they do agree in the following cases: Corollary 2 If {S1, S2 - .. SK} are mutually independent Bernoulli(1/2) random variables then the capacity is C =k and is achieved through time-sharing. Proof: The proof follows by observing that when S1,... Sk are independent Bernoulli 1/2, the outer bound simplifies to 1/K, which is clearly achieved through timesharing between the users. U Corollary 3 If { S1, S2 . .. SK} are mutually independent Bernoulli(q) random variables(q ; 1/2) then as K -* oo we have C -+ (1 - H(q)) and this is achieved by ignoring the side information at the transmitter. Proof: The proof follows by observing that if S1 (K - 1)H (Si) ... Sk are i.i.d then H(S1 D S 2 ,S 1 e S ... S81 E SK) KH(S,) As K --+ oc, the lower and upper bounds agree and we achieve the desired result. M Note that if the side information is available at the receiver then the capacity is C = 1. There is a loss in capacity when there are K > 2 users. This is in contrast to the single user case where we do not experience any loss in the capacity due to additive interference. Corollary 3 makes a negative statement that side information is not very useful in multicasting to large group of users. How severe is this effect? The answer in general depends on the distribution of the interference sequences. In Figure 3-8, we plot the upper bound of Lemma 3 as a function of the number of users for Pr(Si = 1) = 0.1 and 0.2 and all the Si are independent. We note that in either case a modest number of users is sufficient for the upper bound to approach the limit of corollary 3. The value of side information decreases rapidly when the number of users is reasonably large. 3.4.1 Improved Inner Bound for K > 2 users The inner bound proposed in Theorem 5 is not optimal for K > 2 users. We propose an architecture to improve this inner bound. The intuition behind the sub-optimality of the 52 Upper Bound on Capacity For Binary Noiseless Multicasting Channels with q=0.1,0.2 - -*- 1-H(0.2) UB:q=0.2 UB:q=0.1 0.9 - 1-H(0.1) 0.8- o 0 0 0.7. 0.0- C 0 OL 0.5- 0.4- 0.3 0.2 1 1 1 t 2 3 4 I I 5 6 Number of Users(K) 7 8 9 10 Figure 3-8: Upper bound for K user multicasting channel inner bound of Theorem 5 is that the scheme pre-assigns the blocks to the users. While cleaning up the block for one user, it injects additional noise to all other users. While this scheme is optimal for two user case, it is in general sub-optimal. A scheme that decides the cleanup positions after observing the interference sequences does better. For example, consider a particular channel use when Si(i) = 1, S 2 (i) = 0 and S 3 (i) = 0 and 0 < i < n/3. If the slots were pre-assigned, user l's channel is cleaned up and user 2 and 3 are hurt. On the other hand if a decision were to be made solely based on the observation of these values, then it may be better to serve the majority of the users. In the long run serving majority of users should help, since it would translate to better rates for each individual user. Thus, one would expect that higher gains could be achieved if the cleanup blocks were decided after observing the interference sequences when there are more than 2 users in the system. Throughout this section we consider the case when the interferences Si are mutually independent. To develop further insight into the architecture consider a naive scheme which, in each block cleans up the channel of the majority of users. For instance, in our earlier example 53 C, Y1 Pick Schedule U Mesag $\b Cw $:U E {Ui ... Um} XS2 X(i) = C.(i) +S(i) Sj is chosen by U 2 CT S Scheduler Scheduler-3 CodeWord Selection IW1 (C.,U) - --- -- - -- --- - (C", U) Y3 (Cw, U) - - - - - - - Decoders Channel Figure 3-9: Improved Architecture for the 3 user channel of three users if Si(i) = 1, S 2 (i) = 0 and S 3 (i) = 0 then we select x(i) = c(i). The channel for each user is transformed into a BSC channel with a cross-over probability = Pr(the user is in minority). If there are three users in the system and the interferences are independent then such a scheme yields a rate of 1 - H(q(1 - q)) (where q For small values of q, this rate is better than the rate 1 - 2H(2q(1 = Pr(Si = 1)). - q)) suggested by the inner bound in Theorem 5, however for larger values of q, the latter is better. Is it possible to do uniformly better than the inner bound of Theorem 5? The answer is yes and we propose an architecture that achieves a higher rate for all values of q. For simplicity we focus on the three user architecture. Theorem 5 involves two steps. Selecting a suitable codeword and scheduling the cleanup. See Figure 3-6. We maintain the same architecture, since it is amenable to practical systems. However instead of using pre-assigned schedules as in the two user case, we make schedular block more complex. (See Figure 3-9.) We maintain a list of many cleanup schedules and pick the best schedule after observing the interference sequences. The decoders try out all the schedules and if the number of schedules is small they get a decoding failure for all the schedules except the correct one. We now describe the coding scheme in detail. Coding Scheme " Generate C 1 , C2 .. C2 according to i.i.d B(!) distribution. Given message W pick the codeword C,. " Fix p(UIS1, S2, 83) where U E {U 1 , U2 , U3 }. Generate M = 2 n1(U;siS 2 ,S 3 ) schedule sequences i.i.d according to p(U). If U(i) = Uj then for symbol i, user j's channel is 54 cleaned according to X(i) = C.(i) e Sj(i) as shown in the figure above. Both the codeword and schedule sequences are revealed to all the receivers. " Given the sequence triplet (S1, S2, S 3 ) find a schedule U jointly typical with it according to p(U Si, S2, S3) and select this U as the scheduling scheme. " Generate X = C, e Sj where Sj is determined by the selected U as explained above. " Each receiver upon receiving Yi searches for a pair (CW, U) that is jointly typical according to Pr(Y, C, U) = Pr(C) Pr(U) Pr(Y IC, U), since C and U are generated independently. If it finds a unique such pair it declares the corresponding message, otherwise it declares a decoding failure. Probability of Error Analysis There are two main sources of error " Encoding Error: The encoder does not find a sequence u jointly typical with s 1 , S2, S3. The probability of this event vanishes as n -- sequences > 2 oo if the number of U n1(U;Si,S2 ,S3 ) " Decoding Error: More than one (C, U) pair is jointly typical with the received sequence. This probability vanishes as n -+ oo if the number of (C, U) pairs < 2 nI(C,U;Yi) It follows that having R + I(U; Si, S2, S 3 ) < I(C, U; Y) is sufficient to guarantee vanishing probability of error. Theorem 6 An achievable rate for the 3 user binary channel is given by R < max p(C),p(UIS1,S Where |UI = 2 ,S3 ) min{I(C, U; Y)} - I(U; Si, S 2 , S 3 ) i 3 and U and C are mutually independent. To simplify the expression of Theorem 6, we introduce a new variable N = C e Y. If Ni (i) = 1 then the received sequence has an error at symbol i. Clearly if U = U1 then N, = 0 and there is no error. If U = U2 or U = U3 then N is either 0 or 1. N has the interpretation of effective noise injected in channel 1 based on the choice of a particular sequence u". In addition we impose the following distribution on p(U|S1, S 2 , S3) shown in Table 3.1. 55 Table 3.1: The probability distribution p(U|SI, S2, S 3 ) of cleanup scheduling. Here p is a parameter to be optimized. S 000 p(U=UlIS) p(U=U 1/3 1/3 2 S) p(U=U 1/3 3 H(UIS) log 2 (3) S) p(S) q3 001 1/2-p 1/2-p 2p + H(2p) q7qi 010 011 1/2-p 2p 2p 1/2-p 1/2-p 1/2-p 1 - 2p + H(2p) 1 - 2p + H(2p) q qi q qo 100 2p 1/2-p 1/2-p 1/3 1/2-p 2p 1/2-p 1/3 1/2-p 1/2-p 2p 1/3 1 - 2p + H(2p) 1 - 2p + H(2p) 1 - 2p + H(2p) log 2 (3) q7qi q, qO q qo q_3 101 110 111 1 - 2p For the distribution of U in Table 3.1, we can calculate Pr(Ni = 1JU = U2 ) Pr(Ni = 1JU = U 2 ) Pr(Ni = 1&U = U 2) Pr(U = U 2 ) Pr({010 & U = U 2} or {011 & U = U2} or {100 & U = U2 } or {101 & U = U 2}) 1/3 = 3 qoqi(1/ 2 + p) Note that if p = 1/6 we have I(U; Si, S2, S 3 ) = 0 and the scheme reduces to choosing the blocks ahead of time, without looking at the specific realization of the interference sequences. Thus the inner bound described in the Theorem 5 falls as a special case of this scheme. After some algebra, we arrive at the following H(N|U) = I(U;Si,S 2 ,S 3 ) = H(3qoqi(1/2 + p)) 3qoqi(2p- H(2p) + log 2 ( )) 2 (3.6) (3.7) (3.8) 56 Optimal l(U;S) as a functio of q 0.012 0.01 - 0.008 D.006 0,004 0.002 00' 0.15 0.1 0.05 0.2 0.25 q 0.3 5 0.35 0.4 0.45 0.! 0.35 0.4 0.45 0.5 Optimal p as a function of q 0.15 - a 0.10.05 01 1 0.05 0 0.1 0.2 0.15 0.25 0.3 Figure 3-10: Optimal p, I(U; Si, S 2 , S 3 ) vs q Inner and 1r Outer Bounds for 3 User scheme - Achievable by Multiple Scheduling Scl - - Achiable by Single Schedule Bound 0.9 Outger 0.8 0.5 0.4 0 0.05 0.1 0.15 02 q: p F 1) 0.3 0.35 0.4 0.45 05 Figure 3-11: Inner Bound, Improved Inner Bound and Outer Bound. 57 We now simplify the achievable rate of Theorem 6 as follows: R < I (C, U; Yi) - I (U; S1, S2, S3) - E = H(Y) - H(YIU, C) - I(U; Si, S 2 , S 3 ) = H(Y) = H(Y) - H(NiJU) - I(U; S1, S2S, = 1 - - H(C e NijU, C) - I(U; S1, S2, S3) - 3) - E N, is noise, independent of C - E {H(Ni U) + I(U;SiS 2,S 3 )} -E The summation {H(NiIU) + I(U; Si, S 2 , S3)} above captures the tradeoff between complexity at the encoder and the cleanup ability at the receiver. If I(U; Si, S2, S 3 ) = 0 i.e we only have one schedule, the scheduling block in Figure 3-9 is simple however the effective noise injected in the channels is high. As we increase the number of schedules, we are able to decrease the effective noise on the symbols. However, by increasing the number of schedules, we increase the possibility of selecting a wrong schedule. The tradeoff is to minimize the sum {H(NiJU) + I(U; S 1 , S 2 , S 3 )} for the distribution p(UISi, S 2 , S 3 ). Substituting (3.6),(3.7) in the above expression, we have R(p) = 1- 3qoqi{2p - H(2p)+log2( 2 )}+ 3 H(3qoq1(1/2 + p)) Maximizing with respect to p, yields the maximum rate achievable from this scheme. The optimal p and the corresponding rate are shown in Figure 3-11. We note that the gain from using optimal p is minimum and the added complexity of this scheme may not be necessary. In fact the results of the previous section suggest that even though the inner bound of Theorem 5 is not optimal it is quite close to the outer bound so that the further gains may not be worthwhile. 3.4.2 Binary Channel with Noise In this section we consider the binary channels with additional noise not known to either the encoder or the decoder. Suppose there is an additive noise Zi on each user channels and let p = Pr(Zi = 1). The coding theorem of section 3.3.1 is still applicable and the corresponding rate for the 2 user case can be easily shown to be R < 1 - !H(p) - 1H(p * Pr(Si = S 2 )) 58 Si ~(O,Q) Z1 ~j(O,N) ' Encoder Decoder2 W Decoder 2 W2 -~ A +2 =x+s2+Z2 S2~WN(O,Q) Z 2 ~ W(O,N) Figure 3-12: Two User Multicasting Gaussian Dirty Paper Channel where * denotes the convolution operator (p * q = p + q - 2pq). Unfortunately the converse presented in section 3.3.2 does not generalize. A loose upper bound may be obtained by assuming that the noise is known to the sender, however this is clearly not tight and it is still an open problem to develop a tighter upper bound in this case. Gaussian Channels 3.5 In this section we turn our attention to the Gaussian channel. We focus on the case of two users. Each user receives Y C.Af(O, Q) = X + Si + Zi. We suppose that Si and S2 are independent random variables while Zi and Z 2 are independent CK(O, N) random variables. The scenario is shown in Figure 3-12. We observed that in the binary case, a novel two user coding scheme outperforms timesharing as well as ignoring side information. Are there similar gains in the Gaussian case? The answer is yes for at least a certain regime of P and Q and we describe some achievable rates in this section. Unfortunately, getting an upper bound is difficult, so we do not know the optimality of these schemes. Before we proceed to some achievable rates note that time-sharing between the two users yields a rate of R = - log (1 + E) while ignoring side information yields R = log 1 + N+Q). Corollary 4 R = log 1 + N+Q2 is achievable Proof: Write S1 and SI - = S2 }(S1 + S 2 ) + "(S 1 - S2 ) and S2 = {(S1 + S2 ) are uncorrelated. We cancel the common term }(S1 - S2) and note that (S1 + 52) 2 ) using the single user dirty j(Si + S 59 paper coding idea [4] while absorb the other term - (S 1 - S2) into noise. Thus the effective noise on each channel is A(0, N + Q/2) and the rate R is achievable. U Corollary 5 An achievable rate for the Gaussian two user channel is R 1 -log(1 + -)+- max 2 N _ _ log 2\PP$+Q(P4 _ +, P+N+Q _ _ PN2I P+N 0 Proof: The sender finds a sequence u jointly typical with the particular realization , f +). = (S J Let of s and sends x = u - ag, where a = P/(P + N). With this choice of U and using the result of [13](R < I(U; Y) - I(U; S)), we calculate the achievable rate at receiver 1 for the two blocks Qog(P )2). (The rates at receiver 2 are p2 N+ < i log(12+ Ni), respectively asP AN P+N+Q reverse.) By using arguments similar to the ones used in the for the binary case we can show that the rate R < i (Rbl + Rb2 ) is achievable under joint typical decoding and this completes the proof. U The achievable rate in the above Corollary is motivated from the coding scheme for the two user binary channel. To make this clear, we provide an alternate scheme based on lattice coding. Lattice coding for dirty paper channels was first suggested in [7]. An alternative interpretation of the above Interpretation using Lattice decoding argument using lattice decoding is as follows: The codeword is divided into two blocks. In the first block, dirty paper coding is performed for user 1. Accordingly the transmitter sends X = (V - aS 1 + D) mod A. Here V is the desired codeword and D is a dither sequence [7]. We next calculate the performance of receiver 1 and 2 for this signalling: " Receiver 1: Y1 = X + Si + Zi. The receiver generates MMSE estimate aY1 = X + E + aS1 where E is a sequence of i.i.d CA[(O, PN ) RV's. Taking modulo A, and subtracting the dither sequence the effective channel is (V + E) mod A and a rate of log(1 + 2) " is achievable. Receiver 2: Y 2 = X + S2 + Z 2 . Since S2 is not presubtracted by the transmitter, it is treated as additional noise and the receiver generates MMSE estimate a'Y = X + E', where a' = P and E' is a sequence of i.i.d Cfi(O, PN+±) random variables. The effective channel after the modulo A operation and subtracting the dither signal is (V - aS1 + E') mod A and the achievable rate is max 0, log \ (2Q \ 0 (+ )) P+N+Q/ In the subsequent block, the achievable rates at the two receivers will reverse and the average rate in the lemma is achieved. 60 While the analogous scheme is strictly better than time-sharing in the binary case the same is not true in the Gaussian case. The main difficulty is that in the first block if we decide to clean up user l's channel, the noise injected in user 2's channel is significantly high and a positive rate cannot be achieved for user 2 in this block for large (but finite) values of Q. One may decide not to completely cleanup user 1 in the first block. One way to do this is to choose a different value of a from the optimal. Unfortunately such a strategy does not yield large gains and in general it is still an open problem to come up with a non-trivial rate that beats time-sharing for all range of interferences. 3.6 Writing on many memories with defects As noted earlier, the problem of writing on memory with defects [19],[14] was one of the earliest application that lead to the general problem of coding for channels with non-causal side information at the transmitter [13] for point to point links. In this section we briefly summarize the classical memory with defects problem and then suggest some achievable rates for multiuser channels. Suppose we have an array of n cells on which we wish to store an encoded file. Each cell can store either a 0 or a 1. A cell is defective with probability p and clear with probability of 1 - p. A defective cell is either stuck at a 0 or a 1 (both events are equally likely), no matter what we write on it. A clear cell stores the bit that is written on it. A write-head scans the entire array to determine which of the cells are defective and notes the value they are stuck at. It uses this position to encode the file into a binary codeword, so that when the file is subsequently read later on, it can be correctly decoded without knowing the position of the defects. If decoder knew this information, it is clear that the capacity of the memory is 1 - p. Is there a loss in the capacity if the decoder does not know the position of the defects? The surprising result is that there is no loss even if the receiver did not have this information. This is analogous to the result of dirty paper coding [4]. The coding scheme uses the idea of random binning. The main idea is that for large values of n, approximately np cells are stuck at either a 0 or a 1. If in each bin, we can find a codeword that agrees with the pattern of defects, then we can use this codeword to store the information of the bin index without incurring any errors. There are n(1 - p) bits which are clear and hence 61 there are a total of 2 n(1-p) codewords which agree with any specific sequence of defects. If all these codewords are in different bins then we can communicate with a rate of 1 - p. If the codewords were generated i.i.d Bernoulli(1/2) and randomly partitioned into 2n(1-P-E) bins, then the codewords that agree with a given defect are in fact in different bins and we achieve the maximum possible rate. More practical schemes based on erasure correcting codes have been suggested for this and related applications [19],[41]. A slightly more general version single memory with defects is given in the following lemma: Lemma 2 Consider a memory with defects channel whose cells can have one of the following possibilities (independent of all other cells): " stuck at 0 with probability P " stuck at 1 with probability P " clear with probability 1 - p but can flip the symbol with probability 6 The capacity of this memory is given by [14] (1 - p)(I - H(E)) 3.6.1 Two Memories with Defects We are interested in the following multi-user generalization of the writing on memory with defects problem. There are two memories each with n cells. Each cell is defective with probability p independent of all other cells. There is only one write-head which can simultaneously write on both the memories. The header can scan the memories and know the position of defects. A read-head that subsequently scans one of the two memories does not know these locations. How should one encode the file so that any one memory can be used to decode the file at the maximum possible rate? The problem is still open. We propose the following achievable rates Corollary 6 An achievable rate for writing on two memories is R = 1 2 This scheme is asymptotically optimal as p -+ 1 Proof: The main idea is to use different codebooks for the two memories. We first serve memory 1 at a rate 1 - p and write on all the clear cells on this memory. In the process, we treat the resulting symbols on memory 2 as additional stuck at defects. Memory 2 achieves a rate of p(l - p) corresponding to the fraction of indices that have clear cell in memory 2 and defective cells in memory 1 . For the 62 as claimed. As next message, the order of the two memories is reversed. This yields a rate of 1 p -* 1, this approaches 1 - p, which is an obvious upper bound on the achievable rate. U The above coding scheme performs well as p -+ 1. But for small values of p it performs poorly. Infact for p = 0, we only achieve 1/2 of the total rate. Use of independent codebooks for the two memories is suboptimal for small values of p. We now consider the other extreme, where we have the same codebook for the two memories. Corollary 7 An achievable rate for writing on two memories is max (R 1 , R 2 ) where I - 2p(1 - p) - p2 -) 2 R, = R2 = -2 2 -- H 4 lp p (__p)2 + 2 (I -H 22 + -Pip) p-2 2 Proof: Our proof is based on constructing a common codebook for the two memories. Let us define the event Ei 2 k, i E {1, . .. n} j, k c {c, 1, 2} to be the event that at index i the cell of memory 1 is in state j and the cell of memory 2 is in state k. Here the state 'c' refers to a clear cell, the state '0' refers to a cell stuck at 0 and the state '1' refers to a cell stuck at '1'. Since the cells are independent, we have P(Eijk) = Pr(j)Pr(k). We argue that R 1 is achievable as follows. Consider a fictitious memory F whose ith cell Fj is marked as clear if either Ec, Ejoi or Eio occurs. F is stuck at 0 either Eco, Ejoc or Ej00 occurs. or E411 occurs. By law of large numbers, Fj has ~ (1 - p) 2 + Z 2p(l - p) + P. fraction of stuck cells. If the events Ejoi and Eilo fraction of clear cells and occur, there is an error in one of the memories, since the common codebook cannot agree for both F is stuck at 1 if either Eiic, Ei~ defects. The fraction of these errors in either memories is e = (i-p) 2 Accordingly, we let each +P'2 cell marked clear on 'F' to be a BSC with probability E. It is clear that any rate achievable for F is also achievable for the constituent memories. We note that F is equivalent to the memory in Lemma 2 with eff = E and peff = 2p(l - p) + I- thus R 1 is achievable. To show that R 2 is achievable, we consider another fictitious memory G, whose ith cell is labeled as clear if any of the following events occur: Ejcc, Ejo, Eiic, Eco, Ejci, Ejol and E 1 0 . Gi is marked as stuck at 1 if Esil occurs and as stuck at 0 if Ej00 occurs. To ensure that any rate achievable by G is also achievable for the constituent memories, we require that each clear cell is G is a BSC with error probability E 4 . We now note that G is equivalent to a memory in Lemma 2 with 2 &egf= e and pef f I and thus the achievability of R 2 follows. U Figure 3-13 plots the achievable rates for the different schemes described in this section. We observe that for large p, it is better to code the two memories separately using the scheme described in Corollary 6. On the other hand, for small values of p, it is better to use the coding schemes described in Corollary 7 to achieve higher rates. It is still a open problem to find a strategy that is uniformly optimal. 63 1 - 0.9 - R,(Common Code) R 2 (Common Code) (Independent Code) (Outer Bound) 0.8 0.7 N 0.6 co CL 0.5 - -N 0.4 - '- 0.3 0.2 0 N - 0.1 0 0.1 0.2 0.3 0.5 0.4 0.6 0.7 0.8 0.9 1 p: Channel Figure 3-13: Achievable Rates and Outer Bound for writing on two memories with defects 64 Chapter 4 Conclusion and Future Work The main focus of this thesis has been to understand fundamental limits and propose practical architectures for multicasting in wireless networks. While the problem of sending independent messages to different users has been studied extensively in the literature, relatively little work has been done for the case of multicasting common information. In Chapter 2, we study the problem of multicasting from a base station having multiple transmit antennas. The fading environment plays an important role in the design of such systems. The problem is more interesting for slow fading case where different users experience different channel gains. Our focus in this study has been the limit of large number of users. We show that multiple antennas provide huge gains under many different performance criteria. The main intuition is that multiple antennas average out the fading effects and hence drastically improve the performance of the worst user. We also propose an architecture that uses erasure codes for communicating over block fading channels. This architecture sheds new light on the cross layer design of the application and physical layers. One important observation is that if the application layer exploits time diversity of the wireless channel using an ARQ type protocol, it is not necessarily a good idea to design the physical layer that aims for a small outage probability. Furthermore one should be careful in interpreting the gains achieved from using multiple antennas in these systems. It is known that spatial diversity does not provide substantial gains in systems with rich time diversity [24]. Our study takes a further step and shows that to achieve these marginal gains one has to operate very close to the optimal operating point. These gains diminish quickly as we deviate from the optimal operating point. Finally we consider the problem of multicasting 65 to multiple groups with multiple antennas. We discuss the difficulty in generalizing the dirty paper coding solution to these problems and motivate our work in Chapter 3. In Chapter 3, we study the problem of multicasting over channels with known interference at the transmitter. We develop a model for a class of multicasting channels where the transmitter has to deal with more than one state sequence simultaneously and observe that it is a non-trivial generalization of the single user problem which has been studied previously. We derive the capacity of the two user binary case and and propose several achievable rates for other models such as the Gaussian channels and the memory with defects case. Our main conclusion for the binary case is that there is a fundamental loss if only the transmitter knows the interference. Furthermore the value of side information diminishes as the number of users grows. These results are somewhat negative since the point to point result states that there is no loss in capacity if only the transmitter knows the interference. We expect similar results to hold for channels other than binary channels, but we have not proved this fact in this thesis. Future work Understanding fundamental limits for multicasting in wireless networks is still a rich area and a lot remains to be done. One open problem for future work is in adapting rateless erasure codes over fading channels as discussed in section 2.5. The broadcast approach in section 2.5.1 is a multilayered generalization of the single layer system that we consider and is particularly attractive way for communicating with low delay on block fading channels. The main difficulty in using erasure codes with the broadcast approach is that the rates of different layers are different. Hence we need erasure codes which can provide different rates for different layers. It is an open problem to design rateless erasure codes over different alphabets which can be simultaneously used for efficient decoding. The problem of multicasting over channels with known interference has many open problems. The most general answer to this problem will be an extension of Gel'fand Pinsker result [13] to multicasting channels. In this thesis, we proposed a novel coding scheme for the binary channel, which is perhaps the simplest non-trivial channel in this class. The Gaussian channel still remains to be completely solved. The main difficulty in these problems is that in contrast to the single user case, there is a fundamental loss if only the transmitter knows the additive interference. The solution to these problems will require novel coding techniques to achieve non-trivial inner bounds, while deriving sharp upper bounds for these channels 66 may require some tricks not yet known. 67 Appendix A Coding over multiple blocks in SISO Block Fading Channels In this appendix we analyze the performance improvement by using erasure codes that span K blocks. We assume a SISO system with rayleigh fading and the high SNR (p -- 00) regime. For such a system the the ergodic capacity is given by Cerg = log p - 'y nats [38] where -y A 0.577 ... is the Euler's constant. We now consider the rate achieved from coding over K blocks. We state the following two lemmas: I' Lemma 3 (Chernoff Bound) If X = Xi is a sum of K i.i.d random variables, then Pr(X < a) _< (E[e-sXi])Kesa for all s > 0. Lemma 4 [42] If X ~ X then 1 n + 1 - s, -) 1 E[(1 + pX)-s] = -4(n, pf P p(-) is a hypergeometric function of second kind with the following property: FP(n-s) .(1)s-fl F(n) 4(n,n + 1 -s,) P ifs < n r(s n) ifs> n log p-V(n)+ 2 V(1) if s=n + H(n) Here IF(-) is the regular Gamma function and b(.) is the digamma function. 68 Let Ik = i 1 log(1 + pjhjI 2 ). We compute e = Pr (IK < KR) Xi K Pr(Z Xi < KR) E= i=1 a K sKR K (E[e-sXI) = (E[(1 + phi (1, 2 ((T b]pl_ 2 -s])Ke sKR esKR s, 1) - esKR ) (A. 1) Where (a) is due to Chernoff bound and (b) follows from Lemma (4). Thus we have R > R (I - We now maximize this lower bound over s (r(1-S))KesKR) and R. Differentiating with respect to R and rearranging terms, we get': -1=0 esRK(1+sRK) F(S) psK psK esRK(1 + sRK) = ( sK r(1 - s)K sRK + log(l + sRK) = sK log p - K logF(1 - s) R = log logF(1-s) s (A.2) log(1+sKR) sK Since we are free to choose any s > 0 and we have K -+ oo, we select s -0 0 such that sK -+ oo. One such choice could be s = 1/log K. With this choice, we have R =logp - lim logL(1 - s) s-0 -+ -+ S _ lim log(1 + sKR) sK-oo sK log R logp-y- sK log log p log p-sK 'This step is heuristic since we are differentiating an approximation. To be rigorous, one should differentiate the entire expression and then make the high SNR approximation. 69 Finally, we need to calculate the outage probability e, since N? = (1 - E)R. From (A.1), we have rF(1 - s) E K esKR PS )S K(1 _S)(e - eR)sK sK logr(1-s) e sK(R + logF(1 exp --+ c, C = s 1oo 1 sKR sK Thus as K P (1 - e)R - logp 0 - - S) - log P) From (A.2) y - (log log p) nats. Thus for high SNR and large K, we approach the ergodic capacity according to -O(loglog s = K-5 (where 3 > 0 can be arbitrarily small), we satisfy both sK thus we approach the capacity as 0 (K1_ i i). 70 -+ p). By choosing 0c and s -+ 0, and Appendix B Proof of proposition 2 We prove the following result used in Proposition 2. The minimum of n i.i.d XiL random variables decreases as 0(n-(1/L+E)), for large n Let X 1 , X 2 . and suppose W .. X, be i.i.d random variables with cumulative distribution function F(-) = miniE{1,2...n} Xi is the minimum of the n random variables. Let a(F) = inf{x : F(x) > O} be the lower end point of the distribution. The following results are known [10](Sec. 2.1,pg. 58). Proposition 3 Let a(F) = -o. Assume there is a constant y > 0, such that for all x > 0, lim Ft)= t--oo F(t) X (B. 1) Then there is a sequence dn > 0 such that lim Pr(Wn < dnx) = Li,,(x) n--.oo where {= exp (- (-x)) 1 if x < 0 if X> 0 The normalizing constant d, can be chosen as 1 dn = sup {x: F(x) < n { 'There is a certain amount of freedom in choosing the normalizing constant. But for our purpose, this choice of d, is sufficient. 71 Proposition 4 Let a(F) be finite. Suppose that the distributionfunction F(t) = F(a(F)- 1/t) fort < 0 satisfies (B.1). Then there are sequences c, and d, > 0 such that lim Pr(Wn < cn + dax) = L 2 ,(X) (B.2 n-oo ) where - exp (-((x L 2 ,,y (X) = if x > 0 ) 0 if x < 0 The normalizing constants cn and d, can be chosen to be cn = a(F) dn = sup We apply Proposition 4 to the X 2 - F(x) = (B.3 ) x:F(x) < dn W4 (BA ) -a(F) distribution. The distribution function is given by ex (1 + x + x2 XL 1 x>0 (B.5) 1 x<0 10 Clearly, a(F) = 0 and F(x) in Proposition 4 is given by -e F(t)= lt (1 - 1 + 12 + S27 1 l)L (L-)!tL- 1)t>O t< t>0 10 72 (B.6) Now we verify that F(-) satisfies the (B.1). lrn F(tx) t-*-oo F(x) dF(tx) = l'H6pital Rule t lim t--oo dF(t) dt (7 {e'/t lim - t-4-00 . d eji/t el/t - (I X+ 2(tX) 2 (L-1)!(tX)Ll 1 1 (+) -et+++. + lim)I = 1 x 1. Accordingly, using Proposition 4 we have Thus Corollory 4 is applies to F(.) with ' cn = 0. To solve for da, we use the approximation that for small x, (B.5) is approximately . As n -+ is satisfied by choosing dn = given by F(x) ~ oo, dn < O(n 1 -'). 1 and hence, we have from (B.4) that d . This From (B.2), it follows that lim Pr(Wn < dnx) = 1 n--+oo < - exp(-x) for x > 0 (B.7) Thus for a X2L distribution, as r -+00, the smallest element decreases as 0(n-(1/L+E)) 73 Bibliography [1] G. Caire and S. Shamai. On the achievable throughput of a multiantenna Gaussian broadcast channel. IEEE Trans. Info Theory, 49(7):1691-1706, July 2003. [2] B. Chen and G. W. Wornell. Quantization index modulation: A class of provably good methods for digital watermarking and information embedding. IEEE Trans. Information Theory, IT-47:1423-1443, May 2001. [3] A. S. Cohen and A. Lapidoth. Generalized writing on dirty paper. In Proc. Int. Symp. Inform. Theory (ISIT), Lausanne, Swtizerland, page 227, July 2002. [4] M. H. M. Costa. Writing on dirty paper. IEEE Trans. Information Theory, IT-29:439441, May 1983. [5] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991. [6] 1. Csiszar and J. Korner. Information Theory - Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981. [7] U. Erez, S. Shamai (Shitz), and R. Zamir. Capacity and lattice strategies for cancelling known interference. IEEE Trans. Information Theory, submitted, June 2002. [8] G. D. Forney, Jr. Digital communications-II(6.451) - Course notes. MIT, Spring 2003. [9] G. D. Forney, Jr. Exponential error bounds for erasure, list, and decision feedback schemes,. IEEE Trans. Info. Theory, 1974. [10] Janos Galambos. The Asymptotic Theory of Extreme Order Statistics. Rpnert E. Krieger Publishing Company, 1986. 74 [11] R. G. Gallager. Information Theory and Reliable Communication. Wiley, New York, N.Y., 1968. [12] A. El Gamal. Network information theory - summer Course notes. EPFL, 2003. [13] S. I. Gelfand and M. S. Pinsker. Coding for channel with random parameters. Problemy Pered. Inform. (Problems of Inform. Trans.), 9, No. 1:19-31, 1980. [14] C. Heegard and A. El Gamal. On the capacity of computer memory with defects. IEEE Trans. Information Theory, IT-29:731-739, Sept. 1983. [15] Sidharth Jaggi, Philip A. Chou, and Kamal Jain. Low complexity algebraic multicast network codes. In InternationalSymposium on Information Theory, 2003. [16] N. Jindal and A. Goldsmith. Dirty paper coding vs. TDMA for MIMO broadcast channels. In IEEE International Conf. on Communications (ICC), June, 2004. [17] Young-Han Kim, Arak Sutivong, and Styrmir Sigurjonsson. Multiple user writing on dirty paper. In InternationalSymposium on Information Theory, 2004. [18] R. Koetter and M. Medard. An algebraic approach to network coding. IEEE/A CM Transactions on Networking, 11:782-795, 2002. [19] A. V. Kuznetsov and B. S. Tsybakov. Coding in a memory with defective cells. translated from Prob. Peredach. Inform., 10:52-60, April-June, 1974. [20] M. Lopez. Multiplexing, Scheduling, and Multicasting Strategies for Antenna Arrays in Wireless Networks. PhD dissertation, MIT, EECS Dept, June-August 2002. [21] M. Luby. LT codes. In The 43rd Annual IEEE Symposium on Foundationsof Computer Science, 2002. [22] E. Malkamaki and H. Leib. Evaluating the performance of convolutional codes over block fading channels. IEEE Trans. Info Theory, 45(5):1643-46, July 1999. [23] A. Narula, M. Lopez, M. Trott, and G. Wornell. Efficient use of side information in multiple-antenna data transmission over fading channels. Areas in Communications, 16(8):1423-36, October 1998. 75 IEEE Journal on Selected [24] A. Narula, M. Trott, and G. Wornell. Performance limits of coded diversity methods for transmitter antenna arrays. IEEE Trans. Inform. Theory, 45(7):2418-33, November 1999. [25] J. Nonnenmacher. Reliable Multicast Transport to Large Groups. PhD thesis, EPFL Lausanne, Switzerland, 1998. [26] L. Ozarow, S. Shamai, and A. Wyner. Information theoretic considerations for cellular mobil radio. IEEE Tranactions on Vehicular Technology, 43(2):359-78, May 1994. [27] P. Sanders, S. Egner, and L. Tolhuizen. Polynomial time algorithms for network information flow. In Proc. of the 15th ACM Symposium on Parallelism in Algorithms and Architectures., 2003. [28] C. E. Shannon. Channels with side information at the transmitter. IBM Journal of Research and Development, 2:289-293, Oct. 1958. [29] A. Shokrollahi. Raptor codes(2003). Pre-print. [30] Y. Steinberg. On the broadcast channel with random parameters. In International Symposium on Information Theory, 2002. [31] A. Steiner and S. Shamai. Multi-layer broadcasting in a MIMO channel. In Conference on Information Science and Systems, March 2004. [32] E. Teletar. Capacity of multi-antenna Gaussian channels. European Transac. on Tele- com. (ETT), 10:585-596, November/December 1999. [33] D. N. C. Tse and P. Viswanath. Fundamentals of Wireless Communications. Working Draft, 2003. [34] P. Viswanath and D.N.C Tse. Sum capacity of the vector Gaussian broadcast channel and uplink-downlink duality. IEEE Trans. Info Theory, 49(8):1912-21, August 2003. [35] S. Viswanath, N. Jindal, and A. Goldsmith. On the capacity of multiple input multiple output broadcast channels. In Proc. of ICC 2002, New-York, NY, USA, 28 April- 2 May, 2002. 76 [36] Hanan Weingarten, Yossef Steinberg, and Shlomo Shamai (Shitz). The capacity region of the Gaussian MIMO broadcast channel. In CISS, 2004. [37] J. Wolfowitz. Coding Theorems of Information Theory. Springer-Verlag, New York, 1964. [38] G. W. Wornell and L. Zheng. Principles of wireless communication - Course notes. Massachusetts Insititute of Technology, 2004. [39] W. Yu and J.M. Cioffi. Trellis precoding for the broadcast channel, preprint, Globecom 2001. See also: The sum capacity of a Gaussian vector broadcast channel, submitted to IEEE Trans. Inform. Theory. [40] W. Yu, A. Sutivong, D. Julian, T. Cover, and M. Chiang. Writing on colored paper. In Proc. Int. Symp. Inform. Theory (ISIT2001), page 302, Washington, DC., USA, June 24-29, 2001. [41] R. Zamir, S. Shamai (Shitz), and U. Erez. Nested linear/lattice codes for structured multiterminal binning. IEEE Trans. Information Theory, IT-48:1250 -1276. [42] Hao Zhang and Tommy Guess. Asymptotical analysis of the outage capacity of ratetailored blast. In GlobeComm, 2003. 77

Coding Techniques for Multicasting Ashish Khisti

Related documents

Products

Support

Coding Techniques for Multicasting Ashish Khisti

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib