Rateless and Rateless Unequal Error Protection Codes for Gaussian Channels by Kevin P. Boyle B.S. E.E., University of Notre Dame (2005) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2007 ( Kevin P. Boyle, MMVII. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author ...... ................. ................... ................ Department of Electrical Engineering and Computer Science .A -August 10, 2007 Certified G rego ry W . Worne ll . . ... Pofessor of Electrical Engineering and Computer Science Thesis Supervisor Certified by.......... ................... ....................... Dr. Christopher Yu arles S 7, Accepted by...... ... raper Laboratory ees ,i.sor C.-- Arthur C. Smith MASSACHUSETTS INSTITUE. Chairman, Department Committee on Graduate Students OF TEOHNOLOGY 12 2007 iOCT ARCHIVES Rateless and Rateless Unequal Error Protection Codes for Gaussian Channels by Kevin P. Boyle Submitted to the Department of Electrical Engineering and Computer Science on August 10, 2007, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science Abstract In this thesis we examine two different rateless codes and create a rateless unequal error protection code, all for the additive white Gaussian noise (AWGN) channel. The two rateless codes are examined through both analysis and simulation with the hope of developing a better understanding of how the codes will perform and pushing the codes further toward implementation. After analyzing and simulating the rateless codes by themselves, we compare using a rateless code to two different forms of hybrid automatic repeat request (HARQ), namely Chase combining HARQ and incremental redundancy HARQ. We find that the rateless codes compare favorably to both forms of HARQ. In addition, we develop a code that has both rateless and unequal error protection (UEP) properties. A rateless code adapts to the quality of the channel and achieves capacity but all of the information bits are decoded at the same time and thus the bitstream cannot be prioritized. In addition, if only a finite number of retransmissions is allowed, the range of available rates that a rateless code can provide is limited. In contrast, a UEP code provides a prioritization of the bitstream, and an arbitrary range of rates but does not achieve capacity. The rateless UEP code, or RUEP code for short, provides the prioritization of a bitstream that UEP provides, and also adapts to the quality of the channel as a rateless code does. The RUEP code provides bitstream prioritization while being more efficient than a traditional UEP code and is capacity achieving for some channel realizations. In addition, the RUEP code provides a larger range of available rates than a rateless code when only a finite number of retransmissions is allowed. Thesis Supervisor: Gregory W. Wornell Title: Professor of Electrical Engineering and Computer Science Thesis Supervisor: Dr. Christopher Yu Title: Charles Stark Draper Laboratory Acknowledgments This thesis would not be possible without the guidance and support of many people. First, I would like to thank the EECS department and Professor Jae Lim for supporting me as a TA for my first and second semesters at MIT, respectively. Next, I am truly grateful for the opportunity to be a Draper Fellow. As my Draper advisor, Chris Yu has provided great guidance, feedback, encouragement, and support for my work. Chris has done a great job of providing me freedom to explore the problem and yet guidance at the same time. Also at Draper, I consider Phillip Lin an unofficial co-advisor. The numerous technical discussions and thesis draft feedback that Phil has provided have been invaluable. This work would also not be possible without Greg Wornell, my MIT advisor. I am thankful for the advice, insight, and encouragement to press on with promising aspects of the problem that Greg has provided along the way. I am also fortunate to have been a part of Greg's research group, and to have gotten to know the members of the Signals, Information, and Algorithms Laboratory. Within the SIA Lab, I owe a special thanks to Urs Niesen and Maryam Shanechi for discussions regarding their work on rateless codes, and Charles Swannack for last minute LaTeX help. While completely unrelated to my research, I would like to thank Fred Chen for help with our analog circuits class, and for numerous pickup basketball games. In addition, thanks to Ballard Blair for discussions regarding classes, TQE preparation, coding theory, and whatever other topics happened to come up during our lunches. Also, thanks to Stefan Campbell, Kiel Martin, and John Miller. To all of you, among other side activities, for football tosses on campus during nice Summer and Fall days when, oddly, no one else was out there throwing a football. To Stefan, for numerous Notre Dame related conversations that others grew tired of. To Kiel, for time well spent as roommates. And to John, for our weekly adventure to Central Square for dinner. A special thank you to Molly. For your constant care, support, and love throughout my time here at MIT. Finally, to my family - Mike, Kati, mom, and dad. For your overwhelming love and support both at MIT and before. I am blessed to have had such a wonderful family behind me. This thesis was prepared at The Charles Stark Draper Laboratory, Inc., under Internal Company Sponsored Research and Development, number 21171-001, Communication and Networking. Publication of this thesis does not constitute approval by Draper or the sponsoring agency of the findings or conclusions contained herein. It is published for the exchange and stimulations of ideas. Kevin P. Boyle Contents 1 Introduction 17 1.1 Problem Statement ............................ 17 1.2 Background . . . . . . . . . . .. 1.3 . .. . . . . . . . . . . . . . . .. . 18 . 18 1.2.1 Single User Rateless Codes . .................. 1.2.2 Rateless Unequal Error Protection Codes . ........... 19 1.2.3 Rateless Networking ....................... 20 Outline of the Dissertation ......................... 20 2 Single User Rateless Codes 23 2.1 Rateless Codes from a Communication System Viewpoint ....... 24 2.2 A Rateless Code with a Time-Varying Power Distribution ....... 28 2.2.1 Effect of Imperfect Base Code . ................. 29 2.2.2 Simulation Results with an LDPC Base Code . ........ 34 2.2.3 Finite Precision Digital to Analog Converter Effects ...... 41 2.3 2.4 A Rateless Code with a Constant Power Distribution ........ . 51 2.3.1 Code Design for a Time-Varying Interference Channel .... . 52 2.3.2 Performance of Rateless Code with a Constant Power Distribution 65 Mitigating the Effects of Acknowledgment Delay . ........... 69 2.4.1 Modifying the Acknowledgment . ................ 70 2.4.2 Interleaving Different Data Streams . .............. 71 3 Rateless Unequal Error Protection Codes 3.1 Traditional UEP ............................. 73 75 Description of Superposition UEP . ............... 3.1.2 Two Sources of Inefficiency in Superposition UEP ...... 3.2 Rateless UEP vs. Traditional Rateless . ................. 3.3 Rateless UEP . .. .. ... .. ... ...... ... . ... .. .... 84 Achievable Rates of RUEP in Different Channel SNR Regimes 3.3.2 RUEP Scenario Number One - Cannot Repeat High Priority 85 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 90 3.3.3 RUEP Scenario Number Two - Can Repeat Both Sets of Bits 91 3.3.4 Additional Considerations for RUEP 96 3.3.5 Example of RUEP 3.3.6 Asymptotic Analysis of Achievable RUEP Rates ....... 3.3.7 Summary - RUEP vs. UEP and Rateless . ........... . ............. ........................ 98 . 105 109 How Rateless Fits in to Point-to-Point Communication and Networking 4.1 4.2 5 79 82 3.3.1 B its 4 75 3.1.1 111 Rateless vs. Hybrid ARQ for Point-to-Point Communication .... . 111 4.1.1 Rateless versus Chase Combining HARQ . ........... 4.1.2 Rateless versus Incremental Redundancy HARQ ....... . 124 4.1.3 Rateless versus HARQ in Time-Varying Channels ...... . 127 112 Future Direction - Robust Networking with Rateless Codes ...... 4.2.1 The Potential of Rateless Codes in Networks ......... 4.2.2 Issues with Rateless Codes in Networks . ............ Conclusions 128 . 129 130 133 A Proofs 137 B Code Simulation Details 139 List of Figures 2-1 Example of a rateless transmission. The transmitter sends three blocks, lowering the rate and increasing the Eb/No with each block. The receiver can finally decode the information bits at a bit-error rate of 10- 4 27 after the third block. ........................... 2-2 E,,p vs. Pmax for different values of A. Curves with a higher A are 32 below curves with a lower A ........................ 2-3 Bit-error rate versus SNRor,m of layered, dithered repetition rateless code from [7]. Here we do not take into account the inefficiencies in the rateless code and the base code spectral efficiency is p = .Z . . 36 2-4 Bit-error rate versus SNorm of layered, dithered repetition rateless code from [7]. Here we do take into account the inefficiencies in the rateless code and the base code has a spectral efficiency of p = Egap * log(1 + ' ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2-5 Efficiency of the rateless code versus the number of received blocks ).The efficiency of the code from the simwhen p = Egap * log(1 + --. ulations is greater than the lower bound on Etot. Thus, the simulated results agree with the theoretical bound. . ................ 38 2-6 Gap to capacity of the rateless code versus the number of received ± blocks when p = E,,a*log(1+ ). Given the lower bound on efficiency, we can calculate a corresponding upper bound on the gap to capacity, which is plotted in the top curve. The actual gap to capacity from the simulations is below this bound. . .................. .. 38 2-7 Efficiency of the rateless code versus the number of received blocks when p = -. Here we do not take into account the inefficiencies in the code when choosing C*. The assumptions made in deriving (2.6) are no longer valid. It is clear that the bound (2.6) does not hold for this code and this code performs worse than the code in Fig. 2-5. 2-8 . . 39 Gap to capacity of the rateless code versus the received number of blocks when p = %*. Here we do not take into account the inefficiencies in the code when choosing C*. The assumptions made in deriving (2.6) are no longer valid. It is clear that the gap to capacity upper bound corresponding to the efficiency lower bound (2.6) does not hold for this code and this code performs worse than the code in Fig. 2-6. ...... 40 2-9 Performance comparison of a rateless code that is sent through two different digital to analog converters with 5-bit precision. The code with higher D/A mean-squared error actually performs better. .... 46 2-10 Effect of quantization on performance of rateless code with one block (M = 1). The bit-error rate vs. SNRorm for one rateless block is plotted for four different scenarios - no quantization, 5-bit quantization, 6-bit quantization, and 8-bit quantization. In each scenario, MMax = 20. 47 2-11 Effect of quantization on performance of rateless code for four blocks (M = 4). The bit-error rate vs. SNRnorm for is plotted for two different scenarios - no quantization and 5-bit quantization. The loss in performance due to quantization at a bit-error rate of 10-4 is 0.16 dB. 50 2-12 Illustration of one block of a rateless code from [19] with four layers and one sublayer per layer. Dashed lines show the boundaries between different sub-blocks in the top sublayer. The labels ri are the rate of the code used on the ith sub-block if we design the base code with the divide and conquer approach. ................... ... 53 2-13 Comparison of the efficiency of the rateless code if we take the average mutual information in a sublayer versus if we code each sub-block within a sublayer separately. Here we use a rateless code with sixteen layers and the maximum number of rateless blocks that we could transmit is only two .......................... 55 2-14 Comparison of the efficiency of the rateless code if we take the average mutual information in a sublayer versus if we code each sub-block within a sublayer separately. Here we use a rateless code with sixteen layers and the maximum number of rateless blocks that we could transmit is fifty.............................. 56 2-15 Bit-error rate (BER) versus efficiency of a rate-1/5 LDPC code in a time-varying channel. We simulated 1-4 constant SINR sections, which is equivalent to the number of constant SINR sections that the LDPC code would have if the rateless code has 1-4 layers. The channel was simulated to approximate the time-varying channel that the code would see if it is used as a base code for the sublayers in the rateless code from [19]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 59 2-16 Bit-error rate (BER) versus efficiency of a rate-1/5 LDPC code in a time-varying channel. We simulated 1,5,6, and 8 constant SINR sections, which is equivalent to the number of constant SINR sections that the LDPC code would have if the rateless code has 1,5,6, or 8 layers, respectively. The channel was simulated to approximate the time-varying channel that the code would see if it is used as a base code for the sublayers in the rateless code from [19]. . .......... 60 2-17 Frame-error rate (FER) versus efficiency of a rate-1/5 LDPC code in a time-varying channel. We simulated 1,3,5,6, and 8 constant SINR sections, which is equivalent to the number of constant SINR sections that the LDPC code would have if the rateless code has 1,3,5,6, or 8 layers, respectively. The channel was simulated to approximate the time-varying channel that the code would see if it is used as a base 62 code for the sublayers in the rateless code from [19]. . .......... 2-18 Bit-error rate versus efficiency of a rate-1/5 LDPC code in two different time-varying channels. The "4 Section" curve approximates what the code would see if it is used as a base code for the sublayers in the rateless code from [19]. We also reversed the interference pattern that the code would see, and plotted the results in the "4 Sections with 64 Interference Pattern Reversed" curve. . .................. 2-19 Bit-error rate versus SNRnorm of the rateless code from [19]. Each block consists of four layers and two sublayers. The same rate-1/5 LDPC code that has been used throughout was used on each sublayer. Note that the code is only a few dB from capacity and that the gap to capacity decreases as the number of transmitted blocks increases. 2-20 Bit-error rate versus efficiency of the rateless code from [19]. . . 67 Each block consists of four layers and two sublayers. The same rate-1/5 LDPC code that has been used throughout was used on each sublayer. The efficiency of the code at a bit-error rate of 10-4 is approximately 0.66 ................... 3-1 ........... ..... 68 .. Achievable rates with a fixed rate code. The point labeled A corresponds to SNRFR. The channel capacity and the achievable rates with a fixed rate code are plotted as a function of the channel SNR, SN Rc................................... 3-2 Illustration of UEP coding scheme ................... 76 . . 77 3-3 Achievable rates with a UEP code. The points labeled A and B correspond to SNRHP and SNRLP. The channel capacity and the achievable rates with an unequal error protection code are plotted as a function of the channel SNR, SNRc. . . . . . . . . . . . . . . . . . . . . 80 3-4 Achievable rates with a rateless code. The points labeled A and B correspond to SNRRLMIf and SNRRLMa,. The channel capacity and the achievable rates with a rateless code are plotted as a function of the channel SNR, SNRc. ........................ 83 3-5 Achievable rates with RUEP. The points labeled A, B, C, and D correspond to SNRHPMMi,, SNRLPM,,i, SNRHPMax, and SNRLPMax. The channel capacity and the achievable rates with RUEP are plotted as a function of the channel SNR, SNRc. The five channel SNR regimes are also labeled with the numbers 1-5. . ................. 86 3-6 Illustration of RUEP when predictive acknowledgments (PACKs) are used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3-7 Illustration of RUEP when predictive acknowledgments (PACKs) cannot be used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3-8 RUEP when constant streams of high and low priority bits are transmitted. Here the receiver can decode the high priority bits after one block, and the low priority bits after two blocks. Upon receiving an acknowledgment from the receiver for the first set of high priority bits, the transmitter sends a new set of high priority bits in the second block.104 4-1 Comparison of CC HARQ to using a rateless code. Here we set the efficiencies to EFixed = 0.9 and ERateiess = 0.8, and allow the trans- mission of up to four rateless blocks. We plot the sum of the received SNR needed to decode versus the initial rate of the code after the first transmission. ............................... 114 4-2 Comparison of CC HARQ to using a rateless code. Here we set the efficiencies to EFixed = 0.9 and ERateless = 0.7, and allow the trans- mission of up to four rateless blocks. We plot the sum of the received SNR needed to decode versus the initial rate of the code after the first transmission. 4-3 ............................... 116 A zoomed in comparison of CC HARQ to using a rateless code with efficiencies EFixed = 0.9 and ERateless = 0.7. We allow the transmis- sion of up to four rateless blocks. We plot the sum of the received SNR needed to decode versus the initial rate of the code after the first transmission. 4-4 ............................... 117 Comparison of CC HARQ to using a rateless code when the two efficiencies are EFixed = 0.9 and ERateless = 0.8. SNRcMa, = 63. Achiev- able rates are plotted versus the channel SNR. . ............. 4-5 122 Comparison of CC HARQ to using a rateless code when the two efficiencies are EFixed = 0.9 and ERateles, = 0.7. SNRcMa = 63. Achiev- able rates are plotted versus the channel SNR. . ............. 123 List of Tables 2.1 FER Corresponding to Particular BER for rate-1/5 LDPC Code in Time-Varying Channel ............... 3.1 ......... Performance of Rateless UEP Scheme with MMax = NMax = 20 .. 63 . . . 106 Chapter 1 Introduction 1.1 Problem Statement The ideal rateless code has the following property: The code is capacity-approaching for any given channel condition, and without any prior channel knowledge at the transmitter. The term "rateless" describes a channel code that does not have an a priori fixed rate. Instead, a good rateless code will adapt its rate in such a way to be close to the channel capacity. The problem of rateless coding is to create a reliable and efficient method of communicating over an AWGN channel of unknown signal-to-noise ratio (SNR). This is highly desirable in many applications. In particular, rateless codes are very useful in wireless applications where it is not practical to track the channel at the transmitter. In addition to rateless codes, it is often desirable to have a code that has both rateless and unequal error protection properties. With an unequal error protection (UEP) code, a subset of the bits is given greater error protection than the rest of the bits. In current UEP schemes, this prioritization of the bitstream results in a rate loss with respect to capacity. It is desirable to have a code that provides unequal error protection but incurs a smaller rate loss than current UEP codes. In developing a rateless code, it is beneficial to compare it to existing schemes that attempt to adapt to the quality of the channel without needing a priori channel knowledge at the transmitter. Hybrid automatic repeat request (HARQ) attempts this, and therefore comparisons to HARQ will be made. Finally, ad hoc networks require an efficient and robust method of communication between nodes. We will describe how the work on point-to-point rateless codes might be extended to solve throughput and fairness issues in mobile ad hoc networks (MANETs). 1.2 1.2.1 Background Single User Rateless Codes The rateless codes that are simulated and analyzed in this dissertation were proposed in [7] and [19]. These codes adapt to the channel by sending additional coded symbols if the receiver cannot decode the information from the symbols that it already has. Codes that adapt to the channel quality in this way have been researched under the name rateless and also by other names, including hybrid ARQ. There are various types of HARQ. Two common HARQ schemes include Chase combining HARQ [2] and incremental redundancy HARQ. There has been a considerable amount of work focused on creating good incremental redundancy codes from rate-compatible punctured codes, including [10]. The performance of both Chase combining and incremental redundancy HARQ is examined in [26] and [4]. Rateless codes for the binary erasure channel are well-known. These include Luby Transform (LT) codes [13] and more recently Raptor codes [21], which are based on LT codes. However, it has been shown that these codes are not optimal for the binary symmetric or AWGN channels [8],[16]. This research will focus on furthering the development of rateless codes for the AWGN channel. Under certain conditions, the codes proposed in [7] and [19] are capacity-achieving. The performance of codes from [7] and [19] will be simulated and analyzed further, and comparisons to HARQ will be made. Throughout this thesis, there are several parameters of the rateless code that we will refer to often. The rateless codes that we examine repeat coded blocks until the receiver can decode the initial group of information bits. Each block is a superposition of L layers of symbols. In addition, M is the number of blocks that have been transmitted and received. The spectral efficiency of the code used on each layer is p. We will refer to the initial target spectral efficiency of the rateless code after the first block as C*. C* is the spectral efficiency if only one rateless block is needed to decode and the rateless code is capacity-achieving. The actual spectral efficiency of the rateless code after one block is less than or equal to C*, with equality only when the rateless code is one hundred percent efficient and has no gap to capacity. Unless otherwise noted, p and C* will be given per complex symbol, or two dimensions. In addition, both p and C* may be given either as bits/2D, or nats/2D. 1 1.2.2 Rateless Unequal Error Protection Codes A motivation for rateless codes is to provide good performance in the case of a highly dynamic channel and/or a channel where the SNR is difficult to estimate at the transmitter. Another approach to communicating robustly in these conditions is unequal error protection (UEP). An unequal error protection code protects one subset of the bitstream more than another. In UEP schemes, a bitstream is often divided into high priority and low priority bits. If the channel is good, all of the bits are received. However, if the channel is bad, the more important bits are still decoded, but the less important bits are lost. The details of UEP will be more precisely explained in Chapter 3. In this thesis we will propose a rateless UEP (RUEP) scheme. Various UEP codes have been studied in the literature. An information theoretic perspective on UEP is related to work by Cover on broadcast channels in [6]. In practice, various types of UEP codes exist. For the AWGN channel, these include codes based on superposition coded modulation (SCM) [1], [23], [24], time-division coded la "nat" is a logarithmic unit of information when natural log is used in calculating entropy and mutual information. It is analogous to a "bit," with "bit" being the logarithmic unit of information when log base 2 is used in calculating entropy and mutual information. In [7], natural log and units of nats are used. Therefore, it will sometimes be convenient to use p, and C* in units of nats, especially when using efficiency equations and bounds. However, codes are more commonly quoted in units of bits, so it will also be convenient at times to use rates in bits, especially when describing simulation parameters and results. The units will be stated at times and will be clear from the context when they are not explicitly stated. modulation (TDCM) [25], and codes using orthogonal frequency division multiplexing (OFDM) [11]. The latter two approaches share the degrees of freedom between the two sets of bits and transmit the two sets of bits so that they are orthogonal to each other. It is often desirable to have a code that has both rateless and UEP properties. We will refer to such a code as an RUEP code. An RUEP code that is analyzed for the binary erasure channel has been proposed in [17]. The RUEP code proposed in this thesis is a superposition code and is analyzed for the AWGN channel. 1.2.3 Rateless Networking With the ubiquity of wireless networks and mobile ad hoc networks (MANETs), it is increasingly important to have a network that is robust while achieving a high total throughput and a fair throughput to each user. Rateless networking aims to achieve these goals simultaneously. One rateless scheme for the multiple access channel (MAC) has been described in [14]. While promising, MANETs do not have the same base station centered topology as the multiple access channel. We will briefly discuss how single user rateless codes could improve performance in MANETs. 1.3 Outline of the Dissertation In Chapter 2, we provide simulation and further analysis to assess the performance of rateless codes in the AWGN channel. Specifically, we look at the layered, dithered repetition code with a time-varying power distribution described in [7] and the subblock structured code from Chapter 3 of [19]. The code in [7] assumes that the base code used on each layer is perfectly capacity-achieving, and the efficiency results rely heavily on information theoretic arguments. In practice, while good codes such as turbo codes and low-density parity-check (LDPC) codes exist that are close to capacity, there remains a nonzero gap to capacity. The gap to capacity of the base code results in a loss in performance of the overall rateless code. The gap to capacity effect is examined for the code proposed in [7] through simulation and analysis. In addition, the rateless code in [7] assumes that signal points can be transmitted with arbitrary precision. While high performance digital to analog (D/A) converters exist, they still cannot achieve perfect precision. The effect on the code in [7] of having to send points from a discrete grid with uniformly spaced points are examined in Chapter 2. Next, the construction in [19], while originally proposed for parallel Gaussian channels, lends itself to a single Gaussian channel as well. This construction enables less synchronization between the transmitter and receiver, and yields the "fountain" property that is highly desirable in many applications. The fountain analogy is that from one group of information bits the transmitter will encode an endless stream of blocks, acting as a fountain of encoded blocks. The receiver listens for those blocks and attempts to decode. If the receiver can successfully decode, then it sends an acknowledgment to the transmitter. The fountain property is that it does not matter which blocks the decoder receives. If the channel is such that it will take three blocks to decode, then it does not matter if the decoder receives the first three blocks that were transmitted, or the seventh, eighth, and ninth blocks that were transmitted. The decoder only needs a certain amount of encoded blocks. This is similar to filling up a glass of water from a fountain - it does not matter when you start filling up, you only need a certain volume of water. The key to this rateless code performing well is having a base code that performs well in a time-varying channel with a very specific structure. We simulate an LDPC code on this specific time-varying channel. It is shown that, in certain cases, the efficiency that the LDPC code achieves in our time-varying channel is the same as it is for the AWGN channel. Then, we incorporate the LDPC code as the base code for each rateless code block and simulate the rateless code. These simulations show that we can accurately predict the rateless code's performance based on the efficiency of the base code and the amount of mutual information accumulated at the receiver. Finally, other approaches to code design for the time-varying interference channel are considered. In Chapter 3, we look at developing codes that have both rateless and unequal error protection (UEP) properties for the AWGN channel. RUEP codes are important because they allow the prioritization of the information bits and can accommodate bits with two different delay constraints. The prioritization of the bitstream is present in UEP, but RUEP will allow this prioritization while achieving a higher rate than UEP. Accommodating two delay constraints is an improvement on traditional blockbased rateless codes where all of the bits are decoded at the same time. In addition, given a finite delay constraint on the high priority bits, RUEP codes will allow for a larger range of available rates than a rateless code. In Chapter 4, we compare rateless codes to HARQ and also examine ways of using rateless codes to improve mobile ad-hoc network (MANET) performance. First, we compare rateless coding to two different HARQ schemes, Chase combining HARQ and incremental redundancy HARQ, and we will see that rateless coding compares favorably. Then, we discuss using rateless codes in wireless networks, particularly in MANETs. Issues with using existing standards such as 802.11 in MANETs have been well documented [3]. We present a framework to show how rateless codes could potentially improve upon 802.11 performance in MANETs. Chapter 2 Single User Rateless Codes In this chapter, we examine two different rateless codes for point-to-point communication in the AWGN channel - the layered, dithered repetition code from [7] and the sub-block structured code from [19]. We first motivate the need for rateless codes by showing how a good rateless code works from a communication system viewpoint. Then, we will modify an existing lower bound on the efficiency of the code from [7] to more accurately model the gap to capacity of the base code. We will show that this lower bound is still maximized as the rate per layer goes to zero. Our simulation results will show a good rateless code that meets the updated lower bound. The last aspect of the code from [7] that we will examine is the effect of having a finite precision digital to analog (D/A) converter at the transmitter. We will show a four layer code that has a negligible decrease in performance with modest, 5-bit resolution. More generally, we will show that decreasing the D/A mean-squared error does not monotonically increase the performance of the code. For the code in [19] to be capacity-approaching, we need a base code that is good in a very specific time-varying channel. We will discuss several approaches to code design for this time-varying channel. We will see that, in some regimes, an LDPC code that is good for the AWGN channel is also good for this time-varying channel. In addition, we will show that we can accurately predict the performance of the rateless code if we have knowledge of what we will call the mutual information efficiency, and the efficiency of the base code. We end the chapter by describing two approaches to mitigate rate loss due to the acknowledgment protocol. In the derivation of both codes, it was assumed that a positive acknowledgment is instantly received at the transmitter once the receiver is finished decoding a message. Because there might be a round trip delay in the system, this might not be the case in practice. We propose two methods that will help mitigate any rate loss due to round trip delay in the acknowledgment. 2.1 Rateless Codes from a Communication System Viewpoint In the sequel we will discuss different rateless code constructions in more detail. Here we want to motivate the problem from a communication system viewpoint and describe the advantage of using a good rateless code rather than a fixed rate code. Our scenario of interest is one where the channel is constant for the length of time over which a user transmits a message, or group of information bits, but changes from one message to the next. This assumption is made to model a channel that can be viewed as AWGN for the transmission of one set of information bits, but changes quickly enough so that reliable feedback is impossible or costly to obtain. Thus, the transmitter does not know the channel quality. In this case, while the transmitter does not have reliable channel state information (CSI), it is reasonable to assume that both the transmitter and receiver might have good empirical knowledge of the channel statistics. With some knowledge of the channel statistics, we can choose an operating point that balances the throughput requirements of the system with the error levels that we can tolerate. For example, if our system must be robust for a targeted signal-to-noise ratio (SNR) level, then we would choose to code at a particular rate based on that targeted SNR so that we can ensure the information is received reliably. However, if the channel realization has a relatively high SNR compared to our targeted SNR, we have wasted channel capacity and sacrificed throughput for reliability. At the other extreme, we might set our targeted SNR high and code at a higher rate in an attempt to obtain a higher throughput when the channel SNR is high. If we code at a higher rate and the channel SNR is too low, then the message will not be received reliably. Rateless coding removes the tradeoff between reliability and throughput. The rate automatically adapts to the quality of the channel so that a high throughput is achieved when the channel has a high SNR, and the message is still decoded reliably at a lower rate when the channel has a low SNR. A general overview of how a rateless code works is as follows. The transmitter encodes a group of information bits into an infinite stream of blocks. The transmitter sends the first block of coded symbols. If the receiver can decode the information bits using only the first block, it sends an acknowledgment to the transmitter. When the transmitter receives the acknowledgment, it encodes the next group of information bits and begins sending blocks corresponding to that next group of information bits. If, however, the receiver cannot decode the information bits after the first block then it does not send an acknowledgment and the transmitter simply sends the second block of coded symbols corresponding the the current group of information bits. The transmitter sends blocks of coded symbols corresponding to the same group of information bits until an acknowledgment is received at the transmitter. By sending additional blocks of coded symbols that correspond to the same set of information bits, the rate of the code is decreased and the energy per information bit is increased. Figure 2-1 will help us visualize how a rateless code works from a system perspective.1 The transmitter does not know the channel SNR, and starts by transmitting one block of coded information at a relatively high rate. In this example, the rate after the first block is 1 b/1D. Let us suppose that the noise level in the channel is too high, and we are at the 'X' furthest to the left in Fig. 2-1. That is, our value of Eb/No is too low and the receiver cannot decode the information. The transmitter does not receive an acknowledgment from the receiver, so it sends the second rateless block. When the second block is sent, two things happen - the rate of the code is 1 Later in this thesis, we will show results for a rateless code that performs better than the one illustrated. This rateless code is used here to provide a general illustration of how a rateless code works from a communication system viewpoint. decreased by a factor of two, and the energy per bit (Eb) is increased by a factor of two, which is approximately 3 dB. Now we are at the second 'X' in Fig. 2-1. We have moved by a factor of two in Eb/No, and also jumped to the curve corresponding to two rateless blocks. When the transmitter sends the second block and decreases the rate by a factor of two, the Shannon limit on Eb/No decreases, or moves to the left on the Eb/NO axis. Therefore, it is desirable that the waterfall curve corresponding to the lower rate also moves to the left. We see that this indeed happens. Finally, we send a third rateless block. Now our original rate has been decreased by a factor of three, and our original Eb/No has been increased by a factor of three. We are now at the '0' on the plot. The 'O' represents the fact that we can successfully decode at a low bit-error rate. Notice that when we sent the third block, in addition to increasing our Eb/No, we jumped to the lower rate curve corresponding to three rateless blocks being transmitted. With any code we can increase our Eb/No through repetition. However, a good rateless code will provide a more efficient method of repeating the information bits. The advantage of a rateless code is when the transmitter sends another block, not only is Eb/No increased, but we also jump to a curve for a lower rate code. If we choose a fixed rate code for transmission, then we are stuck on one curve. Consider the scenario described previously. If we are conservative and choose a low rate code for the initial transmission, then we could potentially lose throughput. We would lose throughput if the channel SNR was in fact high enough to reliably transmit information at a higher rate. If this is the case, we could have achieved a low bit-error rate with a higher rate code that has a curve further to the right. In the opposite scenario, we might choose a high rate code but the Eb/No is too low to decode at that rate. We can increase our Eb/No at the receiver through repetition, but when we repeat a fixed rate codeword we must continue to follow the curve corresponding to the code we chose. This will require more repetitions than rateless coding. We will have used more energy and also ended up with a lower throughput. Choosing a fixed rate code and repeating the initial codeword is known as Chase combining HARQ [2]. A good rateless code will provide a way to retransmit blocks Cannot Decode w -2 10 LU a) .0 i0 0L 10 4 1 1 Block (rate = 4/5) '. 2 Blocks (rate = 2/ 5) S- - 3 Blocks (rate = 4/ A ....... 4 Blocks (rate = 1/ 5) 10- 5 1 - 6 [ -6 .. \ 1 U ,.l....r....l.... Can finally decode at this rate and Eb/No I..,.I.... 0 Eb/No (dB) Figure 2-1: Example of using a rateless code. The transmitter first sends one block of coded information at rate 4/5, but the Eb/No is too low at that rate so the receiver cannot decode. The transmitter sends a second coded block containing the same information bits, which lowers the rate to 2/5 and also increases Eb/No by 3.01 dB. At the curve corresponding to the transmission of two blocks and the new increased Eb/No, it is still not possible to decode. Finally, the transmitter sends a third block containing the same information bits, which lowers the rate to 4/15 and increases Eb/No by an additional 1.76 dB, or by a total of 4.77 dB (a factor of three) with respect to the first block. At this lower rate and higher Eb/No, the decoder can successfully decode at a bit-error rate of 10- 4 and sends a one bit acknowledgment to the transmitter. that are coded from the same information bits that is much more efficient than simply repeating the same initial codeword. A more thorough comparison between rateless coding and Chase combining HARQ is provided in Sec. 4.1.1. The more efficient retransmission method of a good rateless code allows the code to achieve a high fraction of capacity (and therefore a small gap to capacity) at a wide range of rates without a priori channel knowledge at the transmitter. In the remainder of the chapter, the performance of two different rateless codes is discussed. Now that we have developed an understanding of the power and behavior of a good rateless code from a system perspective, we will look at the efficiency of two different rateless codes. 2.2 A Rateless Code with a Time-Varying Power Distribution A block based rateless code which is based on superposition coding and dithered repetition transmission is proposed in [7]. Each block consists of a superposition of L codewords that are multiplied by different scaling factors and pseudorandom dithering sequences. Each scaled and dithered codeword can be viewed as one layer, with each block consisting of L layers. For a given group of information bits, the same L codewords are repeated in each block; however, the scaling factor and dithering sequence for each of the L codewords changes from block to block. The scaling factors are selected to keep the mutual information per layer balanced and the dithering sequences differ so that the interference a particular codeword sees from other codewords is uncorrelated from block to block. At the receiver, all of the received blocks are used. The codewords are decoded in a particular order. The receiver first decodes the first codeword, then subtracts that codeword's contribution from the received symbols in each block. The next codeword is then decoded and stripped off in the same way. Each of the L codewords are decoded and their contribution to the received symbols stripped off successively. From information theoretic arguments in [7], this code achieves the channel capacity as the number of layers per block (which is also the number of superimposed codewords), L, goes to infinity. For the code to be efficient, the number of layers must grow large so that each of the L codewords is in the low signal-to-noise plus interference (SINR) regime, where repetition does not cause a large loss in mutual information. Further details of the code construction are found in [7]. While the number of layers per block must go to infinity for the code to achieve full capacity, a lower bound on performance with a finite number of layers is given in [7]. However, the derivation of this bound assumes that the code used on each layer is perfectly capacity-achieving (i.e., has a gap to capacity of zero). The effect of the gap to capacity of the base code will be quantified in the sequel. We will incorporate the gap to capacity of the base code into the lower bound on efficiency and show that this lower bound is still maximized by having the number of layers go to infinity, or equivalently by having the rate per layer go to zero. In addition, simulation results show that the performance of the rateless code improves when the inefficiencies are taken into account in the power allocation. Finally, we look at the performance of this rateless code when we cannot send signal points with arbitrary precision and must instead use a finite precision D/A converter at the transmitter. We will see that decreasing the D/A mean-squared error does not monotonically increase the code performance. We will also show that with a four layer code, the performance loss can be made small with modest D/A resolution. 2.2.1 Effect of Imperfect Base Code In analyzing the performance of this rateless code, it was assumed in [7] that the code used on each layer was capacity-achieving. We will call this code used on each layer the "base code." In practice, while good codes exist, there is some gap to capacity of that code. This leads to a lower efficiency of the overall rateless code. We can use the bound developed in [7] and incorporate the gap to capacity of the base code to show a bound on the overall efficiency of the rateless code. The derivation is shown below, where EMRC is the efficiency of the rateless code from [7] taking into account only the efficiency loss from having to combine via maximal ratio combining (MRC) at the receiver. Note that MRC is the optimal way to combine; however, there is still an inefficiency due to the fact that mutual information does not add linearly with signal-to-noise ratio (SNR). We will use Egap to denote the efficiency of the base code. The overall efficiency of the rateless code is called Etot. We will also show that our lower bound on Etot, which we modify from the bound in [7], is maximized by having the rate per layer of the rateless code go to zero. Bound on the Overall Efficiency of the Rateless Code If SNRt is the SNR threshold needed to decode a base code with spectral efficiency p, then a perfectly capacity-achieving code at the same SNR has a spectral efficiency of:2 Pmax = log(1 + SNRt) = log(l + A(eP - 1)) (2.1) where A is the gap to capacity (as a linear multiplicative term) of the base code. We also know from [7] that: EMRC > Pmax - ePma - 1 (2.2) Egap = (2.3) In addition, we define: Pmax Etot = Egap * EMRC (2.4) Substituting (2.2) and (2.3) into (2.4), we have: Etot > e P max - 1 (2.5) After algebraic manipulation of (2.1) and substituting into (2.5), we obtain: Etot 2> 2 m x log( eP• + 1) epmax -1 (2.6) Throughout this work, "log" will mean natural log, or log base e, also commonly written as "In". Log base two will be written "log 2" and log base ten will be written "loglo" We now have a lower bound on the total efficiency of the rateless code. This lower bound includes both the MRC inefficiency and the inefficiency of the base code. Optimality of Increasing the Number of Layers per Block Given the lower bound (2.6), what Pmax will maximize this lower bound? We will call this value the optimal pma for a rateless code. From [7], as Pmax goes to zero, the bound on EMRC goes to one. In [7], Egap = 1, therefore Etot = EMRC, SO having Pmax -- 0 will make the rateless code capacity-achieving. However, for A > 1, Egap < 1 and Egap decreases as the number of layers per block increases. Although it is not immediately clear that having Pmax - 0 is still optimal, we will show that this is in fact the case. We take the limit of Etot as Pmax goes to zero: lim Etot log(ePmax + 1) o g(epm ax - 1 Pmax"-O (2.7) 0 Using L'Hopital's Rule: 1 ePmax lim-- • a Pmax O ePmax (ePmax -1 ) (2.8) (28 A We now see that the lower bound on Etot, the overall efficiency of the rateless code, goes to 1 as the spectral efficiency per layer, Pmax, goes to zero. However, it is not clear that this lower bound is maximized as Pmax goes to zero. In fact, the gap to capacity efficiency, Egap, increases as Pmax increases. This is plotted in Fig. 2-2 for different values of A. In spite of the fact that Egap is maximized by having Pmax -+ 00, we show below that the lower bound on Et& 0 from (2.6) is maximized as Pmax -- 0. side of (2.6) ETotLowerBound: ETotLowerBound = log(ePmax- + 1) og(ePmax - 1 1(2.9) We call the right WO) cts CLU wo O CU I o "(3 0. Cu 0b 0E 0cu 0 1 2 3 Pmax (nats/2D) 4 5 Figure 2-2: Eg,p vs. Pmax for different values of A. Curves with a higher A are below curves with a lower A. Multiplying both sides of (2.9) by A we have: ETotLowerBound * A Substituting = P log (ePm-1 + 1) ePmax -1 (2.10) -1 into (2.10) we have: ETotLowerBound A = log(1++ () (2.11) Using the relationship x > log(1 + x), we see that: ETotLowerBound * A _1 1 ETotLowerBound < - (2.12) (2.13) with equality when x -+ 0 and therefore when Pmax -* 0. Thus, we do in fact obtain the greatest possible lower bound on the efficiency of the rateless code by letting Pmax --4 0. To be fair, while the lower bound (2.6) is valid for any number of received blocks, the exact efficiency of the code after a particular number of blocks might decrease as Pmax decreases. In other words, having Pmax --+ 0 does not guarantee that the efficiency after a particular number of blocks is maximized, only that the lower bound (2.6) on the efficiency is maximized. In particular, the achievable efficiency after the first block actually decreases as Pmax -+ 0. This occurs because there is no loss due to MRC with only one block, so EMRC = 1, and therefore Eto& = Egap after one block. As shown in Fig. 2-2, Egap decreases as Pmax decreases, so the efficiency of the code in the first block decreases as pmax decreases. The lower bound is still valid, but since Eto = Egap, the efficiency in the first block actually decreases as Pmax decreases. In [20], the power allocation for each layer is modified to improve upon the performance of the code from [7]. Although the code in [20] has a power allocation that is different from the one that we have analyzed and used in our simulations, the code is similar and similar results regarding the efficiency of the code are reported. In [20] the gap to capacity of the first block increases if the total rate in the block is fixed and Pmax decreases. Since the gap to capacity of the first block increases as Pmax decreases, the efficiency of the code, Etot, after the first block also decreases as Pmax decreases. However, for all other numbers of received blocks, in [20] the efficiency of the code increases as Pmax decreases. This behavior makes sense in light of our analysis, because for more than one block EMRC < 1 and decreasing Pmax will increase EMRC. Although Pmax -- 0 maximizes the lower bound (2.6), it makes the code less efficient in the first block. However, for all other numbers of received blocks, EMRC is increased as Pmax decreases, and it was found empirically for a similar rateless code [20] that the efficiency of the rateless code does in fact increase as pmax decreases. To summarize, we have shown that the lower bound on efficiency increases for all numbers of blocks as Pmax --+ 0, but for our code and for a similar code [20] the actual efficiency decreases for the first block. While we have not verified that the efficiency increases in our code for more than one block, this is the case for the code in [20]. As suggested in [20], the design of a particular rateless code could be done through a numerical optimization of a weighted sum of the gap to capacity, or equivalently the efficiency, after each number of received blocks. This could be especially useful if we have knowledge of the channel statistics and incorporate that information into the code design. For example, we could numerically take an expectation of the performance of the code over the possible realizations of the channel. 2.2.2 Simulation Results with an LDPC Base Code We simulated two different versions of the layered, dithered repetition rateless code with a time-varying power distribution from [7]. In each code, we have L = 4 layers, and use and LDPC code from the Digital Video Broadcasting Second Generation (DVB-S2) standard. Specifically, we use what is referred to in the DVB-S2 standard as the "short rate-1/4" code. "Short" indicates a codeword length of n = 16200 bits, where "long" would mean a codeword length of n = 64800 bits. The actual rate of this code is 1 b/1D, so, despite the label in the DVB-S2 standard, we will refer to the code as a rate-1/5 code. Information on this standard can be found in [9]. More information on how we simulated the rateless code can be found in Appendix B. With four layers and a rate-1/5 (or, equivalently a rate-0.4 b/2D) base code used on each layer, the overall initial rate of the rateless code after one block is R1 = Rbase * L = 0.4 4 = 1.6 b/2D. The base code achieves a bit-error rate of 10- 4 when the SNR is equal to SNRt, the SNR threshold for decoding at that bit-error rate: SNR = Es/No = SNRt = 0.42 = -3.7679 dB (2.14) where Es is the average signal energy per two dimensions and the noise variance per one dimension is oU = No/2. The spectral efficiency of the base code is p = 0.4 bits per two dimensions (or b/2D), but with a code that achieved capacity at the same SNR, one could achieve: Pmax = log 2 (1 + SNRt) = log 2 (1 + 0.42) = 0.5059 b (2.15) The power allocation of the rateless code depends on our choice of C*, the maximum target spectral efficiency. Once C* is selected, the power is allocated as described in [7]. In the first code that we simulated, we did not take into account the MRC or gap to capacity inefficiency of the code in our selection of C*. C* (given in units of information per two dimensions), was selected in such a way that we tried to achieve 100% of the channel capacity. That is, we chose C* = pL, where p is the spectral efficiency of the base code. Without reiterating all of the details of the code construction in [7], suffice it to say that the power is allocated to maintain a balanced mutual information per layer. However, as noted in [7], when maximal ratio combining is performed at the receiver, mutual information does not add linearly with SNR. Therefore, instead of being able to decode with an initial rate of -- on each layer, from [7] at certain channel noise breakpoints the SNR per layer is lower bounded by L, if we give C* in nats/2D. With a capacity-achieving code on each layer, the achievable rate per layer is lower bounded by log(1+±2), which is less than C* (where C* in both formulas is in nats/2D). This leads to the MRC inefficiency in the rateless code. Taking into account the gap to capacity of the base code, with the power allocation from [7], the actual rate that we can achieve on each layer is lower bounded by Egap * log(1 + -z)nats/2D. In the second code that we simulated, we took into account both the MRC and the gap to capacity inefficiencies in the code and chose a higher C* so that p = Egap * log(1 + 9-) nats/2D, where we have again used C* in nats/2D in the formula. When we choose C* = pL, then the receiver attempts to decode a base code with spectral efficiency p = --, which we have seen cannot be done at the set of channel noise breakpoints assumed in [7]. In contrast, when we choose a higher C* so that each layer is coded at a spectral efficiency p = Egap * log(1 + Z), decoding is possible at the set of channel noise breakpoints assumed in [7]. The derivation of the lower bound (2.6) assumes that the code is constructed with p = Egap *log(1 +± ), ensuring that each layer can be decoded. Thus, the lower bound (2.6) is valid when we choose C* so that p = Egap * log(1 + )." If we choose C* so that p = !Z, then the spectral efficiency of the base code is higher than we assumed in the derivation of (2.6) and 100 10-1 uI ci 10- aD 0.- .0 10-3 10 10 -4 -s 10 10-6 0 2 1 3 4 5 SNRnorm (dB) Figure 2-3: Bit-error rate versus SNRorm of layered, dithered repetition rateless code from [7]. Here we do not take into account the inefficiencies in the rateless code and the base code spectral efficiency is p = . we will see from our simulations that the lower bound no longer holds. Fig. 2-3 shows the performance of the layered, dithered repetition rateless code when the inefficiencies in the code are not considered and we attempt to obtain 100% of capacity with p = C. When we take the inefficiencies of the rateless code into account and choose a higher C* so that each layer is coded at the appropriate rate, p = Egap * log(1 + '), performance improves, and the lower bound (2.6) on efficiency is met. Fig. 2-4 shows the performance of the layered, dithered repetition rateless code when both the MRC inefficiency and the gap to capacity of the base code are taken into account. It is clear that this code performs better than the code that does not take these factors into account. When we take into account the inefficiencies in the code, the bound (2.6) on Etot is valid. It is important to check whether or not the simulation results agree with the 1 Block . 2 Blocks S--- 3 Blocks ... a'' 4 Blocks S- - U) CO (D 0-' I.._ o -ii 1, 0 I._. U) I 1.0 -t Ir 1I, -6 L __ 2 SNR · · 3 (dB) norm Figure 2-4: Bit-error rate versus SNRnorm of layered, dithered repetition rateless code from [7]. Here we do take into account the inefficiencies in the rateless code and the base code has a spectral efficiency of p = Egap * log(1 + "-). theoretical bound. For up to four blocks, the simulated code indeed meets the lower bound on the efficiency of the rateless code. The efficiency of the code is shown in Fig. 2-5. It is also interesting to plot the gap to capacity of the rateless code. The gap to capacity vs. the number of received blocks is shown in Fig. 2-6. We can translate the lower bound on efficiency to a corresponding upper bound on the gap to capacity. This upper bound on gap to capacity is also included in Fig. 2-6. Note that the code meets the upper bound on gap to capacity. If we do not take into account the inefficiencies in the code, the assumptions that were made in deriving the lower bound (2.6) are no longer valid, so it is not clear that the lower bound on efficiency still holds. The efficiency of this code is plotted in Fig. 2-7. Note that for one block, the code clearly does not meet the efficiency · 1 - 0.9 - Efficiency of code - - Lower limit on efficiency of code 0.8 0.7 m! m m m m m !mm m m m e m 0.6 Number of Received Blocks Figure 2-5: Efficiency of the rateless code versus the number of received blocks when p = Egap * log(1 + *). The efficiency of the code from the simulations is greater than the lower bound on Etot. Thus, the simulated results agree with the theoretical bound. 3.5 · \ 2.5 Gap to Cap. of Rateless code SI " - - Upper limit on Gap to Cap. % 44 w 44 Number of Received Blocks Figure 2-6: Gap to capacity of the rateless code versus the number of received blocks when p = Egp * log(1 + 2L). Given the lower bound on efficiency, we can calculate a corresponding upper bound on the gap to capacity, which is plotted in the top curve. The actual gap to capacity from the simulations is below this bound. - 0.9 ( LU - Efficiency of code -- Lower limit on efficiency of code 0.8 0.7 0.6 1 2 3 Number of Received Blocks 4 Figure 2-7: Efficiency of the rateless code versus the number of received blocks when p = !. Here we do not take into account the inefficiencies in the code when choosing C*. The assumptions made in deriving (2.6) are no longer valid. It is clear that the bound (2.6) does not hold for this code and this code performs worse than the code in Fig. 2-5. bound. The simulations show that the code does not meet this bound, so the bound does not apply to the rateless code where p = -. In addition, the gap to capacity of the rateless code with p = T exceeds the corresponding upper bound on gap to capacity, which is shown in Fig. 2-8. Our simulations show that the performance of the rateless code improves when the inefficiencies in the code are taken into account in its construction. Specifically, taking the inefficiencies into account means choosing C* so that p = Egap* log(1 + -- ), with C* given in nats/2D. When we do not take the inefficiencies into account the lower bound (2.6) is not valid. However, when we do take the inefficiencies into account (2.6) is valid and our simulation results up to four rateless blocks meet this bound. Choosing C* so that p = Egap * log(1 + !Z), we have a four layer rateless code with initial rate of 1.6 b/2D that has a gap to capacity for up to four blocks of approximately 2 to 2.5 dB, and an efficiency between approximately 0.69 and 0.74. 3j.3 *0 c3 0. CL Ca 0a- 2.5 0 0 1 2 3 4 Number of Received Blocks Figure 2-8: Gap to capacity of the rateless code versus the received number of blocks when p = . Here we do not take into account the inefficiencies in the code when choosing C*. The assumptions made in deriving (2.6) are no longer valid. It is clear that the gap to capacity upper bound corresponding to the efficiency lower bound (2.6) does not hold for this code and this code performs worse than the code in Fig. 2-6. 2.2.3 Finite Precision Digital to Analog Converter Effects In the code construction, the amount of power allocated to each layer varies from block to block. Since the signal points that are transmitted are a superposition of all of the layers in one block, the time-varying power distribution corresponds to a signal set that also varies with time. Implemented hardware cannot send an arbitrary signal point with infinite precision. We must consider the effects of a finite resolution digital to analog (D/A) converter at the transmitter on the performance of the rateless code. In this section, we will see that for our four layer code, the performance loss is small with an appropriate choice of moderate resolution D/A converter. Interestingly, we will find that in choosing the D/A converter, that reducing the mean-squared error between the points we want to send and the points that we do send does not not necessarily improve the performance of the code. Reducing the mean-squared error is a common goal in analog to digital and digital to analog converters. Therefore, one might initially conjecture that reducing the mean-squared error should be the goal of the D/A converter at the transmitter for a rateless code. However, the goal is to reduce the bit-error rate, and we will show that reducing the mean-squared error does not monotonically reduce the bit-error rate of the rateless code. The signal points that we want to send are calculated digitally as a superposition of all of the layers in one particular block. We call the vector of points that we want to send X. We pass the points through a D/A converter and send the analog points X. The process of sending X through a D/A converter can equivalently be viewed as quantizing each symbol in the the vector X to a certain number of bits, B, then being able to send the exact points with a B-bit D/A converter. With these modifications, the equivalent channel model changes from: Y=X+N (2.16) to the new model: Y = Q(X) + N = +N (2.17) where Q(.) is a B-bit quantizer. Note that Q(.) quantizes each symbol in the vector separately and we have been keeping track of the fact that X is a vector as a reminder that we are sending a block of symbols. Although X is a vector we are not performing any kind of vector quantization - the quantization is done on a symbol-by-symbol basis. If we are free to choose an arbitrary set of 2 B points that we can send with the D/A converter, and the number of points in our signal set is less than or equal to 2 B then the problem becomes trivial. Assuming that each block consists of a different set of points that can be transmitted, then each block has 2 L points that can be sent, where L is the number of layers per block. If we want to have the ability to send up to MMax blocks, then the total number of points that we would need to be able to send is MMax * 2 L . The total number of bits that we would need is B = [log10(MMax * 2 L)1 . For example, L = 4 and MMax = 20 corresponds to a 7-bit D/A converter, which is a moderate amount of resolution. With this moderate amount of resolution, one could use the 4-layer rateless code discussed earlier down to a rate of 1, which would allow reliable communication down to low channel SNR with no loss in performance due to the D/A conversion. In addition, it turns out that as the block number, M, grows large, each layer in the rateless code is allocated equal power. This happens fairly quickly, and how quickly depends on the particular rateless code. This behavior is advantageous because it allows us to make MMax moderate, but still send more than MMax blocks with little or no quantization error. In contrast to the above scenario where the 2 B D/A levels can be selected arbitrarily, here we consider being restricted to using a D/A converter with uniformly spaced levels, which is more realistic. In this case the D/A converter will add distortion to the received signal. To determine the effect of the D/A converter on the overall code performance, we must understand how this distortion is added. Specifically, it is not clear that we will be able to use the same assumptions for quantization noise in this scenario as you might for certain analog signals. In [15], it is explained that, heuristically, "... when the signal is a complicated signal, such as speech or music, where the signal fluctuates rapidly in an unpredictable manner, the assumptions are more realistic." Those assumptions, from [15] but changing the notation to our notation, are: 1. The error sequence e = X - X is a sample sequence of a stationary random process. 2. The error sequence is uncorrelated with the sequence X 3. The random variables of the error process are uncorrelated; i.e., the error is a white-noise process 4. The probability distribution of the error process is uniform over the range of quantization error One assumption that almost certainly will not hold is the first one - that the error sequence is a sample of a stationary random process. Since the signal set varies from block to block, it is very likely that the quantization error sequence will be allowed to take on different values from block to block. We have found empirically that this is the case. Thus, the probability distribution changes with time. In addition, we have found empirically that the variance of the quantization error changes from block to block, so the error sequence does not even meet the criteria to be wide-sense stationary. In addition, the fourth assumption will not hold - that the probability distribution of the error process is uniform (and it is implied that the error process is continuous since we typically quantize a continuous signal) over the range of quantization error. This is obvious because we are quantizing variables that take on discrete values. Therefore, the error signal must have a discrete probability distribution. It is not immediately clear whether or not the quantization error will have a discrete uniform probability distribution. Finally, even if there were some conditions under which these assumptions were correct or almost correct, then the quantization noise would be white but with a uniform probability distribution. It is not clear how adding uniform white noise to the additive white Gaussian noise of the channel would affect performance. The quantity that we wish to minimize is the bit-error rate at the decoder. When quantizing an analog signal, the mean-squared error is often the quantity that is to be minimized. In our scenario, we would intuitively guess that lower mean-squared errors might correspond to better performance, but we cannot be sure of this. As we have already stated, it will be shown in the sequel that this is not necessarily the case. Fortunately, even with a uniform D/A converter, the system designer has control over the range of the converter and the number of bits. While the number of bits can be increased, more sophisticated hardware is necessary to obtain a higher resolution. Therefore, it desirable to understand how a D/A converter affects performance. Description of Digital to Analog Converter at the Transmitter We want to have a D/A converter that has levels at the minimum possible signal point, the maximum possible signal point, and zero. Thus, we develop the following simple algorithm for setting the levels of the D/A converter: 1. For a particular rateless code, determine MMax, the maximum number of rateless code blocks that could be transmitted. 2. Find the minimum and maximum signal point that could be transmitted in those MMax blocks. 3. Set the uniformly spaced digital to analog converter levels so that the minimum level corresponds to the minimum possible signal point, and the second highest D/A level corresponds to the maximum possible signal point. Note that step three automatically places a level at zero, and the maximum possible D/A level is not used. Description of Receiver We have not made any modifications to the receiver. Specifically, the receiver still receives the signal points from the channel with arbitrary precision. In addition, the receiver does not take the D/A converter into account. The maximal ratio combining is done assuming the signal points were transmitted with infinite precision. Also, the successive cancellation structure of the receiver is not changed. The receiver strips off each level assuming that the signal points were sent with infinite precision. Since the receiver assumes that the signal points were sent with infinite (or arbitrarily high) precision, it could in fact strip off the wrong amount of power for a particular layer. However, implementing a more sophisticated stripping algorithm would require additional information and a more complex receiver. For example, it might require an estimate of exactly which signal point was sent, which requires a joint estimate of the bit values in all of the layers corresponding to that particular symbol. We will see that the performance loss can be made small with an appropriate choice of D/A converter at the transmitter. Therefore, we believe that it is more fruitful at this time to look at how to choose the D/A converter instead of modifying the receiver. Performance versus Mean-Squared Error As noted previously, mean-squared error (MSE) is often the quantity of interest when a signal is quantized. Minimizing the MSE is a common goal. Short of minimizing the MSE, any quantizer that results in a lower MSE than another quantizer is deemed better. Thus, an initial conjecture might be that reducing the MSE from the D/A converter for a particular set of rateless blocks will monotonically improve the performance over that set of blocks. We show in this section that is not true. While we do attempt to minimize the MSE, we show that reducing the MSE does not necessarily lead to better performance. In this example we change MMax, the maximum number of blocks that can be sent by the transmitter. This changes the maximum and minimum points that can be sent, and sets our levels for the D/A converter as described earlier. We use MMax = 1 and MMax = 20. Each D/A converter has a different mean-squared error for the first block. We compare the performance of the first block of the rateless code after passing it through each of the two D/A converters. Each D/A converter has 5-bit precision (i.e., B = 5). IU 0 S- - Max number of blocks = 1 Max number of blocks = 20 10- 2 OZ I 10-2 S10-3 -I 1 CL. 10-4 10 - i 0 1 l 2 ~I 3 4 5 SNRnorm (dB) Figure 2-9: Performance comparison of a rateless code that is sent through two different digital to analog converters with 5-bit precision. The quantization levels are chosen differently, and the two codes perform differently. The maximum number of blocks that can be transmitted is denoted by MMax. With MMax = 1 the meansquared error of the first block due to the D/A is 5.6 * 10-3. If we quantize with MMax = 20, then the MSE of the first block is 7.7* 10- 3 . Although the mean-squared error increases for MMax = 20, the bit-error rate decreases. The difference in SNR to achieve a bit-error rate of 2 * 10-4 is about 0.6 dB. If we quantize with MMax = 1, B = 5, then the MSE of the first block due to the D/A converter is 5.6 * 10- 3 . If we quantize with MMax = 20, B = 5, then the MSE of the first block is 7.7 * 10- 3, an increase of about 38%. However, at a bit-error rate of 2 * 10- 4 , the code actually performs about 0.6 dB better when MMax = 20 and the MSE is higher, as illustrated by Fig. 2-9. Thus, the performance of the rateless code does not monotonically increase as the MSE decreases. 100 - No quantization I 5-bit D/A precision - - - 6-bit D/A precision I - I - . 10- 1 8-bit D/A precision - 10-2 I S10 3 10-4 II 0 1 2 SNR I 3 4 5 (dB) Figure 2-10: Effect of quantization on performance of rateless code with one block (M = 1). The bit-error rate vs. SNRo,,rm for one rateless block is plotted for four different scenarios - no quantization, 5-bit quantization, 6-bit quantization, and 8-bit quantization. In each scenario, MMax = 20. Performance with Increasing D/A resolution The assumption that decreasing the mean-squared error would always increase the performance of the code was proven false in an example where we used two different 5-bit D/A converters with different level placement. Here we show that increasing the D/A resolution does not monotonically increase the performance of the rateless code, which is again contrary to what one would naively expect. The performance with no quantization, 5-bit quantization, 6-bit quantization, and 8-bit quantization is illustrated in Fig. 2-10. In Fig. 2-10 only one rateless block is sent and MMa, the maximum number of blocks that can be sent, is set to twenty. In Fig. 2-10 we see that the rateless code with a 5-bit D/A converter performs as well as the code with no quantization. We would expect that increasing the D/A resolution could not yield any improvement in performance. However, the code does indeed perform better when the resolution is increased to 6-bits. When we change the D/A converter, we are changing the codewords of the rateless code, and thus the code itself. Therefore, the fact that the performance with a 6-bit D/A converter is better than the performance of our rateless code with no quantization shows that for one block there is a code that performs slightly better than our rateless code. This is a very interesting result by itself, although not unprecedented. In fact, the code in [20], which is also based on [7] and is similar to our code, but with an optimized power distribution, should perform better given the same base code. Next, it might be assumed that increasing the resolution further would increase the performance yet again, or at least not decrease the performance. However, increasing the resolution to 8-bits, the performance returns to that of the code with no quantization. The loss in performance when increasing the resolution from 6-bits to 8-bits shows that merely increasing the resolution of the D/A converter while maintaining uniformly spaced levels that are calculated as previously described does not always increase the performance of the code. Although increasing the resolution does not always increase the performance of our code, there might exist systematic methods for increasing the resolution that provide performance that is monotonically non-decreasing with respect to the resolution. One possible method is the following: 1. Simulate the performance of the rateless code with a B-bit D/A converter 2. Increase the D/A precision to B+1 bits by ensuring that all levels that are represented by the B-bit D/A can also be represented by the B+1 bit D/A. Simply add new levels in between the existing levels from the B-bit D/A converter. 3. Identify new D/A levels that increase the performance of the rateless code. Keep these levels and discard ones that do not increase the performance of the rateless code. The best way to identify these levels is not immediately clear. The above recommendation is based on the idea that when the resolution is increased, more resolution is available but none of the new D/A levels actually have to be used. At the very least, it is desirable that the performance is not worse than it was with a smaller resolution. Adding new levels one-by-one, or perhaps two-by-two since the signal set is symmetric, would ensure that the performance is not decreased when the resolution is increased. D/A Effects with More Than One Block The previous results were simulated by sending only one rateless block. In this section we show that for a larger number of rateless blocks, the loss in performance with our 5-bit D/A is no longer zero, but it is still fairly small. Specifically, with four blocks the loss in performance at a bit-error rate of 10- 4 with respect to no quantization is 0.16 dB. This is shown in Fig. 2-11. In the fourth block, the difference in power between any two layers is less than 2.1 percent. As the number of blocks goes to infinity, the power allocated to each layer becomes equal. Since the power in each layer is almost equal after four blocks, the quantization error that occurs in the fourth block is very similar to the error that will occur for all ensuing blocks. D/A Effects with a Large Number of Layers The results of the previous sections are promising. The D/A converter causes only a small loss in performance with even modest resolution. However, in the code we simulated, we only use four layers. What if the number of layers increases while the amount of power in each block remains constant? This situation is not straightforward to simulate. As the number of layers increases, the number of signal points also increases, but the power per block stays the same. We would expect the signal points to become more densely packed into a particular region. Because the signal points become more densely packed, this might have a mitigating effect on performance loss due to the D/A converter. In other words, even though we have more signal points, they are more densely packed, so the number of bits of precision necessary to obtain a certain level of performance might not increase with the number of layers. It is difficult to test this hypothesis because as we increase the number of layers, we need a code with a lower rate. Thus we cannot use the same base code, and furthermore it 100 I - - No quantization (4 blocks) I- I 5-bit precision (4 blocks) 10-1 10-2 10 L_ I r D 10-3 .0 I II 10.4 I 0 1 II | 2 SNR I 3 4 5 (dB) Figure 2-11: Effect of quantization on performance of rateless code for four blocks (M = 4). The bit-error rate vs. SNRorm for is plotted for two different scenarios no quantization and 5-bit quantization. The loss in performance due to quantization at a bit-error rate of 10- 4 is 0.16 dB. is not straightforward to find a lower rate base code with the same gap to capacity. For these reasons, the performance of rateless codes with a larger number of layers under a finite precision D/A constraint is an area for further investigation. Conclusion of D/A Effects The most important point is that, at this time, there are no clear rules that relate D/A performance with bit-error rate. It was shown that decreasing the mean-squared error after a particular number of blocks does not necessarily improve the performance of the code after that number of blocks. Similarly, increasing the D/A resolution also does not necessarily improve the performance of the code after a particular number of blocks. It is promising that in our simulations the amount of performance loss was small for even a modest resolution D/A converter. Also, the fact that the power distribution approaches a constant value for each layer fairly quickly means that the the signal points that are transmitted in later blocks are approximately equal. If we regard the signal points in later blocks as exactly equal then past a certain block number the size of the signal set does not grow with the number of blocks. However, if a code with many layers is designed and the number of bits in the D/A converter needs to be greater than the number of layers, then that code would require more complex hardware than a code with fewer layers. We are not sure at this time if the number of bits needed grows with the number of layers. Clearly, further investigation is necessary to understand how to optimize the performance of a rateless code under a certain D/A converter constraint. It is recommended that before implementation, the performance of a particular rateless code under the constraint of a finite precision D/A converter is simulated. In light of the counterintuitive results of this section, simulation will provide accurate feedback on the performance of a particular rateless code with a certain D/A converter. 2.3 A Rateless Code with a Constant Power Distribution The previous code has a time-varying power distribution. The total amount of power in each block remains constant; however, the amount of power allocated to each layer varies from block to block. In Chapter 3 of [19], a sub-block structured code with a constant power allocation was proposed. This code was proposed for parallel Gaussian channels, but here we discuss using it for the single-input single-output (SISO) AWGN channel. One advantage that this code has over the previous code is that it will have the socalled "fountain" property. This allows less synchronization between the transmitter and receiver compared to the code in [7] and is also beneficial in environments where the receiver does not receive certain blocks, or receives a very noisy version of certain blocks. The first Fountain Codes were LT-codes, invented in 1998 and described in [13]. As previously mentioned, these codes and later modifications are capacityapproaching for the binary erasure channel. However, they are bounded away from capacity for the AWGN channel [8],[16]. 2.3.1 Code Design for a Time-Varying Interference Channel We will call one layer from a "basic unit" in [19] a sublayer. We will refer to the concatenation of multiple sublayers as one layer of the block. Due to the structure of the rateless code in [19], each sublayer sees a time-varying amount of interference, plus channel noise. The signal energy and variance of the channel noise stay constant, but the overall interference plus noise varies within each sublayer. Therefore, the signalto-noise plus interference ratio (SINR) varies within the sublayer. The SINR within a sublayer has a very specific structure. Note that a sublayer is a block of n coded bits. We can divide each sublayer into L mutually exclusive subsets of bits, each of length L. As in [19] we will call each of these L sets of bits a sub-block. The first sub-block is the first L bits of the codeword, the second sub-block is the next 1 bits of L bits of the codeword, and so on until the Lth sub-block, which is the last L the codeword. The first sub-block sees only channel noise. The second sub-block sees channel noise plus one layer of interference. Each layer of interference has energy per symbol that is equal to the signal energy per symbol in one layer. The next sub-block sees an additional layer of interference. This continues until the final sub-block in a sublayer sees channel noise plus L - 1 layers of interference. While the SINR varies across a sublayer, within a sub-block the SINR is constant. Therefore, each sublayer is essentially divided into L constant SINR sub-blocks. One block of a rateless code with four layers and one sublayer is illustrated in Fig. 2-12. In Fig. 2-12, the rateless block is shown on the left, and on the right we zoom in on the top sublayer. The dashed lines represent the boundaries between the different sub-blocks in the sublayer. The labels rl, r 2 , r 3 , and r4 in the different sub-blocks will be explained later when we discuss the divide and conquer approach to constructing a code for the time-varying I First I Second I Third Fourth Sub-block :Sub-block Sub-block Sub-block r2 T rl rT•r,÷,-,1 1,-,•,,-,-÷h ' T l oUtJaen l ,--,g ,-,11 •,,1• tU ll f h Uo r4 T3 a b sU-UUU 1-,1,-.,•1 .... bl 1•,;4-• bi k s, no ts 5 Figure 2-12: One block of a rateless code from [19] with four layers and one sublayer per layer. On the right, we take a closer look at the top sublayer, which is n bits long. The dashed lines indicate the boundaries between sub-blocks in the sublayer. The SINR is constant within a sub-block, but is different for each sub-block. If we choose to design a base code with the divide and conquer approach, we use a different code with codeword length n on each sub-block. The labels ri indicate the rate of the code used on the ith sub-block if we take the divide and conquer approach to base code design. Another approach to base code design is to use one code with codeword length n across the entire sublayer. interference channel that a sublayer sees. The base code for the sublayer of n bits must be capacity-achieving in a timevarying interference channel for the rateless code to achieve the efficiency described in [19]. Designing such a base code is a difficult problem. One approach is to "divide and conquer" - that is take advantage of the division of a sublayer into L constant SINR sub-blocks and use a different rate code on each one of the sub-blocks that matches the SINR in that particular sub-block. The divide and conquer approach takes advantage of the fact that the SINR is constant within each sub-block and now the problem of code design is reduced to creating good AWGN codes at different rates, which has been studied extensively. However, it is shown in the sequel that this approach is suboptimal. The basic reason is that instead of averaging the mutual information over the different sub-blocks and taking the worst-case scenario channel realization, the worst-case scenario for each of the L sub-blocks is considered separately, and then the average mutual information is taken. A second approach is to use a code that works well in the AWGN channel and see if the performance is still good for this specific time-varying channel. This approach has been simulated with an LDPC code and is detailed in the sequel. We found that, for a modest number of sub-blocks per layer, the code performs as well in this time-varying channel as it does in the AWGN channel. Divide and Conquer One approach to coding for the time-varying interference that each sublayer sees is to use the division of each sublayer into constant SINR sub-blocks and code at a rate appropriate for each sub-block. The division of a sublayer into constant SINR subblocks, each with a different rate code, is illustrated in Fig. 2-12, where ri is the rate of the code used for the ith sub-block. This transforms a time-varying interference channel into several constant-SINR channels. Because the SINR decreases as the subblock increases, we must choose rl > r 2 > r 3 > r 4 . However, taking this approach is suboptimal and results in a loss in the achievable rate of the overall rateless code. Fig. 2-13 shows that the efficiency of the rateless code is greatly reduced with this divide and conquer approach, even if we only plan on sending a maximum of two rateless blocks. In Fig. 2-13, we set MMax, the maximum number of blocks that can be transmitted, to two (MMax = 2). The loss in efficiency due to the divide and conquer approach becomes worse if we want be able to transmit more than two blocks. This is illustrated in 2-14, where we look at the efficiency if we want to send up to fifty rateless code blocks (MMax = 50). The efficiency of both approaches decreases as we increase MMax. Comparing Fig. 2-14 to Fig. 2-13, we see that as MMax increases, the efficiency of the divide and conquer approach is decreased more than the efficiency when we do not divide and conquer. The results illustrated in Figs. 2-13 and 2-14 are typical results as we vary the maximum number of blocks. One might initially posit that a possible redeeming quality of the divide and conquer approach is that the loss in efficiency is not as large for lower values of C*, the maximum target rate. While rateless codes in the low C* regime are often useful, the potential of rateless codes is exciting because we can extend the code to be capacity-achieving for a wide range of rates that include higher values of C*. I I 0.95 I I I % 0.9 0.85 o 0.8 0.75 0.7 0.65 0.6 Diverdingeachemu I 0AG 0 2 4 noasublayerfurthe 6 C*, (b/2D) 8 10 12 Figure 2-13: Comparison of the efficiency of the rateless code if we take the average mutual information in a sublayer versus if we code each sub-block within a sublayer separately. Here we use a rateless code with sixteen layers and the maximum number of rateless blocks that we could transmit is only two. Efficiency is plotted against the maximum achievable rate, C*, which is the Shannon limit on the spectral efficiency that you could communicate over the AWGN channel if you can decode after one rateless block. In Figs. 2-13 and 2-14 we assumed for both curves in each figure that the codes used on each layer as well as the interference plus noise that a layer sees are Gaussian. Thus, mutual information is given by the familiar formula log2 (1+ SNR). In practice, the codes used on each layer will most likely be binary and therefore the interference that a layer sees would be a combination of independent binary random variables. An efficiency analysis in [19] that compares the efficiency when approximating the interference plus noise as Gaussian to the efficiency using exact noise analysis for a Gaussian codebook on each layer shows that often the two are very close. The exceptions are with a low number of blocks and at high C*. As the number of blocks combined at the receiver increases, the interference plus noise will become more Gaussian by the Central Limit Theorem. Given the huge discrepancy in efficiency in both Fig. 2-13 and Fig. 2-14, the fact that we approximated the code and interference plus noise as Gaussian is a relatively insignificant source of error. · 0.9 · · · - 0.8 '4 '4 '4 '4 >, 0.7 '4 '4 W 0.6 '4 '4 '4 .4 H 0.5 E I 111 I Averaging the mutual information in a sublayer I I Dividing each sublayer further I I I I I I 6 C*, (b/2D) Figure 2-14: Comparison of the efficiency of the rateless code if we take the average mutual information in a sublayer versus if we code each sub-block within a sublayer separately. Here we use a rateless code with sixteen layers and the maximum number of rateless blocks that we could transmit is fifty. Efficiency is plotted against the maximum achievable rate, C*, which is the Shannon limit on the spectral efficiency that you could communicate over the AWGN channel if you can decode after one rateless block. The reason for the loss in efficiency when we code each sub-block separately is that we code at the minimum mutual information that each sub-block could see. That amount of mutual information depends on the actual channel realization. If we are given that our channel SNR will fall in a certain range then we know that the Shannon upper limit on the rate that our channel can support will be in an interval [M--x, C*] corresponding to the given range on the channel SNR. Therefore, the number of blocks the receiver needs to decode falls in the interval M = [1, MMa.], where M is an integer. For simplicity of analysis, we have assumed that the rate that the channel can support is exactly 2 for some integer M in the range M = [1, MMaX]. If it takes M blocks to decode, then we assume that the channel signal-to-noise ratio is: SNR(M) = 211 - 1 (2.18) To find the efficiency of the rateless code, we search over every number of blocks in the interval [1, MMax] and find the channel noise corresponding to needing that number of blocks to decode. Using that channel noise value, we calculate the amount of mutual information that a sublayer contains and also the amount of mutual information that each sub-block contains. We want the rateless code to work for any number of blocks in the interval [1, MMax]. Therefore, if we do not divide and conquer, we must code each sublayer at a rate lower than the minimum possible amount of mutual information that each sublayer could contain. If we do divide and conquer, we must code each sub-block at a rate lower than the minimum possible amount of mutual information that each sub-block could contain. The divide and conquer approach is less efficient because we take the minimum mutual information that each sub-block could contain, and then take the average of those minimum values. In general, each sub-block could have a mutual information minimum occurring at a different value of M within the interval [1, MMax]. In contrast, when we do not divide and conquer, we take the average mutual information that a sublayer sees, then take the minimum of those averages, which results in a significantly higher efficiency. The divide and conquer approach is not suitable for moderate to high values of C*, so we must look at other ways of coding near the capacity of our time-varying interference channel. AWGN Codes in Our Time-Varying Channel Unfortunately, the simple divide and conquer approach is highly inefficient for most useful scenarios. In this section we examine using a single-rate AWGN code for the entire sublayer. That is, we will use a code that has codeword length n that is equal to the length of the sublayer instead of using the division of the sublayer into L sub-blocks and using L different rate codes with codeword length ý. As previously discussed, the code block will see a time-varying SINR. The amount of interference that the code sees has a particular structure. The amount of interference is constant within a sub-block but increases by one layer as the sub-block index increases. We approach this problem by simulating the same rate-1/5 low-density paritycheck (LDPC) code from the DVB-S2 standard that we used previously. In this section, we simulate the code with a time-varying interference channel similar to the one that it will see when it is used as a base code for each sublayer of the rateless code from [19]. The only difference is that when used as a base code for the rateless code, the interference will be a combination of binary random variables and the channel noise will be Gaussian. Our simulations treat the interference plus channel noise as Gaussian, which is a more general and also worst-case scenario. Since the interference plus noise variance changes in different sections of the code block, we must take this into account at the decoder. The variable interference plus noise variance is taken into account when the log-likelihoods are calculated. The log-likelihood is a function of the interference plus noise variance, thus it is straightforward to incorporate the changing variance when calculating the received bit log-likelihoods. Specifically, we calculate the log-likelihoods one sub-block at a time, using the appropriate interference plus noise variance for each different subblock. This is done before the log-likelihoods are fed into the iterative decoder. In Fig. 2-15 we plot the bit-error rate vs. the efficiency of the code for one to four sub-blocks, or constant SINR sections, per codeword. For each different curve, as we varied the channel noise we changed the SINR in each section. We can calculate the total capacity across the block as the average mutual information. Each channel 0 Sn 1 constant SINR section *2 constant SINR sections - - -3 constant SINR sections 4 constant SINR sections 10- 1 1U C 10-2 (D L_ 10-3 1D -4 .104 I - 10 5 1 0.95 0.9 0.85 0.8 0.75 Efficiency 0.7 0.65 0.6 0.55 Figure 2-15: Bit-error rate (BER) versus efficiency of a rate-1/5 LDPC code in a time-varying channel. We simulated 1-4 constant SINR sections, which is equivalent to the number of constant SINR sections that the LDPC code would have if the rateless code has 1-4 layers. The channel was simulated to approximate the timevarying channel that the code would see if it is used as a base code for the sublayers in the rateless code from [19]. noise value corresponds to a certain amount of capacity in the block. The efficiency of the code is defined as the code rate divided by the available capacity. For a meaningful comparison of the different curves, we plot BER vs. efficiency instead of BER vs. channel noise level. Fig. 2-15 shows promising results because the bit-error rate performance of the LDPC code is not changed for up to four sections. This provides evidence that, under certain conditions, an LDPC code that is good for the AWGN channel is also good for this particular time-varying channel. The fact that there is no loss in efficiency for up to four sections is quite remarkable. For one to four sections, the bit-error rate performance of the LDPC code changes _ 100 · S- - - ....... * 4. SINR SINR SINR SINR section sections sections sections ¼ ¼ 2 -4. * Cn 1... 4. 0 10 L.. 1 constant 5 constant 6 constant 8 constant · '¼ 4. 10 - · -3 -4 10 4. I 0.9 0.85 I 0.8 I 0.75 S 0.7 0.65 0.6 0.55 Efficiency Figure 2-16: Bit-error rate (BER) versus efficiency of a rate-1/5 LDPC code in a time-varying channel. We simulated 1,5,6, and 8 constant SINR sections, which is equivalent to the number of constant SINR sections that the LDPC code would have if the rateless code has 1,5,6, or 8 layers, respectively. The channel was simulated to approximate the time-varying channel that the code would see if it is used as a base code for the sublayers in the rateless code from [19]. negligibly. The BER performance then degrades for each increase in the number of sub-blocks per codeword past four. This is illustrated in Fig. 2-16, where we show one, five, six, and eight section simulations on the same axes. The efficiency of the code decreases slightly at low bit-error rates when we have five sections. As we increase the number of sections past five, the loss in performance at all bit-error rates becomes larger. In addition to the curves moving further to the right, the curve corresponding to eight constant SINR sections is flattening out at a BER of about 10 - 5 , so there appears to be an error floor. For the code proposed in [19] to be efficient there must be a large number of layers, and the number of layers necessary increases with the value of C*. The number of constant SINR sections (i.e., sub-blocks per sublayer) is equal to the number of layers by the construction of the rateless code. Therefore, to achieve very high efficiencies at high values of C*, we ultimately need codes that work well for a large number of constant SINR sections. We have seen that the BER performance of this LDPC code does not degrade with up to four sections, but the performance suffers slightly with five sections and further degrades as we increase the number of sections further. In addition, with eight sections there appears to be an error floor at a BER of 10- 5 . We have not run enough iterations of our simulation to investigate possible error floors for the other numbers of sections. In order to get accurate results at lower bit-error rates, we would need to run more iterations for those curves. At this point we can only say that if they do have an error floor it occurs past a BER of 10- 5 . More insight can be developed by looking at the frame-error rate (FER) as we vary the number of sections in our simulation. We have defined a frame-error as an error on any one of the k information bits after attempting to decode. Figure 2-17 shows the FER vs. efficiency. Using FER as a measure of performance, the performance degrades as the number of sections increases. More importantly, the relationship between bit-error rate and frame-error rate changes as the number of sections increases. Table 2.1 quantifies the changing relationship between FER and BER as we vary L, which is equal to the number of sub-blocks per codeword (and the number of layers that we would have in the rateless code). Specifically, Table 2.1 shows, at three different bit-error rates, the corresponding frame-error rate as we vary the number of sub-blocks per codeword. From Table 2.1, and comparing Fig. 2-16 to Fig. 2-17, we see that at a given bit-error rate, the frame-error rate is higher when the code sees more sections of interference. The interpretation of this result is that frame errors happen more frequently with more sections, but when there is a frame error the number of bit errors is fewer with more sections. In an attempt to find out if particular aspects of the structure of the LDPC code contribute to the loss in performance as the number of constant SINR sections increases, we ran the following simulation. We used the same code and generated a codeword in the same way, but reversed the interference pattern. Now the first I 100 1 1 N ~6 -- 'I -- 10-2 4' -4' I' '4' 4' -3 4' 10 44 1I 10 I I 0. 9 0.85 -4 I_· 0.8 1 constant SINR section - 3 constant SINR sections 44 V 0.75 - - - - 5 constant SINR sections ....... -0- 6 constant SINR sections 8 constant SINR sections · I · 0.7 0.65 0.6 0.55 Efficiency Figure 2-17: Frame-error rate (FER) versus efficiency of a rate-1/5 LDPC code in a time-varying channel. We simulated 1,3,5,6, and 8 constant SINR sections, which is equivalent to the number of constant SINR sections that the LDPC code would have if the rateless code has 1,3,5,6, or 8 layers, respectively. The channel was simulated to approximate the time-varying channel that the code would see if it is used as a base code for the sublayers in the rateless code from [19]. sub-block in the code will see the most interference, and the last sub-block will see the least amount of interference. The reason for this simulation is that the code is systematic, which means the first k bits of the n bit codeword are simply the information bits. For this code k = 3240 and n = 16200. In addition to being systematic, of the 3240 information bits, the first 1440 of them have degree 12, and the last 1800 of them have degree 3. Of the remaining n - k = 12960 bits, called the parity bits, one bit has degree one, and the rest have degree 2. The details of the code can be found in [9]. In the simulations with five or fewer sections, the first k bits have the same SINR. Also, with any number of sections, those k information and high degree bits have higher SINR compared Table 2.1: FER Corresponding to Particular BER for rate-1/5 LDPC Code in Time-Varying Channel 3 L FER at BER = 10- 4 1 3 5 10-2.75 " 10-1 83 10- 0.90 10 - 2.31 10 - 1.55 10- 0.70 10 - 1.69 10-1.44 10- 1.23 10 - 1 .08 10-0.94 10- 0.57 47 10- 0. - 0.40 10 10-0.13 6 8 FER at BER = 10- FER at BER = 10-2 to bits at the end of the codeword. It is possible that because the first k bits are the information bits, and they also include all of the bits in the codeword with high degree (greater than 2), that it is advantageous to have these bits see a higher SINR than the bits at the end of the codeword with lower degree. In the simulation shown in Fig. 2-18, the first k bits still see a constant SINR, but they see the lowest SINR of any group of bits in the block. It is clear that the performance in this case actually improves. Thus, this LDPC code's good performance up to five sections does not depend on the first k bits seeing a relatively low SINR. We are not sure at this time why, as we move toward a higher number of constant SINR sections, the frame-error rate is higher for a given bit-error rate, or why the efficiency of the code decreases. It is possible that the block length of the code needs to be longer to accommodate the more rapidly changing interference. In addition, as the number of sections increases, the bits in the last section see more interference. It is possible that this makes it more difficult to decode certain information bits that have a particular relationship in the LDPC code's graph to these very noisy bits. At this point, these reasons are speculation and explaining this behavior is an area for further investigation. A relevant work that discusses LDPC codes in time-varying channels is [12]. In [12], LDPC codes are optimized for an uncorrelated Rayleigh fading channel with channel state information (CSI) at the receiver where each bit has an independent identically distributed fading coefficient. Among the observations in [12] are the fact IU 0 10 1C M 10 0 1c 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 Efficiency Figure 2-18: Bit-error rate versus efficiency of a rate-1/5 LDPC code in two different time-varying channels. The "4 Section" curve approximates what the code would see if it is used as a base code for the sublayers in the rateless code from [19]. We also reversed the interference pattern that the code would see, and plotted the results in the "4 Sections with Interference Pattern Reversed" curve. For both curves we used the same code and 4 constant SINR sections. The difference in the simulations is that in the "4 Section" curve the SINR decreases as you move from the beginning of the code block to the end, and in the "4 Sections with Interference Pattern Reversed" the SINR increases as you move from the beginning to the end of the code block. that, when an uncorrelated fading channel is simulated, LDPC codes optimized for the AWGN channel often have decoding thresholds that are only about 0.1 to 0.2 dB away from the decoding threshold of an LDPC code that is optimized for the uncorrelated fading channel with CSI at the receiver. In addition to our results, the results in [12] are another example of LDPC codes that are good for the AWGN channel being good for a particular time-varying channel. Another interesting result in [12] is that irregular LDPC codes designed for an uncorrelated fading channel work well on a correlated fading channel without the use of an interleaver. It is argued that the sparseness and random construction of the parity-check matrix for an LDPC code provide a built in "interleaver" effect. Essentially, if one bit is in a deep fade, it is likely that bits connected to it in the LDPC code's graph are not. This is advantageous for the base code that needs to be used in this rateless construction because the time-varying interference channel that the base code sees could alternatively be viewed as a correlated fading channel. The "interleaver" effect of a good LDPC code would make it likely that bits in one sub-block with a low SINR are close on the graph to bits that see a higher SINR. Finally, it might be possible to optimize an irregular LDPC code for a particular number of constant SINR sections. We have shown results using an irregular LDPC code that is good for the AWGN channel and applying it to our specific time-varying channel while varying the number of constant SINR sections across the block. Other work has successfully optimized LDPC codes for the erasure channel [22], and for the AWGN channel [18],[5], and as previously mentioned for the uncorrelated Rayleigh fading channel [12]. Optimizing an LDPC code for the time-varying channel that it will see as part of this rateless code is a problem for further investigation. 2.3.2 Performance of Rateless Code with a Constant Power Distribution We have discussed our simulations of a rate-1/5 LDPC code in a time-varying channel that approximates the channel that it will see when used as a base code for each sublayer in the rateless code designed in [19]. We have also simulated the overall rateless code using the same rate-1/5 LDPC code as the base code for each sublayer. Our rateless code consists of four layers (L = 4), and each layer has two sublayers (Lsub = 2). Recall that the number of layers in the rateless code dictates the number of sub-blocks (or constant SINR sections) in each sublayer. Therefore, we would expect that the LDPC base code performs as it did in the four section simulation from Sec. 2.3.1. If the layers of the rateless code were not staggered, then the initial rate of the code after one block would simply be equal to the number of layers multiplied by the rate per layer. For our rateless code with four layers and a rate 0.4 b/2D base code per layer, the overall initial rate would be R 1 = Rbase * L = 0.4 * 4 = 1.6 b/2D. However, staggering the layers increases the length of the rateless block by L - 1 sub-blocks. The number of sub-blocks without staggering the layers is equal to Lub * L, and the number of sub-blocks with staggering is Lsub * L + (L - 1). Thus, to calculate the actual initial rate of the code we use the formula: (2.19) R1 = RbaseLLsubL-1 Lsub + and for our code the initial rate is R 1 = 0.4 * 4 * L = 1.1636 b/2D. Figure 2-19 shows the bit-error rate versus SNR,norm for the rateless code that we simulated. The performance is within a few dB of capacity and the gap to capacity decreases as the number of transmitted blocks increases. We also plot the BER versus efficiency. Figure 2-20 shows the bit-error rate vs. the efficiency of the code. Fig. 2-20 is the same as Fig. 2-19 except that the horizontal axis is now efficiency instead of SNR,no,m. In Fig. 2-20 we see that the efficiency of the rateless code at a bit-error rate of 10- 4 is approximately 0.66. From [19], due to the construction of the code, there is a certain loss in mutual information with respect to the channel capacity. In addition, mutual information only provides an upper limit on the achievable rate and any practical code will not achieve this upper limit. Specifically, we have shown that 100 10- 1 10-2 0 -4 10 0 1 2 3 4 5 SNRnorm (dB) Figure 2-19: Bit-error rate versus SNR,,orm of the rateless code from [19]. Each block consists of four layers and two sublayers. The same rate-1/5 LDPC code that has been used throughout was used on each sublayer. Note that the code is only a few dB from capacity and that the gap to capacity decreases as the number of transmitted blocks increases. using the rate-1/5 LDPC code we can achieve about 79% of the channel capacity at a BER of 10- 4 . Thus, the overall efficiency of the rateless code depends on the loss in mutual information from the rateless code construction and the efficiency of the base code. In the following, we will show that the achieved efficiency of the rateless code can be very closely predicted by the mutual information and base code efficiency. To begin, let us define EMI, Egap, and Ett,2 to be the mutual information efficiency, the efficiency of the base code, and the overall efficiency of the rateless code, respectively. The subscript "2" for Etot will denote that we are talking about the constant power rateless code from [19], which is the second rateless code analyzed in this thesis. If we assume that EMI and Egap are the two major sources of inefficiency in this rateless 100 - 1 Block '-' -'2 Blocks - - -3 Blocks 4 Blocks 10 Blocks 10- 1 SLU w m 40) 1L. 0Cl.0 -3 10 n~ 10 -4 t • 0.9 0.85 = 0.8 | 0.75 | 0.7 ! • 0.65 , | 0.6 0.55 Efficiency Figure 2-20: Bit-error rate versus efficiency of the rateless code from [19]. Each block consists of four layers and two sublayers. The same rate-1/5 LDPC code that has been used throughout was used on each sublayer. The efficiency of the code at a bit-error rate of 10- 4 is approximately 0.66. code construction, then we would expect that: Etot2 ~~EMI * Egap (2.20) We have already stated that the efficiency of the base code with four constant SINR sections is 0.79. We must make a modification to the calculation of EMI used in the plot in Fig. 2-20 in order to obtain a more precise estimate of EMI, the mutual information efficiency. In Fig. 2-20 we assumed that the number of sublayers per layer goes to infinity, thus making any loss in rate at the edges of the block due to the staggered structure of the layers negligible. The loss in rate due to a finite number of sublayers is quantified in (2.19). If we calculate EMI of the code for two sublayers (i.e., Lsub = 2), it is lower than if we assume that Lsub goes to infinity. When we use Ls 8 b = 2, EMI does not change much with M for the values of M that we simulated in Fig. 2-20 (i.e., EMI remains fairly constant for M = 1, 2, 3, 4, 10). Thus, we would expect all of the curves in Fig. 2-20 to be close together and we see that this is indeed the case. With M = 10, the mutual information efficiency is EMI = 0.834. Looking at the overall efficiency for M = 10, we predict that: Etot, = EMI * Egap = 0.834 * 0.79 = 0.659 (2.21) That is, we predict that with ten blocks the rateless code will have an efficiency of 0.659. We can see in Fig. 2-20 that the efficiency at a bit-error rate of 10- 4 is approximately 0.66. It is very promising that for this rateless code our model of the major sources of inefficiency is accurate. 2.4 Mitigating the Effects of Acknowledgment Delay With the rateless codes discussed in the previous sections, there is a one bit acknowledgment (ACK) sent from the receiver to the transmitter when the message is reliably decoded. When the transmitter receives this acknowledgment, it begins sending the next set of data bits. In the information theoretic derivation of the performance of the rateless codes, it made sense to assume that the acknowledgment is instantly received at the transmitter. This ensures that the transmitter does not send any additional and unnecessary blocks due to a delay in receiving the acknowledgment. In practice, there might be a delay between the time the receiver sends the ACK and the time the transmitter receives that ACK. Thus, the transmitter might unnecessarily send additional blocks before it receives the ACK and begins sending the next message, which would result in a rate loss in the rateless coding scheme. We briefly describe two methods for mitigating this rate loss. 2.4.1 Modifying the Acknowledgment The rate loss due to a delay in the acknowledgment reception at the transmitter can be mitigated if the receiver has accurate channel state information (CSI). The receiver should be able to predict when it will be able to decode the message. This knowledge can be used to send a predictive ACK, which we will call a PACK, back to the transmitter. If we want the PACK to remain only one bit, as the standard ACK is, then the receiver should send the PACK an appropriate amount of time before it can actually decode. Then, when the receiver is actually finished decoding, the transmitter will have received the PACK and will have started transmitting the next message, eliminating any rate loss due to the acknowledgment protocol. The usefulness of the PACK depends on the quality of the channel state information at the receiver. If the receiver sends a PACK but in fact cannot decode after the predicted number of blocks, then the message is lost with the protocol previously described. For this reason, it might be desirable to have a more sophisticated ACK system. For example, the receiver can send a soft PACK letting the transmitter know that it is likely that it can decode after a certain number of blocks. This soft PACK can still be a one bit message. If the transmitter starts sending the next message, but the receiver cannot decode the previous message, then the receiver has the option of sending a negative acknowledgment (NACK) to the transmitter. This would be interpreted at the transmitter as the receiver thought it would be able to decode the previous message, but it cannot, so it requires one or more additional blocks corresponding to the previous message. It would be helpful if this NACK were more than one bit to provide the transmitter with precise directions, such as how many more rateless blocks to send corresponding to the previous message before continuing on to the next message. It might be possible to devise a scheme where even this NACK is one bit; however, that is beyond the scope of this section. The purpose of this section is to show that the rate loss due to a delay in the system can be mitigated by modifying the acknowledgment protocol. 2.4.2 Interleaving Different Data Streams Another approach to attenuating or even eliminating the rate loss that does not depend on accurate channel state information at the receiver is interleaving the blocks of different data streams. We assume that the transmitter has a large amount of data divided into X different streams, numbered 1, 2,..., X. The rateless code that we have described up to this point sends the first block of data stream one, then the second block of data stream one, and so on, until data stream one is decoded and an acknowledgment is received. That is, the rateless code sends the following sequence of blocks, where we use the notation iM to mean that we have sent block number M from the i t h stream: l, 12, 13 ... (2.22) and when the first stream is finally decoded, the transmitter sends the second stream: 21,22,23... (2.23) Sending the blocks in this order will result in a rate loss if there is a delay in the reception of the acknowledgment at the transmitter. If we know at the transmitter that there is a delay of DACK blocks from the time the transmitter is finished sending a certain block to the time an acknowledgment corresponding to that block would be received at the transmitter, then after sending a block from the i t h stream, the transmitter should send DACK blocks from the other streams. The transmission order should be: 11, 2 1, 3 1,..., (DACK + 1)1 (2.24) After block (DACK + 1)1 is transmitted, the transmitter should receive an acknowledgment corresponding to block 11 if block 11 was successfully decoded at the receiver. If an acknowledgment is received, then the transmitter should send block (DACK + 2)1. If the acknowledgment is not received then the transmitter should send block 12. This modification to the rateless code does not depend on accurate CSI at the receiver, but it does depend on accurate DACK knowledge at the transmitter. Chapter 3 Rateless Unequal Error Protection Codes Unequal error protection (UEP) and rateless coding are both methods used to combat channel uncertainty. With a rateless code, the code rate automatically adapts to the channel quality through the use of incremental redundancy. In contrast, UEP allocates resources in a way to ensure that bits with higher priority are received reliably after only one transmission, even under poor channel conditions. One advantage of UEP over both fixed rate and rateless coding is that UEP allows this prioritization of a bitstream. In addition, it allows for an arbitrary range of rates. While UEP is often highly desirable, it is limited in the same way that traditional fixed rate communication links are limited. That is, without the use of rateless codes, the link margins, or extra signal energy allocated to both the high and low priority bits, must be specified a priori. The need for link margins and the prioritization of the bitstream that UEP provides both lead to a loss in throughput. We create a coding scheme that has both rateless and unequal error protection properties. We will use the abbreviation RUEP to refer to a rateless unequal error protection code. An RUEP code provides the bitstream prioritization that UEP provides, but attains a higher efficiency than UEP by using rateless codes to adapt to the channel. The ideal RUEP code would have the following property: Whenever the receiver decides to decode, it can successfully decode and will receive an amount of information that is arbitrarilyclose to the upper limit set by the channel capacity. The basic premise for an ideal RUEP code is that the receiver can wait and decode all of the information, or if it has a delay constraint it can decode earlier and still receive some of the information. If only a subset of the information can be decoded, it is the high priority bits that are decoded. If all of the information can be decoded, the high priority bits are decoded first, which is beneficial when a subset of the information has a more stringent delay constraint. Regardless of when the receiver decides to decode, the ideal RUEP code would have the property that the amount of information received is the maximum possible amount of information that could have been received within that time interval, and bits with a higher priority are received before bits with a lower priority. Our RUEP code will differ from the ideal RUEP code in two ways. First, the receiver cannot choose an arbitrary decoding time. Instead, the channel SNR will dictate M and N, with M < N, which are the number of blocks that the receiver needs to decode the high and the low priority bits, respectively. The second difference is that if M < N then after only M blocks the receiver will be receiving a rate that is less than the channel capacity. This is an unavoidable effect of superposition coding. In our RUEP code the two bitstreams will be superimposed and the low priority bits cause interference for the high priority bits. Thus, unless the low priority bits are also decoded after M blocks, then the rate achieved by decoding only the high priority bits is less than capacity. Therefore, in order to achieve the channel capacity in this case, a new set of high priority bits must be sent while the receiver listens for and eventually decodes the low priority bits. It is important to understand that the ideal RUEP code is not achievable and how our code will differ from the ideal RUEP code. Depending on the parameters, our RUEP code will use one or two rateless codes as a building block. In our analysis, we assume that the codes used as building blocks for the RUEP code are perfectly capacity-achieving. Similarly, when comparing RUEP to traditional UEP, we will assume that traditional UEP uses fixed rate codes that are capacity-achieving as building blocks. This will ensure that the analysis is making a fundamental comparison between the RUEP and UEP code constructions and that the differences are not merely due to using a better fixed rate or rateless code as a building block. Finally, with both UEP and RUEP, we will consider only two bit priorities, although both coding schemes can be extended to have more than two bit priorities. 3.1 Traditional UEP In this section we describe several aspects of traditional unequal error protection. First, we describe a common coding scheme known as superposition UEP. Other coding schemes exist, such as time-division UEP, but we will focus on superposition codes. Next, we will describe two sources of inefficiency in superposition UEP that lead to a rate loss with respect to capacity. It is important to understand where the inefficiencies in UEP come from so that we can properly address these aspects with RUEP. 3.1.1 Description of Superposition UEP In this section we briefly explain a form of traditional UEP called superposition UEP. The transmitter is allocated an amount of signal energy, STotal. Without UEP, and simply using a fixed rate code, there is no prioritization of the bitstream and all of the signal energy is used to code all of the bits into a codeword at a fixed rate, RFR. We assume in our analysis that the fixed rate code is capacity-achieving at that rate. That is, the code can be decoded if the channel SNR, SNRc, is at least as large as SNRFR, where SNRFR = 2RFR - 1. With a fixed rate code we choose one SNR breakpoint, SNRFR, and we can only achieve one nonzero rate, RFR. If SNRc < SNRFR then we cannot decode any of the bits and the achieved rate is zero. If SNRc = SNRFR, then all of the bits are decoded and we achieve capacity; however, if SNRc > SNRFR then all of the bits are decoded but we have achieved a rate strictly less than capacity. The achieved rate with a fixed rate code is plotted against the channel SNR in Fig. 3-1. In contrast to a code with one fixed rate, to provide a prioritization of the bitstream A S------ Channel Capacity Achieved rate with a fixed rate code 6 5 0 %0os CL) cu j9# 4 / Er 3 I / I I 2 II / I I II / I 1 I II I 0 0 10 I I I I 20 30 40 50 I 60 SNRC A Figure 3-1: Achievable rates with a fixed rate code. The point labeled A corresponds to SNRFR. The channel capacity and the achievable rates with a fixed rate code are plotted as a function of the channel SNR, SNRc. and two achievable rates, two level UEP splits the total energy, STotal, into two parts, SHigh and SLow under the constraint: SHigh + SLow = STotal (3.1) The bitstream is divided into high priority and low priority bits, also referred to as more important and less important bits. The two bitstreams can have different signal constellations. The different amounts of energy SHigh and SLow are the energy constraints for the signal constellations of the high priority and low priority bits, respectively. Once the two different bitstreams are coded and mapped to signal points in their respective signal constellations, the symbols are superimposed and then transmitted. 0 0 0 S 0* 0* + 0 S 00O0 *· *· 0 ** ** ** Figure 3-2: Illustration of UEP coding scheme An example coding scheme for traditional superposition UEP is illustrated in Fig. 32, with the high priority bits mapped to the points on the left, the low priority bits mapped to the points in the middle, and the resulting signal set after superimposing the two bitstreams shown on the right. In Fig. 3-2 the more important bits are allocated more power and the less important bits are allocated less power. The idea is that even if the channel is bad it is still very likely that the decoder will be able to determine which four-point cluster the signal point came from. Therefore, the receiver will be able to decode the high priority bits. If the channel SNR is good enough, the decoder will be able to detect which of the sixteen signal points was transmitted, and therefore decode both the high priority and low priority bits. Because of the energy constraint (3.1) on the two signal constellations and the fact that the high priority and low priority signal points are independent of each other, the superimposed symbols meet the total energy constraint, STotal. The receiver first decodes the more important bits and subtracts the symbols corresponding to the more important bits from the received symbols. The receiver then decodes the less important bits. We will show that there is no loss in capacity by dividing the total available energy into SHigh and SLow, superimposing the independent symbols at the transmitter, and decoding the symbols successively. Under the total power constraint STotal and with channel noise U2 the capacity of the channel is: CTotal = log(1 + ) (3.2) When we split the power for the two different bitstreams, the high priority bitstream sees both channel noise and interference from the low priority bits. Once the high priority bits are stripped off, the low priority bitstream see only channel noise. Thus, the capacity of the high priority and low priority bitstreams are given by: CHigh = log(1 + (3.3) SHigh SLow (3.4) ) CLow = log(l + We now add the capacities of the two different bitstreams: CHigh + CLOW= log(l + SHigh N + SLow log(1 SHigh + SLo (3.5) )) (3.6) N )(1 + SHigh N+ ) + log(1 + SHigh =log((1+ + =log(1+ ) SLow SLo aN SHi ghSLow +S 2 + N(Or + SLOW) ON + SLoW + SHighaU + SLoUN Slog(+ + SW +SHighSL•) SLo(3.8) ON ( = log(1 + (O 2 (3.7) (3.7) (3.8) + SLOW) + SLow)(SHigh + SLow) a (aN + SLow) (3.9) = log(1 + SHigh+SLW) (3.10) Using the constraint (3.1), we see that: CHigh + CLow = log(1 + Hih + ) (3.11) Total ) (3.12) CTotal (3.13) S N Srotal =log(+ = Thus, there is no loss in capacity by using a superposition code with two layers. If the channel noise is known, then a fixed rate code can be selected to achieve the channel capacity. However, with an unknown channel noise, UEP prioritizes the bits by providing a different link margin for each set of bits. The rate loss comes not from a loss in capacity due to superposition coding, but because of the unknown channel noise and the prioritization that UEP provides. Two sources of rate loss with UEP will be further explained in the following. 3.1.2 Two Sources of Inefficiency in Superposition UEP There is no loss in capacity due to the use of a superposition code with two layers. In UEP, two SNR breakpoints are selected, SNRHP and SNRLp. SNRHP is the minimum SNR necessary to decode the high priority bits, and SNRLp is the minimum SNR required to also decode the low priority bits. By construction, SNRHP < SNRLP. In the case where the channel SNR, SNRc, is actually below SNRHP then nothing is decoded. In traditional communication link design for one fixed rate, SNRFR is selected with some margin so that the channel SNR is less than SNRFR with an acceptably small probability. In UEP, SNRHP is usually selected with a large margin so that SNRc < SNRHP with a very small probability, which allows us to ignore this case in our analysis. In the case where SNRHP <_SNRc < SNRLp, then only the high priority bits can be decoded. Finally if SNRc Ž SNRLP, then both the high priority and low priority bits can be decoded. The channel capacity and the achieved rates in UEP are plotted versus the channel SNR, SNRc, in Figure 3-3. Note that at all values of SNRc the achieved rate is less than capacity. Focusing only on the region where SNRc 2 SNRHP, we will now describe two sources of inefficiency in UEP. One inefficiency causes a rate loss with respect to capacity on the high priority, and the other inefficiency causes a rate loss on the low priority bits. First, we will describe the rate loss on the high priority bits, which we will call AHigh. Rate Loss on High Priority Bits One source of rate loss occurs because the high priority bits are coded at a rate assuming a worst-case channel. Recall that the transmitter does not know the channel SNR a priori, and the high priority bits must be decoded for any channel SNR satisfying SNRc Ž SNRHP. A channel SNR of SNRHP corresponds to a channel noise 6 - ----- Channel Capacity .. Achieved rate with UEP ,- o,n ,' 5 # a4- CI a,3 - II I 2 II I 0 10 20 30 40 50 60 SNRc B A Figure 3-3: Achievable rates with a UEP code. The points labeled A and B correspond to SNRHP and SNRLp. The channel capacity and the achievable rates with an unequal error protection code are plotted as a function of the channel SNR, SNRc. variance of u2 - S=o .l Therefore, the transmitter codes the high priority bits assuming that the capacity for the high priority bits is: CHighMwn = lo0g(1 + a2 SHigh S NmMax ) < CHigh = 10g(1 + SLoN Sigh + SLO ) N+ SLow (3.14) where CHigh is the actual channel capacity for the high priority bits. The loss in rate will be equal to the difference between the actual channel capacity, CHigh, and the worst-case capacity that the transmitter has assumed, CHighMi·. We will refer to this loss in rate on the high priority bitstream as AHigh. Specifically, AHigh = CHigh - CHighMin = 0log + NSLOW SHiLgh NMax (3.15) In all communication link design where fixed rate codes are used, a link margin must be allocated. For UEP, the loss in rate, AHigh, is due to the conservative channel assumption and the corresponding large link margin allocated to the high priority bits. Rate Loss on Low Priority Bits In addition to a loss in rate on the high priority bits, there can also be a loss in rate on the low priority bits. If the channel SNR is greater than the low priority bit threshold that is selected (i.e., SNRc > SNRLp), then the low priority bits could have been coded at a higher rate and still have been decoded successfully. In contrast, if the rate of the code is chosen too optimistically, corresponding to SNRc < SNRLP, then the low priority bits cannot be decoded at the receiver. The only time that there is no loss on the low priority bits is if SNRc = SNRLp. If the low priority (LP) bits are encoded under the assumption that the channel noise is -2 = ,Sool then the rate loss on the low priority bits in the two different situations is: log(1 + 2L)) - log (1 + S log(1 ALow = N ), if U2 < a( log(1 + 2y), (3.16) -- NLp if aU > 2 L where the latter case corresponds to the channel noise being worse than assumed and the low priority bits not being decoded. We ignored this situation for the high priority bits because, as discussed previously, the channel assumption is made so that SNRc 2 SNRHP with an arbitrarily high probability. We will later discuss RUEP codes. Depending on the choice of parameters for an RUEP code, the code could address only the inefficiency ALow and not AHigh, or it could mitigate both ALow and AHigh. This will be explained in more detail when we introduce RUEP. Summary of UEP Rate Loss The two sources of rate loss in UEP have been described in the previous sections. UEP provides a prioritization of the bitstream and creates two different achievable rates by selecting two different SNR breakpoints, SNRHP and SNRLP. The exact amount of rate loss will depend not only on SNRHP and SNRLP, but also on the actual channel realization. With a UEP code, for any channel realization there is a nonzero rate loss. With UEP, the only way to avoid this rate loss is to know the channel and make the two SNR breakpoints both equal to the channel SNR, setting SNRHP = SNRLP = SNRc, but then the code is a fixed rate code at a single rate with no prioritization and is no longer considered a UEP code. Using a fixed rate code that is capacity-achieving at a single rate is depicted in Fig. 3-1. 3.2 Rateless UEP vs. Traditional Rateless The RUEP code differs from a traditional rateless code in that the the receiver does not have to wait the same amount of time to decode all of the bits and the range of available rates can be made larger with RUEP. With a rateless code, the receiver has to wait for incremental redundancy and can then decode all of the bits at the same time. Often it is desirable to decode one group of delay sensitive bits before another group of bits that can tolerate longer delays (and hence can tolerate a larger number of retransmissions). The number of retransmissions necessary to decode these delay sensitive bits is smaller for RUEP than it is for rateless. The bits that can tolerate a larger number of retransmissions are also no longer constrained by the small maximum number of retransmissions that the delay sensitive bits can tolerate. Due to the prioritization of the bits and the fact that different bitstreams can be decoded at different times and be allowed a different number of retransmissions, RUEP also allows for a larger range of available rates under a finite delay constraint. A rateless code subject to a constraint of only allowing a maximum of MMax transmitted blocks with a minimum rate requirement RMin can only support rates in the range [RMin, RMin * MMax]. A rateless code matches its rate to any channel SNR in the ------ Channel Capacity Achieved rate witii rateless code ..--- "*" E ,- -.0 c 4 .000 r -I I I I I 2 I' I 1 K' i 0v i 20 il I.- 60 SNRc 30 Figure 3-4: Achievable rates with a rateless code. The points labeled A and B correspond to SNRRLM,,f and SNRRLM,,. The channel capacity and the achievable rates with a rateless code are plotted as a function of the channel SNR, SNRc. interval [SNRRLMj,,, SNRRLMa,,, where: = 2 RM i • - 1 (3.17) = 2 R Min*MMa = - 1 (3.18) SNRRL,,n SNRRLMa. The achieved rates for a rateless code are plotted against the channel SNR in Fig. 3-4. Although the rates are constrained to be the initial rate divided by the number of blocks needed to decode, for simplicity of illustration we assume that the rateless code can achieve capacity at any rate in the range [RMin, RMin * MMax]. In our block based scheme this would, of course, require non-integer numbers of blocks. For simplicity of analysis and illustration, we will allow non-integer numbers of blocks when discussing rateless and RUEP in this chapter, and assume that a rateless code can achieve the exact rate that the channel supports in the range [RMin, RMin * MMax]. In contrast to a rateless code, we will show how RUEP can still meet the desired lower bound on the achievable rate but also provide a larger range of rates with an arbitrary upper limit by setting the initial rate on the low priority bits as high as desired. 3.3 Rateless UEP RUEP provides the bitstream prioritization that is characteristic of UEP while mitigating the inefficiencies in UEP through the use of rateless codes. RUEP uses a rateless code on the high priority bits and a different rateless code on the low priority bits. After being ratelessly encoded, the two coded bitstreams are superimposed to create RUEP blocks. In contrast to UEP, where only two SNR breakpoints, SNRHP and SNRLP are selected, an RUEP code allows for a range on both SNRHP and SNRLp. There are now four SNR parameters, SNRHPM,,n, SNRHPMax, SNRLPMinl and SNRLPMax. If the realized channel SNR, SNRc, falls in the interval [SNRHPMin, SNRHPMax] then the rateless code used for the high priority bits automatically adapts to the channel so that capacity for the high priority bits is achieved. Similarly, if the realized channel SNR falls in the interval [SNRLPMn, SNRLPMax] then the rateless code used for the low priority bits also automatically adapts to the channel so that the low priority bits are received at the maximum possible rate. While it is not necessary for the two intervals [SNRHPMin, SNRHPMaxI and [SNRLPAj,, SNRLPMax] to overlap, we will consider cases where the intervals do overlap. In addition, the two intervals can be selected separately from each other, with the only constraint being that SNRLPMi, >2 SNRHP•Mi since decoding the low priority bits assumes that the high priority bits can be decoded and stripped off of the received symbols. Because the two intervals can be selected separately, the maximum range of available rates can be chosen arbitrarily. This allows for an increase in the range of rates over rateless coding where the available rates are determined by the minimum rate and MMax. Also, with RUEP there are separate constraints for each set of bits on the allowable number of transmitted blocks. For the high priority bits, the maximum number of transmitted blocks is MMax. For the low priority bits, the maximum number of transmitted blocks is NMax. In addition, we constrain MMax • NMax. MMax provides a constraint between SNRHPMi,, and SNRHPMax, and NMax provides a constraint between SNRLp,,,, and SNRLPMax. The achievable rates with our RUEP code for different SNR regimes are depicted in Figure 3-5. In Fig. 3-5, we assume that the rateless codes used are capacity-achieving and also for clarity of illustration that the rateless codes are not constrained to take on a discrete set of rates that consists of some maximum rate divided by the number of blocks received. We will now describe the achievable rates of RUEP for different ranges of the realized channel SNR, SNRc. 3.3.1 Achievable Rates of RUEP in Different Channel SNR Regimes Here we will discuss the achievable rates and the rate loss with respect to capacity, if any, of the RUEP scheme. These rates, along with the channel capacity curve, are plotted in Fig. 3-5. It is clear that SNRLPMi,, 2 SNRHPMi since decoding of the low priority bits depends on the fact that the high priority bits can be decoded and stripped off of the received symbols. In addition, as shown in Fig. 3-5, we will assume that SNRLPM,,~ < SNRHPMaz, although this does not have to be imposed. If, in fact, SNRLPMin, > SNRHPMax, then the achievable rate remains constant between those two SNR points instead of increasing monotonically with SNRc. The achievable rates of the RUEP code will be denoted RRUEPI, meaning the rate for RUEP in the ith region of SNRc. There are five mutually exclusive regions of SNRc that span all possible values of SNRc. The boundaries of the regions are set by zero, the four different SNR interval limits (SNRHPMin,,, SNRHPMax, SNRLPMi,, SNRLPMaX), and infinity. We number the different regions from left to right with ----- : Channel Capacity -Achieved rate with RUEP 0 . 00O.0. .. =90. 0 .J.' - -0.M ,,.-*** clJ 4- Cu 0 10 A 30 20 B 40 C 50 60 SNRC D Figure 3-5: Achievable rates with RUEP. The points labeled A, B, C, and D correspond to SNRHPMi,,, SNRLPM,,i, SNRHPMa,,, and SNRLPMa.. The channel capacity and the achievable rates with RUEP are plotted as a function of the channel SNR, SNRc. The five channel SNR regimes are also labeled with the numbers 1-5. respect to Fig. 3-5, and we will discuss them sequentially. In addition to the RUEP rates, we will give the loss in rate for the i th region, ARRUEPi, with respect to the channel capacity. RUEP Rate for SNRc < SNRHPMi, In region one, where SNRc < SNRHPMMi,, none of the bits can be decoded and the achieved rate is zero. Therefore, we have: RRUEP1 = 0; (3.19) and the loss in rate with respect to capacity is simply equal to the channel capacity: ARRUEP, = log(1 + SNRc) = log(1 + STotal) (3.20) Note that both UEP and a rateless code also have a region where nothing is decoded and the rate is zero. RUEP Rate for SNRHPM, <_SNRc < SNRLPMij In region two the low priority bits cannot be decoded, but the high priority bits are decoded at the maximum rate available for the high priority bits. Recall that the high priority bits see both channel noise and interference from the energy allocated to the low priority bits, SLoW. Therefore the achievable rate is equal to: RRUEP2 = log(1 + (3.21) SHig h ) a2 + SLow and the loss in rate with respect to capacity is due to not decoding the low priority bits: ARRUEP2 = log(1 + SNRc) - log(1 + RUEP Rate for SNRLPMi SHigh 2SHigh) 2 + SLow = log(+ SLow. o2 ) (3.22) :< SNRc < SNRHPM, In region three both the low priority and high priority bits are decoded at the highest possible rates. The achieved rate is equal to the channel capacity: RRUEP3 = log(1 + SNRc) (3.23) and there is no rate loss with respect to capacity: ARRUEP3 = 0 (3.24) RUEP Rate for SNRHPMax < SNRc < SNRLPMax In region four the high priority bits are decoded after one block because the channel SNR exceeds SNRHPMax. There is a rate loss on the high priority bits because SNRHPMAax has been exceeded. Making the natural assumption that SNRLPMax > SNRHPMaxI the low priority bits are still decoded with the highest possible rate because SNRLP,,,i < SNRc < SNRLPax . The overall rate is: RRUEP4 = log(1 + SLow) O SHigh + log(1 + ) SH ota SLOW SNRHPa + (3.25) (3.25) x and the loss in capacity due to the low rate decoding of the high priority bits is: ARRUEP4 = 0log(1 + SHigh SLow + aN log( + (3.26) SHigh SLow + SNRHo The overall throughput is limited by the fact that SNRHPMax < SNRc. In practice, it is common that SNRHPIni, is a specified system parameter. In addition, the constraint MMax on the maximum number of transmissions of the high priority bits is usually a given system parameter. With these two constraints, SNRHPjMax is specified completely by SNRHPfi and MMax, so it is not possible to increase SNRHP1ax any further in an attempt to mitigate the rate loss ARRUEP4 . RUEP Rate for SNRc > SNRLP,,ax When SNRc > SNRLPMax, the achievable rate flattens out and no longer increases with increasing SNRc. Thus, the achieved rate is: SLow RRUEP5 = o0g(1 + SLW SHigh )+ log(1 + SHig STotl SW SNRLPa SNrLPMa SLow STRotal -SNRHPMax (3.27) and the rate loss with respect to capacity is: ARRUEP5 = log(1 + SNRc) )- SL log(1 + SNRLPMa log(1 + x SHigh SL ) SNRHPMa. RUEP Achievable Rates Summary The achievable rates of RUEP for different channel SNR regimes and the loss in rate with respect to the channel capacity has been quantified. When one or more bitstreams are allowed to be ratelessly repeated then the rate loss that is present in UEP can be mitigated by RUEP. RUEP becomes UEP in the limiting case when SNRHPMi, = SNRHPMax and SNRLPMMi, = SNRLPMax. In all other cases, RUEP can provide a higher rate than UEP by using a rateless code for each set of bits. In addition, compared to a rateless code, RUEP can provide a larger range of available rates and also provides bitstream prioritization, which rateless does not provide. Next we will describe two different scenarios with RUEP. The first is when the high priority bits have a small delay constraint that prohibits using more than one block to decode those bits. In this scenario, it is necessary to set SNRHPMi,, SNRHPMa a. In this case, the rate loss on the high priority bits, AHigh, = is the same as with regular UEP. However, the low priority bits can still be repeated ratelessly, allowing SNRLPMf,, < SNRLPMa,, and the loss on the low priority bits, ALow, is reduced with respect to UEP. In the second scenario, the delay constraint on the high priority bits is large enough so that we are allowed to also ratelessly repeat the high priority bits. Then SNRHPMi, < SNRHPMaX, and AHig h is reduced with respect the UEP, in addition to ALow being reduced. 3.3.2 RUEP Scenario Number One - Cannot Repeat High Priority Bits In the two-level UEP construction that we have considered, there are two nonzero rates that are available. One rate is achieved when the channel is "bad" and the channel SNR is in the range SNRHP < SNRc < SNRLP. The other rate is achieved when the channel is "good," meaning the channel SNR is in the range SNRc > SNRLP. These two rates, respectively are: RGood = RBad = RHigh (3.28) RLow (3.29) RHigh where RHigh and RLow are the rates at which we encode the high priority and low priority bits. If we can only transmit the high priority bits once, but can transmit the low priority bits more than once, then we can reduce ALow from (3.16) by using a rateless code on the low priority bits. In this case, we can think of the high and low priority bits as delay intolerant and delay tolerant bits. Because the delay intolerant bits cannot be ratelessly repeated, the range on the channel SNR for decoding those bits changes from an interval to a single breakpoint, as it is for UEP. That is, SNRHPAMin = SNRHPMax. We still assume that there is some maximum number of transmissions allowed, NMax, for the delay tolerant bits. In this scenario, RUEP differs from traditional UEP because if the delay tolerant bits are not received in the first transmission, the receiver has the option to listen to as many as NMax blocks to decode the low priority bits. Because the transmitter is allowed to ratelessly repeat the low priority bits, SNRLPMin < SNRLPMax. RUEP with SNRHPMi = SNRHPMax and SSNRLP <SNR ax creates a more reliable way of transmitting the low priority bits than traditional UEP. The delay sensitive bits are transmitted the same way that high priority bits are transmitted in traditional UEP. However, the delay tolerant bits are sent ratelessly. In each block, a new set of delay sensitive bits is transmitted, but the same set of delay tolerant bits is ratelessly repeated until an acknowledgment is relayed from the receiver to the transmitter, or until the maximum number of transmissions, NMax, has been reached. 3.3.3 RUEP Scenario Number Two - Can Repeat Both Sets of Bits In this section we consider the scenario where the delay constraint on the high priority bits can be relaxed, enabling the retransmission of those bits. There is still a maximum number of retransmissions, MA[ax, allowed for the high priority bits; however, they do not necessarily need to be decoded after one transmission. This will allow us to code the high priority bits at a higher initial rate, instead of having to assume the worst-case channel noise. We can choose SNRHPAfi, < SNRHPMa,,, and the rateless code will automatically adapt the rate of the high priority bits to the channel for any SNRHPAMi, < SNRc < SNRHPAI. In this scenario, MMax is the maximum allowable number of transmissions for the high priority bits and NMax is the maximum number of transmissions for the low priority bits. Note that once the RUEP code is constructed, the actual channel SNR still dictates the achieved rate and after how many transmissions of each bitstream the receiver can decode the different bitstreams. We denote M and N the number of blocks needed to decode the high and low priority bits, respectively.1 It is straightforward to construct an RUEP code so that M < N. When M < N the high priority bits are decoded after fewer blocks than the low priority bits, which can be very advantageous in certain applications. Again, in this scenario we still code the less important bits with a rateless code, exactly as we did in Section 3.3.2. The difference is that now we also code the more important bits with a rateless code. The rateless codes used on the less important and more important bits are allowed to have different maximum target rates. The 1In solving for N, we always assume that the high priority bits have been decoded and stripped off. maximum target rate for the high priority bits is constrained to be RMi, * MMax, where RMin is the minimum acceptable rate that we must achieve at the receiver. Thus, if SNRHPM-i,_ < SNRc < SNRHPMax and the receiver has to wait for more than one transmission to decode the high priority bits, it still achieves a rate greater than or equal to RMi,. In regular UEP, the rate of the high priority bits is set to be RMin. Therefore, the achieved rate at the receiver on the high priority bits in RUEP is at least as good as it is in regular UEP. When M < N and the receiver is waiting for additional blocks to be able to decode the less important bits, the transmitter can send a new group of more important bits. Sending a new group of more important bits might cause a longer delay in decoding the low priority bits, but avoids a rate loss on the high priority bits. One might initially ask if sending the high priority bits ratelessly could cause a loss in rate on the low priority bits with respect to a fixed rate transmission of the high priority bits where MMax = 1. This seems plausible if the high priority bits are not decoded, but if they were decoded then they could have been stripped off and the low priority bits could have been decoded. This situation is avoided with the initial groups of bits because the code is constructed so that at the beginning of a transmission the first group of high priority bits is always decoded before the first group of low priority bits. If we then send another group of high priority bits, it is conceivable that the second group cannot be decoded before the first group of low priority bits. However, with accurate channel state information at the receiver, this problem can be overcome through the use of predictive ACKs, which are described in Sec. 2.4.1. The receiver will know that it should be able to decode the low priority bits after a certain block once the high priority bits are decoded and stripped off. Thus, the receiver can ask for the next set of low priority bits even though the current set has not yet been decoded. This avoids a rate loss on the low priority bits while waiting to decode a set of high priority bits. If there is not accurate enough channel state information at the receiver to implement predictive ACKs, then the message interleaving strategy described in Sec. 2.4.2 could be modified to work with RUEP. Fig. 3-6 illustrates the use of predictive ACKs (PACKs) in RUEP. Fig. 3-7 il- lustrates the same example when PACKs cannot be used and the receiver must wait until after a group of bits is decoded to send an ACK. Arrows indicate the direction of transmission. The notation HPs,M denotes the Mth block of the Sth group of high priority bits. The notation LPs,M denotes the Mth block of the Sth group of low priority bits. In Fig. 3-6 and Fig. 3-7, there is no delay between transmission and reception. Thus, the PACKs are not used to combat delay in the system, but rather they are used to eliminate the need to decode the second group of high priority bits before the first group of low priority bits can be decoded. In RUEP, the transmitter will transmit a particular group of high priority bits and a particular group of low priority bits until the receiver sends back an ACK or PACK corresponding to that group of bits. In the example in Fig. 3-6 and Fig. 3-7, the channel SNR is such that the receiver can decode the high priority bits after two blocks. Thus, the receiver sends and ACK to the transmitter after the second block, and the transmitter sends the next group of high priority bits. The receiver will need three blocks to decode the low priority bits. However, when the second group of high priority bits is transmitted, the receiver will need two more blocks to decode the second group of high priority bits. Therefore, the receiver cannot strip off the second group of high priority bits and decode the first group of low priority bits until after the fourth block is sent. Without PACKs, the receiver would need to wait until after the fourth block and then send an ACK for the high and low priority bits. The case where PACKs cannot be used is illustrated in Fig. 3-7. When PACKs can be used, the receiver sends a PACK after the third block so that the transmitter sends the next group of low priority bits in the fourth block. PACKs enable the transmission of a new group of low priority bits even though receiver has not yet decoded the current group of low priority bits. HPACK LPPACK I Vi Received Received 4 HPACK , I '•' • JI ' •' w • " ' ILo wvI Received • JI. V qttJq•lJ- Tx: L---e~--- I I_ I Rx: I HP ACK Sent LP PACK Sent HP ACK Sent Time Figure 3-6: Illustration of RUEP when predictive acknowledgments (PACKs) are used. The receiver has to wait until after the fourth block to strip off the second group of high priority bits and decode the first group of low priority bits. However, the receiver knows that it has enough information after three blocks to decode the first group of low priority bits once the high priority bits are stripped off. The receiver sends a PACK after the third block and the transmitter sends a new group of low priority bits in the fourth block. HP ACK dr] D ca V %.W ec HP,LP ACKs Received i i Tx: I I 'I Rx: HP ACK Sent HP,LP ACKs Sent L Time Figure 3-7: Illustration of RUEP when predictive acknowledgments (PACKs) cannot be used. The receiver has to wait until after the fourth block to strip off the second group of high priority bits and decode the first group of low priority bits. In contrast to Fig. 3-6, the receiver does not know that it has enough information after three blocks to decode the first group of low priority bits once the high priority bits are stripped off. The receiver must wait until the second set of high priority bits are decoded in the fourth block to send an ACK for the first set of low priority bits. Thus, there is a rate loss on the low priority bits while waiting to decode the high priority bits. 3.3.4 Additional Considerations for RUEP Receiver Can Choose to Wait or Not Wait for Low Priority Bits We have highlighted the fact that the channel dictates M and N, the number of blocks necessary to decode the high priority and low priority bits, respectively. Because the channel SNR determines M and N, the receiver cannot choose an arbitrary decoding time and our RUEP code differs from the ideal RUEP code. However, the receiver does have some flexibility because it can decide if it wants to wait for M and N blocks to decode the high and low priority bits. We will assume that the receiver will always wait M blocks to decode the high priority bits, and it will then choose if it wants to wait an additional N - M blocks to also decode the low priority bits. With RUEP, the transmitter will send as many as MMax blocks to allow the receiver to decode one set of high priority bits and as many as NMax blocks to allow the receiver to decode one set of low priority bits, where MMax < NMax. Until now we have assumed that the receiver would always wait for MMax and NMax blocks in an attempt to decode the high and low priority bits. However, at any point if the receiver does not wish to wait any longer for the current set of low priority bits, it can send back a positive acknowledgment so that the transmitter will move on and begin sending the next group of low priority bits. The receiver can wait for the low priority bits for a given amount of time, then give up if they cannot be decoded. Or, if the receiver has accurate channel knowledge, it will know how long it would have to wait for the low priority bits and decide if it wants to wait that long or not. If the receiver does not want to wait, then it can send back an acknowledgment to the transmitter. This flexibility is advantageous because while the maximum number of blocks the receiver would ever want to receive for the low priority bits is NMax, it might not always be advantageous to wait for the full NMax blocks for a particular group of low priority bits. Rateless UEP for Broadcast We have discussed the use of RUEP codes in transmitting to one user where the channel is time-varying. In this situation, when the receiver has successfully decoded one group of high priority or low priority bits, it sends an acknowledgment to the transmitter. The transmitter then sends the next group of high or low priority bits. Because a new group of bits is transmitted when the previous group is decoded, two constant streams of bits are transmitted - one constant stream of high priority bits and one constant stream of low priority bits. In addition to transmitting to one user with a time-varying channel, it is common to transmit to multiple users. Each user has an SNR that is independent of the other users, but every user has the same channel statistics. In other words, the SNRs of the different users are independent, identically distributed random variables. Here the SNR is spatially-varying instead of time-varying. When there are multiple receivers, the transmitter cannot simply send the next group of bits upon receiving an acknowledgment from one user. Because there are multiple users, the performance of each user depends on the other users. Consider a pair of two users. If one is finished decoding a set of high or low priority bits, then that user either has to wait for the other user to decode before a new set of bits can be transmitted, or the new set of bits is transmitted at the expense of the user with a worse channel not being able to decode the current group of bits. To avoid this complication in our analysis, when broadcasting to multiple users, we will consider the case when the transmitter only has one set of high priority bits and one set of low priority bits. All of the high and low priority bits are transmitted in the first block. When the transmitter sends more blocks, all of the bits are included in every block. Therefore, when a particular receiver can decode, there is no need for the transmitter to send additional information bits because there are no remaining information bits. This enables each user to finish decoding according to its own channel quality and not have to incur a rate loss while it waits for other users to decode. While one receiver can decode independently of the other users, when a receiver waits for the low priority bits, there is a rate loss on the high priority bits. After M blocks the receiver can decode the high priority bits, but if it also waits to receive N blocks to decode the low priority bits then the rate achieved on the high priority bits is decreased by a factor of N. In our analysis of the achievable rates with RUEP in the preceding sections, summarized in Fig. 3-5, we were considering the one user scenario where this problem does not arise and therefore we did not include this source of rate loss for RUEP. In the sequel we will illustrate RUEP with an example and will consider both the single user case where a new set of high or low priority bits can be transmitted when the current set is decoded, and the multiple user case where there is only one set of high priority and one set of low priority bits and waiting to decode the low priority bits causes a rate loss on the high priority bits. 3.3.5 Example of RUEP In this section, we illustrate RUEP with an example. All rates and capacities are given in bits per two dimensions (b/2D) unless otherwise noted. We assume that the rateless code used for each set of bits is capacity-achieving. We will consider two cases. The first case that we will briefly discuss is when MMax = 1 and NMax = 20, which corresponds to not being able to retransmit the high priority bits. The second case that we will consider is when MMax = NMax = 20. In the case where MMax = NMax, it is still desirable to receive the high priority bits before the low priority bits. The other parameters of our example are as follows, where STotal is the constraint on the total signal energy, and 2 =- is the maximum channel noise value (with SnToal an acceptably high level of certainty): 3 (3.30) = 55.7136 (3.31) dB (3.32) STotal -= a2 = SNRHPMS, 2 = -12.688 NMax In addition, suppose that the minimum rate of information on the high priority bits that we need to be able to reliably attain is: RMin - (3.33) 1 20 That is, within MMax blocks, we need to be able to decode the high priority bits with at least rate 1 b/2D, assuming that the channel noise is no worse than a.2Vf, 55.7136. Example of RUEP with MMax = 1 and NMax = 20 Note that in the above scenario, with the worst-case channel SNR of SNRHPMin, the channel can support a rate of Rworstcase = log 2 (1 + 55 ) = 0.0757 b/2D. Using a fixed rate code with MMax = 1 and NMax = 20, a code of rate 0.0757 must be used and the low priority bits are constrained by the stringent retransmission constraint on the high priority bits. Depending on the channel SNR, all of the bits are decoded after one block, or none of the bits are decoded. Using an RUEP code in the scenario where the high priority bits cannot be retransmitted, it is necessary to receive a rate RMin = 2- b/2D on the high priority bits with each transmitted block for all SNRc Ž> SNRHPMi,. This can be achieved by allocating energy SHigh = 2 to the high priority bits. Thus, even if the channel noise is equal to a.2, the high priority bits can achieve a rate of: 2 RHigh = log 2 (1+ 1 1+55.7136 = 1 20 (3.34) With RUEP in this scenario, we cannot ratelessly send the high priority bits and therefore AHigh is the same as it is with UEP. However, we can ratelessly send the low priority bits and therefore ALno can be reduced with respect to UEP. Example of RUEP with MMax = 20 and NMax" = 20 With MMax = 20 and NMax = 20, using a capacity-achieving rateless code we would set the initial rate after one block to C* = 20 * .0757 = 1.514 b/2D. If the worst- case channel is realized, then we must wait for twenty blocks, and we achieve a rate of 0.0757 b/2D. Using a rateless code we must make C* = Rworstcase * MMax. If the channel is very good and could support a rate greater than 1.514 b/2D, we would achieve a rate less than capacity and lose out on potential throughput. This corresponds to SNRc > SNRRLMax in Fig. 3-4. Thus, a rateless code can be used to accommodate a finite MMax constraint, but the highest achievable rate is constrained by SNRRLMi, and MMax. In addition, all of the bits are decoded at the same time instead of the high priority bits being decoded earlier, as they are with RUEP. With an RUEP code we set the maximum rate for the high priority bits to be RHighMax =- - * MMax = 1 b/2D. After setting SHigh = 2 to meet our minimum rate requirement for the high-priority bits when SNRc = SNRHPMin, we must set SLow = 1. We send the low priority bits ratelessly and can set the initial low priority rate, RLowMa', to be any value we want. In this example, we set RLOWMax = 2 b/2D. Thus, the total maximum rate that the code can achieve is: RTotaIMa, = RLowMa, + RHighMax = 2 + 1 = (3.35) 3 The throughput that we can achieve is higher than the throughput using regular rateless, and we could have increased the maximum throughput further by increasing RLowMGx. The receiver always receives a rate of at least -0 b/2D, and decodes the high priority bits before or at the same time as the low priority bits. Because the high and low priority bits can be retransmitted, both AHigh and ALow are reduced with respect to traditional UEP. If the worst-case channel is realized, that is, SNRc = SNRHPMin, the receiver simply waits for twenty blocks and decodes the high priority bits at a rate of !. To examine cases other than the worst-case, we will denote aP(M) and P(M) as the maximum noise levels that still allow decoding after M blocks of the high priority and low priority bits, respectively. Note that our RUEP code has the following four 100 SNR parameters: SNRHPM,,, 2 STotal orH2 HP(MMax) SNRLPmi,, 2STotal LP(NMax) SNRHpMa 55.7136 = 0.0538 (3.36) - = 0.2154 (3.37) 3=3 (3.38) 55.7136 13.93 Tota HP(1) SNRLPMax 3 STotal STotal = ULP(1) 3 9 (3.39) A general illustration of the achievable rates for RUEP given four SNR parameters is found in Sec. 3.3.1, Fig. 3-5. In this example: HP(M) > 2P(M), VM (3.40) which enables the receiver to always decode the high priority bits as soon as or before the low priority bits. In addition, for each integer N in the interval [1, NMax], we find the smallest possible M such that: aLP(N) :5HP(M) (3.41) The reason for doing this is so we know that if the receiver can decode the low priority bits after N blocks, then the high priority bits can be decoded after M blocks, where M < N. If there is one user and the transmitter sends steady streams of high and low priority bits, then the rates on the high priority and low priority bits are equal to RHigh/Ma and RLMax respectively. If only one set of high priority and low priority bits is transmitted and not a constant stream, then the receiver has the choice after M blocks of decoding only the high priority bits, or waiting an additional N - M blocks and decoding both the high priority bits and the low priority bits. Of course, with a one time transmission, if the receiver chooses to wait and decode all of the bits, then the rate of the high priority bits decreases by a factor of g. Therefore, in order for the total rate including 101 both sets of bits to increase, the decrease in the high priority rate must be offset by decoding the additional set of bits, the low priority bits. Even if the overall rate is not increased, it might still be desirable for the receiver to wait and decode all of the bits because it still receives a greater number of bits. Table 3.1 summarizes the achievable rates using RUEP with the power allocation and rate allocation described earlier. As noted earlier, 2LP(N is the maximum channel noise variance that allows decoding of the low priority bits after N blocks. For channel noise variance equal to aP(N), N and M are the number of blocks needed to decode the low and high priority bits. RHP(M) and RLP(N) are the high priority rate if we decode the high priority bits after M blocks and the low priority rate if we decode the low priority bits after N blocks, respectively. Thus, RHP(M) + RLP(N) is the total rate achieved when a constant stream of high and low priority bits is transmitted. If a constant stream of bits is not transmitted, RHP(N) = RHig is the rate achieved on the high priority bits if new high priority bits are not transmitted while waiting N blocks to decode the low priority bits. In this case, the total rate is equal to RHP(N) + RLP(N). When a constant stream of bits is not transmitted, in the last column of Table 3.1 we define AR(M, N) = RHP(N) + RLP(N) - RHP(M), which is the rate that the receiver achieves when waiting N blocks for the low priority bits but not receiving new high priority bits minus the rate that could have been achieved had the receiver only decoded the high priority bits after M blocks. We will now take a closer look at the code for one realization of the channel. Let us suppose that the channel noise is ao = 1. From Table 3.1, we see that this is the upper limit on the noise level that will allow the receiver to decode the low priority bits after N = 2 blocks. Also, we see that M = 1, which means that the decoder can decode the high priority bits after only one block. If the transmitter sends a steady stream of bits to only one user with this channel, then the receiver can decode the first set of high priority bits after one block at RHP(1) = 1 b/2D. The receiver will send an acknowledgment for the high priority bits when they are decoded and the transmitter can then send another set of high priority bits in the second block while ratelessly repeating the initial set of low priority bits in the second block. After the 102 second block, the receiver will decode this new set of high priority bits in addition to the first set of low priority bits. After decoding both sets of bits, the receiver will send an ACK for both sets back to the transmitter and new sets of high and low priority bits will be transmitted in the next block. The first two blocks are illustrated in Fig. 3-8. The receiver maintains the same rate RHP(1) = 1 b/2D of high priority rate RLP(2) bits, but also receives rate RLP(2) = rate is equal to RHP(1) + RLP(2) = / 2 blocks - 1 b/2D of low priority bits. The total 2 b/2D, which is shown in the row corresponding to N = 2 and the fifth column of Table 3.1. In this example, when a2 = 1, SNRLPM,,_< SNRc • SNRHPMax. Therefore, the channel SNR is in the third region in Fig. 3-5, described in Sec. 3.3.1, where the RUEP code is capacity-achieving. In practice, for the code to actually achieve capacity, a 2 must correspond to the highest possible channel noise variance for both M and N, the number of blocks needed to decode the high and low priority bits. Also, for the code to achieve capacity, the transmitter must be allowed to send constant streams of high and low priority bits. Both of these practical conditions are met for av = 1 in addition to the channel SNR being in the proper regime, so the RUEP code achieves capacity. In addition to achieving capacity, the high priority bits are decoded after fewer blocks than the low priority bits. In contrast, achieving the channel capacity cannot be done with traditional UEP, as seen in Fig. 3-3. Also, in contrast, with a rateless code there is no prioritization of the bitstream so all of the bits are decoded at the same time rather than the high priority bits being decoded first. In a broadcast scenario, referring still to the situation where the channel noise is aN = 1, if the receiver waits for the low priority bits in the second block while not receiving new high priority bits, then the rates that are achieved on the high and low priority bits are RHP(2) = Z b/2D and RLp(2) = 1 b/2D. The rate on the high priority bits is decreased by a factor of two but the overall rate increases by waiting for the low priority bits. If the receiver had decided to stop listening after only one block, then it would have received RHP(1) + RLP(1) = 1 + 0 = 1 b/2D because it would have been unable to decode the low priority bits. By listening for a longer period of time, the total rate is RHP(2) + RLP(2) = 1 + 1 = 11 b/2D and the overall rate increase is 103 HP ACK HP,LP ACKs RppcimvA r r rru RPIPip; r~rr~~ rJL • V d 4i i Tx: I Rx: I HP ACK HP,LP ACKs Sent Sent • II Time Figure 3-8: RUEP when constant streams of high and low priority bits are transmitted. Here the receiver can decode the high priority bits after one block, and the low priority bits after two blocks. Upon receiving an acknowledgment from the receiver for the first set of high priority bits, the transmitter sends a new set of high priority bits in the second block. The notation HPs,M denotes the Mth block of the Sth group of high priority bits and LPs,M denotes the Mth block of the Sth group of low priority bits. 104 AR(M,N) = RHP(2) + RLP(2) - RHP(1) = 1 + 1 - 1 = 1. In addition to the increase in rate, the receiver is able to decode more bits, which is beneficial because receiving a greater number of bits allows for a more faithful reconstruction of the transmitted data at the receiver. Thus, even if the rate remained the same after listening for a longer period of time, or decreased (as it does for larger N in Table 3.1), depending on the application it might still be advantageous for the receiver to listen for a longer period of time to receive a greater number of bits. For large N, N - M increases and the receiver must wait a long time to decode the low priority bits. The rate loss on the high priority bits in the broadcast scenario becomes large and the rate on the low priority bits becomes smaller as N increases. Therefore, AR(M,N) becomes small and even negative for large values of N. Finally, note that for all a2 such that N - M > 0, there is a rate loss with respect to capacity if the transmitter does not send a new set of high priority bits when the original set is decoded. In this scenario, although there is a rate loss on the high priority bits while waiting for the low priority bits, the rate on the high priority bits is at least as large as it is with UEP when the high priority bits are not sent ratelessly and SNRHPMax = SNRHPMi, = SNRHP- 3.3.6 Asymptotic Analysis of Achievable RUEP Rates In this section we analyze the efficiency of RUEP in the low SNR regime. Specifically, we look at the fraction of capacity that RUEP achieves in the second region in Fig. 35. In this region, SNRHPM.i, 5 SNRc < SNRLPMi,T and we make the assumption that SNRc is small. The low priority bits are not decoded but the high priority bits are decoded successfully. This is the lowest SNR region at which RUEP will provide a nonzero rate. We will show that the fraction of capacity achieved in this region is approximately equal to but strictly less than the fraction of the total signal energy that the high priority bits are allocated. This knowledge is useful when designing a system to achieve a certain fraction of capacity in the low SNR regime. First, we show an approximation of the efficiency of RUEP when the channel is poor and the receiver is only able to decode the high priority bits. We then show that 105 -20 Table 3.1: Performance of Rateless UEP Scheme with=2 0.33 1.00 1.70 2.41 3.13 3.85 4.57 5.29 6.00 6.73 7.45 8.17 8.89 9.61 10.33 11.05 11.77 12.49 13.21 13.93 3.00 2.00 1.67 1.00 0.90 0.67 0.62 0.58 0.56 0.45 0.43 0.42 0.40 0.34 0.33 0.33 0.32 0.28 0.27 0.27 106 3.00 1.50 1.00 0.75 0.60 0.50 0.43 0.38 0.33 0.30 0.27 0.25 0.23 0.21 0.20 0.19 0.18 0.17 0.16 0.15 N/A 0.50 0.50 0.25 0.10 0.17 0.10 0.04 0.00 0.05 0.02 0.00 -0.02 0.01 0.00 -0.01 -0.02 0.00 -0.01 -0.02 this approximation is also an upper bound on the efficiency. Therefore, the efficiency becomes very close to this upper bound when the channel has a low SNR. To begin, we note again that with a superposition code the capacity on the high priority bits is given, as it was in (3.3), by: CHigh = log(1 + (3.42) SHigh N + SLow where SHigh and SLow are the amounts of power allocated to the high priority and low priority bits, respectively. For low channel SNR, SNRc, the channel noise, a 2 becomes large. When a 2 > SLow we make the approximation: (3.43) ) log(1 + CHigh N Next, since both the high priority and low priority bits are allocated a nonzero fraction of the total available power, we note that SHigh = KHP * STotal, where 0 < KHP < 1. We will use this to relate the high priority bit capacity back to the overall channel capacity. Using a first order Taylor series expansion of log(1 + x) around x = 0, we have: log(1 + x) . x Applying this Taylor series approximation to (3.43), we have: SHigh High 2 N (3.44) Substituting SHigh = KHP * STotal into (3.44), we see that: CHigh KHP * STotal KHP * ST2 TcN (3.45) The overall channel capacity in the low SNR regime can also be approximated using 107 the Taylor series expansion by: CTotat = log(1 + STotal STotal N (3.46) (3.46) N Taking the ratio of (3.44) to (3.46), it becomes clear that: KHP CHigh (3.47) CTotal In addition, we will show that gh CTotal < KHP. In order to do this, we look at the fraction of the total capacity that the low priority bits are allocated. Note that: SLow = KLP (3.48) * STotal and KLP + KHP = 1. Using a Taylor series expansion for CLow, we see that: SLow) SLowu ) CLrW = log (1+ UN (3.49) (3.49) N However, for any x > 0 it is well known that: log(1 + x) < x (3.50) and therefore, applying this bound to both CLow and CTotal: CLow = log (1 + SLow N CTota = log(1 + It is clear that -2rN < S N Tota2 aN SLo (3.51) N STota (3.52) N since the low priority bits are not allocated all of the total power. Now we use the fact that log(1+) decreasing for x > 0. The fact that log(1+) is less than one and monotonically is monotonically decreasing for x > 0 is < 1, for all x > 0 is obvious from shown in Appendix A, and the fact that log(1+X) X7 108 the inequality (3.50). We can therefore rewrite (3.46) and (3.49) as: CLa (3.53) = aow SLOW STotal (3.54) CTotal = aTotal ST2 N where aLow, 0 Total < 1. Furthermore, the Taylor series approximation holds more tightly for CLow than it does for CTotal since the SNR for the low priority bits is less than the SNR given the total amount of power and log(+x) is monotonically x decreasing for positive values of x. Therefore, aTotal < aOLo w . Now we can precisely find the fraction of the total capacity that is allocated to the low priority bits by substituting (3.48) into (3.53) and then dividing (3.53) by (3.54): CLO CTotal - aLo KLp > KLP (3.55) COTotal Since the low priority bits have a fraction of capacity greater than KLP, and KHP = 1, KLP + clearly the high priority bits must have a fraction of capacity less than KHP, or else the channel capacity formula would be violated. Thus, while the approximation in (3.47) holds, the following upper bound also holds: High < KHP (3.56) CTotal 3.3.7 Summary - RUEP vs. UEP and Rateless We now summarize the differences between RUEP and rateless, and also between RUEP and UEP. With a finite number of transmissions allowed, MMa, on all of the information bits (including both high priority and low priority bits) and a minimum rate requirement, RMi, a rateless code is capacity-achieving for a set of the interval [RMin, RMin * MMax]. MMax different rates in However, if the channel capacity is greater than RMin * MMax, then the rateless code does not achieve the full channel capacity. Also, a rateless code does not separate the bitstream into high and low priority bits. In 109 contrast, RUEP allows for an arbitrarily high upper limit on the achievable rate while still supporting the minimum rate, RMi,. In addition, RUEP supports a prioritization of the bitstream. Because it supports bitstream prioritization, RUEP can accommodate two different constraints on the maximum number of transmissions, MMax and NMax, for two different sets of bits. Finally, the bitstream prioritization with RUEP allows the high priority bits to be decoded after fewer or at most the same number of blocks as the low priority bits. An unequal error protection code by itself supports bitstream prioritization and two different nonzero rates through the selection of SNRHP and SNRLP. However, like all traditional communication link design based on fixed rate codes, UEP does not adapt to the quality of the channel. RUEP builds on UEP, allowing bitstream prioritization while adapting to the channel to provide higher rates. With an RUEP code, four parameters are selected, SNRHPMin, SNRHPMax, SNRLPM•f,, and SNRLPMax, providing for each set of bits a range of channel SNRs where the code for that set of bits matches the supported rate. By incorporating rateless codes into UEP, RUEP decreases the rate loss that traditional UEP incurs. 110 Chapter 4 How Rateless Fits in to Point-to-Point Communication and Networking In this chapter, we first compare rateless codes to more established forms of incremental redundancy codes, namely hybrid automatic repeat request (HARQ). We compare rateless and HARQ for the AWGN channel, in the context of point-to-point communication. After understanding the relative advantages of rateless for point-to-point communication, we will discuss how rateless codes could improve wireless network performance. In particular, we will focus on using rateless codes in mobile ad hoc networks (MANETs). 4.1 Rateless vs. Hybrid ARQ for Point-to-Point Communication In this section we will compare rateless codes to hybrid automatic repeat request (HARQ). There are many different coding schemes used in HARQ, and they fall broadly into three types, simply named type-I, type-II, and type-III HARQ. Instead of focusing on comparing rateless to certain "types" of HARQ, we will compare rate111 less to two general methods that are commonly used. The first method is to use a good code, and simply repeat the entire codeword when the receiver requests another transmission. The receiver will use maximal ratio combining (MRC) to combine all of the received versions of the codeword. This was described by Chase in [2], and it is commonly referred to as "Chase combining" HARQ. We will use the abbreviation CC HARQ to refer to this method. The other HARQ method that we will consider is incremental redundancy (IR) codes, where instead of repeating the entire codeword, additional parity bits are transmitted. Often these codes are designed from a low-rate mother code that is punctured in a specific way to form several higher rate codes. With each new repeat request from the receiver, a different subset of the punctured parity bits is transmitted. Eventually, after several retransmissions, the receiver has the original codeword from the low-rate mother code. An example of a practical IR code with good performance is given in [10]. We will use the abbreviation IR HARQ when referring to incremental redundancy hybrid ARQ. 4.1.1 Rateless versus Chase Combining HARQ One natural question in comparing rateless to CC HARQ is the following - Suppose we have a choice between using a fixed rate code with rate RFRini,tal after one transmission and efficiency EFixed at rate RFRInitial and a rateless code with efficiency ERateless that remains constant for rates in the interval [ Initia RRLIniial], where Max is the maximum number of rateless blocks that can be transmitted and RRLIitIls is the initial rate after the first transmitted block. At what point does it become advantageous to use rateless instead of repeating the good fixed rate code? Note that at any channel SNR there is a lower bound on the efficiency of the rateless code. Since we have seen in Fig. 2-5 that this lower bound is reasonably tight, in this chapter we will assume that the rateless code has a constant efficiency, ERateless, that is equal to the lower bound on efficiency. Whether or not it is better to repeat a good fixed rate code or use a rateless code depends on the parameters of each code and also the channel SNR, SNRc. In general, we will see that the greater the number of repetitions that are needed 112 the better rateless will do compared to CC HARQ because the efficiency of rateless does not change with the number of repetitions while the efficiency of the fixed rate code decreases. Furthermore, repetition is less efficient in the high spectral efficiency regime, so the performance will depend on the initial rate after the first transmission. We will now make this comparison assuming that the channel is static. In the trivial case where ERateless = EFixed, rateless will always perform the same as or better than CC HARQ, so we will only compare the two methods when ERateless < EFixed. With CC HARQ, the receiver must accumulate a fixed SNR, which depends on the code being used. Assuming a fixed rate code with an initial rate of RFRInitial (b/2D) and efficiency 0 < EFixed < 1, the accumulated SNR required is: RFRInitial SNRFixed = 2 EFixed - 1 (4.1) Note that for CC HARQ the accumulated SNR needed to decode remains constant once a fixed rate code is selected. It does not depend on the number of repetitions. In contrast to CC HARQ, because the spectral efficiency decreases as the number of repetitions increases, we will see that a good rateless code has the property that the accumulated SNR needed to decode decreases as the number of repetitions increases. The necessary accumulated SNR to decode a rateless code with an initial rate RRLntia (b/2D) is given by: RRLInitial SSNRRateles, = (2 ERateles s *M)- 1)M (4.2) where M is the number of rateless blocks transmitted and combined at the receiver. Because the efficiency of a rateless code remains constant (recall that we are assuming the rateless efficiency is equal to the lower bound on efficiency) for all rates, the channel SNR necessary to decode decreases as the rate decreases. We now set the initial rates for CC HARQ and rateless to RFRInitial = RRLInitial R 1 . We then plot the necessary accumulated SNR at the receiver for each code versus the initial rate in the first transmission, R 1. For both codes we allow up to four repetitions. These curves are similar to the curves in [4], Figure 3, except that we 113 20 15 zCl 10 z 5 1) .~0 o 0 o -5 Ur -10 -10 ,, 0 1 2 3 4 Initial Rate, R1 , (b/2D) Figure 4-1: Comparison of CC HARQ to using a rateless code. Here we set the efficiencies to EFixed = 0.9 and ERateless = 0.8, and allow the transmission of up to four rateless blocks. We plot the sum of the received SNR needed to decode versus the initial rate of the code after the first transmission. approximate mutual information by log 2 (1 + SNR) instead of using an exact formula and we do not assume that the codes are 100% efficient given a particular amount of mutual information. Figure 4-1 compares the two codes with EFized = 0.9 and ERateless = 0.8, and Figure 4-2 compares the two codes with an even larger efficiency discrepancy, EFixed = 0.9 and ERateless = 0.7. Note that there is only one curve for the fixed rate code, regardless of the number of repetitions. For the rateless code, the necessary accumulated SNR decreases with the number of transmissions. With the assumption that ERateless < EFixed, the fixed rate code always outperforms the rateless code after the first transmission. Beyond the first transmission, the fixed rate code only performs better in the low rate, low SNR regime. We will develop more insight into Figs. 4-1 and 4-2 with an example. The trans114 mitter sends the first block at some initial rate. Let us suppose that the initial rate is R, = 4 b/2D. The total received SNR for a static channel is simply: SSNR- = SNRc * M (4.3) where SNRc is the channel SNR and M is the number of received blocks. Therefore, at a given SNRc the only way to accumulate additional SNR at the receiver is by sending additional blocks. Sending additional blocks decreases the achieved rate of the code, so it is desirable to have a lower necessary total received SNR. In our example the rateless curve for M = 1 is above the CC HARQ curve, which means that the CC HARQ code can be decoded at a lower channel SNR than a rateless code after the first block. However, with R, = 4 b/2D, the rateless curves are lower than the CC HARQ curve for all M > 2, which means that the rateless code can be decoded for M > 2 with a lower value of E SNRi. If the channel SNR is high enough so that the CC HARQ code can be decoded after one block, then CC HARQ performs better than rateless. However, if the channel SNR is not high enough to decode the CC HARQ code after one block, since the necessary accumulated SNR at the receiver is less for rateless, the rateless code can be decoded with fewer blocks than the CC HARQ code, 1 resulting in a higher rate than CC HARQ. It is clear that in the high rate, high SNR regime, the rateless code performs better than CC HARQ for all M > 2. In the low rate regime, we zoom in on Fig. 4-2 to look at the curves more closely. The zoomed in view of Fig. 4-2 is shown in Fig. 4-3. In the low rate regime, a lower SNR is necessary to decode, and CC HARQ outperforms rateless for some values of initial rate. In Fig. 4-3, with EFixed = 0.9 and ERateless = 0.7, the rateless curves intersect the CC HARQ curve at different points, depending on the number of rateless blocks. As the number of rateless blocks increases, the achieved rate decreases, and the rateless curves intersect the CC HARQ curve at a lower accumulated SNR, or equivalently at 1Here we assume that M is not constrained to be an integer. If we constrain M to be an integer, then since the accumulated SNR increases in discrete steps as we increase M, we can say that the rateless code will need at most the same number of blocks as CC HARQ. 115 3JU - 25 - 20 cr z 15 Of z 10 0) 5 m 00 CD C', ._ I- -5 -10 -/ r. - - - Rateless, M = 2 ....... Rateless, M = 3 --S-Rateless, M = 4 I 1. 0 1 2 3 4 Initial Rate, R1 , (b/2D) 5 6 7 Figure 4-2: Comparison of CC HARQ to using a rateless code. Here we set the efficiencies to EFized = 0.9 and ERateless = 0.7, and allow the transmission of up to four rateless blocks. We plot the sum of the received SNR needed to decode versus the initial rate of the code after the first transmission. a lower initial rate. The one block rateless curve never intersects the CC HARQ curve because ERateless < EFixed . The two block rateless curve intersects at R 1 ? 1.572 b/2D, the three block rateless curve intersects at R 1 ; 1.042 b/2D, and the four block rateless curve intersects with CC HARQ at R 1 e 0.894 b/2D. If the initial rate is less than 0.894 b/2D, then for up to four blocks CC HARQ will perform better than rateless because the accumulated SNR needed to decode is always less for CC HARQ. With other initial rates, if the rateless curves for some numbers of blocks are below the CC HARQ curve while other rateless curves are above it, then which scheme performs better depends on the channel realization, which dictates the number of blocks needed to decode. The reason that CC HARQ is not as good as rateless when more than one block is 116 8 6 V 4 z 2 cl) 1U) " a, -2 -4 -6 -8 -10 -12 0.2 0.4 0.6 0.8 1.0 1.2 Initial Rate, R1, (b/2D) 1.4 1.6 1.8 Figure 4-3: A zoomed in comparison of CC HARQ to using a rateless code with efficiencies EFixed = 0.9 and ERateless = 0.7. We allow the transmission of up to four rateless blocks. We plot the sum of the received SNR needed to decode versus the initial rate of the code after the first transmission. transmitted in the high SNR regime is the following. In each received block, there is a certain amount of mutual information, which is approximately equal to log2 (1+SNR) b/2D. A rateless code enables decoding of information at a rate that is a constant fraction of the amount of mutual information present between a transmitted signal and a received signal. In contrast, when performing Chase combining, the SNRs of the different blocks add. From the concavity of the capacity (mutual information) curve, we know that mutual information only increases linearly with SNR in the low SNR regime. If a high initial rate is selected, then enough blocks must be combined to achieve a high SNR. However, in the high SNR regime, when performing Chase combining of blocks, the SNRs add linearly but the increase in mutual information with SNR is less than linear. Therefore, when adding SNRs, more blocks must be 117 combined than if we had been able to add the mutual information from each block. Of course, adding the mutual information from each block is what determines the channel capacity, and a rateless code can be decoded at a rate that is a fixed fraction of the channel capacity. Another way to look at the same phenomenon is as follows. With CC HARQ, if a low initial rate is selected, but the channel SNR is 3 dB too low to support that rate, then the codeword must be repeated twice. If we are in the low rate and therefore low SNR regime, then the rate that the channel can support is approximately linear with respect to the SNR, so the rate that the channel can support is approximately 3 dB, a factor of two, less than the initial rate that was selected. Therefore, to achieve the channel capacity we could have either coded at a rate that was half as large to begin with, or we could code at the same initial rate, but repeat twice. For this reason, CC HARQ works well in the low rate, low SNR regime. It is shown in [7] that the rate loss due to repetition goes to zero as the channel SNR goes to zero. To gain more insight, we also plot the achieved rate versus the channel SNR for both rateless and CC HARQ. We still assume that the channel is static for the duration of transmitting one set of message bits, but can change from message to message. The maximum channel SNR, SNRc,,M, is a given parameter and both codes will send blocks until the message is decoded. We ignore the fact that the rates achieved are a discrete set of rates consisting of the initial rate divided by the number of blocks received. To ease the comparison we assume instead that the rates achieved form a continuous curve. The rate achieved with a rateless code when the channel SNR is SNRc is given by: RRateless = ERateless log 2 (1 + SNRc) (4.4) Therefore, the curve for the achieved rate using a rateless code is a scaled version of the capacity curve, as seen in Fig. 4-4. For CC HARQ, once a fixed rate code is selected the receiver must accumulate a certain SNR, which is given by (4.1), to decode successfully. The fixed rate code 118 has an efficiency equal to EFixed at the initial rate, RFRIitaj. Note that the achieved rate is equal to where M is the number of repetitions, and the efficiency decreases with increasing M. If the required received SNR is an integer multiple of the channel SNR, that is if E SNRFixed = M * SNRc, then the fixed rate code will need to be repeated M times. The achieved rate as a function of SNRc for CC HARQ is: R(SNRc) = RRFRIt l SNRFixed SNRc, if 0 < SNRc • Z SNRFixed (4.5) - if SNRc > E SNRFixed RFRInitial, We will consider two choices of initial rate for CC HARQ. We will not have to make this choice for a rateless code, because given our assumptions and the fact that a rateless code maintains its efficiency at all achieved rates, it is always best to set the initial rate to the highest possible rate, corresponding to SNRM,,,. That is, for a rateless code, given SNRcMx, the achievable rate versus SNRc curve will be monotonically increasing in the channel SNR interval 0 < SNRc • SNRcM,,x and the achievable rate will be equal to the rate given in (4.4). We will now describe the two choices of initial rate for CC HARQ. Chase Combining with Highest Maximum Rate One choice of the initial rate for CC HARQ is to have the rate correspond to the highest possible channel SNR, SNRCMax. This involves setting: RFRIniti,• = EFixed * log 2 (1 + SNRc,,M ) (4.6) For this value of RFRn,,ia,, the necessary accumulated SNR at the receiver is: SSNRFixed = SNRcm. (4.7) The achieved rate will depend on the channel SNR, SNRc. The curve corresponding to the achievable rates is found by substituting (4.6) into (4.5), and using the identity 119 E SNRFixed = SNRcMa,, The equation for achievable rates as a function of the channel SNR with this choice of RFR1,itial for CC HARQ is: EFixed*log2(1+SNRCMax ) * SNRc, R(SNRc) = if 0 < SNRc < SNRCMax (4.8) SNRcM,, EFixed * log 2 (1 + SNRcM,,), if SNRc > SNRCMa In designing a CC HARQ system, we assume that EFixed is a fixed parameter and SNRcMX is given to the system designer. Therefore, after choosing the initial rate, the actual achieved rate is simply a function of the channel SNR. Chase Combining with Initial Rate Matched to Rateless Code Instead of trying to maximize the maximum achievable rate with CC HARQ, another option is to set the initial rate of the code equal to the maximum rate that a rateless code can achieve. This will result in a lower initial rate than in (4.6), so the necessary accumulated SNR will be less. Also, the achievable rate line has a steeper slope and is greater than the rateless achievable rate curve for a larger range of SNRc. In the design of CC HARQ, there is a tradeoff between the initial rate and the slope of the achievable rate versus SNRc line. Now the initial rate will be: RFRi,,it•, = ERateless log 2 (1 + SNRCMa,) (4.9) Recall that the only non-trivial scenario is when ERateless < EFi•ed, which means that the initial rate selected in (4.9) is less than the initial rate selected in (4.6). Again, the achieved rate depends on the channel SNR. The achieved rate for 0 < SNRc 5 E SNRFixed is found by substituting (4.9) into (4.5), and for this choice of initial rate is: R(SNRc) = ERateles log 2 (1 + SNRR ( SNRFixed 120 a) * SNRc (4.10) ERateesq log2 (1+SNRCM Substituting (4.9) into (4.1) we have that E SNRFixed = 2 - 1. EFixed Plugging this expression for E SNRFixed into (4.10), we obtain: R(SNRc) - SERateless log22(1 + SNR ER"t 1°g (1±SNR 0 2 for 0 < SNRc _I EFixed ,) 0 Max - •SNR 0 (4.11) 1 SNRFixed, and R(SNRc) = ERateless log 2 (1 + SNRCMa,) for SNRc > E SNRFixed. We will now plot the achieved rate versus the channel SNR for a rateless code with ERateless = 0.8 and for both choices of initial rate for CC HARQ, where the efficiency of the initial code is EFixed = 0.9. The maximum channel SNR, SNRcMax, is 63. Since we plot continuous curves instead of discrete (SNRc, R) pairs, we are essentially allowing for decoding after a non-integer number of blocks. These rates are plotted along with the channel capacity in Figure 4-4. We see that the rateless code is better than CC HARQ when the channel SNR is low, and the CC HARQ schemes perform better than rateless when the channel SNR is high. Recall that if ERateless = EFixed then the rateless code would perform the same as CC HARQ after one block and better everywhere else. The reason that CC HARQ performs better than rateless when SNRc is high is that in the high SNRc regime, CC HARQ codewords can be decoded after M < 2 blocks, where M is no longer constrained to be an integer for our curves. We have already seen that CC HARQ performs better than rateless for M = 1 and ERateles, < Eixed. At the points where the R = RFRIni 2 the rateless code achieves a higher rate than either CC HARQ code. Thus, we can view the better performance of a rateless code after two or more blocks for these relatively high initial rates as being a decrease in the required SNR to decode, or an increase in the achievable rate. Next we plot the achieved rate versus the channel SNR for a rateless code with a lower rateless efficiency, ERateless = 0.7, and the same initial efficiency for the fixed rate code, EFixed = 0.9. The same trend holds, with the initial rates selected for CC HARQ the rateless code performs better if two or more blocks are needed to decode. However, with ERateless = 0.7, the fraction of SNRc where CC HARQ can achieve a 121 5 a 4 i L.L c(D -a CO (D C. / i " . /. ° - - - Rateless Code CC HARQ with Maximum Rate . - -.. CC HARQ Matched to Rateles!S I'I 10 20 30 40 50 Channel SNR, SNR c , (linear) 60 Figure 4-4: Comparison of CC HARQ to using a rateless code when the two efficiencies are EFixed = 0.9 and ERateless = 0.8. SNRc,,, = 63. Achievable rates are plotted versus the channel SNR. We look at two options for choosing the initial rate for CC HARQ given SNRc,,,, the maximum possible channel SNR. The first option is transmitting at the maximum rate that SNRc,,x will allow, and the second option is transmitting at a rate that is matched to the rate that the rateless code obtains at SNRcMa 122 A b 5 .-.-.-.. - -- - - -- c 4 / /- a) .0 (d c- / u2 ** Capacity "-v S / I /" _..*." '..* n 0 - Rateless Code - I ,mI 10 20 . ...* CC HARQ with Maximum Rate / -.. CC HARQ Matched to Rateless I I II 30 40 50 60 70 Channel SNR, SNRc, (linear) Figure 4-5: Comparison of CC HARQ to using a rateless code when the two efficiencies are EFixed = 0.9 and ERateless = 0.7. SNRcma, = 63. Achievable rates are plotted versus the channel SNR. We look at two options for choosing the initial rate for CC HARQ given SNRcM,,, the maximum possible channel SNR. The first option is transmitting at the maximum rate that SNRCMax will allow, and the second option is transmitting at a rate that is matched to the rate that the rateless code obtains at SNRcma- higher rate than the rateless code is larger than it was for ERateless = 0.8. From Figs. 4-4 and 4-5, we see that for two reasonable pairs of ERateless and EFixed, the rateless code will give a higher rate than CC HARQ for all M > 2 for the two choices of initial code rate that we made. However, lowering the initial code rate for CC HARQ increases the range of SNRc where CC HARQ provides a higher rate than a rateless code. From Figs. 4-1 and 4-2 we saw that lowering the initial code rate can enable CC HARQ to perform better for M > 2, instead of being limited to only performing better when M = 1. The tradeoff with lowering the initial code rate for CC HARQ is that the achieved rate becomes saturated at a lower channel SNR. 123 With CC HARQ there is a tradeoff between wanting a low initial rate for efficiency and a high initial rate to take advantage of when the channel SNR is high. In CC HARQ, to balance this tradeoff a design point is usually selected based on a channel estimate. In contrast, a rateless code can easily be extended to a higher initial rate without any loss in efficiency. If the channel realization is in fact poor, there is no loss in efficiency with a rateless code when several blocks need to be transmitted. Thus, with a rateless code there is no tradeoff and a channel estimate is not needed. Finally, recall that if ERateless = EFixed, rateless performs as well as CC HARQ after the first transmission and better than CC HARQ for all M > 1. We have noted that whether or not a rateless code performs better than CC HARQ depends on the channel realization. In practice, to decide if a rateless code will perform better or not, one could compare rateless to CC HARQ given some criteria and a probability distribution on the channel SNR. For example, if the criterion of importance is to maximize the channel throughput, then the expected channel throughput for rateless and CC HARQ can be calculated numerically. The scheme with the highest throughput can be selected. Figures 4-4 and 4-5 can provide insight into this process. One can take the difference between a particular CC HARQ rate curve and a rateless curve, then take the expectation of that difference given the prior distribution of the channel SNR. This should be straightforward to implement for a particular channel. One difficulty might be in optimizing CC HARQ for that channel, in other words, deciding which CC HARQ curve to use to compare to rateless. The optimal CC HARQ might be one that we have not considered. It might also be useful to take into account the fact that the achievable rates in both schemes are constrained to be the initial rate divided by the number of blocks received. Our analysis should provide useful intuition into this process. 4.1.2 Rateless versus Incremental Redundancy HARQ It has been shown in [4] that the theoretical performance of incremental redundancy (IR) HARQ codes is better than that of CC HARQ when the channel is static. However, it is not trivial to design good incremental redundancy codes. We will 124 compare our rateless code to one good incremental redundancy code that has been described in [4]. Specifically, we will compare the gap to capacity of our rateless code after one and two blocks to the results in Figure 12 in [4]. We will provide enough information from [4] so that comparison will be self-contained. In [4], Figure 12, the codeword-error rate of various HARQ schemes is shown after two transmissions for a static channel. The smallest possible error rate that can be seen for all of the schemes occurs where the waterfall curves intersect the plot legend, and is slightly greater than 10-2. The initial rate of the code is 1.6 b/2D because a rate 0.4 base code is used and then those bits are bit-interleaved and mapped to 16-QAM. This corresponds exactly to our 4-layer code with a time-varying power distribution that had an initial rate of 1.6 b/2D. In [4], various demodulation methods are used and we will refer to the curves corresponding to log-MAP demodulation. For log-MAP demodulation, the accumulated SNR needed to decode with chase combining is ? 4.75 dB. Since this is the accumulated SNR needed in a static channel it is also equal to the SNR needed to decode after one transmission. The lower limit on the SNR to decode at that spectral efficiency is 10logo1 (21- - 1) = 3.078 dB. Therefore, the gap to capacity is 4.75 - 3.078 = 1.672 dB. With a turbo code of rate-4 as the mother code, the accumulated SNR needed to decode after two transmissions is approximately 3.5 dB. Since there were two transmissions, the actual channel SNR is a factor of two lower than the accumulated SNR, so the channel SNR is 0.49 dB. After two transmissions, the spectral efficiency drops to 0.8 b/2D, and the lower limit on SNR to decode at that spectral efficiency is 10logo0 (20.s - 1) = -1.301 dB. Thus, the gap to capacity after two transmissions is 0.49 + 1.301 = 1.791 dB. Here the gap to capacity actually increases with the second transmission. In contrast, in our Figure 2-6, the gap to capacity after one transmission is 2.3 dB but then it drops to 1.94 dB in the second transmission. For a rateless or HARQ code to maintain a constant efficiency, it is necessary for the gap to capacity to decrease as the rate decreases. We will not show a rigorous proof of this, but this is why the gap to capacity bound from Fig. 2-6 decreases with the number of blocks even though the corresponding efficiency bound 125 remains constant in Fig. 2-5. To be fair, after the second block, the gap to capacity of our code decreases at a smaller rate that is not enough to maintain a constant efficiency. Still, the efficiency of our code is bounded by (2.6) and our simulation results up to four blocks, summarized in Fig. 2-5, support the fact that our rateless code meets this bound. Although our rateless code has a slightly higher gap to capacity at the same spectral efficiency than the IR HARQ code based on a rate-- mother code in [4], we defined our gap to capacity at a bit-error rate of 10- 4 and we are comparing to IR HARQ at a codeword-error rate of approximately 1.5 * 10-2. Because we are comparing bit-error rate to codeword-error rate, we have tried to choose the BER and CER to be somewhat comparable, but it is not an exact comparison. In addition, our code could be made to have a smaller gap to capacity by using a better base code and also by using a lower rate code on each layer with more layers. In this light, the important result for relative performance is that the performance of our code is similar to the performance of the IR HARQ code. Since the gap to capacities are close, what is more important is that the gap to capacity of our rateless code decreases with decreasing spectral efficiency and the gap of the IR HARQ code does not. It is necessary for the gap to capacity to decrease as the spectral efficiency decreases to maintain a constant efficiency. While our code does not quite maintain the same efficiency, we have a lower bound on the efficiency of our code and our simulation results meet that lower bound. Looking beyond performance to how the two codes behave as we send the second block provides more insight than looking at performance alone. The fact that our rateless code behaves properly by having the gap to capacity shrink as we send the second block, while the IR HARQ code does not behave in this way, is important. We have compared one specific rateless code to a specific IR HARQ code in terms of performance and behavior. Making a more general comparison of the two types of codes, we can say that the initial rate of a rateless code can easily be changed by varying the number of layers per block. In addition, the layered, dithered repetition code from [7] takes advantage of breakthroughs in AWGN codes since an AWGN code 126 is used as the base code on each layer. In contrast, the design of a new IR HARQ code is a non-trivial problem and it is not as straightforward to incorporate advances in AWGN code design to the design of IR HARQ codes. One advantage of IR HARQ over rateless codes is that IR HARQ can provide a finer granularity of available rates without needing a large delay. That is, with a rateless code, since the blocklengths are always equal the transmission of the second rateless block drops the rate by a factor of two. With IR HARQ, the blocklengths of successive transmissions do not have to be equal. Instead, it is possible to transmit only a few parity bits when the initial transmission cannot be decoded, providing a smaller decrease in rate. 4.1.3 Rateless versus HARQ in Time-Varying Channels Until this point in this section we have compared rateless codes to HARQ when the channel is static, or quasi-static, meaning that the channel remains the same until the all of the information bits from the first block can be decoded. The channel is allowed to change when we send a new group of information bits, but it remains constant for all retransmissions corresponding to a particular group of information bits. We compared rateless to CC HARQ and noted that at high initial rates a rateless code will perform better than CC HARQ for M > 2. Since IR HARQ is better than CC HARQ in theory, we also compared rateless to a good IR HARQ code. We found that the performance of rateless was comparable to IR HARQ, and the behavior was in fact more desirable because the gap to capacity of the rateless code decreased as we transmitted more blocks. It has been shown in [4] that in block fading channels, where the channel SNR changes from block to block, CC HARQ can actually outperform IR HARQ. One advantage of CC HARQ in this situation is that each transmitted codeword is self decodable. With IR HARQ, each transmission does not have to be self decodable. Therefore, if the first transmission containing the information bits is badly corrupted, it might be hard to decode the information bits when only parity bits are transmitted in subsequent transmissions. Due to the time-varying nature of fading channels, it is desirable to have codes 127 that are self decodable. The rateless code with a time-varying power distribution of Section 2.2, originally from [7], has this self decodable property because all of the information bits are transmitted in every transmission. However, because the power allocated to each layer varies with time and is allocated assuming reception of all of the blocks with the same channel noise, it is not clear that it is well suited for the fading channel. A code that is well suited for the fading channel is the one simulated in Section 2.3, originally from [19]. This code was designed for fading channels. Each block is self decodable. The symmetric power distribution was designed assuming a different channel quality for each block. We have already simulated this code in a static channel and its performance is comparable to the code with a time-varying power distribution. From Fig. 2-19 we see that the gap to capacity decreases as we transmit more blocks, which we have previously stated is necessary to maintain a constant efficiency. While we do not have a bound on the efficiency of this code, in Fig. 2-20 we see that the efficiency remains fairly constant as we transmit up to ten blocks. 4.2 Future Direction - Robust Networking with Rateless Codes We have discussed the use of rateless codes by single users transmitting to a single receiver. Here we discuss how rateless codes might be used in a mobile ad hoc network (MANET) to provide a robust method of communication. Fundamentally, a rateless code provides an efficient and robust way to communicate over a channel of unknown quality. In addition, rateless codes have the potential of reducing the amount of control overhead and overhead in tracking the channel. These are highly useful attributes for MANETs. 128 4.2.1 The Potential of Rateless Codes in Networks Mobile ad hoc networks have been studied extensively. Often, the approach is to take traditional network protocols and try to adapt them to work in a MANET. One common approach is to take 802.11 and use that for a MANET. This approach has several shortcomings, which are highlighted in [3]. In [3], there are six configurations that lead to performance issues. The performance issues include: long-term fairness, short-term fairness, and low aggregate throughput. "Fairness" refers to the situation where one or more users achieve a high throughput at the expense of other users' throughput suffering. These three issues are caused by two fundamental problems: 1. In 802.11 when two or more users try to access the channel at the same time a collision occurs. In the case of a collision nothing is decoded at the receiver and what is received is discarded. 2. Because collisions are very detrimental to the system, the medium access control (MAC) protocol is designed to avoid collisions. This is accomplished through a sophisticated protocol involving random backoffs where each user waits to use the channel and does not transmit if it knows another user is currently using the channel. The MAC helps avoid collisions but can lead to large backoffs and waiting times for transmitters who want to use the channel. Interference from other users at the physical (PHY) layer causes collisions. A sophisticated MAC protocol is implemented to try to avoid collisions, which leads to further problems. Rateless codes address all of these problems. With a rateless code, if two or more nodes transmit at the same time, each user sees additional interference, but the packet is not lost. The users do not have a backoff period, instead they transmit the next block of their rateless code after they have not received an acknowledgment from the destination. Clearly, this cuts down on retransmit time. In addition, the receiver saves what it receives, rather than discarding it. The received information is combined with past receptions and the receiver attempts to decode using all of the received information. If we adapt a single user rateless code to be 129 used in a network, it would be advantageous to have a code that is designed for time-varying channels and also requires less synchronization. For those reasons, we recommend using the construction from [19] instead of the rateless code with a timevarying power distribution from [7]. A rateless code provides a more robust method for interference management at the PHY layer. This, in turn, simplifies the MAC protocol. It has been shown that a rateless code for the multiple access channel exists [14]. The multiple access channel is a channel where multiple users attempt to communicate with one base station. In [14], the rateless coding scheme simplifies the MAC protocol - instead of each user waiting its turn to use the channel, all users transmit at the same time after a synchronization beacon from the base station. MANETs are not constrained to have the same base station centered topology as the multiple access channel; however, [14] provides one concrete example of a rateless code providing a robust method of communication at the PHY layer, which in turn simplifies the MAC protocol with the overall result being a fair, high throughput system. 4.2.2 Issues with Rateless Codes in Networks While a promising start, the work in [14] focuses on the multiple access channel and not MANETs and also allows for the use of retransmissions with varying blocklengths. As future work, if a particular rateless code calls for retransmissions with varying blocklengths, it would be fruitful to consider the loss due to the use of constant blocklength retransmissions. Another issue with using a rateless code in a multiple user environment is the problem of per layer dithering. In order for all of the interference that a particular layer of a particular user sees to add independently when maximal ratio combining is used, every layer for every user must have a different dithering sequence. If multiple users have the same set of dithering sequences, then the interference from other users will be correlated when combining is done across blocks. If a more elegant solution besides giving every layer of every user a different dither sequence is not found, then the number of dither sequences will grow linearly with the number of users. One user 130 has MMax * L dither sequences, where MMax is the maximum number of transmitted blocks for one set of information bits, and L is the number of layers per block. MMax*L could already be a large number. If the number of dither sequences grows with the number of users, U, to be MMax * L * U, then this could pose a problem, as U could be quite large. If, instead, we take advantage of spatial reuse and make U the number of users in a certain area, or cell, of the system, then the number of dithering sequences will be smaller and more manageable. However, the spatial reuse approach increases the amount of control overhead needed to coordinate and allocate dither sequences among the users in the cell. If the dither sequences are not reused in different cells then each user can simply be allocated a certain set of dither sequences that it will always use that are different from the dither sequences that other users have. In addition, with rateless networking, the receiver needs to know who is sending information in order to decode that information. The receiver will need to know which dither sequences to use to decode what it is receiving. The MAC protocol for 802.11 provides this knowledge by letting the destination node know to expect a message from a particular emitter node. With a rateless code, a mechanism must be in place to provide this information to the receiver or remove the need for this information. 131 132 Chapter 5 Conclusions In this thesis we analyze and simulate rateless codes for the single-input single-output additive white Gaussian noise (AWGN) channel from [7] and [19]. Both rateless codes take advantage of and depend on using good low rate codes, and therefore for our simulations we use a good, rate-1/5 low-density parity-check (LDPC) code as the base code. In addition to rateless codes, we construct a code that provides both rateless and unequal error protection properties. Finally, we compare rateless codes to two common forms of hybrid automatic repeat request (HARQ) and explain how rateless codes can be used to improve performance in mobile ad hoc networks (MANETs). We begin Chapter 2 by looking at the layered, dithered repetition code from [7]. We incorporate the gap to capacity of the base code into the lower bound on efficiency for the code from [7] and show that the revised lower bound is still maximized by having the rate per layer go to zero. Our simulation results for up to four blocks meet the information theoretic lower bound, and for rates between 1/5 and 4/5 (b/1D) the efficiency of the rateless code is approximately 0.69 to 0.74. Since the code from [7] has a time-varying power distribution, we look at the effect of having a finite precision digital to analog (D/A) converter at the transmitter where the signal points cannot be transmitted with arbitrary precision. We found that decreasing the meansquared error of the D/A conversion does not monotonically increase the rateless code's performance. Although we were unable to discern any clear rules that relate D/A performance with the bit-error rate performance of the code, we saw that our four 133 layer code performs well with modest D/A resolution. Specifically, comparing 5-bit D/A precision to no D/A quantization, after one block there is no loss in performance and with four blocks there is a an increase of 0.16 dB in the gap to capacity of the code. We continue Chapter 2 by looking at the sub-block structured rateless code from [19]. We examine the design of a base code for each sublayer that is efficient in the time-varying interference channel that the code will see. We found that coding each sub-block within a sublayer with a different constant rate code results in a large loss in the achievable rate. We also simulated a rate-1/5 LDPC code that is good for the AWGN channel and found that, in some regimes, the code also performs well in this specific time-varying channel. For up to four sub-blocks per sublayer, the bit-error rate performance of the code is not affected down to a bit-error rate of 10- 5 . However, the bit-error rate performance is affected for more than four sub-blocks and there is an error floor at about 10- 5 for eight sub-blocks. Also, the frame-error rate performance decreases for more than one sub-block. For more than one sub-block, the relationship between the frame-error rate and the bit-error rate changes, indicating basically that frame errors are more common with more sub-blocks, but when a frame error occurs the number of bit errors decreases with more sub-blocks. This behavior suggests that a subset of the information bits is decoded in error more often than other information bits. Finally, when we construct and simulate a four layer rateless code using the LDPC base code, we obtain a code with an efficiency of approximately 0.66 at a bit-error rate of 10- 4 . More importantly, we can accurately predict the rateless code efficiency using the mutual information efficiency and the base code efficiency. To conclude Chapter 2 we describe two modifications to the rateless protocol to mitigate or even eliminate rate loss due to inefficiencies in the acknowledgment protocol. The first relies on accurate channel state information at the receiver, and involves using predictive acknowledgments (PACKs) at the receiver. The predictive acknowledgments can be made more robust if they are modified to be soft predictive acknowledgments. The second method is to interleave the rateless blocks from different groups of information bits. This relies on the transmitter having knowledge of the 134 acknowledgment delay time so that it can send blocks corresponding to other groups of information bits while it waits to receive an acknowledgment on a particular group of information bits. In Chapter 3, we create and analyze rateless unequal error protection (RUEP) codes. RUEP codes allow for the prioritization of a bitstream, and if the repetition of one or more sets of bits is allowed then RUEP is more efficient than traditional unequal error protection (UEP). In addition, in contrast to rateless codes, RUEP can accommodate two different constraints on the number of transmissions allowed for different sets of bits and one set of bits can be decoded before or at least no later than another set of bits. Finally, with one or more delay constraints on the bitstream, RUEP can allow for a larger range of available rates than a rateless code can. Chapter 4 compares rateless to two forms of HARQ and also explains how rateless codes can address various issues in mobile ad hoc networks. First, we compare rateless codes to Chase combining HARQ (CC HARQ) and then to incremental redundancy HARQ (IR HARQ). It is seen that, even with the assumption that the initial efficiency after one block of the rateless code is less than the efficiency of a fixed rate code, for more than one block, CC HARQ is only better than rateless if the initial rates are low. In comparing rateless to IR HARQ, we see that for one and two blocks the performance of the rateless code is similar to the performance of an IR HARQ scheme from [4]. While the performance is similar, the gap to capacity of the rateless code decreases as the number of blocks increases while the gap to capacity of the IR HARQ code actually increases as the number of blocks increases. To maintain a constant efficiency, it is desirable for the gap to capacity to decrease as the number of blocks increases. Therefore, while the performance of rateless and IR HARQ are similar, the rateless code behaves better than the IR HARQ code because the gap to capacity decreases as the spectral efficiency decreases. One direction for future work is laid out in the conclusion of Chapter 4 where we discuss using rateless codes in mobile ad hoc networks. We note that rateless codes are efficient and robust in channels with an unknown amount of interference. This provides a reliable method of communicating at the physical layer, which in turn 135 could simplify the medium access control (MAC) layer. However, there are issues, including assigning pseudorandom dither sequences to different users and knowing which users are transmitting in order for the receiver to decode. In addition to using rateless codes in networks, there are other directions for future work. These include further development of low rate base codes for the rateless code in [19]. The base code must be efficient in a time-varying channel. Also, simulations of RUEP codes would be worthwhile, especially if the use of predictive acknowledgments to maintain two constant streams of bits is tested under practical system constraints. 136 Appendix A Proofs Here we prove that 1og(1+x) is monotonically decreasing with x for all x > 0. This is used in proving the bound (3.56) in Sec. 3.3.6. We begin by taking the derivative: d log(1 + x) x dx x2 1 + log(1 + z) x2 (A.1) The derivative will be negative for all x > 0 if: 1 x2 + log(1 + x) 2 x (A.2) which can be rearranged to: 1 x+l X+l log(1 + x) x (A.3) < log(1 + x) (A.4) < It is well known that the bound A.4 is true for x > 0. Therefore, the derivative of 9og(1+x) is>0. negative for all x > 0, and the expression for all for all x > 0. 137 Iog(1+x) is monotonically decreasing 138 Appendix B Code Simulation Details To make the exposition in the main body more compact, we have left the details of the code simulation to this appendix. The simulations were run in MATLAB@' and used the Iterative Solutions Coded Modulation Library (ISCML), version cml.1.6.2, which can be downloaded from the Iterative Solutions website. 2 The ISCML is free software that is licensed under the GNU Lesser General Purpose License. Our simulation scripts use several functions from the ISCML, but were created to construct and decode rateless codes, whereas the ISCML includes scripts to simulate other codes. We have two different scripts to simulate the code from [7] and [19] separately, but we will discuss in parallel how the two codes are simulated since they are done in a similar way. In our simulations, we varied the channel SNR and at each different channel SNR ran tens of thousands of coding/decoding iterations. One coding/decoding iteration at an SNR point consists of the following. For a fixed rateless code power distribution, and at a certain channel noise level, we generate L codewords from randomly generated data bits for the code in [7], and L * LSub codewords for the rateless code in [19], where L is the number of layers in the rateless code, and LS·b is the number of sublayers in the rateless code from [19]. The codewords are modulated to binary phase shift keying (BPSK) symbols using the mapping {0, 1} -+ {1, -1}. 1MATLAB is a registered trademark of The MathWorks, Inc. For MATLAB product information, please contact: The MathWorks, Inc., 24 Prime Park Way, Natick, MA, 01760-1500, USA. Tel.: 508647-7000; Fax: 508-647-7001; E-mail: info@mathworks.com; Web: http://www.mathworks.com. 2 The Iterative Solutions website is: http://www.iterativesolutions.com/ 139 For the code in [7], we also generate L * M pseudorandom sequences (Bernoulli with parameter 1/2) that are the same length as the codeword, where M is the number of blocks that we will send and use in decoding the information bits. For the code in [19], we generate L * M * LSub pseudorandom sequences (Bernoulli with parameter 1/2) that are the same length as the codeword. The pseudorandom sequences are the dithering sequences and are also mapped to BPSK symbols with the same mapping as the codewords. For each rateless code, we multiply each of the L or L * LSub codewords by M different dithering sequences. Each codeword is multiplied by a unique set of dithering sequences. We then scale and superimpose the dithered codewords as described in [7] and [19]. We add randomly generated Gaussian noise to each symbol that has a variance corresponding to the SNR that we are simulating. At the receiver, we use maximal ratio combining (MRC). After MRC, we attempt to decode a particular codeword by entering the log-likelihood of each bit in the codeword into an iterative decoder. At the output of the decoder, we check the information bits for errors. If there is an error on one or more of the information bits, we declare and count a frame error. We also keep track of the number of bits that are decoded in error. We use the frame and bit error counters to calculate the frame and bit-error rates. We have described one coding/decoding iteration for a particular number of blocks, M, at a particular SNR. We run more coding/decoding iterations with the same M and SNR until a maximum number of frame errors is reached, or a maximum number of iterations has been run. For example, to simulate the rateless code from [7], the maximum number of frame errors is typically around 10, and the number of iterations around 10,000. We found that these parameters give consistent results and smooth waterfall curves in the region of interest. We then increment the channel SNR, and run coding/decoding iterations at the next specified channel SNR. After we are done with all of the specified channel SNRs, we can change M and specify a new range of channel SNR to simulate. 140 Bibliography [1] R. Calderbank and N. Seshadri. Multilevel codes for unequal error protection. IEEE Transactions on Information Theory, 39:1234-1248, July 1993. [2] D. Chase. A combined coding and modulation approach for communication over dispersive channels. IEEE Transactions on Communications, 21:159-174, Mar. 1973. [3] Claude Chaudet. Performance issues with ieee 802.11 in ad hoc networking. IEEE Communications Magazine, pages 110-116, July 2005. [4] Jung-Fu (Thomas) Cheng. Coding performance of hybrid arq schemes. IEEE Transactions on Communications, 54:1007-1029, June 2006. [5] S.-Y. Chung, G. D. Forney, T. Richardson, and R. Urbanke. On the design of low-density parity-check codes within 0.0045 db of the shannon limit. IEEE Communication Letters, 5:58-60, Feb. 2001. [6] Thomas M. Cover. Broadcast channels. Theory, IT-18:2-14, Jan. 1972. IEEE Transactions on Information [7] U. Erez, G. W. Wornell, and M. D. Trott. Faster-than-nyquist coding: The merits of a regime change. In The 42nd Annual Allerton Conference, October 2004. [8] 0. Etesami, M. Molkaraie, and A. Shokrollahi. Rateless codes on symmetric channels. In Proceedings of IEEE InternationalSymposium on Information Theory (ISIT2004), page 38, Chicago, Illinois, June 2004. [9] European Telecommunications Standards Institute. ETSI EN 302 307 v1.1.2 (2006-06): Digital Video Broadcasting (DVB); Second generationframing structure, channel coding, and modulation systems for Broadcasting, Interactive Services, News Gathering and other broadband satellite applications. [10] Jeongseok Ha, Jaehong Kim, Demijan Klinc, and Steven W. McLaughlin. Ratecompatible punctured low-density parity-check codes with short block lengths. IEEE Transactions on Information Theory, 52:728 - 738, Feb. 2006. 141 [11] K. P. Ho. Unequal error protection based on ofdm and its application in digital audio transmission. In Proceedings of IEEE Global Telecommunication Conference 1998 (GLOBECOM 98), volume 3, pages 1320-1324, Sydney, Australia, Nov. 1998. [12] J. Hou, Paul H. Siegel, and Laurence B. Milstein. Performance analysis and code optimization of low density parity-check codes on rayleigh fading channels. IEEE Journal on Selected Areas in Communications, 19:924-934, May 2001. [13] M. Luby. Lt-codes. In Proceedings of 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 271-280, Vancouver, BC, Canada, Nov. 2002. [14] U. Niesen, U. Erez, D. Shah, and G. W. Wornell. Rateless codes for the gaussian multiple access channel. In Proceedings of IEEE Global Telecommunication Conference (GLOBECOM), 2006. [15] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck. Discrete-Time Signal Processing. Prentice-Hall, 2nd edition, 1999. [16] R. Palanki and J.S. Yedidia. Rateless codes on noisy channels. In Proceedings of IEEE InternationalSymposium on Information Theory (ISIT2004), page 37, Chicago, Illinois, June 2004. [17] N. Rahnavard and F. Fekri. Finite-length unequal error protection rateless codes: design and analysis. In Proceedings of IEEE Global Telecommunication Conference 2005 (GLOBECOM '05), volume 3, 2005. Design of capacity[18] T. J. Richardson, A. Shokrollahi, and R. Urbanke. approaching irregular low-density parity-check codes. IEEE Transactions on Information Theory, 47:619-637, Feb. 2001. [19] M. Shanechi. Universal codes for parallel gaussian channels. Master's thesis, Massachusetts Institute of Technology, Cambridge, MA, June 2006. [20] Jerome M. Shapiro, Richard J. Barron, and Gregory W. Wornell. Practical layered rateless codes for the gaussian channel: Power allocation and implementation. In Proceedings of IEEE Signal Processing Advances for Wireless Communications Conference, Helsinki, Finland, June 2007. [21] A. Shokrollahi. Raptor codes. In Proceedings of IEEE InternationalSymposium on Information Theory 2004 (ISIT 2004), page 36, 2004. [22] A. Shokrollahi and R. Storn. Design of efficient erasure codes with differential evolution. In Proceedings of IEEE InternationalSymposium on Information Theory, page 5, June 2000. 142 [23] T. W. Sun, R. D. Wesel, M. R. Shane, and K. Jarett. Superposition turbo tcm for multirate broadcast. IEEE Transactions on Communications, 53(3):368-371, Mar. 2004. [24] X. Wang and M. T. Orchard. Design of superposition coded modulation for unequal error protection. In Proceedings of IEEE International Conference on Communications, pages 412-416, June 2001. [25] L. Wei. Coded modulation with unequal error protection. IEEE Transactions on Communications, 41:1439-1449, Oct. 1993. [26] Wang Yafeng, Zhang Lei, and Yang Dacheng. Performance analysis of type iii harq with turbo codes. In Proceedings of IEEE Vehicular Technology Conference, pages 2740 - 2744, April 2003. 143