LDPC Options for Next Generation Wireless Systems T. Lestable* and E. Zimmermann# *Advanced Technology Group, Samsung Electronics Research Institute, UK Technische Universität Dresden, Vodafone Chair Mobile Communications Systems # Abstract—Low-Density Parity-Check (LDPC) codes have recently drawn much attention due to their nearcapacity error correction performance, and are currently in the focus of many standardization activities, e.g., IEEE 802.11n, IEEE 802.16e, and ETSI DVB-S. In this contribution, we discuss several aspects related to the practical application of such codes to wireless communications systems. We consider flexibility, memory requirements, en- and decoding complexity and different variants of decoding algorithms for LDPC codes that enable to effectively trade-off error correction performance for implementation simplicity. We conclude that many of what have been considered significant disadvantages of LDPC codes (inflexibility, high encoding complexity, etc.) can be overcome by appropriate use of different algorithms and strategies that have been recently developed – making LDPC codes a highly attractive option for forward error correction for B3G/4G systems. Index Terms—LDPC, Belief Propagation, Bit Flipping, Scheduling, Complexity, TGnSync, 4G. INTRODUCTION R ecently many standards proposals, namely TGnSync [15][27], or WWise [28] for IEEE 802.11n, together with IEEE 802.16e [14], have considered LDPC coding schemes as key component of their system features. The adoption by such standards activities proves the increasing maturity of the LDPC related technology, especially the affordable joint complexity from encoder and decoder implementation. From sub-optimal lower- complexity decoding algorithms [16] to complete flexible architecture design [6][7][8], some pragmatic and realistic implementation solutions allow LDPC codes to be more and more attractive as enhancement of current (B3G) or next generation wireless systems (4G) [29]. The aim of this paper is thus to present and evaluate non-exhaustive solutions that allow to decrease the global complexity of encoding/decoding. The first part presents basic properties of LDPC codes, together with the message-passing principle. Then we tackle the encoder complexity issue, where Hardware (HW) requirements in terms of dimensioning are assessed, relying on the Block-LDPC approach. The decoder side is kept for the final part, as it represents the more voracious element within the joint design. We thus review and evaluate performance/complexity trade-off of suboptimal low-complexity decoding algorithms, and highlight common trends in the architecture of such LDPC codes, by evaluating their HW requirements. LDPC Codes: Fundamentals In this part, we briefly introduce the basic principle and notations of LDPC codes. For further reading see [1][16][17]. LDPC codes are linear block codes whose parity-check matrix H has the favourable property of being sparse, i.e. contains only a low number of non-zero elements. Tanner graphs of such codes are bipartite graphs containing two Page 1 (10) different kinds of nodes, code (bit, or variable) nodes and check nodes. A (n, k) LDPC code is thus represented by a m x n parity-check matrix H, where m=n-k, is the number of redundancy (parity bits) of the coding scheme. We can then distinguish regular from irregular LDPC codes, depending on the degree distribution of code nodes (column weights) and check nodes (row weights). A regular scheme means that these distributions are constant along column and rows, and are usually represented by the notation (dv, dc). For such a code, the number of non-zero elements is thus given either by n*dv or m*dc, leading to the following code rate relation Rc=1-(m/n)=1-(dv/dc). The decoding of LDPC codes is relying on the Belief-Propagation Algorithm (BPA) framework extensively discussed in literature [19]. This involves two major steps, the check node update and the bit node update. (Fig.1), where intrinsic values from the channel feed first bit nodes (parents), then extrinsic information is processed and forwarded to check nodes (child), that themselves will produce new extrinsic information relying on parity-check constraints, feeding their connected bit nodes. Figure 1: Message-Passing Illustration Check Node Update followed by Bit Node Update The way of switching between bit and check nodes updates is referred as scheduling, and will be discussed later on, as this can impact the decoder complexity. Joint Design Methodology With parallelization holding the promise of keeping delays low while continuously increasing data rates, the major attraction from a design point of view is that LDPC codes are inherently fully parallel oriented. Nevertheless, a fully parallel implementation is prohibitive due to large block lengths. Consequently a strong trend is currently going towards semi-parallel architectures [6][8], with Block-LDPC being the centrepiece for this approach. Another important practical issue when dealing with coding schemes for adaptive air interfaces is flexibility in terms of block sizes and code rates. While designing LDPC codes we have therefore to keep in mind the direct and strong relation between the structure of the parity check matrix and the total encoding, decoding and implementation complexity. Indeed, a completely random LDPC might achieve better performance at the expense of a very complex interconnections (shuffle) network that might be prohibitive for large block lengths in terms of HW wirings, together with leading to potentially high complexity encoding, low achievable parallelization level, and most importantly –low flexibility in terms of block sizes and code rates. Therefore the sequel intends to highlight and assess the most relevant performance/HW requirements trade-offs. Random-Like LDPC One typical way of constructing good LDPC codes is to take a degree distribution that promises good error correction performance [17] (e.g. by EXIT chart curve matching of variable and check node decoder transfer curves) as a starting point and then use e.g. progressive edge growth (PEG) [18] algorithms to ensure a good distance spectrum, i.e., low error floors. Codes that are contructed following this framework usually come very close to the bounds of what is achievable in terms of error correction performance [17]. They are hence considered as the baseline comparison case for performance assessment. The disadvantage of this approach for practical implementation is that a new code needs to be designed for each block length and code rate, leading to the above mentioned low flexibility. The obtained codes are often non-systematic, thus requiring appropriate preprocessing to enable near linear-time encoding [2]. Page 2 (10) Structured LDPC (Block-LDPC) Structured (Block-)LDPC codes on the other hand, such as Pi-Construction Codes [15] or LDPC Array Codes [14] proposed in the framework of IEEE 802 have been shown to have good performance and high flexibility in terms of code rates and block sizes at the same time. The parity-check matrix H of such codes can be seen as an array of square subblock matrices. These latter sub-block matrices are obtained by circular shift and/or rotation of the identity matrix, or are all-zero matrices. The parity check matrix is hence fully determined by means of these circular shifts, and the square sub-block matrices dimension p. LDPC codes defined by such standards rely on the concept of a base model matrix introduced by Zhong and Zhang [7], which contains the circular shift values. Figure 2 shows the base model matrix (Mb,Nb)=(12, 24) Hb for the Rc=1/2 LDPC code defined in TGnSync [27]: Figure 2: Base model matrix for TGnSync Rc=1/2 [27] Adaptation to different block lengths can e.g. be done by expanding elements of the base matrix (e.g. by replacing each “1” in the parity check matrix by an identity matrix and each “0” by an all-zeros matrix). Different code rates are obtained by appending more elements to the matrix in only one dimension (i.e., add more variable nodes, but no check nodes). Note that the decoder must be flexible enough to support such changes of the code structure. Using such base matrices hence adds flexibility in terms of packet length together with maintaining the degree distributions of H. Indeed, for a given block length N, the expansion factor p (denoted Zf in standards) is obtained through p=N/Nb. As N must therefore be a multiple of Nb, the maximum achievable granularity is obviously bounded by the size of the base model matrix. In the case of the TGnSync LDPC code the block length is hence scalable in steps of 24 bits – it is not very probable that a higher granularity could be required. Another interesting aspect is that the whole expansion process is independent from the circular shift values, leading to many different possible LDPC code designs [10][11][12] with different performance, but capable of being mapped onto the same semi-parallel decoding architecture. Encoding Complexity One considerable challenge for the application of LDPC in practical wireless systems has long been encoding complexity (it is in fact a still quite widespread misconception that this remains an open issue). If the parity check matrix is in non-systematic format, straightforward methods for encoding destroy (or do not exploit) the sparse nature of the matrix – thus leading to an encoding complexity quadratic with block length. However, in his famous paper [2], Richardson presented several pre-processing techniques to transform H into an approximate lower triangular (and thus approximate systematic) form, leading to a complexity quadratic w.r.t. to only a small fraction of the block length n for the encoding. The resulting H format is given below (Fig. 3) for the Block-LDPC after expansion factor p (=Zf): N −M g M −g A B T H= C D E M −g M = m⋅ p g =γ ⋅ p N = n⋅ p Figure 3: Approximated Lower Triangular H to facilitate near linear-time encoding Page 3 (10) The Block-LDPC considered in this paper [27], have exactly the above requested format, and thus enable first to estimate accurately the encoding complexity, and then to take advantage of the pipelined processing described hereafter (Fig. 4): (1) s T A E −T sT p1T (2) −1 − Φ −1 B − T −1 The total number of logical gates (NAND) is given in Fig.5, as a function of both the block length (indexed by its expansion factor Zf), and the code rate Rc. Therefore dimensioning requires around 11K gates for the worst case, here a rate Rc=1/2 code of codeword length 2304 bits (Zf=96). p T2 (3) C Figure 4: Pipelined encoder structure The remaining complexity comes from the inversion of two matrices. Fortunately, due to the triangular nature of blocks (1) and (3), this can be solved by back-substitution, thus considerably decreasing the amount of operations. Then (2) involves only matrixvector multiplications, enabling the use of dedicated techniques (cf. the colouring problem described in [7]) for proposing efficient architectures. Nevertheless, the complexity of (2) is still O(g2), where g is proportional to the expansion factor. It is thus recommended to fix g=p=Zf. Alternatively, one may right away construct the parity check matrix in systematic form [15] and achieve full linear time encoding. Encoder HW requirements Taking into account all these HW requirements, we follow estimations given in [7], and apply them to the TGnSync LDPC proposal. Figure 6: ROM width (bits) for TGnSync Encoder w.r.t. Block length (Zf) and Code Rate (Rc) Figure 6 above depicts the memory requirements (ROM in bits), for storing counters initialization values. This amount is directly related to the Non-Zero blocks in H, underlining sparsity differences among the coding schemes available. The case Rc=3/4 is now the worst case. Figure 7: Register width (bits) for TGnSync Encoder w.r.t. Block length (Zf) and Code Rate (Rc) Figure 5: Number of Logical Gates for TGnSync Encoder w.r.t. Block length (Zf) and Code Rate (Rc) The register storage is related with the pipelined structure (Fig. 4) used for encoding. Page 4 (10) The code rate ranking is respected here (Fig. 7), and Rc=0.5 is once again the worst case requesting around 15.6Kbits of memory (1Kbits=1024 bits). As a result, to implement a fully pipelined encoder working with the whole range of coding schemes available in TGnSync we need 11K gates, together with around 16.2Kbits of memory. Note that for a random-like LDPC the required amount of memory will be significantly higher, even if the sparsity of the check matrix is retained for encoding, as will be highlighted later when discussing interleaver complexity. Decoding Complexity On the receiver side, there are mainly three different topics that have to be investigated: how to decrease the decoder complexity, how to increase the efficiency by means of generic architectures, and/or how to achieve high throughput. We will start by considering decoding complexity. As is obvious from the structure of the message passing process, the average complexity of LDPC decoding process is the product of three factors: o the node complexity o the average number of iterations, and o the number of nodes in each iteration. In the following, we will discuss how these three factors can be minimized using state-ofthe-art algorithms. Node Complexity – Sub-Optimal Decoding The standard algorithm for decoding of LDPC codes is the so-called “belief propagation algorithm (BPA)”, also known as sum-product algorithm [1][19]. For implementation simplicity, it is convenient to execute the algorithm in the log-domain, turning the typically required multiplications into simple additions (e.g. at the bit nodes). Following this path, however, has a significant disadvantage: calculating the check node messages then requires the non-linear “box-plus” operation. This drawback can be overcome by applying the maxLog approximation known from Turbo decoding – resulting in the well-known “MinSum” algorithm [16]. Unfortunately, introducing this approximation results in a typical performance loss of 0.5-1dB. Several proposals aim at reducing this offset by introducing correction terms in the calculation of check node messages [20][21]. However, the most efficient method by far is a simple scaling of the check node messages [16] that enables to recover close-to-optimal error correction performance. We will refer to this algorithm as the corrected MinSum algorithm. Calculating variable and check node messages then only involves simple sum and minimum operations, respectively, plus a scaling of the check node messages. It is conjectured that no further substantial reduction in the node complexity is possible without accepting quite significant performance losses. In this context, Bit-Flipping Algorithms can be considered as a viable solution for low-end terminals where some loss in performance is acceptable when large savings in decoding complexity can be achieved. Such types of algorithms based on majority-logic decoding have recently received increased interest from the research community [3][4][5]. Proposals for weighted bit-flipping [5][22] show reasonable performance at extremely low implementation complexity. A drawback of such approaches is the absence of reliability information (soft output) at the output of the decoder. However, since low-end terminals will most probably not use iterative equalisation and/or channel estimation techniques in any case, this appears to be no significant limitation. Convergence Speed – Scheduling It has been shown that the classical method of message passing (called flooding), that consists in updating all the nodes on one side of the Tanner graph before going into the next half-iteration, leads to the highest memory requirements, together with a higher number of iterations (delay). Alternative schedulings of the message passing [23][24][25][13] also known as “shuffled” or “layered” belief propagation in fact yield faster convergence, thus enabling to reduce the average and maximum number of decoder iterations while retaining performance. The methods proposed by Mansour and Fossorier ([30][13]), are updating all bit nodes connected to each check node (horizontal Page 5 (10) scheduling), or all check node connected to each bit nodes (vertical scheduling), respectively. A speedup factor of 2 in the average number of iterations is usually achieved [13]. Moreover, if we denote Γ as the number of non-zero elements of H, and q as the number of quantization bits, we can obtain the following memory requirements (in bits): • Flooding: Γ.q+3.n.q • Horizontal: Γ.q+n.q • Vertical: Γ.q+(2-Rc).n.q allows to trade computational complexity for decoding performance. Decoding complexity can be lowered by around 20% at a target FER of 1%, for both message passing and MinSum decoding at negligible losses in error correction performance [26]. Decoder Architecture Recently many architecture proposals [6][8] are more or less relying on the same trend, which is generic semi-parallel architecture, where the degree of parallelism can be tuned depending on throughput and HW requirements. While using Block-LDPC from TGnSync, these architectures converge towards the following scheme (Fig. 9): MBV1 MBV1 VNU1 VNU 2 Shuffle Control Unit Figure 8: Memory Requirement w.r.t. Scheduling algorithm (e.g. q=4 bits) If we consider the case of 4 quantization bits (q=4), then we can compare the sensitivity of memory requirement w.r.t. scheduling process in TGnSync (Fig. 8). The amount of memory saved by applying alternative scheduling emphasizes that it is important to investigate design alternatives. Number of Nodes – Forced Convergence The “forced convergence” [26] or “sleeping node” [9] approach aims at reducing the third factor – the number of active nodes. The basic idea is to exploit the fact that a large number of variable nodes converges to a strong belief after very few iterations, i.e., these bits have already been reliably decoded and one may skip updating their messages in subsequent iterations. In order to identify such nodes, “aggregate messages'” at the variable and check nodes are compared with threshold values. Choosing the thresholds appropriately MBV1 CNU1 CNU 2 MBC1 MBC2 …… …… VNU n Shuffle −−11 …… …… CNU m MBCm Figure 9: Generic Semi-Parallel Architecture for LDPC Decoder The decoder thus constitutes of two processing sets, the Variable Node Units (VNU) and the Check Node Units (CNU) respectively, which are connected through an interleaving (shuffle) network. Each Processing Unit takes care either of a subset of code nodes (columns), or a subgroup of check nodes (rows). In our case (TGnSync), the number of VNUs and CNUs is given by the dimensions of the respective base model matrices. That means VNUs are fixed to Nb=24 elements, and CNUs vary as a function of code rate Rc as m=Nb.(1-Rc): • Rc=1/2, #CNUs=12 • Rc=2/3, #CNUs=8 • Rc=3/4, #CNUs=6 • Rc=5/6, #CNUs=4 The size of the Memory Banks (MB) attached to the Processing Unit (avoiding memory access conflicts), that store the messages from each edge within each column or row, is Page 6 (10) given by the number of edges in each slice, which is exactly (due to Block-LDPC construction) the expansion factor Zf, and their total number is simply the number of non-zero elements within the base model matrix (often denoted as P ). We just need now to add some memory to store the channel messages, together with the hard decision. Finally we need 3.Nb more memory banks of size Zf. Decoder HW requirements By applying now the above used estimation technique for memory usage to all the coding schemes available in TGnSync proposal, this results in the following requirements (Fig. 10): Figure 11: Number of Logical Gates for TGnSync Decoder w.r.t. Block length (Zf) and Code Rate (Rc) Obviously, this is totally independent of the block length, and in order to be fully compliant with TGnSync proposal the decoder might request around 40K gates, which is quite affordable for an FPGA. Interleaver Complexity This last point can lead to prohibitive complexity for the implementation of LDPC codes, in terms of wiring (routing problems), but memory consumption as well. To serve as a reference, a randomly generated LDPC code will require: Figure 10: Memory Requirement for TGnSync decoder w.r.t. Block length (Zf) and Code Rate (Rc) It is worth noticing that the code of rate ¾ is requesting the largest amount of memory, around 53.7Kbits. Now, considering the complexity itself, authors from [7] evaluate code nodes and check nodes complexity equivalent to 250 and 320 NAND gates, respectively (nevertheless this can vary depending on the position of the shuffle network, thus impacting the complexity of Node Units). In our case we apply the shuffle positioning proposed by Zhong in his thesis [6], and thus leading to following complexity evaluation (Fig. 11): log 2 ( N ⋅ d v ) ⋅ N ⋅ d v bits for storing addresses. For instance, in our case with Zf=96 (2304 bits), code rate Rc=1/2, and an average bit node degree of 3.4, we would need around 6.13*107 bits,i.e., around 10Mbytes of memory! Fortunately using such Block-LDPC allows to for determining directly the interleaver relation by means of the base model matrix and the circular shifts values. For the current case, we need to store 12*24 cyclic shift values from -1 to 46, leading to a memory requirement of 1728 bits, which represents 0.28% of the memory consumed in the random case. Page 7 (10) The following two items can be regarded as central when considering the application of LDPC codes to wireless communications systems, i.e., their implementation for a realtime system: o How much is lost in error correction performance by replacing true belief propagation by a reduced complexity variant? o How much is performance deteriorated if we prefer structured to random-like LDPC codes? To answer these two questions, we evaluate the performance of the above mentioned Block-LDPC, as well as several random-like codes of comparable block length from [31] for different decoding algorithms and block lengths. All of the random-like LPDC codes were constructed using the PEG [18] algorithm and many of them are in fact considered to be the best available codes for the considered block length [31]. 10 -1 10 -2 10 -3 BPA, 20 it. BPA, 100 it. MinSum, 20 it. MinSum, 100 it. Corr. MinSum, 20 it. Corr. MinSum, 50 it. 0 1 10 -2 FER -1 -3 0 2 Eb/N0 [dB] 3 4 Figure 13: Comparison between different decoding algorithms, for a random-like LDPC of length N=2048 Using the corrected Min-Sum algorithm is clearly the most promising option, especially for larger block lengths (cf. the results in Figure 13). The SNR loss w.r.t. true BPA decoding is usually below 0.2dB – which is quite acceptable when considering a practical implementation of LDPC codes. 10 0 10 -1 10 -2 10 -3 N=504, Random-like code N=576, Structured code N=1024, Random-like code N=1152, Structured code N=2048, Random-like code N=1728, Structured code 0 10 10 0 FER 10 10 FER Decoding Performance BPA, 100 it. MinSum, 20 it. MinSum, 100 it. Corr. MinSum, 20 it. Corr. MinSum, 50 it. 1 2 Eb/N0 [dB] 3 4 Figure 12: Comparison between different decoding algorithms, for a Block-LDPC of length N=288 As can be observed from Figures 12 and 13, the difference between standard belief propagation and Min-Sum decoding is in the order of 0.5-1dB. The losses tend to be higher at larger block lengths. Increasing the number of (maximum) iterations for Min-Sum decoding obviously allows for reducing this gap (especially for short codes) – at the expense of lower savings in decoding complexity. 0 1 2 Eb/N0 [dB] 3 4 Figure 14: Comparison between performance of structured and random-like LDPC codes (BPA, 50 it.) Figure 14 illustrates that the loss in performance incurred by going from randomlike LDPC to Block-LPDC is not substantial (below 0.2 dB at a target FER of 5%). Note that the exhibited differences are partly due to slightly different block sizes. These results clearly motivate the use of Block-LDPC instead of random-like LDPC. Page 8 (10) Acknowledgements 0 10 This work has been performed in the framework of the IST project IST-2003507581 WINNER, which is partly funded by the European Union. The authors would like to acknowledge the contributions of their colleagues, although the view expressed are those of the authors and do not necessarily represent the project. -1 FER 10 -2 10 -3 10 -4 10 0 1 2 3 BPA, len=288 BPA, len=576 BPA, len=1152 BPA, len=1728 WBFA, len=288 WBFA, len=576 WBFA, len=1152 WBFA, len=1728 4 5 6 7 Eb/N0 (dB) 8 9 10 Figure 15: Performance Comparison between BPA, WBFA From figure 15 it can be seen that using low complexity Weighted Bit-Flipping Algorithm (WBFA) [5] instead of “standard” algorithms for LDPC decoding results in a performance loss between 5-6 dB, which is quite significant. Nevertheless, this degradation should be balanced by considering that BFA don’t require any message-passing storage, or complex Processing Units. They thus might be suitable for very low end terminals. Conclusions After their creation in the early sixities by Gallager, LDPC codes sank into oblivion until they were rediscovered in the mid-nineties. Their recent introduction within standards is the proof that many of what have been considered serious challenges for the practical implementation of LDPC codes have been efficiently tackled by recent research advances. Their adoption from the implementation oriented community (and eventually their industry take-off) is facilitated by increasing research activities targeting to formalize the joint framework of the LDPC encoding/decoding architecture. In this paper, we reviewed the keystone elements of such a formal process. We are convinced that the combination of implementation oriented code design (Block-LPDC) and sub-optimal decoding algorithms (corrected MinSum decoding) will make LDPC codes a viable option for next generation wireless systems. REFERENCES [1] R.G Gallager, ‘Low-Density Parity-Check Codes’, Cambridge MA, MIT Press, 1963. [2] T. Richardson, and R. Urbanke, ‘Efficient Encoding of Low-Density Parity-Check Codes’, IEEE Trans. on Info. Theory, Vol.47, N.2, Feb. 2001. [3] L. Bazzi, T. J. Richardson, and R. Urbanke, ‘Exact Thresholds and Optimal Codes for the Binary-Symmetric Channel and Gallager’s Decoding Algorithm A’, IEEE Trans. on Info. Theory, Vol. 50, N.9, Sept. 2004. [4] P. Zarrinkhat, and A. H. Banihashemi, ‘Threshold Values and convergence Properties of Majority-Based Algorithms for Decoding Regular Low-Density Parity-Check Codes’, IEEE trans. on Comm., Vol.52, N.12, Dec. 2004. [5] J. Zhang and M. P. C. Fossorier, ‘A Modified Weighted Bit-Flipping Decoding of Low-Density Parity-Check Codes’, IEEE comm. Letters, Vol. 8, N.3, March 2004. [6] T. Zhang, ‘Efficient VLSI Architectures for ErrorCorrecting Coding’, University of Minnesota, Ph.D Thesis, July 2002. [7] H. Zhong, and T. Zhang, ‘Block-LDPC: A Practical LDPC Coding System Design Approach’, IEEE Trans. on Circ. And Syst.-I: Reg. Papers, Vol. 52, N. 4, April 2005. [8] F. Guilloud, ‘Generic Architecture for LDPC Codes Decoding’, ENST Paris, Ph.D Thesis, July 2004. [9] R. Bresnan, ‘Novel Code Construction and Decoding Techniques for LDPC Codes’, National University of Ireland, M. Eng. Sc Thesis, September 2004. [10] B. Vasic, E. M. Kurtas, and A. V. Kuznetsov, ‘LDPC Codes Based on Mutually Orthogonal Latin Rectangles and Their Application in Perpendicular Magnetic Recording’, IEEE Trans. on Magn., Vol. 38, N. 5, Sept. 2002. [11] O. Milenkovic, I. B. Djordjevic, and B. Vasic, ‘BlockCirculant Low-Density Parity-Check Codes for Optical Communications systems’, IEEE Journal Of Selected Topics in Quantum Electronics, Vol. 10, N. 2, March/April 2004. [12] B. Vasic, and O. Milenkovic, ‘Combinatorial Constructions of Low-Density Parity-Check Codes for Iterative Decoding’, IEEE Trans. on Info. Theory, Vol.50, N.6, June 2004. [13] J. Zhang, M. Fossorier, “Shuffled iterative decoding,” IEEE Trans. Comm., vol. 53, no.2, pp 209-213, Feb. 2005. [14] IEEE 802.16e, “LDPC coding for OFDMA PHY,” IEEE Doc. C802-16e-05/066r3, January 05 [15] IEEE 802.11n, “Structured LDPC Codes as an Advanced Coding Scheme for 802.11n,” IEEE Doc. 802.1104/884r0, August 2004 Page 9 (10) [16] J. Chen and M. Fossorier, “Near Optimum Universal Belief Propagation Based Decoding of Low-Density Parity Check Codes,” IEEE Transactions on Communications, Vol. 50, No. 3, pp. 406–414, Mar. 2002. [17] T.J. Richardson, M.A. Shokrollahi, R. Urbanke, “Design of Capacity-Approaching Irregular Low-Density Parity-Check Codes”, IEEE Trans. Inform. Theory, vol. 47, No. 2, pp 617-637, Feb. 2001 [18] X.Y. Hu, E. Eleftheriou and D.M. Arnold, “Regular and Irregular Progressive Edge-Growth Tanner Graphs”, submitted to IEEE Trans. On Inf. Theory 2003 [19] D. J. C. MacKay and R. M. Neal, “Near Shannon Limit Performance of Low-Density Parity-Check Codes”, Electron. Lett., vol. 32, pp. 1645-1646, August 1996 [20] T. Clevorn and P. Vary. “Low-Complexity Belief Propagation Decoding by Approximations with LookupTables.” In Proc. of the 5th International ITG Conference on Source and Channel Coding, pp. 211–215, Erlangen, Germany, January 2004. [21] G. Richter, G. Schmidt, M. Bossert, and E. Costa. „Optimization of a Reduced-Complexity Decoding Algorithm for LDPC Codes by Density Evolution.” In Proceedings of the IEEE International Conference on Communications 2005 (ICC2005), Seoul, Korea, March 2005. [22] Y. Kou, S. Lin and M. Fossorier, “Low Density paritycheck codes based on finite geometries: A rediscovery and more,” IEEE Trans. Inform. Theory, vol. 47, pp. 2711-2736, Nov. 2001. [23] F. R. Kschischang, B. J. Frey, “Iterative Decoding of Compound Codes by Probabilistic Propagation in Graphical Models,” Journal on Select. Areas Commun., pp. 219-230, 1998. [24] Y. Mao, A. H. Banihashemi, “Decoding Low-Density Parity-Check Codes with Probabilistic Scheduling,” Comm. Letters, 5:415-416, Oct. 2001. [25] E. Yeo, B. Nikolie, and V. Anantharam, “High Throughput Low-Density Parity-Check Decoder Architectures,” IEEE Globecom 2001. [26] E. Zimmermann, W. Rave, and G. Fettweis, “Forced Convergence Decoding of LDPC Codes - EXIT Chart Analysis and Combination with Node Complexity Reduction Techniques,” in Proceedings of the 11th European Wireless Conference (EW’2005), Nicosia, Cyprus, Apr. 2005. [27] TGn Sync proposal: http://www.tgnsync.org/home [28] WWISE proposal: http://www.wwise.org/ [29] WINNER FP6: https://www.ist-winner.org/ [30] Mansour, “Turbo-decoder Architectures for Density Parity-Check codes”. IEEE Globecom 2002. Low- [31] D.J.C. MacKay, “Encyclopedia of Sparse Graph Codes”. Available online at http://wol.ra.phy.cam.ac.uk/mackay/codes/data.html Page 10 (10)