c 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. doi: http://dx.doi.org/10.1109/ICC.2009.5199493 Stochastic Decoding of LDPC Codes over GF(q ) Gabi Sarkis, Shie Mannor and Warren J. Gross Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7 Email: gabi.sarkis@mail.mcgill.ca, shie.mannor@mcgill.ca, warren.gross@mcgill.ca Abstract—Nonbinary LDPC codes have been shown to outperform currently used codes for magnetic recording and several other channels. Currently proposed nonbinary decoder architectures have very high complexity for high-throughput implementations and sacrifice error-correction performance to maintain realizable complexity. In this paper, we present an alternative decoding algorithm based on stochastic computation that has a very simple implementation and minimal performance loss when compared to the sum-product algorithm. We demonstrate the performance of the algorithm when applied to a GF(16) code and provide details of the hardware resources required for an implementation. I. I NTRODUCTION Low-Density Parity Check (LDPC) codes are linear block codes that can achieve performance close to the Shannon limit under iterative decoding. Binary LDPC codes have received much interest and are specified in recent wireless and wireline communications standards, for example digital video broadcast (DVB-S2), WiMAX wireless (IEEE 802.16e) and 10 gigabit Ethernet (IEEE 802.3an). Nonbinary LDPC codes defined over q-ary Galois fields (GF(q)) were introduced in [1] and were shown to perform better than equivalent bit-length binary codes for additive-white Gaussian noise (AWGN) channels. In [2], Song et al. showed that GF(q) LDPC codes significantly outperform binary LDPC and Reed-Solomon (RS) codes for the magnetic recording channel. Chen et al. [3] demonstrated that LDPC codes over GF(16) perform better than RS codes for general channels with bursts of noise; thus making GF(q) LDPC a candidate to replace RS coding in many storage systems. Djordjevic et al. concluded that nonbinary LDPC codes achieve lower BER than other codes while allowing for higher transmission rates when used with the fiber-optic channel [4]. LDPC codes over GF(q) are defined such that elements of the parity check matrix H are elements of GF(q). As in the binary case, these codes are decoded by the sum-product algorithm (SPA) applied to the Tanner graph representation of the parity-check matrix H. Unfortunately, the nonbinary values of H result in very high complexity for the check node updates in the graph, presenting a significant barrier to practical realization. The only hardware implementation in the literature is fully serial, consisting of only one variable node and one check node [5]. There have been a number of approaches to reduce the complexity of the check node update in the literature. MacKay et al. proposed using the fast Fourier transform (FFT) to convert convolution to multiplication in the check nodes [6]. Song et al. use the log-domain to replace multiplication with addition [2]. Declercq et al. introduced the extended min-sum (EMS) algorithm as an approximation to the SPA, computing likelihood values for only a subset of the field elements; thus reducing the number of computations performed [7]. While these approaches are simpler than a direct implementation of the SPA, there is a need to further reduce the complexity for practical decoder implementations. Recently, a new approach to decoding binary LDPC codes based on stochastic computation ([8], [9]) was introduced in [10]. Stochastic decoders use random bit-streams to represent probability messages and result in simple node hardware and reduced wiring complexity. Subsequently, area-efficient fullyparallel high-throughput decoders with performance close to the SPA were demonstrated in field-programmable gate arrays (FPGAs) [11], [12]. We realized that the complexity benefits of stochastic decoding might be even greater for nonbinary LDPC codes and could result in a practical decoder implementation. In this paper, we present a generalization of stochastic decoding to LDPC codes over GF(q). The algorithm has significantly lower hardware complexity than other nonbinary decoding algorithms in the literature. II. S UM -P RODUCT D ECODING A. Notation Since most digital systems transmit data using 2p symbols, the focus in current research is on codes defined over GF(2p ). In this section, we describe the SPA for decoding LDPC codes over GF(2p ). However, it should be noted that the SPA works on any field GF(q) with minor modifications to notation and channel likelihood calculations. The elements of GF(2p ) can be represented as powers of the primitive element α, or using polynomials; the latter form in this section; so that the polynomial i(x) = Pp is used l−1 i x , where il are binary coefficients, represents an l l=1 element of GF(2p ). The notation used in this section for representing internode messages is similar to that of [7]; namely that U and V represent messages heading in the direction of check and variable nodes respectively. The subscripts represent the source and destination nodes. For example, Uxy is a message from node x to node y. All the messages are probability mass function (PMF) vectors indexed using GF(2p ) elements. Fig. 1a shows this notation applied to a Tanner graph. B. Algorithm While nonbinary codes can also be decoded using SPA on Tanner graphs, the check node update is modified because the lenge in implementing nonbinary LDPC decoders. III. S TOCHASTIC D ECODING (a) (b) Fig. 1: Stochastic decoder graphs with X and X −1 denoting forward and inverse permutation operations. (a) message labels, (b) message propagation with EMs added to the decoder. elements of H are nonbinary. Therefore the check constraint for a check node of degree dc is: dc X hk ik (x) = 0, (1) k=1 where hk is the element of H with indices corresponding to the check and variable nodes of interest. ThisP is different from dc the binary case where the check constraint is k=1 ik (x) = 0. To accommodate this change, Davey et al. [1] assigned values from H as labels to the edges connecting variable and check nodes and integrated the multiplication into the check node functionality. Declercq et al. [7] introduced a third node type called the permutation node which connects variable and check nodes and performs multiplication as shown P in Fig. 1a; theredc jk (x) = 0. fore, reverting the check node constraint to k=1 While the two approaches are functionally equivalent; the one in [7] results in simpler equations and implementation since all check nodes of the same degree are identical. The first step in the SPA is computing the channel likelihood vector Lv [i(x)] for each variable node v which is computed based on the channel model and modulation scheme. The outgoing message from variable node v to permutation node z is given by: Uvz = Lv × dv Y Vpv , (2) p=1,p6=z where × is the term-by-term product of vectors and dv is the P variable node degree. Normalization is needed so that a∈GF (2p ) Uvz [a] = 1. Permutation nodes implement multiplication by an element from H when passing messages from the variable to check nodes, and multiplication by the inverse of an element from H in the other direction. As shown in [7] the multiplication and multiplication by inverse can performed using cyclic shifts of the positions of the values in a message vector except those values indexed by 0. The parity check constraint does not include multiplication by elements of H anymore; therefore, the check node update equation is the convolution of incoming messages as shown in [7]: c Vct = ~dp=1,p6 (3) =t Upa . This convolution represents a significant computational chal- A message in the SPA for LDPC codes over GF(q) is a vector containing the probabilities of each of the q possible symbols. Stochastic decoding uses streams of symbols chosen from GF(q) to represent these messsages; the number of occurrences of a symbol in a stream divided by the total number of symbols observed in the stream gives the probability of that symbol. The advantage of utilizing such a method for message passing lies in the simple circuitry required to manipulate the stochastic streams to reflect likelihood changes as presented in Section III-D. Stochastic decoding of binary LDPC codes results in simple hardware structures. The reader is referred to [8], [9], [10], [11], [12] for details on binary stochastic decoding algorithms and their implementation. Similar notation to the SPA is used when describing the stochastic decoding message updates, the difference being that messages are serial stochastic streams instead of vectors; thus, an index t is used to denote the location of a symbol within a stream and the stream name is overlined, e.g. U vp (t). A. Node Equations Winstead et al. [13] presented a stochastic decoding algorithm that uses streams of integers instead of the conventional binary streams. In that work, an integer stream encodes the probabilities of the states in a trellis, leading to a demonstration of trellis decoding of a (16,11) Hamming code and a turbo product decoder built from the Hamming component decoders. However, that work did not interpret the integers as finite field symbols and did not utilize GF(q) arithmetic. In this section we present the node equations for a stochastic decoder for LDPC codes over GF(q). Taking the view that the nonbinary streams are composed of finite field elements, we present message update rules that are much simpler than those derived from a straightforward application of the rules in [13]. In particular, the trellis representation of the convolution in the check node reduces to Galois field addition. Section III-E demonstrates the performance of the stochastic algorithm when decoding a (256,128)-symbol LDPC code over GF(16). Variable Node: A stochastic variable node of degree dv takes as input dv stochastic streams from permutation nodes in addition to one generated based on channel likelihood values. In [13], the output of a node is updated if its inputs satisfy some constraint; otherwise, the output remains unchanged from the previous iteration. To implement a variable node constraint on an output message stream at time t, we copy the input symbol to the output symbol if the input symbols on all the other incoming edges are equal at time t. For a stochastic variable node with output U vp and inputs V iv , we propose the following update rule: ½ a if V iv = a, ∀i : i 6= p U vp (t) = (4) U vp (t − 1) otherwise Using equation (4) and assuming the inputs are independent, the PMF of the output is: Y P [U vp (t) = c] = P [V iv (t) = c] i X Y (5) +(1 − P [V iv (t) = a])P [U vp (t − 1) = c] a∈GF(q) i In the stochastic node, we define the output as the GF(q) addition of input, i.e V cp (t) = U 1c (t) ⊕ U 2c (t). The PMF of the output is computed as: P [V cp (t) = z] = P [U 1c (t) ⊕ U 2c (t) = z] (8) X = P [U 1c (t) = x]P [U 2c (t) = y]. x⊕y=z As in [13], if the stochastic streams are assumed to be stationary, then P [U vp (t) = c] = P [U vp (t − 1) = c] and the PMF of U vp (t) becomes: Y P [V iv (t) = c] Xi Y P [U vp (t) = c] = P [V iv (t) = a] . (6) a∈GF(q) i Equation (6) is identical to the normalized output of a sumproduct variable node; therefore, equation (4) is a valid update rule for the stochastic variable node. Permutation Node: The function of the permutation node is to remove multiplication by elements of H from the check node constraint. In the sum-product algorithm this is achieved by a cyclic shift of the message vector elements as in section II-B. Here, we demonstrate that multiplying the stochastic stream from a variable to a check node by an element of H accomplishes the same result. Assuming a permutation node p which corresponds to h = αi , the permutation node output message in a SPA decoder is defined such that each element in the message vector is given by: Upc [a] = Uvp [a.αi ], ∀ a ∈ GF(q). When, in a stochastic decoder, the permutation node multiplies all elements of the input stream by h, the output PMF becomes: P [U pc (t) = a] = P [U vp (t) = a.αi ] The SPA and stochastic output PMFs are identical and since the multiplicative group of GF(q) is cyclic and multiplication is closed on GF(q), the stochastic permutation node operation is equivalent to that of the SPA algorithm. Similarly, it can be shown that for messages passed from check to variable nodes, the inverse permutation node operation is multiplication by h−1 . It should be noted that h 6= 0, since a value of 0 in H signifies the lack of a connection between a variable and a check node. Therefore, there are no permutation nodes with a multiplier h = 0. Check Node: When deriving the stochastic update message for a check node, a degree-three node is considered and the result is generalized to a check node of any degree. Let U1c and U2c be the node inputs, which are assumed to be independent, and Vcp its output. From equation (3), the output of such a node when using the SPA is given as: X P [U1c = x]P [U2c = y], (7) P [Vcp = z|U1c , U2c ] = x⊕y=z where ⊕ is GF(q) addition. The PMFs (7) and (8) are identical; therefore it is concluded that GF(q) is a valid update message for a degree-3 stochastic check node. Since the output of a check node can be computed recursively [7], the previous conclusion can be generalized to a check node of any degree, and the output messages for these nodes are given as: V cp (t) = dc X U ic (t), (9) i=1,i6=p where the summation is GF(q) addition. It can be readily shown that the previous node equations reduce to the binary ones presented in [10] for GF(2). B. Noise-Dependent Scaling and Edge-Memories In binary stochastic decoding the switching activity can become very low resulting in poor bit-error-rate performance. This phenomenon is called latch-up and is caused by cycles in the graph that cause the stochastic streams to become correlated invalidating the independent stream assumption used to derive equations (4) and (9). Two solutions were proposed in [10]: noise-dependent scaling and edge memories. Both of these methods are used to improve the performance of the GF(q) decoder. Noise-dependent scaling increases switching activity by scaling down the channel likelihood values. For example, when transmitting data using BPSK modulation over an AWGN channel the scaled likelihood of each received bit l0 (i) is calculated by: l0 (i) = [l(i)] 2 2ασn Y , where l(i) is the unscaled bit likelihood, σn2 is the noise variance, and the ratio Yα is determined offline to yield the best performance in the SNR range of interest. Accordingly the equation for computing the channel likelihood values becomes: L[i(x)] = p Y [l(ik )] 2 2ασn Y . (10) k=1 Edge memories (EM) are finite depth buffers inserted between variable nodes and permutation nodes and randomly reorder symbols in the output streams of variable nodes; thus, they break correlation between streams without affecting the overall stream statistics. The EM contents are updated with the variable node output when the node update condition is satisfied, and remain intact otherwise. The output of the EM is that of the variable node in the first case, or a randomly selected symbols from its contents in the second. Due to the Algorithm Multiplication Addition FFT-SPA [2] 2p (d2c + 4dc ) p2p+1 dc + 2p LUT 0 Log-FFT-SPA [2] 0 (p2p+1 + 2p+2 )dc p2p+1 dc Stoc. dc − 1 dc − 1 0 Stoc.-LUT 0 dc − 1 dc − 1 TABLE I: The number of operations needed by FFT-SPA, Log-FFT-SPA, and stochastic decoders to compute a single check node output message including the permutation node operations. memory’s finite length, older symbols are discarded when new ones are added. Figure 1b demonstrates the message passing mechanism and the location of edge memories within a stochastic decoder. For complexity comparison, Table I provides the number of operations needed to compute a single check node output message in the FFT-SPA and Log-FFT-SPA algorithms as presented in [2]. It should be noted that the operations for the SPA are for real numbers and quantization will degrade the decoder performance; while those for the stochastic decoder are over a finite field GF(2p ). C. Algorithm Description At the beginning of the algorithm the edge memories are initialized using scaled channel likelihood values as PMFs for their content distribution. The following steps describe the stochastic decoding algorithm for each decoding cycle. 1: Variable node messages are computed using equation (4), edge memories are updated where appropriate, and messages are sent from edge memories to permutation nodes. 2: Permutation nodes perform GF(q) multiplication on incoming messages and send the results to check nodes. 3: Check node messages are computed as in equation (9) and are sent to permutation nodes. 4: Permutation nodes perform GF(q) multiplication by inverse and send resulting messages to variable nodes. 5: Each variable node contains counters C[a] corresponding to GF(q) elements. These counters are incremented based on incoming messages and the channel message L(t). A variable node belief is defined as arg max C[a]. 6: Variable nodes beliefs are updated accordingly. The streams are processed on a symbol-by-symbol basis, one symbol each cycle (steps 1-5), until the algorithm converges (the variable node beliefs satisfy the check constraints) or a maximum number of iterations is reached. As in the binary algorithm presented in [10] the processing is not packetized. D. Implementation While the stochastic decoding algorithm is defined for any finite field; the implementation presented in this section is limited to GF(2p ) as these are the most utilized fields and they yield the simplest implementation. The polynomial representation of GF(2p ) is used when implementing the algorithm. This choice greatly simplifies the circuitry needed to perform GF(2p ) addition. All gate number estimates assume 2-input logic gates in a tree configuration. (a) dv = 2 var. node (b) dc = 4 chk. node Fig. 2: GF(8) stochastic elements. Variable Node: To implement the operation specified by equation 4, a GF(2p ) equality check is needed. XNOR gates and an AND gate are used to perform the check and provide an enable (latch) signal to an edge-memory as shown in Figure (2a). To extend the circuit for a higher order field, more XNOR gates are used and connected to a larger AND gate. This accommodates the increase in the number of bits required to represent each GF(2p ) symbol in the stochastic streams. For higher degree nodes, the number of inputs to each XNOR gate is increased. The total number of gates, without counters, required by a variable node is: [p(dv − 1)XNOR + (p − 1)AND]dv . (11) Each variable node requires a maximum of 2p counters to track occurances of each symbol and determine the node belief. The size of EMs associated with a variable node of degree dv is dv lp bits, where l is the EM length. Permutation Node: Permutation nodes can be implemented using GF(2p ) multipliers. For a particular code, the symbols arriving at a permutation node are always multiplied by the same element of H. As a result, the multiplier can be designed to multiply by a specific (constant) element of GF(2p ) instead of a generic GF(2p ) multiplier, significantly reducing circuit complexity. Alternatively, look-up tables (LUT) can be used since their size would not be large. The multiplication by inverse for messages passed in the other direction is implemented in a similar manner. If LUTs are used to implement multiplication, each node requires two LUTs: one for multiplication by h and one for multiplication by h−1 . An operation LUT contains 2p − 1 entries each p bits wide. Check Node: The outgoing messages from check nodes are GF(2p ) summations of incoming messages. Since the GF(2p ) symbols are represented using the polynomial form, this operation can be realized utilizing XOR operations between corresponding bit lines of messages. The circuit in Fig. 2b is an example of a degree 4 check node in GF(8). To implement a higher degree check node, the number of inputs to each XOR gate is increased to account for the extra incoming messages. Extending this circuit to higher order fields can be done by adding more XOR gates. The total number of gates required by a check node is: [p(dc − 1)XOR]dc . (12) 0 0 10 10 SP6 Stochastic DCmax = 105 Stochastic DCmax = 10 SP6 Stochastic DCmax = 105 Stochastic DCmax = 10 -1 10 -1 10 -2 10 -2 -3 Bit Error Rate Frame Error Rate 10 -3 10 10 -4 10 -5 10 -4 10 -6 10 -5 10 -7 10 -6 10 -8 0 0.5 1 1.5 2 2.5 3 3.5 4 Eb/N0 (dB) Fig. 3: FER for a (256,128)-symbol (2,4)-regular LDPC code over GF(16). EM length = 50, Yα = 0.5. SNR (dB) 2.0 2.5 3.0 3.5 4.0 DCavg (DCmax = 106 ) DCavg (DCmax = 105 ) 22599 8888 4243 2329 1433 17958 8511 4209 2326 1433 TABLE II: Average number of decoding cycles. 10 0 0.5 1 1.5 2 Eb/N0 (dB) 2.5 3 3.5 4 Fig. 4: BER for a (256,128)-symbol (2,4)-regular LDPC code over GF(16). EM length = 50, Yα = 0.5. ACKNOWLEDGEMENT The authors would like to thank Prof. D. Declercq from ENSEA for helpful discussions. R EFERENCES E. Performance Figures 3 and 4 demonstrate the performance of the stochastic decoder compared to that of a SPA decoder when decoding a (256,128)-symbol LDPC code over GF(16) [14], when using an AWGN channel, BPSK, and random codewords. The SPA decoder has a maximum of 1000 iterations, while the stochastic decoder’s maximum is 106 decoding cycles (DC). The performance of the two decoders is very similar and the two decoders perform identically for higher SNR values. The change in the slope of the error rate graph was also observed in [14]. We note that the maximum number of decoding cycles is much greater than the average number of decoding cycles as shown in Table II, with DCavg determining the decoder throughput. Figures 3 and 4 demonstrate that, at higher SNRs, DCmax can be reduced with a small performance loss. It should be noted that the number of iterations in the SPA decoder and decoding cycles in the stochastic decoder are not directly comparable. SPA iterations involve complex operations, for example, the node operations in EMS [15] involve sorting and iterating over incoming message elements; thus, requiring many clock cycles. In a stochastic decoder, a decoding cycle is very simple and can be completed within a single clock cycle. Also, due to the nature of stochastic computation, the proposed implementation lends itself to pipelining (due to the random order of the messages, the feedback loop in the graph is broken allowing pipelining [12]); thus, enabling clock rates faster than those possible with the SPA. IV. C ONCLUSION In this paper we presented a stochastic decoding algorithm which we expect to enable practical high-throughput decoding of LDPC codes over GF(2p ). [1] M. Davey and D. MacKay, “Low-density parity check codes over GF(q),” IEEE Commun. Lett., vol. 2, no. 6, pp. 165–167, 1998. [2] H. Song and J. Cruz, “Reduced-complexity decoding of Q-ary LDPC codes for magnetic recording,” IEEE Trans. Magn., vol. 39, no. 2, pp. 1081–1087, 2003. [3] J. Chen, L. Wang, and Y. Li, “Performance comparison between nonbinary LDPC codes and reed-solomon codes over noise bursts channels,” in Proc. International Conference on Communications, Circuits and Systems, L. Wang, Ed., vol. 1, 2005, pp. 1–4 Vol. 1. [4] I. Djordjevic and B. Vasic, “Nonbinary LDPC codes for optical communication systems,” IEEE Photonics Technology Letters, vol. 17, no. 10, pp. 2224–2226, 2005. [5] C. Spagnol, W. Marnane, and E. Popovici, “FPGA implementations of LDPC over GF(2m ) decoders,” in Proc. IEEE Workshop on Signal Processing Systems, W. Marnane, Ed., 2007, pp. 273–278. [6] D. MacKay and M. Davey, “Evaluation of Gallager codes for short block length and high rate applications,” in In Codes, Systems and Graphical Models. Springer-Verlag, 2000, pp. 113–130. [7] D. Declercq and M. Fossorier, “Decoding algorithms for nonbinary LDPC codes over GF(q),” IEEE Trans. Commun., vol. 55, no. 4, pp. 633–643, 2007. [8] B. Gaines, Advances in Information Systems Science. Plenum, New York, 1969, ch. 2, pp. 37–172. [9] V. Gaudet and A. Rapley, “Iterative decoding using stochastic computation,” Electronics Letters, vol. 39, no. 3, pp. 299–301, Feb. 2003. [10] S. Sharifi Tehrani, W. Gross, and S. Mannor, “Stochastic decoding of LDPC codes,” IEEE Commun. Lett., vol. 10, no. 10, pp. 716–718, 2006. [11] S. Sharifi Tehrani, S. Mannor, and W. J. Gross, “An area-efficient FPGAbased architecture for fully-parallel stochastic LDPC decoding,” in Proc. IEEE Workshop on Signal Processing Systems, 17–19 Oct. 2007, pp. 255–260. [12] ——, “Fully parallel stochastic LDPC decoders,” IEEE Trans. Signal Process., vol. 56, no. 11, pp. 5692–5703, Nov. 2008. [13] C. Winstead, V. Gaudet, A. Rapley, and C. Schlegel, “Stochastic iterative decoders,” in Proc. International Symposium on Information Theory ISIT, 2005, pp. 1116–1120. [14] C. Poulliat, M. Fossorier, and D. Declercq, “Design of regular (2, dc )-LDPC codes over GF(q) using their binary images,” IEEE Trans. Commun., vol. 56, no. 10, pp. 1626–1635, October 2008. [15] A. Voicila, F. Verdier, D. Declercq, M. Fossorier, and P. Urard, “Architecture of a low-complexity non-binary LDPC decoder for high order fields,” in Proc. International Symposium on Communications and Information Technologies ISCIT ’07, F. Verdier, Ed., 2007, pp. 1201– 1206.