cG2009 IEEE. Personal use of this material is permitted. Permission

advertisement
c
2009
IEEE. Personal use of this material is permitted. Permission from
IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to servers
or lists, or reuse of any copyrighted component of this work in other works.
doi: http://dx.doi.org/10.1109/ICC.2009.5199493
Stochastic Decoding of LDPC Codes over GF(q )
Gabi Sarkis, Shie Mannor and Warren J. Gross
Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7
Email: gabi.sarkis@mail.mcgill.ca, shie.mannor@mcgill.ca, warren.gross@mcgill.ca
Abstract—Nonbinary LDPC codes have been shown to outperform currently used codes for magnetic recording and several
other channels. Currently proposed nonbinary decoder architectures have very high complexity for high-throughput implementations and sacrifice error-correction performance to maintain
realizable complexity. In this paper, we present an alternative
decoding algorithm based on stochastic computation that has a
very simple implementation and minimal performance loss when
compared to the sum-product algorithm. We demonstrate the
performance of the algorithm when applied to a GF(16) code
and provide details of the hardware resources required for an
implementation.
I. I NTRODUCTION
Low-Density Parity Check (LDPC) codes are linear block
codes that can achieve performance close to the Shannon limit
under iterative decoding. Binary LDPC codes have received
much interest and are specified in recent wireless and wireline
communications standards, for example digital video broadcast
(DVB-S2), WiMAX wireless (IEEE 802.16e) and 10 gigabit
Ethernet (IEEE 802.3an). Nonbinary LDPC codes defined over
q-ary Galois fields (GF(q)) were introduced in [1] and were
shown to perform better than equivalent bit-length binary
codes for additive-white Gaussian noise (AWGN) channels. In
[2], Song et al. showed that GF(q) LDPC codes significantly
outperform binary LDPC and Reed-Solomon (RS) codes for
the magnetic recording channel. Chen et al. [3] demonstrated
that LDPC codes over GF(16) perform better than RS codes
for general channels with bursts of noise; thus making GF(q)
LDPC a candidate to replace RS coding in many storage
systems. Djordjevic et al. concluded that nonbinary LDPC
codes achieve lower BER than other codes while allowing
for higher transmission rates when used with the fiber-optic
channel [4].
LDPC codes over GF(q) are defined such that elements
of the parity check matrix H are elements of GF(q). As in
the binary case, these codes are decoded by the sum-product
algorithm (SPA) applied to the Tanner graph representation
of the parity-check matrix H. Unfortunately, the nonbinary
values of H result in very high complexity for the check
node updates in the graph, presenting a significant barrier
to practical realization. The only hardware implementation in
the literature is fully serial, consisting of only one variable
node and one check node [5]. There have been a number of
approaches to reduce the complexity of the check node update
in the literature. MacKay et al. proposed using the fast Fourier
transform (FFT) to convert convolution to multiplication in
the check nodes [6]. Song et al. use the log-domain to replace
multiplication with addition [2]. Declercq et al. introduced the
extended min-sum (EMS) algorithm as an approximation to
the SPA, computing likelihood values for only a subset of
the field elements; thus reducing the number of computations
performed [7]. While these approaches are simpler than a
direct implementation of the SPA, there is a need to further
reduce the complexity for practical decoder implementations.
Recently, a new approach to decoding binary LDPC codes
based on stochastic computation ([8], [9]) was introduced in
[10]. Stochastic decoders use random bit-streams to represent
probability messages and result in simple node hardware and
reduced wiring complexity. Subsequently, area-efficient fullyparallel high-throughput decoders with performance close to
the SPA were demonstrated in field-programmable gate arrays
(FPGAs) [11], [12].
We realized that the complexity benefits of stochastic decoding might be even greater for nonbinary LDPC codes and could
result in a practical decoder implementation. In this paper, we
present a generalization of stochastic decoding to LDPC codes
over GF(q). The algorithm has significantly lower hardware
complexity than other nonbinary decoding algorithms in the
literature.
II. S UM -P RODUCT D ECODING
A. Notation
Since most digital systems transmit data using 2p symbols,
the focus in current research is on codes defined over GF(2p ).
In this section, we describe the SPA for decoding LDPC codes
over GF(2p ). However, it should be noted that the SPA works
on any field GF(q) with minor modifications to notation and
channel likelihood calculations.
The elements of GF(2p ) can be represented as powers
of the primitive element α, or using polynomials; the latter
form
in this section; so that the polynomial i(x) =
Pp is used
l−1
i
x
,
where il are binary coefficients, represents an
l
l=1
element of GF(2p ).
The notation used in this section for representing internode messages is similar to that of [7]; namely that U and
V represent messages heading in the direction of check and
variable nodes respectively. The subscripts represent the source
and destination nodes. For example, Uxy is a message from
node x to node y. All the messages are probability mass
function (PMF) vectors indexed using GF(2p ) elements. Fig.
1a shows this notation applied to a Tanner graph.
B. Algorithm
While nonbinary codes can also be decoded using SPA on
Tanner graphs, the check node update is modified because the
lenge in implementing nonbinary LDPC decoders.
III. S TOCHASTIC D ECODING
(a)
(b)
Fig. 1: Stochastic decoder graphs with X and X −1 denoting
forward and inverse permutation operations. (a) message labels, (b) message propagation with EMs added to the decoder.
elements of H are nonbinary. Therefore the check constraint
for a check node of degree dc is:
dc
X
hk ik (x) = 0,
(1)
k=1
where hk is the element of H with indices corresponding to
the check and variable nodes of interest. ThisP
is different from
dc
the binary case where the check constraint is k=1
ik (x) = 0.
To accommodate this change, Davey et al. [1] assigned values
from H as labels to the edges connecting variable and check
nodes and integrated the multiplication into the check node
functionality. Declercq et al. [7] introduced a third node type
called the permutation node which connects variable and check
nodes and performs multiplication as shown P
in Fig. 1a; theredc
jk (x) = 0.
fore, reverting the check node constraint to k=1
While the two approaches are functionally equivalent; the one
in [7] results in simpler equations and implementation since
all check nodes of the same degree are identical.
The first step in the SPA is computing the channel likelihood
vector Lv [i(x)] for each variable node v which is computed
based on the channel model and modulation scheme. The
outgoing message from variable node v to permutation node
z is given by:
Uvz = Lv ×
dv
Y
Vpv ,
(2)
p=1,p6=z
where × is the term-by-term product of vectors and dv is
the
P variable node degree. Normalization is needed so that
a∈GF (2p ) Uvz [a] = 1.
Permutation nodes implement multiplication by an element
from H when passing messages from the variable to check
nodes, and multiplication by the inverse of an element from
H in the other direction. As shown in [7] the multiplication
and multiplication by inverse can performed using cyclic shifts
of the positions of the values in a message vector except those
values indexed by 0.
The parity check constraint does not include multiplication
by elements of H anymore; therefore, the check node update
equation is the convolution of incoming messages as shown
in [7]:
c
Vct = ~dp=1,p6
(3)
=t Upa .
This convolution represents a significant computational chal-
A message in the SPA for LDPC codes over GF(q) is a
vector containing the probabilities of each of the q possible
symbols. Stochastic decoding uses streams of symbols chosen
from GF(q) to represent these messsages; the number of occurrences of a symbol in a stream divided by the total number
of symbols observed in the stream gives the probability of that
symbol. The advantage of utilizing such a method for message
passing lies in the simple circuitry required to manipulate the
stochastic streams to reflect likelihood changes as presented
in Section III-D. Stochastic decoding of binary LDPC codes
results in simple hardware structures. The reader is referred
to [8], [9], [10], [11], [12] for details on binary stochastic
decoding algorithms and their implementation.
Similar notation to the SPA is used when describing the
stochastic decoding message updates, the difference being that
messages are serial stochastic streams instead of vectors; thus,
an index t is used to denote the location of a symbol within
a stream and the stream name is overlined, e.g. U vp (t).
A. Node Equations
Winstead et al. [13] presented a stochastic decoding algorithm that uses streams of integers instead of the conventional
binary streams. In that work, an integer stream encodes the
probabilities of the states in a trellis, leading to a demonstration of trellis decoding of a (16,11) Hamming code and
a turbo product decoder built from the Hamming component
decoders. However, that work did not interpret the integers
as finite field symbols and did not utilize GF(q) arithmetic.
In this section we present the node equations for a stochastic
decoder for LDPC codes over GF(q). Taking the view that
the nonbinary streams are composed of finite field elements,
we present message update rules that are much simpler than
those derived from a straightforward application of the rules in
[13]. In particular, the trellis representation of the convolution
in the check node reduces to Galois field addition. Section
III-E demonstrates the performance of the stochastic algorithm
when decoding a (256,128)-symbol LDPC code over GF(16).
Variable Node: A stochastic variable node of degree dv
takes as input dv stochastic streams from permutation nodes in
addition to one generated based on channel likelihood values.
In [13], the output of a node is updated if its inputs satisfy
some constraint; otherwise, the output remains unchanged
from the previous iteration. To implement a variable node
constraint on an output message stream at time t, we copy
the input symbol to the output symbol if the input symbols
on all the other incoming edges are equal at time t. For a
stochastic variable node with output U vp and inputs V iv , we
propose the following update rule:
½
a
if V iv = a, ∀i : i 6= p
U vp (t) =
(4)
U vp (t − 1) otherwise
Using equation (4) and assuming the inputs are independent,
the PMF of the output is:
Y
P [U vp (t) = c] =
P [V iv (t) = c]
i
X Y
(5)
+(1 −
P [V iv (t) = a])P [U vp (t − 1) = c]
a∈GF(q) i
In the stochastic node, we define the output as the GF(q)
addition of input, i.e V cp (t) = U 1c (t) ⊕ U 2c (t). The PMF of
the output is computed as:
P [V cp (t) = z]
= P [U 1c (t) ⊕ U 2c (t) = z]
(8)
X
=
P [U 1c (t) = x]P [U 2c (t) = y].
x⊕y=z
As in [13], if the stochastic streams are assumed to be
stationary, then P [U vp (t) = c] = P [U vp (t − 1) = c] and
the PMF of U vp (t) becomes:
Y
P [V iv (t) = c]
Xi Y
P [U vp (t) = c] =
P [V iv (t) = a]
.
(6)
a∈GF(q) i
Equation (6) is identical to the normalized output of a sumproduct variable node; therefore, equation (4) is a valid update
rule for the stochastic variable node.
Permutation Node: The function of the permutation node
is to remove multiplication by elements of H from the check
node constraint. In the sum-product algorithm this is achieved
by a cyclic shift of the message vector elements as in section
II-B. Here, we demonstrate that multiplying the stochastic
stream from a variable to a check node by an element of H
accomplishes the same result. Assuming a permutation node
p which corresponds to h = αi , the permutation node output
message in a SPA decoder is defined such that each element
in the message vector is given by:
Upc [a] = Uvp [a.αi ],
∀ a ∈ GF(q).
When, in a stochastic decoder, the permutation node multiplies
all elements of the input stream by h, the output PMF
becomes:
P [U pc (t) = a] = P [U vp (t) = a.αi ]
The SPA and stochastic output PMFs are identical and since
the multiplicative group of GF(q) is cyclic and multiplication
is closed on GF(q), the stochastic permutation node operation
is equivalent to that of the SPA algorithm.
Similarly, it can be shown that for messages passed from
check to variable nodes, the inverse permutation node operation is multiplication by h−1 . It should be noted that h 6= 0,
since a value of 0 in H signifies the lack of a connection
between a variable and a check node. Therefore, there are no
permutation nodes with a multiplier h = 0.
Check Node: When deriving the stochastic update message
for a check node, a degree-three node is considered and the
result is generalized to a check node of any degree. Let U1c and
U2c be the node inputs, which are assumed to be independent,
and Vcp its output. From equation (3), the output of such a
node when using the SPA is given as:
X
P [U1c = x]P [U2c = y], (7)
P [Vcp = z|U1c , U2c ] =
x⊕y=z
where ⊕ is GF(q) addition.
The PMFs (7) and (8) are identical; therefore it is concluded
that GF(q) is a valid update message for a degree-3 stochastic
check node.
Since the output of a check node can be computed recursively [7], the previous conclusion can be generalized to a
check node of any degree, and the output messages for these
nodes are given as:
V cp (t) =
dc
X
U ic (t),
(9)
i=1,i6=p
where the summation is GF(q) addition.
It can be readily shown that the previous node equations
reduce to the binary ones presented in [10] for GF(2).
B. Noise-Dependent Scaling and Edge-Memories
In binary stochastic decoding the switching activity can
become very low resulting in poor bit-error-rate performance.
This phenomenon is called latch-up and is caused by cycles
in the graph that cause the stochastic streams to become correlated invalidating the independent stream assumption used
to derive equations (4) and (9). Two solutions were proposed
in [10]: noise-dependent scaling and edge memories. Both of
these methods are used to improve the performance of the
GF(q) decoder.
Noise-dependent scaling increases switching activity by
scaling down the channel likelihood values. For example, when
transmitting data using BPSK modulation over an AWGN
channel the scaled likelihood of each received bit l0 (i) is
calculated by:
l0 (i) = [l(i)]
2
2ασn
Y
,
where l(i) is the unscaled bit likelihood, σn2 is the noise
variance, and the ratio Yα is determined offline to yield the
best performance in the SNR range of interest. Accordingly the
equation for computing the channel likelihood values becomes:
L[i(x)] =
p
Y
[l(ik )]
2
2ασn
Y
.
(10)
k=1
Edge memories (EM) are finite depth buffers inserted between variable nodes and permutation nodes and randomly
reorder symbols in the output streams of variable nodes; thus,
they break correlation between streams without affecting the
overall stream statistics. The EM contents are updated with
the variable node output when the node update condition is
satisfied, and remain intact otherwise. The output of the EM
is that of the variable node in the first case, or a randomly
selected symbols from its contents in the second. Due to the
Algorithm
Multiplication
Addition
FFT-SPA [2]
2p (d2c + 4dc )
p2p+1 dc + 2p
LUT
0
Log-FFT-SPA [2]
0
(p2p+1 + 2p+2 )dc
p2p+1 dc
Stoc.
dc − 1
dc − 1
0
Stoc.-LUT
0
dc − 1
dc − 1
TABLE I: The number of operations needed by FFT-SPA,
Log-FFT-SPA, and stochastic decoders to compute a single
check node output message including the permutation node
operations.
memory’s finite length, older symbols are discarded when new
ones are added.
Figure 1b demonstrates the message passing mechanism and
the location of edge memories within a stochastic decoder.
For complexity comparison, Table I provides the number
of operations needed to compute a single check node output
message in the FFT-SPA and Log-FFT-SPA algorithms as
presented in [2]. It should be noted that the operations for the
SPA are for real numbers and quantization will degrade the
decoder performance; while those for the stochastic decoder
are over a finite field GF(2p ).
C. Algorithm Description
At the beginning of the algorithm the edge memories are
initialized using scaled channel likelihood values as PMFs
for their content distribution. The following steps describe
the stochastic decoding algorithm for each decoding cycle. 1:
Variable node messages are computed using equation (4), edge
memories are updated where appropriate, and messages are
sent from edge memories to permutation nodes. 2: Permutation
nodes perform GF(q) multiplication on incoming messages
and send the results to check nodes. 3: Check node messages
are computed as in equation (9) and are sent to permutation
nodes. 4: Permutation nodes perform GF(q) multiplication by
inverse and send resulting messages to variable nodes. 5: Each
variable node contains counters C[a] corresponding to GF(q)
elements. These counters are incremented based on incoming
messages and the channel message L(t). A variable node
belief is defined as arg max C[a]. 6: Variable nodes beliefs
are updated accordingly.
The streams are processed on a symbol-by-symbol basis,
one symbol each cycle (steps 1-5), until the algorithm converges (the variable node beliefs satisfy the check constraints)
or a maximum number of iterations is reached. As in the binary
algorithm presented in [10] the processing is not packetized.
D. Implementation
While the stochastic decoding algorithm is defined for any
finite field; the implementation presented in this section is
limited to GF(2p ) as these are the most utilized fields and they
yield the simplest implementation. The polynomial representation of GF(2p ) is used when implementing the algorithm.
This choice greatly simplifies the circuitry needed to perform
GF(2p ) addition. All gate number estimates assume 2-input
logic gates in a tree configuration.
(a) dv = 2 var. node
(b) dc = 4 chk. node
Fig. 2: GF(8) stochastic elements.
Variable Node: To implement the operation specified by
equation 4, a GF(2p ) equality check is needed. XNOR gates
and an AND gate are used to perform the check and provide
an enable (latch) signal to an edge-memory as shown in Figure
(2a).
To extend the circuit for a higher order field, more XNOR
gates are used and connected to a larger AND gate. This
accommodates the increase in the number of bits required to
represent each GF(2p ) symbol in the stochastic streams. For
higher degree nodes, the number of inputs to each XNOR
gate is increased. The total number of gates, without counters,
required by a variable node is:
[p(dv − 1)XNOR + (p − 1)AND]dv .
(11)
Each variable node requires a maximum of 2p counters
to track occurances of each symbol and determine the node
belief. The size of EMs associated with a variable node of
degree dv is dv lp bits, where l is the EM length.
Permutation Node: Permutation nodes can be implemented
using GF(2p ) multipliers. For a particular code, the symbols
arriving at a permutation node are always multiplied by the
same element of H. As a result, the multiplier can be designed
to multiply by a specific (constant) element of GF(2p ) instead
of a generic GF(2p ) multiplier, significantly reducing circuit
complexity. Alternatively, look-up tables (LUT) can be used
since their size would not be large. The multiplication by
inverse for messages passed in the other direction is implemented in a similar manner.
If LUTs are used to implement multiplication, each node
requires two LUTs: one for multiplication by h and one for
multiplication by h−1 . An operation LUT contains 2p − 1
entries each p bits wide.
Check Node: The outgoing messages from check nodes are
GF(2p ) summations of incoming messages. Since the GF(2p )
symbols are represented using the polynomial form, this
operation can be realized utilizing XOR operations between
corresponding bit lines of messages. The circuit in Fig. 2b is
an example of a degree 4 check node in GF(8).
To implement a higher degree check node, the number of
inputs to each XOR gate is increased to account for the extra
incoming messages. Extending this circuit to higher order
fields can be done by adding more XOR gates. The total
number of gates required by a check node is:
[p(dc − 1)XOR]dc .
(12)
0
0
10
10
SP6
Stochastic DCmax = 105
Stochastic DCmax = 10
SP6
Stochastic DCmax = 105
Stochastic DCmax = 10
-1
10
-1
10
-2
10
-2
-3
Bit Error Rate
Frame Error Rate
10
-3
10
10
-4
10
-5
10
-4
10
-6
10
-5
10
-7
10
-6
10
-8
0
0.5
1
1.5
2
2.5
3
3.5
4
Eb/N0 (dB)
Fig. 3: FER for a (256,128)-symbol (2,4)-regular LDPC code
over GF(16). EM length = 50, Yα = 0.5.
SNR (dB)
2.0
2.5
3.0
3.5
4.0
DCavg (DCmax = 106 )
DCavg (DCmax = 105 )
22599
8888
4243
2329
1433
17958
8511
4209
2326
1433
TABLE II: Average number of decoding cycles.
10
0
0.5
1
1.5
2
Eb/N0 (dB)
2.5
3
3.5
4
Fig. 4: BER for a (256,128)-symbol (2,4)-regular LDPC code
over GF(16). EM length = 50, Yα = 0.5.
ACKNOWLEDGEMENT
The authors would like to thank Prof. D. Declercq from
ENSEA for helpful discussions.
R EFERENCES
E. Performance
Figures 3 and 4 demonstrate the performance of the stochastic decoder compared to that of a SPA decoder when decoding
a (256,128)-symbol LDPC code over GF(16) [14], when
using an AWGN channel, BPSK, and random codewords. The
SPA decoder has a maximum of 1000 iterations, while the
stochastic decoder’s maximum is 106 decoding cycles (DC).
The performance of the two decoders is very similar and the
two decoders perform identically for higher SNR values. The
change in the slope of the error rate graph was also observed
in [14]. We note that the maximum number of decoding cycles
is much greater than the average number of decoding cycles
as shown in Table II, with DCavg determining the decoder
throughput. Figures 3 and 4 demonstrate that, at higher SNRs,
DCmax can be reduced with a small performance loss.
It should be noted that the number of iterations in the
SPA decoder and decoding cycles in the stochastic decoder
are not directly comparable. SPA iterations involve complex
operations, for example, the node operations in EMS [15]
involve sorting and iterating over incoming message elements;
thus, requiring many clock cycles. In a stochastic decoder, a
decoding cycle is very simple and can be completed within a
single clock cycle. Also, due to the nature of stochastic computation, the proposed implementation lends itself to pipelining
(due to the random order of the messages, the feedback loop
in the graph is broken allowing pipelining [12]); thus, enabling
clock rates faster than those possible with the SPA.
IV. C ONCLUSION
In this paper we presented a stochastic decoding algorithm
which we expect to enable practical high-throughput decoding
of LDPC codes over GF(2p ).
[1] M. Davey and D. MacKay, “Low-density parity check codes over
GF(q),” IEEE Commun. Lett., vol. 2, no. 6, pp. 165–167, 1998.
[2] H. Song and J. Cruz, “Reduced-complexity decoding of Q-ary LDPC
codes for magnetic recording,” IEEE Trans. Magn., vol. 39, no. 2, pp.
1081–1087, 2003.
[3] J. Chen, L. Wang, and Y. Li, “Performance comparison between nonbinary LDPC codes and reed-solomon codes over noise bursts channels,”
in Proc. International Conference on Communications, Circuits and
Systems, L. Wang, Ed., vol. 1, 2005, pp. 1–4 Vol. 1.
[4] I. Djordjevic and B. Vasic, “Nonbinary LDPC codes for optical communication systems,” IEEE Photonics Technology Letters, vol. 17, no. 10,
pp. 2224–2226, 2005.
[5] C. Spagnol, W. Marnane, and E. Popovici, “FPGA implementations
of LDPC over GF(2m ) decoders,” in Proc. IEEE Workshop on Signal
Processing Systems, W. Marnane, Ed., 2007, pp. 273–278.
[6] D. MacKay and M. Davey, “Evaluation of Gallager codes for short block
length and high rate applications,” in In Codes, Systems and Graphical
Models. Springer-Verlag, 2000, pp. 113–130.
[7] D. Declercq and M. Fossorier, “Decoding algorithms for nonbinary
LDPC codes over GF(q),” IEEE Trans. Commun., vol. 55, no. 4, pp.
633–643, 2007.
[8] B. Gaines, Advances in Information Systems Science. Plenum, New
York, 1969, ch. 2, pp. 37–172.
[9] V. Gaudet and A. Rapley, “Iterative decoding using stochastic computation,” Electronics Letters, vol. 39, no. 3, pp. 299–301, Feb. 2003.
[10] S. Sharifi Tehrani, W. Gross, and S. Mannor, “Stochastic decoding of
LDPC codes,” IEEE Commun. Lett., vol. 10, no. 10, pp. 716–718, 2006.
[11] S. Sharifi Tehrani, S. Mannor, and W. J. Gross, “An area-efficient FPGAbased architecture for fully-parallel stochastic LDPC decoding,” in Proc.
IEEE Workshop on Signal Processing Systems, 17–19 Oct. 2007, pp.
255–260.
[12] ——, “Fully parallel stochastic LDPC decoders,” IEEE Trans. Signal
Process., vol. 56, no. 11, pp. 5692–5703, Nov. 2008.
[13] C. Winstead, V. Gaudet, A. Rapley, and C. Schlegel, “Stochastic iterative
decoders,” in Proc. International Symposium on Information Theory
ISIT, 2005, pp. 1116–1120.
[14] C. Poulliat, M. Fossorier, and D. Declercq, “Design of regular (2,
dc )-LDPC codes over GF(q) using their binary images,” IEEE Trans.
Commun., vol. 56, no. 10, pp. 1626–1635, October 2008.
[15] A. Voicila, F. Verdier, D. Declercq, M. Fossorier, and P. Urard, “Architecture of a low-complexity non-binary LDPC decoder for high
order fields,” in Proc. International Symposium on Communications and
Information Technologies ISCIT ’07, F. Verdier, Ed., 2007, pp. 1201–
1206.
Download