"AI Techniques for Turbo Decoding (Word)"

advertisement
Applying Artificial Intelligence Techniques
to Turbo Decoding
Bob Wall
EE 548
Apr. 30, 2004
Turbo coding techniques are currently a topic of great interest– to date, their performance on noisy channels
comes closest to Shannon’s limit. However, there are some drawbacks to the use of turbo codes; one of the
most significant is the delay in decoding. This is due in part to the blocking and interleaving inherent in turbo coding, and in part because of the iterative nature of the decoder. It is therefore of great interest to find
ways to implement a relatively simple, fast decoder. The field of Artificial Intelligence provides several
techniques that can be directly applied to this problem.
INTRODUCTION
Shannon’s Capacity Theorem, introduced by Claude Shannon in 1948, states that the maximum data transmission capacity (in bits/second) of a band-limited channel with additive white Gaussian noise (AWGN) is
given by
C = BW log2 (1 + S/N)
That is, the capacity is proportional to the bandwidth of the channel and to the signal-to-noise ratio.
The work by Shannon and contemporaries like Hamming spawned the fields of information theory and
coding theory. There has been a great deal of research over the years on ways to encode information such
that a rate approaching Shannon’s limit can be reached. In practice, it is all but impossible to reach this
capacity, but there is great interest in finding data transmission techniques that will at least approach the
limit. This is of particular interest in power-limited or high-noise environments, such as space communications links and cellular networks.
Coding theory has produced a number of different codes in an effort to improve the error-free transmission
rate; these are typically divided into block codes and convolutional codes. Convolutional codes have an
advantage over block codes in that they can be made more powerful (i.e., able to detect and correct more
errors) without huge increases in the size and complexity of the decoder. However, convolutional codes
are sensitive to burst errors and have some other problems that limit their effectiveness. In an effort to
overcome these problems, concatenated codes were introduced. This involved feeding the output of one
coder (the “outer code”, such as a Reed-Solomon block coder) into another coder (the “inner code”, such as
a convolutional coder). This further improved coding gain (the increase in bit error rate, or BER, for a given signal-to-noise ratio), but there was still significant opportunity for improvement.
In 1993, Berrou, Glavieux, and Thitimajshima proposed a variation of concatenated codes that introduced a
pseudo-random interleaver to reorder the input before passing it through the second coder ([1]). They
called these turbo codes, as an analogy to a turbo-charged engine, which uses feedback from the exhaust
system to enhance performance. The principles of turbo coding are described below.
The performance of turbo codes was a significant breakthrough in coding theory – experimental results
demonstrated that with appropriate choices of the component coders and interleaver, it was possible to get
within 0.5 dB of Shannon’s limit in certain environments. These results spurred research in understanding
what characteristics of turbo codes caused them to perform so well. One breakthrough was achieved when
the turbo decoding process was interpreted in terms that were not familiar within the framework of coding
theory, but which were common in artificial intelligence – if the turbo decoder was viewed as a special type
of Bayesian network called a belief network, the application of a popular algorithm associated with these
networks directly yielded the turbo decoding algorithm. This discovery provided important insights into
the turbo decoding. A summary of belief networks and how they are related to turbo decoding is presented
below.
Another construct common in artificial intelligence, the neural network, has been used in conjunction with
turbo decoding; a technique is described for using neural nets in conjunction with the turbo decoder to accurately predict the presence of errors in decoded data. This method is more reliable and less complex to
implement than a cyclic redundancy check in conjunction with the decoder.
Finally, a brief discussion is presented on the use of genetic algorithms to select the parameters for a turbo
coder given the characteristics of the transmission channel.
TURBO CODE OVERVIEW
The turbo coder as it was initially proposed by Berrou, et al., was a parallel concatenation of two Reed Solomon block codes. The key contribution that distinguishes turbo codes from other concatenated codes is
the use of a pseudo-random interleaver, which reordered the data to be encoded before feeding it to the
second encoder. This is a simple block diagram of the encoder:
data dk
Dk
Interleaver
RSC1
C1,k
RSC2
C2,k
There are a number of variations on this theme – the coders do not need to be identical, and in fact one can
be a block coder and the other a convolutional coder. They can also be concatenated serially rather than in
parallel – that is, the output of one coder is used as the input of the other coder.
One explanation for the improvements obtained by turbo coding is that each of the encoders typically produces a “good” or high-weight code word for any input, but it produces a “bad” or low-weight code word
for some inputs. The interleaver makes it very unlikely that both encoders will output a bad code word for
the same input, increasing the chance that the decoder will be able to extract the correct information.
This is a block diagram of the corresponding decoder:
L2
C1,k
C2,k
Decoder 1
Interleaver
Decoder 2
Deinterleaver
L1
Dk
Interleaver
Threshold
detector
Terminate
Condition
Stop
Iteration
Output
data
The use of a large block interleaver dramatically increases the state space of the encoded message, making
the creation of a deterministic optimal decoder impractical. Instead, it is necessary to use an iterative algorithm to perform the decoding. There are two typical approaches used – the Soft Output Viterbi Algorithm
(SOVA) ([2]) and the Maximum A Posteriori (MAP) algorithm, often called the BCJR algorithm after its
inventors ([3]). The BCJR algorithm is more complicated than the SOVA algorithm, but performs better.
However, its complexity can be reduced by implementing it in the log domain, which transforms multiplications into additions ([4]).
In addition to a prediction of the output value associated with their inputs, these algorithms produce a confidence measure or reliability estimate associated with the output (the values L1 and L2 in the diagram).
That is, each decoder has “soft” inputs and outputs, which reflect the likelihood that the decoder has determined the correct value for the data.
The key to these algorithms is that they cannot directly determine the correct answer for a given input, so
they iterate until they converge on the most likely source data sequence that would produce the received
data. If the decoder does not converge, the errors in the received data exceed its ability to correct them.
Because code words are accumulated into blocks and interleaved while encoding, it is necessary to fill up at
least some large portion of that block in the receiver before decoding can begin. The subsequent iterations
of the decoder introduce additional delay. It is of interest to find methods of generating this decoder that
are simple, fast, and reasonably close to the optimal decoder.
THE TURBO DECODER AS A BAYESIAN BELIEF NETWORK
A Bayesian network is a common construct used in the field of artificial intelligence to represent the relationship between random variables. For instance, this is a simple Bayesian network:
This indicates that there are a number of source variables Ui, which have no dependencies on other variables, a hidden variable X, and a number of observation variables Yj. If there is a probability associated with
each possible value of X based on the values of each of the Ui, and a probability associated with each possible value of Yj based on the value of X, this is called a Bayesian belief network. In 1998, MacEleice, MacKay, and Cheng published a paper in which they analyzed parallel concatenated turbo codes (and other
recently developed codes) as belief networks ([5]).
Framing the turbo decoder as a Bayesian belief network allows the application of a number of well-known
techniques from the field of AI for analysis and for the design of solutions.
Turbo Decoding as Belief Propagation
Given a belief network, if the values of one or more variables are observed or measured, this set of variables can be considered as evidence. The fundamental probabilistic inference problem is to calculate the
updated or a posteriori probabilities of the other variables given this evidence. Obviously, if the number of
nodes in the network increases, the effort involved to compute these probabilities grows rapidly; in fact, the
solution in a general network has been shown to be NP hard. Fortunately, there are a number of algo-
rithms, including Judah Pearl’s belief propagation algorithm, which can be applied if the network is in fact
a tree (i.e., there are no loops in the network). This algorithm can significantly simplify the inference problem and solve it in a distributed fashion ([6]).
However, MacEleice, et al., pointed out that experiments demonstrated that Pearl’s algorithm works “approximately” for some loopy networks. They presented the following interpretation of the turbo decoder as
a belief network:
They then proved that if Pearl’s belief propagation algorithm is applied to this network, it yields an algorithm that is identical to the turbo decoding algorithm. They also showed that the use of belief propagation
on other coding schemes also generated their iterative decoding algorithms. This proved to be a much-cited
result in the error-correction and coding community, and initiated a large amount of research into the interpretation of codes as graphs and the subsequent decoding as a graph theory problem. Subsequent attention
has been given to the representation of decoders as Tanner graphs and more recently as factor graphs; this
is an active area of research.
NEURAL NETWORKS
Given the iterative nature of the turbo decoding process, there must be some stopping criteria that determine when the decoder has converged, or if the decoder will not converge (due to the presence of uncorrectable errors). One possibility is to incorporate an outer error detecting code, such as a cyclic redundancy
check (CRC) into the code. Another is to continue iterations until the measured variance of the estimates
produced by the decoders drops below a pre-set threshold. In 1996, Joachim Hagenauer, Elke Offer, and L.
Papke published a paper describing a scheme in which the cross entropy of the decoder outputs are tracked
to detect convergence ([7]). A threshold is still required to halt the iterations if they are not converging. In
2000, Michael E. Buckley and Stephen Wicker proposed a mechanism whereby a neural network can be
trained to monitor this cross entropy and to predict whether there are errors in the decoded data ([8]). Even
more importantly, a similar network can be trained to predict whether the decoding process will produce
errors, but it can make this prediction early in the decoding processes. This allows the network to be used
within an automatic repeat request (ARQ) protocol to quickly and accurately request the retransmission of
corrupted data frames.
The cross-entropy of the output of the two decoders is given by the following equation:
N
D
k 1
1
 p(uˆ
a 0
k
a | Y ) log
p(uˆk  a | Y )
q(uˆk  a | Y )
where the functions p and q are the estimates of the a posteriori probabilities output by each decoder at the
end of an iteration and û k is the estimate of the data bit u k for each of the N data bits. Hagenauer, et al.
showed that using a threshold on this entropy measurement, as compared to using a preset limit on the iteration count, had a minimal impact on bit-error rate (BER) while greatly reducing the number of iterations.
Buckley and Wicker noticed that there was also a relationship between the cross entropy and errors in the
decoded data. They determined that by using the cross entropy after several iterations of the decoder as
input to a neural network, they could very accurately predict the presence of decoder errors. They proposed
the use of two different neural networks tied to the decoder; the first is the “Future Error Detecting Network” (FEDN), which processes the outputs from the first d iterations of the decoder and predicts whether
the decoding should be halted prematurely and a request for retransmission of the frame should be made,
and the second network is the “Decoder Error Detecting Network” (DEDN), which uses the output from
every iteration of the decoder and predicts the correctness of the resulting output. Each of these networks is
a simple feed-forward neural network with a single hidden layer.
Buckley and Wicker described the process by which the best network topologies were chosen and how the
networks were trained. They then presented a detailed analysis of the performance of neural net-assisted
turbo decoders using different stopping conditions and different parameters for the allowed false error rate
and undetected frame error rate. They concluded that these decoders could be used in conjunction with a
hybrid-ARQ protocol in lieu of using a CRC to detect errors – the decoder is less complex, retransmission
requests can be generated sooner, and the encoded bit stream requires less overhead. Their simulations
predict that the reliability should be comparable to a CRC-based decoder at all signal-to-noise ratios of interest.
Other Neural Network Coding Approaches
There was a significant amount of work which preceded the paper by Buckley and Wicker regarding the
use of neural networks in relation to error correction coding. Some of the earlier results were reported by
N. Wiberg ([9]) and Wang and Wicker ([10]).
Analog Decoding
One of the things that distinguished turbo decoders from their predecessors was the use of “soft” outputs
from the component decoders – that is, the requirement that each decoder generates a reliability estimate or
probability measure for its output, rather than a simple “hard” decision about whether each bit was a 0 or 1.
In 1998, Joachim Hagenauer proposed the use of analog decoders within the turbo decoder network, rather
than maintenance of the decision values in the digital domain and the use of iterative algorithms ([11]).
This allowed for a highly parallel decoder structure in which iteration was not required; the values of the
decoders feed back in a continuous analog network. Another significant advantage of this approach is that
the entire decoder can be implemented directly in a self-contained analog VLSI circuit. There has been
much subsequent research into analog VLSI decoder implementations.
GENETIC ALGORITHMS
There are a number of parameters of a turbo coder and decoder which can be adjusted and which will yield
different performance depending on the characteristics of the transmission channel. Ezequiel Bertone, Ismail Soto, and Roland Carrasco have recently been exploring the use of genetic algorithms to search for
turbo codes that will yield good performance on a given channel, subject to certain constraints on the modulation available.
TURBO CODING RESEARCH
There are a number of resources on turbo coding to be found on the Web. A good place to start is the Jet
Propulsion Lab’s Turbo Coding page, at http://www331.jpl.nasa.gov/public/JPLtcodes.html.
Robert McEliece, Ph.D., Professor of Electrical Engineering at the California Institute of Technology, leads
a group doing information theoretical research, including continued analysis of turbo codes. They have
proposed a general message passing algorithm, the General Distributive Law, that is a superset of research
into algorithms including turbo coding and Pearl’s belief propagation algorithm ([12]).
Stephen Wicker, Ph.D., Professor and Associate Director in the School of Electrical and Computer Engineering at Cornell University (http://people.ece.cornell.edu/wicker/), is very active in coding theory research and particularly in turbo coding and in the applications of AI techniques to wireless network problems.
Roland A. Carrasco, Ph.D., Professor of Mobile Communications in the School of Computing and Information Technology at the University of Wolverhampton, UK (http://www.scit.wlv.ac.uk/~in8189), heads a
group whose interests include equalization and coding, including turbo codes. Some of their research involves the application of AI techniques, in particular neural networks and genetic algorithms, to coding
problems.
Joachim Hagenaer, Ph.D., Head of the Institute for Communications Engineering at TU Muenchen
(http://www.lnt.e-technik.tu-muenchen.de/mitarbeiter/hagenauer/hagenauer_e.html) is still actively involved in coding research, particularly and analog decoders and equalizers for turbo codes.
FUTURE ISSUES
Active research into turbo codes is ongoing and seems likely to continue. Their use in wireless communications networks is highly desirable due to their excellent performance in power-constrained and high-noise
environments, and this should drive considerable research. One of the biggest problems is latency inherent
in the decoder, due to the iterative nature of the decoding algorithm. It is important to minimize this latency if turbo codes are to be used for channels carrying data of a highly interactive nature, such as digital
voice.
It is likely that significant attention will be paid to the implementation of the decoder in analog VLSI circuits; this is probably the best option for decreasing the decoding latency inherent in turbo codes. However, a drawback is that the decoder parameters are fixed by the implementation; mechanisms by which the
circuit could be reconfigured or adjusted to adapt to changing channel characteristics could further improve
performance.
Given that the turbo decoding problem can be formulated as a probabilistic decision making algorithm, it is
possible that application of AI search techniques might yield additional improvements to the standard digital decoding process. In particular, the study of alternative representations of turbo codes and other codes
as Tanner graphs and as factor graphs might yield additional insights into how to improve these codes and
their decoders.
REFERENCES
[1] Berrou, G., A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding: Turbo
Codes,” in Proc. 1993 Int. Conf. Commun., Geneva, Switzerland, May 1993, pp. 1064-1070.
[2] J. Hagenauer and P. Hoeher, “A Viterbi Algorithm with Soft-Decision Outputs and Its Applications,”
Proceedings of IEEE GLOBECOM, November 1989, Dallas, TX, pp. 1680–1686.
[3] Bahl, L. R., J. Cocke, F. Jelinek, and J. Raviv, "Optimal Decoding of Linear Codes for Minimizing
Symbol Error Rate", IEEE Transactions on Information Theory, vol. 20. no. 2, March 1974, pp. 284287.
[4] Viterbi, A. J., “An Intuitive Justification and Simplified Implementation of the MAP Decoder for Convolutional Codes,” IEEE Journal on SelectedAreas of Communication, February, 1998, pp. 260–264.
[5] McEliece, R., D. MacKay, and J. Cheng, “Turbo Decoding as an Instance of Pearl’s ‘Belief Propagation’ Algorithm”, IEEE Journal on Selected Areas in Communciations, Vol. 16, No. 2, Feb. 1998, pp.
140-152.
[6] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan
Kauffman, San Mateo, CA, 1988.
[7] Hagenauer, J., E. Offer, and L. Papke, “Iterative Decoding of Binary Block and Convolutional Codes,”
IEEE Transactions on Information Theory, vol. 42, March 1996, pp. 429–445.
[8] Buckley, M. and S. Wicker, “The Design and Performance of a Neural Network for Predicting Turbo
Decoding Error with Application to Hybrid ARQ Protocols,” IEEE Transactions on Communications,
Vol. 48, No. 4, April 2002, pp 566-576.
[9] Wiberg, N., Approaches to Neural-Network Decoding of Error-Correcting Codes. Linkőping studies
in science and technology, thesis no. 425, Linkőping University, Department of Electrical Engineering,
1994.
[10] Wang, X.-A. and S. Wicker, “An Artificial Neural Net Viterbi Decoder,” IEEE Transactions on Communications, Vol. 44, Feb. 1996, pages 165–170.
[11] Hagenauer, J., “Decoding of Binary Codes with Analog Networks,” Proc. 1998 IEEE Information
Theory Workshop, San Diego, CA, Feb. 8-11, 1998, p. 13-14.
[12] Aji, S. and R. McEliece, “The Generalized Distributive Law,” IEEE Transactions on Information Theory, Vol. 46, No. 2, March 2000, pp. 325-343.
Download