"AI Techniques for Turbo Decoding (Word)"

Applying Artificial Intelligence Techniques to Turbo Decoding Bob Wall EE 548 Apr. 30, 2004 Turbo coding techniques are currently a topic of great interest– to date, their performance on noisy channels comes closest to Shannon’s limit. However, there are some drawbacks to the use of turbo codes; one of the most significant is the delay in decoding. This is due in part to the blocking and interleaving inherent in turbo coding, and in part because of the iterative nature of the decoder. It is therefore of great interest to find ways to implement a relatively simple, fast decoder. The field of Artificial Intelligence provides several techniques that can be directly applied to this problem. INTRODUCTION Shannon’s Capacity Theorem, introduced by Claude Shannon in 1948, states that the maximum data transmission capacity (in bits/second) of a band-limited channel with additive white Gaussian noise (AWGN) is given by C = BW log2 (1 + S/N) That is, the capacity is proportional to the bandwidth of the channel and to the signal-to-noise ratio. The work by Shannon and contemporaries like Hamming spawned the fields of information theory and coding theory. There has been a great deal of research over the years on ways to encode information such that a rate approaching Shannon’s limit can be reached. In practice, it is all but impossible to reach this capacity, but there is great interest in finding data transmission techniques that will at least approach the limit. This is of particular interest in power-limited or high-noise environments, such as space communications links and cellular networks. Coding theory has produced a number of different codes in an effort to improve the error-free transmission rate; these are typically divided into block codes and convolutional codes. Convolutional codes have an advantage over block codes in that they can be made more powerful (i.e., able to detect and correct more errors) without huge increases in the size and complexity of the decoder. However, convolutional codes are sensitive to burst errors and have some other problems that limit their effectiveness. In an effort to overcome these problems, concatenated codes were introduced. This involved feeding the output of one coder (the “outer code”, such as a Reed-Solomon block coder) into another coder (the “inner code”, such as a convolutional coder). This further improved coding gain (the increase in bit error rate, or BER, for a given signal-to-noise ratio), but there was still significant opportunity for improvement. In 1993, Berrou, Glavieux, and Thitimajshima proposed a variation of concatenated codes that introduced a pseudo-random interleaver to reorder the input before passing it through the second coder ([1]). They called these turbo codes, as an analogy to a turbo-charged engine, which uses feedback from the exhaust system to enhance performance. The principles of turbo coding are described below. The performance of turbo codes was a significant breakthrough in coding theory – experimental results demonstrated that with appropriate choices of the component coders and interleaver, it was possible to get within 0.5 dB of Shannon’s limit in certain environments. These results spurred research in understanding what characteristics of turbo codes caused them to perform so well. One breakthrough was achieved when the turbo decoding process was interpreted in terms that were not familiar within the framework of coding theory, but which were common in artificial intelligence – if the turbo decoder was viewed as a special type of Bayesian network called a belief network, the application of a popular algorithm associated with these networks directly yielded the turbo decoding algorithm. This discovery provided important insights into the turbo decoding. A summary of belief networks and how they are related to turbo decoding is presented below. Another construct common in artificial intelligence, the neural network, has been used in conjunction with turbo decoding; a technique is described for using neural nets in conjunction with the turbo decoder to accurately predict the presence of errors in decoded data. This method is more reliable and less complex to implement than a cyclic redundancy check in conjunction with the decoder. Finally, a brief discussion is presented on the use of genetic algorithms to select the parameters for a turbo coder given the characteristics of the transmission channel. TURBO CODE OVERVIEW The turbo coder as it was initially proposed by Berrou, et al., was a parallel concatenation of two Reed Solomon block codes. The key contribution that distinguishes turbo codes from other concatenated codes is the use of a pseudo-random interleaver, which reordered the data to be encoded before feeding it to the second encoder. This is a simple block diagram of the encoder: data dk Dk Interleaver RSC1 C1,k RSC2 C2,k There are a number of variations on this theme – the coders do not need to be identical, and in fact one can be a block coder and the other a convolutional coder. They can also be concatenated serially rather than in parallel – that is, the output of one coder is used as the input of the other coder. One explanation for the improvements obtained by turbo coding is that each of the encoders typically produces a “good” or high-weight code word for any input, but it produces a “bad” or low-weight code word for some inputs. The interleaver makes it very unlikely that both encoders will output a bad code word for the same input, increasing the chance that the decoder will be able to extract the correct information. This is a block diagram of the corresponding decoder: L2 C1,k C2,k Decoder 1 Interleaver Decoder 2 Deinterleaver L1 Dk Interleaver Threshold detector Terminate Condition Stop Iteration Output data The use of a large block interleaver dramatically increases the state space of the encoded message, making the creation of a deterministic optimal decoder impractical. Instead, it is necessary to use an iterative algorithm to perform the decoding. There are two typical approaches used – the Soft Output Viterbi Algorithm (SOVA) ([2]) and the Maximum A Posteriori (MAP) algorithm, often called the BCJR algorithm after its inventors ([3]). The BCJR algorithm is more complicated than the SOVA algorithm, but performs better. However, its complexity can be reduced by implementing it in the log domain, which transforms multiplications into additions ([4]). In addition to a prediction of the output value associated with their inputs, these algorithms produce a confidence measure or reliability estimate associated with the output (the values L1 and L2 in the diagram). That is, each decoder has “soft” inputs and outputs, which reflect the likelihood that the decoder has determined the correct value for the data. The key to these algorithms is that they cannot directly determine the correct answer for a given input, so they iterate until they converge on the most likely source data sequence that would produce the received data. If the decoder does not converge, the errors in the received data exceed its ability to correct them. Because code words are accumulated into blocks and interleaved while encoding, it is necessary to fill up at least some large portion of that block in the receiver before decoding can begin. The subsequent iterations of the decoder introduce additional delay. It is of interest to find methods of generating this decoder that are simple, fast, and reasonably close to the optimal decoder. THE TURBO DECODER AS A BAYESIAN BELIEF NETWORK A Bayesian network is a common construct used in the field of artificial intelligence to represent the relationship between random variables. For instance, this is a simple Bayesian network: This indicates that there are a number of source variables Ui, which have no dependencies on other variables, a hidden variable X, and a number of observation variables Yj. If there is a probability associated with each possible value of X based on the values of each of the Ui, and a probability associated with each possible value of Yj based on the value of X, this is called a Bayesian belief network. In 1998, MacEleice, MacKay, and Cheng published a paper in which they analyzed parallel concatenated turbo codes (and other recently developed codes) as belief networks ([5]). Framing the turbo decoder as a Bayesian belief network allows the application of a number of well-known techniques from the field of AI for analysis and for the design of solutions. Turbo Decoding as Belief Propagation Given a belief network, if the values of one or more variables are observed or measured, this set of variables can be considered as evidence. The fundamental probabilistic inference problem is to calculate the updated or a posteriori probabilities of the other variables given this evidence. Obviously, if the number of nodes in the network increases, the effort involved to compute these probabilities grows rapidly; in fact, the solution in a general network has been shown to be NP hard. Fortunately, there are a number of algorithms, including Judah Pearl’s belief propagation algorithm, which can be applied if the network is in fact a tree (i.e., there are no loops in the network). This algorithm can significantly simplify the inference problem and solve it in a distributed fashion ([6]). However, MacEleice, et al., pointed out that experiments demonstrated that Pearl’s algorithm works “approximately” for some loopy networks. They presented the following interpretation of the turbo decoder as a belief network: They then proved that if Pearl’s belief propagation algorithm is applied to this network, it yields an algorithm that is identical to the turbo decoding algorithm. They also showed that the use of belief propagation on other coding schemes also generated their iterative decoding algorithms. This proved to be a much-cited result in the error-correction and coding community, and initiated a large amount of research into the interpretation of codes as graphs and the subsequent decoding as a graph theory problem. Subsequent attention has been given to the representation of decoders as Tanner graphs and more recently as factor graphs; this is an active area of research. NEURAL NETWORKS Given the iterative nature of the turbo decoding process, there must be some stopping criteria that determine when the decoder has converged, or if the decoder will not converge (due to the presence of uncorrectable errors). One possibility is to incorporate an outer error detecting code, such as a cyclic redundancy check (CRC) into the code. Another is to continue iterations until the measured variance of the estimates produced by the decoders drops below a pre-set threshold. In 1996, Joachim Hagenauer, Elke Offer, and L. Papke published a paper describing a scheme in which the cross entropy of the decoder outputs are tracked to detect convergence ([7]). A threshold is still required to halt the iterations if they are not converging. In 2000, Michael E. Buckley and Stephen Wicker proposed a mechanism whereby a neural network can be trained to monitor this cross entropy and to predict whether there are errors in the decoded data ([8]). Even more importantly, a similar network can be trained to predict whether the decoding process will produce errors, but it can make this prediction early in the decoding processes. This allows the network to be used within an automatic repeat request (ARQ) protocol to quickly and accurately request the retransmission of corrupted data frames. The cross-entropy of the output of the two decoders is given by the following equation: N D k 1 1  p(uˆ a 0 k a | Y ) log p(uˆk  a | Y ) q(uˆk  a | Y ) where the functions p and q are the estimates of the a posteriori probabilities output by each decoder at the end of an iteration and û k is the estimate of the data bit u k for each of the N data bits. Hagenauer, et al. showed that using a threshold on this entropy measurement, as compared to using a preset limit on the iteration count, had a minimal impact on bit-error rate (BER) while greatly reducing the number of iterations. Buckley and Wicker noticed that there was also a relationship between the cross entropy and errors in the decoded data. They determined that by using the cross entropy after several iterations of the decoder as input to a neural network, they could very accurately predict the presence of decoder errors. They proposed the use of two different neural networks tied to the decoder; the first is the “Future Error Detecting Network” (FEDN), which processes the outputs from the first d iterations of the decoder and predicts whether the decoding should be halted prematurely and a request for retransmission of the frame should be made, and the second network is the “Decoder Error Detecting Network” (DEDN), which uses the output from every iteration of the decoder and predicts the correctness of the resulting output. Each of these networks is a simple feed-forward neural network with a single hidden layer. Buckley and Wicker described the process by which the best network topologies were chosen and how the networks were trained. They then presented a detailed analysis of the performance of neural net-assisted turbo decoders using different stopping conditions and different parameters for the allowed false error rate and undetected frame error rate. They concluded that these decoders could be used in conjunction with a hybrid-ARQ protocol in lieu of using a CRC to detect errors – the decoder is less complex, retransmission requests can be generated sooner, and the encoded bit stream requires less overhead. Their simulations predict that the reliability should be comparable to a CRC-based decoder at all signal-to-noise ratios of interest. Other Neural Network Coding Approaches There was a significant amount of work which preceded the paper by Buckley and Wicker regarding the use of neural networks in relation to error correction coding. Some of the earlier results were reported by N. Wiberg ([9]) and Wang and Wicker ([10]). Analog Decoding One of the things that distinguished turbo decoders from their predecessors was the use of “soft” outputs from the component decoders – that is, the requirement that each decoder generates a reliability estimate or probability measure for its output, rather than a simple “hard” decision about whether each bit was a 0 or 1. In 1998, Joachim Hagenauer proposed the use of analog decoders within the turbo decoder network, rather than maintenance of the decision values in the digital domain and the use of iterative algorithms ([11]). This allowed for a highly parallel decoder structure in which iteration was not required; the values of the decoders feed back in a continuous analog network. Another significant advantage of this approach is that the entire decoder can be implemented directly in a self-contained analog VLSI circuit. There has been much subsequent research into analog VLSI decoder implementations. GENETIC ALGORITHMS There are a number of parameters of a turbo coder and decoder which can be adjusted and which will yield different performance depending on the characteristics of the transmission channel. Ezequiel Bertone, Ismail Soto, and Roland Carrasco have recently been exploring the use of genetic algorithms to search for turbo codes that will yield good performance on a given channel, subject to certain constraints on the modulation available. TURBO CODING RESEARCH There are a number of resources on turbo coding to be found on the Web. A good place to start is the Jet Propulsion Lab’s Turbo Coding page, at http://www331.jpl.nasa.gov/public/JPLtcodes.html. Robert McEliece, Ph.D., Professor of Electrical Engineering at the California Institute of Technology, leads a group doing information theoretical research, including continued analysis of turbo codes. They have proposed a general message passing algorithm, the General Distributive Law, that is a superset of research into algorithms including turbo coding and Pearl’s belief propagation algorithm ([12]). Stephen Wicker, Ph.D., Professor and Associate Director in the School of Electrical and Computer Engineering at Cornell University (http://people.ece.cornell.edu/wicker/), is very active in coding theory research and particularly in turbo coding and in the applications of AI techniques to wireless network problems. Roland A. Carrasco, Ph.D., Professor of Mobile Communications in the School of Computing and Information Technology at the University of Wolverhampton, UK (http://www.scit.wlv.ac.uk/~in8189), heads a group whose interests include equalization and coding, including turbo codes. Some of their research involves the application of AI techniques, in particular neural networks and genetic algorithms, to coding problems. Joachim Hagenaer, Ph.D., Head of the Institute for Communications Engineering at TU Muenchen (http://www.lnt.e-technik.tu-muenchen.de/mitarbeiter/hagenauer/hagenauer_e.html) is still actively involved in coding research, particularly and analog decoders and equalizers for turbo codes. FUTURE ISSUES Active research into turbo codes is ongoing and seems likely to continue. Their use in wireless communications networks is highly desirable due to their excellent performance in power-constrained and high-noise environments, and this should drive considerable research. One of the biggest problems is latency inherent in the decoder, due to the iterative nature of the decoding algorithm. It is important to minimize this latency if turbo codes are to be used for channels carrying data of a highly interactive nature, such as digital voice. It is likely that significant attention will be paid to the implementation of the decoder in analog VLSI circuits; this is probably the best option for decreasing the decoding latency inherent in turbo codes. However, a drawback is that the decoder parameters are fixed by the implementation; mechanisms by which the circuit could be reconfigured or adjusted to adapt to changing channel characteristics could further improve performance. Given that the turbo decoding problem can be formulated as a probabilistic decision making algorithm, it is possible that application of AI search techniques might yield additional improvements to the standard digital decoding process. In particular, the study of alternative representations of turbo codes and other codes as Tanner graphs and as factor graphs might yield additional insights into how to improve these codes and their decoders. REFERENCES [1] Berrou, G., A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding: Turbo Codes,” in Proc. 1993 Int. Conf. Commun., Geneva, Switzerland, May 1993, pp. 1064-1070. [2] J. Hagenauer and P. Hoeher, “A Viterbi Algorithm with Soft-Decision Outputs and Its Applications,” Proceedings of IEEE GLOBECOM, November 1989, Dallas, TX, pp. 1680–1686. [3] Bahl, L. R., J. Cocke, F. Jelinek, and J. Raviv, "Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate", IEEE Transactions on Information Theory, vol. 20. no. 2, March 1974, pp. 284287. [4] Viterbi, A. J., “An Intuitive Justification and Simplified Implementation of the MAP Decoder for Convolutional Codes,” IEEE Journal on SelectedAreas of Communication, February, 1998, pp. 260–264. [5] McEliece, R., D. MacKay, and J. Cheng, “Turbo Decoding as an Instance of Pearl’s ‘Belief Propagation’ Algorithm”, IEEE Journal on Selected Areas in Communciations, Vol. 16, No. 2, Feb. 1998, pp. 140-152. [6] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kauffman, San Mateo, CA, 1988. [7] Hagenauer, J., E. Offer, and L. Papke, “Iterative Decoding of Binary Block and Convolutional Codes,” IEEE Transactions on Information Theory, vol. 42, March 1996, pp. 429–445. [8] Buckley, M. and S. Wicker, “The Design and Performance of a Neural Network for Predicting Turbo Decoding Error with Application to Hybrid ARQ Protocols,” IEEE Transactions on Communications, Vol. 48, No. 4, April 2002, pp 566-576. [9] Wiberg, N., Approaches to Neural-Network Decoding of Error-Correcting Codes. Linkőping studies in science and technology, thesis no. 425, Linkőping University, Department of Electrical Engineering, 1994. [10] Wang, X.-A. and S. Wicker, “An Artificial Neural Net Viterbi Decoder,” IEEE Transactions on Communications, Vol. 44, Feb. 1996, pages 165–170. [11] Hagenauer, J., “Decoding of Binary Codes with Analog Networks,” Proc. 1998 IEEE Information Theory Workshop, San Diego, CA, Feb. 8-11, 1998, p. 13-14. [12] Aji, S. and R. McEliece, “The Generalized Distributive Law,” IEEE Transactions on Information Theory, Vol. 46, No. 2, March 2000, pp. 325-343.

"AI Techniques for Turbo Decoding (Word)"

Related documents

Products

Support

"AI Techniques for Turbo Decoding (Word)"

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib