Adaptive Turbo Decoder with FSM Based Interleaver Address Generator Varsha Ramesh

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
Adaptive Turbo Decoder with FSM Based
Interleaver Address Generator
Varsha Ramesh#1, M.Thangamani#2
#1
II Year M.E Student, #2 Assistant Professor
#1, #2
ECE Department,Dhanalakshmi Srinivasan College of Engineering,
Coimbatore,Tamilnadu,India.
Anna University
Abstract---The fast spread of wireless data communication
systems and the ever increasing demand for faster data
rates requires fast design, implementation and test of new
wireless algorithms and architectures for data
communications. The most popular communication
decoder, the Turbo decoder, requires an exponential
increase in hardware complexity to achieve greater
decodeing accuracy. Interleaver is a critical component of
a turbo decoder. In this work first utilize the balanced
scheduling scheme to avoid memory reading conflicts.
Then, based on the statistical property of memory
conflicts, the other critical parameters access timing,
power and area are reduced when using another
alternative for the IAG which is FSM based
IAG(Interleaver Addres Generator). This IAG is the
efficient and fast parallel interleaver architecture
supporting both interleaving and deinterleaving modes.
Also describe about the analysis and implementation of a
reduced-complexity decodes approach, the adaptive turbo
decoding architecture (ATA). Our ATA design takes full
advantage of algorithm parallelism and specialization. To
achieve improved decoder performance, Run-time
dynamic reconfiguration is used in response to changing
channel noise conditions. Implementation parameters for
the decoder have been determined through simulation and
the decoder has been implemented on a Xilinx XC4036
software.
Keywords--- Forward Error Correction,
decoder, Trellis tree diagram, Turbo decoder.
I.
Turbo
INTRODUCTION
Growth of high-performance wireless communication
systems has been drastically increased over the last few years.
Due to rapid advancements and changes in radio
communication systems, there is always a need of flexible and
general purpose solutions for processing the data [1]. The
ISSN: 2231-5381
solution not only requires adopting the variances within a
particular standard but also needs to cover a range of standards
to enable a true multimode environment.
To handle the fast transition between different
standards, fast andf accurate platform is needed in both
mobile devices and especially in base stations. Including
symbol processing, one of the challenging area will be the
provision of flexible subsystems for forward error correction
(FEC). FEC subsystems can further be divided in two
categories, channel coding / decoding and interleaving/
deinterleaving. Among these categories, interleavers and
deinterleavers appeared to be more silicon consuming due to
the silicon cost of the permutation tables used in conventional
approaches. Therefore, the hardware reuse among different
interleaver modules to support multimode processing platform
is of significance. This paper introduces flexible and low-cost
hardware interleaver architecture and adaptive turbo
architecture for trellis tree structure which will covers a range
of interleavers admitted in various communication standards.
II.
BACKGROUND
Error correcting codes [9] can be used to detect and
correct data transmission errors in communication channels.
The encoding is accomplished through the addition of
redundant bits to transmitted information symbols. These
redundant bits provide decoders with the capability to correct
transmission errors. In convolution coding, the encoded output
of a transmitter is (encoder) depend not only on the set of
encoder inputs received within a particular time step, but also
on set of the inputs received within a previous span of K-1
time units, where K will be greater than 1. The parameter K is
constraint length of the code.
A convolutional encoder is represented by the
number of output bits per input bit (v), the number of input
bits accepted at a time (b), and the constraint length (K),
leading to representation (v, b, K). A (2, 1, 3) convolutional
http://www.ijettjournal.org
Page 144
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
encoder since the encoder accepts one input bit per time step
and generates two output bits. The two output bits are
dependent on the present input and the previous two input bits.
The constraint length K indicates the number of times each
input bit has an effect on producing output bits. Larger
constraint lengths, i.e. K = 9 or higher, are preferable since
they allow for more accurate error correction.
The operation of the encoder is represented by using
a state diagram. Nodes represent the present state of the shift
register while edges represent the output sequence and point to
the next state of transition. The diagram is a time-ordered
mapping of encoder state with each possible state represented
with a point on the vertical axis. A node shows the present
state of the shift register at specific points in time while edges
represent the output sequence and point to the next state of
transition. The upper branch leaving a node implies an input of
0 while the lower branch implies an input of 1.
The function of the decoder is that attempt to
reconstruct the input sequence transmitted by the encoder by
evaluating the received channel output. Values received by the
decoder may differ from values sent from the encoder due to
channel noise. The interaction between states represented by
the trellis diagram is used by a decoder to determine the likely
transmitted data sequence [10] as v-bit symbols are received.
At each node, the cumulative cost or path metric of the path is
determined. After a series of time steps, that is known as the
truncation length (TL), the lowest-cost path or minimum
distance path is determined and that will identify the mostlikely transmitted symbol sequence. The value of the
truncation length depends on the noise in the channel and it
has been empirically found to be 3-5 times the constraint
length [9]. Each path in the trellis diagram represented by a
unique set of inputs corresponding to the lowest-cost input
sequence b = (0110). The performance of a decoder is
characterized by number of decoded output bits which are in
error, the Bit Error Rate or BER. Ratio of number of bits in
error to the total number of bits transmitted is known as BER.
For accurate communication fidelity it is desirable to achieve a
low BER.
The most popular decoding approach for
convolutional codes, the turbo algorithm [9] determines a
minimum distance path with regards to Hamming distances
applied to each received symbol. A limiting factor in turbo
decoder implementations is the need to preserve candidate
paths at all 2K−1 trellis states with each received symbol. This
will leads to an exponential growth in the amount of
computation performed and the amount of path storage
retained as constraint length K grows. The architecture of the
ISSN: 2231-5381
Turbo algorithm [9] is split into three parts: the branch metric
generators (BMG), add-compare select (ACS) units, and the
survivor memory unit. A BMG unit determines Hamming
distances between received and expected symbols. An ACS
unit determines path costs and identifies lowest-cost paths.
The survivor memory stores lowest cost bit-sequence paths
based on decisions made by the ACS units.
III.
RELATED WORK
An efficient parallel decoding approach called the
segmented sliding window (SSW) approach was introduced
Zhongfeng. The idea is to divide a decoding frame into many
sliding blocks and assign these sliding blocks to several
segments. Each segment consists of consecutive sliding blocks
and adjacent segments have an overlap of exactly two sliding
blocks. When employing the sliding window approach for
decoding each segment, the performance of the parallel
decoder is expected to be almost the same as using the sliding
window approach for the whole frame [9].
Maurizio [10] A flexible UMTS/WiMax turbo decoder
architecture has been presented together with a parallel
WiMax interleaver architecture. Compared to a single-mode,
parallel WiMax architecture the proposed one exhibits a
limited complexity overhead. Moreover, compared to
separated dual mode UMTS/WiMax turbo decoder
architecture, it achieves the 17.1% logic reduction and the
27.3% memory reduction. Besides this, in order to cope with
severe transmission environments, typical of wireless systems,
channel codes ought to be adopted.
The first dedicated approach that finds conflict free
memory mapping for every type of codes and for every type of
parallelism in polynomial time is presented [11]. The
implementation of this highly efficient algorithm shows
significant improvement in terms of computational time
compared to state of the art approaches. This could enable
memory mapping algorithm to be embedded on chips and
executed on the fly to support multiple block lengths and
standards.
Vosoughi propose a novel algorithm and architecture for
on-the-fly parallel interleaved address generation in UMTS /
HSPA+ standard that is highly scalable. Algorithm generates
an interleaved memory address from an original input address
without building the complete interleaving pattern or storing
it; Ahmad Sghaier et al. [7] have described a look up table
based method for address generation of the interleaver used in
http://www.ijettjournal.org
Page 145
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
IEEE 802.11 WLAN. In [6] special matrix based architecture
for multimode WLAN block interleaver is presented.
Algebraic constructions are the particular interest
because it will include analytical designs and simple practical
hardware implementation. Sun and Takeshita shown that the
class of quadratic permutation polynomials over integer rings
provides excellent performance for the turbo codes.
Interleaving method first introduced in this structure. The
developed method is on-the-fly IAG. To eliminate the
disadvantages of on-the-fly IAG new Interleaver architecture
developed that is FSM based IAG is described in the follwing
part.
Several reconfigurable implementations of Turbo
decoders have been reported. Although these systems are
FPGA based, none of them use run-time reconfiguration to
achieve performance improvement. Unlike our approach, this
implementation does not evaluate all trellis states in parallel,
resulting in slower decoding operation. Racer, a constraint
length 14 Turbo decoder, is described. The system uses 36
XC4010 FPGAs and seven processor cards and employs a
novel approach to implementing survivor memory. Due to the
use of a sizable number of FPGAs and significant inter-chip
communication, system area is large. Racer exhibits
significant parallelism, although some add-compare-select
hardware is multiplexed across multiple trellis states per
received symbol. Candidate paths are stored in memory
external to the FPGAs. Our ATA approach achieves fully
parallel implementation on a single, large FPGA that contains
significantly less total logic than the board used in [16] for the
same constraint length (K=14).
In [5], a Turbo decoder of constraint length 7 using four
XC4028EX FPGAs is described. The decoder is partitioned so
that 64 ACS units fit into two of the FPGAs and the remaining
two FPGAs house the survivor memory and its corresponding
controller. The main issue with this approach involves data
transfer between FPGAs. Although [5] allowed for parallel
trellis evaluation, the limited data rate of 12 Kbps was
achieved for a relatively small constraint length of 7. This
reduced rate was primarily due to inter chip data transfer
overhead. No dynamic reconfiguration was performed.
IV.
permutation ensures that adjacent coded bits are mapped
alternately onto less or more significant bits of constellation,
and avoiding long runs of lowly reliable bits, d represents
number of columns of the block Interleaver which is typically
chosen to 16. mk is the output after first level of permutation
and k varies from 0 to Ncbps -1. S is a parameter defined as s=
max {1, Ncpc/2}, where Ncpc is the number of coded bits per
subcarrier.
A. Address Generator
Our proposed design of address generator block is
described in the form of schematic diagram in Fig. 1. Bulk of
the circuitry is used for generation of write address. It contains
three multiplexers (muxs): mux-1 and mux-2 implements the
unequal increments required in 16-QAM and 64-QAM
whereas mux-3 routes the outputs received from mux-1 and
mux-2 along with equal increments of BPSK and QPSK. The
select input of mux-1 controlled by a T-flip-flop named
qam16_sel whereas that of mux-2 is controlled by a mod-3
counter, qam64_sel. The two lines of mod_typ (modulation
type) are used as select input of the mux-3. The 6-bit output
from the mux-3 acts as one of the input of the 9-bit adder after
zero padding. The other input of adder comes from the
accumulator, which will hold previous address. After addition
of a new address is written in the accumulator.
The preset logic is a hierarchical FSM whose
principal function is to generate the correct beginning
addresses for all subsequent iterations. This block contains a
4-bit counter keeping track of end of states during the
iteration. The FSM enters into the first state (SF) with clr = 1.
FINITE STATE MECHINE BASED
INTERLEAVER ADDRESS GENERATOR
The Interleaver is defined with a two-step
permutation. The first will ensures that adjacent coded bits are
mapped onto non adjacent subcarriers.
The second
ISSN: 2231-5381
http://www.ijettjournal.org
Page 146
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
versa. Each memory module receives either write address or
read address with the help of the mux connected to their
address input lines (A) and sel line.At the beginning RAM-1
receives the read address and RAM-2 gets the write address
with write enable (WE) signal of RAM-2 active. After a
particular memory block is read / written up to the desired
location the status of sel line changes and the operation is
reversed. The mux at the output of the memory modules routes
the interleaved data stream from the read memory block to the
output.
Figure 1. Schematic diagram of address generator
Based on the value in mod_typ it makes transition to
one of the four possible next states (SMT0, SMT1, SMT2 or
SMT3). Each state in this level represents one of the possible
modulation schemes. The FSM there after makes transition to
the next states (e.g. S000, S001 and so on) based on the value
in the accumulator. When FSM at this level reaches to the
terminal value of that iteration it makes transition to a state
(e.g. S000) in which it loads the accumulator with the initial
value (e.g. preset=1) of next iteration. This will continues till
all of the interleaver addresses are generated for selected
mod_typ. If no changes take place within the values of
mod_typ, the FSM will follow the same route of transition and
the same set of interleaver addresses will be continually be
generated. Any change in mod_typ value causes the
interleaver to follow an alternate path. In order to facilitate the
address generator with on the fly address computation feature,
we have made the circuit to respond to clr input followed by
mod_typ inputs at any stage of the FSM. With clr=1 it will
comes back to SF state irrespective of its current position and
there after transits to the desired states in response of new
value in mod_typ.
B. Interleaver Memory
The interleaver memory block comprises of two
memory modules (RAM-1 and RAM-2), three muxs and an
inverter as shown in Fig. 2. In block interleaving when one
memory block is being written the other one is read and vice-
ISSN: 2231-5381
Figure 2. Schematic view of Interleaver Memory block
V.
ADAPTIVE TURBO ALGORITHM
The adaptive turbo algorithm [4] is introduces with a
goal of reducing the average computation and the path storage
required by the Turbo decoding algorithm. Instead of
computing and retaining all the 2K−1 paths, only those paths
which satisfy certain cost conditions are retained for each
received symbol at each state node. The path retention based
on some criteria and that is, a threshold T indicates that a path
is retained if its path metric is less than dm + T, where dm is
the minimum cost among all surviving paths in the previous
trellis stage. The total number of survivor paths per trellis
stage is limited to a fixed number, Nmax, which is pre-set
prior to the start of communication.
The first criteria allows high-cost paths that do not
represent the transmitted data to be eliminated from
consideration early in the decoding process. Where many
paths with similar cost, the second criterion is restrict to the
number of paths to Nmax. A trellis diagram of the adaptive
Turbo algorithm of constraint length 3 is shown in Figure with
a Threshold value T = 1.
http://www.ijettjournal.org
Page 147
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
on BER as T. If a small value of Nmax chosen such that, the
paths which satisfy the threshold condition may be discarded,
potentially leading to a large BER.
To demonstrate the benefit of the adaptive Turbo
algorithm we have developed interleaver and adaptive
algorithm. This architecture takes advantage of parallelization
and specialization of hardware for specific constraint lengths
and dynamic reconfiguration to adapt decoder hardware to
changing channel noise characteristics.
A. Description of the Architecture
Figure 3. Trellis diagram for a hard-decision adaptive
Turbo decoder with T = 1 and Nmax = 3
At each stage, the minimum cost (path metric) of the
previous stage dm, threshold T, and maximum survivors
Nmax are used to prune the number of surviving paths.
Initially, at t=0, the decoder state is set as 00. Two branches
flow out from state 00 to states 00 and 10 at t=1 representing
encoded transmission 0 and 1 values respectively by the
encoder. If the received value when t=0 is 00, it is appeared to
be b = 0, v = 00 was transmitted rather than a b = 1, v = 11
value since both bits of the latter v would have been corrupted
by noise Since state 00 is the only state at t=0, dm is the path
metric of state 00, which is 0. As a result, dm + T are 1. when
t=1, the path leading to state 10, does not survive because the
current path metric of state 10, is greater than 1, the value of
dm + T. As a result, only one branch, the branch leading to
state 00 survives at t=1. The new dm used at t=2 is the
minimum among metrics of all surviving paths at t=1. Since
only one path survives at t=1, the path metric of state 00 and
dm will be 0. This can result in an increased BER since the
decision on the most likely path has to be taken from a
reduced number of possible paths. When the large value of T
selected, BER reduces and average number of survivor paths
will be increases. Then the increased decode accuracy comes
at the expense of additional computation and a larger path
storage memory. The value of T should be selected so that the
BER is allowable limits while matching the resource
capabilities of the hardware.
Nmax denote that the maximum number of survivor
paths to be retained at any trellis stage. The maximum pertrellis stage number of survivor paths Nmax has similar effect
ISSN: 2231-5381
The architecture of the implemented adaptive Turbo
decoder is shown in Figure 4 for the encoder with parameters,
(2, 1, 3). The branch metric generator determines the
difference between the received v-bit value and 2v possible
expected values. This difference is the Hamming distance
between the values. A total of 2v branch metrics are
determined by the branch metric generator. For v=2 these
metrics are labeled b00, b01, b10 and b11.
At each trellis stage, the minimum-value surviving
path metric among all path metrics for the preceding trellis
stage dm, is computed. New path metrics are compared to the
sum dm+ T to identify path metrics with excessive cost. As
shown in the left of Figure 4, the path metrics for all the
potential next state paths that is di, is computed by the ACS
unit. Comparators will then used to determine the life of each
path based on the threshold, T. Then the threshold condition is
not satisfied with path metric dm + T, then the corresponding
path will be discarded.
Present and next state values for the trellis are stored
in two column arrays, Present state and NEXTSTATE of
dimensions Nmax and 2Nmax respectively, as shown in
Figure 4. There will be at the most Nmax survivor paths at any
Stage . Since each path is associated with a state then the
number of present states will be Nmax. Each path can
potentially create two child paths before pruning as there are
two possible branches for each present state based on a
received 0 or 1 symbol. Entries in the NEXTSTATE array
need not be in the same row as their respective source present
states. In order to correlate the next state paths and next states
located in the NEXT STATE array, an array of size 2Nmax,
called Path Identify, is used. For each next state element, this
array also indicates the corresponding row in path storage
(survivor) memory for the path.
Once the paths that meet the threshold conditions are
determined, the lowest cost Nmax paths will be selected. To
avoid the need for the sorting circuit described in [6] for the
http://www.ijettjournal.org
Page 148
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
M-algorithm, here developed a nw path pruning
approach.Sorting circuitry will be eliminated by making
feedback adjustments to the parameter T. If the number of
paths that survive and the threshold is less than Nmax, no
sorting is required. For stages,the number of paths surviving
the threshold condition is greater than Nmax, T will be
iteratively reduced by 2 for the current trellis stage until the
number of paths surviving the threshold condition will be
equal to or less than Nmax.
Figure 4: Adaptive Turbo decoder architecture
VI.
EXPERIMENTAL RESULTS
The Modelsim version 6 software is used to model
the entire system process that is encoding, interleaving,
transmitting the bits, receving the bits decoding and
deinterlaving based on the mod_typ value. Figure 5.shows the
combined form for the entire process. ModelSim SE 6.3f, our
entry-level simulator offers VHDL, Verilog, or mixedlanguage simulation. Model Technology’s award-winning
Single Kernel Simulation (SKS) technology enables
transparent mixing of VHDL and Verilog in a design.
ModelSim’s architecture allows platform independent compile
along with the outstanding performance of native compiled
code.
Figure 5. Combined output for the entire process
ISSN: 2231-5381
http://www.ijettjournal.org
Page 149
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
Figure 7. Timing Summary
Figure 8.Power Summary
Figure 6 shows design summary that is total gate
count. it will depends on the area required for the on board
implementation.
The speed of FSM based turbo decoder with adaptive
turbo architecture is shown in figure 7.minimum period is
known as the time taken to complete the entire process.
The estimated power is given in figure 8.The entire
process takes less power as compared to any other system.
Power is major concern in designing any 3G/4G system. Low
power consumption is an important advantage for any case
for equipment used in wireless communication as they are to
run on battery power.
Figure 6. Design Summary
VII.
PERFORMANCE COMPARISON
The FSM based approach and ATA architecture provides
higher operating frequency and excellent FPGA resource
utilization. Use of FPGA’s memory offers advantages like
reduced access time, lesser amount of area required on circuit
board and lower power consumption than external memory
based techniques. The table 1 given below gives the comparison
between parallel architecture of the turbo decoeder with on-the
fly interleaver and adaptive turbo decoder with FSM based IAG.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 150
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
Parameter
Parallel
architecture
with on the
fly IAG
Adaptive
Turbo
decoder
with FSM
based IAG
Total
gate
count
Timing
Summary
(ns)
Power
109,870
14.220
3429
(mW)
[4].
[5].
810
9.325
169
Table 1.Comparison between parallel architecture with on
the fly IAG and adaptive turbo decoder with FSM based IAG.
VIII.
CONCLUSION
In novel FSM based technique with adaptive turbo decoder
used in IEEE 802.11a and IEEE 802.11g based WLAN has been
presented. In this work first the interleaver address generator
preset logic is changed to the finite state mechine based address
generator. Then, based on the statistical property of memory
conflicts, the other critical parameters access timing, power and
area are reduced when using another alternative for the IAG
which is FSM based IAG. This IAG is the efficient unified
parallel interleaver architecture supporting both interleaving and
deinterleaving modes. The FSM based IAG hardware model of
the interleaver is completely implemented in Spartan-3 FPGA.
Then the trellis tree diagram used in decoding was modified by
using adaptive turbo decoding architecture. Assigning the
threshold value for each state in the state reduce the buffer size
so that area is reduced. Decoding will be more accurate by the
adaptive architecture. Critical analysis of implementation results
of both approaches has been made to ease the decision making
of a system designer regarding the technique to adopt.
F.Speziali and J. Zory, “Scalable and area efficient
concurrent interleavre for high throughput turbodecoders,”inProc.Euromicro Symp. Digit. Syst. Des.
(DSD),Aug 2004,pp. 334-341.
Vosoughi,G. Wang, H. Shen,J. R. Cavallaro,Y .Guo
“ Highly Scalable on-the-fly interleaved address
Generation for UMTS/HSPA parallel turbo decoder,”
In Proc. IEEE Int. Conf.ASAP,Jun.2013,pp.356-362.
[6]. F. Chan and D. Haccoum “Adaptive Turbo Decoding of
Convolutional codes over memoryless channels”,IEEE
Trans. On Communications, pp. 1389-1400,Nov. 2001.
[7].
S.J. Simmons. “Breath-first trellis decoding with
Adaptive effort”, IEEE Trans. On Communications,
Vol. 38, pp. 3-12, Jan. 1990.
[8].
S.Swaminathan. “An FPGA-based Adaptive Turbo
Decoder”, Master’s thesis,University of Massachusets
Amherts, Dept. of Electrical and Computer EngineerIng, 2001.
[9].
M.Kivioja, J. B. Anderson. “M-algorithm decoding of
Channel convolutional codes”,Int. Conf. of Info.
Science and systems, pp. 362-366,Mar 1986.
[10]. Y. Sun and J. R. Cavallaro, “Efficient hardware
Implementation of a highly parallel 3GPP LTE/LTEAdvance turbo decoder,”VLSI J.Integr.,vol. 44, no. 4,
pp. 305-315,2011.
[11]. A. Nimbalkr, T. Blankenship,B. Classon,T. Fuja and
D. Costello, “Contention-free interleavers for high
Throughput turbo decoding,”IEEE Trans. Commun.,
Vol. 56,no. 8,pp. 2701-2704.
REFERENCES
[1].
[2].
[3].
Guohui Wang, Hao Shen, Joseph R. Cavallaro, Aida
Vasoughi,(2014), “Parallel interleaver design for a
high throughput HSPA+/LTE multi-standard turbo
decoder”, in IEEE Transactions on circuits and
systems, Volume. 61, No. 5, pp.1376-1389.
Z.Wang, Z.Chi and K.K Parhi, “Area-efficient highspeed decoding schemes for turbo decoders,”IEEE
Trans.VLSI Syst., vol.10,no.6,pp.902-912,2002.
C.Benkeser, A.Burg, T.Cupaiulo,and Q. Huang,
“Design and optimization of an HSDPA turbo decoder
ASIC”, IEEE J.Solid-State Circuits, vol. 44, no. 1,pp.
98-106.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 151
Download