Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT Nisha Laguri

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7 - Nov 2013
Design of Delay Efficient Distributed Arithmetic
Based Split Radix FFT
Nisha Laguri#1, K. Anusudha*2
#1
M.Tech Student, Electronics, Department of Electronics Engineering, Pondicherry University, Puducherry, India
*2
Assistant professor, Department of Electronics Engineering, Pondicherry University, Puducherry, India
Abstract— In this paper a Split Radix FFT without the use
of multiplier is designed. All the complex multiplications
are done by using Distributed Arithmetic (DA) technique.
For faster calculation parallel prefix adder is used.
Basically high radix algorithms are developed for efficient
calculation of FFT. These algorithms reduces overall
arithmetic operations in FFT, but increases the number of
operations and complexity of each butterfly. In Split Radix
FFT, mixed-radix approach helps to achieve low number
of multiplications and additions. DA is basically a bitserial computational operation that forms an inner (dot)
product of a pair of vectors in a single direct step. The
advantage of DA is its efficiency of mechanization. A
method is incorporated to overcome the overflow problem
introduced by DA method.
Keywords— Split Radix, Fast Fourier Transform (FFT), DA,
Parallel Prefix Adder
I. INTRODUCTION
Digital Signal Processing (DSP) kernels such as Discrete
Fourier Transform (DFT) are common in real time
applications. Since DFT computation requires a large amount
of arithmetic operations, Fast Fourier Transform (FFT)
processors are required to meet real-time requirement. Fast
Fourier Transform is a very common operation which is used
in various signal processing units. It has also applications in
the wide range of radar, image & speech processing etc. It is
important to have architecture which performs FFT quickly.
Different FFT algorithms, like the Radix-4 and the
High radix algorithms are also been developed for efficient
calculation of FFT. These algorithms reduces overall
arithmetic operations in FFT, but increases the number of
operations and complexity of each butterfly. Various
implementations are reported with high radix algorithm.
Among them, radix-4 algorithm is very popular due to its
lesser complexity. Split Radix FFT calculates the even parts
using the radix-2 algorithm and the odd parts using the radix-4
algorithm. This mixed-radix approach helped to achieve lower
number of multiplications and additions. The resulting
butterfly has simple structure. Every butterfly has two main
operations i.e., complex multiplication and addition. Complex
multiplication decides the speed, hardware cost and
consumption of power.
ISSN: 2231-5381
Usually there are three conventions always to tackle
the complex multiplication: Booth-Wallace multiplier,
CORDIC multiplier, and CSD multiplier. It is not easy to
handle the constant twiddle factors in CSD arithmetic and it
results in large area cost. Distributed Arithmetic (DA) with
Modulo Arithmetic, are the computation algorithms that
perform multiplication with look-up table based schemes. The
commonly encountered form of computation in digital signal
processing is a sum of products and it can be executed most
efficiently by DA. The advantage of DA is its efficiency of
mechanization. Since twiddle factors in any FFT algorithm are
fixed for specific N-point FFT, DA can be used to replace
complex multiplication in FFT.
As a summary, this paper addresses the
implementation aspects DA based multiplier and SRFFT. This
paper further arranged as follows. Section II explains the Split
Radix FFT algorithm and corresponding butterfly diagram.
Section III presents how a complex multiplication can be
substituted with DA operations. Sections IV gives detailed
architecture for DA based complex multiplier. Section V
explains about parallel prefix adder. Section VI and VII
explain about Kogge-Stone Adder and Brent Kung Adder.
Results are compared with other architectures in Section VIII
and lastly, in section IX paper is concluded.
Table I.
Number of real multiplication and additions to compute an n-point complex
DFT
Real Multiplication
N
Radix-2
Radix-4
16
24
20
32
88
64
264
208
Real Addition
Split
Radix-2
Radix-4
20
152
148
68
408
196
1032
Radix
Split
Radix
148
388
976
964
II. SPLIT-RADIX FFT ALGORITHM
While calculating FFT using Radix-2 method, it can be
concluded that even-numbered points and odd-numbered
points are computed independently. This leads to the
http://www.ijettjournal.org
Page 341
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7 - Nov 2013
possibility of using different computational methods for
different independent parts of the algorithm. This will reduce
computational complexity. Split-radix algorithm uses above
method by combining the simplicity of radix-2 algorithm and
lesser computational complexity of radix-4 algorithm,
achieving the lowest number of arithmetic operation count to
compute DFT of power-of-two sizes N.
Split-radix method recursively expresses DFT of length
N in terms of one smaller DFT of length N/2 and two smaller
DFTs of length N/4. Split-radix algorithm is only applicable
when N is a multiple of 4, but we can combine this with other
FFT algorithms.
This section presents briefly the SRFFT algorithm and its
butterfly structure. General equation of Discrete Fourier
Transform is given as(1)
Where,
Radix-2 algorithm calculates odd and even
components of X (k) using decimation in frequency.
Corresponding equations for even and odd components are
given by the equations:
Fig.2. SRFFT butterfly structure with outputs
The difference between part (1) and (2) is in the number
of stages required to finish the butterfly operation. Compared
to part (1), part (2) completes the operation in a single step
and is also a symmetrical structure from a hardware point of
view.
III. DISTRIBUTED ARITHMETIC METHOD FOR
COMPLEX MULTIPLICATION
(2)
(3)
Distributed Arithmetic can be used to implement
multiplication operation if either the multiplicand or the
multiplier value is fixed. It stores the possible combinations of
fixed operand in ROM and suitable combination is added and
shifted with respect to bits of other operand. The method for
DA based complex multiplication can be summarized as(4)
It shows that 4 real multiplications and 2 real additions are
required to compute
and . But these equations can be
considered as one „multiply and accumulate operation.
(5)
Let, are fixed coefficients and are the input words. If is
M-bit fractional number in 2‟s complement form then it can
be expressed in following form
Fig.1. SRFFT butterfly structure
(6)
ISSN: 2231-5381
http://www.ijettjournal.org
Page 342
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7 - Nov 2013
IV. ARCHITECTURE OF DA BASED COMPLEX
MULTIPLIER
B. Carry generation network
The signal from the first stage will proceed with the next stage, to
yield all carry bits signal. The stage containing three main complex
logic cells called as Black cell, Gray cell and buffer cell. Black cell
compute both
and
as define in equation (9) and (10), whereas
Gray cell only execute
. The stage of Prefix carry tree is a part that
differentiate or determine the adder used.
In this stage, carry is compute corresponding to each
bit. Execution of operations is carried out in parallel. After the
computation of carries in parallel they are segmented into
small pieces. It uses carry propagate and generate as
intermediate signals which are given by the logic equations 9
& 10:
(9)
(10)
Fig..3. DA based complex multiplier
The detailed architecture for complex multiplier is
shown in above Fig. The real and imaginary parts of incoming
words and are stored in two 8 bits wide parallel in serial
out register. Shifting is carried out starting from LSB to MSB.
Each output bit of these two registers are used as address
lines of the ROMs. The ROM stores precalculated outcomes
for both
and . The size of each ROM is 4×8. One of the
input to the 2:1 MUX is directly fed from the output of ROM
and the other input to MUX is inverted. Input and output bit
width for MUX is also 8 bits. The select line of MUX is „cin‟
signal and it remains as „0‟ till the MSB arrives at output. If
select line „cin‟ of Mux is 1, it selects inverted output from
ROM and it is added to the value stored in the partial product
register (PPR). The PPR is a 8 bit wide „parallel in parallel
out‟ register which also performs 1-bit right shift operation.
Finally the output is taken from the left shift register.
V. PARALLEL PREFIX ADDER
The parallel prefix adders are more flexible and are used
to speed up the binary additions. Parallel prefix adders are
fastest adders and these adders are used for high performance
arithmetic circuits in many industries. The construction of
parallel prefix adder involves three stages:
A. Pre- processing stage
B. Carry generation network
C. Post processing
Fig.4. Carry operator
The operations given in the figure are as follows-
C. Post Processing
Complement the overall adder operation, carry bits that
produced from the second stage shall pass through the last part
known as Post-Processing stage. This is the final step to
compute the summation of input bits. It is common for all
adders and the bits of sum are computed by logic equation-
A. Pre-possessing stage
In this stage we compute, generate and propagate various
signals to each pair of inputs A and B. Logic equations for
these signals are given in equations 7 & 8:
(11)
(12)
(7)
(8)
ISSN: 2231-5381
http://www.ijettjournal.org
Page 343
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7 - Nov 2013
VII. BRENT-KUNG PREFIX ADDER
The Brent-Kung adder is a parallel prefix adder. Parallel
prefix adders are special class of adders that are based on the
use of generate and propagate signals. Simpler Brent-Kung
adders was been proposed to solve the disadvantages of
Kogge-Stone adders. The cost and wiring complexity is
greatly reduced, but the logic depth of Brent-Kung adders
increases to 2log (2n-1). The block diagram of 4-bit BrentKung adder is shown in Fig.7. HDL coded multiplier is
carried out in Xilinx ISE simulator. The proposed multiplier is
coded in Verilog language.
Fig.5. Complex logic cells inside the Prefix Carry Tree
Specific into pre-processing stage, it is obviously consumes
two input ports
and
while producing generation signal
and
propagation signal. By refer to the equation (7) and
(8), the pre-processing block has managed to create a circuit
as shown in Figure5. The block of this stage should be put at
every single input bits of the adder.
VI. KOGGE-STONE PREFIX ADDER
Kogge-Stone adder is a parallel prefix form carry look-ahead
adder. A parallel prefix adder can be represented as a parallel
prefix graph consisting of carry operator nodes. The time
which is required to generate carry signals in this prefix adder
is o(log n). It is a fastest adder design and common design for
high performance adders in many industries. The KoggeStone adder was first developed by Peter M. Kogge and
Harold S. Stone which they published in 1973. The better
performances of Kogge-Stone adder are minimum logic depth
and bounded fan-out. But it has large area. The block diagram
of 4-bit Kogge-Stone adder is shown in Fig6.
Fig.7. 4-bit Brent-Kung adder
RESULT AND DISCUSSION
This design was synthesized in Xilinx Verilog of version 13.2
ISE web pack. The simulation is done by integrated test bench
in 13.2 versions. All the operation can be simulated at one
time; the behavioural simulation is done by executing the test
bench file.
Table II.
Comparison Table between different Distributed Arithmetic based Complex
Multiplier
Distributed
Distributed
Distributed
Arithmetic1
Arithmetic2
Arithmetic3
No. of Slices
24
49
40
No.of bonded
8
15
16
Delay(ns)
10.516
16.405
17.305
Maximum
207.639
94.661
86.699
0.00195
0.00195
0.00129
IOBs
Frequency(Mhz)
Fig.6. 4-bit Kogge-Stone adder
ISSN: 2231-5381
Power(w)
http://www.ijettjournal.org
Page 344
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 7 - Nov 2013
Table III.
Comparison between different Butterfly Structure
Butterfly 1
Butterfly 2
Butterfly 3
No. of slices
153
136
32
Bonded
144
144
72
Delay
13.598
16.365
9.901
DSP48A 1s
7
6
4
the pre-scaling of input data; instead, it pre-scaled the data and
stored in ROM. In this paper a new overflow technique is
proposed for the DA based complex multiplier. The
Distributed Arithmetic based complex multiplier is proposed
with parallel prefix adder i.e. Brent Kung Adder. It shows the
less delay along with high frequency when compare with
other adders. The simulated result also shows that the Split
Radix FFT proposed with prefix adder reduces delay. So the
proposed design is high in speed.
IOBs
REFERENCES
[1]
[2]
Table IV.
Comparison of SRFFT with different Adder
Parameters
[3]
SRFFT with
SRFFT with Brent
Adder
Kung Adder
No. of slices
81
86
No. of Bonded IOBs
20
20
Delay (ns)
10.190
7.297
Bit Width
8X8
8X8
[4]
[5]
[6]
[7]
IX. CONCLUSION
[8]
This paper describes the design of 8 point Split Radix FFT.
For the design of SRFFT, distributed arithmetic based
complex multiplier is used. The proposed architecture avoided
ISSN: 2231-5381
Sunil P. Joshi, Roy paily, “Distributed Arithmetic Based Split- Radix
FFT”, Journal of Signal Processing System, Springer, Online
Publication, 31 May 2013
M. Mohamed Ismail, Dr. M. J. S. Rangachar and Dr. Ch. D. V. Paradesi
Rao “VLSI Implementation of OFDM using Efficient Mixed-Radix 8-2
FFT Algorithm with bit reversal for the output sequences”, International
Journal of Elecctronics and Communication Engineering, IISN 09742166 vol. 5, no. 4, pp. 513-520, 2012
Ansuman Diptisankar Das, Abhishek Mankar, N. Prasad, K.K.
Mahapatra, Avas Kanta Swain, “Efficient VLSI Architectures of Split
Radix FFT using New Distributed Arithmetic”, International Journal of
Soft Computing and Engineerimg (IJSCE) ISSN:2231-2307, volume-3,
Issue-1, march 2013
K. Swarnalatha, S. Mohan Das, P. Uday Kumar, “An Efficient Carry
Select Adder with less delay and reduced area using FPGA Quartus II
Verilog Design” International Journal of Science, Engineering and
Technology Research (IJSETR) Volume 2, Issue 8, August 2013
John G. Proakis, Dimitris, G. Manolakis, “Digital Signal Processing:
Principles, Algorithms, And Applications, 4th edition, Published by
Pearson Edition, inc. @ 2007, pp.519-532
Eleanor Chu, Alan George, “Inside the FFT Black Box: Serial and
Parallel Fast Fourier Transform Algorithms, CRC Press LLC, N.W.
Corporate Blvd. Boea Raton, Florida33431, pp.22-25
Joseph Cavanagh, “Computer Arithmetic and Verilog HDL
Fundamentals”, 3rd Edition, 2010 CRC Press, Parkway N.W. Boea
Raton, Florida, pp.329-334
Jonas Claeson, “Design And Implementation of an Asynchronous
Pipelined FFT Processor”, M. Eng. Thesis, Avdelning, Institution,
Linköping, June 12, 2003
http://www.ijettjournal.org
Page 345
Download