Optimization of the performance of FFT Processor using Vedic algorithm

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
Optimization of the performance of FFT Processor using Vedic algorithm
1
2
B. VARUN KUMAR
Mrs. K. Maheswari
1
PG Student (M.Tech), Dept. Of ECE, Gates Institute of Technology, Gooty.
2
Assocaite Professor, Dept. Of ECE, Gates Institute of Technology, Gooty
Abstract — Over the past few years, there has been increasing emphasis on extending the services available on
wired public telecommunications networks to wireless telecommunications users. Demand for wire- less broadband
multimedia communication systems (WBMCS) are anticipated within both the public and private sectors. To
overcome multipath-fading environment with low complexity and to achieve WBMCS, we adopt orthogonal
frequency- division multiplexing (OFDM) transmission scheme. OFDM is one of the applications of a parallel datatransmission scheme, which makes complex equalizers unnecessary. In the optimization of orthogonal frequency
division multiplexing (OFDM) modulation, Fast Fourier transform (FFT) processor is a key component. In this
paper we propose the optimization of 512-point FFT processor using Vedic multiplier which reduces unwanted
multiplication steps, hence reduces propagation delay. Vedic multiplier is based on a novel concept through which
the generation of all partial products can be done with the concurrent addition of these partial products. The
parallelism in generation of partial products and their summation is obtained using Urdhava Triyakbhyam (vertical
and crosswire). Vedic Multiplier has the advantage that as the number of bits increases, gate delay and area
increases very slowly as compared to other multipliers. So FFT processor employing Vedic multiplier reduces
propagation delay, hardware complexity in area and power in hardware optimization. At the end, a comparison is
done between FFT processor using Vedic multiplier and existing booth multiplier.
Keywords— Orthogonal frequency division multiplexing, FFT Processor, Vedic mathematics.
I. INTRODUCTION
OFDM essentially identical to coded OFDM
(COFDM) and discrete multi-tone modulation (DMT), is
a frequency-division multiplexing (FDM) scheme used
as a digital multi-carrier modulation method. A huge
range of closely-spaced orthogonal sub-carriers are
employed to carry information data. The data is split into
many parallel data streams or channels, one for every
sub-carrier. Every sub-carrier is modulated with a
traditional modulation scheme (such as quadrature AM
or phase-shift keying) at a low symbol rate, maintaining
total data rates analogous to that of standard singlecarrier modulation scheme within the same bandwidth.
OFDM has developed into a popular scheme for
wideband digital communication, whether or not
wireless or over copper wires, employed in applications
like digital TV and audio broadcasting, wireless
networking and broadband web access.
The primary advantage of OFDM over single-carrier
schemes is its ability to deal with severe channel
conditions (for example, attenuation of high frequencies
during a long copper wire, narrowband interference and
ISSN: 2231-5381
frequency-selective weakening owing to multipath)
without complex equalization filters. Channel
equalization complexity is simplified as a result of
OFDM could also be viewed as victimization several
slowly-modulated narrowband signals instead of one
rapidly-modulated wideband signal. The low symbol
rate makes the employment of a guard interval between
symbols affordable, creating it potential to handle timespreading and overcoming inter symbol Interference
(ISI). This mechanism conjointly facilitates the planning
of Single Frequency Networks (SFNs), wherever many
adjacent transmitters send a same signal at the same time
at an same frequency, because the signals from multiple
distant transmitters could also be combined
constructively, instead of interfering as would
commonly occur during a traditional single-carrier
system.
Figure 1 illustrates the distinction between the
standard no overlapping multicarrier technique and
therefore the overlapping multicarrier modulation
technique. By victimization the overlapping multicarrier
modulation technique, we have a tendency to save
nearly fifty percentage of bandwidth. To understand this
http://www.ijcttjournal.org
Page 69
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
method, however, we'd like to scale back cross talk
among SCs, which suggests that we wish Orthogonality
among the different modulated carriers.
The word ―orthogonal‖ indicates that there's a certain
mathematical relation- ship between the frequencies of
the carriers within the system. During a traditional FDM
system, several carriers are spaced apart in such the way
that the signals are often received victimization standard
filters and demodulators. In such receivers, guard bands
are introduced between the various carriers and within
the frequency domain, which ends up during a lowering
of spectrum potency.
The input to the DFT may be a finite sequence of real
or complex numbers creating the DFT ideal for process
stored data in computers. Particularly, the DFT is wide
utilized in signal process and related fields to analyse the
frequencies contained in a sampled signal, to solve
partial differential equations, and to perform different
operations like convolutions or multiplying large
integers. A key sanctioning factor for these applications
is that the incontrovertible fact that the DFT will be
computed with efficiency in follows employing a Fast
Fourier Transform (FFT) algorithmic rule.
Figure 2 Basic OFDM transmitter and receiver
Figure 1 (A) spectrum of FDM showing guard bands (B)
spectrum of OFDM showing overlapping subcarriers
In arithmetic, the discrete Fourier Transform (DFT)
is a specific reasonably discrete transform, utilized in
Fourier analysis. It transforms one operate domain
function into another, that is named the frequency
domain representation, or just the DFT, of the original
function (which is usually a function within the time
domain). However the DFT needs associate input
function that's discrete and whose non-zero values have
a limited (finite) period. Such inputs are typically
created by sampling a continuous function, sort of a
person's voice. In contrast to the discrete-time Fourier
Transform (DTFT), it solely evaluates enough frequency
components to reconstruct the finite segment that was
analyzed. Mistreatment the DFT implies that the finite
segment that's analyzed is one amount of associate
infinitely extended periodic signal; if this is often not
truly true, a window function needs to be accustomed
cut back the artifacts within the spectrum. For a similar
reason, the inverse DFT cannot regenerate the entire
time domain, unless the input happens to be periodic
(forever). So it's typically same that the DFT may be a
transform for Fourier analysis of finite-domain discretetime functions. The sinusoidal basis functions of the
decomposition have similar properties.
ISSN: 2231-5381
FFT algorithms are thus normally utilized to compute
DFTs that the term "FFT" is usually wont to mean
"DFT" in informal settings. Formally, there's a clear
distinction: "DFT" refers to a mathematical
transformation or function, no matter however it's
computed, whereas "FFT" refers to a selected family of
algorithms for computing DFTs.
II. MODIFIED FFT/IFFT PROCESSOR
Modified Radix 25 Algorithm
The
512-point
FFT
with radix 2^k algorithmic
computation
rule consists
of nine arithmetic stages. The radix 2^k algorithmic
rule is developed mistreatment –dimensional
linear
index mapping. The radix 25 algorithmic rule may
be expressed
as various formulas using
common
5
factor algorithmic rule. The radix 2 algorithmic rule is
given as follows. Applying a 6-D linear index map
http://www.ijcttjournal.org
Page 70
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
6)N
The radix 25 algorithmic rule is reformulated
into two mouldering methods (Method 1 and Method 2),
that are referred to as changed radix 25 algorithmic rule.
The common factor algorithmic rule may be used for
derivative explanation of method 1 and method 2 of that
successively used for deriving radix 25 FFT algorithmic
rules. The common factor algorithmic rule takes the
form as in
)
In the two radix-25 moldering methods we've got
twiddle factors having complex numbers. If the two
radix-25 moldering methods are used severally, the
number of twiddle factor multiplications tends to extend,
this can be as a result of the stage 2 of in method 2 has
twiddle factor W16, that has higher hardware
complexity compared to W8 in method one. Stages
seven and eight in method1 have twiddle factors W8 and
W32 That need higher hardware complexity compared
to W16 in Method 2. The modified radix 25 radix
algorithmic rule has similar complex multiplication
pattern repeatedly after every five stages as tabulated
below.
Table 1 Twiddle factors at different stages of 512-point FFT
Radix
Method1
Method2
Mixed
Method 1
The method 1 of the modified radix 25 radix
algorithmic
rule is
expressed
as
follows:
So the number of twiddle factor multiplications can
be reduced by combining two methods. The processing
elements in initial first five stages realized by method 1
and in last three stages are realized by method 2 referred
to as mixed methods.
Method 2
The method 2 can be expressed as follows:
ISSN: 2231-5381
http://www.ijcttjournal.org
Page 71
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
The
CSD representation
number could
be
of An integer
a signed
and distinctive
digit representation that contains no adjacent nonzero
digits. Given an n-digit binary unsigned number X={x0,
x1, xn+1} expressed as
Then
the
(n+1)
CSD representation
Y={y0,
y1, ……yn} of X is given by
Figure 3 Block diagram of 512-point modified radix 25
FFT/IFFT processor
The algorithmic rule uses a complex constant
multiplier using common sub-expression sharing (CSS)
technique rather than a programmable complex Booth
multiplier, that contains the fewest range of non-zero
digits. Hence, the area and power consumption of the
complicated multipliers may be reduced. Additionally,
the memory size for storing twiddle factor LUT is
simply half size than mistreatment complicated Booth
multipliers. The Vedic multiplier is employed for the
calculation of twiddle factor W8, W16, W32,
W512 .The twiddle factor values are stored in a RAM
and used as multiplicand number in Vedic multiplier.
The butterfly unit 1(BU1) performs complex additions
and subtractions of 2 input data: x[n] and x[n+N/2].
Butterfly unit 2 (BU2) includes twiddle factor W4
multiplication utilizing any multiplexers. The block
diagram representation of BU1 and BU2 square measure
shown below.
The condition that each one nonzero digits in a
very CSD number are separated by zeros implies that
From this property, the likelihood that a CSD ndigit includes a nonzero worth is given by
As n becomes massive, this probability tends to
1/3 whereas this probability becomes 1/2 in a computer
binary code. Mistreatment this property, the amount of
additions/subtractions is reduced to a minimum in
multipliers and, as a result, an overall speed-up may
be achieved. The adoption of a ternary number system
representation
mathematical
notation
adds
some
flexibility to the CSD representation, since it permits the
amount of nonzero digits to be reduced. The conversion
from a binary representation to CSD representation
generally supported following identity.
This means that a string of ones may be replaced
by a 1, followed by 0s, followed by a 1¯ . Isolated 1s are
left unchanged, however isolated 0s are re-examined in
such how
that, once applying relative
Eq.(7),
pairs
of kind 11¯ square measure modified to 01. For instance,
Figure 4 Butterfly Units 1 & 2
the binary number (001010111)2 is equivalent in a
very CSD representation
ISSN: 2231-5381
http://www.ijcttjournal.org
to
0101¯
01¯
001¯
;
Page 72
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
the encryption method is diagrammatically shown
in
below figure.
Figure 5 Conversion process from binary to CSD code
III.
VEDIC MATHEMATICS
Urdhva Tiryakbhyam Sutra literature could be a
general multiplication formula applicable to all or any
cases of multiplication. It virtually suggests that
―Vertically and crosswise‖. It's supported a completely
unique construct through that the generation of all partial
product are often finished the coinciding addition of
those partial product. The parallelism in generation of
partial product and their summation is obtained
mistreatment Urdhava Triyakbhyam. The algorithmic
rules are often generalized for n x n bit variety. Since the
partial product and their sums area unit calculated in
parallel, the multiplier factor is freelance of the clock
frequency of the processor. Therefore the multiplier
factor would require constant quantity of your time to
calculate the product and thus is freelance of the clock
frequency. The net advantage is that it reduces the
necessity of microprocessors to control at more and
higher clock frequencies. Whereas a better clock
frequency typically ends up in increased process power,
its disadvantage is that it conjointly will increase power
dissipation which ends in higher device in operation
temperatures. By adopting the vedic multiplier,
microprocessors designers will simply circumvent these
issues to avoid ruinous device failures. The process
power of multiplier factor will simply be increased by
increasing the input and output data bus widths since it's
ISSN: 2231-5381
a quite a regular structure. Owing to its regular structure,
it are often simply layout in a silicon chip. The
multiplier factor has the advantage that because the
variety of bits will increase, gate delay and space will
increase terribly slowly as compared to alternative
multipliers. So it's time, area and power economical. It’s
incontestable that this design is sort of economical in
terms of silicon area/speed [10, 4]. Every Multiplication
operation is associate embedded parallel 4x4 Multiply
module.
To illustrate the multiplication algorithmic rule,
allow us to take into account the multiplication of 2
binary numbers a3a2a1a0 and b3b2b1b0. Because the
results of this multiplication would be quite four bits, we
tend to categorical it as... r3r2r1r0. Line diagram for
multiplication of two 4-bit numbers is shown in Fig. 3.4
that is nothing however the mapping of in binary
notation. For the simplicity, every bit is drawn by a
circle. Least important bit r0 is obtained by multiplying
the least significant bits of the multiplicand and the
multiplier. The method is followed in keeping with the
steps shown in Fig.6.
Figure 6 Multiplication of 4X4 using Vedic Mathematics
Firstly, least significant bits are multiplied which
provides the least significant bit of the product (vertical).
Then, the LSB of the multiplicand is increased with
following higher bit of the multiplier factor and added
with the product of LSB of multiplier factor and next
higher bit of the number (crosswise). The addition
provides second LSB of the product and also the carry is
additional within the output of next stage add obtained
by the crosswise and vertical multiplication and addition
of three bits of the two numbers from least significant
http://www.ijcttjournal.org
Page 73
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
position. Next, all the four bits area unit processed with
crosswise multiplication and addition to offer the sum
and carry. The sum is that the corresponding bit of the
product and also the carry is once more additional to
following stage multiplication and addition of three bits
except the LSB. Similar operation continues till the
multiplication of the two MSBs providing the MSB of
the product. For instance, if in some intermediate step,
we get 110, then 0 can act as result bit (referred as rn)
and 11as the carry (referred as cn). It ought to be clearly
noted that cn is also a multi-bit number. Mistreatment
the basic of Vedic multiplication, taking four bits at a
time and mistreatment four bit multiplier factor block as
mentioned we will perform the multiplication. The
outputs of 4x4 bit multipliers area unit additional
consequently to get the ultimate product.
r0=a0b0;
c1r1=a1b0+a0b1;
c2r2=c1+a2b0+a1b1+a0b2;
c3r3=c2+a3b0+a2b1+a1b2+a0b3;
c4r4=c3+a3b1+a2b2+a1b3;
c5r5=c4+a3b2+a2b3;
c6r6=c5+a3b3
So we get c6r6r5r4r3r2r1r0 as the ultimate
product. Thus this is often the overall mathematical
formula applicable to all or any cases of multiplication.
The hardware realization of a 4-bit multiplier
factor is shown in figure 7. This hardware style is
incredibly like that of the famed array multiplier factor
wherever associate array of adders is needed to reach the
ultimate product. The entire partial product is calculated
in parallel and also the delay associated is principally the
time taken by the carry to propagate through the adders.
ISSN: 2231-5381
Figure 7 Hardware architecture of the Urdhva
Tiryakbhyam multiplier
IV.
RESULTS AND CONCLUSIONS
For the implemented 2^5 radix 512 point FFT
processor we gave a 16-bit length complex input that
was stored in a LUT(look up table) memory and the
concerned output can be viewed in Model sim window.
Here for our easy visualisation, we gave a
predetermined 512-point input of that of a sinusoidal
signal as shown below in fig 8, and its output is
samples of staggered impulses as below in fig 9.
Figure 8 Inputs to 512 point FFT processor
As we cannot visualise 512 inputs in a single
window, it is taken as an analog waveform (equivalent)
of the stream of input sequence as shown in fig 8. In the
same manner for output sequence resulting also, we
visualised as an analog waveform as shown in fig 9.
http://www.ijcttjournal.org
Page 74
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
IOS
67
RAMS
3
Flipflops
3343
#Adders
110
Subtractors
# XORs
12
# Shift
951
Registers
Figure 9 Outputs to 512 point FFT processor
The Vedic multiplier does not use any storage
unit for storing intermediate product value and thus
ensure substantial reduction in the propagation delay. It
is also found that the Vedic algorithm does not uses any
subtractor thereby reducing the total area and
computation cost. The RTL schematic of the
implemented 512-point FFT processor is shown in
below fig.10.
Cell usage:
9308
#BELS
243
#AND2
131
#INV
#IO Buffers
66
#IBUF
27
#OBUF
39
This chapter explains about the simulation and
synthesis results of implementing 512-point FFT
processor using Vedic algorithm. It also shows better
performance of system, when booth multiplier replaced
with Vedic multiplier.
REFERENCES
Figure 10 RTL schematic of 512-point FFT processor
Table 2 Parameter Conventional algorithm Vedic
algorithm
Parameter
Conventional
Vedic
algorithm
algorithm
Power
93mW
81mW
Delay
5.67nsec
4.31nsec
Temperature
30.4oc
27oc
Table 3 Performance of the Implemented FFT processor
Parameters
Using Vedic multiplier
Delay
4.31nsec
ISSN: 2231-5381
1. Cortes.A, Velez.I, and Sevillano.J.F, ―Radix rK
FFTs Matrical representation and SDC/SDF
pipeline implementation,‖ IEEE Trans. Signal
process,vol.57,no.7,pp.2824-2839,jul.2009.
2. Hartley R, ―Subexpression sharing in filters using
canonic signed digit multipliers‖, IEEE Trans.
Circuit Syst 11, Exp. Briefs, vol.43, no.10,pp.677688,oct,1996.
3. Huang.S and Chen.S, ―A green FFT processor
with 2.5-GS/ for IEEE 802.15.3c (WPANs),‖ in
Proc. Int. Conf. Green Circuits Syst. (ICGCS),
2010.
4. Cho.T, Lee.H, "A High-Speed Low-Complexity
Modified radix 2^5FFT Processor for High Rate
WPAN Applications," IEEE Transactions on Very
Large Scale Integration (VLSI) Systems,vol.pp,no
.99, dec2011
5. A.RonishaPrakash, S.Kirubaveni ―Performance
Evaluation of FFT Processor Using Conventional
and Vedic Algorithm‖ 2013 IEEE International
Conference on Emerging Trends in Computing,
http://www.ijcttjournal.org
Page 75
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014
Communication and Nanotechnology (ICECCN
2013)
6. Kunchigi.V, Kulkarni.L, Kulkarni.S, "High speed
and area efficient vedic multiplier," Devices,
Circuits and Systems (ICDCS), 2012 International
Conference on , vol., no., pp.360-364, 15-16
March 2012
7. Cho.T, Lee.H, Park.K, and Park.C, ―A high-speed
lowcomplexity modified radix 2^5 FFT processor
for gigabit WPAN applications,‖ in Proc. IEEE
Int. Symp. Circuits Syst (ISCAS),2011, pp. 1259–
1262.
Authors Profiles
B. VARUN KUMAR is pursuing
his Master degree M.Tech in VLSI &
EMBEDDED SYSTEN DESIGN in
Gates Institute of Technology,
Gooty.
K. Maheswari, is working as
Associate Professor in Gates
Institute Of Technology, Gooty.
Her areas of interest include Mobile
Communication,
wireless
communication, Cryptography
ISSN: 2231-5381
http://www.ijcttjournal.org
Page 76
Download