International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 Optimization of the performance of FFT Processor using Vedic algorithm 1 2 B. VARUN KUMAR Mrs. K. Maheswari 1 PG Student (M.Tech), Dept. Of ECE, Gates Institute of Technology, Gooty. 2 Assocaite Professor, Dept. Of ECE, Gates Institute of Technology, Gooty Abstract — Over the past few years, there has been increasing emphasis on extending the services available on wired public telecommunications networks to wireless telecommunications users. Demand for wire- less broadband multimedia communication systems (WBMCS) are anticipated within both the public and private sectors. To overcome multipath-fading environment with low complexity and to achieve WBMCS, we adopt orthogonal frequency- division multiplexing (OFDM) transmission scheme. OFDM is one of the applications of a parallel datatransmission scheme, which makes complex equalizers unnecessary. In the optimization of orthogonal frequency division multiplexing (OFDM) modulation, Fast Fourier transform (FFT) processor is a key component. In this paper we propose the optimization of 512-point FFT processor using Vedic multiplier which reduces unwanted multiplication steps, hence reduces propagation delay. Vedic multiplier is based on a novel concept through which the generation of all partial products can be done with the concurrent addition of these partial products. The parallelism in generation of partial products and their summation is obtained using Urdhava Triyakbhyam (vertical and crosswire). Vedic Multiplier has the advantage that as the number of bits increases, gate delay and area increases very slowly as compared to other multipliers. So FFT processor employing Vedic multiplier reduces propagation delay, hardware complexity in area and power in hardware optimization. At the end, a comparison is done between FFT processor using Vedic multiplier and existing booth multiplier. Keywords— Orthogonal frequency division multiplexing, FFT Processor, Vedic mathematics. I. INTRODUCTION OFDM essentially identical to coded OFDM (COFDM) and discrete multi-tone modulation (DMT), is a frequency-division multiplexing (FDM) scheme used as a digital multi-carrier modulation method. A huge range of closely-spaced orthogonal sub-carriers are employed to carry information data. The data is split into many parallel data streams or channels, one for every sub-carrier. Every sub-carrier is modulated with a traditional modulation scheme (such as quadrature AM or phase-shift keying) at a low symbol rate, maintaining total data rates analogous to that of standard singlecarrier modulation scheme within the same bandwidth. OFDM has developed into a popular scheme for wideband digital communication, whether or not wireless or over copper wires, employed in applications like digital TV and audio broadcasting, wireless networking and broadband web access. The primary advantage of OFDM over single-carrier schemes is its ability to deal with severe channel conditions (for example, attenuation of high frequencies during a long copper wire, narrowband interference and ISSN: 2231-5381 frequency-selective weakening owing to multipath) without complex equalization filters. Channel equalization complexity is simplified as a result of OFDM could also be viewed as victimization several slowly-modulated narrowband signals instead of one rapidly-modulated wideband signal. The low symbol rate makes the employment of a guard interval between symbols affordable, creating it potential to handle timespreading and overcoming inter symbol Interference (ISI). This mechanism conjointly facilitates the planning of Single Frequency Networks (SFNs), wherever many adjacent transmitters send a same signal at the same time at an same frequency, because the signals from multiple distant transmitters could also be combined constructively, instead of interfering as would commonly occur during a traditional single-carrier system. Figure 1 illustrates the distinction between the standard no overlapping multicarrier technique and therefore the overlapping multicarrier modulation technique. By victimization the overlapping multicarrier modulation technique, we have a tendency to save nearly fifty percentage of bandwidth. To understand this http://www.ijcttjournal.org Page 69 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 method, however, we'd like to scale back cross talk among SCs, which suggests that we wish Orthogonality among the different modulated carriers. The word ―orthogonal‖ indicates that there's a certain mathematical relation- ship between the frequencies of the carriers within the system. During a traditional FDM system, several carriers are spaced apart in such the way that the signals are often received victimization standard filters and demodulators. In such receivers, guard bands are introduced between the various carriers and within the frequency domain, which ends up during a lowering of spectrum potency. The input to the DFT may be a finite sequence of real or complex numbers creating the DFT ideal for process stored data in computers. Particularly, the DFT is wide utilized in signal process and related fields to analyse the frequencies contained in a sampled signal, to solve partial differential equations, and to perform different operations like convolutions or multiplying large integers. A key sanctioning factor for these applications is that the incontrovertible fact that the DFT will be computed with efficiency in follows employing a Fast Fourier Transform (FFT) algorithmic rule. Figure 2 Basic OFDM transmitter and receiver Figure 1 (A) spectrum of FDM showing guard bands (B) spectrum of OFDM showing overlapping subcarriers In arithmetic, the discrete Fourier Transform (DFT) is a specific reasonably discrete transform, utilized in Fourier analysis. It transforms one operate domain function into another, that is named the frequency domain representation, or just the DFT, of the original function (which is usually a function within the time domain). However the DFT needs associate input function that's discrete and whose non-zero values have a limited (finite) period. Such inputs are typically created by sampling a continuous function, sort of a person's voice. In contrast to the discrete-time Fourier Transform (DTFT), it solely evaluates enough frequency components to reconstruct the finite segment that was analyzed. Mistreatment the DFT implies that the finite segment that's analyzed is one amount of associate infinitely extended periodic signal; if this is often not truly true, a window function needs to be accustomed cut back the artifacts within the spectrum. For a similar reason, the inverse DFT cannot regenerate the entire time domain, unless the input happens to be periodic (forever). So it's typically same that the DFT may be a transform for Fourier analysis of finite-domain discretetime functions. The sinusoidal basis functions of the decomposition have similar properties. ISSN: 2231-5381 FFT algorithms are thus normally utilized to compute DFTs that the term "FFT" is usually wont to mean "DFT" in informal settings. Formally, there's a clear distinction: "DFT" refers to a mathematical transformation or function, no matter however it's computed, whereas "FFT" refers to a selected family of algorithms for computing DFTs. II. MODIFIED FFT/IFFT PROCESSOR Modified Radix 25 Algorithm The 512-point FFT with radix 2^k algorithmic computation rule consists of nine arithmetic stages. The radix 2^k algorithmic rule is developed mistreatment –dimensional linear index mapping. The radix 25 algorithmic rule may be expressed as various formulas using common 5 factor algorithmic rule. The radix 2 algorithmic rule is given as follows. Applying a 6-D linear index map http://www.ijcttjournal.org Page 70 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 6)N The radix 25 algorithmic rule is reformulated into two mouldering methods (Method 1 and Method 2), that are referred to as changed radix 25 algorithmic rule. The common factor algorithmic rule may be used for derivative explanation of method 1 and method 2 of that successively used for deriving radix 25 FFT algorithmic rules. The common factor algorithmic rule takes the form as in ) In the two radix-25 moldering methods we've got twiddle factors having complex numbers. If the two radix-25 moldering methods are used severally, the number of twiddle factor multiplications tends to extend, this can be as a result of the stage 2 of in method 2 has twiddle factor W16, that has higher hardware complexity compared to W8 in method one. Stages seven and eight in method1 have twiddle factors W8 and W32 That need higher hardware complexity compared to W16 in Method 2. The modified radix 25 radix algorithmic rule has similar complex multiplication pattern repeatedly after every five stages as tabulated below. Table 1 Twiddle factors at different stages of 512-point FFT Radix Method1 Method2 Mixed Method 1 The method 1 of the modified radix 25 radix algorithmic rule is expressed as follows: So the number of twiddle factor multiplications can be reduced by combining two methods. The processing elements in initial first five stages realized by method 1 and in last three stages are realized by method 2 referred to as mixed methods. Method 2 The method 2 can be expressed as follows: ISSN: 2231-5381 http://www.ijcttjournal.org Page 71 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 The CSD representation number could be of An integer a signed and distinctive digit representation that contains no adjacent nonzero digits. Given an n-digit binary unsigned number X={x0, x1, xn+1} expressed as Then the (n+1) CSD representation Y={y0, y1, ……yn} of X is given by Figure 3 Block diagram of 512-point modified radix 25 FFT/IFFT processor The algorithmic rule uses a complex constant multiplier using common sub-expression sharing (CSS) technique rather than a programmable complex Booth multiplier, that contains the fewest range of non-zero digits. Hence, the area and power consumption of the complicated multipliers may be reduced. Additionally, the memory size for storing twiddle factor LUT is simply half size than mistreatment complicated Booth multipliers. The Vedic multiplier is employed for the calculation of twiddle factor W8, W16, W32, W512 .The twiddle factor values are stored in a RAM and used as multiplicand number in Vedic multiplier. The butterfly unit 1(BU1) performs complex additions and subtractions of 2 input data: x[n] and x[n+N/2]. Butterfly unit 2 (BU2) includes twiddle factor W4 multiplication utilizing any multiplexers. The block diagram representation of BU1 and BU2 square measure shown below. The condition that each one nonzero digits in a very CSD number are separated by zeros implies that From this property, the likelihood that a CSD ndigit includes a nonzero worth is given by As n becomes massive, this probability tends to 1/3 whereas this probability becomes 1/2 in a computer binary code. Mistreatment this property, the amount of additions/subtractions is reduced to a minimum in multipliers and, as a result, an overall speed-up may be achieved. The adoption of a ternary number system representation mathematical notation adds some flexibility to the CSD representation, since it permits the amount of nonzero digits to be reduced. The conversion from a binary representation to CSD representation generally supported following identity. This means that a string of ones may be replaced by a 1, followed by 0s, followed by a 1¯ . Isolated 1s are left unchanged, however isolated 0s are re-examined in such how that, once applying relative Eq.(7), pairs of kind 11¯ square measure modified to 01. For instance, Figure 4 Butterfly Units 1 & 2 the binary number (001010111)2 is equivalent in a very CSD representation ISSN: 2231-5381 http://www.ijcttjournal.org to 0101¯ 01¯ 001¯ ; Page 72 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 the encryption method is diagrammatically shown in below figure. Figure 5 Conversion process from binary to CSD code III. VEDIC MATHEMATICS Urdhva Tiryakbhyam Sutra literature could be a general multiplication formula applicable to all or any cases of multiplication. It virtually suggests that ―Vertically and crosswise‖. It's supported a completely unique construct through that the generation of all partial product are often finished the coinciding addition of those partial product. The parallelism in generation of partial product and their summation is obtained mistreatment Urdhava Triyakbhyam. The algorithmic rules are often generalized for n x n bit variety. Since the partial product and their sums area unit calculated in parallel, the multiplier factor is freelance of the clock frequency of the processor. Therefore the multiplier factor would require constant quantity of your time to calculate the product and thus is freelance of the clock frequency. The net advantage is that it reduces the necessity of microprocessors to control at more and higher clock frequencies. Whereas a better clock frequency typically ends up in increased process power, its disadvantage is that it conjointly will increase power dissipation which ends in higher device in operation temperatures. By adopting the vedic multiplier, microprocessors designers will simply circumvent these issues to avoid ruinous device failures. The process power of multiplier factor will simply be increased by increasing the input and output data bus widths since it's ISSN: 2231-5381 a quite a regular structure. Owing to its regular structure, it are often simply layout in a silicon chip. The multiplier factor has the advantage that because the variety of bits will increase, gate delay and space will increase terribly slowly as compared to alternative multipliers. So it's time, area and power economical. It’s incontestable that this design is sort of economical in terms of silicon area/speed [10, 4]. Every Multiplication operation is associate embedded parallel 4x4 Multiply module. To illustrate the multiplication algorithmic rule, allow us to take into account the multiplication of 2 binary numbers a3a2a1a0 and b3b2b1b0. Because the results of this multiplication would be quite four bits, we tend to categorical it as... r3r2r1r0. Line diagram for multiplication of two 4-bit numbers is shown in Fig. 3.4 that is nothing however the mapping of in binary notation. For the simplicity, every bit is drawn by a circle. Least important bit r0 is obtained by multiplying the least significant bits of the multiplicand and the multiplier. The method is followed in keeping with the steps shown in Fig.6. Figure 6 Multiplication of 4X4 using Vedic Mathematics Firstly, least significant bits are multiplied which provides the least significant bit of the product (vertical). Then, the LSB of the multiplicand is increased with following higher bit of the multiplier factor and added with the product of LSB of multiplier factor and next higher bit of the number (crosswise). The addition provides second LSB of the product and also the carry is additional within the output of next stage add obtained by the crosswise and vertical multiplication and addition of three bits of the two numbers from least significant http://www.ijcttjournal.org Page 73 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 position. Next, all the four bits area unit processed with crosswise multiplication and addition to offer the sum and carry. The sum is that the corresponding bit of the product and also the carry is once more additional to following stage multiplication and addition of three bits except the LSB. Similar operation continues till the multiplication of the two MSBs providing the MSB of the product. For instance, if in some intermediate step, we get 110, then 0 can act as result bit (referred as rn) and 11as the carry (referred as cn). It ought to be clearly noted that cn is also a multi-bit number. Mistreatment the basic of Vedic multiplication, taking four bits at a time and mistreatment four bit multiplier factor block as mentioned we will perform the multiplication. The outputs of 4x4 bit multipliers area unit additional consequently to get the ultimate product. r0=a0b0; c1r1=a1b0+a0b1; c2r2=c1+a2b0+a1b1+a0b2; c3r3=c2+a3b0+a2b1+a1b2+a0b3; c4r4=c3+a3b1+a2b2+a1b3; c5r5=c4+a3b2+a2b3; c6r6=c5+a3b3 So we get c6r6r5r4r3r2r1r0 as the ultimate product. Thus this is often the overall mathematical formula applicable to all or any cases of multiplication. The hardware realization of a 4-bit multiplier factor is shown in figure 7. This hardware style is incredibly like that of the famed array multiplier factor wherever associate array of adders is needed to reach the ultimate product. The entire partial product is calculated in parallel and also the delay associated is principally the time taken by the carry to propagate through the adders. ISSN: 2231-5381 Figure 7 Hardware architecture of the Urdhva Tiryakbhyam multiplier IV. RESULTS AND CONCLUSIONS For the implemented 2^5 radix 512 point FFT processor we gave a 16-bit length complex input that was stored in a LUT(look up table) memory and the concerned output can be viewed in Model sim window. Here for our easy visualisation, we gave a predetermined 512-point input of that of a sinusoidal signal as shown below in fig 8, and its output is samples of staggered impulses as below in fig 9. Figure 8 Inputs to 512 point FFT processor As we cannot visualise 512 inputs in a single window, it is taken as an analog waveform (equivalent) of the stream of input sequence as shown in fig 8. In the same manner for output sequence resulting also, we visualised as an analog waveform as shown in fig 9. http://www.ijcttjournal.org Page 74 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 IOS 67 RAMS 3 Flipflops 3343 #Adders 110 Subtractors # XORs 12 # Shift 951 Registers Figure 9 Outputs to 512 point FFT processor The Vedic multiplier does not use any storage unit for storing intermediate product value and thus ensure substantial reduction in the propagation delay. It is also found that the Vedic algorithm does not uses any subtractor thereby reducing the total area and computation cost. The RTL schematic of the implemented 512-point FFT processor is shown in below fig.10. Cell usage: 9308 #BELS 243 #AND2 131 #INV #IO Buffers 66 #IBUF 27 #OBUF 39 This chapter explains about the simulation and synthesis results of implementing 512-point FFT processor using Vedic algorithm. It also shows better performance of system, when booth multiplier replaced with Vedic multiplier. REFERENCES Figure 10 RTL schematic of 512-point FFT processor Table 2 Parameter Conventional algorithm Vedic algorithm Parameter Conventional Vedic algorithm algorithm Power 93mW 81mW Delay 5.67nsec 4.31nsec Temperature 30.4oc 27oc Table 3 Performance of the Implemented FFT processor Parameters Using Vedic multiplier Delay 4.31nsec ISSN: 2231-5381 1. Cortes.A, Velez.I, and Sevillano.J.F, ―Radix rK FFTs Matrical representation and SDC/SDF pipeline implementation,‖ IEEE Trans. Signal process,vol.57,no.7,pp.2824-2839,jul.2009. 2. Hartley R, ―Subexpression sharing in filters using canonic signed digit multipliers‖, IEEE Trans. Circuit Syst 11, Exp. Briefs, vol.43, no.10,pp.677688,oct,1996. 3. Huang.S and Chen.S, ―A green FFT processor with 2.5-GS/ for IEEE 802.15.3c (WPANs),‖ in Proc. Int. Conf. Green Circuits Syst. (ICGCS), 2010. 4. Cho.T, Lee.H, "A High-Speed Low-Complexity Modified radix 2^5FFT Processor for High Rate WPAN Applications," IEEE Transactions on Very Large Scale Integration (VLSI) Systems,vol.pp,no .99, dec2011 5. A.RonishaPrakash, S.Kirubaveni ―Performance Evaluation of FFT Processor Using Conventional and Vedic Algorithm‖ 2013 IEEE International Conference on Emerging Trends in Computing, http://www.ijcttjournal.org Page 75 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number2- Dec 2014 Communication and Nanotechnology (ICECCN 2013) 6. Kunchigi.V, Kulkarni.L, Kulkarni.S, "High speed and area efficient vedic multiplier," Devices, Circuits and Systems (ICDCS), 2012 International Conference on , vol., no., pp.360-364, 15-16 March 2012 7. Cho.T, Lee.H, Park.K, and Park.C, ―A high-speed lowcomplexity modified radix 2^5 FFT processor for gigabit WPAN applications,‖ in Proc. IEEE Int. Symp. Circuits Syst (ISCAS),2011, pp. 1259– 1262. Authors Profiles B. VARUN KUMAR is pursuing his Master degree M.Tech in VLSI & EMBEDDED SYSTEN DESIGN in Gates Institute of Technology, Gooty. K. Maheswari, is working as Associate Professor in Gates Institute Of Technology, Gooty. Her areas of interest include Mobile Communication, wireless communication, Cryptography ISSN: 2231-5381 http://www.ijcttjournal.org Page 76