Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory High-Throughput FFT Student : Andrey Kuyel Supervised by Mony Orbach Spring 2011 Midterm Presentation (One semestrial project) Project goals • The project goal is to design and implement on FPGA device FFT that capable to deal with data transmitted at the rates up to 10Ms/sec*. • The design will be written on VHDL • The project has aspects of: signal processing and logic design and high rate data processing. *- 5Ms/sec for each of I and Q components . FFT - Theoretic overview The DFT (N- length vector) definition is: N 1 X k x nWN nk ;0 k N 1 n 0 The time-complexity of the DFT is: ,WNk 2 k exp i N N 2 The FFT algorithm (developed at first by J.W. Cooley and John Tukey at 1965) comes to reduce the time-complexity of DFT into N log N This algorithm called: "The Cooley–Tukey radix-2 FFT algorithm". It is one of the most common FFT algorithms. FFT radix-2 - Theoretic overview The Cooley–Tukey radix-2 FFT algorithm The idea of algorithm is to compute in two parts: 1. Calculation two series Y k Y k , Z k N 1 2 x nW n 0 even nk N 2 , Z k While: N 1 2 x nW n 0 odd nk N 2 : The first series is using all of the even components of the input vector - x 2n , n 0,1, 2... N 1 x 2n 1 , n 1, 2... 2 N 2 , and the second uses the odd components This calculation takes log2 N FFT radix-2 - Theoretic overview Calculation of the twiddles: WNk 2 k exp i N The FFT (N- length vector) definition is: N X k Y k Z k WN k ;0 k 1 ; 2 N N X k Y k Z k WN k ;0 k 1 ; 2 2 FFT radix-2 - Theoretic overview The FFT schematic Radix-2 diagram (for N=2) – called butterfly unit: 4 real multipliers and 2 add/sub 16*16 bit 2 (16bits)adders/substractors The butterfly unit is used as a sub-unit in any Radix-2 FFT-N-size unit. The "twiddle factor" is the sine/cosine imaginary/real factor: WNk 2 k exp i N The FFT (N=8) radix 2 data flow C language program for creating FFT data flow for N points FFT FFT core features FFT core will have the following features: • Real and imaginary Inputs: 8 bits width each (for each of N point). • Real and imaginary outputs: 16 bits width each, where 8 MSB bits for integer part and 8 LSB bits for fractional part (for each N point). • Drop-in module for Virtex-6 (xc6vlx240T) • Forward complex FFT • Transform sizes N = 16,32,(possibly 64 and 128 ) • Arithmetic type: Fixed-point • Truncation after the butterfly • Block RAM or/and Distributed RAM for data storage • Bit/digit reversed or natural output order • Input data at frequency 10 Gs/sec (total rate for real and image part of data ) Parallel N points radix 2 FFT – Block diagram clock Synchronization signals Controls signals nput data (real). N points each of 8 bits width Input data (imag). N points each of 8 bits width Output data (real). N points each of 16 bits width Parallel N points radix 2 FFT Input data (imag). N points each of 16 bits width The FFT (N=16) radix 2 data flow – pipeline stages separation FFT verification and performance measurement The FFT core will be tested on Virtex-6 FPGA (Ml605 based on xc6vlx240T) Test module Virtex-6 FPGA A2D@10Gs/sec* Data in Controls in Data out FFT core Output signals Controls out Stimulus Memory MEM DATA@10Gs/sec Currently implementation will be made on Virtex-6 FPGA Family (xc6vlx240T), the devise contains 768 DSP slices, distributed RAM 3,650 Kb and block RAM of the size 14,976 Kb. See attached specification table at the end of document. *- Primary verification will be done on the reference data from FPGA internal memory that will emulate data sampled at the rate 10Ms/sec Post synthesis report • Maximum operational frequency up to 350MHz (5.6 Gs/sec per component). • The design have: 8*4*4=128 real (16*16 bits) multipliers. Where one 16*16 bits multiplier take 280 6LUTs 128 (16 bits) adders/subtractors each of the takes 16 of 6LUTS Simulation and verification • C program for generating radix 2 data flow • Radix 2 FFT N points parallel C language model for verification VHDL implementation of design. Project Schedule Weeks Tasks 1-2 Laboratory administrative issues Done 3 DFT , FFT Background studies Done 4 Studding different FFT algorithms and implimintations Done 5 Studding hardware available for the project implementation Done 6 Specification presentation, defining interface signals. Done 7 Defining FFT core architecture. (Composing block diagram that take advantage of hardware that FFT core will be tested on ) Done 8 Defining FSM (controllers) of the design and implementing them on VHDL. (Testing their functionality) In progress 9 Writing VHDL code for data path adapted for Virtex6 architecture. (Virtex-6 DSP slices and memory available for the core) Partial Project Schedule (continued…) Weeks Tasks 10 Continuing task from week 9. Writing VHDL code for data path adapted for Virtex6 architecture. (Virtex-6 DSP slices and memory available for the core) In progress 11 Midterm presentation Done 12-15 Finishing VHDL implementation and beginning of verification 16-17 Writing final presentation and project book