A radix-8/4/2 FFT processor for OFDM systems Jungmin Park Project background OFDM used widely for high-speed digital communication High performance of FFT processor for real time application Dedicated FFT processor for only specific application Variable-length FFT processor for many applications Application FFT length DVB-T/H 2K-8K DAB 256-2K xDSL 256-4K WLAN 64-128 Nicola E. L’insalata, Sergio Sanponara, Luca Fanucci, Pierangelo Terreni, “Automatic Synthesis of cost effective FFT/IFFT cores for VLSI OFDM Systems” Architecture of conventional FFT processors Pipeline architecture – Pipeline process – Butterfly unit and memory every computation stage High throughput Input Parallel architecture – Parallel process – The worst case in hardware efficiency High throughput The best case in hardware efficiency – Low throughput Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit Butterfly Unit ... (b) Parallel Architecture Shared memory architecture – Butterfly Unit Shared memory Bank 1 Bank 2 High-radix FFT algorithm Bank r Radix-r Butterfly W Radix-r Butterfly (a) Pipeline Architecture W Radix-r Butterfly Radix-r Butterfly Unit (c)Shared memory Architecture Output large area with large FFT length Butterfly Unit Project objective and contents Project objective Design of High-performance and variable-length FFT processor for OFDM systems Project contents Hardware efficiency and area • Shared memory architecture • Proposed twiddle factor generator High throughput • Pipelined Radix-8 DIF FFT algorithm Only Radix-8 FFT : 8n points (64, 512, 4K points) Mixed Radix-8/4 FFT : 4 8n points (256, 2K points) and 8n points Mixed Radix-8/2 FFT: 2 8n points (128, 1K, 8K points) ) and 8n points Variable FFT length Mixed radix-8/4/2 DIF butterfly unit Memory assignment and addressing Efficient memory assignment and addressing Structure of proposed FFT processor Address generator Memory Bank0 – 8 banks Bank1 Bank3 Bank4 Bank5 Commutator2 Commutator1 Bank2 Radix-8/4/2 BU Memory address generator and Commutator Bank6 Bank7 Twiddle factor generator Pipelined Radix-8/4/2 butterfly unit Twiddle factor generator – Dual port Control Unit Control unit Operation of the proposed FFT processer (64-point data flow) The Pipelined radix-8/4/2 DIF butterfly unit S0 S1 Mode 0 0 4 parallel Radix-2 0 1 2 Parallel Radix-4 1 0 Radix-8 without multiplication 1 1 Radix-8 with multiplication Application of proposed butterfly unit Point # Computation Stages 1 stage 2 stage 3 stage 4 stage 64 2 Radix-8 Radix-8 128 3 Radix-8 Radix-8 Radix-2 256 3 Radix-8 Radix-8 Radix-4 512 3 Radix-8 Radix-8 Radix-8 1K 4 Radix-8 Radix-8 Radix-8 Radix-2 2K 4 Radix-8 Radix-8 Radix-8 Radix-4 4K 4 Radix-8 Radix-8 Radix-8 Radix-8 8K 5 Radix-8 Radix-8 Radix-8 Radix-8 5 stage Radix-2 Twiddle factor generator Twiddle factor generator – Recurisive feedback difference equation sin n 2cos n sin(n 1) sin(n 2) cos n 2cos cos(n 1) cos(n 2) – Error propagation problem sin m 2cos sin(m 1) sin(m 2) 3 3 n error n 2 2 2(cos 2 ( n1) )[sin(m 1) 2 ( n1) ] [sin(m 2) 2 ( n1) ] 2cos sin(m 1) sin(m 2) 2 n [sin(m 1) cos 21 2 ( n1) ] max | 2 n [sin(m 1) cos 21 2 ( n1) ] | 2 ( n 2) Proposed error correction using correction table [ z2 z1 z0 ]2 unsigned ([ x2 x1 x0 ]2 [ y2 y1 y0 ]2 ) Correct_ value signed (computed _ value [ z2 z1 z0 ]2 ) ( where [ x x x ] is 3LSBs of correct value in correction table, 2 1 0 2 [ y y y ] is 3LSBs of computed value.) 2 1 0 2 Structure of proposed twiddle factor generator LUT(Correction table) 2 3 R E G 16 17 16 16 Error Correction 16 16 cos j ) 16 LUT( LUT( s in 1 16 sin(nθ) 1 ) j R E G 0 0 sin n 2cos n sin(n 1) sin(n 2) cos n 2cos cos(n 1) cos(n 2) (a) Sine function generator LUT(Correction table) 2 R E G 3 23 24 23 Error Correction 23 23 LUT( 0 23'h200000 23 1 cos j) 23 R E G (b) Cosine function generator 0 1 23 Roundoff 16 cos(nθ) Implemenation and verification (1) VHDL modeling How to verify and measure SQNR HDL model Random Generator 16-QAM Modultation Ideal IFFT 16 Quantization Proposed FFT processor MATLAB Ideal FFT (MATLAB) MATLAB (Re( A)) (Im( A)) (Re( A) Re( B)) (Im( A) Im( B)) 2 SQNR 10 log10 ( 2 2 2 ) ( where A is the value of MATLAB, B is the value of proposed FFT ) 16 Comparison Implemenation and verification (2) Simulation (64 point FFT) (a) Constellations Point 64 SQNR 66.9 (dB) (b) Error and SQNR 128 256 512 1024 2048 4096 8192 63.2 60.3 57.7 55 51.9 48.1 45.3 Implemenation and verification (3) FPGA synthesis – Xilinx ISE 12.4 – Xilinx Virtex-5 Input data width(bit) Twiddle factor width(bit) LUTs Block RAMs DSP 48s Critical path(ns) Max. Freq.(MH z) 16 16 4811 22 57 10.339 96.723 Conclusion Design of high-performance and veriable-length FFT processor Shared memory architecutre – Simplicity of hardware Proposed Radix-8/4/2 DIF butterfly unit – Every point FFT computation from 64 to 8192 points Proposed twiddle factor generator – 80% reduction SQNR : 45.3 ~ 66.7 dB OFDM standard (symbol duration) – Proposed FFT processor for OFDM applications, such as 802.11a, 802.16a, DAB, DVB-T and so on Thank you for listening