September 2004 doc.: IEEE 802. 15-04-0467-00-003a Project: IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANS) Submission Title: [Implementation of High Speed FFT processor for MB-OFDM System] Date Submitted: [September 2004] Revised: [] Source: [Sang-sung Choi, Sang-in Cho] Company [Electronics and Telecommunications Research Institute] Address [161 Gajeong-dong, Yuseong-gu, Daejeon, 305-350 Korea] Voice : [+82-42-860-6722], FAX : [+82-42-860-5199], E-mail [sschoi@etri.re.kr] Re: [Technical contribution] Abstract: [This presentation presents the implementation method of IFFT/FFT processor for MB-OFDM UWB system] Purpose: [Technical contribution to implement IFFT/FFT processor proposed for MB-OFDM UWB system] Notice: This document has been prepared to assist the IEEE P802.15. It is offered as a basis for discussion and is not binding on the contributing individual or organization. The material in this document is subject to change in form and content after further study. The contributor reserves the right to add, amend or withdraw material contained herein. Release: The contributor acknowledges and accepts that this contribution becomes the property of IEEE and may be made publicly available by P802.15. doc.: IEEE 802.15-04-0467-00-003a September 2004 Implementation of High Speed FFT processor for MB-OFDM System Sang-Sung Choi (sschoi@etri.re.kr) Sang-In Cho (sicho@etri.re.kr) ETRI www.etri.re.kr Submission Slide 2 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Introduction MB-OFDM UWB proposal requires high speed IFFT/FFT processors with 128-point computation. Digital signals processed in IFFT processor change into analog signals by DAC, and then pass through the sharp LPF to satisfy the transmitting PSD mask. - Transmitter using 128-point IFFT processor (DAC speed : 528MHz) 128 point IFFT P/S D/A S/P LPF 128-point Complex data 528Msps 528Msps - LPF shape & Frequency spectrum of OFDM signal after DAC Desired Filter shape Conventional Filter shape 0 528 8.25 Submission 1056 f [MHz] 12.375 Slide 3 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Introduction The TX LPF is very important to determine the transmit PSD mask of MB-OFDM UWB system, but the TX LPF design is not easy to satisfy the Transmit PSD mask of MB-OFDM. Two methods are considered to design the TX LPF satisfying the transmit PSD mask. 1) fix 528MHz sampling rate of DAC , and design high order TX LPF 2) increase sampling rate of DAC, and reduce the order of TX LPF Use 2 times over-sample rate at DAC to design the TX LPF. - Reduce the order of TX LPF - It has advantage of the performance compared to method 1). Presented by DOC IEEE802.15-03/275r0 There are trade-offs between two methods for considering power consumption and gate size etc. ETRI is developing a prototype UWB system using 256-point IFFT processor (DAC speed : 1056MHz) Submission Slide 4 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Proposed IFFT Processor Approach For easy low pass filtering of 528MHz baseband signal after DAC, we have to make space between OFDM signals that are repeated in frequency spectrum, which is accomplished by 128-point zero-padding. - Transmitter using 256-point IFFT processor (DAC speed : 1056MHz) D/A 256 point IFFT S/P P/S LPF 128-point complex data + 128 zeros 528Msps 1056Msps - LPF shape & Frequency spectrum of OFDM signal after DAC Desired Filter shape Conventional Filter shape 0 1056 8.25 Submission f [MHz] 12.375 Slide 5 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 IFFT/FFT Processor Specification Input data of FFT processor are QPSK modulated 128-point complex data Input data of IFFT processor become 256-point that consisted of QPSK modulated 128-point complex data and 128-point zeros. - Input data of original IFFT processor (128-point QPSK data) 0 1 2 3 ……………………… 63 64 …………………… 126 127 f 0 n 128 -64 …………………… -3 -2 -1 0 1 2 3 ……………………… 63 -64 …………………… -3 -2 -1 0 1 2 3 ……………………… 63 128-point complex data Frequency Domain - Input data of proposed IFFT processor (128-point QPSK data + 128-point zeros) 0 1 2 3 ……………………… 63 64 …………………… 126 127 128 129 ……………………… 191 192 …………………… 254 255 f 0 128 n 256 -64 …………………… -3 -2 -1 0 1 2 3 ……………………… 63 0 1 2 3 …………………… 63 64 ………………………… 126 127 -64 ………………… -3 -2 -1 0 1 2 3 ……………………… 63 256-point complex data Frequency Domain Submission Slide 6 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Proposed transceiver for MB-OFDM UWB PHY proposal Output : 8 samples/clock Input : 4 samples/clock Input : 4 samples/clock Output : 4 samples/clock UWB Channel 256point IFFT Add ZP, Guard interval DAC Clipping Level Freq. Domain TX LO RX LO Slide 7 128point FFT Freq. Domain Time Domain 1056 MHz Submission ADC Remove ZP, Guard interval 528 MHz ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Characteristics of Multipliers Multiplier is one of the most dominant elements in FFT/IFFT implementation – Standard 2’s Complement Multiplier • (W-bit) x (W-bit) = (2W-1)-bit • Many DSP applications need only W-bit products – Fixed-Width Multiplier • Quantization to W-bit by eliminating (W-1) Least Significant Bits • Can reduce area by approximately 50% but Truncation Error is introduced • Proper Error Compensation Bias needed – Canonic Signed Digit Multiplier • Constant coefficient • 33% fewer nonzero digits than 2’s complement numbers – Modified Booth Multiplier • Variable coefficient • The number of partial products has been reduced to W/2 • These multipliers can achieve about 40% reduction in area and power consumption Submission Slide 8 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 The radix-24 structure of FFT processor DFT : X (k ) N 1 n x(n)WNkn N 2 n1 N 4 n2 N 8 N n3 16 n4 n5 k k1 2k2 4k3 8k4 16k5 n 0 N X (k1 2k2 4k3 8k4 16k5 ) 16 1 1 1 1 1 x n5 0 n4 0 n3 0 n2 0 n1 0 N 2 n1 N 4 n2 N 8 N n3 16 n4 n5 WNnk Radix-2 structure 16 Delay TWF Butterfly (1) 8 Delay TWF Butterfly (2) 4 Delay TWF Butterfly (3) 2 Delay TWF Butterfly (4) 1 Delay TFW Butterfly (5) Radix-24 structure 16 Delay Butterfly (1) W32 8 Delay -j Butterfly (2) TWF Butterfly (3) CSD multiplier Submission 4 Delay Modified Booth multiplier Slide 9 2 Delay Butterfly (4) -j 1 Delay W16 Butterfly (5) CSD multiplier ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 The structure of 256-point IFFT processor (124) . . . (8)(4)(0) (127) . . . (11)(7)(3) 32-point Radix-24 FFT - P3 32-point Radix-24 FFT - P4 32-point Radix-24 FFT – P5 P/S (126) . . . (10)(6)(2) Bit Reverse Input Data 32-point Radix-24 FFT - P2 S/P (125) . . . (9)(5)(1) 32-point Radix-24 FFT - P1 Output Data 32-point Radix-24 FFT – P6 32-point Radix-24 FFT – P7 32-point Radix-24 FFT – P8 32-point Radix-24 FFT structure 8-level parallelism DIF (Decimation In Frequency), SDF (Single Delay Feedback) Fixed CSD & Modified Booth multipliers used Submission Slide 10 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 The structure of 256-point IFFT processor 16 Delay …x(8)x(4)x(0) W32 Butterfly (1) 8 Delay -j Butterfly (2) 4 Delay TWF ROM Butterfly (3) 2 Delay -j Butterfly (4) 1 Delay W16 Butterfly (5) …X(48)X(16)X(32)X(0) …v(8)v(4)v(0) 16 Delay …x(9)x(5)x(1) Butterfly (1) 16 Delay …x(10)x(6)x(2) W32 8 Delay -j Butterfly (2) W32 8 Delay TWF ROM Butterfly (3) -j Butterfly (2) Butterfly (1) 4 Delay 4 Delay TWF ROM -j 2 Delay 1 Delay W16 …X(176)X(144)X(160)X(128) Butterfly (5) -j Butterfly (4) 1 Delay W16 Butterfly (5) -j …v(9)v(5)v(1) 16 Delay …x(11)x(7)x(3) W32 8 Delay -j Butterfly (2) Butterfly (1) 4 Delay TWF ROM Butterfly (3) Butterfly (7) …v(10)v(6)v(2) 2 Delay Butterfly (4) Butterfly (3) Butterfly (6) Butterfly (6) …X(112)X(80)X(96)X(64) Butterfly (7) …v(11)v(7)v(3) 2 Delay -j Butterfly (4) 1 Delay …X(240)X(208)X(224)X(192) W16 Butterfly (5) Bit Reverse Unit -j 16 Delay W32 8 Delay -j Butterfly (2) Butterfly (1) 4 Delay TWF ROM Butterfly (3) 2 Delay -j Butterfly (4) 1 Delay W16 Butterfly (5) …X(49)X(17)X(33)X(1) …v`(8)v`(4)v`(0) -j 16 Delay W32 -j 16 Delay 8 Delay -j Butterfly (2) Butterfly (1) W32 8 Delay -j Butterfly (2) Butterfly (1) 4 Delay TWF ROM Butterfly (3) 4 Delay TWF ROM -j 2 Delay 1 Delay W16 …X(177)X(145)X(161)X(129) Butterfly (5) -j Butterfly (4) 1 Delay W16 Butterfly (5) -j …v`(9)v`(5)v`(1) -j 16 Delay Butterfly (1) W32 8 Delay Butterfly (2) -j 4 Delay TWF ROM Butterfly (3) Butterfly (7) …v`(10)v`(6)v`(2) 2 Delay Butterfly (4) Butterfly (3) Butterfly (6) Butterfly (6) …X(113)X(81)X(97)X(65) Butterfly (7) …v`(11)v`(7)v`(3) 2 Delay -j Butterfly (4) 1 Delay W16 …X(241)X(209)X(225)X(193) Butterfly (5) Butterfly unit : 48 -j multiplier : 22 CSD multiplier : 16 Modified Booth Multiplier : 8 Submission Slide 11 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 The structure of 256-point IFFT processor 16 Delay …x(8)x(4)x(0) W32 Butterfly (Type 1) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(48)X(16)X(32)X(0) …v(8)v(4)v(0) 16 Delay …x(9)x(5)x(1) Butterfly (Type 1) 16 Delay …x(10)x(6)x(2) W32 W32 Butterfly (Type 1) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM TWF ROM …x(11)x(7)x(3) W32 Butterfly (Type 1) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM Butterfly (Type 1) …v(10)v(6)v(2) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(176)X(144)X(160)X(128) W16 …X(112)X(80)X(96)X(64) …v(9)v(5)v(1) 16 Delay Butterfly (Type 1) Butterfly (Type 1) Butterfly (Type 2) …v(11)v(7)v(3) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) …X(240)X(208)X(224)X(192) W16 Bit Reverse Unit 16 Delay …x(8)x(4)x(0) W32 Butterfly (Type 2) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(49)X(17)X(33)X(1) …v`(8)v`(4)v`(0) 16 Delay …x(9)x(5)x(1) Butterfly (Type 2) 16 Delay …x(10)x(6)x(2) W32 W32 Butterfly (Type 2) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM TWF ROM …x(11)x(7)x(3) Butterfly (Type 2) CSD multiplier Submission W32 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM Butterfly (Type 1) …v`(10)v`(6)v`(2) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(177)X(145)X(161)X(129) W16 …X(113)X(81)X(97)X(65) …v`(9)v`(5)v`(1) 16 Delay Butterfly (Type 1) Butterfly (Type 1) Butterfly (Type 2) …v`(11)v`(7)v`(3) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) Modified Booth multiplier Slide 12 W16 …X(241)X(209)X(225)X(193) CSD multiplier ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 The structure of 128-point FFT processor (126) . . . (10)(6)(2) (127) . . . (11)(7)(3) 32-point Radix-24 FFT - P3 P/S Input Data 32-point Radix-24 FFT - P2 S/P (125) . . . (9)(5)(1) 32-point Radix-24 FFT - P1 Bit Reverse (124) . . . (8)(4)(0) Output Data 32-point Radix-24 FFT - P4 32-point Radix-24 FFT structure 4-level parallelism DIF (Decimation In Frequency), SDF (Single Delay Feedback) Fixed CSD & Modified Booth multipliers used Submission Slide 13 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 The structure of 128-point FFT processor 16 Delay …x(8)x(4)x(0) W16 Butterfly (Type 1) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(8)X(16)X(0) …v(8)v(4)v(0) 16 Delay …x(9)x(5)x(1) W16 Butterfly (Type 1) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM Butterfly (Type 1) Butterfly (Type 1) …v(10)v(6)v(2) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) …X(72)X(80)X(64) W16 Bit Reverse Unit 16 Delay …x(10)x(6)x(2) W16 Butterfly (Type 2) 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) TWF ROM 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(40)X(48)X(32) …v(9)v(5)v(1) 16 Delay …x(11)x(7)x(3) Butterfly (Type 2) CSD multiplier W16 8 Delay 4 Delay Butterfly (Type 1) Butterfly (Type 2) Modified Booth multiplier TWF ROM Butterfly (Type 1) Butterfly (Type 2) …v(11)v(7)v(3) 2 Delay 1 Delay Butterfly (Type 1) Butterfly (Type 2) W16 …X(104)X(112)X(96) CSD multiplier Butterfly unit : 24 -j multiplier : 11 CSD multiplier : 8 Modified Booth Multiplier : 4 Submission Slide 14 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Simulation result of 256-point IFFT processor 128-point QPSK modulated data + 128-point ‘0’ 256-point IFFT Radix-24 DIF SDF (Fixed point) Decimation (2) MATLAB Function FFT(128) SQNR calculation Constellation Input Bit resolution : 3 Output bit resolution : 20 Multiplier coefficient bit : 10 SQNR : 52dB Submission Slide 15 Input Bit resolution : 3 Output bit resolution : 11 Multiplier coefficient bit : 8 SQNR : 30dB ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Simulation result of 128-point FFT processor 128-point QPSK modulated data + 128-point ‘0’ MATLAB Function IFFT(256) 128-point FFT Radix-24 DIF SDF (Fixed point) Decimation (2) SQNR calculation Constellation Input Bit resolution : 10 Output bit resolution : 20 Multiplier coefficient bit : 10 SQNR : 52dB Submission Slide 16 Input Bit resolution : 10 Output bit resolution : 12 Multiplier coefficient bit : 8 SQNR : 30dB ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Summary of simulations IFFT processor Points Parallel level SQNR (dB) Gate Count 8 52 about 100k 8 30 about 80k Parallel level SQNR (dB) Gate Count 4 52 about 50k 4 30 about 40k 256 FFT processor Points 128 Submission Slide 17 ETRI doc.: IEEE 802.15-04-0467-00-003a September 2004 Conclusion 256-point IFFT processing for easy Low Pass Filtering Parallel structure for high speed signal processing IFFT/FFT processor 32-point radix-24 DIF SDF structure Small area, low power, high speed operation Canonic Signed Digit Multiplier – constant coefficients Modified Booth Multiplier – variable coefficients IFFT processor FFT processor Point 256-point 128-point Parallelism 8 4 Number of input data (sample/clock) 4 4 Throughput (sample/clock) 8 4 Latency (except S/P, reverse unit) 32 32 Number of gates (30dB SQNR) About 80K About 40K Submission Slide 18 ETRI