Processing VLSISignal Laboratory A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology -Alice Wang & Anantha Chandrakasan- Seok-jae, Lee VLSI Signal Processing Lab. Korea University 1 Why FFT processor? • FFT processor is used for wireless sensor network. FFT has been used in target tracking, localization and radar by analyzing phase differences form multiple sensors. FFT processor require low power design, chip speed is not critical. • FFT processor is configured with some multipliers, control logics and SRAM memory parts. • With various design method for low power consumption variable bit precision, variable FFT length-, more power saving can be achived. • Especially, multipliers, control logics and SRAM are implemented using ‘SUBTHRESHOLD’ circuits dissipated extremely low energy. 2 Processing VLSISignal Laboratory Radix-2 Butterfly FFT architecture Subthreshold circuits are used!!! 3 Processing VLSISignal Laboratory 8-b and 16-b Scalable Baugh-Wooley Multiplier To minimize switching in the LSB adders, LSB inputs are gated. With 8-b precision, MSB parts of two inputs are processed. 4 Processing VLSISignal Laboratory Minimum Energy Point Analysis(1) The power supply starting from large value is dropped, the switching(dynamic) and overall energy reduced. (VDD > Vth) 5 Processing VLSISignal Laboratory Minimum Energy Point Analysis(2) Computation delay!!! In subthreshold region, the propagation delay increases exponentially resulting in a increase in leakage energy. (VDD <Vth) 6 Processing VLSISignal Laboratory Minimum Energy Point Analysis(3) Minimum energy point = Optimal operating point (VDD, VTH) = (380mV, 480mV) • Case 1 : Processing speed is not important. The optimal operating point occurs at the minimum energy point. And circuit operates with corresponding frequency. 7 Processing VLSISignal Laboratory Minimum Energy Point Analysis(4) Optimal operating point contour • Case 2 : Processing speed is critical. The given frequency constraints the VDD and VTH to achieve maximum power saving. One performance contours is tangent to one energy contour. 8 Processing VLSISignal Laboratory Minimum Energy Point for fixed VTH • VTH value is fixed as 450mV for implementing FFT processor. VDD value is 400mV for minimizing energy consumption • Low power FFT processor operates in SUBTHRESHOLD region !!! 9 Processing VLSISignal Laboratory Subthreshold Inverter • Case 1 : Input is logical ‘0’. Leakage, IOFF 0 1 In subthreshold region, the leakage current is significant, So minimum WP (WP(min)) exists to pull up output node. worst case : Fast NMOS & Slow PMOS (FS) ION • Case 2 : Input is logical ‘1’. ION Leakage, IOFF Minimum sized NMOS pulls down output node to ‘0’. But a large PMOS lead to a large leakage current compared to the drive current if NMOS. So maximum WP (WP(max)) exists to pull down output node. worst case : Slow NMOS & Fast PMOS (FS) 10 Processing VLSISignal Laboratory Operating Point for a Subthreshold Inverter VDD = 195mV, WP = 5.4um (0.18um technology) 11 Processing VLSISignal Laboratory Subthreshold Standard Cell – XOR Case (1) Conventional XOR gate scheme in subthreshold region In A=1, B=0 case, Leakage current is large and ION/IOFF is small. So, output node can not be fully pulled up. 12 Processing VLSISignal Laboratory Subthreshold Standard Cell – XOR Case (2) A transmission gate XOR in subthreshold region devices are balanced Because there are two devices pulling the output node high and two diveces pulling low, ION/IOFF is not degraded!!! 13 Processing VLSISignal Laboratory Subthreshold Memory Design • FFT processor contains eight 128W X 16b RAM blocks and four 256W X 16b blocks. => Analyzing the functionality of conventional 6T SRAM in subthreshold. - Bitline cap, bitline leakage, speed, PVT variation…etc.. => Hierarchical read-bitline is used in the design of data memory and achieves acceptable ION/IOFF in subthreshold. 14 Processing VLSISignal Laboratory Subthreshold Write Access (1) • NPD have to be large enough to… voltage at LO does not rise above ΔVLO due to leakage of PPU and BL. • Worst case : Slow NMOS and Fast PMOS (SF) 15 Processing VLSISignal Laboratory Subthreshold Write Access (2) • Write ‘Low’ case : => Determines NPS to pull HI down to ΔVLO , worst : SF • Write ‘High’ case : Determines Maximum NPD and NPS. Since NPD and NPS causes voltage divider by its leakage current, so the drive current of PPU used to pull LO up to ΔVHI . 16 Processing VLSISignal Laboratory Sizing analysis on NPD If VDD decreases, Cell size increase dramatically!!! This is optimal point, but this value can’t satisfy both READ and WRITE condition!!! 17 Processing VLSISignal Laboratory A Latch Based Write Sceheme and its analysis • C2MOS tristate inverters is a more robust design for subthrehold operation. •The tristate latch memory cells shows functionality at down to 215mV. 18 Processing VLSISignal Laboratory Subthreshold Read Access (1) The conventional 128W single-ended scheme case • During precharge phase, Wpre is on and Bit line (RBL) is charged to VDD. •But, since the charge stored bitline leaks away through all of the pull down device, Wpre is sized to offset the maximum leakage current through the pull down devices. 19 Processing VLSISignal Laboratory Subthreshold Read Access (2) 0 1 1 1 1 • In worst case, M0 = 0 and M1~M127 =1, the bit line leakage are maximized. • But, in this case, when RBL evaluate to ‘0’, ION << IOFF , RBL fails to evaluate to ‘0’. 20 Processing VLSISignal Laboratory Subthreshold Read Access (3) 0 The tristate-based scheme case 1 1 1 1 • In worst case, M0 = 0 and M1~M127 =1, the tristate-based read access also suffer from bitline leakage effects. •RBL evaluate to ‘0’, ION << IOFF , RBL fails to evaluate to ‘0’. 21 Processing VLSISignal Laboratory Subthreshold Read Access (4) Proposed hierarhical-read-bitline scheme case Proposed SRAM scheme has some area, timing overhead but achieves extremely low energy dissipation. Latency!!! Need a decoder!!! 22 MUX with balanced circuit Processing VLSISignal Laboratory Results – Energy Dissipation as a function of VDD • The optimal operating point for minimal energy dissipation is at VDD = 350mV • In simulation result, VDD = 400mV. 23 Processing VLSISignal Laboratory Results – Energy of 8-b and 16-b Processing 24 Processing VLSISignal Laboratory Summary specifications values Technology 0.18um CMOS with six metal layer Area 2.6 X 2.1 mm2 FFT length 128, 256, 512, 1024 Bit precision 8bit and 16bit precision Voltage supply 180~900mV Clock frequency 164Hz ~ 6MHz Power consumption 90nW (VDD=180mV) 600nW (VDD = 350mV, frequency = 10kHz) 25 Processing VLSISignal Laboratory