**** 1

advertisement
Processing
VLSISignal
Laboratory
A 180-mV Subthreshold FFT
Processor Using a Minimum Energy
Design Methodology
-Alice Wang & Anantha Chandrakasan-
Seok-jae, Lee
VLSI Signal Processing Lab.
Korea University
1
Why FFT processor?
• FFT processor is used for wireless sensor network.
 FFT has been used in target tracking, localization and radar by
analyzing phase differences form multiple sensors.
 FFT processor require low power design, chip speed is not critical.
• FFT processor is configured with some multipliers, control
logics and SRAM memory parts.
•
With various design method for low power consumption variable bit precision, variable FFT length-, more power
saving can be achived.
• Especially, multipliers, control logics and SRAM are
implemented using ‘SUBTHRESHOLD’ circuits dissipated
extremely low energy.
2
Processing
VLSISignal
Laboratory
Radix-2 Butterfly FFT architecture
Subthreshold circuits are used!!!
3
Processing
VLSISignal
Laboratory
8-b and 16-b Scalable Baugh-Wooley Multiplier
To minimize
switching
in the LSB
adders,
LSB inputs
are gated.
With 8-b
precision,
MSB parts
of two
inputs are
processed.
4
Processing
VLSISignal
Laboratory
Minimum Energy Point Analysis(1)
 The power supply starting from large value is dropped, the switching(dynamic)
and overall energy reduced. (VDD > Vth)
5
Processing
VLSISignal
Laboratory
Minimum Energy Point Analysis(2)
Computation delay!!!
 In subthreshold region, the propagation delay increases exponentially resulting
in a increase in leakage energy. (VDD <Vth)
6
Processing
VLSISignal
Laboratory
Minimum Energy Point Analysis(3)
Minimum energy point =
Optimal operating point
(VDD, VTH) = (380mV, 480mV)
• Case 1 : Processing speed is not important.
The optimal operating point occurs at the minimum energy point.
 And circuit operates with corresponding frequency.
7
Processing
VLSISignal
Laboratory
Minimum Energy Point Analysis(4)
Optimal operating point
contour
• Case 2 : Processing speed is critical.
 The given frequency constraints the VDD and VTH to achieve maximum power
saving.
 One performance contours is tangent to one energy contour.
8
Processing
VLSISignal
Laboratory
Minimum Energy Point for fixed VTH
• VTH value is fixed as 450mV for implementing FFT processor.
 VDD value is 400mV for minimizing energy consumption
• Low power FFT processor operates in SUBTHRESHOLD region !!!
9
Processing
VLSISignal
Laboratory
Subthreshold Inverter
• Case 1 : Input is logical ‘0’.
Leakage, IOFF
0
1
 In subthreshold region, the leakage current is
significant, So minimum WP (WP(min)) exists to pull
up output node.
 worst case : Fast NMOS & Slow PMOS (FS)
ION
• Case 2 : Input is logical ‘1’.
ION
Leakage, IOFF
 Minimum sized NMOS pulls down output
node to ‘0’. But a large PMOS lead to a large
leakage current compared to the drive current if
NMOS. So maximum WP (WP(max)) exists to pull
down output node.
 worst case : Slow NMOS & Fast PMOS (FS)
10
Processing
VLSISignal
Laboratory
Operating Point for a Subthreshold Inverter
VDD = 195mV, WP = 5.4um (0.18um technology)
11
Processing
VLSISignal
Laboratory
Subthreshold Standard Cell – XOR Case (1)
Conventional XOR gate scheme in subthreshold region
In A=1, B=0 case,
Leakage current is large and
ION/IOFF is small.
So, output node can not be
fully pulled up.
12
Processing
VLSISignal
Laboratory
Subthreshold Standard Cell – XOR Case (2)
A transmission gate XOR in subthreshold region
devices are balanced
Because there are two
devices pulling the output
node high and two diveces
pulling low,
ION/IOFF is not degraded!!!
13
Processing
VLSISignal
Laboratory
Subthreshold Memory Design
• FFT processor contains eight 128W X 16b
RAM blocks and four 256W X 16b blocks.
=> Analyzing the functionality of conventional 6T SRAM
in subthreshold.
- Bitline cap, bitline leakage, speed, PVT variation…etc..
=> Hierarchical read-bitline is used in the design of
data memory and achieves acceptable ION/IOFF in
subthreshold.
14
Processing
VLSISignal
Laboratory
Subthreshold Write Access (1)
• NPD have to be large enough to…
voltage at LO does not rise above ΔVLO due to leakage of PPU and BL.
• Worst case : Slow NMOS and Fast PMOS (SF)
15
Processing
VLSISignal
Laboratory
Subthreshold Write Access (2)
• Write ‘Low’ case :
=> Determines NPS to pull HI down to ΔVLO , worst : SF
• Write ‘High’ case :
Determines Maximum NPD and NPS. Since NPD and NPS causes
voltage divider by its leakage current, so the drive current of PPU used
to pull LO up to ΔVHI .
16
Processing
VLSISignal
Laboratory
Sizing analysis on NPD
If VDD decreases,
Cell size increase
dramatically!!!
This is optimal point,
but this value can’t
satisfy both READ and
WRITE condition!!!
17
Processing
VLSISignal
Laboratory
A Latch Based Write Sceheme and its analysis
• C2MOS tristate inverters is a more robust design for subthrehold
operation.
•The tristate latch memory cells shows functionality at down to 215mV.
18
Processing
VLSISignal
Laboratory
Subthreshold Read Access (1)
The conventional 128W single-ended scheme case
• During precharge phase, Wpre is on and Bit line
(RBL) is charged to VDD.
•But, since the charge stored bitline leaks away
through all of the pull down device, Wpre is sized
to offset the maximum leakage current through
the pull down devices.
19
Processing
VLSISignal
Laboratory
Subthreshold Read Access (2)
0
1
1
1
1
• In worst case, M0 = 0 and M1~M127 =1,
the bit line leakage are maximized.
• But, in this case, when RBL evaluate to ‘0’,
ION << IOFF , RBL fails to evaluate to ‘0’.
20
Processing
VLSISignal
Laboratory
Subthreshold Read Access (3)
0
The tristate-based scheme case
1
1
1
1
• In worst case, M0 = 0 and M1~M127 =1,
the tristate-based read access also suffer from
bitline leakage effects.
•RBL evaluate to ‘0’,
ION << IOFF , RBL fails to evaluate to ‘0’.
21
Processing
VLSISignal
Laboratory
Subthreshold Read Access (4)
Proposed hierarhical-read-bitline scheme case
Proposed SRAM scheme has
some area, timing overhead but
achieves extremely low energy
dissipation.
Latency!!!
Need a decoder!!!
22
MUX with
balanced circuit
Processing
VLSISignal
Laboratory
Results – Energy Dissipation as a function of VDD
• The optimal operating point for minimal energy dissipation is at
VDD = 350mV
• In simulation result, VDD = 400mV.
23
Processing
VLSISignal
Laboratory
Results – Energy of 8-b and 16-b Processing
24
Processing
VLSISignal
Laboratory
Summary
specifications
values
Technology
0.18um CMOS with six metal layer
Area
2.6 X 2.1 mm2
FFT length
128, 256, 512, 1024
Bit precision
8bit and 16bit precision
Voltage supply
180~900mV
Clock frequency
164Hz ~ 6MHz
Power consumption
90nW (VDD=180mV)
600nW (VDD = 350mV, frequency = 10kHz)
25
Processing
VLSISignal
Laboratory
Download