Presentation 1 - Sandip - Atomic Scale Design Network (ASDN)

advertisement
FEMTO-JOULE SWITCHING
Review of Low Energy
Approaches for the Nano Era
Jabulani Nyathi
Washington State University
Valeriu Beiu
Washington State University
Snorre, Aunet
University of Oslo, Norway
With credits to
Joel Birnbaum (HP), Hugo De Man (IMEC/KUL), Kaushik Roy (Purdue), Mark
Lundstrom (Purdue), Vojin G. Oklobdzija (UCDavis), Takayasu Sakurai (University of
Tokyo), Tadahiro Kuroda (Keio University), Anantha Chandrakasan (MIT), Richard
Brown (Univ. of Utah), and ITRS Roadmap
Motivation
2
Where are we going?
Toward pervasive information systems
Penetration
Information
utility
Utility: The ability, capacity or power…
to sati sfy the needs or gratify the desires
of the majority or of the human race as a
whole (Oxford English Dictionary)
Cooperative
computing
Networked
personal computing
Distributed
computing
Batch computing
and timesharing
Information
appliances
Open systems of
clients and servers
Micros
Appliance: a thing applied
as a means to an end
(Oxford English Dictionary)
Minis
Mainframes
1960
1970
1980
1990
2000
3
How to get there? The very big
picture
Services
Network
embedded C
RF
asp
opamp
System on
Filters AD/DA
dspP
IC
ASIP
memory
IP
µP
µC
gate
RT-ops
FSM
ASIC
FPGA
Silicon Board
VHDL
OO
cC++
Softwar
e
Hardware
1960
70
Design Software
80
90
2000
2010
Year
4
How to get there? The very small
picture
10 nm scale
MOSFETs
1000 mA
ID(on)
10 mA
ID(off)
0.00001 mA
10X increase
per technology node
1.2 nm
1990
2016
5
As the electrons vanish
Scaling of electronic devices
Vanishing electrons
Number of chip components
1018
Electrons per device
104
Classical Age
1016
(16M)
103
14
10
(64M)
(256M)
(Transistors per chip)
(1G)
1012
SIA Roadmap
1010
1995
108
Historical Trend 1990
106
104
(4M)
2000
2005
102
2010
(4G)
(16G)
101
CMOS
100
1980
1970
102
101
100
10-1
10-2
Feature size (microns)
10-3
10-1
1985
1990
1995
2000
2005
2010
2015
2020
Year
Power cost of information transfer?
Information is a
physical entity
– Rolf Landauer, IBM
Therefore,
computation is a
physical process
d
P = nkBT c 2
P
kB
T
d
c

n
= power
= Boltzman constant
= temperature
= transmission distance
= speed of light
= operating frequency
= number of parallel
operations
6
Power
Power
Power
7
The trend: power, VDD, and
current
200
500
0.5
Power per chip [W]
VDD current [A]
2.5
0
0
0
Voltage
Voltage [V]
2
Power
1.5
Current
1
1998
2002
2006
2010
2014
Year
8
How should we deal with power and
speed?
Device level
devices must have low threshold voltages,
reduced parasitic capacitances or
better yet new devices
Examples include fully and partially depleted silicon-on-insulator CMOS
Novel nano devices (e.g., single electron transistors, molecular, spin transistor, etc.)
Gate level
Logic design styles that include
Standard CMOS
Domino logic
Differential logic families
Pseudo nMOS and many more
Threshold logic
Circuit level
Clock gating, current sensing, etc
Module level
Will inherit the gains achieved at device, circuit and gate levels and
manage these by employing innovative architectures (e.g., reduce
switching activity).
Chip level
9
Sources of power dissipation
Power has been a secondary design issue to
speed
Device miniaturization and voltage scaling
have led to:
Fast switching speeds,
High density designs,
High leakage currents and
ultimately increased power dissipation.
In deep sub-micron (i.e. nano), the conflicting
issues of high speed and low power are
becoming even more prominent.
10
Past techniques for power
reduction
Voltage/frequency scaling
Limited by technology.
Not possible below a certain feature-size.
Architectural adaptation
Shut off portions of core when not needed
Dynamic speculation control
Reconfigurable caches
Limitations:
Very few choices to make
Only dynamic power being saved
Has associated overhead
11
TransMeta Example
12
Expression for average power
Sufficient details of the currents drawn must
be studied to allow for a detailed power
analysis.
The average total power in digital CMOS
circuits can be described by:
Ptotal = Pdynamic + Pshort_circuit +
Pstatic
The dynamic power component and methods
to manage it, have seen a fair share of
analysis.
13
Power component expressions
Each component of the average power can be
analyzed further as follows:
Pdynamic = α • VDD• Vswing• CL • fCLK
With VDD being the supply voltage, Vswing the
output/internal node voltage swing, CL the load
capacitance and f the switching rate of the output
and α, the activity factor.
Pshort_circuit = α • Isc_ave• Vswing
Isc_ave is the average short circuit current over a
period.
14
The static power … becomes
important!
The third component of the average power
equation is:
Pstatic = Psub_leakage + PDC
Where Psub_leakage is due to sub-threshold leakage
PDC is due to DC current
For nano-electronics it is expected that the
static component of power will be comparable
to the dynamic power dissipation
Standby power (Psub_leakage) – a component of
static power will be the culprit due to scaling.
15
Example: Reducing dynamic
power
Reducing the active
load:
•Minimize the circuits
•Use more efficient design
•Charge recycling
•More efficient layout
Technology scaling:
•The highest win
•Thresholds should scale
•Leakage starts to byte
•Dynamic voltage scaling
Pdynamic = a CL VDD Vswing fCLK
Reduce switching
activity:
•Conditional clock
•Conditional precharge
•Switching-off inactive
blocks
•Conditional execution
Run it slower:
•Use parallelism
•Less pipeline
stages
•Use doubleedge flip-flop
16
Is there an optimal design
point ?
17
Power dissipation and circuit
delay
Power :
V th
2
P = pt •f CLK • CL • VDD + I 0 •10
S
Delay
VDD
•
-4
1
5
0.8
Power (W)
k•Q
I
=
k • CL • VDD
a
(VDD - Vth )
( a=1.3)
-10
x 10
x 10
4
Delay (s)
0.6
0.4
=
A
0.2
B
0
4
3
2
1
04
3
2
10.8
0.4
0
-0.4
3
A
B
2
1 0.8 0.4
0
-0.4
18
Power-delay product, energy-delay
product
Lowest Voltage –
Highest Threshold –
no optimum
Power-delay product is a misleading metric, as it
favors a processor that operates at lower frequency
Energy-delay is adequate, but energy delay2 should be
used instead
19
2
Energy-delay
20
Lowering VDD to achieve ultralow power
VDD should be lowered
to the minimum level
which ensures
the real-time operation.
1.0
Normalized power
Energy consumption is
proportional to
the square of VDD.
0.8
Variable Vdd
Fixed Vdd
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
Normalized workload
1.0
21
Aggressively lowering VDD + Vth
If VDD and Vth are dynamically scaled; the advantage is
obvious
22
The future: sub-threshold and
body bias ?
23
A fresh look at leakage currents
Some device and circuit level techniques for leakage
current reduction are:
Dynamic threshold transistors (DTMOS)
Technique permits the body voltage to be switched with the gate voltage.
High threshold voltages in standby mode result in low leakage currents.
Low threshold voltage in active mode allow for higher current drives
(high speed).
Multi-threshold CMOS (MTCMOS)
A high threshold voltage device is placed in series with low threshold
MOS devices
Devices in the critical path are assigned low threshold voltages to allow
for high gate speeds
Devices that are not in the critical path are assigned high threshold
voltages to dissipate minimum leakage power in standby mode.
Digital sub-threshold voltage
Devices operate in sub-threshold region (Vgs < |Vth|)
Technique is suitable for ultra low power applications where speed is of
secondary importance
24
Various DTMOS configurations
VDD
DTMOS
Inveter
configuration
Vin
Vout
DTMOS:
Allows for control of the bulk terminal
Good for low voltage operation (VDD < 0.6V)
25
Basic MTCMOS architecture
Low-VTH circuit
(High leakage)
High-VTH circuit
(Low leakage)
Critical paths
Non-critical paths
26
MTCMOS circuits configuration
VDD
VDD
Vsleep
Low Vt
Devices or
Logic
VGND
Vsleep
High Vt
Device
High Vt
Device
V_HIGH
Low Vt
Devices or
Logic
MTMOS:
Low Vth in active mode
Power supply is disconnected through the high Vth
device in standby mode
Extra high Vth memory circuit needed if data
retention is necessary in standby mode
27
Digital sub-threshold circuits
Improved characteristics including higher
gain, better noise margin, and more energy
efficient
Ratio-ed logic (pseudo/true-NMOS) compared
to CMOS logic in terms of switching and
power
Pseudo NMOS:
Switches faster
Draws high currents (dc currents are dominant)
Dissipates more power
Both CMOS and pseudo-nMOS sub-threshold logic
are easy to design and more efficient as compared
to other known ultra-low power logic, such as
28
Ring oscillator configurations
Brown et al have
compared floating
body and DTMOS
inverters.
Body conditioning
is expected to
yield superior
results
Our ring
oscillators use
both conventional
and adaptive body
biasing.
29
Ring oscillators @ different
nodes (PDP)
Wp
Wn
Delay
Current
SPEE
D
nm
nm
ns
nA
GAIN
VDD
(mV)
450
3900
150
0 296.90
1250
150
0 183.00
Pseudo + Swap
1500
150
0 146.50
180 nm
VDD
(mV)
450
3375
108
0 176.70
900
108
0
1080
108
0
250 nm
CMOS
Pseudo nMOS
CMOS
Pseudo nMOS
Pseudo + Swap
75.50
62.40
286
480
2800
270
688
3055
1.00
1.62
2.03
1.00
2.34
2.83
POWE
R
PDP
EDP
nW
fJ
fJ*ns
7.64
2.2689
7
7.90
1.4467
2
36.91
5.4084
8
4.30
0.7604
0
4.67
0.3531
6
17.15
1.0705
8 30
26
43
252
24
62
275
31
The best of
both worlds
?
32
Effect of using different
circuits styles
33
How are logic design styles
affected?
P
P
P
LOGIC
dynamic
short_circu
it
STYLE
a
VDD
Vswin
3C V
DD
VDD
VDD
CL
DC
Pleakage
Isc*VDD
IDC*VDD
Isc*e-vt/vT*VDD
1.5
1X
[0 if VDD≤Vtn+Vtp]
0
1
VDD
2
1X
[0 if VDD≤Vtn+Vtp]
0
1
g
Standard
CMOS
a/2
Domino
2a
CL
Pass
Transistor
a/2
CL
VDD
VDDVt
0.4
0
0
1
Differential
(standard)
2a
2C V
DD
VDD
4
2X
[0 if VDD≤Vtn+Vtp]
0
2
Differential
w/ charge
recycling
2a
2
2X
[0 if VDD≤Vtn+Vtp]
0
2
Pseudo
nMOS
a/2
0.4
1X
[0 if VDD≤Vtn]
1X
[0 if VDD≤V]
1
L
L
2C V
DD VDD/2
L
CL
VDD
VDDVt
34
Instead of conclusions … Where
is CL?
The interconnection dilemma
Metal 7
“T HE FAULT, DEAR BRUTUS,
LIES NOT I N OU R GATES,
GATES ,
BUT I N OUR WI RES.”
RES .”
Metal 6
– with apologies to W. Shakespeare
and J. Caesar
Metal 5
Metal 4
Metal 3
Metal 2
Metal 1
Silicon wafer
35
36
Download