Technology Shifts Independent of Technology

advertisement
What Comes Next?
1.E+11
The Impact of the Nanoscale on
Computing Systems
1.E+10
?
Ops/sec/$
1.E+09
doubles
every 1.0
years
1.E+06
Seth Copen Goldstein
Tubes/
Transistor
1.E+03
seth@cs.cmu.edu
Mechanical/
Relays
1.E+00
Carnegie Mellon University
CMOS
doubles every
7.5 years
1.E-03
doubles every
2.3 years
Combination of Hans Moravac + Larry Roberts + Gordon Bell
WordSize*ops/s/sysprice
1.E-06
1880
NSF 9/05
© 2001-5 Seth Copen Goldstein
1900
1920
Technology Shifts
1980
2000
2010
2020
2030
2
© 2001-5 Seth Copen Goldstein
Independent of Technology
• Size of Devices
⇒ Inches to Microns to Nanometers
• Type of Interconnect
⇒ Rods to Lithowires to Nanowires
• Method of Fabrication
⇒ Hammers to Light to Self-Assembly
• Largest Sustainable System
⇒ 101 to 108 to 1012
• Reliability
⇒ Bad to Excellent to Unknown
© 2001-5 Seth Copen Goldstein
1960
From Gray Turing Award Lecture
1
NSF 9/05
NSF 9/05
1940
3
As we scale down:
• Devices become
Drain
IBM
– more variable
– more faulty (defects & faults)
– numerous
• Fabrication becomes
– More expensive
– More constrained
Gate Source
1 dopant atom
CMOS
Nano
MIT
HP
• Design becomes
– More complicated
– More expensive
• Market pressures
remain
NSF 9/05
© 2001-5 Seth Copen Goldstein
4
120
100
80
– more variable
40
– more faulty (defects & faults)200
– numerous
1 dopant atom
0.25 um
0.18 um
'07
'09
'01
'03
'05
'99
'97
'95
'93
'89
'87
'85
'83
© 2001-5 Seth Copen Goldstein
Intel
Carbon Nanotube
transistor
~2nm width
Delft
1cm
Copper wires,
predicted
~50nm pitch
IBM
NSF 9/05
Nanowires,
Already
17nm pitch
Copper wires,
predicted
~50nm pitch
Fab Constraints
The Red
Brick Wall
Design Costs
108
1cm
Challenges arise from
- Small size:
changes in physical
process
- Many devices:
increased complexity
104
ip
Ch
105
n
e
CMOS
tivity
o du c
er pr
1991
1993
1995
1987
1989
Parametric Variation
Power
Nano
MIT
YEAR
TECHNOLOGY NODE
On-chip local frequency (MHz)
Number of metal levels - Logic
Number of optional levels
2
o
Jmax (A/cm ) - wire (at 105 C)
Local wiring pitch - DRAM non-contacted (nm)
Local wiring pitch - Logic (nm)
Local wiring
AR-Logic
(Cu)
Drain
Gate
Source
Cu local dishing (nm)
Intermediate wiring pitch - Logic (nm)
Intermediate wiring h/w AR - Logic (Cu DD via/lin)
IBM
Cu intermediate wiring
dishing dopant
15 um1wide
wire atom
(nm)
Dielectric erosion, intermediate wiring
50% density (nm)
Global wiring pitch - Logic (nm)
Global wiring h/w AR - Logic - Cu DD via/line (nm)
Cu global wiring dishing, 15 um wide wire (nm)
Contact aspect ratio - DRAM, stacked cap
Conductor effective resistivity (uohm -cm)
Barrier/cladding thickness (nm)
Interlevel metal insulator effective
dielectric constant (k) - Logic
1985
1983
Desig
HP
1999
2002
2005
180 nm 130 nm 100 nm
1.25
2.10
3.50
6-7
7-8
8-9
0
2
2
5.8 E5 9.6 E5 1.4 E6
360
260
200
500
325
230
Rest
1.4
1.5
1.7
18
14
11
560
405
285
2.0/2.1 2.2/2.1 2.4/2.2
Verification
Design $
breakdown Verification
64
51
41
Efficiency
64
900
2.2/2.4
116
9.3
2.2
17
51
650
2.5/2.7
95
11.4
2.2
13
41
460
2.7/2.8
76
13
2.2
10
ALUs
4.0 - 3.5 3.5 - 2.7 2.2 - 1.6
2008
70 nm
6.00
9
3
2.1 E6
140
165
1.9
9
210
2.5/2.3
2011
2014
50 nm 35 nm
10.00
13.50
9-10
10
4
4
3.7 E6 4.6 E6
100
70
120
85
2.1 2.2 - 2.3
7
5
145
110
2.7/2.4 2.9/2.5
Wire/Gate delay
gate
wire
5ps
30
20ps
22
17
Global Wires
0
0
0
330
240
170
2.8/2.9 2.9/3.0 3.0/3.1
55
38
20
14.1
16.1
23.1
1.8
< 1.8
< 1.8
0
0
0
1.4
<1.5
<1.5
Solutions Exist
Solutions being pursued
No known solutions
CalTech
© 2001-5 Seth Copen Goldstein
si z
Mask Costs
Opportunities too!
6
© 2001-5 Seth Copen Goldstein
109
107
106
… and those that do
won’t be identical
CalTech
NSF 9/05
1010
1cm
Nanowires,
Already
17nm pitch
IBM
5
Size Matters
10nm gate,
end-of-roamap
approx CMOS size
1cm
2007
Productivity
Delft
2009
HP
Intel
2003
MIT
105
104
103
102
10
1
0.1
10-2
2005
CMOS
Complexity Nano
SEMATECH
1999
104
103
102
10
1
0.1
-2
10
10-3
… but many of them
probably won’t work
0.13 um
Karen Brown, NIST
SIA Roadmap Generation
2001
0.35 um
'81
• Market pressures
remain
A trillion devices/cm2
IBM
– Defect tolerance
– Higher level specification
– Universal substrate
– Asynchronous circuits
– Spatial Computing
– More complicated
– More expensive
Carbon Nanotube
transistor
~2nm width
10nm gate,
end-of-roamap
approx CMOS size
500
Gate Source
Affordable Total Cost /
Wafer Level Exposure
60
Requires:
• Design becomes
NSF 9/05
Drain
1981
– More expensive
– More constrained
Logic Ts (M)/Chip
• Fabrication becomes
Wafer Exposures/Mask
3000
160
140
Size Matters
1997
As we scale down:
• Devices become
180
Logic Ts (K)/Staff Mon
Mask Cost / Wafer Level Exposure ($)
Independent of Technology
7
NSF 9/05
© 2001-5 Seth Copen Goldstein
8
2007
2009
2003
2005
1999
2001
1997
1991
1993
Simple/regular
CMOS
Nano
layout
defect tolerant
MIT
HP
Verification
Parametric Variation
Manufacturing Paradigm Shift Required
• Reliable Systems from reliable components
Reliable systems from unreliable components
• Functionality invested at time of manufacture
Functionality modified after manufacture
New manufacturing: Bottom-up assembly
Tolerate
parametric
variation
Design $
Automatic
breakdown Verification
Verification
Wire/Gate delay
gate
Simple,
short,
wire
unidirectional
5ps 20ps
interconnect
Mask Costs
Efficiency
Global Wires
No
interpretation
Distributed
control,
Asynchronous
Drain
Rest
Gate Source
IBM
1 dopant atom
Reduce or
Eliminate
mask costs
ALUs
NSF 9/05
• Top-Down
© 2001-5 Seth Copen Goldstein
Limited Patterns
– Sub wavlength lithography
OPC, RET, CPM, …
– Nanoimprint lithography
– DPN
• Bottom-Up
• Behavior remains same as features scales down
Expect increased variability
Changes in functionality
Restrictions on connectivity
9
NSF 9/05
TI
© 2001-5 Seth Copen Goldstein
10
Balance
CalTech
– Self-assembly
Future
Today
Simple
parallel hw,
mostly idle
Resnick, etal
1981
104
1995
105
Power
Power
Fab Constraints
e
Automatic
si z
ip
Ch
y
ctivit
translation
rodu
ner p
ig
s
De
VHLS
1987
107
106
1989
108
Design Costs
1983
109
1985
1010
• Nanoscale makes things harder
• Nanoscale makes things easier
• Challenge: Use devices to
– Ease restrictions
– Reduce complexity
– Reduce power
• How: change abstractions and tools
caltech
NSF 9/05
Whang, etal
© 2001-5 Seth Copen Goldstein
Nanoin
11
NSF 9/05
© 2001-5 Seth Copen Goldstein
12
The Clock
Reconfigurable Computing
• Design for worst case arrival
– Parametric variation
– Timing closure
– Power
• Asynchrounous circuits
–
–
–
–
No global controllers
No global clock
No timing closure
Tolerant of parametric variation
Datain
Logic
Req
General-Purpose
Ack
int reverse(int x)
{
int k,r=0;
for (k=0; k<64; k++)
r |= x&1;
x = x >> 1;
r = r << 1;
}
}
int func(int* a,int *b)
{
int j,sum=0;
for (j=0; *a>0; j++)
sum+=reverse(*b
Handshaking
Reg
Dataout
Custom Hardware
Req Ack
Compiler
• Use more devices to
Logic Blocks
– Reduce power
– Support device scaling
– Support defect tolerance
NSF 9/05
© 2001-5 Seth Copen Goldstein
Routing Resources
13
NSF 9/05
Reconfigurable Rationale
© 2001-5 Seth Copen Goldstein
14
Reconfigurability & DFT
•Reconfigurable Architectures address roadblocks
– Yield
with defect tolerance
– Cost
single substrate eliminates NRE
– Manufacturability Crystaline architecture reduces fab
complexity
– Power
General-Purpose
Custom Hardware
Place & Route
Power ∝ Area(3 -σ )/σ, where σ is algrthm
dependent; typically 2 < σ < 3.
Place& Route
•However, must change computing approach
• FPGA computing fabric
–
–
–
–
Regular
periodic
Fine-grained
Homogenous
• programs ⇒ circuits
• Aides defect tolerance
•Aside:
Molecular Scale Electronics increases fabric density
NSF 9/05
© 2001-5 Seth Copen Goldstein
15
NSF 9/05
© 2001-5 Seth Copen Goldstein
16
Design Pressure
Reconfigurable Computing
Routing Resources
17
© 2001-5 Seth Copen Goldstein
'07
'09
'01
'03
'05
'99
'97
'95
'93
'89
'87
Productivity
105
104
103
102
10
1
0.1
10-2
Spanning 10-orders of Magnitude
Masks
Mask costs soar
Logic Ts (K)/Staff Mon
SEMATECH
Complexity
Spec written in C:
used to verify HW
and check user reqts
Logic Blocks
NSF 9/05
104
103
102
10
1
0.1
10-2
10-3
'81
Compiler
Verification
Design Crisis: By 2010,
1000 Man-years/chip
Logic Ts (M)/Chip
int reverse(int x)
{
int k,r=0;
for (k=0; k<64; k++)
r |= x&1;
x = x >> 1;
r = r << 1;
}
}
int func(int* a,int *b)
{
int j,sum=0;
for (j=0; *a>0; j++)
sum+=reverse(*b
HW Design
Spec
'85
General-Purpose
Custom Hardware
Custom Hardware
'83
General-Purpose
Mean time to chip: 46 weeks
User
Requirements
Other issues:
• Yield
• Parametric variation
• Power
Change in Spec, or bug in chip → must respin chip
NSF 9/05
© 2001-5 Seth Copen Goldstein
18
Performance: Ops/Clk * Clks/Sec
1 Program
1000.00
Compilers
Theory
Phoenix
Architecture
10 Billion Gates
Horowitz
NSF 9/05
© 2001-5 Seth Copen Goldstein
19
NSF 9/05
© 2001-5 Seth Copen Goldstein
20
ISA has to go?
SpecInt/Mhz
• Current ISA hides to much
– Good for
• forward compatibility
• human oriented assembly
• ad hock additions
– Bad for
• removing constraints
• exploiting compiler
• verification
• What can replace ISA?
Horowitz
NSF 9/05
21
© 2001-5 Seth Copen Goldstein
NSF 9/05
Breaking Abstractions
• Use available devices to:
– Map circuits in space, not time
– Reduce virtualization
– Decrease clock frequency
Algorithms
Programming Languages
• Eliminate
Intermediate Representations
ISA
Microarchitecture
NSF 9/05
Circuits
Devices
© 2001-5 Seth Copen Goldstein
22
Spatial Computing
Applications
Fabrication
© 2001-5 Seth Copen Goldstein
– Global control
– Global structures
Tools
• Use different devices/architecture
– Hybrid approach: CMOS+MSE
– Hybrid approach: match task to devices
FPGA
• Stochastic approaches
23
NSF 9/05
© 2001-5 Seth Copen Goldstein
24
Automatic Verification
Spatial Computing: C → hardware
C
• Compile ALL of ANSI-C
•Support Three levels of verification
CASH
CASH
core
– No pragmas or hardware directives needed
– Model Checking
• Check for attributes of C program
• Verify specification
– Translation Verification
• Prove translation is equivalent to original C code
– Self-Certification
• Allow safe and secure downloading of hardware
• Uses new intermediate representation
– Pegasus has precise semantics
– Correspondence between pegasus and linear logic
• Produces asynch circuits
IR
Program
a
x = a & 7;
...
Circuits
1000x
a
7
•Linear-logic ⇔ IR correspondence
Dedicated hardware
CASH circuits
&
2
y = x >> 2;
x
&7
Asynchronous μP
FPGA
>>
General-purpose DSP
>>2
Microprocessors
Operations
Variables
Nodes
Def-use edges
NSF 9/05
Pipeline stages
Channels (wires)
0.01
0.1
1
10
1000
100
Energy Efficiency [Operations/nJ]
© 2001-5 Seth Copen Goldstein
25
– Gives rise to typed-hardware
– Eliminates MANY design bugs early
– Prove useful runtime properties
NSF 9/05
Using Area to Reduce Power
– Gate leakage
•
•
•
•
• Pdyn = αCV2F
• F∝V
Dynamic power
[Chen97,Flynn99]
– If C per node remains the same
– If threshold voltage remains fixed
Static power
• C∝A
• Using F ∝ A-1/σ ⇒ Pdyn ∝ A A-3/σ
⇒ Pdyn ∝ A(σ-3/σ)
• If σ ≤ 3, power can be reduced by
using more area!
Early VLSI result: ATσ = constant
Thus, Tσ ∝ A-1, or T ∝ A-1/σ
Since T ∝ F-1, we get: F ∝ A-1/σ
Thesis: Use more devices (A) to reduce F
and in turn reduce P.
NSF 9/05
Joint work with Paul Beckett
© 2001-5 Seth Copen Goldstein
26
Dynamic Switching Power
• Power in CMOS has four components
– Dynamic switching
– Short-circuit
– Subthreshold leakage
© 2001-5 Seth Copen Goldstein
27
NSF 9/05
© 2001-5 Seth Copen Goldstein
28
What is σ
Subthreshold Power
• σ is a measure of a circuits inherent
sequentialness
• Lower values of σ mean a circuit is
more parallelizable
• Many important circuits have σ ≤ 2!
–
–
–
–
–
NSF 9/05
• Isub
0.8
VDD
1
Isub
so
– VGS=0, VDS=VDD
1.E-04
2
• IOFF ∝ e −40VTH 1.E-05 3
n
a
b
2 0.29 0.075
3 0.325 0.11
• Change how we set V
4 TH: VTH = a-bVDD
4 0.37 0.16
n
1.E-06
40
bV
DD which can be approx: V
• IOFF ∝ e
DD
(σ-3)/σ
• Psub ∝ A
29
© 2001-5 Seth Copen Goldstein
30
• Nanoscale imposes new constraints
• All components of power can be
reduced by using more transistors
such that:
(σ-k)/σ
P∝A
,k≥3
• Constraints/Comments:
– power, cost, defects, regularity, …
– regular, homogenous architectures
• Its not about technology, but size
• Reconfigurable Computing is inevitable
• Harness scaling
Use the massive numbers of devices
available at the nanoscale
• Tools are key
VDD must scale with F
Must set VTH properly
This improves energy-delay!
Algorithm must be parallel enough, i.e.,
– Make abstractions tool friendly
– Get human out of the loop
σalg < 3
© 2001-5 Seth Copen Goldstein
NSF 9/05
Summary
Power/Area Tradeoff
NSF 9/05
0.6
• Worst case off
current is when
I
FFT/DFT
Adders
Multipliers
Sorting
…
© 2001-5 Seth Copen Goldstein
–
–
–
–
−VDS
TH −VOFF
⎛
⎞ VGS −VnV
V
t
= I SO ⎜1 − e t ⎟e0.2
0.4
⎜
⎟
1.E-03
⎝
⎠
31
NSF 9/05
© 2001-5 Seth Copen Goldstein
32
Summary
Continuing the Trend
• Nanoscale imposes
new constraints
• Reconfigurable
fabrics
•
•
•
•
– power, cost, defects,
regularity,
…
• Reduce
manufacturing
costs
• Improve
time-to-market
– regular, homogenous
architectures
• Improve defect tolerance
Its not about technology, but size
• Asynchronous circuits
Reconfigurable Computing
is inevitable
• Reduce timing
issues
Harness scaling • Aid defect tolerance
• Reduce of
power
Use the massive numbers
devices
• Very high-level synthesis
available at the nanoscale
• Reduce design time
Tools are key
• Reduce verification time
• Spatial
Computing
– Make abstractions
tool friendly
• Reduce
– Get human out of the
loop power
• Reduce wire delay problem
NSF 9/05
© 2001-5 Seth Copen Goldstein
33
Tradeoff complexity (and
precision)future?
at manufacturing
Program
time
for
complexity
at
or
Program
compilation time. Configuration
Complex fixed chip
+
Program
NSF 9/05
Regular, tileable structures
+
Configuration
© 2001-5 Seth Copen Goldstein
34
What is Nanotechnology?
• Fundamental Misunderstanding?
Nanotechnology ≈ 10-9 meters
Postscript
• Maybe true for nanomaterials?
• My personal view:
Nanotechnology ≈ 109 components
NSF 9/05
© 2001-5 Seth Copen Goldstein
35
NSF 9/05
© 2001-5 Seth Copen Goldstein
36
CS & Nano
Challenges
• Computer science:
The science of controlling complexity
through abstraction.
• Nanotechnology:
Technology for constructing and
manipulating billions of nanoscale items.
• For example, Manage:
– Randomness/regularity of bottom-up assembly
– Build in defect-tolerance
– Complexity of manufacturing
NSF 9/05
© 2001-5 Seth Copen Goldstein
37
Nanoscale regime ⇒ Billions of components
• Use CS to control processes; eliminating
need for precise molecular manufacturing
yet yielding interesting and valuable
products
• CS contributions to nanotechnology:
–
–
–
–
–
Concurrency
Interfaces
Hierarchical assembly
Distributed control
…
Aka: How do we deal with complexity?
NSF 9/05
© 2001-5 Seth Copen Goldstein
38
Download