What Comes Next? 1.E+11 The Impact of the Nanoscale on Computing Systems 1.E+10 ? Ops/sec/$ 1.E+09 doubles every 1.0 years 1.E+06 Seth Copen Goldstein Tubes/ Transistor 1.E+03 seth@cs.cmu.edu Mechanical/ Relays 1.E+00 Carnegie Mellon University CMOS doubles every 7.5 years 1.E-03 doubles every 2.3 years Combination of Hans Moravac + Larry Roberts + Gordon Bell WordSize*ops/s/sysprice 1.E-06 1880 NSF 9/05 © 2001-5 Seth Copen Goldstein 1900 1920 Technology Shifts 1980 2000 2010 2020 2030 2 © 2001-5 Seth Copen Goldstein Independent of Technology • Size of Devices ⇒ Inches to Microns to Nanometers • Type of Interconnect ⇒ Rods to Lithowires to Nanowires • Method of Fabrication ⇒ Hammers to Light to Self-Assembly • Largest Sustainable System ⇒ 101 to 108 to 1012 • Reliability ⇒ Bad to Excellent to Unknown © 2001-5 Seth Copen Goldstein 1960 From Gray Turing Award Lecture 1 NSF 9/05 NSF 9/05 1940 3 As we scale down: • Devices become Drain IBM – more variable – more faulty (defects & faults) – numerous • Fabrication becomes – More expensive – More constrained Gate Source 1 dopant atom CMOS Nano MIT HP • Design becomes – More complicated – More expensive • Market pressures remain NSF 9/05 © 2001-5 Seth Copen Goldstein 4 120 100 80 – more variable 40 – more faulty (defects & faults)200 – numerous 1 dopant atom 0.25 um 0.18 um '07 '09 '01 '03 '05 '99 '97 '95 '93 '89 '87 '85 '83 © 2001-5 Seth Copen Goldstein Intel Carbon Nanotube transistor ~2nm width Delft 1cm Copper wires, predicted ~50nm pitch IBM NSF 9/05 Nanowires, Already 17nm pitch Copper wires, predicted ~50nm pitch Fab Constraints The Red Brick Wall Design Costs 108 1cm Challenges arise from - Small size: changes in physical process - Many devices: increased complexity 104 ip Ch 105 n e CMOS tivity o du c er pr 1991 1993 1995 1987 1989 Parametric Variation Power Nano MIT YEAR TECHNOLOGY NODE On-chip local frequency (MHz) Number of metal levels - Logic Number of optional levels 2 o Jmax (A/cm ) - wire (at 105 C) Local wiring pitch - DRAM non-contacted (nm) Local wiring pitch - Logic (nm) Local wiring AR-Logic (Cu) Drain Gate Source Cu local dishing (nm) Intermediate wiring pitch - Logic (nm) Intermediate wiring h/w AR - Logic (Cu DD via/lin) IBM Cu intermediate wiring dishing dopant 15 um1wide wire atom (nm) Dielectric erosion, intermediate wiring 50% density (nm) Global wiring pitch - Logic (nm) Global wiring h/w AR - Logic - Cu DD via/line (nm) Cu global wiring dishing, 15 um wide wire (nm) Contact aspect ratio - DRAM, stacked cap Conductor effective resistivity (uohm -cm) Barrier/cladding thickness (nm) Interlevel metal insulator effective dielectric constant (k) - Logic 1985 1983 Desig HP 1999 2002 2005 180 nm 130 nm 100 nm 1.25 2.10 3.50 6-7 7-8 8-9 0 2 2 5.8 E5 9.6 E5 1.4 E6 360 260 200 500 325 230 Rest 1.4 1.5 1.7 18 14 11 560 405 285 2.0/2.1 2.2/2.1 2.4/2.2 Verification Design $ breakdown Verification 64 51 41 Efficiency 64 900 2.2/2.4 116 9.3 2.2 17 51 650 2.5/2.7 95 11.4 2.2 13 41 460 2.7/2.8 76 13 2.2 10 ALUs 4.0 - 3.5 3.5 - 2.7 2.2 - 1.6 2008 70 nm 6.00 9 3 2.1 E6 140 165 1.9 9 210 2.5/2.3 2011 2014 50 nm 35 nm 10.00 13.50 9-10 10 4 4 3.7 E6 4.6 E6 100 70 120 85 2.1 2.2 - 2.3 7 5 145 110 2.7/2.4 2.9/2.5 Wire/Gate delay gate wire 5ps 30 20ps 22 17 Global Wires 0 0 0 330 240 170 2.8/2.9 2.9/3.0 3.0/3.1 55 38 20 14.1 16.1 23.1 1.8 < 1.8 < 1.8 0 0 0 1.4 <1.5 <1.5 Solutions Exist Solutions being pursued No known solutions CalTech © 2001-5 Seth Copen Goldstein si z Mask Costs Opportunities too! 6 © 2001-5 Seth Copen Goldstein 109 107 106 … and those that do won’t be identical CalTech NSF 9/05 1010 1cm Nanowires, Already 17nm pitch IBM 5 Size Matters 10nm gate, end-of-roamap approx CMOS size 1cm 2007 Productivity Delft 2009 HP Intel 2003 MIT 105 104 103 102 10 1 0.1 10-2 2005 CMOS Complexity Nano SEMATECH 1999 104 103 102 10 1 0.1 -2 10 10-3 … but many of them probably won’t work 0.13 um Karen Brown, NIST SIA Roadmap Generation 2001 0.35 um '81 • Market pressures remain A trillion devices/cm2 IBM – Defect tolerance – Higher level specification – Universal substrate – Asynchronous circuits – Spatial Computing – More complicated – More expensive Carbon Nanotube transistor ~2nm width 10nm gate, end-of-roamap approx CMOS size 500 Gate Source Affordable Total Cost / Wafer Level Exposure 60 Requires: • Design becomes NSF 9/05 Drain 1981 – More expensive – More constrained Logic Ts (M)/Chip • Fabrication becomes Wafer Exposures/Mask 3000 160 140 Size Matters 1997 As we scale down: • Devices become 180 Logic Ts (K)/Staff Mon Mask Cost / Wafer Level Exposure ($) Independent of Technology 7 NSF 9/05 © 2001-5 Seth Copen Goldstein 8 2007 2009 2003 2005 1999 2001 1997 1991 1993 Simple/regular CMOS Nano layout defect tolerant MIT HP Verification Parametric Variation Manufacturing Paradigm Shift Required • Reliable Systems from reliable components Reliable systems from unreliable components • Functionality invested at time of manufacture Functionality modified after manufacture New manufacturing: Bottom-up assembly Tolerate parametric variation Design $ Automatic breakdown Verification Verification Wire/Gate delay gate Simple, short, wire unidirectional 5ps 20ps interconnect Mask Costs Efficiency Global Wires No interpretation Distributed control, Asynchronous Drain Rest Gate Source IBM 1 dopant atom Reduce or Eliminate mask costs ALUs NSF 9/05 • Top-Down © 2001-5 Seth Copen Goldstein Limited Patterns – Sub wavlength lithography OPC, RET, CPM, … – Nanoimprint lithography – DPN • Bottom-Up • Behavior remains same as features scales down Expect increased variability Changes in functionality Restrictions on connectivity 9 NSF 9/05 TI © 2001-5 Seth Copen Goldstein 10 Balance CalTech – Self-assembly Future Today Simple parallel hw, mostly idle Resnick, etal 1981 104 1995 105 Power Power Fab Constraints e Automatic si z ip Ch y ctivit translation rodu ner p ig s De VHLS 1987 107 106 1989 108 Design Costs 1983 109 1985 1010 • Nanoscale makes things harder • Nanoscale makes things easier • Challenge: Use devices to – Ease restrictions – Reduce complexity – Reduce power • How: change abstractions and tools caltech NSF 9/05 Whang, etal © 2001-5 Seth Copen Goldstein Nanoin 11 NSF 9/05 © 2001-5 Seth Copen Goldstein 12 The Clock Reconfigurable Computing • Design for worst case arrival – Parametric variation – Timing closure – Power • Asynchrounous circuits – – – – No global controllers No global clock No timing closure Tolerant of parametric variation Datain Logic Req General-Purpose Ack int reverse(int x) { int k,r=0; for (k=0; k<64; k++) r |= x&1; x = x >> 1; r = r << 1; } } int func(int* a,int *b) { int j,sum=0; for (j=0; *a>0; j++) sum+=reverse(*b Handshaking Reg Dataout Custom Hardware Req Ack Compiler • Use more devices to Logic Blocks – Reduce power – Support device scaling – Support defect tolerance NSF 9/05 © 2001-5 Seth Copen Goldstein Routing Resources 13 NSF 9/05 Reconfigurable Rationale © 2001-5 Seth Copen Goldstein 14 Reconfigurability & DFT •Reconfigurable Architectures address roadblocks – Yield with defect tolerance – Cost single substrate eliminates NRE – Manufacturability Crystaline architecture reduces fab complexity – Power General-Purpose Custom Hardware Place & Route Power ∝ Area(3 -σ )/σ, where σ is algrthm dependent; typically 2 < σ < 3. Place& Route •However, must change computing approach • FPGA computing fabric – – – – Regular periodic Fine-grained Homogenous • programs ⇒ circuits • Aides defect tolerance •Aside: Molecular Scale Electronics increases fabric density NSF 9/05 © 2001-5 Seth Copen Goldstein 15 NSF 9/05 © 2001-5 Seth Copen Goldstein 16 Design Pressure Reconfigurable Computing Routing Resources 17 © 2001-5 Seth Copen Goldstein '07 '09 '01 '03 '05 '99 '97 '95 '93 '89 '87 Productivity 105 104 103 102 10 1 0.1 10-2 Spanning 10-orders of Magnitude Masks Mask costs soar Logic Ts (K)/Staff Mon SEMATECH Complexity Spec written in C: used to verify HW and check user reqts Logic Blocks NSF 9/05 104 103 102 10 1 0.1 10-2 10-3 '81 Compiler Verification Design Crisis: By 2010, 1000 Man-years/chip Logic Ts (M)/Chip int reverse(int x) { int k,r=0; for (k=0; k<64; k++) r |= x&1; x = x >> 1; r = r << 1; } } int func(int* a,int *b) { int j,sum=0; for (j=0; *a>0; j++) sum+=reverse(*b HW Design Spec '85 General-Purpose Custom Hardware Custom Hardware '83 General-Purpose Mean time to chip: 46 weeks User Requirements Other issues: • Yield • Parametric variation • Power Change in Spec, or bug in chip → must respin chip NSF 9/05 © 2001-5 Seth Copen Goldstein 18 Performance: Ops/Clk * Clks/Sec 1 Program 1000.00 Compilers Theory Phoenix Architecture 10 Billion Gates Horowitz NSF 9/05 © 2001-5 Seth Copen Goldstein 19 NSF 9/05 © 2001-5 Seth Copen Goldstein 20 ISA has to go? SpecInt/Mhz • Current ISA hides to much – Good for • forward compatibility • human oriented assembly • ad hock additions – Bad for • removing constraints • exploiting compiler • verification • What can replace ISA? Horowitz NSF 9/05 21 © 2001-5 Seth Copen Goldstein NSF 9/05 Breaking Abstractions • Use available devices to: – Map circuits in space, not time – Reduce virtualization – Decrease clock frequency Algorithms Programming Languages • Eliminate Intermediate Representations ISA Microarchitecture NSF 9/05 Circuits Devices © 2001-5 Seth Copen Goldstein 22 Spatial Computing Applications Fabrication © 2001-5 Seth Copen Goldstein – Global control – Global structures Tools • Use different devices/architecture – Hybrid approach: CMOS+MSE – Hybrid approach: match task to devices FPGA • Stochastic approaches 23 NSF 9/05 © 2001-5 Seth Copen Goldstein 24 Automatic Verification Spatial Computing: C → hardware C • Compile ALL of ANSI-C •Support Three levels of verification CASH CASH core – No pragmas or hardware directives needed – Model Checking • Check for attributes of C program • Verify specification – Translation Verification • Prove translation is equivalent to original C code – Self-Certification • Allow safe and secure downloading of hardware • Uses new intermediate representation – Pegasus has precise semantics – Correspondence between pegasus and linear logic • Produces asynch circuits IR Program a x = a & 7; ... Circuits 1000x a 7 •Linear-logic ⇔ IR correspondence Dedicated hardware CASH circuits & 2 y = x >> 2; x &7 Asynchronous μP FPGA >> General-purpose DSP >>2 Microprocessors Operations Variables Nodes Def-use edges NSF 9/05 Pipeline stages Channels (wires) 0.01 0.1 1 10 1000 100 Energy Efficiency [Operations/nJ] © 2001-5 Seth Copen Goldstein 25 – Gives rise to typed-hardware – Eliminates MANY design bugs early – Prove useful runtime properties NSF 9/05 Using Area to Reduce Power – Gate leakage • • • • • Pdyn = αCV2F • F∝V Dynamic power [Chen97,Flynn99] – If C per node remains the same – If threshold voltage remains fixed Static power • C∝A • Using F ∝ A-1/σ ⇒ Pdyn ∝ A A-3/σ ⇒ Pdyn ∝ A(σ-3/σ) • If σ ≤ 3, power can be reduced by using more area! Early VLSI result: ATσ = constant Thus, Tσ ∝ A-1, or T ∝ A-1/σ Since T ∝ F-1, we get: F ∝ A-1/σ Thesis: Use more devices (A) to reduce F and in turn reduce P. NSF 9/05 Joint work with Paul Beckett © 2001-5 Seth Copen Goldstein 26 Dynamic Switching Power • Power in CMOS has four components – Dynamic switching – Short-circuit – Subthreshold leakage © 2001-5 Seth Copen Goldstein 27 NSF 9/05 © 2001-5 Seth Copen Goldstein 28 What is σ Subthreshold Power • σ is a measure of a circuits inherent sequentialness • Lower values of σ mean a circuit is more parallelizable • Many important circuits have σ ≤ 2! – – – – – NSF 9/05 • Isub 0.8 VDD 1 Isub so – VGS=0, VDS=VDD 1.E-04 2 • IOFF ∝ e −40VTH 1.E-05 3 n a b 2 0.29 0.075 3 0.325 0.11 • Change how we set V 4 TH: VTH = a-bVDD 4 0.37 0.16 n 1.E-06 40 bV DD which can be approx: V • IOFF ∝ e DD (σ-3)/σ • Psub ∝ A 29 © 2001-5 Seth Copen Goldstein 30 • Nanoscale imposes new constraints • All components of power can be reduced by using more transistors such that: (σ-k)/σ P∝A ,k≥3 • Constraints/Comments: – power, cost, defects, regularity, … – regular, homogenous architectures • Its not about technology, but size • Reconfigurable Computing is inevitable • Harness scaling Use the massive numbers of devices available at the nanoscale • Tools are key VDD must scale with F Must set VTH properly This improves energy-delay! Algorithm must be parallel enough, i.e., – Make abstractions tool friendly – Get human out of the loop σalg < 3 © 2001-5 Seth Copen Goldstein NSF 9/05 Summary Power/Area Tradeoff NSF 9/05 0.6 • Worst case off current is when I FFT/DFT Adders Multipliers Sorting … © 2001-5 Seth Copen Goldstein – – – – −VDS TH −VOFF ⎛ ⎞ VGS −VnV V t = I SO ⎜1 − e t ⎟e0.2 0.4 ⎜ ⎟ 1.E-03 ⎝ ⎠ 31 NSF 9/05 © 2001-5 Seth Copen Goldstein 32 Summary Continuing the Trend • Nanoscale imposes new constraints • Reconfigurable fabrics • • • • – power, cost, defects, regularity, … • Reduce manufacturing costs • Improve time-to-market – regular, homogenous architectures • Improve defect tolerance Its not about technology, but size • Asynchronous circuits Reconfigurable Computing is inevitable • Reduce timing issues Harness scaling • Aid defect tolerance • Reduce of power Use the massive numbers devices • Very high-level synthesis available at the nanoscale • Reduce design time Tools are key • Reduce verification time • Spatial Computing – Make abstractions tool friendly • Reduce – Get human out of the loop power • Reduce wire delay problem NSF 9/05 © 2001-5 Seth Copen Goldstein 33 Tradeoff complexity (and precision)future? at manufacturing Program time for complexity at or Program compilation time. Configuration Complex fixed chip + Program NSF 9/05 Regular, tileable structures + Configuration © 2001-5 Seth Copen Goldstein 34 What is Nanotechnology? • Fundamental Misunderstanding? Nanotechnology ≈ 10-9 meters Postscript • Maybe true for nanomaterials? • My personal view: Nanotechnology ≈ 109 components NSF 9/05 © 2001-5 Seth Copen Goldstein 35 NSF 9/05 © 2001-5 Seth Copen Goldstein 36 CS & Nano Challenges • Computer science: The science of controlling complexity through abstraction. • Nanotechnology: Technology for constructing and manipulating billions of nanoscale items. • For example, Manage: – Randomness/regularity of bottom-up assembly – Build in defect-tolerance – Complexity of manufacturing NSF 9/05 © 2001-5 Seth Copen Goldstein 37 Nanoscale regime ⇒ Billions of components • Use CS to control processes; eliminating need for precise molecular manufacturing yet yielding interesting and valuable products • CS contributions to nanotechnology: – – – – – Concurrency Interfaces Hierarchical assembly Distributed control … Aka: How do we deal with complexity? NSF 9/05 © 2001-5 Seth Copen Goldstein 38