FEMTO-JOULE SWITCHING Review of Low Energy Approaches for the Nano Era Jabulani Nyathi Washington State University Valeriu Beiu Washington State University Snorre, Aunet University of Oslo, Norway With credits to Joel Birnbaum (HP), Hugo De Man (IMEC/KUL), Kaushik Roy (Purdue), Mark Lundstrom (Purdue), Vojin G. Oklobdzija (UCDavis), Takayasu Sakurai (University of Tokyo), Tadahiro Kuroda (Keio University), Anantha Chandrakasan (MIT), Richard Brown (Univ. of Utah), and ITRS Roadmap Motivation 2 Where are we going? Toward pervasive information systems Penetration Information utility Utility: The ability, capacity or power… to sati sfy the needs or gratify the desires of the majority or of the human race as a whole (Oxford English Dictionary) Cooperative computing Networked personal computing Distributed computing Batch computing and timesharing Information appliances Open systems of clients and servers Micros Appliance: a thing applied as a means to an end (Oxford English Dictionary) Minis Mainframes 1960 1970 1980 1990 2000 3 How to get there? The very big picture Services Network embedded C RF asp opamp System on Filters AD/DA dspP IC ASIP memory IP µP µC gate RT-ops FSM ASIC FPGA Silicon Board VHDL OO cC++ Softwar e Hardware 1960 70 Design Software 80 90 2000 2010 Year 4 How to get there? The very small picture 10 nm scale MOSFETs 1000 mA ID(on) 10 mA ID(off) 0.00001 mA 10X increase per technology node 1.2 nm 1990 2016 5 As the electrons vanish Scaling of electronic devices Vanishing electrons Number of chip components 1018 Electrons per device 104 Classical Age 1016 (16M) 103 14 10 (64M) (256M) (Transistors per chip) (1G) 1012 SIA Roadmap 1010 1995 108 Historical Trend 1990 106 104 (4M) 2000 2005 102 2010 (4G) (16G) 101 CMOS 100 1980 1970 102 101 100 10-1 10-2 Feature size (microns) 10-3 10-1 1985 1990 1995 2000 2005 2010 2015 2020 Year Power cost of information transfer? Information is a physical entity – Rolf Landauer, IBM Therefore, computation is a physical process d P = nkBT c 2 P kB T d c n = power = Boltzman constant = temperature = transmission distance = speed of light = operating frequency = number of parallel operations 6 Power Power Power 7 The trend: power, VDD, and current 200 500 0.5 Power per chip [W] VDD current [A] 2.5 0 0 0 Voltage Voltage [V] 2 Power 1.5 Current 1 1998 2002 2006 2010 2014 Year 8 How should we deal with power and speed? Device level devices must have low threshold voltages, reduced parasitic capacitances or better yet new devices Examples include fully and partially depleted silicon-on-insulator CMOS Novel nano devices (e.g., single electron transistors, molecular, spin transistor, etc.) Gate level Logic design styles that include Standard CMOS Domino logic Differential logic families Pseudo nMOS and many more Threshold logic Circuit level Clock gating, current sensing, etc Module level Will inherit the gains achieved at device, circuit and gate levels and manage these by employing innovative architectures (e.g., reduce switching activity). Chip level 9 Sources of power dissipation Power has been a secondary design issue to speed Device miniaturization and voltage scaling have led to: Fast switching speeds, High density designs, High leakage currents and ultimately increased power dissipation. In deep sub-micron (i.e. nano), the conflicting issues of high speed and low power are becoming even more prominent. 10 Past techniques for power reduction Voltage/frequency scaling Limited by technology. Not possible below a certain feature-size. Architectural adaptation Shut off portions of core when not needed Dynamic speculation control Reconfigurable caches Limitations: Very few choices to make Only dynamic power being saved Has associated overhead 11 TransMeta Example 12 Expression for average power Sufficient details of the currents drawn must be studied to allow for a detailed power analysis. The average total power in digital CMOS circuits can be described by: Ptotal = Pdynamic + Pshort_circuit + Pstatic The dynamic power component and methods to manage it, have seen a fair share of analysis. 13 Power component expressions Each component of the average power can be analyzed further as follows: Pdynamic = α • VDD• Vswing• CL • fCLK With VDD being the supply voltage, Vswing the output/internal node voltage swing, CL the load capacitance and f the switching rate of the output and α, the activity factor. Pshort_circuit = α • Isc_ave• Vswing Isc_ave is the average short circuit current over a period. 14 The static power … becomes important! The third component of the average power equation is: Pstatic = Psub_leakage + PDC Where Psub_leakage is due to sub-threshold leakage PDC is due to DC current For nano-electronics it is expected that the static component of power will be comparable to the dynamic power dissipation Standby power (Psub_leakage) – a component of static power will be the culprit due to scaling. 15 Example: Reducing dynamic power Reducing the active load: •Minimize the circuits •Use more efficient design •Charge recycling •More efficient layout Technology scaling: •The highest win •Thresholds should scale •Leakage starts to byte •Dynamic voltage scaling Pdynamic = a CL VDD Vswing fCLK Reduce switching activity: •Conditional clock •Conditional precharge •Switching-off inactive blocks •Conditional execution Run it slower: •Use parallelism •Less pipeline stages •Use doubleedge flip-flop 16 Is there an optimal design point ? 17 Power dissipation and circuit delay Power : V th 2 P = pt •f CLK • CL • VDD + I 0 •10 S Delay VDD • -4 1 5 0.8 Power (W) k•Q I = k • CL • VDD a (VDD - Vth ) ( a=1.3) -10 x 10 x 10 4 Delay (s) 0.6 0.4 = A 0.2 B 0 4 3 2 1 04 3 2 10.8 0.4 0 -0.4 3 A B 2 1 0.8 0.4 0 -0.4 18 Power-delay product, energy-delay product Lowest Voltage – Highest Threshold – no optimum Power-delay product is a misleading metric, as it favors a processor that operates at lower frequency Energy-delay is adequate, but energy delay2 should be used instead 19 2 Energy-delay 20 Lowering VDD to achieve ultralow power VDD should be lowered to the minimum level which ensures the real-time operation. 1.0 Normalized power Energy consumption is proportional to the square of VDD. 0.8 Variable Vdd Fixed Vdd 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 Normalized workload 1.0 21 Aggressively lowering VDD + Vth If VDD and Vth are dynamically scaled; the advantage is obvious 22 The future: sub-threshold and body bias ? 23 A fresh look at leakage currents Some device and circuit level techniques for leakage current reduction are: Dynamic threshold transistors (DTMOS) Technique permits the body voltage to be switched with the gate voltage. High threshold voltages in standby mode result in low leakage currents. Low threshold voltage in active mode allow for higher current drives (high speed). Multi-threshold CMOS (MTCMOS) A high threshold voltage device is placed in series with low threshold MOS devices Devices in the critical path are assigned low threshold voltages to allow for high gate speeds Devices that are not in the critical path are assigned high threshold voltages to dissipate minimum leakage power in standby mode. Digital sub-threshold voltage Devices operate in sub-threshold region (Vgs < |Vth|) Technique is suitable for ultra low power applications where speed is of secondary importance 24 Various DTMOS configurations VDD DTMOS Inveter configuration Vin Vout DTMOS: Allows for control of the bulk terminal Good for low voltage operation (VDD < 0.6V) 25 Basic MTCMOS architecture Low-VTH circuit (High leakage) High-VTH circuit (Low leakage) Critical paths Non-critical paths 26 MTCMOS circuits configuration VDD VDD Vsleep Low Vt Devices or Logic VGND Vsleep High Vt Device High Vt Device V_HIGH Low Vt Devices or Logic MTMOS: Low Vth in active mode Power supply is disconnected through the high Vth device in standby mode Extra high Vth memory circuit needed if data retention is necessary in standby mode 27 Digital sub-threshold circuits Improved characteristics including higher gain, better noise margin, and more energy efficient Ratio-ed logic (pseudo/true-NMOS) compared to CMOS logic in terms of switching and power Pseudo NMOS: Switches faster Draws high currents (dc currents are dominant) Dissipates more power Both CMOS and pseudo-nMOS sub-threshold logic are easy to design and more efficient as compared to other known ultra-low power logic, such as 28 Ring oscillator configurations Brown et al have compared floating body and DTMOS inverters. Body conditioning is expected to yield superior results Our ring oscillators use both conventional and adaptive body biasing. 29 Ring oscillators @ different nodes (PDP) Wp Wn Delay Current SPEE D nm nm ns nA GAIN VDD (mV) 450 3900 150 0 296.90 1250 150 0 183.00 Pseudo + Swap 1500 150 0 146.50 180 nm VDD (mV) 450 3375 108 0 176.70 900 108 0 1080 108 0 250 nm CMOS Pseudo nMOS CMOS Pseudo nMOS Pseudo + Swap 75.50 62.40 286 480 2800 270 688 3055 1.00 1.62 2.03 1.00 2.34 2.83 POWE R PDP EDP nW fJ fJ*ns 7.64 2.2689 7 7.90 1.4467 2 36.91 5.4084 8 4.30 0.7604 0 4.67 0.3531 6 17.15 1.0705 8 30 26 43 252 24 62 275 31 The best of both worlds ? 32 Effect of using different circuits styles 33 How are logic design styles affected? P P P LOGIC dynamic short_circu it STYLE a VDD Vswin 3C V DD VDD VDD CL DC Pleakage Isc*VDD IDC*VDD Isc*e-vt/vT*VDD 1.5 1X [0 if VDD≤Vtn+Vtp] 0 1 VDD 2 1X [0 if VDD≤Vtn+Vtp] 0 1 g Standard CMOS a/2 Domino 2a CL Pass Transistor a/2 CL VDD VDDVt 0.4 0 0 1 Differential (standard) 2a 2C V DD VDD 4 2X [0 if VDD≤Vtn+Vtp] 0 2 Differential w/ charge recycling 2a 2 2X [0 if VDD≤Vtn+Vtp] 0 2 Pseudo nMOS a/2 0.4 1X [0 if VDD≤Vtn] 1X [0 if VDD≤V] 1 L L 2C V DD VDD/2 L CL VDD VDDVt 34 Instead of conclusions … Where is CL? The interconnection dilemma Metal 7 “T HE FAULT, DEAR BRUTUS, LIES NOT I N OU R GATES, GATES , BUT I N OUR WI RES.” RES .” Metal 6 – with apologies to W. Shakespeare and J. Caesar Metal 5 Metal 4 Metal 3 Metal 2 Metal 1 Silicon wafer 35 36