VLSI Design Power Frank Sill Torres Department of Electronic Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil franksill@ufmg.br http://www.cpdee.ufmg.br/~frank/ TRENDS Copyright Sill Torres, 2012 2 Trend: Performance 1000000 100000 Pentium® 4 proc 10000 1 TIPS 1000 MIPS 100 10 1 386 Pentium® proc 8086 0,1 0,01 1970 8080 1980 1990 2000 2010 2020 Source: Moore, ISSCC 2003 Copyright Sill Torres, 2012 3 Trends – Power Dissipation SoC Consumer Portable Power Trend [Source: ITRS, 2010 Update] Copyright Sill Torres, 2012 Trends - Power Density Nuclear Reactor → ←Hot Plate Source: http://cpudb.stanford.edu/ Copyright Sill Torres, 2012 Problems of High Power Dissipation Continuously increasing performance demands Increasing power dissipation of technical devices Today: power dissipation is a main problem High Power dissipation leads to: Reduced time of operation High efforts for cooling Higher weight (batteries) Increasing operational costs Reduced mobility Reduced reliability Copyright Sill Torres, 2012 6 Chip Power Density Distribution Power Map On-Die Temperature Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots Impact on packaging, cooling Copyright Sill Torres, 2012 7 „The Internet is an Electricity Hog“ Badische Zeitung, 2003 Energy for the internet in 2001 in Germany: 6.8 Bill. kWh = 1.4 % of total energy consumption 2.35 Bn. kWh for 17.3 Mill. Internet-PCs 1.91 Bn. for servers 1.67 Bn. for the network 0.87 Bn. for USV Rate of growth (at the moment): 36 % per year Prognosis: 2010 33 Bn. kWh > 6 % total energy consumption > 3 medium nuclear power plants World: 400 Mill. PCs 0.16 PW (P = Peta=1015) Copyright Sill Torres, 2012 8 Dissipation in a Notebook Peripherals Processing ASICs Disk Display Power supply Battery Copyright Sill Torres, 2012 programmable µPs or DSPs Memory Communication DC-DC converter WLAN Ethernet 9 Examples for Energy Dissipation Energy dissipation in a notebook Copyright Sill Torres, 2012 Energy dissipation a PDA 10 Battery Capacity Generalized Moore‘s Law Intel beats Varta Capacity of batteries 2% - 6% Increase per year (up to year 2000) Source: Timmernann, 2007 Copyright Sill Torres, 2012 11 Current Progresses Batter. 20 kg Factor 4 in the last 10 years still much too less Copyright Sill Torres, 2012 12 POWER CONSUMPTION IN CMOS Copyright Sill Torres, 2012 13 Metrics: Energy and Power Energy Measured in Joules or kWh “Measure of the ability of a system to do work or produce a change” “No activity is possible without energy.” Power Measured in Watts or kW “Amount of energy required for a given unit of time.” Average power Average amount of energy consumed per unit time Simplified to "power" in clear contexts Instantaneous power Energy consumed if time unit goes to zero Copyright Sill Torres, 2012 14 Metrics: Energy and Power cont’d Instantaneous Electrical Power P(t) P(t) = v(t) * i(t) v(t): Potential difference (or voltage drop) across component i(t): Current through component Electrical Energy E = P(t) * t = v(t) * i(t) * t Electrical Energy in CMOS circuits Energy = Power * Delay Why? Copyright Sill Torres, 2012 15 Consumption in CMOS Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water 1 CL 0 Energy consumption is proportional to capacitive load! Copyright Sill Torres, 2012 16 Consumption in CMOS cont’d Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water 1 CL 0 Energy for calculation only consumed at 0→1 at output Copyright Sill Torres, 2012 17 Energy and Instantaneous Power INV1: High instantaneous Power (bigger width) CL Same Energy (Cin ingnored) INV1 is faster INV2: Low instantaneous power CL Copyright Sill Torres, 2012 td1 td2 18 Metrics: Energy and Power cont’d Power is height of curve Watts Approach 1 Approach 2 time Energy is area under curve Watts Approach 1 Approach 2 time Energy = Power * time for calculation = Power * Delay Copyright Sill Torres, 2012 19 Metrics: Energy and Power cont’d Energy dissipation Determines Sets battery life in hours packaging limits Peak power Determines Impacts power ground wiring designs signal noise margin and reliability analysis Copyright Sill Torres, 2012 20 Metrics: PDP and EDP Power-Delay Product Power Quality P, delay tp criterion PDP = P * tp [J] P and tp have some weight Two designs can have same PDP, even if tp = 1 year Energy-Delay Product EDP = PDP * tp = P * tp2 Delay tp Copyright Sill Torres, 2012 has higher weight 21 Energy and Power Average Power direct proportional to Energy In Following: Power means average power Copyright Sill Torres, 2012 22 Where Does Power Go in CMOS? Dynamic Power Consumption Short Circuit Currents Charging and discharging capacitors Short circuit path between supply rails during switching Leakage Leaking diodes and transistors Copyright Sill Torres, 2012 23 Dynamic Power Consumption VDD Vin Vout CL f01= α * f Pdyn = CL * VDD2 * P01 * f P01 : probability for 0-to-1 switch of output f : clock frequency α : activity Data dependent - a function of switching activity! Copyright Sill Torres, 2012 24 Dynamic Power Consumption E c I (t )VDD (t )dt VDD 0 dV CL VDD (t )dt dt 0 CL CLVDD VDD dV 0 2 CLVDD Copyright Sill Torres, 2012 25 Transition Probabilities for CMOS Cells Example: Static 2 Input NOR Cell If A and B with same input signal probability: Truth table of NOR2 cell A B Out 1 1 0 0 1 0 1 0 0 0 0 1 PA=1 = 1/2 PB=1 = 1/2 Then: POut=0 = 3/4 POut=1 = 1/4 P0→1 = POut=0 * POut=1 = 3/4 * 1/4 = 3/16 Ceff = P0→1 * CL = 3/16 * CL Copyright Sill Torres, 2012 26 Transition Probabilities cont’d A and B with different input signal probability: PA and PB : Probability that input is 1 P1 : Probability that output is 1 Switching activity in CMOS circuits: P01 = P0 * P1 For 2-Input NOR: P1 = (1-PA)(1-PB) Thus: P01 = (1-P1)*P1 = [1-(1-PA)(1-PB)]*[(1-PA)][1-PB] (see next slide) P01 = Pout=0 * Pout=1 NOR (1 - (1 - PA)(1 - PB)) * (1 - PA)(1 - PB) OR (1 - PA)(1 - PB) * (1 - (1 - PA)(1 - PB)) NAND PAPB * (1 - PAPB) AND (1 - PAPB) * PAPB XOR (1 - (PA + PB- 2PAPB)) * (PA + PB- 2PAPB) Copyright Sill Torres, 2012 27 Transition Probabilities cont’d Transition Probability of NOR2 Cell as a Function of Input Probabilities Probability of input signals → high influence on P01 Source: Timmernann, 2007 Copyright Sill Torres, 2012 28 Short Circuit Power Consumption VDD Vin Isc Vout CL tsc GND Finite slope of input signal During switching: NMOS and PMOS transistors are conducting for short period of time (tsc) Direct current path between VDD and GND Psc = VDD * Isc * (P01 + P10 ) Copyright Sill Torres, 2012 29 Leakage Power Consumption VDD Gate Igate Source Igate Isub Drain SiO2 Isub L CL GND Copyright Sill Torres, 2012 Most important Leakage currents: Subthreshold Leakage Isub Gate Oxide Leakage Igate Pleak = Ileak * VDD ≈ (Isub + Igate)* VDD 30 Power Equations in CMOS P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak Dynamic power (≈ 40 - 70% today and decreasing relatively) Copyright Sill Torres, 2012 Short-circuit power (≈ 10 % today and decreasing absolutely) Leakage power (≈ 20 – 50 % today and increasing) 31 LEAKAGE Copyright Sill Torres, 2012 32 Trends 30 nm 50 nm 20 nm 10 nm 35 nm SiGe S/D Strained Silicon 5 nm SiGe S/D Strained Silicon Metal Gate Nanowire Tri-Gate 5 nm High-k Si Substrate S G Copyright Sill Torres, 2012 D S III-V Carbon Nanotube FET 33 Trends cont‘d Power Dissipation [W] (100 mm² Chip) 1400 Power Dissipation by Leakage currents 1200 1000 800 Dynamic Power Dissipation 600 400 200 0 90 nm 65 nm 45 nm 32 nm 22 nm 16 nm Technology Technologie Source: S. Borkar (Intel), ‘05 Copyright Sill Torres, 2012 34 Recap: Transistor Geometrics polysilicon gate Gate-width W tox L n+ n+ SiO2 gate oxide (good insulator, eox = 3.9 p-type body tox – thickness of oxide layer Gate length Source: Rabaey,“Digital Integrated Circuits”,1995 Copyright Sill Torres, 2012 35 Subthreshold Leakage Threshold Voltage Transistor characteristic If: „Gate-Source“-Voltage Vgs higher than Vth Channel under Gate Current between Drain and Source If: Vgs lower than Vth (ideal) No current Vgs < >V Vthth Gate Gate Subthreshold leakage Isub Leakage between Drain and Source when Vgs < Vth Based on: Short Channels Diffusion Thermionic Emission Copyright Sill Torres, 2012 Drain Source Source Isub Drain Diffusion high Concentration Low concentration 36 Subthreshold Leakage cont’d Short-channel device Log (Drain current) Transistor is conducting Isub NMOS-Transistor 0 Vth’ Vth Gate voltage Source: Agarwal, 2007 Copyright Sill Torres, 2012 37 Drain Induced Barrier Lowering (DIBL) Vgs > Vth Vgs < Vth Vds Vds Gate Source Gate Drain Source Drain Potential Height of curve = Potential barrier Changed by gate voltage Electrons have to overcome potential barrier to enter the channel Ideal: Potential barrier is only controlled by gate voltage Copyright Sill Torres, 2012 38 Drain Induced Barrier Lowering cont’d Long-channel transistor (L > 2 µm) Short-channel transistor (L < 180 nm) Vds Vds G Gate Source S Drain D Lowering of potential barrier Vds = Vth Vds = Vth Vds = VDD Vds = VDD At short channel transistors potential barrier is also affected by drain voltage If Vds = VDD Transistors can start to conduct even if Vgs < Vth Copyright Sill Torres, 2012 39 Temperature dependence 20 Source: Chatterjee, Intel-labs IOFF at 1100C Normalized Isub/µm 16 Isub at 250C 12 8 4 130nm6x 0 0 20 40 60 80 100 120 Temperature (°C) Based on Thermionic Emission: subthreshold leakage Isub increases with temperature Copyright Sill Torres, 2012 40 Gate Oxide Leakage Tunneling effect Electromagnetic wave strike at barrier: Reflection Potential Energy Energy Potential + Intrusion into barrier 0 If thickness is small enough: Wave interfuse barrier partially: (Electrons tunnel through Barrier) Gate Igate oxide leakage Igate In Nanometer-Transistors, where Tox< 2 nm Electrons Leakage x Tox Gate Gateoxide Source Tox Drain tunnel through gate oxide current Copyright Sill Torres, 2012 41 Gate Oxide Thickness at 45 nm Copyright Sill Torres, 2012 42 Gate Oxide Leakage cont’d Components of Gate Oxide Leakage: Tunneling currents through overlap regions (gate-drain Igso, gatesource Igdo) Tunneling currents into channel (gate-drain Igis, gate-source Igcd) Tunneling currents between gate and bulk (Igb) Gate Source Igso Igcd Igcs Igdo Drain Igb Bulk Copyright Sill Torres, 2012 43 Further Leakage Components Reverse bias pn junction conduction Ipn Gate induced drain leakage IGIDL Drain source punchthrough IPT Hot carrier injection IHCI IHCI Gate Source IGIDL Ipt Copyright Sill Torres, 2012 Drain Ipn 44 Leakage Dependencies Leakage depends on: Gate Width (Isub, Igate) Gate Length (Isub, Igate) Gate Oxide Thickness (Igate) Threshold Voltage (Isub) Temperature Input (Isub) state (Igate) Copyright Sill Torres, 2012 45 LOW POWER TECHNIQUES Copyright Sill Torres, 2012 46 Lowering Dynamic Power Reducing VDD has a quadratic effect! Has a negative effect on performance especially as VDD approaches 2VT Lowering CL Improves Keep performance as well transistors minimum size Reducing the switching activity, f01 = P01 * f A function of signal statistics and clock rate Impacted Copyright Sill Torres, 2012 by logic and architecture design decisions 47 Power & Delay Dependence of Vth VTH W P pt f CLK CL VDD 2 I 0 T 10 S VDD W0 td k Q k' CL VDD I (W / L ) (VDD VTH ) K w.o. gate leakage Source: Sakurai, ‘01 Copyright Sill Torres, 2012 Micro transductors ‘08, Low Leakage 48 Transistor Sizing for Power Minimization Lower Capacitance Higher Voltage Small W’s To keep performance Large W’s Higher Capacitance Lower Voltage Larger sized devices: only useful only when interconnects dominate Minimum sized devices: usually optimal for low-power Source: Timmernann, 2007 Copyright Sill Torres, 2012 49 Logic Style and Power Consumption Voltage increases: Power-delay product improves Best logic style minimizes power-delay for a given delay constraint New Logic style can reduced Power dissipation (if possible / available !) Source: Jan M. Rabaey Copyright Sill Torres, 2012 50 Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P01 = P0 * P1 = (1 - PAPB) * PAPB 0.5 A B 0.5 (1-0.25)*0.25 = 3/16 W 7/64 = 0.109 X 15/256 C F 0.5 D 0.5 0.5 A 0.5 B 0.5 C 0.5 D 3/16 Y 15/256 F Z 3/16 = 0.188 Chain implementation has a lower overall switching activity than tree implementation for random inputs BUT: Ignores glitching effects Source: Jan M. Rabaey Copyright Sill Torres, 2012 51 Input Ordering (1-0.5x0.2)*(0.5x0.2)=0.09 0.5 A B 0.2 X C 0.1 F (1-0.2x0.1)*(0.2x0.1)=0.0196 0.2 B X C F 0.1 A 0.5 AND: P01 = (1 - PAPB) * PAPB Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5) Source: Jan M. Rabaey Copyright Sill Torres, 2012 52 Glitching A B X Z C ABC 101 000 X Z Unit Delay Source: Jan M. Rabaey Copyright Sill Torres, 2012 53 Example 1: Chain of NAND Cells out1 out2 out3 out4 out5 1 ... V (Volt) 6.0 4.0 out2 out4 out6 out8 VDD / 2 2.0 out1 out3 out5 out7 0.0 0 1 t (nsec) 2 3 Source: Jan M. Rabaey Copyright Sill Torres, 2012 54 Example 2: Adder Circuit Cin S14 S15 S0 S1 S2 S Output Voltage (V) 3 S3 2 S4 Cin S2 S15 VDD / 2 S5 1 S10 S1 S0 0 0 2 4 6 8 10 12 Time (ps) Source: Jan M. Rabaey Copyright Sill Torres, 2012 55 How to Cope with Glitching? 0 F1 0 1 F2 0 0 2 F3 0 0 F1 1 F3 0 0 F2 1 Equalize Lengths of Timing Paths Through Design Source: Jan M. Rabaey Copyright Sill Torres, 2012 56 Clock Gating Power is reduced by two mechanisms –Clock net toggles less frequently, reducing feff –Registers’ internal clock buffering switches less often d din en q dout enF FSM enE Execution Unit enM Memory Control clk d din q qn clk en clk Local Gating Copyright Sill Torres, 2012 dout clk Global Gating Source: Jan M. Rabaey qn clk Clock Gating Insertion Local clock gating: 3 methods Logic synthesizer finds and implements local gating opportunities RTL code explicitly specifies clock gating Clock gating cell explicitly instantiated in RTL Global clock gating: 2 methods RTL code explicitly specifies clock gating Clock gating cell explicitly instantiated in RTL Source: Jan M. Rabaey Copyright Sill Torres, 2012 Clock Gating VHDL Code Conventional RTL Code //always clock the register if rising_edge (clk) then // form the flip-flop if (enable = ‘1’)then q <= din; end if; end if; Low Power Clock Gated RTL Code //only clock the register when enable is true gclk <= enable and clk; // gate the clock if rising_edge (gclk) then // form the flip-flop q <= din; end if; Instantiated Clock Gating Cell //instantiate a clock gating cell from the target library I1: clkgx1 port map(en=>enable, cp=>clk, gclk_out=>gclk); if rising_edge (gclk) then // form the flip-flop q <= din; Source: Jan M. Rabaey end if; Copyright Sill Torres, 2012 Clock Gating: Example Without clock gating 30.6mW With clock gating 8.5mW 0 5 10 15 VDE 20 25 MIF DSP/ HIF Power [mW] 90% of FlipFlops clock-gated DEU 896Kb SRAM 70% power reduction by clock-gating MPEG4 decoder Source: M. Ohashi, Matsushita, 2002 Copyright Sill Torres, 2012 Data Gating Objective Reduce wasted operations => reduce feff Example X Multiplier whose inputs change every cycle, whose output conditionally feeds an ALU Low Power Version Inputs are prevented from rippling through multiplier if multiplier output is not selected X Source: Jan M. Rabaey Copyright Sill Torres, 2012 Data Gating Insertion Two insertion methods Logic synthesizer finds and implements data gating opportunities RTL code explicitly specifies data gating Some opportunities cannot be found by synthesizers Issues Extra logic in data path slows timing Additional area due to gating cells Source: Jan M. Rabaey Copyright Sill Torres, 2012 Data Gating VHDL Code: Operand Isolation Conventional Code assign muxout = sel ? A : A*B ; B // build mux X muxout Low Power Code A sel assign multinA = sel & A ; // build and cell assign multinB = sel & B ; // build and cell assign muxout = sel ? A : multinA*multinB ; B X muxout A sel Copyright Sill Torres, 2012 Source: Jan M. Rabaey Influence of Threshold Voltage Vth Threshold Voltage Vth: Influence on sub-threshold leakage Isub Influence on delay of logic cells 55 160 120 Isub 50 45 80 40 40 0 0.25 35 0.27 0.29 0.31 0.33 0.35 Dealy [ps] Leakage- -Isub Isub [nA] [nA] Leakage Inverter (BPTM 65 nm) 30 0.37 [V] [V] Threshold Voltage VthNMOS VoltageVthNMOS Threshold Copyright Sill Torres, 2012 64 Influence of Gate Oxide Thickness Tox Gate oxide Thickness Tox: Influence on gate oxide leakage Igate Influence on delay 160 50 120 45 Igate 40 80 35 40 30 0 25 1.4 1.6 1.7 1.8 2.0 Delay [ps] Leakage - Igate [nA] Inverter (BPTM 65 nm) 2.2 Gate oxide Thicknes Tox [nm] Copyright Sill Torres, 2012 65 Recap: Data Paths Data propagate through different data paths between registers (flipflops - FF) Paths mostly differ in propagation delay times Frequency of clock signal (CLK) depends on path with longest delay critical path FF FF FF FF FF FF Paths Path FF CLK Copyright Sill Torres, 2012 FF CLK FF CLK 66 Recap: Slack C A B G1 Y G2 A G1 ready with evaluation B Y all inputs of G2 arrived all Inputs of G1 arrived C delay of G1 Copyright Sill Torres, 2012 Slack for G1 time 67 Dual-Vth / Dual-Tox Two different cell types: “LVT / LTO”- Cells Cells consist of „low-Vth“- or „low-Tox“-transistors Low threshold voltage or thin gate oxide layer For critical paths High leakage / short delay “HVT / HTO”- Cells Cells consist of „high-Vth“- „high-Tox“-transistors High threshold voltage or thick gate oxide layer For uncritical paths Low leakage / long delay Leakage reduction at constant performance (no level converter necessary) Copyright Sill Torres, 2012 68 Normalized Performance Performance at different Dual-Vth 1.0 0.8 0.6 0.4 0.2 0.0 1.0V 0.9V Low Vth High Vth 0.8V 0.7V Supply Voltage VDD 0.6V Measured at NAND2 BPTM 65nm Technology Copyright Sill Torres, 2012 69 Sub-Threshold Lekage [nA] Leakage Isub at different Dual-Vth 80 60 40 20 0 1.0V 0.9V Low Vth High Vth 0.8V 0.7V 0.6V Supply Voltage VDD Measured at NAND2 BPTM 65nm Technology Copyright Sill Torres, 2012 70 Dual-Vth / Dual-Tox Example LVT- and/or LTO-Cells HVT- and/or HTO-Cells Critical Path Copyright Sill Torres, 2012 71 Stack Effect Transistor stack: at least two transistor from same type (NMOS or PMOS) in a row Based on behavior of internal nodes: The more transistors are non-conducting (off) the lower the leakage Leakage Isub [nA] 10 8 6 4 2 0 1 2 3 Transistors off in stack Copyright Sill Torres, 2012 4 Source: K. Roy 72 Sleep Transistors Idea: Insertion of additional transistors between logic block and supply lines sleep This transistors: connect with SLEEPsignal Vdd Virtual Vdd If circuit has nothing to do: SLEEP signal is active: Stack effect (additional off transistor in row to other) If sleep transistors are High-Vth: approach also called Multi-Threshold CMOS (MTCMOS) Low-Vth logic cells sleep Virtual Vss Vss Mostly insertion only of 1 Transistor Source: Kaijian Shi, Synopsys Copyright Sill Torres, 2012 73 Sleep Transistors: Realization Ring style sleep transistor implementation Global VDD VDD VVDD1 domain VVDD2 domain Sleep transistors are placed around each VVDD island Source: Kaijian Shi, Synopsys Copyright Sill Torres, 2012 74 Sleep Transistors: Realization cont’d Grid style sleep transistor implementation Global VDD VVDD1 VDD VVDD2 VVDD1 VVDD2 VVDD1 VVDD2 VDD network cross chip; VVDD networks in each gating domain Sleep transistors are placed in grid connecting VDD and VVDDs Source: Kaijian Shi, Synopsys Copyright Sill Torres, 2012 75 Sleep Transistors: Problems SLEEP VDD VDD CMOS Gatter / Block CMOS Gatter / Block high-Vth sleep transistor R I Sleep transistor can be modeled as resistor R In active mode (cell is working) Current I through sleep transistor Voltage Vx drop over resistor Output voltage reduced to VDD-Vx VDD - Vx Vx = RI Current I is not a leakage current! I is a discharging current of load capacitances Reduced Delay (of following blocks) Copyright Sill Torres, 2012 76 Stackforcing Simple method of using stack effect Increasing stack by splitting transistors Cin stays constant Only one technology is needed Area is (almost) the same Drive strength (drain-source current) is reduced delay goes down VDD VDD WP/2 WP WP/2 WN/2 WN/2 Copyright Sill Torres, 2012 77 Normalized delay Stackforcing cont’d No Stackforcing Normalized Isub Source: Narendra, et al., ISLPED01 Copyright Sill Torres, 2012 78 Input Vector Control (IVC) Leakage of cell depends on input vector VDD Input vector In3 In2 In1 In1 In2 In3 TN3 TN2 TN1 Copyright Sill Torres, 2012 Leakage [nA] Trans. off in NMOS-Stack 0 0 0 0,1 TN3, TN2, TN1 0 0 1 0,2 TN3, TN2 0 1 0 0,2 TN3, TN1 0 1 1 1,9 TN3 1 0 0 0,2 TN2, TN1 1 0 1 1,3 TN2 1 1 0 1,2 TN1 1 1 1 9,4 - 79 Input Vector Control cont’d Every circuits is input vector with minimum leakage Idea: If design is in passive mode SLEEP signal gets active Sleep vector is applied Data MUX Logic Circuit Sleep Vector SLEEP Copyright Sill Torres, 2012 80 Pin Reordering BPTM, 65 nm technology VDD Input vector [In3,In2,In1] T3 T2 T1 |Igate,stack| 001 Igdo - Igcs, Igso, Igcd, Igdo → 65.9 nA 010 Igdo Igci, Igcs, Igdo, Igcd - ↑ 42.8 nA 100 - Igdo - ↓ 10.3 nA 101 - Igdo Igcs, Igso, Igcd, Igdo → 58.7 nA 110 - - Igdo ↓ 7.6 nA 011 Igdo Igci, Igso, Igdo, Igcd Igcs, Igso, Igcd, Igdo ↑ 116.0 nA Drain In3 T3 In2 T2 Igdo Igcd Igso In1 Igcs T1 Example Gate leakage in stack depends on input vector Same logic input vector (amounts of ‘0’ and ‘1’ is equal) → can result in different leakage If input probability is known reorder pins so that highest probable state has minimum gate leakage Copyright Sill Torres, 2012 81 Delay and Power versus VDD 10 Pdyn 5 4 8 td 6 3 4 2 1 2 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Relative Pdyn Relative Delay td 6 2.4 Supply voltage (VDD) Dynamic Power (and leakage) can be traded by delay Copyright Sill Torres, 2012 Adaptive Dynamic Voltage/Frequency Scaling (DVS/DFS) Slow down processor to fill idle time More Delay lower operational voltage Active Idle Active Idle Active 3.3 V 2.4 V Runtime Scheduler determines processor speed and selects appropriate voltage Transitions delay for frequencies <150s Potential to realize 10x energy savings E.g.: Intel SpeedStep, AMD PowerNow, Transmeta Longrun Copyright Sill Torres, 2012 DVS/DFS with Transmeta LongRun % of max powerl consumption 100 90 80 70 60 50 40 30 20 10 0 300 300 Mhz 0.80 V Peak performance region Typical operating region 400 433 Mhz 0.87 V 500 533 Mhz 0.95 V 600 700 667 Mhz 1.05 V 800 800 Mhz 1.15 V 900 900 Mhz 1.25 V 1000 1000 Mhz 1.30 V Frequency (MHz) Source: Transmeta Copyright Sill Torres, 2012 Multi-VDD Objective Reduce dynamic power by reducing the VDD2 term Higher supply voltage used for speed-critical logic Lower supply voltage used for non speed-critical logic Example Memory VDD = 1.2 V Logic VDD = 1.0 V Logic dynamic power savings = 30% Source: Jan M. Rabaey Copyright Sill Torres, 2012 Multi-VDD Issues Partitioning Which blocks and modules should use with voltages? Physical and logical hierarchies should match as much as possible Voltages Voltages should be as low as possible to minimize CVDD2f Voltages must be high enough to meet timing specs Level shifters Needed (generally) to buffer signals crossing islands Added delays must be considered Physical design Multiple VDD rails must be considered during floorplanning Timing verification Timing verification must be performed for all corner cases across voltage islands. Source: Jan M. Rabaey Copyright Sill Torres, 2012 Multi-VDD Flow Determine which blocks run at which Vdd Multi-voltage synthesis Determine floor plan Multi-voltage placement Clock tree synthesis Route Verify timing Copyright Sill Torres, 2012 Source: Jan M. Rabaey Power-orientated Programming Switched Capacitance (nF) 14000 12000 10000 Others Functional Unit Pipeline Registers Register File 8000 6000 4000 2000 0 bubble.c heap.c quick.c Algorithms can differ in power dissipation Source: Irwin, 2000 Copyright Sill Torres, 2012