Power Consumption by Integrated Circuits Lin Zhong ELEC518, Spring 2011 Power consumption of processing • Dynamic power 2 Busy power vs. delay vs. energy Pdyn a C V dd f 2 t V dd (V dd V T ) Analysis and Design of Digital ICs, Hodges et al 3 Core 2 Duo for example • Intel® Core™2 Duo processor – T7800 at 2.6GHz – T7700 at 2.4GHz available on Thinkpad T61p – 0.75-1.35V, 35Watts • Intel® Core™2 Duo Low Voltage – L7500 at 1.6GHz available on Thinkpad X61 – 0.75-1.3V, 17Watts • Intel® Core™2 Duo Ultra Low Voltage – U7500 at 1.06GHz available on Dell D430 – 0.75-0.975V, 10Watts 4 Switching energy e=1/2∙C ∙V2 Switching power P= b∙C ∙V2= a∙C ∙V2 ∙f 5 Higher integration • Selling the chipset (or solution or platform) – Intel Centrino • Centrino Duo includes Core 2 Duo processor, 9XX Express-series chipset, and Wi-Fi adapter – TI TCS2600 chipset 6 6 System-on-a-chip (SoC) • TI OMAP 7 SiP: Multiple-chip product (MCP) 400MHz 32MB Source: Intel.com Siemens SX66 PDA Phone Audiovox PPC6601KIT 8 SiP: Stacked-die approach Qualcomm 3G CDMA2000 chip Seven power regimes 100 clock regimes ISSCC 2004 9 Moore’s Law Exciting Unknown known 10 MOSFET at nanoscale Sunlin Chou, “Extending Moore’s Law in the Nanotechnology Era” (www.intel.com). 11 Given workload L and deadline T • L measured by # of CPU cycles • Clock speed f ≥ L/T • Time to finish: t = L/f • Energy to finish: P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L 12 Effect of lower clock speed (f) Power consumption P= a∙C ∙V2 ∙f Energy consumption E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L 13 Effect of lower supply voltage (V) Maximum clock speed f= b∙V Power consumption P= a∙C ∙V2 ∙f=k∙V3=x∙f3 Energy consumption E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L 14 Given workload L and deadline T single processor • The processor can run at any frequency (voltage) – f= b∙V • The processor can be complete off when work is done (zero power when idle) • To minimize energy consumption, at which frequency should the processor run? – f ≥ L/T (in order to meet the deadline) – E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L – f=???? 15 f f2=L/(T/2)=2f1 f1=L/T T time 16 P2=23P1 P P1=x∙f3 T time 17 Given workload L and deadline T M processors • The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0) • To minimize energy consumption, at which frequency should processor i run? – f i= Li/T and V = u ∙ Li – Ei= a∙C ∙V2 ∙Li=w∙Li3 18 Given workload L and deadline T M processors • The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0) • To minimize the TOTAL energy consumption, how should the workload be allocated? – E= E1+E2+…+EM= w∙L13+w∙L23+…+w∙LM3 – = w(L13+L23+…+LM3) 19 From high school • [(a+b)/2]2≤ (a2+b2)/2 ≥ ≥ Quadratic mean Arithmetic mean Geometric mean ≥ harmonic mean 20 From high school (Contd.) • [(a+b)/2]3≤ (a3+b3)/2 ( for a, b ≥0) – E= w(L13+L23+…+LM3) ??? (L1+L2+…+LM)3 21 From college: Convex (Concave) By definition of “convex” 22 Jensen’s Inequality (finite form) • ϕ (x) is convex – ϕ (t∙x1+(1-t)∙x2)≤ t∙ ϕ (x1)+(1-t) ∙ϕ (x2) http://en.wikipedia.org/wiki/Jensen%27s_inequality#Proof_1_.28finite_form.29 23 • ai=1/n • ϕ (x) =x2 (Convex) ≥ • ϕ (x) =x3(Convex for x≥0) – E= w(L13+L23+…+LM3)=w∙M ∙ (L13+L23+…+LM3)/M – ≥ w∙M ∙[(L1+L2+…+LM)/M] 3=w∙L3/M2 24 More about Convexity Cost Return Example Cost Return Workload distribution Energy Workload finished within T Eating Price of apples Pleasure from eating apples Helicopter engine Price of engine Engine thrust Law of diminishing marginal returns Cost of production Increase in production More about Convexity Cost Return • Greedy optimization works • Combine simpler/cheaper components Check the assumptions • Power consumption is zero when the processor is not active 27 Idle power (Static power) Pstatic T e 2 T Pstatic V dd e V d d When IC is idle but not powered off, e.g. SRAM 28 Leakage power Scaling down 30 Scaling down (Contd.) Quantum dynamics: Individual molecules Thermodynamics: Gas High variation and likely defectivel Uniform (central limit theorem) 31 Scaling: Not that simple (Contd.) Tunneling effect 32 f f2=L/(T/2)=2f1 f1=L/T T time 33 P P1=x∙f3 T time 34 P P1=x∙f3+Pstatic T time 35 P2=23x∙f3+Pstatic P P1=x∙f3+Pstatic T time 36 Why is static power important? ITRS, 2009 Pentium II (Klamath) and III (Coppermine) 28M Transistors 7.5M Transistors 38 Core 2 Duo (Conroe) Core 1 Core 2 64KB L1 cache, 4MB L2 cache, 291M Transistors 39 Solutions to “never-enough” challenge 234M transistors 24M go to L2 cache 8 SPE, each 20.9M transistors (167M transistors) Each has 4 64KB SRAM (12M transistors) SRAM takes 122M transistors (>50%) 40 Multiple power/clock domains Multimedia phone: NTT DoCoMo 3G FOMA 902i to be released with OMAP2420 TI OMAP 2 architecture, ISSCC 2005 41 Given workload L and deadline T single processor • One processor can run at any frequency (voltage) – f= b∙V • The processor can be complete off when work is done (zero power when idle) Given Pstatic – Given energy overhead of shutting down the processor (Eoverhead) • To minimize energy consumption, at which frequency should the processor run? 42 P2=23x∙f3+Pstatic P P1=x∙f3+Pstatic T time 43 Why is there overhead to power off circuit? Clock generator • Resonant circuit + amplifier A Res • Resonant circuit (Oscillator) – Crystal oscillator (>2x109/yr) • ~10KHz to ~10MHz • Quartz, ceramics (low cost, low accuracy), surface acoustic wave (SAW) quartz crystal (expensive, accurate) • Real-time clocks – 32.768KHz (215), 4.194304MHz (222) • Application-specific – 4.9152MHz (4 x 1.2288MHz, CDMA baseband frequency)…… 45 Oscillator (Contd.) • LC/RLC circuit • Ring oscillator – Application other than oscillator? • Voltage-controlled oscillator (VCO) – Varicap: variable capacitance diode (tuning diode) – Phase-locked loop for high-speed clock (next slide) – Frequency scaling of IC for energy saving 46 Phase-locked loop (PLL) • High-speed clock from a master oscillator • Digital PLL Master oscillator Phasefrequency detector voltage VCO Frequency divider (N) • Clock generation, recovery, synchronization – Digital computing, RF communication 47 Given workload L and deadline T single processor • The processor can run at any frequency (voltage) – f= b∙V • The processor can be complete off when work is done (zero power when idle) • To minimize energy consumption, at which frequency should the processor run? – f ≥ L/T (in order to meet the deadline) – E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L – f=???? 48 Threshold voltage Vdd scales slow & Vth scales slower • Vth is limited by the thermal voltage • Vdd needs to stay considerable higher than Vth to curb leakage current • End up with destroying the scaling rules – low channel mobility Plummer and Griffin, 2001 (Data from ITRS/NTRS) 50 Check the assumptions (Contd.) • The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0) • Communication cost between processors!!! 51 Quadrotor vs. Helicopter Quadrotor vs. Helicopter De Bothezat Quadrotor, 1923. Quadrotor vs. Helicopter A.R. Drone, 2010 Wire power consumption 55 Wire power consumption Inter-processor communication