Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu Advisor: Dr. V Agrawal ELEC6270 Low Power Design of Electronic Circuits Team Project VLSI D&T Seminar Nov. 8 2006 Project Objectives Design and verify 16-bit ALU with synchronous clocked inputs and outputs. Study low-voltage power and delay characteristics of the design. Redesign ALU for minimum power and highest speed. Component of Power Dissipation Dynamic Power due to Signal transitions. • • Logic power (due to logic transitions). Glitch power (due to glitches). Short Circuit power Static Leakage power (due to leakage currents). Power components in CMOS circuit Ron Dynamic power Leakage power VDD vi (t) vo(t) Short circuit power R=large Ground Power = CVDD2 CL 1-bit ALU Design 1-bit ALU Core Reg B Reg C Reg A 1 bit ALU Core Simulation Specification Technology TSMC 0.25 um Application Voltage 2.5 Volt N-MOS Vth 0.365 V P-MOS Vth -0.5625 V Temperature 90 C degree Spice Simulator Eldo ver. 6.3.1.1 Sweep Supply Voltage (6 point) 0,0.5,1.0,1.5,2.0,2.5 V 1-bit ALU Core Timing ( Vdd=2.5V ) A opcode[3:0] NX156 C B NX60 Combinational Logic CYIN CLK opcode 1010 (nand) opcode 1001 (c<=b) opcode 1000 (c<=a) opcode 0110 (or) opcode 0100 (xor) opcode 0010 (equal) opcode 0000 (a+b) opcode 0111 (and) opcode 0101 (nor) opcode 0011 (not equal) opcode 0001 (a-b) opcode others (all zero’s output) Z NX16 COMPOUT CY NX80 DFF Longest Path in Combinational Logic: c <= a+b (Opcode 0000) C CY Z COMPOUT 1-bit ALU Core Sweep Vdd from 2.5V to 0V 2.5V 2.0V 1.5V 1.0V 0.5V 0.0V Analog Mode C(NX156) Output Vdd=2.5 Vdd=0.5 1Bit ALU Core Logic Operation Voltage @200Mz Supply Voltage Sweep near PMOS Vth = -0.5625 V ( ver. NMOS Vth= 0.365) Sweep From Vsupply = 0.50 to 1.00 Volt ( linear increment 0.05 V, 11 point) Vsupply = 0.80 V (Analog Domain) opcode 1000 (c<=a) Vsupply = 0.85 V Overshoot (Analog Domain) Ripples Output Output Input Input Vsupply = 0.80 V Vsupply = 0.85 V Wrong Operation Correct Operation 1-bit ALU Average Power vs. Delay @200MHz Average Power ( Total ALU Block ver. ALU Core) 400 4 1bit ALU Block 354.563 Average Power 2.2493 179.9153 200 2 1.4203 82.8828 31.0283 0.5427 0 0 0.0 1-bit ALU Core Delay 0.5 0.5 1 1.0 0.4955 0.7204 1.5 Vsupply(V) 1.5 Power = CVDD2 2 2.0 0.4123 0 2.5 2.5 Delay(nsec) 1-bit ALU Core Power(uW) Average Power Combinational Logic (16-Bit ALU) Register Input Register 16 Bit ALU (Single Core) Design Cref CK Supply voltage Total capacitance switched per cycle Clock frequency Power consumption: Pref = Vref = Cref =f = CrefVref2f Output 16-BIT ALU Vectors a b Opcode cyin Vector1 1010101010101010 0001010101010101 0001 (sub) 0 Vector2 0101010101010101 1010101010101010 0011 (comp) 0 Vector3 0101010101010101 1010101010101010 0100 (xor) 0 Vector4 1111111111111111 0000000000000001 0000 (add) 0 Vector5 0110011001100110 0000000000000000 1010 (nand) 0 Vector6 0001011001101101 0101010010101010 0001 (sub) 0 *Vector4 activate the critical path, carryout = 1 16-Bit ALU Simulation Result Circuit information: # 694 Gates Temperature: 27C o Clock Frequency applied: 10 MHz Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V By ELDO, SPICE simulation Voltage (v) 2.5 Simulation Time: 700 ns 1.25 0.85 0.625 0.45 24.55 Static Power(nw) 6.02 3.05 1.84 1.71 391.16 Average Power (uw) Delay (ns) 2.83 62.62 26.66 14.57 3.56 7.14 18.88 73.21 Ckt failed 16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V, 0.85 V and 0.625 V for 6 Vectors Circuit fail @0.45 V (< Vth) Simulated Single Vector Pair 16-Bit ALU Power Savings and Delay Increase with Reference @ 2.5 Volts (Reference) Voltage (v) Average Power (uw) Delay (ns) 1.25 V 0.85 V 0.625 V VDD/2 VDD/3 VDD/4 391.16 62.22 P2.5/6.24 84% 26.22 P2.5/14.67 93% 14.67 P2.5/26.66 96% 2.83 7.14 2.57*D2.5 18.87 6.67*D2.5 73.21 25.87*D2.5 VDD 2.5V 16 Bit ALU Power Savings and Delay Increase with Reference @1.25 Volts Voltage (v) Average Power (uw) Delay (ns) 0.85 0.625 (VDD/1.5) (VDD/2) 62.22 26.66 P1.25/2.35 57% 14.67 P1.25/4.27 77% 7.14 18.87 2.63 * D1.25 73.21 10.25 * D1.25 (Reference) 1.25 Different Technology Impact On Power Saving 16 Bit ALU Simulation Setup: Supply Voltage: 2.5v Simulation Transient Time: 700 ns 6 vectors Temperature: 27Co Technology TSMC035 #Gates after synthesis 734 gates TSMC025 694 gate 2.5 V 2.5 V 24.555 N Watts 24.550 N Watts Average Power 381.60 U Watts 3.12 ns Delay 391.16 U Watts Voltage Static Power 2.83 ns Temperature Influence On Power Circuit information: # 734 Gates Clock Frequency applied: 10 MHz ; Vdd=2.5V Vectors Applied: 6 vectors Simulation Time: 700 ns TSMC035 Technology Temperature (C o ) 0 27 60 90 120 900 Static Power (nw) 12.7 24.5 75.51 357.36 4803.3 3.38 mw Average Power (uw) 404.23 381.60 378.15 367.48 363.15 70.43 w Delay (ns) 2.58 3.12 3.18 3.53 3.91 Ckt fail!! Multicore Design Methodology Lower supply voltage This slows down circuit speed Use parallel computing to gain the speed back Multi-core means to place two or more complete cores within a single module. This architecture is a “divide and conquer” strategy. By splitting the work between multiple execution cores , a multi-core design can perform more work within a given clock cycle. About more than 60% reduction in power is observed. Source: http://www.eng.auburn.edu/~vagrawal/D&TSEMINAR_SPR06/SLIDES/Agrawal_DTSem06.ppt f/4 Comb. Logic Copy 2 f/4 Rgst Comb. Logic Copy 3 Rgst Comb. Logic Copy 4 Ck3 Ck2 Ck1 Mux control CK Ck0 f/4 Register Rgst Input Rgst 16 Bit ALU Comb. Logic Copy 1 f/4 4 to 1 multiplexer Parallel Architecture f Output Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 Mux control 00 01 10 11 00 01 01 10 11 …… 16 Bit ALU Multi-core Power Savings and Delay Increase with Reference @2.5 Volts Circuit information: # 2617 Gates Clock Frequency applied: 10 MHz Temperature: 27C Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V Simulator: ELDO(Spice) Simulation Setup: Simulation Time: 700 ns Voltage (v) (Reference) Static Power (nw) 96.35 Average Power (uw) Delay (ns) 2.5 1.25 0.85 VDD/2 VDD/3 23.56 11.94 95.64U 40.93U 687.86 0.11 P2.5/7.19 86% P2.5/16.8 94% 0.57 1.52 0.625 VDD/4 0.45 7.21 6.37 21.13U P2.5/32.55 94.75% 7.26U 30.70 Ckt failed 5.18*D2.5 13.8*D2.5 279.1*D2.5 16 Bit ALU Multicore Power Savings and Delay Increase with Reference @1.25 Volts Voltage (v) Average Power (uw) Delay (ns) (Reference) 1.25 VDD 0.85 VDD/1.5 0.625 VDD/2 95.64 40.93 P1.25/2.33 57% 21.13 P1.25/4.52 78% 0.57 1.52 2.67 * D1.25 30.7 53.86 * D1.25 Power and Delay comparison @2.5 V Reference Design with Multicore Design at different voltages Voltage (v) 2.5 VDD 1.25 0.85 0.725 Multicore Reference Design VDD/2 Design Multicore Design Multicore Design VDD/3 VDD/3.5 0.7 0.625 Multicore Multicore Design Design VDD/3.6 VDD/4 Average Power (uw) 391.16 95.64 P2.5/4.09 76% 40.93 P2.5/9.56 89.5% 25.6 22.35 21.14 P2.5/15.23 P2.5/17.5 P2.5/18.5 93.45% 94.3% 94.6% Delay (ns) 2.83 0.57 D2.5/4.96 1.52 D2.5/1.86 2.61 D2.5/1.08 3.04 30.7 D2.5/0.93 D2.5/0.09 Summary For Single core ALU design we get more than 60% power savings at reduced voltage but at the cost of performance. With Reference of 2.5 Volts we observe power drops faster than 1/Vsquare. With Reference of 1.25 Volts, power drop is almost equal to 1/Vsquare. Multi-core design helps to gain the speed back at reduced voltage and consumes less power. References ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “Multi-Core Parallelism for Low-Power Design” www.tomshardware.com N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005. L. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,” Potentials IEEE, vol. 25 , Issue 5, 2006 International Technology Roadmap for Semiconductors. http://public.itrs.net Alokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors” Version 2.0, 2003 K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE, vol. 87, no. 4, pp. 606-632, Apr. 1999 A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. “Quad-core processor forecas”,Alexander Wolfe @TechWeb Thank You !!!