Power Saving at Architectural Level Xiao Xing March 7, 2005 Purpose of Power Saving In VLSI Circuits • For Portability: So that portable Devices Don’t require Batteries That are as Large as A Brief Case. • For Cooling: So one does NOT Have to Resort to Expensive Cooling Equipment, that Might Cost more than the Circuit you’re trying to Cool off. Types of Power Consumption • Dynamic Power ( Main type of Power Consumption) • Short Circuit Power • Static Power [1] – Leakage – Sub-threshold Power Saving Schemes at Different Levels • Transistor Level [Decreasing Transistor & Interconnect Capacitances] • Gate-Level [Input Ordering, Tree Vs. Chain] • Logic Level [MCML (Low Voltage Swing), Domino (Small Device Count)] • Architectural Level [Parallelism, Pipelining, etc] Can Save the Most Power for Suitable Applications [2] Pipelining to Save Power • PDynamic = C * f * VDD2 * Alpha [3] • Decreasing VDD has the largest Impact on Decreasing Dynamic Power • Decreasing VDD should also decrease Leakage Power • Sub-Threshold & Short-Circuit (up or down) Power Dissipation might increase, due to the Slightly Increased Device Count (Pipe-Line Registers) • Decreasing VDD will also slow down your Circuit, But With Pipelining & Parallelism, This Loss of Speed Can be Compensated. Pipeline Operation Illustrated Idea behind Pipelining for Power Saving • Pipelining Utilizes Parallelism to Boost the Throughput of the Non-Pipelined Circuit • The Throughput Boost can be Nullified by Decreasing VDD of the Pipelined Circuit (The Pipelined Circuit Now has Roughly the Same Throughput as the NonPipelined Circuit) • But the Decreased VDD Decreased Dynamic Power Consumption Pipelined Data Path for a RISC Micro-Processor Enable from Control Unit 16 or 32-bit instructions 16 bit value in Read Register 1 . . . . 16+7+3+3+3 = 32 Flip Flops Instruction Fetch / Register Access Pipe-Line Registers Signal from Control Unit Indicating if 2 writes Are NEEDed Enable from Control Unit . . MOST Significant 16-bit of ALU Output result 16-bit Immediate Value . . 7-bit Op Code Register File . . 3-Bit addressing The read register 1 3-Bit addressing The read register 2 3-Bit addressing The Destination Register . . . . 16+16+7+3 = 42 7-bit OP-Code Flip Flops 16+16+3+1 = 36 ALU Flip Flops . . Register Access / Execute . . Enable from Control Unit Pipe-Line Registers 16 bit value in Read Register 2 Execute / Write-Back . . Least Significant 16-bit of ALU Output result . . Pipe-Line Registers 16 or 32-bit data written back • Actual Circuit Utilized To Analyze Pipelining as a Viable Power Saving A 32-Bit Shift RegisterScheme – Not Large Scale, Transparent to Implement – 32 Flip-Flops, Pipelined to 4 Stages, requiring 3 Extra Flip-flops, with Each Extra Flip-Flop Serving as the Corresponding PipeLine Register – Power Ratio is 10+ : 1 (Possibly 1 of the Better Cases, Almost Trivializing the Power by the Pipeline Registers), So Power Saved by Decreasing VDD, should Substantially Out-Weight the Extra Power of the Extra Flip-Flops – Power Ratio Comparable to that of a VLSI with its necessary Pipe-Lined Registers (the # of the FF ‘s Required Generally proportional to the Size of the VLSI Circuit) – Parallel Version, Parallel + Pipelined Version – Layout of the Flip-Flop For Power/Area, Simulation/Estimation – Interested in the Relative % (Should be Applicable to a Bigger Picture) Power Saved Architecture Analyzed • Plain Shift-Register – 32 Flip-Flops – VDD at Max (2.5 or 3V for CMOSP18) – Input Rate == 1 Bit Inputted (Processed) Every 32 Clock Cycles – Clock Period decreased to find out the Maximum Operating Frequency (By Looking at Waveform Quality, and Voltage Swing) – Throughput = Input Rate * Frequency = (1 Bit/ 32 Cycles) * (f cycles/second) = x Bit/Second Architecture Analyzed • Pipelined Shift-Reg – 35 Flip-Flops – Input Rate == 1 Bit Inputted Every 8 Clock Cycles – VDD, f initially same as that of Plain Version, then Drop to Achieve the same Through-Put 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops Architecture Analyzed • Parallel Shift-Reg – 64 Flip-Flops, 1 Demux, 1 Mux – Input Rate = 2 Bits Inputted Every 32 ClockCycles – VDD, f initially same as that of Plain Version, then Drop to Achieve the same Through-Put 32 Flip-Flops MuX De-Mux 32 Flip-Flops Architecture Analyzed • Pipelined + Parallel – 70 Flips-Flops, 1 Mux, 1 DeMux – Input Rate = 2 Bits Every 8 Clock Cycles – VDD, f initially same as that of Plain Version, then Drop to Achieve the same Through-Put 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops Summary • The Effectiveness of Architectural Approaches (Pipelining, Full-Parallelism, etc) as Viable Power-Saving Schemes for Digital IC ‘s, will be Simulated on a Smaller Scale. • The Resulting Relative Percentage PowerSaved, should be Applicable on a Grander Scale. • Pipelining An Average VLSI circuit, May need more than 10% of Hardware/Power for the PipeLine Registers (Flip-Flops) Time Table • Feb 1 March 1: Literature Survey • March 8 March 12 : Layout • March 14 March 18: Simulating Serial & Pipelined Versions • Mach 19 March 23: Simulating Parallel & The Combo Version • March 24 End of March: Preparing for the Final Presentation • April 1st April 15: Write up the Final Report References [1]. Jan. M Raebaey, “Digital Integrated Circuits”, 2nd Ed., Prentice Hall, 2003 [2]. Jerry Frenkil, “A Multi-Level Approach to LowPower IC Design”, IEEE Spectrum, Vol 35, Number 2, 1998 [3]. Anantha P. Chandrakasan, “Low Power CMOS Digital Design, IEEE Journal of Solid State Circuits, pp. 473 -- 484, 1992 [4]. K.K. Parhi, "Low-Power Digital VLSI Approaches", Chapter in Circuits and Systems in the Information Age , Edited by Y. Huang and C. Wei, pp. 3-22, IEEE Press, June 1997 (ISCAS-97 Tutorial Book) Aside • Portability: If your portable device is very power hungry, and Knowing the limited advancement there has been/will be in terms of Battery Capacity, one would need a Very Large Battery to expect it to keep going and going. Intel CPUs getting hotter and hotter than they used to be, and Average House hold Maybe able to afford a CPU, but not necessarily something as Drastic as a Vapor Cooling Computer Case. Application Suitability for Pipelining-For-Power-Saving: 1) Power Consumption of the VLSI being pipelined, must >> the Power Consumption of the Pipeline Registers. 2) Large & Complex Data Dependency Large & Complex 3) Huge Discrepancy between the delays of the Pipeline stages (1 + 1 + 1000 clock Cycles)