CSV881: Low-Power Design Introduction to Low Power Design Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal Copyright Agrawal, 2011 Lectures 1, 2: Introduction 1 Course Objectives Low-power is a current need in VLSI design. Learn basic ideas, concepts, theory and methods. Gain experience with techniques and tools. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 2 Course Description This course is designed for the MTech program in VLSI at IIT, Delhi. It is patterned after a one-semester graduate-level course offered at Auburn University. A set of 16 lectures that include classroom exercises provide understanding of theoretical and practical aspects of power and energy in digital VLSI systems. The course fulfills a basic need of today’s industrial design environment. Specific topics include power components of digital CMOS circuits, power analysis, glitch elimination for reducing dynamic power, dual-threshold design for reduced static power, voltage and frequency scaling*, power management in memories* and microprocessors*, parallelism for power saving, battery management*, test power, ultra-low voltage (subthreshold) logic circuits*, and low power technologies (domino CMOS, pass transistor logic)*, adiabatic logic*. ________________ * Not included in short course. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 3 Outline Lecture 1: Introduction (37) * Number of slides Lecture 2: Examples Homework 1 (10 points) Lecture 3: Power dissipation in CMOS Circuits (46): Lecture 4: Power of a transition, static power Homework 2 (10 points) Lecture 5: Gate-level power analysis (56): Lecture 6: Logic simulation, delay estimation Lecture 7: Transition density, Probabilistic methods Homework 3 (10 points) Lecture 8: Linear Programming – A Mathematical optimization technique (44): Lecture 9: Examples of LP and ILP optimization Homework 4 (10 points) Lecture 10: Gale-level power optimization (59): Lecture 11: Glitch-free design for reduced dynamic power Lecture 12: Dual-threshold design for reduced leakage Lecture 13: Multicore design for low power (23) Homework 5 (10 points) Lecture 14: Test Power (52) Lecture 15: Test Power (continued) Lecture 16: SoC Test Scheduling EXAM (50 points) Copyright Agrawal, 2011 Lectures 1, 2: Introduction 4 Schedule Oct 21, 2013 – 3:30-5:00PM Lectures 1 and 2 Oct 22, 2013 – 3:30-5:00PM Lectures 3 and 4 Oct 23, 2013 – 3:30-5:00PM Lectures 5 and 6 Oct 24, 2013 – 3:30-5:00PM Lectures 7 and 8 Oct 25, 2013 – 3:30-5:00PM Lectures 9 and 10 Oct 26, 2013 – 3:30-5:00PM Lectures 11 and 12 Oct 28, 2013 – 4:00-5:30PM Lectures 13 and 14 Oct 29, 2013 – 4:00-5:30PM Lectures 15 and 16 Oct 31, 2013 – EXAM Copyright Agrawal, 2011 Lectures 1, 2: Introduction 5 Power Consumption of VLSI Chips Why is it a concern? Copyright Agrawal, 2011 Lectures 1, 2: Introduction 6 ISSCC, Feb. 2001, Keynote “Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second – about the same number of calculations that the world's fastest supercomputer can perform now. Patrick P. Gelsinger Senior Vice President General Manager Digital Enterprise Group INTEL CORP. Copyright Agrawal, 2011 “Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor. . . .” Lectures 1, 2: Introduction 7 VLSI Chip Power Density Sun’s Surface Power Density (W/cm2) 10000 Rocket Nozzle 1000 Nuclear Reactor 100 8086 Hot Plate 10 4004 8008 8085 386 286 8080 1 1970 Copyright Agrawal, 2011 1980 P6 Pentium® 486 1990 Year Lectures 1, 2: Introduction Source: Intel 2000 2010 8 Year 1999 2002 2005 2008 2011 2014 Feature size (nm) 180 130 100 70 50 35 Logic transistors/cm2 6.2M 18M 39M 84M 180M 390M Clock (GHz) 1.25 2.1 3.5 6.0 10.0 16.9 Chip size (mm2) 340 430 520 620 750 900 Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5 High-perf. Power (W) 90 130 160 170 175 183 Untrue predictions. SIA Roadmap for Processors (1999) Source: http://www.semichips.org Copyright Agrawal, 2011 Lectures 1, 2: Introduction 9 Recent Data Source: http://www.eetimes.com/story/OEG20040123S0041 Copyright Agrawal, 2011 Lectures 1, 2: Introduction 10 Low-Power Design Design practices that reduce power consumption by at least one order of magnitude; in practice 50% reduction is often acceptable. Low-power design methods: Algorithms and architectures High-level and software techniques Gate and circuit-level methods Test power Copyright Agrawal, 2011 Lectures 1, 2: Introduction 11 VLSI Building Blocks Finite-state machine (FSM) Bus Flip-flops and shift registers Memories Datapath Processors Power grid Clock distribution Analog circuits RF components Copyright Agrawal, 2011 Lectures 1, 2: Introduction 12 Specific Topics in Low-Power Power dissipation in CMOS circuits Device technology Circuit and gate level methods Logic synthesis Dynamic power reduction techniques Leakage power reduction System level methods Low-power CMOS technologies Energy recovery methods Microprocessors Arithmetic circuits Low power memory technology Test Power Power estimation Copyright Agrawal, 2011 Lectures 1, 2: Introduction 13 Some Examples Copyright Agrawal, 2011 Lectures 1, 2: Introduction 14 State Encoding for a Counter Two-bit binary counter: State sequence, 00 → 01 → 10 → 11 → 00 Six bit transitions in four clock cycles 6/4 = 1.5 transitions per clock Two-bit Gray-code counter State sequence, 00 → 01 → 11 → 10 → 00 Four bit transitions in four clock cycles 4/4 = 1.0 transition per clock Gray-code counter is more power efficient. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers (now Springer), 1998. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 15 Binary Counter: Original Encoding Present state a 0 0 1 1 b 0 1 0 1 a Next state b A 0 1 1 0 B 1 0 1 0 A = a’b + ab’ = a xor b B = a’b’ + ab’ = b’ Copyright Agrawal, 2011 A B CK CLR Lectures 1, 2: Introduction 16 Binary Counter: Gray Encoding Present state a 0 0 1 1 b 0 1 0 1 Next state A A 0 1 0 1 B 1 1 0 0 B b a A = a’b + ab = b B = a’b’ + a’b = a’ Copyright Agrawal, 2011 CK CLR Lectures 1, 2: Introduction 17 Three-Bit Counters State Binary No. of toggles Gray-code State No. of toggles 000 - 000 - 001 1 001 1 010 2 011 1 011 1 010 1 100 3 110 1 101 1 111 1 110 2 101 1 111 1 100 1 000 3 000 1 Av. Transitions/clock = 1.75 Copyright Agrawal, 2011 Av. Transitions/clock = 1 Lectures 1, 2: Introduction 18 N-Bit Counter: Toggles in Counting Cycle Binary counter: T(binary) = 2(2N – 1) Gray-code counter: T(gray) = 2N T(gray)/T(binary) = 2N-1/(2N – 1) → 0.5 Bits T(binary) T(gray) T(gray)/T(binary) 1 2 2 1.0 2 6 4 0.6667 3 14 8 0.5714 4 30 16 0.5333 5 62 32 0.5161 6 126 64 0.5079 ∞ - - 0.5000 Copyright Agrawal, 2011 Lectures 1, 2: Introduction 19 FSM State Encoding Transition probability based on PI statistics 0.6 11 0.3 0.4 00 0.6 0.6 0.1 01 0.3 0.1 0.4 01 00 0.9 0.6 0.1 0.1 11 0.9 Expected number of state-bit transitions: 2(0.3+0.4) + 1(0.1+0.1) = 1.6 1(0.3+0.4+0.1) + 2(0.1) = 1.0 State encoding can be selected using a power-based cost function. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 20 FSM: Clock-Gating Moore machine: Outputs depend only on the state variables. If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Xi/Zk Si Sk Sj Copyright Agrawal, 2011 Xk/Zk Clock can be stopped when (Xk, Sk) combination occurs. Xj/Zk Lectures 1, 2: Introduction 21 Clock-Gating in Moore FSM PI Flip-flops Combinational logic Clock activation logic CK Copyright Agrawal, 2011 Latch PO L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998. Lectures 1, 2: Introduction 22 Bus Encoding for Reduced Power Example: Four bit bus 0000 → 1110 has three transitions. If bits of second pattern are inverted, then 0000 → 0001 will have only one transition. Bit-inversion encoding for N-bit bus: Number of bit transitions after inversion encoding Copyright Agrawal, 2011 N N/2 0 0 N/2 Number of bit transitions Lectures 1, 2: Introduction N 23 Sent data Received data Bus-Inversion Encoding Logic Polarity decision logic Copyright Agrawal, 2011 Bus register Polarity bit M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49-58, March 1995. Lectures 1, 2: Introduction 24 Clock-Gating in Low-Power Flip-Flop D D Q CK Copyright Agrawal, 2011 Lectures 1, 2: Introduction 25 Example: Benchmark S5378 TSMC025 CMOS technology 50ns clock 1,000 random vectors Reference: J. D. Alexander, Simulation Based Power Estimation for Digital CMOS Technologies, Master’s Thesis, Auburn University, December 2008, Section 3.8. Clock Number Number of comb. of flipgates flops Power consumption in μW Comb. gates Flip-flops Total Ungated 2,958 179 330 752 1,082 Gated 3,316 179 276 32 308 Copyright Agrawal, 2011 Lectures 1, 2: Introduction 26 Example: Shift Register D D Q D Q D Q D Q Output D Q D Q D Q D Q CK Copyright Agrawal, 2011 Lectures 1, 2: Introduction 27 Reduced-Power Shift Register D Q D Q D Q D Q multiplexer D D Q D Q D Q D Output Q CK(f/2) Flip-flops are operated at full voltage and half the clock frequency. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 28 Power Consumption of Shift Register P = C’VDD2f/n 16-bit shift register, 2μ CMOS Deg. of parallelism Freq (MHz) Power (μW) 1 33.0 1535 2 16.5 887 4 8.25 738 C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Springer, 1997. Copyright Agrawal, 2011 Normalized power 1.0 0.5 0.25 0.0 Lectures 1, 2: Introduction 1 2 4 Degree of parallelism, n 29 Books on Low-Power Design (1) L. Benini and G. De Micheli, Dynamic Power Management Design Techniques and CAD Tools, Boston: Springer, 1998. T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston: Springer, 2002. A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston: Springer, 1995. A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE Press, 1998. J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral and System Levels using Formal Methods, Boston: Springer, 1999. D. Chinnery and K. Keutzer, Closing the Power Gap Between ASIC & Custom: Tools and Techniques for Low Power Design, Springer, 2007, ISBN 0387257632, 9780387257631. M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced LowPower Digital Circuit Techniques, Boston: Springer, 1997. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 30 Books on Low-Power Design (2) P. Girard, N. Nicolici and X. Wen, Power-Aware Testing and Test Strategies for Low Power Devices, Springer, 2010. R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston: Springer, 1998. M. Keating, D. Flynn, R. Aitken, A. Gibbons and K. Shi, Low Power Methodology Manual For System-on-Chip Design, 1st ed. 2007. Corr. 2nd printing, 2007, XVI, 304 p., Hardcover, ISBN: 978-0-38771818-7. J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: Wiley-Interscience, 1999. J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power Sequential Logic Circuits, Boston: Springer, 1997. S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS Technologies, Boston: Springer, 2005. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 31 Books on Low-Power Design (3) W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics, Boston: Springer, 1997. N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits, Boston: Springer, 2003. V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, 2005. M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston: Springer, 2002. C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, 2005. J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston: Springer, 1996. S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor Networks, Boston: Springer, 2003. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 32 Books on Low-Power Design (4) K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: Wiley-Interscience, 2000. E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE Press, 1999. W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits, Boston: Springer, 1995. S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A Wideband CDMA System Design, Boston: Springer, 1998. G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: Springer, 2001. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Springer, 1998. K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 33 Books Useful in Low-Power Design A. Chandrakasan, W. J. Bowhill and F. Fox, Design of HighPerformance Microprocessor Circuits, New York: IEEE Press, 2001. R. C. Jaeger and T. N. Blalock, Microelectronic Circuit Design, Third Edition, McGraw-Hill, 2006. S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, New York: McGraw-Hill, 1996. E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization, Springer, 2005. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Second Edition, Upper Saddle River, New Jersey: PrenticeHall, 2003. J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, New York: IEEE Press, 2004. N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts: Addison-Wesley, 2005. Copyright Agrawal, 2011 Lectures 1, 2: Introduction 34 Problem: Bus Encoding A 1-hot encoding is to be used for reducing the capacitive power consumption of an n-bit data bus. All n bits are assumed to be independent and random. Derive a formula for the ratio of power consumptions on the encoded and the un-coded buses. Show that n ≥ 4 is essential for the 1-hot encoding to be beneficial. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, New York: Springer, 1995, pp. 224-225. [Hint: You should be able to solve this problem without the help of the reference.] Copyright Agrawal, 2011 Lectures 1, 2: Introduction 35 Solution: Bus Encoding Un-coded bus: Two consecutive bits on a wire can be 00, 01, 10 and 11, each occurring with a probability 0.25. Considering only the 0→1 transition, which draws energy from the supply, the probability of a data pattern consuming CV 2 energy on a wire is ¼. Therefore, the average per pattern energy for all n wires of the bus is CV 2n/4. Encoded bus: Encoded bus contains 2n wires. The 1-hot encoding ensures that whenever there is a change in the data pattern, exactly one wire will have a 01 transition, charging its capacitance and consuming CV 2 energy. There can be 2n possible data patterns and exactly one of these will match the previous pattern and consume no energy. Thus, the per pattern energy consumption of the bus is 0 with probability 2–n, and CV 2 with probability 1 – 2–n. The average per pattern energy for the 1-hot encoded bus is CV 2(1 – 2–n). Copyright Agrawal, 2011 Lectures 1, 2: Introduction 36 Solution: Bus Encoding (Cont.) Power ratio = Encoded bus power / un-coded bus power = 4(1 – 2–n)/n → 4/n for large n For the encoding to be beneficial, the above power ratio should be less than 1. That is, 4(1 – 2–n)/n ≤ 1, or 1 – 2–n ≤ n/4, or n/4 ≥ 1 (approximately) → n ≥ 4. The following table shows 1-hot encoded bus power ratio as a function of bus width: n 4(1 – 2–n)/n n 4(1 – 2–n)/n 1 2.0000 8 0.4981 2 1.5000 16 0.2500 = 1/4 3 1.1670 32 1/8 4 0.9375 64 1/16 Copyright Agrawal, 2011 Lectures 1, 2: Introduction 37