Power Reduction Techniques in the Processor Core Low Power Design for SoCs ASIC Tutorial Processor Core.1 Core.1 ©M.J. Irwin, PSU, 1999 Power Usage Stats 16% 18% 52% 2% 12% 1995 5V Notebook PC Motherboard Hard Disk Floppy Disk LCD/VGA Power Supply From Roy, 1997 Low Power Design for SoCs ASIC Tutorial Processor Core.2 Core.2 ©M.J. Irwin, PSU, 1999 1 Processor Power Budgets Clock Datapath Memory I/O (pads) Inner circle: low end embedded microprocessor Next circle: high end CPU with on-chip cache Next circle: MPEG2 decoder ASIC Outer circle: ATM switch ASIC Low Power Design for SoCs ASIC Tutorial Processor Core.3 Core.3 ©M.J. Irwin, PSU, 1999 Basic Principles of Low Power Design P = CL Vdd2 f + (tr + tf)/2 Vdd Ipeak f + Vdd Ileakage l Reduce switching (supply) voltage » quadratic effect -> dramatic savings » negative effect on performance l Reduce capacitance l Reduce switching frequency l Reduce glitching l Reduce leakage and static currents Low Power Design for SoCs ASIC Tutorial Processor Core.4 Core.4 ©M.J. Irwin, PSU, 1999 2 Design Levels Abstraction Level Power Savings Analysis Resources Analysis Accuracy Most Least Worst Least Most Best Algorithm Software/system Architecture Functional unit Gate Circuit Low Power Design for SoCs ASIC Tutorial Processor Core.5 Core.5 ©M.J. Irwin, PSU, 1999 Circuit and Logic Gate Techniques Low Power Design for SoCs ASIC Tutorial Processor Core.6 Core.6 ©M.J. Irwin, PSU, 1999 3 Transistor Sizing for Dynamic Power Reduction l Use the smallest transistors that satisfy the delay constraints » slack time - difference between required time and arrival time of a signal at a gate output – Positive slack - size down – Negative slack - size up l Make gates that toggle more frequently smaller Low Power Design for SoCs ASIC Tutorial Processor Core.7 Core.7 ©M.J. Irwin, PSU, 1999 Equivalent Pin Ordering l Logically equivalent pins may not have identical delay/power characteristics lTo B Out Cout A Ci Low Power Design for SoCs conserve power (and improve speed), connect inputs so that most active input is nearest output lNeed to know signal stats ASIC Tutorial Processor Core.8 Core.8 ©M.J. Irwin, PSU, 1999 4 Gate Restructuring l Logically equivalent gates may not have identical power/delay characteristics Low Power Design for SoCs ASIC Tutorial Processor Core.9 Core.9 ©M.J. Irwin, PSU, 1999 Network Restructuring l Logically equivalent gate networks may not have identical power/delay characteristics F = ABCD Technology mapping Low Power Design for SoCs ASIC Tutorial Processor Core.10 Core.10 delay area power ©M.J. Irwin, PSU, 1999 5 Dual Supply Voltages l Use two Vdd’s (e.g., 2.5V and 1.5V) » use the higher supply for gates on the critical path » use the lower supply for gates off the critical path Reduces power without a performance loss l Cons l » slight area penalty » increased design time » need level converters to interconnect gates on different supplies (to avoid static currents) Low Power Design for SoCs ASIC Tutorial Processor Core.11 Core.11 ©M.J. Irwin, PSU, 1999 Functional Unit Techniques Low Power Design for SoCs ASIC Tutorial Processor Core.12 Core.12 ©M.J. Irwin, PSU, 1999 6 Latches and Flipflops l Consume a lot of power because they are clocked every cycle » Clock energy (Ec) – energy dissipated when the ff is clocked with stable data » Data energy (Ed) – energy dissipated when the ff is clocked and the data has changed so that the ff changes state » Typically the data rate (fd) is much lower than the clock rate (fc) l Also impacts clock power since a large portion of clock power is used to drive the sequential elements Low Power Design for SoCs ASIC Tutorial Processor Core.13 Core.13 ©M.J. Irwin, PSU, 1999 Power Consumption in Latches CLKB D Q CLK 100 % Power 80 60 Data Clock 40 20 0 0 0.1 0.2 0.3 0.4 0.5 Latch Data AF From Tiwari, Tiwari, 1998 Low Power Design for SoCs ASIC Tutorial Processor Core.14 Core.14 ©M.J. Irwin, PSU, 1999 7 Some Typical CMOS FFs CLK CLK Q D Q D Static TG FF Dynamic C2MOS FF Q D D CLK Q CLK Dyn Precharged TSPC FF Low Power Design for SoCs Dyn Non-Precharged TSPC FF ASIC Tutorial Processor Core.15 Core.15 ©M.J. Irwin, PSU, 1999 Relative Power Consumption FF Power Comparison 30 25 TGFF GFF C2MOS PTSPC NPTSPC RSLATCH 20 15 10 5 0 0.05 0.15 0.25 0.35 0.45 Latch Data AF Low Power Design for SoCs ASIC Tutorial Processor Core.16 Core.16 From Svenson, Svenson, 1996 ©M.J. Irwin, PSU, 1999 8 Some Low Power FFs D VDD GND Q VDD Q VDD CLK CLK Power PC 603 FF CLKB StrongArm SA110 FF D Q CLK CLKB Low Power Design for SoCs ASIC Tutorial Processor Core.17 Core.17 ©M.J. Irwin, PSU, 1999 PDP of Some Low Power FFs 80 70 PDPtot (fJ) 60 50 High Low Average 40 30 20 10 K6 ET L SA 11 0F F m C2 M OS Po we rP C SD FF HL FF 0 From Stojanovic, Stojanovic, 1998 Low Power Design for SoCs ASIC Tutorial Processor Core.18 Core.18 ©M.J. Irwin, PSU, 1999 9 Self-Gating FF l When ff input is equal to its output, suppress internal clocking to conserve power » gating function is derived within the FF Φ Φ D Q Φ Φ Φ Φ Φ CLK Φ Low Power Design for SoCs Φ Strict rules D on when D can Q change wrt CLK ASIC Tutorial Processor Core.19 Core.19 ©M.J. Irwin, PSU, 1999 Power of Self-Gated FF Power dissipation 10 SG FF Reg FF 0 1 2 Data switching rate fd/fc From Reyes, 1996 Low Power Design for SoCs ASIC Tutorial Processor Core.20 Core.20 ©M.J. Irwin, PSU, 1999 10 Double Edge Triggered FF CLKB CLK Loads data at both rising and falling clock edges CLKB CLKB CLK Q D CLK CLKB Low Power Design for SoCs CLK ASIC Tutorial Processor Core.21 Core.21 ©M.J. Irwin, PSU, 1999 DETFF Pros and Cons l Advantages » Clock frequency can be halved to achieve the same computational throughput: Pd = 0.84Ps » Also get a 2X power savings in the clock network l Disadvantages » » » » » About 15% larger in transistor count Maximum operating frequency less Strict requirements on clock skew Requires a strict 50% duty cycle Larger clock load Low Power Design for SoCs ASIC Tutorial Processor Core.22 Core.22 ©M.J. Irwin, PSU, 1999 11 Arithmetic Components l Many techniques for lowering power consumption of arithmetic components » adders, ALUs » barrel shifters, multipliers, MACs l PDP of different architectures l Delay balancing to reduce glitching l Precomputation l Common case computation Low Power Design for SoCs ASIC Tutorial Processor Core.23 Core.23 ©M.J. Irwin, PSU, 1999 PDP of Different Adders 100 RCA MCCA CSkA VSkA CSlA CLA BKA ELMA 75 50 25 0 8 bits 16 bits 32 bits 48 bits 64 bits From Nagendra, Nagendra, 1996 Low Power Design for SoCs ASIC Tutorial Processor Core.24 Core.24 ©M.J. Irwin, PSU, 1999 12 Array Multiplier B3 B2 0 M03 Low Power Design for SoCs M11 M22 M33 M21 M32 Y6 0 M01 M12 M23 Y7 B1 M02 M13 Longest delay path 2i+j+1 0 M31 Y5 Y4 ASIC Tutorial Processor Core.25 Core.25 B0 0 M00 A0 0 M10 Y0 A1 0 M20 Y1 A2 0 M30 Y2 A3 0 Y3 ©M.J. Irwin, PSU, 1999 Multiplier Cell Structure Bj sum input Ai carry out full adder carry in add delay elements to minimize glitching sum output Low Power Design for SoCs ASIC Tutorial Processor Core.26 Core.26 ©M.J. Irwin, PSU, 1999 13 Precomputation Logic Precomputed inputs R1 Gated inputs R2 Combination logic f(X) Outputs Load g(X) disable Precomputation logic lIdentify logical conditions at inputs that are invariant to the output »since those inputs don’t affect output, disable input transitions »trade area for power Low Power Design for SoCs ASIC Tutorial Processor Core.27 Core.27 ©M.J. Irwin, PSU, 1999 Binary Comparator Example An Bn An-1 Bn-1 R1 R2 n-bit binary value comparator A>B A>B A1 B1 Load disable An = Bn Low Power Design for SoCs Can achieve up to 75% power reduction with 3% area overhead and 1 to 5 additional gate delays in worst case path ASIC Tutorial Processor Core.28 Core.28 ©M.J. Irwin, PSU, 1999 14 Design Issues in Precomputation l Design steps 1. Select precomputation architecture 2. Determined the precomputed and gated inputs (R1 should be much smaller than R2) 3. Find (good implementation for) g(X) 4. Evaluate potential power savings based on input statistics (if savings not sufficient go to step 2 or 3 and try again) l Also works for multiple output functions where g(X) is the product of gj(X) over all j Low Power Design for SoCs ASIC Tutorial Processor Core.29 Core.29 ©M.J. Irwin, PSU, 1999 Common Case Computation Inputs common case detected sleep1 CC detection circuit Original circuit CC execution circuit sleep2 sleep3 CCC controller common case completed Outputs Low Power Design for SoCs ASIC Tutorial Processor Core.30 Core.30 ©M.J. Irwin, PSU, 1999 15 Activity of CCC Circuit Over Time Original circuit CC detection circuit CC execution circuit tp tc te Time lSeveral (possibly conflicting) factors involved in choosing the CC circuit leading to maximal energy and/or time savings lDependent Low Power Design for SoCs on input data statistics ASIC Tutorial Processor Core.31 Core.31 ©M.J. Irwin, PSU, 1999 CCC Performance Circuit GCD Area % Increase 29.0 Cycles Power (mW) % Decrease % Decrease 76.6 59.8 Poly 14.5 17.9 58.2 Test1 21.9 42.5 48.6 Linegen 23.5 43.3 39.7 Graphics 29.7 27.4 12.4 From Lakshminarayana,, 1999 Low Power Design for SoCs ASIC Tutorial Processor Core.32 Core.32 ©M.J. Irwin, PSU, 1999 16 Control Unit Design Inputs Outputs Combinational Logic State FFs n! different possible encodings (n states) 0/0 State Encoding One of most important factors determining area, speed, and power of resulting control logic Low Power Design for SoCs 11 0/0 0,1/1 1/X 00 ASIC Tutorial Processor Core.33 Core.33 01 1/X ©M.J. Irwin, PSU, 1999 Power State Encoding Heuristic Area driven -> try to reduce the distance in Boolean n-space between related states l Power driven -> try to minimize number of bit transitions in the state register l » fewer transitions in state register » fewer transitions propagated to combinational logic 0.1 0.3 01 0.1 0.4 00 Low Power Design for SoCs 0.1 11 ASIC Tutorial Processor Core.34 Core.34 probability that a transition will occur (sum of all edges equals unity) ©M.J. Irwin, PSU, 1999 17 Caveat l Lowest E[M] may not be lowest in power -> it could require more gates and/or signal transitions in the combinational logic l Experiments show that the area and power dissipation of a state machine are correlated when the state encoding is varied Low Power Design for SoCs ASIC Tutorial Processor Core.35 Core.35 ©M.J. Irwin, PSU, 1999 State Encoding Effects 750 Power 700 650 600 550 500 3300 3400 3500 3600 3700 3800 Area Low Power Design for SoCs ASIC Tutorial Processor Core.36 Core.36 3900 4000 4100 From Yeap, Yeap, 1997 ©M.J. Irwin, PSU, 1999 18 Practical Considerations l Balance area-power by forced encoding of only a subset of states that span the high probability edges » leave assignment of remaining states to the logic synthesis system for area optimization » fortunately, in practice, most state machines have this characteristic l Unlike area encoding, power encoding requires knowledge of probabilities of state transitions and input signals Low Power Design for SoCs ASIC Tutorial Processor Core.37 Core.37 ©M.J. Irwin, PSU, 1999 Architecture Techniques Low Power Design for SoCs ASIC Tutorial Processor Core.38 Core.38 ©M.J. Irwin, PSU, 1999 19 Glitch Reduction by Pipelining l Glitches are dependent on the logic depth of the circuit l Nodes logically deeper are more prone to glitching » arrival times of the gate inputs are more spread due to delay imbalances » usually affected by more PI switching l Reduce depth by adding pipeline registers Low Power Design for SoCs ASIC Tutorial Processor Core.39 Core.39 ©M.J. Irwin, PSU, 1999 Typical RISC Datapath l Five stage pipeline (originally for performance, but also helps with power) Low Power Design for SoCs Memory ASIC Tutorial Processor Core.40 Core.40 D$ WriteBack MDR Execute MAR I$ Decode Instruction PC Fetch ©M.J. Irwin, PSU, 1999 20 Pipelined Multiplier CLK B3 0 M03 M13 M23 M33 Y7 Low Power Design for SoCs Y6 B2 0 M02 M12 M22 M32 Y5 B1 0 M01 M11 M21 M31 Y4 B0 0 M00 A0 0 M10 Y0 A1 0 M20 Y1 A2 0 M30 Y2 A3 0 Y3 ASIC Tutorial Processor Core.41 Core.41 ©M.J. Irwin, PSU, 1999 Signal Gating l Mask unwanted switching activity from propagating source signal gated signal Latch/ FF control signal to suppress source signal l Generation of control signals requires additional logic circuitry (more power) Low Power Design for SoCs ASIC Tutorial Processor Core.42 Core.42 ©M.J. Irwin, PSU, 1999 21 Signal Gating, con’t l Signal gating saves power if the relative enable/disable frequency of control signal is much lower than the frequency of source signal (so many signal activities blocked) l Savings even greater if a group of source signals can share a control signal l Good candidates - clock signals, address or data buses, signals with high frequency or high glitching Low Power Design for SoCs ASIC Tutorial Processor Core.43 Core.43 ©M.J. Irwin, PSU, 1999 Guarded Evaluation Reduce switching activity by adding latches at the inputs if outputs are not used A A B C B C Multiplier condition l Latch l Multiplier condition Latch preserves previous value of inputs to suppress activity – could also use AND gates to mask one or both inputs to zero -> forced zero (good if zero-out condition changes infrequently compared to data rate) Low Power Design for SoCs ASIC Tutorial Processor Core.44 Core.44 ©M.J. Irwin, PSU, 1999 22 Sleep Modes l Software power control - power management » DOZE - most fu’s stopped except on-chip cache memory (cache coherency) » NAP - cache also turned off, time out or external interrupt to resume » SLEEP - clock off, external interrupt to resume Deeper sleep mode saves more power Low Power Design for SoCs Deeper sleep mode requires more latency to resume ASIC Tutorial Processor Core.45 Core.45 ©M.J. Irwin, PSU, 1999 PowerPC Sleep Modes Mode 66Mhz 80Mhz No power mgmt Dynamic power mgmt DOZE 2.18W 1.89W 307mW 2.54W 2.20W 366mW NAP 113mW 135mW SLEEP 89mW 105mW SLEEP without PLL 18mW 19mW SLEEP without clock 2mW 2mW 10 cycles to wake up from SLEEP Low Power Design for SoCs 100us to wake up from SLEEP+ ASIC Tutorial Processor Core.46 Core.46 ©M.J. Irwin, PSU, 1999 23 Keeper Circuits lA floating node (not driven by any gates) can suffer charge decay resulting in shortcircuit currents powered down weak l Keeper circuits can power down control » slightly increase power dissipation » slightly increase delay l Essential Low Power Design for SoCs in circuits with sleep modes ASIC Tutorial Processor Core.47 Core.47 ©M.J. Irwin, PSU, 1999 A Low Power Processor Core Example Low Power Design for SoCs ASIC Tutorial Processor Core.48 Core.48 ©M.J. Irwin, PSU, 1999 24 M• CORE Architecture GP Alt Control reg file reg file reg file (32bitx16) (32bitx16) (32bitx13) X port Y port Address bus Immed PC increment Scale Branch adder Barrel shift, FF1 Sign ext Instr pipeline ALU, priority encode, 0 detect Instr decoder Writeback bus H/W acc bus Low Power Design for SoCs Data bus ASIC Tutorial Processor Core.49 Core.49 ©M.J. Irwin, PSU, 1999 M• CORE Power Distribution 28% 36% 9% 5% 6% 42% 7% 36% Datapath Clock Control Low Power Design for SoCs 8% 9% 14% ASIC Tutorial Processor Core.50 Core.50 Reg File Addr/Data Bus Inst Reg Barrel Shifter X MUX Y MUX Addr Gen Other ©M.J. Irwin, PSU, 1999 25 Key References Alidina, Precomputation-based sequential logic optimization for low power, IEEE Trans. on VLSI Systems, 2(4):426-436, 1994. Hossain, Low power design using double edge triggered flipflop, IEEE Trans. on VLSI Systems, 2(2):261-265, 1994. Lakshminarayana, et.al., Common-Case Computation, Proc. of DAC, pp 5661, 1999. Motorola, M• CORE Architecture microRISC Engine, MCORE 1/D, www.mot.com/SPS/MCORE/info_documentation.htm Mutsunori, Low power designmethod using multiple supply voltages, Proc. of SLPED, pp. 36-41, 1997. Rabaey, Digital Integrated Circuits, Prentice-Hall, 1996. Reyes, Low Power FF Circuit and Method Thereof, Patent No 5,498,988, 1996. Roy, Power analysis and design at the system level, Low Power Design in Deep Submicron Electronics, Nebel and Mermet, Ed., Kluwer, 1997. Low Power Design for SoCs ASIC Tutorial Processor Core.51 Core.51 ©M.J. Irwin, PSU, 1999 Key References, con’t Sakuta, Delay balanced multipliers for low power, Proc. of SLPE, pp. 36-37, 1995. Scott, Designing the Low-Power M• CORE Architecture, Proc. Inter. Symp. Computer Architecture Power Driven Microarchitecture Workshop, June 1998. Stojanovic, A unified approach in the analysis of latches and FFs for low power systems, Proc. of ISLPED, pp. 227-232, 1998. Tiwari, Reducing power in high-performance microprocessors, Proc. of DAC, pp. 732-737, 1998. Tiwari, Guarded evaluation, Proc. ISLPD, pp. 221-226, 1995. Yeap, CPU controller optimization for HDL logic synthesis, Proc. of CICC, pp. 127-130, 1997. Yeap, Practical Low Power Digital VLSI Design, KAP, 1998. Low Power Design for SoCs ASIC Tutorial Processor Core.52 Core.52 ©M.J. Irwin, PSU, 1999 26