Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Chapter 6: Low-Energy System Issues Wiley-Interscience and IEEE Press, January 2003 Clocking Energy • System Issues • Increasing clock frequency • Low skew requirements • Deeper pipelining • Increased Clocking Energy • Focus on energy minimization • Methods for Energy Reduction • Supply voltage scaling • Alternate circuit topology • New clocking strategies Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 2 Switching Energy Clock is the highest activity signal Switching energy is dominant N Eswitching 01 (i ) Ci Vswing (i ) VDD i 1 - probability of a 0-1 transition Ci – total switched capacitance at node i Vswing(i) – voltage swing at node i VDD – supply voltage N – number of nodes Energy is best reduced by scaling down VDD, but this also means performance degradation Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 3 Supply Voltage Scaling • Supply voltage is the most dominant tuning knob for energy reduction • Not always possible due to performance requirements • Flip-flop speed scales favorably with Vdd compared to standard CMOS inverter • Improved slow-path requirements • Internal race immunity tracks scaling of an inverter • Same fast-path requirements Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 4 Pulsed Designs Scale the Best with VDD (a) (b) (Delay is normalized to FO4 at its respective VDD. FO4 as defined in this example increases with VDD performance degradation) Impact of Vdd on (a) delay, and (b) internal race immunity (0.25 mm, light load). (Markovic et al. 2001), Copyright © 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 5 Minimizing Switched Capacitance • Clocked capacitances are the most important to minimize due to their high switching activity • Reduce Csw by circuit sizing that takes into account both energy and performance • Topology dependent • Lower limit imposed by the noise requirements Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 6 Other Energy Reduction Techniques • Low-Swing Circuit Techniques • Reduced-swing Clk drivers • CSE redesign • N-only CSEs with Low-Vcc Clk • Clock Gating • Global • Local • Dual-Edge Triggering • Latch-mux • Pulsed-latch • Flip-flop Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 7 Other Energy Reduction Techniques • Low-Swing Circuit Techniques • Reduced-swing Clk drivers • CSE redesign • N-only CSEs with Low-Vcc Clk • Clock Gating • Global • Local • Dual-Edge Triggering • Latch-mux • Pulsed-latch • Flip-flop Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 8 Low-Swing Clocking, Option #1: Clock Driver Re-design VDD GND Clk Cp1 CPT CNT Cn1 Cp2 CPB CNB Cn2 CA VDD Vthp CPT CPB CNT CNB H-VDD CB Vthn GND 50% power reduction with half-swing clock (minus some penalty in clock drivers) Clock driver for half-swing clocking (Kojima at al. 1995), Copyright © 1995 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 9 Low-Swing Clocking, Option #2: CSE Re-design PMOS does not fully turn-off Vwell > VDD VDD E(a),(b) ~VDD (VDD-Vth ) E(c) ~(VDD -Vth )2 Clk (VDD -Vth ) V DD-Low (a) VDD D Clk (V DD-Low ) n Clk (c) Clk (VDD -nVth ) (b) Q Q Clock drivers Reduced clock-swing flip-flop (Kawaguchi and Sakurai, 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 10 Low-Swing Clocking, Option #3: N-only CSEs Clk D Clk Clk Q QM SM N1 N2 SS Clk N3 N4 Clk N-only clocked transistors, M-S Latch Example (N1 and N2 improve pull-up on SM) N-Only clocked M-S latch Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 11 Other Energy Reduction Techniques • Low-Swing Circuit Techniques • Reduced-swing Clk drivers • CSE redesign • N-only CSEs with Low-Vcc Clk • Clock Gating • Global • Local • Dual-Edge Triggering • Latch-mux • Pulsed-latch • Flip-flop Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 12 Clock Gating, Option #1: Global Clock Gating (a) (b) 0 In 1 S D Load Clk REG Time-mux (no gating!) In Q EN Clk D Q REG Global Clk Gating Used to save clocking energy when data activity is low (a) Nongated clock circuit, (b) gated clock circuit. (Kitahara et al. 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 13 Clock Gating, Option #2: Local Clock Gating CP CP QM D Clk CP Q CP CP CP Pulse Generator CPI Clock Control P1 Data-Transition Look-Ahead CP CP Data-transition look-ahead latch (Nogawa and Ohtomo, 1998), Copyright © 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 14 Local Clock Gating, Another Example Clk * S Clk enabled only when D Q S' Q Q N * R R' N D D Clk Clk1 S' R' Q Q R S Conditional capture flip-flop (Kong et al. 2000), Copyright © 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 15 Other Energy Reduction Techniques • Low-Swing Circuit Techniques • Reduced-swing Clk drivers • CSE redesign • N-only CSEs with Low-Vcc Clk • Clock Gating • Global • Local • Dual-Edge Triggering • Latch-mux • Pulsed-latch • Flip-flop Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 16 Dual-Edge Triggering, Option #1: Latch-Mux D D Q 0 Clk C Q D Q C Q 1 S Q Used to save clocking energy regardless of data activity! Dual-edge-triggered latch-mux design Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 17 DET Latch-Mux: Circuit Example Clk D Clk Clk Clk Clk Clk Clk Clk Q Clk Clk Dual-edge-triggered latch-mux circuit (Llopis and Sachdev, 1996), Copyright © 1996 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 18 Dual-Edge Triggering, Option #2: Pulsed-Latch D Pulse Gen D Q Q C Q Q C Pulse Gen Clk C Dual-edge-triggered pulsed-latch design Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 19 DET Pulsed-Latch: Circuit Examples Clk Clk2 Clk Clk Clk1 Clk1 Clk1 Clk Clk Clk2 Clk Clk1 Clk1 Clk1 Clk D D Clk2 Clk Q Q Clk2 Clk Clk2 (a) Clk2 Clk1 Clk Clk Clk1 Clk Clk (b) Pulsed-latch: (a) single-edge-triggered; (b) dual-edge-triggered Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 20 Dual-Edge Triggered Flip-Flop D D S C R Q Q Q Q CL Clk D S C R Dual-edge-triggered flip-flop design Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 21 DET Flip-Flop: Circuit Example 1st stage: PG Latch X 1st stage: PG Latch Y CL Clk Clk SX SY D D Clk1 Clk2 Clk Q Clk Clk Clk1 Clk Clk1 Clk2 DET symmetric pulse-generator flip-flop Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 22 Clock Distribution for DET CSEs s = Storage Element s/2L-1 s s/2L-1 PLL M/4L-1 storage elements H-tree clock distribution network Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 23 PClk (DET) / P Clk (SET) Clocking Power: SET vs. DET 1.2 Single edge better Double edge better 1.0 0.8 0.6 0.4 0.2 1 2 3 0.0 0 0.3 0.6 0.9 1.2 1.5 CClk-CSE,SE /CWIRE-L DET wins if the total clock load capacitance is below 2x the capacitance of a SET design Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 24 Glitch Tolerant Design • Are CSEs that are the best in terms of active power also favorable in terms of glitch power? • The opposite is true • So, let’s investigate how glitch power scales with data and glitching activity Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 25 Average Glitching Energy in CSEs Comparison of average glitching energy in CSEs (Markovic at al. 2001), Copyright © 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 26 Comparison of Eglitching and Eswitching Glitching energy as a percentage of switching energy in representative CSEs showing the greatest glitch sensitivity of the gated designs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 27 Summary • Energy best reduced by VDD scaling • Penalty in performance • Reducing Clk swing only reduces EClk • Still penalty in performance • Clock gating • Reduces EClk at low-activity • No penalty in perf. if gating is outside crit-path • Dual-Edge Triggering • Reduces EClk ideally by 2x • Small or no performance degradation Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 28