Minimum Dynamic Power Design Using Variable Input Delay CMOS Logic Vishwani D. Agrawal Dept. of ECE, Auburn University, AL, USA http://www.eng.auburn.edu/~vagrawal Tezaswi Raja Transmeta Corp., San Jose, CA, USA Michael L. Bushnell Dept. of ECE, Rutgers University, NJ, USA Research Funded by: National Science Foundation Jan 2005 Agrawal: Low Power Design 1 Talk Outline Motivation Background on Glitch Elimination Techniques Problem Statement New Variable Input Delay Logic Transistor Level Design of Variable Input Delay Gate Results Physical Level Implementation Conclusion and Future Work Jan 2005 Agrawal: Low Power Design 2 What Are Glitches? Delay =1 2 Delay = 2 2 Glitches occur due to differential (unbalanced) path delays. Glitches are transients that are unnecessary for the correct functioning of the circuit. Glitches waste power in CMOS circuits. Jan 2005 Agrawal: Low Power Design 3 Prior work Delay Balancing for Glitch Elimination: Hazard Filtering for Glitch Elimination: Glitch suppression by increasing the inertial delay of gates. Ref: Agrawal et al., VLSI Design `97, `99, `03, `04. Gate Sizing for Glitch Elimination: Balancing delays by adding buffers on select paths. Ref: Chandrakasan and Brodersen and other books Every gate is modeled as an equivalent inverter. Model is non-linear Ref : Berkelaar et al., IEEE Trans. on Circuits and Systems ‘96 Transistor Sizing for Area-Speed Oprimization: Jan 2005 Size the width and length of every transistor to get exact delay. Model is non-linear Convergence problems due to large search space. Ref: Fishburn et al., ICCAD ’85. Agrawal: Low Power Design 4 Example: Why Buffers Were Necessary? 1 Critical path delay = 3 1 1 Delay unit is the smallest delay possible for a gate in a given technology. Critical Path is the longest delay path in the circuit and determines the speed of the circuit. Jan 2005 Agrawal: Low Power Design 5 Example (cont.) 0 1 0 1 time 1 For glitch free operation of first gate: Differential delay at inputs < inertial delay OK Jan 2005 Agrawal: Low Power Design 6 Example (cont.) 1 1 1 0 time 1 For glitch free operation of second gate: Jan 2005 Differential delay at inputs < inertial delay OK (Assuming equality does not produce a glitch) Agrawal: Low Power Design 7 Example (cont.) 1 time 1 2 1 0 For glitch free operation of third gate: Jan 2005 Differential delay at inputs < inertial delay Not true for gate 3 Agrawal: Low Power Design 8 Example (cont.) 1 time 1 2 1 1 1 For glitch free operation with no IO delay increase: Must add a delay buffer. Buffer is necessary for conventional gate design – only gate output delay is controllable. Jan 2005 Agrawal: Low Power Design 9 Controllable Input Delay Gates 1 time 1 2 1 2 0 Assume gate input delays to be controllable Glitches can be suppressed without buffers Jan 2005 Agrawal: Low Power Design 10 Problem Statement Find a glitch reduction technique such that: All glitches are eliminated in the circuit. No delay buffers are inserted in the circuit. Circuit operates at the highest possible speed permitted by the device technology. Technique should be scalable for large circuits. Circuits are realizable at the physical level of design. Note: The objective is to minimize switching power. Hence, no attempt is made to reduce short-circuit and leakage power, which is an order of magnitude lower for present CMOS technologies; those components of power may be addressed in the future research. Jan 2005 Agrawal: Low Power Design 11 New Variable Input Delay Logic I/O path delay through a gate = Input Delay + Output Delay Output Delay Input Delay Propagation delay through a gate from the inputs to the outputs. Extra delay that can be added on a single I/O path through the gate, which can be controlled independently of the other input delays. Variable Input Delay Logic Jan 2005 Logic level design of circuits using components with variable input and output delays along different I/O paths through the gate. Agrawal: Low Power Design 12 Delay Model for a New Gate 1 d3,1 + d3 2 d3,2 + d3 3 Separate the output (inertial) and input delay variables. d3 - output delay of the gate. d3,1 - input delay of the gate along path from 1 to 3. Technology constraint: 0 d3,1 ,d3,2 ub Input delay difference has an upper bound, which we define as Gate Input Differential Delay Upper Bound ( ub ). Jan 2005 Agrawal: Low Power Design 13 Gate Input Differential Delay Upper Bound (ub) It is a measure of the maximum difference in delay of any two I/O paths through the gate, that can be designed in a given CMOS technology. Arbitrary input delays cannot be realized in practice due to the technology limitation at the transistor and layout levels. The bound ub is the limit of flexibility allowed by the technology to the designer at the transistor and layout levels. The following feasibility condition must be imposed while determining delays for glitch suppression: 0 di, j ub Jan 2005 Agrawal: Low Power Design 14 New Linear Programs We propose two new LPs for designing circuits based on the specifications of the design. Minimum dynamic power (MDP) LP Where the circuit consumes least power possible and operates at the highest possible speed for that power. Delay specification (DS) LP Jan 2005 Where the circuit meets a given delay requirement but does it by adding the smallest number of buffers. Agrawal: Low Power Design 15 New MDP LP Example 1 d5,1 + d5 5 d7,5 + d7 d5,2 + d5 2 d7,6 + d7 d6,2 + d6 3 d6,3 + d6 7 d7,4 + d7 6 4 Gate inertial delay variables d5 ..d7 Gate input delay variables di, j for every path through gate i from input j Corresponding window variables t5 ..t7 and T5 ..T7. Jan 2005 Agrawal: Low Power Design 16 New MDP LP Example (cont.) 1 2 d5,1 + d5 5 d7,5 + d7 d5,2 + d5 d7,6 + d7 d6,2 + d6 3 d6,3 + d6 7 d7,4 + d7 6 4 Inertial delay constraint for gate 5: d5 1 Input delay (feasibility) constraints for gate 5: 0 d5,1 ub 0 d5,2 ub Jan 2005 Agrawal: Low Power Design 17 New MDP LP Example (cont.) 1 2 d5,1 + d5 5 d7,5 + d7 d5,2 + d5 d7,6 + d7 d6,2 + d6 3 d6,3 + d6 7 d7,4 + d7 6 4 Differential delay constraints for gate 5: T5 > T1 + d5,1 + d5; T5 > T2 + d5,2 + d5; Jan 2005 t5 < t1+ d5,1 + d5; t5 < t2+ d5,2 + d5; Agrawal: Low Power Design d5 > T5 – t5; 18 New MDP LP Example (cont.) 1 2 d5,1 + d5 5 d7,5 + d7 d5,2 + d5 7 d7,6 + d7 d6,2 + d6 d7,4 + d7 3 d6,3 + d6 6 4 IO delay constraint for each PO in the circuit: T7 maxdelay; maxdelay is the parameter which gives the delay of the critical path. This determines the speed of operation of the circuit. Jan 2005 Agrawal: Low Power Design 19 New MDP LP Example (cont.) 1 d5,1 + d5 5 d7,5 + d7 d5,2 + d5 2 7 d7,6 + d7 d6,2 + d6 d7,4 + d7 3 d6,3 + d6 6 4 Objective Function: minimize maxdelay; This gives the fastest possible, minimum dynamic power consuming circuit, given the feasibility condition for the technology. Jan 2005 Agrawal: Low Power Design 20 Solution Curves Power Previous solutions New MDP LP solutions Power consumed by buffers Minimum Dynamic power ub = ∞ ub=15 ub=10 ub=5 Fastest Possible Design in any technology Jan 2005 ub=0 Maxdelay Agrawal: Low Power Design 21 Delay Specification LP If the design needs to meet a given delay specification and the designer is willing to sacrifice some dynamic power by inserting buffers. Modifications to MDP LP Insert buffer variables at every fanout stem and branches and at PIs (similar to Linear constraint set method by Raja et al.) maxdelay is a given parameter, which is the maximum delay of the critical path according to specification. Jan 2005 Agrawal: Low Power Design 22 Delay Specification LP Components of the LP Gate constraints – unchanged Input delay (feasibility) constraints – unchanged for same ub Differential delay constraints – unchanged Maxdelay constraints – unchanged but maxdelay is a given parameter. Objective function: Minimize sum ( dj) where j є buffers Jan 2005 Agrawal: Low Power Design 23 Solution Curves Power Previous solutions New MDP LP solutions New DS LP solutions Power consumed by buffers Minimum Dynamic power ub = ∞ ub=15 ub=10 ub=5 Fastest Possible Design in any technology Jan 2005 ub=0 Maxdelay Agrawal: Low Power Design 24 Transistor Level Implementation Ron Cr Cin d3,1 Cin d3,2 Ron Ron Cp Cr Cin Cr Conventional CMOS gate design: Delay = Ron ( Crouting + Cinput ) Energy = 0.5 (Cr + Cin ) V2 Delay can be changed by changing the resistance or the capacitance. Resistance does not affect energy per transition. Jan 2005 Agrawal: Low Power Design 25 Transistor Level Implementation Possible implementations of the variable input delay gate: Capacitance manipulation method where the input capacitance offered by the respective transistor pair is varied. Pass transistor added design where an extra transistor is added to increase the resistance and thereby the input delay. We propose the addition of: Jan 2005 Single nMOS transistor CMOS pass transistor We describe the single nMOS transistor added design in detail here. The other two are documented in the thesis. Agrawal: Low Power Design 26 Single nMOSFET Added Design Ron d3,1 = Ron (Cr + Cin ) + Rs Cin Rs Cr Cin Cin Ron d3,1 d3,2 d3,2 = Ron (Cr + Cin ) Energy = 0.5 (Cr + Cin ) V2 Cr d3,1 = Output + Input delay The input delay can be added by an nMOS transistor in series to the path desired. The addition of resistance does not increase the energy per transition. Jan 2005 Agrawal: Low Power Design 27 Effect of Input Slope Rs Too large ub cannot be realized in practice due to noise issues. Increased resistance degrades the slope of a signal and we use the CMOS gate following it to regenerate the slope. The regenerative capability of a gate is limited and this determines practical ub value. The slope allowed in a design depends on the noise specifications of the circuit. Jan 2005 Agrawal: Low Power Design 28 Single nMOSFET Added Design Advantages: Almost completely independent control of input delays. ub is very high compared to capacitance manipulation method. Very less overhead compared to a conventional buffer. Can be integrated to full-custom as well as standard cell place and route design flows. Design Issues: nMOSFET degrades the signal when passing logic 1. Hence, it increases the leakage of the transistors in the fanout stages. However, this is for certain input combinations only. Short circuit current is a function of the ratio of input/output slopes. Since we increase the input slope by inserting resistance, it might increase short circuit power by a minor amount. Jan 2005 Agrawal: Low Power Design 29 CMOS Pass Transistor Added Design Ron Rs Cr Ron Cin Cin d3,1 = Ron (Cr + Cin) + Rs Cin d3,1 d3,2 d3,2 = Ron (Cr + Cin) Energy = 0.5 (Cr + Cin) V2 Cr d3,1 = Output + Input delay The input delay can be added by the input CMOS pass transistor in series to the path desired. This does not degrade the signal as both transistors together conduct both logic values well. Jan 2005 Agrawal: Low Power Design 30 Technology Mapping Delay required Look Up Table for sizes Transistor Sizes yes Error no acceptable? Increment that transistor dimension Sensitivity of each transistor size to delay Determine sizes of transistors in a gate for the given delay and given load capacitance. First guess is given by the look-up table. Second stage is sensitivity driven. Reduces the complexity of transistor search. Jan 2005 Agrawal: Low Power Design 31 Results for Speed of Circuit Using MDP LP Maxdelay is normalized to the length of the critical path when all gates are of unit delay. Each curve is a different benchmark circuit. As we increase ub the circuit becomes faster. Flexibility required for fastest operation of circuit is proportional to the size of the circuit. Jan 2005 Agrawal: Low Power Design 32 Power Opt. Using MDP LP (for ub=10) Circuit No. of maxdelay Norm. vectors delay Original power Optimized power Avg. Peak Avg. Peak c432 56 71 4.17 1.0 1.0 0.65 0.55 c499 54 34 2.26 1.0 1.0 0.70 0.65 c880 78 45 1.50 1.0 1.0 0.48 0.45 c1355 87 67 2.05 1.0 1.0 0.47 0.36 c1908 144 173 4.32 1.0 1.0 0.54 0.44 c2670 82 35 1.09 1.0 1.0 0.68 0.56 c3540 200 347 7.38 1.0 1.0 0.53 0.43 c5315 157 542 11.06 1.0 1.0 0.53 0.44 c6288 141 124 1.87 1.0 1.0 0.22 0.18 c7552 158 50 1.16 1.0 1.0 0.28 0.26 Jan 2005 Agrawal: Low Power Design 33 Power Opt. Using DS LP (for ub=10) Circuit c432 c499 c880 c1355 c1908 Jan 2005 Norm. Maxdelay Conventional gates Variable input delay gates (Raja et al., VLSI Design `03) Avg. Peak Buffers Avg. Peak Buffers 1.0 0.72 0.67 95 0.69 0.66 61 2.0 0.62 0.60 66 0.65 0.55 0 1.0 0.91 0.87 48 0.86 0.84 0 2.0 0.70 0.66 0 0.71 0.65 0 1.0 0.68 0.54 62 0.58 0.45 1 2.0 0.68 0.52 34 0.56 0.45 0 1.0 0.58 0.48 224 0.48 0.42 64 2.0 0.57 0.48 192 0.44 0.39 32 1.0 0.69 0.59 219 0.56 0.46 5 2.0 0.59 0.44 70 0.55 0.45 4 Agrawal: Low Power Design 34 Power Opt. Using DS LP (for ub=10) Circuit c2670 c3540 c5315 c6288 c7552 Jan 2005 Norm. Power (conventional gates) Maxdelay (Raja et al., VLSI Design `03) Power (variable input delay gates) Avg. Peak Buffers Avg. Peak Buffers 1.0 0.79 0.65 157 0.70 0.56 2 2.0 0.71 0.58 35 0.69 0.57 0 1.0 0.64 0.44 239 0.57 0.46 3 2.0 0.58 0.46 140 0.54 0.43 1 1.0 0.63 0.52 280 0.57 0.48 26 2.0 0.60 0.45 171 0.55 0.46 4 1.0 0.40 0.36 294 0.91 0.87 584 2.0 0.36 0.34 120 0.21 0.16 0 1.0 0.38 0.34 366 0.28 0.24 1 2.0 0.36 0.32 111 0.27 0.24 0 Agrawal: Low Power Design 35 Example Circuit 1 2 3 5 4 d=2 1 2 3 7 d=1 Unoptimized Circuit d=1 d=1 d=1 5 4 d=1 1 2 3 6 4 d=2 Jan 2005 7 d=1 d=2 d=1 Buffer optimized Circuit d=1 5 7 6 d=2 6 d=1 d=1 nMOS optimized Circuit d=1 Agrawal: Low Power Design 36 Example Circuit – Spectre Results time Unoptimized Circuit Jan 2005 time Buffer optimized Circuit Agrawal: Low Power Design time nMOS optimized Circuit 37 Physical Level Verification AMPL Delays Technology Mapping Transistor Sizes Create Cells using Prolific Standard Cell Library No Routing acceptable? Standard Cell Place and Route Layout Extract Routing Capacitance Routing load Yes Optimized Layout Jan 2005 Analog Power simulations Energy Consumption Agrawal: Low Power Design 38 Layouts of C7552 (0.25 CMOS) c7552 Un-optimized Gate Count Transistor Count Critical Delay Area Jan 2005 = 3827 ≈ 40,000 = 2.15 ns = 710 x 710 um2 c7552 optimized (ub = 10) Gate Count = 3828 Transistor Count ≈ 45,000 Critical Delay = 2.15 ns Area = 760 x 760 um2(1.14) Agrawal: Low Power Design 39 Instantaneous Power Savings Peak Power Savings = 68% Jan 2005 Agrawal: Low Power Design 40 Patents and Dissertations Patents V. D. Agrawal, “Low Power Circuits Through Hazard Pulse Suppression,” U.S. Patent 5,983,007, November 1999. T. Raja, V. D. Agrawal and M. L. Bushnell, “Variable Input Delay CMOS Logic and Its Application to Low Power Design,” to be submitted to USPTO through Rutgers Univ., May 2004. Dissertations Jan 2005 T. Raja, Minimum Dynamic Power Design of CMOS Circuits using a Reduced Constraint Set Linear Program, MS Thesis, Dept. of ECE, Rutgers University, May 2002. T. Raja, Minimum Dynamic Power CMOS Design with Variable Input Delay Logic , PhD Thesis, Dept. of ECE, Rutgers University, May 2004. S. Uppalapati, Low Power Design of Standard Cell Digital VLSI Circuits, MS. Thesis, Dept. of ECE, Rutgers University, October 2004. Agrawal: Low Power Design 41 Papers V. D. Agrawal, “Low-Power Design by Hazard Filtering,” Proc. 10th Int. Conf. VLSI Design, Jan. 1997, pp. 193-197. V. D. Agrawal, M. L. Bushnell, G. Parthasarathy, and R. Ramadoss, “Digital Circuit Design for Minimum Transient Energy and a Linear Programming Method,” Proc. 12th Int. Conf. VLSI Design, Jan. 1999, pp. 434-439. T. Raja, V. D. Agrawal, and M. L. Bushnell, “Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program,” Proc. 16th Int. Conf. VLSI Design, Jan. 2003, pp. 527-532. T. Raja, V. D. Agrawal, and M. L. Bushnell, “CMOS Circuit Design for Minimum Dynamic Power and Highest Speed,” Proc. 17th Int. Conf. VLSI Design, Jan. 2004, pp. 1035-1040. T. Raja, V. D. Agrawal, and M. L. Bushnell, “Variable Input Delay CMOS Logic for Low Power Design,” Proc. 18th Int. Conf. VLSI Design, Jan. 2005, pp. 368-374. Jan 2005 Agrawal: Low Power Design 42 Conclusion Main idea: Minimum dynamic power high speed circuits can be designed if gates with variable input delays are used. The new design suppresses all glitches without any delay buffers. Decreases power without loss in speed and very little increase in area. Developed a linear program solution to demonstrate the idea. Developed new gate design for transistor level implementation. Results have been verified by physical layout design of large circuits. Results show average power savings up to 58%. Technique easily scalable for large circuits. Leakage power remains a concern – ongoing research. Jan 2005 Agrawal: Low Power Design 43 ILP Optimization of Leakage by Dual-Threshold Devices 70nm CMOS, 90oC, spice evaluation. Jan 2005 Agrawal: Low Power Design 44