Design of Variable Input Delay Gates for Low Dynamic Power Circuits Tezaswi Raja, Transmeta Corp., Santa Clara, CA Vishwani D. Agrawal, Dept. of ECE, Auburn University Michael L. Bushnell, Dept. of ECE, Rutgers University Research Funded by: National Science Foundation Talk Outline Motivation Transistor Level Design of Variable Input Delay Gate Results References Conclusion and Future Work Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 2 Motivation: Variable Input Delay Gates 2 1 3 2 1 0 0 Unoptimized Produce glitches Waste power. 2 3 1 1 2 0 Buffer Optimized Variable Input Delay Gate Glitches removed. Active power consumed in buffer. Leakage paths added through buffer. Glitches removed. No extra leakage paths added. Issues: Sep 23, 2005 3 Tezaswi Raja: PATMOS Conf. Leuven. Can we design such a gate? How much can the delays through IO paths differ by? 3 Problem Statement Design a gate at the transistor-level such that The gate has different delays along different IO paths. The maximum achievable difference in delay between any two paths (ub) through the gate can be quantified. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 4 Transistor Level Implementation We propose three new implementations of the variable input delay gate Capacitance manipulation method where the input capacitance offered by the respective transistor pair is varied. Pass transistor added design where an extra transistor is added to increase the resistance and thereby the input delay. We propose the addition of: Single nMOS transistor CMOS pass transistor We describe the pass transistor added design in detail here. The first design is documented in the paper. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 5 Concept of Increasing Resistance Ron C in Delay = Ron (Cp + Cr + Cin) 2 Cr Energy = 0.5 (Cr + Cin) V Need a CMOS gate with different delays along different IO paths. Note that the resistance of the path influences only the delay and not the energy consumed. Hence, adding more resistance can be the best way to add delay without wasting more energy. Solution: Add another transistor in series to the path. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 6 Single nMOSFET Added Design Ron Cr Cr Cin Cin Ron d3,1 = Ron (Cr + Cin) + Rs Cin Rs d3,1 d3,2 d3,1 = Output + Input delay d3,2 = Ron (Cr + Cin) Energy = 0.5 (Cr + Cin) V2 The input delay can be added by the input nMOS transistor in series to the path desired. The addition of resistance does not increase the energy per transition. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 7 1- λ -IdsRs Ids +IdsRs Rs Linear Rs 0 Logic 1 transmission Logic 0 transmission For pmos cutoff: (pmos threshold) For nmos cutoff: (nmos threshold) 1- λ - IdsRs > Vdd – Vtp IdsRs < Vtn Constraints give upper bound on Rs and λ Upper bound on Rs determines upper bound on ub Can be made specific to any technology. Note: nmos conducts logic ‘0’ well but ‘1’ is degraded (shown by λ). Sep 23, 2005 Linear Ids Cutoff 1 Cutoff Theoretical Calculation of ub Tezaswi Raja: PATMOS Conf. Leuven. 8 Effect of Input Slope Rs Theoretical ub cannot be realized in practice due to noise issues. Increased resistance degrades the slope of a signal and we use the CMOS gate following it to regenerate the slope. The regenerative capability of a gate is limited and this governs practical ub value. The slope allowed in a design depends on the noise specifications of the circuit. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 9 Single nMOSFET Added Design Advantages: Complete independent control of input delays. ub is very high compared to capacitance manipulation method. Very less overhead compared to a conventional buffer. Can be integrated to full-custom as well as standard cell place and route design flows. Design Issues: nMOSFET degrades the signal when passing logic 1. Hence, it increases the leakage of the transistors in the fanout stages. However, this is for certain input combinations only. Short circuit current is a function of the ratio of input/output slopes. Since we increase the input slope by inserting resistance, it might increase short circuit power by a minor amount. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 10 CMOS Pass Transistor Added Design Ron Rs Cr Ron Cr Cin Cin d3,1 = Ron (Cr + Cin) + Rs Cin d3,1 d3,2 d3,1 = Output + Input delay d3,2 = Ron (Cr + Cin) Energy = 0.5 (Cr + Cin) V2 The input delay can be added by the input CMOS pass transistor in series to the path desired. This does not degrade the signal as both transistors together conduct both logic values well. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 11 1 -IdsRs Linear Rs 0 Ids +IdsRs Rs Logic 1 transmission Linear Ids Cutoff 1 Cutoff Theoretical Calculation of ub Logic 0 transmission For pmos cutoff: (pmos threshold) For nmos cutoff: (nmos threshold) 1 - IdsRs > Vdd – Vtp IdsRs < Vtn Constraints give upper bound on Rs and λ Upper bound on Rs determines upper bound on ub Can be made specific to any technology. Note that the resistance is a parallel combination of both the resistances of the transistors. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 12 CMOS Pass Transistor Added Design Advantages: No signal degradation for any logic value No increase in leakage current in fanout stage. All other advantages as the nMOSFET added design Design Issues: Two transistors are added instead of one. Effective resistance per unit length is lesser due to the parallel combination of resistances. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 13 Technology Mapping Delay required Look Up Table for sizes Transistor Sizes yes Error no acceptable ? Increment that transistor dimension Sensitivity of each transistor size to delay Determine sizes of transistors in a gate for the given delay and given load capacitance. First guess is given by the look-up table. Second stage is sensitivity driven. Reduces the complexity of transistor search. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 14 Physical Level Verification c7552 Un-optimized Gate Count Transistor Count Critical Delay Area Sep 23, 2005 = 3827 ≈ 40,000 = 2.15 ns = 710 x 710 um2 c7552 optimized (ub = 10) Gate Count = 3828 Transistor Count ≈ 45,000 Critical Delay = 2.15 ns Area = 760 x 760 um2(1.14) Tezaswi Raja: PATMOS Conf. Leuven. 15 Instantaneous Power Savings Peak Power Savings = 68% Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 16 Average Energy Savings Average Energy Savings = 58% Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 17 Related Publications Theses 1. “Minimum Dynamic Power Deisgn with Variable Input Delay Logic”, PhD Thesis, Dept. of Elec. and Comp. Eng., Rutgers University, May 2004. “Minimum Dynamic Power Design of CMOS Circuits using a Reduced Constraint Set Linear Program,” MS Thesis, Dept. of Elec. and Comp. Eng., Rutgers University, May 2002. 2. Journal Papers 1. T. Raja, V. D. Agrawal and M. L. Bushnell, “Low Power CMOS Design for Minimum Power and Highest Speed using a New Gate Design”, submitted to IEEE Transactions on VLSI(IEEETVLSI), in April, 2005. Conference Papers: 1. T. Raja, V. D. Agrawal and M. L. Bushnell, “Design of Variable Input Delay Logic for Low Dynamic Power Circuits,” Proc. Of PATMOS Conf. , Sep 2005. T. Raja, V. D. Agrawal and M. L. Bushnell, “Variable Input delay logic and its Application to Low Power Design,” Proc. 18th Int’l. Conference on VLSI Design, Jan 2005. T. Raja, V. D. Agrawal and M. L. Bushnell, “CMOS Design of Circuits for Minimum Power and Highest Speed,” Proc. 17th Int’l. Conference on VLSI Design, Jan 2004. T. Raja, V. D. Agrawal, and M. L. Bushnell, “Minimum Dynamic Power Design of CMOS Circuits using a Reduced Constraint Set Linear Program,” Proc. 16th Int’l. Conf. on VLSI Design, pp. 527-532, Jan 2003. 2. 3. 4. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 18 Conclusion Pass transistor (nMOS and CMOS) can be used as a delay element instead of a buffer. There are limitations to the size of the transmission gate used based on Input slope degradation Signal degradation when passing a high signal through nMOS. Transmission gate can be used for delay as long as the delay does not exceed ub. Described the technique to calculate ub for a given technology. Described the algorithm for sizing of the three variable input delay gates for given delay requirements. Presented results on power savings using these new gates. FUTURE WORK: Include Leakage power in the analysis. Analyze results for more recent technologies. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 19 Thank you Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 20 Design Issues and FAQ Is this not similar to Input Re-ordering techniques? Input re-ordering can change only the rise or fall delay but not both. The capacitance manipulation method also cannot have completely independent control over both rise and fall delays but input re-ordering has zero control. The ub obtained by the input re-ordering is much smaller than what can be obtained by Capacitance manipulation. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 21 Design Issues and FAQ Does this increase Leakage Power? Observed no increase for 0.25u technology. Need to investigate for present technologies. Can be complemented with known leakage reduction techniques. How big should the standard cell library be? For c7552 with 3827 gates, we needed 155 different standard cells generated by Prolific. Area can be further reduced if these cells are custom designed. Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 22 Transistor Overhead 1,4 – nMOS added design (for maxdelay = 1 and 2) 2,5 – CMOS added design (for maxdelay = 1 and 2) 3,6 – Buffer added design (for maxdelay = 1 and 2) Sep 23, 2005 Tezaswi Raja: PATMOS Conf. Leuven. 23