Implementation of Constant Delay Model Implementation of Low

1 Implementation of Constant Delay Model Implementation of Low Power Constant Delay Model T. Leela Krishna 1, Dr.J. Selva Kumar 2. 1. SRM University, Kattankulathur, Tamil Nadu 603203, INDIA, e-mail: thota.leelakrishna@gmail.com , 2. SRM University, Kattankulathur, Tamil Nadu 603203, INDIA, e-mail: selvakumar.j@ktr.srmuniv.ac.in ABSTRACT This paper describes a unique logic style description named Constant Delay Logic (CDL) style, targeting at full custom implementation of high speed & low power applications. Pre evaluation of output before the arrival of inputs from the preceding stages is ready becomes an added advantage of CDL style. Besides adjusting the width of timing window, clock allocation & its distribution are considered as important design factors. Power consumption is significantly reduced, but the precharge propagation path delay affects the speed performances and limits the energy–delay product (EDP) improvements. Using 45-nm general purpose CMOS technology, CDL is evaluated for single cycle multi-staged circuit block. Simulation results unveil that the CDL achieves better performance and is more energy efficient than the other logic styles for the implementation on several circuits. Keywords: Constant delay logic model (CDL), contention, glitch, pre – evaluation. 1. INTRODUCTION Ever growing demand of Low Power energy efficient high performance very large scale integration (VLSI) can be addressed at different design levels mentioned as the architectural, algorithmic, circuit, layout, and the process technology level [2]. At the circuit design level, power saving can be done by means of selecting proper choice of a logic style for implementing arbitrary circuits. This is because all the important parameters governing power dissipation like switching capacitance in the entire circuit, transition activity, and short-circuit currents which are strongly influenced by the logic style selected. Application dependency, implementation methodology of the circuit, and the design technique used, various performance aspects become vital, disallowing the articulation of universal rules for optimal logic styles. With ever increasing clock frequency, energy efficient logic style which can be useful in the design of low power consuming circuits became a pre-requisite. The power dissipation characteristics of various existing logic styles are compared qualitatively and quantitatively by actual logic gate implementations & simulations are carried under realistic circuit arrangements and operating conditions [2]. However, the performance enhancement comes with several costs, including a reduced noise margin, a problem of chargesharing and higher power dissipation due to a higher data activity. Several variations of the dynamic domino logic, namely NP domino (NORA domino) [2], zipper domino [4], and data-driven dynamic logic (D3L) [5], [6], have been proposed but they are never widespread in the VLSI industry [7], [8]. A tremendous effort on research has been dedicated to explore new logic styles that go beyond dynamic domino & static logic’s. In specific, source-coupled logic (SCL) [12] demonstrated victor performances that are difficult to achieve by any other logic styles. However, it lacks due to its sufferings from high power dissipation due to a constant current draw, and its differential nature requires complementary signals. Pseudo-NMOS logic uses a single pull-up PMOS transistor, provides both high speed and less transistor count at the expense of high static power consumption as well as reduced output voltage swing. Output prediction logic (OPL) [11] has also shown superior performance in high-speed adders [6]. Yet, OPL requires the generation and distribution of multiphase clock signals with small timing separations and low skews, which are hard to achieve. Though numerous highspeed logic styles have been proposed, dynamic and CDL still remain the enthralling choices when performance is the primary concern. In recent years, a new way of logic operation named feed through logic (FTL) [12], has been proposed, which has demonstrated its high-performance potentiality. 2 Implementation of Constant Delay Model The rest of the paper is organised as follows. Section 2 gives an introduction to the most important existing static & dynamic logic styles and compares them qualitatively. Section 3 provides the importance of FTL & introduction of CDL. Section 4 renders the design of CDL & measurement of unwanted glitch voltage using first order approximation of Taylor’s series. Section 5 dissects the impact of timing window width on CDL & it’s comparison with other logic style’s using different logic expressions. Section 6 concludes with some final comments. A optimum robust operation of CDL is achieved as long as clock signals reaches earlier than the input signals. In this paper, we exhibit that: 1) CDL surpasses other logic styles as it has better energy efficiency which is well suited for high-performance digital blocks & 2) Robustness under extreme process and temperature variations (PVT). 2. LOGIC STYLES 3.1. FTL Logic The logic style employed in logic gates basically influences the speed, wiring complexity, power dissipation, and the size of a circuit. FTL [13] [Figure 2] in CMOS technology basic operation is as follows: Consider dynamic domino logic [Figure 1.], the critical path consists of NMOS logic transistors. In FTL, the part of the clock & logic transistor is replaced & the clock transistor itself is included in critical path. 3. EXPLOITATION OF CDL When CLK is high, the pre discharge period occurs and Out is pulled down to GND through M2. When CLK is low, M1 is on, M2 is off, and the gate moves into the evaluation period. Figure 2: FTL Figure 1: Schematic of Dynamic Domino Logic with Footer transistor A high performance capable feed through logic (FTL) came into existence. The first era of FTL discloses many defects which includes excessive power dissipation & reduced noise margin. To alleviate these problems proposed new high performance logic is Constant Delay Logic (CDL).CDL renders a local window technique & a self-reset circuit where robust logic operation is enabled with minimum power consumption besides maintaining speed advantage of FTL. The uniqueness of CDL from previously existing logic styles is delay which is not affected by the logic expression on firstorder approximation,. CDL does not need complementary signals & can be easily incorporated with static as well as dynamic domino logics. CDL is exempted to suffer from constant static power dissipation which usually occurs in Pseudo NMOS. Moreover, timing requirement of the clock in CDL is not as rigorous as OPL. If inputs (IN) are at logic “1,” Out enters into the contention mode where M1 and transistors in the NMOS pull-down network (PDN) are conducting current simultaneously. If PDN is off, then the output quickly rises to logic “1.” In this case, FTL’s critical path is always a single PMOS transistor. Contempting the performance advantage, FTL is affected from reduced noise margin, excess direct path current and output voltage which is nonzero nominally low. It is because of contention between M1 and NMOS PDN during the evaluation period. Furthermore, cascading multiple FTL stages together to perform complicated logic evaluations is not practical. Consider a chain of inverters implemented in FTL cascaded together and driven by the same clock, as shown in Figure 3. When CLK is low, M1 of every stage turns on, and the output of every stage begins to rise. This will result in false logic evaluations at even numbered (i.e., 2, 4, 6, etc.,) stages since initially there is no contention between M1 and NMOS PDN because all inputs to NMOS transistors are reset to logic “0” during the reset period. 3 Implementation of Constant Delay Model Figure 5: CDL Basic Buffer Figure 3: Inverter Chain with FTL In order to resolve the above problems, CDL is proposed with a schematic shown in Figure 4. Figure 6: Timing diagram 4. DESIGN CONSIDERATIONS 4.1 CDL Sizing: The sizing of INV1-3 and M3–M6 in Figure 5 is close to the minimum size which will not create any effect on area. Modification of the length of INV1-3 provides the required timing window duration based on requirement. Figure 4: CDL Block Diagram Timing block (TB) creates an adjustable window period in order to minimise the static power dissipation. Logic Block (LB) helps to eliminate the unwanted glitch and also makes cascading CDL feasible. A buffer implemented in CDL with schematics of TB and LB is shown in Figure 5 4.1.1 CDL Versus Pseudo-NMOS: Though pseudo-NMOS and CDL are ratioed circuits, they depend on the correct PMOS to NMOS strength ratio to perform correct logic operations[.3] PMOS transistor width is adjusted one-half the strength of the NMOS PDN as a tradeoff between noise margin and speed in pseudo-NMOS [1]. On the other hand, CDL always discharges X to GND when CLK is high, therefore, CDL can be optimized for low-to-high transition only. In order to provide more speed up, PMOS clock transistors in CDL can be upsized larger until the output glitch is maintained at an acceptable level. 4 Implementation of Constant Delay Model 4.2 Output Glitch Measurement: Figure 7 provides a simplified schematic of CDL during the contention mode, where both transistors P1 and N1 are on simultaneously and induce a glitch voltage Figure 7 again transistor N2 operates in the subthreshold region while P2 is working in the linear mode. Equating the two current equations yields 𝑊𝑛2 𝐼𝐷 = 𝐼𝑒 𝐿𝑛2 1 𝑉 − 𝑑𝑠𝑛2 𝑉𝑔𝑠𝑛2 −𝑉𝑡𝑛 (1−𝑒 𝑉𝑇 ) 𝜂𝑉𝑇 𝑊𝑝2 (𝑉𝑑𝑠𝑝2 ) = 𝜇𝑝 𝐶𝑜𝑥 [(𝑉𝑔𝑠𝑝2 − 𝑉𝑡𝑝 )𝑉𝑑𝑠𝑝2 − 𝐿𝑝2 2 Where,𝐼𝑡 ΔV1, which in turn generates another smaller glitch ΔV2. By design, ΔV1 should be small [<VT]. Hence, P1 operates in the saturation region while N1 is in the linear region. The current equation is given as = 𝜇𝑛 𝐶𝑜𝑥 𝐿𝑛1 [ (𝑉𝑔𝑠𝑛1− 𝑉𝑡𝑛 )𝑉𝑑𝑠𝑛1 − 2 = (𝑉𝐷𝐷 − ∆𝑉1 − 𝑉𝑡𝑝 )∆𝑉2 − (−𝑉𝑑𝑠𝑛2 ) ⁄𝑉 𝑇 √(𝑉𝐷𝐷 − △ 𝑉1 − 𝑉𝑡𝑝 ) 2 − 𝐴𝑒 (1) ] (△ 𝑉1 )2 𝑊𝑝1 (𝑉𝐷𝐷 − 𝑉𝑡𝑝 ) = (𝑉𝐷𝐷 − 𝑉𝑡𝑛 ) △ 𝑉𝑡 − (2) 4𝑊𝑛1 2 𝐴= △𝑉1 − 𝑉𝑡𝑛 𝜂𝑉𝑇 𝑊𝑝1 2 △ 𝑉𝑡 = (𝑉𝐷𝐷 − 𝑉𝑡𝑛 ) − √((𝑉𝐷𝐷 − 𝑉𝑡𝑛 )2 − (𝑉 − 𝑉𝑡𝑝 ) 2𝑊𝑛1 𝐷𝐷 (4) Assuming Vtp ≈ Vtn, (3) can be approximated as 2 𝑊𝑝1 (𝑉𝐷𝐷 − 𝑉𝑡𝑝 ) △ 𝑉1 ≈ 𝑉𝐷𝐷 − 𝑉𝑡𝑛 − (𝑉𝐷𝐷 − 𝑉𝑡𝑛 ) + 4𝑊𝑛1 (𝑉𝐷𝐷 − 𝑉𝑡𝑛 ) 4𝑊𝑛1 (5) △ 𝑉2 = (𝑉𝐷𝐷 − △ 𝑉1 − 𝑉𝑡𝑝 ) 𝐴𝑒 △𝑉1 − 𝑉𝑡𝑛 𝜂𝑉𝑇 ) 2(𝑉𝐷𝐷 − △ 𝑉1 − 𝑉𝑡𝑝 ) 𝐴𝑒 is found through a similar approach. Consider (11) △𝑉1 − 𝑉𝑡𝑛 𝜂𝑉𝑇 2(𝑉𝐷𝐷 − △𝑉1 − 𝑉𝑡𝑝 ) (12) (3) Equations (5) and (12) provide several first-order design for CDL. For a given ΔV1, designers can quickly estimate the required Wp1 to Wn1 ratio. Moreover, ΔV1 is linearly proportional to the shift of Vt and transistor width in the presence of process variations. 4.3 Power Consumption Data activity measures how frequent signals toggle and is defined as 𝐷𝑎𝑡𝑎 𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = # 𝑁𝑜 𝑜𝑓 𝑠𝑖𝑔𝑛𝑎𝑙 𝑇𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑠 ΔV2 (10) Applying Taylor expansion ∆𝑉2 ≈ By Taylor expansion, it is approximated in first order as 𝑑 𝑑2 𝑑3 𝑑 − + ∙∙∙∙∙∙∙ ≈ 𝑁 + 2𝑁 8𝑁 3 16𝑁 5 2𝑁 (9) 2𝑊𝑛2 𝐼𝑡 4𝑊𝑛2 ≈ (𝑉 )2 𝑒 1.8 𝜇𝑝 𝐶𝑜𝑥 𝑊𝑝2 𝑊𝑝2 𝑡 − ((𝑉𝐷𝐷 − △ 𝑉1 − 𝑉𝑡𝑝 ) − ΔV1 can be found by solving the quadratic equation 𝑊𝑝1 (𝑉𝐷𝐷 −𝑉𝑡𝑝 ) (∆𝑉2 )2 (8) 2 since (𝑉𝐷𝐷 − △ 𝑉2 ) ≫ 𝑉𝑇 ≈0 2 ≈ (7) Solving △ 𝑉2 = (𝑉𝐷𝐷 − △ 𝑉1 − 𝑉𝑡𝑝 ) − where μp and μn are the hole and electron mobility of PMOS and NMOS transistors, respectively, Cox is the oxide capacitance, W and L are the transistor width and length, respectively, and Vgs and Vds are the transistor gate-to-source and drain-to-source voltages, respectively. Assuming same length devices are used, and μp ≈ 0.5 μn, rearranging (1) gives √𝑁 2 + 𝑑 = 𝑁 + , 𝑉𝑇 = 𝐾𝑇𝑞 ∆𝑉1 −∆𝑉𝑡𝑛 𝑊𝑛2 𝐼1 𝑒 𝜂𝑉𝑡 𝜇𝑝 𝐶𝑜𝑥 𝑊𝑝2 Where 𝑒 𝑊𝑝1 1 𝐼𝐷 = 𝜇𝑝 𝐶0𝑥 (𝑉 − 𝑉𝑡𝑝 ) 2 𝐿𝑝1 𝑔𝑠𝑝1 (𝑉𝑑𝑠𝑛1 ) 2 3𝑇𝑜𝑥 𝑊𝑑𝑚 (6) where μo is the zero bias mobility, η is the sub threshold swing coefficient, Wdm is the maximum depletion layer width, VT is the thermal voltage, K is the Boltzmann constant, T is the temperature in kelvin, and q is the electron charge. In the case of NMOS transistors, μo is simply μn. Rearranging (6) gives Figure 7: CDL during Contention Mode 𝑊𝑛1 = 𝜇0 𝐶0𝑥 (𝑉𝑇 )2 𝑒1.8 , 𝜂 = 1 + 2 # 𝑁𝑜.𝑜𝑓 𝑠𝑖𝑔𝑛𝑎𝑙𝑠 𝑋 # 𝑁𝑜.𝑜𝑓 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 (13) 5 Implementation of Constant Delay Model CDL is still an attractive choice in a high-performance fullcustom design because: 1) CDL is only intended to replace the critical path 2) Power management techniques such as clock gating [10], where the clock connection to idle module is turned off (gated), will significantly reduce CDL dynamic power consumption. The window duration (width) is defined as the 50% point of the falling edge of CLK to the 50% point of the rising edge of node Y. The delay is measured at the 50% switching point of either the CLK or data to the 50% switching point of the latest output. 5. PERFORMANCE ANALYSIS Delay (C-Q & D-Q),Power, Energy , Power Delay Product (PDP),Energy Delay Product (EDP) are measured & plotted for different logic styles in comparison. 5.1 Ripple Carry Adder The clock timing is designed in such a way that all the CDL gates except the first stage are operated in the D–Q mode. 5.2 32-bit Carry Lookahead Adder (CLA) The 32-bit CLA uses eight 4-bit FAs with dedicated circuitry to facilitate carry generation[9]. The energy-efficient FA used in this analysis utilizes pass transistor logic styles with only 24 transistors for sum generation [23]. For the carry generation, only the critical path is replaced with different logic style. In this case, the 4-bit critical carry generation path of CLA is G3:0 = G3 + P3 (G2 + P2(G1(G0))) (15) where G and P are the generate (A.B) and propagate (A⊕B) signals, respectively. CDL and CCD logic are implemented to reduce the number of fan-ins. When expression 15 is rearranged 𝐺1:0 = 𝐺1 + 𝑃1 𝐺0 , 𝑃3:2 = 𝑃3 𝑃2 𝐺3:2 = 𝐺3 + 𝑃3 𝐺2 , 𝐺3:0 = 𝐺3:2 (𝑃3:2 + 𝐺1:0 ) (16) Figure 9: 32 bit CLA Block Diagram Figure 8: RCA Block Diagram The main purpose of this 8-bit RCA is to demonstrate CDL performance advantage and to discuss the design considerations that should be taken into account when using CDL A more energy-efficient pass-transistor FA design [13] will be implemented in the subsequent analysis to provide a more realistic comparison. Only the timing-critical carry generation is replaced with dynamic and CDL, while noncritical sum computation remains static in all three RCAs. VI. PERFORMANCE ANALYSIS VI. PERFORMANCE ANALYSIS Figure 10: Graph between Power Vs Data Activity for different Logic Style 6 Implementation of Constant Delay Model Table 1.RCA Performance Comparison: Data Activity (%) Delay (ps) Power (μW) Energy (fJ) PDP (fJ) EDP (fJ.ps) 100 18.152 145.5 331.33 2642.2 Dynamic 50 18.16 18.168 72.76 330.33 1321.32 25 12.5 100 18.176 36.37 329.3 660.47 18.19 18.17 329.69 330.14 10.583 10.57 330.82 264.25 50 25 10.586 5.273 330.25 131.82 CDL 25 12.5 10.33 2.625 390.5 65.625 10.353 1.301 390.65 32.52 CDL 25 12.5 9.187 69.01 187.23 7047.4 6.753 10.078 137.62 210.96 Table 2. CLA 32 bit Performance Comparison: Data Activity (%) Delay (ps) Power (μW) Energy (fJ) PDP (fJ) EDP (fJ.ps) 100 16.046 127.3 450.72 3577.13 Dynamic 50 28.1 15.848 239.7 445.32 6735.57 25 12.5 100 15.835 4.607 444.96 129.45 15.75 0.916 442.57 25.73 10.38 135.9 211.54 2769.64 6. CONCLUSION Implementation of 8 bit RCA using CDL provides an area improvement of 41.7% when compared with dynamic logic style & an area improvement of 22.3% when compared with static logic. Performance analysis of 32-bit CLAs reveals that CDL is 35.2% faster than dynamic domino logic. PDP & EDP is measured for the simulation circuits where a drastic improvement is observed. REFERENCES 1. R. Zimmermann and W. Fichtner, “Low-power logic styles: CMOS versus pass-transistor logic,” IEEE J. Solid-State Circuits, vol. 32, no. 7, pp. 1079–1090, Jul. 1997. 2. N. Goncalves and H. De Man, “NORA: A racefree dynamic CMOS technique for pipelined logic structures,” IEEE J. Solid-State Circuits, vol. 18, no. 3, pp. 261–266, Jun. 1983. 3. C. Lee and E. Szeto, “Zipper CMOS,” IEEE Circuits Syst. Mag., vol. 2,no. 3, pp. 10–16, May 1986. 4. R. Rafati, S. Fakhraie, and K. Smith, “A 16-bit barrelshifter implemented in data-driven dynamic logic (D3L),” IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 53, no. 10, pp. 2194–2202, Oct. 2006. 5. F. Frustaci, M. Lanuzza, P. Zicari, S. Perri, and P. Corsonello, “Lowpower split-path data-driven dynamic logic,” Circuits Dev. Syst. IET,vol. 3, no. 6, pp. 303–312, Dec. 2009. 6. N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Reading, MA: Addison Wesley, Mar. 2010. 7. K. H. Chong, L. McMurchie, and C. Sechen, “A 64 b adder using selfcalibrating differential output prediction logic,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, San Francisco, CA, Feb. 2006, pp. 1745–1754. 8. S. Mathew, R. K. Krishnamurthy, M. A. Anders, R. Rios, K. R. Mistry, and K. Soumyanath, “Sub-500-ps 64-b ALUs in 0.18-μm SOI/bulk CMOS: Design and scaling 50 23.8 9.2 345.8 187.49 1406.42 trends,” IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 318–319, Nov. 2001. 9. S. Mathew, M. Anders, R. Krishnamurthy, and S. Borkar, “A 4 GHz 130 nm address generation unit with 32-bit sparse-tree adder core,” in VLSI Circuits Dig. Tech. Papers Symp., 2002, pp. 126–127. 10. S. K. Mathew, M. A. Anders, B. Bloechel, T. N. Krishnamurthy, and S. Borkar, “A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS,” IEEE J. Solid-State Circuits, vol. 40,no. 1, pp. 44–51, Jan. 2005. 11. I. Sutherland, R. F. Sproull, and D. Harris. (Feb. 1999). Logical Effort: Designing Fast CMOS Circuits [Online]. Available: http://amazon.com/o/ASIN/1558605576/ 12. S. Kiaei, S.-H. Chee, and D. Allstot, “CMOS sourcecoupled logic for mixed-mode VLSI,” in Proc. IEEE Int. Circuits Syst. Symp., New Orleans, LA, May 1990, pp. 1608–1611. 13. L. McMurchie, S. Kio, G. Yee, T. Thorp, and C. Sechen, “Output prediction logic: A high-performance CMOS design technique,” in Proc. Comput. Des. Int. Conf., Austin, TX, 2000, pp. 247–254. T. Leela Krishna received the B.Tech in Electronics & Communication Engineering from SVCET, Andhra Pradesh, India in 2010. He is currently pursuing the M.Tech. in VLSI Design from SRM .University. His current research interests include low-power highperformance digital CMOS circuits. Dr. J Selva Kumar, received the B.E in Electronics & Communication Engineering from MADRAS University, India in 1999. He received his M.Tech from Anna University, 2003 & Ph.D. degree from SRM University, 2013 respectively. His current research interests are Low power & Reconfigurable VLSI architecture Design.

Implementation of Constant Delay Model Implementation of Low

Related documents

Products

Support

Implementation of Constant Delay Model Implementation of Low

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib