IEEF J O U R N A L OF SOLID-STATE CIRCUITS, VOL. 25, NO. 4, AUGUST 1990 1005 Correspondence CMOS Tapered Buffer N. c. LI, MEMBER, IEEE, GENE L. HAVILAND, MEMBER, IEEE, AND A. A. TUSZYNSKI, SENIOR MEMBER, IEEE Abstract -Jaeger's buffer comprises a string of tapered inverters. Each inverter is modeled by a capacitor and a conductor. We split the capacitor into inherent and load components (C, and C y ) , and show that the value of the optimal taper depends on the C, / C y ratio: the best taper exceeds Jaeger's 2.72 slope, but only moderately. Fig. 1 . I. BACKGROUND The need for buffers at chip-crossing boundaries of MOS IC's has been highlighted by Weste and Eshraghian [ l ] as well as Mead and Conway [2]. The wherewithal for the design of such buffers has been scrutinized by Lin and Linholm (31, Jaeger [4], Veendrick [5], Hedenstierna and Jeppson [6], Nemes [7], and Kanuma [8]. Several topics, which bear upon approximations employed in buffer design, have been discussed by Greenbaum [9], as well as k n o u t and De Man [lo]. Improvements attainable by recourse to BiCMOS have been examined by Rosseel and Dutton [ll], as well as De Los Santos and Hoefflinger [12]. A severe mismatch between off-chip loads and on-chip logic devices prevails in high-density CMOS circuits. In the interest of speed and power considerations, MOS transistors are laid out to minimal geometries $and W / L ratios close to 1. With gate oxides of about 250 A, the on-chip capacitance of logic devices amounts to several tens of femtofarads against an off-chip load capacitance of 50 p F or more. Thus, a speed degradation factor of three orders of magnitude would result, if the loads were connected directly to logic-level transistors. Naturally then, guided by past practice, one inserts a tapered buffer between the logic devices and the load. OF THE TAPERED BUFFER 11. DESIGN We begin with the Jaeger version of the Lin-Linholm approach, and, then proceed to the split-capacitor modification developed by us. In Jaeger's model, each stage of the buffer is represented by one conductor and one capacitor. We use one conductor but two capacitors. The thrust of our discussion is directed at the optimization of the dynamic response of the buffer. Jaeger's buffer and its model are shown in Figs. 1 and 2, respectively. There are n stages, numbered 0 to n -1. The logic-level capacitance is C,, the logic-level conductance is g , Manuscript received August 23, 1989; revised March 7, 1990. N. C. Li is with the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115. G. L. Haviland is with the Solid-state Division, Naval Ocean System Center, San Diego, CA. A. A. Tuszynski is with the Department of Electrical and Computer Engineering, San Diego State University, San Diego, CA 92182. IEEE Log Number 9036486. Circuit configuration explored by Jaeger. #O #2 #1 #n-1 Model implied in Jaeger's paper Fig. 2. and the logic-level time constant 7, = C, / g . The taper is p, i.e., the W / L ratio of stage # ( k 1) is /3 times larger than that of itage # k : + (w/L)k +I = p ( W / L ) k . (1) The conductance, capacitance, and time constant of stage # k are The overall time constant of the buffer (7,) is assumed to be equal to the sum of the time constants of the individual stages: n-1 7, (Tk)=npT,. = (3) k=O The load capacitance at the output stage (C,) is CL = pnc,. (4) The number of stages of the buffer can, therefore, be written as Substitution of ( 5 ) into (3) yields 0018-9200/90/0800-lOO5$01.00 01990 IEEE Authorized licensed use limited to: UNIVERSITA TRENTO. Downloaded on July 13, 2009 at 04:53 from IEEE Xplore. Restrictions apply. 1006 IFEE JOIJRNAL O F SOLID-STATE CIRCUITS, VOL. #O which leads to /3 (optimum) = e = 2.72. 25, NO. 4, AUGUST 1990 #@-I) #1 (7) Thus one arrives at an overall delay of 70 = e . [In (C, /c,11. T , (8) and a buffer insertion penalty factor B,= e.ln(C,/C,). (9) Transition from femtofarad logic to picofarad loads incurs the still surprisingly high penalty factor of almost twenty. TABLE I TAPE-K AS A F V N C I I O 01. NC, / C , 111. SPLIT-CAPACITOR SOLUTION C,/C, We adopt the equivalent circuit and the summation of time constants used by Jaeger, but we split the capacitor into two parts: an inherent output capacitance C, and an incidental load capacitance C , (Fig. 3). The logic-level value of C , + C y is C,. The load capacitance of the last stage is C,. To be included in C , is C,, an equivalent short-circuit current capacitance, whose maximum value is where Zp is the peak short-circuit current of the inverter, while and T~ stand for rise and fall times, respectively. See [5] for background to (10). The new definitions read as follows. The logic-level time constant is T, T, = ( c ,+ C , ) / g and the time constant of stage #k is p k c , + P'k I)cl p 0.1 0.2 0.3 0.4 0.5 0.6 0.8 1.0 3.0 2.72 2.82 2.91 3.00 3.09 3.19 3.27 3.43 3.59 4.97 0 Optimum beta is now seen to depend on the relative magnitudes of C, and C,, (Table I and Fig. 4). As was to be expected, if C , is negligibly small compared to Cv, then the optimum slope reduces to e = 2.72, in correspondence to Jaeger's solution. Conversely, if C, is much larger than C y , then P may exceed 2.72 by a considerable margin (Table 11). Typically, P is moderately larger than 2.72. IV. THEFAN-OUTDECISION In general layout work, of special interest are nodes with low to moderate fan-out. Faced with a fan-out of k , d o we or don't we use a buffer? Obviously enough, where this question arises, reference is made to a single-stage buffer, scaled as shown in Fig. 5(b). That buffer is to be compared with the straight inverter in Fig. 5(a). Retaining the technique of linear addition of time constants, we write the overall delay in Fig. 5(b) as + r/, = - - (11) PhR, c, + c, + ( P - l > C , Sn, = (12) Scrutinizing (19) for best P , one arrives at [1 + ( P - l ) P l T , (13) where a=-. C, (14) c, + c, f n confirmation of the uniform taper approach. The total delay 1s The total delay through the buffer is .r,(min) 7 , = 11Tk (15) = 2(C, +JkC,)/g (21) which is to be compared with the "no buffer" delay where n= In ( c, / c,) InP T:=(Cx+kCy)/g. (16) ' The former is smaller than the latter when Substituting now (13) and (16) into (15), we get 2( C, T , = T , .In ( C, / C y ) . [I+ ( P - 1)Pl In P Finally, differentiating (17) with respect to (17) 0 , and invoking (14) we arrive at + JkC,) < C, + kC, that is when c, c, k-2Jk>--. (24) C p[1n(p)-1]=>. c,. (18) If C , is very small compared to Cy,then the critical value of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOI.. 25, NO. 4, AUGIJST 1990 1007 4 -1 0.0 v) K v m" 7.5 a, n - & 5.0 3 m 2.5 0 5 OL 1'0 (a) o: 115 10.0 h c v) v 2 7.5 a n - 5.0 2.5 (c) (d) Fig. 4. Taper versus C , / C , : (a) p = 2.72. (b) = 3.10, ( c ) p = 3.59, and (d) P 5 4.99 12.2 12.6 4 7.45 13.3 13.8 6 3.82 12.1 11.9 7 3.15 12.6 12.2 4.32. TABLE 111 C,/c, TABLE I 1 EQUATION (15) A N U SPICE SIMUL.ATION R~SULIS number of stages n taper p B, per (15) B, per SPICE* = SLOPt VtRSUS 8 2.73 13.0 12.4 *For C, = 38.9 fF,C,- = 50 pF, and MOSIS 1.2-/~mCMOS SPICE parameters. C,/C,, k,, P 0 4 2 0.25 4.49 2.1 0.50 4.95 2.2 0.75 5.39 2.3 1.0 5.83 2.4 2.0 6.46 2.5 3.0 9.0 3.0 fan-out is k,,(O) =4. Othcnvise k,, = 2+2J(1+ C,/C,) + C,/C,. (26) Equation (26) and Table 111 reveal that the answer to the buffer question depends on both C,/C,, and C,/C,. As a rule, a buffer should be used only when C , /C, is larger than four. (a) VI $?lcx rt p&c.x.$, C L= kCy (b) Fig. 5 . The fan-out decision: (a) direct hookup and (b) buffered connection. V. CONCLUSION The split-capacitor model leads to the conclusion that the taper is a function of C, / C , and, therefore, a matter of technology, i.e., it depends on fcature size, gate-oxide thickness, junction capacitanccs, etc. For any particular load capacitance, there exists a best taper and a corresponding best number of stages, but the law relating the delay penalty to the taper of the buffer is not very strong. At on-chip distribution points, buffers are justified only where fan-out cxceeds a factor of 4. However, the chip-crossing penalty of MOS implementations is severe even in the best case; therein lies a strong argument in favor of BiCMOS I/O. 1008 I E E E JOURNAL OF SOLID-STATE C i n c u i T s , VOL. REFERENCES N. H. E. Weste and K. Eshrdgian, Principles of CMOS VLSI Design: A System Perspectii,e. Reading, MA: Addison-Wesley, 1985. C. Mead and L. Conway, Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1980. H. C. Lin and L. W. Linholm, “An optimized output stage for MOS integrated circuits,” IEEE J . Solid-state Circuits, vol. SC-IO, no. 2, pp. 106-109, Apr. 1975. R. C. Jaeger, “Comments on ‘An optimized output stage for MOS integrated circuits’,” IEEE J . Solid-state Circuits, vol. SC-IO, no. 3, pp. 185-186, June 1975. H. J. Veendrick, “Short-circuit dissipation of static CMOS circuitry and its impact o n the design of buffer circuits,” fEEE J . Solid-state Circuits, vol. SC-19, no. 4, pp. 468-474, Aug. 1984. N. Hedenstierna and K. 0. Jeppson, “CMOS circuit speed and buffer optimization,” IEEE Trans. Computer-Aided Design, vol. CAD-6, no. 2, pp. 276-281, Mar. 1987. M. Nemes, “Driving large capacitances in MOS LSI systems,” IEEE J . Solid-Stale Circuits, vol. SC-19, no. 1, pp. 159-161, Feb. 1984. A. Kanuma, “CMOS circuit optimization,” Solid-Slate Electron., vol. 26, pp. 47-58, 1983. J. R. Greenbaum, “Digital-IC models for computer-aided design,” Electronics, pp. 121-125, Dec. 6, 1973; also pp. 107-112, Dec. 20, 1973. G. Arnout and H. J. De Man, “The use of threshold functions and Boolean-controlled network elements for macromodeling of LSI circuits,” IEEE J . Solid-State Circuits, vol. SC-13, no. 3, pp. 326-332, June 1978. G. P. Rosseel and R. W. Dutton, “Influence of device parameters on the switching speed of BiCMOS buffers,” IEEE J . Solid-State Circuits, vol. 24, no. 1, pp. 90-99, Feb. 1989. H. L. De Los Santos and B. Hoefflinger, “Optimization and scaling of CMOS-bipolar drivers for VLSI interconnects,” IEEE Trans. Electron Deuces, vol. ED-33, no. 11, pp. 1722-1730, Nov. 1986. A Novel CMOS Implementation of Double-Edge-TriggeredFlip-Flops SHIH-LIEN LU AND MILOS ERCEGOVAC, MEMBER, 25, NO. 4, AUGUST 1990 double-edge-triggered flip-flops (DET-FFs) have two major advantages. First, power dissipation is reduced. With the conventional SET-FF’s, one of the two clock transitions accomplishes nothing. However, this transition may cause changes in the output of some logic elements internal to the FF’s. In addition, extra energy is wasted to charge or discharge the capacitive load of the global clock line in a system using SET-FF’s. This is particularly true in CMOS where static power dissipation is small and the dynamic power dissipation is the main contributor of energy dissipation. Second, the speed of the system is accelerated. With both edges able to cause state transition, some redundant logic can be eliminated. Moreover, the clock period will be shortened because there is no need to wait for the clock signal to toggle up and down. The main disadvantage of DET-FF’s has been the substantial increase in the number of components required to build such FF’s. In most cases, more than double the logic counts is expected. This paper proposes a novel design in CMOS which will implement static DET-FF’s with relatively little increase in components. It is based on the single-phase CMOS register proposed by Lu in [2]. An implementation of a D-type DET-FF uses only 26 MOS devices in comparison with a typical static CMOS D-type flip-flop which requires 16 MOS devices. Another disadvantage of DET-FF’s is in the extra delays caused by the extra gates needed to implement it by parallel decomposition. The presented CMOS implementation introduces little delays. It satisfies the speed requirement of the modern digital system. This D-FF is clocked at 50 MHz. Simulation performed with parameters obtained from a MOS Implementation System (MOSIS) [3] 2-pm CMOS/bulk process endorses the proposed implementation. IEEE 11. CIRCUIT DESIGNOF A D-TYPEDET-FF Abstmct -A CMOS implementation of a D-type double-edgetriggered flip-flop (DET-FF) is presented. A DET-FF changes its state at both the positive and the negative clock edge transitions. It has advantages with respect to both system speed and power dissipation. The design presented requires little overhead in circuit complexity. This CMOS D-type DET-FF is capable of operating at more than 50 MHz, which gives an equivalent system frequency of 100 MHz. I. INTRODUCTION Conventional single-edge-triggered flip-flops (SET-FF’s) change states at the time when the clock signal goes from 0 to 1 or at the time when the clock goes from 1 to 0. The former are called positive-edge-triggered flip-flops (PET-FF’s) or risingedge-triggered flip-flops (RET-FF’s) and the latter are called negative-edge-triggered flip-flops (NET-FF’s) or trailing-edge triggered flip-flops (TET-FF’s). The advantage of edge triggering is that the setup time for data input is independent of the clock pulse width. This makes system design simpler. It is also less sensitive to noises. However, these flip-flops respond only once per clock pulse cycle. Energy and time are wasted. Unger proposed in [l]a class of flip-flops (FF’s) that will respond to both the positive and the negative edge of the clock pulse. These Manuscript received November 14, 1989; revised December 18, 1989. S:L. Lu is with the Department of Computer Science, University of California, Los Angeles, CA 90024 and with MOSIS, Marina del Rey, CA 90292-6695. M. Ercegovac is with the Department of Computer Science, University of California, Los Angeles, CA 90024. IEEE Log Number 9036484. A D-type DET-FF consists of two cross-coupled latches with input gating devices and some simple pass-transistor logic. A circuit diagram is illustrated in Fig. 1. Its operation principle is similar to the one used by Mead and Wawrzynek [4]. The two cross-coupled latches are enabled/disabled by the clock signal. When the clock is low, latch 1 is disabled and latch 2 is enabled. With clock high, latch 1 is enabled and latch 2 disabled. A disabled latch 1 has both of its output and the complement set to high (Vdd).A disabled latch 2 has both its output and the complement set to low (GND). During the rising edge of the clock signal, latch 1 is being enabled. Depending on the D input value, either transistor M 7 or M 8 is conducting just before M 9 switches off. Either output Ql or its complement will remain charged to high (&,) while the other is discharged to low (GND). The set value will stay unchanged throughout the half of the clock period while it is high. Similarly, on the trailing edge of the clock signal latch 2 is being enabled. According to the value of the D input, either Q 2 or its complement will remain low (GND) while the other will be set to high. The output value remains stable for the duration of low clock signal. Thus, this DET-FF is a static flip-flop. It consumes no static power. Table I gives the logic required to obtain the final output value. We observed that when the clock is low, both Q l and its complement are high. The final value is the value of Q2. When clock is high, both Q2 and its complement are low. The final value is the value of Ql. Pass-transistor logic, shown in Fig. l(c), is used to implement the logic function. 0018-9200/90/0800-1008$01.00 01990 IEEE Authorized licensed use limited to: UNIVERSITA TRENTO. Downloaded on July 13, 2009 at 04:53 from IEEE Xplore. Restrictions apply.