CMOS tapered buffer - Solid-State Circuits, IEEE Journal of

advertisement
IEEF J O U R N A L OF SOLID-STATE CIRCUITS, VOL.
25, NO. 4,
AUGUST
1990
1005
Correspondence
CMOS Tapered Buffer
N.
c. LI, MEMBER, IEEE, GENE L. HAVILAND, MEMBER, IEEE,
AND
A. A. TUSZYNSKI, SENIOR
MEMBER, IEEE
Abstract -Jaeger's buffer comprises a string of tapered inverters.
Each inverter is modeled by a capacitor and a conductor. We split the
capacitor into inherent and load components (C, and C y ) , and show
that the value of the optimal taper depends on the C, / C y ratio: the best
taper exceeds Jaeger's 2.72 slope, but only moderately.
Fig. 1 .
I. BACKGROUND
The need for buffers at chip-crossing boundaries of MOS IC's
has been highlighted by Weste and Eshraghian [ l ] as well as
Mead and Conway [2]. The wherewithal for the design of such
buffers has been scrutinized by Lin and Linholm (31, Jaeger [4],
Veendrick [5], Hedenstierna and Jeppson [6], Nemes [7], and
Kanuma [8]. Several topics, which bear upon approximations
employed in buffer design, have been discussed by Greenbaum
[9], as well as k n o u t and De Man [lo]. Improvements attainable by recourse to BiCMOS have been examined by Rosseel
and Dutton [ll], as well as De Los Santos and Hoefflinger [12].
A severe mismatch between off-chip loads and on-chip logic
devices prevails in high-density CMOS circuits. In the interest of
speed and power considerations, MOS transistors are laid out to
minimal geometries $and W / L ratios close to 1. With gate
oxides of about 250 A, the on-chip capacitance of logic devices
amounts to several tens of femtofarads against an off-chip load
capacitance of 50 p F or more. Thus, a speed degradation factor
of three orders of magnitude would result, if the loads were
connected directly to logic-level transistors. Naturally then,
guided by past practice, one inserts a tapered buffer between
the logic devices and the load.
OF THE TAPERED
BUFFER
11. DESIGN
We begin with the Jaeger version of the Lin-Linholm approach, and, then proceed to the split-capacitor modification
developed by us. In Jaeger's model, each stage of the buffer is
represented by one conductor and one capacitor. We use one
conductor but two capacitors. The thrust of our discussion is
directed at the optimization of the dynamic response of the
buffer.
Jaeger's buffer and its model are shown in Figs. 1 and 2,
respectively. There are n stages, numbered 0 to n -1. The
logic-level capacitance is C,, the logic-level conductance is g ,
Manuscript received August 23, 1989; revised March 7, 1990.
N. C. Li is with the Department of Electrical and Computer Engineering,
Northeastern University, Boston, MA 02115.
G. L. Haviland is with the Solid-state Division, Naval Ocean System
Center, San Diego, CA.
A. A. Tuszynski is with the Department of Electrical and Computer
Engineering, San Diego State University, San Diego, CA 92182.
IEEE Log Number 9036486.
Circuit configuration explored by Jaeger.
#O
#2
#1
#n-1
Model implied in Jaeger's paper
Fig. 2.
and the logic-level time constant 7, = C, / g . The taper is p, i.e.,
the W / L ratio of stage # ( k 1) is /3 times larger than that of
itage # k :
+
(w/L)k
+I = p ( W / L ) k .
(1)
The conductance, capacitance, and time constant of stage # k
are
The overall time constant of the buffer (7,) is assumed to be
equal to the sum of the time constants of the individual stages:
n-1
7,
(Tk)=npT,.
=
(3)
k=O
The load capacitance at the output stage (C,) is
CL = pnc,.
(4)
The number of stages of the buffer can, therefore, be written as
Substitution of ( 5 ) into (3) yields
0018-9200/90/0800-lOO5$01.00 01990 IEEE
Authorized licensed use limited to: UNIVERSITA TRENTO. Downloaded on July 13, 2009 at 04:53 from IEEE Xplore. Restrictions apply.
1006
IFEE JOIJRNAL O F SOLID-STATE CIRCUITS, VOL.
#O
which leads to
/3 (optimum) = e = 2.72.
25, NO. 4, AUGUST 1990
#@-I)
#1
(7)
Thus one arrives at an overall delay of
70
= e . [In
(C,
/c,11. T ,
(8)
and a buffer insertion penalty factor
B,= e.ln(C,/C,).
(9)
Transition from femtofarad logic to picofarad loads incurs the
still surprisingly high penalty factor of almost twenty.
TABLE I
TAPE-K
AS A F V N C I I O
01. NC, / C ,
111. SPLIT-CAPACITOR
SOLUTION
C,/C,
We adopt the equivalent circuit and the summation of time
constants used by Jaeger, but we split the capacitor into two
parts: an inherent output capacitance C, and an incidental load
capacitance C , (Fig. 3). The logic-level value of C , + C y is C,.
The load capacitance of the last stage is C,. To be included in
C , is C,, an equivalent short-circuit current capacitance, whose
maximum value is
where Zp is the peak short-circuit current of the inverter, while
and T~ stand for rise and fall times, respectively. See [5] for
background to (10).
The new definitions read as follows. The logic-level time
constant is
T,
T, =
( c ,+ C , ) / g
and the time constant of stage #k is
p k c , + P'k I)cl
p
0.1
0.2 0.3 0.4 0.5 0.6 0.8 1.0 3.0
2.72 2.82 2.91 3.00 3.09 3.19 3.27 3.43 3.59 4.97
0
Optimum beta is now seen to depend on the relative magnitudes
of C, and C,, (Table I and Fig. 4). As was to be expected, if C ,
is negligibly small compared to Cv, then the optimum slope
reduces to e = 2.72, in correspondence to Jaeger's solution.
Conversely, if C, is much larger than C y , then P may exceed
2.72 by a considerable margin (Table 11). Typically, P is moderately larger than 2.72.
IV. THEFAN-OUTDECISION
In general layout work, of special interest are nodes with low
to moderate fan-out. Faced with a fan-out of k , d o we or don't
we use a buffer? Obviously enough, where this question arises,
reference is made to a single-stage buffer, scaled as shown in
Fig. 5(b). That buffer is to be compared with the straight
inverter in Fig. 5(a).
Retaining the technique of linear addition of time constants,
we write the overall delay in Fig. 5(b) as
+
r/, =
-
-
(11)
PhR,
c, + c, + ( P - l > C ,
Sn,
=
(12)
Scrutinizing (19) for best P , one arrives at
[1 + ( P - l ) P l T ,
(13)
where
a=-.
C,
(14)
c, + c,
f n confirmation of the uniform taper approach. The total delay
1s
The total delay through the buffer is
.r,(min)
7 , = 11Tk
(15)
= 2(C,
+JkC,)/g
(21)
which is to be compared with the "no buffer" delay
where
n=
In ( c, / c,)
InP
T:=(Cx+kCy)/g.
(16)
'
The former is smaller than the latter when
Substituting now (13) and (16) into (15), we get
2( C,
T , = T , .In
( C, / C y ) .
[I+
( P - 1)Pl
In P
Finally, differentiating (17) with respect to
(17)
0 , and invoking (14)
we arrive at
+ JkC,) < C, + kC,
that is when
c,
c,
k-2Jk>--.
(24)
C
p[1n(p)-1]=>.
c,.
(18)
If C , is very small compared to Cy,then the critical value of the
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOI..
25, NO. 4,
AUGIJST
1990
1007
4
-1 0.0
v)
K
v
m" 7.5
a,
n
-
& 5.0
3
m
2.5
0
5
OL
1'0
(a)
o:
115
10.0
h
c
v)
v
2 7.5
a
n
-
5.0
2.5
(c)
(d)
Fig. 4. Taper versus C , / C , : (a) p
=
2.72. (b)
=
3.10, ( c ) p
=
3.59, and (d) P
5
4.99
12.2
12.6
4
7.45
13.3
13.8
6
3.82
12.1
11.9
7
3.15
12.6
12.2
4.32.
TABLE 111
C,/c,
TABLE I 1
EQUATION
(15) A N U SPICE SIMUL.ATION
R~SULIS
number of stages n
taper p
B, per (15)
B, per SPICE*
=
SLOPt VtRSUS
8
2.73
13.0
12.4
*For C, = 38.9 fF,C,- = 50 pF, and MOSIS 1.2-/~mCMOS SPICE
parameters.
C,/C,,
k,,
P
0
4
2
0.25
4.49
2.1
0.50
4.95
2.2
0.75
5.39
2.3
1.0
5.83
2.4
2.0
6.46
2.5
3.0
9.0
3.0
fan-out is
k,,(O)
=4.
Othcnvise
k,,
= 2+2J(1+
C,/C,)
+ C,/C,.
(26)
Equation (26) and Table 111 reveal that the answer to the buffer
question depends on both C,/C,, and C,/C,. As a rule, a
buffer should be used only when C , /C, is larger than four.
(a)
VI
$?lcx
rt
p&c.x.$,
C L= kCy
(b)
Fig. 5 . The fan-out decision: (a) direct hookup and (b) buffered connection.
V. CONCLUSION
The split-capacitor model leads to the conclusion that the
taper is a function of C, / C , and, therefore, a matter of
technology, i.e., it depends on fcature size, gate-oxide thickness,
junction capacitanccs, etc. For any particular load capacitance,
there exists a best taper and a corresponding best number of
stages, but the law relating the delay penalty to the taper of the
buffer is not very strong. At on-chip distribution points, buffers
are justified only where fan-out cxceeds a factor of 4. However,
the chip-crossing penalty of MOS implementations is severe
even in the best case; therein lies a strong argument in favor of
BiCMOS I/O.
1008
I E E E JOURNAL OF SOLID-STATE C i n c u i T s , VOL.
REFERENCES
N. H. E. Weste and K. Eshrdgian, Principles of CMOS VLSI Design: A
System Perspectii,e. Reading, MA: Addison-Wesley, 1985.
C. Mead and L. Conway, Introduction to VLSI Systems. Reading, MA:
Addison-Wesley, 1980.
H. C. Lin and L. W. Linholm, “An optimized output stage for MOS
integrated circuits,” IEEE J . Solid-state Circuits, vol. SC-IO, no. 2, pp.
106-109, Apr. 1975.
R. C. Jaeger, “Comments on ‘An optimized output stage for MOS
integrated circuits’,” IEEE J . Solid-state Circuits, vol. SC-IO, no. 3, pp.
185-186, June 1975.
H. J. Veendrick, “Short-circuit dissipation of static CMOS circuitry and
its impact o n the design of buffer circuits,” fEEE J . Solid-state Circuits,
vol. SC-19, no. 4, pp. 468-474, Aug. 1984.
N. Hedenstierna and K. 0. Jeppson, “CMOS circuit speed and buffer
optimization,” IEEE Trans. Computer-Aided Design, vol. CAD-6, no. 2,
pp. 276-281, Mar. 1987.
M. Nemes, “Driving large capacitances in MOS LSI systems,” IEEE J .
Solid-Stale Circuits, vol. SC-19, no. 1, pp. 159-161, Feb. 1984.
A. Kanuma, “CMOS circuit optimization,” Solid-Slate Electron., vol. 26,
pp. 47-58, 1983.
J. R. Greenbaum, “Digital-IC models for computer-aided design,”
Electronics, pp. 121-125, Dec. 6, 1973; also pp. 107-112, Dec. 20, 1973.
G. Arnout and H. J. De Man, “The use of threshold functions and
Boolean-controlled network elements for macromodeling of LSI circuits,” IEEE J . Solid-State Circuits, vol. SC-13, no. 3, pp. 326-332, June
1978.
G. P. Rosseel and R. W. Dutton, “Influence of device parameters on
the switching speed of BiCMOS buffers,” IEEE J . Solid-State Circuits,
vol. 24, no. 1, pp. 90-99, Feb. 1989.
H. L. De Los Santos and B. Hoefflinger, “Optimization and scaling of
CMOS-bipolar drivers for VLSI interconnects,” IEEE Trans. Electron
Deuces, vol. ED-33, no. 11, pp. 1722-1730, Nov. 1986.
A Novel CMOS Implementation of
Double-Edge-TriggeredFlip-Flops
SHIH-LIEN LU AND MILOS ERCEGOVAC, MEMBER,
25, NO. 4,
AUGUST
1990
double-edge-triggered flip-flops (DET-FFs) have two major advantages. First, power dissipation is reduced. With the conventional SET-FF’s, one of the two clock transitions accomplishes
nothing. However, this transition may cause changes in the
output of some logic elements internal to the FF’s. In addition,
extra energy is wasted to charge or discharge the capacitive load
of the global clock line in a system using SET-FF’s. This is
particularly true in CMOS where static power dissipation is
small and the dynamic power dissipation is the main contributor
of energy dissipation. Second, the speed of the system is accelerated. With both edges able to cause state transition, some
redundant logic can be eliminated. Moreover, the clock period
will be shortened because there is no need to wait for the clock
signal to toggle up and down.
The main disadvantage of DET-FF’s has been the substantial
increase in the number of components required to build such
FF’s. In most cases, more than double the logic counts is
expected. This paper proposes a novel design in CMOS which
will implement static DET-FF’s with relatively little increase in
components. It is based on the single-phase CMOS register
proposed by Lu in [2]. An implementation of a D-type DET-FF
uses only 26 MOS devices in comparison with a typical static
CMOS D-type flip-flop which requires 16 MOS devices. Another disadvantage of DET-FF’s is in the extra delays caused by
the extra gates needed to implement it by parallel decomposition. The presented CMOS implementation introduces little
delays. It satisfies the speed requirement of the modern digital
system. This D-FF is clocked at 50 MHz. Simulation performed
with parameters obtained from a MOS Implementation System
(MOSIS) [3] 2-pm CMOS/bulk process endorses the proposed
implementation.
IEEE
11. CIRCUIT
DESIGNOF A D-TYPEDET-FF
Abstmct -A
CMOS implementation of a D-type double-edgetriggered flip-flop (DET-FF) is presented. A DET-FF changes its state at
both the positive and the negative clock edge transitions. It has advantages with respect to both system speed and power dissipation. The
design presented requires little overhead in circuit complexity. This
CMOS D-type DET-FF is capable of operating at more than 50 MHz,
which gives an equivalent system frequency of 100 MHz.
I. INTRODUCTION
Conventional single-edge-triggered flip-flops (SET-FF’s)
change states at the time when the clock signal goes from 0 to 1
or at the time when the clock goes from 1 to 0. The former are
called positive-edge-triggered flip-flops (PET-FF’s) or risingedge-triggered flip-flops (RET-FF’s) and the latter are called
negative-edge-triggered flip-flops (NET-FF’s) or trailing-edge
triggered flip-flops (TET-FF’s). The advantage of edge triggering is that the setup time for data input is independent of the
clock pulse width. This makes system design simpler. It is also
less sensitive to noises. However, these flip-flops respond only
once per clock pulse cycle. Energy and time are wasted. Unger
proposed in [l]a class of flip-flops (FF’s) that will respond to
both the positive and the negative edge of the clock pulse. These
Manuscript received November 14, 1989; revised December 18, 1989.
S:L. Lu is with the Department of Computer Science, University of
California, Los Angeles, CA 90024 and with MOSIS, Marina del Rey, CA
90292-6695.
M. Ercegovac is with the Department of Computer Science, University of
California, Los Angeles, CA 90024.
IEEE Log Number 9036484.
A D-type DET-FF consists of two cross-coupled latches with
input gating devices and some simple pass-transistor logic. A
circuit diagram is illustrated in Fig. 1. Its operation principle is
similar to the one used by Mead and Wawrzynek [4]. The two
cross-coupled latches are enabled/disabled by the clock signal.
When the clock is low, latch 1 is disabled and latch 2 is enabled.
With clock high, latch 1 is enabled and latch 2 disabled. A
disabled latch 1 has both of its output and the complement set
to high (Vdd).A disabled latch 2 has both its output and the
complement set to low (GND). During the rising edge of the
clock signal, latch 1 is being enabled. Depending on the D input
value, either transistor M 7 or M 8 is conducting just before M 9
switches off. Either output Ql or its complement will remain
charged to high (&,) while the other is discharged to low
(GND). The set value will stay unchanged throughout the half of
the clock period while it is high. Similarly, on the trailing edge
of the clock signal latch 2 is being enabled. According to the
value of the D input, either Q 2 or its complement will remain
low (GND) while the other will be set to high. The output value
remains stable for the duration of low clock signal. Thus, this
DET-FF is a static flip-flop. It consumes no static power. Table
I gives the logic required to obtain the final output value. We
observed that when the clock is low, both Q l and its complement are high. The final value is the value of Q2. When clock is
high, both Q2 and its complement are low. The final value is the
value of Ql. Pass-transistor logic, shown in Fig. l(c), is used to
implement the logic function.
0018-9200/90/0800-1008$01.00 01990 IEEE
Authorized licensed use limited to: UNIVERSITA TRENTO. Downloaded on July 13, 2009 at 04:53 from IEEE Xplore. Restrictions apply.
Download