OCIECE Winter 2002 High-Speed Low-power VLSI New Single-Clock CMOS Latches and FlipFlips with improved Speed and Power [2] & A New True-Single-Phase-Clocked Double-Edge-Triggered Flip-Flop for Low-Power VLSI Designs [7] Discussion States • Introduction. • Motivation for TSPC and DET Flip-Flops. • New techniques for high-speed TSPC and single clocked FlipFlops and latches. • A New technique for TSPC Dual-edge-clocked Flip-Flop. • A new approach for Power Consumption comparative analysis of Single Edge Triggered /Dual Edge Triggered Flip-Flops. • Results and comparisons • Conclusions • Questions Static vs. Dynamic Latches • A Static Latch: • A cross-coupled inverter pair produces a bistable element. The bistable states are used to memorize binary data as long as the supply voltage exists [10]. • Another signal(s) (Clock) is/are used to allow transparency or notransparency between the I/P and O/P of the bistable element. D Q Clk • A Dynamic Latch: • Temporary storage of a charge in the parasitic capacitors of a circuit that is periodically refreshed . The stored charge is used to memorize binary data [10]. • A Clock signal is used to allow transparency, partial transparency or no-transparency between the I/P and O/P of the circuit. Static, Semi-static and Dynamic Flip-Flops • Static (Non-precharged) Flip-Flop: a cascaded pair of static latches clocked in a complementary style. • Semistatic Flip-Flop: a cascaded pair of static and dynamic latches clocked in a complementary style. • D Q Clk Fully dynamic Flip-Flop: a cascaded pair of dynamic latches clocked in a complementary style. • Differential Flip-Flop: a cascaded pair of a static, dynamic or mix of differential latches clocked in a complementary style. It deals with differential inputs and outputs. D Q D Q Clk Power Consumption in the Clocking System of a Synchronous Pipelined circuit Ckext Clock Generator Data D CL1 D F/F Cg.1 Cj Q C D Q CLn D F/F F/F C Cgw Q F/F Cg.n C Clw.1 Q C Clw.n Ckint • Total power consumption of the clocking system (Pck) can be expressed as: Pck = Pckgen + Pgw + Plw + Pg + Preg = ( 1 + 1/ k + 1/k2 + ......... + 1/kn ) . Vs 2 . f . [ Cj + Cgw + Clw + Cg ] + Preg ......... (eqn 1) where, Pgw : Power consumption of global wiring capacitances Plw : Power consumption of local wiring capacitances Pg : Power consumption due to total clocked gate capacitance Preg : Power consumption in the flip-flops, k : Tapering factor of the clock generator (clock buffer) Cg : Total gate capacitance of clocked transistors of the clocking system, and Cj : junction cap. of the source-drain regions of the output node of the clock generator Power Consumption in the Clocking System of a Synchronous Pipelined circuit • Pck = 1.11 . Vs 2 . f . [Cj + Cgw + Clw + m . Cgf ] + m . Pgf .......... (eqn 2) where m : total number of flip-flops in the clocking system Cgf : Total clocked gate capacitance per flip-flop Pgf : Power consumption of one flip-flop @ frequency f Therefore, the power consumption of the clocking system can be minimized by: 1. Reducing Cgw and Clw E.g. Using True Single Phase clocking techniques (TSPC). 2. Reducing the number of clocked transistors per flip-flop ( Cgf term) E.g. Using Single Transistor Clocked FF (STC). 3. Utilizing the two edges of the clock e.g. (Dual Edge Triggered FF). 4. Using minimum transistor size, and careful layout design to reduce Pgf . TSPC Basic stages F F F F • F F PP PN SP SN Basic (non-differential) TSPC Edge triggered FF are : • Pre-charged version (dynamic) PP+SP+PN+SN • Non-pre-charged (static) SP+SP+SN+SN Low power bottlenecks: 1. clocked transistors with a = 1. 2. Pre-charged nodes with a = 0.5. Speed bottlenecks : P-blocks to provide complementary inputs to nblocks. (2 stages +Inv.) delay Precharged Single stage TSPC Full-latch (PN+SN) + (PP+SP+INV) Advantages: 1- High and low input total latching. 2- Isolated output node. (Infinite impedance). 3- The precharge state is not showing on the O/P at the start of the evaluation. 4- No need for a stable input. F F In F D F F PN D F SN PP SP NMOS transistor sizing is important in order not to load the pre-charged node. F In F F F (PN + SN) + FL(P) + INV D D F In F F F D D (PN/SN) + FL(P) + INV Nonprecharged Single stage TSPC Fulllatch • • N-latch to become PSN. I n Charge sharing compensation. F F SN • Only 3 clocked transistors in the PSLT. • Essential need for a stable input. In F F F D D SN F SP SP F In F * F D D (PSN+SN) + FL(P) + INV PSLT(N) + FL(P) + INV D D (SN + SN) + (SP + SP + INV) F * F Non-Differential semistatic flip-flops F Vdd Conflict-free Full latch • Transistors in the dotted box should be kept to a minimum size to minimize the load. F * In • Vdd Vdd * F D F D * F PN + SN + SFL(P) + INV Vdd F In F Vdd Vdd F * * Simplified version F D D CVSL and Single-Transistor-Clocked TSPC dynamic and static differential latches • Availability of complementary outputs. • F In Out In In Can Not be recovered in DSTC1. • Ou t F STC charge sharing problems • n-latch p-latch In F F Out Out Can be recovered in SSTC1. F In Out Out In In In Out Out Out F * * In F Out In In * * In F Out Out DSTC1(P) DSTC1(N) SSTC1(P) SSTC1(N) High speed differential Flip-Flops • An opportunity to remove the speed bottleneck using the following principle; • “The master does not have to be a full-latch and it is enough to have only one isolated output state as long as it is identical to the nontransparent input state of the slave.” [2] In In D F * * F * * * * SP D F In In F * * F * * * SP DSTC1(N) Dynamic Flip-Flop ( (SP + SP) + DSTC1(N) ) SP D D * SP F SSTC1(N) Semistatic Flip-Flop ( (SP + SP) + SSTC1(N) ) A high-speed dynamic differential Flip-Flop (cont’d) In • P-block is too fast, so n-transistors need to be minimized to give enough setup time. D In F F * * D * * * F * • SP stages are merged to remove more P-transistors. • A correspondence showed an input glitch related problem Positive edge-triggered dynamic ( (SP + SP) + DSTC1(N) ) DSTC2(P) • Capacitive coupling (x-D). Charge sharing (bj - D) D F leads to a poor logic-level zero output. • DSTC1(N) D In In x* * * * x bj F A high-speed dynamic differential Flip-Flop (cont’d) Whenever a pipeline is ended with a DSTC2(p), a termination stage is needed to fully latch the output [2]. D F * I n F * D * * In DSTC1(N) DSTC2(P) Only used for p-termination of a pipeline A high-speed fully static differential Flip-Flop • DSTC2 p-latch to be converted to SSTC2(P): • A minimum inverter and two minimum n-transistors to low output from floating to high. D D F * In In * * * * * * * F In SSTC2(P) SSTC1(N) Performance Comparison [2] Conventional Dual Edge Triggered (DET) Flip-Flops VDD VDD F D F F F D F F F F GND VDD F F Q Q GND VDD F F F F F GND GND Fig 1 FEATURES OF DIFFERENTIAL EDGE-TRIGGERED FLIP-FLOPS F/Fs s/d f Nt Nc Pcg.n Pw.n Pg.n SET s 1 9/11 4 >1 >1 >1 Fig 1 d 1/2 18/20 8 >1 1 >1 Fig 2 d 1/2 20/22 6 >1 1 >1 Fig 3 d 1/2 12/14 4 1 1 1 Fig 4 d 1/2 16/18 4 1 1 1 Fig 2 F F D F F Q F Fig 3 F A new TSPC Dual Edge Triggered (DET) Flip-Flop • The TSPC DET leads to less Pw. • Less number of clocked transistors => less Cgf • Less number of speed bottleneck devices. • No PP/PN stages leads to less activity in the internal nodes. VDD F F VDD D GND F F GND Fig 4 FEATURES OF DIFFERENTIAL EDGE-TRIGGERED FLIP-FLOPS F/Fs s/d f Nt Nc Pcg.n Pw.n Pg.n SET s 1 9/11 4 >1 >1 >1 Fig 1 d 1/2 18/20 8 >1 1 >1 Fig 2 d 1/2 20/22 6 >1 1 >1 Fig 3 d 1/2 12/14 4 1 1 1 Fig 4 d 1/2 16/18 4 1 1 1 Dual Edge Triggered (DET) Flip-Flop Speed calculations Data SPEED/ POWER PERFORMANCE OF EACH EDGE-TRIGGERED FLIP FLOPS F/Fs SET Fig 1 Fig 2 Fig 3 Fig 4 D Q D Q CL tcq tsu t Cgf Pgf Cff(fF) Cff/Cgf 0.36n 0.43n 0.70n 0.53n 0.66n 0.34n 0.34n 0.30n 0.33n 0.43n 0.70n 0.77n 1.00n 0.86n 1.09n 6.88f 13.76f 10.32f 6.88f 6.88f 32.46 26.02 29.92 125.4 23.38 25.97 41.63 47.86 200.6 37.41 3.77 3.03 4.64 29.16 5.44 DEFTFF C DEFTFF C Clock Clock Data Q tsu2 tsu1 tcq2 tcq2 • Setup time: tsu = max (tsu1 + tsu2) • Clk-to-output delay: tcq = max (tcq1 + tcq2) • The speed figure of the DET FF: t = tsu + tcq • Max. toggle frequency of the DETFF (@25MHz) = 1 / (2 * speed figure) = 500MHz Power consumption in the clocking system associated with a SET/DET Flip-Flops • Pgf ( power consumption per flip-flop @ frequency f ) = Cff .Vdd 2 . f ..... (eqn 3) where, Cff = Equivalent capacitance of one flip-flop , Pck = 1.11 . Vs 2 . f . [ Cj + Cgw + Clw + m . Cgf ] + m . Cff . Vdd 2 . f = Vdd 2 . f ( Cclk + Creg ) , assuming for Vs = Vdd = Vdd 2 . f . Ctot • A pipelined FIR macro (No. of FF’s = 2616) was implemented and a comparative analysis was done showing a power reduction of 36% A TYPICAL POWER CONSUMPTION EXAMPLE compared to the SETFF implementation. OF DIFFERENT CLOCKING SYSTEMS F/Fs f(M) Cg Cclk Creg Ctot Pck (W) Pck.n norm. P.t (fJ) SET Fig 1 Fig 2 Fig 3 Fig 4 50 25 25 25 25 18 36 27 18 1 37.7 57.7 47.7 37.7 . 67.9 108.9 125.9 524.9 . 105.6p 166.6p 172.9p 562.6p 135.6p 0.132 0.104 0.108 0.352 0.085 1.00 0.79 0.82 2.67 0.64 35.3 30.6 41.3 115.7 35.4 A new approach for Power Consumption analysis of SET/DET Flip-Flops [5] In the absence of Glitches; PD = 1 2 X D D fDV 2 DD cjaj Q Clk aj = transition probability@ node j Z D D Power consumption of the clock nodes Pck = fckV 2 DDCk PSET = fV 1 2 1 2 2 Clk 1 For equal data rate , Ck , SET fV DD 0 Q 1 2 2 DD D aj ,SETCj ,SET Q Y j fV 2 DDCk , DET 12 fV 2 DD aj , DETCj , DET j PSET = fV PDET = Q j Cj = the node capacitance. PDET = Q 2 Ck , SET a DD 1 D 2 fV 2 Ca , SET DD fV 2 DDCk , DET aD 12 fV 2 DDCa , DET PDET PSET = Ck , DET aDCa , DET 2Ck , SET aDCa , SET Q A new approach for Power Consumption analysis of SET/DET Flip-Flops [5] Input Glitch Analysis: D X D Q Q Q Pglitch, SET = fV 2 DDCk , SET bfV 2 DDCb , SET where, Clk b = 0 -> 1 -> 0 average transitions between two active clock edges. Cb , SET = CD, SET 12 CX , SET Pglitch, DET = 1 2 Per one glitch transistion. fV 2 DDCk , DET bfV 2 DDCb , DET Where, Z D D Cb , DET = CD, DET CX , DET Per one glitch transition. Clk 1 D Pglitch, DET Pglitch, SET = 0 Q Ck , DET 2 bCb , DET 2Ck , SET 2 bCb , SET Q Y Q Conclusions [2],[7] • DET flip-flops is more sensitive to signal glitches wrt SET FF. • In case of applications that have low transistion probability of the input signals and reduced glitching, the power savings using DET instead of SET can be significant. (Up to 36%) according to the new proposed TSPC FF. The proposed DETFF has a very challenging power-delay product performance. [12] • • With the trend of increasing Clock frequency, it will be increasingly difficult to control both edges of the clock in the clock distribution system. • SSTC1, SSTC2, DSTC1, DSTC2 based Flip-flops showed superior improvement in delays by factors of 2.2 and 2.4 in the Semistatic and fully static styles resp. • PDP reduced by factors of 3.4 and 6.5 for (a = 0.25) in the Semistatic and fully static styles resp. Conclusions [2],[7] • New proposed high-speed flip-flops with logic related transistors are purely n-type in both n-latches and p-latches which gives the speed advantage to this approach in designing flip-flops. References [1] [2] [3] [4] [5] [6] J. Yuan and C. Svensson, “High-Speed CMOS Circuit Technique,” IEEE J. Solid-State Circuits, vol. 24, no.1, pp.62-70, Feb. 1989. J. Yuan and C. Svensson, “New Single-Clock CMOS Latches and Flipflops with Improved Speed and Power Savings,” IEEE J. Solid-State Circuits, vol. 32, no.1, pp.62-69, Jan. 1997. W. M. Chung and M. Sachdev, “A Comparative Analysis of Dual Edge Triggered Flip-Flops,” R.P. Llopis and M. Sachdev, “Low Power, Testable Dual Edge Triggered FlipFlops”, International on Low Power Electronics and Design, 1996, pp.341-5. A.G.M. Strollo, E. Napoli, and C. Cimino, “Analysis of Power Dissipation in Double Edge-Triggered Flip-Flops,” IEE Trans. On VLSI Systems, Vol. 8, no.5, Oct. 2000. S.M.M. Mishra, S.S.Rofail and K.S.Yeo, “Design of High Performance Double Edge-Triggered Flip-Flops,” IEEE Proc. Circuits Devices Syst., Vol.147, no.5, Oct. 2000. References [7] [8] [9] [10] [11] J.S.Wang, “A New True-Single-Phase-Clocked Double-Edge-Triggered Flip-Flop for Low-Power VLSI Designs,” IEE Int’l Symposium on Circuits and Systems, pp.1896-1899 June 9-12, 1997. G.M.Blair, Comments on “New Single-Clock CMOS Latches and Flip-flops with Improved Speed and Power Savings,” IEEE J. Solid-State Circuits, vol.. 32, no.10, pp.1610-1611, Oct. 1997. V.Stojanovic and V.G.Oklobdzija, “Comparative Analysis of Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems,” IEEE J. Solid-State Circuits, vol.. 34, no.4, pp. 536-548, April. 1999. M.Afghani and J. Yuan, “Double-edged-triggered D-Flip-Flops for high speed CMOS circuits,” IEEE J. Solid-State Circuits, vol.SC-26, no.8, pp.1168-1170, Aug. 1991. Jan N. Rabaey, “Digital Integerated Circuits: A Design Perspective”, 1996 Questions? The Bistability principle [10] • The width of the trigger pulse needs to be larger than the total propagation delay around the circuit loop. [10] Vi2 = Vo1 Vi 2 = Vo1 T B C d Vi1 = V o2 d Vi1 = V o2