Lecture 10: Latch and Flip-Flop Design Slides orginally from: Vladimir Stojanovic Computer Systems Laboratory Stanford University horowitz@stanford.edu 5/7/2001 EE371 1 Outline • • • • • • Recent interest in latches and flip-flops Timing and Power metrics Design and optimization tradeoffs Master-slave vs. Pulse-triggered Latch Representative designs Comparison 5/7/2001 EE371 2 1 Recent Interest in Flip-Flops • Trends in high-performance systems ! Higher clock frequency ! More transistors on chip • Consequences ! Increased flip-flop overhead relative to cycle time • Cycle time 10 - 20 FO4 delays, flop overhead 2 - 4 FO4 ! ! ! ! Difficult to control both edges of the clock Higher impact of clock skew Higher crosstalk and substrate coupling Higher power consumption • expensive packages and cooling systems • limit in performance ! Clock burns up to 40%, flops up to 20% of total power 5/7/2001 EE371 3 Requirements in the Flip-Flop Design • • • • Small Clk-Output delay, Narrow sampling window Low power Small clock load High driving capability (increased levels of parallelism) ! Typical flip-flop load in a 0.18µm CMOS ranges from 50fF to over 200fF, with typical values of 100-150fF in critical paths (28FO4s or even higher) • Integration of logic into the flop • Multiplexed or clock scan • Crosstalk insensitivity - dynamic/high impedance nodes are affected 5/7/2001 EE371 4 2 Flip-Flop Delay • Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed • T = TClk-Q + TLogic + Tsetup+ Tskew D Q Logic D Q N Clk Clk TClk-Q TSetup TLogic 5/7/2001 EE371 5 Delay vs. Setup/Hold Times 350 300 Minimum Data-Output Clk-Output [ps] 250 200 150 Setup Hold 100 50 0 -200 -150 -100 -50 0 50 100 150 200 Data-Clk [ps] 5/7/2001 EE371 6 3 Timing parameters, details 410 Unstable Clk-Q region 390 Failure region Time [ps] 370 350 330 Stable Clk-Q region D-Q D CQ +U minimum D-Q Clk-Q stable D CQ 310 290 270 U Optimum setup time 250 -80 -60 -40 -20 0 20 40 D - Clk delay [ps] 60 80 100 The best point to pick on delay curve is minimum D-Q 5/7/2001 EE371 7 Design & optimization tradeoffs PDPtot [fJ] 90 80 • ! Minimal Total power consumption ! Minimal Delay 50 40 30 20 • • 10 0 0 50 100 150 Total Power [uW] 90 80 70 60 PDPtot [fJ] PDPtot [fJ] Opposite Goals 70 60 50 40 30 20 10 0 0 5 10 15 20 200 90 80 70 60 50 40 30 20 10 0 25 0 200 400 600 800 1000 1200 Delay [ps] Width [um] 5/7/2001 Power-Delay tradeoff Minimize Power-Delay product (PDPtot) EE371 8 4 Types of Flip-Flops Master-Slave Latch Pulse-Triggered Latch L Data L1 L2 D Q D Q Clk Clk Data Clk D Q Clk Clk Data Clk 5/7/2001 S Q R EE371 9 Master-Slave Latches • Positive setup times • Two clock phases: ! distributed globally ! generated locally • Small penalty in delay for incorporating MUX • Some circuit tricks needed to reduce the overall delay 5/7/2001 EE371 10 5 T-G Master-Slave Latch • PowerPC 603 (Gerosa, JSSC 12/94) Vdd Clk Vdd Clkb Q D Clk Clkb 5/7/2001 EE371 11 T-G Master-Slave Latch • Low power feedback • Unbuffered input ! input capacitance depends on the phase of the clock ! over-shoot and under-shoot with long routes ! wirelength must be restricted at the input • • • • Clock load is high Low power Small clk-output delay, but positive setup Easily embedded scan or mux 5/7/2001 EE371 12 6 C2MOS MS Latches Vdd Vdd Ck Ckb D Q Ckb Vdd Clk Ck Vdd Vdd Vdd Ck Vdd Vdd Ckb • • • • Ck Ckb Ck Low power feedback Locally generated second phase Poor driving capability Robustness to clock slope 5/7/2001 EE371 13 Single-Transistor-Clocked MS latches Q Vdd Vdd Q Vdd Vdd Clk Clk Q Q D D D • • • • • D DSTC SSTC Yuan and Svennson, JSSC Jan. ‘97 Ratioed DCVS and SRPL based designs Relatively small clock load Very sensitive to input glitching Capacitive coupling and charge sharing related speed and power problems 5/7/2001 EE371 14 7 Pulse-Triggered Latches • First stage is a pulse generator ! generates a pulse (glitch) on a rising edge of the clock • Second stage is a latch ! captures the pulse generated in the first stage • Pulse generation results in a negative setup time • Frequently exhibit a soft edge property • Must check for hold time violations Note: power is always consumed in the clocked pulse generator 5/7/2001 EE371 15 Hybrid Latch Flip-Flop • AMD K-6, Partovi, ISSCC’96 Vdd Q Q D Clk 5/7/2001 EE371 16 8 HLFF Operation • 1-0 and 0-1 transitions at the input with 0ps setup time 5/7/2001 EE371 17 Hybrid Latch Flip-Flop Skew absorption Partovi et al, ISSCC’96 5/7/2001 EE371 18 9 Hybrid Latch Flip-Flop • Flip-flop features: ! single phase clock ! edge triggered, on one clock edge • Latch features: Soft clock edge property ! ! ! ! brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew • Hold time is comparable to HLFF delay ! minimum delay between flip-flops must be controlled • Fully static • Possible to incorporate logic 5/7/2001 EE371 19 Semi-Dynamic Flip-Flop (SDFF) • Sun UltraSparc III, Klass, VLSI Circuits’98 Vdd Vdd Q Q D Clk • • • Soft edge conditioned by data since first stage is precharged - cross-coupled latch is added for robustness Small penalty for adding logic Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists 5/7/2001 EE371 20 10 Sense-amplifier-based flip-flop Matsui et al. 1994. DEC Alpha 21264, StrongARM 110 • • • • • First stage is a sense amplifier On rising clock edge monotonic S_b or R_b trigger the S-R latch Cross-coupled NAND speed bottleneck Big power savings in reduced swing designs Nice interface to/from domino logic 5/7/2001 EE371 21 Modified Sense Amplifier-Based Flip-Flop • The first stage is unchanged sense amplifier • Second stage is sized to provide maximum switching speed • Driver transistors are large • Keeper transistors are small and disengaged during transitions Nikolic & Stojanovic, ISSCC ‘99 5/7/2001 EE371 22 11 Modified Sense Amplifier-Based Flip-Flop • Delay of each of the outputs is independent of the load on the other output • Delay of Q and Q is symmetrical as opposed to the NAND based design • Convenient for dual rail logic and driving strength for standard CMOS is effectively doubled • SAFF presents a small clock load, small setup time and all the advantages of original design • Possible tradeoff between speed and robustness to crosstalk 5/7/2001 EE371 23 K-6 Dual-Rail ETL • • • • Self-reset property ! increases dynamic power ! drives domino logic Precharge increases speed Very fast but burns a lot of power Small clock load Vdd Vdd Q Q D Clk 5/7/2001 EE371 24 12 Flip-Flop Performance Comparison Data D Q Test bench Clk Q Clock 200fF 200fF 50fF • Total power consumed ! internal power ! data power ! clock power Delay is (minimum D-Q) Clk-Q + setup time • Measured for four cases ! no activity (0000… and 1111…) ! maximum activity (0101010..) ! average activity (random sequence) 5/7/2001 EE371 25 Delay comparison 700 Delay [ps] 600 500 400 300 200 100 0 K6 SA-F/F StrongArm SSTC DSTC 350 • 300 Delay [ps] 250 200 150 Pulsed design brings the fastest structures 100 50 0 SDFF 5/7/2001 HLFF PowerPC mC2MOS EE371 26 13 Overall performance 60 160 50 140 PDPtot [fJ] PDPtot [fJ] 120 40 30 20 100 80 60 40 20 10 0 0 SA-F/F HLFF SDFF PowerPC mC2MOS K6 SSTC DSTC Activity=0.5 equal transition probability Activity=0.5 equal transition probability • • Strong Arm110 Real signals have the activity between 0 and 0.5 (g) Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the g point • 5/7/2001 EE371 27 Conventional Clk-Q vs. minimum D-Q 400 HLFF SSTC & DSTC Total power [uW] 350 300 Pulsed designs 250 PowerPC MS designs Strong Arm FF 200 SA-F/F 150 mC2MOS latch 100 K6 ETL 50 0 150 SSTC 200 250 300 350 400 450 500 550 600 650 Delay [ps] SDFF 400 HLFF 350 Total Power [uW] DSTC PowerPC 300 Strong Arm FF 250 200 SA-F/F 150 mC2MOS latch 100 K6 ETL 50 0 100 SSTC 150 200 250 300 350 • Hidden positive setup time Degradation of total delay SDFF Clk-Q delay [ps] 5/7/2001 DSTC • EE371 28 14 Comparison of Clock power consumption DSTC MS latch SSTC MS latch K6 ETL StrongArm FF SA-F/F 2 mC MOS PowerPC MS latch SDFF HLFF 0 10 20 30 40 50 Local Clock power consumption [µ µ W] 5/7/2001 EE371 29 Design goals • Apply ! Small clock load ! Short direct path ! Reduced node swing ! Low-power feedback ! Pulsed design ! Optimization of both Master and Slave latch • Avoid ! Positive setup time ! Sensitivity to clock slope and skew ! Dynamic (floating) nodes ! Dynamic Master latch Conduct Power *Delay optimizations on constant frequency - really optimize Energy*Delay product Take into account all sources of power dissipation ALWAYS use Clk-Q + setup time for max delay 5/7/2001 EE371 30 15 General characteristics ! 60ps = FO4 delay in .2u technology ! min gate width 1.6u Nominal conditions # of transistors PowerPC 603 HLFF SDFF mC2MOS SA-F/F StrongArm FF K6 ETL SSTC DSTC 16 20 23 24 19 20 37 16 10 5/7/2001 Table 1: General characteristics Total Internal Clock Data Total Delay transistor power power power power [ps] width [u] [uW] [uW] [uW] [uW] 147 36 46 5 87 266 162 106 18 3 127 199 167 158 27 2 187 187 170 94 15 6 115 292 214 97 18 3 118 272 215 101 18 3 122 275 246 250 15 5 270 200 147 94 22 4 120 592 136 132 22 4 158 629 EE371 PDPtot [fJ] 23 25 35 34 32 34 54 71 99 31 16