Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Timing Issues January 2003 EE141 Integrated © Digital Circuits2nd 1 Timing Issues Synchronous Timing CLK In R1 EE141 Integrated © Digital Circuits2nd Cin Combinational Logic Cout R2 Out 2 Timing Issues Timing Definitions EE141 Integrated © Digital Circuits2nd 3 Timing Issues Latch Parameters the register maximum propagation delay tc-q is clock to output and td-q is data to output delay D Q Clk data D must be stable to be properly registered in the latch (no unintended changes when the latch is transparent) T Clk PWm D tsu thold tc-q Q td-q Intended change must come before the latch closes by at least tsu Delays can be different for rising and falling data transitions EE141 Integrated © Digital Circuits2nd 4 Timing Issues Register Parameters D Q Data must be stable before the rising edge of the clock and held sufficiently long to be processed by the register Clk T Clk thold D tsu tc-q Q Delays can be different for rising and falling data transitions EE141 Integrated © Digital Circuits2nd 5 Timing Issues Clock Uncertainties 4 Power Supply 3 Interconnect Devices 2 5 Temperature 6 Capacitive Load 7 Coupling to Adjacent Lines 1 Clock Generation Sources of clock uncertainty EE141 Integrated © Digital Circuits2nd 6 Timing Issues Clock Nonidealities Clock skew (constant delay) Spatial variation in temporally equivalent clock edges; deterministic + random, tSK Clock jitter (random variations) Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle-to-cycle (short-term) tJS Long term tJL Variation of the pulse width Important for level sensitive clocking EE141 Integrated © Digital Circuits2nd 7 Timing Issues Clock Skew and Jitter Clk tSK Clk tJS Both skew and jitter affect the effective cycle time Only skew affects the race margin EE141 Integrated © Digital Circuits2nd 8 Timing Issues Clock Skew # of registers Earliest occurrence of Clk edge Nominal – /2 Latest occurrence of Clk edge Nominal + /2 Insertion delay Max Clk skew Clk delay EE141 Integrated © Digital Circuits2nd 9 Timing Issues Positive and Negative Skew In R1 Combinational Logic D Q CLK R2 D Q tCLK1 Combinational Logic R3 tCLK2 delay ••• D Q tCLK3 delay (a) Positive skew In R1 Combinational Logic D Q tCLK1 R2 D Q Combinational Logic tCLK2 delay R3 ••• D Q tCLK3 delay CLK (b) Negative skew EE141 Integrated © Digital Circuits2nd 10 Timing Issues Positive Skew TCLK + d CLK1 CLK2 TCLK 1 3 d 2 4 d + th Launching register clock edge arrives before the receiving register clock edge EE141 Integrated © Digital Circuits2nd 11 Timing Issues Negative Skew TCLK + d 1 CLK1 CLK2 2 d TCLK 3 4 Receiving register clock edge arrives before the launching register clock edge EE141 Integrated © Digital Circuits2nd 12 Timing Issues Timing Constraints In R1 D R2 Q Combinational Logic tCLK1 CLK tc - q tc - q, cd tsu, thold D Q tCLK2 tlogic tlogic, cd Minimum clock cycle time: T - = tc-q + tsu + tlogic Were cd stands for a contamination or a minimum delay both in register propagation time and combinational logic delay Worst case is when receiving edge arrives early (negative ) thus a negative clock skew reduces the clock frequency EE141 Integrated © Digital Circuits2nd 13 Timing Issues Timing Constraints R1 In D R2 Q Combinational Logic tCLK1 CLK tc - q tc - q, cd tsu, thold D Q tCLK2 tlogic tlogic, cd Hold time constraint: t(c-q, cd) + t(logic, cd) > thold + Worst case is when receiving edge arrives late (positive skew) Race between data and clock is more likely for a positive clock skew EE141 Integrated © Digital Circuits2nd 14 Timing Issues Impact of Jitter TC LK CLK t j itter -tji tte r Combinational Logic REGS In CLK tc-q , tc-q, ts u, thold tjitter cd t log ic t log ic, cd Since jitter is a random delay it increases the minimum clock period and increases likelihood for race between clock and data EE141 Integrated © Digital Circuits2nd 15 Timing Issues Longest Logic Path in Edge-Triggered Systems TSU Clk TClk-Q Latest point of launching TJI + TLM T Earliest arrival of next cycle TLM - the maximum logic delay EE141 Integrated © Digital Circuits2nd 16 Timing Issues Clock Constraints in Edge-Triggered Systems If the launching edge is late and the receiving edge is early, the data will not be too late if: Tc-q + TLM + TSU < T – TJI,1 – TJI,2 - Minimum cycle time is determined by the maximum delays through the logic Tc-q + TLM + TSU + + 2 TJI < T Skew can be either positive or negative EE141 Integrated © Digital Circuits2nd 17 Timing Issues Shortest Path Shortest path effects feedback connections that typically have a negative clock skew Earliest point of launching Clk TClk-Q TLm TLm - the minimum logic delay Clk TH Nominal clock edge EE141 Integrated © Digital Circuits2nd Data must not arrive before this time 18 Timing Issues Clock Constraints in Edge-Triggered Systems If launching edge is early and receiving edge is late: Tc-q + TLm – TJI,1 > TH + TJI,2 + Minimum logic delay Tc-q + TLm > TH + 2TJI+ For clock skew only we had: t(c-q, cd) + t(logic, cd) > thold + EE141 Integrated © Digital Circuits2nd 19 Timing Issues How to counter Clock Skew? . REG REG In REG REG Negative Skew log Out Positive Skew Clock Distribution Data and Clock Routing EE141 Integrated © Digital Circuits2nd 20 Timing Issues Flip-Flop – Based Timing Logic propagation must finish before the next clock’s rising edge Skew Flip-flop delay Logic delay TSU TClk-Q Flip -flop =0 =1 Logic Clock cycle Representation after M. Horowitz, VLSI Circuits 1996. EE141 Integrated © Digital Circuits2nd 21 Timing Issues Flip-Flops and Dynamic Logic Logic delay TSU TSU TClk-Q TClk-Q =0 =0 =1 =1 Logic delay Precharge Evaluate In dynamic logic gates logic propagation must finish before the clock’s falling edge Evaluate Dual relation holds for the PUN controlled by inverted clocks Flip-flops are used only with static logic EE141 Integrated © Digital Circuits2nd Precharge 22 Timing Issues Latch timing When data arrives to a transparent latch tD-Q D Q Latch is a ‘soft’ barrier Clk tClk-Q When data arrives to closed latch Data has to be ‘re-launched’ EE141 Integrated © Digital Circuits2nd 23 Timing Issues Single-Phase Clock with Latches Latch Logic Tskl Tskl Tskt Tskt Clk PW P EE141 Integrated © Digital Circuits2nd 24 Timing Issues Latch-Based Design L1 latch is transparent when = 0 L2 latch is transparent when = 1 L1 Latch Logic L2 Latch Logic EE141 Integrated © Digital Circuits2nd 25 Timing Issues Slack-borrowing In L1 D Q CLB_A t pd,A a b CLK1 L2 D Q CLB_B t pd,B c L1 d D CLK2 Q e CLK1 TCLK CLK1 CLK2 t pd,A a valid EE141 Integrated © Digital Circuits2nd t DQ tpd,B b valid c valid t DQ slack passed to next stage shortening the clock period requirement e valid d valid 26 Timing Issues Clock Distribution H-tree balances the clock skew CLK Clock is distributed in a tree-like fashion EE141 Integrated © Digital Circuits2nd 27 Timing Issues More realistic H-tree [Restle98] EE141 Integrated © Digital Circuits2nd 28 Timing Issues The Grid Clock Distribution GCL K •Does not require rc-matching Driver Driver Driver GCLK •Large power dissipation GCLK •Easier to satisfy metal density requirement in fabrication Driver GCL K EE141 Integrated © Digital Circuits2nd •Good thermal distribution 29 Timing Issues Example: DEC Alpha 21164 Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load: 3.75 nF Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: • Single 6-stage driver at center of chip • Secondary buffers drive left and right side clock grid in Metal3 and Metal4 Total driver size: 58 cm! EE141 Integrated © Digital Circuits2nd 30 Timing Issues 21164 Clocking tcycle= 3.3ns trise = 0.35ns tskew = 150ps Clock waveform final drivers pre-driver Location of clock driver on die EE141 Integrated © Digital Circuits2nd 2 phase single wire clock, distributed globally 2 distributed driver channels Reduced RC delay/skew Improved thermal distribution 3.75nF clock load 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation effects symmetry 31 Timing Issues Clock Drivers EE141 Integrated © Digital Circuits2nd 32 Timing Issues Clock Skew in Alpha Processor Clock skew EE141 Integrated © Digital Circuits2nd 33 Timing Issues EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS tcycle= 1.67ns trise = 0.35ns Global clock waveform tskew = 50ps 2 Phase, with multiple conditional buffered clocks 2.8 nF clock load 40 cm final driver width PLL EE141 Integrated © Digital Circuits2nd Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking 34 Timing Issues 21264 Clocking EE141 Integrated © Digital Circuits2nd 35 Timing Issues EV6 Clock Results ps 300 305 310 315 320 325 330 335 340 345 ps 5 10 15 20 25 30 35 40 45 50 GCLK Skew GCLK Rise Times (at Vdd/2 Crossings) (20% to 80% Extrapolated to 0% to 100%) EE141 Integrated © Digital Circuits2nd 36 Timing Issues EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains + widely dispersed drivers DLL DLL DLL NCLK (Mem Ctrl) + DLLs compensate static and lowfrequency variation GCLK (CPU Core) SYSCLK EE141 Integrated © Digital Circuits2nd L2R_CLK (L2 Cache) PLL L2L_CLK (L2 Cache) + divides design and verification effort - DLL design and verification is added work + tailored clocks 37 Timing Issues Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol EE141 Integrated © Digital Circuits2nd 38 Timing Issues Synchronous Pipelined Datapath R1 D Q In CLK Logic Block #1 tpd,reg R2 D Q tpd1 Logic Block #2 tpd2 R3 D Q R4 D Q Logic Block #3 tpd3 Make sure that the clock period T is larger than the max delay T > max(tpd1,tpd2,tpd3 )+tpd,reg Problems: Clock skew and jitter Strong clock currents, induces noise due to package inductance Power dissipation Uneven stage delay could be used to support faster processing EE141 Integrated © Digital Circuits2nd 39 Timing Issues Self-Timed Pipelined Datapath Req Req HS Ack HS Ack Start In Req Done R1 F1 tpF1 Start R2 Req HS Ack Done F2 tpF2 ACK Start R3 Done F3 Out tpF3 Necessary for self-timed logic is a completion signal EE141 Integrated © Digital Circuits2nd 40 Timing Issues Completion Signal Generation Completion signal can be generated by: Replica delay Dual-rail coding LOGIC In Out NETWORK Start DELAY MODULE Critical path replica Using Delay Element (e.g. in memories) EE141 Integrated © Digital Circuits2nd Done 41 Timing Issues Completion Signal Generation Completion signal generation by dual-rail coding requires a redundancy in data representation Below two bits B0 and B1 represent a single bit value B value B Using Redundant Signal Encoding EE141 Integrated © Digital Circuits2nd 42 Timing Issues Completion Signal in DCVSL VDD VDD B0 Start Done B1 B0 B1 In1 In1 In2 In2 PDN Start EE141 Integrated © Digital Circuits2nd PDN Generation of a completion signal in DCVSL 43 Timing Issues Self-Timed Adder VDD VDD Start C0 C0 P0 C1 G0 P1 C2 G1 P2 C3 G2 P3 Start C4 C4 G3 Start VDD Done C4 C4 C3 C3 C2 C2 C1 C1 Start Start C0 C0 P0 C1 K0 P1 K1 C2 P2 K2 Start (a) Differential carry generation EE141 Integrated © Digital Circuits2nd C3 P3 C4 C4 (b) Completion signal K3 Done signal generated after all carry signals are stable 44 Timing Issues Completion Signal Using Current Sensing Start Input Register Inputs VDD Start Output Static CMOS Logic A GNDsense Current Sensor tdelay toverlap A B tMDG Done Done Min Delay Generator tpd-NOR B Output valid Current sensor outputs a low value when no current flows through the logic and a high value when logic is switching EE141 Integrated © Digital Circuits2nd 45 Timing Issues Hand-Shaking Protocol Two Phase Handshake Sender can cannot change its data once it sends the request signal which finishes its active cycle Receiver reads the data and produces acknowledge signal, this will start a new cycle and sender can process new data Req and Ack signals can be generated in both high-low and low-high transitions EE141 Integrated Circuits2nd © Digital 46 Timing Issues Event Logic – The Muller-C Element A F C B (a) Schematic VDD A A S B R (a) Logic Q VDD A B F n+1 0 0 0 0 1 Fn 1 0 Fn 1 1 1 (b) Truth table VDD B F B F B A A F B B (b) Majority Function Implementations of Muller-C element EE141 Integrated © Digital Circuits2nd (c) Dynamic 47 Timing Issues 2-phase Handshake Protocol Data Sender Receiver logic logic Data Ready Data Accepted Req C Initially Req, Ack, & Data Ready are 0 With Data Ready = 1 Req goes high and Data is transmitted Ack Handshake logic EE141 Integrated © Digital Circuits2nd Once this is finished Ack goes high and control is passed to the sender 48 Timing Issues Example: Self-timed FIFO Out In R1 En R2 R3 Done Reqi Req0 C C C Acko Acki Data transferred on positive and negative transmission of En Done is a delayed En signal Examine operation of FIFO by plotting signals EE141 Integrated © Digital Circuits2nd 49 Timing Issues 2-Phase Protocol EE141 Integrated © Digital Circuits2nd 50 Timing Issues Example From [Horowitz] EE141 Integrated © Digital Circuits2nd 51 Timing Issues Example EE141 Integrated © Digital Circuits2nd 52 Timing Issues Example EE141 Integrated © Digital Circuits2nd 53 Timing Issues Example EE141 Integrated © Digital Circuits2nd 54 Timing Issues 4-Phase Handshake Protocol Used to initialize Muller C-elements in a fixed state Also known as RTZ Slower, but unambiguous EE141 Integrated © Digital Circuits2nd 55 Timing Issues 4-Phase Handshake Protocol Implementation using Muller-C elements Initially Ack=0, Req=0, S=0, Data ready=0 EE141 Integrated © Digital Circuits2nd 56 Timing Issues 4-Phase Handshake Protocol Implementation using Muller-C elements Once Data ready=1 ->Req=1->S=1 and Data is transmitted EE141 Integrated © Digital Circuits2nd 57 Timing Issues 4-Phase Handshake Protocol Implementation using Muller-C elements At the end of transmition Data ready=0 -> Req=0 , S waits for Ack once the receiver sets Ack=1 -> S=0 and system waits for new Data ready EE141 Integrated © Digital Circuits2nd 58 Timing Issues Self-Resetting Logic completion detection (L1) Precharged Logic Block (L1) completion detection (L2) Precharged Logic Block (L2) completion detection (L3) Precharged Logic Block (L3) VDD Logic block is precharged as soon as the successor block finishes its operation and does not need the old data Post-charge logic int out A B EE141 Integrated © Digital C Circuits2nd A=1 ->int=0 -> out=1 ->precharge At this stage A must be low to avoid conflict 59 Timing Issues Clock-Delayed Domino GND CLK2 (to next stage) CLK1 VDD Q1 (also D2) D1 EE141 Integrated © Digital Pulldown Network Circuits2nd No global clock Clock from one stage drives the next one Transmission gate always switched on High speed operation 60 Timing Issues Asynchronous-Synchronous Interface fin Synchronous system Asynchronous system fCLK Synchronization EE141 Integrated © Digital Circuits2nd 61 Timing Issues Synchronizers and Arbiters Arbiter: Circuit to decide which of 2 events occurred first Synchronizer: Arbiter with clock as one of the inputs Problem: Circuit HAS to make a decision in limited time - which decision is not important Caveat: It is impossible to ensure correct operation But, we can decrease the error probability at the expense of delay EE141 Integrated © Digital Circuits2nd 62 Timing Issues A Simple Synchronizer CLK int D I1 Q I2 CLK • Data sampled on rising edge of the clock • Latch will eventually resolve the signal value, but ... this might take infinite time! EE141 Integrated © Digital Circuits2nd 63 Timing Issues Synchronizer: Output Trajectories Single-pole model for a flip-flop 2.0 Initial increase Exponential with time constant t Vout 1.0 0.0 0 100 200 300 time [ps] Transient response of FF around the metastable point VMS EE141 Integrated © Digital Circuits2nd 64 Timing Issues Synchronizer: Output Trajectories Error occurs if the signal is still undefined after waiting period T A signal is undefined if it falls into the interval VIH VIL which happens if the initial value v(0) is 2.0 Vout 1.0 0.0 0 VMS VMS VIL e T /t v(0) VMS VMS VIL e T /t 100 200 300 time[ps] Or when the initial value v(0) is close to VMS in the interval of length T = 2 VMS VIL e T /t Increasing the waiting time T decreases probability of error exponentially EE141 Integrated © Digital Circuits2nd 65 Timing Issues Mean Time to Failure Tsignal Probability that changing signal is undefined at the beginning of the sampling time for a signal duration time Tsignal. This probability falls down exponentially with time T used to decide the signal value Number of synchronization errors per second After time T number of synchronization errors drops to T EE141 Integrated © Digital Circuits2nd is a synchronization time 66 Timing Issues Example Synchronization time and clock frequency T = 10 nsec = T Average period of input signal transition Tsignal = 50 nsec Rise time Time constant Undefined region tr = 1 nsec t = 310 psec VIH - VIL = 1 V (VDD = 5 V) Probability of undefined signal: mean-time-to-fail MTF (T) = 1 / Nsync (T) with and without synchronizer for this clock frequency Number of errors MTF with synchronizer MTF without synchronizer EE141 Integrated © Digital Circuits2nd Nsync(T) = 3.9 10-9 errors/sec MTF (T) = 2.6 108 sec = 8.3 years MTF (0) = 2.5 msec 67 Timing Issues Influence of Noise p(v) Uniform distribution around VM One would hope that noise may throw the system out of undefined range, however Low amplitude noise does not influence synchronization behavior 0 VIL VIH Initial Distribution EE141 Integrated © Digital Circuits2nd Sometimes it helps sometimes it hurts 68 Timing Issues Typical Synchronizers 2 phase clocking circuit 2 Q 1 Q 2 1 Using delay line 1 EE141 Integrated © Digital Circuits2nd 69 Timing Issues Cascaded Synchronizers Reduce MTF O1 In Sync O2 Sync Out Sync f Increased MTF is obtained at the expense of larger latency For instance cascading synchronizers reduces exponentially probability of synchronization failure EE141 Integrated © Digital Circuits2nd 70 Timing Issues Arbiters Req1 Req2 Ack1 Arbiter Req1 A Ack2 B Ack2 Ack1 Req2 (a) Schematic symbol Req1 (b) Implementation Req2 Arbiter decides which of two events occurred first, so a VT gap A B metastable Ack1 (c) Timing diagram EE141 Integrated © Digital Circuits2nd t synchronizer is a special case of an arbiter which decides if the signal came before the clock or not 71 Timing Issues PLL-Based Synchronization Chip 1 Chip 2 Data Digital System Digital System fsystem = N x fcrystal Divider reference clock PLL Clock Buffer PLL fcrystal , 200<Mhz Crystal Oscillator EE141 Integrated © Digital Circuits2nd Lower frequency reference clock is increased to a desired frequency range by PLL 72 Timing Issues PLL Block Diagram Reference clock Up Phase detector Charge pump Loop filter vcont VCO Down Local clock div Divide by N System Clock Based on up and down signals the charge pump either increases or decreases the Vcont that controls the voltage controlled oscillator. VCO delivers precise system clock and frequency divider divides it to match with the reference clock. Phase detector issues up and down signals based on the phase difference. EE141 Integrated © Digital Circuits2nd 73 Timing Issues Phase Detector Output before filtering Transfer characteristic EE141 Integrated © Digital Circuits2nd 74 Timing Issues Phase-Frequency Detector Rst D Q B UP B UP = 0 DN = 1 A A UP = 0 DN = 0 UP = 1 DN = 0 A Rst D Q DN A B B (a) schematic (b) state transition diagram A A B B UP UP DN DN (c) Timing waveforms EE141 Integrated © Digital Circuits2nd 75 Timing Issues Phase-Frequency Detector Response to frequency – B has lower frequency than A A B UP DN EE141 Integrated © Digital Circuits2nd 76 Timing Issues PFD Phase Transfer Characteristic Average (UP-DN) VDD -2 p 2p phase error (deg) Notice that if the signal shift is by multiplicity of the clock period it cannot be detected. EE141 Integrated © Digital Circuits2nd 77 Timing Issues Charge Pump VDD UP To VCO Control Input DN Up signals increases the output voltage and down signal decreases it. EE141 Integrated © Digital Circuits2nd 78 Timing Issues PLL Simulation local local EE141 Integrated © Digital Circuits2nd 79 Timing Issues Example of PLL-generated clock EE141 Integrated © Digital Circuits2nd 80 Timing Issues Clock Generation using DLLs Delay-Locked Loop DLL (Voltage Controlled Delay Line based – no frequency conversion) U fREF Phase Det D Charge Pump VCDL Filter fO Phase-Locked Loop (VCO-Based) fREF U ÷N PD D CP VCO Filter fO EE141 Integrated © Digital Circuits2nd 81 Timing Issues Delay Locked Loop EE141 Integrated © Digital Circuits2nd 82 Timing Issues DLL-Based Clock Distribution VCDL ••• Digital Circuit ••• Digital Circuit CP/LF Phase Detector GLOBAL CLK VCDL CP/LF Phase Detector EE141 Integrated © Digital Circuits2nd 83 Timing Issues