ECE260B – CSE241A Winter 2005 Timing Analysis and Correction Website: http://vlsicad.ucsd.edu/courses/ece260b-w05 ECE 260B – CSE 241A Timing Analysis & Correction 1 http://vlsicad.ucsd.edu Timing Analysis Testing Simulation Device modeling (BSIM) Transistor-level time domain analysis (SPICE) Frequency domain interconnect analysis (AWE, PRIMA) Static timing analysis Transistor-level (PathMill) Gate-level (PrimeTime) ECE 260B – CSE 241A Timing Analysis & Correction 2 http://vlsicad.ucsd.edu Sequential Machine Combinational logic Combinational logic Combinational logic clk clk clk State is stored in registers (flip-flops or latches) Combinational logic computes next-state, outputs from present-state, inputs ECE 260B – CSE 241A Timing Analysis & Correction 3 Courtesy K. Keutzer et al. UCB http://vlsicad.ucsd.edu Why Clocks? Clocks provide the means to synchronize By allowing events to happen at known timing boundaries, we can sequence these events Greatly simplifies building of state machines No need to worry about variable delay through combinational logic (CL) All signals delayed until clock edge (clock imposes the worst case delay) FSM Courtesy K. Yang, UCLA Comb Logic register ECE 260B – CSE 241A Timing Analysis & Correction 4 register register Comb Logic Dataflow http://vlsicad.ucsd.edu Clock Cycle Time Cycle time is determined by the delay through the CL Signal must arrive before the latching edge If too late, it waits until the next cycle - Synchronization and sequential order becomes incorrect Constraint: Tcycle > Tprop_delay_through_CL + Toverhead Example: 3.0 GHz Pentium-4 Tcycle = 333ps Can change circuit architecture to obtain smaller Tcycle ECE 260B – CSE 241A Timing Analysis & Correction 5 Courtesy K. Yang, UCLA http://vlsicad.ucsd.edu Pipelining For dataflow: Instead of a long critical path, split the critical path into chunks Insert registers to store intermediate results This allows 2 waves of data to coexist within the CL Can we extend this ad infinitum? Overhead eventually limits the pipelining - E.g., 1.5 to 2 gate delays for latch or FF Granularity limits as well - Minimum time quantum: delay of a gate T cycle > Tpd + Toverhead A tpd1 Courtesy K. Yang, UCLA CL B register ECE 260B – CSE 241A Timing Analysis & Correction 6 CL > max(tpd1, tpd2) + Toverhead register tpd register A+B register register CL T cycle tpd2 http://vlsicad.ucsd.edu Intel MPU FO4 INV Delays Per Clock Period Number of FO4 inverter delays 120.00 100.00 386 486 DX2 DX4 80.00 Pentium Pentium MMX Pentium Pro 60.00 Pentium II Celeron 40.00 Pentium III Pentium 4 20.00 0.00 1982 1987 1993 1998 2004 Year FO4 INV = inverter driving 4 identical inverters (no interconnect) Half of frequency improvement has been from reduced logic stages, i.e., pipelining ECE 260B – CSE 241A Timing Analysis & Correction 7 http://vlsicad.ucsd.edu Let’s Revisit Cycle Time and Path Delay Cycle time (T) cannot be smaller than longest path delay (Tmax) Longest (critical) path delay is a function of: Total gate, wire delays cycle time data Tclock1 Tmax T logic levels Q2 Q1 Tclock1 critical path, ~5 logic levels Tclock2 clock ECE 260B – CSE 241A Timing Analysis & Correction 8 Courtesy K. Keutzer et al. UCB http://vlsicad.ucsd.edu Cycle Time - Setup Time For FFs to correctly capture data, must be stable for: Setup time (Tsetup) before clock arrives setup time data Tclock1 Tmax Tsetup T Q2 Q1 Tclock1 critical path, ~5 logic levels Tclock2 clock ECE 260B – CSE 241A Timing Analysis & Correction 9 Courtesy K. Keutzer et al. UCB http://vlsicad.ucsd.edu Cycle Time – Clock Skew If clock network has unbalanced delay – clock skew Cycle time is also a data Tclock1 Tclock2 Q2 function of clock skew (Tskew) Tmax Tsetup Tskew T Q2 Q1 Tclock1 clock skew critical path, ~5 logic levels Tclock2 clock Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 10 http://vlsicad.ucsd.edu 10 Cycle Time – Flip-Flop Delay (Clock to Q) Cycle time is also a function of propagation delay of FF (Tclk-to-Q or Tc2q) Tc2q : time from arrival of clock signal till change at FF output) data Tclock1 Tclock2 Q2 clock-to-Q Tmax Tsetup Tskew Tclk to Q T Q2 Q1 Tclock1 critical path, ~5 logic levels Tclock2 clock Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 11 http://vlsicad.ucsd.edu Min Path Delay - Hold Time For FFs to correctly latch data, data must be stable during: Hold time (Thold) after clock arrives Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew) hold time data Tclock1 Q2 Q1 Tclock1 Tmin Thold Tskew short path, ~3 logic levels Tclock2 clock Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 12 http://vlsicad.ucsd.edu Setup, Hold, Cycle Times cycle time hold time – D stable after clock set-up time – D stable before clock When signal may change Example of a single phase clock Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 13 http://vlsicad.ucsd.edu Timing Constraints for Edge-Triggered FFs Logic FlipFlop Comb Tcycle Max(Tpd) < Tcycle – Tsetup – Tc2q – Tskew Delay is too long for data to be captured Min(Tpd) > Thold-Tc2q+Tskew Delay is too short and data can race through, skipping a state ECE 260B – CSE 241A Timing Analysis & Correction 14 Courtesy K. Yang, UCLA http://vlsicad.ucsd.edu Example of Tpdmax Violation Suppose there is skew between the registers in a dataflow (regA after regB) “i” gets its input values from regA at transition in Ck’ CL output “o” arrives after Ck transition due to skew To correct this problem, can increase cycle time Ck’ Ck Tskew Comb Logic o regB regA i Tpdmax Ck Too late! Ck’ i Courtesy K. Yang, UCLA o ECE 260B – CSE 241A Timing Analysis & Correction 15 Tpdmax http://vlsicad.ucsd.edu Example of Tpdmin Violation: Race Through Suppose clock skew causes regA to be clocked before regB “i” passes through the CL with little delay (tpdmin) “o” arrives before the rising Ck’ causes the data to be latched Cannot be fixed by changing frequency have rock instead of chip Ck’ Ck Comb Logic o regB regA i Tpdmin Tskew Ck Ck’ i Too early! Tpdmin o Courtesy K. Yang, UCLA ECE 260B – CSE 241A Timing Analysis & Correction 16 http://vlsicad.ucsd.edu Summary: Timing Constraints Synchronous design = combinational logic + sequential elements FF For each flip-flop: Tmax+ Tsetup < Tcycle - Tskew CLK Tmin > Thold + Tskew Q FF combinational logic D Tmax : longest data propagation path delay CLK Tmin : shortest data propagation path delay DATA Tcycle Thold ECE 260B – CSE 241A Timing Analysis & Correction 17 Tsetup http://vlsicad.ucsd.edu Clock Identification Partition the design Clock network Clock definition Derived clock Clock groups Clock delay (skew) FF FF Q combinational logic CLK1 /8 divider calculation Timing constraints exist D CLK4 CLK2 CLK3 between clocks with a common divisor frequency Data paths with timing constraints ECE 260B – CSE 241A Timing Analysis & Correction 18 http://vlsicad.ucsd.edu Timing Graph Data paths with timing constraints Starting from primary inputs/FF outputs Ending at primary outputs/FF inputs Represented by a labeled directed graph G = <V,E> Timing node V ~ pin/primary input/output Timing edge E ~ gate/wire delay (Timing arc ~ gate delay) U 0 A 1 V 0 1 Y 2 1 U .20 X 0 2 Z F 0 2 .15 C 2 .15 C B A .20 X F V .20 .20 1 2 B 2 Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 19 Z Y http://vlsicad.ucsd.edu Characterization Static analysis = vector-less worst case analysis Graph based path propagation No logics Pre-characterized look-up tables for gate delays Min/max/rise/fall Characterized interconnect delays On-the-fly delay calculation SDF (standard delay format) annotation X X Y 2 2 Z Z 2 ECE 260B – CSE 241A Timing Analysis & Correction 20 Y http://vlsicad.ucsd.edu Compute Longest Path A 1 U 0 0 Origin (Kirkpatrick 1966, IBM JRD) .20 X 2 .15 C 2 F V .20 1 2 B 2 Z Y Compute longest path in a DAG G = <V,E,delay,Origin> // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 21 http://vlsicad.ucsd.edu Compute Longest Path A 1 U 0 0 Origin (Kirkpatrick 1966, IBM JRD) .20 X 2 .15 C 2 F V .20 1 Z 2 B 2 Y Compute longest path in a DAG G = <V,E,delay,Origin> // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Dynamic programming How to exclude a set of paths? Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A Timing Analysis & Correction 22 http://vlsicad.ucsd.edu Timing Analysis Terminology Actual arrival time (AAT): forward propagation Required arrival time (RAT): backward propagation Slack = RAT - AAT A measure of how much timing margin exists at each node Slack < 0 timing violation Can optimize a particular branch Can trade slack for power, area, robustness Critical path clock ECE 260B – CSE 241A Timing Analysis & Correction 23 http://vlsicad.ucsd.edu Static Timing Analysis Flow Read in design (LEF/DEF) timing library (.lib) timing constraints (GCF) delay annotation (SDF) Set up constraints Annotated delays IO path constraints Single cycle setup/hold checks Timing exceptions - Construct timing graph AAT propagation Partition clock domain (form path groups) Ideal/propagated clock Case analysis Levelization Timing report End points with violations Path enumeration False paths Multi-cycle paths Max delay constraints Min delay constraints ECE 260B – CSE 241A Timing Analysis & Correction 24 http://vlsicad.ucsd.edu Timing Exceptions False paths: topologically connected but logically impossible to enable To enable a path Logically: non-controlling values (e.g., 0 for OR gates, 1 for AND gates) at side inputs Temporally: earlier signal transitions at side inputs clock ECE 260B – CSE 241A Timing Analysis & Correction 25 http://vlsicad.ucsd.edu False Path Representation Abstracted graph Set_false_path -from {…} –through {…} … -through {…} –to {…} through through from to from to through ECE 260B – CSE 241A Timing Analysis & Correction 26 through http://vlsicad.ucsd.edu False Path Identification Tagged timing analysis Arrival times with the same tag are compared to find worst case False path filtered arr: 1 tag: 0 arr: 2 tag: 2 b d a c arr: 3 tag: 3 clock from a to through through d b c tag: 2 tag: 3 ECE 260B – CSE 241A Timing Analysis & Correction 27 http://vlsicad.ucsd.edu Handling Latch-Based Designs Latch: level enabling sequential element Latch Transparent signal propagation Time borrowing combinational logic Path delay of previous stage – Tborrow Path delay of current stage CLK + Tborrow D Q combinational logic CLK DATA transparent Tborrow ECE 260B – CSE 241A Timing Analysis & Correction 28 http://vlsicad.ucsd.edu Counting Process Variation Off-chip variation: two paths on a chip cannot use two different operating conditions (i.e., corners) at the same time for setup or hold analysis Launchclock_latepath (max) + data_latepath (max) < captureclock_earlypath (max) + clock_period – setup Launchclock_earlypath (min) + data_earlypath (min) > captureclock_latepath (min) + hold On-chip variation: the software calculates the delay for one path based on maximum operating condition while calculating the delay for another path based on minimum operating condition for setup or hold checks Statistical static timing analysis (SSTA) pdf Continuous pdf (probability distribution functions) Or discrete corners ECE 260B – CSE 241A Timing Analysis & Correction 29 http://vlsicad.ucsd.edu Clock Re-convergence Pessimism Removal Common part of two clock propagation paths cannot have two different path delays at the same time Need to compute clock propagation delay from the branch point FF Q max combinational logic FF min D CLK max Common part ECE 260B – CSE 241A Timing Analysis & Correction 30 http://vlsicad.ucsd.edu Outline Timing Analysis Timing Requirements Static Timing Analysis Timing Correction ECE 260B – CSE 241A Timing Analysis & Correction 31 http://vlsicad.ucsd.edu Timing Correction Driven by STA “Incremental performance analysis backplane” Two goals Fix logic design rule violations Fix timing problems DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 32 http://vlsicad.ucsd.edu Logic Design Rules Constraints of Fanout Slew rate Load cap Reduce timing look-up table extrapolation error Control signal integrity Transition degradation Crosstalk noise Supply voltage drop Device reliability Approaches Resizing Buffering Cloning (copying cells) ECE 260B – CSE 241A Timing Analysis & Correction 33 http://vlsicad.ucsd.edu Timing Correction Approaches Re-synthesis Timing-driven placement Critical net weighting Timing-driven routing Local synthesis transforms Net ordering Buffering Topology optimization Post-route optimization (IPO) Re-routing Re-timing and useful clock skew Sizing Buffering DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 34 http://vlsicad.ucsd.edu Local Synthesis Transforms Resize cells Move critical signals forward Buffer or clone to reduce load on critical nets Decompose large cells Swap connections on commutative pins or among equivalent nets Pad early paths Area recovery DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 35 http://vlsicad.ucsd.edu Transform Example ….. Double Inverter Delay = 4 Removal ….. ….. Delay = 2 DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 36 http://vlsicad.ucsd.edu Resizing ? b 0.2 e 0.2 f 0.3 d a d 0.05 0.04 0.03 0.02 0.01 0 0 a 0.2 A b 0.8 0.6 0.4 1 load 0.035 A B C a C b 0.026 DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 37 http://vlsicad.ucsd.edu d Cloning 0.05 0.04 0.03 0.02 0.01 0 0 0.2 0.4 0.6 0.8 1 load A a ? b d 0.2 e 0.2 f 0.2 g h 0.2 0.2 B C d A f a B b DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 38 e g h http://vlsicad.ucsd.edu d Buffering 0.05 0.04 0.03 0.02 0.01 0 0 0.2 0.4 0.6 0.8 1 load A a ? b d 0.2 e 0.2 f 0.2 g h B C 0.2 e 0.2 a B b 0.2 0.2 DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 39 d 0.1 B f 0.2 g 0.2 0.2 h http://vlsicad.ucsd.edu Redesign Fan-in Tree Arr(a)=4 Arr(b)=3 a b 1 e 1 Arr(c)=1 Arr(d)=0 c Arr(e)=6 1 d a b c d 1 e 1 Arr(e)=5 1 DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 40 http://vlsicad.ucsd.edu Redesign Fan-out Tree 3 3 1 1 1 1 1 1 1 1 2 1 1 Longest Path = 4 Slowdown of buffer due to load Longest Path = 5 DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 41 http://vlsicad.ucsd.edu Decomposition DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 42 http://vlsicad.ucsd.edu Swap Commutative Pins 1 0 a 1 1 2 b 5 1 c 2 Simple sorting on arrival times and delay works 1 2 3 c 1 1 b 0 1 a 2 DAC-2002, Physical Chip Implementation ECE 260B – CSE 241A Timing Analysis & Correction 43 http://vlsicad.ucsd.edu Logic Restructuring 1 • Nodes in critical section that fan out outside of critical section are duplicated f f a Collapsed node a b e e e b h h d c Late input signals ECE 260B – CSE 241A Timing Analysis & Correction 44 c Slides courtesy of Keutzer d http://vlsicad.ucsd.edu Logic Restructuring 2 Place timing-critical nodes closer to output Make them pass through fewer gates After collapse, a divisor is selected such that substituting k into f places critical signal c and d closer to output Re-extract factor k f Collapse critical section k f Collapsed node a b c d d divisor e ECE 260B – CSE 241A Timing Analysis & Correction 45 e a b Slides courtesy of Keutzer c close to output http://vlsicad.ucsd.edu Summary of Local Synthesis Transforms Variety of methods for delay optimization No single technique dominates The one with more tricks wins? No! Technology dependant (for gate delay) Differ with cell libraries Methodology dependant (for wire delay) Need to predict placement and routing result Uncertainty! Pros: large potential improvement Cons: less predictable, more expensive ECE 260B – CSE 241A Timing Analysis & Correction 46 http://vlsicad.ucsd.edu Summary of Local Synthesis Transforms Work smoothly in a physical synthesis flow Tight integration with placement and routing Need a good framework for evaluating and processing different transforms Accurate, fast timing engine with incremental analysis capability - don’t want to retime the whole design for each local transform Simultaneous min and max delay analysis - How does fixing the setup violation affect the existing hold checks? ECE 260B – CSE 241A Timing Analysis & Correction 47 http://vlsicad.ucsd.edu Timing Correction Approaches Re-Synthesis Local Transformation Timing-Driven Placement Timing-Driven Routing Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering ECE 260B – CSE 241A Timing Analysis & Correction 48 http://vlsicad.ucsd.edu Reducing Crosstalk Effect Shielding Effective for short range capacitive coupling Not for long range inductive coupling Net ordering (wire swizzling) ECE 260B – CSE 241A Timing Analysis & Correction 49 http://vlsicad.ucsd.edu Reducing Crosstalk Effect Shielding Net ordering Gate sizing A strong driver is less sensitive to crosstalk But more likely to project crosstalk to its neighbors ECE 260B – CSE 241A Timing Analysis & Correction 50 http://vlsicad.ucsd.edu Reducing Crosstalk Effect Shielding Net ordering Gate sizing Buffering Partition interconnects Mutual canceling: ECE 260B – CSE 241A Timing Analysis & Correction 51 http://vlsicad.ucsd.edu Timing Correction Approaches Re-Synthesis Local Transformation Timing-Driven Placement Timing-Driven Routing Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering ECE 260B – CSE 241A Timing Analysis & Correction 52 http://vlsicad.ucsd.edu Re-Timing How would you meet the 10ns clock cycle time? FF FF FF D Q D Q D Q 6 clock 4 2 4 4 Cycle = 10 ECE 260B – CSE 241A Timing Analysis & Correction 53 http://vlsicad.ucsd.edu Re-Timing Re-order sequential elements and combinational logic Did you see a problem here? FF FF FF D Q D Q D Q 6 clock 4 4 2 4 Cycle = 10 FF FF FF D Q D Q D Q 6 clock 4 2 4 4 Cycle = 10 ECE 260B – CSE 241A Timing Analysis & Correction 54 http://vlsicad.ucsd.edu Re-Timing Re-order sequential elements and combinational logic Need to predict placement and routing FF FF FF D Q D Q D Q 6 clock 4 4 2 4 Cycle = 10 FF FF FF D Q D Q D Q 6 clock 4 2 4 4 Cycle = 10 ECE 260B – CSE 241A Timing Analysis & Correction 55 http://vlsicad.ucsd.edu Useful Clock Skew Equivalent to re-timing Clock tree re-construction Insert delay cells Snaking Add dummy capacitive load FF FF FF D Q D Q D Q 6 4 4 2 4 +2 clock Cycle = 10 ECE 260B – CSE 241A Timing Analysis & Correction 56 http://vlsicad.ucsd.edu Timing Correction Approaches Re-Synthesis Local Transformation Timing-Driven Placement Timing-Driven Routing Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering ECE 260B – CSE 241A Timing Analysis & Correction 57 http://vlsicad.ucsd.edu Driving Large Capacitances: Inverter As Buffer A U*A 1 U In Cin Total propagation delay = tp(inv) + tp(buffer) Minimize tp = U * tp0 + X/U * tp0 tp0 = delay of min-size inverter with single min-size inverter as fanout load CL = X * Cin Uopt = sqrt(X) ; tp,opt = 2 tp0 * sqrt(X) Use only if combined delay is less than unbuffered case •Slide courtesy of Mary Jane Irwin, PSU ECE 260B – CSE 241A Timing Analysis & Correction 58 http://vlsicad.ucsd.edu Delay Reduction With Cascaded Buffers CL = xCin = uN Cin in Cin 1 u2 u C1 uN-1 C2 out CL Cascade of buffers with increasing sizes (U = tapering factor) can reduce delay If load is driven by a large transistor (which is driven by a smaller transistor) then its turn-on time dominates overall delay Each buffer charges the input capacitance of the next buffer in the chain and speeds up charging, reducing total delay Cascaded buffers are useful when Rint < Rtr •Slide courtesy of Mary Jane Irwin, PSU ECE 260B – CSE 241A Timing Analysis & Correction 59 http://vlsicad.ucsd.edu t as Function of U and X p u/ln(u) 60.0 40.0 x=10,000 x=1000 20.0 x=100 x=10 0.0 1.0 3.0 5.0 7.0 u Total line delay as function of driver size, load capacitance Question: Derive the optimum (min-delay) value of U. •Slide courtesy of Mary Jane Irwin, PSU ECE 260B – CSE 241A Timing Analysis & Correction 60 http://vlsicad.ucsd.edu Reducing RC Delay With Repeaters RC delay is quadratic in length must reduce length Observation: 22 = 4 and 1+1 = 2 but 12 + 12 = 2 driver receiver driver receiver L = 2 units Repeater = strong driver (usually inverter or pair of inverters for non-inversion) that is placed along a long RC line to “break up” the line and reduce delay ECE 260B – CSE 241A Timing Analysis & Correction 61 http://vlsicad.ucsd.edu Repeaters vs. Cascaded Buffers Repeaters are used to drive long RC lines Cascaded buffers are used to drive large capacitive loads, where there is no parasitic resistance Breaking up the quadratic dependence of delay on line length is the goal Typically sized identically We put all buffers at the beginning of the load This would be pointless for a long RC wire since the wire RC delay would be unaffected and would dominate the total delay Optimum buffering for an uniform long interconnect Cascaded buffers at source and sink Identical sized and spaced repeaters in between ECE 260B – CSE 241A Timing Analysis & Correction 62 http://vlsicad.ucsd.edu Buffering a Tree for Timing Optimization Van Ginneken’s dynamic programming Bottom-up traversal Evaluate each sub-tree by a triple <delay, cap, cost> Filter out sub-optimal solutions Limitations <delay, cap, cost> <delay, cap, cost> Buffer insertion locations (explored by edge segmenting) Buffer insertion constraints (e.g., legal buffer locations) Routing detour Delay calculation accuracy (wire delay, slew rate, etc.) ECE 260B – CSE 241A Timing Analysis & Correction 63 http://vlsicad.ucsd.edu Buffering a Tree for Load Cap Constraints Greedy for a single line Greedy for a fixed routing tree Bottom-up traversal Insert a buffer when load cap reaches limit Bottom-up traversal For each edge, greedy insertion For each node, buffer the branch with the largest cap NP-hard for simultaneous buffering and routing construction C1 C2 C3 C4 C1 < U, C2 < U, C3 < U, C4 < U C1 + C2 + C3 + C4 > U ECE 260B – CSE 241A Timing Analysis & Correction 64 http://vlsicad.ucsd.edu Timing-Driven Routing Tree Construction Minimum wirelength (Steiner Minimum Tree) Given a set of terminals S Find an additional set of points A such that a spanning tree T over S A has minimum wirelength May not be timing optimum S T Some sinks are more timing critical than others Some sinks have larger capacitive load Buffers? ECE 260B – CSE 241A Timing Analysis & Correction 65 http://vlsicad.ucsd.edu Timing-Driven Routing Tree Construction Minimum wirelength (Steiner Minimum Tree) Shortest Path Tree AHHK Tree Cost(q) = k * path_length(p) + edge_length(p, q) k = 0 minimum wirelength k = 1 shortest path S T Heuristics with sink timing criticality weights ECE 260B – CSE 241A Timing Analysis & Correction 66 http://vlsicad.ucsd.edu Timing-Driven Routing Tree Construction Simultaneous routing tree construction and buffer insertion Dynamic programming Buffer station (legal buffer locations) Routing blockage P-Tree Clustering (C-Tree) Timing criticality Geometric distance Signal polarity Try AHHK with different k ECE 260B – CSE 241A Timing Analysis & Correction 67 http://vlsicad.ucsd.edu Timing-Driven Routing Tree Topology Optimization Chicken-egg dilemma (delay vs. routing) Iterative greedy improvement (Q-Tree) Delta Elmore delay S T Buffer location ECE 260B – CSE 241A Timing Analysis & Correction 68 http://vlsicad.ucsd.edu