EE 201A (Starting 2005, called EE 201B) Modeling and Optimization for VLSI Layout Instructor: Lei He Email: LHE@ee.ucla.edu Chapter 7 Interconnect Delay 7.1 Elmore Delay 7.2 High-order model and moment matching 7.3 Stage delay calculation Basic Circuit Analysis Techniques Network structures & state • Output response Input waveform & zero-states Natural response vN(t) (zero-input response) Forced response vF(t) (zero-state response) For linear circuits: v(t ) vN (t ) vF (t ) • Basic waveforms – Step input – Pulse input – Impulse Input • Use simple input waveforms to understand the impact of network design Basic Input Waveforms 1/ T 1 0 -T/2 T/2 unit step function u(t)= 0 t 0 1 t 0 PT (t ) 1 T T T u ( t ) u ( t ) 2 2 By definition u (t ) or δ(t) unit impulse function pulse function of width T t ( x ) dx du (t ) dt (t ) : PT (t ) when T 0 (t ) 0 for t 0 singular for t 0 s.t. for any 0 (t )dt 1 Step Response vs. Impulse Response • Definitions: – (unit) step input u(t) – (unit) impulse input (t) (Input Waveform) • Relationship (t ) (unit) step response g(t) (unit) impulse response h(t) (Output Waveform) du (t ) dt t u (t ) ( x)dx • Elmore delay 0 0 TD g ' (t )t dt h(t )t dt dg (t ) dt h(t ) t g (t ) h( x)dx 0 Analysis of Simple RC Circuit R i (t ) v (t ) vT (t ) d (Cv(t )) dv (t ) i (t ) C dt dt dv (t ) RC v (t ) vT (t ) dt state variable Input waveform i(t) R vT(t) ± C v(t) first-order linear differential equation with constant coefficients Analysis of Simple RC Circuit zero-input response: (natural response) dv(t ) v(t ) 0 dt 1 dv(t) 1 t v N (t) Ke RC v(t) dt RC RC dv(t ) RC v(t ) v0u (t ) step-input response: dt t vF (t ) v0u(t ) v(t ) Ke RC v0u(t ) match initial state: output response for step-input: v(0) 0 K v0u (t ) 0 v(t ) v0 (1 e t v0 RC )u (t ) v0u(t) v0(1-eRC/T)u(t) Delays of Simple RC Circuit • v(t) = v0(1 - e-t/RC) under step input v0u(t) • v(t)=0.9v0 t = 2.3RC v(t)=0.5v0 t = 0.7RC • Commonly used metric TD = RC (Elmore delay to be defined later) Lumped Capacitance Delay Model • R = driver resistance • C = total interconnect capacitance + loading capacitance • Sink Delay: td = R·C N3 Rd N2 driver N • 50% delay under step input = 0.7RC • Valid when driver resistance >> interconnect resistance • All sinks have equal delay Cload Lumped RC Delay Model t D R d Cload R d Cint Cg R d C0 L Cg • Minimize delay minimize wire length N3 Rd N2 driver N Cload Delay of Distributed RC Lines Vout(t) Laplace Transform Vout(s) VIN R VOUT VIN R VOUT C C VOUT 1 s cosh sRC e x e x cosh( x) 2 x2 x4 1 ...... 2! 4! Vout ( s) 1.0 0.5 DISTRIBUTED LUMPED 1.0RC 2.0RC time Step response of distributed and lumped RC networks. A potential step is applied at VIN, and the resulting VOUT is plotted. The time delays between commonly used reference points in the output potential is also tabulated. Delay of Distributed RC Lines (cont’d) Output potential range Time elapsed (Distributed RC Network) Time elapsed (Lumped RC Network) 0 to 90% 1.0 RC 2.3 RC 10% to 90% (rise time) 0.9 RC 2.2 RC 0 to 63% 0.5 RC 1.0 RC 0 to 50% (delay) 0.4 RC 0.7 RC 0 to 10% 0.1 RC 0.1 RC Distributed Interconnect Models • Distributed RC circuit model – L,T or P circuits • Distributed RCL circuit model • Tree of transmission lines Distributed RC Circuit Models Distributed RLC Circuit Model (without mutual inductance) Z0 Z0 Delays of Complex Circuits under Unit Step Input • Circuits with monotonic response 1 0.5 T50% v(t) t TR • Easy to define delay & rise/fall time • Commonly used definitions – Delay T50% = time to reach half-value, v(T50%) = 0.5Vdd – Rise/fall time TR = 1/v’(T50%) where v’(t): rate of change of v(t) w.r.t. t – Or rise time = time from 10% to 90% of final value • Problem: lack of general analytical formula for T50% & TR! Delays of Complex Circuits under Unit Step Input (cont’d) • Circuits with non-monotonic response t • Much more difficult to define delay & rise/fall time Elmore Delay for Monotonic Responses v(t) 1 • Assumptions: – Unit step input – Monotone output response 0.5 v’(t) t T50% • Basic idea: use of mean of v’(t) to approximate median of v’(t) v(t ) : output response (monotone) v' (t ) : rate of change of v(t ) median of v’(t) (T50%) TD tv' (t )dt 0 mean of v'(t) t Elmore Delay for Monotonic Responses • T50%: median of v’(t), since T5 0% 0 v' (t )dt T5 0% v' (t )dt half of final value of v(t ) (by def.) • Elmore delay TD = mean of v’(t) TD 0 v' (t )tdt Why Elmore Delay? • Elmore delay is easier to compute analytically in most cases – Elmore’s insight [Elmore, J. App. Phy 1948] – Verified later on by many other researchers, e.g. • Elmore delay for RC trees [Penfield-Rubinstein, DAC’81] • Elmore delay for RC networks with ramp input [Chan, TCAS’86] • ..... • For RC trees: [Krauter-Tatuianu-Willis-Pileggi, DAC’95] T50% TD • Note: Elmore delay is not 50% value delay in general! Elmore Delay for RC Trees h(t) = impulse response • Definition – h(t) = impulse response – TD = mean of h(t) = H(t) = step response h(t) t dt 0 • Interpretation – H(t) = output response (step process) median TD tv' (t )dt 0 of v’(t) mean of v'(t) – h(t) = rate of change of H(t) (T50%) – T50%= median of h(t) – Elmore delay approximates the median of h(t) by the mean of h(t) Elmore Delay of a RC Tree [Rubinstein-Penfield-Horowitz, T-CAD’83] Lemma: when a step input is applied to a RC tree vi (t ) is monotonic in t for every node i in tree Proof: v'i (t ) 0 at every node i (v' i (t ) hi (t )) impulse response hi (t ) 0 at every node i Apply impulse func. at t=0: Let hmin (t ) be the min. voltage of any node at t hmin (0) 0 Assume that hmin (t0 ) 0 Then, t1 t0 imin s.t. h'min (t1 ) 0 Let node imin achieve hmin (t1 ) at t1 Then, the current from any node i to imin is 0 at t1 Since hi (t1 ) hmin (t1 ) & i connects imin via resistors current iimin i Since all currents i imin charge the capacitor at imin h'min (t1 ) 0 contradict ion! Elmore Delay in a RC Tree (cont’d) Pi : path from input to node i ; si :subtree rooted at i R jk : resistance of common path Si path resistance Rii Pj Pk from input to j & k TDi Rki Ck Rjk k k dvi (t ) dt 1 vi (t ) The voltage drop on Pi Rk (current to all cap' s in Si ) The current to cap. of node i Ci kPi (current t o cap k ) (common path res. between Pi and Pk ) k Ck k dvk (t ) dv (t ) Rki Rki Ck k dt dt k 0 0 TDi v'i (t )t dt vi (t ) t |0 vi (t )dt T lim [vi (T ) T vi (t )dt ] lim (vi (T ) 1) T (1 vi (t )) dt T 0 i input Theorem : Elmore delay to node i Proof : j T 0 Elmore Delay in a RC Tree (cont’d) • • (1 vi (T )) T 0 We shall show later on that Tlim i.e. 1-vi(T) goes to 0 at a much faster rate than 1/T when T Let t f i (t ) [1 vi ( x)]dx 0 dv ( x) f i (t ) Rki Ck k dx 0 dx k t vi(t) area t f i (t ) [1 vi ( x)]dx 0 1 Rki Ck v k (t ) k Rki Ck Rki Ck [1 vk (t )] k f i () Rki Ck k (1) 0 k TDi lim (1 vi (T ))T [1 vi (t )]dt T 0 f i () Rki Ck k t Some Definitions For Signal Bound Computation Let Tp Rkk Ck k TRi ( Rki2 Ck ) Rii Recall TDi Rki Ck k k Then, TRi TDi Tp (since Rkk Rki & Rii Rki ) Signal Bounds in RC Trees • Theorem Lower bounds 0 vi (t ) 1 1 t0 TDi t 0 (non - trivial when t TDi TRi ) t TRi TDi Tp (T p TRi ) e Tp e t Tp Upper bounds TDi t 1 Tp t T p TRi t0 v i (t ) 1 TRi Tp (TDi TRi ) e TRi e t TRi t TDi TRi Delay Bounds in RC Trees Lower bounds : t TDi Tp 1 vi (t ) t TDi TRi TRi ln TRi Tp 1 vi (t ) when vi (t ) 1 TRi when vi (t ) 1 TDi Tp Upper bounds : TDi t TRi 1 vi (t ) TDi t Tp TRi Tp ln Tp 1 vi (t ) Tp Computation of Elmore Delay & Delay Bounds in RC Trees • Let C(Tk) be total capacitance of subtree rooted at k • Elmore delay TDi R k pi k C (Tk ) upper bound : Tp Rk C (Tk ) k lower bound : TRi Rki2 k Ck Rii * all three formula can be computed in linear tim e recursivel y in a bottom - up fashion Comments on Elmore Delay Model • Advantages – Simple closed-form expression • Useful for interconnect optimization – Upper bound of 50% delay [Gupta et al., DAC’95, TCAD’97] • Actual delay asymptotically approaches Elmore delay as input signal rise time increases – High fidelity [Boese et al., ICCD’93],[Cong-He, TODAES’96] • Good solutions under Elmore delay are good solutions under actual (SPICE) delay Comments on Elmore Delay Model • Disadvantages – Low accuracy, especially poor for slope computation – Inherently cannot handle inductance effect • Elmore delay is first moment of impulse response • Need higher order moments Chapter 7.2 Higher-order Delay Model Time Moments of Impulse Response h(t) • Definition of moments L h(t ) H ( s) 0 0 H ( s) h(t )e st dt 1 i h(t ) ( st ) dt i 0 i! 1 i ( s) h(t ) t i dt 0 i 0 i! 1 i i-th moment mi (1) h(t ) t i dt 0 i! • Note that m1 = Elmore delay when h(t) is monotone voltage response of impulse input Pade Approximation • H(s) can be modeled by Pade approximation of type (p/q): Hˆ p ,q ( s ) – • where q < p << N bp s p b1s b0 aq s q a1s 1 Or modeled by q-th Pade approximation (q << N): q kj ˆ ˆ H q ( s ) H q1,q ( s ) j 1 s p j • Formulate 2q constraints by matching 2q moments to compute ki’s & pi’s General Moment Matching Technique • Basic idea: match the moments m-(2q-r), …, m-1, m0, m1, …, mr-1 Hˆ ( s ) kq k1 k2 s p1 s p2 s pq mˆ 0 mˆ 1s mˆ r 1s r 1 ( s r ) • When r = 2q-1: (i) initial condition matches, i.e. hˆ(0 ) h(0 ), or lim sHˆ ( s ) lim sH ( s ) s s or ( mˆ 1 m1 ) (ii) ˆ k mk for k 0,1,,2q 2 m moments of Hˆ ( s) Compute Residues & Poles ki ki pi ki s pi 1 s pi pi s j ( ) j 0 pi k1 k 2 k q h(0) m1 kq k1 k 2 ( ) m0 p1 p2 pq EQ1 kq k1 k 2 ( 2 2 2 ) m1 p1 p2 pq kq k1 k2 ( 2 q 1 2 q 1 2 q 1 ) m2 q 2 p1 p2 pq ( lim sHˆ ( s )) s initial condition match first 2q-1 moments Basic Steps for Moment Matching Step 1: Compute 2q moments m-1, m0, m1, …, m(2q-2) of H(s) Step 2: Solve 2q non-linear equations of EQ1 to get p1 , p2 ,, pq : poles k1 , k2 ,, kq : residues Step 3: Get approximate waveform pq t p1t p2 t ˆ h(t ) k1e k2e kq e Step 4: Increase q and repeat 1-4, if necessary, for better accuracy Components of Moment Matching Model • Moment computation – Iterative DC analysis on transformed equivalent DC circuit – Recursive computation based on tree traversal – Incremental moment computation • Moment matching methods – Asymptotic Waveform Evaluation (AWE) [Pillage-Rohrer, TCAD’90] – 2-pole method [Horowitz, 1984] [Gao-Zhou, ISCAS’93]... • Moment calculation will be provided as an OPTIONAL reading Chapter 7 Interconnect Delay 7.1 Elmore Delay 7.2 High-order model and moment matching 7.3 Stage delay calculation Stage Delay Source A Interconnect B Load C TAC TAB ( I , L) TBC ( L) TAB (0, L) : Gate delay with out interconne ct TAB ( I , L) : Gate delay with interconne ct TBC ( L) : Propagatio n delay Transition delay TInterconnect TAB ( I , L) TAB (0, L) TBC ( L) Modeling of Capacitive Load • First-order approximation: the driver sees the total capacitance of wires and sinks • Problem: Ignore shielding effect of resistance pessimistic approximation as driving point admittance • Transform interconnect circuit into a p-model [O’Brian-Savarino, ICCAD’89] – Problem: cannot be easily used with most device models • Compute effective capacitance from p-model [Qian-Pullela-Pileggi, TCAD’94] P-Model [O’Brian-Savarino, ICCAD’89] • Moment matching again! • Consider the first three moments of driving point admittance (moments of response current caused by an applied unit impulse) • Current in the downstream of node k I k ( s) C sV (s) jTk j j Yk ( s ) I k ( s ) / Vin ( s ) Yk ( s ) C sH jTk j j ( s) ( i ) i 1 C s ( 1 m s m s ) C m j j j s jTk (1) j ( 2) 2 j i 1 jTk Driving-Point Admittance Approximations • Driving-point admittance = Sum of voltage momentweighted subtree capacitance yk( i ) ( i 1) C m j j jTk Yk ( s ) yk( i ) s i i 1 • Approximation of the driving point admittance at the driver Y (s ) General RC Tree: lumped RC elements, distributed RC lines Driving-Point Admittance Approximations • First order approximation: y(1) = sum of subtree capacitance (1) Y ( s) C y (1) • Second order approximation: yk(2) = sum of subtree capacitance weighted by Elmore delay Y ( 2) ( s) R y ( 2) /( y (1) ) 2 C y (1) Third Order Approximation: P Model Y ( 3) ( s ) R C2 y (1) C1 C2 y ( 2) RC12 y (3) R 2C13 C1 ( y ( 2) ) 2 C1 (3) y C2 y (1) C1 y ( 2) R 2 C1 Current Moment Computation • Similar to the voltage moment computation • Iterative tree traversal: – O(n) run-time, O(n) storage • Bottom-up tree traversal: – O(n) run-time – Can achieve O(k) storage if we impose order of traversal, k = max degree of a node – O’Brian and Savarino used bottom-up tree traversal Bottom-Up Moment Computation • Maintain transfer function Hv~w(s) for sink w in subtree Tv, and moment-weighted capacitance of subtree: m for j 0 p, C for j 0 p 1 j w • j Tv m As we merge subtrees, compute new transfer function Hu~v(s) and weighted capacitance recursively: j 1 CTvj 1 mvj 1Cv mvj 1 q CTqv , q 0 mvj ( Rv CTvj 1 Lv CTvj 2 ) for j 1 p • u p u Rv , Lv , Cv mvp v New transfer function for node w CTpu 1 mvp CTpv 1 CTpv 1 H u ~ w ( s) H u ~ v ( s) H v ~ w ( s) j m mvj q mwq for j 0 p j w • q 0 New moment-weighted capacitance of Tu: CTuj j C Tv for j 0 p 1 vchild ( u ) w mwp mwp Current Moment Computation Rule #1 yU(1) y D(1) C YU (s) YD (s) C yU( 2 ) y D( 2 ) yU(3) y D(3) Current Moment Computation Rule #2 YU (s) R YD (s) yU(1) y D(1) yU( 2) y D( 2) R( y D(1) ) 2 yU(3) y D(3) 2 R( y D(1) )( y D( 2) ) R 2 ( y D(1) )3 Current Moment Computation Rule #3 YU (s) R YD (s) C yU(1) y D(1) C ( 2) U y y ( 2) D 1 2 (1) 2 (1) R ( y D ) C ( y D ) C 3 yU(3) y D(3) R 2( y D(1) )( y D( 2) ) C ( y D( 2) ) 2 2 (1) 2 3 (1) 3 4 (1) 2 R ( y D ) C ( y D ) C ( y D ) C 3 3 15 2 Current Moment Computation Rule #4 (Merging of Sub-trees) B YD1 ( s) YU (s) (1) yU(1) y Di i 1 B B Branches ( 2) yU( 2 ) y Di i 1 YDB (s) B ( 3) yU( 3) y Di i 1 Example: Uniform Distributed RC Segment Purely capacitive TAB/TAB(0) Wide metal (distributive) Wide metal (p) Wide metal (lumped RC) Narrow metal (p) Narrow metal (distributive) Narrow metal (lumped RC) Cload/Cmax Why Effective Capacitance Model? • The p-model is incompatible with existing empirical device models – Mapping of 4D empirical data is not practical from a storage or run-time point of view • Convert from a p-model to an effective capacitance model for compatibility • Equate the average current in the p-load and the Ceff load R in C2 C1 in C eff Equating Average Currents Ip in IC R in C1 C2 1 tD tD 0 1 Ip (t )dt tD tD 0 I C (t )dt Ip ( s ) Yp ( s )Vout ( s ) I C ( s ) sCeff Vout ( s) tD = time taken to reach 50% point, not 50% point of input to 50% point of output C eff Waveform Approximation for Vout(t) • Quadratic from initial voltage (Vi = VDD for falling waveform) to 20% point, linear to the 50% point Vi ct 2 Vout (t ) a b(t t x ) 0 t tx tx t tD • Voltage waveform and first derivative are continuous at tx a Vi ct x2 b 2ct x Average Currents in Capacitors tD 1 tx I C (t ) C ( 2 ct ) dt C ( 2 ct ) dt eff eff x tx t D 0 2Ceff c t x tD [t D tx ] 2 2C2 c t x tx I C2 (t ) [t D ] tD 2 • Average current of C1 is not quite as simple: – Current due to quadratic current in C2 – Current due to linear current in C2 Average Currents in C1 • Average current for (0,tx) in C2 t x 2 2C1 c t x RC1 2 RC1t x ( RC1 ) 1 e I C1 (t ) tx 2 • Average current for (tx,tD) in C2 t x ( t D t x ) C1 c RC1 RC1 2 2 RC1t x 2( RC1 ) 1 e 1 e I C1 (t ) tD tx ( t D t x ) RC1 RC1 2ct x C1 1 1 e t D t x • Average current for (0,tD) in C2 ( t D t x ) t D 2C1 c t x2 RC1 RC1 2 t x t D t x RC1 ( RC1 ) e I C1 (t ) e tD 2 Computation of Effective Capacitance • Equating average currents Ceff t D 2 ( t D t x ) RC1 ( RC1 ) RC1 RC1 C2 C1 1 e e tx tx t D 2 t x (t D 2 ) • Problem: tD and tx are not known a priori • Solution: iterative computation – Set the load capacitance equal to total capacitance – Use table-lookup or K-factor equations to obtain tD and tx t D t d in / 2 tx tD / 2 – Equate average currents and calculate effective capacitance – Set load capacitance equal to effective capacitance and iterate Stage Delay Computation • Calculate output waveform at gate – Using Ceff model to model interconnect • Use the output waveform at gate as the input waveform for interconnect tree load • Apply interconnect reduced-order modeling technique to obtain output waveform at receiver pins