Outline Introduction to CMOS VLSI Design Lecture 7B: Logical Effort Introduction Delay in a Logic Gate Multistage Logic Networks Choosing the Best Number of Stages Example Summary Lecture by Jay Brockman University of Notre Dame Fall 2008 Modified by Peter Kogge Fall 2009 Based on lecture slides by David Harris, Harvey Mudd College http://www.cmosvlsi.com/coursematerials.html 6: Logical Effort Slide 1 Review Assume gate G1 is driving some # of other gates G2 – Fanout = number of such gates being driven Delay of a gate G1 = parasitic delay + effort delay Parasitic Delay = delay if gate G1 is driving 0 load – Function of diffusion capacitance in gate – Delay seen when G1 drives no other circuits Effort Delay: delay due to capacitance of circuits driven by G; function of – the number of gates of type G2 being driven – the input capacitance presented by a G2 gate CMOS VLSI Design 5: Logical Effort CMOS VLSI Design Slide 2 Normalized Delay Normalized Delay: gate delay relative to: – “fanout of 1” inverter (drives one inverter) – with no parasitic capacitance – of value 3RC • R = eqvt resistance of unit nmos transistor in saturation • C = eqvt capacitance of gate of unit nmos transistor CMOS VLSI Design Normalized Parasitic Delay Inverter has 3 units of diffusion capacitance – 2 from pmos – 1 from nmos Parasitic delay of inverter is τ =3RC Normalized Parasitic Delay of a gate is p/3RC – p is parasitic delay from Elmore model To convert to psec, multiply by parasitic delay of an inverter in chosen technology CMOS VLSI Design Normailzed Effort Delay h = fanout or electrical effort = property of circuit – = # equivalent G1 gates being driven by G1 – = Cout/Cin where • Cout = total load capacitance presented by G2 inputs • Cin = input capacitance G1 presents on its sources – When gates G2 (those being driven) are type G1, then • h = # of copies g = logical effort to drive a gate of type G2 – effort required to drive one gate vs perfect inverter – how many eqvt invertors one G2 gate looks like – = (Input cap of G2 gate)/(Input cap of inverter) – = (Input cap of G2 gate)/3 Inverter has logical effort = 1 CMOS VLSI Design Summary Rise Time Input Cap per load # of loads Common Gates Gate type Number of inputs 1 Inverter R Fall Time is similar Cdiff hCin # of G2s Logical effort of type G2 delay = RCdiff + hRCin NOR 2 XOR, XNOR 4/3 5/3 6/3 (n+2)/3 5/3 7/3 9/3 (2n+1)/3 2 2 2 2 4, 4 6, 12, 6 8, 16, 16, 8 2 3 4 n 2 3 4 n 1 NOR pG1 + h*gG2 n Tristate / mux XOR, XNOR 2 2 3 4 n 4 6 8 2n 4 6 8 3RC CMOS VLSI Design g=Logical Effort Number of inputs 1 NAND -------------------------- = 4 Gate type Inverter relative-delay = RCdiff + hRCin 3 1 NAND Tristate / mux 2 CMOS VLSI Design p=Normalized Parasitic n-input NAND Gates parasitic delay = (N2/2 + 5/2N)RC SLOW!!!!!! CMOS VLSI Design MultiStage Logic Networks Relative Input Capacitance (based on gate design & transistor size) gi = logical effort to drive a gate of type i = input cap/cap of inverter hi = fanout of gates of type i = load cap/input cap Question: if delay thru one gate is p + hg, can we write delay thru multistage as some P+HG? CMOS VLSI Design Scaling Transistors What if all transistors in gate G got wider by k? – Denote as gate “G(k)” Parasitic delay of G(k): delay of unloaded gate – Diffusion capacitance increases by k – Resistance decreases by k – Result: No change Effort delay: ratio of load cap to input cap – If drive same # of G(k) as before, no change – If drive same # of G(1) as before, decrease by 1/k Result: fanout to type G(1) gates increases by k CMOS VLSI Design Overall Delay delay = ∑delay(i) – where delay(i) = delay thru I’th “stage” of logic delay(i) = pi + hi * gi – pi function only of gate type at stage i – gi function only of gate type at stage i • input cap/cap of inverter – hi depends on gates at stage i+1 • total load on gate i/input cap of gate I delay = ∑(pi + hi * gi ) = ∑(pi ) + ∑(hi * gi ) Can we write ∑(hi * gi ) as some H*G? CMOS VLSI Design Paths that Branch Definitions Path Logical Effort G = ∏g Branch point No! Consider paths that branch: Individual terms 15 g1 = g2 = 1 (inverters) h1 = (15 +15) / 5 = 6 5 h2 = 90 / 15 = 6 15 Path Terms G = Πgi = 1x1 = 1 H = Cout/Cin = 90 / 5 = 18 GH = 18 F = g1h1g2h2 = 36 = 2GH != GH i Path Electrical Effort H= Cout-path F =∏ Path Effort 10 x g2 = 5/3 h2 = y/x g1 = 1 h1 = x/10 Cin-path f i = ∏ gi hi y g3 = 4/3 h3 = z/y z g4 = 1 h4 = 20/z 20 90 90 Question: Can we write F = GH? 5: Logical Effort CMOS VLSI Design Slide 13 Branching Effort Path Delay Path Effort Delay Con path Note: Now we compute the Path Effort – F = GBH CMOS VLSI Design ∏h i Slide 14 D = ∑ d i = DF + P Path Parasitic Delay Con path + Coff path B = ∏ bi 5: Logical Effort CMOS VLSI Design Multistage Delays Introduce Branching Effort – Accounts for branching between stages in path b= 5: Logical Effort P = ∑ pi DF = ∑ f i = BH Slide 15 5: Logical Effort CMOS VLSI Design Slide 16 Designing Fast Circuits Gate Sizes How wide should the gates be for least delay? D = ∑ d i = DF + P fˆ = gh = g CCoutin Delay is smallest when each stage bears same effort 1 fˆ = gi hi = F N ⇒ Cini = Thus minimum delay of N stage path is Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives. Check work by verifying input cap spec is met. 1 D = NF N + P This is a key result of logical effort – To find fastest possible delay – Doesn’t require calculating gate sizes 5: Logical Effort CMOS VLSI Design Slide 17 5: Logical Effort Example: 3-stage path A 8 5: Logical Effort x CMOS VLSI Design y 45 y B 45 Slide 19 Slide 18 x x A x CMOS VLSI Design Example: 3-stage path Select gate sizes x and y for least delay from A to B x gi Couti fˆ 8 x y 45 y Logical Effort Electrical Effort Branching Effort Path Effort Best Stage Effort Parasitic Delay Delay 5: Logical Effort B 45 G= H= B= F= fˆ = P= D= CMOS VLSI Design Slide 20 Example: 3-stage path x x A 8 x 45 B 45 Logical Effort Electrical Effort Branching Effort Path Effort Best Stage Effort Parasitic Delay Delay 5: Logical Effort Work backward for sizes y= x= y y G = (4/3)*(5/3)*(5/3) = 100/27 H = 45/8 B=3*2=6 F = GBH = 125 fˆ = 3 F = 5 x 8 P=2+3+2=7 D = 3*5 + 7 = 22 = 4.4 FO4 CMOS VLSI Design Slide 21 45 P: 4 N: 6 CMOS VLSI Design P: 12 N: 3 B 5: Logical Effort x y 45 y CMOS VLSI Design B 45 Slide 22 Choosing Best # of Stages Goal: estimate delay & choose transistor sizes Many different topologies (combinations of gate types) that implement same function We know in general – NANDs better than NORs – Gates with fewer inputs better than more inputs Typical shortcut: estimate delay by # of stages – Assuming constant “gate delay” – and thus shorter paths are faster THIS IS NOT ALWAYS TRUE! – Adding inverters at end with increasing sizes can speed up circuit, esp. when high load Work backward for sizes y = 45 * (5/3) / 5 = 15 x = (15*2) * (5/3) / 5 = 10 5: Logical Effort x A Example: 3-stage path A P: 4 N: 4 Example: 3-stage path 45 Slide 23 CMOS VLSI Design Best Number of Stages Example (p. 178) (p. 178) How many stages should a path use? – Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter InitialDriver 1 1 1 How many stages should a path use? – Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter 1 InitialDriver D = 1 D = NF1/N + P = N(64)1/N + N 1 1 1 8 4 2.8 16 8 23 DatapathLoad N: f: D: 5: Logical Effort 64 1 64 2 64 3 CMOS VLSI Design 64 DatapathLoad N: f: D: 4 Slide 25 5: Logical Effort General Derivation D = NF + ∑ pi + ( N − n1 ) pinv 1 N Logic Block: n1Stages Path Effort F N - n1 ExtraInverters N total stages with (N-n1) Inverters • do not change logical effort • do add parasitic delay CMOS VLSI Design 3 4 15 64 4 2.8 15.3 Fastest CMOS VLSI Design Slide 26 pinv + ρ (1 − ln ρ ) = 0 has no closed-form solution Neglecting parasitics (pinv = 0), we find ρ = 2.718 (e) Again, – these ρ values are best logical effort per stage ^ = log – when you have N ρ F stages pinv + ρ (1 − ln ρ ) = 0 5: Logical Effort 2 8 18 64 For pinv = 1, solve numerically for ρ = 3.59 i =1 1 1 1 ∂D = − F N ln F N + F N + pinv = 0 ∂N 1 Define best stage effort ρ = F N 1 64 65 64 Best Stage Effort Consider adding inverters to end of n1 stage path – How many give least delay? n1 64 Slide 27 5: Logical Effort CMOS VLSI Design Slide 28 Sensitivity Analysis 1st Example, Revisited D(N) /D(N) How sensitive is delay to using exactly the best 1.6 1.51 number of stages? 1.4 Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the decoder for a register file. A[3:0] A[3:0] 1.26 1.2 32 bits 1.15 1.0 0.5 0.7 1.0 N/ N 1.4 2.0 = actual N vs optimal N 2.4 < ρ < 6 gives delay within 15% of optimal – We can be sloppy! – I like ρ = 4 5: Logical Effort CMOS VLSI Design Slide 29 What Does This Mean? 16 word register file – There are 16 separate row lines – Branching factor of 16 at end Each word is 32 bits wide & each bit presents load of 3 unit-sized transistors – The load on each row line is 32*3 True and complementary address inputs A[3:0] – Any address input needed for only 8 row lines Each input may drive 10 unit-sized transistors – Total input capacitance from 1st stage gates on inputs = 10 5: Logical Effort CMOS VLSI Design Slide 31 Decoder specifications: – 16 word register file – Each word is 32 bits wide – Each bit presents load of 3 unit-sized transistors – True and complementary address inputs A[3:0] – Each input may drive 10 unit-sized transistors Ben needs to decide: – How many stages to use? – How large should each gate be? – How fast can decoder operate? 5: Logical Effort 16 Register File CMOS VLSI Design Slide 30 Number of Stages Decoder effort is mainly electrical and branching Electrical Effort: H= Branching Effort: B= If we neglect logical effort (assume G = 1) Path Effort: F= Number of Stages: 5: Logical Effort N= CMOS VLSI Design Slide 32 16 words 0.0 4:16 Decoder (ρ =2.4) (ρ=6) Number of Stages 3 Stage Gate Sizes & Delay Decoder effort is mainly electrical and branching Electrical Effort: H = (32*3) / 10 = 9.6 Branching Effort: B=8 If we neglect logical effort (assume G = 1) Path Effort: F = GBH = 76.8 Number of Stages: Logical Effort: Path Effort: Stage Effort: Path Delay: Gate sizes: A[3] A[3] 10 A[2] A[2] 10 10 10 A[1] A[1] 10 10 G = 1 * 6/3 * 1 = 2 F = GBH = 154 fˆ = F 1/ 3 = 5.36 D = 3 fˆ + 1 + 4 + 1 = 22.1 z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7 A[0] A[0] 10 10 N = log4F = 3.1 y z word[0] 96 units of wordline capacitance Try a 3-stage design 5: Logical Effort y CMOS VLSI Design Slide 33 Comparison 5: Logical Effort z word[15] CMOS VLSI Design Slide 34 Review of Definitions Compare many alternatives with a spreadsheet Term Stage Path number of stages 1 N G = ∏ gi Design N G P D NAND4-INV 2 2 5 29.8 logical effort g NAND2-NOR2 2 20/9 4 30.1 electrical effort h= Cout Cin H= Con-path + Coff-path Con-path B = ∏ bi Cout-path Cin-path INV-NAND4-INV 3 2 6 22.1 branching effort b= NAND4-INV-INV-INV 4 2 7 21.1 effort f = gh F = GBH NAND2-NOR2-INV-INV 4 20/9 6 20.5 NAND2-INV-NAND2-INV 4 16/9 6 19.7 effort delay f DF = ∑ f i INV-NAND2-INV-NAND2-INV 5 16/9 7 20.4 parasitic delay p P = ∑ pi NAND2-INV-NAND2-INV-INV-INV 6 16/9 8 21.6 delay d= f +p 5: Logical Effort CMOS VLSI Design Slide 35 5: Logical Effort CMOS VLSI Design D = ∑ d i = DF + P Slide 36 Method of Logical Effort 1) 2) 3) 4) 5) Compute path effort Estimate best number of stages Sketch path with N stages Estimate least delay Determine best stage effort 6) Find gate sizes 5: Logical Effort F = GBH N = log 4 F 1 D = NF N + P 1 fˆ = F N Cini = CMOS VLSI Design gi Couti fˆ Slide 37 Summary Logical effort is useful for thinking of delay in circuits – Numeric logical effort characterizes gates – NANDs are faster than NORs in CMOS – Paths are fastest when effort delays are ~4 – Path delay is weakly sensitive to stages, sizes – But using fewer stages doesn’t mean faster paths – Delay of path is about log4F FO4 inverter delays – Inverters and NAND2 best for driving large caps Provides language for discussing fast circuits – But requires practice to master 5: Logical Effort CMOS VLSI Design Slide 39 Limits of Logical Effort Chicken and egg problem – Need path to compute G – But don’t know number of stages without G Simplistic delay model – Neglects input rise time effects Interconnect – Iteration required in designs with wire Maximum speed only – Not minimum area/power for constrained delay 5: Logical Effort CMOS VLSI Design Slide 38