CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 11: Designing for Speed [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L11 S.1 Review: CMOS Inverter: Dynamic VDD tpHL = f(Rn, CL) Vout CL Rn tpHL = 0.69 Reqn CL tpHL = 0.69 (3/4 (CL VDD)/ IDSATn ) = 0.52 CL / (W/Ln k’n VDSATn ) Vin = V DD Sp12 CMPEN 411 L11 S.2 Review: Designing Inverters for Performance Reduce CL Increase W/L ratio of the transistor the most powerful and effective performance optimization tool in the hands of the designer watch out for self-loading! Increase VDD internal diffusion capacitance of the gate itself interconnect capacitance fanout only minimal improvement in performance at the cost of increased energy dissipation Slope engineering - keeping signal rise and fall times smaller than or equal to the gate propagation delays and of approximately equal values good for performance good for power consumption Sp12 CMPEN 411 L11 S.3 Switch Delay Model Req A A Rp A Rp Rp B B Rn Rp CL A A Rn B NAND Sp12 CMPEN 411 L11 S.4 Cint Cint Rn Rn A B NOR CL Input Pattern Effects on Delay Rp A Rp Delay is dependent on the pattern of inputs Low to high transition B both inputs go low - delay is ____________ Rn CL - delay is ____________ A Rn Cint High to low transition B both inputs go high - delay is ____________ Sp12 CMPEN 411 L11 S.5 one input goes low Adding transistors in series (without sizing) slows down the circuit High to Low Transition (VTC Curve) 2-input NAND with 0.5/0.25 NMOS 0.75/0.25 PMOS 3 A B 2 F= ! (A & B) D A S D B M1 VGS1 = VB S Vout weaker PUN 1 M2 VGS2 = VA –VDS1 A,B: 0 -> 1 B=1, A:0 -> 1 A=1, B:0->1 Cint 0 0 1 Vin 2 The threshold voltage of M2 could be higher than M1 due to the body effect () because of Cint VTn1 = VTn0 VTn2 = VTn0 + ((|2F| + Vint) - |2F|) since VSB of M2 is not zero due to the presence of Cint Sp12 CMPEN 411 L11 S.6 Low to High Transition (Delay Curve) 2-input NAND with 0.5m/0.25m NMOS 0.75m/0.25m PMOS CL = 10 fF 3 A=B=10 2.5 2 A=1, B=10 Voltage, V 1.5 A=10, B=1 1 0.5 0 -0.5 0 100 200 time, psec Sp12 CMPEN 411 L11 S.7 300 400 Input Data Delay Pattern (psec) A=B=01 69 A=1, B=01 62 A= 01, B=1 50 A=B=10 35 A=1, B=10 76 A= 10, B=1 57 Low to High Transition (Delay Curve) A 2-input NAND with 0.5m/0.25m NMOS 0.75m/0.25m PMOS CL = 10 fF B F= ! (A & B) A B M2 M1 Cint Case 1. have to discharge both CL and Cint (really depends on state of Cint – assuming charged up here) Case 2. have to discharge both CL and Cint Case 3. have to discharge only CL Case 4. no Cint to charge, both pfets on so strong pullup Case 5. have to charge both CL and Cint through one pfet Case 6. have to charge only CL but through one pfet Sp12 CMPEN 411 L11 S.8 Input Data Delay Pattern (psec) A=B=01 69 A=1, B=01 62 A= 01, B=1 50 A=B=10 35 A=1, B=10 76 A= 10, B=1 57 Transistor Sizing Rp 1 A Rp B Rn 2 A 2 Rn B Rp 2 B 1 CL Cint Rp 2 A 1 Rn Rn A B Assuming Rp = Rn Sp12 CMPEN 411 L11 S.9 Cint CL 1 Transistor Sizing a Complex CMOS Gate B A C D OUT = !(D + A • (B + C)) A D B Sp12 CMPEN 411 L11 S.10 C Transistor Sizing a Complex CMOS Gate A B 4 12 C 4 12 2 6 D 2 6 OUT = !(D + A • (B + C)) A D 2 1 B 2C Sp12 CMPEN 411 L11 S.11 2 Transistor Sizing a Complex CMOS Gate B A D C OUT D A C B Sp12 CMPEN 411 L11 S.12 Fan-In Considerations A B C D A CL B C3 C C2 D C1 Distributed RC model (Elmore delay) tpHL = 0.69 Reqn(C1+2C2+3C3+4CL) Propagation delay deteriorates rapidly as a function of fan-in – quadratically in the worst case. Sp12 CMPEN 411 L11 S.13 tp as a Function of Fan-In 1250 quadratic function of fan-in tp (psec) 1000 750 tpHL 500 tp 250 tpLH 0 2 4 6 8 10 12 14 linear function of 16 fan-in fan-in Gates with a fan-in greater than 4 should be avoided. Sp12 CMPEN 411 L11 S.14 Fast Complex Gates: Design Technique 1 Transistor sizing as long as fan-out capacitance dominates, the pull down chain is like a distributed RC line so Should all fets be of the same size? InN MN In3 M3 C3 In2 M2 C2 In1 M1 C1 Sp12 CMPEN 411 L11 S.15 CL Fast Complex Gates: Design Technique 1 Transistor sizing as long as fan-out capacitance dominates, the pull down chain is like a distributed RC line so Should all fets be of the same size? No, use progressive sizing InN MN In3 M3 C3 In2 M2 C2 In1 M1 C1 CL M1 > M2 > M3 > … > MN The fet closest to the output should be the smallest. Can reduce delay by more than 20%; decreasing gains as technology shrinks Sp12 CMPEN 411 L11 S.16 Fast Complex Gates: Design Technique 2 Input re-ordering When not all inputs arrive at the same time, the latest arriving signal should be driving the top or bottom fet? critical path In3 1 M3 In2 1 M2 In1 M1 01 Sp12 CMPEN 411 L11 S.17 C charged L C2 C1 critical path 01 In1 M3 CLcharged In2 1 M2 C2 In3 1 M1 C1 Fast Complex Gates: Design Technique 2 Input re-ordering When not all inputs arrive at the same time, the latest arriving signal should be driving the top or bottom fet? critical path In3 1 M3 charged CL In2 1 M2 C2 charged In1 M1 01 C1 charged delay determined by time to discharge CL, C1 and C2 critical path 01 In1 M3 CLcharged In2 1 M2 C2 discharged In3 1 M1 C1 discharged delay determined by time to discharge CL The latest arriving signal should be driving the fet closest to the output. Sp12 CMPEN 411 L11 S.18 Sizing and Input Ordering Effects A 3 B 3 C 3 D A 44 B 45 C 46 C2 D 47 C1 3 CL= 100 fF C3 Progressive sizing in pull-down chain gives up to a 23% improvement. Input ordering saves 5% critical path A – 23% critical path D – 17% Sp12 CMPEN 411 L11 S.19 Fast Complex Gates: Design Technique 3 Alternative logic structures, which is the fastest? F = ABCDEFGH Sp12 CMPEN 411 L11 S.20 Fast Complex Gates: Design Technique 4 Isolating fan-in from fan-out using buffer insertion CL Sp12 CMPEN 411 L11 S.21 CL Fast Complex Gates: Design Technique 5 Logical Effort First proposed by Ivan Sutherland and Bob Sproull in 1991 “Logical Effort: Designing for Speed on the back of an Envelope”, IEEE Advanced Research in VLSI, 1991 Both authors are vice president and fellow at Sun Microsystems Gain-based synthesis based on Logical effort Implemented in IBM’s logic synthesis tool BooleDozer Also adopted by Magma’s logic synthesis tool Sp12 CMPEN 411 L11 S.22 Introduction of Logical Effort Logical Effort is a method to answer these questions: A very simple model of delay Back of the envelope calculations and tractable optimization Who needs to learn about logical effort Circuit designers EDA tool developers Sp12 CMPEN 411 L11 S.23 Application of Logical Effort Alternative logic structures, which is the fastest? F = ABCDEFGH Sp12 CMPEN 411 L11 S.24 Next Lecture: Logical Effort Reading: Textbook pp.252-257 Sp12 CMPEN 411 L11 S.25