Performance of CMOS Circuits Instructed by Shmuel Wimer Eng. School, Bar-Ilan University Credits: David Harris Harvey Mudd College (Some material copied/taken/adapted from Harris’ lecture notes) Dec 2010 Performance of CMOS Circuits 1 Outline Gate and Diffusion Capacitance RC Delay Models Power and Energy Dynamic Power Static Power Low Power Design Dec 2010 Performance of CMOS Circuits 2 MOSFET Capacitance gate to source gate to drain gate to substrate Dec 2010 Performance of CMOS Circuits 3 Any two conductors separated by an insulator have capacitance Gate to channel capacitor is very important – Creates channel charge necessary for operation Source and drain have capacitance to body – Across reverse-biased diodes – Called diffusion capacitance because it is associated with source/drain diffusion Dec 2010 Performance of CMOS Circuits 4 Gate Capacitance Approximate channel as connected to source Cgs = eoxWL/tox = CoxWL = CpermicronW Cpermicron is typically about 2 fF/mm polysilicon gate W tox n+ L n+ SiO2 gate oxide (good insulator, eox = 3.9e0) p-type body Dec 2010 Performance of CMOS Circuits 5 Accumulation occurs when Vg is negative (for P material). Holes are induced under the oxide. Cgate = CoxA where Cox = eSiO2eo/tox Dec 2010 Performance of CMOS Circuits 6 Depletion occurs when Vg is near zero but < Vtn. Here the Cgate is given by CoxA in series with depletion layer capacitance Cdep Dec 2010 Performance of CMOS Circuits 7 Inversion occurs when Vg is positive and > Vtn (for P material). A model for inversion in comprised of Cox A connecting from gateto-channel and Cdep connecting from channel-to-substrate. Dec 2010 Performance of CMOS Circuits 8 Normalized gate capacitance versus Gate voltage Vgs. High freq behavior is due to the distributed resistance of channel Dec 2010 Performance of CMOS Circuits 9 Normalized Experimental MOS Gate Capacitance Measurements vs Vds, Vgs For Vds = 0, the total gate capacitance Cox A splits equally to the drain and source of the transistor. Dec 2010 Performance of CMOS Circuits 10 For Vds > 0, the gate capacitance tilts more toward the source and becomes roughly 2/3 CoxA to the source and 0 to the drain for high Vds. Dec 2010 Performance of CMOS Circuits 11 Higher Vgs – Vt forces this tilting to occur later, since the device is linear up to Vgs – Vt = Vds. Dec 2010 Performance of CMOS Circuits 12 MOS Transistor Gate Capacitance Model Gate capacitance has different components in different modes, but total remains constant. Dec 2010 Performance of CMOS Circuits 13 Gate capacitance has different components in different modes, but total remains constant. Dec 2010 Performance of CMOS Circuits 14 Diffusion Capacitance Csb, Cdb Undesirable, called parasitic capacitance Capacitance depends on area and perimeter – Use small diffusion nodes – Comparable to Cg for contacted diff – ½ Cg for uncontacted – Varies with process Dec 2010 Performance of CMOS Circuits 15 Diffusion Capacitance (Cont’d) worst best Dec 2010 Performance of CMOS Circuits 16 Effective Resistance Shockley models have limited value – Not accurate enough for modern transistors – Too complicated for much hand analysis Simplification: treat transistor as resistor – Replace Ids(Vds, Vgs) with effective resistance R • Ids = Vds/R – R averaged across switching of digital gate Too inaccurate to predict current at any given time – But good enough to predict RC delay Dec 2010 Performance of CMOS Circuits 17 RC Delay Model Use equivalent circuits for MOS transistors – Ideal switch + capacitance and ON resistance – Unit nMOS has resistance R, capacitance C – Unit pMOS has resistance 2R, capacitance C Capacitance proportional to width Resistance inversely proportional to width d g 2R/k g g kC kC d k s s Dec 2010 kC kC R/k d k s s Performance of CMOS Circuits kC g kC d 18 RC Values Capacitance – C = Cg = Cs = Cd = 2 fF/mm of gate width – Values similar across many processes Resistance – R 6 KW*mm in 0.6um process – Improves with shorter channel lengths Unit transistors – May refer to minimum contacted device (4/2 l) – Or maybe 1 mm wide device – Doesn’t matter as long as you are consistent Dec 2010 Performance of CMOS Circuits 19 Inverter Delay Estimate Estimate the delay of a fanout-of-1 inverter 2C R A 2 Y 2 1 1 2C 2C 2C Y R C R C C C Dec 2010 2C Performance of CMOS Circuits C d = 6RC 20 Transient Response DC analysis tells us Vout if Vin is constant Transient analysis tells us Vout(t) if Vin(t) changes – Requires solving differential equations Input is usually considered to be a step or ramp – From 0 to VDD or vice versa Dec 2010 Performance of CMOS Circuits 21 Inverter Step Response Ex: find step response of inverter driving load cap Vin (t ) u (t t0 )VDD Vout (t t0 ) VDD Vin(t) dVout (t ) I dsn (t ) dt Cload 0 2 I dsn (t ) V V DD t 2 VDD Vt Vout (t ) 2 Dec 2010 V out(t) Cload Idsn(t) t t0 Vout VDD Vt V (t ) V V V out out DD t Performance of CMOS Circuits 22 Inverter Step Response Ex: find step response of inverter driving load cap Vin(t) Vin (t) V out(t) Cload Idsn(t) Vout(t) t0 Dec 2010 t Performance of CMOS Circuits 23 Delay Definitions rising delay falling delay low to high propagation delay high to low propagation delay Dec 2010 Performance of CMOS Circuits 24 Dec 2010 Performance of CMOS Circuits 25 Delay Definitions (Cont’d) tpdr: rising propagation delay – Maximum time from input crossing 50% to rising output crossing 50% tpdf: falling propagation delay – Maximum time from input crossing 50% to falling output crossing 50% tpd: average propagation delay – tpd = (tpdr + tpdf)/2 tr: rise time – From output crossing 0.2 VDD to 0.8 VDD Dec 2010 Performance of CMOS Circuits 26 Delay Definitions (Cont’d) tf: fall time – From output crossing 0.8 VDD to 0.2 VDD tcdr: rising contamination delay – Minimum time from input crossing 50% to rising output crossing 50% tcdf: falling contamination delay – Minimum time from input crossing 50% to falling output crossing 50% tcd: average contamination delay – tcd = (tcdr + tcdf)/2 Dec 2010 Performance of CMOS Circuits 27 Simulated Inverter Delay Solving differential equations by hand is too hard SPICE simulator solves the equations numerically – Uses more accurate I-V models too! But simulations take time to write 2. 0 1. 5 1. 0 t pdr = 83ps (V) Vin t pdf = 66ps Vout 0. 5 0. 0 0. 0 200p 400p 600p 800 p 1n t(s) Dec 2010 Performance of CMOS Circuits 28 Delay Estimation We would like to be able to easily estimate delay – Not as accurate as simulation – But easier to ask “What if?” The step response usually looks like a 1st order RC response with a decaying exponential. Use RC delay models to estimate delay – C = total capacitance on output node – Use effective resistance R – So that tpd = RC Characterize transistors by finding their effective R – Depends on average current as gate switches Dec 2010 Performance of CMOS Circuits 29 RC Delay Model Use equivalent circuits for MOS transistors – Ideal switch + capacitance and ON resistance – Unit nMOS has resistance R, capacitance C – Unit pMOS has resistance 2R, capacitance C Capacitance proportional to width Resistance inversely proportional to width d g 2R/k g g kC kC d k s s Dec 2010 kC kC R/k d k s s Performance of CMOS Circuits kC g kC d 30 Example: 3-input NAND Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R). 2 2 2 all N devices must In worst case of P only one device is 3 opened. 3 be opened. 3 Dec 2010 Performance of CMOS Circuits 31 3-input NAND Caps Annotate the 3-input NAND gate with gate and diffusion capacitance. 2C 2C 2 2C 2C 2 2C 2C 3C 3C 3C Dec 2010 2C 2C 2 2C 3 3 3 3C 3C 3C 3C Performance of CMOS Circuits 32 3-input NAND Caps (Cont’d) Annotate the 3-input NAND gate with gate and diffusion capacitance. 2 2 3 5C 5C 5C Dec 2010 2 3 3 Performance of CMOS Circuits 9C 3C 3C 33 Elmore Delay ON transistors look like resistors Pullup or pulldown network modeled as RC ladder Elmore delay of RC ladder t pd Ri to sourceCi nodes i R1C1 R1 R2 C2 ... R1 R2 ... RN C N R1 Dec 2010 R2 R3 C1 C2 RN C3 Performance of CMOS Circuits CN 34 For a step input Vin, the delay at any node can be estimated with the Elmore delay equation tDi = Cj Rk For example, the Elmore delay at node 7 is give by: R1 ( C1 + C2 + C3 + C4 + C5+ C6+ C7 + C8 ) + R6 ( C6+ C7+ C8 )+ R7 ( C7 + C8) Dec 2010 Performance of CMOS Circuits 35 Dec 2010 Performance of CMOS Circuits 36 Dec 2010 Performance of CMOS Circuits 37 Dec 2010 Performance of CMOS Circuits 38 Example: 2-input NAND Estimate rising and falling propagation delays of a 2input NAND driving h identical gates. 2 2 6C A 2 B 2x R Dec 2010 Y (6+4h)C Y 4hC h copies 2C t pdr 6 4h RC Performance of CMOS Circuits 39 Example: 2-input NAND Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates. 2 h copies x R/2 Dec 2010 R/2 2C Y (6+4h)C 2 A 2 B 2x 6C Y 4hC 2C t pdf 2C R2 6 4h C R2 R2 7 4h RC Performance of CMOS Circuits 40 Delay Components Delay has two parts – Parasitic delay • 6 or 7 RC • Independent of load – Effort delay • 4h RC • Proportional to load capacitance Dec 2010 Performance of CMOS Circuits 41 Contamination Delay Best-case (contamination) delay can be substantially less than propagation delay. Ex: If both inputs fall simultaneously 2 2 6C A 2 B 2x R R Y (6+4h)C Dec 2010 Y 4hC 2C tcdr 3 2h RC Performance of CMOS Circuits 42 Diffusion Capacitance we assumed contacted diffusion on every s / d Good layout minimizes diffusion area Ex: NAND3 layout shares one diffusion contact – Reduces output capacitance by 2C – Merged uncontacted diffusion might help too 2C Shared Contacted Diffusion 2C Isolated Contacted Diffusion Merged Uncontacted Diffusion 2 2 2 3 3 3 7C 3C 3C 3C 3C 3C Dec 2010 Performance of CMOS Circuits 43 Layout Comparison Layout representation by stick diagram. What CKT? Which layout is better? VDD A VDD B Y GND Dec 2010 A B Y GND Performance of CMOS Circuits 44 Power and Energy Power is drawn from a voltage source attached to the VDD pin(s) of a chip. Instantaneous Power: P(t ) iDD (t )VDD T 0 0 E P(t )dt iDD (t )VDDdt Energy: Average Power: Dec 2010 T T E 1 Pavg iDD (t )VDD dt T T 0 Performance of CMOS Circuits 45 Dynamic Power Dynamic power is required to charge and discharge load capacitances when transistors switch One cycle involves a rising and falling output On rising output, charge Q = CVDD is required On falling output, charge is dumped to GND This repeats Tfsw times over an interval of T VDD iDD (t) f sw Dec 2010 C Performance of CMOS Circuits 46 T Pdynamic E 1 iDD (t )VDD dt T T0 T VDD VDD 2 i ( t ) dt Tf CV CV DD sw DD DD f sw T 0 T VDD iDD (t) f sw Dec 2010 C Performance of CMOS Circuits 47 Activity Factor Suppose the system clock frequency = f Let fsw = af, where a = activity factor – If the signal is a clock, a = 1 – If the signal switches once per cycle, a = ½ – Static gates: • Depends on design, but typically a = 0.1 – Dynamic gates: • Switch either 0 or 2 times per cycle, a = ½ Dynamic power: Dec 2010 Pdynamic aCVDD2 f Performance of CMOS Circuits 48 Short Circuit Current When transistors switch, both nMOS and pMOS networks may be momentarily ON at once Leads to a blip of “short circuit” current. < 10% of dynamic power if rise/fall times are comparable for input and output Dec 2010 Performance of CMOS Circuits 49 Power Dissipation Sources Ptotal = Pdynamic + Pstatic Dynamic power: Pdynamic = Pswitching + Pshortcircuit – Switching load capacitances – Short-circuit current Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD – Sub-threshold leakage – Gate leakage – Junction leakage – Contention current Dec 2010 Performance of CMOS Circuits 50 Dynamic Power Example 1 billion transistor chip – 50M logic transistors • Average width: 12 l • Activity factor = 0.1 – 950M memory transistors • Average width: 4 l • Activity factor = 0.02 – 1.0 V 65 nm process – C = 1 fF/mm (gate) + 0.8 fF/mm (diffusion) Estimate dynamic power consumption @ 1 GHz. Neglect wire capacitance and short-circuit current. Dec 2010 Performance of CMOS Circuits 51 Power Estimate Ex (Cont’d) Clogic 50 106 12l 0.025m m / l 1.8 fF / m m 27nF Cmem 950 106 4l 0.025m m / l 1.8 fF / m m 171nF Pdynamic 0.1Clogic 0.02Cmem 1.0 1.0GHz 6.1W 2 Dec 2010 Performance of CMOS Circuits 52 Dynamic Power Reduction Pswitching aCVDD2 f Try to minimize: – Activity factor – Capacitance – Supply voltage – Frequency Dec 2010 Performance of CMOS Circuits 53 Activity Factor Estimation Let Pi = Prob(node i = 1) ai = Pi *(1- Pi) Completely random data has P = 0.5 and a = 0.25 Data is often not completely random – e.g. upper bits of 64-bit words representing bank account balances are usually 0 Data propagating through ANDs and ORs has lower activity factor – Depends on design, but typically a ≈ 0.1 Dec 2010 Performance of CMOS Circuits 54 Switching Probability Dec 2010 Performance of CMOS Circuits 55 Example A 4-input AND is built out of two levels of gates Estimate the activity factor at each node if the inputs have P = 0.5 Dec 2010 Performance of CMOS Circuits 56 Clock Gating The best way to reduce the activity is to turn off the clock to registers in unused blocks – Saves clock activity (a = 1) – Eliminates all switching activity in the block – Requires determining if block will be used Dec 2010 Performance of CMOS Circuits 57 Capacitance Gate capacitance – Fewer stages of logic – Small gate sizes Wire capacitance – Good floorplanning to keep communicating blocks close to each other – Drive long wires with inverters or buffers rather than complex gates Dec 2010 Performance of CMOS Circuits 58 Voltage / Frequency Run each block at the lowest possible voltage and frequency that meets performance requirements Voltage Domains – Provide separate supplies to different blocks – Level converters required when crossing from low to high VDD domains Dynamic Voltage Scaling – Adjust VDD and f according to workload Dec 2010 Performance of CMOS Circuits 59 Static Power Static power is consumed even when chip is quiescent – Ratioed circuits burn power in fight between ON transistors. Occurs when output is low (0). – Leakage draws power from nominally OFF devices Dec 2010 Performance of CMOS Circuits 60 Static Power Example Revisit power estimation for 1 billion transistor chip Estimate static power consumption – Subthreshold leakage • Normal Vt: 100 nA/mm • High Vt: 10 nA/mm • High Vt used in all memories and in 95% of logic gates – Gate leakage 5 nA/mm – Junction leakage negligible Dec 2010 Performance of CMOS Circuits 61 Solution Wnormal-Vt 50 106 12l 0.025m m / l 0.05 0.75 106 m m Whigh-Vt 50 106 12l 0.95 950 106 4l 0.025m m / l 109.25 106 m m I sub Wnormal-Vt 100 nA/m m+Whigh-Vt 10 nA/m m / 2 584 mA I gate Wnormal-Vt Whigh-Vt 5 nA/m m / 2 275 mA Pstatic 584 mA 275 mA1.0 V 859 mW Dec 2010 Performance of CMOS Circuits 62 Leakage Control Leakage and delay trade off – Aim for low leakage in sleep and low delay in active mode To reduce leakage: – Increase Vt: multiple Vt • Use low Vt only in critical circuits – Increase Vs: stack effect • Input vector control in sleep – Decrease Vb • Reverse body bias in sleep • Or forward body bias in active mode Dec 2010 Performance of CMOS Circuits 63 Gate Leakage Extremely strong function of tox and Vgs – Negligible for older processes – Approaches subthreshold leakage at 65 nm and below in some processes An order of magnitude less for pMOS than nMOS Control leakage in the process using tox > 10.5 Å – High-k gate dielectrics help – Some processes provide multiple tox • e.g. thicker oxide for 3.3 V I/O transistors Control leakage in circuits by limiting VDD Dec 2010 Performance of CMOS Circuits 64 Power Gating Turn OFF power to blocks when they are idle to save leakage – Use virtual VDD (VDDV) – Gate outputs to prevent invalid logic levels to next block Voltage drop across sleep transistor degrades performance during normal operation – Size the transistor wide enough to minimize impact Switching wide sleep transistor costs dynamic power – Only justified when circuit sleeps long enough Dec 2010 Performance of CMOS Circuits 65 Low Power Design Reduce dynamic power a: clock gating, sleep mode – C: small transistors (esp. on clock), short wires – VDD: lowest suitable voltage – f: lowest suitable frequency Reduce static power – Selectively use ratioed circuits – Selectively use low Vt devices – Leakage reduction: stacked devices, body bias, low temperature Dec 2010 Performance of CMOS Circuits 66