3. Preformance of CMOS Circuits

advertisement
Performance of CMOS
Circuits
Instructed by Shmuel Wimer
Eng. School, Bar-Ilan University
Credits: David Harris
Harvey Mudd College
(Some material copied/taken/adapted from
Harris’ lecture notes)
Dec 2010
Performance of CMOS Circuits
1
Outline
 Gate and Diffusion Capacitance
 RC Delay Models
 Power and Energy
 Dynamic Power
 Static Power
 Low Power Design
Dec 2010
Performance of CMOS Circuits
2
MOSFET Capacitance
gate to
source
gate to
drain
gate to
substrate
Dec 2010
Performance of CMOS Circuits
3
 Any two conductors separated by an insulator have
capacitance
 Gate to channel capacitor is very important
– Creates channel charge necessary for operation
 Source and drain have capacitance to body
– Across reverse-biased diodes
– Called diffusion capacitance because it is
associated with source/drain diffusion
Dec 2010
Performance of CMOS Circuits
4
Gate Capacitance
 Approximate channel as connected to source
 Cgs = eoxWL/tox = CoxWL = CpermicronW
 Cpermicron is typically about 2 fF/mm
polysilicon
gate
W
tox
n+
L
n+
SiO2 gate oxide
(good insulator, eox = 3.9e0)
p-type body
Dec 2010
Performance of CMOS Circuits
5
Accumulation occurs
when Vg is negative (for
P material). Holes are
induced under the oxide.
Cgate = CoxA where Cox =
eSiO2eo/tox
Dec 2010
Performance of CMOS Circuits
6
Depletion occurs
when Vg is near zero
but < Vtn. Here the
Cgate is given by CoxA
in series with
depletion layer
capacitance Cdep
Dec 2010
Performance of CMOS Circuits
7
Inversion occurs when
Vg is positive and >
Vtn (for P material). A
model for inversion in
comprised of Cox A
connecting from gateto-channel and Cdep
connecting from
channel-to-substrate.
Dec 2010
Performance of CMOS Circuits
8
Normalized gate
capacitance versus
Gate voltage Vgs.
High freq behavior
is due to the
distributed
resistance of
channel
Dec 2010
Performance of CMOS Circuits
9
Normalized Experimental MOS Gate Capacitance
Measurements vs Vds, Vgs
For Vds = 0, the total gate capacitance Cox A splits
equally to the drain and source of the transistor.
Dec 2010
Performance of CMOS Circuits
10
For Vds > 0, the gate capacitance tilts more toward
the source and becomes roughly 2/3 CoxA to the
source and 0 to the drain for high Vds.
Dec 2010
Performance of CMOS Circuits
11
Higher Vgs – Vt forces this tilting to occur later,
since the device is linear up to Vgs – Vt = Vds.
Dec 2010
Performance of CMOS Circuits
12
MOS Transistor Gate Capacitance Model
Gate capacitance has different components in different modes, but total
remains constant.
Dec 2010
Performance of CMOS Circuits
13
Gate capacitance has different components in different modes, but total
remains constant.
Dec 2010
Performance of CMOS Circuits
14
Diffusion Capacitance
 Csb, Cdb
 Undesirable, called parasitic capacitance
 Capacitance depends on area and perimeter
– Use small diffusion nodes
– Comparable to Cg
for contacted diff
– ½ Cg for uncontacted
– Varies with process
Dec 2010
Performance of CMOS Circuits
15
Diffusion Capacitance (Cont’d)
worst
best
Dec 2010
Performance of CMOS Circuits
16
Effective Resistance
 Shockley models have limited value
– Not accurate enough for modern transistors
– Too complicated for much hand analysis
 Simplification: treat transistor as resistor
– Replace Ids(Vds, Vgs) with effective resistance R
• Ids = Vds/R
– R averaged across switching of digital gate
 Too inaccurate to predict current at any given time
– But good enough to predict RC delay
Dec 2010
Performance of CMOS Circuits
17
RC Delay Model
 Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R, capacitance C
 Capacitance proportional to width
 Resistance inversely proportional to width
d
g
2R/k
g
g
kC
kC
d
k
s
s
Dec 2010
kC
kC
R/k
d
k
s
s
Performance of CMOS Circuits
kC
g
kC
d
18
RC Values
 Capacitance
– C = Cg = Cs = Cd = 2 fF/mm of gate width
– Values similar across many processes
 Resistance
– R  6 KW*mm in 0.6um process
– Improves with shorter channel lengths
 Unit transistors
– May refer to minimum contacted device (4/2 l)
– Or maybe 1 mm wide device
– Doesn’t matter as long as you are consistent
Dec 2010
Performance of CMOS Circuits
19
Inverter Delay Estimate
 Estimate the delay of a fanout-of-1 inverter
2C
R
A
2 Y
2
1
1
2C
2C
2C
Y
R
C
R
C
C
C
Dec 2010
2C
Performance of CMOS Circuits
C
d = 6RC
20
Transient Response
 DC analysis tells us Vout if Vin is constant
 Transient analysis tells us Vout(t) if Vin(t) changes
– Requires solving differential equations
 Input is usually considered to be a step or ramp
– From 0 to VDD or vice versa
Dec 2010
Performance of CMOS Circuits
21
Inverter Step Response
 Ex: find step response of inverter driving load cap
Vin (t )  u (t  t0 )VDD
Vout (t  t0 )  VDD
Vin(t)
dVout (t )
I dsn (t )

dt
Cload

0


2

I dsn (t )  
V

V


DD
t
2

  VDD  Vt  Vout (t )
2
 
Dec 2010
V out(t)
Cload
Idsn(t)
t  t0
Vout  VDD  Vt
V (t ) V  V  V
 out
out
DD
t

Performance of CMOS Circuits
22
Inverter Step Response
 Ex: find step response of inverter driving load cap
Vin(t)
Vin (t)
V out(t)
Cload
Idsn(t)
Vout(t)
t0
Dec 2010
t
Performance of CMOS Circuits
23
Delay Definitions
rising delay
falling delay
low to high
propagation
delay
high to low
propagation
delay
Dec 2010
Performance of CMOS Circuits
24
Dec 2010
Performance of CMOS Circuits
25
Delay Definitions (Cont’d)
 tpdr: rising propagation delay
– Maximum time from input crossing 50% to rising
output crossing 50%
 tpdf: falling propagation delay
– Maximum time from input crossing 50% to falling
output crossing 50%
 tpd: average propagation delay
– tpd = (tpdr + tpdf)/2
 tr: rise time
– From output crossing 0.2 VDD to 0.8 VDD
Dec 2010
Performance of CMOS Circuits
26
Delay Definitions (Cont’d)
 tf: fall time
– From output crossing 0.8 VDD to 0.2 VDD
 tcdr: rising contamination delay
– Minimum time from input crossing 50% to rising
output crossing 50%
 tcdf: falling contamination delay
– Minimum time from input crossing 50% to falling
output crossing 50%
 tcd: average contamination delay
– tcd = (tcdr + tcdf)/2
Dec 2010
Performance of CMOS Circuits
27
Simulated Inverter Delay
 Solving differential equations by hand is too hard
 SPICE simulator solves the equations numerically
– Uses more accurate I-V models too!
 But simulations take time to write
2. 0
1. 5
1. 0
t pdr = 83ps
(V)
Vin
t pdf = 66ps
Vout
0. 5
0. 0
0. 0
200p
400p
600p
800 p
1n
t(s)
Dec 2010
Performance of CMOS Circuits
28
Delay Estimation
 We would like to be able to easily estimate delay
– Not as accurate as simulation
– But easier to ask “What if?”
 The step response usually looks like a 1st order RC
response with a decaying exponential.
 Use RC delay models to estimate delay
– C = total capacitance on output node
– Use effective resistance R
– So that tpd = RC
 Characterize transistors by finding their effective R
– Depends on average current as gate switches
Dec 2010
Performance of CMOS Circuits
29
RC Delay Model
 Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R, capacitance C
 Capacitance proportional to width
 Resistance inversely proportional to width
d
g
2R/k
g
g
kC
kC
d
k
s
s
Dec 2010
kC
kC
R/k
d
k
s
s
Performance of CMOS Circuits
kC
g
kC
d
30
Example: 3-input NAND
 Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall resistances
equal to a unit inverter (R).
2
2
2
all N devices must
In worst case of P
only one device is
3
opened.
3
be opened.
3
Dec 2010
Performance of CMOS Circuits
31
3-input NAND Caps
 Annotate the 3-input NAND gate with gate and
diffusion capacitance.
2C
2C
2
2C
2C
2
2C
2C
3C
3C
3C
Dec 2010
2C
2C
2
2C
3
3
3
3C
3C
3C
3C
Performance of CMOS Circuits
32
3-input NAND Caps (Cont’d)
 Annotate the 3-input NAND gate with gate and
diffusion capacitance.
2
2
3
5C
5C
5C
Dec 2010
2
3
3
Performance of CMOS Circuits
9C
3C
3C
33
Elmore Delay
 ON transistors look like resistors
 Pullup or pulldown network modeled as RC ladder
 Elmore delay of RC ladder

t pd 
Ri to  sourceCi
nodes i
 R1C1   R1  R2  C2  ...   R1  R2  ...  RN  C N
R1
Dec 2010
R2
R3
C1
C2
RN
C3
Performance of CMOS Circuits
CN
34
For a step input Vin, the delay at any node can be estimated with the
Elmore delay equation tDi =  Cj  Rk
For example, the Elmore delay at node 7 is give by:
R1 ( C1 + C2 + C3 + C4 + C5+ C6+ C7 + C8 ) +
R6 ( C6+ C7+ C8 )+
R7 ( C7 + C8)
Dec 2010
Performance of CMOS Circuits
35
Dec 2010
Performance of CMOS Circuits
36
Dec 2010
Performance of CMOS Circuits
37
Dec 2010
Performance of CMOS Circuits
38
Example: 2-input NAND
 Estimate rising and falling propagation delays of a 2input NAND driving h identical gates.
2
2
6C
A
2
B
2x
R
Dec 2010
Y
(6+4h)C
Y
4hC
h copies
2C
t pdr   6  4h  RC
Performance of CMOS Circuits
39
Example: 2-input NAND
 Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
2
h copies
x
R/2
Dec 2010
R/2
2C
Y
(6+4h)C
2
A
2
B
2x
6C
Y
4hC
2C
t pdf   2C   R2    6  4h  C   R2  R2 
  7  4h  RC
Performance of CMOS Circuits
40
Delay Components
 Delay has two parts
– Parasitic delay
• 6 or 7 RC
• Independent of load
– Effort delay
• 4h RC
• Proportional to load capacitance
Dec 2010
Performance of CMOS Circuits
41
Contamination Delay
 Best-case (contamination) delay can be substantially
less than propagation delay.
 Ex: If both inputs fall simultaneously
2
2
6C
A
2
B
2x
R R
Y
(6+4h)C
Dec 2010
Y
4hC
2C
tcdr   3  2h  RC
Performance of CMOS Circuits
42
Diffusion Capacitance
 we assumed contacted diffusion on every s / d
 Good layout minimizes diffusion area
 Ex: NAND3 layout shares one diffusion contact
– Reduces output capacitance by 2C
– Merged uncontacted diffusion might help too
2C
Shared
Contacted
Diffusion
2C
Isolated
Contacted
Diffusion
Merged
Uncontacted
Diffusion
2
2
2
3
3
3
7C
3C
3C
3C 3C 3C
Dec 2010
Performance of CMOS Circuits
43
Layout Comparison
 Layout representation by stick diagram. What CKT?
 Which layout is better?
VDD
A
VDD
B
Y
GND
Dec 2010
A
B
Y
GND
Performance of CMOS Circuits
44
Power and Energy
 Power is drawn from a voltage source attached to
the VDD pin(s) of a chip.
 Instantaneous Power: P(t )  iDD (t )VDD
T
0
0
E   P(t )dt   iDD (t )VDDdt
 Energy:
 Average Power:
Dec 2010
T
T
E 1
Pavg    iDD (t )VDD dt
T T 0
Performance of CMOS Circuits
45
Dynamic Power
 Dynamic power is required to charge and discharge
load capacitances when transistors switch
 One cycle involves a rising and falling output
 On rising output, charge Q = CVDD is required
 On falling output, charge is dumped to GND
 This repeats Tfsw times over an interval of T
VDD
iDD (t)
f sw
Dec 2010
C
Performance of CMOS Circuits
46
T
Pdynamic
E 1
   iDD (t )VDD dt 
T T0
T
VDD
VDD
2
i
(
t
)
dt

Tf
CV

CV


DD
sw
DD
DD f sw

T 0
T
VDD
iDD (t)
f sw
Dec 2010
C
Performance of CMOS Circuits
47
Activity Factor
 Suppose the system clock frequency = f
 Let fsw = af, where a = activity factor
– If the signal is a clock, a = 1
– If the signal switches once per cycle, a = ½
– Static gates:
• Depends on design, but typically a = 0.1
– Dynamic gates:
• Switch either 0 or 2 times per cycle, a = ½
 Dynamic power:
Dec 2010
Pdynamic  aCVDD2 f
Performance of CMOS Circuits
48
Short Circuit Current
 When transistors switch, both nMOS and pMOS
networks may be momentarily ON at once
 Leads to a blip of “short circuit” current.
 < 10% of dynamic power if rise/fall times are
comparable for input and output
Dec 2010
Performance of CMOS Circuits
49
Power Dissipation Sources
 Ptotal = Pdynamic + Pstatic
 Dynamic power: Pdynamic = Pswitching + Pshortcircuit
– Switching load capacitances
– Short-circuit current
 Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD
– Sub-threshold leakage
– Gate leakage
– Junction leakage
– Contention current
Dec 2010
Performance of CMOS Circuits
50
Dynamic Power Example
 1 billion transistor chip
– 50M logic transistors
• Average width: 12 l
• Activity factor = 0.1
– 950M memory transistors
• Average width: 4 l
• Activity factor = 0.02
– 1.0 V 65 nm process
– C = 1 fF/mm (gate) + 0.8 fF/mm (diffusion)
 Estimate dynamic power consumption @ 1 GHz.
Neglect wire capacitance and short-circuit current.
Dec 2010
Performance of CMOS Circuits
51
Power Estimate Ex (Cont’d)
Clogic   50 106  12l  0.025m m / l 1.8 fF / m m   27nF
Cmem   950  106   4l  0.025m m / l 1.8 fF / m m   171nF
Pdynamic  0.1Clogic  0.02Cmem  1.0  1.0GHz   6.1W
2
Dec 2010
Performance of CMOS Circuits
52
Dynamic Power Reduction

Pswitching  aCVDD2 f
 Try to minimize:
– Activity factor
– Capacitance
– Supply voltage
– Frequency
Dec 2010
Performance of CMOS Circuits
53
Activity Factor Estimation




Let Pi = Prob(node i = 1)
ai = Pi *(1- Pi)
Completely random data has P = 0.5 and a = 0.25
Data is often not completely random
– e.g. upper bits of 64-bit words representing bank
account balances are usually 0
 Data propagating through ANDs and ORs has lower
activity factor
– Depends on design, but typically a ≈ 0.1
Dec 2010
Performance of CMOS Circuits
54
Switching Probability
Dec 2010
Performance of CMOS Circuits
55
Example
 A 4-input AND is built out of two levels of gates
 Estimate the activity factor at each node if the inputs
have P = 0.5
Dec 2010
Performance of CMOS Circuits
56
Clock Gating
 The best way to reduce the activity is to turn
off the clock to registers in unused blocks
– Saves clock activity (a = 1)
– Eliminates all switching activity in the block
– Requires determining if block will be used
Dec 2010
Performance of CMOS Circuits
57
Capacitance
 Gate capacitance
– Fewer stages of logic
– Small gate sizes
 Wire capacitance
– Good floorplanning to keep communicating
blocks close to each other
– Drive long wires with inverters or buffers rather
than complex gates
Dec 2010
Performance of CMOS Circuits
58
Voltage / Frequency
 Run each block at the lowest possible
voltage and frequency that meets
performance requirements
 Voltage Domains
– Provide separate supplies to different blocks
– Level converters required when crossing
from low to high VDD domains
 Dynamic Voltage Scaling
– Adjust VDD and f according to
workload
Dec 2010
Performance of CMOS Circuits
59
Static Power
 Static power is consumed even when chip is
quiescent
– Ratioed circuits burn power in fight between ON
transistors. Occurs when output is low (0).
– Leakage draws power from nominally OFF
devices
Dec 2010
Performance of CMOS Circuits
60
Static Power Example
 Revisit power estimation for 1 billion transistor chip
 Estimate static power consumption
– Subthreshold leakage
• Normal Vt:
100 nA/mm
• High Vt:
10 nA/mm
• High Vt used in all memories and in 95% of
logic gates
– Gate leakage
5 nA/mm
– Junction leakage
negligible
Dec 2010
Performance of CMOS Circuits
61
Solution
Wnormal-Vt   50 106  12l  0.025m m / l  0.05   0.75 106 m m
Whigh-Vt   50 106  12l  0.95   950 106   4l    0.025m m / l   109.25 106 m m
I sub  Wnormal-Vt 100 nA/m m+Whigh-Vt 10 nA/m m  / 2  584 mA


I gate   Wnormal-Vt  Whigh-Vt  5 nA/m m  / 2  275 mA


Pstatic  584 mA  275 mA1.0 V  859 mW
Dec 2010
Performance of CMOS Circuits
62
Leakage Control
 Leakage and delay trade off
– Aim for low leakage in sleep and low delay in
active mode
 To reduce leakage:
– Increase Vt: multiple Vt
• Use low Vt only in critical circuits
– Increase Vs: stack effect
• Input vector control in sleep
– Decrease Vb
• Reverse body bias in sleep
• Or forward body bias in active mode
Dec 2010
Performance of CMOS Circuits
63
Gate Leakage
 Extremely strong function of tox and Vgs
– Negligible for older processes
– Approaches subthreshold leakage at 65 nm and
below in some processes
 An order of magnitude less for pMOS than nMOS
 Control leakage in the process using tox > 10.5 Å
– High-k gate dielectrics help
– Some processes provide multiple tox
• e.g. thicker oxide for 3.3 V I/O transistors
 Control leakage in circuits by limiting VDD
Dec 2010
Performance of CMOS Circuits
64
Power Gating
 Turn OFF power to blocks when they are idle to
save leakage
– Use virtual VDD (VDDV)
– Gate outputs to prevent
invalid logic levels to next block
 Voltage drop across sleep transistor degrades
performance during normal operation
– Size the transistor wide enough to minimize
impact
 Switching wide sleep transistor costs dynamic power
– Only justified when circuit sleeps long enough
Dec 2010
Performance of CMOS Circuits
65
Low Power Design
 Reduce dynamic power
 a: clock gating, sleep mode
– C: small transistors (esp. on clock), short wires
– VDD: lowest suitable voltage
– f: lowest suitable frequency
 Reduce static power
– Selectively use ratioed circuits
– Selectively use low Vt devices
– Leakage reduction:
stacked devices, body bias, low temperature
Dec 2010
Performance of CMOS Circuits
66
Download