EE5324-Adders

advertisement
EE 5324 – VLSI Design II
Part II: Adders
Kia Bazargan
University of Minnesota
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
68
References and Copyright
• Textbooks referenced
 [WE92] N. H. E. Weste, K. Eshraghian
“Principles of CMOS VLSI Design: A System Perspective”
Addison-Wesley, 2nd Ed., 1992.
 [Rab96] J. M. Rabaey
“Digital Integrated Circuits: A Design Perspective”
Prentice Hall, 1996.
 [Par00] B. Parhami
“Computer Arithmetic: Algorithms and Hardware Designs”
Oxford University Press, 2000.
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
69
References and Copyright (cont.)
• Slides used
 [©Hauck] © Scott A. Hauck, 1996-2000;
G. Borriello, C. Ebeling, S. Burns, 1995,
University of Washington
 [©Prentice Hall] © Prentice Hall 1995, © UCB 1996
Slides for [Rab96]
http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html
 [©Oxford U Press] © Oxford University Press,
New York, 2000
Slides for [Par00]
With permission from the author
http://www.ece.ucsb.edu/Faculty/Parhami/files_n_docs.htm
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
70
Outline
• One-bit adder, basic ripple-carry adder
• Carry-Lookahead adders (CLA)
• Manchester carry chain
• Carry bypass
• Carry select adder
• Brent-Kung adder
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
71
Why Adders?
• Addition: a fundamental operation
 Basic block of most arithmetic operations
 Address calculation
• Faster, faster and faster
• How?
 Architectural level optimization
 Gate-level optimization
 Speed/area trade-off
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
72
Adding Two One-bit Operands
• One-bit Half Adder:
A
Cout
B
Sum = A  B
HA
Cout = A.B
A
0
0
1
1
B Sum Cout
0 0
0
1 1
0
0 1
0
1 0
1
Sum
• One-bit Full Adder:
A
Cout
B
FA
Sum
Spring 2006
Sum = A  B  Cin
Cin
Cout = A.B + B.Cin
+ A.Cin
EE 5324 - VLSI Design II - © Kia Bazargan
Cin A
0 0
0 0
0 1
0 1
1 0
1 0
1 1
1 1
B Sum Cout
0 0
0
1 1
0
0 1
0
1 0
1
0 1
0
1 0
1
0 0
1
1 1
1
73
N-Bit Ripple-Carry Adder: Series of FA Cells
• To add two n-bit numbers
An-1 Bn-1
C
FA
A2
...
B2
A1
B1
A0
B0
FA
FA
FA
S2
S1
S0
C0
n
Sn-1
• Note: adder delay = Tc * n
• Tc = (Cin:Cout delay)
A
Cou
B
FA
t
Ci
n
Sum
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
74
4-bit Ripple Carry Addition: Example
A=0011
B=0101
C4
0
0
0
1
1
0
1
1
A3
B3
A2
B2
A1
B1
A0
B0
FA
C3
S3
FA
C2
S2
FA
C1
S1
FA
C0
0
S0
T=0
T=1
T=2
0
0
0
0
0
0
0
0
S=0000
0
0
0
1
0
1
1
0
S=0110
0
0
0
1
1
0
1
0
S=0100
T=3
T=4
0
0
1
0
1
0
1
0
S=0000
0
1
1
0
1
0
1
0
S=1000
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
75
One-bit Full Adder Implementation
• Direct gate implementation
Sum = A  B  Cin
A
B
Cin
Sum
Cout = A.B + B.Cin + A.Cin
= A.B + Cin. (A+B)
A
B
Cin
A
B
Cout
32 Transistors Used
[WE92] p516
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
76
One-Bit Full Adder: Share Logic
• An observation
 Almost always,
sum = NOT carry
includes 111
Sum = A.B.Cin +
(A+B+Cin).Cout
Cin A
0 0
0 0
0 1
0 1
1 0
1 0
1 1
1 1
B Sum Cout
0 0
0
1 1
0
0 1
0
1 0
1
0 1
0
1 0
1
0 0
1
1 1
1
excludes 000
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
77
One-Bit Full Adder: Transistor Implementation
Cout = A.B + C.(A+B)
A
B
A
B
C
Spring 2006
B
A
A
B
–
–
–
–
28 Transistors
C
Cout
C
A
Sum = A.B.C + (A+B+C).Cout
B
A
B
C
A
B
C Sum
C
B
A
[WE92] p517
Use inverters to get Cout and Sum
[Rab96] p390
C transistors close to output
Cout delay: 2 inverting stages (1-stage possible?)
Sum delay: 3 inverting stages (not an issue, though)
EE 5324 - VLSI Design II - © Kia Bazargan
78
One-Bit Full Adder: Inverted Inputs
• An observation
 Invert inputs =>
outputs invert
FA

FA
• Exploit this property:
 Get rid of the inverter on
the carry critical path
Spring 2006
Cin A
0 0
0 0
0 1
0 1
1 0
1 0
1 1
1 1
B Sum Cout
0 0
0
1 1
0
0 1
0
1 0
1
0 1
0
1 0
1
0 0
1
1 1
1
EE 5324 - VLSI Design II - © Kia Bazargan
79
Ripple Carry Adder: Inverting Property
A
B3
A
3
...
C4
B2
A
2
FA’
C3
B1
A
1
FA’
C2
B0
0
FA’
C1
FA’
S
S
S
S
3
2
1
0
C0
 FA’ is similar to FA, but with no inverters on the
outputs
 Much faster (1-stage)
 Disadvantage: not regular data path
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
80
Summary: Ripple-Carry Adder
• Basic ripple carry: AND-OR gates
 Area: 32 transistors (per bit position)
 Delay: 2 stages of inverting logic (per bit position)
• Direct CMOS logic, share Cout’
 Area: 28 transistors
 Delay: 2 stages
• Use “inverting” property
 Area: 27 (odd bits:26, even bits:28)
 Delay: ~1 stage
• So far: transistor/logic manipulation
• Is that all we can do?!!
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
81
Outline
• One-bit adder, basic ripple-carry adder
• Carry-Lookahead adders (CLA)
• Manchester carry chain
• Carry bypass
• Carry select adder
• Brent-Kung adder
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
82
Carry-Lookahead Adder: Idea
• New look: carry propagation
• Idea:
 Try to “predict” Ck earlier than Tc*k
 Instead of passing through k stages, compute
Ck separately using 1-stage CMOS logic
• Carry propagation: an example
Bit position
7
6
5
4
3
2
1
Carry
1
0
0
1
1
1
1
0
0
1
1
0
0
0
0
1
0
1
1
0
1
Sum
1
0
0
1
0
1
0
A
B
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
0
1
1
+
0
83
Carry-Lookahead Adder (CLA): One Bit
• What happens to the
propagating carry in
bit position k?
0-propagate
A
B
A B Cin Cout
0
0
1
1
kill
A
0
1
0
1
C
C
-
0 (kill)
C (propagate)
C (propagate)
1 (generate)
B
Cout
C
A
B
1-propagate
Spring 2006
B
A
p = A+B (or A  B)
g = A.B
generate
EE 5324 - VLSI Design II - © Kia Bazargan
[Rab96] p391
84
CLA: Propagation Equations
• If C4=1, then either:





g3
g2.p3
g1.p2.p3
g0.p1.p2.p3
Cin.p0.p1.p2.p3
generated at bit pos 3
generated at bit pos 2, propagated 3
generated at bit pos 1, propagated 2,3
generated at bit pos 0, propagated 1,2,3
input carry, propagated 0,1,2,3
• C4 = g3+ g2.p3 + g1.p2.p3 + g0.p1.p2.p3 +
Cin.p0.p1.p2.p3
Implement C4 as a one-stage CMOS logic
 delay=1 (or is it?)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
85
CLA: Static Logic Implementation
d
p1.g2.g3C4
o
q
e
f
r
h s
j
Cin
p0
p1
p2
p3
Spring 2006
t
u
v
w
x
g3
g2
g1
g0
C4
k
l
m
p3.g2  C4 n
EE 5324 - VLSI Design II - © Kia Bazargan
[Rab96] p405
[©Hauck]
86
CLA: Dynamic Logic Implementation
• Dynamic gate implementation:
 C4 = g3+ p3 . (g2 + p2 . (g1 + p1 . (g0 + P0.Cin)))

6 transistors
in series
p2
p3
g2
p1
p0
g3
C4
g1
g0
Cin

[WE92] p529
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
87
CLA: Dynamic Logic Implementation
• Can we reuse logic?
 Can we get C1, C2 and C3 from the same circuit?

C3?
C2?
C1?
p3
p2
g2
p1
p0
g3
g1
g0
C4
No!
C1, C2 and C3
may be floating
(not precharged)
Cin

Spring 2006
Charge sharing
problem
EE 5324 - VLSI Design II - © Kia Bazargan
[©Hauck]
88
CLA: Dynamic Logic Implementation

p0
Cin

C1
g0
p1
p0

C2
g1
g0

Cin


p1
p0
p2
g1
C3
g2
p1
g0
Cin
Spring 2006
p3

p0
p2
g1
g3
C4
g2
g0
Cin
EE 5324 - VLSI Design II - © Kia Bazargan

[WE92] p529
89
CLA: Basic Block (4 Bits) Architecture
• Block of 4-bit p, g, Cout
A
B3
A
3
B2
A
2
p,g
p3
A
1
p,g
g3
B1
p2
0
p,g
g2
B0
p1
p,g
g1
p0
g0
C0
C4
C1
C2
C3
Spring 2006
S
S
S
S
3
2
1
0
EE 5324 - VLSI Design II - © Kia Bazargan
90
CLA: N-Bit Architecture
• Put it all together:
…
A7 B7 A6 B6 A5 B5 A4 B4
…
p,g
p,g
p,g
p,g
Carry Generator
…
…
p,g
p,g
p,g
p,g
C0
Carry Generator
C8
C4
S7
Spring 2006
A3 B3 A2 B2 A1 B1 A0 B0
S6
S5
S4
S3
EE 5324 - VLSI Design II - © Kia Bazargan
S2
S1
S0
91
CLA: 12-Bit Example
A=
B=
A11
1101
0111
A10
A9
A8
B11 B10
B9
B8
p,g
p,g
p,g
p,g
A7
1001
0110
B7
p,g
A6
B6
p,g
A5
B5
A4
p,g
B4
p,g
A3
1010
1101
B3
p,g
A2
B2
p,g
A1
B1
p,g
A0
B0
p,g
0
C0
Carry Generator
Carry Generator
C12
Carry Generator
C8
S11
S10
S9
S8
C4
S7
S6
S5
S4
S3
S2
S1
T=0
0
0000
0
0000
0
0000
T=2
T=3
T=4
1
0100
0
1111
1
0111
1
0100
1
0000
1
0111
1
0101
1
0000
1
0111
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
S0
92
Summary: Carry Lookahead Adder
• CLA compared to ripple-carry adder:
 Faster (“4 times”?),
but delay still linear (w.r.t. # of bits)
 Larger area
o P, G signal generation
o Carry generation circuits
o Carry generation ckt for each bit position (no re-use)
• Limitation: cannot go beyond 4 bits of look-ahead
 Large p,g fan-out slows down carry generation
• Next: Manchester carry chains
 Tries to reuse logic by pre-charging each carry position
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
93
Outline
• One-bit adder, basic ripple-carry adder
• Carry-Lookahead adders (CLA)
• Manchester carry chain
• Carry bypass
• Carry select adder
• Brent-Kung adder
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
94
Recap: Carry Look-Ahead
• Charge sharing problem

C3?
C2?
C1?
p0
p3
g3
p2
C4
g2
p1
g1
g0
Cin

Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
95
Manchester Carry Chain: First Shot
• Improvement over CLA:
 Precharge internal nodes to avoid charge-sharing problem

Cin
C1
C2
C3
p0
p1
p2
g0
g1
g2
p3
g3
C4

• Fastest way to do small adders
– 6 transistors on the critical path
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[©Hauck]
96
Manchester Carry Chain: Sizing
Discharge Transistor
R1
MC
C1
2
R2
M0
C2
3
R3
M1
R4
4
M2
C3
5
C4
R5
M3
C5
6
R6
M4
C6
 i

tp = 0.69  Ci   R j
i = 1 j = 1 
25
400
20
300
Area
delay
Speed
N
15
100
10
5
1
2.0 2.5 3.0
k
Speed (normalized by 0.69RC)
Spring 2006
200
1.5
0
1
1.5
2.0 2.5 3.0
k
Area (in minimum size devices)
EE 5324 - VLSI Design II - © Kia Bazargan
Out
(“k” is the sizing factor)
1
[© Prentice Hall] 97
Manchester Carry Chain: An Improvement
• Problem: Cin arrives late move it closer to output
 Use bypass logic:

p0
Cin
g0
p1
g1
p2
g2
p3
g3
C4

Cin
p0 p1 p2 p3


[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
98
Manchester Carry Chain: the Improvement
• Direct implementation
• Carry bypass circuitry
p0 g0 p1 g1 p2 g 2 p3 g3
Cin
C4
C1
C2
C3
C4
Cin
C4
Cin
p0 p1 p2 p3


• Advantages of the carry bypass circuitry
– Only 5 series transistors
– Less capacitance in internal nodes
– Cin close to the output
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[©Hauck]
99
Manchester Carry Chain: Summary
• Compared to CLA:
 Smaller area
o Pre-charge internal nodes
o Reuse logic for intermediate carry signals
 Cin close to the output
• Carry chain can be any length
 Series propagate is slow (O(n2) delay) 
buffer every 4 bits
• Compact adder: good for up to 16 bits
• Using carries to compute sum slows down MCC
– Use two carry chains: one for sum, one for
carry propagation
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
100
Outline
• One-bit adder, basic ripple-carry adder
• Carry-Lookahead adders (CLA)
• Manchester carry chain
• Carry bypass
• Carry select adder
• Brent-Kung adder
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
101
Carry Bypass Adder: Idea
• The “bypass” idea is general
 Not just for Manchester carry chain
 The local carry chain could be “ripple carry adder”
Bit i to i+k
• Structure
Setup
Ci
Local
Carry
Chain
Sum
Spring 2006
Ci+k+1
– Could be static, dynamic,
pass transistor
– Carry and sum paths
shown in different colors
– Bypass logic determines:
“pass” or “kill/generate”?
Bypass?
EE 5324 - VLSI Design II - © Kia Bazargan
102
Carry Bypass Adder: Cell Examples
• Static implementation, using ripple carry adder
Local Carry Chain
FA
FA
FA
FA
• Dynamic, Manchester (mux=wire!)
p

0
p1
p2
p3
g0
g1
g2
g3
Cin
p0 p1 p2 p3

Spring 2006
p0.p1.p2.p3
C4

[Rab96] p398
EE 5324 - VLSI Design II - © Kia Bazargan
103
Carry Bypass Adder: Cell Examples (cont.)
• Static (pass transistor logic), Manchester
T1=(p0.p1.p2).p3
p0
T2=p3
p1
T3=p0.p1.p2.p3
p2
T1
C0
p0
p1
g
g
0
1
p0
T1
p2
T2
g2
p1
g3
p2
T2
T3
T3
Spring 2006
C4
EE 5324 - VLSI Design II - © Kia Bazargan
[WE92] p531
104
Carry Bypass Adder: the Structure and Timing
• Timing (Critical path shown in different color):
1-Setup
2-Local carry generate/kill, MUX select line ready
3-C0-C16 carry propagate (if applicable)
Bit 0-3
Bit 4-7
Bit 8-11
Bit 12-15
Setup
Setup
Setup
Setup
Local
Carry
Chain
Local
Carry
Chain
Local
Carry
Chain
Local
Carry
Chain
Sum
Sum
Sum
Sum
C0
[Rab96] p.399
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
105
Carry Bypass Adder: Timing of a Sub-block
• For an intermediate stage, after setup:
 If in pass mode
o Local carry vector computes intermediate
carries (possibly incorrectly)
o At the same time, mux selection set to pass
o When input carry arrives, intermediate carries
might be recomputed
o Meanwhile, input carry is sent to Cout
– If not pass mode (assume bit 10 generates)
• Local carry vector computes intermediate
carries (bits 10, 11 correc)
• At the same time, mux selection set to local
• Meanwhile, output carry is sent to Cout correctly
• When input carry arrives, intermediate carries
C8and C9 (S8,S9,S10) will be recomputed correctly
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
Bit 8-11
Setup
Local
Carry
Chain
Sum
Bit 8-11
Setup
Local
Carry
Chain
Sum
106
Carry Bypass Adder: Timing
Delay =
tsetup+ max { tselect , 4 x tFA} +
3 x tmux_pass + 3 x tFA+ tsum
Bit 0-3
Bit 4-7
Bit 8-11
Bit 12-15
Setup
Setup
Setup
Setup
Local
Carry
Chain
Local
Carry
Chain
Local
Carry
Chain
Local
Carry
Chain
Sum
Sum
Sum
Sum
C0
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
107
• Speed:
 Faster than
ripple adder
 Still linear!
• Area overhead:
 Mux (setup?)
 Not worth for
small adders (N<8)
 10-20% for
large adders
Propagation Delay
Carry Bypass Adder: Pros and Cons
Ripple Adder
Bypass Adder
4..8
Number of
bits
[Rab96] p.399
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
108
Outline
• One-bit adder, basic ripple-carry adder
• Carry-Lookahead adders (CLA)
• Manchester carry chain
• Carry bypass
• Carry select adder
• Brent-Kung adder
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
109
Carry Select Adder: the Idea
• Similar to bypass
k bits
Setup (p,g)
p,g
p,g
0
0-Carry propagation
1
1-Carry propagation
Ci
Multiplexers
Carry Vector
Sum Generation
 Instead of “waiting” for the
input carry, ”precompute”
the carry output
 Compute Ci+k for both
cases Ci=0 and Ci=1
 When Ci arrives, select the
appropriate result
 Sum computed in one step
Ci+k after the intermediate carry
signals are ready
[Rab96] p.400
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
110
Linear Carry Select Adder: Structure
Bits 0-3
Bits 4-7
Bits 8-11
Bits 12-15
Setup
Setup
Setup
Setup
0
0-Carry
0
0-Carry
0
0-Carry
0
0-Carry
1
1-Carry
1
1-Carry
1
1-Carry
1
1-Carry
C0
C4
Sum
C8
Sum
C12
Sum
C16
Sum
[Rab96] p.401
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
111
Linear Carry Select Adder: Timing
Bits 0-3
Bits 4-7
Bits 8-11
Bits 12-15
Setup
Setup
Setup
Setup
0
0-Carry
0
0-Carry
0
0-Carry
0
0-Carry
1
1-Carry
1
1-Carry
1
1-Carry
1
1-Carry
C0
C4
Sum
C8
Sum
C12
Sum
Delay = 3 + 1 + 1 + 1 + 1 = 7 (16 bits)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
C16
Sum
[Rab96] p.401
112
Square Root Carry Select Adder: the Idea
• Later stages have to wait for the multiplexers in
the earlier stages
• Why not give them bigger chunks of data to
compute?
 Balances the delay paths
 Sub-linear delay (we will see why)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
113
Square Root Carry Select Adder: the Structure
• Assuming the following delays:
 Setup=1, carry propagate=1/bit, mux=1
Bits 0-1 Bits 2-4
3
C0
C2
Sum
Bits 5-8
Bits 9-13
Bits 14-19
5
6
7
4
C5
C9
C14
C19
Delay from all paths = 8 (20 bits)
[Rab96] p.402
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
114
Square Root Carry Select Adder: Delay
• Assume
 N-bit adder
 P stages (delay directly depends on P)
 First stage computes M bits
N  M  ( M  1)  ( M  2)    ( M  P  1)
P( P  1)
 MP 
2
P2
1

 P( M  )
2
2
• For M<<N (e.g. N=64, M=2)
 The first term dominates  N  P2/2 
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
P  2N
115
Carry Select Adder: Trade-offs
• Area overhead:
 An additional carry path and a multiplexer (not the whole adder)
 About 30% more than a ripple-carry
• Delay
 Sub-linear (we can beat that too!)
40.0
ripple adder
30.0
20.0
10.0
0.0 0
linear select
square root select
20
40
Number of bits
60
[© Prentice Hall]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
116
Outline
• One-bit adder, basic ripple-carry adder
• Carry-Lookahead adders (CLA)
• Manchester carry chain
• Carry bypass
• Carry select adder
• Brent-Kung adder
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
117
Binary Carry-Lookahead or Brent-Kung Adder
• Idea: use binary tree for carry propagation
 logarithmic delay
F
A0
A1
A2
A3
A0
A1
A2
A3
A4
A5
A6
A7
Spring 2006
A4
A5
A6
A7
tp ~ N
F
tp ~ log2(N)
[© Prentice Hall]
EE 5324 - VLSI Design II - © Kia Bazargan
118
Brent-Kung Adder
• Basic component
MSB
LSB
(gleft, pleft)
gleft pleft
gright
(gright pright)
pright
(g, p)
g
p
Concatenation
g = gleft + pleft • gright
p = pleft • pright
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
119
Brent-Kung Adder: Structure
• Define (Gi, Pi)
 generate and propagate for least significant i bits
(G0,P0) = (g0,p0)
gi = Ai.Bi
pi = AiBi
• (Gi-1, Pi-1)
= (gi, pi) • (gi-1, pi-1) • . . . . • (g1, p1)
for i>0: (Gi, Pi) = (gi, pi)
• Key to Brent-Kung adder – use tree structure to
perform concatenations
7 6 5 4 3 2 1
C5 ?
No! Doesn’t know about
C0-3 yet!
Spring 2006
7-6
5-4
3-2
0
1-0
3-0
7-4
7-0
EE 5324 - VLSI Design II - © Kia Bazargan
[©Hauck]
120
Brent-Kung: the Complete Tree
(g 0 ,p0 )
(g 1 ,p1 )
(g 2 ,p2 )
C0
C1
C2
C4
C3
C5
(g 3 ,p3 )
(g 4 ,p4 )
(g 5 ,p5 )
C6
(g 6 ,p6 )
C7
(g 7 ,p7 )
tadd ~ log2 (N)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[© Prentice Hall]
121
Brent-Kung: Timing
x15 x14x13 x12 x
x x
11 10 9
x8 x x x x x x x x
7 6 5
4 3 2 1
0
Level
1
2
3
4
5
6
s15 s14 s13 s12 s s s s s s s s s s s s
11 10 9
8 7 6 5
4 3 2 1
0
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[Par00] p.102
[©Oxford U Press]
122
Brent-Kung Adder: Summary
• Area
 On average, twice as large as ripple adder
 Layout of the cells is very compact
• Delay
 Logarithmic time
 Once carry signals are ready,
sum bits derived in const time
 Good for wide adders
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
123
Comparing Adder Designs
60
static mirror
manchester
bypass
Brent-Kung
select
Area (mm2)
80
tp(sec)
0.4
40
select
static
bypass
mirror
0.2
Brent-Kung
20
manchester
0 0
10
20
Number of bits
0
0
10
20
Number of bits
[© Prentice Hall]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
124
Combining Different Adders
[Par00] p.103
[©Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
125
Combining Different Adders
• Two-level carry skip adder
 Delay = 8 cycles
 Number of bits: 30
T produce
{8, 1}
c out
bF
8
{7, 2}
bE
{6, 3}
bD
6
7
S2
S2
F Blk E
Cout
t=8
7
Tassimilate
{5, 4}
bC
Block D
4
5
S2
S2
Block C
{4, 5}
bB
3
S2
Block B
Block A
2
6
5
4
{3, 8}
c
bA in
0
3
3
3
Cin
t=0
[Par00] p.113
[©Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
126
Combining Different Adders
MSB
RA(63:24)
RB(63:24)
RA(23:0)
40 Bit
Carry Select Adder
cout23
64 Bit Adder
RB(23:0)
LSB
24 Bit
Differential Carry
Lookahead Adder
EA(23:0)
EA(63:24)
Data
Cache
TLB
Compare
real_add(40:0)
Compare
hit/miss/data
© Dan Stasiak, IBM Rochester, 2001
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
127
Combining Different Adders
40 Bit Adder Section
24 Bit Adder Section
EA(24:63)
EA(0:23) &
EA_L(0:23)
© Dan Stasiak, IBM Rochester, 2001
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
128
Should appear before
Combining Different Adders
slide 126
• Ripple+skip adder: delay=8. Max adder width?
 Assume: p,g, ripple, skip signal, skipping: 1 unit delay
 Carry signals
o Pass mode: ready at time x through skip logic  limit # blocks
o Local gen mode: blocks can process y bits and still have time to
deliver locally generated carry by time x for the next block.
 Sum signals
Cout
1
o If in local generation mode, y is OK
o If in pass mode, y not OK for left bits (e.g., bE receives cin at
x=5, can process at most z=3 bits to meet the delay bound of 8
on the sum bits)
7
2
bG
7
Spring 2006
bF
S
4
5
bD
3
6
bE
6
[Par00] p.112
S
5
S
4
bC
4
S
EE 5324 - VLSI Design II - © Kia Bazargan
1
bA
3
bB
3
2
S
Cin
0
2
[©Oxford U Press] 129
Should appear before
slide 86
CLA Static Logic: Trimmed Down
g0
h s
j
Cin
p0
t
u
C1
k
[Rab96] p405
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
130
Download