VLSI Arithmetic Adders & Multipliers

advertisement
VLSI Arithmetic
Adders & Multipliers
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Introduction
• Digital Computer Arithmetic belongs to
Computer Architecture, however, it is also an
aspect of logic design.
• The objective of Computer Arithmetic is to
develop appropriate algorithms that are
utilizing available hardware in the most
efficient way.
• Ultimately, speed, power and chip area are
the most often used measures, making a
strong link between the algorithms and
technology of implementation.
Oklobdzija 2004
Computer Arithmetic
2
Basic Operations
•
•
•
•
Addition
Multiplication
Multiply-Add
Division
• Evaluation of Functions
• Multi-Media
Oklobdzija 2004
Computer Arithmetic
3
Addition of Binary Numbers
Addition of Binary Numbers
Full Adder. The full adder is the fundamental building block
of most arithmetic circuits:
ai
Cout
bi
Full
Adder
Cin
si
The sum and carry outputs are described as:
si  ai bi ci  ai bi ci  ai bi ci  ai bi ci
ci1  ai bi ci  ai bi ci  ai bi ci  ai bi ci  ai bi  ai ci  bi ci
Oklobdzija 2004
Computer Arithmetic
5
Addition of Binary Numbers
Inputs
Outputs
ci
ai
bi
si
ci+1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
0
0
1
0
1
1
1
Oklobdzija 2004
Computer Arithmetic
Propagate
Generate
Propagate
Generate
6
Full-Adder Implementation
Full Adder operations is defined by equations:
si  aibi ci  aibi ci  aibi ci  aibi ci  ai  bi  ci  pi  ci
ci 1  aibi ci  aibi ci  aibi  gi  pi ci
ai bi
Carry-Propagate: pi  ai  bi
and Carry-Generate gi
g i  ai  bi
cout
cin
One-bit adder could be
implemented as shown
Oklobdzija 2004
Computer Arithmetic
si
7
High-Speed Addition
ci 1  gi  pi ci
ai
gi  ai  bi
bi
pi  ai  bi
0
cout
cin
s 1
One-bit adder could be
implemented more efficiently
because MUX is faster
Oklobdzija 2004
si  pi  ci
si
Computer Arithmetic
8
The Ripple-Carry Adder
Oklobdzija 2004
Computer Arithmetic
9
The Ripple-Carry Adder
A0
A1
B0
Co,0
Ci,0
FA
S0
FA
A2
B1
A3
B2
Co,2
C o,1
B3
Co,3
FA
FA
S2
S3
(= C i,1)
S1
Worst case delay linear with the number of bits
td = O(N)
t adder   N – 1 tcarry + tsum
Goal: Make the fastest possible carry path circuit
From Rabaey
Oklobdzija 2004
Computer Arithmetic
10
Inversion Property
A
Ci
A
B
FA
Co
Ci
S
B
FA
Co
S
S  A B C i  = S  A B  Ci 
C  A B C  = C  A B  C 
o
i
o
i
From Rabaey
Oklobdzija 2004
Computer Arithmetic
11
Minimize Critical Path by Reducing Inverting
Stages
Even Cell
A0
Ci,0
A1
B0
FA’
C o,0
S0
B1
FA’
S1
A2
Co,1
A3
B2
FA’
Odd Cell
C o,2
S2
B3
FA’
C o,3
S3
Exploit Inversion Property
From Rabaey Note: need 2 different types of cells
Oklobdzija 2004
Computer Arithmetic
12
Ripple Carry Adder
Carry-Chain of an RCA implemented using multiplexer from the
ai+2 library:
bi+2
ai+1
bi+1
ai
bi
standard cell
Critical Path
ci+1
cout
ci
cin
Oklobdzija, ISCAS’88
si+2
Oklobdzija 2004
si+1
Computer Arithmetic
si
13
Manchester Carry-Chain
Realization of the Carry Path
• Simple and very popular scheme for implementation of
carry signal path
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Generate
device
Carry in
Carry out
+
+
+
+
+
+
+
+ Propagate
device
Predischarge
& kill device
Oklobdzija 2004
Computer Arithmetic
14
Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:
A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.
Oklobdzija 2004
Computer Arithmetic
15
Manchester Carry Chain (CMOS)
•Implement P with pass-transistors
•Implement G with pull-up, kill (delete) with pull-down
•Use dynamic logic to reduce the complexity and speed up
VDD

Ci,0
P0
P1
P2
P3
P4
G0
G1
G2
G3
G4

Kilburn, et al, IEE Proc, 1959.
Oklobdzija 2004
Computer Arithmetic
16
Pass-Transistor Realization in DPL
C
C
VCC
S
A
A
B
B
S
XOR/XNOR
MULTIPLEXER
BUFFER
AND/NAND
VCC
A
A
B
B
C
C
VCC
CO
VCC
A
A
B
B
CO
MULTIPLEXER
BUFFER
OR/NOR
Oklobdzija 2004
Computer Arithmetic
17
Carry-Skip Adder
MacSorley, Proc IRE 1/61
Lehman, Burla, IRE Trans on Comp, 12/61
Oklobdzija 2004
Computer Arithmetic
18
Carry-Skip Adder
G1
Ci,0
P0
G1
C o,0
P0
FA
P2
FA
G2
Co,1
FA
G3
Co,3
FA
G1
C o,0
P3
Co,2
FA
P0 G1
G2
C o,1
FA
Ci,0
P2
P3
G3
BP=P oP1 P2 P3
C o,2
FA
FA
Multiplexer
P0
Co,3
Bypass
From Rabaey
Idea: If (P0 and P1 and P2 and P3 = 1)
then C o3 = C 0, else “kill” or “generate”.
Oklobdzija 2004
Computer Arithmetic
19
Carry-Skip Adder:
N-bits, k-bits/group, r=N/k groups
a (r-1)k b(r-1)k a (r-1)kb (r-1)k
a N-1bN-1a N-k-1b N-k-1
OR
Cout
+
...
...
+
... ...
SN-1 S N-k-1
Pr-1
AND
...
G r1
OR
+
G1
+
... ...
S (r-1)k-1
...
...
OR
OR
Gr
a k-1 b k-1 a0 b0
a 2k-1b 2k-1 ak bk
... ...
... ...
S (r-2)k
Pr-2
...
AND
S
2k-1
Sk
P1
AND
Cin
Go
S
k-1
S
0
P0
AND
critical path, delay =2(k-1)+(N/2-2)
Oklobdzija 2004
Computer Arithmetic
20
Carry-Skip Adder
tp
ripple adder
bypass adder
N

td  2k  1t RCA    2 t SKIP
 2k

4..8
Oklobdzija 2004
N
Computer Arithmetic
21
Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
22
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
a N-1b N-1
C out
..
...
Gm
Gm-1
SN-1
Pm
Pm-1
Gm
Gm-1
a
aj b j
..
.
Gm-2
i
...
..
G2
G1
G0
Si
Pm-2
P2
...
a0 b0
...
Sj
Gm-2
bi
Cin
S0
P1
G2
G1
skiping
P0
G0
...
C ou
Cin
t
rippling
Oklobdzija 2004
Carry signal path
Computer Arithmetic
23
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
6
1 3
4
5
5
=9
4
3 1
Any-point-to-any-point delay = 9 
as compared to 12  for CSKA
Oklobdzija 2004
Computer Arithmetic
24
Carry-chain block size determination for a
32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
25
Delay Calculation for Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
G0
P1
P2
G1
P3
G2
BP
Co,3
G3
BP
Delay model:
Oklobdzija 2004
Computer Arithmetic
26
Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
td  c1  c2 N  c3
Oklobdzija, Barnes, Arith’85
Oklobdzija 2004
Computer Arithmetic
27
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
•
•
Oklobdzija 2004
No closed form solution for delay
It is a dynamic programming problem
Computer Arithmetic
28
Delay Comparison: Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
29
Delay
Delay Comparison: Variable Block Adder
16
VBA
14
12
CLA
10
8
VBA- Multi-Level
6
4
2
0
4
11
18
25
32
39
46
53
60
Size N
Oklobdzija 2004
Computer Arithmetic
30
VLSI Arithmetic
Lecture 4
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Review
Lecture 3
Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
33
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
a N-1b N-1
C out
..
...
Gm
Gm-1
SN-1
Pm
Pm-1
Gm
Gm-1
a
aj b j
..
.
Gm-2
i
...
..
G2
G1
G0
Si
Pm-2
P2
...
a0 b0
...
Sj
Gm-2
bi
Cin
S0
P1
G2
G1
skiping
P0
G0
...
C out
Cin
rippling
Oklobdzija 2004
Carry signal path
Computer Arithmetic
34
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
6
1 3
4
5
5
=9
4
3 1
Any-point-to-any-point delay = 9 
as compared to 12  for CSKA
Oklobdzija 2004
Computer Arithmetic
35
Carry-chain block size determination for a
32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
36
Delay Calculation for Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
G0
P1
P2
G1
P3
G2
BP
Co,3
G3
BP
Delay model:
Oklobdzija 2004
Computer Arithmetic
37
Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
td  c1  c2 N  c3
Oklobdzija, Barnes, Arith’85
Oklobdzija 2004
Computer Arithmetic
38
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
•
•
Oklobdzija 2004
No closed form solution for delay
It is a dynamic programming problem
Computer Arithmetic
39
Delay Comparison: Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
40
Delay
Delay Comparison: Variable Block Adder
16
Square Root
Dependency
14
VBA
12
Log
Dependency
10
CLA
8
VBA- Multi-Level
6
4
2
0
4
11
18
25
32
39
46
53
60
Size N
Oklobdzija 2004
Computer Arithmetic
41
Circuit Issues
• Adder speed can not be estimated based
on:
– logic gates in the critical path
– number of transistors in the path
– logic levels in the path
• Estimating Adders speed is much more
complex and many of the “fast” schemes
may be misleading you.
Oklobdzija 2004
Computer Arithmetic
42
Fan-Out Dependency
Oklobdzija 2004
Computer Arithmetic
43
Fan-In Dependency
This looks like
“Logical Effort”
(1985)
Oklobdzija 2004
Computer Arithmetic
44
Delay Comparison: Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
45
Oklobdzija 2004
Computer Arithmetic
46
Carry-Lookahead Adder
(Weinberger and Smith, 1958)
ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who
invented CLA adder in 1958)
Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”,
National Bureau of Standards, Circ. 591, p.3-12, 1958.
Oklobdzija 2004
Computer Arithmetic
47
CLA Definitions: One-bit adder
ci 1  gi  pi ci
ai
gi  ai  bi
bi
pi  ai  bi
0
cout
cin
s 1
si  pi  ci
si
Oklobdzija 2004
Computer Arithmetic
48
CLA
Definitions:
4-bit
Adder
a
a
a
b
b
b
b
ai+3
i+2
i+3
Ci+4
Ci+3
gi+3
pi+3
i+1
i+2
Ci+2
gi+2
pi+2
i
i+1
i
Ci+1
gi+1
pi+1
Ci
gi
pi
ci 1  aibi ci  aibi ci  aibi  gi  pi ci
ci  2  gi 1  pi 1ci 1  gi 1  pi 1 ( gi  pi c1 )
 gi 1  pi 1gi  pi 1 pi c1
Oklobdzija 2004
Computer Arithmetic
49
Carry-Lookahead
a
a 4-bits
a
a Adder:
b
b
b
b
i+3
i+2
i+3
Ci+4
Ci+3
gi+3
pi+3
i+1
i+2
Ci+2
gi+2
pi+2
i
i+1
i
Ci+1
gi+1
pi+1
Ci
gi
pi
ci 3  gi  2  pi  2ci  2  gi  2  pi  2 ( gi 1  pi 1gi  pi 1 pi ci )
 gi  2  pi  2 gi 1  pi  2 pi 1gi  pi  2 pi 1 pi ci
ci  4  gi 3  pi 3ci 3  gi 3  pi 3 ( gi  2  pi  2 gi 1  pi  2 pi 1gi )
 gi 3  pi 3 gi  2  pi 3 pi  2 gi 1  pi 3 pi  2 pi 1gi  pi 3 pi  2 pi 1 pi ci
Gj
Oklobdzija 2004
Pj
Computer Arithmetic
50
Carry-Lookahead Adder
G j  gi 3  pi 3 gi 2  pi 3 pi 2 gi 1  pi 3 pi 2 pi 1gi
Pj  pi 3 pi 2 pi 1 pi
ai+3 bi+3
One gate delay 
to calculate p, g
One  to calculate
P and two for G
ai+1 bi+1
ai
bi
Cj
Cin
gi+1pi+1
gi+1pi+1
C4(j+1)
Three gate delays
To calculate C4(j+1)
gi+1pi+1
gi pi
P, G Group
C4j+3
c4( j 1)  G j  Pj c j
Oklobdzija 2004
ai+2 bi+2
C4j+2
Gj
C4j+1
Pj
Compare that to 8  in RCA !
Computer Arithmetic
51
Carry-Lookahead Adder
(Weinberger and Smith)
G* j  Gi 3  Pi 3Gi  2  Pi 3Pi  2Gi 1  Pi 3Pi  2 Pi 1Gi
P* j  Pi3 Pi 2 Pi 1Pi
Gj+3 Pj+3
Pj+2
Gj+2
Gj+1 Pj+1
Gj Pj
C4(j+1)
C4j
P*
G*
c4( j 1)  G *k P *k c4 j
C4j+3
C4j+2
C4j+1
Additional two gate delays
C16 will take a total of 5 vs. 32 for RCA !
Oklobdzija 2004
Computer Arithmetic
52
32-bit Carry Lookahead Adder
ai
C28
C24
individual adders
generating: gi, pi,
and sum Si
C20
C12
bi
C8
C4
Cin
C16
Carry-lookahead super- blocks of
4-bits blocks generating:
G*i, P*i, and Cin for the 4-bit
blocks
Cout
Cout
Cin
Cin
Carry-lookahead blocks of
4-bits generating:
Gi, Pi, and Cin for the
adders
Group producing final
carry Cout and C16
Critical path delay =  (for gi,pi)+2x2 (for G,P)+3x2 (for Cin)+1XOR- (for Sum) = appx. 12 of delay
Oklobdzija 2004
Computer Arithmetic
53
Carry-Lookahead Adder
(Weinberger and Smith: original derivation, 1958 )
Oklobdzija 2004
Computer Arithmetic
54
Carry-Lookahead Adder
(Weinberger and Smith: original derivation )
Oklobdzija 2004
Computer Arithmetic
55
Carry-Lookahead Adder (Weinberger and Smith)
please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004
Computer Arithmetic
56
Carry-Lookahead Adder (Weinberger and Smith)
please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004
Computer Arithmetic
57
Motorola: CLA Implementation
Example
A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS
Adder Design”,
Proceedings of the IEEE Custom Integrated Circuits
Conference, May 3-6, 1992.
P63
G63
P62
G62
P61
G61
P60
G60
P59
P63:48
P63:0
C60
P63:60
P59:48
G59:48
P55:48
G55:48
P51:48
G51:48
P11:0
G11:0
P7:0
G7:0
P3:0
G3:0
C4
C8
C12
C16
C32
C48
C52
C56
G63:0
P59:56
G59:56
P55:52
G55:52
P51:48
G51:48
P15:12
G15:12
P11:8
G11:8
P7:4
G7:4
P3:0
G3:0
G15:0
P15:0
G31:16
P31:16
G47:32
G63:48
61
CARRY
BLOCK
G63:60
C
P,G62:60 63
C
P,G61:6062
C
P,G60
PG BLOCK
PG BLOCK
G56
P55
G
52
P47:32
PG BLOCK
1.7nS
P51
P,G2:0
P,G1:0
P,G0
C16
P15:0
G31:0
C32
P31:0
G47:0
P47:0
C48
3.75nS
G15:0
C64
C0
2.35nS
2.0nS
G48
P47
G32
P
31
G16
P15
G12
P
PG BLOCK
PG BLOCK
59
Computer Arithmetic
Oklobdzija 2004
11
G8
P
7
G4
P3
G3
P2
G2
P1
G1
P0
G0
1.05nS
...
...
...
...
...
...
...
...
4.8nS
Critical path in Motorola's 64-bit CLA
2.7nS
Critical path: A, B - G0 - G3:0 - G15:0 - G47:0 - C48 - C60 - C63 - S63
Motorola's 64-bit
CLA
conventional PG Block
no better
situation here !
Basically, this is MCC performance with
Carry-Skip.
One should not expect any better results
than VBA.
Oklobdzija 2004
Computer Arithmetic
carry ripples locally
5-transistors in the path
60
Motorola's 64-bit
CLA
Modified PG Block
Intermediate propagate signals Pi:0
are generated to speed-up C3
still critical path resembles MCC
Oklobdzija 2004
Computer Arithmetic
61
Motorola's 64-bit CLA
3.9nS
1.8nS
2.2nS
3.55nS
2.9nS
Oklobdzija 2004
3.2nS
Computer Arithmetic
62
P63
G63
P62
G62
P61
G61
P60
G
60
P59
56
G
P
55
G52
PG BLOCK
PG BLOCK
P63:48
P63:0
C4
C8
C12
C16
C32
C48
C52
C56
P63:60
P59:48
G59:48
P55:48
G55:48
P51:48
G51:48
P11:0
G11:0
P7:0
G7:0
P3:0
G3:0
P31:16
G31:16
P15:0
G15:0
P47:0
G47:0
C32
P31:0
G31:0
C16
P15:0
G15:0
C64
3.75nS
C48
G63:0
C60
P47:32
G63:48
C61
P59:56
G59:56
P55:52
G55:52
P51:48
G51:48
P15:12
G15:12
P11:8
G11:8
P7:4
G7:4
P3:0
G3:0
G47:32
C G63:60
P,G62:60 63
C
P,G61:6062
P,G60
CARRY
BLOCK
63
3.2nS
Computer Arithmetic
P51
P,G2:0
P,G1:0
P,G0
2.7nS
C0
3.55nS
2.2nS
3.9nS
2.35nS
2.0nS
G48
P47
G32
P31
G16
P15
G12
P
11
G8
P
7
G4
P3
G3
P2
G2
P1
G1
P0
G0
PG BLOCK
1.7nS
2.9nS
Oklobdzija 2004
PG BLOCK
PG BLOCK
1.05nS
...
...
...
...
...
...
...
...
4.8nS
Critical path: A, B - G0 - G3:0 - G15:0 - G47:0 - C48 - C60 - C63 - S63
1.8nS
Delay Optimized CLA
B. Lee, V. G. Oklobdzija
Journal of VLSI Signal Processing, Vol.3, No.4, October 1991
Delay
Optimized
CLA: LeeOklobdzija ‘91
(a.) Fixed groups and levels
(b.) variable-sized groups,
fixed levels
(c.) variable-sized groups and
fixed levels
(d.) variable-sized groups and
levels
Oklobdzija 2004
Computer Arithmetic
65
Two-Levels of Logic Implementation of
the Carry Block
Oklobdzija 2004
Computer Arithmetic
66
Two-Levels of Logic Implementation of
the Carry-Lookahead Block
Oklobdzija 2004
Computer Arithmetic
67
Three-Levels of Logic Implementation
of the Carry Block (restricted fan-in)
Oklobdzija 2004
Computer Arithmetic
68
Three-Levels of Logic Implementation of the
Carry Lookahead (restricted fan-in)
Oklobdzija 2004
Computer Arithmetic
69
Delay Optimized CLA: Lee-Oklobdzija ‘91
Delay: Three-level BCLA
Delay: Two-level BCLA
Oklobdzija 2004
Computer Arithmetic
70
Delay Optimized CLA: Lee-Oklobdzija ‘91
(a.) 2-level BCLA =8.5nS
Oklobdzija 2004
(b.) 3-level BCLA =8.9nS
Computer Arithmetic
71
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.
Ling’s Derivations
ai
bi
define:
Ci 1  gi  pi  Ci
Hi 1  Ci 1  Ci
gi implies Ci+1 which implies
Hi+1 , thus: gi= gi Hi+1
ci+1
ci
gi  ai bi
si
pi Ci  pi Ci 1  pi gi  pi pi Ci 1
ai bi pi gi ti
 pi Ci 1  pi Ci 1  pi H i 1
0
0 0 0
0
0
1 1 0
1
1
0 1 0
1
1
1 0 1
1
pi Ci  pi Hi 1
Ci1  ti  Hi1
Ci 1  gi  pi  Ci  gi Hi 1  pi  Ci
 gi Hi 1  pi  Hi 1  ti  Hi 1
Oklobdzija 2004
Computer Arithmetic
73
Ling’s Derivations
From:
and
gi 
pi CC
Hii 11  Cii11  Cii  gi  C
1 
i
piiC

C

g
i
i
i
i
because:
Hi 1  gi  ti 1Hi
Ci1  ti  Hi1
fundamental expansion
Now we need to derive Sum equation
Oklobdzija 2004
Computer Arithmetic
74
Ling Adder
Variation of CLA:
Ling’s equations:
pi  ai  bi
ti  ai  bi
gi  ai  bi
gi  ai  bi
Ci 1  gi  pi  Ci
Hi 1  gi  ti 1  Hi
Si  pi  Ci
Si  ti  Hi 1  gi ti 1Hi
Ling, IBM J. Res. Dev, 5/81
Oklobdzija 2004
Computer Arithmetic
75
Ling Adder
Ling’s equation:
Variation of CLA:
Ci 1  gi  gi Ci  pi  Ci
 gi  gi  pi   Ci
Hi  gi  ti 1  Hi 1
Ci 1  gi  ti  Ci
Ling uses different transfer function.
Four of those functions have desired
properties (Ling’s is one of them)
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004
Computer Arithmetic
76
Ling Adder
Conventional:
Fan-in of 5
C4  g3  t3 g 2  t3t2 g1  t3t2t1g0  t3t2t1t0Cin
Ling:
H 4  g 3  t2 g 2  t2t1g1  t2t1t0 g0  t2t1t0t1Cin
H 4  g 3  g 2  t2 g1  t2t1 g0  t2t1t0Cin
Fan-in of 4
Oklobdzija 2004
Computer Arithmetic
77
Advantages of Ling’s Adder
• Uniform loading in fan-in and fan-out
• H16 contains 8 terms as compared to G16
that contains 15.
• H16 can be implemented with one level of
logic (in ECL), while G16 can not.
(Ling’s adder takes full advantage of wiredOR, of special importance when ECL
technology is used)
Oklobdzija 2004
Computer Arithmetic
78
VLSI Arithmetic
Lecture 5
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Review
Lecture 4
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.
Ling’s Derivations
ai
bi
define:
Ci 1  gi  pi  Ci
Hi 1  Ci 1  Ci
gi implies Ci+1 which implies
Hi+1 , thus: gi= gi Hi+1
ci+1
ci
gi  ai bi
si
pi Ci  pi Ci  pi gi  pi pi Ci
ai bi pi gi ti
 pi Ci  pi Ci 1  pi Hi 1
0
0 0 0
0
0
1 1 0
1
1
0 1 0
1
1
1 0 1
1
pi Ci  pi Hi 1
Ci1  ti  Hi1
Ci 1  gi  pi  Ci  gi Hi 1  pi  Ci
 gi Hi 1  pi  Hi 1  ti  Hi 1
Oklobdzija 2004
Computer Arithmetic
82
Ling’s Derivations
From:
and
gi 
pi CC
Hii 11  Cii11  Cii  gi  C
1 
i
piiC

C

g
i
i
i
i
because:
Hi 1  gi  ti 1Hi
Ci1  ti  Hi1
fundamental expansion
Now we need to derive Sum equation
Oklobdzija 2004
Computer Arithmetic
83
Ling Adder
Variation of CLA:
Ling’s equations:
pi  ai  bi
ti  ai  bi
gi  ai  bi
gi  ai  bi
Ci 1  gi  pi  Ci
Hi 1  gi  ti 1  Hi
Si  pi  Ci
Si  ti  Hi 1  gi ti 1Hi
Ling, IBM J. Res. Dev, 5/81
Oklobdzija 2004
Computer Arithmetic
84
Ling Adder
Ling’s equation:
Variation of CLA:
Ci 1  gi  gi Ci  pi  Ci
 gi  gi  pi   Ci
ai
bi
Hi+1
ci+1
ai-1 bi-1
Hi
gi, ti
ci
si
gi-1, ti-1
ci-1
si-1
Hi 1  gi  ti 1  Hi
Ci 1  gi  ti  Ci
Ling uses different transfer function.
Four of those functions have desired
properties (Ling’s is one of them)
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004
Computer Arithmetic
85
Ling Adder
Conventional:
Fan-in of 5
C4  g3  t3 g 2  t3t2 g1  t3t2t1g0  t3t2t1t0Cin
Ling:
H 4  g 3  t2 g 2  t2t1g1  t2t1t0 g0  t2t1t0t1Cin
H 4  g 3  g 2  t2 g1  t2t1 g0  t2t1t0Cin
Fan-in of 4
Oklobdzija 2004
Computer Arithmetic
86
Advantages
of
Ling’s
Adder
Uniform loading in fan-in and fan-out
•
• H16 contains 8 terms as compared to G16 that
contains 15.
• H16 can be implemented with one level of logic
(in ECL), while G16 can not (with 8-way wireOR).
(Ling’s adder takes full advantage of wired-OR, of
special importance when ECL technology is
used - his IBM limitation was fan-in of 4 and
wire-OR of 8)
Oklobdzija 2004
Computer Arithmetic
87
Ling: Weinberger Notes
Oklobdzija 2004
Computer Arithmetic
88
Ling: Weinberger Notes
Oklobdzija 2004
Computer Arithmetic
89
Ling: Weinberger Notes
Oklobdzija 2004
Computer Arithmetic
90
Advantage of Ling’s Adder
• 32-bit adder used in: IBM 3033, IBM S370/
Model168, Amdahl V6.
• Implements 32-bit addition in 3 levels of
logic
• Implements 32-bit AGEN: B+Index+Disp in
4 levels of logic (rather than 6)
• 5 levels of logic for 64-bit adder used in
HP processor
Oklobdzija 2004
Computer Arithmetic
91
Implementation of Ling’s
Adder in CMOS
(S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96)
Oklobdzija 2004
Computer Arithmetic
92
S. Naffziger,
ISSCC’96
H 4  g3  g2  t2 g1  t2t1g0
Ci 1  ti  Hi 1
Oklobdzija 2004
Computer Arithmetic
93
S. Naffziger,
ISSCC’96
H 4  g3  g2  t2 g1  t2t1g0
Oklobdzija 2004
Computer Arithmetic
94
H 4  g3  g2  t2 g1  t2t1g0
S. Naffziger,
ISSCC’96
Oklobdzija 2004
Computer Arithmetic
95
S. Naffziger,
ISSCC’96
Oklobdzija 2004
Computer Arithmetic
96
S. Naffziger, ISSCC’96
Oklobdzija 2004
Computer Arithmetic
97
S. Naffziger, ISSCC’96
Oklobdzija 2004
Computer Arithmetic
98
S. Naffziger,
ISSCC’96
Oklobdzija 2004
Computer Arithmetic
99
C16  p15 H16  p15 ( g15  g11  t11 g7  t11t7 g0 ) 
S. Naffziger, ISSCC’96
Oklobdzija 2004
Computer Arithmetic
100
S. Naffziger,
ISSCC’96
Oklobdzija 2004
Computer Arithmetic
101
S. Naffziger,
ISSCC’96
Oklobdzija 2004
Computer Arithmetic
102
S. Naffziger,
ISSCC’96
Oklobdzija 2004
Computer Arithmetic
103
Ling Adder Critical Path
Oklobdzija 2004
Computer Arithmetic
104
Ling Adder: Circuits
G3
CK
A2
CK
A3
B2
A2
A1
B2
B1
A1
G4
A0
B0
B1 A3
B3
A0
B0 A2
B2
LC
SumL
C1L
LCH
LCL
C1H C0L
C0H
K
G1
P2
CK
P4
CK
G0
P1
A1
B3
B1
CK
CK
P
G2
G
C1H
SumH
C1L C0H
LCH
C0L
LCL
CK
Oklobdzija 2004
Computer Arithmetic
105
LCS4 – Critical G Path
in1
4b
(k,p) or (g,p)
P4
G3
G4
12b
C15
32b
C47
C31
C15
16b
S63 S62
Oklobdzija 2004
S48
Computer Arithmetic
106
LCS4 – Logical Effort Delay
Prefix-4 Ling/Conditional-Sum (Dynamic - Long Carry Path)
Stages
dg3# (dg3)
g4 (NAND2)
C15# (GG4)
C15 (INV)
C47# (LC)
C47 (INV)
C47#b (INV)
C47b (INV)
S63# (SUM)
S63 (INV)
Branch
4.0
2.0
1.0
1.0
3.0
1.0
1.0
1.0
16.0
1.0
Oklobdzija 2004
LE
0.98
1.11
1.01
1.00
1.03
1.00
1.00
1.00
0.86
1.00
Total
Path
Effort
fo, opt
Parasitic Branch Total LE
2.97
1.84
1.80
1.00
3.32
3.84E+02 9.73E-01 3.74E+02 1.81
1.00
1.00
1.00
1.36
1.00
Computer Arithmetic
Effort
Delay
(ps)
Parasitic
Delay
(ps)
Total
Delay
(ps)
Total
Delay
(FO4)
66
70
136
7.2
107
Results:
• 0.5u Technology
• Speed: 0.930 nS
• Nominal process, 80C, V=3.3V
See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96
Oklobdzija 2004
Computer Arithmetic
108
Prefix Adders
and
Parallel Prefix Adders
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
110
Prefix Adders
Following recurrence operation is defined:
(g, p)o(g’,p’)=(g+pg’, pp’)
such that:
(g0, p0)
Gi, Pi =
i=0
(gi, pi)o(Gi-1, Pi-1 )
1≤i≤n
for i=0, 1, ….. n
ci+1 = Gi
c1 = g0+ p0 cin
(g-1, p-1)=(cin,cin)
This operation is associative, but not commutative
It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004
Computer Arithmetic
111
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
112
Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
113
Pyramid Adder:
M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units”, IFIP Congress, Munich, Germany, 1962.
Oklobdzija 2004
Computer Arithmetic
114
Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
115
Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
116
Hybrid BK-KS Adder
Oklobdzija 2004
Computer Arithmetic
117
Parallel Prefix Adders: S. Knowles 1999
operation is associative: h>i≥j≥k
operation is idempotent: h>i≥j≥k
produces carry: cin=0
Oklobdzija 2004
Computer Arithmetic
118
Parallel Prefix Adders: Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
Oklobdzija 2004
Computer Arithmetic
119
Parallel Prefix Adders: Ladner-Fisher
(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of two.
Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004
Computer Arithmetic
120
Parallel Prefix Adders:Exploits
Kogge-Stone
idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.
Buffers needed in both
cases: K-S, L-F
Oklobdzija 2004
Computer Arithmetic
121
Kogge-Stone Adder
Oklobdzija 2004
Computer Arithmetic
122
Parallel Prefix Adders: Brent-Kung
• Set the fan-out to one
• Avoids explosion of wires (as in K-S)
• Makes no sense in CMOS:
– fan-out = 1 limit is arbitrary and extreme
– much of the capacitive load is due to wire
(anyway)
• It is more efficient to insert buffers in L-F
than to use B-K scheme
Oklobdzija 2004
Computer Arithmetic
123
Brent-Kung Adder
Oklobdzija 2004
Computer Arithmetic
124
Parallel Prefix Adders: Han-Carlson
• Is a hybrid synthesis of L-F and K-S
• Trades increase in logic depth for a
reduction in fan-out:
– effectively a higher-radix variant of K-S.
– others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
• Others, similarly trade the logical depth for
reduction of fan-out and wire.
Oklobdzija 2004
Computer Arithmetic
125
Parallel Prefix Adders:
variety of possibilities
from: Knowles
bounded by L-F and K-S at ends
Oklobdzija 2004
Computer Arithmetic
126
Parallel Prefix Adders: variety of possibilities
Knowles 1999
Following rules are used:
• Lateral wires at the jth level span 2j bits
• Lateral fan-out at jth level is power of 2 up
to 2j
• Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.
Oklobdzija 2004
Computer Arithmetic
127
Parallel Prefix Adders: variety of possibilities
Knowles 1999
• The number of minimal depth graphs of this type
is given in:
• at 4-bits there is only K-S and L-F, afterwards
there are several new possibilities.
Oklobdzija 2004
Computer Arithmetic
128
Parallel Prefix Adders: variety of possibilities
Knowles 1999
example of a new 32-bit adder [4,4,2,2,1]
Oklobdzija 2004
Computer Arithmetic
129
Parallel Prefix Adders: variety of possibilities
Knowles 1999
Example of a new 32-bit adder [4,4,2,2,1]
Oklobdzija 2004
Computer Arithmetic
130
Parallel Prefix Adders: variety of possibilities
Knowles 1999
• Delay is given in terms of FO4 inverter delay: w.c.
(nominal case is 40-50% faster)
• K-S is the fastest
• K-S adders are wire limited (requiring 80% more area)
• The difference is less than 15% between examined schemes
Oklobdzija 2004
Computer Arithmetic
131
Parallel Prefix Adders: variety of possibilities
Knowles 1999
Oklobdzija 2004
Conclusion
• Irregular, hybrid schmes
are possible
• The speed-up of 15% is
achieved at the cost of
large wiring, hence area
and power
• Circuits close in speed to
K-S are available at
significantly lower wiring
cost
Computer
Arithmetic
132
VLSI Arithmetic
Lecture 6
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Review
Lecture 5
Prefix Adders
and
Parallel Prefix Adders
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
136
Prefix Adders
Following recurrence operation is defined:
(g, p)o(g’,p’)=(g+pg’, pp’)
such that:
(g0, p0)
Gi, Pi =
i=0
(gi, pi)o(Gi-1, Pi-1 )
1≤i≤n
for i=0, 1, ….. n
ci+1 = Gi
c1 = g0+ p0 cin
(g-1, p-1)=(cin,cin)
This operation is associative, but not commutative
It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004
Computer Arithmetic
137
Parallel Prefix Adders: S. Knowles 1999
operation is associative: h>i≥j≥k
operation is idempotent: h>i≥j≥k
produces carry: cin=0
Oklobdzija 2004
Computer Arithmetic
138
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
139
Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
140
Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
141
Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
142
Kogge-Stone Adder
Oklobdzija 2004
Computer Arithmetic
143
Brent-Kung Adder
Oklobdzija 2004
Computer Arithmetic
144
Hybrid BK-KS Adder
Oklobdzija 2004
Computer Arithmetic
145
Pyramid Adder:
M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units”, IFIP Congress, Munich, Germany, 1962.
Oklobdzija 2004
Computer Arithmetic
146
Parallel Prefix Adders: Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
Oklobdzija 2004
Computer Arithmetic
147
Parallel Prefix Adders: Ladner-Fisher
(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of two.
Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004
Computer Arithmetic
148
Parallel Prefix Adders:Exploits
Kogge-Stone
idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.
Buffers needed in both
cases: K-S, L-F
Oklobdzija 2004
Computer Arithmetic
149
Parallel Prefix Adders: Brent-Kung
• Set the fan-out to one
• Avoids explosion of wires (as in K-S)
• Makes no sense in CMOS:
– fan-out = 1 limit is arbitrary and extreme
– much of the capacitive load is due to wire
(anyway)
• It is more efficient to insert buffers in L-F
than to use B-K scheme
Oklobdzija 2004
Computer Arithmetic
150
Two Parallel Prefix Adder Structures
Han-Carlson
Kogge-Stone
Cout
C15
C14
C13
C12
C11
C10
C9
C8
C7
C6
C5
C4
C3
C2
G1,P1
G1,P1
G2,P2
G2,P2
G3,P3
G3,P3
G4,P4
G4,P4
C1
Cout
• log(bits) carry stages
• Extra Wiring
Oklobdzija 2004
C15
C14
C13
C12
C11
C10
C9
C8
C7
C6
C5
C4
C3
C2
C1
• log(bits) + 1 carry stages
• Reduced Wiring and Gates
Computer Arithmetic
151
Parallel Prefix Adders: Han-Carlson
• Is a hybrid synthesis of L-F and K-S
• Trades increase in logic depth for a
reduction in fan-out:
– effectively a higher-radix variant of K-S.
– others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
• Others, similarly trade the logical depth for
reduction of fan-out and wire.
Oklobdzija 2004
Computer Arithmetic
152
Parallel Prefix Adders:
variety of possibilities
from: Knowles
bounded by L-F and K-S at ends
Oklobdzija 2004
Computer Arithmetic
153
Parallel Prefix Adders: variety of possibilities
Knowles 1999
Following rules are used:
• Lateral wires at the jth level span 2j bits
• Lateral fan-out at jth level is power of 2 up
to 2j
• Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.
Oklobdzija 2004
Computer Arithmetic
154
Parallel Prefix Adders: variety of possibilities
Knowles 1999
• The number of minimal depth graphs of this type
is given in:
• at 4-bits there is only K-S and L-F, afterwards
there are several new possibilities.
Oklobdzija 2004
Computer Arithmetic
155
Parallel Prefix Adders: variety of possibilities
Knowles 1999
example of a new 32-bit adder [4,4,2,2,1]
Oklobdzija 2004
Computer Arithmetic
156
Parallel Prefix Adders: variety of possibilities
Knowles 1999
Example of a new 32-bit adder [4,4,2,2,1]
Oklobdzija 2004
Computer Arithmetic
157
Parallel Prefix Adders: variety of possibilities
Knowles 1999
• Delay is given in terms of FO4 inverter delay: w.c.
(nominal case is 40-50% faster)
• K-S is the fastest
• K-S adders are wire limited (requiring 80% more area)
• The difference is less than 15% between examined schemes
Oklobdzija 2004
Computer Arithmetic
158
Parallel Prefix Adders: variety of possibilities
Knowles 1999
Oklobdzija 2004
Conclusion
• Irregular, hybrid schmes
are possible
• The speed-up of 15% is
achieved at the cost of
large wiring, hence area
and power
• Circuits close in speed to
K-S are available at
significantly lower wiring
cost
Computer
Arithmetic
159
Possibilities for Further Research
• The logical depth is important (Knowles was
right)
• The fan-out is less important than fan-in
(Knowles was wrong):
– It is possible to examine a variety of topologies with
restricted and varied fan-in.
• Driving strength and Logical Effort rules were
overlooked and at least neglected:
– It is possible to create number of topologies taking LE
rules into account.
– It is further possible to combine the rules with
compound domino implementation taking advantage
of two different rules governing “dynamic” and “static”.
• ItOklobdzija
is still2004possible toComputer
produce
a better adder !
Arithmetic
160
Other Types of Adders
Oklobdzija 2004
Computer Arithmetic
161
Conditional Sum Adder
J. Sklansky, “Conditional-Sum Addition
Logic”, IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.
Conditional Sum Adder
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
163
Conditional
Sum Adder
Oklobdzija 2004
Computer Arithmetic
164
Conditional Sum Adder
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
165
Conditional Sum Adder
Oklobdzija 2004
Computer Arithmetic
from: Ercegovac-Lang
166
Conditional Sum Adder
Oklobdzija 2004
Computer Arithmetic
167
Carry-Select Adder
O. J. Bedrij, “Carry-Select Adder”, IRE
Transactions on Electronic Computers, June
1962, p.340-34
Carry-Select Sum Adder
Oklobdzija 2004
Computer Arithmetic
from: Ercegovac-Lang
169
Carry-Select Adder
Addition under assumption of Cin=0 and Cin =1.
Oklobdzija 2004
Computer Arithmetic
170
Carry Select Adder:
combining two 32-b VBAs in select mode
Delay =VBA32+ MUX
Oklobdzija 2004
Computer Arithmetic
171
Carry-Select Adder
O.J. Bedrij, IBM Poughkeepsie, 1962
Oklobdzija 2004
Computer Arithmetic
172
Download