Arithmetic Building Blocks

advertisement
Digital Integrated
Circuits
A Design Perspective
Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolic
Arithmetic Circuits
January, 2003
EE141 Integrated
© Digital
Circuits2nd
1
Arithmetic Circuits
A Generic Digital Processor
INPUT-OUTPUT
MEM ORY
CONTROL
DATAPATH
EE141 Integrated
© Digital
Circuits2nd
2
Arithmetic Circuits
Building Blocks for Digital Architectures
Arithmetic unit
- Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.)
Memory
- RAM, ROM, Buffers, Shift registers
Control
- Finite state machine (PLA, random logic.)
- Counters
Interconnect
- Switches
- Arbiters
- Bus
EE141 Integrated
© Digital
Circuits2nd
3
Arithmetic Circuits
a
g64
CARRYGEN
SUMSEL
node1
2-1 Mux
9-1 Mux
ck1
b
SUMGEN
+ LU
REG
5-1 Mux
9-1 Mux
An Intel Microprocessor
sum
sumb
to Cache
s0
s1
LU : Logical
Unit
1000um
Itanium has 6 integer execution units like this
EE141 Integrated
© Digital
Circuits2nd
4
Arithmetic Circuits
Bit-Sliced Design
Control
Bit 2
Bit 1
Data-Out
Multiplexer
Shifter
Adder
Register
Data-In
Bit 3
Bit 0
Tile identical processing elements
EE141 Integrated
© Digital
Circuits2nd
5
Arithmetic Circuits
Bit-Sliced Datapath
From register files / Cache / Bypass
Multiplexers
Shifter
Adder stage 1
Adder stage 2
Wiring
Bit slice 0
Bit slice 1
Bit slice 2
Bit slice 63
Adder stage 3
Loopback Bus
Loopback Bus
Loopback Bus
Wiring
Sum Select
To register files / Cache
EE141 Integrated
© Digital
Circuits2nd
6
Arithmetic Circuits
Itanium Integer Datapath
Fetzer, Orton, ISSCC’02
2nd
EE141 Integrated Circuits
© Digital
7
Arithmetic Circuits
Adders
EE141 Integrated
© Digital
Circuits2nd
8
Arithmetic Circuits
Full-Adder
A
Cin
B
Full
adder
Cout
Sum
EE141 Integrated
© Digital
Circuits2nd
9
Arithmetic Circuits
The Binary Adder
A
Cin
B
Full
adder
Cout
Sum
S = A  B  Ci
= ABC i + ABC i + ABCi + ABCi
C o = AB + BC i + ACi
EE141 Integrated
© Digital
Circuits2nd
10
Arithmetic Circuits
Express Sum and Carry as a function of P, G, D
Define 3 new variable which ONLY depend on A, B
Generate (G) = AB
Propagate (P) = A  B
Delete = A B
Can also derive expressions for S and Co based on D and P
Note that we will be sometimes using an alternate definition for
Propagate (P) = A + B
EE141 Integrated
© Digital
Circuits2nd
11
Arithmetic Circuits
The Ripple-Carry Adder
Worst case delay linear with the number of bits
td = O(N)
tadder = (N-1)tcarry + tsum
Goal: Make the fastest possible carry path circuit
EE141 Integrated
© Digital
Circuits2nd
12
Arithmetic Circuits
Complimentary Static CMOS Full Adder
28 Transistors
EE141 Integrated
© Digital
Circuits2nd
13
Arithmetic Circuits
Inversion Property
A
Ci
A
B
FA
S
Co
Ci
B
FA
Co
S
S  A B C i  = S  A  B  C i 
C  A B C  = C  A B  C 
o
i
o
i
EE141 Integrated
© Digital
Circuits2nd
14
Arithmetic Circuits
Minimize Critical Path by Reducing Inverting Stages
Exploit Inversion Property
EE141 Integrated
© Digital
Circuits2nd
15
Arithmetic Circuits
A Better Structure: The Mirror Adder
VDD
VDD
A
B
VDD
A
B
B
Ci
A
B
Kill
"0"-Propagate
A
Ci
Co
Ci
S
Ci
A
"1"-Propagate
Generate
A
B
B
A
B
Ci
A
B
24 transistors
EE141 Integrated
© Digital
Circuits2nd
16
Arithmetic Circuits
Mirror Adder
Stick Diagram
VDD
A
B
Ci
B
A Ci
Co
Ci
A
B
Co
S
GND
EE141 Integrated
© Digital
Circuits2nd
17
Arithmetic Circuits
The Mirror Adder
•The NMOS and PMOS chains are completely symmetrical.
A maximum of two series transistors can be observed in the carrygeneration circuitry.
•When laying out the cell, the most critical issue is the minimization
of the capacitance at node Co. The reduction of the diffusion
capacitances is particularly important.
•The capacitance at node Co is composed of four diffusion
capacitances, two internal gate capacitances, and six gate
capacitances in the connecting adder cell .
•The transistors connected to Ci are placed closest to the output.
•Only the transistors in the carry stage have to be optimized for
optimal speed. All transistors in the sum stage can be minimal
size.
EE141 Integrated
© Digital
Circuits2nd
18
Arithmetic Circuits
Transmission Gate Full Adder
P
VDD
Ci
A
P
A
A
P
B
VDD
Ci
A
P
Ci
S Sum Generation
Ci
P
B
VDD
A
P
Co Carry Generation
Ci
A
Setup
EE141 Integrated
© Digital
VDD
Circuits2nd
P
19
Arithmetic Circuits
Manchester Carry Chain
VDD
Pi
VDD
Pi
Co
Ci
Gi
Co
Ci

Gi
Di
Pi
EE141 Integrated
© Digital
Circuits2nd

20
Arithmetic Circuits
Manchester Carry Chain
VDD

P0
P1
P2
P3
C3
Ci,0
G1
G0
G3
G2

C0
EE141 Integrated
© Digital
Circuits2nd
C1
C2
C3
21
Arithmetic Circuits
Manchester Carry Chain
Stick Diagram
Propagate/Generate Row
VDD
Pi
Ci - 1
Gi

Pi + 1
Gi + 1
Ci

Ci + 1
GND
Inverter/Sum Row
EE141 Integrated
© Digital
Circuits2nd
22
Arithmetic Circuits
Carry-Bypass Adder
G1
Ci,0
P0
G1
C o,0
P0
FA
P2
Also called
Carry-Skip
FA
G2
Co,1
FA
G3
Co,3
FA
G1
C o,0
P3
Co,2
FA
P0 G1
G2
C o,1
FA
Ci,0
P2
P3
G3
BP=P oP1 P2 P3
C o,2
FA
FA
Multiplexer
P0
Co,3
Idea: If (P0 and P1 and P2 and P3 = 1)
then C o3 = C 0, else “kill” or “generate”.
EE141 Integrated
© Digital
Circuits2nd
23
Arithmetic Circuits
Carry-Bypass Adder (cont.)
tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum
EE141 Integrated
© Digital
Circuits2nd
24
Arithmetic Circuits
Carry Ripple versus Carry Bypass
tp
ripple adder
bypass adder
4..8
EE141 Integrated
© Digital
Circuits2nd
N
25
Arithmetic Circuits
Carry-Select Adder
Setup
P,G
"0"
"0" Carry Propagation
"1"
"1" Carry Propagation
Co,k-1
Multiplexer
Co,k+3
Carry Vector
Sum Generation
EE141 Integrated
© Digital
Circuits2nd
26
Arithmetic Circuits
Carry Select Adder: Critical Path
EE141 Integrated
© Digital
Circuits2nd
27
Arithmetic Circuits
Linear Carry Select
Bit 0-3
Bit 4-7
Setup
Setup
Bit 8-11
Bit 12-15
Setup
Setup
(1)
"0" Carry
"0"
"0"
"0" Carry
"0"
"0" Carry
"0"
"0" Carry
(1)
"1" Carry
"1"
"1" Carry
"1"
(5)
(5)
(5)
(6)
Multiplexer
"1" Carry
"1"
(5)
(7)
Multiplexer
"1" Carry
"1"
(5)
(8)
Multiplexer
Ci,0
Multiplexer
(9)
Sum Generation
S0-3
EE141 Integrated
© Digital
Circuits2nd
Sum Generation
Sum Generation
S 4-7
S8-11
Sum Generation
S 12-15 (10)
28
Arithmetic Circuits
Square Root Carry Select
Bit 0-1
Bit 2-4
Setup
Setup
Bit 5-8
Bit 9-13
Setup
Setup
Bit 14-19
(1)
"0" Carry
"0"
"0"
"0" Carry
"0"
"0" Carry
"0"
"0" Carry
(1)
"1" Carry
"1" Carry
"1"
(3)
"1"
"1" Carry
"1"
(3)
(4)
(4)
Multiplexer
(5)
(5)
Multiplexer
"1" Carry
"1"
(6)
Multiplexer
(7)
(6)
(7)
Multiplexer
Mux
Ci,0
(8)
Sum Generation
Sum Generation
Sum Generation
S2-4
S5-8
S0-1
EE141 Integrated
© Digital
Circuits2nd
Sum Generation
S9-13
Sum
S14-19 (9)
29
Arithmetic Circuits
Adder Delays - Comparison
EE141 Integrated
© Digital
Circuits2nd
30
Arithmetic Circuits
LookAhead - Basic Idea
C o k = f A k B k Co k – 1  = Gk + P kCo k – 1
EE141 Integrated
© Digital
Circuits2nd
31
Arithmetic Circuits
Look-Ahead: Topology
Expanding Lookahead equations:
C o k = Gk + Pk Gk – 1 + Pk – 1Co k – 2 
All the way:
C o k = Gk + Pk  Gk – 1 + P k – 1  + P1 G0 + P0 Ci 0  
EE141 Integrated
© Digital
Circuits2nd
32
Arithmetic Circuits
Logarithmic Look-Ahead Adder
F
A0
A1
A2
A3
A4
A5
A6
A7
tp N
A0
A1
A2
A3
F
A4
A5
A6
A7
EE141 Integrated
© Digital
Circuits2nd
tp log2(N)
33
Arithmetic Circuits
Carry Lookahead Trees
Co  0 = G0 + P0 Ci  0
C o 1 = G1 + P1 G0 + P1 P0 Ci 0
C o 2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C i 0
=  G2 + P2 G1 +  P2 P1   G0 + P0 Ci  0  = G 2:1 + P2:1 C o 0
Can continue building the tree hierarchically.
EE141 Integrated
© Digital
Circuits2nd
34
Arithmetic Circuits
EE141 Integrated
© Digital
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
(A3, B3)
(A4, B4)
(A5, B5)
(A6, B6)
(A7, B7)
(A8, B8)
(A9, B9)
(A10, B10)
(A11, B11)
(A12, B12)
(A13, B13)
(A14, B14)
(A15, B15)
S1
(A1, B1)
(A2, B2)
S0
(A0, B0)
Tree Adders
16-bit radix-2 Kogge-Stone tree
Circuits2nd
35
Arithmetic Circuits
EE141 Integrated
© Digital
S2
S3
S4
S5
S6
S7
S8
S9
S 10
S 11
S 12
S 13
S 14
S 15
(a 3, b 3)
(a 4, b 4)
(a 5, b 5)
(a 6, b 6)
(a 7, b 7)
(a 8, b 8)
(a 9, b 9)
(a 10, b 10)
(a 11, b 11)
(a 12, b 12)
(a 13, b 13)
(a 14, b 14)
(a 15, b 15)
S1
(a 1, b 1)
(a 2, b 2)
S0
(a 0, b 0)
Tree Adders
16-bit radix-4 Kogge-Stone Tree
Circuits2nd
36
Arithmetic Circuits
EE141 Integrated
© Digital
S2
S3
S4
S5
S6
S7
S8
S9
S 10
S 11
S 12
S 13
S 14
S 15
(a 3, b 3)
(a 4, b 4)
(a 5, b 5)
(a 6, b 6)
(a 7, b 7)
(a 8, b 8)
(a 9, b 9)
(a 10, b 10)
(a 11, b 11)
(a 12, b 12)
(a 13, b 13)
(a 14, b 14)
(a 15, b 15)
S1
(a 1, b 1)
(a 2, b 2)
S0
(a 0, b 0)
Sparse Trees
16-bit radix-2 sparse tree with sparseness of 2
Circuits2nd
37
Arithmetic Circuits
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
(A0, B0)
(A1, B1)
(A2, B2)
(A3, B3)
(A4, B4)
(A5, B5)
(A6, B6)
(A7, B7)
(A8, B8)
(A9, B9)
(A10, B10)
(A11, B11)
(A12, B12)
(A13, B13)
(A14, B14)
(A15, B15)
Tree Adders
Brent-Kung Tree
EE141 Integrated
© Digital
Circuits2nd
38
Arithmetic Circuits
Example: Domino Adder
VDD
VDD
Clk
Clk
Pi= ai + bi
ai
bi
Gi = aibi
ai
bi
Clk
Clk
Propagate
EE141 Integrated
© Digital
Circuits2nd
Generate
39
Arithmetic Circuits
Example: Domino Adder
VDD
VDD
Clkk
Clkk
Gi:i-2k+1
Pi:i-2k+1
Pi:i-k+1
Pi:i-k+1
Gi:i-k+1
Gi-k:i-2k+1
Pi-k:i-2k+1
Propagate
EE141 Integrated
© Digital
Circuits2nd
Generate
40
Arithmetic Circuits
Example: Domino Sum
EE141 Integrated
© Digital
Circuits2nd
41
Arithmetic Circuits
Multipliers
EE141 Integrated
© Digital
Circuits2nd
42
Arithmetic Circuits
The Binary Multiplication
Z
=
··
X Y
M+ N– 1

=
Zk 2
k=0
N – 1
i 
M – 1

= 
Xi 2   
 

 i=0
 j = 0

=

j
Yj 2 
M – 1 N – 1




k



 Xi Yj 2
i =0 j= 0
i + j



with
M –1
X
=
 Xi 2
i=0
N– 1
Y
=
 Y j2
j= 0
EE141 Integrated
© Digital
Circuits2nd
i
j
43
Arithmetic Circuits
The Binary Multiplication
EE141 Integrated
© Digital
Circuits2nd
44
Arithmetic Circuits
The Array Multiplier
EE141 Integrated
© Digital
Circuits2nd
45
Arithmetic Circuits
The MxN Array Multiplier
— Critical Path
FA
HA
FA
FA
FA
FA
HA
HA
Critical Path 1
Critical Path 2
FA
EE141 Integrated
© Digital
FA
Circuits2nd
FA
HA
Critical Path 1 & 2
46
Arithmetic Circuits
Carry-Save Multiplier
HA
HA
HA
HA
HA
FA
FA
FA
HA
FA
FA
FA
FA
FA
HA
HA
Vector Merging Adder
EE141 Integrated
© Digital
Circuits2nd
47
Arithmetic Circuits
Multiplier Floorplan
X3
X2
X1
X0
Y0
Y1
C S
C S
C S
C S
HA Multiplier Cell
Z0
FA Multiplier Cell
Y2
Y3
Z7
EE141 Integrated
© Digital
C S
C S
C S
C S
C S
C S
C S
C S
C
C
C
C
S
Z6
Circuits2nd
S
Z5
S
Z4
Z1
Vector Merging Cell
Z2
X and Y signals are broadcasted
through the complete array.
(
)
S
Z3
48
Arithmetic Circuits
Wallace-Tree Multiplier
EE141 Integrated
© Digital
Circuits2nd
49
Arithmetic Circuits
Wallace-Tree Multiplier
EE141 Integrated
© Digital
Circuits2nd
50
Arithmetic Circuits
Wallace-Tree Multiplier
y0 y1
y2
y0 y1 y2
Ci-1
FA
y3
Ci
y3 y4 y5
FA
Ci-1
FA
FA
Ci
Ci-1
Ci
Ci-1
y4
FA
Ci
Ci-1
FA
Ci
Ci-1
y5
FA
Ci
FA
C
C
EE141 Integrated
© Digital
S
S
Circuits2nd
51
Arithmetic Circuits
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques
- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
EE141 Integrated
© Digital
Circuits2nd
52
Arithmetic Circuits
Shifters
EE141 Integrated
© Digital
Circuits2nd
53
Arithmetic Circuits
The Binary Shifter
Right
nop
Ai
Left
Bi
Ai-1
Bi-1
Bit-Slice i
...
EE141 Integrated
© Digital
Circuits2nd
54
Arithmetic Circuits
The Barrel Shifter
A3
B3
Sh1
A2
B2
: Data Wire
Sh2
A1
B1
: Control Wire
Sh3
A0
B0
Sh0
Sh1
Sh2
Sh3
Area Dominated by Wiring
EE141 Integrated
© Digital
Circuits2nd
55
Arithmetic Circuits
4x4 barrel shifter
A3
A2
A1
A0
Sh0
Sh2
Sh1
Sh3
Buffer
Widthbarrel ~ 2 pm M
EE141 Integrated
© Digital
Circuits2nd
56
Arithmetic Circuits
Logarithmic Shifter
Sh1 Sh1
Sh2 Sh2
Sh4 Sh4
A3
B3
A2
B2
A1
B1
A0
B0
EE141 Integrated
© Digital
Circuits2nd
57
Arithmetic Circuits
0-7 bit Logarithmic Shifter
A
A
A
A
3
Out3
2
Out2
1
Out1
0
EE141 Integrated
© Digital
Out0
Circuits2nd
58
Arithmetic Circuits
Download