C411L20Multiplier

advertisement
CMPEN 411
VLSI Digital Circuits
Spring 2009
Lecture 20: Multiplier Design
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003
J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L20 S.1
Review: Basic Building Blocks

Datapath

Execution units
- Adder, multiplier, divider, shifter, etc.



Control


Finite state machines (PLA, ROM, random logic)
Interconnect


Register file and pipeline registers
Multiplexers, decoders
Switches, arbiters, buses
Memory

Caches (SRAMs), TLBs, DRAMs, buffers
Sp09 CMPEN 411 L20 S.2
The Binary Multiplication
1 0 1 0 1 0
1 0 1 1
x
Multiplicand
Multiplier
1 0 1 0 1 0
1 0 1 0 1 0
0 0 0 0 0 0
+
Partial products
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
Sp09 CMPEN 411 L20 S.3
Result
Multiply Operation

Multiplication is just a a lot of additions
N
multiplicand
multiplier
partial
product
array
N
can be formed in parallel
double precision product
2N
Sp09 CMPEN 411 L20 S.4
Multiplication Approaches

Right shift and add

Partial product array rows are accumulated from top to bottom
on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial
product to align it with the next row to add


Time for N bits
Tserial_mult = O(N Tadder) = O(N2) for a RCA
Making it faster

Use a faster adder

Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth)


Sp09
Form the partial product array in parallel and add it in parallel
Making it smaller (i.e., slower)

Use serial-parallel mult

Use an array multiplier
- Very regular structure with only short wires to nearest neighbor
cells. Thus, very simple and efficient layout in VLSI Can be easily
CMPEN 411 L20 S.5
and efficiently pipelined
Serial-parallel multiplier structure
Sp09 CMPEN 411 L20 S.6
The Array Multiplier
X3
Z7
X2
X1
X0
Y1 Z0
X3
X2
X1
X0
HA
FA
FA
HA
X3
X2
X1
X0
FA
FA
FA
HA
X3
X2
X1
X0
FA
FA
FA
HA
Z6
Sp09 CMPEN 411 L20 S.7
Z5
Z4
Y3
Z3
Y2
Z2
Z1
Y0
Booth multiplier

Encoding scheme to reduce number of stages in
multiplication.

Performs two bits of multiplication at once—requires half
the stages.

Each stage is slightly more complex than simple
multiplier, but adder/subtracter is almost as small/fast as
adder.
Sp09 CMPEN 411 L20 S.11
Booth encoding

Two’s-complement form of multiplier:
y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)
(example, y=18=010010 y= -18 = 101110 )


Rewrite using 2a = 2a+1 - 2a:


y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we
can determine whether to add x, 2x to partial product.
Sp09 CMPEN 411 L20 S.12
Booth actions


y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we
can determine whether to add x, 2x to partial product.
yi yi-1 yi-2 increment
000
0
001
x
010
x
011
2x
100
-2x
101
-x
110
-x
111
0
Sp09 CMPEN 411 L20 S.13
Booth example

x = 1001 (910), y = 0111 (710).

P0 = 00000000

y3y2y1=011 y1y0y-1=11(0)

y1y0y-1 = 110, P1 = P0 - (1001) = 11110111

x shift left for 2 bits to be 100100

y3y2y1 = 011, P2 = P1 (10*100100) =
11110111+01001000 = 001111111 (6310)

An array multiplier needs N addtions, booth multiplier
needs only N/2 additions
Sp09 CMPEN 411 L20 S.14
Review: A 64-bit Adder/Subtractor
add/subt


Ripple Carry Adder (RCA)
built out of 64 FAs
Subtraction – complement
all subtrahend bits (xor
gates) and set the low
order carry-in
RCA

advantage: simple logic,
so small (low cost)

disadvantage: slow (O(N)
for N bits) and lots of
glitching (so lots of energy
consumption)
Sp09 CMPEN 411 L20 S.15
A0
1-bit
FA
C1
S0
A1
1-bit
FA
C2
S1
A2
1-bit
FA
C3
S2
B0
B1
B2
...

C0=Cin
C63
A63
B63
1-bit
FA
S63
C64=Cout
Booth structure
Sp09 CMPEN 411 L20 S.16
Wallace-Tree Multiplier
Partial products
6
5
4
3
First stage
2
1
0
6
5
4
(a)
5
4
1
0 Bit position
2
1
0
Final adder
3
FA
2
1
0
6
5
4
3
HA
(c)
Sp09 CMPEN 411 L20 S.17
2
(b)
Second stage
6
3
(d)
Wallace-Tree Multiplier
Full adder = (3,2) compressor
Sp09 CMPEN 411 L20 S.18
Making it Faster: Tree Multiplier Structure
0 D
Q (‘ier)
0 D
0 D
0 D (‘icand)
partial
product
array
reduction
tree
fast carry
propagate
adder (CPA)
Sp09 CMPEN 411 L20 S.19
mux
+
reduction
tree (log N)
+
CPA (log N)
P (product)
interconnect
multiple
forming
circuits
(4,2) Counter

Built out of two (3,2) counters (just FA’s!)


all of the inputs (4 external plus one internal) have
the same weight (i.e., are in the same bit position)
the internal carry output is fed to the next higher
weight position (indicated by the
)
(3,2)
(3,2)
Sp09 CMPEN 411 L20 S.20
Note: Two carry outs - one
“internal” and one “external”
Tiling (4,2) Counters

(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
Reduces columns four high to columns only two high


Tiles with neighboring (4,2) counters
Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out
Sp09 CMPEN 411 L20 S.22
4x4 Partial Product Array Reduction

Fast 4x4 multiplication using (4,2) counters
 How would you
lay it out?
multiplicand
multiplier
multiplicand
multiplier
partial
product
array
reduced pp
array (to
CPA)
double
precision
product
Sp09 CMPEN 411 L20 S.24
five (4,2) counters
5-bit CPA
8-bit product
8x8 Partial Product Array Reduction
Wallace tree
multiplier

‘icand
‘ier
partial
product
array
two rows of
nine (4,2)
counters
reduced
partial
product
array
one row of
thirteen
(4,2)
counters
to a 13-bit fast CPA
Sp09 CMPEN 411 L20 S.25
An 8x8 Multiplier Layout

How should it be laid out?
multiplicand
multiplier
nine (4,2) counters
nine (4,2) counters
thirteen (4,2) counters
13-bit CPA
Sp09 CMPEN 411 L20 S.26
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques
- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
Sp09 CMPEN 411 L20 S.32
Next Lecture and Reminders

Next lecture

Shifters, decoders, and multiplexers
- Reading assignment – Rabaey, et al, 11.5-11.6
Sp09 CMPEN 411 L20 S.33
Download