EE 357 Lecture 1

advertisement
1
EE 209
Multiplication Techniques
2
Digital System Design
• End our semester revisiting our key concept:
– In digital systems, algorithms can be
implemented in hardware, software, or a
combination of both
Sensors
Digital
Inputs
Outputs
Clock
Reset
Custom Logic
(ASIC or FPGA)
Interconnect
Microprocessors
(Software
executing on
hardware)
Interconnect
Analog
Inputs
Analog to
Digital
Conversion
(ADC)
Digital
Processing
Digital to
Analog
Conversion
(DAC)
Analog
Outputs
Digital
Outputs
3
Preparing for the Project
• You will be designing a hardware engine for
something "difficult/slow" to do in software
• Then integrate that hardware engine with a
processor core
4
Array Multiplier (Combinational)
Add and Shift Method (Sequential)
MULTIPLICATION TECHNIQUES
5
Unsigned Multiplication Review
• Same rules as decimal multiplication
• Multiply each bit of Q by M shifting as you go
• An m-bit * n-bit mult. produces an m+n bit result
(i.e. n-bit * n-bit produces 2*n bit result)
• Notice each partial product is a shifted copy of M or 0 (zero)
1010 M (Multiplicand)
* 1011 Q (Multiplier)
6
Unsigned Multiplication Review
• Same rules as decimal multiplication
• Multiply each bit of Q by M shifting as you go
• An m-bit * n-bit mult. produces an m+n bit result
(i.e. n-bit * n-bit produces 2*n bit result)
• Notice each partial product is a shifted copy of M or 0 (zero)
1010
* 1011
1010
1010_
0000__
+ 1010___
01101110
M (Multiplicand)
Q (Multiplier)
PP(Partial
Products)
P (Product)
7
Multiplication Techniques
• A multiplier unit can be
– Purely Combinational: Each partial product is
produced in parallel and fed into an array of
adders to generate the product
– Sequential and Combinational: Produce and add 1
partial product at a time (per cycle)
8
Combinational Multiplier
• Partial Product (PPi) Generation
– Multiply Q[i] * M
• if Q[i]=0 => PPi = 0
• if Q[i]=1 => PPi = M
9
Combinational Multiplier
• Partial Product (PPi) Generation
– Multiply Q[i] * M
• if Q[i]=0 => PPi = 0
• if Q[i]=1 => PPi = M
– AND gates can be used to generate each partial
product
M[3]
M [2 ]
0
0
M[ 1 ]
0
0
M[0 ]
0
0
0
0
M[3]
if…
Q[ i]=0
1
M[3]
M [2 ]
1
M[2]
M[ 1 ]
1
M[1]
M[0 ]
1
M[0]
if…
Q[ i]=1
10
Combinational Multiplier
• Partial Products must be added together
• Combinational multipliers suffer from long
propagation delay through the adders
– propagation delay is proportional to the number
of partial products (i.e. number of bits of input)
and the width of each adder
11
Adder Propagation Delay
1111
+ 0001
X
Y
Co FA
S
X
Ci
0
Y
Co FA
S
X
Ci
0
Y
Co FA
S
X
Ci
0
Y
Co FA
S
Ci
0
12
Adder Propagation Delay
1111
+ 0001
1
0
X
1
Y
Co FA
S
0
X
Ci
0
1
Y
Co FA
S
0
X
Ci
0
1
Y
Co FA
S
1
X
Ci
0
Y
Co FA
S
Ci
0
13
Adder Propagation Delay
1111
+ 0001
1
0
X
0
1
Y
Co FA
0
X
Ci
0
1
Y
Co FA
0
X
Ci
0
1
Y
Co FA
1
X
Ci
1
Y
Co FA
S
S
S
S
1
1
1
0
Ci
0
14
Adder Propagation Delay
1111
+ 0001
1
0
X
0
1
Y
Co FA
0
X
Ci
0
1
Y
Co FA
0
X
Ci
1
1
Y
Co FA
1
X
Ci
1
Y
Co FA
S
S
S
S
1
1
0
0
Ci
0
15
Adder Propagation Delay
1111
+ 0001
1
0
X
0
1
Y
Co FA
0
X
Ci
1
1
Y
Co FA
0
X
Ci
1
1
Y
Co FA
1
X
Ci
1
Y
Co FA
S
S
S
S
1
0
0
0
Ci
0
16
Adder Propagation Delay
1111
+ 0001
1
0
X
1
1
Y
Co FA
0
X
Ci
1
1
Y
Co FA
0
X
Ci
1
1
Y
Co FA
1
X
Ci
1
Y
Co FA
S
S
S
S
0
0
0
0
Ci
0
17
Critical Path
• Critical Path = Longest possible delay path
Assume tsum = 5 ns,
tcarry= 4 ns
X
16 ns
Y
Co FA
S
17 ns
X
Ci
12 ns
Y
Co FA
X
Ci
8 ns
Y
Co FA
X
Ci
4 ns
Y
Co FA
S
S
S
13 ns
9 ns
5 ns
Ci
Critical Path
18
Combinational Multiplier
19
Combinational Multiplier
20
Combinational Multiplier
21
Combinational Multiplier
22
Combinational Multiplier
23
Combinational Multiplier
24
Combinational Multiplier
25
Combinational Multiplier
26
Combinational Multiplier
27
Critical Paths
Critical Path 1
Critical Path 2
28
Combinational Multiplier Analysis
• Large Area due to (n-1) m-bit adders
– n-1 because the first adder adds the first two
partial products and then each adder afterwards
adds one more partial product
• Propagation delay is in two dimensions
– proportional to m+n
29
Sequential Multiplier
• Use 1 adder to add a single partial product per
clock cycle keeping a running sum
30
Add and Shift Method
•
•
•
•
•
•
Sequential algorithm
n-bit * n-bit multiply
Adds 1 partial product per clock
Shift running sum 1-bit right each clock
Three n-bit Registers, 1 Adder
At start:
– M = Multiplicand
– Q = Multiplier
– A = Answer => initialized to 0
• After completion
– A and Q concatenate to form 2n-bit answer
31
Add and Shift Hardware
1010 = M
* 1011 = Q
C
0
A
0
0
Q
0
0
Cout
0
Cin
0
1010
M
1
0
1
1
32
Add and Shift Algorithm
• C=0, A=0
• Repeat the following n-times
– If Q[0] = 0, A = A+0
Else if Q[0] = 1, A= A+M
– Shift right 1-bit (0→C→A→Q)
33
1010
* 1011
34
Add and Shift Multiplication
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
0
0
Q
0
1
0
1
1
C
0
Cout
0
Cin
0
1010
M
M = 1010
A
0000
Q
1011
35
Add and Shift Multiplication
1010
C
0
1010
A
0
0
1010
* 1011
+ 1010
1010 = M
* 1011 = Q
01101110 = Ans
0
Q
0
1
0
1
1
0
Cout
ADD
Multiplicand
1010
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
1010
Q
1011
1011
Add
36
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
1
0
1
Q
0
1
0
1
1
0
Cout
Before Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
Q
1011
1011
0101
Add
Shift
37
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
1
0
Q
1
0
1
0
1
1st bit of
Product
0
Cout
After Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
Q
1011
1011
0101
Add
Shift
38
Add and Shift Multiplication
1111
C
0
A
0
1010
* 1011
1010
1010
+ 1010011110
1010 = M
* 1011 = Q
01101110 = Ans
1
0
Q
1
0
1
0
1
0
Cout
ADD
Multiplicand
1111
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
1111
0101
Add
Q
1011
39
Add and Shift Multiplication
1010
* 1011
1010
1010
+ 1010011110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
1
1
1
Q
1
0
1
0
1
0
Cout
Before Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
Q
1011
40
Add and Shift Multiplication
1010
* 1011
1010
1010
+ 1010011110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
1
1
Q
1
1
0
1
0
2nd bit of
Product
0
Cout
After Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
Q
1011
41
Add and Shift Multiplication
0111
C
0
A
0
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
1010 = M
* 1011 = Q
01101110 = Ans
1
1
Q
1
1
0
1
0
0
Cout
ADD
Zero
0111
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0111
1010
No Add
Q
1011
42
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
1
1
Q
1
1
0
1
0
0
Cout
Before Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0
0111
0011
1010
1101
No Add
Shift
Q
1011
43
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
0
1
Q
1
1
1
0
1
3rd bit of
Product
0
Cout
After Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0
0111
0011
1010
1101
No Add
Shift
Q
1011
44
Add and Shift Multiplication
1101
C
0
A
0
0
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
+ 1010--01101110
1010 = M
* 1011 = Q
01101110 = Ans
1
Q
1
1
1
0
1
0
Cout
ADD
Multiplicand
1101
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0
0111
0011
1010
1101
No Add
Shift
0
1101
1101
Add
Q
1011
45
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
+ 1010--01101110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
1
1
0
Q
1
1
1
0
1
0
Cout
Before Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0
0111
0011
1010
1101
No Add
Shift
0
0
1101
0110
1101
1110
Add
Shift
Q
1011
46
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
+ 1010--01101110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
1
1
Q
0
1
1
1
0
Final Product
0
Cout
After Shift
Right
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0
0111
0011
1010
1101
No Add
Shift
0
0
1101
0110
1101
1110
Add
Shift
Q
1011
47
Add and Shift Multiplication
1010
* 1011
+ 1010
1010
+ 1010011110
+ 0000-0011110
+ 1010--01101110
1010 = M
* 1011 = Q
01101110 = Ans
C
0
A
0
1
1
Q
0
1
1
1
0
Final Product
0
Cout
Finished
0
Cin
0
1010
M
C
0
M = 1010
A
0000
0
0
1010
0101
1011
0101
Add
Shift
0
0
1111
0111
0101
1010
Add
Shift
0
0
0111
0011
1010
1101
No Add
Shift
0
0
1101
0110
1101
1110
Add
Shift
0110
1110 = 11010
Q
1011
48
1101 * 0101 Example
C
0
A
0
0
Q
0
0
0
1
0
1
C=0
M=1101
A=0000
0
1101
Q=0101 Description
0101 A=A+M
Shift Right C,A,Q
A=A+0
Cout
Shift Right C,A,Q
0
Cin
A=A+M
Shift Right C,A,Q
A=A+0
0
1101
M
Shift Right C,A,Q
49
1101 * 0101 Example
C
0
A
0
0
Q
0
0
Cout
0
Cin
0
1101
M
0
1
0
1
C=0
M=1101
A=0000
0
1101
0101 A=A+M
0
0110
1010 Shift Right C,A,Q
0
0110
1010 A=A+0
0
0011
0101 Shift Right C,A,Q
1
0000
0101 A=A+M
0
1000
0010 Shift Right C,A,Q
0
1000
0010 A=A+0
0
0100
0001 Shift Right C,A,Q
Q=0101 Description
50
Sequential Multiplier Analysis
• Pros:
– Smaller Area due to the use of only 1 adder
• Cons:
– Slow to execute (2 cycles per bit of the multiplier)
51
Let's Practice our Design Skills
• Break design into control and datapath
– This is the datapath
– 1 Adder
– 2-to-1 mux
– 2 shift registers (A/Q)
– 1 normal reg (M)
– 1 FF w/ Enable (C)
C
0
A
0
0
Q
0
0
Cout
0
Cin
0
1010
M
1
0
1
1
52
State Machine Control
• From our high level datapath we can arrive at
a high-level state diagram
On Reset
(power on)
START
ADD
WAIT
If q[0] = 0
C,A = A+0
If q[0] = 1
C,A = A+M
C=A=0
M=Min,Q=Qin
CNT=0
START
SHIFT
C→A→Q
CNT=CNT+1
MAXCNT
MAXCNT
START
DONE
DONE=1
START
53
Refining our Design
• But now we need to refine our design to
actual components, specific control bits, etc.
54
Sample Shift Register
• Shift registers come in
many flavors, we'll just look
at one component we can
use from Xilinx: SR4CLED
• 4-bit Bi-directional Shift
Register
– ACLR: asynchronous reset
– LD: Load/data enable
• Allows usage as register w/
enable
– CE: Must be 1 to shift, 0
means hold
– Left: 0 = Right, 1 = Left
– DSL and DSR
• Data to shift in from left or right
DSR D3
D2
D1
D0 DSL
LD
ACLR 4-bit Bidirectional
CE
Shift
Register
CLK
LEFT
Q3
Q2
Q1
Q0
ACLR
CLK
LD
CE
LEFT
Q*[3:0]
(case)
1
X
X
X
X
0000
Reset
0
0,1
X
X
X
Q[3:0]
0
↑
1
X
X
D[0:3]
Load
0
↑
0
0
X
Q[3:0]
Hold
0
↑
0
1
0
DSR,Q[3:1]
Right
0
↑
0
1
1
Q[2:0],DSL
Left
Xilinx: SR4CLED
55
Shift Registers
CLK
ACLR
Hold
Hold
Load
Right
Hold
Right
Left
Left
LD
CE
LEFT
DSR
DSL
D[3:0]
Q[3:0]
1011
1111
0000
1011
0101
1010
0100
1001
56
Complete the DataPath
Assume you build
the state machine
below and
produce 4-signals
that tell us which
state we are in:
• Qwait
• Qadd
• Qsh
• Qdone
DSR
ACLR
CLK
SET
DSL
LD
Shift
Reg.
DSR
Shift
Reg.
CLK
LEFT
DSL
LD
D[3:0]
ACLR
CE
Q[3:0]
1
Q
D[3:0]
CE
LEFT
Q[3:0]
D
EN
CNTR
Q0
Q1
ACLR
Q2
EN
CLR
C4
A[3:0]
S[3:0]
B[3:0]
Adder
C0
S0
0000
Y
1
Q[3:0]
D[3:0]
EN
/AR
CLK
57
Complete the DataPath
Qin[3:0]
S[3:0]
DSR
0
ACLR
CLK
CLK
SET
Shift
Reg.
DSL
QAdd
LD
QSh
CE
LEFT
0
DSR
0
CLK
Q[3:0]
1
Q
D[3:0]
D
EN
CLR
DSL
QWait
LD
QSh
CE
D[3:0]
ACLR
Shift
Reg.
CLK
LEFT
0
Q[3:0]
A[0]
QAdd
CLK
A[3:0]
CNTR
Q0
Q1
ACLR
Q2
EN
Q[0]
QWait
C4
A[3:0]
S[3:0]
S[3:0]
PP[3:0]
B[3:0]
Adder
C0
0
S0
0000
Y
1
Q[3:0]
D[3:0]
EN
/AR
CLK
Min[3:0]
QWait
1
CLK
MAXCNT
58
PROJECT
59
Square Root Finder
• Find the square root of, x, without using sqrt function…
• Pick a number, square it and see if it is equal to x
• Use a binary search to narrow down the value you pick
to square
1.87
Sqrt(5)
0
1.25
2.5
4
5
+∞
60
Square Root Finder
• Given a 16-bit integer, X, in the range [0-64,000],
find the square root
– Note: The square root of 0-64,000 lies in the range
[0-256]
– We start our low and high bounds at 0 and 256, pick
the midpoint, square it and see how it compares to x
– We then adjust either the low or high guess
61
Algorithm Pseudocode
int sqrt(int x) // x is 16-bits, return val is 8
{
int hi=256, lo=1; // 8-bit values
while(hi-lo > 1){
int guess = (hi+lo)/2;
if(guess*guess > x) hi = guess;
else lo = guess;
}
return lo;
}
62
Practice The Algorithm
X=38
X=64
Lo
Hi
Lo
Hi
1
256
1
256
1
128
1
128
1
64
1
64
1
32
1
32
1
16
1
16
1
8
8
16
4
8
8
12
6
8
8
10
6
7
8
9
63
Sqrt Datapath
?
hi
+
^2
÷2
Load_Hi
>
Load_Lo
x
?
lo
=
1
Done
Download