ELECT 90X Programmable Logic Circuits: Multipliers Dr. Eng. Amr T. Abdel-Hamid Slides based on slides prepared by: • B. Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000. • I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. Peters, Natick, MA, 2002. Fall 2009 Shift/Add Multiplication Algorithms Programmable Logic Circuits Notation for our discussion of multiplication algorithms: a x p Multiplicand Multiplier Product (a x) p2k–1p2k–2 ak–1ak–2 . . . a1a0 xk–1xk–2 . . . x1x0 . . . p3 p2 p1 p0 Initially, we assume unsigned operands a x M ultiplic and M ultiplier Dr. Amr Talaat x a 0 x a 1 x a 2 x a 3 p 20 21 22 23 P artial pro duc ts bit-m atrix P roduc t Multiplication of two 4-bit unsigned binary numbers in dot notation. ELECT 90X Multiplication Recurrence Programmable Logic Circuits a x M ultiplic and M ultiplier x a 0 x a 1 x a 2 x a 3 p 20 21 22 23 P artial pro duc ts bit-m atrix P roduc t Preferred Multiplication with right shifts: top-to-bottom accumulation p(j+1) = (p(j) + xj a 2k) 2–1 |–––add–––| |––shift right––| with p(0) = 0 and p(k) = p = ax + p(0)2–k Dr. Amr Talaat Multiplication with left shifts: bottom-to-top accumulation p(j+1) = 2 p(j) + xk–j–1a |shift| |––––add––––| with p(0) = 0 and p(k) = p = ax + p(0)2k ELECT 90X Examples of Basic Multiplication Programmable Logic Circuits Dr. Amr Talaat Right-shift algorithm ======================== a 1 0 1 0 x 1 0 1 1 ======================== p(0) 0 0 0 0 +x0a 1 0 1 0 ––––––––––––––––––––––––– 2p(1) 0 1 0 1 0 (1) p 0 1 0 1 0 +x1a 1 0 1 0 ––––––––––––––––––––––––– 2p(2) 0 1 1 1 1 0 (2) p 0 1 1 1 1 0 +x2a 0 0 0 0 ––––––––––––––––––––––––– 2p(3) 0 0 1 1 1 1 0 p(3) 0 0 1 1 1 1 0 +x3a 1 0 1 0 ––––––––––––––––––––––––– 2p(4) 0 1 1 0 1 1 1 0 (4) p 0 1 1 0 1 1 1 0 ======================== Left-shift algorithm ======================= a 1 0 1 0 x 1 0 1 1 ======================= p(0) 0 0 0 0 (0) 2p 0 0 0 0 0 +x3a 1 0 1 0 –––––––––––––––––––––––– p(1) 0 1 0 1 0 (1) 2p 0 1 0 1 0 0 +x2a 0 0 0 0 –––––––––––––––––––––––– p(2) 0 1 0 1 0 0 2p(2) 0 1 0 1 0 0 0 +x1a 1 0 1 0 –––––––––––––––––––––––– p(3) 0 1 1 0 0 1 0 (3) 2p 0 1 1 0 0 1 0 0 +x0a 1 0 1 0 –––––––––––––––––––––––– p(4) 0 1 1 0 1 1 1 0 ======================= Examples of sequential multiplicati on with right and left shifts. ELECT 90X Basic Hardware Multipliers S hift Programmable Logic Circuits M u ltip lie r x D o ub le w id th p a rtia l p ro d uc t p (j) S hift M u ltip lic a nd a 0 0 Mux xj a Dr. Amr Talaat cout k 1 xj k A dder k Hardware realization of the sequential multiplication algorithm with additions and right shifts. ELECT 90X Programmable Logic Circuits Multiplication of Signed Numbers Sequential multiplication of 2’s-complement numbers with right shifts (positive multiplier). Dr. Amr Talaat Negative multiplicand, positive multiplier: No change, other than looking out for proper sign extension ============================ a 1 0 1 1 0 x 0 1 0 1 1 ============================ p(0) 0 0 0 0 0 +x0a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(1) 1 1 0 1 1 0 p(1) 1 1 0 1 1 0 +x1a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(2) 1 1 0 0 0 1 0 (2) p 1 1 0 0 0 1 0 +x2a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(3) 1 1 1 0 0 0 1 0 (3) p 1 1 1 0 0 0 1 0 +x3a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(4) 1 1 0 0 1 0 0 1 0 p(4) 1 1 0 0 1 0 0 1 0 +x4a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(5) 1 1 1 0 0 1 0 0 1 0 (5) p 1 1 1 0 0 1 0 0 1 0 ============================ ELECT 90X Programmable Logic Circuits The Case of a Neg ative Multiplier Sequential multiplication of 2’s-complement numbers with right shifts (negative multiplier). Dr. Amr Talaat Negative multiplicand, negative multiplier: In last step (the sign bit), subtract rather than add ============================ a 1 0 1 1 0 x 1 0 1 0 1 ============================ p(0) 0 0 0 0 0 +x0a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(1) 1 1 0 1 1 0 p(1) 1 1 0 1 1 0 +x1a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(2) 1 1 1 0 1 1 0 (2) p 1 1 1 0 1 1 0 +x2a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(3) 1 1 0 0 1 1 1 0 (3) p 1 1 0 0 1 1 1 0 +x3a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(4) 1 1 1 0 0 1 1 1 0 p(4) 1 1 1 0 0 1 1 1 0 +(-x4a) 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(5) 0 0 0 1 1 0 1 1 1 0 (5) p 0 0 0 1 1 0 1 1 1 0 ============================ ELECT 90X Booth’s Encoding Programmable Logic Circuits Recall grade school trick When multiplying by 9: Multiply by 10 (easy, just shift digits left) Subtract once E.g. 123454 x 9 = 123454 x (10 – 1) = 1234540 – 1234 54 Converts addition of six partial products to one shift and one subtraction Booth’s algorithm applies same principle Dr. Amr Talaat Except no ‘9’ in binary, just ‘1’ and ‘0’ So, it’s actually easier! ELECT 90X Booth’s Encoding Programmable Logic Circuits Search for a run of ‘1’ bits in the multiplier E.g. ‘0110’ has a run of 2 ‘1’ bits in the middle Multiplying by ‘0110’ (6 in decimal) is equivale nt to multiplying by 8 and subtracting twice, si nce 6 x m = (8 – 2) x m = 8m – 2m Hence, iterate right to left and: Dr. Amr Talaat Subtract multiplicand from product at first ‘1’ Add multiplicand to product after first ‘1’ Don’t do either for ‘1’ bits in the middle ELECT 90X Booth’s Algorithm Programmable Logic Circuits Dr. Amr Talaat Curren Bit to t bit right Explanation Example Operation 1 0 Begins run of ‘1’ 0000111100 0 Subtract 1 1 Middle of run of ‘1’ 0000111100 0 Nothing 0 1 End of a run of ‘1’ 0000111100 0 Add 0 0 Middle of a run of ‘0’ 0000111100 0 Nothing ELECT 90X Booth’s Encoding Programmable Logic Circuits Really just a new way to encode numbers Normally positionally weighted as 2n With Booth, each position has a sign bit Can be extended to multiple bits 0 1 1 0 Binary +1 0 -1 0 1-bit Booth Dr. Amr Talaat +2 -2 2-bit Booth 11 ELECT 90X Booth’s Recoding Programmable Logic Circuits Radix-2 Booth’s recoding ––––––––––––––––––––––––––––––––––––– xi xi–1 yi Explanation ––––––––––––––––––––––––––––––––––––– 0 0 0 No string of 1s in sight 0 1 1 End of string of 1s in x -1 1 0 Beginning of string of 1s in x 1 1 0 Continuation of string of 1s in x ––––––––––––––––––––––––––––––––––––– Dr. Amr Talaat Example 1 0 0 1 (1) -1 0 1 0 1 1 0 1 0 -1 1 0 1 0 1 0 1 1 1 0 -1 1 -1 1 0 0 -1 0 Operand x Recoded version y Justification 2j + 2j–1 + . . . + 2i+1 + 2i = 2j+1 – 2i ELECT 90X Programmable Logic Circuits Example Multiplication with Booth’s Recoding Sequential multiplication of 2’s-complement numbers with right shifts by means of Booth’s recoding. Dr. Amr Talaat –––––––––– xi xi–1 yi –––––––––– 0 0 0 0 1 1 -1 1 0 1 1 0 –––––––––– ============================ a 1 0 1 1 0 x 1 0 1 0 1 Multiplier y 1 1 -1 1 -1 Booth-recoded ============================ p(0) 0 0 0 0 0 +y0a 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(1) 0 0 1 0 1 0 (1) p 0 0 1 0 1 0 +y1a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(2) 1 1 1 0 1 1 0 (2) p 1 1 1 0 1 1 0 +y2a 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(3) 0 0 0 1 1 1 1 0 (3) p 0 0 0 1 1 1 1 0 +y3a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(4) 1 1 1 0 0 1 1 1 0 (4) p 1 1 1 0 0 1 1 1 0 y4a 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(5) 0 0 0 1 1 0 1 1 1 0 (5) p 0 0 0 1 1 0 1 1 1 0 ============================ ELECT 90X Radix-4 Multiplication in Dot Notation Programmable Logic Circuits a x M ultiplic and M ultiplier x a 0 x a 1 x a 2 x a 3 20 21 22 23 P artial pro duc ts bit-m atrix p Dr. Amr Talaat Number of cycles is halved, but now the “difficult” multiple 3a must be dealt with Radix-4, or two-bitat-a-time, multiplication in dot notation P roduc t a x M ultiplic and M ultiplier (x x ) a 40 (x x ) a 41 1 3 p 0 tw o 2 tw o P roduc t ELECT 90X A Possible Design for a Radix-4 Multiplier Programmable Logic Circuits Precomputed via shift-and-add (3a = 2a + a) Multiplier 3a 0 a 2a 2-bit shifts x i+1 00 01 10 11 xi Mux k + 1 cycles, rather than k One extra cycle not too bad, but we would like to avoid it if possible Dr. Amr Talaat Solving this problem for radix 4 may also help when dealing with even higher radices To the adder The multiple generation part of a radix-4 multiplier with precomputation of 3a. ELECT 90X Example Radix-4 Multiplication Using 3a Programmable Logic Circuits Dr. Amr Talaat ================================ a 0 1 1 0 3a 0 1 0 0 1 0 x 1 1 1 0 ================================ p(0) 0 0 0 0 +(x1x0)twoa 0 0 1 1 0 0 ––––––––––––––––––––––––––––––––– 4p(1) 0 0 1 1 0 0 p(1) 0 0 1 1 0 0 +(x3x2)twoa 0 1 0 0 1 0 ––––––––––––––––––––––––––––––––– 4p(2) 0 1 0 1 0 1 0 0 p(2) 0 1 0 1 0 1 0 0 ================================ Example of radix-4 multiplication using the 3a multiple. ELECT 90X Modified Booth’s Recoding Programmable Logic Circuits Radix-4 Booth’s recoding yielding (zk/2 . . . z1z0)four Dr. Amr Talaat ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– xi+1 xi xi–1 yi+1 yi zi/2 Explanation ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 0 0 0 0 0 0 No string of 1s in sight 0 0 1 0 1 1 End of string of 1s 0 1 0 0 1 1 Isolated 1 0 1 1 1 0 2 End of string of 1s 1 0 0 1 0 2 Beginning of string of 1s 1 0 1 1 1 1 End a string, begin new one 1 1 0 0 1 1 Beginning of string of 1s 1 1 1 0 0 0 Continuation of string of 1s ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Recoded Context Radix-4 digit radix-2 digits Example 1 0 0 1 (1) -1 0 1 0 (1) -2 2 1 1 0 1 0 -1 1 0 -1 2 1 0 1 0 1 1 1 0 -1 1 -1 1 0 0 -1 0 -1 -1 0 -2 Operand x Recoded version y Radix-4 version z ELECT 90X Example Multiplication via Modified Booth’s Recoding Programmable Logic Circuits Dr. Amr Talaat ================================ a 0 1 1 0 x 1 0 1 0 -1 -2 z Radix-4 ================================ p(0) 0 0 0 0 0 0 +z0a 1 1 0 1 0 0 ––––––––––––––––––––––––––––––––– 4p(1) 1 1 0 1 0 0 p(1) 1 1 1 1 0 1 0 0 +z1a 1 1 1 0 1 0 ––––––––––––––––––––––––––––––––– 4p(2) 1 1 0 1 1 1 0 0 p(2) 1 1 0 1 1 1 0 0 ================================ Example of radix-4 multiplication with modified Booth’s recoding of the 2’scomplement multiplier. ELECT 90X Multiple Generation with Radix-4 Booth’s Recoding Programmable Logic Circuits Multiplier Multiplicand Init. 0 2-bit shift x i+1 xi x i–1 k Sign extension, not 0 Recoding Logic neg two non0 0 0 a 0 En able 2a 1 Mux Select 0, a, or 2a k+1 Dr. Amr Talaat Add/subtract co ntrol z i/2 a To adder input The multiple generation part of a radix-4 multiplier based on Booth’s recoding. ELECT 90X Programmable Logic Circuits Count = 4 Vs 8 speed improvement Count = 7 Vs 9 no speed improvement Dr. Amr Talaat Count = 16 Vs 8 speed worsened. On an average no improvement in speed ELECT 90X Yet Another Design for Radix-4 Multiplication Multip lier Programmable Logic Circuits 2a 0 M ux x i+1 a 0 xi M ux Old Cumulativ e Partial Pro du ct Radix-4 multiplication with two carry-save adders. CSA CSA New Cumulativ e Partial Pro du ct Dr. Amr Talaat Adder FF 2-Bit Ad der To the Lo wer Half of Partial Pro duct ELECT 90X Radix-8 and Radix-16 Multipliers Multiplier Programmable Logic Circuits 0 8a Mux 0 x i+3 4a Mux 4-bit right shift 0 4-Bit Shift Mux 0 x i+1 a Mux CSA Dr. Amr Talaat Radix-16 multiplication with the upper half of the cumulative partial product in carry-save form. x i+2 2a xi CSA CSA CSA Sum Carry Partial Product (Upper Half) 4 3 FF 4-Bit Ad der 4 To the Lo wer Half of Partial Pro du ct ELECT 90X A Spectrum of Multiplier Design Choices Programmable Logic Circuits Next multiple Several multiples All multiples ... ... Small CSA tree Full CSA tree Adder Partial product Partial product Adder Dr. Amr Talaat Basic binary Speed up High-radix or partial tree Adder Economize Full tree High-radix multipliers as intermediate between sequential radix-2 and full-tree multipliers. ELECT 90X Multibeat Multipliers Programmable Logic Circuits Inp uts P res ent s tate Nex t-s tate logic S tate flip- fl ops N e xt-s ta te e xcita tio n Inp uts P H1 Nex t-s tate logic S tate latc hes CLK (a) S e que ntial m ac hine with F F s S tate latc hes Nex t-s tate logic P H2 Inp uts (b) S e que ntial m ac hine with latc hes and 2 -ph as e c loc k Conceptual view of a twin-beat multiplier. Dr. Amr Talaat Begin changing FF contents Change becomes visible at FF output Observation: Half of the clock cycle goes to waste One cycle ELECT 90X Twin-Beat and Three-Beat Multipliers Programmable Logic Circuits Twin Multiplier Registers a 3a a 3a 4 4 Pip elined Radix-8 Booth Reco der & Selecto r Pip elined Radix-8 Booth Reco der & Selecto r CSA CSA Sum Sum Carry Carry 5 Dr. Amr Talaat FF Adder 6 6-Bit Ad der 6 To the Lo wer Half of Partial Pro duct Twin-beat multiplier with radix-8 Booth’s recoding. ELECT 90X Full-Tree Multipliers Programmable Logic Circuits Multip lier ... a Multip leForming Circuits a . . . a a Partial-Pro ducts Reduction Tree (Multi-Operand Addition Tree) Redundant result Dr. Amr Talaat Redundant-to-Binary Converter Higher-order product bits Some lower-order product bits are generated directly General structure of a full-tree multiplier. ELECT 90X Full-Tree versus Partial-Tree Multiplier Programmable Logic Circuits A ll p a r tia l p ro d ucts S e ve ra l p a rtia l p ro d ucts . . . . . . Dr. Amr Talaat L a rg e tre e o f ca rry -sa ve a d d e rs Logd e p th A dder Logd e p th P ro d uct S m a ll tre e o f ca rry -sa ve a d d e rs A dder P ro d uct Schematic diagrams for full-tree and partial-tree multipliers. ELECT 90X Variations in Full-Tree Multiplier Design Programmable Logic Circuits Designs are distinguished by variations in three elements: Multip lier ... a Multip leForming Circuits 1. Multiple-forming circuits a . . . a a Partial-Pro ducts Reduction Tree 2. Partial products reduction tree (Multi-Operand Addition Tree) Dr. Amr Talaat Redundant result 3. Redundant-to-binary converter Redundant-to-Binary Converter Higher-order product bits Some lower-order product bits are generated directly ELECT 90X Example of Variations in CSA Tree Design Programmable Logic Circuits D a d d a T re e (4 F A s + 2 H A s + 6 -B it A d d e r) W a lla c e T re e (5 F A s + 3 H A s + 4 -B it A d d e r) 1 2 1 3 4 3 2 1 FA FA FA HA 3 2 3 2 1 4 3 2 1 -------------------1 1 3 2 2 3 2 1 FA HA HA FA FA HA FA HA ---------------------2 2 2 2 1 4-Bit Adder 3 FA FA -------------------1 2 1 ---------------------2 1 2 2 2 1 2 1 6-Bit Adder Dr. Amr Talaat ---------------------- ---------------------- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Two different binary 4 4 tree multipliers. ELECT 90X Binary Tree of 4-to-2 Reduction Modules Programmable Logic Circuits CSA CSA 4 -to -2 re d u c tio n m o d u le im p le m e n te d w ith tw o le v e ls o f (3 ; 2 )-c o u n te rs 4 -to -2 4 -to -2 4 -to -2 4 -to -2 4 -to -2 4 -to -2 4 -to -2 Dr. Amr Talaat Tree multiplier with a more regular structure based on 4-to-2 reduction modules. Due to its recursive structure, a binary tree is more regular than a 3-to-2 reduction tree when laid out in VLSI ELECT 90X Array Multipliers Programmable Logic Circuits x 2a x 1a x 0a CSA x 3a a4 x0 0 0 a3 x0 CSA a 2 x0 0 a 1 x0 0 a0 x0 p0 a3 x1 a2 x1 a4 x1 x 4a 0 a1 x1 a0 x1 p1 a3 x2 a2 x2 a4 x2 a1 x2 a0 x2 CSA p2 a3 x3 CSA a2 x3 a4 x3 a1 x3 p3 a3 x4 Ripple-Carry Adder a 0 x3 a1 x4 a2 x4 a4 x4 a 0 x4 p4 Dr. Amr Talaat 0 ax A basic array multiplier uses a one-sided CSA tree and a ripplecarry adder. p9 p8 p7 p6 Details of a 5 5 array multiplier using FA blocks. ELECT 90X p5 Array Multiplier Built of Modified Full-Adder Cells Programmable Logic Circuits Design of a 5 5 array multiplier with two additive inputs and full-adder blocks that include AND gates. a 4 a 3 a 2 a 1 a 0 x p 0 x p Dr. Amr Talaat p p 9 p 8 p 7 p p 6 3 3 x FA 2 2 x p 1 1 x p 0 4 5 ELECT 90X 4 Pipelined Array Multipliers a a Programmable Logic Circuits 4 a 3 a 2 a 1 x 0 0 x 1 x 2 x 3 x 4 With latches after every FA level, the maximum throughput is achieved Latches may be inserted after every h FA levels for an intermediate design Example: 3-stage pipeline Dr. Amr Talaat Pipelined 5 5 array multiplier using latched FA blocks. The small shaded boxes are latches. L a tch e d FA w ith AN D g a te FA FA FA L a tch FA p 9 p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 ELECT 90X p 0 Bit-Serial Multipliers Programmable Logic Circuits Bit-serial adder (LSB first) …x x x 2 1 0 FF FA …y y y …s s s 2 1 0 2 1 0 Bit-serial multiplier Dr. Amr Talaat (Must follow the k-bit inputs with k 0s; alternatively, view the product as being only k bits wide) …a a a 2 0 1 …x x x ? …p p p 2 0 1 2 0 1 What goes inside the box to make a bit-serial multiplier? Can the circuit be designed to support a high clock rate? ELECT 90X Semisystolic Serial-Parallel Multiplier Programmable Logic Circuits a3 Multiplicand (parallel in) a1 a2 x0 x1 x2 x3 a0 Multiplier (serial in) LSB-first Su m FA Carry FA FA FA Product (serial out) Dr. Amr Talaat Semi-systolic circuit for 4 4 multiplication in 8 clock cycles. This is called “semisystolic” because it has a large signal fan-out of k (k-way broadcasting) and a long wire spanning all k positions ELECT 90X Systolic Retiming as a Design Tool Programmable Logic Circuits A semisystolic circuit can be converted to a systolic circuit via retiming, which involves advancing and retarding signals by means of delay removal and delay insertion in such a way that the relative timings of various parts are unaffected Cut –d +d e+d f+d e f CR CL CL g h g–d h–d CR Dr. Amr Talaat –d Original delays +d Adjusted delays Example of retiming by delaying the inputs to CL and advancing the outputs from CL by d units ELECT 90X Multiplicand (parallel in) a1 a2 a3 Programmable Logic Circuits A First Attempt at Retiming x0 x1 x2 x3 a0 Multiplier (serial in) LSB-first Su m FA FA FA Product (serial out) Carry a3 Multiplicand (parallel in) a1 a2 FA x0 x1 x2 x3 a0 Multiplier (serial in) LSB-first Sum FA FA Dr. Amr Talaat FA FA Product (serial out) Carry Cut 3 Cut 2 Cut 1 A retimed version of our semisystolic multiplier. ELECT 90X Multiplicand (parallel in) a1 a2 a3 Programmable Logic Circuits Deriving a Fully S ystolic Multiplier x0 x1 x2 x3 a0 Multiplier (serial in) LSB-first Su m FA FA FA FA Product (serial out) Carry a3 Multip licand (parallel in) a1 a2 x0 a0 x1 x2 x3 Multip lier (serial in) LSB-first Sum Dr. Amr Talaat Carry FA FA FA FA Product (serial out) A retimed version of our semisystolic multiplier. ELECT 90X