International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 An Efficient Multi-Mode Multiplier Design Swati Joshi#1, Dr. Neelam Rup Prakash#2 #1 ME Research scholar, EC Department, PEC University of Technology #2 Supervisor, EC Department, PEC University of Technology Abstract- This work combines Radix-4 modified booth multiplier which is known to provide higher speed as compared to other multipliers with multi precision control structure in effort to improve performance. The multi precision technique allows for flexible architectural solutions, where the variation in operand bit width can be used to decrease power dissipation and to increase throughput of multiplications. Proposed multiplier can work at different levels of precision N-bit, N/2-bit, two N/2-bit, N/4-bit, two N/4-bit, three N/4-bit, four N/4-bit operations (where N is equal to 16) which gives the designer the opportunity to design a system which can adapt to changing modes, such as lowpower, high-throughput, or high-precision operation. The design is implemented using VHDL and simulated using Cadence INCISIVE simulator. Synthesis of the design is carried out by using the cadence RTL compiler. Keywords— Modified booth, multi-mode, precision, Radix-4 I. INTRODUCTION In today’s world of ever-increasing computational demands, complex mathematical operation plays a key role in deciding system performance. Multipliers used in DSP and multimedia applications require flexible processing ability, low power consumption and high performance. Hence modifications are made to their architecture to achieve all these requirements. Recent research at micro architecture level aims at developing data path components that are capable of performing computations with variable operand size [2]. When choosing a multiplier for a digital system, the bit width of the multiplier is required to be at least as wide as the largest operand of the applications that are to be run on that digital system [10]. There have been several studies on operand bit widths of integer applications in general purpose microprocessors and it has been shown that for the more than 50% of the instructions are instructions where both operands are less than or equal to bit width of a multiplier (henceforth called narrow-width operations) [5].The bit width of the multiplier is, therefore, often much larger than its operands, which leads to excessive power dissipation and long delay [79]. This could partially be remedied by having several multipliers, each with a specific bit width, and using the particular multiplier with the smallest bit width that is large enough for the current multiplication. However, using several ISSN: 2231-5381 multipliers with different bit widths would not be an efficient solution, this scheme has several drawbacks [10-14]: The total area of the multipliers would increase, since several multiplier units are used. Power overhead due to static power dissipation of inactive multipliers. The use of several multipliers increases the fan out of the signals that drive the inputs of the multipliers. Higher fan out means longer delays and/or higher power dissipation. There would be a need for multiplexers that connect the active multiplier(s) to the result route. These multiplexers would be in the critical path, increasing total delay as well as power dissipation. II. MODIFIED BOOTH MULTIPLIER Original radix-4 booth algorithm[1][15][16][18] is efficient multiplier algorithm that decrease the number of partial product by two which lead to substantially power ,delay and area reduction however in order to provide for correct addition of the two’s-complement partial products, each partial product row must be sign extended to the width of the multiplier. Sign extension by repeating the MSB increases the loading on the logic gates that generates it, and require extra wiring which can increase area, delay and power [6],[16].To avoid sign extending the rows of recoded partial products, the sign-extension prevention scheme presented by [4] has been used. In this sign extending scheme we will assume that all the partial products rows are negative, since all partial products are assumed to be negative, the large number of sign extended(s) bits in each partial product can be replaced by an equal number of constant 1’s .Now if our partial product row come out to be positive a single 1 is added to the least significant position in a string of 1’s, the result is a string of 0’s plus a carry-out the top bit that may be discarded. Therefore, the large number of bits in each partial product can be replaced by an equal no of constant 1’s as shown in figure1.1. http://www.ijettjournal.org Page 303 International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 III. MULTI-MODE MULTIPLIER DESIGN In a multiplication process each bit of multiplier is multiplied with multiplicand thereby generating partial products and then partial products are summed up in order to generate the final result. Assume that X and Y are two n-bit unsigned numbers, where X is the multiplicand and Y is the multiplier. They can be expressed as following: X= ∑ ……………...............................................(3) Y= ∑ ……………..............................................(4) Figure1.1Booth encoded partial product with simplified sign extension Bit that gets added to the least significant position in the string of 1s is determined from circuit shown in Figure1.2: = Figure1.2Circuit diagram for sign extension corrector bit These constant bits can be taken out of the array by pre computing there sum. The sequence shown in Figure1.3 is the pre computed sum of the constant 1 in the MSB sign bits Figure1.3 Pre computed sum of constant 1's in the sign bits In a typical radix-4 Booth-encoded multiplier design, each group of 3 bits is encoded into {-2,-1, 0, 1, 2}. Negative partial products should be two’s-complemented (i.e., invert and add 1). If negative line is asserted, the partial product is inverted. The extra 1 can be added in the least significant column of the next row to avoid needing a adder. In case of concurrent parallel multiplication in a single multiplier we need a regular partial product array for this we draw on the idea of [17], called modified partial-product array Here we pre-compute the impact on the two least significant positions of a row of recoded partial products by the insertion of a '1' during sign change. The pre-computation calculates the addition of the LSB with the potential '1', from which the sum is used as the new LSB for the row of recoded partial products. A potential carry from the pre-computation is inserted at the second least significant position. (1) = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ̅̅̅̅̅̅̅̅̅̅̅̅̅ + ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅) ISSN: 2231-5381 …………………...................................................(5) Looking at the multiplication scheme shown in figure 1.4 in gray we see that up to column S7 we can obtain the result of least significant 8-bits just by adding the partial products in that column, but after column S7 there are unwanted partial products that gets added. If the values of these bits are zero then we can get the result for least significant 8-bits of a multiplier. Now we look at the Most significant 8-bits of the multiplier, we see that the result for this multiplication is shown in yellow colour. We see that when doing an N/2 bit multiplication within an N bit multiplier more than half of logic is unutilized, similarly while performing an N/4-bit multiplication within an N bit multiplier more than three fourth of logic is unutilized .We need to make some architectural modifications such that we can efficiently utilize multiplier capabilities[10-14]. Figure1.4 8x8 multiplication in the LSP and MSP of a16x16 multiplier (2) http://www.ijettjournal.org Page 304 International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 There are 7 modes in the proposed multiplier as suggested in the Table1.1: Table1. 1Modes of operation of multi-mode multiplier MODE FUNCTION CODE M0 16x16 110 M1 Single 8x8 100 M2 Double 8x8 101 M3 Four 4x4 011 M4 Triple 4x4 010 M5 Double 4x4 001 M6 Single 4x4 000 A. MODE M0:16x16 multiplication The partial products denoted by P80, P81, P82 and P83during normal 16-bit multiplication (Figure1.5) are replaced with partial products that are used to prevent sign extension in the low-precision 8-bit multiplication (Figure 1.6). Multiplexers are used for this selection. LSB bits in first four rows are replaced with bits computed according to equation (1). Pre computed sum of constant sign extended bit and potential carry bits are replaced with short pattern shown in yellow (Figure 1.6). The pattern of 1's and 0's for the normal 16-bit and 4-bit multiplications shown in grey colour cannot be used in low-precision mode. Multiplexer are used to select between the yellow and grey pattern. Partial products shown in white are set to zero. This is easily accomplished by using an AND gate, with one input used as a control signal. Potential carry generated from 16th Column is set to zero so that it could not propagate into the multiplication in the MSP and corrupt the result. C. MODE M2: Double 8x8 multiplication Figure1.5 Signed 16-bit multiplication using modified booth algorithm Mode M0 is designed to perform 16-bit multiplication.Figure1.5 shows modified booth encoded multiplication scheme for16x16 multiplier. LSB bits of partial product rows are replaced with precomputed LSB given in equation(1).Sign extension prevention bits determined by circuit in figure1.2 are added to MSB position in the partial product rows. Potential carry bits (A0-A7) determined by equation (2) needed for the multiplication are added. Pre-computed sum of constant sign extended bits as given in Figure (1.3) is added to get the final result. B. MODE M1: Single 8x8 multiplication Figure1.7 Two parallel 8x8 multiplication using modified booth algorithm Partial products shown in white are set to zero. This is accomplished by using an AND gate with one input used as a control signal. Pre computed sum of constant sign extended bits and potential carry bits are replaced with two short patterns shown in yellow and blue in the last row (Figure 1.7). The yellow pattern is for lower precision multiplier and blue pattern is used for higher precision multiplier Partial products denoted by P80, P81, P82 and P83 during normal 16-bit multiplication (Figure1.5) are replaced with partial products that are used to prevent sign extension in the low-precision 8-bit multiplication, similarly partial products denoted by P164, P165, P166, P167 are replaced with partial products that are used to prevent sign extension in most significant 8-bits of the multiplier. For MSP 8-bit multiplication, LSB of multiplicand is Y8 and MSB is Y15. Partial product bits P84, P85, P86, P87 and LSB bits in first Figure1.6 Signed 8-bit multiplication using LSP of modified booth multiplier ISSN: 2231-5381 http://www.ijettjournal.org Page 305 International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 four rows are replaced with new LSB determined by equation (1) for multiplication in most significant 8-bits and least significant 8-bits. For correct operation the input to the booth encoder for the first row in the MSP multiplication is to be set to zero, instead of using X n/2-1 as input(Figure1.8). Partial products denoted by P42, P43, P84, P85, P126, P127 and LSB bits in first two rows are replaced with new LSB determined by equation (1) for multiplication. Potential carry generated from 8th, 16th, 24th Column is set to zero so that it could not propagate into the multiplication in the MSP and corrupt the result. E. MODE M4: Three 4x4 Multiplications Figure1.8 Modified Booth encoding for two parallel 8x8 multiplier Potential carry generated from 16th Column is set to zero so that it could not propagate into the multiplication in the MSP and corrupt the result. D. MODE M3: Four 4x4parallel multiplication Figure1.9Four parallel 4x4 multiplication Partial products shown in white (Figure 1.9) are set to zero, this is accomplished by using an AND gate with one input used as a control signal. Pre computed sum of constant sign extended bit and potential carry bits are replaced with four short pattern shown in yellow, blue, brown and violet (Figure1.9). The pattern of 1's and 0's for the normal 16-bit and 8-bit multiplications, shown in grey cannot be used in this mode. Multiplexers are used for making this selection with mode signal as control signal. Partial products denoted by P40, P41, P82, P83, P124, P125, P166, P167 during normal 16-bit multiplication Figure 1.5 are replaced with partial products that are used to prevent sign extension in 4-bit multiplications. Multiplexers are used for this selection which depending on the mode of operation, select the appropriate signal as input to the reduction tree. For 4-bit multiplication shown in blue MSB of multiplicand is Y7 and LSB is Y4, for 4-bit multiplication shown in brown MSB of multiplicand is Y11 and LSB is Y8, for 4bit multiplication shown in violet MSB of multiplicand is Y15 and LSB is Y12. For correct operation the input to the booth encoder for the Second 4-bitmultiplication is to be set to zero, instead of using X n/4-1 as input, similarly for third and fourth 4-bit multiplication zero are used instead of using X n/2-1 and Xn-5 as input to the booth encoder. ISSN: 2231-5381 Figure1.10Three parllel 4x4 multiplication Partial products shown in white (Figure1.10) are set to zero, this is accomplished by using an AND gate with one input used as a control signal. Pre computed sum of constant sign extended bits and potential carry bits are replaced with three short pattern shown in yellow, blue, brown(Figure 1.10)Partial products denoted by P40, P41, P82, P83, P124, P125 during normal 16-bit multiplication (Figure 1.5) are replaced with partial products that are used to prevent sign extension in 4-bit multiplications. Multiplexers are used for this selection which depending on the mode of operation, select the appropriate signal as input to the reduction tree. For correct operation the input to the booth encoder for the Second 4-bit multiplication has to be set to zero, instead of using Xn/4-1 as input, Similarly for third 4-bit multiplication zero has been used instead of using Xn/2-1 as input to the booth encoder. Partial products denoted byP42, P43, P84, P85 and LSB bits in first two rows are replaced with new LSB determined by equation (1) for multiplication. F. MODE M5: Two 4x4 Multiplications Figure1.11 Two parallel 4x4 multiplications Partial products shown in white (Figure 1.11) are set to zero, this is accomplished by using an AND gate with one input used as a control signal. http://www.ijettjournal.org Page 306 International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 Pre computed sum of constant sign extended bit and potential carry bits are replaced with short pattern shown in yellow, blue(Figure 1.11) Partial products denoted by P40, P41, P82, P83 during normal 16-bit multiplication (Figure 1.5) are replaced with partial products that are used to prevent sign extension in 4bit multiplications. For correct operation the input to the booth encoder for the Second 4-bit multiplication has to be set to zero, instead of using X n/4-1 as input. Partial product denoted by P42, P43and LSB bits in first two rows are replaced with new LSB determined by equation (1) for multiplication. Potential carry generated from 8th, 16th Column is set to zero so that it could not propagate into the multiplication in the MSP and corrupt the result. G. MODE M6: Single 4x4 Multiplication Figure1.12 Single 4x4 multiplication Partial products shown in white (Figure 1.12) are set to zero. Partial products denoted by P40, P41 during normal 16-bit multiplication (Figure 1.5) are replaced with partial products that are used to prevent sign extension in 4-bit multiplications. LSB bits in first two rows are replaced with new LSB determined by equation (1) for multiplication. Pre computed sum of constant sign extended bits and potential carry bits are replaced with pattern shown in yellow (Figure 1.12). Potential carry generated from 8th Column is set to zero. In proposed multi-mode multiplier design, the 16-bit multiplier register is connected to four 4-bits registers. The contents of 16-bit multiplier register are transferred to these four 4-bit register depending on the mode select signal. Bits which are considered for multiplication are transferred to the four bit registers and other bits are set to zero. Now this set of four 4-bit register are taken for booth encoding. Since the block of three zero bits in booth encoding is encoded zero, the unwanted partial product are encoded to zero, this reduces task of making unwanted products zero and also reduces switching activity. ISSN: 2231-5381 IV. PARTIAL PRODUCT ADDITION Higher multiplications require a huge number of adders to perform the partial product addition. The choice of adder is very important in order to get short delays for different modes of multiplication. This multiplication scheme reduces the number of adders by using special kind of adders that are capable to add five/six/seven bits. These adders are called compressors [3].Uses of these compressors permit the reduction of the vertical critical paths. These compressors make the multipliers faster as compared to the conventional design that uses half adders and full adders. Proposed multiplier design has different modes of operation. After analyzing all modes, the partial products whose probability of being zero is high as compared to others in all multiplier modes are found and instead of using a higher order compressor for addition of all the partial product in a single column of partial product array, two lower order compressors are used .All those partial products whose probability of being zero is high are connected to a lower order compressor and others are connected to other lower order compressors in a single column. This result in power saving. V. RESULTS AND DISCUSSION In this work a 16-bit multi-mode modified booth multiplier is designed using VHDL programming language. It is possible to operate proposed multiplier in 7 different modes. Functionality of the proposed multiplier is verified by feeding the multipliers with random input vectors and verifying the result. Simulation of the proposed design is done using Cadence Incisive simulator. The VHDL descriptions are synthesized using Cadence RC compiler. Synthesized netlist are taken through place-and-route using Cadence Encounter tool, RC data is extracted from these place and route netlist and switching power estimate for each mode is calculated by applying Value Change Dump (VCD) estimates of simulation of random input vectors. Results show that there is significant improvement in power when proposed multiplier is operated in lower precision mode. A. SIMULATION RESULTS Figure1.13 Waveform for 16x16-bit multiplication http://www.ijettjournal.org Page 307 International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 Figure1.14 Output waveform for single 8x8 multiplication Figure1.19 Output waveform for single 4x4 multiplication B. SYNTHESIS RESULTS Figure1.15 Output waveform for two parallel 8x8 multiplication Figure1.20 Circuit diagram for proposed multiplier Figure1.16 Output waveform for four parallel 4x4 multiplication Figure1.21 Standard cell layout of proposed multiplier Figure1.17 Output waveform for three parallel 4x4 multiplication C. Power Analysis Table 1.2 shows switching power estimate for each mode of multiplier calculated by applying Value Change Dump (VCD) estimates of simulation of random input vectors. The test bench applied is same for all modes of operation. Table 1.2 Power dissipation in different mode of a multi-mode multiplier Multiplication mode M0 Internal Power (mW) 0.528 Switching power (mW) 0.3284 Figure1.18 Output waveform for two parallel 4x4 multiplication ISSN: 2231-5381 http://www.ijettjournal.org Page 308 International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014 M1 0.2745 0.1708 M2 0.4752 0.2955 M3 0.4013 0.2496 M4 0.2957 0.1839 M5 0.1953 0.1215 M6 0.1478 0.09195 D. Area Proposed multiplier has 44% area overhead than conventional 16-bit modified booth multiplier .This area overhead is due to extra circuitry used for controlling different modes of multiplier. Table1.3Area of different booth multiplier designs Conventional 16-bit modified booth multiplier 6894 µm2 Conventional 8-bit modified booth multiplier 3842 µm2 Conventional 4-bit modified booth multiplier 1045 µm2 Proposed 16-bit multi-mode modified booth multiplier 9987µm2 E. Delay Proposed multiplier has total delay of 11.5ns. VI. CONCLUSION This work analysed existing multiplier design methodologies and based on them a 16-bit multi-mode modified booth multiplier has been proposed. The proposed multiplier can efficiently perform either one 16- bit, one 8- bit, two 8- bit, one 4- bit, two 4 -bit, three 4- bit or four 4- bit multiplication in parallel thus providing architectural solutions, where the variation in operand bit width is harnessed to decrease power dissipation and to increase throughput of multiplications. Results show that there has been significant reduction in total switching activity and hence low power dissipation. Currently a lot of research is done on reconfigurable architectures, where the architecture can be adapted to the applications that are being executed. Various Approaches to reduce the power consumption of multiplier by eliminating spurious computation according to dynamic range of the input ISSN: 2231-5381 operands are being developed. Proposed design and dynamic range detection technique can be combined together to design a configurable multiplier (CBM) that supports multi precision operation. VII. REFRENCES [1]Booth, A., 1951. A signed binary multiplication technique. Quarterly Journal of Mechanics and Applied Mathematics,Vol.4, Issue2. [2]Brooks, D. & Martonosi, M., 1999. Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance.IEEE Computer Society,5th International symposium on high performance computer architecture.pp.13-22 [3] Dandapat, A., Ghosal, S., Sarkar, P. & Mukhopadhyay, D., 2010. A 1.2ns16×16-Bit Binary Multiplier Using High Speed Compressors. World Academy of Science, Engineering and Technology,Vol.4, Issue3, pp.556-61. [4] Fadavi-Ardekani, J., 1993. M x N Booth Encoded Multiplier Generator Using Optimized Wallace Trees. lEEE Transactions on very Large Scale Integration Systems, Vol.1, Issue2, pp.120-25. [5] Koc, C.K., 1996. RSA Hardware Implementation. RSA Laboratories, RSA Data Security,Inc. [6] Lin, Hsin-Lie, Chang, Robert, C.,Chan, M. 2004. Design of a Novel Radix-4 Booth Multiplier.IEEE Asia-Pacific Conference on Circuits and Systems,pp-837-840 [7] Parhami, B., 2000. Computer Arithmetic: Algorithm and Hardware Design. 2nd ed. Oxford University Press. [8] Sakthi, S.S. & N.Kayalvizhi, 2011. Power Aware and High Speed Reconfigurable Modified Booth Multiplier. IEEE Recent Advances in Intelligent Computational Systems. Trivandrum, pp-352-356. [9] Shun, Z., PfandeR, O.A., Pfleiderer, H.-J. & Bermak, A., 11-14 Dec. 2007. A VLSI architecture for a Run-time Multi-precision Reconfigurable Booth Multiplier. 14th IEEE International Conference on Electronics, Circuits and Systems. Marrakech,pp-975-978. [10] Sjalander, M., 2006. Efficient Reconfigurable Multipliers Based on the Twin-Precision Technique. Thesis. Chalmers University of Technology. [11] Själander, M., Eriksson, H. & Larsson-Edefors, P., 2004. An Efficient Twin-Precision Multiplier. IEEE International Conference on Computer Design. San Jose, United States of America, pp-507-510. [12]Själander, M. & Larsson-Edefors, P., 2009. Multiplication Acceleration Through Twin Precision. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,Vol.17, Issue 9,pp.1233 -1246. [13]Själander, M. & Larsson-Edefors, P., 31st August - 3 September, 2008. High-Speed and Low-Power Multipliers Using the Baugh-Wooley Algorithm and HPM Reduction Tree. IEEE International Conference on Electronics, Circuits and Systems. St. Julians, Malta. [14]Själander, M. & Larsson-Edefors, P., April 18-19, 2005. A PowerEfficient and Versatile Modified-Booth Multiplier. In Swedish System-onChip Conference. Tammsvik, Sweden. [15]Swee, K.L.S. & Hiung, L.H., 2012. Performance Comparison Review of Radix-Based Multiplier Designs. International Conference on Intelligent and Advanced Systems., 2012. pp-836 - 841. [16]Weste, N.H.E., 1998. Principle of CMOS VLSI Design:A Systems Perspective. 2nd ed. Adison-Wesley. [17]Yeh, W.-C. & Jen, C.-W., July,2000. High-Speed Booth Encoded Parallel Multiplier Design. IEEE Transactions on Computers,Vol.49,Issue7. [18]Yeo, Kiat.-Seng &Roy, Kaushik. 2009. Low-voltage, Low-Power VLSI Subsystems. Tata McGraw-Hill ed. http://www.ijettjournal.org Page 309