An Efficient Multi-Mode Multiplier Design Swati Joshi , Dr. Neelam Rup Prakash

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
An Efficient Multi-Mode Multiplier Design
Swati Joshi#1, Dr. Neelam Rup Prakash#2
#1
ME Research scholar, EC Department, PEC University of Technology
#2
Supervisor, EC Department, PEC University of Technology
Abstract- This work combines Radix-4 modified booth multiplier
which is known to provide higher speed as compared to other
multipliers with multi precision control structure in effort to
improve performance. The multi precision technique allows for
flexible architectural solutions, where the variation in operand
bit width can be used to decrease power dissipation and to
increase throughput of multiplications. Proposed multiplier can
work at different levels of precision N-bit, N/2-bit, two N/2-bit,
N/4-bit, two N/4-bit, three N/4-bit, four N/4-bit operations (where
N is equal to 16) which gives the designer the opportunity to
design a system which can adapt to changing modes, such as lowpower, high-throughput, or high-precision operation. The design
is implemented using VHDL and simulated using Cadence
INCISIVE simulator. Synthesis of the design is carried out by
using the cadence RTL compiler.
Keywords— Modified booth, multi-mode, precision, Radix-4
I. INTRODUCTION
In today’s world of ever-increasing computational demands,
complex mathematical operation plays a key role in deciding
system performance. Multipliers used in DSP and multimedia
applications require flexible processing ability, low power
consumption and high performance. Hence modifications are
made to their architecture to achieve all these requirements.
Recent research at micro architecture level aims at developing
data path components that are capable of performing
computations with variable operand size [2].
When choosing a multiplier for a digital system, the bit width
of the multiplier is required to be at least as wide as the largest
operand of the applications that are to be run on that digital
system [10]. There have been several studies on operand bit
widths of integer applications in general purpose
microprocessors and it has been shown that for the more than
50% of the instructions are instructions where both operands
are less than or equal to bit width of a multiplier (henceforth
called narrow-width operations) [5].The bit width of the
multiplier is, therefore, often much larger than its operands,
which leads to excessive power dissipation and long delay [79]. This could partially be remedied by having several
multipliers, each with a specific bit width, and using the
particular multiplier with the smallest bit width that is large
enough for the current multiplication. However, using several
ISSN: 2231-5381
multipliers with different bit widths would not be an efficient
solution, this scheme has several drawbacks [10-14]:
 The total area of the multipliers would increase, since
several multiplier units are used.
 Power overhead due to static power dissipation of inactive
multipliers.
 The use of several multipliers increases the fan out of the
signals that drive the inputs of the multipliers. Higher fan
out means longer delays and/or higher power dissipation.
 There would be a need for multiplexers that connect the
active multiplier(s) to the result route. These multiplexers
would be in the critical path, increasing total delay as well
as power dissipation.
II. MODIFIED BOOTH MULTIPLIER
Original radix-4 booth algorithm[1][15][16][18] is efficient
multiplier algorithm that decrease the number of partial
product by two which lead to substantially power ,delay and
area reduction however in order to provide for correct
addition of the two’s-complement partial products, each
partial product row must be sign extended to the width of the
multiplier. Sign extension by repeating the MSB increases the
loading on the logic gates that generates it, and require extra
wiring which can increase area, delay and power [6],[16].To
avoid sign extending the rows of recoded partial products, the
sign-extension prevention scheme presented by [4] has been
used. In this sign extending scheme we will assume that all the
partial products rows are negative, since all partial products
are assumed to be negative, the large number of sign
extended(s) bits in each partial product can be replaced by an
equal number of constant 1’s .Now if our partial product row
come out to be positive a single 1 is added to the least
significant position in a string of 1’s, the result is a string of
0’s plus a carry-out the top bit that may be discarded.
Therefore, the large number of bits in each partial product can
be replaced by an equal no of constant 1’s as shown in
figure1.1.
http://www.ijettjournal.org
Page 303
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
III. MULTI-MODE MULTIPLIER DESIGN
In a multiplication process each bit of multiplier is multiplied
with multiplicand thereby generating partial products and then
partial products are summed up in order to generate the final
result. Assume that X and Y are two n-bit unsigned numbers,
where X is the multiplicand and Y is the multiplier. They can
be expressed as following:
X= ∑
……………...............................................(3)
Y= ∑
……………..............................................(4)
Figure1.1Booth encoded partial product with simplified sign extension
Bit that gets added to the least significant position in the string
of 1s is determined from circuit shown in Figure1.2:
=
Figure1.2Circuit diagram for sign extension corrector bit
These constant bits can be taken out of the array by pre
computing there sum. The sequence shown in Figure1.3 is the
pre computed sum of the constant 1 in the MSB sign bits
Figure1.3 Pre computed sum of constant 1's in the sign bits
In a typical radix-4 Booth-encoded multiplier design, each
group of 3 bits is encoded into {-2,-1, 0, 1, 2}. Negative
partial products should be two’s-complemented (i.e., invert
and add 1). If negative line is asserted, the partial product is
inverted. The extra 1 can be added in the least significant
column of the next row to avoid needing a adder.
In case of concurrent parallel multiplication in a single
multiplier we need a regular partial product array for this we
draw on the idea of [17], called modified partial-product array
Here we pre-compute the impact on the two least significant
positions of a row of recoded partial products by the insertion
of a '1' during sign change. The pre-computation calculates the
addition of the LSB with the potential '1', from which the sum
is used as the new LSB for the row of recoded partial products.
A potential carry from the pre-computation is inserted at the
second least significant position.
(1)
=
̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅̅̅̅̅̅̅̅̅̅̅̅̅ + ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅)
ISSN: 2231-5381
…………………...................................................(5)
Looking at the multiplication scheme shown in figure 1.4 in
gray we see that up to column S7 we can obtain the result of
least significant 8-bits just by adding the partial products in
that column, but after column S7 there are unwanted partial
products that gets added. If the values of these bits are zero
then we can get the result for least significant 8-bits of a
multiplier. Now we look at the Most significant 8-bits of the
multiplier, we see that the result for this multiplication is
shown in yellow colour. We see that when doing an N/2 bit
multiplication within an N bit multiplier more than half of
logic is unutilized, similarly while performing an N/4-bit
multiplication within an N bit multiplier more than three
fourth of logic is unutilized .We need to make some
architectural modifications such that we can efficiently utilize
multiplier capabilities[10-14].
Figure1.4 8x8 multiplication in the LSP and MSP of a16x16 multiplier
(2)
http://www.ijettjournal.org
Page 304
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
There are 7 modes in the proposed multiplier as suggested in
the Table1.1:
Table1. 1Modes of operation of multi-mode multiplier
MODE
FUNCTION
CODE
M0
16x16
110
M1
Single 8x8
100
M2
Double 8x8
101
M3
Four 4x4
011
M4
Triple 4x4
010
M5
Double 4x4
001
M6
Single 4x4
000
A. MODE M0:16x16 multiplication
 The partial products denoted by P80, P81, P82 and
P83during normal 16-bit multiplication (Figure1.5) are
replaced with partial products that are used to prevent sign
extension in the low-precision 8-bit multiplication (Figure
1.6). Multiplexers are used for this selection.
 LSB bits in first four rows are replaced with bits computed
according to equation (1).
 Pre computed sum of constant sign extended bit and
potential carry bits are replaced with short pattern shown in
yellow (Figure 1.6). The pattern of 1's and 0's for the normal
16-bit and 4-bit multiplications shown in grey colour cannot
be used in low-precision mode. Multiplexer are used to
select between the yellow and grey pattern.
 Partial products shown in white are set to zero. This is
easily accomplished by using an AND gate, with one input
used as a control signal.
 Potential carry generated from 16th Column is set to zero so
that it could not propagate into the multiplication in the
MSP and corrupt the result.
C. MODE M2: Double 8x8 multiplication
Figure1.5 Signed 16-bit multiplication using modified booth algorithm
Mode
M0
is
designed
to
perform
16-bit
multiplication.Figure1.5 shows modified booth encoded
multiplication scheme for16x16 multiplier.
 LSB bits of partial product rows are replaced with precomputed LSB given in equation(1).Sign extension
prevention bits determined by circuit in figure1.2 are added
to MSB position in the partial product rows.
 Potential carry bits (A0-A7) determined by equation (2)
needed for the multiplication are added.
 Pre-computed sum of constant sign extended bits as given in
Figure (1.3) is added to get the final result.
B. MODE M1: Single 8x8 multiplication
Figure1.7 Two parallel 8x8 multiplication using modified booth algorithm
 Partial products shown in white are set to zero. This is
accomplished by using an AND gate with one input used as
a control signal.
 Pre computed sum of constant sign extended bits and
potential carry bits are replaced with two short patterns
shown in yellow and blue in the last row (Figure 1.7). The
yellow pattern is for lower precision multiplier and blue
pattern is used for higher precision multiplier
 Partial products denoted by P80, P81, P82 and P83 during
normal 16-bit multiplication (Figure1.5) are replaced with
partial products that are used to prevent sign extension in
the low-precision 8-bit multiplication, similarly partial
products denoted by P164, P165, P166, P167 are replaced
with partial products that are used to prevent sign extension
in most significant 8-bits of the multiplier. For MSP 8-bit
multiplication, LSB of multiplicand is Y8 and MSB is Y15.
Partial product bits P84, P85, P86, P87 and LSB bits in first
Figure1.6 Signed 8-bit multiplication using LSP of modified booth multiplier
ISSN: 2231-5381
http://www.ijettjournal.org
Page 305
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
four rows are replaced with new LSB determined by
equation (1) for multiplication in most significant 8-bits and
least significant 8-bits. For correct operation the input to the
booth encoder for the first row in the MSP multiplication is
to be set to zero, instead of using X n/2-1 as input(Figure1.8).
 Partial products denoted by P42, P43, P84, P85, P126, P127
and LSB bits in first two rows are replaced with new LSB
determined by equation (1) for multiplication.
 Potential carry generated from 8th, 16th, 24th Column is set to
zero so that it could not propagate into the multiplication in
the MSP and corrupt the result.
E. MODE M4: Three 4x4 Multiplications
Figure1.8 Modified Booth encoding for two parallel 8x8 multiplier
 Potential carry generated from 16th Column is set to zero so
that it could not propagate into the multiplication in the
MSP and corrupt the result.
D. MODE M3: Four 4x4parallel multiplication
Figure1.9Four parallel 4x4 multiplication
 Partial products shown in white (Figure 1.9) are set to zero,
this is accomplished by using an AND gate with one input
used as a control signal.
 Pre computed sum of constant sign extended bit and
potential carry bits are replaced with four short pattern
shown in yellow, blue, brown and violet (Figure1.9). The
pattern of 1's and 0's for the normal 16-bit and 8-bit
multiplications, shown in grey cannot be used in this mode.
Multiplexers are used for making this selection with mode
signal as control signal.
 Partial products denoted by P40, P41, P82, P83, P124, P125,
P166, P167 during normal 16-bit multiplication Figure 1.5
are replaced with partial products that are used to prevent
sign extension in 4-bit multiplications. Multiplexers are
used for this selection which depending on the mode of
operation, select the appropriate signal as input to the
reduction tree.
 For 4-bit multiplication shown in blue MSB of multiplicand
is Y7 and LSB is Y4, for 4-bit multiplication shown in
brown MSB of multiplicand is Y11 and LSB is Y8, for 4bit multiplication shown in violet MSB of multiplicand is
Y15 and LSB is Y12.
 For correct operation the input to the booth encoder for the
Second 4-bitmultiplication is to be set to zero, instead of
using X n/4-1 as input, similarly for third and fourth 4-bit
multiplication zero are used instead of using X n/2-1 and Xn-5
as input to the booth encoder.
ISSN: 2231-5381
Figure1.10Three parllel 4x4 multiplication
 Partial products shown in white (Figure1.10) are set to zero,
this is accomplished by using an AND gate with one input
used as a control signal.
 Pre computed sum of constant sign extended bits and
potential carry bits are replaced with three short pattern
shown in yellow, blue, brown(Figure 1.10)Partial products
denoted by P40, P41, P82, P83, P124, P125 during normal
16-bit multiplication (Figure 1.5) are replaced with partial
products that are used to prevent sign extension in 4-bit
multiplications. Multiplexers are used for this selection
which depending on the mode of operation, select the
appropriate signal as input to the reduction tree.
 For correct operation the input to the booth encoder for the
Second 4-bit multiplication has to be set to zero, instead of
using Xn/4-1 as input, Similarly for third 4-bit multiplication
zero has been used instead of using Xn/2-1 as input to the
booth encoder.
 Partial products denoted byP42, P43, P84, P85 and LSB bits
in first two rows are replaced with new LSB determined by
equation (1) for multiplication.
F. MODE M5: Two 4x4 Multiplications
Figure1.11 Two parallel 4x4 multiplications
 Partial products shown in white (Figure 1.11) are set to
zero, this is accomplished by using an AND gate with one
input used as a control signal.
http://www.ijettjournal.org
Page 306
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
 Pre computed sum of constant sign extended bit and
potential carry bits are replaced with short pattern shown in
yellow, blue(Figure 1.11)
 Partial products denoted by P40, P41, P82, P83 during
normal 16-bit multiplication (Figure 1.5) are replaced with
partial products that are used to prevent sign extension in 4bit multiplications.
 For correct operation the input to the booth encoder for the
Second 4-bit multiplication has to be set to zero, instead of
using X n/4-1 as input.
 Partial product denoted by P42, P43and LSB bits in first
two rows are replaced with new LSB determined by
equation (1) for multiplication.
 Potential carry generated from 8th, 16th Column is set to zero
so that it could not propagate into the multiplication in the
MSP and corrupt the result.
G. MODE M6: Single 4x4 Multiplication
Figure1.12 Single 4x4 multiplication
 Partial products shown in white (Figure 1.12) are set to
zero.
 Partial products denoted by P40, P41 during normal 16-bit
multiplication (Figure 1.5) are replaced with partial
products that are used to prevent sign extension in 4-bit
multiplications. LSB bits in first two rows are replaced with
new LSB determined by equation (1) for multiplication.
 Pre computed sum of constant sign extended bits and
potential carry bits are replaced with pattern shown in
yellow (Figure 1.12). Potential carry generated from 8th
Column is set to zero.
In proposed multi-mode multiplier design, the 16-bit
multiplier register is connected to four 4-bits registers. The
contents of 16-bit multiplier register are transferred to these
four 4-bit register depending on the mode select signal. Bits
which are considered for multiplication are transferred to the
four bit registers and other bits are set to zero. Now this set of
four 4-bit register are taken for booth encoding. Since the
block of three zero bits in booth encoding is encoded zero, the
unwanted partial product are encoded to zero, this reduces
task of making unwanted products zero and also reduces
switching activity.
ISSN: 2231-5381
IV. PARTIAL PRODUCT ADDITION
Higher multiplications require a huge number of adders to
perform the partial product addition. The choice of adder is
very important in order to get short delays for different modes
of multiplication. This multiplication scheme reduces the
number of adders by using special kind of adders that are
capable to add five/six/seven bits. These adders are called
compressors [3].Uses of these compressors permit the
reduction of the vertical critical paths. These compressors
make the multipliers faster as compared to the conventional
design that uses half adders and full adders.
Proposed multiplier design has different modes of operation.
After analyzing all modes, the partial products whose
probability of being zero is high as compared to others in all
multiplier modes are found and instead of using a higher order
compressor for addition of all the partial product in a single
column of partial product array, two lower order compressors
are used .All those partial products whose probability of being
zero is high are connected to a lower order compressor and
others are connected to other lower order compressors in a
single column. This result in power saving.
V. RESULTS AND DISCUSSION
In this work a 16-bit multi-mode modified booth multiplier is
designed using VHDL programming language. It is possible
to operate proposed multiplier in 7 different modes.
Functionality of the proposed multiplier is verified by feeding
the multipliers with random input vectors and verifying the
result. Simulation of the proposed design is done using
Cadence Incisive simulator. The VHDL descriptions are
synthesized using Cadence RC compiler. Synthesized netlist
are taken through place-and-route using Cadence Encounter
tool, RC data is extracted from these place and route netlist
and switching power estimate for each mode is calculated by
applying Value Change Dump (VCD) estimates of simulation
of random input vectors. Results show that there is significant
improvement in power when proposed multiplier is operated
in lower precision mode.
A. SIMULATION RESULTS
Figure1.13 Waveform for 16x16-bit multiplication
http://www.ijettjournal.org
Page 307
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
Figure1.14 Output waveform for single 8x8 multiplication
Figure1.19 Output waveform for single 4x4 multiplication
B. SYNTHESIS RESULTS
Figure1.15 Output waveform for two parallel 8x8 multiplication
Figure1.20 Circuit diagram for proposed multiplier
Figure1.16 Output waveform for four parallel 4x4 multiplication
Figure1.21 Standard cell layout of proposed multiplier
Figure1.17 Output waveform for three parallel 4x4 multiplication
C. Power Analysis
Table 1.2 shows switching power estimate for each mode of
multiplier calculated by applying Value Change Dump (VCD)
estimates of simulation of random input vectors. The test
bench applied is same for all modes of operation.
Table 1.2 Power dissipation in different mode of a multi-mode multiplier
Multiplication
mode
M0
Internal Power
(mW)
0.528
Switching power
(mW)
0.3284
Figure1.18 Output waveform for two parallel 4x4 multiplication
ISSN: 2231-5381
http://www.ijettjournal.org
Page 308
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 7 – Jul 2014
M1
0.2745
0.1708
M2
0.4752
0.2955
M3
0.4013
0.2496
M4
0.2957
0.1839
M5
0.1953
0.1215
M6
0.1478
0.09195
D. Area
Proposed multiplier has 44% area overhead than conventional
16-bit modified booth multiplier .This area overhead is due to
extra circuitry used for controlling different modes of
multiplier.
Table1.3Area of different booth multiplier designs
Conventional 16-bit modified booth
multiplier
6894 µm2
Conventional 8-bit modified booth
multiplier
3842 µm2
Conventional 4-bit modified booth
multiplier
1045 µm2
Proposed 16-bit multi-mode modified
booth multiplier
9987µm2
E. Delay
Proposed multiplier has total delay of 11.5ns.
VI. CONCLUSION
This work analysed existing multiplier design methodologies
and based on them a 16-bit multi-mode modified booth
multiplier has been proposed. The proposed multiplier can
efficiently perform either one 16- bit, one 8- bit, two 8- bit,
one 4- bit, two 4 -bit, three 4- bit or four 4- bit multiplication
in parallel thus providing architectural solutions, where the
variation in operand bit width is harnessed to decrease power
dissipation and to increase throughput of multiplications.
Results show that there has been significant reduction in total
switching activity and hence low power dissipation.
Currently a lot of research is done on reconfigurable
architectures, where the architecture can be adapted to the
applications that are being executed. Various Approaches to
reduce the power consumption of multiplier by eliminating
spurious computation according to dynamic range of the input
ISSN: 2231-5381
operands are being developed. Proposed design and dynamic
range detection technique can be combined together to design
a configurable multiplier (CBM) that supports multi precision
operation.
VII.
REFRENCES
[1]Booth, A., 1951. A signed binary multiplication technique. Quarterly
Journal of Mechanics and Applied Mathematics,Vol.4, Issue2.
[2]Brooks, D. & Martonosi, M., 1999. Dynamically Exploiting Narrow Width
Operands to Improve Processor Power and Performance.IEEE Computer
Society,5th International symposium on high performance computer
architecture.pp.13-22
[3] Dandapat, A., Ghosal, S., Sarkar, P. & Mukhopadhyay, D., 2010. A 1.2ns16×16-Bit Binary Multiplier Using High Speed Compressors. World
Academy of Science, Engineering and Technology,Vol.4, Issue3, pp.556-61.
[4] Fadavi-Ardekani, J., 1993. M x N Booth Encoded Multiplier Generator
Using Optimized Wallace Trees. lEEE Transactions on very Large Scale
Integration Systems, Vol.1, Issue2, pp.120-25.
[5] Koc, C.K., 1996. RSA Hardware Implementation. RSA Laboratories,
RSA Data Security,Inc.
[6] Lin, Hsin-Lie, Chang, Robert, C.,Chan, M. 2004. Design of a Novel
Radix-4 Booth Multiplier.IEEE Asia-Pacific Conference on Circuits and
Systems,pp-837-840
[7] Parhami, B., 2000. Computer Arithmetic: Algorithm and Hardware
Design. 2nd ed. Oxford University Press.
[8] Sakthi, S.S. & N.Kayalvizhi, 2011. Power Aware and High Speed
Reconfigurable Modified Booth Multiplier. IEEE Recent Advances in
Intelligent Computational Systems. Trivandrum, pp-352-356.
[9] Shun, Z., PfandeR, O.A., Pfleiderer, H.-J. & Bermak, A., 11-14 Dec.
2007. A VLSI architecture for a Run-time Multi-precision Reconfigurable
Booth Multiplier. 14th IEEE International Conference on Electronics,
Circuits and Systems. Marrakech,pp-975-978.
[10] Sjalander, M., 2006. Efficient Reconfigurable Multipliers Based on the
Twin-Precision Technique. Thesis. Chalmers University of Technology.
[11] Själander, M., Eriksson, H. & Larsson-Edefors, P., 2004. An Efficient
Twin-Precision Multiplier. IEEE International Conference on Computer
Design. San Jose, United States of America, pp-507-510.
[12]Själander, M. & Larsson-Edefors, P., 2009. Multiplication Acceleration
Through Twin Precision. IEEE Transactions on Very Large Scale Integration
(VLSI) Systems,Vol.17, Issue 9,pp.1233 -1246.
[13]Själander, M. & Larsson-Edefors, P., 31st August - 3 September, 2008.
High-Speed and Low-Power Multipliers Using the Baugh-Wooley Algorithm
and HPM Reduction Tree. IEEE International Conference on Electronics,
Circuits and Systems. St. Julians, Malta.
[14]Själander, M. & Larsson-Edefors, P., April 18-19, 2005. A PowerEfficient and Versatile Modified-Booth Multiplier. In Swedish System-onChip Conference. Tammsvik, Sweden.
[15]Swee, K.L.S. & Hiung, L.H., 2012. Performance Comparison Review of
Radix-Based Multiplier Designs. International Conference on Intelligent and
Advanced Systems., 2012. pp-836 - 841.
[16]Weste, N.H.E., 1998. Principle of CMOS VLSI Design:A Systems
Perspective. 2nd ed. Adison-Wesley.
[17]Yeh, W.-C. & Jen, C.-W., July,2000. High-Speed Booth Encoded Parallel
Multiplier Design. IEEE Transactions on Computers,Vol.49,Issue7.
[18]Yeo, Kiat.-Seng &Roy, Kaushik. 2009. Low-voltage, Low-Power VLSI
Subsystems. Tata McGraw-Hill ed.
http://www.ijettjournal.org
Page 309
Download