International Journal of Engineering Trends and Technology (IJETT) – Volume... Kolla Sruthi K.Raju

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
A 6X6 Multiplier Based on MCM Method
Kolla Sruthi 1 K.Raju 2
1
PG Student (M.Tech), Dept. Of ECE, KKR & KSR Institute of Technology & Sciences, Guntur
2
Associate Professor, Dept. Of ECE, KKR & KSR Institute of Technology & Sciences, Guntur
Abstract— The digit-based recoding technique does not exploit the sharing of common partial products, which
allows great reductions in the number of operations and, consequently, in area and power dissipation of the MCM
design at the gate level. Hence, the fundamental optimization problem, called the MCM problem, is defined as
finding the minimum number of addition and subtraction operations that implement the constant multiplications.
Note that, in bit-parallel design of constant multiplications, shifts can be realized using only wires in hardware
without representing any area cost. The main idea of digit serial technique is to find the terms which are common
between different constants and decreasing the number of repeated operations. There are some algorithms in the
literature which deal with CSE and in most of them there are three main steps involved: Identifying the presence of
multiple patterns in the input matrix of coefficients, selecting one pattern for elimination and eliminating all
occurrences of the selected pattern. This process is iteratively repeated until there are no more multiple patterns
present. The run time and quality of the solution are important metrics in these algorithms.
Keywords— Galois Field, multiplier, Multiple Constant Multiplications (MCM), Polynomial, Finite field.
I. INTRODUCTION
Finite impulse response (FIR) filters are of great
importance in digital signal processing (DSP) systems
since their characteristics in linear-phase and feedforward implementations make them very useful for
building stable high-performance filters. Although
architectures have similar complexity in hardware, the
transposed form is generally preferred because of its
higher performance and power efficiency [1]. The
multiplier block of the digital FIR filter in its transposed
form, where the multiplication of filter coefficients with
the filter input is realized, has significant impact on the
complexity and performance of the design because a
large number of constant multiplications are required.
This is generally known as the multiple constant
multiplications (MCM) operation and is also a central
operation and performance bottleneck in many other
DSP systems such as fast Fourier transforms, discrete
cosine transforms (DCTs), and error-correcting codes.
Although area-, delay-, and power-efficient
multiplier architectures, such as Wallace and modified
Booth multipliers, have been proposed, the full
flexibility of a multiplier is not necessary for the
constant multiplications, since filter coefficients are
ISSN: 2231-5381
fixed and determined beforehand by the DSP algorithms.
Hence, the multiplication of filter coefficients with the
input data is generally implemented under a shift adds
architecture, where each constant multiplication is
realized using addition/subtraction and shift operations
in an MCM operation.
The digit-based recoding technique does not
exploit the sharing of common partial products, which
allows great reductions in the number of operations and,
consequently, in area and power dissipation of the MCM
design at the gate level. Hence, the fundamental
optimization problem, called the MCM problem, is
defined as finding the minimum number of addition and
subtraction operations that implement the constant
multiplications. Note that, in bit-parallel design of
constant multiplications, shifts can be realized using
only wires in hardware without representing any area
cost.
Common subexpression elimination (CSE) algorithm
In computer science, common subexpression elimination
(CSE) is a compiler optimization that searches for
instances of identical expressions (i.e., they all evaluate
to the same value), and analyses whether it is
worthwhile replacing them with a single variable
holding the computed value.
http://www.ijettjournal.org
Page 27
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
Graph-Based (GB) Techniques
Graph dependence algorithms try to reduce the number
of adders to implement the multipliers and CSE is a
technique that searches for instances of identical subexpressions and analyses whether it is worthwhile
replacing them with a single variable holding the
computed value. An efficient solution of these problems
can yield significant implementation area, power
consumption, and time. The disadvantages in existing
system are multiple constant multiplication problem and
The optimization of the multiplications can lead to
important improvements in various design parameters
like area or power consumption
The main idea of digit serial technique proposed
in this paper is to find the terms which are common
between different constants and decreasing the number
of repeated operations. There are some algorithms in the
literature which deal with CSE and in most of them there
are three main steps involved:
 Identify the presence of multiple patterns in the
input matrix of coefficients.
 Select one pattern for elimination.
 Eliminate all occurrences of the selected pattern.
This process is iteratively repeated until there are no
more multiple patterns present. The run time and quality
of the solution are important metrics in these algorithms.
Multipliers usually have large area and power and
multiplication is expensive in hardware. In MCM the
values of the constants are known beforehand. Hence,
multiplication can be implemented by sequence of
additions and shifts. Suppose we want to compute 23*X.
the binary representation of 21 is 10101. So, instead of
multiplication we can compute 21X as below:
21*X = (10101)2*X = X + X<<2 + X<<4
In this case, the complexity of the
implementation is directly related to the number of nonzero digits in the constant representation. There are also
some techniques such as signed digit representations by
which the number of non-zero digits can be reduced.
Among all approaches which are available for reducing
the number of non-zero digits, Canonical Signed Digit
(CSD) has the least number of non-zero digits.
ISSN: 2231-5381
The goal of this project is to implement some previous
works related to CSE [1,2,3] and apply some
improvements to these algorithms or devise a new
algorithm and implement it to perform such an
optimization and compare to related works.
By this representation, common sub-expression
elimination (CSE) is to find the common patterns in
binary representation of the constants. Consider the
following example:
21*X = (10101)2*X = X + X<<2 + X<<4
13*X = (1101)*X = X + X<<2 + X<<3
To implement these operations, we need four add
operations and four shift operations. However, if we
extract the common digit pattern “101” then we need
three shift operations and three add operations.
F0 = (101)2*X = X + X<<2
F1 = 21*X = (10101)2*X = F0 + X<<4
F2 = 13*X = (1101)*X = F0 + X<<3
II. MATHEMATICAL MODELLING OF MCM
For the shift-adds implementation of constant
multiplications, a straightforward method, generally
known as digit based recoding [6], initially defines the
constants in binary. Then, for each “1” in the binary
representation of the constant, according to its bit
position, it shifts the variable and adds up the shifted
variables to obtain the result. As a simple example,
consider the constant multiplications 29x and 43x. Their
decompositions in binary are listed as follows:
which requires six addition operations as illustrated in
Fig. 1(a).
However, the digit-based recoding technique
does not exploit the sharing of common partial products,
which allows great reductions in the number of
operations and, consequently, in area and power
dissipation of the MCM design at the gate level. Hence,
the fundamental optimization problem, called the MCM
problem, is defined as finding the minimum number of
addition and subtraction operations that implement the
http://www.ijettjournal.org
Page 28
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
constant multiplications. Note that, in bit-parallel design
of constant multiplications, shifts can be realized using
only wires in hardware without representing any area
cost.
Fig. 1. Shift-adds implementations of 29x and 43x. (a)
Without partial product sharing [6] and with partial
product sharing. (b) Exact CSE algorithm . (c) Exact GB
algorithm
The algorithms designed for the MCM problem
can be categorized in two classes: common
subexpression elimination (CSE) algorithms and graphbased (GB) techniques. The CSE algorithms initially
extract all possible subexpression from the
representations of the constants when they are defined
under binary, canonical signed digit (CSD) [7], or
minimal signed digit (MSD) [8]. Then, they find the
“best” subexpression, generally the most common, to be
shared among the constant multiplications. The GB
methods are not limited to any particular number
representation and consider a larger number of
alternative implementations of a constant, yielding better
solutions than the CSE algorithms..
Returning to our example in Fig. 1, the exact
CSE algorithm gives a solution with four operations by
finding the most common partial products 3x = (11)binx
and 5x = (101)binx when constants are defined under
binary, as illustrated in Fig. 2(b). On the other hand, the
exact GB algorithm [2] finds a solution with the
minimum number of operations by sharing the common
partial product 7x in both multiplications, as shown in
Fig. 1(c). Note that the partial product 7x = (111)binx
cannot be extracted from the binary representation of
43x in the exact CSE algorithm [9].
ISSN: 2231-5381
However, all these algorithms assume that the
input data x is processed in parallel. On the other hand,
in digit-serial arithmetic, the data words are divided into
digit sets, consisting of d bits, that are processed one at a
time [3]. Since digit serial operators occupy less area
and are independent of the data word length, digit-serial
architectures offer alternative low complexity designs
when compared to bit-parallel architectures. However,
the shifts require the use of D flip-flops, as opposed to
the bit-parallel MCM design where they are free in
terms of hardware. Hence, the high-level algorithms
should take into account the sharing of shift operations
as well as the sharing of addition/subtraction operations
in digit-serial MCM design. Furthermore, finding the
minimum number of operations realizing an MCM
operation does not always yield an MCM design with
optimal area at the gate level [4]. Hence, the high-level
algorithms should consider the implementation cost of
each digit-serial operation at the gate level.
Since there are still instances which the exact
CSE algorithm cannot handle, we describe the
approximate GB algorithm [6] that iteratively finds the
“best” partial product which leads to the optimal area in
digit-serial MCM design at the gate level. This paper
also introduces a computer-aided design (CAD) tool
called SAFIR which generates the hardware descriptions
of digit-serial MCM operations and FIR filters based on
a design architecture and implements these circuits using
a commercial logic synthesis tool. In SAFIR, the digitserial constant multiplications can be implemented under
the shiftadds architecture, and also can be designed
using generic digit serial constant multipliers
Canonical Number Representation
In computing canonical-signed-digit (CSD) is a
number system for encoding a floating-point value in a
two's complement representation. This encoding
contains 33% fewer non-zeros than the two's
complement form, leading to efficient implementations
of add/subtract networks in hardwired Digital signal
processing. The representation uses a sequence of one
or more of the symbols, -1, 0, +1 (alternatively -, 0 or +)
with each position possibly representing the addition or
subtraction of a power of 2. For instance 23 is
represented as +0-00-, which expands to +2^5 -2^3 -2^0
or 32 - 8 - 1 = 23
http://www.ijettjournal.org
Page 29
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
CSD is obtained by transforming every
sequence of zero followed by ones (011...1) into +
followed by zeros and the least significant bit by (+0....0-).
Any n digit number in CSD has at most (n+1)/2
nonzero digits and, on average, the number of nonzero
digits is reduced by 33% when compared to binary. The
MSD representation [8] is obtained by dropping the first
property of the CSD representation. Thus, a constant
may have several representations under MSD, including
its CSD representation, but all with a minimum number
of nonzero digits. Consider the constant 23 defined in
six bits. Its binary representation 010111 includes four
nonzero digits. It is represented as 101001 in CSD, and
both 101001 and 011001 denote 23 in MSD using three
nonzero digits (where 1 stands for −1).
Digit-Serial Arithmetic
In digit-serial designs, the input data is divided
into d bits and processed serially by applying each d-bit
data in parallel. The special cases, called bit-serial and
bit-parallel, occur when the digit size d is equal to 1 and
equal to input data wordlength, respectively. The digitserial computation plays a key role when the bit-serial
implementations cannot meet the delay requirements and
the bit-parallel designs require excessive hardware. Thus,
an optimal tradeoff between area and delay can be
explored by changing the digit size d.
The basic digit-serial arithmetic operations can be found.
The digit-serial addition, subtraction, and left shift
operations are depicted in Fig. 2 using a digit size d
equal to 2, where the bits with index 0 and 1 denote the
least significant bit (LSB) and the most significant bit
(MSB), respectively. Notice from Fig. 4(a) that a digitserial addition operation, in general, requires d full
adders (FAs) and one D flip-flop. The subtraction
operation [Fig. 2(b)] is implemented using 2’s
complement, requiring the initialization of the D flipflop with 1 and additional d inverter gates with respect
to the digitserial addition operation. In a left shift
ISSN: 2231-5381
operation [Fig. 4(c) and (d)], the number of required D
flip-flops is equal to the shift amount and is realized in d
layers (one for each bit). The input–output
correspondence and the number of flip-flops cascaded
serially for each input at each layer of the digit-serial left
shift operation are given in (3) and (4), respectively,
where i ranges from 0 to d−1 and ls denotes the amount
of left shift
Fig. 2. Digit-serial design of shift-adds implementation
of 29x and 43x
Digit Serial Method- Partial Terms
In the pre-processing phase, the constants to be
multiplied by a variable are converted to positive, and
then made odd by successive divisions by 2. The
resulting constants are stored without repetition in the
target set T. Thus, T includes the minimum number of
necessary constants to be implemented. The part of the
algorithm where the implementations of the target
constants and partial terms are found is as follows.
1) Take an element from T, ti , find its representation( s)
under the given number representation, and store it(them)
in a set called S. Form an empty set Oi , associated with
ti , that will include the inputs and the amount of left
shifts at the inputs of all addition/subtraction operations
which generate ti .
2) For each representation of ti in the set S.
a) Compute all nonsymmetric partial term pairs that
cover the representation of ti .
b) In each pair, make each partial term positive and odd,
and determine its amount of left shift.
c) Add each pair to the set Oi with the amount of left
shifts of partial terms.
d) Add each partial term to T, if it does not represent the
input that the constants are multiplied with, i.e.,
denoted by 1, and is not in T.
http://www.ijettjournal.org
Page 30
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
3) Repeat Step 1 until all elements of T are considered.
Observe that the target set T only includes the target
constants to be implemented in the beginning of the
iterative loop, and in later iterations it is augmented with
the partial terms that are required for the implementation
of target constants.
All possible implementations of an element in
the target set ti are found by decomposing the nonzero
digits in the representation of, ti , into two partial terms.
As an example, consider 25 as a target constant defined
under MSD, which has two representations 011001 and
101001. All possible implementations of 25 are given in
Fig. 3. Observe from Fig. 3 that the last implementations
of 25 on both representations, i.e., 1 + 3 3, are identical,
therefore one of them can be eliminated. Also, the
duplications of implementations, such as 1 4 + 9 = 9 + 1
4, are not listed in Fig. 6. After the partial terms required
for the implementation of 25 under MSD, i.e., 3, 7, 9, 17,
and 33, are found, they are added to the target set T
without repetition and their implementations are
determined in a similar way.
Construction of the Boolean Network
After all possible implementations of target
constants and partial terms are found, they are
represented in a network that includes only AND and
OR gates. Its properties are given as follows.
Fig. 3. Possible implementations of 25 under MSD
representation
1) The primary input of the network is the input variable
to be multiplied with the constants.
2) An AND gate in the network represents an addition/
subtraction operation and has two inputs.
3) An OR gate in the network represents a target
constant or a partial term and combines all its possible
implementations.
4) The outputs of the network are the
The Boolean network is constructed as follows.
1) Take an element from T, ti .
ISSN: 2231-5381
2) For each pair in Oi , generate a two-input AND gate.
The inputs of the AND gate are the elements of the pair,
i.e., 1, denoting the input that the constants are
multiplied with, or the outputs of OR gates representing
target constants or partial terms in the network.
3) Generate an OR gate associated with ti , where its
inputs are the outputs of the AND gates determined in
Step 2.
4) If ti is a target constant, make the output of the
corresponding OR gate an output of the network.
5) Repeat Step 1 until all elements in T are considered.
Fig. 4. Network constructed for the target constant 25
under MSD.
Numerical example:
Constant =29 (11101)
Constant =43(101011)
Input =5
Theoretical
29*5=145
43*5=215
Shift and add:
(11101)*5=
5<<4 + 5<<3 + 5<<2 + 5
=> 80 + 40 + 20 + 5
=> 145
(101011)*5
5<<5 + 5<<3 + 5<<1 + 5
=>160+40+10+5
=>215
http://www.ijettjournal.org
Page 31
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
Fig. 5. Partial product sharing
Fig. 9. RTL Schematic for Exact GB Method
Fig. 6. GB Algorithm
III.
RESULTS & CONCLUSIONS
The Simulation results for the designed error
control circuit are shown in this section.
Fig. 7. Simulation result for digit serial method
Fig. 10. Comparison for Exact GB and DSM
Architectures
Compared with the existing systolic structures
for bit-parallel realization of multiplication over GF,
the proposed one is found to involve less area, shorter
critical-path and lower latency. From ASIC and FPGA
synthesis results we find that the proposed design
involves significantly less ADP and PDP than the
existing designs. Moreover, our proposed design can be
extended to further reduce the latency.
REFERENCES
Fig. 8. Simulation Result for Exact GB Method
ISSN: 2231-5381
1. S. Fenn, M.G. Parker,M. Benaissa, and D.
Taylor, “Bit-serial multiplication in GF using
all-one polynomials,” IEE Proc. Com. Digit.
Tech., vol. 144, no. 6, pp. 391–393, 1997.
2. K.-Y. Chang, D. Hong, and H.-S. Cho, “Low
complexity bit-parallel multiplier for GF defined
by all-one polynomials using redundant
http://www.ijettjournal.org
Page 32
International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014
3.
4.
5.
6.
7.
representation,” IEEE Trans. Computers, vol.
54, no. 12, pp. 1628–1629, Dec. 2005.
C.-Y. Lee, E.-H. Lu, and J.-Y. Lee, “Bit-parallel
systolic multipliers for GF fields defined by allone and equally spaced polynomials,” IEEE
Trans. Computers, vol. 50, no. 6, pp. 385–393,
May 2001
K. K. Parhi, VLSI Digital Signal Processing
Systems: Design and Implementation New
York: Wiley, 1999.
P. K. Meher, “Systolic and non-systolic
scalable modular designs of finite field
multipliers for Reed-Solomon Codec,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 17, no. 6, pp. 747–757, Jun. 2009
M. Ciet, J. J. Quisquater, and F. Sica, “A secure
family of composite finite fields suitable for fast
implementation of elliptic curve cryptography,”
in Proc. Int. Conf. Cryptol. India, 2001, pp.
108–116.
H. Wu, “Bit-parallel polynomial basis multiplier
for new classes of finite fields,” IEEE Trans.
Computers, vol. 57, no. 8, pp. 1023–1031, Aug.
2008.
Authors Profile:
KOLLA SRUTHI is pursuing her
Master degree M.Tech in VERY
LARGE SCALE INTEGRATION
(VLSI) SYSTEMS in KKR & KSR
Institute of Technology & Science.
She has completed her B.Tech in ECE in RVR&JC
College of engineering.
K.RAJU is working as Associate
Professor in KKR & KSR Institute of
Technology & Science. He has
completed his Master Degree and doing
his PhD. His research area is image
processing.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 33
Download