International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 A 6X6 Multiplier Based on MCM Method Kolla Sruthi 1 K.Raju 2 1 PG Student (M.Tech), Dept. Of ECE, KKR & KSR Institute of Technology & Sciences, Guntur 2 Associate Professor, Dept. Of ECE, KKR & KSR Institute of Technology & Sciences, Guntur Abstract— The digit-based recoding technique does not exploit the sharing of common partial products, which allows great reductions in the number of operations and, consequently, in area and power dissipation of the MCM design at the gate level. Hence, the fundamental optimization problem, called the MCM problem, is defined as finding the minimum number of addition and subtraction operations that implement the constant multiplications. Note that, in bit-parallel design of constant multiplications, shifts can be realized using only wires in hardware without representing any area cost. The main idea of digit serial technique is to find the terms which are common between different constants and decreasing the number of repeated operations. There are some algorithms in the literature which deal with CSE and in most of them there are three main steps involved: Identifying the presence of multiple patterns in the input matrix of coefficients, selecting one pattern for elimination and eliminating all occurrences of the selected pattern. This process is iteratively repeated until there are no more multiple patterns present. The run time and quality of the solution are important metrics in these algorithms. Keywords— Galois Field, multiplier, Multiple Constant Multiplications (MCM), Polynomial, Finite field. I. INTRODUCTION Finite impulse response (FIR) filters are of great importance in digital signal processing (DSP) systems since their characteristics in linear-phase and feedforward implementations make them very useful for building stable high-performance filters. Although architectures have similar complexity in hardware, the transposed form is generally preferred because of its higher performance and power efficiency [1]. The multiplier block of the digital FIR filter in its transposed form, where the multiplication of filter coefficients with the filter input is realized, has significant impact on the complexity and performance of the design because a large number of constant multiplications are required. This is generally known as the multiple constant multiplications (MCM) operation and is also a central operation and performance bottleneck in many other DSP systems such as fast Fourier transforms, discrete cosine transforms (DCTs), and error-correcting codes. Although area-, delay-, and power-efficient multiplier architectures, such as Wallace and modified Booth multipliers, have been proposed, the full flexibility of a multiplier is not necessary for the constant multiplications, since filter coefficients are ISSN: 2231-5381 fixed and determined beforehand by the DSP algorithms. Hence, the multiplication of filter coefficients with the input data is generally implemented under a shift adds architecture, where each constant multiplication is realized using addition/subtraction and shift operations in an MCM operation. The digit-based recoding technique does not exploit the sharing of common partial products, which allows great reductions in the number of operations and, consequently, in area and power dissipation of the MCM design at the gate level. Hence, the fundamental optimization problem, called the MCM problem, is defined as finding the minimum number of addition and subtraction operations that implement the constant multiplications. Note that, in bit-parallel design of constant multiplications, shifts can be realized using only wires in hardware without representing any area cost. Common subexpression elimination (CSE) algorithm In computer science, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyses whether it is worthwhile replacing them with a single variable holding the computed value. http://www.ijettjournal.org Page 27 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 Graph-Based (GB) Techniques Graph dependence algorithms try to reduce the number of adders to implement the multipliers and CSE is a technique that searches for instances of identical subexpressions and analyses whether it is worthwhile replacing them with a single variable holding the computed value. An efficient solution of these problems can yield significant implementation area, power consumption, and time. The disadvantages in existing system are multiple constant multiplication problem and The optimization of the multiplications can lead to important improvements in various design parameters like area or power consumption The main idea of digit serial technique proposed in this paper is to find the terms which are common between different constants and decreasing the number of repeated operations. There are some algorithms in the literature which deal with CSE and in most of them there are three main steps involved: Identify the presence of multiple patterns in the input matrix of coefficients. Select one pattern for elimination. Eliminate all occurrences of the selected pattern. This process is iteratively repeated until there are no more multiple patterns present. The run time and quality of the solution are important metrics in these algorithms. Multipliers usually have large area and power and multiplication is expensive in hardware. In MCM the values of the constants are known beforehand. Hence, multiplication can be implemented by sequence of additions and shifts. Suppose we want to compute 23*X. the binary representation of 21 is 10101. So, instead of multiplication we can compute 21X as below: 21*X = (10101)2*X = X + X<<2 + X<<4 In this case, the complexity of the implementation is directly related to the number of nonzero digits in the constant representation. There are also some techniques such as signed digit representations by which the number of non-zero digits can be reduced. Among all approaches which are available for reducing the number of non-zero digits, Canonical Signed Digit (CSD) has the least number of non-zero digits. ISSN: 2231-5381 The goal of this project is to implement some previous works related to CSE [1,2,3] and apply some improvements to these algorithms or devise a new algorithm and implement it to perform such an optimization and compare to related works. By this representation, common sub-expression elimination (CSE) is to find the common patterns in binary representation of the constants. Consider the following example: 21*X = (10101)2*X = X + X<<2 + X<<4 13*X = (1101)*X = X + X<<2 + X<<3 To implement these operations, we need four add operations and four shift operations. However, if we extract the common digit pattern “101” then we need three shift operations and three add operations. F0 = (101)2*X = X + X<<2 F1 = 21*X = (10101)2*X = F0 + X<<4 F2 = 13*X = (1101)*X = F0 + X<<3 II. MATHEMATICAL MODELLING OF MCM For the shift-adds implementation of constant multiplications, a straightforward method, generally known as digit based recoding [6], initially defines the constants in binary. Then, for each “1” in the binary representation of the constant, according to its bit position, it shifts the variable and adds up the shifted variables to obtain the result. As a simple example, consider the constant multiplications 29x and 43x. Their decompositions in binary are listed as follows: which requires six addition operations as illustrated in Fig. 1(a). However, the digit-based recoding technique does not exploit the sharing of common partial products, which allows great reductions in the number of operations and, consequently, in area and power dissipation of the MCM design at the gate level. Hence, the fundamental optimization problem, called the MCM problem, is defined as finding the minimum number of addition and subtraction operations that implement the http://www.ijettjournal.org Page 28 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 constant multiplications. Note that, in bit-parallel design of constant multiplications, shifts can be realized using only wires in hardware without representing any area cost. Fig. 1. Shift-adds implementations of 29x and 43x. (a) Without partial product sharing [6] and with partial product sharing. (b) Exact CSE algorithm . (c) Exact GB algorithm The algorithms designed for the MCM problem can be categorized in two classes: common subexpression elimination (CSE) algorithms and graphbased (GB) techniques. The CSE algorithms initially extract all possible subexpression from the representations of the constants when they are defined under binary, canonical signed digit (CSD) [7], or minimal signed digit (MSD) [8]. Then, they find the “best” subexpression, generally the most common, to be shared among the constant multiplications. The GB methods are not limited to any particular number representation and consider a larger number of alternative implementations of a constant, yielding better solutions than the CSE algorithms.. Returning to our example in Fig. 1, the exact CSE algorithm gives a solution with four operations by finding the most common partial products 3x = (11)binx and 5x = (101)binx when constants are defined under binary, as illustrated in Fig. 2(b). On the other hand, the exact GB algorithm [2] finds a solution with the minimum number of operations by sharing the common partial product 7x in both multiplications, as shown in Fig. 1(c). Note that the partial product 7x = (111)binx cannot be extracted from the binary representation of 43x in the exact CSE algorithm [9]. ISSN: 2231-5381 However, all these algorithms assume that the input data x is processed in parallel. On the other hand, in digit-serial arithmetic, the data words are divided into digit sets, consisting of d bits, that are processed one at a time [3]. Since digit serial operators occupy less area and are independent of the data word length, digit-serial architectures offer alternative low complexity designs when compared to bit-parallel architectures. However, the shifts require the use of D flip-flops, as opposed to the bit-parallel MCM design where they are free in terms of hardware. Hence, the high-level algorithms should take into account the sharing of shift operations as well as the sharing of addition/subtraction operations in digit-serial MCM design. Furthermore, finding the minimum number of operations realizing an MCM operation does not always yield an MCM design with optimal area at the gate level [4]. Hence, the high-level algorithms should consider the implementation cost of each digit-serial operation at the gate level. Since there are still instances which the exact CSE algorithm cannot handle, we describe the approximate GB algorithm [6] that iteratively finds the “best” partial product which leads to the optimal area in digit-serial MCM design at the gate level. This paper also introduces a computer-aided design (CAD) tool called SAFIR which generates the hardware descriptions of digit-serial MCM operations and FIR filters based on a design architecture and implements these circuits using a commercial logic synthesis tool. In SAFIR, the digitserial constant multiplications can be implemented under the shiftadds architecture, and also can be designed using generic digit serial constant multipliers Canonical Number Representation In computing canonical-signed-digit (CSD) is a number system for encoding a floating-point value in a two's complement representation. This encoding contains 33% fewer non-zeros than the two's complement form, leading to efficient implementations of add/subtract networks in hardwired Digital signal processing. The representation uses a sequence of one or more of the symbols, -1, 0, +1 (alternatively -, 0 or +) with each position possibly representing the addition or subtraction of a power of 2. For instance 23 is represented as +0-00-, which expands to +2^5 -2^3 -2^0 or 32 - 8 - 1 = 23 http://www.ijettjournal.org Page 29 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 CSD is obtained by transforming every sequence of zero followed by ones (011...1) into + followed by zeros and the least significant bit by (+0....0-). Any n digit number in CSD has at most (n+1)/2 nonzero digits and, on average, the number of nonzero digits is reduced by 33% when compared to binary. The MSD representation [8] is obtained by dropping the first property of the CSD representation. Thus, a constant may have several representations under MSD, including its CSD representation, but all with a minimum number of nonzero digits. Consider the constant 23 defined in six bits. Its binary representation 010111 includes four nonzero digits. It is represented as 101001 in CSD, and both 101001 and 011001 denote 23 in MSD using three nonzero digits (where 1 stands for −1). Digit-Serial Arithmetic In digit-serial designs, the input data is divided into d bits and processed serially by applying each d-bit data in parallel. The special cases, called bit-serial and bit-parallel, occur when the digit size d is equal to 1 and equal to input data wordlength, respectively. The digitserial computation plays a key role when the bit-serial implementations cannot meet the delay requirements and the bit-parallel designs require excessive hardware. Thus, an optimal tradeoff between area and delay can be explored by changing the digit size d. The basic digit-serial arithmetic operations can be found. The digit-serial addition, subtraction, and left shift operations are depicted in Fig. 2 using a digit size d equal to 2, where the bits with index 0 and 1 denote the least significant bit (LSB) and the most significant bit (MSB), respectively. Notice from Fig. 4(a) that a digitserial addition operation, in general, requires d full adders (FAs) and one D flip-flop. The subtraction operation [Fig. 2(b)] is implemented using 2’s complement, requiring the initialization of the D flipflop with 1 and additional d inverter gates with respect to the digitserial addition operation. In a left shift ISSN: 2231-5381 operation [Fig. 4(c) and (d)], the number of required D flip-flops is equal to the shift amount and is realized in d layers (one for each bit). The input–output correspondence and the number of flip-flops cascaded serially for each input at each layer of the digit-serial left shift operation are given in (3) and (4), respectively, where i ranges from 0 to d−1 and ls denotes the amount of left shift Fig. 2. Digit-serial design of shift-adds implementation of 29x and 43x Digit Serial Method- Partial Terms In the pre-processing phase, the constants to be multiplied by a variable are converted to positive, and then made odd by successive divisions by 2. The resulting constants are stored without repetition in the target set T. Thus, T includes the minimum number of necessary constants to be implemented. The part of the algorithm where the implementations of the target constants and partial terms are found is as follows. 1) Take an element from T, ti , find its representation( s) under the given number representation, and store it(them) in a set called S. Form an empty set Oi , associated with ti , that will include the inputs and the amount of left shifts at the inputs of all addition/subtraction operations which generate ti . 2) For each representation of ti in the set S. a) Compute all nonsymmetric partial term pairs that cover the representation of ti . b) In each pair, make each partial term positive and odd, and determine its amount of left shift. c) Add each pair to the set Oi with the amount of left shifts of partial terms. d) Add each partial term to T, if it does not represent the input that the constants are multiplied with, i.e., denoted by 1, and is not in T. http://www.ijettjournal.org Page 30 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 3) Repeat Step 1 until all elements of T are considered. Observe that the target set T only includes the target constants to be implemented in the beginning of the iterative loop, and in later iterations it is augmented with the partial terms that are required for the implementation of target constants. All possible implementations of an element in the target set ti are found by decomposing the nonzero digits in the representation of, ti , into two partial terms. As an example, consider 25 as a target constant defined under MSD, which has two representations 011001 and 101001. All possible implementations of 25 are given in Fig. 3. Observe from Fig. 3 that the last implementations of 25 on both representations, i.e., 1 + 3 3, are identical, therefore one of them can be eliminated. Also, the duplications of implementations, such as 1 4 + 9 = 9 + 1 4, are not listed in Fig. 6. After the partial terms required for the implementation of 25 under MSD, i.e., 3, 7, 9, 17, and 33, are found, they are added to the target set T without repetition and their implementations are determined in a similar way. Construction of the Boolean Network After all possible implementations of target constants and partial terms are found, they are represented in a network that includes only AND and OR gates. Its properties are given as follows. Fig. 3. Possible implementations of 25 under MSD representation 1) The primary input of the network is the input variable to be multiplied with the constants. 2) An AND gate in the network represents an addition/ subtraction operation and has two inputs. 3) An OR gate in the network represents a target constant or a partial term and combines all its possible implementations. 4) The outputs of the network are the The Boolean network is constructed as follows. 1) Take an element from T, ti . ISSN: 2231-5381 2) For each pair in Oi , generate a two-input AND gate. The inputs of the AND gate are the elements of the pair, i.e., 1, denoting the input that the constants are multiplied with, or the outputs of OR gates representing target constants or partial terms in the network. 3) Generate an OR gate associated with ti , where its inputs are the outputs of the AND gates determined in Step 2. 4) If ti is a target constant, make the output of the corresponding OR gate an output of the network. 5) Repeat Step 1 until all elements in T are considered. Fig. 4. Network constructed for the target constant 25 under MSD. Numerical example: Constant =29 (11101) Constant =43(101011) Input =5 Theoretical 29*5=145 43*5=215 Shift and add: (11101)*5= 5<<4 + 5<<3 + 5<<2 + 5 => 80 + 40 + 20 + 5 => 145 (101011)*5 5<<5 + 5<<3 + 5<<1 + 5 =>160+40+10+5 =>215 http://www.ijettjournal.org Page 31 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 Fig. 5. Partial product sharing Fig. 9. RTL Schematic for Exact GB Method Fig. 6. GB Algorithm III. RESULTS & CONCLUSIONS The Simulation results for the designed error control circuit are shown in this section. Fig. 7. Simulation result for digit serial method Fig. 10. Comparison for Exact GB and DSM Architectures Compared with the existing systolic structures for bit-parallel realization of multiplication over GF, the proposed one is found to involve less area, shorter critical-path and lower latency. From ASIC and FPGA synthesis results we find that the proposed design involves significantly less ADP and PDP than the existing designs. Moreover, our proposed design can be extended to further reduce the latency. REFERENCES Fig. 8. Simulation Result for Exact GB Method ISSN: 2231-5381 1. S. Fenn, M.G. Parker,M. Benaissa, and D. Taylor, “Bit-serial multiplication in GF using all-one polynomials,” IEE Proc. Com. Digit. Tech., vol. 144, no. 6, pp. 391–393, 1997. 2. K.-Y. Chang, D. Hong, and H.-S. Cho, “Low complexity bit-parallel multiplier for GF defined by all-one polynomials using redundant http://www.ijettjournal.org Page 32 International Journal of Engineering Trends and Technology (IJETT) – Volume 18 Number1- Dec 2014 3. 4. 5. 6. 7. representation,” IEEE Trans. Computers, vol. 54, no. 12, pp. 1628–1629, Dec. 2005. C.-Y. Lee, E.-H. Lu, and J.-Y. Lee, “Bit-parallel systolic multipliers for GF fields defined by allone and equally spaced polynomials,” IEEE Trans. Computers, vol. 50, no. 6, pp. 385–393, May 2001 K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation New York: Wiley, 1999. P. K. Meher, “Systolic and non-systolic scalable modular designs of finite field multipliers for Reed-Solomon Codec,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 6, pp. 747–757, Jun. 2009 M. Ciet, J. J. Quisquater, and F. Sica, “A secure family of composite finite fields suitable for fast implementation of elliptic curve cryptography,” in Proc. Int. Conf. Cryptol. India, 2001, pp. 108–116. H. Wu, “Bit-parallel polynomial basis multiplier for new classes of finite fields,” IEEE Trans. Computers, vol. 57, no. 8, pp. 1023–1031, Aug. 2008. Authors Profile: KOLLA SRUTHI is pursuing her Master degree M.Tech in VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS in KKR & KSR Institute of Technology & Science. She has completed her B.Tech in ECE in RVR&JC College of engineering. K.RAJU is working as Associate Professor in KKR & KSR Institute of Technology & Science. He has completed his Master Degree and doing his PhD. His research area is image processing. ISSN: 2231-5381 http://www.ijettjournal.org Page 33