Proc. of Int. Conf. on Recent Trends in Signal Processing, Image Processing and VLSI, ICrtSIV Design and Implementation of Binary Signed Vedic Multiplier for Real Time Digital Signal Processing Applications Sunil Kumar K B1 and Ganesh V N2 1 MITE/M.Tech 3rd Semester, Moodabidare, India Email: sunilkumarkb7@gmail.com 2 MITE /Assistant Professor, Department of ECE, Moodabidare, India Email: ganesh.vnaik@gmail.com Abstract— In any digital signal processors (DSP’s) the speed of the multipliers plays a very important role. The precision of the multipliers also plays very important role along with its speed. For many high performance systems such as Digital Signal Processors, Microprocessors etc Multipliers are the key components. Since the Floating point multipliers can provide required precision but they are relatively slow and tend to consume more silicon area compared to fixed point (Q-format) multipliers. Since the Multipliers are the slowest component in any system, the system’s performance is determined by multiplier’s performance. In this paper we propose a fast fixed point signed multiplication using Urdhava Tiryakbhyam method of Vedic Mathematics. Index Terms— Fractional fixed point, Intellectual Property, Q-format, Vedic Mathematics, Urdhava Tiryakbhyam. I. INTRODUCTION Vedic Mathematics derived from the “Vedas” the ancient Indian scriptures which means the source of Knowledge. Vedas covers all the forms of mathematic computations, be it algebra, geometry or trigonometry. The significant feature of Vedic Mathematics is, its algorithms are designed in the way our mind naturally works. This makes the fastest and easiest way to perform any mathematical calculations mentally. Vedic Mathematics is created around 1500 BC and was rediscovered by Sri Bharthi Krishna Tirthaji (1884-1960) around 1911 to 1918. He was a well known philosopher, Sanskrit scholar and a mathematician [1]. He classified and organized the entire Vedic Mathematics into 16 formulae. And these 16 formulae are also called as sutras. These 16 formulae forms the backbone of Vedic mathematics. A lot of research is been carried out these years to implement these algorithms on digital processors. Because of symmetry and coherence in these algorithms it consumes less area and can have a regular silicon layout [2,3]. Basically signal processing algorithms are implemented using high level languages like C or Matlab using floating point number representations. The algorithms used for mapping architecture using floating point representation is more expensive as it consumes a more hardware. Hence Fixed point number representation which is a good option to implement the architecture at silicon level. Therefore we focus our work for multiplication operation using optimized hardware modules which the most used operation in signal processing applications such as Image processing systems, Optical signal processing, Fourier transforms etc. DOI: 03.AETS.2014.5.206 © Association of Computer Electronics and Electrical Engineers, 2014 Considering the 16 bit Q15 format and 32 bit Q31 format fixed point representations which are best suited for implementation on processors for digital signal processing applications with the required precision. The advantage of Q-format over floating point multipliers is in Q-format, fraction multiplication is carried out using integer multipliers which consumes very less silicon area and very fast. In this paper we propose the Urdhava Tiryakbhyam method of vedic mathematics for the implementation of Q-format [6] high speed multiplier. And we also implement multipliers using normal booth algorithm [7] and Xilinx parallel multiplier Intellectual Property and compare the speed or maximum frequency of these multipliers. II. FIXED-POINT ARITHMETIC A N-bit number fixed point number [6] can be interpreted as either an fractional or a integer number. Fixed point integer is difficult to use in the processors due to its possible overflow problem. For e.g. In a 16-bit processor the dynamic range for signed integers is from - 215 to 215-1 i.e. 32768 to 32767. If we multiply 600 with 800, the result is 48000 which is an overflow. To overcome this situation we use fractional fixed point representation also called as Q-format representation. A. Fixed Point Multiplication In general any Q-format representation is denoted by Qm,n, where m is the number of bits to represent integer, n is number of bits to represent fractional part and the total number of bits is given by N = m+n+1 for signed numbers. For e.g. Q 4,11 format signifies a total of 16 bits are required to represent fractional number in which 4 bits are reserved for the integer part and 11 bits for the fractional part and 1 bit indicates the Sign. Special cases of Q-format consist of zero bits to indicate the integer part. Q0,15 (Q15) and Q0,31 (Q31) are the two such formats for 16 bit and 32 bit representations respectively. Since there is no integer part the fractional number lie between the range 1 and -1. An N-bit number in Qm,n format is represented as follows [6]. an+m an+m-1… … … an.an-1 … … a1 a0 Here the ‘.’ an.an-1 represents the fixed point and the value of [1] is given by, (an+m 2N-1+ an+m-1 2 N-2… … …+ a2 22+a1 2+a0)2-n If we want to convert a fractional number in the range of the desired Qm,n format we should multiply it with 2n. The resultant value is rounded off or truncated to the nearest integer. Therefore a small amount of precision loss is involved which reduces as the number of bits representing the fractional part increases. We prefer rounding technique, since its error bias in positive and negative direction is same [6]. Therefore rounded value will be more precise. In general a Qm,n format has a resolution of 2-n and its dynamic range lies between -2 m to 2m - 2-n. Therefore as the number of fractional bits increases the resolution increases and as the number of integer bits increases the dynamic range increases. The resolution of Q15 format is 2-15. B. Q - Format Multiplication When two Q15 numbers are multiplied their product will be 32 long as illustrated in the Fig. 1. The product consists of a extended sign bit or redundant bit. Since the memory of product to be stored should be a Q15 number we shift the product by one bit to left and the MSB 16 bits (including redundant sign bit) is stored in the memory. Fig. 1. Illustrates the multiplication of the two Q15 format numbers. Therefore, with the Q-format representation, multiplications of the two fractional numbers are carried out using the integer multiplication. Integer multiplications are faster compared to floating point multipliers and also consume less area which is the major advantage of the Q-format representation. III. URDHAVA TIRYAKBHYAM METHOD Urdhava Tiryakbhyam [2] is a Sanskrit word which means crosswire and vertical in English. This method is a general multiplication formula and can be applied to all the cases of multiplication. In this method concurrently all the partial products are generated. Fig. 2 demonstrates a 4 x 4 binary multiplication using Urdhava Tiryakbhyam method. 8 This multiplier based is on the Urdhava Tiryakbhyam algorithm of the vedic mathematics is independent of the speed of the processor because their sums and partial products are calculated in parallel. And also by increasing the data bus widths of input and output since they are regular in structure we can increase the Figure 1. Product of two Q15 format numbers in Q15 format itself Figure 2. Multiplication of two 4-bit numbers in urdhava tiryakbhyam method processing power. Due to its regular structure, it consumes less area [3] and can be layout into a silicon chip easily. The area and gate delay increases very slowly with the increase in the number of input bits as compared to other multipliers. Hence Urdhava Tiryakbhyam multiplier is space, power and time efficient. The line diagram in fig. 2 illustrates multiplying of the two 4-bit binary numbers a3a2a1a0 and b3b2b1b0. Initially in step1 of Fig. 2, LSB of the multiplier is multiplied with LSB of the multiplicand (vertically multiplied). The result we get will be the LSB of the product. In step2 the next bit of the multiplier is multiplied with the least significant bit of the multiplicand and the least significant bit of the multiplier is multiplied with the next bit of the multiplicand (crosswire multiplied). The two partial products are added and the least significant bit of the sum is the next bit of the product and remaining bits are carried forward to the next step. For example, in one of the step, if we get the 1010 as the resultt, then 0 will act as the resultant bit (referred as fn ) and 101 as the carry bits (referred as Cn ). Hence Cn may be a multi-bit number. The sums and partial products for every step can be parallel calculated. The significant feature is, the switching instances of a processor increases with the increase in the operating frequency of the processor. This results in the higher device operating temperatures due to the more power dissipation and also consumption in the form of heat. Scalability is the another advantage of this multiplier. For each step in fig. 2 the corresponding expressions are as follows: f0 = a0b0. (1) c1f1 = a1b0 + a0b1. (2) c2f2 = c1 + a2b0 + a1b1 + a0b2. (3) 9 c3f3 = c2 + a3b0 + a2b1 + a1b2 + a0b3. (4) c4f4 = c3 + a3b1 + a2b2 + a1b3. (5) c5f5 = c4 + a3b2 + a2b3. (6) c6f6 = c5 + a3b3. (7) With c6f6f5f4f3f2f1f0 being the final result [5]. Hence this mathematical formula is generally applicable for all cases of multiplication. For the multiplication of two 8-bit numbers using 4-bit multiplier we proceed as follows. Consider XHXL and YHYL as the two 8-bit numbers, where XH and YH corresponds to the MSB 4 bits, XL and YL are the LSB 4 bits of an 8-bit number. When these two numbers are multiplied according to Vedic algorithm (vertically and crosswire) method, we get, XH XL YH YL ______________________ (XH * XL) + (XH * YL + YH* XL) + (XL * BL) Thus we need two adders to add the partial products, four 4-bit multipliers and an 4-bit intermediate carry is generated. The product of 4 x 4 bit multiplier is 8 bits long, the 4 MSB bits in each step is carried to the next step and 4 LSB bits corresponds to the final product..This process is continued for 3 steps for this case. Similarly, a 16 bit multiplier has two 16 bit adders and four 8 x 8 multipliers with 8 bit carry. Hence the modularity of this multiplier is very high and leads to the scalability and regularity of the multiplier layout. IV. ARCHITECTURE Our design of Q-format signed multiplier includes Urdhava Tiryakbhyam integer multiplier [4] with certain modifications. Since concurrently all the partial products are computed the operating speed of this multiplier is very fast compared to other multipliers. The product of the 16-bit Q15 multiplier is 16 bits long which is also a Q15 number. Now, if the most significant bit of input is 1 then the number is a negative number. Hence before proceeding with the multiplication the 2’s complement of the number is taken. Since the most significant bit signifies the sign of the number it is neglected and a ‘0’ is placed in most significant bit during multiplication. A 16-bit Q15 format multiplier is as shown in Fig. 3, the resulting product is 32 bits long, which consists of four 8 x 8 Urdhava multipliers. But the product of a Q15 number must also be a Q15 number which should be of 16 bits long. Hence 1 bit is left shifted to remove the redundant sign bit in the 32 bit product. And only the MSB 16 bits of the product are considered which forms final product. If the output is ‘1’ then the 16 bit final result is converted to its 2’s complement format indicating that the product is negative. To determine the sign of the result an xor operation is performed on input sign bits. Fig. 3. Q15 format multiplier’s Architecture 10 V. IMPLEMENTATION AND RESULTS The proposed Urdhava Tiryakbhyam Q-format multiplier is designed using structural form of coding and VHDL. The 4 x 4 Urdhava Tiryakbhyam integer multiplier which is made up of two 2x2 multiplier blocks which forms the basic block of Q15 multiplier. The clock is used to completely synchronize the design. Further, normal booth’s multiplier and Xilinx parallel multiplier Intellectual Property (IP) generated by Xilinx Core Generator which is optimized for speed with no pipelining stage were also implemented and compared with this multiplier. A. Simulation Results The design was simulated using Xilinx ISE 14.2 version. For the multiplication of Q15 format shown in Fig.4, Input1 = -0.75 = 1010 0000 0000 0000 Input2 = -0.25 = 1100 0000 0000 0000 Output = 0.1875 = 0001 1000 0000 0000 Fig. 4. Simulation results of Q15 multiplier TABLE I: C OMPARISON OF 16 BIT Q15-FORMAT M ULTIPLIERS Maximum Frequency (in MHz) 6 – input Slice LUT Usage Urdhava Multiplier 236.18 Xilinx Parallel Multiplier IP 168.77 434/19200 Normal Booth Multiplier 128.35 718/19200 662/19200 Factor which Urdhava Multiplier is faster 1.40 times 1.84 imes The speed of Urdhava Qformat multiplier is faster than the booth multiplier by a factor of 1.84. For a Q15 format multiplier, the speed is improved by a factor of 1.40 compared to Xilinx parallel multiplier IP and booth multiplier respectively. VI. CONCLUSION This paper proposed a architecture of fast multiplier for multiplications of signed Q-format numbers using Urdhava Tiryakbhyam algorithm of Vedic mathematics. The proposed multiplier speeds up the multiplication operation compared to other multipliers which is the basic hardware block as the Q-format representation, which is widely used in Digital Signal Processors. They are faster than the booth multipliers. It is also shown that the Urdhava Tiryakbhyam Q-format multiplier is faster than Xilinx parallel multiplier Intellectual Property (IP) using slice LUT’s and also using DSP48E blocks which are meant specifically for digital signal processing operations like multiply-accumulate, multiply-add etc. Therefore the Q-format multiplier based on the Urdhava Tiryakbhyam algorithm is best suited for faster multiplications required in the digital signal processing applications. 11 REFERENCES [1] Jagadguru Swami Sri BharatiKrisnaTirthaji Maharaja, “VedicMathematics: Sixteen Simple Mathematical Formulae fromthe Veda,” MotilalBanarasidas Publishers, Delhi, 2009, pp. 5-45. [2] H. Thapliyal and M. B. Shrinivas and H. Arbania, “Design and Analysisof a VLSI Based High Performance Low Power Parallel Square Architecture,” Int. Conf. Algo.Math.Comp. Sc., Las Vegas, June 2005, pp. 72-76. [3] HimanshuThapliyal and M. B. Srinivas,“An efficient method of elliptic curve encryption using Ancient Indian Vedic Mathematics,”48th IEEE International Midwest Symposium on Circuits and Systems, 2005, vol. 1, pp. 826828. [4] M. Pradhan and R. Panda, “Design and Implementation of Vedic Multiplier,” A.M.S.E Journal, Computer Science and Statistics, Francevol. 15, July 2010, pp. 1-19. [5] Harpreet Singh Dhillon ,AbhijitMitra, “A Reduced-Bit Multiplication Algorithm for Digital Arithmetics” International Journal of Computational and Mathematical Sciences, Spring 2008, pp.64-69. [6] Sen-Maw Kuo and Woon-SengGan, “Digital Signal Processor, architectures, Implementations and applications,” Pearson PrenticeHall, 2005, pp. 253-323 [7] A.D. Booth, “A Signed Binary Multiplication Technique”, Qrt. J. Mech. App. Math., vol. 4, 1951, pp. 236–240. 12