International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016 FPGA Based Antilog Computation Unit with Novel Shifter Swapna Kalyani P#1, Lakshmi Priyanka S*2, Santhosh S Kiran K#3, Murali Krishna P*4 # M.Tech Scholar & VLSI ES & KIETW, India Asst.Professor & Dept. of ECE & KIETW, India *# Abstract- Technology demands improvements in area, speed and power day by day. Modern FPGAs are the best-suited devices for implementing complex applications that provides optimized design with minimal cost when compared to ASICs. This project presents an efficient architecture for fixed point binary antilogarithmic computation that uses a piecewise linear approximation method to generates the approximation coefficients that works for both positive and negative real numbers. Logarithmic Number System (LNS) based implementation is less complex, has small gate counts and high operational speed. In the proposed design, a novel shifter is used to perform required number of shifts either in left or right direction and generates integer and fractional part separately IP cores are utilized to serve various tasks. Error analysis shows that the proposed design provides highest accuracy for both positive and negative numbers with least percentage error of 0.07 and 0.34 respectively. The design is implemented on Xilinx Virtex-5 xc5vfx70t device and the maximum operational frequency of 139.548MHz is achieved. Keywords— FPGAs, ASICs, LNS, Antilogarithm, Piecewise linear, IP Cores I. INTRODUCTION Present day embedded applications like signal, image, video processing demand implementations with less area and high accuracy. Implementing complex arithmetic functions such as power, square root, division, etc in VHDL using floating-point number format is area and power consuming, and works very slow [1-3], whereas a fixed point format, which has simple datapath, is best-suited for such applications because of its remarkable fractional accuracy. Fixed point datapath circuits also perform fast with less area and power consumption [2-3]. LNS is best suited for such implementations [3-6]. FPGAs overtake ASICs because of its minimal design time and less time to market, which is costeffective. Latest FPGAs have built-in components such as multipliers, adders, memories, communication and networking devices, mathematical models etc. With fixed-point number format, implementing complex arithmetic functions become easy using logarithmic number system. X[20] 24 X [16] ...... .... 21 X [15] 20 SSN: 2231-5381 2-1 .................................... 2-2 ..... X[0] 2-16 To perform simple arithmetic operations, the input numbers are first converted into their log equivalents. The result is then converted back to its original form using antilogarithmic conversion. Input data Output data Fig. 1 Arithmetic Computations using LNS Arithmetic simplicity can be achieved at the cost of overhead for conversion, which is very small for many embedded applications. [7-8] presents antilog approximation without any hardware implementation. A 16-bit CMOS based antilog converter architecture is discussed in [9], that works only for positive numbers. A 32-bit antilog converter was presented in [10] that uses less regions for approximation. In [11], architecture generates the antilog of given number in a single output of size 32 bits that include both integer part and fraction part with an error percentage of 0.16 for positive and 0.8 for negative numbers. The proposed design uses piecewise linear approximation to calculate the approximation coefficients by using curve fitting method. The architecture for generating antilog of fraction part is quite similar with that of [11], but uses a new shifter design that generates two separate outputs one for integer part and one for fraction part in contrast with the barrel shifter that generates a mixed output for integer and fraction. As a result of separate outputs, bits needed for representing them have increased. Hence, error is reduced by more than 50%. The design is implemented on Xilinx Virtex-5 xc5vfx70t device. The architecture uses off-the shelf components like multipliers and adders. II. APPROXIMATION APPROACH Fixed point number format is used to represent the antilogarithmic approximation coefficients. Piecewise linear approximation method is used to approximate the coefficients. A. Fixed point Number Format Architecture for fixed point arithmetic is less complex when compared to floating point. Hence, it occupies less area and consumes less power. The proposed architecture uses a Q1.4.16 format as shown in the figure 2. In that format, „1‟, „4‟ and „16‟ denotes number of bits allotted for sign, integer part and fraction part respectively. http://www.ijettjournal.org Page 164 International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016 Sign 4 bit integer part 16 bit fraction part 1 4 Fig. 2 Fixed point number format + 1 SEL MUX(U) 0 4 SHIFTER B. Computation of Approximation Coefficients Piecewise linear approximation is best suited to achieve area efficient implementation. Let X be a 21 bit binary number(X[20:0]) with a Q notation of 1.4.16 in fixed point format, in the range 2−16 ≤ X 24. Any number X can be expressed as integer part and fraction part. (e.g. 8.56= 8+0.56). Let integer part is denoted with k and fraction part with f. If MSB X[20]=0, input number is positive, else negative. Based on the fixed-point number format, the computation of antilogarithmic value is as given in (1): 1 1 SEL MUX(L) 1 1 7 7 0 FPA - Fig. 3 Architecture for Antilog Approximation Unit Antilog (X) = 2X = 2k . 2f (1) The fractional data (f) is approximated in the range of 0 ≤ f < 1. k and f values depend on sign bit of the given number. For positive number, sign bit is 0, k and f values remain unchanged. For a negative number, sign bit is 1, f goes out of range. Hence, k is decremented by 1 and f is subtracted from 1 as shown in (2). (2) Antilog of a given input number X is calculated from (1), theoretically. Designing an architecture for computing the same requires some approximation method such that its output will be as close as possible with the theoretical value. III. ARCHITECTURE FOR ANTILOG UNIT Fig 2 shows the complete architecture of the antilogarithmic computation unit. Multiplexers supply select values of integer part and fraction part to the succeeding blocks, depending on the sign bit. In fig 2, Sel Mux (U) and Sel Mux (L) selects „k‟ (X[19:16]) and „f‟ (X[15:0]) if X[20] = 0 and „k-1‟ and „1-f‟ if X[20]= 1 respectively. The output of sel mux (U) is given to shifter and sel mux (L) to fractional part approximation unit (FPA). The output of FPA is then fed to shifter. This block generates two outputs, one for integer and one for fraction. Hence, the approximation is termed piecewise linear. This is used to approximate the f vs 2f curve. 2f = mi.f + ci SSN: 2231-5381 (3) where 0 ≤ i ≤ 7 that represents eight piecewise linear regions. mi and ci are approximation coefficients to be computed for all values of i. Hence, eight sets of m and c values (m0 & c0 to m7 & c7) are obtained. These coefficients may be generated in many ways. In this design, curve fitting tool box (cftool) in matlab is used to generate the approximation coefficients. Therefore, obtained values are very accurate. Each set of m (Q1.7) & c (Q1.10) values are combined into a 19-bit data and stored in eight locations of a 19 X 8 ROM. The ROM content is as shown below. TABLE I: ROM CONTENT Coefficients ROM Address m c 0 000 010111000 1111111111 1 001 011001010 1111110110 2 010 011011100 1111100100 3 011 011110000 1111000110 4 5 100 101 100011100 100110110 1101011111 1100010010 6 110 100110110 1100010010 7 111 101010010 1010101111 Location These m and c values are then converted into hexadecimal equivalents and written in a “.coe file” and loaded into ROM IP Core, which supplies the same to fixed point multiplier and fixed point adder for the computation of fractional part. The architecture for FPA unit is shown in fig 2. http://www.ijettjournal.org Page 165 International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016 TABLE II: SHIFTER ROUTING DATA Selection Lines Integer Mux Fraction Mux appends necessary number of zeroes in the least significant positions in order to convert it into a 17 bit value. This is applied as one input to ex-or gate and second input is the FPA output. K[3:0] X[20]=1 X[20]=0 X[20]=1 X[20]=0 0000 00000000000000000 int 0 shr 1 fp2 0001 00000000000000000 int 1 shr 2 fp2 0010 00000000000000000 int 2 shr 3 fp2 Int1 0011 00000000000000000 int 3 shr 4 fp2 Int2 2 fp2 Int 3 3 Int4 4 Int5 5 Int0 0100 00000000000000000 int 4 shr 5 0 1 0101 00000000000000000 int 5 shr 6 fp2 0110 00000000000000000 int 6 shr 7 fp2 Int6 6 0111 00000000000000000 int 7 shr 8 fp2 Int7 7 1000 00000000000000000 int 8 shr 9 fp2 Int8 8 Int 9 9 1001 00000000000000000 int 9 shr 10 fp2 Int10 10 1010 00000000000000000 int 10 shr 11 fp2 Int11 11 1011 00000000000000000 int 11 shr 12 fp2 Int12 12 00000000000000000 int 12 shr 13 fp2 Int13 13 1100 Int14 14 1101 00000000000000000 int 13 shr 14 fp2 Int15 15 1110 00000000000000000 int 14 shr 15 fp2 1111 00000000000000000 int 15 F>>15 fp2 X[15:0] FP Mul FP Add + m (8 bit) X 19 17 R O M c 1 9 0 17 Leading 1’s Detector INT MUX Int_out p 1 EXOR Fp1 CONCAT BLOCK F(16:0) 0’s (16:0) Fp2 0 Shr2 The output of FPA block consists of final output but with mixed integer and fraction parts which are difficult to distinguish because decimal point can‟t be denoted in the output. This task is handled by the shifter block. In the proposed design, shifter block 17 m 17 Shr1 A. Shifter M U X 1 1 Shr3 2 Shr4 3 Shr5 4 Shr6 5 Shr7 6 Shr8 7 Shr9 8 Shr10 9 Shr11 10 Shr12 11 Shr13 12 Shr14 13 Shr15 14 Shr 0 15 32 0 M U X 2 F M U X 32 1 q X[20:0] 32 FPA_OUT Fig. 4 Fractional Part Approximation Unit generates two outputs integer and fraction separately. 16 possible integer values (0 to 15) are made ready at the input of mux chain 1 that selects the corresponding integer value depending on output of sel mux (U). The selected integer is then fed to a leading „1‟ detector so that the unwanted bits in the most significant position may be neglected and also SSN: 2231-5381 K3 K2 K1 K0 Fig. 5 Architecture for Proposed Shifter Block http://www.ijettjournal.org Page 166 32 F_out International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016 The result of FPA is again appended with 15 zeroes to make the result 32 bit. This 32-bit representation improves the fractional part accuracy a lot when compared with that in [11]. Inputs int 0 to int 15 of mux1 and Shr 0 to Shr 15 of mux2 are signals of size 17-bits and 32-bits generated after appropriate left and right shifting of FPA output respectively. Fig 7 and fig 8 shows the simulation results with the obtained values for positive and negative numbers in binary form, respectively. IV. IMPLEMENTATION RESULTS Fig 6 shows technology schematic of the antilog approximation unit. It uses very less FPGA resources. Table III shows the data corresponding to both positive and negative numbers. It is evident that difference between expected and obtained results is very less. The expected values are calculated from [12-13]. TABLE III: COMPARISON OF RESULTS Decimal Number +2.28 -2.28 Antilog of input Q1.4.16 notation 00010 0100011 110101110 10010 0100011 110101110 Expected Obtained 4.8567795 3758 4.853088 3789 0.2058977 5431 0.205192 5659 Fig. 7 Simulation Result of Antilog block for a positive number input Fig. 8 Simulation Result of Antilog block for a negative number input. This difference in terms of error percentage is shown in table IV. The values prove that proposed design is far better than the existing design. TABLE IV: ERROR ANALYSIS Input Number Fig. 6 Technology Schematic of the proposed architecture SSN: 2231-5381 +ve -ve http://www.ijettjournal.org Percentage of Error Existing Proposed 0.16 0.8 0.07 0.342 Page 167 International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016 Table V shows the resource utilization summary of the Xilinx Virtex-5 5vfx70t FPGA device. The detail states that the architecture uses 0.66% of the available LUTs on virtex-5 device. It also uses 2 IP Cores, 1 multiplier and 1 adder. [3] J. G. Pandey, A. Karmakar, and C. G. S. Shekhar, “An FPGA-based fixed-point architecture for binary logarithmic computation,” 2nd IEEE International Conference in Image Information Processing (ICIIP), Shimla, India, 09-12 Dec. 2013. [4] H. Kim, B. G. Nam, J. H. Sohn, J. H. Woo, and H. J. Yoo, “A 231-MHz, 2.18-mW 32-bit logarithmic arithmetic unit for fixed-point 3-D graphics system,” IEEE Journal of Solid-State Circuits, vol. 41, no. 11, pp. 2373-2381, 2006, DOI:10.1109/JSSC.2006.882887. [5] H. Tian, T. Srikanthan, and K. V. Asari, “Automatic segmentation algorithm for the extraction of lum n region and boundary from endoscopic images,” Medical and Biological Engineering and Computing, vol. 39, no. 1, pp. 8-14, 2001, DOI:10.1007/BF02345260. [6] H. Kim, B. G. Nam, J. H. Sohn, J. H. Woo, and H. J. Yoo, “A 231-MHz, 2.18-mW 32-bit logarithmic arithmetic unit for fixed-point 3-D graphics system,” IEEE Journal of Solid-State Circuits, vol. 41, no. 11, pp. 2373-2381, 2006, DOI:10.1109/JSSC.2006.882887. [7] J. N. Mitchell, “Computer multiplication and division using binary logarithm,” IRE Trans. Computer, vol. EC-11, pp. 512-517, 1962 [8] M. Combet, H. Zonneveld, and L. Verbeek, “Computation of the base two logarithm of binary numbers,” IEEE Transactions on Electronic Computers, vol. EC-14, no. 6, pp. 863-867, Dec. 1965, DOI:10.1109/PGEC.1965.264080 [9] K. H. Abed and R. E. Siferd, “CMOS VLSI implementation of 16-Bit logarithm and anti-logarithm converters,” in Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems, vol. 2, Lansing, MI, USA, 2000, pp. 776-779 [10] K. H. Abed and R. E. Siferd, “VLSI implementation of a low-power antilogarithmic converter,” IEEE Transactions on Computers, vol. 52, no. 9, pp. 1221-1228, 2003, DOI:10.1109/TC.2003.1228517. [11] J. G. Pandey, A. Karmakar, C. Shekhar and S. Gurunarayanan, “An FPGA based Novel Architecture for fixed point binary antilogarithmic computation,” International Conference on Electronic Systems, Signal Processing and Computing Technologies(ICESC),2014. [12] http://www.exploringbinary.com/binary-converter/ TABLE V: DEVICE UTILIZATION SUMMARY Elements Slice LUTs External IO Blocks IP Cores Used Proposed Architecture 297/44800 (0.66%) 71/640 (11%) Multiplier-1 Adder-1 V. CONCLUSIONS An FPGA-based architecture for binary antilog approximation unit is proposed. The design is implemented in Xilinx Virtex- 5 xc5vfx70t FPGA. FPA unit utilizes a built-in adder and a mutliplier to design fixed-point datapath. The characteristic portion of the binary number shifts the mantissa using a novel shifter that uses multiplexers and extra logic to generate the required outputs, which are the final results and are closest approximations to the original value. The error can be further reduced by using quadratic polynomial to approximate the antilog curve and also by increasing number of bits allocated for fraction part in the Q notation of input number. REFERENCES [1] J. R. Parker, Algorithms for Image Processing and Computer Vision, 2nd ed. Wiley Publishing Inc., 2011. [2] J. H. Sohn, R. Woo, and H. J. Yoo, “A programmable vertex shader with fixed-point SIMD datapath for lo w power wireless applications,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Sarajevo, Bosnia-Herzegovina, 2004, p. 107– 114. SSN: 2231-5381 [13] Google Calculator [Online] http://www.ijettjournal.org Page 168