Document 12917203

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016
FPGA Based Antilog Computation Unit with Novel Shifter
Swapna Kalyani P#1, Lakshmi Priyanka S*2, Santhosh S Kiran K#3, Murali Krishna P*4
#
M.Tech Scholar & VLSI ES & KIETW, India
Asst.Professor & Dept. of ECE & KIETW, India
*#
Abstract- Technology demands improvements in
area, speed and power day by day. Modern FPGAs
are the best-suited devices for implementing complex
applications that provides optimized design with
minimal cost when compared to ASICs. This project
presents an efficient architecture for fixed point
binary antilogarithmic computation that uses a
piecewise linear approximation method to generates
the approximation coefficients that works for both
positive and negative real numbers. Logarithmic
Number System (LNS) based implementation is less
complex, has small gate counts and high operational
speed. In the proposed design, a novel shifter is used
to perform required number of shifts either in left or
right direction and generates integer and fractional
part separately
IP cores are utilized to
serve various tasks. Error analysis shows that the
proposed design provides highest accuracy for both
positive and negative numbers with least percentage
error of 0.07 and 0.34 respectively. The design is
implemented on Xilinx Virtex-5 xc5vfx70t device and
the maximum operational frequency of 139.548MHz
is achieved.
Keywords— FPGAs, ASICs, LNS, Antilogarithm,
Piecewise linear, IP Cores
I. INTRODUCTION
Present day embedded applications like signal,
image, video processing demand implementations
with less area and high accuracy. Implementing
complex arithmetic functions such as power, square
root, division, etc in VHDL using floating-point
number format is area and power consuming, and
works very slow [1-3], whereas a fixed point format,
which has simple datapath, is best-suited for such
applications because of its remarkable fractional
accuracy. Fixed point datapath circuits also perform
fast with less area and power consumption [2-3].
LNS is best suited for such implementations [3-6].
FPGAs overtake ASICs because of its minimal
design time and less time to market, which is costeffective. Latest FPGAs have built-in components
such
as
multipliers,
adders,
memories,
communication
and
networking
devices,
mathematical models etc. With fixed-point number
format, implementing complex arithmetic functions
become easy using logarithmic number system.
X[20]
24
X
[16]
......
....
21
X
[15]
20
SSN: 2231-5381
2-1
....................................
2-2
.....
X[0]
2-16
To perform simple arithmetic operations, the
input numbers are first converted into their log
equivalents. The result is then converted back to its
original form using antilogarithmic conversion.
Input
data
Output
data
Fig. 1 Arithmetic Computations using LNS
Arithmetic simplicity can be achieved at the cost
of overhead for conversion, which is very small for
many embedded applications.
[7-8] presents antilog approximation without any
hardware implementation. A 16-bit CMOS based
antilog converter architecture is discussed in [9], that
works only for positive numbers. A 32-bit antilog
converter was presented in [10] that uses less regions
for approximation. In [11], architecture generates the
antilog of given number in a single output of size 32
bits that include both integer part and fraction part
with an error percentage of 0.16 for positive and 0.8
for negative numbers.
The proposed design uses piecewise linear
approximation to calculate the approximation
coefficients by using curve fitting method. The
architecture for generating antilog of fraction part is
quite similar with that of [11], but uses a new shifter
design that generates two separate outputs one for
integer part and one for fraction part in contrast with
the barrel shifter that generates a mixed output for
integer and fraction. As a result of separate outputs,
bits needed for representing them have increased.
Hence, error is reduced by more than 50%. The
design is implemented on Xilinx Virtex-5 xc5vfx70t
device. The architecture uses off-the shelf
components like multipliers and adders.
II. APPROXIMATION APPROACH
Fixed point number format is used to represent
the antilogarithmic approximation coefficients.
Piecewise linear approximation method is used to
approximate the coefficients.
A. Fixed point Number Format
Architecture for fixed point arithmetic is less
complex when compared to floating point. Hence, it
occupies less area and consumes less power. The
proposed architecture uses a Q1.4.16 format as
shown in the figure 2. In that format, „1‟, „4‟ and „16‟
denotes number of bits allotted for sign, integer part
and fraction part respectively.
http://www.ijettjournal.org
Page 164
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016
Sign
4 bit integer part 
 16 bit fraction part 
1
4
Fig. 2 Fixed point number format
+
1
SEL
MUX(U)
0
4
SHIFTER
B. Computation of Approximation Coefficients
Piecewise linear approximation is best suited to
achieve area efficient implementation. Let X be a 21
bit binary number(X[20:0]) with a Q notation of
1.4.16 in fixed point format, in the range
2−16 ≤ X 24. Any number X can be expressed as
integer part and fraction part. (e.g. 8.56= 8+0.56).
Let integer part is denoted with k and fraction part
with f. If MSB X[20]=0, input number is positive,
else negative.
Based on the fixed-point number format, the
computation of antilogarithmic value is as given in
(1):
1
1
SEL
MUX(L)
1
1
7
7
0
FPA
-
Fig. 3 Architecture for Antilog Approximation Unit
Antilog (X) = 2X = 2k . 2f
(1)
The fractional data (f) is approximated in the
range of 0 ≤ f < 1.
k and f values depend on sign bit of the given
number. For positive number, sign bit is 0, k and f
values remain unchanged. For a negative number,
sign bit is 1, f goes out of range. Hence, k is
decremented by 1 and f is subtracted from 1 as
shown in (2).
(2)
Antilog of a given input number X is calculated
from (1), theoretically. Designing an architecture for
computing the same requires some approximation
method such that its output will be as close as
possible with the theoretical value.
III. ARCHITECTURE FOR ANTILOG UNIT
Fig 2 shows the complete architecture of the
antilogarithmic computation unit. Multiplexers
supply select values of integer part and fraction part
to the succeeding blocks, depending on the sign bit.
In fig 2, Sel Mux (U) and Sel Mux (L) selects „k‟
(X[19:16]) and „f‟ (X[15:0]) if X[20] = 0 and „k-1‟
and „1-f‟ if X[20]= 1 respectively. The output of sel
mux (U) is given to shifter and sel mux (L) to
fractional part approximation unit (FPA). The output
of FPA is then fed to shifter. This block generates
two outputs, one for integer and one for fraction.
Hence, the approximation is termed piecewise
linear. This is used to approximate the f vs 2f curve.
2f = mi.f + ci
SSN: 2231-5381
(3)
where 0 ≤ i ≤ 7 that represents eight piecewise linear
regions. mi and ci are approximation coefficients to
be computed for all values of i. Hence, eight sets of
m and c values (m0 & c0 to m7 & c7) are obtained.
These coefficients may be generated in many ways.
In this design, curve fitting tool box (cftool) in
matlab is used to generate the approximation
coefficients. Therefore, obtained values are very
accurate. Each set of m (Q1.7) & c (Q1.10) values
are combined into a 19-bit data and stored in eight
locations of a 19 X 8 ROM. The ROM content is as
shown below.
TABLE I: ROM CONTENT
Coefficients
ROM
Address
m
c
0
000
010111000
1111111111
1
001
011001010
1111110110
2
010
011011100
1111100100
3
011
011110000
1111000110
4
5
100
101
100011100
100110110
1101011111
1100010010
6
110
100110110
1100010010
7
111
101010010
1010101111
Location
These m and c values are then converted into
hexadecimal equivalents and written in a
“.coe file” and loaded into ROM IP Core, which
supplies the same to fixed point multiplier and fixed
point adder for the computation of fractional part.
The architecture for FPA unit is shown in fig 2.
http://www.ijettjournal.org
Page 165
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016
TABLE II: SHIFTER ROUTING DATA
Selection
Lines
Integer Mux
Fraction Mux
appends necessary number of zeroes in the least
significant positions in order to convert it into a 17
bit value. This is applied as one input to ex-or gate
and second input is the FPA output.
K[3:0]
X[20]=1
X[20]=0
X[20]=1
X[20]=0
0000
00000000000000000
int 0
shr 1
fp2
0001
00000000000000000
int 1
shr 2
fp2
0010
00000000000000000
int 2
shr 3
fp2
Int1
0011
00000000000000000
int 3
shr 4
fp2
Int2
2
fp2
Int 3
3
Int4
4
Int5
5
Int0
0100
00000000000000000
int 4
shr 5
0
1
0101
00000000000000000
int 5
shr 6
fp2
0110
00000000000000000
int 6
shr 7
fp2
Int6
6
0111
00000000000000000
int 7
shr 8
fp2
Int7
7
1000
00000000000000000
int 8
shr 9
fp2
Int8
8
Int 9
9
1001
00000000000000000
int 9
shr 10
fp2
Int10
10
1010
00000000000000000
int 10
shr 11
fp2
Int11
11
1011
00000000000000000
int 11
shr 12
fp2
Int12
12
00000000000000000
int 12
shr 13
fp2
Int13
13
1100
Int14
14
1101
00000000000000000
int 13
shr 14
fp2
Int15
15
1110
00000000000000000
int 14
shr 15
fp2
1111
00000000000000000
int 15
F>>15
fp2
X[15:0]
FP
Mul
FP
Add
+
m (8 bit)
X
19
17
R
O
M
c
1
9
0
17
Leading 1’s
Detector
INT
MUX
Int_out
p
1
EXOR
Fp1
CONCAT
BLOCK
F(16:0)
0’s
(16:0)
Fp2
0
Shr2
The output of FPA block consists of final output
but with mixed integer and fraction parts which are
difficult to distinguish because decimal point can‟t
be denoted in the output. This task is handled by the
shifter block. In the proposed design, shifter block
17
m
17
Shr1
A. Shifter
M
U
X
1
1
Shr3
2
Shr4
3
Shr5
4
Shr6
5
Shr7
6
Shr8
7
Shr9
8
Shr10
9
Shr11
10
Shr12
11
Shr13
12
Shr14
13
Shr15
14
Shr 0
15
32
0
M
U
X
2
F
M
U
X
32
1
q
X[20:0]
32
FPA_OUT
Fig. 4 Fractional Part Approximation Unit
generates two outputs integer and fraction separately.
16 possible integer values (0 to 15) are made ready
at the input of mux chain 1 that selects the
corresponding integer value depending on output of
sel mux (U). The selected integer is then fed to a
leading „1‟ detector so that the unwanted bits in the
most significant position may be neglected and also
SSN: 2231-5381
K3 K2 K1 K0
Fig. 5 Architecture for Proposed Shifter Block
http://www.ijettjournal.org
Page 166
32
F_out
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016
The result of FPA is again appended with 15 zeroes
to make the result 32 bit. This 32-bit representation
improves the fractional part accuracy a lot when
compared with that in [11].
Inputs int 0 to int 15 of mux1 and Shr 0 to Shr 15
of mux2 are signals of size 17-bits and 32-bits
generated after appropriate left and right shifting of
FPA output respectively.
Fig 7 and fig 8 shows the simulation results with
the obtained values for positive and negative
numbers in binary form, respectively.
IV. IMPLEMENTATION RESULTS
Fig 6 shows technology schematic of the antilog
approximation unit. It uses very less FPGA
resources. Table III shows the data corresponding to
both positive and negative numbers. It is evident that
difference between expected and obtained results is
very less. The expected values are calculated from
[12-13].
TABLE III: COMPARISON OF RESULTS
Decimal
Number
+2.28
-2.28
Antilog of input
Q1.4.16 notation
00010
0100011
110101110
10010
0100011
110101110
Expected
Obtained
4.8567795
3758
4.853088
3789
0.2058977
5431
0.205192
5659
Fig. 7 Simulation Result of Antilog block for a
positive number input
Fig. 8 Simulation Result of Antilog block for a
negative number input.
This difference in terms of error percentage is shown
in table IV. The values prove that proposed design
is far better than the existing design.
TABLE IV: ERROR ANALYSIS
Input Number
Fig. 6 Technology Schematic of the proposed
architecture
SSN: 2231-5381
+ve
-ve
http://www.ijettjournal.org
Percentage of Error
Existing
Proposed
0.16
0.8
0.07
0.342
Page 167
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 3- January 2016
Table V shows the resource utilization summary
of the Xilinx Virtex-5 5vfx70t FPGA device. The
detail states that the architecture uses 0.66% of the
available LUTs on virtex-5 device. It also uses 2 IP
Cores, 1 multiplier and 1 adder.
[3]
J. G. Pandey, A. Karmakar, and C. G. S. Shekhar, “An
FPGA-based fixed-point architecture for binary
logarithmic computation,” 2nd IEEE International
Conference in Image Information Processing (ICIIP),
Shimla, India, 09-12 Dec. 2013.
[4]
H. Kim, B. G. Nam, J. H. Sohn, J. H. Woo, and H. J. Yoo,
“A 231-MHz, 2.18-mW 32-bit logarithmic arithmetic unit
for fixed-point 3-D graphics system,” IEEE Journal of
Solid-State Circuits, vol. 41, no. 11, pp. 2373-2381, 2006,
DOI:10.1109/JSSC.2006.882887.
[5]
H. Tian, T. Srikanthan, and K. V. Asari, “Automatic
segmentation algorithm for the extraction of lum n region
and boundary from endoscopic images,” Medical and
Biological Engineering and Computing, vol. 39, no. 1, pp.
8-14, 2001, DOI:10.1007/BF02345260.
[6]
H. Kim, B. G. Nam, J. H. Sohn, J. H. Woo, and H. J. Yoo,
“A 231-MHz, 2.18-mW 32-bit logarithmic arithmetic unit
for fixed-point 3-D graphics system,” IEEE Journal of
Solid-State Circuits, vol. 41, no. 11, pp. 2373-2381, 2006,
DOI:10.1109/JSSC.2006.882887.
[7]
J. N. Mitchell, “Computer multiplication and division using
binary logarithm,” IRE Trans. Computer, vol. EC-11, pp.
512-517, 1962
[8]
M. Combet, H. Zonneveld, and L. Verbeek, “Computation
of the base two logarithm of binary numbers,” IEEE
Transactions on Electronic Computers, vol. EC-14, no. 6,
pp. 863-867, Dec. 1965, DOI:10.1109/PGEC.1965.264080
[9]
K. H. Abed and R. E. Siferd, “CMOS VLSI
implementation of 16-Bit logarithm and anti-logarithm
converters,” in Proceedings of the 43rd IEEE Midwest
Symposium on Circuits and Systems, vol. 2, Lansing, MI,
USA, 2000, pp. 776-779
[10]
K. H. Abed and R. E. Siferd, “VLSI implementation of a
low-power antilogarithmic converter,” IEEE Transactions
on Computers, vol. 52, no. 9, pp. 1221-1228, 2003,
DOI:10.1109/TC.2003.1228517.
[11]
J. G.
Pandey, A. Karmakar, C. Shekhar and S.
Gurunarayanan, “An FPGA based Novel Architecture for
fixed point binary antilogarithmic computation,”
International Conference on Electronic Systems, Signal
Processing and Computing Technologies(ICESC),2014.
[12]
http://www.exploringbinary.com/binary-converter/
TABLE V: DEVICE UTILIZATION SUMMARY
Elements
Slice LUTs
External IO
Blocks
IP Cores
Used
Proposed
Architecture
297/44800
(0.66%)
71/640
(11%)
Multiplier-1
Adder-1
V. CONCLUSIONS
An FPGA-based architecture for binary antilog
approximation unit is proposed. The design is
implemented in Xilinx Virtex- 5 xc5vfx70t FPGA.
FPA unit utilizes a built-in adder and a mutliplier to
design fixed-point datapath. The characteristic
portion of the binary number shifts the mantissa
using a novel shifter that uses multiplexers and extra
logic to generate the required outputs, which are the
final results and are closest approximations to the
original value. The error can be further reduced by
using quadratic polynomial to approximate the
antilog curve and also by increasing number of bits
allocated for fraction part in the Q notation of input
number.
REFERENCES
[1]
J. R. Parker, Algorithms for Image Processing and
Computer Vision, 2nd ed. Wiley Publishing Inc., 2011.
[2]
J. H. Sohn, R. Woo, and H. J. Yoo, “A programmable
vertex shader with fixed-point SIMD datapath for lo w
power wireless applications,” in Proceedings of the ACM
SIGGRAPH/EUROGRAPHICS conference on Graphics
hardware, Sarajevo, Bosnia-Herzegovina, 2004, p. 107–
114.
SSN: 2231-5381
[13] Google Calculator [Online]
http://www.ijettjournal.org
Page 168
Download