Da Based Fir Filter Using APC-Oms Technique

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
Da Based Fir Filter Using APC-Oms
Technique
K.Shareef Babu1, H.D.Praveena 2,K.Charan Kumar3
1
M.Tech(DECS), 3M.Tech(VLSI) Students, 3 Assistant Professor, ECE Department,
Sree Vidyanikethan Engineering College(Autonomous), A.Rangampet,Tirupati.
Abstract-In FPGA design the implementation fir filters
for DSP applications place an important role. The FPGA
area is mainly decided by the number of LUT’s
occupied. Hence for any design if the optimisation for
the area is carried out for LUT’s, then delay will also
reduce. To optimize filters using LUT’s for memory
based multiplications, four basic techniques are used
from which the combination of two techniques i.e., APC
and OMS gave better optimization results. Further if
Distributed Arithmetic (DA) technique is utilised for the
filter design approach. Then an efficient area
implementation can be achieved. In this paper L=2 to 8
bit width based filters are designed and synthesised
using Xilinx ISE 10.1i. Nearly 40% area improvement is
achieved for approximately same delay.
Keywords- Field Programmable Gate Array (FPGA),
Odd Multiple Storage(OMS), Anti –Symmetric Product
Coding(APC), Distributed Arithmetic (DA) and Look
Up Table(LUT).
I. INTRODUCTION
In the design of digital processors and application
specific systems digital operations are very important
[1]. The important class in digital systems arithmetic
circuits are arithmetic circuits. Now a day’s many
complex circuits, unthinkable have become easy with
the remarkable progress in very large scale integration
(VLSI) circuit technology [2]. In present day’s
semiconductor devices has become more prominent
usage in every field due to the rapid development of
increasing technology. The operation of the these
devices is very fast which consumes less power, less
area, reduces time of operation & become more
efficient with respect to the several factors such as
reliability, flexibility, scaling etc. therefore it leads to
significant growth & improvement of these devices
become cheaper [3].
The semiconductors have embedded memory
which results in dominating presence in the SOC’s
exceeding 90% of the total SOC. When compared to
logical components, the semiconductor memory
devices have high transistor packing density with
increasing fast rate. Apart from that, memory based
computing structures offers more other advantages
rather than multiply accumulate structures such as
greater potential for high throughput, low latency
implementation and less dynamic power consumption.
Fixed set of coefficients involved in the multiplication
ISSN: 2231-5381
for memory based computing is well suited for many
digital signal processing (DSP) The block diagram
shown below in fig. 1 is the conventional look up
table based multiplier.
Fig.1Conventional LUT based multiplier
In most of the DSP processors the memory based
computing structures are mainly concern about the
multiplier and accumulator structures. Reducing the
computational complexity for the complex
multiplication, operations are simplified with the
usage of LUTs that are used for the direct storage of
the complex computational values. Look-up-tables
provides better performance in terms of speed and
effective area utilization [4]. Using Odd Multiple
Storage(OMS) and Anti –Symmetric Product
Coding(APC) are the optimizing schemes of LUT
based FPGA design for multiply and accumulate
structures used for DSP cores[4,5,6]. To store the odd
multiples of the LUT design the OMS mechanism is
used and even multiples can be derived by shifting the
available odd multiples by using the barrel shifter.
APC is the mechanism used to reduce the required
number of LUT bit positions [5].
LUT optimization using the APC coding and OMS
methodology are the primary factors for LUT based
FIR filter is designed for DSP applications. The odd
integer representation is always used for input and
output address transformation. Previously it is
observed that, when an Anti-symmetric product
coding approach is combined with the Odd multiple
storage technique, the two’s complement operations
could be very much simplified since the odd integer
representation is always used for input and output
address transformation, and both cannot be combined
since the words generated are odd numbers.
http://www.ijettjournal.org
Page 2657
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
II. APC-OMS DESIGN FOR LUT BASED
MULTIPLIER DESIGN
A discrete FIR filter consists of delay element, coefficient and adder blocks as shown in fig. 2.
In general LUT multiplier it has the input bit ‘X’ of
length ‘L’ and ‘AX’ as output bit, where A is the
constant depends on the LUT value. 2L words are
required for multiplying X of L-bit with constant.
With the increase in input size LUT size increases
exponentially.
LUT for input of word length L=4 requires 16
address lines to store the input bit sequence is shown
in below table 1.
TABLE1
LUT REPRESENTATION
Fig. 2 General Representation of FIR filter
For a discrete-time FIR filter, the output is a
weighted sum of the current and a finite number of
previous values of the input. The operation is
described by the following equation, which defines
the output sequence y[n] in terms of its input
sequence x[n] is shown in eq(1).
[ ]=
[ ]+
=∑
[ − 1] + ⋯ +
[ − ]
[ − ]
(1)
Here bi are the filter coefficients, also known as tap
weights, that make up the impulse response and N is
the filter order. An Nth-order filter has (N+1) terms
on the right-hand side. The x[n-i] in these terms are
commonly referred to as taps, based on the structure
of a tapped delay line that in many implementations
or block diagrams provides the delayed inputs to the
multiplication operations. One may speak of a 5th
order/6-tap filter, for instance.
Address
Product
word
Address
word,X
Product
word
0000
0
1000
8A
0001
A
1001
9A
0010
2A
1010
10A
0011
3A
1011
11A
0100
4A
1100
12A
0101
5A
1101
13A
0110
6A
1110
14A
0111
7A
1111
15A
word, X
TABLE 2
OMS BASED REDUCTION SCHEME FOR LUT MULTIPLIER
Address
word
Product
word
0001
A
0011
3A
0101
5A
0111
7A
1001
9A
1011
11A
1101
13A
1111
15A
III. LOOK UP TABLE (LUT)
The tables of multiplication are pre-calculated and
stored in memory. For fast accessing of values from
the memory, LUT’s are used for saving the
computation complexity. In digital logic, an n-bit
LUT can be implemented with a multiplexer whose
select lines are the inputs of LUT and inputs are
constants. An n-bit LUT can encode any n-input
Boolean function by modelling with truth tables[7].
LUT’s with 4-6 bits of input are the key component of
modem FPGAs and this is an efficient way of
encoding functions. General representation of LUT
for multiplication bits are shown in below fig. 3.
By using the OMS scheme only odd multiplies are
stored in the LUT and the even multiplies of the LUT
are derived by left shifting the odd multiplies by using
the barrel shifter scheme. By using the barrel shifter
we can produce the maximum (L-1) no. of left shifts
to produce the even multiples.
Fig.3 General Form for LUT multiplier
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2658
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
TABLE 3
APC WORDS FOR DIFFERENT INPUT WORDS FOR L=4
Input
X1
X21 X11 X01
001
010
100
011
110
101
111
Product
value
A
2XA
4XA
3A
2X3A
5A
7A
No.
of
shifts
0
1
2
0
1
0
0
Shifted
Stored
input,
X11
APC
word
d2d1d0
001
P0=A
000
011
P1=3A
001
101
111
P2=5A
P3=7A
010
011
Address
words. If two bits per word are accepted, then the
computational speed can be essentially improved. The
maximum speed can be achieved with the fully
pipelined word-parallel architecture. For maximum
speed, a separate ROM (with identical content) for
each bit vector xb[n] should be provided.Combined
approach of FIR filter using DA technique for L=8
using APC-OMS techniques with L=4 is shown in
Fig.5.
Fig.5 Combined approach of FIR filter using DA technique for L=8
using APC-OMS techniques with L=4.
Fig. 4 APC-OMS approach for L=4
The implementation of the proposed APC-OMS
combined LUT for memory based multiplier uses two
techniques, APC and OMS method shown in fig. 4.
This method is supposed to reduce the area to one
fourth. This multiplier uses four blocks [8] [5] [6] [7].
The address generation block converts our input to
address d0, d1, d2which is produced by combining
both the APC and OMS method. The 3-to-8 address
line decoder converts the address d0, d1, d2 to LUT
address from w1 to w7. The memory array is an LUT
and barrel shifter converts the LUT output to the
desired output.The control circuit is used to produce
the controls s0, s1 which is used in the proceeding
blocks [2] [7] [8] [11]. The control and reset circuit
can be designed as
S0=x0+(x1+x2’)
(2)
S1=(x0+x1)
(3)
Reset= x3 and x2’x1’
(4)
The barrel shifter will right shift circularly
according to the control values (s0 s1), using the basic
gates to produce the control elements reset, s0, s1.
From the barrel shifter, thus producing the address
(d0d1d2) to use in the next sections. The address
generator circuit consists of a barrel shifter and some
basic gates, which converts our input to an address
d0d1d2, which is obtained by combining both of our
methods anti-symmetric (APC) and odd multiple
storage (OMS).
V. RESULTS AND DISCUSSIONS
Fig 6 Simulated results for L=8
The fig. 6 shows the waveforms generated using
Xilinx ISE while performing combined approach FIR
filter using DA technique for L=8 using two L=4 LUT
design using combined APC approach. The detailed
description of the given inputs and the output
generated is given further.For 8-bit input operand X,
Data_in, Address, W, P are given with inputs of
8’h05, 8’h02, 8’h04, 8’h004, 8’h04 respectively at
421ns, the output Q is obtained as 8’h001.
In the Fig. 7 showing the comparison for different
lengths of binary words with number of 4 input
LUT’s with the combination of both APC-OMS and
DA techniques when compared to using only APCOMS technique. Using that combination technique
uses less number of LUT’s when compared with
APC-OMS technique which implies the reduction in
the area utilized by the FPGA.
IV. DISTRIBUTED ARITHMETIC BASED FIR
FILTER
A basic DA architecture, for a length Nth sum-ofproduct computation, accepts one bit from each of N
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2659
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
Number of 4-input
LUT’s
Fig. 9 Comparison of delay
Comparision for No.of LUT's
utilized
20
15
10
DA
5
0
APC-OMS
4 5 6bit 7 8
bit bit
bit bit
Length of binary word size
Fig.7 Comparison for number of LUT’s utilized
Number of Slices
In the fig. 8the comparison for different lengths of
binary words with number of slices with the
combination of both APC-OMS and DA techniques
when compared to using only APC-OMS technique is
shown. Using this combination technique, less
number of slices when compared with APC-OMS
technique which implies the diminution in the area
utilized by the FPGA.
10
Comparision for No.of Slices
utilized
5
DA
APC-OMS
0
4 5 6bit 7 8
bit bit
bit bit
Length of binary word size
Fig 8 Comparison for number of slices utilized
In the fig. 9 the comparison for different lengths of
binary words with delay in Nano seconds with the
combination of both APC-OMS and DA techniques
when compared to using only APC-OMS technique is
shown. Even though there will be slight increase in
delay but there is 40% decrease in area utilization
with the combination of both APC-OMS and DA
techniques when compared to using only APC-OMS
technique.
Fig.10 Spartan 3E results for L=8
The same code is implemented in Spartan 3E
FPGA kit with input as 8’h05 and getting the output
sequences in hexa decimal value as ‘01’shown in fig.
10.
V. CONCLUSION
The LUT based multipliers can be used to
implement the constant multiplication for DSP
applications. The full advantages of proposed LUT
based design can be derived if the LUTs are
implemented as NAND or NOR read-only memories.
The OMS–APC-based LUTs can be used for higher
input sizes with different forms like parallel and
pipelined addition schemes for suitable area–delay
trade-offs. Finite impulse response plays an important
role in manyDigital Signal Processing applications. In
this method multiplier less FIR filter is implemented
using DA technique. This architecture provides an
efficient area implementation of FIR filter with less
latency, less area when compared with existing FIR
filters. L=4 to 8-bit width based filters are designed
and simulated using Xilinx ISE 10.1i. The
performance of the filter can be improved further by
pipelining all the input and partition tables for higher
input sizes.
REFERANCES
[1]
[2]
[3]
7.5
7
6.5
DA
6
4 bit
5 bit
6bit
7 bit
8 bit
Delay in nano seconds
Comparision of Delay
APC-OMS
[4]
Length of binary word size
ISSN: 2231-5381
[5]
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation. New York: Wiley, 1999.
HanhoLee, GeraldE. Sobelman, “FPGA-based digitserial CSD FIR filter for image signal format
conversion”,MicroelectronicsJournal33(5–6)(2002)
501– 508.
Narender Singh Pal, Harjit Pal Singh, R.K. Sarin,
SarbjeetSingh
“IMPLEMENTATION OF HIGH
SPEED FIR FILTER USING SERIAL AND
PARALLEL
DISTRIBUTED
ARITHMETIC
ALGORITHAM” International
Journal of Computer
Applications (0975 – 8887) Volume 25– No.7, July
2011.
P.K. Meher, “LUT Optimization for Memory-Based
Computation” IEEE TRANSCTIONS ON CIRCUITS
AND SYSTEMS—II: EXPRESS BRIEFS,
VOL. 57,
NO. 4, APRIL 2010.
International Technology Roadmap for Semiconductors.
[Online]. Available:http://public.itrs.net.
http://www.ijettjournal.org
Page 2660
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
[6]
[7]
[8]
JiafengXie,Jianjun He, Guanzheng Tan, “FPGA
Realization of FIR filters for high-speed and mediumspeed by using modified distributed arithmetic
architectures”, Microelectronics journal 41(2010) 365370.
Valeria Garofalo, “Fixed-width multipliers for the
implementation of efficient digital FIR filters”,
Microelectronics Journal 39(12)(2008)1491–1498.
Eldho John, P. Dinesh Kumar “MODIFIED APC-OMS
COMBINED LUT FOR MEMORY BASED
COMPUTATION”, International Journal of Systems,
Algorithms & Applications Volume 2, Issue 3, March
2012, ISSN Online: 2277-2677.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2661
Download