International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 Da Based Fir Filter Using APC-Oms Technique K.Shareef Babu1, H.D.Praveena 2,K.Charan Kumar3 1 M.Tech(DECS), 3M.Tech(VLSI) Students, 3 Assistant Professor, ECE Department, Sree Vidyanikethan Engineering College(Autonomous), A.Rangampet,Tirupati. Abstract-In FPGA design the implementation fir filters for DSP applications place an important role. The FPGA area is mainly decided by the number of LUT’s occupied. Hence for any design if the optimisation for the area is carried out for LUT’s, then delay will also reduce. To optimize filters using LUT’s for memory based multiplications, four basic techniques are used from which the combination of two techniques i.e., APC and OMS gave better optimization results. Further if Distributed Arithmetic (DA) technique is utilised for the filter design approach. Then an efficient area implementation can be achieved. In this paper L=2 to 8 bit width based filters are designed and synthesised using Xilinx ISE 10.1i. Nearly 40% area improvement is achieved for approximately same delay. Keywords- Field Programmable Gate Array (FPGA), Odd Multiple Storage(OMS), Anti –Symmetric Product Coding(APC), Distributed Arithmetic (DA) and Look Up Table(LUT). I. INTRODUCTION In the design of digital processors and application specific systems digital operations are very important [1]. The important class in digital systems arithmetic circuits are arithmetic circuits. Now a day’s many complex circuits, unthinkable have become easy with the remarkable progress in very large scale integration (VLSI) circuit technology [2]. In present day’s semiconductor devices has become more prominent usage in every field due to the rapid development of increasing technology. The operation of the these devices is very fast which consumes less power, less area, reduces time of operation & become more efficient with respect to the several factors such as reliability, flexibility, scaling etc. therefore it leads to significant growth & improvement of these devices become cheaper [3]. The semiconductors have embedded memory which results in dominating presence in the SOC’s exceeding 90% of the total SOC. When compared to logical components, the semiconductor memory devices have high transistor packing density with increasing fast rate. Apart from that, memory based computing structures offers more other advantages rather than multiply accumulate structures such as greater potential for high throughput, low latency implementation and less dynamic power consumption. Fixed set of coefficients involved in the multiplication ISSN: 2231-5381 for memory based computing is well suited for many digital signal processing (DSP) The block diagram shown below in fig. 1 is the conventional look up table based multiplier. Fig.1Conventional LUT based multiplier In most of the DSP processors the memory based computing structures are mainly concern about the multiplier and accumulator structures. Reducing the computational complexity for the complex multiplication, operations are simplified with the usage of LUTs that are used for the direct storage of the complex computational values. Look-up-tables provides better performance in terms of speed and effective area utilization [4]. Using Odd Multiple Storage(OMS) and Anti –Symmetric Product Coding(APC) are the optimizing schemes of LUT based FPGA design for multiply and accumulate structures used for DSP cores[4,5,6]. To store the odd multiples of the LUT design the OMS mechanism is used and even multiples can be derived by shifting the available odd multiples by using the barrel shifter. APC is the mechanism used to reduce the required number of LUT bit positions [5]. LUT optimization using the APC coding and OMS methodology are the primary factors for LUT based FIR filter is designed for DSP applications. The odd integer representation is always used for input and output address transformation. Previously it is observed that, when an Anti-symmetric product coding approach is combined with the Odd multiple storage technique, the two’s complement operations could be very much simplified since the odd integer representation is always used for input and output address transformation, and both cannot be combined since the words generated are odd numbers. http://www.ijettjournal.org Page 2657 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 II. APC-OMS DESIGN FOR LUT BASED MULTIPLIER DESIGN A discrete FIR filter consists of delay element, coefficient and adder blocks as shown in fig. 2. In general LUT multiplier it has the input bit ‘X’ of length ‘L’ and ‘AX’ as output bit, where A is the constant depends on the LUT value. 2L words are required for multiplying X of L-bit with constant. With the increase in input size LUT size increases exponentially. LUT for input of word length L=4 requires 16 address lines to store the input bit sequence is shown in below table 1. TABLE1 LUT REPRESENTATION Fig. 2 General Representation of FIR filter For a discrete-time FIR filter, the output is a weighted sum of the current and a finite number of previous values of the input. The operation is described by the following equation, which defines the output sequence y[n] in terms of its input sequence x[n] is shown in eq(1). [ ]= [ ]+ =∑ [ − 1] + ⋯ + [ − ] [ − ] (1) Here bi are the filter coefficients, also known as tap weights, that make up the impulse response and N is the filter order. An Nth-order filter has (N+1) terms on the right-hand side. The x[n-i] in these terms are commonly referred to as taps, based on the structure of a tapped delay line that in many implementations or block diagrams provides the delayed inputs to the multiplication operations. One may speak of a 5th order/6-tap filter, for instance. Address Product word Address word,X Product word 0000 0 1000 8A 0001 A 1001 9A 0010 2A 1010 10A 0011 3A 1011 11A 0100 4A 1100 12A 0101 5A 1101 13A 0110 6A 1110 14A 0111 7A 1111 15A word, X TABLE 2 OMS BASED REDUCTION SCHEME FOR LUT MULTIPLIER Address word Product word 0001 A 0011 3A 0101 5A 0111 7A 1001 9A 1011 11A 1101 13A 1111 15A III. LOOK UP TABLE (LUT) The tables of multiplication are pre-calculated and stored in memory. For fast accessing of values from the memory, LUT’s are used for saving the computation complexity. In digital logic, an n-bit LUT can be implemented with a multiplexer whose select lines are the inputs of LUT and inputs are constants. An n-bit LUT can encode any n-input Boolean function by modelling with truth tables[7]. LUT’s with 4-6 bits of input are the key component of modem FPGAs and this is an efficient way of encoding functions. General representation of LUT for multiplication bits are shown in below fig. 3. By using the OMS scheme only odd multiplies are stored in the LUT and the even multiplies of the LUT are derived by left shifting the odd multiplies by using the barrel shifter scheme. By using the barrel shifter we can produce the maximum (L-1) no. of left shifts to produce the even multiples. Fig.3 General Form for LUT multiplier ISSN: 2231-5381 http://www.ijettjournal.org Page 2658 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 TABLE 3 APC WORDS FOR DIFFERENT INPUT WORDS FOR L=4 Input X1 X21 X11 X01 001 010 100 011 110 101 111 Product value A 2XA 4XA 3A 2X3A 5A 7A No. of shifts 0 1 2 0 1 0 0 Shifted Stored input, X11 APC word d2d1d0 001 P0=A 000 011 P1=3A 001 101 111 P2=5A P3=7A 010 011 Address words. If two bits per word are accepted, then the computational speed can be essentially improved. The maximum speed can be achieved with the fully pipelined word-parallel architecture. For maximum speed, a separate ROM (with identical content) for each bit vector xb[n] should be provided.Combined approach of FIR filter using DA technique for L=8 using APC-OMS techniques with L=4 is shown in Fig.5. Fig.5 Combined approach of FIR filter using DA technique for L=8 using APC-OMS techniques with L=4. Fig. 4 APC-OMS approach for L=4 The implementation of the proposed APC-OMS combined LUT for memory based multiplier uses two techniques, APC and OMS method shown in fig. 4. This method is supposed to reduce the area to one fourth. This multiplier uses four blocks [8] [5] [6] [7]. The address generation block converts our input to address d0, d1, d2which is produced by combining both the APC and OMS method. The 3-to-8 address line decoder converts the address d0, d1, d2 to LUT address from w1 to w7. The memory array is an LUT and barrel shifter converts the LUT output to the desired output.The control circuit is used to produce the controls s0, s1 which is used in the proceeding blocks [2] [7] [8] [11]. The control and reset circuit can be designed as S0=x0+(x1+x2’) (2) S1=(x0+x1) (3) Reset= x3 and x2’x1’ (4) The barrel shifter will right shift circularly according to the control values (s0 s1), using the basic gates to produce the control elements reset, s0, s1. From the barrel shifter, thus producing the address (d0d1d2) to use in the next sections. The address generator circuit consists of a barrel shifter and some basic gates, which converts our input to an address d0d1d2, which is obtained by combining both of our methods anti-symmetric (APC) and odd multiple storage (OMS). V. RESULTS AND DISCUSSIONS Fig 6 Simulated results for L=8 The fig. 6 shows the waveforms generated using Xilinx ISE while performing combined approach FIR filter using DA technique for L=8 using two L=4 LUT design using combined APC approach. The detailed description of the given inputs and the output generated is given further.For 8-bit input operand X, Data_in, Address, W, P are given with inputs of 8’h05, 8’h02, 8’h04, 8’h004, 8’h04 respectively at 421ns, the output Q is obtained as 8’h001. In the Fig. 7 showing the comparison for different lengths of binary words with number of 4 input LUT’s with the combination of both APC-OMS and DA techniques when compared to using only APCOMS technique. Using that combination technique uses less number of LUT’s when compared with APC-OMS technique which implies the reduction in the area utilized by the FPGA. IV. DISTRIBUTED ARITHMETIC BASED FIR FILTER A basic DA architecture, for a length Nth sum-ofproduct computation, accepts one bit from each of N ISSN: 2231-5381 http://www.ijettjournal.org Page 2659 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 Number of 4-input LUT’s Fig. 9 Comparison of delay Comparision for No.of LUT's utilized 20 15 10 DA 5 0 APC-OMS 4 5 6bit 7 8 bit bit bit bit Length of binary word size Fig.7 Comparison for number of LUT’s utilized Number of Slices In the fig. 8the comparison for different lengths of binary words with number of slices with the combination of both APC-OMS and DA techniques when compared to using only APC-OMS technique is shown. Using this combination technique, less number of slices when compared with APC-OMS technique which implies the diminution in the area utilized by the FPGA. 10 Comparision for No.of Slices utilized 5 DA APC-OMS 0 4 5 6bit 7 8 bit bit bit bit Length of binary word size Fig 8 Comparison for number of slices utilized In the fig. 9 the comparison for different lengths of binary words with delay in Nano seconds with the combination of both APC-OMS and DA techniques when compared to using only APC-OMS technique is shown. Even though there will be slight increase in delay but there is 40% decrease in area utilization with the combination of both APC-OMS and DA techniques when compared to using only APC-OMS technique. Fig.10 Spartan 3E results for L=8 The same code is implemented in Spartan 3E FPGA kit with input as 8’h05 and getting the output sequences in hexa decimal value as ‘01’shown in fig. 10. V. CONCLUSION The LUT based multipliers can be used to implement the constant multiplication for DSP applications. The full advantages of proposed LUT based design can be derived if the LUTs are implemented as NAND or NOR read-only memories. The OMS–APC-based LUTs can be used for higher input sizes with different forms like parallel and pipelined addition schemes for suitable area–delay trade-offs. Finite impulse response plays an important role in manyDigital Signal Processing applications. In this method multiplier less FIR filter is implemented using DA technique. This architecture provides an efficient area implementation of FIR filter with less latency, less area when compared with existing FIR filters. L=4 to 8-bit width based filters are designed and simulated using Xilinx ISE 10.1i. The performance of the filter can be improved further by pipelining all the input and partition tables for higher input sizes. REFERANCES [1] [2] [3] 7.5 7 6.5 DA 6 4 bit 5 bit 6bit 7 bit 8 bit Delay in nano seconds Comparision of Delay APC-OMS [4] Length of binary word size ISSN: 2231-5381 [5] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. HanhoLee, GeraldE. Sobelman, “FPGA-based digitserial CSD FIR filter for image signal format conversion”,MicroelectronicsJournal33(5–6)(2002) 501– 508. Narender Singh Pal, Harjit Pal Singh, R.K. Sarin, SarbjeetSingh “IMPLEMENTATION OF HIGH SPEED FIR FILTER USING SERIAL AND PARALLEL DISTRIBUTED ARITHMETIC ALGORITHAM” International Journal of Computer Applications (0975 – 8887) Volume 25– No.7, July 2011. P.K. Meher, “LUT Optimization for Memory-Based Computation” IEEE TRANSCTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010. International Technology Roadmap for Semiconductors. [Online]. Available:http://public.itrs.net. http://www.ijettjournal.org Page 2660 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 [6] [7] [8] JiafengXie,Jianjun He, Guanzheng Tan, “FPGA Realization of FIR filters for high-speed and mediumspeed by using modified distributed arithmetic architectures”, Microelectronics journal 41(2010) 365370. Valeria Garofalo, “Fixed-width multipliers for the implementation of efficient digital FIR filters”, Microelectronics Journal 39(12)(2008)1491–1498. Eldho John, P. Dinesh Kumar “MODIFIED APC-OMS COMBINED LUT FOR MEMORY BASED COMPUTATION”, International Journal of Systems, Algorithms & Applications Volume 2, Issue 3, March 2012, ISSN Online: 2277-2677. ISSN: 2231-5381 http://www.ijettjournal.org Page 2661