International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 Area Efficient Parallel FIR Digital Filter Structures of Even Length Based on Fast FIR Algorithm G.Sreenivasulu1, M.Maddhu Sudhan Reddy2 M.Tech, Depatment of ECE, GPREC, KNL.A.P,India1 Asst.professor, M.Tech, MIETE, MIEE, Department of ECE,GPREC,KNL,A.P,India2 Abstract- This paper introduces novel parallel FIR filter structures which are advantageous to symmetric coefficient in the reduction of hardware cost , based, on“Fast finite impulse response (FIR), algorithms” (FFA),under the condition that number of taps is multiple of two or three. The novel parallel finite impulse response determines the inherent nature of symmetric coefficients which reduces the half of the required multiplier in sub filter block at additional increase of adders in pre processing and post processing blocks .Interchanging multipliers with adders is profitable because multipliers occupy more silicon area compared to that of adders which occupy less silicon area .and also additional increase adders ,stay fixed only in preprocessing and post processing blocks they do not in the subfilter section. For example for a 2 parallel 24 tap FIR filter the proposed structure save 6 multipliers at expense of 2 adders for a 3 parallel 24 tap FIR filter the proposed structure saves 6 multipliers at expense of 7 adders and for a 4 parallel 24 tap FIR filter the proposed structure saves 9 multipliers at the expense of 11 adders. The advantage of novel parallel FIR filter structures is that the more number of multipliers are saved as the length of the FIR filter increases. Keywords-Digital signal processing (DSP), fast finite-impulse response (FIR) algorithms (FFAs), parallel FIR, symmetric convolution, very large scale integration (VLSI). I. INTRODUCTION Due to the explosive growth of multimedia application, the demand for high-performance and low-power digital signal processing (DSP) is getting higher and higher. Finite- ISSN: 2231-5381 impulse response (FIR) digital filters are one of the most widely used fundamental devices performed in DSP systems, ranging from wireless communications to video and image processing. Some applications need the FIR filter to operate at high frequencies such as video processing, whereas some other applications request high throughput with a low-power circuit such as multiple- input multiple-output (MIMO) systems used in cellular wireless communication. Furthermore, when FIR filter narrow transition band characteristics are required, the much higher order in the FIR filter is unavoidable. For example, a 576-tap digital filter is used in a video ghost canceller for broadcast television, which reduces the effect of multipath signal echoes. On the other hand, parallel and pipelining processing are two techniques used in DSP applications, which can both be exploited to reduce power consumption. Pipelining shortens the critical path by interleaving pipelining latches along the data path, at the price of increasing the number of latches and the system latency, whereas parallel processing increase the sampling rate by replicating hardware so that multiple inputs can be processed in parallel and multiple outputs are generated at the same time, at the expense of increased area. Both techniques can reduce the power consumption by lowering the supply voltage, where the sampling speed does not increase. In this paper, parallel processing in the digital FIR filter will be discussed. Due to its linear increase in the hardware implementation cost brought by the increase of the block size L, the parallel processing technique loses its advantage in practical implementation. In poly phase decomposition is mainly manipulated, where the smallsized parallel FIR filter structures are derived first and then the larger block-sized ones can be constructed by cascading or iterating small-sized parallel FIR filtering blocks. Fast FIR algorithms (FFAs) it can implement a L-parallel filter http://www.ijettjournal.org Page 58 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 using approximately (2L-1) sub filter blocks, each of which is of length N/L. FFA structures successfully break the constraint that the hardware implementation cost of a parallel FIR filter has a linear increase along with the block size L. It reduces the required number of multipliers to (2N- N/L) from L×N. , the fast linear convolution is utilized to develop the small-sized filtering structures and then a long convolution is decomposed into several short convolutions, i.e., larger block-sized filtering structures can be constructed through iterations of the small-sized filtering structures. However, in both categories of method, when it comes to symmetric convolutions, the symmetry of coefficients has not been taken into consideration for the design of structures yet, which can lead to a significant saving in hardware cost. In this paper, we provide new parallel FIR filter structures based on FFA consisting of advantageous poly phase decompositions, which can reduce amounts of multiplications in the sub filter section by exploiting the inherent nature of the symmetric coefficients, compared to the existing FFA fast parallel FIR filter structure. This paper is organized as follows. A brief introduction of FFAs is given in Section II. In Section III, the proposed parallel FIR filter structures are presented. Section IV investigates the complexity and comparisons. Y ( ) X (Z )Z = H (Z )Z (2) Where, X ∑∞ H ∑ Y ( ( ) ∑∞ ) ( ( ) ) for p,q,r = 0,1,2,3,….,L-1. From This FIR filtering equation, it shows that the traditional FIR filter will require L2-1FIR subfilter blocks of length for implementation. A. (2x2 FFA) (L=2) According to (2), a two- parallel FIR filter can be expressed as In Section V, the description of hardware implementation and the experimental results are shown. Section VI gives the conclusion. Y + Z Y = (H + Z H )(X + Z X ) = H X + Z (H X + H X ) + Z H X II. FAST FIR ALGORITHM (FFA) (3) Consider an N-tap FIR filter which can be expressed in the general form as ( )=∑ Implying that ( ) ( − ), = , , , … … … , ∞ (1) Where, = X(n) Infinite length input Sequence = + , + (4) Equation (4) shows the traditional two parallel filter structure, which will require four length- N/2 FIR sub filter blocks, two processing adders and totally 2N multipliers and 2N-2 adders. However, (4) can be written as H(i) FIR filter coefficients. N Order of the Filter. The traditional L- parallel FIR filter can be derived using polyphase decomposition as Y =H X +Z H X Y = ((H + H )(X + X ) − H X − H X (5) ISSN: 2231-5381 http://www.ijettjournal.org Page 59 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 III. PROPOSED FFA STRUCTURES FOR SYMMETRIC CONVOLUTIONS Fig1.Existing 2 parallel FIR Filter The implementation of (5) will require three FIR sub filter blocks of length N/2, one preprocessing and three post processing adders, and 3N/2 multipliers and 3(N/2-1)+4 adders, which reduces approximately one fourth over the traditional two-parallel filter hardware cost from (4). The two-parallel (L=2) FIR filter implementation using FFA obtained from (5) is shown in Fig. 1. To utilize the symmetry of coefficients, the main idea behind the proposed structures is actually pretty intuitive, to manipulate the polyphase decomposition to earn as many subfilter blocks as possible which contain symmetric coefficients so that half the number of multiplications in the single subfilter block can be reused for the multiplications of whole taps, which is similar to the fact that a set of symmetric coefficients would only require half the filter length of multiplications in a single FIR filter. Therefore, for an N-tap L-parallel FIR filter the total amount of saved multipliers would be the number of subfilter blocks that contain symmetric coefficients times half the number of multiplications in a single subfilter block (N/2L). B. 3x3 FFA(L=3) By the similar approach, a three-parallel FIR filter using FFA can be expressed as + × [( + )( + ) − ] ]−( = [( + )( + ) − − ) = [( + + )( + + )] ] − [( + )( + ) − − [( + )( + ) − ] = − A. 2 x 2 Proposed FFA (L = 2): From (4) two-parallel FIR filter can also be written as: 1 Y = [(H + H )(X + X ) 2 + (H − H )(X − X )] −H X (6) Y = +Z H X 1 [(H + H )(X + X ) − [(H − H )(X − X )] 2 (7) When it comes to a set of even symmetric coefficients, (7) can earn one more subfilter block containing symmetric coefficients than (5), the existing FFA parallel FIR filter. Fig. 5.1 shows implementation of the proposed twoparallel FIR filter based on (7). Fig2.existing 3 parallel FIR filter The hardware implementation of (6) requires six lengthN/3 FIR subfilter blocks, three pre processing and seven post processing adders, and three N multipliers and 2N+4 adders, which has reduced approximately one third over the traditional three-parallel filter hardware cost. The implementation obtained from (6) is shown in Fig. 2. ISSN: 2231-5381 Fig.3: Proposed two-parallel FIR filter http://www.ijettjournal.org Page 60 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 Example 1: Consider a 24-tap FIR fiter with a set of symmetric coefficients applying to the proposed twoparallel FIR filter. {ℎ(0), ℎ(1), ℎ(2), … … ℎ(23)} ℎ(0) = ℎ(23), ℎ(1) = ℎ(22), ℎ(2) = ℎ(21), ℎ(3) = ℎ(20), ℎ(4) = ℎ(19), … . , ℎ(11) = ℎ(12), where ± = {ℎ(0) ± ℎ(1), ℎ(2) ± ℎ(3), ℎ(4) ± ℎ(5), ℎ(6) ± ℎ(7) … , ℎ(18) ± ℎ(19), ℎ(20) ± ℎ(21), ℎ(22) ± ℎ(23)} (8) As can be seen from the example above, two of three subfilter blocks from the proposed two-parallel FIR filter structure,H₀-H₁ and Hₒ+H₁, are with symmetric coefficients now, as (8), which means the subfilter block can be realized by Fig. 4, with only half the amount of multipliers required. Each output of multipliers responds to two taps. Note that the transposed direct-form FIR filter is employed. Compared to the existing FFA two-parallel FIR filter structure, the proposed FFA structure leads to one more subfilter block which contains symmetric coefficients however. However, it comes with the price of the increase of amount of adders in preprocessing and postprocessing blocks. In this case, two additional adders are required for L=2. Y = 1 [(H + H ) (X + X ) + (H − H )(X − X )] 2 −H X +Z (H + H + H )(X + X + X ) − (H + H )(X + X ) 1 − [(H + H )(X + X ) 2 − (H − H )(X − X )] − H X Y = Y = 1 [(H + H ) (X + X ) − (H − H )(X − X )] 2 1 (H + H )(X + X ) +Z 2 − (H − H )(X − X ) 1 − [(H + H )(X + X ) 2 + (H − H )(X − X )] + H X 1 [(H + H )(X + X ) − [(H − H )(X − X )] 2 +H X (9) Fig.5: Proposed three-parallel FIR filter Fig4: subfilter block implementation with symmetric coefficients Fig. 5 shows implementation of the proposed three-parallel FIR filter. When the number of symmetric coefficients N is the multiple of 3, the proposed three-parallel FIR filter B. 3 x 3 Proposed FFA (L = 3): structure presented in (9) enables four subfilter blocks with With the similar approach, from (6), a three-parallel FIR filter can also be written as (9). symmetric coefficients in total, whereas the existing FFAparallel FIR filter structure has only two ones out ofsix subfilter blocks. A comparison figure is shown in Fig. 6, ISSN: 2231-5381 http://www.ijettjournal.org Page 61 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 Fig8.proposed four parallel fir filter implementation C.Proposed Cascading FFA a small modification is adopted here for lower hardware consumption. As we can see, the proposed parallel FIR structure enables the reuse of multipliers in parts of the subfilter blocks but it also brings more adder cost in preprocessing and postprocessing blocks. When cascading the proposed FFA parallel FIR structures for larger parallel block factor L. The increase of adders can become larger. Therefore, other than applying the proposed FFA FIR filter structure to all the decomposed subfilter blocks, the existing FFA structures which have more compact operations in preprocessing and postprocessing blocks are employed for those subfilter blocks that contain no symmetric coefficients, whereas the proposed FIR f ilter structures are still applied to the ISSN: 2231-5381 rest of subfilter blocks with symmetric coefficients. An illustration of the proposed cascading process for a four-parallel FIR filter (L=4) as an example is shown in Fig. 7, and the realization is shown in Fig. 8. From Fig. 7, it is clear to see that the proposed four-parallel FIR structure earns three more subfilter blocks containing symmetric coefficients than the existing FFA one, which means 3N/8 multipliers can be saved for an N-tap FIR filter, at the price of 11 additional adders in preprocessing and postprocessing blocks. By this cascading approach, parallel FIR filter structures with larger block factor L can be realized. IV. COMPLEXITY ANALYSIS AND COMPARISON When an L-parallel FIR filter comes with a set of symmetric coefficients of length N, the number of required http://www.ijettjournal.org Page 62 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 multipliers for the proposed parallel FIR filter structures is provided by (10) = − ∏ 2 (10) Where L is the small parallel block size, r is the number of FFAs used. M is the number of subfilter blocks resulted from ith FFA.,S is the number of subfilter blocks containing symmetric coefficients. V. IMPLEMENTATION AND EXPERIMENTAL RESULT TABLE 2 COMPARASION OF DEVICE LOGIC UTILIZATION FOR TWO PARALLEL EXISTING AND PROPOSING LOGIC UTILIZATION AVAILABLE EXISTING PROPOSING Number of slices Number of slices Number of 4 input LUTS Number of bonded IOBS Number of MULT18X18SI Number of GCLKS 4656 1822 1474 9312 51 58 9312 3450 2807 232 328 296 20 20 20 4 1 1 The proposed FFA structures and the existing FFA structures are implemented in Verilog HDL with filter length of 24 with word length 16-bit. TABLE 1 COMPARISION OF PROPOSED AND THE EXISTING FFA STRUCTURES NUMBER OF REQUIRED MULTIPLIERS (M.),REDUCED MULTIPLIERS (R.M.),NUMBE OF REQUIRED ADDERS IN SUBFILTER SECTION (SUB.),NUMBER OF REQUIRED ADDERS IN PRE/POST PROCESSIN BLOCKS(PRE/POST.), AND THE NUMBER OF THE INCREASED ADDERS(I.A.) VI. CONCLUSION In this paper, we have presented new parallel FIR filter structures, which are beneficial to symmetric convolutions when the number of taps is the multiple of 2 or 3. Multipliers L 2 Length 24-tap 3 24-tap 4 24-tap Structure Existing Proposed Existing Proposed Existing Proposed M 30 24 40 32 51 42 R.M. 6 I.A 2 are the major portions in hardware consumption for the parallel FIR filter implementation. The proposed new structure exploits the nature of even 8 7 symmetric coefficients and save a significant amount of 9 11 multipliers at the expense of additional adders. Since multipliers outweigh adders in hardware cost, it is TABLE2 COMPARASION OF DEVICE LOGIC UTILIZATION FOR TWO PARALLEL EXISTING AND PROPOSING profitable to exchange multipliers with adders. Moreover, the number of increased adders stays still when the length of FIR filter becomes large, whereas the number of reduced LOGIC UTILIZATION AVAILABLE EXISTING PROPOSING Number of slices Number of slices Number of 4 input LUTS Number of bonded IOBS Number of MULT18X18SI Number of GCLKS 4656 989 609 Consequently, the larger the length of FIR filters is, the 9312 28 28 more the proposed structures can save from the existing 9312 1883 1174 232 298 238 consisting of advantageous polyphase decompositions 20 20 20 dealing with symmetric convolutions comparatively better 4 1 1 multipliers increases along with the length of FIR filter. FFA structures, with respect to the hardware cost. Overall, in this paper, we have provided new parallel FIR structures ISSN: 2231-5381 than the existing FFA structures in terms of hardware consumption. http://www.ijettjournal.org Page 63 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 2 – Oct 2014 REFERENCES [1] J. G. Chung and K. K. Parhi, “Frequency-spectrum-based low-area low-power parallel FIR filter design,” EURASIP J. Appl. Signal Process., vol. 2002, no. 9, pp. 444–453, 2002. [2] Z.-J. Mou and P. Duhamel, “Short-length FIR filters and their use in fast non recursive filtering,” IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1322–1332, Jun. 1991. [3] C. Cheng and K. K. Parhi, “Hardware efficient fast parallel FIR filter structures based on iterated short convolution,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 8, pp. 1492–1500, Aug. 2004. [4] C. Cheng and K. K. Parhi, “Low-cost parallel FIR structures with2stage parallelism,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol.54, no. 2, pp. 280–290, Feb. 2007. [5] N. Sankarayya , Kaushik Roy, and Debashis Bhattacharya, “Algorithms for Low Power and High Speed FIR Filter Realization Using Differential Coefficients” IEEE transactions on circuits and systems—ii: analog and digital signal processing, vol. 44, no. 6, june 1997 [6] I.-S. Lin and S. K. Mitra, “Overlapped block digital filtering,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 43, no. 8, pp. 586–596, Aug. 1996. [7] Yu-Chi Tsao and Ken Choi, “ Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filters of Odd Length Based on Fast FIR Algorithm” IEEE Transactions on circuits and systems—ii: express briefs, vol. 59, no. 6, June 2012 . [8] Yongtao wang and Kaushik Roy , “A New Complexity Reduction Technique for Multiplier less Implementation of Digital FIR Filters” .IEEE Transactions On Circuits And Systems—I: Regular Papers, Vol. 52, No. 9, September 2005 . [9] Akhiro Matsuura,Mitsuteru Yukishita,Akira Nagoya“An efficient Hierarchical clustering method for the multiple constant multiplication problem”. [10].Yasuhiro Takahashi,Kazukiyo Takahashi And Michio Yokoyama, “synthesis of multiplierless FIR filter by efficient sharing of horizontal and vertical common sub expression elimination”., international conference on circuits and systems. ISSN: 2231-5381 http://www.ijettjournal.org Page 64