Chapter 4 The Fast Fourier Transform 4.1 Introduction The fundamental motivation for the FFT is the extremely high computational load of calculating the DTFS directly as shown in section (3.7.1). By exploiting the symmetry and periodicity properties of the twiddle factor in section (3.3), the number of required calculations is significantly reduced [1]. Many FFT algorithms exist today, but they are all derivatives of the work done by Cooley and Tukey in 1965 [7]. Their algorithm was so efficient in computation that it revolutionized digital signal processing [4]. The highest efficiency is achieved when the sample length is a power of two [1]. It is important to note that the FFT is mathematically equivalent to the DTFS, not an approximation. As a result, all of the properties and strengths of the DTFS hold true for all FFT algorithms. Most of the weaknesses also apply to FFT algorithms. The FFT does reduce the computational requirements and the amount of quantization noise error due to the high computational load of the DTFS [2]. 4.2 FFT Improvements to the DTFS 4.2.1 Computational Load The computational load of the DTFS discussed in section (3.7.1) is defined by equations (3.24) and (3.25). The high computational load of the DTFS is due to the N2 terms in these equations. The FFT algorithms significantly reduce the required computations. All FFT algorithms have a computational load of 32 33 Number of Adds = C1*N*LOG2(N) (4.1) Number of Multiplies = C2*N*LOG2(N) (4.2) Where C1 and C2 are constants. The radix-2 Cooley-Tukey algorithm requires approximately 3*N*LOG2(N) adds and 2*N*LOG2(N) multiplies [2]. Figure (4.1) compares the computational load of this FFT algorithm with that of computing the DTFS directly. Figure 4.1 Computational Loads of DTFS and FFT As shown in figure (4.1) there is a significant advantage to using the FFT over the DTFS as the sample length (N) increases. For example, consider a sample length (N) of 32,768. Calculating the DTFS directly would require 8,590,000,129 computations while calculating the FFT would require only 2,457,600. This means that if the calculation 34 time for a computer to evaluate the FFT is 30 seconds, then it would require 29.13 hours to resolve the DTFS. 4.2.2 Reduced Quantization Noise Error Quantization noise error has two contributing components. First, as discussed in section (3.7.2), the rounding-off process involved with quantization of a sampled signal, introduces quantization error. Quantization of a sampled signal was discussed in section (2.6). This quantization error is magnified by the numerous multiplications in calculating the DTFS. Since the number of multiplications required to calculate the FFT is significantly reduced, the error due to quantization noise is also reduced. The second part of quantization noise error also deals with the multiplications involved. The product of the multiplication itself must be rounded-off. The product of two M-bit numbers is a 2*M bit number. To store the result as an M-bit value, the bottom M bits must be discarded. For example, if two 16-bit numbers are multiplied, the product is a 32-bit number [2]. Again, since the number of multiplications required to calculate the FFT is reduced, this contribution to error is also reduced. 4.3 Additional Weakness of the FFT Since the input data must be reorganized in order to compute the FFT, all of the output coefficients must be computed. A single output coefficient cannot be computed using the FFT. However, by using the DTFS, each output coefficient can be computed one at a time. As a result, if only a few output coefficients are needed, then using the DTFS would be more beneficial than the FFT in terms of computational load. Generally though, all of the output coefficients are needed, and this weakness does not apply [2]. 35 4.4 Radix-2 Decimation-in-Time (DIT) FFT Algorithm A length-N DTFS can be split up into a series of lower-order DTFS. The number of computations required to calculate the series of lower-order DTFS is significantly reduced. Consider equations (3.1) and (3.2). Since they are basically the same operation, the same algorithm with small modification can be used to generate either the set of X[k] (FFT) or the set of x[n] (inverse-FFT) [1]. Assuming that N is an even number, the set x[n] in equation (3.1) an be divided into its even and odd indexes [1]. Quantitatively, this is described by equations (4.3) and (4.4): xe [n] x[2n], 0 n N '-1 (4.3) xo [n] x[2n 1], 0 n N '-1 (4.4) where N’ = N/2. The DTFS of each even and odd data set is defined by equations (4.5) and (4.6) DTFS{xe[n]} = Xe[k], 0 ’ (4.5) DTFS{xo[n]} = Xo[k], 0 ’ (4.6) where 0’ = 2/N’ = 4/N = 20. Now express equation (3.1) in terms of the even and odd data sets [1]: X [k ] N 1 x[n]e j0kn 1 N N ' 1 x[2m]e j0k ( 2m) 1 N N ' 1 1 N N ' 1 1 N N ' 1 n 0 1 N N 1 1 N x[n]e j0kn even m 0 x[2m]e j0k ( 2m) m 0 1 N x[2m 1]e N 1 x[n]e j 0 kn (4.7) odd j 0 k ( 2 m 1) (4.8) m 0 x[2m 1]e m 0 j 0 k ( 2 m ) e j 0 k (4.9) 36 1 N N ' 1 x[2m]e j0 k ( 2 m ) m 0 e j0k N N ' 1 x[2m 1]e j0 k ( 2 m ) (4.10) m 0 Now by substituting xe[n], xo[n] and 0’ = 2/N’ = 4/N = 20 into equation (4.10), equation (4.11) is obtained: X[k] 1 N N ' 1 xe [m]e j0 'km m 0 e j0k N N ' 1 x [m]e m 0 j0 'km o (4.11) and by equations (3.1), (4.5) and (4.6) the following expression is obtained: X [k ] X e [k ] e j0 k X o [k ], 0 k N -1 (4.12) Equation (4.12) implies that an N-point DTFS can be divided into its even- and oddindexed N’-point DTFSs and evaluated. The original N-point DTFS is the sum of the even-indexed DTFS and the weighted, odd-indexed DTFS. If N is a power of two, then each even- and odd-indexed DTFS can be subdivided into its even- and odd-indexed portions until an array of N one-point DTFSs exist [1]. The DTFS can be further simplified by using the periodicity of the DTFS. From equation (3.22), equations (4.13) and (4.14) are obtained [1]: X e [k ] X e [k N ' ], 0 k N'-1 (4.13) X o [k ] X o [k N ' ], 0 k N'-1 (4.14) Also, from the inverse symmetry property in equation (3.10) it is gathered that e j 0 k e j 0 ( k N ) 2 (4.15) By applying equations (4.13), (4.14), and (4.15) to equation (4.12), the following are obtained [1]: X [k ] X e [k ] e j0 k X o [k ], 0 k N '-1 (4.16) X [k N ' ] X e [k ] e j0 k X o [k ], 0 k N '-1 (4.17) 37 Figure (4.2) displays the calculation of equation (4.16) and (4.17) for an 8-point FFT. Figure 4.2 Eight-Point FFT [1] Each 4-Point DTFS block can further be divided into two 2-point DTFSs as shown below in figure (4.3). Figure 4.3 Expanded Four-Point FFT [1] 38 Finally, each 2-point DTFS can be divided into two 1-point DTFSs. This is shown in figure (4.4) below. Figure 4.4 Expanded Two-Point FFT [1] This is called a butterfly because of its appearance [1]. Since it is the first stage, the calculation in figure (4.3) is called a first-stage butterfly [9]. The calculations in the second column are called second-stage butterflies, etc. Notice the absence of the complex exponential in figure (4.4). Equations (4.18) and (4.19) illustrate why the complex exponential is not present in figure (4.4). X ee [0] xee [0]e 0 xee [1]e 0 xee [0] xee [1] X ee [1] xee [0]e xee [1]e 0 j 2 (1)(1) 2 xee [0] xee [1] (4.18) (4.19) The process of subdividing each of the blocks into even- and odd- indexed sets of data permutes the order of the DTFS input coefficients. The final result after all of the subdivisions down to the 1-point DTFSs results in bit-reversed order of the DTFS input coefficients. For example, after all of the subdivisions for the 8-point FFT shown above, the final order of the input coefficients is x[0], x[4], x[2], x[6], x[1], x[5], x[3], and lastly x[7]. Therefore, when using this algorithm, the input coefficients must first be bitreversed before applying the calculations. The location of each input coefficient can be found by taking the index’s binary representation, bit reversing this binary value, and then converting back to decimal. For example, the binary representation of the index of 39 x[1] is 0012. Bit reversing this binary value gives 1002 which is equal to 410 in decimal. Therefore, x[1] is relocated to the original x[4] position [5]. The FFT algorithm used in this thesis is a radix-2 decimation-in-time (DIT) FFT algorithm that uses the scheme discussed above. The FFT algorithm used is an iterative algorithm in which the larger DTFS blocks are broken down into smaller blocks until the fundamental DTFSs are reached. After this process is completed the algorithm computes the butterflies from the 1st stage to the last stage. The algorithm takes in an input data set in its original order and manipulates the data in a bit-reversed order for processing by the FFT. The code for the FFT algorithm used in this thesis is presented in Appendix (D). 4.5 Summary The primary weakness of the DTFS is the computational burden it puts on a system for increasing values of N. In order to harness the power of the DTFS for signal processing of large data sets, it is necessary to circumvent the computational load required to calculate it. As a product of the work done by Cooley and Tukey in 1965, this is made possible [7]. The fast Fourier transform provides an output of mathematical equivalence to the DTFS while drastically reducing its computational requirements. In the initial development more time is required to utilize the FFT over the DTFS, but this time is outweighed by compounded time saved by using the FFT over the DTFS. Many FFT algorithms exist in addition to the algorithm used in this thesis and can be found in references [2] and [4]. The user may peruse through the available algorithms and find the one that is best suited for the application at hand. As will be discussed in section (5.2), the choice of the FFT algorithm and the choice of the window function are independent of each other. This allows a “superposition” approach to the development 40 process of an FFT-based signal processing system. The next chapter discusses the use of window functions in order to minimize some of the weaknesses of the DTFS.