Fast Fourier Transform: Theory and Algorithms Lecture 8 Vladimir Stojanović 6.973 Communication System Design – Spring 2006 Massachusetts Institute of Technology Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Discrete Fourier Transform – A review Definition {Xk} is periodic Since {Xk} is sampled, {xn} must also be periodic From a physical point of view, both are repeated with period N Requires O(N2) operations 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 2 Fast Fourier Transform History Twiddle factor FFTs (non-coprime sub-lengths) 1805 Gauss Predates even Fourier’s work on transforms! 1903 Runge 1965 Cooley-Tukey 1984 Duhamel-Vetterli (split-radix FFT) FFTs w/o twiddle factors (coprime sub-lengths) 1960 Good’s mapping application of Chinese Remainder Theorem ~100 A.D. 1976 Rader – prime length FFT 1976 Winograd’s Fourier Transform (WFTA) 1977 Kolba and Parks (Prime Factor Algorithm – PFA) 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 3 Divide and conquer Divide and conquer always has less computations Suppose all Il sets have same number of elements N1 so, N=N1*N2, r=N2 Each inner-most sum takes N12 multiplications The outer sum will need N2 multiplications per output point N2*N for the whole sum (for all output points) Hence, total number of multiplications 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 4 Generalizations The inner-most sum has to represent a DFT Only possible if the subset (possibly permuted) All main classes of FFTs can be cast in the above form Both sums have same periodicity (Good’s mapping) No permutations (i.e. twiddle factors) All the subsets have same number of elements m=N/r Has the same periodicity as the initial sequence (m,r)=1 – i.e. m and r are coprime If not, then inner sum is one stap of radix-r FFT If r=3, subsets with N/2, N/4 and N/4 elements Split-radix algorithm 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 5 The cost of mapping The goal for divide and conquer Different types balance mapping with subproblem cost E.g. in radix-2 subproblems are trivial (only sum and differences) Mapping requires twiddle factors (large number of multiplies) E.g. in prime-factor algorithm Subproblems are DFTs with coprime lengths (costly) Mapping trivial (no arithmetic operations) 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 6 FFTs with twiddle factors Reintroduced by Cooley-Tukey ‘65 Start from general divide and conquer Keep periodicity compatible with periodicity of the input sequence Use decimation almost N1 DFTs of size N2 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 7 Cooley-Tukey FFT contd. can be taken mod N2 Yn1 ,k = Yn1 ,k2 1. N1 DFTs of length N2 2. N multplications with twiddle factors 3. N2 DFTs of length N1 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 8 Step 1: Evaluate N1 DFTs of length N2 Step 2: N multiplications with twiddle factors Step 3: Evaluate N2 DFTs of length N1 Vector xi mapped to matrix xn1,n2 (N1xN2) Compute N1 DFTs of length N2 on each row Point-to-point multiply with twiddle factors Compute N2 DFTs of length N1 on the columns 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 9 2-D view of Cooley-Tukey mapping N=15 (N1=3, N2=5) x0 x1............... x13 x14 x0 x3 x6 x9 x12 x1 x4 x7 x10 x13 x2 x5 x8 x11 x14 x12 x12 x9 x3 x0 x7 x4 x1 x9 x13 x10 x14 x11 x8 x3 x0 DFT-5 W DFT-3 DFT-3 DFT-5 x1 x14 x0 x13 x5 x5 x2 x9 DFT-3 x4 x5 x2 DFT-3 x6 jm x6 x4 DFT-3 x12 x11 DFT-5 x10 Figure by MIT OpenCourseWare. Cannot exchange the order of DFTs Because of twiddle multiply Different mapping for N1=5, N2=3 Not just transpose x0 x1 x2 x3 x4 x5 x10 x6 x11 x7 x12 x8 x13 x9 x14 Figure by MIT OpenCourseWare. 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 10 Radix 2 and radix 4 algorithms Lengths as powers of 2 or 4 are most popular Assume N=2n N1=2, N2=2n-1 (divides input sequence into even and odd samples – decimation in time – DIT) “Butterfly” (sum or difference followed or preceeded by a twiddle factor multiply) Xm and XN/2+m outputs of N/2 2-pt DFTs on outputs of 2, N/2-pt DFTs weighted with twiddle factors 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 11 DIT radix-2 implementations Several different ways Reorder the input data Input samples for inner DFTs in subsequent locations Results in bit-reversed input, in-order output DIT Selectively compute DFTs on evens and odds Perform in-place computation Output in bit-reversed order (X3 in position six (011->110)) X0 DFT DFT X1 N=4 X2 {x2i} N=2 DFT N=2 X3 X4 DFT DFT X5 N=4 w18 X6 {x2i+1} w2 X7 Division DFT of into even and N / 2 odd numbered sequences N=2 8 DFT w38 N=2 Multiplication by twiddle factors X0 X4 X1 X5 X2 X6 X3 X7 DFT of 2 Figure by MIT OpenCourseWare. Which type is this implementation? 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 12 Decimation in frequency (DIF) radix-2 implementation If reverse the role of N1 and N2, get DIF X2k1 = N1=N/2, N2=2 N / 2-1 n1=0 X2k1+1 = n k W N1/ 12(xn +xN / 2+n ), 1 1 N / 2-1 n k W N1/ 12 Wn1(xn -xN / 2+n ). 1 1 N n1=0 X0 DFT DFT X0 X1 N=2 N=4 X2 X2 DFT {x2k} X4 X3 N=2 X4 DFT X5 N=2 X6 DFT X7 N=2 w18 w2 8 X6 DFT X1 N=4 X3 {x2k+1} X5 w3 DFT of Multiplication 2 by twiddle factors 6.973 Communication System Design X7 8 DFT of N/2 Figure by MIT OpenCourseWare. Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 13 Duality DIT<->DIF X0 DFT DFT X1 N=4 X2 {x2i} N=2 DFT N=2 X3 X4 DFT DFT X5 N=4 w18 X6 {x2i+1} w2 X7 8 w38 DFT of Multiplication Division by twiddle into even and N / 2 factors odd numbered sequences N=2 DFT N=2 DFT of 2 X0 X0 X4 X1 X1 X2 X0 DFT DFT N=2 DFT N=2 X5 X3 X2 X4 DFT X6 X5 N=2 X3 X6 DFT X7 X7 N=2 N=4 X2 { } X4 x2k w1 X6 8 DFT w2 8 X1 N=4 X3 { } X5 x2k+1 w3 X7 8 DFT of Multiplication 2 by twiddle factors DFT of N/2 . are. Figure by MIT OpenCourseW Which one is DIT (DIF)? How can we get one from another? 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 14 Complexity of radix-2 FFTs DFT of length N replaced by two length-N/2 At the cost of N complex multiplications (twiddle) Iterate the scheme log2N-1 times And N complex additions (2pt DFTs) Obtain trivial transforms (length 2) of the length-N/2 DFTs Twiddle multiplies (WNi) Complex multiply – 3 real mult + 3 real add If i is multiple of N/4, no arithmetic operation required (why?) 4 butterflies (one general, 3 special cases) 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 15 Radix-4 N=4n, N1=4, N2=N/4 Cost of length-4 DFT 4 DFTs of length N/4 3N/4 twiddle multiplies N/4 DFTs of length 4 No multiplication Only 16 real additions x0 X0 DFT 8 DFT 2 X14 X1 x8 DFT 8 x15 X15 x0 x4 x8 DFT 4 DFT 4 x12 x15 DFT 4 DFT 4 DFT 4 Reduces the number of stages to log4N X0 X12 X1 X13 X2 X14 X3 X15 Figure by MIT OpenCourseWare. Radix-8 can reduce number of operations even more 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 16 Mixed-radix and Split-radix Mixed-radix Diferent radices in different stages Split-radix X0 DFT 8 DFT 2 R2 X14 X1 x8 DFT 8 Different radices in the same stage x0 Simultaneously on different parts of the transform Can achieve lowest number of adds and multiplies for length 2n inputs x15 X15 x0 x4 x8 X0 DFT 4 DFT 4 X12 X1 X13 X2 X14 X3 DFT 4 DFT 4 x12 DFT 4 x15 X15 X0 x0 x4 x8 x12 R4 DFT 2/4 DFT 8 DFT 4 DFT 4 X14 X1 X13 X3 S-R X15 Figure by MIT OpenCourseWare. 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 17 Split-radix (DIF SRFFT) Look at DIF radix-2 X2k1 don’t have twiddles X2k1 = N / 2-1 n1=0 X2k +1 = 1 nk W N1 / 12(xn +xN / 2+n ), 1 1 N / 2-1 n1=0 nk W N1/ 12 Wn1(xn -xN / 2+n ). 1 1 N X0 DFT DFT X0 X1 N=2 N=4 X2 X2 DFT {x2k} X4 X3 N=2 X4 DFT X5 N=2 X6 DFT X7 N=2 w1 8 w2 8 Even samples X2k in DIF should be computed separately from other samples With same algorithm (recursively) as the original sequence No general rule for odd samples DFT X1 N=4 X3 {x2k+1} X5 w83 DFT of Multiplication 2 by twiddle factors X6 Radix-4 is more efficient than radix-2 Higher radices are inefficient X2k = 1 N / 2-1 n1=0 X4k +1 = 1 N / 4-1 n1=0 nk W N1/ 14Wn1 N X[(xn -xN / 2+n ) 1 1 +j(xn1+N / 4-xn1+3N / 4)], X7 DFT of N/2 nk W N1/ 12(xn +xN / 2+n ), 1 1 X4k +3 = 1 N / 4-1 n1=0 W n1k1 3n 1 N / 4WN Figure by MIT OpenCourseWare. X[(xn +xN / 2+n ) 1 1 -j(xn1+N / 4-xn1+3N / 4)]. X0 X0 DFT 8 X4 X8 X12 DFT 2/4 DFT 4 DFT 4 X14 X1 X13 X3 X15 Figure by MIT OpenCourseWare. Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Split-radix (DIT SRFFT) Dual to DIF SRFFT Considers separately subsets {x2i}, {x4i+1} and {x4i+3} Redundancy in Xk, Xk+N/4, Xk+N/2, Xk+3N/4 computation 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 19 FFTs without twiddle factors Divide and conquer requirements N-long DFT computed from DFTs with lengths that are factors of N (allows the inner sum to be a DFT) Provided that subsets Il guarantee periodic xi When N factors into co-prime factors N=N1*N2 Starting from any xi form subset with compatible periodicity (the periodicity of the subset divides the periodicity of the set) {xi + N1n2 | n2 = 1,..., N 2 −1} or { xi + N2 n1 | n1 = 1,..., N1 −1} Both subsets have only one common point xi Allows a rearrangement of the input (periodic) vector into a matrix with a periodicity in both dimensions (rows and columns), both periodicities being comatible with the initial one 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 20 Good’s mapping FFTs without twiddle factors all based on the same mapping Turns original transform into a set of small DFTs with coprime lengths {xi + N1n2 | n2 = 1,..., N 2 −1} or {xi+ N2 n1 | n1 = 1,..., N1 −1} equivalent to 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Good's mapping 0 3 6 9 12 5 8 11 14 2 10 13 1 4 7 . are. Figure by MIT OpenCourseW This mapping is one-to-one if N1 and N2 are coprime All congruences modulo N1 obtained For a given congruence modulo N2 and vice versa 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 21 Just another arrangement of CRT Chinese Remainder Theorem (CRT) If we know the residue of some number k modulo two coprime numbers N1 and N2 k N k N It is possible to reconstruct k N N Let k N = k1 k N = k2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Then k N N = N1t1k2 + N 2t2 k1 N 1 1 1 N2 2 1 t1 N1 2 2 = 1 and t2 N 2 N1 = 1 t1 multiplicative inverse of N1 mod N2 t2 multiplicative inverse of N2 mod N1 t1, t2 always exist since N1, N2 coprime (N1,N2)=1 Good's mapping What are t1, t2 for N1=3, N2=5? CRT mapping Reversing N1 and N2 2 0 3 6 5 8 11 14 2 10 13 1 0 9 12 4 7 6 12 3 9 10 1 7 13 4 5 11 2 8 14 . are. Figure by MIT OpenCourseW Results in transposed mapping 6.973 Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 22 Impact on DFT Formulating the true multi-dimensional transform k N1 N 2 = N1t1k2 + N 2t2 k1 N N-1 Xk = Σ xi WNik, k = 0,..., N - 1, i=0 N1 - 1 N2 - 1 XN1t1k2 + N2r2k1 = Σ (n1N2 + N1n2)(N1t1k2+N2t2k1) N Σ xn1N2 + n2N1 W n1 = 0 n2 = 0 x3 x6 x9 x12 X9 DFT 5 X4 x0 N2 N W = WN1 W N2t2 N1 (N2t2)N 1 N1 =W N1 - 1 N2 - 1 XN1t1k2 + N2t2k1 = Σ = WN1 n k x5 n k Σ xn1N2 + n2N1 WN11 2 WN22 2 , DFT 3 X2 X8 X14 X11 x10 X5 n1 = 0 n 2 = 0 X'k1k2 = XN1t1k2 + N2t2k1 N1 - 1 N2 - 1 X'k1k2 = Σ Figure by MIT OpenCourseWare. x'n1.n2 = xn1N2 + n2N1 n k n k Σ x'n1n2 WN11 1 WN22 2 n1 = 0 n2 = 0 Figure by MIT OCW. True bidimensional transform! (no extra twiddle factors) 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 23 Using convolution to compute DFTs All sub DFTs are prime length Rader showed that prime-length DFTs can be computed as a result of cyclic convolution E.g. length 5 DFT Permute last two rows and columns Cyclic correlation (a convolution with a reversed sequence) This is a general result 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 24 Example Results in smallest number of multiplies 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 25 Prime Factor Algorithm X = F1 x F2T. Good's mapping x9 x3 CRT mapping x12 x6 X9 X4 DFT 5 X14 x0 x5 x10 DFT 3 X2 X5 X8 X11 Figure by MIT OpenCourseWare. Efficient small DFTs are a key to the feasibility of this algorithm 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 26 Winograd’s Fourier Transform Algorithm x10 x0 X5 x5 X11 X2 x3 X8 x6 x9 x12 X9 input additions N=3 X4 X14 input additions point wise output additions output additions multiplication N=5 N=5 N=3 Figure by MIT OpenCourseWare. B1xB2’ only involves additions D – diagonal (so point multiply) Winograd transform has many more additions than twiddle FFTs 6.973 Communication System Design Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 27