the Top THEME ARTICLE THE FFT: AN ALGORITHM THE WHOLE FAMILY CAN USE The fast Fourier transform is one of the fundamental algorithm families in digital information processing. The author discusses its past, present, and future, along with its important role in our current digital revolution. A paper by Cooley and Tukey described a recipe for computing Fourier coefficients of a time series that used many fewer machine operations than did the straightforward procedure ... What lies over the horizon in digital signal processing is anyone’s guess, but I think it will surprise us all. – Bruce P. Bogert, IEEE Trans. Audio Electronics, AU-15, No. 2, 1967, p. 43. T hese days, it is almost beyond belief that there was a time before digital technology. It seems almost everyone realizes that the data whizzing over the Internet, bustling through our modems, or crashing into our cell phones is ultimately just a sequence of 0’s and 1’s—a digital sequence— that magically makes the world the convenient, high-speed place it is today. Much of this magic is due to a family of algorithms that collectively go by the name the fast Fourier transform. In1521-9615/00/$10.00 © 2000 IEEE DANIEL N. ROCKMORE Dartmouth College 60 deed, the FFT is perhaps the most ubiquitous algorithm used today to analyze and manipulate digital or discrete data. My own research experience with various flavors of the FFT is evidence of its wide range of applicability: electroacoustic music and audiosignal processing, medical imaging, image processing, pattern recognition, computational chemistry, error-correcting codes, spectral methods for partial differential equations, and last but not least, mathematics. Of course, I could list many more applications, notably in radar and communications, but space and time restrict. E. Oran Brigham’s book is an excellent place to start, especially pages two and three, which contain a (nonexhaustive) list of 77 applications!1 History We can trace the FFT’s first appearance, like so much of mathematics, back to Gauss.2 His interests were in certain astronomical calculations (a recurrent area of FFT application) that dealt with the interpolation of asteroidal orbits from a finite set of equally spaced observations. Undoubtedly, the prospect of a huge, laborious hand calculation provided good motivation to develop COMPUTING IN SCIENCE & ENGINEERING Discrete Fourier transforms The fast Fourier transform efficiently computes the discrete Fourier transform. Recall that the DFT of a complex input vector of length N, X = (X(0) ..., X(N − 1)), denoted Xˆ , is another vector of length N given by the collection of sums Xˆ(k ) = N −1 ∑ X ( j )WNjk (1) Equation 5 that we can rewrite Equation 1 as Xˆ(c , d ) = ( FN = (( )) is the so-called Fourier matrix. The DFT is an invertible transform with inverse given by a =0 2 ∑ X (a , b )WNad 2 N 1−1 ∑ W b (cN +d )X˜(b ,d ) 2 (2) k =0 2 Thus, if computed directly, the DFT would require N operations. Instead, the FFT is an algorithm for computing the DFT in O(N log N) operations. Note that we can view the inverse as the DFT of the function 1 ˆ X ( −k ) , N so that we can also use the FFT to invert the DFT. One of the DFT’s most useful properties is that it converts circular or cyclic convolution into pointwise multiplication, for example, X *Y (k ) = Xˆ(k )Yˆ(k ) (3) where n X *Y ( j ) = ∑ X (l )Y ( j − l ) . (4) l =0 Consequently, the FFT gives an O(N log N) (instead of an N2) algorithm for computing convolutions: First compute the DFTs of both X and Y, then compute the inverse DFT of the sequence obtained by multiplying pointwise Xˆ and Yˆ . In retrospect, the idea underlying the Cooley-Tukey FFT is quite simple. If N = N1N2, then we can turn the 1D equation (Equation 1) into a 2D equation with the change of variables (7) which is now interpreted as a subsampled DFT of length N2. Even if computed directly, at most N1N22 arithmetic operations are required to compute all of the X˜ (b, d ) . Finally, we compute N1N2 transforms of length N1: N −1 ∑ Xˆ(k ) jWN− jk . (6) N 2 −1 a =0 WNjk 1 X(j ) = N 2 The computation is now performed in two steps. First, compute for each b the inner sums (for all d) X˜(b , d ) = ) N 2 −1 ∑ WNb (cN +d ) ∑ X (a , b )WNad b =0 j =0 where WN = exp 2π −1 / N . Equivalently, we can view this as the matrix-vector product FN . X, where N 1−1 (8) b =0 which requires at most an additional N1N12 operations. Thus, instead of (N1N2)2 operations, this two-step approach uses at most (N1N2)(N1 + N2) operations. If we had more factors in Equation 6, then this approach would work even better, giving Cooley and Tukey’s result. The main idea is that we have converted a 1D algorithm, in terms of indexing, into a 2D algorithm. Furthermore, this algorithm has the advantage of an in-place implementation, and when accomplished this way, concludes with data reorganized according to the well-known bit-reversal shuffle. This “decimation in time” approach is one of a variety of FFT techniques. Also notable is the dual approach of “decimation in frequency” developed simultaneously by Gordon Sande, whose paper with W. Morven Gentleman also contains an interesting discussion on memory consideration as it relates to implementational issues.1 Charles Van Loan’s book discusses some of the other variations and contains an extensive bibliography.2 Many of these algorithms rely on the ability to factor N. When N is prime, we can use a different idea in which the DFT is effectively reduced to a cyclic convolution instead.3 References 1. W.M. Gentleman and G. Sande, “Fast Fourier Transforms—For Fun and Profit,” Proc. Fall Joint Computer Conf. AFIPS, Vol. 29, Spartan, Washington, D.C., 1966, pp. 563–578. j = j(a,b) = aN1 + b, 0 ≤ a < N2, 0 ≤ b <N1 k = k(c,d) = cN2 + d, 0 ≤ c < N1, 0 ≤ d < N2 2. (5) Philadelphia, 1992. 3. Using the fact WNm +n = WNmWNn , it follows quickly from a fast algorithm. Fewer calculations also imply less opportunity for error and therefore lead to numerical stability. Gauss observed that he could break a Fourier series of bandwidth N = N1N2 into a computation of N2 subsampled discrete JANUARY/FEBRUARY 2000 C. Van Loan, Computational Framework for the Fast Fourier Transform, SIAM, C.M. Rader, “Discrete Fourier Transforms When the Number of Data Points is Prime,” Proc. IEEE, IEEE Press, Piscataway, N.J., Vol. 56, 1968, pp. 1107–1108. Fourier transforms of length N1, which are combined as N1 DFTs of length N2. (See the “Discrete Fourier transforms” sidebar for detailed information.) Gauss’s algorithm was never published outside of his collected works. 61 The statistician Frank Yates published a less general but still important version of the FFT in 1932, which we can use to efficiently compute the Hadamard and Walsh transforms.3 Yates’s “interaction algorithm” is a fast technique designed to compute the analysis of variance for a 2n-factorial design and is described in almost any text on statistical design and analysis of experiments. Another important predecessor is the work of G.C. Danielson and Cornelius Lanczos, performed in the service of x-ray crystallography, another area for applying FFT technology.4 Their “doubling trick” showed how to reduce a DFT on 2N points to two DFTs on N points using only N extra operations. Today, it’s amusing to note their problem sizes and timings: “Adopting these improvements, the approximate times for Fourier analysis are 10 minutes for 8 coefficients, 25 minutes for 16 coefficients, 60 minutes for 32 coefficients, and 140 minutes for 64 coefficients.”4 This indicates a running time of about .37 N log N minutes for an N-point DFT! Despite these early discoveries of an FFT, it wasn’t until James W. Cooley and John W. Tukey’s article that the algorithm gained any notice. The story of their collaboration is an interesting one. Tukey arrived at the basic reduction while in a meeting of President Kennedy’s Science Advisory Committee. Among the topics discussed were techniques for offshore detection of nuclear tests in the Soviet Union. Ratification of a proposed United States– Soviet Union nuclear test ban depended on the development of a method to detect the tests without actually visiting Soviet nuclear facilities. One idea was to analyze seismological time-series data obtained from offshore seismometers, the length and number of which would require fast algorithms to compute the DFT. Other possible applications to national security included the long-range acoustic detection of nuclear submarines. Richard Garwin of IBM was another participant at this meeting, and when Tukey showed him the idea, he immediately saw a wide range of potential applicability and quickly set to getting the algorithm implemented. He was directed to Cooley, and, needing to hide the national security issues, told Cooley that he wanted the code for another problem of interest: the determination of the spinorientation periodicities in a 3D crystal of He3. Cooley was involved with other projects, and sat down to program the Cooley-Tukey FFT only after much prodding. In short order, he and Tukey prepared a paper which, for a mathematics or computer science paper, was published almost instantaneously (in six months).5 This publication, as well as Gar- 62 win’s fervent proselytizing, did a lot to publicize the existence of this (apparently) new fast algorithm.6 The timing of the announcement was such that usage spread quickly. The roughly simultaneous development of analog-to-digital converters capable of producing digitized samples of a time-varying voltage at rates of 300,000 samples per second had already initiated something of a digital revolution. This development also provided scientists with heretofore unimagined quantities of digital data to analyze and manipulate (just as is the case today). The “standard” applications of FFT as an analysis tool for waveforms or for solving PDEs generated a tremendous interest in the algorithm a priori. But moreover, the ability to do this analysis quickly let scientists from new areas try the algorithm without having to invest too much time and energy. Its effect It’s difficult for me to overstate FFT’s importance. Much of its central place in digital signal and image processing is due to the fact that it made working in the frequency domain equally computationally feasible as working in the temporal or spatial domain. By providing a fast algorithm for convolution, the FFT enabled fast, large-integer and polynomial multiplication, as well as efficient matrix-vector multiplication for Toeplitz, circulant, and other kinds of structured matrices. More generally, it plays a key role in most efficient sorts of filtering algorithms. Modifications of the FFT are one approach to fast algorithms for discrete cosine or sine transforms, as well as Chebyshev transforms. In particular, the discrete cosine transform is at the heart of MP3 encoding, which gives life to real-time audio streaming. Last but not least, it’s also one of the few algorithms to make it into the movies—I can still recall the scene in No Way Out where the image-processing guru declares that he will need to “Fourier transform the image” to help Kevin Costner see the detail in a photograph! Even beyond these direct technological applications, the FFT influenced the direction of academic research, too. The FFT was one of the first instances of a less-than-straightforward algorithm with a high payoff in efficiency used to compute something important. Furthermore, it raised the natural question, “Could an even faster algorithm be found for the DFT?” (the answer is no7), thereby raising awareness of and heightening interest in the subject of lower bounds and the analysis and development of efficient algorithms in general. With respect to Shmuel Winograd’s COMPUTING IN SCIENCE & ENGINEERING lower-bound analysis, Cooley writes in the discussion of the 1968 Arden House Workshop on FFT, “These are the beginnings, I believe, of a branch of computer science which will probably uncover and evaluate other algorithms for high speed computers.”8 Ironically, the FFT’s prominence might have slowed progress in other research areas. It provided scientists with a big analytic hammer, and, for many, the world suddenly looked as though it were full of nails—even if this wasn’t always so. Researchers sometimes massaged problems that might have benefited from other, more appropriate techniques into a DFT framework, simply because the FFT was so efficient. One example that comes to mind is some of the early spectral-methods work to solve PDEs in spherical geometry. In this case, the spherical harmonics are a natural set of basis functions. Discretization for numerical solutions implies the computation of discrete Legendre transforms (as well as FFTs). Many of the early computational approaches tried instead to approximate these expansions completely in terms of Fourier series, rather than address the development of an efficient Legendre transform. Even now there are still lessons to learn from the FFT’s development. In this day and age, where any new technological idea seems fodder for Internet venture capitalists and patent lawyers, it is natural to ask, “Why didn’t IBM patent the FFT?” Cooley explained that because Tukey wasn’t an IBM employee, IBM worried that it might not be able to gain a patent. Consequently, IBM had a great interest in putting the algorithm in the public domain. The effect was that then nobody else could patent it either. This did not seem like such a great loss because at the time, the prevailing attitude was that a company made money in hardware, not software. In fact, the FFT was designed as a tool to analyze huge time series, in theory something only supercomputers tackled. So, by placing in the public domain an algorithm that would make timeseries analysis feasible, more big companies might have an interest in buying supercomputers (like IBM mainframes) to do their work. Whether having the FFT in the public domain had the effect IBM hoped for is moot, but it certainly provided many scientists with applications on which to apply the algorithm. The breadth of scientific interests at the Arden workshop (held only two years after the paper’s publication) is truly impressive. In fact, the rapid pace of today’s technological developments is in many ways a testament to this open development’s advantage. This is a cautionary tale in today’s arena of proprietary re- JANUARY/FEBRUARY 2000 search, and we can only wonder which of the many recent private technological discoveries might have prospered from a similar announcement. The future FFT As torrents of digital data continue to stream into our computers, it seems that the FFT will continue to play a prominent role in our analysis and understanding of this river of data. What follows is a brief discussion of future FFT challenges, as well as a few new directions of related research. Even bigger FFTs Astronomy continues to be a chief consumer of large FFT technology. The needs of projects like MAP (Microwave Anisotropy Project) or LIGO (Laser InterFerometer Gravitational-Wave Observatory) require FFTs of several (even tens of) gigapoints. FFTs of this size do not fit in the main memory of most machines, and these so-called outof-core FFTs are an active area of research.9 As computing technology evolves, undoubtedly, versions of the FFT will evolve to keep pace and take advantage of it. Different kinds of memory hierarchies and architectures present new challenges and opportunities. Approximate and nonuniform FFTs For a variety of applications (such as fast MRI), we need to compute DFTs for nonuniformly spaced grid points and frequencies. Multipolebased approaches efficiently compute these quantities in such a way that the running time increases by a factor of 1 log e where e denotes the approximation’s precision.10 Algebraic approaches based on efficient polynomial evaluation are also possible.11 Group FFTs The FFT might also be explained and interpreted using the language of group representation theory—working along these lines raises some interesting avenues for generalization. One approach is to view a 1D DFT of length N as computing the expansion of a function defined on CN, the cyclic group of length N (the group of integers mod N) in terms of the basis of irreducible matrix elements of CN, which are precisely the familiar sampled exponentials: ek (m) = exp(2π −1km / N ) . The FFT is a highly efficient algorithm for computing the expansion in this basis. More generally, a function on 63 any compact group (cyclic or not) has an expansion in terms of a basis of irreducible matrix elements (which generalize the exponentials from the point of view of group invariance). It’s natural to wonder if efficient algorithms for performing this change of basis exist. For example, the problem of efficiently computing spherical harmonic expansions falls into this framework. The first FFT for a noncommutative finite group seems to have been developed by Alan Willsky in the context of analyzing certain Markov processes.12 To date, fast algorithms exist for many classes of compact groups.11 Areas of applications of this work include signal processing, data analysis, and robotics.13 Quantum FFTs One of the first great triumphs of the quantum-computing model is Peter Shor’s fast algorithm for integer factorization on a quantum computer.14 At the heart of Shor’s algorithm is a subroutine that computes (on a quantum computer) the DFT of a binary vector representing an integer. The implementation of this transform as a sequence of one- and two-bit quantum gates, now called the quantum FFT, is effectively the Cooley-Tukey FFT realized as a particular factorization of the Fourier matrix into a product of matrices composed as certain tensor products of two-by-two unitary matrices, each of which is a so-called local unitary transform. Similarly, the quantum solution to the Modified DeutschJosza problem uses the matrix factorization arising from Yates’s algorithm.15 Extensions of these ideas to the more general group transforms mentioned earlier are currently being explored. T hat’s the FFT—both parent and child of the digital revolution, a computational technique at the nexus of the worlds of business and entertainment, national security and public communication. Although it’s anyone’s guess as to what lies over the next horizon in digital signal processing, the FFT will most likely be in the thick of it. Acknowledgment Special thanks to Jim Cooley, Shmuel Winograd, and Mark Taylor for helpful conversations. The Santa Fe Institute provided partial support and a very friendly and stimulating environment in which to write this paper. NSF Presidential Faculty Fellowship DMS-9553134 supported part of this work. 64 References 1. E.O. Brigham, The Fast Fourier Transform and Its Applications, Prentice Hall Signal Processing Series, Englewood Cliffs, N.J., 1988. 2. M.T. Heideman, D.H. Johnson, and C.S. Burrus, “Gauss and the History of the Fast Fourier Transform,” Archive for History of Exact Sciences, Vol. 34, No. 3, 1985, pp. 265–277. 3. F. Yates, “The Design and Analysis of Factorial Experiments,” Imperial Bureau of Soil Sciences Tech. Comm., Vol. 35, 1937. 4. G.C. Danielson and C. Lanczos, “Some Improvements in Practical Fourier Analysis and Their Application to X-Ray Scattering from Liquids,” J. Franklin Inst., Vol. 233, Nos. 4 and 5, 1942, pp. 365–380 and 432–452. 5. J.W. Cooley and J.W. Tukey, “An Algorithm for Machine Calculation of Complex Fourier Series,” Mathematics of Computation, Vol. 19, Apr., 1965, pp. 297–301. 6. J.W. Cooley, “The Re-Discovery of the Fast Fourier Transform Algorithm,” Mikrochimica Acta, Vol. 3, 1987, pp. 33–45. 7. S. Winograd, “Arithmetic Complexity of Computations,” CBMSNSF Regional Conf. Series in Applied Mathematics, Vol. 33, SIAM, Philadelphia, 1980. 8. “Special Issue on Fast Fourier Transform and Its Application to Digital Filtering and Spectral Analysis,” IEEE Trans. Audio Electronics, AU-15, No. 2, 1969. 9. T.H. Cormen and D.M. Nicol, “Performing Out-of-Core FFTs on Parallel Disk Systems,” Parallel Computing, Vol. 24, No. 1, 1998, pp. 5–20. 10. A. Dutt and V. Rokhlin, “Fast Fourier Transforms for Nonequispaced Data,” SIAM J. Scientific Computing, Vol. 14, No. 6, 1993, pp. 1368–1393; continued in Applied and Computational Harmonic Analysis, Vol. 2, No. 1, 1995, pp. 85–100. 11. D.K. Maslen and D.N. Rockmore, “Generalized FFTs—A Survey of Some Recent Results,” Groups and Computation, II, DIMACS Ser. Discrete Math. Theoret. Comput. Sci., Vol. 28, Amer. Math. Soc., Providence, R.I., 1997, pp. 183−237. 12. A.S. Willsky, “On the Algebraic Structure of Certain Partially Observable Finite-State Markov Processes,” Information and Control, Vol. 38, 1978, pp. 179–212. 13. D.N. Rockmore, “Some Applications of Generalized FFTs (An Appendix with D. Healy),” Groups and Computation II, DIMACS Series on Discrete Math. Theoret. Comput. Sci., Vol. 28, American Mathematical Society, Providence, R.I., 1997, pp. 329–369. 14. P.W. Shor, “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer,” SIAM J. Computing, Vol. 26, No. 5, 1997, pp. 1484–1509. 15. D. Simon, “On the Power of Quantum Computation,” Proc. 35th Annual ACM Symp. on Foundations of Computer Science, ACM Press, New York, 1994, pp. 116–123. Daniel N. Rockmore is an associate professor of mathematics and computer science at Dartmouth College, where he also serves as vice chair of the Department of Mathematics. His general research interests are in the theory and application of computational aspects of group representations, particularly to FFT generalizations. He received his BA and PhD in mathematics from Princeton University and Harvard University, respectively. In 1995, he was one of 15 scientists to receive a fiveyear NSF Presidential Faculty Fellowship from the White House. He is a member of the American Mathematical Society, the IEEE, and SIAM. Contact him at the Dept. of Mathematics, Bradley Hall, Dartmouth College, Hanover, NH 03755; rockmore@cs.dartmouth.edu; www.cs.darthmouth.edu/~rockmore. COMPUTING IN SCIENCE & ENGINEERING