The Fast Fourier Transform

advertisement
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Programming
in C with MPI and OpenMP
Michael J. Quinn
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 15
The Fast Fourier Transform
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outline
Fourier analysis
 Discrete Fourier transform
 Fast Fourier transform
 Parallel implementation

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Discrete Fourier Transform
Many applications in science, engineering
 Examples
 Voice recognition
 Image processing
 Straightforward implementation: (n2)
 Fast Fourier transform: (n log n)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Fourier Analysis


Fourier analysis: Represent continuous functions
by potentially infinite series of sine and cosine
functions
Discrete Fourier transform: Map a sequence over
time to another sequence over frequency
 Signal strength as a function of time 
 Fourier coefficients as a function of frequency
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (1/4)
16 data points representing signal strength over time
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (2/4)
DFT yields amplitudes and frequencies of sine/cosine functions
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (3/4)
Plot of four constituent sine/cosine functions and their sum
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (4/4)
Continuous function and original 16 samples.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT of Speech Sample
“An
gorra
cats are furrier...”
Signal
Frequency
and amplitude
Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Computing DFT

Matrix-vector product Fn x
x is input vector (signal samples)
 fi,j = nij for 0  i, j < n and n is

primitive nth root of unity
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 1
Compute DFT of vector (2, 3)
 2, the primitive square root of unity, is -1

 200  201  x0  1 1  2   5 
 10



















1

1


x
1

1
3

1
 2  1  
   
 2
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2
Compute DFT of vector (1, 2, 4, 3)
 The primitive 4th root of unity is i

  40  40
 0
  4  41
 0
2


4
 4
 0  3
4
 4
 40
 42
 44
 46
 40  x0  1 1 1 1  1   10 
  
  

3
 4  x1  1 i  1  i  2    3  i 


6 




0 
 4  x2  1  1 1  1 4

  






9 

 4  x3  1  i  1 i  3    3  i 
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Fast Fourier Transform
An (n log n) algorithm to perform DFT
 Based on divide-and-conquer strategy
 Suppose we want to compute f(x)

f ( x) a0 a1 x a2 x 2  ...  an1 x n1

We define two new functions, f[0] and f[1]
f [ 0]  a0  a2 x  a4 x 2  ...  an 2 x n / 21
f [1]  a1  a3 x  a5 x 2  ...  an 1 x n / 21
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
FFT (continued)
Note: f(x) = f [0](x2) + x f [1](x2)
 Problem of evaluating f (x) at n values of 
reduces to
 Evaluating f [0](x) and f [1](x) at n/2 values
of 
 Performing f [0](x2) + x f [1](x2)
 Leads to recursive algorithm with time
complexity (n log n)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Iterative Implementation
Preferable
Well-written iterative version performs
fewer index computations than recursive
version
 Iterative version evaluates key common
sub-expression only once
 Easier to derive parallel FFT algorithm
when sequential algorithm in iterative form

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Recursive  Iterative (1/3)
Recursive implementation of FFT
(10, -3-i, 0, -3+i)
fft(1,2,4,3)
(5, -3)
(1)
fft(1,4)
fft(1)
(4)
fft(4)
(2)
5+1(5)
(5, -1)
fft(2,3)
fft(2)
fft(3)
(3)
1+1(4)
1
3+i)
3)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Recursive  Iterative (2/3)
Determining which computations are performed
for each function invocation
5+1(5)
-3+i(-1)
5-1(5)
-3-i(-1)
1+1(4)
1-1(4)
2+1(3)
2-1(3)
1
4
2
3
fft(1,2,4,3)
(5, -3)
(1)
fft(1,4)
fft(1)
5+1(5)
(5, -1)
-3+i(-1)
5-1(5)
-3-i(-1)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
(4)
(2)
(3)
fft(2,3)
1+1(4)
1-1(4)
2+1(3)
2-1(3)
Recursive  Iterative (3/3)
fft(4)
fft(2)
fft(3)
1
4
2
3
(a)
(b)
Tracking
the flow of data values (input
vector at bottom)
10
-3 - i
0
-3 + i
5 + 1(5)
-3 + i(-1)
5 - 1(5)
-3 - i(-1)
5
1 + 1(4)
1
-3
5
1 - 1(4)
-1
2 + 1(3)
4
2
1
4
1
2
(c)
2 - 1(3)
3
2
3
4
3
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Program Design
Domain decomposition
 Associate primitive task with each
element of input vector a and
corresponding element of output vector y
 Add channels to handle communications
between tasks

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
FFT Task/Channel Graph
a[0]
y[0]
a[1]
y[1]
a[2]
y[2]
a[3]
y[3]
a[4]
y[4]
a[5]
y[5]
a[6]
y[6]
a[7]
y[7]
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration and Mapping
Agglomerate primitive tasks associated with
contiguous elements of vector
 Map one agglomerated task to each process

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
After Agglomeration, Mapping
a[0]
a[1] a[2]
a[3]
a[4]
y[0]
y[1] y[2]
y[3] y[4]
a[5]
a[6]
a[7]
a[8]
a[9] a[10] a[11] a[12] a[13] a[14] a[15]
y[5] y[6] y[7] y[8]
y[9] y[10] y[11] y[12] y[13] y[14] y[15]
Input
Output
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Phases of Parallel FFT Algorithm



Phase 1: Processes permute a’s (all-to-all
communication)
Phase 2:
 First log n – log p iterations of FFT
 No message passing is required
Phase 3:
 Final log p iterations
 Processes organized as logical hypercube
 In each iteration every process swaps values
with partner across a hypercube dimension
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Complexity Analysis
Each process performs equal share of
computation: (n log n / p)
 All-to-all communication: (n log p / p)
 Sub-vector swaps during last log p
iterations: (n log p / p)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Isoefficiency Analysis



Sequential time complexity: (n log n)
Parallel overhead: (n log p)
Isoefficiency relation:
n log n  C n log p  log n  C log p  n  pC
M(p )/ p  p / p  p
C

C
C 1
Scalability depends C, a function of the ratio
between computing speed and communication
speed.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary




Discrete Fourier transform used in many scientific
and engineering applications
Fast Fourier transform important because it
implements DFT in time (n log n)
Developed parallel implementation of FFT
Why isn’t scalability better?
 (n log n) sequential algorithm
 Parallel version requires all-to-all data
exchange
Download