Document

advertisement
289
IX. Basic Implementation Techniques
and Fast Algorithm
 9-A 快速演算法設計的原則
 Fast Algorithm Design
Goals: Saving Computational Time
Number of Additions
Number of Multiplications
Number of Time Cycles
Saving the Hardware Cost for Implementation
 9-B 對於簡單矩陣快速演算法的設計
如何簡化下面四個運算
(1) y[1] = ax[1] + 2ax[2]
 y (1)   a a   x (1) 

(2) 



 y (2)   a a   x(2) 
(3)  y (1)    a b   x (1) 
 y (2)   b a   x(2) 

 


(4)  y (1)   a b   x (1) 
 y (2)    c a   x(2) 

 


290
291
(4)
b  a   x (1) 
 y (1)   a b   x(1)   a a   x(1)   0



 y (2)   c a   x(2)   a a   x(2)   c  a
0   x(2) 

 

 

 
 z (1)   a a   x(1) 
 z (2)    a a   x(2) 

 


b  a   x(1) 
 z (3)   0
 z (4)   c  a
  x(2) 
0

 


(i) z(1) = a[x(1) + x(2)], z(2) = z(1)
(ii) z(3) = (b−a)x(2),
z(4) = (c−a)x(1)
(iii) y(1) = z(1) + z(3),
y(2) = z(2) + z(4)
292
問題思考:如何對 complex number multiplication 來做 implementation?
293
 9-C 一般化的簡化理論:
假設一個 MN sub-rectangular matrix S 可分解為 column vector 及
row vector 相乘
 a1  b1 b2
a 
S 2
 
a 
 M
bN 
 y1 
 x1 
y 
x 
 2 S 2 
 
 
 
 
y
 N
 xN 
 z  b1 x1  b2 x2  bN xN

 yn  an z
若 [a1, a2, …., aM]T 有 M0 個相異的 non-trivial values
(am  2k, am  2kah where m  h)
[b1, b2, …., bN] 有 N0 個相異的 non-trivial values
則 S 共需要 M0 + N0 個乘法
294
 z[1] 
 x[1]   a1  b1 b2
 z[2] 
 x[2]   a 

  S
 2



  
 z[ N ]
 x[ N ]  a 



  M
bN   x[1] 
 x[2] 




 x[ N ]


Step 1
za = b1x[1] + b2x[2] + …. + bNx[N]
Step 2
z[1] = a1 za , z[2] = a2 za , ………, z[N] = aM za
295
簡化理論的變型
 a1  b1 b2
a 
S 2
 
a 
 M
bN 
 S1
S1 也是一個 MN matrix
若 S1 有 P1 個值不等於 0, 則 S 的乘法量上限為 M0 + N0 + P1
 a1  b1 b2
a 
S 2
 
a 
 M
bN   c1   d1 d 2
c 
 2 
 
c 
 M
dN 
 S1
以此類推
思考: 對於如下的情形需要多少乘法
 y[1]   a
 y[2]  e


 y[3]   f
 y[4]  d

 
b
f
e
c
c
f
e
b
d   x[1] 
e   x[2]


f   x[3]
a   x[4]
296
297
 9-D Examples
N 1
DFT:
X  m   x  n e
j
2 m n
N
n 0
Without any simplification, the DFT
needs 4N2 real multiplications (x[n]
may be complex)
 3  3 DFT 可以用特殊方法簡化
1
1 
1
1 1/ 2 1/ 2  


1 1/ 2 1/ 2 
0
0 
0


j 0  3 / 2
3/2 
0

3
/
2

3
/
2


298
 5  5 DFT 的例子
1
1

1
1

1
1
a
b
b
a
1
b
a
a
b
1
b
a
a
b
1
a

b 
b 
a 
0 0
0 c

j 0 d
0 d

0 c
0
d
c
c
d
0
d
c
c
d
0
c 

d 
d 
c 
a  cos  2 / 5
b  cos  4 / 5
c  sin  2 / 5
d  sin  4 / 5
299
8-point DCT
 y[0] 0.7010
 y[1]   0.9808

 
 y[2]  0.9239
 y[3]  0.8315


 y[4]  0.7010

 
y
[5]

  0.5556
 y[6] 0.3827

 
y
[7]

  0.1951
0.7010
0.7010
0.7010
0.7010
0.8315
0.3827
0.5556 0.1951 0.1951
0.3827 0.9239 0.9239
0.1951 0.9808 0.5556 0.5556
0.7010 0.7010 0.7010 0.7010
0.9808 0.1951 0.8315 0.8315
0.9239
0.9239
0.3827 0.3827
0.5556
0.8315
0.9808
0.9808
0.7010   x[0]
0.5556 0.8315 0.9808  x[1] 


0.3827 0.3827 0.9239   x[2]
0.9808 0.1951 0.8315  x[3]
0.7010 0.7010 0.7010   x[4]


0.1951 0.9808 0.5556  x[5]
0.9239 0.9239 0.3827   x[6]


0.8315 0.5556 0.1951  x[7]
0.7010
觀察對稱性質之後,令
 z[0]  x[0] x[7]
 z[1]   x[1] x[6] 



 z[2]  x[2] x[5]
 z[3]   x[3] x[4]

 

 z[4]  x[0] x[7]
 z[5]  x[1] x[6] 



 z[6]  x[2] x[5]
 z[7]  x[3] x[4]

 

0.7010
Part 1:
 y[0]  0.7010 0.7010 0.7010 0.7010   z[0]
 y[2]  0.9239 0.3827 0.3827 0.9239   z[1] 




 y[4]  0.7010 0.7010 0.7010 0.7010   z[2]
 y[6] 0.3827 0.9239 0.9239 0.3827   z[3]

 


300
Part 2:  y[1]   0.9808 0.8315 0.5556 0.1951   z[4]
 y[3]  0.8315 0.1951 0.9808 0.5556  z[5]




 y[5] 0.5556 0.9808 0.1951 0.8315   z[6]
 y[7]  0.1951 0.5556 0.8315 0.9808  z[7]

 


301
[Ref] B. G. Lee, “A new algorithm for computing the discrete cosine transform,”
IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, pp. 1243-1245, Dec. 1984.
X. Fast Fourier Transform
 C. S. Burrus and T. W. Parks, “DFT / FFT and convolution algorithms”,
John Wiley and Sons, New York, 1985.
 R. E. Blahut, Fast Algorithm for Digital Signal Processing, Addison
Wesley Publishing Company.
N 1
X  m   x  n e
j
2 m n
N
n 0
N-point Fourier Transform: 運算量為 N2
FFT (with the Cooley Tukey algorithm): 運算量為 Nlog2N
要學到的概念:(1)快速演算法不是只有 Cooley Tukey algorithm
(2) 不是只有 N = 2k 有時候才有快速演算法
302
303
 10-A Other DFT Implementation Algorithms
(1) Cooley-Tukey algorithm (Butterfly form)
(2) Radix-4, 8, 16, …. Algorithms
(3) Prime Factor Algorithm
(4) Goertzel Algorithm
(5) Chirp Z transform (CZT)
(6) Winograd algorithm
304
Reference
 J. W. Cooley and J. W. Tukey, “An algorithm for the machine computation
of complex Fourier series,” Mathematics of Computation, vol. 19, pp. 297301, Apr. 1965. (Cooley-Tukey)
 C. S. Burrus, “Index Mappings for multidimensional formulation of the DFT
and convolution,” IEEE Trans. Acoustics, Speech, and Signal Processing,
vol. 25, pp. 1239-242, June 1977. (Prime factor)
 G. Goertzel, “An algorithm for the evaluation of finite trigonometric series,”
American Math. Monthly, vol. 65, pp. 34-35, Jan. 1958.
(Goertzel)
 C. R. Hewes, R. W. Broderson, and D. D. Buss, “Applications of CCD and
switched capacitor filter technology,” Proc. IEEE, vol. 67, no. 10, pp. 14031415, Oct. 1979.
(CZT)
 S. Winograd, “On computing the discrete Fourier transform,” Mathematics
of Computation, vol. 32, no. 141, pp. 179-199, Jan. 1978.
(Winograd)
 R. E. Blahut, Fast Algorithm for Digital Signal Processing, Reading, Mass.,
Addison-Wesley, 1985.
305
 10-B Cooley Tukey Algorithm
When N = 2k
N 1
X  m   x  n e
j
2 m n
N
n 0

N / 2 1
 x  2n  e
j
2 m (2 n )
N

N / 2 1
n 0

N / 2 1
 x  n e
n 0
1
j
 x  2n  1 e
j
2 m (2 n 1)
N
n 0
2 m n
N /2
e
j
2 m N / 2 1
N
 x n
n 0
j
2
2 m n
N /2
twiddle factors
x1[n] = x[2n], x2[n] = x[2n+1]
Therefore,
one N-point DFT = two (N/2)-point DFT + twiddle factors
306
8-point DFT
X[0]
x[0]
x[2]
X[1]
4-point DFT
x[4]
X[2]
x[6]
X[3]
1
x[1]
x[3]
w
4-point DFT
x[5]
w2
w3
x[7]
we
j
2
8
X[4]
-1
-1
X[5]
-1
X[6]
-1
w4  1
X[7]
w8  1
307
8-point DFT
x[0]
x[4]
X[0]
X[1]
-1
1
x[2]
x[6]
w2
-1
X[2]
-1
X[3]
-1
x[1]
1
x[5]
w
-1
x[3]
1
x[7]
w2
we
j
2
8
X[5]
-1
w2
-1
X[6]
-1
w3
-1
-1
X[4]
-1
-1
X[7]
308
 Number of real multiplications 的估算
2k-point DFT 一共有 k 個 stages
每個 stage 和下一個 stage 之間有 2k-1 個 twiddle factors
所以,一共有 2k-1(k-1) 個 twiddle factors
一般而言,每個 twiddle factor 需要 3 個 real multiplications
 2k-point DFT 需要
3
32k 1 (k  1)  N  log 2 N  1
2
個 real multiplications
Complexity of the N-point DFT: O(Nlog2N)
309
 8-point DFT 只需要 4 個 real multiplications (Why?)
 更精確的分析,使用 Cooley-Tukey algorithm 時,N-point DFT 需要
3
N log 2 N  5 N  8
2
(Why?)
個 real multiplications
310
 10-C Radix-4 Algorithm
限制: N = 4k
or N = 24k (此時 Cooley-Tukey algorithm 和 radix-4 algorithm 並用)
N 1
X  m   x  n e
j
2 m n
N
n 0

N /4 1
 x  4n  e
j
2 m n
N /4
e
j
2 m N /4 1
N
 x  4n  1
n 0
e
j
2 (2 m ) N /4 1
N
 x  4n  2  e
j
2 m n
N /4
n 0
j
n 0
2 m n
N /4
e
j
2 (3 m ) N /4 1
N
 x  4n  3
n 0
twiddle factors
One N-point DFT = four (N/4)-point DFTs + twiddle factors
j
2 m n
N /4
311
Note:
(1) radix-4 algorithm 最後可將 N = 4k-point DFT 拆解成 4-point DFTs 的組合
4-point DFTs 不需要任何的乘法
(2) 使用 radix-4 algorithm 時,N-point DFT 需要
9
43
16
N log 4 N  N 
4
12
3
個 real multiplications
312
 Number of real multiplications for the N-point DFT
N
乘法數 加法數
N
乘法數
N
乘法數
N
乘法數
1
0
0
12
8
26
104
44
160
2
0
4
13
52
27
114
45
170
3
2
12
14
32
28
64
48
92
4
0
16
15
40
30
80
52
208
5
10
34
16
20
32
72
54
228
6
4
36
18
32
33
160
56
156
7
16
72
20
40
35
150
60
160
8
4
52
21
62
36
64
63
256
9
16
72
22
80
39
182
64
204
10
20
88
24
28
40
100
66
284
11
46
156
25
148
42
124
70
300
313
N
乘法數
N
乘法數
N
乘法數
N
乘法數
72
164
128
560
256
1308
1024
7436
80
260
144
436
288
1160
1152
7088
81
480
160
680
312
1324
1260
7640
84
248
168
580
336
1412
2016
12728
88
412
180
680
360
1540
2048
16836
90
340
192
752
420
2080
2304
15868
96
280
204
976
504
2300
2520
16540
104
468
216
1020
512
3180
4032
29488
108
456
224
1016
560
3100
4096
37516
112
396
240
940
672
3496
4368
35828
120
380
252
1024
1008
5356
5040
36860
附錄十: Complexities 一覽表
 N-point DFT: O  N log2 N 
 N-point DCT, DST, DHT:
O  N log2 N 
 Two-dimensional (2-D) Nx  Ny-point DFT: O  ( N x N y )log 2 ( N x N y ) 
 Convolution of an M-point sequence and an N-point sequence:
O  (M  N  1)log2 (M  N  1) 
O N 
O M 
when M/N and N/M are not large,
when N >> M and M is a fixed constant.
when M >> N and N is a fixed constant.
314
315
 2-D Convolution of an (Mx  My )-point matrix and an (Nx  Ny )-point matrix:

O ( M x  N x  1)( M y  N y  1)log 2  ( M x  N x  1)( M y  N y  1) 
when MxMy/NxNy and NxNy/MxMy are not large,
OM xM y Nx N y 
when MxMy >> NxNy or NxNy >> MxMy
O Nx N y 
when MxMy >> NxNy or NxNy >> MxMy ,
and Mx, My are fixed constants.

316
Four important concepts that should be learned from fast algorithm design:
(1)
(2)
(3)
(4)
Download