Fast Convolution

advertisement
Fast Convolution
y(n)  a x(n)  a x(n  1)  ..  a
x(n  k  1) ...[0]
0
1
k 1
y(n  1)  a x(n  1)  a x(n)  ..  a
x(n  k  2) ...[1]
0
1
k 1
y(n  2)  a x(n  2)  a x(n  1)  ..  a
x(n  k  3) ...[2]
0
1
k 1
...............
y(n  k  1)  a x(n  k  1)  a x(n  k  2)  ..  a
x(n) ...[k  1]
0
1
k 1
Evaluate [0] by itself ?
Evaluate [0]-[k-1] requires k2 multipliers
Somewhat surprising – can do “better” FFT based O(k log2k)
comes with lots of hidden costs: layout wires, large constant, numerical stability
FFT is a fast way of computing the DFT
frequency domain view of signals and systems
For understanding fast convolution we can IGNORE the frequency domain interpretation of the
DFT
In stead we will think about DFT/FFT in terms of polynomial
n 1
A( x )   a j x j
j 0
n 1
B( x )   b j x j
j 0
aj real or complex – degree bound n
degree k largest k such that ak ≠ 0
Can add and multiply polynomials
Add A(x) + B(x) = C(x), where cj=aj+bj
Given A(x), B(x) of degree bound n their product C(x) is polynomial of degree bound 2n-1 such that
C(x) = A(x).B(x) for all x.
Multiply terms, combine terms with equal powers
A( x)  6 x 3  7 x 2  10 x  9
B( x)  2 x 3  4 x  5
6 x 3  7 x 2  10 x  9
 2x3
 4x  5
__________ __________ ________
 30 x 3  35 x 2  50 x  45
24 x 4  28 x 3  40 x 2  36 x
 12 x 6  14 x 5  20 x 4  18 x 3
__________ __________ ________
 12 x 6  14 x 5  44 x 4  20 x 3  75 x 2  86 x  45
Another way,
C ( x) 
2n  2
 c j x j where c j 
j 0
j
a b
k
k 0
j k
[1]
Relationship between polynomial multiplication and convolution.
a(n  1)b(n  1)
C(2n  2)  

 C(2n  3)  a(n  2)b(n  1)  a(n  1)b(n  2)

 

 ....   

.....
 C(1)  

a(0)b(1)  a(1)b(0)

 

a(0)b(0)
 C(0)  

Viewing
a(0) a(1).....a(n  1)
b(0)b(1).....b(n  1)
as coefficients of degree-bound polynomials, coefficient vector C can be given by equation [1] is the
convolution of a and b denoted by
a b
How to perform convolution/polynomial multiplication “faster” ?
Use “point-wise” representation for some very specific points
works both for software, hardware
Coefficient representation of
A( x) 
n 1
a x
j 0
is just
j
j
a(0) a(1).....a(n  1) - will be viewing as a vector and doing matrix multiplication.
How to evaluate?
Θ(n2) if plugged in.
Θ(n) using Horner’s rule.
A( x0 )  a0  x0 (a1  x0 (a2  ...  x0 an 1 )))
adding Θ(n)
multiplying Θ(n2) using equation [1]
Alternate representation
“Point-value representation” of A(x) of degree bound n
{(x0,y0),(x1,y1), ...,(xn-1,yn-1)} such that yk=A(xk) for k=0,1,...,n-1
Many different representations [some are better than others]
Easy to derive point-value representations from the coefficient form
Select x0,x1, ...,xn-1; evaluate using Horner’s rule Θ(n2). Later Θ(nlog2n)
Getting coefficient from point value representation – “interpolation”
Theorem:
For any set {(x0,y0),(x1,y1), ...,(xn-1,yn-1)} of n point value pairs there is a unique polynomial A(x) of
degree bound n such that yk=A(xk) for k=0,1,...,n-1
Proof Idea:
1 x0

1 x1
.
.

1 x n 1
  a0 
 

.. x1   a1 

..
.   .. 


n 1 
.. x n 1  a n 1 
..
x0
n 1
n 1
Matrix on the left denoted by V(x0,x1, ...,xn-1) has determinant
 (x
jk
k
 xj)
hence invertible implying the existence of a unique solution.
Incidentally, this gives an algorithm for going from point-value representation to coefficient representation.
Θ(n3) to solve n equations in n unknowns.
Faster approach is
 (x  x )
A( x)   Y
 (x  x )
n 1
k 0
j
j k
k
j k
k
j
can compute coefficient in Θ(n2)
n-point evaluation and interpolation are well-defined inverse operators – taking Θ(n2) time.
[very bad numerically]
Point value representation very convenient for many operations.
C(x)=A(x)+B(x);
{(x0,y0),(x1,y1), ...,(xn-1,yn-1)}
{(x0,y0’),(x1,y1’), ...,(xn-1,yn-1’)}
Point value representation is
{(x0, y0+y0’),(x1, y1+y1’), ...,(xn-1, y n-1+yn-1’)}
Θ(n) [point value representation over same n points]
Multiplication:
C(xk)=A(xk)+B(xk)
-
problem degree bounds for A and B is the sum of the degree bounds for A and B – use extended
p.v.r for A and B
How to evaluate polynomial in pvr at a new point
Best know approach is to convert to coefficient form first.
Fast multiplication of polynomials in coefficient form.
Can exploit Θ(n) algorithms for polynomial multiplication in coefficient form.
hinges on being able to go from coefficient to pvr and them pvr to coefficient form
Already have Θ(n2) for these problems – choose evaluation points carefully – do in Θ(n log2n) time
Use “complex roots of unity” as evaluation points. to get the pvr (DFT of the coefficient vector)
-inverse DFT to interpolate
Small detail degree bounds
zero pad A, B coefficient vectors with n zeros
Graphical view
Complex roots of unity
ω is complex n-th root of unity i.e. ωn=1 – exactly n distinct n-th roots of unity.
e
2ik
n
for k=0,1,...,n-1
Interpretation eiu=cosu+i sinu
ωn= e
2 i
n
principal n-th roots of unity.
Others are powers of ωn
{ ω00, ω01,..., ω0n-1} is closed under multiplication ωnj ωnk= ωn(j+k)mod n
Multiplication inverses exist ωn-1= ω0n-1
More properties:
Cancellation Lemma:
dk
ωdndk= ωnk
If n is even
ωnn/2=-1
Halving Lemma:
[LHS
 2dni 
e  = ωnk]


If n>0 is even, then squares of the n complex n-th roots of unity are the n/2 complex n/2-th roots of unity
(ωnk)2= ωn2k= ωn/2k
(ωnk+n/2)2= ωn2k+n= ωn2k ωnn= ωn2k= (ωnk)2
i.e ωnk and ωnk+n/2 have the same square.
Key to divide and conquer
[recursive subproblems are only half as large]
Summation Lemma
for any n≥1 and k>0 such that k is not divisible by n
j
n 1
 (
j 0
k
n
)
 
 0 [Geometric series formula
 1  nn   1 1k  1


0
 1
 nk  1  nk  1
k n
n
k
n
k
The DFT,
n 1
A( x )   a j x j , given in coefficient form
j 0
Evaluate A(x) at ωn0, ωn1, ...,ωnn-1
[i.e. the n compex n-th roots of unity]
-
assuming A has been appropriately zero-padded
also assume n is power of 2 [more zero padding if needed]
j
n 1
Yk=A(ωnk)=
a
j 0
j
( )
k
n
Y=[Y0, Y1, ...., Yn] is defined to be the DFT of a
will write as Y=DFTn(a)
The FFT, takes advantage of the special properties of the complex roots of unity to compute DFT n(a)
in O(n log2n)
Use divide and conquer
A[0](x)=a0+a2x2+a4x4+...+an-2xn/2-1
A[1](x)=a1+a3x2+a5x4+...+an-1xn/2-1
A[0] even index coefficients of A[lsb=1]
A[1] odd index coefficients of A[msb=1]
A(x)= A[0](x2)+x A[1](x2) ... [2]
Hence the problem of computing A(x) at ωn0, ωn1, ...,ωnn-1 reduces to
Evaluating A[0](x) and A[1](x) at (ωn0)2, (ωn1) 2, ...,(ωnn-1) 2 and then
Combining according to [2]
Look carefully at list of n points to evaluate
A[0] and A[1] at (ωn0)2, (ωn1) 2, ...,(ωnn-1) 2
There are only n/2 distinct values here
So to compute the n-point DFT do two n/2 distinct values compute 2 n/2 DFT computations
T(n)=2T(n/2)+ Θ(n)= Θ(nlog2n)
Still need to perform interpolation at the complex roots of unity. DFTn(a) is given in matrix form by
Y=Vna
1 ..
1
 Y0  1 1

3
n 1 
 Y  1 
 n ..
n
n
 1 

.
.. ..
.
 ..   .


 
2 ( n 1)
n 1
( n 1)( n 1) 
n
..  n
Yn 1  1  n

Kij th entry is ωnkj for j,k = 0,1,...,n-1
Inverse operation
a=DFTn-1(Y)
Lemma (j,k)th entry of Vn-1 is ωn-kj/n
Proof: Look at the entry (j,j’) of Vn-1 Vn
  n  kj


 n
k 0 
n 1
 
n 1
 kj
 k ( j' j)
 n   n
 1 if j’=j

n
k

0

= 0 otherwise
aj=
1 n 1
 kj
yk n

n k 0
same approach as FFT
ωn replaced by ωn-1 and result divided by n.
Efficient FFT implementation
common subexpression extraction
make iterative and not recursive
Rearrange elements of a so that adjacent pairs are DFT
bit reversal
RECURSIVE-FFT(a)
n<-length[a]
if n=1
then return a
ωn<- e
2 i
n
ω<-1
a[0]<-(a0,a2,...,an-2)
a[1]<-(a1,a3,...,an-1)
y[0]<-RECURSIVE-FFT(a[0])
y[1]<-RECURSIVE-FFT(a[1])
for k<- to n/2 – 1
do yk <- yk[0]+ ωyk[1]
yk+n/2 <- yk[0] – ω yk[1]
ω <- ω ωn
return y
Parallel FFT Circuit
for s<- 1 to log2n
do for k <- 0 to n-1 by 2s
do combine the two 2s-1 element DFT’s in
A[k..k+2s-1-1] and A[k+2s-1..k+2s-1]
into one 2s – element DFT in A[k.. k+2s-1]
FFT-BASE(a)
n<-length[a]
for s<- 1 to log2n
do m<-2s
2 i
m
ωm<- e
for k<- 0 to n-1 by m
do ω<-1
for j<- 0 to m/2 -1
do t <- ω A[k+j+m/2]
u <- A[k+j]
A[k+j]<-u+t
A[k+j+m/2]<-u-t
ω <- ω ωm
ITERATIVE-FFT(a)
BIT-RESERVE-COPY(a,A)
n<-length[a]
for s<- 1 to log2n
do m<-2s
2 i
ωm<- e m
ω<-1
for j<-0 to m/2-1
do for k<j to n-1 by m
do t <- ωA[k+m/2]
u <- A[k]
A[k] <- u+t
A[k+m/2] <- u-t
ω <- ω ωm
return A
bit reverse inputs
log2n stages n/2 butterflies in parallel
many other applications
1.
2.
Evaluate polynomial of degree bound n at n points in O(n log2n) time
Multiply large numbers
Variants: integer valued coefficients
All arithmetic mod Zp [rather than C]
Practical Issues:
layout
cache unfriendly
numerical instability
number of calculations
N  N convolution N2 multiplies
3Nlog2N complex multiplies and adds
[taking into account redundancy]
12Nlog2N real multiplies
Download