DSP-CIS Part-IV : Filter Banks & Subband Systems Chapter-12 : Frequency Domain Filtering Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be www.esat.kuleuven.be/stadius/ Part-IV : Filter Banks & Subband Systems Chapter-10 Filter Bank Preliminaries Chapter-11 Filter Bank Design Chapter-12 Frequency Domain Filtering • Frequency Domain FIR Filter Realization • Frequency Domain Adaptive Filtering Chapter-13 Transmultiplexers Chapter-14 Time-Frequency Analysis & Scaling DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 2 FIR Filter Realization FIR Filter Realization =Construct (realize) LTI system (with delay elements, adders and multipliers), such that I/O behavior is given by.. y[k] = b0.u[k]+ b1.u[k -1]+... + bL .u[k - L] Several possibilities exist… 1. Direct form 2. Transposed direct form 3. Lattice realization (LPC lattice) 4. Lossless lattice realization 5. Frequency domain realization: see Part IV DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 3 Frequency Domain FIR Filter Realization Have to know a theorem from linear algebra here: • A `circulant’ matrix is a matrix where each row is obtained from the previous row using a right-shift (by 1 position), a d c b b a d c the rightmost element which spills over is c b a d circulated back to become the leftmost element d c b a • The eigenvalue decomposition of a circulant matrix is always given as... (4x4 example) a d c b A 0 0 0 A a b c d a b d a c b c 1 0 F . 0 d a 0 B 0 0 C 0 0 0 .F , 0 D B b with F . C c D d with F the DFT-matrix. This means that the eigenvectors are equal to the column-vectors of the IDFT-matrix, and that then eigenvalues are obtained as the DFT of the first column of the circulant matrix (proof by Matlab) DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 4 Frequency Domain FIR Filter Realization FIR Filter Realization (example filter length =L+1=4, similar for other L) y[k] = b0.u[k]+ b1.u[k -1]+ b2.u[k - 2]+ b3.u[k - 3] B(z) = b0 + b1z-1 + b2 z -2 + b3z-3 Consider a 'block processing’ where a block of LB (=‘block length’) output samples are computed at once, and consider LB=L+1: é u[k] ù é y[k] ù é ê ú ê ê y[k -1] ú ê ê ú=ê y[k 2] ê ú ê êë y[k - 3] úû ê ë b0 b1 b2 b3 0 0 0 0 0 b0 b1 b2 b3 0 0 0 0 0 b0 b1 b2 b3 0 0 0 0 0 b0 b1 b2 b3 0 DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering ê ê ùê úê úê ú. ê úê úê ûê ê êë u[k -1] u[k - 2] u[k - 3] u[k - 4] u[k - 5] u[k - 6] u[k - 7] ú ú ú ú ú ú ú ú ú ú úû p. 5 Frequency Domain FIR Filter Realization Now some matrix manipulation… DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 6 Frequency Domain FIR Filter Realization • This means that a block of LB=L+1 output samples can be computed as follows (read previous formula from right to left) : – Compute DFT of 2(L+1) input samples, i.e. last (L+1) samples combined (‘overlapped’) with previous (L+1) samples – Perform component-wise multiplication with… (=freq.domain representation of the FIR filter) – Compute IDFT é ê ê ê ê ê ê ê ê ê ê ê êë B0 ù é 0 ú ê B1 ú ê b3 ú ê b B2 ú ê 2 B3 ú ú = F. êê b1 B4 ú ê b0 ú B5 ê 0 ú ê B6 ú ê 0 ú êë 0 B7 úû ù ú ú ú ú ú ú ú ú ú ú úû – Throw away 1st half of result, select (‘save’) 2nd half • This is referred to as an ‘overlap-save’ procedure (and ‘frequency domain filter realization’ because of the DFT/IDFT) DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 7 Frequency Domain FIR Filter Realization • Overlap-save procedure is very efficient for large L : – Computational complexity (with FFT/IFFT i.o. DFT/IDFT) is 2.[α.2(L+1).log(2(L+1))] + 2(L+1) multiplications for (L+1) output samples, i.e. O(log(L)) per sample for large L – Compare to computational complexity for direct form realization: (L+1) multiplications per output sample, i.e. O(L) per sample • Overlap-save procedure introduces O(L) processing delay/latency (e.g. y[k-L] only available sometime after time k) • Conclusion: For large L, complexity reduction is large, but latency is also large • Will derive ‘intermediate’ realizations, with a smaller latency at the expense of a smaller complexity reduction. This will be based on an Nth order polyphase decomposition of B(z)… DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 8 Frequency Domain FIR Filter Realization A compact derivation will rely on a result from filter bank theory (return to Chapter-10…) 1 u[k] z 1 z 2 z 3 4 4 4 4 z 3 4 4 4 4 T( z ) T(z)*u[k-3] z 2 + z 1 1 (…and now let B(z) take the place of T(z)) This means that a filter (specified with Nth order polyphase decomposition) B(z) = p0 (z 4 )+ z-1 p1 (z 4 )+ z-2 p2 (z 4 )+ z-3 p3 (z 4 ) can be realized in a multirate structure, based on a pseudo-circulant matrix p ( z) p ( z) p ( z) p ( z) 0 z 1. p ( z ) T( z ) 1 3 z . p2 ( z ) 1 z . p1 ( z ) 1 2 p0 ( z ) p1 ( z ) z 1. p3 ( z ) z 1. p2 ( z ) p0 ( z ) z 1. p3 ( z ) 3 p2 ( z ) p1 ( z ) p0 ( z ) DSP-CIS 2015 / Part IV / for Chapter-12: Domain Filtering PS: formulas given N=4, Frequency for conciseness (but without loss of generality) p. 9 Frequency Domain FIR Filter Realization Now some matrix manipulation… (compare to p.6) DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 10 Frequency Domain FIR Filter Realization • An (8-channel) filter bank representation of this is... 1 u[k] z 1 z 2 z 3 4 4 4 4 E( z ) Analysis bank: H (z ) R (z ) 4 4 4 4 z 3 z 2 z 1 + y[k] 1 I4 x4 E( z ) F . 1 . z .I 4 x 4 Subband processing: H( z ) diag{P0 ( z ), P1 ( z ),...P7 ( z )} Synthesis bank: R ( z ) 0 I 4 x 4 . F 1 This is a 2N-channel filter bank, with N-fold downsampling The analysis FB is a 2N-channel uniform DFT filter bank (see Chapter 10) The synthesis FB is matched to the analysis bank, for PR: R ( z ).E( z ) z 1 I 4 x 4 DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 11 Frequency Domain FIR Filter Realization • This is again known as an `overlap-save’ realization : – Analysis bank: performs 2N-point DFT (FFT) of a block of (N=4) samples, together with the previous block of (N) samples (hence `overlap’) I4 x4 E( z ) F . 1 . z .I 4 x 4 `block’ `previous block’ – Synthesis bank: performs 2N-point IDFT (IFFT), throws away the 1st half of the result, saves the 2nd half (hence `save’) R ( z ) 0 I . F 1 4x4 `throw away’ `save’ – Subband processing corresponds to `frequency domain’ operation • Complexity/latency? See p.15… DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 12 Frequency Domain FIR Filter Realization Derivation on p.10 can also be modified as follows… DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 13 Frequency Domain FIR Filter Realizationng • This is known as an `overlap-add’ realization : – Analysis bank: performs 2N-point DFT (FFT) of a block of (N=4) samples, padded with N zero samples I4x4 E( z ) F . . 04 x 4 `block’ `zero padding’ – Synthesis bank: performs 2N-point IDFT (IFFT), adds 2nd half of the result to 1st half of previous IDFT (hence `add’) R ( z ) z 1. I I . F 1 4x4 4x4 `overlap’ `add’ – Subband processing corresponds to `frequency domain’ operation DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 14 Frequency Domain FIR Filter Realization • Computational complexity is (with FFT/IFFT i.o. DFT/IDFT, plus subband processing) 2.[α.2N.log(2N)] + 2(L+1) multiplications for N output samples, i.e. O(log(N))+O(L/N) per sample For large N≈L this is O(log(L)) i.e. dominated by FFT/IFFT (cheap!) For N<<L this is O(L), i.e. dominated by subband processing • Processing delay/latency is O(N) • Standard `overlap-add’ and `overlap-save’(=p.7) realizations are derived when 0th order poly-phase components are used in the above derivation (N=L+1, i.e. each poly-phase component has only 1 filter coefficient). For large L, this leads to a large complexity reduction, but also a large latency (=O(L)) • In the more general case, with higher-order polyphase components (N<L+1, i.e. each poly-phase component has >1 filter coefficients) a smaller complexity reduction is achieved, but the latency is also smaller (=O(N)). DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 15 Frequency Domain Adaptive Filtering • A similar derivation can be made for LMS-based adaptive filtering with block processing ('Block-LMS'). The adaptive filter then consist in a filtering operation plus an adaptation operation, which corresponds to a correlation operation. Both operations can be performed cheaply in the frequency domain.. • Starting point is the LMS update equation wk+1 = wk + m.x k .(d[k]- xTk wk ) DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 16 Frequency Domain Adaptive Filtering Consider block processing with so-called 'Block-LMS' • Remember that LMS is a ‘stochastic gradient’ algorithm, where instantaneous estimates of the autocorrelation matrix and crosscorrelation vector are used to compute a gradient (=steepest descent vector) • Block-LMS uses averaged estimates, with averaging over a block of LB (=‘block length’) samples, and hence an averaged gradient. The update formula is then.. w n+1 = w n + m. where n is the block index nLB + LB å x k .(d[k]- xTk w n ) k = nLB +1 • Compared to LMS, Block-LMS does fewer updates (one per LB samples), but with (presumably) better gradient estimates. Overall, convergence could be faster or slower (=unpredictable). • The important thing is that Block-LMS can be realized cheaply… DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 17 Frequency Domain Adaptive Filtering é ê ê ê ê êë e[4n + 4] e[4n + 3] e[4n + 2] e[4n +1] ù é ú ê ú ê ú=ê ú ê úû êë é ê as on p.6 ê = ê ê êë d[4n + 4] d[4n + 3] d[4n + 2] d[4n +1] d[4n + 4] d[4n + 3] d[4n + 2] d[4n +1] with Wi=… ù é w[0] ú ê 0 ú ê ú-ê 0 ú ê 0 úû êë ù ú ú é ú - ë 04 x 4 ú úû w[1] w[0] 0 0 I4x 4 w[2] w[1] w[0] 0 w[3] w[2] w[1] w[0] é W 0 ê ê 0 ê ê 0 ê ù.F -1. ê 0 û ê 0 ê 0 ê ê 0 ê êë 0 0 w[3] w[2] w[1] 0 0 w[3] w[2] 0 0 0 w[3] 0 0 0 0 é ê ê ùê úê úê ú. ê úê úû ê ê ê êë x[4n + 4] x[4n + 3] x[4n + 2] x[4n +1] x[4n] x[4n -1] x[4n - 2] x[4n - 3] 0 0 0 0 0 0 0 W1 0 0 0 0 0 0 0 W2 0 0 0 0 0 0 0 W3 0 0 0 0 0 0 0 W4 0 0 0 0 0 0 0 W5 0 0 0 0 0 0 0 W6 0 0 0 0 0 0 0 W7 DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering ù ú ú ú ú ú ú ú ú ú ú úû ù é ú ê ú ê ú ê ú ê ú ê ú.F. ê ú ê ú ê ú ê ú ê ú ê úû ë x[4n + 4] x[4n + 3] x[4n + 2] x[4n +1] x[4n] x[4n -1] x[4n - 2] x[4n - 3] ù ú ú ú ú ú ú ú ú ú ú úû =frequency domain filtering Will consider case where block length LB = filter length = L+1 The update formulas are then given as follows 1) Compute a priori residuals (example LB=L+1=4 , similar for other L) p. 18 Frequency Domain Adaptive Filtering é ê ê ê ê ê ë w[0] w[1] w[2] w[3] ù é ú ê ú ê = ú ê ú ê ú ûn +1 ê ë é ê ê =ê ê ê ë é ê as on p.6 ê = ê ê ê ë w[0] w[1] w[2] w[3] w[0] w[1] w[2] w[3] w[0] w[1] w[2] w[3] ù é ú ê ú ê + m . ú ê ú ê ú ê ûn ë x[4n + 4] x[4n + 3] x[4n + 2] x[4n +1] ù é e[4n + 4] ú ê 0 ú ê ú + m. ê 0 ú ê 0 ú ê ûn ë ù ú ú é ú + m. ë 0 4 x 4 ú ú ûn with Ei=… I4 x 4 x[4n + 3] x[4n + 2] x[4n +1] x[4n] e[4n + 3] e[4n + 4] 0 0 é E 0 ê ê 0 ê ê 0 ê ù.F -1. ê 0 û ê 0 ê 0 ê ê 0 ê ê ë 0 x[4n + 2] x[4n +1] x[4n] x[4n -1] e[4n + 2] e[4n + 3] e[4n + 4] 0 x[4n +1] x[4n] x[4n -1] x[4n - 2] e[4n +1] e[4n + 2] e[4n + 3] e[4n + 4] ùé úê úê ú. ê úê ú ûê ë e[4n + 4] e[4n + 3] e[4n + 2] e[4n +1] 0 e[4n +1] e[4n + 2] e[4n + 3] 0 0 e[4n +1] e[4n + 2] 0 0 0 0 0 0 0 E1 0 0 0 0 0 0 0 E2 0 0 0 0 0 0 0 E3 0 0 0 0 0 0 0 E4 0 0 0 0 0 0 0 E5 0 0 0 0 0 0 0 E6 0 0 0 0 0 0 0 E7 DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering ù ú ú ú ú ú û ù é ú ê ú ê ú ê ú ê ú ê ú.F. ê ú ê ú ê ú ê ú ê ú ê ë ú û 0 0 0 e[4n +1] x[4n + 4] x[4n + 3] x[4n + 2] x[4n +1] x[4n] x[4n -1] x[4n - 2] x[4n - 3] ù ú ú ú ú ú ú ú ú ú ú ú û 0 0 0 0 é ê ê ùê úê úê ú. ê úê ú ûê ê ê ê ë x[4n + 4] x[4n + 3] x[4n + 2] x[4n +1] x[4n] x[4n -1] x[4n - 2] x[4n - 3] ù ú ú ú ú ú ú ú ú ú ú ú û =frequency domain correlation Will consider case where block length LB =filter length =L+1 The update formulas are then given as follows 2) Filter update (example LB=L+1=4 , similar for other L) p. 19 Frequency Domain Adaptive Filtering This is referred to as FDAF (’Frequency Domain Adaptive Filtering’) • FDAF is functionally equivalent to Block-LMS (but cheaper, see below) • Convergence: Instead of using one and the same stepsize for all ‘frequency bins’, frequency dependent stepsizes can be applied.. – In the update formula, is removed and Ei is replaced by i.Ei – Stepsize i dependent on the energy in the ith frequency bin – Leads to increased convergence speed at only a small extra cost DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 20 Frequency Domain Adaptive Filtering This is referred to as FDAF (’Frequency Domain Adaptive Filtering’) • Complexity ≈ 5 (I)FFT’s (size 2(L+1)) per block of L+1 output samples (check!) Hence for large L, FDAF is very efficient/cheap, only O(log(L)) multiplications per output sample (compared to O(L) for (Block-)LMS) cost LMS » 20 Example: LB=L+1=1024, then cost FDAF ! • Processing delay/latency is again O(L). Example: LB=L+1=1024 and fs=8000Hz, then delay is 256 ms ! In cases where this is objectionable (e.g. acoustic echo cancellation), need ‘intermediate’ algorithms with smaller latency and smaller complexity reduction, based on a polyphase decomposition…(read on) DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 21 Frequency Domain Adaptive Filtering For large L, a block length of LB=L+1 may lead to a too large latency If a Nth order polyphase decomposition of the adaptive filter is considered (hence with LP+1=(L+1)/N coefficients per polyphase component), then a frequency domain adaptive filtering algorithm with block length LB=N can derived as follows... (where “N takes the place of L+1”) Example LB=N=4, i.e. (as on p.9) W(z) = p0 (z 4 )+ z-1 p1 (z 4 )+ z-2 p2 (z 4 )+ z-3 p3 (z 4 ) with L p0 (z) = p00 + p10 z -1 + p02 z -2 +... + p0 p z - Lp p1 (z) = etc. DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 22 Frequency Domain Adaptive Filtering (compare to p.18-19) The update formulas are given as follows 1) Compute a priori residuals (example LB=N=4 , similar for other N) é ê ê ê ê ê ë e[4n + 4] e[4n + 3] e[4n + 2] e[4n +1] ù ú ú ú= ú ú û é ê ê ê ê ê ë é ê as on p.6 ê = ê ê ê ë d[4n + 4] d[4n + 3] d[4n + 2] d[4n +1] d[4n + 4] d[4n + 3] d[4n + 2] d[4n +1] with Pij=… ì ï ï ïé p0i ù ú L ïê P ïê 0 ú å íêê ú 0 ú i = 0ï ïê ú û ïê ë 0 ï ï î ù ú ú é ú - ë 04 x 4 ú ú û I4x 4 p1i p2i p3i i 0 i 1 i 2 p 0 0 p i 0 p 0 p i 1 p i 0 p 0 i 3 p i 2 p i 1 p ìé i ïê P0 ïê 0 ïê ïê 0 L ïê P ïê 0 ù.F -1. íê å û ê 0 i = 0ï ïê ïê 0 ïê 0 ïê ïê 0 îë 0 0 0 0 0 0 0 0 i 3 p i 2 i 3 é x[4(n - i) + 4] ùü ê úï ê x[4n - i) + 3] úï ùê úï ú ê x[4n - i) + 2] úï ú ê x[4n - i) +1] úï ú. ê úý x[4n - i)] úê úï ú ê x[4n - i) -1] úï ú ûê úï ê x[4n - i) - 2] úï ê x[4n - i) - 3] úï ë ûþ p p 0 0 0 0 0 0 0 0 P1i 0 0 0 0 0 0 0 i 2 P 0 0 0 0 0 0 0 P3i 0 0 0 0 i 4 0 0 0 P 0 0 0 0 0 0 0 P5i 0 0 i 6 0 0 0 0 0 P 0 0 0 0 0 0 0 P7i DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering ù é ú ê ú ê ú ê ú ê ú ê ú ú.F. ê ê ú ê ú ê ú ê ú ê ú ë ú û x[4(n - i) + 4] x[4(n - i) + 3] x[4(n - i) + 2] x[4(n - i) +1] x[4(n - i)] x[4(n - i) -1] x[4(n - i) - 2] x[4(n - i) - 3] ù ú ú ú ú ú ú ú ú ú ú ú û p. 23 ü ï ï ï ï ï ï ý ï ï ï ï ï ï þ Frequency Domain Adaptive Filtering (compare to p.18-19) The update formulas are given as follows 2) Filter update (example LB=N=4 , similar for other N) é ù é 4 ê ê p0 (z ) ú 4 ê ê p (z ) ú 1 ú ê =ê 4 ê ê p2 (z ) ú ê ú ê 4 ) (z p ú ê 3 ë ûn +1 ê ë é ê ê =ê ê ê ê ë ù ì é p0 (z 4 ) ú ï L ê 4 p1 (z ) ú ï P -4 i ê ú + m. í å z ê p2 (z 4 ) ú ïi = 0 ê ú ï ê ë p3 (z 4 ) ú î ûn ù é e[4n + 4] p0 (z 4 ) ú ê 4 p1 (z ) ú 0 ê ú + m. ê 0 p2 (z 4 ) ú ê ú 0 ê ë p3 (z 4 ) ú ûn ù é 4 ê p0 (z ) ú as on p.6 ê p (z 4 ) ú 1 ú + m. é 0 4 x 4 ê = ë ê p2 (z 4 ) ú ú ê 4 ê ûn ë p3 (z ) ú I4 x4 x[4(n - i) + 4] x[4(n - i) + 3] x[4(n - i) + 2] x[4(n - i) +1] e[4n + 3] e[4n + 4] 0 0 é E 0 ê ê 0 ê ê 0 ê ù.F -1. ê 0 û ê 0 ê 0 ê ê 0 ê ê ë 0 x[4(n - i) + 3] x[4(n - i) + 2] x[4(n - i) +1] x[4(n - i)] e[4n + 2] e[4n + 3] e[4n + 4] 0 e[4n +1] e[4n + 2] e[4n + 3] e[4n + 4] x[4(n - i) + 2] x[4(n - i) +1] x[4(n - i)] x[4(n - i) -1] 0 e[4n +1] e[4n + 2] e[4n + 3] 0 0 e[4n +1] e[4n + 2] 0 0 0 0 0 0 E1 0 0 0 0 0 0 0 E2 0 0 0 0 0 0 0 E3 0 0 0 0 0 0 0 E4 0 0 0 0 0 0 0 E5 0 0 0 0 0 0 0 E6 0 0 0 0 0 0 0 E7 0 DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering x[4(n - i) +1] x[4(n - i)] x[4(n - i) -1] x[4(n - i) - 2] 0 0 0 e[4n +1] ù ì é ú ï ê ú ï ê ú ï ê ú ïL ê P ú ï ê ú.F. í å z -4 i ê ú ï i=0 ê ú ï ê ú ï ê ú ï ê ú ï ê ë î ú û ùü é úï ê úï ê úý. ê úï ê ï ú ë ûþ ê 0 0 0 0 e[4n + 4] e[4n + 3] e[4n + 2] e[4n +1] ì é ï ê ï ê ï ê ù ï ê ú L P ï ú -4i ê ú. í å z ê ê ú ï i=0 ï ê ú û ï ê ï ê ï ê ë î x[4(n - i) + 4] x[4(n - i) + 3] x[4(n - i) + 2] x[4(n - i) +1] x[4(n - i)] x[4(n - i) -1] x[4(n - i) - 2] x[4(n - i) - 3] ù ú ú ú ú ú û x[4(n - i) + 4] x[4(n - i) + 3] x[4(n - i) + 2] x[4(n - i) +1] x[4(n - i)] x[4(n - i) -1] x[4(n - i) - 2] x[4(n - i) - 3] ùü úï úï úï úï úï úý úï úï úï úï úï ûþ p. 24 ùü úï úï úï úï úï úý úï úï úï úï úï ûþ Frequency Domain Adaptive Filtering This is referred to as PB-FDAF (’Partitioned Block Frequency Domain Adaptive Filtering’) • PB-FDAF is functionally equivalent to Block-LMS • Complexity ≈ 3+2(LP+1) (I)FFT’s (size 2N)) per block of N output samples (check!) Example: L+1=1024, N=128, then cost LMS »6 cost PB-FDAF ! • Processing delay/latency is O(N). Example: L+1=1024, N=128 and fs=8000Hz, then delay is 32 ms (used in commercial acoustic echo cancellers) DSP-CIS 2015 / Part IV / Chapter-12: Frequency Domain Filtering p. 25