Fast Algorithms for Discrete Wavelet Transform Review and Implementation by Dan Li F2000 DWT and FWT: Significance DWT Multi-resolution mode to access the information Extensively (and intensively) used in information processing Advantage over other transforms e.g. (in JPEG 2000), DWT provides 20-30% improvement in compression efficiency as oppose to DCT. FWT Multi-resolution mode to access the information DWT: intensive computation and large memory requirement. FWT makes DWT practicable in real applications Main factors controlling the speed of DWT: – Filter length – Floating point operation vs. integer operation FWT Algorithm: An Outline Mallat Straightfoward Filter Bank “Regular” Structure Polyphase Transversal Filters# (* based on FFT for fast filtering) Polyphase Short-length Filters* (* also known as “fast-running FIR algo”) Binomial QMF Filters Classical Lattice Filters “Irregular” Structure CORDIC Lattice Filters Lifting Scheme and Integer WT FWT Algorithm: An Overview (I) 2 H(z) Mallat Filter Bank D1 x[n] 2 H(z) 2 G(z) A1 A2 Transversal Filters x0[n] G1(z) 2 x1[n-1] 2 Short-length Filters H0(z) x[n] z-1 x[n] A1 H1(z) - + 2 2 D1 H0(z) D1 - + H0(z)+H1(z) z-1 z-1 + H1(z) H(z) 2 D3 G(z) 2 A3 2 G(z) G0(z) D2 - A1 + FWT Algorithm: An Overview (II) c0 c1 c2 -c2 c1 -c0 (1+z-1)2 2 Binomial QMF x[n] (1+z-1)3 (1+z-1)(1-z-1) x0[n] (1-z-1)2 (1+z-1)2 z-1 2 x1[n-1] 2 Classical Lattice x[n] z-1 2 -2-4 CORDIC Lattice (1-z-1)2 cos 0 K’ sin0 -sin 0 2 2-16 x[n] z-1 s(0) 2 x0[n] x[n] x1[n-1] z-1 2 d(0) z-1 q1(z) d(1) p2(z) cos L 1 -1 -2-2 K’ 1 2-2 2 1 1 -0 -2 s(1) s(2) p1(z) 1 D1 sinL -sin L ... cos 1 z-1 cos L cos 1 sin1 -sin 1 cos 0 2-8 Lifting Ladder (1+z-1)(1-z-1) (1-z-1)3 A1 q2(z) ... d(2) pM(z) 1 A1 -1 D1 1 2 A1 2-2 -2-2 z-1 s(M) D1 K1 A1 K2 D1 qM(z) d(M) Comput. Complexity: A Comparison Arithmetic Complexity (per input point & per decomposition cell) Mallat FB Tran.Filter (FFT) Tran.Filter(Short-length) 35 # of mults 30 25 # of 20 adds 15 10 5 10 5 0 0 10 20 Filter length L Tran.Filter (FFT) 30 25 20 15 0 Mallat FB 35 30 40 Tran.Filter(Short-length) 0 10 20 Filter length L 30 Computational Structure Complexity (per filter coefficient) Tran.Filter (FFT) 50 40 # of adders 30 needed 20 Tran.Filter(Short-length) Binomial QMF filters 10 Classical Lattice 0 5 10 15 Word length w (bits) 20 CORDIC Lattice 40 Comments, Implementations, etc. “Efficiency” in the sense of arithmetic complexity and computational structure Straightforward filter bank: classical and used in many commercial s/w Polyphase structure: more efficient than direct FB. (Worthy of further exploration!) FFT based filtering: efficient for medium or long filters Fast running FIR filter: good for short filters Binomial QMF: reduces the # of mults with expense of additional adds Lattice: easier to implement with each relatively simpler stages CORDIC: most suitable fore efficient VLSI implementation since only addition and shifts involved and least possible adders required Lifting scheme: lead to IWT which is faster than floating-point DWT and ideal for lossless coding/compression. Implementation focused on the following: Fast filtering for short and long filters Various formats of polyphase structures Reformulation of polyphase transversal filter with the consideration of reduced interchannel communication Integer filter and IWT implementations Simulations