Discriminative Approach for Wavelet Denoising Yacov Hel-Or and Doron Shaked I.D.C.- Herzliya HPL-Israel Motivation – Image denoising - Can we clean Lena? Some reconstruction problems Sapiro et. al. Images of Venus taken by the Russian lander Ventra-10 in 1975 - Can we “see” through the missing pixels? Image Inpainting Sapiro et.al. Image De-mosaicing - Can we reconstruct the color image? Image De-blurring Can we sharpen Barbara? • • • • Inpainting De-blurring De-noising De-mosaicing • All the above deal with degraded images. • Their reconstruction requires solving an inverse problem Typical Degradation Sources Low Illumination Optical distortions (geometric, blurring) Sensor distortion (quantization, sampling, sensor noise, spectral sensitivity, de-mosaicing) Atmospheric attenuation (haze, turbulence, …) Reconstruction as an Inverse Problem n Original image x y noise Distortion y Hx n H Reconstruction Algorithm measurements x̂ Years of extensive study Thousands of research papers y Hx n y H 1 y n x̂ • Typically: – The distortion H is singular or ill-posed. – The noise n is unknown, only its statistical properties can be learnt. Key point: Stat. Prior of Natural Images The Image Prior Px(x ) 1 Image space 0 Bayesian Reconstruction (MAP) • From amongst all possible solutions, choose the one that maximizes the a-posteriori probability PX(x|y) Most probable solution PX(x) measurements P(x|y) Image space So, are we set? Unfortunately not! • The p.d.f. Px defines a prior dist. over natural images: – Defined over a huge dim. space (1E6 for 1Kx1K grayscale image) – Sparsely sampled. – Known to be non Gaussian. – Complicated to model. Example: 3D prior of 2x2 image neighborhoods form Mumford & Huang, 2000 Marginalization of Image Prior • Observation1: The Wavelet transform tends to decorrelate pixel dependencies of natural images. PxW Pi x xW W x i W.T. i W How Many Mapping Functions • Observation2: The statistics of natural images are homogeneous. P x i W Pi x band i i W Share the same statistics Wavelet Shrinkage Denoising Donoho & Johnston 94 (unitary case) • Degradation Model: y xn H I , n ~ N 0, • The MAP estimator: xˆW y arg max P xW | yW xw • The MAP estimator gives: xˆW y arg max P xW | yW xw xˆW y arg max P yW | xW PxW xw xˆW y arg min log P yW | xW log PxW xˆW xw i i y arg min xW yW xw i 2 log Pi x i i W • The MAP estimator diagonalizes the system: xˆ i W y i W i i arg min xW yW xw 2 This leads to a very useful property: Scalar mapping functions: xˆ M i yW i W i log Pi x i W Wavelet Shrinkage Pipe-line xˆ W T MW W y Mapping functions Mi(yiw) y Transform W xiw Inverse Transform WT yiw Non linear operation x How Many Mapping Functions? • Due to the fact that: P x i W Pi x band i i W • N mapping functions are needed for N subbands. Subband Decomposition • Wavelet transform: yB B y where • Shrinkage: x̂ BT M B B y BkT M k Bk y k B1 B BN Wavelet Shrinkage Pipe-line Shrinkage functions Wavelet transform y B1 B1 B1 Bi xi Inverse transform B1 B1 B1 T Bi B yiB x̂ BkT M k Bk y k x + Designing The Mapping Function • The shape of the mapping function Mj depends solely on Pj and the noise variance . x i W iB j (noise variance) Modeling marginal p.d.f. of band j MAP objective yw M j i i i i i xˆW yW arg min xW yW log Pj xW xw • Commonly Pj(yw) are approximated by GGD: P xw ~ e x s p for p<1 from: Simoncelli 99 Hard Thresholding Soft Thresholding Linear Wiener Filtering MAP estimators for GGD model with three different exponents. The noise is additive Gaussian, with variance one third that of the signal. from: Simoncelli 99 • Due to its simplicity Wavelet Shrinkage became extremely popular: – Thousands of applications. – Hundreds of related papers (984 citations of D&J paper in Google Scholar). • What about efficiency? – Denoising performance of the original Wavelet Shrinkage technique is far from the state-of-the-art results. • Why? – Wavelet coefficients are not really independent. Recent Developments • Since the original approach suggested by D&J significant improvements were achieved: Original Shrinkage Over-complete • Overcomplete transform • Scalar MFs • Simple • Not considered state-of-the-art Joint (Local) Coefficient Modeling • Multivariate MFs • Complicated • Superior results Joint (Local) Coefficient Modeling 94 97 HMM Crouse et. al. Sparsity Mallat, Zhang 03 2000 HMM Fan-Xia Joint Bayesian Simoncelli Context Modeling Chang, et. al. 06 Joint Bayesian Pizurika et. al Context Modeling Portilla et. al. Co-occurence Shan, Aviyente Bivariate Adaptive Thresh. Sendur, Selesnick Li, Orchad Shrinkage in Over-complete Transforms 94 Ridgelets Candes Shrinkage D.J. Undecimated wavelet Coifman, Donoho Steerable Simoncelli, Adelson 03 2000 97 Curvelets Starck et. al. 06 Ridgelets Carre, Helbert Ridgelets Nezamoddini et. al. Contourlets Matalon, et. al. Contourlets Do, Vetterli K-SVD Aharon, Elad Over-Complete Shrinkage Denoising • Over-complete transform: yB B y where • Shrinkage: xˆ B B T 1 T B MB B1 B BN B y B B B M B y T 1 T k k k k • Mapping Functions: Naively borrowed from the Unitary case. What’s wrong with existing MFs? 1. Map criterion: – Solution is biased towards the most probable case. 2. Independent assumption: – In the overcomplete case, the wavelet coefficients are inherently dependent. 3. Minimization domain: – For the unitary case MFs optimality is expressed in the transform domain. This is incorrect in the overcomplete case. 4. White noise assumption: – Image noise is not necessarily white i.i.d. Why unitary based MFs are being used? • Non-marginal statistics. • Multivariate minimization. • Multivariate MFs. • Non-white noise. Suggested Approach: • Maintain simplicity – Use scalar LUTs. • Improve Efficiency – Use Over-complete Transforms. – Design optimal MFs with respect to a given set of images. – Express optimality in the spatial domain. – Attain optimality with respect to MSE. Optimal Mapping Function: • Traditional approach: Descriptive x Modeling wavelet p.d.f. i W iB j MAP objective • Suggested approach: Discriminative x e Optimality criteria M j y e M j The optimality Criteria • Design the MFs with respect to a given set of examples: {xei} and {yei} x B B e i i T B M B y 1 T k k 2 e k k i • Critical problem: How to optimize the non-linear MFs The Spline Transform • Let xR be a real value in a bounded interval [a,b). • We divide [a,b) into M segments q=[q0,q1,...,qM] • w.l.o.g. assume x[qj-1,qj) • Define residue r(x)=(x-qj-1)/(qj-qj-1) q0 a q1 qj-1 qj r(x) x qj+(1-r(x)) x=[0,,0x=r(x) ,1-r(x),r(x) ,0,]q =qS j-1q(x)q qM b The Spline Transform-Cont. • We define a vectorial extension: x Sq x q 0 , 1 - r x i , r xi , 0 • We call this the Spline Transform (SLT) of x. ith row S q x The SLT Properties • Substitution property: Substituting the boundary vector q with a different vector p forms a piecewise linear mapping. q4 x x’=Sq(x)p q p4 q3 x’ q2 p3 p2 q1 p1 q p0 q1 q2 q3 x q4 x Back to the MFs Design • We approximate the non-linear {Mk} with piece-wise linear functions: M k Bk y Sq k Bk y p k x B B e i i T B S B y p 1 T k k 2 e qk k i k • Finding {pk} is a standard LS problem with a closed form solution! Designing the MFs y e B1 B1 B1 Bk Undecimated wavelet: 2D convolutions x e x Bk y Bk Mk(y; pk) p k closed form solution: B1 B1 B1 T B k (BTB)-1 e x + Results Training Images Tested Images Simulation setup • • • • Transform used: Undecimated DCT Noise: Additive i.i.d. Gaussian Number of bins: 15 Number of bands: 3x3 .. 10x10 Option 1: Transform domain – independent bands y e x Bk BB1 B1 k y Bk BB1 B1Tk (BTB)-1 BB1 B1Tk (BTB)-1 e x Mk(y; pk) x e x Bk BB1 B1 k y Bk k Bk x S q Bk y i p k e i i k e 2 x e Option 2: Spatial domain – independent bands y e x Bk BB1 B1 k y Bk BB1 B1Tk (BTB)-1 BB1 B1Tk (BTB)-1 e x Mk(y; pk) x e x Bk BB1 B1 k y Bk k B Bk x B S q Bk y i p k T k i e i T k k e 2 x e Option 3: Spatial domain – joint bands y e x Bk BB1 B1 k y Bk BB1 B1Tk (BTB)-1 BB1 B1Tk (BTB)-1 e x Mk(y; pk) x e x Bk BB1 B1 k y Bk x B B e i i T 1 B T k k e S qk Bk y i p k 2 x e Option 1 Option 2 Option 3 MFs for UDCT 8x8 (i,i) bands, i=1..4, =20 33 32.5 Method 1 Method 2 Method 3 32 31.5 psnr 31 30.5 30 29.5 29 28.5 28 27.5 barbara boat fingerprint house lena peppers256 Comparing psnr results for 8x8 undecimated DCT, sigma=20. 8x8 UDCT =10 8x8 UDCT =20 8x8 UDCT =10 The Role of Quantization Bins 35.5 35 34.5 34 psnr 33.5 33 32.5 32 barbara boat fingerprint house lena peppers256 31.5 31 30.5 5 10 15 20 number of bins 25 30 8x8 UDCT =10 35 The Role of Transform Used 36 35.5 35 3x3 DCT 5x5 DCT 7x7 DCT 9x9 DCT psnr 34.5 34 33.5 33 =10 32.5 32 barbara boat fingerprint house lena peppers256 The Role of Training Image 36 35 34 33 psnr 32 31 30 29 28 27 barbara boat fingerprint house lena peppers256 The Role of noise variance =5 =10 =15 =20 MFs for UDCT 8x8 (i,i) bands, i=2..6. The role of noise variance • Observation: The obtained MFs for different noise variances are similar up to scaling: v M v s M 0 s where s 0 Comparison between M20(v) and 0.5M10(2v) for basis [2:4]X[2:4] Comparison with BLS-GSM boat 45 45 45 40 PSNR 50 40 35 35 30 30 1 2 5 10 15 20 25 s.t.d. 40 35 30 1 2 5 10 15 20 25 s.t.d. fingerprint house 1 2 5 10 15 20 25 s.t.d. peppers 50 50 45 45 45 40 40 35 35 30 30 1 2 5 10 15 20 25 s.t.d. PSNR 50 PSNR PSNR lena 50 PSNR PSNR barbara 50 40 35 30 1 2 5 10 15 20 25 s.t.d. 1 2 5 10 15 20 25 s.t.d. Comparison with BLS-GSM 50 48 proposed method GSM method 46 44 PSNR 42 40 38 36 34 32 30 28 1 2 5 10 s.t.d. 15 20 25 Other Degradation Models JPEG Artifact Removal JPEG Artifact Removal Image Sharpening Image Sharpening Conclusions • New and simple scheme for over-complete transform based denoising. • MFs are optimized in a discriminative manner. • Linear formulation of non-linear minimization. • Eliminating the need for modeling complex statistical prior in high-dim. space. • Seamlessly applied to other degradation problems as long as scalar MFs are used for reconstruction. Conclusions – cont. • Extensions: – Filter-cascade based denoising. – Multivariate MFs (activity level). – Non-homogeneous noise characteristics. • Open problems: – What is the best transform for a given image? – How to choose training images that form faithful representation? Thank You MSE for MF scaling from =10 to =20 22 21 2.5 20 19 2.0 scale y 18 1.5 17 16 1.0 15 14 0.5 13 0.5 1.0 1.5 scale x 2.0 2.5 12 MSE for MF scaling from =15 to =20 22 21 2.5 20 19 scale y 2.0 18 17 1.5 16 15 1.0 14 13 0.5 12 0.5 1.0 1.5 scale x 2.0 2.5 11 MSE for MF scaling from =25 to =20 22 21 2.5 20 19 2.0 scale y 18 1.5 17 16 1.0 15 14 0.5 13 0.5 1.0 1.5 scale x 2.0 2.5 12 1.5 v M v s M 0 s where S 1 s 0 0.5 0 0 0.5 1 sigma / 20 1.5 Image Sharpening Wavelet Shrinkage Pipe-line Shrinkage functions Wavelet transform y B1 B1 B1 Bi xi Inverse transform B1 B1 B1 T Bi B yiB (BTB)-1 xˆ B B T B M B y 1 T k k k k x + Option 1 MFs for UDCT 8x8 (i,i) bands, i=1..4, =20 Option 2 MFs for UDCT 8x8 (i,i) bands, i=1..4, =20 Option 3 MFs for UDCT 8x8 (i,i) bands, i=1..4, =20 33 32.5 32 Traditional Suggested 31.5 psnr 31 30.5 30 29.5 29 28.5 28 27.5 barbara boat fingerprint house lena peppers256 Comparing psnr results for 8x8 undecimated DCT, sigma=20. 36 Traditional Suggested 35.5 35 psnr 34.5 34 33.5 33 32.5 32 barbara boat fingerprint house lena peppers256 Comparing psnr results for 8x8 undecimated DCT, sigma=10.