Discriminative Approach for Wavelet De

advertisement
Discriminative Approach for
Wavelet Denoising
Yacov Hel-Or and Doron Shaked
I.D.C.- Herzliya
HPL-Israel
Motivation – Image denoising
- Can we clean Lena?
Some reconstruction problems
Sapiro et. al.
Images of Venus taken by the Russian lander Ventra-10 in 1975
- Can we “see” through the missing pixels?
Image Inpainting
Sapiro et.al.
Image De-mosaicing
- Can we reconstruct the color image?
Image De-blurring
Can we sharpen Barbara?
•
•
•
•
Inpainting
De-blurring
De-noising
De-mosaicing
• All the above deal with degraded images.
• Their reconstruction requires solving an
inverse problem
Typical Degradation Sources
Low Illumination
Optical distortions
(geometric, blurring)
Sensor distortion
(quantization, sampling,
sensor noise, spectral sensitivity,
de-mosaicing)
Atmospheric attenuation
(haze, turbulence, …)
Reconstruction as an Inverse Problem
n
Original image
x
y
noise
Distortion
y  Hx  n
H
Reconstruction
Algorithm
measurements
x̂
Years of extensive study
Thousands of research papers
y  Hx  n
y
H
1
y  n 
x̂
• Typically:
– The distortion H is singular or ill-posed.
– The noise n is unknown, only its statistical
properties can be learnt.
Key point: Stat. Prior of Natural Images
The Image Prior
Px(x
)
1
Image space
0
Bayesian Reconstruction (MAP)
• From amongst all possible solutions, choose the one
that maximizes the a-posteriori probability PX(x|y)
Most probable solution
PX(x)
measurements
P(x|y)
Image space
So, are we set?
Unfortunately not!
• The p.d.f. Px defines a prior dist. over natural
images:
– Defined over a huge dim. space (1E6 for
1Kx1K grayscale image)
– Sparsely sampled.
– Known to be non Gaussian.
– Complicated to model.
Example: 3D prior of 2x2 image neighborhoods
form Mumford & Huang, 2000
Marginalization of Image Prior
• Observation1: The Wavelet transform tends to decorrelate pixel dependencies of natural images.
 
PxW    Pi x
xW  W x
i
W.T.
i
W
How Many Mapping Functions
• Observation2: The statistics of natural images
are homogeneous.
   P   x 
i
W
Pi x
band i
i
W
Share the same statistics
Wavelet Shrinkage Denoising
Donoho & Johnston 94 (unitary case)
• Degradation Model:
y  xn
H  I , n ~ N 0,  
• The MAP estimator:


xˆW y  arg max P xW | yW
xw

• The MAP estimator gives:


xˆW y  arg max P xW | yW
xw




xˆW y  arg max P yW | xW PxW 

xw




xˆW y  arg min  log P yW | xW  log PxW 
xˆW

xw

i
i
y  arg min    xW  yW
xw
i

2
 

  log Pi x 
i

i
W
• The MAP estimator diagonalizes the system:
xˆ
i
W
y 
i
W
i
i

 arg min  xW  yW
xw 
2
This leads to a very useful property:
Scalar mapping functions:
 
xˆ  M i yW
i
W
i
 
 log Pi x 

i
W
Wavelet Shrinkage Pipe-line
 
xˆ  W T MW W y
Mapping functions
Mi(yiw)
y
Transform
W
xiw
Inverse
Transform
WT
yiw
Non linear operation

x
How Many Mapping Functions?
• Due to the fact that:
   P   x 
i
W
Pi x
band i
i
W
• N mapping functions are needed for N subbands.
Subband Decomposition
• Wavelet transform:
yB  B y
where
• Shrinkage:
 
 
x̂  BT M B B y   BkT M k Bk y
k
 B1 


B  
 BN 
Wavelet Shrinkage Pipe-line
Shrinkage
functions
Wavelet
transform
y
B1
B1
B1
Bi
xi
Inverse
transform
B1
B1
B1 T
Bi
B
yiB
 
x̂   BkT M k Bk y
k

x
+
Designing The Mapping Function
• The shape of the mapping function Mj depends
solely on Pj and the noise variance .

x 
i
W iB j
(noise variance)
Modeling
marginal p.d.f.
of band j
MAP
objective
yw
 

M j 
 
i
i
i
i
i
xˆW yW  arg min  xW  yW  log Pj xW
xw
• Commonly Pj(yw) are approximated by GGD:
P  xw  ~ e
 x s  p
for p<1
from: Simoncelli 99
Hard
Thresholding
Soft
Thresholding
Linear Wiener
Filtering
MAP estimators for GGD model with three different exponents. The noise is
additive Gaussian, with variance one third that of the signal.
from: Simoncelli 99
• Due to its simplicity Wavelet Shrinkage
became extremely popular:
– Thousands of applications.
– Hundreds of related papers (984 citations of D&J paper in
Google Scholar).
• What about efficiency?
– Denoising performance of the original Wavelet Shrinkage
technique is far from the state-of-the-art results.
• Why?
– Wavelet coefficients are not really independent.
Recent Developments
• Since the original approach suggested by D&J
significant improvements were achieved:
Original Shrinkage
Over-complete
• Overcomplete transform
• Scalar MFs
• Simple
• Not considered state-of-the-art
Joint (Local) Coefficient
Modeling
• Multivariate MFs
• Complicated
• Superior results
Joint (Local) Coefficient Modeling
94
97
HMM
Crouse et. al.
Sparsity
Mallat, Zhang
03
2000
HMM
Fan-Xia
Joint Bayesian
Simoncelli
Context Modeling
Chang, et. al.
06
Joint Bayesian
Pizurika et. al
Context Modeling
Portilla et. al.
Co-occurence
Shan, Aviyente
Bivariate
Adaptive Thresh. Sendur, Selesnick
Li, Orchad
Shrinkage in Over-complete Transforms
94
Ridgelets
Candes
Shrinkage
D.J.
Undecimated
wavelet
Coifman, Donoho
Steerable
Simoncelli, Adelson
03
2000
97
Curvelets
Starck et. al.
06
Ridgelets
Carre, Helbert
Ridgelets
Nezamoddini et. al.
Contourlets
Matalon, et. al.
Contourlets
Do, Vetterli
K-SVD
Aharon, Elad
Over-Complete Shrinkage Denoising
• Over-complete transform:
yB  B y
where
• Shrinkage:

xˆ  B B
T

1
T
B MB
 B1 


B  
 BN 
B y B B  B M B y
T
1
T
k
k
k
k
• Mapping Functions: Naively borrowed from the
Unitary case.
What’s wrong with existing MFs?
1. Map criterion:
–
Solution is biased towards the most probable case.
2. Independent assumption:
–
In the overcomplete case, the wavelet coefficients are
inherently dependent.
3. Minimization domain:
–
For the unitary case MFs optimality is expressed in the
transform domain. This is incorrect in the overcomplete
case.
4. White noise assumption:
–
Image noise is not necessarily white i.i.d.
Why unitary based MFs are being used?
• Non-marginal statistics.
• Multivariate minimization.
• Multivariate MFs.
• Non-white noise.
Suggested Approach:
• Maintain simplicity
– Use scalar LUTs.
• Improve Efficiency
– Use Over-complete Transforms.
– Design optimal MFs with respect to a given
set of images.
– Express optimality in the spatial domain.
– Attain optimality with respect to MSE.
Optimal Mapping Function:
• Traditional approach: Descriptive
x 
Modeling
wavelet p.d.f.
i
W iB j
MAP
objective
• Suggested approach: Discriminative
x 
e
Optimality
criteria
M 
j
y 
e
M j 
The optimality Criteria
• Design the MFs with respect to a given set of examples:
{xei} and {yei}
   x  B B
e
i
i
T
  B M B y 
1
T
k
k
2
e
k
k
i
• Critical problem: How to optimize the non-linear MFs
The Spline Transform
• Let xR be a real value in a bounded interval [a,b).
• We divide [a,b) into M segments q=[q0,q1,...,qM]
• w.l.o.g. assume x[qj-1,qj)
• Define residue r(x)=(x-qj-1)/(qj-qj-1)
q0
a
q1
qj-1
qj
r(x) x
qj+(1-r(x))
x=[0,,0x=r(x)
,1-r(x),r(x)
,0,]q =qS
j-1q(x)q
qM
b
The Spline Transform-Cont.
• We define a vectorial extension:
  
x  Sq x q
  
   
  0  , 1 - r x i , r xi ,   0  
• We call this the
Spline Transform (SLT) of x.
ith row
S q x 
The SLT Properties
• Substitution property: Substituting the
boundary vector q with a different vector p forms
a piecewise linear mapping.
q4
x
x’=Sq(x)p
q
p4
q3
x’
q2
p3
p2
q1
p1
q
p0
q1
q2
q3
x q4
x
Back to the MFs Design
• We approximate the non-linear {Mk} with
piece-wise linear functions:
 
 
M k Bk y  Sq k Bk y p k
   x  B B
e
i
i
T
  B S B y p
1
T
k
k
2
e
qk
k
i
k
• Finding {pk} is a standard LS problem with a
closed form solution!
Designing the MFs
y 
e
B1
B1
B1
Bk
Undecimated wavelet:
2D convolutions
x 
e
x Bk
y Bk
Mk(y; pk)
p k 
closed form solution:
B1
B1
B1 T
B k
(BTB)-1
 
e
x
+
Results
Training Images
Tested Images
Simulation setup
•
•
•
•
Transform used: Undecimated DCT
Noise: Additive i.i.d. Gaussian
Number of bins: 15
Number of bands: 3x3 .. 10x10
Option 1: Transform domain – independent bands
y 
e
x Bk
BB1
B1 k
y Bk
BB1
B1Tk
(BTB)-1
BB1
B1Tk
(BTB)-1
 
e
x
Mk(y; pk)
x 
e
x Bk
BB1
B1 k
y Bk


 k   Bk x  S q Bk y i p k
e
i
i
k
e
2
x 
e
Option 2: Spatial domain – independent bands
y 
e
x Bk
BB1
B1 k
y Bk
BB1
B1Tk
(BTB)-1
BB1
B1Tk
(BTB)-1
 
e
x
Mk(y; pk)
x 
e
x Bk
BB1
B1 k
y Bk


 k   B Bk x  B S q Bk y i p k
T
k
i
e
i
T
k
k
e
2
x 
e
Option 3: Spatial domain – joint bands
y 
e
x Bk
BB1
B1 k
y Bk
BB1
B1Tk
(BTB)-1
BB1
B1Tk
(BTB)-1
 
e
x
Mk(y; pk)
x 
e
x Bk
BB1
B1 k
y Bk
   x  B B 
e
i
i
T
1
B
T
k
k

e

S qk Bk y i p k
2
x 
e
Option 1
Option 2
Option 3
MFs for UDCT 8x8 (i,i) bands, i=1..4, =20
33
32.5
Method 1
Method 2
Method 3
32
31.5
psnr
31
30.5
30
29.5
29
28.5
28
27.5
barbara
boat
fingerprint
house
lena peppers256
Comparing psnr results for 8x8 undecimated DCT, sigma=20.
8x8 UDCT
=10
8x8 UDCT
=20
8x8 UDCT
=10
The Role of Quantization Bins
35.5
35
34.5
34
psnr
33.5
33
32.5
32
barbara
boat
fingerprint
house
lena
peppers256
31.5
31
30.5
5
10
15
20
number of bins
25
30
8x8 UDCT
=10
35
The Role of Transform Used
36
35.5
35
3x3 DCT
5x5 DCT
7x7 DCT
9x9 DCT
psnr
34.5
34
33.5
33
=10
32.5
32
barbara
boat
fingerprint
house
lena peppers256
The Role of Training Image
36
35
34
33
psnr
32
31
30
29
28
27
barbara
boat
fingerprint
house
lena peppers256
The Role of noise variance
=5
=10
=15
=20
MFs for UDCT 8x8 (i,i) bands, i=2..6.
The role of noise variance
• Observation: The obtained MFs for different
noise variances are similar up to scaling:
v 
M  v s M  0  
s
where
s   0
Comparison between M20(v) and 0.5M10(2v) for basis [2:4]X[2:4]
Comparison with BLS-GSM
boat
45
45
45
40
PSNR
50
40
35
35
30
30
1 2 5 10 15 20 25
s.t.d.
40
35
30
1 2 5 10 15 20 25
s.t.d.
fingerprint
house
1 2 5 10 15 20 25
s.t.d.
peppers
50
50
45
45
45
40
40
35
35
30
30
1 2 5 10 15 20 25
s.t.d.
PSNR
50
PSNR
PSNR
lena
50
PSNR
PSNR
barbara
50
40
35
30
1 2 5 10 15 20 25
s.t.d.
1 2 5 10 15 20 25
s.t.d.
Comparison with BLS-GSM
50
48
proposed method
GSM method
46
44
PSNR
42
40
38
36
34
32
30
28
1
2
5
10
s.t.d.
15
20
25
Other Degradation Models
JPEG Artifact Removal
JPEG Artifact Removal
Image Sharpening
Image Sharpening
Conclusions
• New and simple scheme for over-complete
transform based denoising.
• MFs are optimized in a discriminative manner.
• Linear formulation of non-linear minimization.
• Eliminating the need for modeling complex
statistical prior in high-dim. space.
• Seamlessly applied to other degradation
problems as long as scalar MFs are used for
reconstruction.
Conclusions – cont.
• Extensions:
– Filter-cascade based denoising.
– Multivariate MFs (activity level).
– Non-homogeneous noise characteristics.
• Open problems:
– What is the best transform for a given image?
– How to choose training images that form faithful
representation?
Thank You
MSE for MF scaling from =10 to =20
22
21
2.5
20
19
2.0
scale y
18
1.5
17
16
1.0
15
14
0.5
13
0.5
1.0
1.5
scale x
2.0
2.5
12
MSE for MF scaling from =15 to =20
22
21
2.5
20
19
scale y
2.0
18
17
1.5
16
15
1.0
14
13
0.5
12
0.5
1.0
1.5
scale x
2.0
2.5
11
MSE for MF scaling from =25 to =20
22
21
2.5
20
19
2.0
scale y
18
1.5
17
16
1.0
15
14
0.5
13
0.5
1.0
1.5
scale x
2.0
2.5
12
1.5
v 
M  v s M  0  
s
where
S
1
s   0
0.5
0
0
0.5
1
sigma / 20
1.5
Image Sharpening
Wavelet Shrinkage Pipe-line
Shrinkage
functions
Wavelet
transform
y
B1
B1
B1
Bi
xi
Inverse
transform
B1
B1
B1 T
Bi
B
yiB
(BTB)-1

xˆ  B B
T
  B M B y
1
T
k
k
k
k

x
+
Option 1
MFs for UDCT 8x8 (i,i) bands, i=1..4, =20
Option 2
MFs for UDCT 8x8 (i,i) bands, i=1..4, =20
Option 3
MFs for UDCT 8x8 (i,i) bands, i=1..4, =20
33
32.5
32
Traditional
Suggested
31.5
psnr
31
30.5
30
29.5
29
28.5
28
27.5
barbara
boat
fingerprint
house
lena peppers256
Comparing psnr results for 8x8 undecimated DCT, sigma=20.
36
Traditional
Suggested
35.5
35
psnr
34.5
34
33.5
33
32.5
32
barbara
boat
fingerprint
house
lena peppers256
Comparing psnr results for 8x8 undecimated DCT, sigma=10.
Download