PP, Denoising

advertisement
Denoising by Wavelets



What is Denoising
Denoising refers to manipulation of
wavelet coefficients for noise reduction.
Coefficient values not exceeding a
carefully selected threshold level are
replaced by zero followed by an inverse
transform of modified coefficients to
recover denoised signal.
Denoising by thresholding of wavelet
coefficients is therefore a nonlinear (local)
operation
Noise Reduction by Wavelets and in
Fourier Domains





Comments
Denoising is a unique feature of signal
decomposition by wavelets
It is different from noise reduction as used in
spectral manipulation and filtering.
Denoising utilizes manipulation of wavelet
coefficients that are reflective of time/space
behavior of a given signal.
Denoising is an important step in signal
processing. It is used as one of the steps in
lossy data compression and numerous noise
reduction schemes in wavelet analysis.
Denoising by Wavelets




For denoising we use thresholding approach applied
on wavelet coefficients.
This is to be done by a judiciously chosen
thresholding levels.
Ideally each coefficients may need a unique
threshold level attributed to its noise content
In the absence of information about true signal, this
is not only feasible but not necessary since
coefficients are somewhat correlated both at inter
and intra decomposition levels ( secondary features
of wavelet transform).
True Signal Recovery



Thresholding modifies empirical
coefficients (coefficients belonging to the
measured noisy signal) in an attempt to
reconstruct a replica of the true signal.
Reconstruction of the signal is aimed to
achieve a ‘best’ estimate of the true
(noise-free) signal.
‘Best estimate’ is defined in accordance
with a particular criteria chosen for
threshold selection.
Thresholding


Mathematically, thresholding of the
coefficients can be described by a
transformation of the wavelet coefficients
Transform matrix is a diagonal matrix
with elements 0 or 1. Zero elements
forces the corresponding coefficient below
or equal to a given threshold to be set to
zero while others corresponding to one,
retains coefficients as unchanged.

=diag(1, 2,….. N) with i={0,1}, i=1,…N.
Hard or Soft Thresholding.
Hard Thresholding. Only wavelet coefficients with
absolute values below or at the threshold level
are affected, they are replaced by zero and
others are kept unchanged.
 Soft Thresholding. Coefficiens above threshold
level are also modified where they are reduced
by the threshold size.
Donoho refers to soft threshoding as ‘shrinkage’
since it can be proven that reduction in coefficient amplitudes by soft thresholding, also
results in a reduction of the signal level thus a ‘
shrinkage’.

Hard and Soft Thresholding



Mathematically hard and soft
thresholding is described as
Hard threshold:
wm= w if |w|≥th,
wm= 0 if |w|<th
Soft threshold :
wm = sign(w)(|w|-th), |w|≥th,
wm=0 , |w|<th
Global and Local Thresholding

Thresholding can be done globally
or locally i.e.



single threshold level is applied across
all scales,
or it can be scale-dependent where
each scale is treated separately.
It can also be ‘zonal’ in which the given
function is divided into several
segments (zones) and at each segment
different threshold level is applied.
Additive Noise Model and
Nonparametric Estimation Problem

Additive Noise Model. Additive noise model is
superimposed on the data as follows.
f(t) = s(t) + n(t)



n(t) is a random variable assumed to be white
Gaussian N(0, σ). S(t) is a signal not
necessarily a R.V.
Original signal can be described by the given basis
function in which coefficients of expansion are
unknown se(t)=∑αi φi (t)
Se(t) is the estimate of the true signal s(t). Note the
estimate s^(t) is described by set of spanning
function φi(t), chosen to minimize the L2 error
function of the approximation
||s(t)-se (t)||2 .
As such denoising is considered as a nonparametric estimation problem.
Properties of Wavelets Utilized in
Denoising


Sparse Representation.
Wavelet expansion of class of
functions that exhibit sufficient degree
of regularity and smoothness, results
in pattern of coefficient behavior that
can often be categorized into two
classes:


1) a few large amplitude coefficients and
2) large number of small coefficients. This
property allows compaction for compression
and efficient feature extraction.
Wavelet Properties and Denoising


Decorrelation. Wavelets are referred to as
decorrelators in reference to a property in which
wavelet expansion of a given signal results in
coefficients that often exhibit a lower degree of
correlation among the coefficients as
compared with that of the signal components.
Orthogonality. Intuitively, under a given
standard DWT of a signal, this can be explained by
the orthogonality of expansion and corresponding
bases functions.
i.i.d. assumption

Under certain assumptions,
coefficient in highest frequency
band, can be considered to be
statistically identically
independent of each other
Examples of Signal Compaction and
Decorrelation at Coefficient Domain
Coeffs
Signal at High Freq.Band #36
5
Cycle #6
No Knock
1
0
-1
0
100
200
-5
300
Hard Knock
Cycle #1
0
100
200
10
20
30
0
10
20
30
0
10
20
30
10
0
-10
300
10
5
Cycle # 8
Mild Knock
0
0
-5
0
20
5
-5
0
0
100
200
300
0
-10
Signal Decorrelation at Coefficient
Domain
2
#11,Den1
Normailzed Coeffs (to coeff L norm),Denoised Stages 1 and 2
0.1
0
-0.1
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
No Knock #11, Hard Knoc k # 7
100
#11,Den2
0.2
0
-0.2
#7,Den1
0.02
0
-0.02
#7,Den2
0.02
0
-0.02
Why Decorrelation by Wavelets



Coefficients carry signal information in
subspaces that are spanned by basis
functions of the given subspace.
Such bases can be orthogonal (
uncorrelated) to each other, therefore
coefficients tend to be uncorrelated to
each other
Segmentation of signal by wavelets
introduce decorrealtion at coefficient
domain
White Noise and Wavelet Expansion



No wavelet function can model noise
components of a given signa ( no match in
waveform for white noise).
White noise have spectral distribution which
spreads across all frequencies. There is no
match ( correlation) between a given wavelet
and white noise
As such an expansion of noise component of
the signal, results in small wavelet coefficients
that are distributed across all the details.
Search fro Noise in Small Coeffs

S(t) = x(t) +n(t)
An expansion of white noise
component of the signal, results
in small wavelet coefficients that
are distributed across all the
details.
We search for n(t) of white noise at
small coefficients in DWT that often
residing in details
White Noise and High Frequency
Details



At high frequency band d1, the number
of coefficients is largest under DWT or
other similar decomposition
architectures.
As such, a large portion of energy of
the noise components of a signal,
resides on the coefficients of high
frequency details d1
At high frequency band d1, are short
length basis functions and there is high
decorrelation at this level( white noise)
White Noise Model and Statistically i.i.d. Coeffs



Decorrelation property of the wavelet
transform at the coefficient level, can be
examined in terms of statistical property of
wavelet coefficients.
At one extreme end, coefficients may be
approximated as a realization of a stochastic
process characterized as a purely random
process with i.i.d (identically independently
distributed) random variable.
Under this assumption, every coefficient is
considered statistically independent of
the neighboring coefficients both at the
inter-scale (same sale) and intra-scale
(across the scales) level.
White Noise Model and Multiblock denoising



However, in practice, often there exist
certain degree of interdependence among
the coefficients, and we need to consider
correlated coefficients for noise models(
such as Markov models).
In other models used to estimate noise,
blocks of coefficients instead of single
coefficients, are used as statistically
independent
In Matlab, multi-block denoising at each
level is considered
Main Task

Main task in denoising by
wavelets:


Identification of underlying statistical
distribution of the coefficients attributed
to noise.
For signal, no structural assumption is
made since in general it is assumed to be
unknown. However, if we have additional
information on the signal, we can use
them and improve our estimation results
Main Task



Denoising problem is treated as an
statistical estimation problem.
The task is followed by the evaluation
of variance and STD of statistics of the
noise model that are used as metrics
for thresholding.
A’priori distributions may be imposed
on the coefficients using Bay’s
estimation approach after which
denoising is treated as a parametric
estimation problem
Alternative Models for Noise Reduction.

Basic Considerations

Additive Noise Model. Basic modeling
structure utilizes additive noise model as
stated earlier.
x(i)=s(i)+ n(i) , i=1,2, … N


N is signal length, x(i) are the measurements,
s(i) is the true signal value(unknown) and n(i)
are the noise components(unknown)
n(i) is assumed to be white Gaussian with zero
mean N(0,1). Standard deviation is to be
estimated
Additive Noise Model and Linearity of
Wavelet Transform


Under an orthogonal decomposition and
additive noise model, linearity of wavelet
transform insures that the statistical
distribution of the measurements and
white noise remain unchanged in the
coefficient domain.
Under an orthogonal decomposition, each
coefficient is decomposed into component
attributed to the true signal values s(n)
and to signal noise component n(k) as
follows.
cj= uj+dj
i=1,2, .. n
Orthogonal vs Biorthogonal




In vector form
C=U+D
C, U and D are vector representation of empirical
wavelet coefficients, true coefficient values(
unknown) and noise content of the coefficients
respectively.
Note ‘additive noise model’ at the coefficient level
while preserving statistical property of the signal
and noise at the coefficient as stated above, is valid
under orthogonal transformation where Parseval
relationship holds.
It is not valid under biorthogonal transform. Under
biorthogonal transform, white noise at the signal
level will exhibit itself as colored noise since the
coefficients here are no longer i.i.d but they are
correlated to each other.
Principle Considerations
1. Assumption of Zero Mean Gaussian.
 Under additive noise model and
assumption of i.i.d. for the wavelet
coefficients, we consider zero mean
Gaussian distribution at the coefficient
domain.
 Mean centering of data can always be
done to insure zero mean Gaussian
assumption as used above.
Main Considerations




Preservation of Smoothness.
It can be proved that under soft
thresholding, smoothness property of the
original signal remains unchanged with high
probability under variety of smoothness
measures ( such as Holden or Sobolov
smoothness measures).
Smoothness may be defined in terms of
integral of squared mth derivative of a
given function to be of finite value
This property and structural correlation of
wavelet coefficients at consecutive scales,
are used in wavelet-based zero-tree
compression algorithm
Main Considerations



Shrinkage. Under soft thresholding ( nonlinear operation at the
coefficient level), it can be shown
that
| xid |≤|xi| where xid is
denoised signal component i.e.
denoising results in reduction of all
the coefficients and shrinkage at
the signal level as well.
Denoising Problem



Denoising problem is mainly
estimation of STD and Threshold
Level
Basic problem in noise reduction
under Gaussian white noise, is
centered around the estimation of
standard deviation of the Gaussian
noise 
It is then used to determine a
suitable threshold
Alternative Considerations.





White Noise Model-Global
(Universal) Thresholding.
Assume coefficients at the highest
frequency details gives a good estimate
of the noise content .
A white noise model is superimposed on
the coefficients at the highest frequency
detail level d1
An estimate of the standard deviation at
the d1 level is then used to arrive at a
suitable threshold level for coefficient
thresholding at all levels.
This approach is a global thresholding
which is applied to all detail coefficients
Level Dependent Thresholds



Nonwhite (Colored) Noise Model.
Under this model, still white noise model
is imposed on the coefficients of details,
however threshold levels are considered
to be level(scale) dependent.
Gaussian white noise model is imposed on
detail coefficients using standard
deviation and threshold level at each level
separately.
Comments on Estimation Problem, Near
Optimality under other Optimality Criteria


Wavelet denoising (WaveShrink) utilizes
a nonparametric function estimation
approach for noise thresholding.
It has been shown that statistically,
denoising is considered to be:
asymptotically near optimal over a wide
range of optimality criteria and for large
class of functions found in scientific and
engineering applications( see ref by
Donoho).
Inaccuracy of Assuming Gaussian
Distribution N(1,0), Result Evaluations



Assumption of Gaussian distribution at d1
level may not always be valid
Distribution of the coefficient at d1 often
exhibit a long tail as compared with
standard Gaussian(peaky distribution) This
can also be observed in the case of
sparsely distributed large amplitude
coefficients or outliers.
Under such condition, application of global
thresholding may be revised and results of
the thresholding be examined in light of
actual data analysis and performance of
denoising.
Inaccuracy of Assuming Gaussian Distribution

Fig.2 Peaky Gaussian-like pdf of the
coefficients with long tail ends
Signal Estimation and Threshold
Selection Rules




Use statistical estimation theory applied
on probability distribution of the wavelet
coefficients
Use criteria for estimation of statistical
parameters and selecting threshold levels
A loss function which is referred to as
‘risk function’ is defined first.
For Loss function we often use L2
norm of the error i.e. variance of
estimation error, i.e. difference
between the estimated value and actual
unknown value
Risk (Loss Function)


We use expected value of the error
as loss function since we are
dealing with noisy signal which is a
random variable and is therefore
described in term of expected
value.
R( X , X ')  E || ( X  X ') ||2
Minimization of risk function results
in an estimate of the variance of the
coefficient.
Risk (Loss) Function




X is the actual (true) value of the signal to be
estimated ( or coefficients) and X’ is an
estimate of the signal X ( or coefficients ).
Since noise component is assumed to be zero
mean Gaussian, the difference is a measure
of an error based on the additive noise model
and given risk function.
It is a measure of the energy of the noise
i.e.∑[n(k)]2
Thus optimization procedure as defined above,
attempts to reduce the energy of the signal X by
an amount equal to the energy of the noise and
thus compensating for the noise in the sense of
L2 norms.
Minimization of the risk function at
coefficient level




Under an orthogonal decomposition, minimization of
risk function at the signal level, can equivalently be
defined at the coefficient level.
R(X^ ,X)= E||X^- X||2=E||W-1(C^ -C)||2 =E||(C^ -C)||2
C^ is the estimate of the true coefficient values. We
have used additive noise model and wavelet transform
in matrix form C=WX as described below.
X=S + σn,
C=WX, X=W-1 C
Accordingly, minimization of the risk function at the
coefficient level results equivalently in estimating the
true value of the signal.
Use of Minimax Rule



One ‘best’ estimate is obtained
using minimax rule indicated below:
Minmax R(X^,X)= inf sup R(X^,X)
Under minmax rule, worst case
condition is considered, i.e.
Sup R(X^,X)
Here our objective is to mimimize
the risk under worst case condition
(i.e. obtain Min Max R) .
Global/Universal Thresholding Rule



Under the assumption of i.i.d. for the wavelet coefficients
and Gaussian white noise, one can show that
Under soft thresholding, the actual risk is within
log(n) factor of the ideal risk where the error is
minimal (on the average).
This results in the following threshold value referred to as
Universal Thresholding which minimizes max risk as
defined above.
Th=(2 log n), =MAD/.6745
MAD is ‘median absolute deviation’ of the coefficients
median({|d
J−1,k
|: k = 0, 1, . . . , 2^(J−1) −1})
Ref: Donoho D.L. ”Denoising by Soft thresholding”, IEEE
Trans on Information Theory, Vol 41,No.3 May 1995,pp
613-627
Universal Thresholding Rule
Underlying basis for above threshold rule
is based on the assumption of i.i.d for set
of random variables X1, . . . , Xn having
a distribution N(0, 1).
 Under this assumption, we can say the
following for the probability of maximum
absolute value of the coefficients.
P{max |Xi|, 1≤i≤n> √ 2 logn}→ 0, as n →
∞
Note Xi refers to noise

Universal Thresholding Rule

Therefore, under universal thresholding
applied to wavelet coefficients, we can say
the following.
with
high probability every sample in
the wavelet transform (i.e.coefficient)
in which the underlying function is
exactly zero will be estimated as zero
Universal Thresholding Rule in WP

Universal threshold estimation rule
when applied to wavelet packet is to
be adjusted to the length of
decomposition which is nlog(n).
Threshold is then
Th=[2 log(nlog(n)].
Level Dependent Thresholding



In level dependent thresholding, thresholds
are rescaled at each level to arrive at a new
estimate corresponding to the standard
deviation of wavelet coefficients at that level.
We consider white noise model and Gaussian
distribution for the coefficient at each level.
This is referred to ‘mln’ [multilevel noise
model] in Matlab toolbox. Threshold level is
determined as follows.
Th(j,n) = σj(2 log nj), σj =MADj /.6745
Stein Unbiased Risk Estimator( SURE)



A criteria referred to as Stein Unbiased Risk
Estimator abbreviated by SureSrink, utilizes
statistical estimation theory in which an unbiased
estimate of loss function is derived
Suppose X1, . . . , Xs are independent N(μi, 1), i =
1, . . . , s, random variables. The problem is to
estimate:
mean vector μ = (μ1, . , μs) with minimum L 2-risk.
Stein states that the L2-loss can be estimated
unbiasedly using any estimator μ that can be written
as
μ(X) = X + g(X),
where the function
g = (g1, . . . , gs) is weakly differentiable.
SURE Estimator

Under SURE criteria, following is considered as an
estimate of the loss function.
E||(x)- e||^2 =E SURE(th:x)
where
SURE(th;x)=s-2#B{i:|Xi|≤ th}+
(min(|xi|,th)^2
where (x) is a fixed estimate of the mean of the
coefficients and #B denotes the cardinality of a set B.

It can be shown that SURE(th;x) is an unbiased estimate
of the L2-risk, i.e.
µ|| µλ

- µ||^2 = µSURE(th; X).
Threshold level λ is based on minimum value of SURE loss
function which is defined as
(X)
Ths = arg min th Sure(th;x)
Other Thresholding Rules


Fixed Form thresholding is the same
as Universal Thresholding
Th=(2 log n), =MAD/.6745
Minimax refers to finding the
minimum of the maximum mean
square error obtained for the worst
function in a given set
Rigorous SURE Denoising

Rigorous SURE (Stein’s Unbiased
Risk Estimate), a threshold-based
method with a threshold
where n is the number of samples
in the signal(i.e. coefficients)
Heuristic SURE


Heuristic SURE is a combination of
Fixed Form and Rigorous SURE
( for details refer to Matlab
Helpdesk)
Results of Denoising Application on
CDMA Signal







At SNR = 3 dB, MSE between the original signal
and the noisy signal is 0.99.
The following table shows MSE after denoising:
Wavelet
Haar,Bior3.1,Db10,Coif5,
fixed form, white noise 0.55 0.64 0.46 0.46
RigSURE, white noise 0.36 0.41 0.27 0.27
HeurSURE,wh. Noise 0.42 0.41 0.27 0.28,
Minimax, white noise 0.46 0.46 0.34 0.33,
Minimax, nonwhite
0.53 1.09 0.44 0.32,
Observations on Denoising Applied on
CDMA Signals





It was found that soft thresholding gives better
performance ( in terms of SNR)than hard thresholding
in this project.
Since the noise model used in this project is WGN,
selecting the correct noise type (white noise) will also
give better results.
Db10 and Coif5 outperform Haar and Bior3.1 in
denoising because they have higher order of vanishing
moments.
At SNR equals –3 dB, Db10 and Coif5 with soft
thresholding and rigorous SURE threshold selection rule
give very good denoising performance. The MSE is
brought from 3.9 to approximately 0.7.
In general, Fixed Form and Heuristic SURE are more
aggressive in removing noise. Rigorous SURE and
Minimax are more conservative and they give better
results in this project because some details of the
CDMA signal lie in the noise range.
Denoising in MATLAB

In Matlab, command ‘wden’ is used for denoising:
Sig=wden(s,tptr,sorh,scal,n,wav)
for determining  for noise thresholding where:






s=signal,
sorh= soft or hard thresholding ‘s’ ,‘h’
scal=1 original model( white noise with unscaled noise),
scal=’sln’ first estimate of the noise variance based on
1st level details. This uses basic model.
Scal=’mln’, is for nonwhite noise, i.e scale dependent
noise thresholding.
[For further details, please refer to Matlab wavemenu
toolbox]
Artifacts at points of Singularity and
Stationary Wavelet Transform







Gaussian noise model for the coefficients does not fully
agree with peaky shape of the distribution at d1 level.
Gibbs type of oscillations and artifacts are also observed
at points of singularity, though not as much prominent as
Gibbs oscillation.
To correct such phenomena, stationary wavelet transform
is used and has been incorporated in Matlab toolbox.
In stationary wavelet transform, DWT is applied for all
circular shifts of the signal of length N and coefficients are
evaluated and threshold levels are determined.
An average of all the N denoised signals is used for the
final denoised signal
The only limitation here is that signal length must be a
factor of 2J .
Denoising using SWT often results in a conservative noise
reduction results as compared with standard soft
thresholding using ‘Fixed Form’ or RigSure.
Illustrative Examples of Denoising
No Knoc k Cy cle #11
Hard Knoc k, Cyc le #7
2
6
4
Original
1
2
0
0
-1
Den Stage 1
-2
0
100
200
300
-4
2
4
1
2
0
0
-1
-2
-2
Den Stage 2
-2
0
100
200
300
-4
1.5
4
1
2
0
100
200
300
0
100
200
300
0
100
200
300
0.5
0
0
-2
-0.5
-1
0
100
200
300
-4
Spectrum of Original and
Denoised Signal
Knock, #8
3
4
Original
2
4
Denoised
1
1
0.5
0
2000
4000
6000
10000
0
0
2000
4000
6000
0
2000
4000
6000
0
2000
4000
Freq Hz
6000
4000
3000
5000
2000
1000
0
No Knock #11
x 10
1.5
2
0
Mild Knock #14
x 10
0
2000
4000
6000
0
1500
600
1000
400
500
200
0
0
2000
4000
Freq. Hz
6000
0
Spectrum, Original and Denoised

Spectrum before denoising ( green) and after denoising (
blue)
2D Denoising, An Illustration
Hidden Markov Model for Denoising

Please refer to class notes posted
on site

















THE CLASSICAL APPROACH TO WAVELET THRESHOLDING
A wavelet based linear approach, extending simply spline smoothing estimation methods as described by
Wahba (1990), is the one suggested by Antoniadis (1996) and independentlyby Amato & Vuza (1997).
Of non-threshold type, this method is appropriate for estimating relatively regular functions. Assuming
that the smoothness index s of the function g to be
recovered is known, the resulting estimator is obtained by estimating the scaling coefficients
cj0k by their empirical counterparts ˆ cj0k and by estimating thewavelet coefficients djk via a linear
shrinkage
d˜jk =
dˆjk
1 + λ22js ,
where λ > 0 is a smoothing parameter. The parameter λ is chosen by cross-validation in Amato
& Vuza (1997), while the choice of λ in Antoniadis (1996) is based on risk minimization and
depends on a preliminary consistent estimator of the noise level σ. The above linear methods
are not designed to handle spatially inhomogeneous functions with low regularity. For such
functions one usually relies upon nonlinear thresholding or nonlinear shrinkage methods.
Donoho & Johnstone (1994, 1995, 1998) and Donoho, Johnstone, Kerkyacharian & Picard
(1995) proposed a nonlinear wavelet estimator of g based on reconstruction by keeping the
empirical scaling coefficients ˆ cj0k in (2) intact and from a more judicious selection of the
empirical wavelet coefficients dˆjk in (3).














They suggested the extraction of the significant wavelet
coefficients by thresholding in which wavelet coefficients are set to zero if their absolute value is
below a certain threshold level, λ ≥ 0, whose choice we discuss in more detail in Section 4.1.
Under this scheme we obtain thresholded wavelet coefficients using either the hard or soft
thresholding rule given respectively by
δH
λ (dˆjk) =
0 if|
dˆjk| ≤ λ
dˆjk if | dˆjk| > λ
(4)
and
δSλ
(dˆjk) =










0 if|dˆjk| ≤ λ
dˆjk − λ if dˆjk > λ
dˆjk + λ if dˆjk < −λ.
(5)
11
Thresholding allows the data itself to decide which wavelet coefficients are significant; hard
thresholding (a discontinuous function) is a ‘keep’ or ‘kill’ rule, while soft thresholding (a
continuous function) is a ‘shrink’ or ‘kill’ rule.
Download