Denoising by Wavelets What is Denoising Denoising refers to manipulation of wavelet coefficients for noise reduction. Coefficient values not exceeding a carefully selected threshold level are replaced by zero followed by an inverse transform of modified coefficients to recover denoised signal. Denoising by thresholding of wavelet coefficients is therefore a nonlinear (local) operation Noise Reduction by Wavelets and in Fourier Domains Comments Denoising is a unique feature of signal decomposition by wavelets It is different from noise reduction as used in spectral manipulation and filtering. Denoising utilizes manipulation of wavelet coefficients that are reflective of time/space behavior of a given signal. Denoising is an important step in signal processing. It is used as one of the steps in lossy data compression and numerous noise reduction schemes in wavelet analysis. Denoising by Wavelets For denoising we use thresholding approach applied on wavelet coefficients. This is to be done by a judiciously chosen thresholding levels. Ideally each coefficients may need a unique threshold level attributed to its noise content In the absence of information about true signal, this is not only feasible but not necessary since coefficients are somewhat correlated both at inter and intra decomposition levels ( secondary features of wavelet transform). True Signal Recovery Thresholding modifies empirical coefficients (coefficients belonging to the measured noisy signal) in an attempt to reconstruct a replica of the true signal. Reconstruction of the signal is aimed to achieve a ‘best’ estimate of the true (noise-free) signal. ‘Best estimate’ is defined in accordance with a particular criteria chosen for threshold selection. Thresholding Mathematically, thresholding of the coefficients can be described by a transformation of the wavelet coefficients Transform matrix is a diagonal matrix with elements 0 or 1. Zero elements forces the corresponding coefficient below or equal to a given threshold to be set to zero while others corresponding to one, retains coefficients as unchanged. =diag(1, 2,….. N) with i={0,1}, i=1,…N. Hard or Soft Thresholding. Hard Thresholding. Only wavelet coefficients with absolute values below or at the threshold level are affected, they are replaced by zero and others are kept unchanged. Soft Thresholding. Coefficiens above threshold level are also modified where they are reduced by the threshold size. Donoho refers to soft threshoding as ‘shrinkage’ since it can be proven that reduction in coefficient amplitudes by soft thresholding, also results in a reduction of the signal level thus a ‘ shrinkage’. Hard and Soft Thresholding Mathematically hard and soft thresholding is described as Hard threshold: wm= w if |w|≥th, wm= 0 if |w|<th Soft threshold : wm = sign(w)(|w|-th), |w|≥th, wm=0 , |w|<th Global and Local Thresholding Thresholding can be done globally or locally i.e. single threshold level is applied across all scales, or it can be scale-dependent where each scale is treated separately. It can also be ‘zonal’ in which the given function is divided into several segments (zones) and at each segment different threshold level is applied. Additive Noise Model and Nonparametric Estimation Problem Additive Noise Model. Additive noise model is superimposed on the data as follows. f(t) = s(t) + n(t) n(t) is a random variable assumed to be white Gaussian N(0, σ). S(t) is a signal not necessarily a R.V. Original signal can be described by the given basis function in which coefficients of expansion are unknown se(t)=∑αi φi (t) Se(t) is the estimate of the true signal s(t). Note the estimate s^(t) is described by set of spanning function φi(t), chosen to minimize the L2 error function of the approximation ||s(t)-se (t)||2 . As such denoising is considered as a nonparametric estimation problem. Properties of Wavelets Utilized in Denoising Sparse Representation. Wavelet expansion of class of functions that exhibit sufficient degree of regularity and smoothness, results in pattern of coefficient behavior that can often be categorized into two classes: 1) a few large amplitude coefficients and 2) large number of small coefficients. This property allows compaction for compression and efficient feature extraction. Wavelet Properties and Denoising Decorrelation. Wavelets are referred to as decorrelators in reference to a property in which wavelet expansion of a given signal results in coefficients that often exhibit a lower degree of correlation among the coefficients as compared with that of the signal components. Orthogonality. Intuitively, under a given standard DWT of a signal, this can be explained by the orthogonality of expansion and corresponding bases functions. i.i.d. assumption Under certain assumptions, coefficient in highest frequency band, can be considered to be statistically identically independent of each other Examples of Signal Compaction and Decorrelation at Coefficient Domain Coeffs Signal at High Freq.Band #36 5 Cycle #6 No Knock 1 0 -1 0 100 200 -5 300 Hard Knock Cycle #1 0 100 200 10 20 30 0 10 20 30 0 10 20 30 10 0 -10 300 10 5 Cycle # 8 Mild Knock 0 0 -5 0 20 5 -5 0 0 100 200 300 0 -10 Signal Decorrelation at Coefficient Domain 2 #11,Den1 Normailzed Coeffs (to coeff L norm),Denoised Stages 1 and 2 0.1 0 -0.1 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 No Knock #11, Hard Knoc k # 7 100 #11,Den2 0.2 0 -0.2 #7,Den1 0.02 0 -0.02 #7,Den2 0.02 0 -0.02 Why Decorrelation by Wavelets Coefficients carry signal information in subspaces that are spanned by basis functions of the given subspace. Such bases can be orthogonal ( uncorrelated) to each other, therefore coefficients tend to be uncorrelated to each other Segmentation of signal by wavelets introduce decorrealtion at coefficient domain White Noise and Wavelet Expansion No wavelet function can model noise components of a given signa ( no match in waveform for white noise). White noise have spectral distribution which spreads across all frequencies. There is no match ( correlation) between a given wavelet and white noise As such an expansion of noise component of the signal, results in small wavelet coefficients that are distributed across all the details. Search fro Noise in Small Coeffs S(t) = x(t) +n(t) An expansion of white noise component of the signal, results in small wavelet coefficients that are distributed across all the details. We search for n(t) of white noise at small coefficients in DWT that often residing in details White Noise and High Frequency Details At high frequency band d1, the number of coefficients is largest under DWT or other similar decomposition architectures. As such, a large portion of energy of the noise components of a signal, resides on the coefficients of high frequency details d1 At high frequency band d1, are short length basis functions and there is high decorrelation at this level( white noise) White Noise Model and Statistically i.i.d. Coeffs Decorrelation property of the wavelet transform at the coefficient level, can be examined in terms of statistical property of wavelet coefficients. At one extreme end, coefficients may be approximated as a realization of a stochastic process characterized as a purely random process with i.i.d (identically independently distributed) random variable. Under this assumption, every coefficient is considered statistically independent of the neighboring coefficients both at the inter-scale (same sale) and intra-scale (across the scales) level. White Noise Model and Multiblock denoising However, in practice, often there exist certain degree of interdependence among the coefficients, and we need to consider correlated coefficients for noise models( such as Markov models). In other models used to estimate noise, blocks of coefficients instead of single coefficients, are used as statistically independent In Matlab, multi-block denoising at each level is considered Main Task Main task in denoising by wavelets: Identification of underlying statistical distribution of the coefficients attributed to noise. For signal, no structural assumption is made since in general it is assumed to be unknown. However, if we have additional information on the signal, we can use them and improve our estimation results Main Task Denoising problem is treated as an statistical estimation problem. The task is followed by the evaluation of variance and STD of statistics of the noise model that are used as metrics for thresholding. A’priori distributions may be imposed on the coefficients using Bay’s estimation approach after which denoising is treated as a parametric estimation problem Alternative Models for Noise Reduction. Basic Considerations Additive Noise Model. Basic modeling structure utilizes additive noise model as stated earlier. x(i)=s(i)+ n(i) , i=1,2, … N N is signal length, x(i) are the measurements, s(i) is the true signal value(unknown) and n(i) are the noise components(unknown) n(i) is assumed to be white Gaussian with zero mean N(0,1). Standard deviation is to be estimated Additive Noise Model and Linearity of Wavelet Transform Under an orthogonal decomposition and additive noise model, linearity of wavelet transform insures that the statistical distribution of the measurements and white noise remain unchanged in the coefficient domain. Under an orthogonal decomposition, each coefficient is decomposed into component attributed to the true signal values s(n) and to signal noise component n(k) as follows. cj= uj+dj i=1,2, .. n Orthogonal vs Biorthogonal In vector form C=U+D C, U and D are vector representation of empirical wavelet coefficients, true coefficient values( unknown) and noise content of the coefficients respectively. Note ‘additive noise model’ at the coefficient level while preserving statistical property of the signal and noise at the coefficient as stated above, is valid under orthogonal transformation where Parseval relationship holds. It is not valid under biorthogonal transform. Under biorthogonal transform, white noise at the signal level will exhibit itself as colored noise since the coefficients here are no longer i.i.d but they are correlated to each other. Principle Considerations 1. Assumption of Zero Mean Gaussian. Under additive noise model and assumption of i.i.d. for the wavelet coefficients, we consider zero mean Gaussian distribution at the coefficient domain. Mean centering of data can always be done to insure zero mean Gaussian assumption as used above. Main Considerations Preservation of Smoothness. It can be proved that under soft thresholding, smoothness property of the original signal remains unchanged with high probability under variety of smoothness measures ( such as Holden or Sobolov smoothness measures). Smoothness may be defined in terms of integral of squared mth derivative of a given function to be of finite value This property and structural correlation of wavelet coefficients at consecutive scales, are used in wavelet-based zero-tree compression algorithm Main Considerations Shrinkage. Under soft thresholding ( nonlinear operation at the coefficient level), it can be shown that | xid |≤|xi| where xid is denoised signal component i.e. denoising results in reduction of all the coefficients and shrinkage at the signal level as well. Denoising Problem Denoising problem is mainly estimation of STD and Threshold Level Basic problem in noise reduction under Gaussian white noise, is centered around the estimation of standard deviation of the Gaussian noise It is then used to determine a suitable threshold Alternative Considerations. White Noise Model-Global (Universal) Thresholding. Assume coefficients at the highest frequency details gives a good estimate of the noise content . A white noise model is superimposed on the coefficients at the highest frequency detail level d1 An estimate of the standard deviation at the d1 level is then used to arrive at a suitable threshold level for coefficient thresholding at all levels. This approach is a global thresholding which is applied to all detail coefficients Level Dependent Thresholds Nonwhite (Colored) Noise Model. Under this model, still white noise model is imposed on the coefficients of details, however threshold levels are considered to be level(scale) dependent. Gaussian white noise model is imposed on detail coefficients using standard deviation and threshold level at each level separately. Comments on Estimation Problem, Near Optimality under other Optimality Criteria Wavelet denoising (WaveShrink) utilizes a nonparametric function estimation approach for noise thresholding. It has been shown that statistically, denoising is considered to be: asymptotically near optimal over a wide range of optimality criteria and for large class of functions found in scientific and engineering applications( see ref by Donoho). Inaccuracy of Assuming Gaussian Distribution N(1,0), Result Evaluations Assumption of Gaussian distribution at d1 level may not always be valid Distribution of the coefficient at d1 often exhibit a long tail as compared with standard Gaussian(peaky distribution) This can also be observed in the case of sparsely distributed large amplitude coefficients or outliers. Under such condition, application of global thresholding may be revised and results of the thresholding be examined in light of actual data analysis and performance of denoising. Inaccuracy of Assuming Gaussian Distribution Fig.2 Peaky Gaussian-like pdf of the coefficients with long tail ends Signal Estimation and Threshold Selection Rules Use statistical estimation theory applied on probability distribution of the wavelet coefficients Use criteria for estimation of statistical parameters and selecting threshold levels A loss function which is referred to as ‘risk function’ is defined first. For Loss function we often use L2 norm of the error i.e. variance of estimation error, i.e. difference between the estimated value and actual unknown value Risk (Loss Function) We use expected value of the error as loss function since we are dealing with noisy signal which is a random variable and is therefore described in term of expected value. R( X , X ') E || ( X X ') ||2 Minimization of risk function results in an estimate of the variance of the coefficient. Risk (Loss) Function X is the actual (true) value of the signal to be estimated ( or coefficients) and X’ is an estimate of the signal X ( or coefficients ). Since noise component is assumed to be zero mean Gaussian, the difference is a measure of an error based on the additive noise model and given risk function. It is a measure of the energy of the noise i.e.∑[n(k)]2 Thus optimization procedure as defined above, attempts to reduce the energy of the signal X by an amount equal to the energy of the noise and thus compensating for the noise in the sense of L2 norms. Minimization of the risk function at coefficient level Under an orthogonal decomposition, minimization of risk function at the signal level, can equivalently be defined at the coefficient level. R(X^ ,X)= E||X^- X||2=E||W-1(C^ -C)||2 =E||(C^ -C)||2 C^ is the estimate of the true coefficient values. We have used additive noise model and wavelet transform in matrix form C=WX as described below. X=S + σn, C=WX, X=W-1 C Accordingly, minimization of the risk function at the coefficient level results equivalently in estimating the true value of the signal. Use of Minimax Rule One ‘best’ estimate is obtained using minimax rule indicated below: Minmax R(X^,X)= inf sup R(X^,X) Under minmax rule, worst case condition is considered, i.e. Sup R(X^,X) Here our objective is to mimimize the risk under worst case condition (i.e. obtain Min Max R) . Global/Universal Thresholding Rule Under the assumption of i.i.d. for the wavelet coefficients and Gaussian white noise, one can show that Under soft thresholding, the actual risk is within log(n) factor of the ideal risk where the error is minimal (on the average). This results in the following threshold value referred to as Universal Thresholding which minimizes max risk as defined above. Th=(2 log n), =MAD/.6745 MAD is ‘median absolute deviation’ of the coefficients median({|d J−1,k |: k = 0, 1, . . . , 2^(J−1) −1}) Ref: Donoho D.L. ”Denoising by Soft thresholding”, IEEE Trans on Information Theory, Vol 41,No.3 May 1995,pp 613-627 Universal Thresholding Rule Underlying basis for above threshold rule is based on the assumption of i.i.d for set of random variables X1, . . . , Xn having a distribution N(0, 1). Under this assumption, we can say the following for the probability of maximum absolute value of the coefficients. P{max |Xi|, 1≤i≤n> √ 2 logn}→ 0, as n → ∞ Note Xi refers to noise Universal Thresholding Rule Therefore, under universal thresholding applied to wavelet coefficients, we can say the following. with high probability every sample in the wavelet transform (i.e.coefficient) in which the underlying function is exactly zero will be estimated as zero Universal Thresholding Rule in WP Universal threshold estimation rule when applied to wavelet packet is to be adjusted to the length of decomposition which is nlog(n). Threshold is then Th=[2 log(nlog(n)]. Level Dependent Thresholding In level dependent thresholding, thresholds are rescaled at each level to arrive at a new estimate corresponding to the standard deviation of wavelet coefficients at that level. We consider white noise model and Gaussian distribution for the coefficient at each level. This is referred to ‘mln’ [multilevel noise model] in Matlab toolbox. Threshold level is determined as follows. Th(j,n) = σj(2 log nj), σj =MADj /.6745 Stein Unbiased Risk Estimator( SURE) A criteria referred to as Stein Unbiased Risk Estimator abbreviated by SureSrink, utilizes statistical estimation theory in which an unbiased estimate of loss function is derived Suppose X1, . . . , Xs are independent N(μi, 1), i = 1, . . . , s, random variables. The problem is to estimate: mean vector μ = (μ1, . , μs) with minimum L 2-risk. Stein states that the L2-loss can be estimated unbiasedly using any estimator μ that can be written as μ(X) = X + g(X), where the function g = (g1, . . . , gs) is weakly differentiable. SURE Estimator Under SURE criteria, following is considered as an estimate of the loss function. E||(x)- e||^2 =E SURE(th:x) where SURE(th;x)=s-2#B{i:|Xi|≤ th}+ (min(|xi|,th)^2 where (x) is a fixed estimate of the mean of the coefficients and #B denotes the cardinality of a set B. It can be shown that SURE(th;x) is an unbiased estimate of the L2-risk, i.e. µ|| µλ - µ||^2 = µSURE(th; X). Threshold level λ is based on minimum value of SURE loss function which is defined as (X) Ths = arg min th Sure(th;x) Other Thresholding Rules Fixed Form thresholding is the same as Universal Thresholding Th=(2 log n), =MAD/.6745 Minimax refers to finding the minimum of the maximum mean square error obtained for the worst function in a given set Rigorous SURE Denoising Rigorous SURE (Stein’s Unbiased Risk Estimate), a threshold-based method with a threshold where n is the number of samples in the signal(i.e. coefficients) Heuristic SURE Heuristic SURE is a combination of Fixed Form and Rigorous SURE ( for details refer to Matlab Helpdesk) Results of Denoising Application on CDMA Signal At SNR = 3 dB, MSE between the original signal and the noisy signal is 0.99. The following table shows MSE after denoising: Wavelet Haar,Bior3.1,Db10,Coif5, fixed form, white noise 0.55 0.64 0.46 0.46 RigSURE, white noise 0.36 0.41 0.27 0.27 HeurSURE,wh. Noise 0.42 0.41 0.27 0.28, Minimax, white noise 0.46 0.46 0.34 0.33, Minimax, nonwhite 0.53 1.09 0.44 0.32, Observations on Denoising Applied on CDMA Signals It was found that soft thresholding gives better performance ( in terms of SNR)than hard thresholding in this project. Since the noise model used in this project is WGN, selecting the correct noise type (white noise) will also give better results. Db10 and Coif5 outperform Haar and Bior3.1 in denoising because they have higher order of vanishing moments. At SNR equals –3 dB, Db10 and Coif5 with soft thresholding and rigorous SURE threshold selection rule give very good denoising performance. The MSE is brought from 3.9 to approximately 0.7. In general, Fixed Form and Heuristic SURE are more aggressive in removing noise. Rigorous SURE and Minimax are more conservative and they give better results in this project because some details of the CDMA signal lie in the noise range. Denoising in MATLAB In Matlab, command ‘wden’ is used for denoising: Sig=wden(s,tptr,sorh,scal,n,wav) for determining for noise thresholding where: s=signal, sorh= soft or hard thresholding ‘s’ ,‘h’ scal=1 original model( white noise with unscaled noise), scal=’sln’ first estimate of the noise variance based on 1st level details. This uses basic model. Scal=’mln’, is for nonwhite noise, i.e scale dependent noise thresholding. [For further details, please refer to Matlab wavemenu toolbox] Artifacts at points of Singularity and Stationary Wavelet Transform Gaussian noise model for the coefficients does not fully agree with peaky shape of the distribution at d1 level. Gibbs type of oscillations and artifacts are also observed at points of singularity, though not as much prominent as Gibbs oscillation. To correct such phenomena, stationary wavelet transform is used and has been incorporated in Matlab toolbox. In stationary wavelet transform, DWT is applied for all circular shifts of the signal of length N and coefficients are evaluated and threshold levels are determined. An average of all the N denoised signals is used for the final denoised signal The only limitation here is that signal length must be a factor of 2J . Denoising using SWT often results in a conservative noise reduction results as compared with standard soft thresholding using ‘Fixed Form’ or RigSure. Illustrative Examples of Denoising No Knoc k Cy cle #11 Hard Knoc k, Cyc le #7 2 6 4 Original 1 2 0 0 -1 Den Stage 1 -2 0 100 200 300 -4 2 4 1 2 0 0 -1 -2 -2 Den Stage 2 -2 0 100 200 300 -4 1.5 4 1 2 0 100 200 300 0 100 200 300 0 100 200 300 0.5 0 0 -2 -0.5 -1 0 100 200 300 -4 Spectrum of Original and Denoised Signal Knock, #8 3 4 Original 2 4 Denoised 1 1 0.5 0 2000 4000 6000 10000 0 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 Freq Hz 6000 4000 3000 5000 2000 1000 0 No Knock #11 x 10 1.5 2 0 Mild Knock #14 x 10 0 2000 4000 6000 0 1500 600 1000 400 500 200 0 0 2000 4000 Freq. Hz 6000 0 Spectrum, Original and Denoised Spectrum before denoising ( green) and after denoising ( blue) 2D Denoising, An Illustration Hidden Markov Model for Denoising Please refer to class notes posted on site THE CLASSICAL APPROACH TO WAVELET THRESHOLDING A wavelet based linear approach, extending simply spline smoothing estimation methods as described by Wahba (1990), is the one suggested by Antoniadis (1996) and independentlyby Amato & Vuza (1997). Of non-threshold type, this method is appropriate for estimating relatively regular functions. Assuming that the smoothness index s of the function g to be recovered is known, the resulting estimator is obtained by estimating the scaling coefficients cj0k by their empirical counterparts ˆ cj0k and by estimating thewavelet coefficients djk via a linear shrinkage d˜jk = dˆjk 1 + λ22js , where λ > 0 is a smoothing parameter. The parameter λ is chosen by cross-validation in Amato & Vuza (1997), while the choice of λ in Antoniadis (1996) is based on risk minimization and depends on a preliminary consistent estimator of the noise level σ. The above linear methods are not designed to handle spatially inhomogeneous functions with low regularity. For such functions one usually relies upon nonlinear thresholding or nonlinear shrinkage methods. Donoho & Johnstone (1994, 1995, 1998) and Donoho, Johnstone, Kerkyacharian & Picard (1995) proposed a nonlinear wavelet estimator of g based on reconstruction by keeping the empirical scaling coefficients ˆ cj0k in (2) intact and from a more judicious selection of the empirical wavelet coefficients dˆjk in (3). They suggested the extraction of the significant wavelet coefficients by thresholding in which wavelet coefficients are set to zero if their absolute value is below a certain threshold level, λ ≥ 0, whose choice we discuss in more detail in Section 4.1. Under this scheme we obtain thresholded wavelet coefficients using either the hard or soft thresholding rule given respectively by δH λ (dˆjk) = 0 if| dˆjk| ≤ λ dˆjk if | dˆjk| > λ (4) and δSλ (dˆjk) = 0 if|dˆjk| ≤ λ dˆjk − λ if dˆjk > λ dˆjk + λ if dˆjk < −λ. (5) 11 Thresholding allows the data itself to decide which wavelet coefficients are significant; hard thresholding (a discontinuous function) is a ‘keep’ or ‘kill’ rule, while soft thresholding (a continuous function) is a ‘shrink’ or ‘kill’ rule.