Digital Audio Signal Processing Lecture-4: Noise Reduction Marc Moonen/Alexander Bertrand Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be homes.esat.kuleuven.be/~moonen/ Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 2 Single-Microphone Noise Reduction Problem • Microphone signal is desired signal s[k] y[k ] s[k ] n[k ] desired signal noise contribution contribution y[k] desired signal estimate ? s [k ] noise signal(s) • Goal: Estimate s[k] based on y[k] • Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, … Digital audio restoration • Will consider speech applications: s[k] = speech signal Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 3 Spectral Subtraction Methods: Basics y[k ] s[k ] n[k ] • Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is (i-th frame) Yi ( ) Si ( ) Ni ( ) • However, speech signal is an on/off signal, hence some frames have speech +noise, i.e. Yi (w ) = Si (w )+ Ni (w ) framei Î {`speech + noise' frames} some frames have noise only, i.e. Yi ( ) • 0 Ni ( ) frame i {`noise - only' frames} A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 4 Spectral Subtraction Methods: Basics • Definition: () = average amplitude of noise spectrum ( ) E{ Ni ( ) } • Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames ˆ ( ) • 1 M Y ( ) i M noise-only frames Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated (): Sˆi () Gi ()Yi () based on `gain function’ Digital Audio Signal Processing Version 2014-2015 Gi ( ) f (Yi ( ), ˆ ( )) Lecture-4: Noise Reduction p. 5 Spectral Subtraction: Gain Functions Magnitude Subtraction Spectral Subtraction Wiener Estimation Maximum Likelihood Non-linear Estimation Ephraim-Malah = most frequently used in practice Digital Audio Signal Processing Version 2014-2015 see next slide Lecture-4: Noise Reduction p. 6 Spectral Subtraction: Gain Functions • Example 1: Ephraim-Malah Suppression Rule (EMSR) Gi ( ) with: • • 2 M [ ] SNR post ( ) SNR prio ( ) 1 SNR post SNR prio 1 SNR prio SNR prio .M SNR post 1 SNR prio ( 1 ) I ( ) I ( ) 0 1 2 2 2 Yi ( ) modified Bessel functions ˆ ( ) 2 2 Gi 1 ( )Yi 1 ( ) (1 - )max(SNR post - 1,0) ( ) 2 e 2 This corresponds to a MMSE (*) estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } ) assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984]. Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985]. (*) minimum mean squared error Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 7 Spectral Subtraction: Gain Functions • Example 2: Magnitude Subtraction – Signal model: Yi ( ) Si ( ) N i ( ) Yi ( ) e j y ,i ( ) – Estimation of clean speech spectrum: Sˆi ( ) Y ( ) ˆ ( )e ˆ ( ) 1 Yi ( ) Yi ( ) j y ,i ( ) i Gi ( ) – PS: half-wave rectification Gi ( ) max( 0, Gi ( )) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 8 Spectral Subtraction: Gain Functions • Example 3: Wiener Estimation 2 ˆ Si ( ) – Linear MMSE estimation: find linear filter Gi() to minimize MSE E Si ( ) Gi ( ).Yi ( ) – Solution: Gi ( ) ES i ( ).Yi ( ) Psy,i ( ) EYi ( ).Yi ( ) Pyy ,i ( ) <- cross-correlation in i-th frame <- auto-correlation in i-th frame Assume speech s[k] and noise n[k] are uncorrelated, then... Gi ( ) Pss,i ( ) Pyy ,i ( ) Pyy ,i ( ) Pnn,i ( ) Pyy ,i ( ) Yi ( ) ˆ ( ) 2 2 Yi ( ) 2 ˆ ( ) 2 1 2 Yi ( ) – PS: half-wave rectification Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 9 Spectral Subtraction: Implementation Y[n,i] y[k] Short-time analysis ˆ [ n, i ] Gain functions Sˆ [n, i ] Short-time synthesis sˆ[ k ] Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank) K 1 Y [n, i ] h[k ] y[i.M k ]e j 2 kn / N = estimate for Y(n ) at time i (i-th frame) k 0 N=number of frequency bins (channels) n=0..N-1 M=downsampling factor K=frame length h[k] = length-K analysis window (=prototype filter) frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2M..3M) subband processing: Sˆ[ n, i ] G[ n, i ].Y [ n, i ] synthesis bank: matched to analysis bank (see DSP-CIS) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 10 Spectral Subtraction: Musical Noise • Audio demo: car noise y[k ] magnitude subtraction sˆ[k ] • Artifact: musical noise What? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames, resulting in random gains Gi() statistical analysis shows that broadband noise is transformed into signal composed of short-lived tones with randomly distributed frequencies (=musical noise) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 11 Spectral Subtraction: Musical Noise • Solutions? - Magnitude averaging: replace Yi() in calculation of Gi() by a local average over frames Sˆi () G()Yi () average instantaneous - EMSR (p7) - augment Gi() with soft-decision VAD: Gi() P(H1 | Yi()). Gi() … probability that speech is present, given observation Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 12 Spectral Subtraction: Iterative Wiener Filter Example of signal model-based spectral subtraction… • Basic: Wiener filtering based spectral subtraction (p.9), with (improved) spectra estimation based on parametric models • Procedure: 1. Estimate parameters of a speech model from noisy signal y[k] 2. Using estimated speech parameters, perform noise reduction (e.g. Wiener estimation, p. 9) 3. Re-estimate parameters of speech model from the speech signal estimate 4. Iterate 2 & 3 Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 13 Spectral Subtraction: Iterative Wiener Filter pulse train voiced … gs … pitch period unvoiced u[k] all-pole filter 1 x M 1 m e j m white noise generator speech signal m 1 frequency domain: S ( ) gs M 1 m e j m U ( ) m 1 M time domain: s[k ] s[k m] g u[k ] m s m 1 α 1 M T Digital Audio Signal Processing = linear prediction parameters Version 2014-2015 Lecture-4: Noise Reduction p. 14 Spectral Subtraction: Iterative Wiener Filter For each frame (vector) y[m] (i=iteration nr.) 1. Estimate g s ,i and αi 1,i M ,i T 2. Construct Wiener Filter (p.9) Repeat until some error criterion is satisfied Pss ( ) G ( ) .. Pss ( ) Pnn ( ) with: • Pnn ( ) estimated during noise-only periods g s ,i • P ( ) ss 2 M 1 m , i e j m m 1 3. Filter speech frame y[m] Digital Audio Signal Processing Version 2014-2015 sˆ i [m] Lecture-4: Noise Reduction p. 15 Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 16 Multi-Microphone Noise Reduction Problem s[k ] speech source ? n[k ] noise source(s) (some) speech estimate microphone signals yi [k ] si [k ] ni [k ], i 1...M speech part Digital Audio Signal Processing s [k ] Version 2014-2015 noise part Lecture-4: Noise Reduction p. 17 Multi-Microphone Noise Reduction Problem Will estimate speech part in microphone 1 (*) (**) s[k ] ? s1[k ] n[k ] yi [k ] si [k ] ni [k ], i 1...M (*) Estimating s[k] is more difficult, would include dereverberation... (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 18 Multi-Microphone Noise Reduction Problem • Data model: Y( ) S( ) N( ) d( ).S ( ) N( ) Y1 ( ) H1 ( ) N1 ( ) Y ( ) H ( ) N ( ) 2 2 .S ( ) 2 Y ( ) H ( ) N ( ) M M M See Lecture-2 on multi-path propagation, with q left out for conciseness. Hm(ω) is complete transfer function from speech source position to m-the microphone Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 19 Multi-Channel Wiener Filter (MWF) • Data model: Y( ) d( ).S ( ) N( ) • Will use linear filters to obtain speech estimate (as in Lecture-2) M Sˆ1 ( ) Fm* ( ).Ym ( ) F H ( ).Y( ) m 1 • Wiener filter (=linear MMSE approach) 2 min F ( ) E{ S1 ( ) F ( ).Y( ) } H Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be `unusual’… Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 20 Multi-Channel Wiener Filter (MWF) • Wiener filter solution is (see DSP-CIS) 1 F( ) E{Y( ).Y ( )} E{Y( ).S1* ( )}. H autocorrelation ... crosscorrelation * (with E {S ( ). N 1 ( )} 0 ) E{Y( ).Y H ( )}1. E{Y( ).Y1* ( )} E{N( ).N1* ( )} compute during speech+noise periods compute during noise-only periods – All quantities can be computed ! – Special case of this is single-channel Wiener filter formula on p.9 Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 21 Multi-Channel Wiener Filter (MWF) • MWF combines spatial filtering (as in Lecture-2) with single-channel spectral filtering (as in single-channel noise reduction) if Y( ) Y1 ( ) Y ( ) 2 d( ) .S ( ) N( ) steering vector noise YM ( ) E{N ( ).N H ( )} Φ NN ( ) then… Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 22 Multi-Channel Wiener Filter (MWF) • …then it can be shown that 1 F( ) ( ) . Φ NN ( ).d( ) scalar • 1 Φ NN ( ).d ( ) represents a spatial filtering (*) Compare to superdirective & delay-and-sum beamforming (Lecture-2) – Delay-and-sum beamf. maximizes array gain in white noise field – Superdirective beamf. maximizes array gain in diffuse noise field – MWF maximizes array gain in unknown (!) noise field. MWF is operated without invoking any prior knowledge (steering vector/noise field) ! (the secret is in the voice activity detection… (explain)) (*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR (at one frequency) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 23 Multi-Channel Wiener Filter (MWF) • …then it can be shown that (continued) 1 F( ) ( ) . Φ NN ( ).d( ) scalar • ( ) represents an additional `spectral post-filter’ i.e. single-channel Wiener estimate (p.9), applied to output signal of spatial filter S ( ) .H1* ( ) 2 ( ) ... d H 1 ( ).Φ NN ( ).d( ) . S ( ) 1 2 (prove it!) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 24 Multi-Channel Wiener Filter: Implementation • Implementation with short-time Fourier transform: see p.10 • Implementation with time-domain linear filtering: s[k ] y1[k ] s1[k ] y i [k ] s i [k ] n i [k ] y Ti [k ] yi [k ] n[k ] M s1[k ] y Ti [k ].fi i 1 Digital Audio Signal Processing yi [k 1] ... Version 2014-2015 yi [k L 1] filter coefficients Lecture-4: Noise Reduction p. 25 Multi-Channel Wiener Filter: Implementation • Implementation with time-domain linear filtering: 2 min f E{ s1[k ] y [k ].f } T f f1T f 2T ... f MT T y[k ] y1T [k ] y T2 [k ] ... y TM [k ] T Solution is… E{y[k ].y[k ] } .E{y[k ]. y [k ]} E{n[k ].n [k ]} T 1 T 1 f E{y[k ].y[k ] } .E{y[k ].s1[k ]} 1 1 compute during speech+noise periods compute during noise-only periods Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 26 Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 27 Kalman Filter 1/12 State space model of a time-varying discrete-time system ì ï x[k +1] = í = ï î y[k] process noise A[k].x[k]+ B[k].u[k]+ v[k] C[k].x[k]+ D[k].u[k]+ w[k] measurement noise with v[k] and w[k]: mutually uncorrelated, zero mean, white noises é v[k]] ù ú. éê v[k]H E{ê êë w[k] úû ë w[k]H é 0 ù ù ê V[k] ú úû} = W[k] úû êë 0 1 2 T 2 V[k] = V[k] .V[k] = Cholesky/square-root factorization Then: given A[k], B[k], C[k], D[k], V[k], W[k] and input/output-observations u[k],y[k], k=0,1,2,... then Kalman filter produces MMSE estimates of internal states x[k], k=0,1,... Digital Audio Signal Processing PS: will use shorthand notation here, i.e. Version 2014-2015 Lecture-4: Reduction xk, yk ,.. instead of x[k], Noise y[k],.. p. 28 Kalman Filter 2/12 Definition: x̂ k|l • `FILTERING’ = MMSE-estimate of xk using all available data up until time l = estimate x̂ k|k • `PREDICTION’ = estimate x̂ k|k-n , n > 0 • `SMOOTHING’ = estimate x̂ k|k+n, n > 0 Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 29 Kalman Filter 3/12 Initalization: =error covariance matrix ‘Conventional Kalman Filter’ operation @ time k (k=0,1,2,..) Given x̂ k|k-1 and corresponding error covariance matrix P k|k-1 , Compute x̂ k|k , P k|k and x̂k+1|k , Pk+1|k using uk, yk : Step 1: Measurement Update (produces ‘filtered’ estimate) (compare to standard RLS!) Step 2: Time Update (produces ‘1-step prediction’) Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 30 Kalman Filter 4/12 PS: ‘Standard RLS’ is a special case of ‘Conventional KF’ Digital Audio Signal Processing Version 2014-2015 Internal state vector is FIR filter coefficients vector, which is assumed to be time-invariant Lecture-4: Noise Reduction p. 31 Kalman Filter 5/12 State estimation @ time k corresponds to overdetermined set of linear equations, where vector of unknowns contains all previous state vectors : Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 32 Kalman Filter 6/12 Technical detail… State estimation @ time k corresponds to overdetermined set of linear equations, where vector of unknowns contains all previous state vectors : Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 33 Kalman Filter 7/12 State estimation @ time k corresponds to overdetermined set of linear equations, where vector of unknowns contains all previous state vectors : Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 34 Kalman Filter 8/12 Propagated from time k-1 to time k ‘Square-Root’ Kalman Filter ..hence requires only lower-right/lower part Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 35 Kalman Filter 9/12 Digital Audio Signal Processing Version 2014-2015 Propagated from time k-1 to time k ‘Square-Root’ Kalman Filter Lecture-4: Noise Reduction p. 36 Kalman Filter 10/12 Digital Audio Signal Processing Propagated from time k to time k+1 Propagated from time k-1 to time k ‘Square-Root’ Kalman Filter Version 2014-2015 Lecture-4: Noise Reduction p. 37 Kalman Filter 11/12 QRD- =QRDDigital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 38 Kalman Filter 12/12 can be derived from square-root KF equations is . : can be worked into measurement update eq. : can be worked into state update eq. [details omitted] Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 39 Kalman filter for Speech Enhancement • Assume AR model of speech and noise s[k ] n[k ] M m 1 N m 1 m m s[k m] g s u[k ] n[k m] g n w[k ] u[k], w[k] = zero mean, unit variance,white noise • Equivalent state-space model is… x[k 1] Ax[k ] v[k ] T c x[k ] y[k ] Digital Audio Signal Processing Version 2014-2015 y=microphone signal Lecture-4: Noise Reduction p. 40 Kalman filter for Speech Enhancement with: xT [k ] s[k M 1] s[k ] n[k N 1] n[k ] v[k ] G.u[k ] w[k ] T A s A 0 0 As 0 M Q G.GT N M 0 T g s ; C 0 0 1 0 0 1; G A n 0 1 0 M 1 0 0 0 ; An 0 0 1 1 N gTs 0 0 Digital Audio Signal Processing 0 0 0 1 1 1 0 N 1 g s ; gTn 0 0 Version 2014-2015 0 g n gn ; Lecture-4: Noise Reduction p. 41 Kalman filter for Speech Enhancement Iterative algorithm y[k] split signal in frames iterations estimate parameters gˆ s ,i ; ˆ m,i gˆ n ,i ; ˆm ,i Kalman Filter reconstruct signal sˆ i [m], nˆ i [m] sˆ[ k ] Disadvantages iterative approach: • complexity • delay Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 42 Kalman filter for Speech Enhancement Sequential algorithm iteration index sˆ[k 1 | k 1], time index (no iterations) D nˆ[k 1 | k 1] y[k ] ˆ m [k 1 | k 1], ˆm [k 1 | k 1], gˆ s [k 1 | k 1], gˆ n [k 1 | k 1] Digital Audio Signal Processing sˆ[k | k ], nˆ[k | k ] State Estimator: Kalman Filter sˆ, nˆ Parameters Estimator (Kalman Filter) D Version 2014-2015 ˆ m [k | k ], ˆm [k | k ], ˆ m , ˆm , gˆ s , gˆ n gˆ s [k | k ], gˆ n [k | k ] Lecture-4: Noise Reduction p. 43 CONCLUSIONS • Single-channel noise reduction – Basic system is spectral subtraction – Only spectral filtering, hence can only exploit differences in spectra between noise and speech signal: • noise reduction at expense of speech distortion • achievable noise reduction may be limited • Multi-channel noise reduction – Basic system is MWF, – Provides spectral + spatial filtering (links with beamforming!) • Iterative Wiener filter & Kalman filtering – Signal model based approach Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 44