Digital Audio Signal Processing Lecture-3 Noise Reduction Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be homes.esat.kuleuven.be/~moonen/ Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 2 Single-Microphone Noise Reduction Problem • Microphone signal is desired signal s[k] y[k ] s[k ] n[k ] desired signal noise contribution contribution y[k] desired signal estimate ? s [k ] noise signal(s) • Goal: Estimate s[k] based on y[k] • Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, … Digital audio restoration • Will consider speech applications: s[k] = speech signal Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 3 Spectral Subtraction Methods: Basics y[k ] s[k ] n[k ] • Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is Yi ( ) Si ( ) Ni ( ) (i-th frame) • However, speech signal is an on/off signal, hence some frames have speech +noise, i.e. Yi (w ) = Si (w )+ Ni (w ) framei Î {`speech + noise' frames} some frames have noise only, i.e. Yi ( ) • 0 Ni ( ) frame i {`noise - only' frames} A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 4 Spectral Subtraction Methods: Basics • Definition: () = average amplitude of noise spectrum ( ) E{ Ni ( ) } • Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames ˆ ( ) • 1 M Y ( ) i M noise-only frames Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated (): Sˆi () Gi ()Yi () based on `gain function’ G ( ) f (Y ( ), ˆ ( )) i i Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 5 Spectral Subtraction: Gain Functions Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 6 Spectral Subtraction: Gain Functions • Example 1: Ephraim-Malah Suppression Rule (EMSR) Gi (w ) = 2 M[q ] with: • é öæ SNR ö æ SNR öù prio prio ÷÷çç ÷÷.M êSNR post çç ÷÷ú 1+ SNR 1+ SNR ê øè è prio ø prio øú ë û q é q q ù = e 2 ê(1+ q )I 0 ( ) + q I1 ( )ú ë 2 2 û - SNR post (w ) = SNR prio (w ) • æ 1 çç è SNR post p Yi (w ) 2 m̂ (w )2 modified Bessel functions G (w )Y (w ) = (1-a )max(SNR post -1,0) + a i-1 ⌢ i-12 m (w ) 2 This corresponds to a MMSE (*) estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } ) assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984]. Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985]. (*) minimum mean squared error Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 7 Spectral Subtraction: Gain Functions • Example 2: Magnitude Subtraction – Signal model: Yi ( ) Si ( ) N i ( ) Yi ( ) e j y ,i ( ) – Estimation of clean speech spectrum: Sˆi ( ) Y ( ) ˆ ( )e ˆ ( ) 1 Yi ( ) Yi ( ) j y ,i ( ) i Gi ( ) – PS: half-wave rectification Gi ( ) max( 0, Gi ( )) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 8 Spectral Subtraction: Gain Functions • Example 3: Wiener Estimation – Linear MMSE estimation: Sˆi ( ) find linear filter Gi() to minimize MSE E S ( ) G ( ).Y ( ) i i i – Solution: Gi ( ) ESi ( ).Yi ( ) Psy,i ( ) EYi ( ).Yi ( ) Pyy ,i ( ) 2 <- cross-correlation in i-th frame <- auto-correlation in i-th frame Assume speech s[k] and noise n[k] are uncorrelated, then... Gi ( ) Pss,i ( ) Pyy ,i ( ) Pyy ,i ( ) Pnn,i ( ) Pyy ,i ( ) Yi ( ) ˆ ( ) 2 2 Yi ( ) 2 ˆ ( ) 2 1 2 Yi ( ) – PS: half-wave rectification Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 9 Spectral Subtraction: Implementation Y[n,i] y[k] Short-time analysis ˆ [n, i ] Gain functions Sˆ [n, i ] Short-time synthesis sˆ[ k ] Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank) K-1 Y[n, i] = å h[k]y[i.D - k]e- j 2 p kn/N = estimate for Y( ) at time i (i-th frame) n k=0 N=number of frequency bins (channels) n=0..N-1 D=downsampling factor K=frame length h[k] = length-K analysis window (=prototype filter) frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2D..3D) subband processing: Sˆ[n, i ] G[n, i ].Y [n, i ] synthesis bank: matched to analysis bank (see DSP-CIS) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 10 Spectral Subtraction: Musical Noise • Audio demo: car noise y[k ] magnitude subtraction sˆ[k ] • Artifact: musical noise What? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames, resulting in random gains Gi() statistical analysis shows that broadband noise is transformed into signal composed of short-lived tones with randomly distributed frequencies (=musical noise) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 11 Spectral Subtraction: Musical Noise Solutions? - Magnitude averaging: replace Yi() in calculation of Gi() by a local average over frames Sˆi () G()Yi () average instantaneous - EMSR (p7) - augment Gi() with soft-decision VAD: Gi() P(H1 | Yi()). Gi() … probability that speech is present, given observation Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 12 Spectral Subtraction: Iterative Wiener Filter Example of signal model-based spectral subtraction… • Basic: Wiener filtering based spectral subtraction (p.9), with (improved) spectra estimation based on parametric models • Procedure: 1. Estimate parameters of a speech model from noisy signal y[k] 2. Using estimated speech parameters, perform noise reduction (e.g. Wiener estimation, p. 9) 3. Re-estimate parameters of speech model from the speech signal estimate 4. Iterate 2 & 3 Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 13 Spectral Subtraction: Iterative Wiener Filter pulse train voiced … gs … pitch period unvoiced u[k] all-pole filter 1 x M 1 m e j m white noise generator speech signal m 1 frequency domain: S ( ) gs M 1 m e j m U ( ) m 1 M time domain: s[k ] m s[k m] g s u[k ] m 1 α 1 M T Digital Audio Signal Processing = linear prediction parameters Version 2015-2016 Lecture-3: Noise Reduction p. 14 Spectral Subtraction: Iterative Wiener Filter For each frame (vector) y[m] (i=iteration nr.) 1. Estimate g s ,i and αi 1,i M ,i T 2. Construct Wiener Filter (p.9) Repeat until some error criterion is satisfied Pss ( ) G ( ) .. Pss ( ) Pnn ( ) with: • Pnn ( ) estimated during noise-only periods g s ,i • P ( ) ss 2 M 1 m , i e j m m 1 3. Filter speech frame y[m] Digital Audio Signal Processing Version 2015-2016 sˆ i [m] Lecture-3: Noise Reduction p. 15 Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 16 Multi-Microphone Noise Reduction Problem s[k ] speech source n[k ] microphone signals ym [k] = sm [k]+ nm [k], m =1...M noise source(s) speech part Digital Audio Signal Processing (some) speech estimate Version 2015-2016 noise part Lecture-3: Noise Reduction M= number of microphones ? s [k ] p. 17 Multi-Microphone Noise Reduction Problem Will estimate speech part in microphone 1 (*) (**) s[k ] ? s1[k ] n[k ] ym [k] = sm [k]+ nm [k], m =1...M (*) Estimating s[k] is more difficult, would include dereverberation... (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 18 Multi-Microphone Noise Reduction Problem • Data model: Y( ) S( ) N( ) d( ).S ( ) N( ) Y1 ( ) H1 ( ) N1 ( ) Y ( ) H ( ) N ( ) 2 2 .S ( ) 2 YM ( ) H M ( ) N M ( ) See Lecture-2 on multi-path propagation, with q left out for conciseness. Hm(ω) is complete transfer function from speech source position to m-the microphone Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 19 Multi-Channel Wiener Filter (MWF) • Data model: Y( ) d( ).S ( ) N( ) • Will use linear filters to obtain speech estimate (as in Lecture-2) M Sˆ1 ( ) Fm* ( ).Ym ( ) F H ( ).Y( ) m 1 • Wiener filter (=linear MMSE approach) 2 min F ( ) E{ S1 ( ) F ( ).Y( ) } H Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be `unusual’… Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 20 Multi-Channel Wiener Filter (MWF) • Wiener filter solution is (see DSP-CIS) 1 F ( ) E{Y ( ).Y H ( )} E{Y( ).S1* ( )}. autocorrelation ... crosscorrelation * (with E {S ( ). N 1 ( )} 0 ) E{Y( ).Y H ( )}1. E{Y( ).Y1* ( )} E{N ( ). N1* ( )} compute during speech+noise periods compute during noise-only periods – All quantities can be computed ! – Special case of this is single-channel Wiener filter formula on p.9 – In practice, use alternative to ‘subtraction’ operation (see literature) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 21 Multi-Channel Wiener Filter (MWF) • Note that… MWF combines spatial filtering (as in Lecture-2) with single-channel spectral filtering (as in single-channel noise reduction) if Y( ) Y1 ( ) Y ( ) 2 d( ) .S ( ) N( ) steering vector noise YM ( ) E{N ( ).N H ( )} Φ NN ( ) then… Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 22 Multi-Channel Wiener Filter (MWF) …then it can be shown that 1 F( ) ( ) . Φ NN ( ).d( ) scalar ① 1 Φ NN ( ).d ( ) represents a spatial filtering (*) Compare to superdirective & delay-and-sum beamforming (Lecture-2) – Delay-and-sum beamf. maximizes array gain in white noise field – Superdirective beamf. maximizes array gain in diffuse noise field – MWF maximizes array gain in unknown (!) noise field. MWF is operated without invoking any prior knowledge (steering vector/noise field) ! (the secret is in the voice activity detection… (explain)) (*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR (at one frequency) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 23 Multi-Channel Wiener Filter (MWF) …then it can be shown that 1 F( ) ( ) . Φ NN ( ).d( ) scalar ① 1 Φ NN ( ).d ( ) represents a spatial filtering (*) ② ( ) represents an additional `spectral post-filter’ i.e. single-channel Wiener estimate (p.9), applied to output signal of spatial filter S ( ) .H1* ( ) 2 ( ) ... d H 1 ( ).Φ NN ( ).d( ) . S ( ) 1 2 (prove it!) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 24 Multi-Channel Wiener Filter: Implementation • Implementation with short-time Fourier transform: see p.10 • Implementation with time-domain linear filtering: s[k ] y1[k ] s1[k ] ym [k] = sm [k]+ nm [k] n[k ] yTm [k] = éë ym [k] ym [k -1] ... ym [k - L +1] ùû ⌢ s1[k] = å yTm [k].fm filter coefficients M m=1 Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 25 Multi-Channel Wiener Filter: Implementation • Implementation with time-domain linear filtering: 2 min f E{ s1[k ] y [k ].f } T f f1T f 2T ... f MT T y[k ] y1T [k ] y T2 [k ] ... y TM [k ] T Solution is… E{y[k ].y[k ] } .E{y[k ]. y [k ]} E{n[k ].n [k ]} T 1 T 1 f E{y[k ].y[k ] } .E{y[k ].s1[k ]} 1 1 compute during speech+noise periods compute during noise-only periods Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 26 Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters : See Lecture-6 – Kalman filters for noise reduction Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 27 Kalman filter for Speech Enhancement • Assume AR model of speech and noise s[k] = Ns åa s[k - n]+ n gs u[k] n=1 n[k] = Nn å b n[k - n]+ n gn w[k] u[k], w[k] = zero mean, unit variance,white noise n=1 • Equivalent state-space model is… x[k 1] Ax[k ] v[k ] T y [ k ] c x[k ] Digital Audio Signal Processing Version 2015-2016 y=microphone signal Lecture-3: Noise Reduction p. 28 with: v[k ] G.u[k ] w[k ] T A s A 0 Q G.GT N M 0 T g s ; C 0 0 1 0 0 1; G A n 0 gTs 0 0 Digital Audio Signal Processing g s ; gTn 0 0 Version 2015-2016 gn ; Lecture-3: Noise Reduction 0 g n s[k] and n[k] are included in state vector, hence can be estimated by Kalman Filter Kalman filter for Speech Enhancement p. 29 Kalman filter for Speech Enhancement Iterative algorithm (details omitted) iterations y[k] split signal in frames estimate parameters ĝs,i ; â n,i ĝn,i ; b̂n,i Kalman Filter ŝi [m] reconstruct signal sˆ[ k ] nˆ i [m] Disadvantages iterative approach: • complexity • delay Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 30 Kalman filter for Speech Enhancement Sequential algorithm (details omitted) iteration index time index (no iterations) D ŝ[k -1| k -1] nˆ[k 1 | k 1] y[k ] ân [k -1| k -1] b̂n [k -1| k -1] ĝs [k -1| k -1] gˆ n [k 1 | k 1] Digital Audio Signal Processing sˆ[k | k ], nˆ[k | k ] State Estimator (Kalman Filter) Parameters Estimator (Kalman Filter) D Version 2015-2016 sˆ, nˆ ân , b̂n gˆ s , gˆ n ân [k | k] b̂n [k | k] gˆ s [k | k ], gˆ n [k | k ] Lecture-3: Noise Reduction p. 31 CONCLUSIONS • Single-channel noise reduction – Basic system is spectral subtraction – Only spectral filtering, hence can only exploit differences in spectra between noise and speech signal: • noise reduction at expense of speech distortion • achievable noise reduction may be limited • Multi-channel noise reduction – Basic system is MWF, – Provides spectral + spatial filtering (links with beamforming!) • Iterative Wiener filter & Kalman filtering – Signal model based approach Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 32