Audio and Speech Processing Lecture 6 : Single Channel Speech

Digital Audio Signal Processing Lecture-3 Noise Reduction Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be homes.esat.kuleuven.be/~moonen/ Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 2 Single-Microphone Noise Reduction Problem • Microphone signal is desired signal s[k] y[k ]  s[k ]  n[k ] desired signal noise contribution contribution y[k] desired signal estimate ?  s [k ] noise signal(s) • Goal: Estimate s[k] based on y[k] • Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, … Digital audio restoration • Will consider speech applications: s[k] = speech signal Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 3 Spectral Subtraction Methods: Basics y[k ]  s[k ]  n[k ] • Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is Yi ( )  Si ( )  Ni ( ) (i-th frame) • However, speech signal is an on/off signal, hence some frames have speech +noise, i.e. Yi (w ) = Si (w )+ Ni (w ) framei Î {`speech + noise' frames} some frames have noise only, i.e. Yi ( )  • 0  Ni ( ) frame i {`noise - only' frames} A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 4 Spectral Subtraction Methods: Basics • Definition: () = average amplitude of noise spectrum  ( )  E{ Ni ( ) } • Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames ˆ ( )  • 1 M  Y ( ) i M noise-only frames Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated (): Sî ()  Gi ()Yi () based on `gain function’ G ( )  f (Y ( ), ˆ ( )) i i Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 5 Spectral Subtraction: Gain Functions Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 6 Spectral Subtraction: Gain Functions • Example 1: Ephraim-Malah Suppression Rule (EMSR) Gi (w ) = 2 M[q ] with: • é öæ SNR ö æ SNR öù prio prio ÷÷çç ÷÷.M êSNR post çç ÷÷ú 1+ SNR 1+ SNR ê øè è prio ø prio øú ë û q é q q ù = e 2 ê(1+ q )I 0 ( ) + q I1 ( )ú ë 2 2 û - SNR post (w ) = SNR prio (w ) • æ 1 çç è SNR post p Yi (w ) 2 m̂ (w )2 modified Bessel functions G (w )Y (w ) = (1-a )max(SNR post -1,0) + a i-1 ⌢ i-12 m (w ) 2 This corresponds to a MMSE (*) estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } ) assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984]. Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985]. (*) minimum mean squared error Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 7 Spectral Subtraction: Gain Functions • Example 2: Magnitude Subtraction – Signal model: Yi ( )   Si ( )  N i ( ) Yi ( ) e j y ,i ( ) – Estimation of clean speech spectrum: Sî ( )  Y ( )  ˆ ( )e    ˆ ( )  1  Yi ( ) Yi ( )     j y ,i ( ) i Gi ( ) – PS: half-wave rectification Gi ( )  max( 0, Gi ( )) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 8 Spectral Subtraction: Gain Functions • Example 3: Wiener Estimation  – Linear MMSE estimation: Sî ( )   find linear filter Gi() to minimize MSE  E  S ( )  G ( ).Y ( )  i i i   – Solution: Gi ( )  ESi ( ).Yi ( ) Psy,i ( )  EYi ( ).Yi ( ) Pyy ,i ( ) 2       <- cross-correlation in i-th frame <- auto-correlation in i-th frame Assume speech s[k] and noise n[k] are uncorrelated, then... Gi ( )  Pss,i ( ) Pyy ,i ( )  Pyy ,i ( )  Pnn,i ( ) Pyy ,i ( ) Yi ( )  ˆ ( ) 2 2  Yi ( ) 2 ˆ ( ) 2  1 2 Yi ( ) – PS: half-wave rectification Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 9 Spectral Subtraction: Implementation Y[n,i] y[k] Short-time analysis ˆ [n, i ] Gain functions Sˆ [n, i ] Short-time synthesis sˆ[ k ]  Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank) K-1 Y[n, i] = å h[k]y[i.D - k]e- j 2 p kn/N = estimate for Y( ) at time i (i-th frame) n k=0 N=number of frequency bins (channels) n=0..N-1 D=downsampling factor K=frame length h[k] = length-K analysis window (=prototype filter)  frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2D..3D)  subband processing: Sˆ[n, i ]  G[n, i ].Y [n, i ]  synthesis bank: matched to analysis bank (see DSP-CIS) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 10 Spectral Subtraction: Musical Noise • Audio demo: car noise y[k ] magnitude subtraction sˆ[k ] • Artifact: musical noise What? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames, resulting in random gains Gi()  statistical analysis shows that broadband noise is transformed into signal composed of short-lived tones with randomly distributed frequencies (=musical noise) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 11 Spectral Subtraction: Musical Noise Solutions? - Magnitude averaging: replace Yi() in calculation of Gi() by a local average over frames Sî ()  G()Yi () average instantaneous - EMSR (p7) - augment Gi() with soft-decision VAD: Gi()  P(H1 | Yi()). Gi() … probability that speech is present, given observation Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 12 Spectral Subtraction: Iterative Wiener Filter Example of signal model-based spectral subtraction… • Basic: Wiener filtering based spectral subtraction (p.9), with (improved) spectra estimation based on parametric models • Procedure: 1. Estimate parameters of a speech model from noisy signal y[k] 2. Using estimated speech parameters, perform noise reduction (e.g. Wiener estimation, p. 9) 3. Re-estimate parameters of speech model from the speech signal estimate 4. Iterate 2 & 3 Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 13 Spectral Subtraction: Iterative Wiener Filter pulse train voiced … gs … pitch period unvoiced u[k] all-pole filter 1 x M 1    m e  j m white noise generator speech signal m 1 frequency domain: S ( )  gs M 1   m e  j m U ( ) m 1 M time domain: s[k ]    m s[k  m]  g s u[k ] m 1 α  1   M  T Digital Audio Signal Processing = linear prediction parameters Version 2015-2016 Lecture-3: Noise Reduction p. 14 Spectral Subtraction: Iterative Wiener Filter For each frame (vector) y[m] (i=iteration nr.) 1. Estimate g s ,i and αi  1,i   M ,i T 2. Construct Wiener Filter (p.9) Repeat until some error criterion is satisfied Pss ( ) G ( )  ..  Pss ( )  Pnn ( ) with: • Pnn ( ) estimated during noise-only periods g s ,i • P ( )  ss 2 M 1    m , i e  j m m 1 3. Filter speech frame y[m] Digital Audio Signal Processing Version 2015-2016 sˆ i [m] Lecture-3: Noise Reduction p. 15 Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters – Kalman filters for noise reduction Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 16 Multi-Microphone Noise Reduction Problem s[k ] speech source n[k ] microphone signals ym [k] = sm [k]+ nm [k], m =1...M noise source(s) speech part Digital Audio Signal Processing (some) speech estimate Version 2015-2016 noise part Lecture-3: Noise Reduction M= number of microphones ?  s [k ] p. 17 Multi-Microphone Noise Reduction Problem Will estimate speech part in microphone 1 (*) (**) s[k ] ?  s1[k ] n[k ] ym [k] = sm [k]+ nm [k], m =1...M (*) Estimating s[k] is more difficult, would include dereverberation... (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 18 Multi-Microphone Noise Reduction Problem • Data model: Y( )  S( )  N( )  d( ).S ( )  N( )  Y1 ( )   H1 ( )   N1 ( )   Y ( )   H ( )   N ( )   2  2 .S ( )   2                 YM ( )  H M ( )  N M ( ) See Lecture-2 on multi-path propagation, with q left out for conciseness. Hm(ω) is complete transfer function from speech source position to m-the microphone Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 19 Multi-Channel Wiener Filter (MWF) • Data model: Y( )  d( ).S ( )  N( ) • Will use linear filters to obtain speech estimate (as in Lecture-2) M Sˆ1 ( )   Fm* ( ).Ym ( )  F H ( ).Y( ) m 1 • Wiener filter (=linear MMSE approach) 2 min F ( ) E{ S1 ( )  F ( ).Y( ) } H Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be ùnusual’… Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 20 Multi-Channel Wiener Filter (MWF) • Wiener filter solution is (see DSP-CIS) 1 F ( )  E{Y ( ).Y H ( )} E{Y( ).S1* ( )}.   autocorrelation  ... crosscorrelation * (with E {S ( ). N 1 ( )}  0 )    E{Y( ).Y H ( )}1. E{Y( ).Y1* ( )}  E{N ( ). N1* ( )} compute during speech+noise periods compute during noise-only periods – All quantities can be computed ! – Special case of this is single-channel Wiener filter formula on p.9 – In practice, use alternative to ‘subtraction’ operation (see literature) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 21 Multi-Channel Wiener Filter (MWF) • Note that… MWF combines spatial filtering (as in Lecture-2) with single-channel spectral filtering (as in single-channel noise reduction) if Y( )     Y1 ( )   Y ( )   2   d( ) .S ( )  N( )        steering vector noise   YM ( ) E{N ( ).N H ( )}  Φ NN ( ) then… Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 22 Multi-Channel Wiener Filter (MWF) …then it can be shown that 1 F( )   (  ) . Φ  NN ( ).d( ) scalar ① 1 Φ NN ( ).d ( ) represents a spatial filtering (*) Compare to superdirective & delay-and-sum beamforming (Lecture-2) – Delay-and-sum beamf. maximizes array gain in white noise field – Superdirective beamf. maximizes array gain in diffuse noise field – MWF maximizes array gain in unknown (!) noise field. MWF is operated without invoking any prior knowledge (steering vector/noise field) ! (the secret is in the voice activity detection… (explain)) (*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR (at one frequency) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 23 Multi-Channel Wiener Filter (MWF) …then it can be shown that 1 F( )   (  ) . Φ  NN ( ).d( ) scalar ① 1 Φ NN ( ).d ( ) represents a spatial filtering (*) ②  ( ) represents an additional `spectral post-filter’ i.e. single-channel Wiener estimate (p.9), applied to output signal of spatial filter S ( ) .H1* ( ) 2  ( )  ...  d H  1 ( ).Φ NN ( ).d( ) . S ( )  1 2 (prove it!) Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 24 Multi-Channel Wiener Filter: Implementation • Implementation with short-time Fourier transform: see p.10 • Implementation with time-domain linear filtering: s[k ] y1[k ]  s1[k ] ym [k] = sm [k]+ nm [k] n[k ] yTm [k] = éë ym [k] ym [k -1] ... ym [k - L +1] ùû ⌢ s1[k] = å yTm [k].fm filter coefficients M m=1 Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 25 Multi-Channel Wiener Filter: Implementation • Implementation with time-domain linear filtering: 2 min f E{ s1[k ]  y [k ].f } T  f  f1T  f 2T ... f MT  T  y[k ]  y1T [k ] y T2 [k ] ... y TM [k ] T Solution is…    E{y[k ].y[k ] } .E{y[k ]. y [k ]}  E{n[k ].n [k ]} T 1 T 1 f  E{y[k ].y[k ] } .E{y[k ].s1[k ]} 1 1 compute during speech+noise periods compute during noise-only periods Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 26 Overview • Spectral subtraction for single-micr. noise reduction – – – – Single-microphone noise reduction problem Spectral subtraction basics (=spectral filtering) Features: gain functions, implementation, musical noise,… Iterative Wiener filter based on speech signal model • Multi-channel Wiener filter for multi-micr. noise red. – Multi-microphone noise reduction problem – Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based noise reduction – Kalman filters : See Lecture-6 – Kalman filters for noise reduction Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 27 Kalman filter for Speech Enhancement • Assume AR model of speech and noise s[k] = Ns åa s[k - n]+ n gs u[k] n=1 n[k] = Nn å b n[k - n]+ n gn w[k] u[k], w[k] = zero mean, unit variance,white noise n=1 • Equivalent state-space model is… x[k  1]  Ax[k ]  v[k ]  T y [ k ]  c x[k ]  Digital Audio Signal Processing Version 2015-2016 y=microphone signal Lecture-3: Noise Reduction p. 28 with: v[k ]  G.u[k ] w[k ] T A s A 0 Q  G.GT N M 0  T    g s ; C  0  0 1 0  0 1; G   A n    0  gTs  0  0 Digital Audio Signal Processing   g s ; gTn  0  0 Version 2015-2016  gn ; Lecture-3: Noise Reduction 0 g n  s[k] and n[k] are included in state vector, hence can be estimated by Kalman Filter Kalman filter for Speech Enhancement p. 29 Kalman filter for Speech Enhancement Iterative algorithm (details omitted) iterations y[k] split signal in frames estimate parameters ĝs,i ; â n,i ĝn,i ; b̂n,i Kalman Filter ŝi [m] reconstruct signal sˆ[ k ] nˆ i [m] Disadvantages iterative approach: • complexity • delay Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 30 Kalman filter for Speech Enhancement Sequential algorithm (details omitted) iteration index time index (no iterations) D ŝ[k -1| k -1] nˆ[k  1 | k  1] y[k ] ân [k -1| k -1] b̂n [k -1| k -1] ĝs [k -1| k -1] gˆ n [k  1 | k  1] Digital Audio Signal Processing sˆ[k | k ], nˆ[k | k ] State Estimator (Kalman Filter) Parameters Estimator (Kalman Filter) D Version 2015-2016 sˆ, nˆ ân , b̂n gˆ s , gˆ n ân [k | k] b̂n [k | k] gˆ s [k | k ], gˆ n [k | k ] Lecture-3: Noise Reduction p. 31 CONCLUSIONS • Single-channel noise reduction – Basic system is spectral subtraction – Only spectral filtering, hence can only exploit differences in spectra between noise and speech signal: • noise reduction at expense of speech distortion • achievable noise reduction may be limited • Multi-channel noise reduction – Basic system is MWF, – Provides spectral + spatial filtering (links with beamforming!) • Iterative Wiener filter & Kalman filtering – Signal model based approach Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 32

Audio and Speech Processing Lecture 6 : Single Channel Speech

Related documents

Products

Support

Audio and Speech Processing Lecture 6 : Single Channel Speech

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib