Speech Enhancement 1 2 3 4 Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion y t Clean Speech K 1 vt Noise K 1 z t Noisy Speech K 1 z t y t vt For additive noise yˆ t Ey t z t az t a y t vt 5 Projection Theorem : The Mean Square Error E y t yˆ t 2 is minimum if a is selected such that the error y t yˆ t y t a y t vt is orthogonal to the noisy signal. i.e. : y t a y t vt y t vt E y t y t vt H Ea y t vt y t vt H H : Hermitian transposit ion 6 Assuming v t and y t to be zero - mean and uncorrelat ed , i.e., E y t vt H Ev yt t H 0 Then we' ll have : aEy y v v a E y y E y y v v E yt yt H H t H t t H t t t H t t H t t 1 7 Since y and v are zero mean: E v v E yt yt H yt H t t vt a yt yt vt 1 This is called the time domain Wiener filter 8 We are looking for a frequency-domain Wiener filter, called the non-causal Wiener filter such that: yˆ t ht z d z t h d According to the projection theorem, for the error E y t yˆ t to be minimum, the difference t y t yˆ t has to be orthogonal to the noisy input 2 E y t yˆ t z t 0 9 or E y t z t h d z 0 Ey t z E z t h d z or R yz t R zz t h d :t R yz R zz h d 10 t : R yz R zz * h (Convoluti on) S yz S zz H j H j S yz S zz S zz : Spectrum af z t S yz : cross Spectrum between y t and z t S yz S yy (since : R yz Ey t y t vt Ey y Ey v ) 11 S zz S yy S vv H j S yy S yy S vv S zz S vv H j S zz Popular form of Wiener filter 12 13 Spectral Subtraction z t y t vt Z t Yt Vt 14 15 Vˆt 2 n Vˆt ,old 2 1 n Z t 2 Z t Vt Yˆt Z t 2 Vˆt 2 Z 2 Vˆ t t Ht 2 Zt Yˆ H .Z t t 2 1 2 e 1 j z 2 t 16 1 Maximum Like lihood ( ML ) PObservations Parameters 2 Maximum a Posteriori ( MAP ) PParameters Observations 3 MMSE : EParameters Observation s z z t , t 0,1, , T 1 y y t , t 0,1, , T 1 1 ML : Pz y 2 MAP : P y z 3 MMSE : Ey z 17 18 MAP Speech Enhancement yt R K vt R K zt R K y yt , t 0,1, , T 1 v vt , t 0,1, , T 1 z zt , t 0,1, , T 1 s st , t 0,1, , T 1, st 1, , M m mt , t 0,1, , T 1, mt 1, , L q , y k Weight Seq. 1 M qt , y k , t 0, , T 1 1 L qt , y k : M L max ln P y v y, z max ln P y v s, m, y, z y y s 1 m 1 19 max ln P y v y z y k y k y t k , t 0,1, , T 1 , y t k R y k 1 P s, m, y k y v s ,m ln P y v s, m, y k 1 z y k 1 ln P y k 1 z ln P y k z 1 1 y t k 1 qt , y k H , .z t , , H , , v 0t T 20 21 max ln pyv s, m, y, z s ,m, y max ln pyv s, m, y z s ,m, y max ln pyv s, m, y, z s ,m 22 MMSE Speech Enhancement We try to optimize the function: gˆ y t E g y t z t 0 g(.) is a function on Rk and z z 0 ,, z t t 0 23 M L N P g y t Wt ,, , z 0t . 1 1 1 1 Eg y t z t , st , mt , nt , pt Wt ,, , z P st , mt , nt , pt z t 0 Gt ,, , , z t 0 t 0 G ,, , , z M L N 1 1 1 P 1 t t 0 24 G 0 ,, , , z0 Gt ,, , , z0t t s0t 1:st c| c | b z0 ,, , m0t 1:mt n0t 1:nt p0t 1: pt a .c .an .c .b zc sc , mc , nc , pc s n sc 1 c pc nc 0 st 1 c mc sc 25 exp 1 z tTr st ,mt nt , pt 2 bz t st , mt , nt , pt 2 k 2 det st,mt nt , pt z 1 t 1 2 Eg y t z t , st , mt , nt , pt g y t p yv y t z t , st , mt , nt , pt dy t The computation of Eqn1 is generally difficult. For some specific functions, Eqn1 has been derived. For instance, when g(.) is defined to be: g1 y t Yt k , k 0,1,, K 1 Where Yt (k ) is the kth coefficient of the DFT of yt , Eqn1 is equivalent to the popular Wiener filter 26 27 Recursive Formula For G: M L N P Gt ,, , , z 0t Gt 1 ,, , , z 0t 1 11 1 1 a a c| c | bz t ,, , 28 29 30 31 32 33 34 35 36 37 38 39 Automatic Noise Type Selection 40 41 42 Nonstationary State HMM K 1 yt g t N t g t : Determinis tic Function N t : Stationary Residual (assumed to be an iid zero - mean Gaussian source NS - HMM Parameters : , a, c, , i 1,2, , M i ,m , g t i ,m m 1,2, , L Covariance of N t 43 Nonstationary-State HMM For example, if the determinis tic function is assumed to be polynomial , y t Bi ,m r hr t i N t i ,m R m 1,2, , M r 0 i : The starting time to visit the ith state hr : an rth order polynomial (usually orthogonal ) bt j , m, d 1 2 K 2 j ,m 1 . 2 Tr R R 1 exp 1 y t B j ,m r hr d j ,m y t B j ,m r hr d r 0 r 0 2 44 Segmentation Algorithm in NS-HMM s0 , s1 ,, sT 1 : state sequence y 0 , y1 ,, yT 1 : observatio n sequence d 0 , d1 ,, d T 1 : duration sequence t j , m, d max ps 0 , s1 ,, st j , mt m, d t d , y 0 , y1 ,, y t s0 , s1 ,, st 1 t j , m, d arg max ps 0 , s1 ,, st j , mt m, d t d , y 0 , y1 ,, y t s0 , s1 ,, st 1 i, v, 45 Segmentation Algorithm in NS-HMM 1 - Initializa tion : 0 j , m,0 j .c m j .bt j , m,0 1 m L ,1 j M 2 - Recursion for d 0 (entering a new Markov state) L t 1 1 0 t j , m,0 max max max i j t 1 i, v, .aij .c m| j .bt j , m,0, | t j , m,0 arg max t 1 i, v, .aij i , v , for 0 t T , 1 j M , 1 m L 46 3 - Recursion step for d 0 (self looping) t j , m, d t j , m, d 1.a jj .c m| j .bt j , m, d | t j , m, d j , m, d 1 for 0 t T , 1 j M , 1 m L , 0 d t (assuming the mixture is not changed within a state) 47 4 - Terminatio n M L T 1 i 1 m 1 d 0 p* max max max T 1 i, m, d s *T 1 , m *T 1 , d *T 1 arg max M i 1 L T 1 m 1 d 0 max max T 1 i, m, d 5 - Backtracki ng s *t , m *t , d *t t 1 s *t 1 , m *t 1 , d *t 1 for t T 2, T 3, ,0 48 Now we generalize MMSE formulae for NS-HMM M L N P T E g y t | z 0t Wt ,, , , d , z 0t . 1 1 1 1 d 1 gy f y t y t | st , mt , nt , pt , d t d dy t Wt Eg y t | st , mt , nt , pt , d t d t G , , , , d , z t 0 Wt ,, , , d , z 0t t G , , , , d , z t 0 d for 1 M , 1 L , 1 N , 1 P , 1 d t 49 For the calculatio n of E{g | ....} , g y t has to be specified. It has been shown that for : g y t y t k , k 0,, k 1 ( y t k : k th component of the DFT of y t ) the computatio n cost is less than other functions. 50 A linear estimation using the MMSE criterion has shown that the expectatio n of the kth component of g is Gaussian. i.e., Eg k | z t , st , mt , nt , pt , d t Yt k f yv Yt k | z t , st , mt , nt , pt , d t d Yt k ~ is Gaussian w ith mean : H st ,mt ,nt , pt ,d t k Z t k Where Z t k is the kth component of the DFT of z t ~ and H st ,mt ,nt , pt ,d t k is the kth component of the Wiener filter for the correspond ing state, mixture and duration of speech and noise. 51 Recursive calculatio n of G, with duration constraint s, For entering a new state : Gt ,, , ,0, z t 0 M L N P t t 1 G , , , , d , z t 1 0 .a .a . 11 1 1 d 0 c| .c | .bz t | ,, , ,0 For staying in the old state : N P t t 1 Gt ,, , , d , z 0 Gt 1 ,, , , d 1, z 0 .a .a 1 1 .c| .c | .b z t | ,, , , d 1 d t 52 53 54 55 56 57 58 59 60