An online dereverberation algorithm for hearing aids with binaural

An online dereverberation algorithm for hearing aids with binaural cues preservation 1 Boaz Schwartz1, Sharon Gannot1, and Emanuël A.P. Habets2 2 Faculty of Engineering, International Audio Laboratories, Erlangen Bar-Ilan University University of Erlangen-Nuremberg & Fraunhofer IIS Ramat-Gan, Israel Erlangen, Germany boazsh0@gmail.com, sharon.gannot@biu.ac.il Emanuel.Habets@audiolabs-erlangen.de Experimental Study Experiments took place in the Speech and Acoustic Lab at Bar-Ilan University: • Starkey hearing aids mounted on B&K HATS. • RIRs were recorded. • 3 different reverberation times, 3 positions, 5 angles. Motivation Previous work [1-3]: 3 Significant dereverberation was achieved. 3 Online algorithm tested on moving speakers. 7 Single-channel output → binaural required. 7 Early signal estimated → dry output. In this work we present: 3 Online algorithm for binaural dereverberation. 3 Preserving the desired part of the RIR. Fig. 1: Left and right RIRs Algorithm outline - RKEMD Statistical model Fig. 3: Experimental setup - hearing aid (left), lab setup (right) In the STFT domain, the model for speech signal in the t-th time frame and frequency k is, s(t, k) ∼ NC {0, φs(t, k)} φs(t, k) = −2 T φw (t) · 1 − at ek . Using the convolutive transfer function (CTF) model for reverberation, the j-th input signal is, hj (l, k)s(t − l, k) + vj (t, k) l=0 vj (t, k) ∼ NC 0, φvj (t, k) j = 0, ..., J − 1 . , We use the state-vector representation (k omitted), zj (t) = hjT st + vj (t) sTt ≡ [s(t − L + 1), . . . , s(t)] hjT ≡ [hj (L − 1), ..., hj (0)] , and for the multi-channel signal we use matrix form, Fig. 2: General scheme of the proposed algorithm zt ≡ [z1(t), . . . , zJ (t)]T vt ≡ [v1(t), . . . , vJ (t)]T zt = Hst + vt Fig. 4: ITD and ILD distribution for the different reverberation levels (rows) and different values of α (columns). These plots relate to the farthest speaker (denoted by speaker 3 in the above setup) Recursive-EM algorithm with H = (h0, . . . , hJ−1)T comprising all J CTFs. In order to apply MMSE estimates, define the following matrices,     ! φv1 · · · 0 0 ··· 0 0 I  . ..   . .. . . . . . , .  G= . Φ= Ft =  . T 00 0 · · · φvj 0 · · · φx(t) and the state-space model of the desired source, Binaural problem formulation Define the following signal as the reference, 4 α=0.1 T xt = Φxt−1 + ut , ut ≡ [0, . . . , x(t)] . and xt defined similarly to st. The matrix form is now zt = t + vt with = e h 0 1 e ,...,h T J−1 . The target signals are defined by iT e yj (t) = W · h j xt , where W is a weighting matrix. An intuitive choice is, Wα ≡ diag e , e 0 −α , ..., e b t|t−1 = Φ x b t−1|t−1 x Pt|t−1 = Φ Pt−1|t−1 ΦT + Ft −(M −1)α W∞ = diag {1, 0, ..., 0} , W0 = diag {1, 1, ..., 1} . Update: i−1 fx et = zt − H t b t|t−1 b t|t = x b t|t−1 + Kt et x h i f P Pt|t = IM − KtH t t|t−1 [0.5,4] References (t) (t−1) H b t|tx b t|t Rxx = β Rxx + (1 − β) · x + Pt|t Finally, the binaural target signal is xt where r is the index of the reference microphone on the right device. i (t) (t−1) ∗ b rxzj = β rxzj + (1 − β) · xt|t zj (t) 2 (t) (t−1) rzj zj = β rzj zj + (1 − β) · |zj (t)| Parameters: e (t) ← linear fit of x and z (t) (least-squares). h j t j φvj (t) ← residual of the linear fit. Output signal: h e (t) , h e (t) yB(t) = W · h ` r iT b t|t x B. Schwartz, S. Gannot, and E. A. P. Habets, “Online speech dereverberation using Kalman filter and EM algorithm,” IEEE/ACM Transactions on Audio, Speech, and Language Processing vol. 23, no. 2, February 2015. 2 B. Schwartz, S. Gannot, and E. A. P. Habets, “Multi-microphone speech dereverberation using expectation-maximization and Kalman smoothing,” European Signal Processing Conference (EUSIPCO), Marakech, Morocco, Sept. 2013. 3 B. Schwartz, S. Gannot, and E. A. P. Habets, “LPC-based speech dereverberation using Kalman-EM algorithm,” International Workshop on Acoustic Echo and Noise Control (IWAENC), Antibes – Juan les Pins, France, Sept. 2014. 1 Sufficient statistics: h e , h e yB(t) = W · h ` r [-2.3,0.5] {0, 0.7}: more reverberation is removed. • Above 0.7, there is a slight degradation, due to estimation errors. This degradation is more pronounced in subjective evaluation. • As α increases, the signal becomes more directional, as can be deduced from the more concentrated cue scattering. Statistics and M-step iT [-4,-2.3] • Speech quality is improved as α increases in the range o and the two extremes would be h [-6,-4] Conclusions Predict: fH H fP fH + G Kt = Pt|t−1H H t t|t−1 t t t n 2 Fig. 5: WSNR improvement (w.r.t the direct speech) for different αs. E-step: Kalman filter h h 2.5 DRR range (dB) e T x + v (t) with h e = h−1(0) · h , zj (t) = h j j j j t ` f H α=∞ 3 [-10,-6] where ` denotes the index of the reference microphone of the left device. Now, we re-write the state vector representation as f Hx α=0.7 1.5 x(t) = h`(0)s(t) , α=0.35 3.5 WSNR Imp (dB) zj (t, k) ≈ L−1 X

An online dereverberation algorithm for hearing aids with binaural

Related documents

Products

Support

An online dereverberation algorithm for hearing aids with binaural

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib