APPENDIX In real-world applications, noise reduction algorithms

APPENDIX In real-world applications, noise reduction algorithms generally apply suppression gain functions to the mixture envelopes of speech and noise. In doing this, it can be seen that a gain value of 0 yields least residual noise but most speech distortion; while a gain value of 1 yields least speech distortion but most residual noise. Hence, a tradeoff between residual noise and speech distortion needs to be achieved and this is usually done by deriving the suppression gain functions of noise reduction based on mathematically optimized criteria. These criteria mostly target the goal of minimizing speech distortion with residual noise being kept below a threshold. As a result, real-world noise reduction algorithms introduce speech distortion to some degrees while minimizing the effects of noise on speech intelligibility. In addition, to achieve this goal, an accurate noise-estimation algorithm is required in noise reduction, and clearly noise estimation also plays an important role in order for the suppression gain functions to reduce noise without introducing unnecessary speech distortion. This Appendix only briefly describes the Wiener-filtering algorithm and the logMMSE algorithm used in the present study, as the subspace and spectral-subtractive noise-reduction methods have been described in earlier studies (e.g., Loizou, et al., 2005; Yang and Fu, 2005). Both noise-suppression gain functions rely on one or both of the two SNR estimators, namely, the a priori SNR, and the a posteriori SNR, both of which in turn depend on the noise spectrum estimation. 1) Estimation of a priori SNR, a posteriori SNR and gain function The concept of the a priori SNR has been introduced to achieve the best trade-off between speech distortion and residual noise. The a priori SNR ξ𝑘 is defined as the ratio of the clean-speech power spectrum to the noise power spectrum, and it can been seen that without access to the clean-speech power spectrum, ξ𝑘 has to be estimated from the noisy speech power spectrum. The a posteriori SNR 𝛾𝑘 is defined as the ratio of the noisy-speech power spectrum to the noise power spectrum. In real-world practice, the a priori SNR is estimated using the recursive decision-directed method (Ephraim and Malah, 1984) involving the estimated clean-speech power spectrum in the previous speech frame and the a posteriori SNR in the current frame. The gain function 𝑔𝑘 is defined as the ratio of the estimated clean-speech power spectrum and the noisy-speech power spectrum, and for the Wiener-filtering algorithm, 𝑔𝑘 can be expressed in terms of the a priori SNR ξ𝑘 as: 𝑔𝑘 = ξ ξ𝑘 ; 𝑘 +1 and for the logMMSE algorithm, 𝑔𝑘 can be expressed as: 𝑔𝑘 = ξ𝑘 ξ𝑘 1 ∞ 𝑒 −𝑡 𝑑𝑡} , 𝑣𝑘 𝑘 𝑡 𝑒𝑥𝑝 { ∫𝑣 +1 2 = ξ𝑘 𝛾 . ξ𝑘 +1 𝑘 After the gain function is estimated, it is straightforward to compute the estimated clean-speech power spectrum. 2) Noise power spectrum estimation The smoothed power spectrum of the noisy speech is first computed using a first-order recursive equation involving the short-time power spectrum of the noisy speech and a smoothing constant. Next, a nonlinear rule is used to track the minimum of the noisy-speech power spectrum by continuously averaging past spectral values. Then, the speech presence in each frame and frequency will be determined by comparing the ratio between the noisy-speech power spectrum and its local minimum to a frequency-dependent threshold. If the above ratio is found to be greater than the threshold, it is taken as a speech-present frequency bin; otherwise, it is taken as a speech-absent frequency bin. The above processing is based on the principal that the power spectrum of the noisy speech will be nearly equal to its local minimum when speech is absent. Hence, the smaller the ratio, the higher the probability it will be a noise-only region and vice versa. The speech-presence probability is updated using a first-order recursion that implicitly exploits the correlation for speech presence in adjacent frames. Using the speech-presence probability estimate, the time-frequency-dependent smoothing factor is computed. Finally, the noise power spectrum estimate is updated by using the frequency-dependent smoothing.

APPENDIX In real-world applications, noise reduction algorithms

Related documents

Products

Support

APPENDIX In real-world applications, noise reduction algorithms

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib