International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 Review on Speech Enhancement using Signal Subspace method Nandini Garg1, JyotiGupta2 1&2 MMEC Mullana (Ambala),Haryana,INDIA ABSTRACT In speech communication, quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. The speech processing systems used to communicate or store speech are usually designed for a noise free environment but in a real-world environment, the presence of background interference in the form of additive background and channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener fatigue. Speech enhancement algorithms attempt to improve the performance of communication systems when their input or output signals are corrupted by noise. Speech Enhancement in general has three major objectives: (a) To improve the perceptual aspects such as quality and intelligibility of the processed speech i.e to make it sound better or clearer to the human listener; (b) to improve the robustness of the speech coders which tend to be severely affected by presence of noise; and (c) to increase the accuracy of speech recognition systems operating in less than ideal locations. In this paper various speech enhancement techniques have been discussed. Keywords:speech communication,Speech enhancement,background interference 1. INTRODUCTION Speech is most natural form of human communication. The perception of speech signal is usually measured in terms of its quality and intelligibility. The quality is a subjective measures that indicates the pleasantness or naturalness of the perceived speech. Intelligibility is an objective measure which predicts the percentage of words that can be correctly identified by listeners. Enhancement means the improvement in the value or quality of something. When applied to speech, this simply means the improvement in intelligibility and/or quality of a degraded speech signal by using signal processing tools. By speech enhancement, it refers not only to noise reduction but also to dereverberation and separation of independent signals [21]. This is a very difficult problem for two reasons. First, the nature and characteristics of the noise signals can change dramatically in time and between applications. It is also difficult to find algorithms that really work in different practical environments. Second, the performance measure can also be defined differently for each application. Two criteria are often used to measure the performance: quality and intelligibility. It is very hard to satisfy both at the same time. Several techniques have been proposed for this purpose .The basic method for speech enhancement is Spectral Subtraction approach .It is very simple method and easy to implement. Other methods for SpeechEenhancement are Iterative Wiener filtering, Kalman filtering, Linear Predictive coding (LPC) analysis, Signal Subspace method [1]–[5]. The performances of these techniques depend on the quality and intelligibility of the processed speech signal. The improvement in the speech signal-to-noise ratio (SNR) is the target of most techniques [4].Enhancement techniques may be classified as single channel and dual channel or multi-channel enhancement techniques. Single channel enhancement techniques apply to situations in which only one acquisition channel is available. The spectral subtraction method is a well-known single channel noise reduction technique [5]. The conventional power spectral subtraction method substantially reduces the noise levels in the noisy speech. But it introduces an annoying distortion in the speech signal called musical noise. So the different approaches of spectral subtraction ( spectral subtraction with over subtraction,non-linear spectral subtraction, Multiband spectral subtraction etc.) are used to remove the musical noise and improve the quality and intelligibility of the signal. Evaluation of spectral subtractive algorithms revealed that these algorithms improve speech quality and not affect much more on intelligibility of speech signal. So the other method called Signal Subspace method [6] is proposed that allows better and more suppression of the noise. The aim of this method is to improve the quality, while minimising any loss in intelligibility. This is a review paper and its objective is to provide an overview of the verity of speech enhancement algorithms that has been propose to improve the speech quality and intelligibility. 2. SPEECH ENHANCEMENT METHODS Volume 2, Issue 5, May 2013 Page 215 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 The main objective of speech enhancement technique is to improve the quality and minimize the loss in intelligibility of the signal and listener fatigue. The basic overview is shown in figure 1: Figure1: Basic overview of speech enhancement system [21]. There are various speech enhancement methods proposed for noise reduction and to improve the noise quality and intelligibility. Speech enhancement can be done in both time domain and transform domain [3]. Time domain techniques uses Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, linear predictive coefficients, Kalman Filtering, Hidden Markov Model etc. Transform domain techniques are techniques in which transformation is first performed on noisy speech before filtering followed by the corresponding inverse transformation in order to restore the original signal[3]. The main advantage of performing the noise filtering or reduction process in the transform domain lies in the relative ease of distinguishing and removing noise from speech. 2.1SPECTRAL SUBTRACTION METHOD The very basic method for speech enhancement is Spectral Subtraction method. In this section, we provide a brief overview on conventional spectral subtraction [3]. Spectral subtraction assumes that a signal is composed of two additive components. The noisy speech can be expressed as (1) Where is time, represents the uncorrupted speech signal, represents the additive noise signal and is the corrupted speech signal available for processing. The observed signal is divided into overlapping frames using the application of a window function and implemented in the short-time Fourier transform (STFT) magnitude domain. In the frequency domain this can be represented as (2) The power spectrum of noisy speech can be estimated as: (3) Where is the statistical average values of 2 2 S ( ) Y ( ) E ( D ( ) ) during non-speech stage. So the enhanced speech amplitude is 1/ 2 2 [ Y ( ) n ( )]1 / 2 (4) Combined with the phase of the noise-corrupted signal to re-synthesize the signal S ( ) S ( ) e j arg Y ( ) (5) The inverse short-time Fourier transform is performed to transform the signals into time domain. Conventional spectral subtraction algorithm estimating noisy energy during no speech stage, however, it can’t update noise during speech stage. Also the method requires a VAD that might not work very well under low SNR. Volume 2, Issue 5, May 2013 Page 216 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 Spectral Subtraction may be fairly straight-forward to implement, and although reducing the noise significantly, it has some severe drawbacks [1]. This includes; (a) impossibility to use for non-stationary noise,(b)Dependence on VAD(Voice Activity Detector) accuracy,(c) Musical noise due to imperfect noise estimation. With the passage of time spectral subtraction has undergone many modifications [21]. The new methods are (spectral subtraction with over subtraction, non-linear spectral subtraction, Multiband spectral subtraction etc.). The original spectral subtraction method are modified as subtracting an over estimate of the noise power spectrum and preventing the resultant spectrum from going below a preset minimum spectral floor value. This modification leads minimizing the perception of narrow spectral excursions and thus lowers the musical noise effect. Evaluation of spectral subtractive algorithms revealed that these algorithms improve speech quality and not affect much more on intelligibility of speech signal. So other methods are used for speech enhancement. Figure 2: Spectral Subtraction Process Flow Diagram[21]. 2.2 TIME DOMAIN METHODS 2.2.1 WIENER FILTERING The Wiener filter for speech enhancement was suggested as an improvement to spectral subtraction by Lim and Oppenheim in December 1979. The Wiener filter is a popular technique [4] that has been used in many signal enhancement methods. The basic principle of the Wiener filter is to obtain an estimate of the clean signal from that corrupted by additive noise. This estimate is obtained by minimizing the Mean Square Error (MSE) between the desired signal s(n) and the estimated signal ˆs(n) . The frequency domain solution to this optimization problem gives the following filter transfer function: (6) Where and are the power spectral densities of the clean and the noise signals, respectively. This formula can be derived considering the signal s and the noise v as uncorrelated and stationary signals. The SNR is defined by [13]: SNR Ps ( ) (7) P s ( ) This definition can be incorporated to the Wiener filter equation as follows (8) The drawback [4] of the Wiener filter is the fixed frequency response at all frequencies and the requirement to estimate the power spectral density of the clean signal and noise prior to filtering. 2.2.2 KALMAN FILTERING Volume 2, Issue 5, May 2013 Page 217 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 The Kalman filter can be seen as a generalization of the Wiener filter.This model consists of a slowly varying AR model [7]. The AR model and the excitation model fit nicely into the Kalman filtering framework, fully exploiting the capability of the Kalman filter to process non-stationary signals in an LMMSE optimum manner. The AR-model coefficients are estimated by a decision directed type Power Spectral Subtraction method followed by an LPC analysis. For the robust estimation of the rapidly time-varying excitation model in the presence of noise, MultiPulse Linear Predictive Coding (MPLPC) based method is used. We can say that the Kalman filter combines all the available data measured, plus the knowledge of the system and the measurement devices, to produce an estimation of the desired variables in such a manner that the error is statistically minimized. One of the most fundamental differences between the Wiener filter and the Kalman filter is the ability of the latter to accommodate non-stationary signals. 2.2.3LINEAR PREDICTIVE CODING Linear predictive coding (LPC) is a tool used, mainly, in the audio signal and speech processing to represent the spectral envelop of a speech digital signal in a compressed way (using the information of linear prediction model) [14]. This technique is one of the most powerful to analyze the speech, and one of the most useful methods for encoding with good quality at low rate. LPC starts with the assumption that the speech signal is produced by a buzz at the end of a tube, adding, sometimes, hissing and popping sounds. This model is a good approximation to the reality. The glottis produces the buzz, which is characterized by his intensity (loudness) and frequency (pitch). The vocal tract generates a tube which is characterized by his resonances, called formants. The lips, tongue and throat generate the hisses and pops sounds. LPC analyzes the speech signal using the formants, removing their effect from the speech signal and estimating the intensity and frequency of the remaining speech signal buzz. The removing formants process is called inverse filtering and the remaining signal after the subtraction is called residue. The numbers which describe the frequency and intensity of the buzz, the formants and the residue signal can be stored or transmitted. The basic problem of the LPC system is to determine the formants from the original signal. The solution is to express each sample as a linear combination of previous samples. This equation is called linear predictor. The coefficients of the equation (the prediction coefficients) characterize the formants, so we use the LPC system to estimate these coefficients [14]. 2.3 TRANSFORM DOMAIN METHOD 2.3.1 DFT BASED (STSA METHODS) These are most popular as they have less computational complexity and easy implementation. They use short time DFT (STDFT) and have been intensively investigated; also known as spectral processing methods [6]. They are based on the fact that human speech perception is not sensitive to spectral phase but the clean spectral amplitude must be properly extracted from the noisy speech to have acceptable quality speech at output and hence they are called short time spectral amplitude (STSA) based methods. 2.3.2 SIGNAL SUBSPACE METHOD The approach involves the use of a signal dependant transform to decompose a noisy signal into two separate subspaces, the signal plus noise subspace, and the noise-only subspace. The transform employed to perform this operation is the Karhuenen-Loeve transform (KLT) [6]. This theory assumes that speech can only span the signal plus noise subspace, for simplicity called the signal subspace, while noise can span the entire Euclidean space. Only the signal subspace is used when estimating the clean signal. The KLT components which represent the noise only subspace are nulled, while the components which represent the noisy signal are modified by a gain function. The enhanced signal is determined from the inverse KLT of the altered components. The aim here is to improve the quality, while minimising any loss in intelligibility. Figure 3: Block diagram of Subspace speech enhancement system [6] The enhanced speech, produced by the signal subspace with adaptive noise estimation (SSANE) algorithm, is of a good, natural-sounding quality and contains no audible noise [6]. Volume 2, Issue 5, May 2013 Page 218 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 However, this algorithm can only update the noise estimate when speech is absent, and suffers degradation in performance in many different noise types. 3. EXPERIMENTAL RESULTS AND PERFORMANCES Boll [15] performed intelligibility and quality measure tests using the Diagnostic rhyme Test(DRT).Result indicated that spectral subtraction did not decrease speech intelligibility but improve speech quality particularly in the area of pleasantness and inconspicuousness of the background noise. Paurav Goel et. al. [16] discussed in this paper various techniques to reduce the musical noise from the noisy speech signal. This musical noise can be reduced to a certain limit by using the spectral subtraction techniques using modulation domain and geometric approach. When both subjective and objective test were performed on the modulation approach then we get the improved speech quality. Liuyang Gao et. al. [17] proposed that this paper presents and analyses a new speech enhancement algorithm based on improved spectral subtraction. Improved spectral subtraction algorithm accurately estimates the noise according to that the amplitude spectral of narrowband white Gaussian noise obeys Rayleigh distribution, based on that all noise can be changed into AWGN. This algorithm also adopts a new speech activity detection technology based on frequency band variance to detect signal activity. The emulational analyses indicates that the algorithm in this paper is better suit for speech enhancement by removing the noise in comparison to standard spectral subtraction. Anuradha R fukane [18] et. al. describes different approaches of spectral subtraction method for enhancing the speech signal from the noisy environments. The authors say that the clean signal’s quality is degraded by the additive background noise. Among all the available methods the spectral subtraction algorithm is historically one of the first algorithm proposed for background noise reduction. In this paper the author present the review of basic spectral subtraction algorithm such as spectral subtraction with over subtraction, non linear spectral subtraction and MMSE spectral subtraction based on the perceptual properties that minimizes the limitations of the basic methods. Y. Ephraim et. Al[ 19] describes subspace filtering produces much less musical noise than spectral subtraction does. Also, for improved speech recognition accuracy in noisy environments, SVD-based speech enhancement turned out to be highly competitive with spectral subtraction.Overall, the MV estimator—including its generalization to the TDC estimator and the SDC estimator proved to give the best results. Kris Hermus et al [20] found that KLT-based speech enhancement is to be preferred over FFT-based (i.e., spectral subtraction) algorithms, even though the latter operates at a (much) lower computational load. K. Ramalakshmi[1] proposed algorithm for signal subspace speech enhancement is implemented and tested using speech file sampled at frequency of 8 KH at 16 bits rate. The speech wave file is converted in to 16 bits ASCII values. The raw values are applied to Karhunen loeve transform to separate the speech and noise signal. The sample input value with speech and noise and sampled output signal shown in figure below: Figure 4: Clean signal[1] Figure5: Clean and Noise speech signal[1] Volume 2, Issue 5, May 2013 Page 219 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 Figure6: Enhanced signal[1] 4. PERFORMANCE EVALUATION As an objective measure, segmental signal-to-noise ratio (SNRseg) and weighted spectral slope (WSS) are used in the evaluation [1]. The weighted spectral slope measure is calculated, using the formula 35 d wss ( j ) k sp ( k k ) wa ( k ) S (k ) S ( k ) k 1 2 (10) Where k and are related to overall sound pressure level of the original and enhanced utterances, and is a parameter which can be varied to increase overall performance. Signal-to-noise ratio is used for evaluation of the Quality of random signal transmission[1]. Signal-to-noise ratio in decibels can be calculated, using the formula. (11) Where x (n) and y (n) are speech signals in discrete time. Table 3: Signal to Noise ratio calculations [1] In performance evaluation we calculate the SNR and Weighted spectral slope are calculated by above formulae.SNR calculates the quality of the random signal[1]. Weighted spectral slope improve the overall performance .In table, shown above, both SVD based signal subspace and spectral subtraction noise reduction schemes were tested and compared in enhancing speech signals , which have been degraded by computer generated additive white Gaussian noise at different SNR. 5. CONCLUSION In this paper different speech enhancement techniques have discussed. Spectral subtraction is the basic technique for speech enhancement but it has some severe drawbacks like impossibility to use for non-stationary noise, Dependence on VAD (Voice Activity Detector) accuracy, Musical noise due to imperfect noise estimation. The major Drawback of Wiener filter is the fixed frequency response at all frequencies and the requirement to estimate the power spectral density of the clean signal and noise prior to filtering. One of the most fundamental differences between the Wiener filter and the Kalman filter is the ability of the latter to accommodate non-stationary signals. The signal subspace approach is used for robust and accurate noise estimation in non-stationary environment. Signal subspace speech enhancement has proven to be a powerful and very flexible tool, both for increasing the speech intelligibility in speech communication applications and improving the accuracy of automatic speech recognisers in additive noise environment. 6. REFERENCES [1.] K.Ramalakshmi “Assistant Professor,’ speech enhancement with signal subspace filter based on perceptual post filtering” Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore ,Vol. 2 Issue 1 January 2013. [2.] Ekaterina Verteletskaya, Boris Simak “ Noise Reduction Based on Modified Spectral Subtraction Method” IAENG International Journal of Computer Science, 38:1, IJCS_38_1_10, 10 February 2011. [3.] Lu-ying SUI, Xiong-wei ZHANG, Jian-jun HUANG, Bin ZHOU“An Improved Spectral Subtraction Speech Enhancement Algorithm under Non-stationary Noise” Institute of Command Automation, PLAUST Nanjing, China,IEEE,2011. [4.] M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-samie “ Speech Enhancement using an Adaptive Wiener Filtering approach” Department of Electronics and Electrical communications,Menoufia University ,Menouf, Egypt, Progress In Electromagnetics Research M, Vol. 4, 167–184, 2008. Volume 2, Issue 5, May 2013 Page 220 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 5, May 2013 ISSN 2319 - 4847 [5.] Radu M. Udrea1, Dragoş N. Vizireanu2, Ionuţ Pirnog3 “A Perceptual Approach for Noise Reduction using Nonlinear Spectral Subtraction” sep 26-28,2007. [6.] Barry Commins “Signal Subspace Speech Enhancement with Adaptive Noise Estimation”National University of Ireland, Galway, September 2005. [7.] Chunjian li and Søren Vang Andersen “Integrating Kalman Filtering and Multi-Pulse Coding for Speech Enhancement with a Non-Stationary model of the Speech signal” Department of Communication Technology, Aalborg University, DK-9220 Aalborg Ø, Denmark, 0-7803-8622-1/04/$20.00 ©2004 IEEE 2300 [8.] Chi-Chou Kao andYen-Tai Lai “An Efficient Speech Enhancement method using kalman Filter and Spectral Subtraction” department of information technology and electrical engineering,national pingtung institute of commerce, pingtung, taiwan, national cheng kung university, tainan, taiwan, the 2004 IEEE asia-pacific conference on circuits and systems, december 6-9,2004 [9.] Kris Hermus, PatrickWambacq, and Hugo Van hamme “A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition” Department of Electrical Engineering - ESAT, Katholieke Universiteit Leuven, 3001 Leuven-Heverlee, BelgiumReceived 24 October 2005; Revised 7 March 2006; Accepted 30 April 2006. [10.] Angel de la Torre, Jose C. Segura, Carmen Benitez, Javier Ramirez,Luz Garcia and Antonio J. Rubio “Speech Recognition Under Noise Conditions: Compensation Methods” ,University of Granada ,Spain, Source: Robust Speech Recognition and Understanding, Book edited by: Michael Grimm and Kristian Kroschel, ISBN 987-390213-08-0, pp.460, I-Tech, Vienna, Austria, June 2007. [11.] Yi Hu, Philipos C. Loizou “Subjective comparison and evaluation of speech enhancement algorithms” Department of Electrical Engineering, The University of Texas at Dallas,Richardson, TX 75083-0688, USA Received 2 March 2006; received in revised form 10 July 2006; accepted 18 December 2006.E-mail address: loizou@utdallas.edu (P.C.Loizou). [12.] Jianfen Maa,b,1, Philipos C. Loizou “SNR loss: A new objective measure for predicting the intelligibility of noisesuppressed speech” Taiyuan University of Technology, Shanxi 030024, China, Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75083-0688, United States ,Received 2 December 2009; received in revised form 19 October 2010; accepted 24 October 2010. [13.] Gibak Kim, Yang Lu, Yi Hu, and Philipos C. Loizou“An algorithm that improves speech intelligibility in noise for normal-hearing listeners” Department of Electrical Engineering, University of Texas at Dallas, Richardson, Texas 75080_Received 30 October 2008; revised 27 March 2009; accepted 1 July 2009. [14.] Mostafa Hydari “Speech Signals Enhancement Using LPC Analysisbased on Inverse Fourier Methods” Department of Computer Engineering, Faculty of Engineering Noshirvani,Institute of Technology P.O. Box: 844, Babol, Iran,Contemporary Engineering Sciences, Vol. 2, 2009. [15.] Anuradha R. Fukane, Shashikant L. Sahare“Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments”International Journal of Scientific & Engineering Research, Volume 2, Issue 5, May-2011 1 ISSN 2229-5518. [16.] Paruav Goel and Anil Garg “Review of spectral subtraction Technique for speech enhancement” IJECT Vol 2 December 2011. [17.] Liuyang Gao and Yunfei “Speech enhancement algorithm based on improved spectral suntraction “IEEE 2009. [18.] Anuradha R. Fukane, “Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments” International Journal of Scientific & Engineering Research, Volume 2, Issue 5, May-2011. [19.] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251–266, 1995 [20.] Kris Hermus, PatrickWambacq, and Hugo Van hamme, “A Review of Signal Subspace Speech Enhancement and It’s Application to Noise Robust Speech Recognition” Hindawi Publishing Corporation, EURASIP Journal on Advances in Signal Processing Volume 2007, Article ID 45821, doi:10.1155/2007/45821 [21.] Ganga Prasad, Surender “A Review of Different Approaches of Spectral Subtraction Algorithms for Speech Enhancement”Department of Electronics, Madhav Institute of Technology & Science Gwalior, M.P. – 474005 Volume 2, Issue 5, May 2013 Page 221