International Journal of Application or Innovation in Engineering & Management... Web Site: www.ijaiem.org Email: , Volume 2, Issue 5, May 2013

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
Review on Speech Enhancement using
Signal Subspace method
Nandini Garg1, JyotiGupta2
1&2
MMEC Mullana (Ambala),Haryana,INDIA
ABSTRACT
In speech communication, quality and intelligibility of speech is of utmost importance for ease and accuracy of information
exchange. The speech processing systems used to communicate or store speech are usually designed for a noise free
environment but in a real-world environment, the presence of background interference in the form of additive background and
channel noise drastically degrades the performance of these systems, causing inaccurate information exchange and listener
fatigue. Speech enhancement algorithms attempt to improve the performance of communication systems when their input or
output signals are corrupted by noise. Speech Enhancement in general has three major objectives: (a) To improve the
perceptual aspects such as quality and intelligibility of the processed speech i.e to make it sound better or clearer to the human
listener; (b) to improve the robustness of the speech coders which tend to be severely affected by presence of noise; and (c) to
increase the accuracy of speech recognition systems operating in less than ideal locations. In this paper various speech
enhancement techniques have been discussed.
Keywords:speech communication,Speech enhancement,background interference
1. INTRODUCTION
Speech is most natural form of human communication. The perception of speech signal is usually measured in terms of
its quality and intelligibility. The quality is a subjective measures that indicates the pleasantness or naturalness of the
perceived speech. Intelligibility is an objective measure which predicts the percentage of words that can be correctly
identified by listeners. Enhancement means the improvement in the value or quality of something. When applied to
speech, this simply means the improvement in intelligibility and/or quality of a degraded speech signal by using signal
processing tools. By speech enhancement, it refers not only to noise reduction but also to dereverberation and
separation of independent signals [21]. This is a very difficult problem for two reasons. First, the nature and
characteristics of the noise signals can change dramatically in time and between applications. It is also difficult to find
algorithms that really work in different practical environments. Second, the performance measure can also be defined
differently for each application. Two criteria are often used to measure the performance: quality and intelligibility. It is
very hard to satisfy both at the same time.
Several techniques have been proposed for this purpose .The basic method for speech enhancement is Spectral
Subtraction approach .It is very simple method and easy to implement. Other methods for SpeechEenhancement are
Iterative Wiener filtering, Kalman filtering, Linear Predictive coding (LPC) analysis, Signal Subspace method [1]–[5].
The performances of these techniques depend on the quality and intelligibility of the processed speech signal. The
improvement in the speech signal-to-noise ratio (SNR) is the target of most techniques [4].Enhancement techniques
may be classified as single channel and dual channel or multi-channel enhancement techniques. Single channel
enhancement techniques apply to situations in which only one acquisition channel is available. The spectral subtraction
method is a well-known single channel noise reduction technique [5]. The conventional power spectral subtraction
method substantially reduces the noise levels in the noisy speech. But it introduces an annoying distortion in the speech
signal called musical noise. So the different approaches of spectral subtraction ( spectral subtraction with over
subtraction,non-linear spectral subtraction, Multiband spectral subtraction etc.) are used to remove the musical noise
and improve the quality and intelligibility of the signal. Evaluation of spectral subtractive algorithms revealed that
these algorithms improve speech quality and not affect much more on intelligibility of speech signal. So the other
method called Signal Subspace method [6] is proposed that allows better and more suppression of the noise. The aim of
this method is to improve the quality, while minimising any loss in intelligibility. This is a review paper and its
objective is to provide an overview of the verity of speech enhancement algorithms that has been propose to improve the
speech quality and intelligibility.
2. SPEECH ENHANCEMENT METHODS
Volume 2, Issue 5, May 2013
Page 215
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
The main objective of speech enhancement technique is to improve the quality and minimize the loss in intelligibility
of the signal and listener fatigue. The basic overview is shown in figure 1:
Figure1: Basic overview of speech enhancement system [21].
There are various speech enhancement methods proposed for noise reduction and to improve the noise quality and
intelligibility. Speech enhancement can be done in both time domain and transform domain [3]. Time domain
techniques uses Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, linear predictive
coefficients, Kalman Filtering, Hidden Markov Model etc.
Transform domain techniques are techniques in which transformation is first performed on noisy speech before filtering
followed by the corresponding inverse transformation in order to restore the original signal[3]. The main advantage of
performing the noise filtering or reduction process in the transform domain lies in the relative ease of distinguishing
and removing noise from speech.
2.1SPECTRAL SUBTRACTION METHOD
The very basic method for speech enhancement is Spectral Subtraction method. In this section, we provide a brief
overview on conventional spectral subtraction [3]. Spectral subtraction assumes that a signal is composed of two
additive components. The noisy speech can be expressed as
(1)
Where is time,
represents the uncorrupted speech signal,
represents the additive noise signal and
is the
corrupted speech signal available for processing. The observed signal
is divided into overlapping frames using the
application of a window function and implemented in the short-time Fourier transform (STFT) magnitude domain. In
the frequency domain this can be represented as
(2)
The power spectrum of noisy speech can be estimated as:
(3)
Where
is the statistical average values of


2
2
S ( )  Y ( )  E ( D ( ) )
during non-speech stage. So the enhanced speech amplitude is

1/ 2
2
 [ Y ( )   n ( )]1 / 2
(4)
Combined with the phase of the noise-corrupted signal to re-synthesize the signal

S ( )  S ( ) e
j arg Y (  ) 
(5)
The inverse short-time Fourier transform is performed to transform the signals into time domain. Conventional spectral
subtraction algorithm estimating noisy energy during no speech stage, however, it can’t update noise during speech
stage. Also the method requires a VAD that might not work very well under low SNR.
Volume 2, Issue 5, May 2013
Page 216
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
Spectral Subtraction may be fairly straight-forward to implement, and although reducing the noise significantly, it has
some severe drawbacks [1]. This includes; (a) impossibility to use for non-stationary noise,(b)Dependence on
VAD(Voice Activity Detector) accuracy,(c) Musical noise due to imperfect noise estimation.
With the passage of time spectral subtraction has undergone many modifications [21]. The new methods are (spectral
subtraction with over subtraction, non-linear spectral subtraction, Multiband spectral subtraction etc.). The original
spectral subtraction method are modified as subtracting an over estimate of the noise power spectrum and preventing
the resultant spectrum from going below a preset minimum spectral floor value. This modification leads minimizing
the perception of narrow spectral excursions and thus lowers the musical noise effect.
Evaluation of spectral subtractive algorithms revealed that these algorithms improve speech quality and not affect much
more on intelligibility of speech signal. So other methods are used for speech enhancement.
Figure 2: Spectral Subtraction Process Flow Diagram[21].
2.2 TIME DOMAIN METHODS
2.2.1 WIENER FILTERING
The Wiener filter for speech enhancement was suggested as an improvement to spectral subtraction by Lim and
Oppenheim in December 1979. The Wiener filter is a popular technique [4] that has been used in many signal
enhancement methods. The basic principle of the Wiener filter is to obtain an estimate of the clean signal from that
corrupted by additive noise. This estimate is obtained by minimizing the Mean Square Error (MSE) between the
desired signal s(n) and the estimated signal ˆs(n) . The frequency domain solution to this optimization problem gives
the following filter transfer function:
(6)
Where
and
are the power spectral densities of the clean and the noise signals, respectively. This formula
can be derived considering the signal s and the noise v as uncorrelated and stationary signals. The SNR is defined
by [13]:
SNR 
Ps ( )

(7)
P s ( )
This definition can be incorporated to the Wiener filter equation as follows
(8)
The drawback [4] of the Wiener filter is the fixed frequency response at all frequencies and the requirement to estimate
the power spectral density of the clean signal and noise prior to filtering.
2.2.2 KALMAN FILTERING
Volume 2, Issue 5, May 2013
Page 217
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
The Kalman filter can be seen as a generalization of the Wiener filter.This model consists of a slowly varying AR
model [7]. The AR model and the excitation model fit nicely into the Kalman filtering framework, fully exploiting the
capability of the Kalman filter to process non-stationary signals in an LMMSE optimum manner.
The AR-model coefficients are estimated by a decision directed type Power Spectral Subtraction method followed by an
LPC analysis. For the robust estimation of the rapidly time-varying excitation model in the presence of noise, MultiPulse Linear Predictive Coding (MPLPC) based method is used. We can say that the Kalman filter combines all the
available data measured, plus the knowledge of the system and the measurement devices, to produce an estimation of
the desired variables in such a manner that the error is statistically minimized. One of the most fundamental
differences between the Wiener filter and the Kalman filter is the ability of the latter to accommodate non-stationary
signals.
2.2.3LINEAR PREDICTIVE CODING
Linear predictive coding (LPC) is a tool used, mainly, in the audio signal and speech processing to represent the
spectral envelop of a speech digital signal in a compressed way (using the information of linear prediction model) [14].
This technique is one of the most powerful to analyze the speech, and one of the most useful methods for encoding with
good quality at low rate.
LPC starts with the assumption that the speech signal is produced by a buzz at the end of a tube, adding, sometimes,
hissing and popping sounds. This model is a good approximation to the reality. The glottis produces the buzz, which is
characterized by his intensity (loudness) and frequency (pitch). The vocal tract generates a tube which is characterized
by his resonances, called formants. The lips, tongue and throat generate the hisses and pops sounds. LPC analyzes the
speech signal using the formants, removing their effect from the speech signal and estimating the intensity and
frequency of the remaining speech signal buzz. The removing formants process is called inverse filtering and the
remaining signal after the subtraction is called residue. The numbers which describe the frequency and intensity of the
buzz, the formants and the residue signal can be stored or transmitted.
The basic problem of the LPC system is to determine the formants from the original signal. The solution is to express
each sample as a linear combination of previous samples. This equation is called linear predictor. The coefficients of
the equation (the prediction coefficients) characterize the formants, so we use the LPC system to estimate these
coefficients [14].
2.3 TRANSFORM DOMAIN METHOD
2.3.1 DFT BASED (STSA METHODS)
These are most popular as they have less computational complexity and easy implementation. They use short time DFT
(STDFT) and have been intensively investigated; also known as spectral processing methods [6]. They are based on the
fact that human speech perception is not sensitive to spectral phase but the clean spectral amplitude must be properly
extracted from the noisy speech to have acceptable quality speech at output and hence they are called short time spectral
amplitude (STSA) based methods.
2.3.2 SIGNAL SUBSPACE METHOD
The approach involves the use of a signal dependant transform to decompose a noisy signal into two separate
subspaces, the signal plus noise subspace, and the noise-only subspace. The transform employed to perform this
operation is the Karhuenen-Loeve transform (KLT) [6].
This theory assumes that speech can only span the signal plus noise subspace, for simplicity called the signal subspace,
while noise can span the entire Euclidean space. Only the signal subspace is used when estimating the clean signal.
The KLT components which represent the noise only subspace are nulled, while the components which represent the
noisy signal are modified by a gain function. The enhanced signal is determined from the inverse KLT of
the altered components. The aim here is to improve the quality, while minimising any loss in intelligibility.
Figure 3: Block diagram of Subspace speech enhancement system [6]
The enhanced speech, produced by the signal subspace with adaptive noise estimation (SSANE) algorithm, is of a good,
natural-sounding quality and contains no audible noise [6].
Volume 2, Issue 5, May 2013
Page 218
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
However, this algorithm can only update the noise estimate when speech is absent, and suffers degradation in
performance in many different noise types.
3. EXPERIMENTAL RESULTS AND PERFORMANCES
Boll [15] performed intelligibility and quality measure tests using the Diagnostic rhyme Test(DRT).Result indicated
that spectral subtraction did not decrease speech intelligibility but improve speech quality particularly in the area of
pleasantness and inconspicuousness of the background noise.
Paurav Goel et. al. [16] discussed in this paper various techniques to reduce the musical noise from the noisy speech
signal. This musical noise can be reduced to a certain limit by using the spectral subtraction techniques using
modulation domain and geometric approach. When both subjective and objective test were performed on the
modulation approach then we get the improved speech quality.
Liuyang Gao et. al. [17] proposed that this paper presents and analyses a new speech enhancement algorithm based on
improved spectral subtraction. Improved spectral subtraction algorithm accurately estimates the noise according to that
the amplitude spectral of narrowband white Gaussian noise obeys Rayleigh distribution, based on that all noise can be
changed into AWGN. This algorithm also adopts a new speech activity detection technology based on frequency band
variance to detect signal activity. The emulational analyses indicates that the algorithm in this paper is better suit for
speech enhancement by removing the noise in comparison to standard spectral subtraction.
Anuradha R fukane [18] et. al. describes different approaches of spectral subtraction method for enhancing the speech
signal from the noisy environments. The authors say that the clean signal’s quality is degraded by the additive
background noise. Among all the available methods the spectral subtraction algorithm is historically one of the first
algorithm proposed for background noise reduction. In this paper the author present the review of basic spectral
subtraction algorithm such as spectral subtraction with over subtraction, non linear spectral subtraction and MMSE
spectral subtraction based on the perceptual properties that minimizes the limitations of the basic methods.
Y. Ephraim et. Al[ 19] describes subspace filtering produces much less musical noise than spectral subtraction does.
Also, for improved speech recognition accuracy in noisy environments, SVD-based speech enhancement turned out to
be highly competitive with spectral subtraction.Overall, the MV estimator—including its generalization to the TDC
estimator and the SDC estimator proved to give the best results.
Kris Hermus et al [20] found that KLT-based speech enhancement is to be preferred over FFT-based (i.e., spectral
subtraction) algorithms, even though the latter operates at a (much) lower computational load.
K. Ramalakshmi[1] proposed algorithm for signal subspace speech enhancement is implemented and tested using
speech file sampled at frequency of 8 KH at 16 bits rate. The speech wave file is converted in to 16 bits ASCII values.
The raw values are applied to Karhunen loeve transform to separate the speech and noise signal. The sample input
value with speech and noise and sampled output signal shown in figure below:
Figure 4: Clean signal[1]
Figure5: Clean and Noise speech signal[1]
Volume 2, Issue 5, May 2013
Page 219
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
Figure6: Enhanced signal[1]
4. PERFORMANCE EVALUATION
As an objective measure, segmental signal-to-noise ratio (SNRseg) and weighted spectral slope (WSS) are used in the
evaluation [1]. The weighted spectral slope measure is calculated, using the formula
35




d wss ( j )  k sp ( k  k )   wa ( k ) S (k )  S ( k ) 


k 1
2
(10)
Where k and are related to overall sound pressure level of the original and enhanced utterances, and is a parameter
which can be varied to increase overall performance.
Signal-to-noise ratio is used for evaluation of the Quality of random signal transmission[1]. Signal-to-noise ratio in
decibels can be calculated, using the formula.
(11)
Where x (n) and y (n) are speech signals in discrete time.
Table 3: Signal to Noise ratio calculations [1]
In performance evaluation we calculate the SNR and Weighted spectral slope are calculated by above formulae.SNR
calculates the quality of the random signal[1]. Weighted spectral slope improve the overall performance .In table,
shown above, both SVD based signal subspace and spectral subtraction noise reduction schemes were tested and
compared in enhancing speech signals , which have been degraded by computer generated additive white Gaussian
noise at different SNR.
5. CONCLUSION
In this paper different speech enhancement techniques have discussed. Spectral subtraction is the basic technique for
speech enhancement but it has some severe drawbacks like impossibility to use for non-stationary noise, Dependence on
VAD (Voice Activity Detector) accuracy, Musical noise due to imperfect noise estimation. The major Drawback of
Wiener filter is the fixed frequency response at all frequencies and the requirement to estimate the power spectral
density of the clean signal and noise prior to filtering. One of the most fundamental differences between the Wiener
filter and the Kalman filter is the ability of the latter to accommodate non-stationary signals. The signal subspace
approach is used for robust and accurate noise estimation in non-stationary environment. Signal subspace speech
enhancement has proven to be a powerful and very flexible tool, both for increasing the speech intelligibility in speech
communication applications and improving the accuracy of automatic speech recognisers in additive noise
environment.
6. REFERENCES
[1.] K.Ramalakshmi “Assistant Professor,’ speech enhancement with signal subspace filter based on perceptual post
filtering” Dept of CSE Sri Ramakrishna Institute of Technology, Coimbatore ,Vol. 2 Issue 1 January 2013.
[2.] Ekaterina Verteletskaya, Boris Simak “ Noise Reduction Based on Modified Spectral Subtraction Method” IAENG
International Journal of Computer Science, 38:1, IJCS_38_1_10, 10 February 2011.
[3.] Lu-ying SUI, Xiong-wei ZHANG, Jian-jun HUANG, Bin ZHOU“An Improved Spectral
Subtraction Speech
Enhancement Algorithm under Non-stationary Noise” Institute of Command Automation, PLAUST Nanjing,
China,IEEE,2011.
[4.] M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-samie “ Speech Enhancement using an
Adaptive Wiener Filtering approach” Department of Electronics and Electrical communications,Menoufia
University ,Menouf, Egypt, Progress In Electromagnetics Research M, Vol. 4, 167–184, 2008.
Volume 2, Issue 5, May 2013
Page 220
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 5, May 2013
ISSN 2319 - 4847
[5.] Radu M. Udrea1, Dragoş N. Vizireanu2, Ionuţ Pirnog3 “A Perceptual Approach for Noise Reduction using
Nonlinear Spectral Subtraction” sep 26-28,2007.
[6.] Barry Commins “Signal Subspace Speech Enhancement with Adaptive Noise Estimation”National University of
Ireland, Galway, September 2005.
[7.] Chunjian li and Søren Vang Andersen “Integrating Kalman Filtering and Multi-Pulse Coding for Speech
Enhancement with a Non-Stationary model of the Speech signal” Department of Communication Technology,
Aalborg University, DK-9220 Aalborg Ø, Denmark, 0-7803-8622-1/04/$20.00 ©2004 IEEE 2300
[8.] Chi-Chou Kao andYen-Tai Lai “An Efficient Speech Enhancement method using kalman Filter and Spectral
Subtraction” department of information technology and electrical engineering,national pingtung institute of
commerce, pingtung, taiwan, national cheng kung university, tainan, taiwan, the 2004 IEEE asia-pacific
conference on circuits and systems, december 6-9,2004
[9.] Kris Hermus, PatrickWambacq, and Hugo Van hamme “A Review of Signal Subspace Speech Enhancement and
Its Application to Noise Robust Speech Recognition” Department of Electrical Engineering - ESAT, Katholieke
Universiteit Leuven, 3001 Leuven-Heverlee, BelgiumReceived 24 October 2005; Revised 7 March 2006; Accepted
30 April 2006.
[10.] Angel de la Torre, Jose C. Segura, Carmen Benitez, Javier Ramirez,Luz Garcia and Antonio J. Rubio “Speech
Recognition Under Noise Conditions: Compensation Methods” ,University of Granada ,Spain, Source: Robust
Speech Recognition and Understanding, Book edited by: Michael Grimm and Kristian Kroschel, ISBN 987-390213-08-0, pp.460, I-Tech, Vienna, Austria, June 2007.
[11.] Yi Hu, Philipos C. Loizou “Subjective comparison and evaluation of speech enhancement algorithms” Department
of Electrical Engineering, The University of Texas at Dallas,Richardson, TX 75083-0688, USA Received 2 March
2006; received in revised form 10 July 2006; accepted 18 December 2006.E-mail address: loizou@utdallas.edu
(P.C.Loizou).
[12.] Jianfen Maa,b,1, Philipos C. Loizou “SNR loss: A new objective measure for predicting the intelligibility of noisesuppressed speech” Taiyuan University of Technology, Shanxi 030024, China, Department of Electrical
Engineering, University of Texas at Dallas, Richardson, TX 75083-0688, United States ,Received 2 December
2009; received in revised form 19 October 2010; accepted 24 October 2010.
[13.] Gibak Kim, Yang Lu, Yi Hu, and Philipos C. Loizou“An algorithm that improves speech intelligibility in noise
for normal-hearing listeners” Department of Electrical Engineering, University of Texas at Dallas, Richardson,
Texas 75080_Received 30 October 2008; revised 27 March 2009; accepted 1 July 2009.
[14.] Mostafa Hydari “Speech Signals Enhancement Using LPC Analysisbased on Inverse Fourier Methods”
Department of Computer Engineering, Faculty of Engineering Noshirvani,Institute of Technology P.O. Box: 844,
Babol, Iran,Contemporary Engineering Sciences, Vol. 2, 2009.
[15.] Anuradha R. Fukane, Shashikant L. Sahare“Different Approaches of Spectral Subtraction method for Enhancing
the Speech Signal in Noisy Environments”International Journal of Scientific & Engineering Research, Volume 2,
Issue 5, May-2011 1 ISSN 2229-5518.
[16.] Paruav Goel and Anil Garg “Review of spectral subtraction Technique for speech enhancement” IJECT Vol 2
December 2011.
[17.] Liuyang Gao and Yunfei “Speech enhancement algorithm based on improved spectral suntraction “IEEE 2009.
[18.] Anuradha R. Fukane, “Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in
Noisy Environments” International Journal of Scientific & Engineering Research, Volume 2, Issue 5, May-2011.
[19.] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on
Speech and Audio Processing, vol. 3, no. 4, pp. 251–266, 1995
[20.] Kris Hermus, PatrickWambacq, and Hugo Van hamme, “A Review of Signal Subspace Speech Enhancement and
It’s Application to Noise Robust Speech Recognition” Hindawi Publishing Corporation, EURASIP Journal on
Advances in Signal Processing Volume 2007, Article ID 45821, doi:10.1155/2007/45821
[21.] Ganga Prasad, Surender “A Review of Different Approaches of Spectral Subtraction Algorithms for Speech
Enhancement”Department of Electronics, Madhav Institute of Technology & Science Gwalior, M.P. – 474005
Volume 2, Issue 5, May 2013
Page 221
Download