Speech processing ECE 5525 Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014 1 Abstract Language is the most important, direct, effective and convenient means of information exchange. With the rapid development of science and technology in recent years, people are not satisfied with the way to exchange information with computer, hoping to get rid of the keyboard and the mouse and achieving the goal of using language to control the computer. Therefore, the language signal processing technology was produced. Language signal processing is an emerging discipline, but also is a cross discipline which multiplied disciplines and covered a very wide range. Now some language signal processing systems are embedded in the intelligent system, but they can only work in a quiet environment. However, in the speech information acquisition process will inevitably have a variety of noise interference. Noise can not only reduce speech intelligibility and voice quality, it also affect speech processing accuracy, and even make the system not working properly. In this paper we will discuss the principle and method of the speech enhancement technology. Mainly introduces a method for speech enhancement -- spectral subtraction algorithm and its improved algorithm. The method can effectively eliminate the stationary additive noise, the improved algorithm can effectively eliminate which the common method produced “music noise”, obviously improves the speech signal to noise ratio. Keywords: Speech signal processing Speech enhancement spectral subtraction algorithm improved algorithm Summary A speech enhancement algorithm was developed based on spectral subtraction to 2 reduce the disturbances of noise on speech communications. The algorithm uses a Gaussian statistical model to revise the noise spectrum estimate for the speech enhancement. The algorithm then uses a simple method to compute the presence probability of speech in each frequency bin to enhance the speech signal. Experimental Tools MATLAB is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. Experimental objects First we use the WINDOWS recorder software to take recode a clear speech signal in ‘wav’ format. Next add a sine wave noise signal (0.5 amplitude and 1000Hz frequency) into the previous clear voice to get a new audio document by using MATLAB. Code: clc,clear [x,fs,bits]=wavread('11.wav'); N=size(x,1); x1=x(1:N,1); fn=1000; t=1:length(x1); x2=0.5*sin(2*pi*fn/fs*t); y=x1+x2'; wavwrite(y,fs,'12.wav'); 3 4 General Spectral subtraction algorithm In many speech enhancement methods spectral subtraction is one of the most popular one because of its easy to implement and less calculation in speech processing. Spectral subtraction begins to use in 1980s becomes effective speech enhancement algorithms. The basic spectral subtraction it is assumed a smooth voice signal and noise is additive noise. The voice signals and noise are not related to each other. At this noisy speech signal can be expressed as: y (t ) s (t ) n(t ) y (t) is the noisy speech signal, s (t) for the clean speech signal, n (t) is the noise signal. With Y (w), S (w) and N (w) to repentant y (t), s (t) and n (t) of the Fourier transform the following relationship: Y ( w) S ( w) N ( w) Y w S w N w 2 Re S w N * w 2 2 2 E Y w E S w E N w 2 E Re S w N * w 2 2 2 Because s (t) and n (t) independent so S (w) and N (w) is also independent. E Re S w N * w =0 and Y w S w N w 2 2 2 It is possible to use the "silent frames" before speech to estimating the noise. The formula can be used to estimate the original speech: s w w repentant estimated value; 2 Y w N w w N w w 2 2 2 It means average free speech signal; If the negative results appear in equation, then it is changed to 0 or changes the sign to positive, because the power spectrum cannot be negative. 5 So we can get the original speech valuation: 1 2 2 2 S w Y w N w Code: clc,clear [x,fs,bits]=wavread('12.wav'); y=x(1:350,1); Y=fft(y); magY=abs(Y); b=[]; for i=0:2000; n=350; x1=x(1+n*i:n+n*i); X1=fft(x1); magX=abs(X1); S=(magX.^2-magY.^2); S1=abs(S).^0.5; s1=ifft(S1); a=s1'; b=[b a]; end x2=b'; plot(x2); sound(x2,fs,bits); wavwrite(x2,fs,'13.wav') 6 Fig 13.wav Improve spectral subtraction algorithm In fact the noise spectrum is Gaussian distribution: 1 Pn x e 2 x m 2 2 2 m is the mean of x, is the standard deviation Therefore after using basic spectral subtraction noise elimination, there still exist some greater power spectrum of the residual components random present in the spectrum spike. After the inverse Fourier transform the enhanced speech formed a new rhythmic fluctuation noise (musical noise) and this kind of noise cannot use 7 spectral subtraction to remove. In order to minimize the secondary pollution to the voice information caused by ‘musical noise’ (rhythmic fluctuation noise) spectral subtraction can be improved. Speech information energy generally concentrated in some frequencies or frequency bands in noisy speech, and the noise energy is often distributed over the entire frequency range. Therefore, remove the noise at the higher the amplitude of time frame. Minus N w w 2 it will highlight the voice power spectrum. In addition, there is an improved method, for amend the processing of the power spectrum. Change * 2 1 1 and * 2 to * and * . Combining these two improved process, the enhanced form of spectral subtraction can be expressed as: s w w = Yw w - N w w When =2, =1 that is general spectral subtraction. We know that subtraction correction factor, change the value of is spectral will further enhance the signal to noise ratio; as spectral subtraction noise figure, its role is to reduce the noise power spectrum, modify the coefficient of would serve to reduce noise and highlight the speech spectrum. Code: clc,clear; [x,fs,bits]=wavread('12.wav'); y=x(1:350,1); Y=fft(y); magY=abs(Y); b1=[];a=0.4;b=0.5; for i=0:2000; n=350; x1=x(1+n*i:n+n*i); 8 X1=fft(x1); magX=abs(X1); S=(magX.^a-magY.^a); S1=abs(S).^(1/b); s1=ifft(S1); m=mean(s1)*300; for j=1:350; if abs(s1(j))>m; s1(j)=s1(j)/4; end end a1=s1'; b1=[b1 a1]; end x2=b1'; plot(x2); sound(x2,fs,bits); wavwrite(x2,fs,'14.wav') Fig 14.wav 9 Conclusion Compare the figure Fig 13.wav and Fig 14.wav we can obviously find that the speech waveform has been significantly improved also we can hear less musical noise. But with eliminates musical noise the voice will be reduced inevitably. Many experiments shows that modify the will further enhance the signal to noise ratio and change the coefficient of would serve to reduce noise and highlight the speech spectrum. However too big value of and will cause the voice distortion. The results show that the algorithm more effectively eliminates musical noise and improves the signal to noise rates without significantly impairing the speech intelligibility. Reference Cai Han Tian, Yuan Bo Tao. A speech enhancement algorithm based on masking properties of human auditory system. Journal of China Institute of Communications. 2002,8, Vol.23, no.5.. Jiang Xiao Ping, Yao Tian Ren, Fu Hua. Single channel speech enhancement based on masking properties and minimum statistics. Journal of China Institute of Communications. 2003,6, Vol.24.. Murakami, T., Namba, M., Hoya, T. Speech enhancement based on a combined higher frequency regeneration technique and RBF networks. TENCON '02. Proceedings. Oct.,2002, Vol.1.. 10