speech proccessing

advertisement
Speech processing
ECE 5525
Spectral subtraction algorithm and optimize
Wanfeng Zou
7/3/2014
1
Abstract
Language is the most important, direct, effective and convenient means of
information exchange. With the rapid development of science and technology in
recent years, people are not satisfied with the way to exchange information with
computer, hoping to get rid of the keyboard and the mouse and achieving the goal of
using language to control the computer. Therefore, the language signal processing
technology was produced. Language signal processing is an emerging discipline, but
also is a cross discipline which multiplied disciplines and covered a very wide range.
Now some language signal processing systems are embedded in the intelligent
system, but they can only work in a quiet environment. However, in the speech
information acquisition process will inevitably have a variety of noise interference.
Noise can not only reduce speech intelligibility and voice quality, it also affect speech
processing accuracy, and even make the system not working properly. In this paper
we will discuss the principle and method of the speech enhancement technology.
Mainly introduces a method for speech enhancement -- spectral subtraction
algorithm and its improved algorithm. The method can effectively eliminate the
stationary additive noise, the improved algorithm can effectively eliminate which the
common method produced “music noise”, obviously improves the speech signal to
noise ratio.
Keywords: Speech signal processing Speech enhancement
spectral subtraction
algorithm improved algorithm
Summary
A speech enhancement algorithm was developed based on spectral subtraction to
2
reduce the disturbances of noise on speech communications. The algorithm uses a
Gaussian statistical model to revise the noise spectrum estimate for the speech
enhancement. The algorithm then uses a simple method to compute the presence
probability of speech in each frequency bin to enhance the speech signal.
Experimental Tools
MATLAB is a high-level language and interactive environment for numerical
computation, visualization, and programming. Using MATLAB, you can analyze data,
develop algorithms, and create models and applications.
Experimental objects
First we use the WINDOWS recorder software to take recode a clear speech signal in
‘wav’ format. Next add a sine wave noise signal (0.5 amplitude and 1000Hz
frequency) into the previous clear voice to get a new audio document by using
MATLAB.
Code:
clc,clear
[x,fs,bits]=wavread('11.wav');
N=size(x,1);
x1=x(1:N,1);
fn=1000;
t=1:length(x1);
x2=0.5*sin(2*pi*fn/fs*t);
y=x1+x2';
wavwrite(y,fs,'12.wav');
3
4
General Spectral subtraction algorithm
In many speech enhancement methods spectral subtraction is one of the most
popular one because of its easy to implement and less calculation in speech
processing. Spectral subtraction begins to use in 1980s becomes effective speech
enhancement algorithms.
The basic spectral subtraction it is assumed a smooth voice signal and noise is
additive noise. The voice signals and noise are not related to each other. At this noisy
speech signal can be expressed as:
y (t )  s (t )  n(t )
y (t) is the noisy speech signal, s (t) for the clean speech signal, n (t) is the noise signal.
With Y (w), S (w) and N (w) to repentant y (t), s (t) and n (t) of the Fourier transform
the following relationship:
Y ( w)  S ( w)  N ( w)
Y  w  S  w  N  w  2 Re  S  w N *  w  
2
2
2


E Y  w   E S  w   E N  w   2 E Re  S  w  N *  w  
2
2
2
Because s (t) and n (t) independent so S (w) and N (w) is also independent.


E Re  S  w N *  w   =0 and Y  w  S  w  N  w
2
2
2
It is possible to use the "silent frames" before speech to estimating the noise.
The formula can be used to estimate the original speech:

s  w
w
 repentant estimated value;
2
 Y  w  N w  w
N w  w
2
2
2
It means average free speech signal;
If the negative results appear in equation, then it is changed to 0 or changes the sign
to positive, because the power spectrum cannot be negative.
5
So we can get the original speech valuation:
1
2
2 2
S  w   Y  w  N  w 


Code:
clc,clear
[x,fs,bits]=wavread('12.wav');
y=x(1:350,1);
Y=fft(y);
magY=abs(Y);
b=[];
for i=0:2000;
n=350;
x1=x(1+n*i:n+n*i);
X1=fft(x1);
magX=abs(X1);
S=(magX.^2-magY.^2);
S1=abs(S).^0.5;
s1=ifft(S1);
a=s1';
b=[b a];
end
x2=b';
plot(x2);
sound(x2,fs,bits);
wavwrite(x2,fs,'13.wav')
6
Fig 13.wav
Improve spectral subtraction algorithm
In fact the noise spectrum is Gaussian distribution:
1
Pn  x  
e
2
 x  m 
2
2
2
m is the mean of x,  is the standard deviation
Therefore after using basic spectral subtraction noise elimination, there still exist
some greater power spectrum of the residual components random present in the
spectrum spike. After the inverse Fourier transform the enhanced speech formed a
new rhythmic fluctuation noise (musical noise) and this kind of noise cannot use
7
spectral subtraction to remove.
In order to minimize the secondary pollution to the voice information caused by
‘musical noise’ (rhythmic fluctuation noise) spectral subtraction can be improved.
Speech information energy generally concentrated in some frequencies or frequency
bands in noisy speech, and the noise energy is often distributed over the entire
frequency range. Therefore, remove the noise at the higher the amplitude of time
frame.
Minus  N w  w 
2
it will highlight the voice power spectrum.
In addition, there is an improved method, for amend the processing of the power
spectrum. Change *
2
1
1

and * 2 to * and *  .
Combining these two improved process, the enhanced form of spectral subtraction
can be expressed as:

s  w
w


= Yw  w  - N w  w 

When  =2,  =1 that is general spectral subtraction. We know that
subtraction correction factor, change the value of


is spectral
will further enhance the signal
to noise ratio;  as spectral subtraction noise figure, its role is to reduce the noise
power spectrum, modify the coefficient of  would serve to reduce noise and
highlight the speech spectrum.
Code:
clc,clear;
[x,fs,bits]=wavread('12.wav');
y=x(1:350,1);
Y=fft(y);
magY=abs(Y);
b1=[];a=0.4;b=0.5;
for i=0:2000;
n=350;
x1=x(1+n*i:n+n*i);
8
X1=fft(x1);
magX=abs(X1);
S=(magX.^a-magY.^a);
S1=abs(S).^(1/b);
s1=ifft(S1);
m=mean(s1)*300;
for j=1:350;
if abs(s1(j))>m;
s1(j)=s1(j)/4;
end
end
a1=s1';
b1=[b1 a1];
end
x2=b1';
plot(x2);
sound(x2,fs,bits);
wavwrite(x2,fs,'14.wav')
Fig 14.wav
9
Conclusion
Compare the figure Fig 13.wav and Fig 14.wav we can obviously find that the speech
waveform has been significantly improved also we can hear less musical noise. But
with eliminates musical noise the voice will be reduced inevitably. Many experiments
shows that modify the

will further enhance the signal to noise ratio and change
the coefficient of  would serve to reduce noise and highlight the speech spectrum.
However too big value of
 and
 will cause the voice distortion. The results
show that the algorithm more effectively eliminates musical noise and improves the
signal to noise rates without significantly impairing the speech intelligibility.
Reference
Cai Han Tian, Yuan Bo Tao. A speech enhancement algorithm based on
masking properties of human auditory system. Journal of China Institute
of Communications. 2002,8, Vol.23, no.5..
Jiang Xiao Ping, Yao Tian Ren, Fu Hua. Single channel speech
enhancement based on masking properties and minimum statistics.
Journal of China Institute of Communications. 2003,6, Vol.24..
Murakami, T., Namba, M., Hoya, T. Speech enhancement based on a
combined higher frequency regeneration technique and RBF networks.
TENCON '02. Proceedings. Oct.,2002, Vol.1..
10
Download