The Effect of Noise on Automatic Speech

advertisement
Bathiya Senevirathna 1
The Effect of Noise on Automatic Speech Segmentation Algorithms
The Effect of Noise on Automatic Speech Segmentation Algorithms
1. Introduction
We first familiarized ourselves with using a numerical analysis environment to analyze sound files. The
software we used was MATLAB developed by MathWorks. We used MATLAB’s built-in ‘wavread’
function to first import sound file data into a numerical array.
By using this command, we were introduced to the concept of sampling frequency. Sound in the physical
world is analog and in order to analyze it by computerized means, it is necessary to convert this
continuous stream of data into discrete, digital data. The sampling frequency is the rate at which
samples of the analog data are extracted to form digital data points. This value is usually given in
samples per unit time and is very important to this speech analysis research because the sampling rate
essentially sets the quality of the data you have to work with.
After some experimenting with recording and playback of sounds with different sampling rates, we set
about analyzing actual speech.
2. The Waveform
Figure 2.1 below shows an energy plot of the original sound I recorded (“She sells sea shells on the sea
shore”) at a sampling rate of 10kHz for 8 seconds. As can be seen, each spoken word stands out quite
prominently when looking at its energy. We then went about trying to automatically identify words
using an algorithm based on the sound data energy levels.
Figure 2.1 – Plot of original sound
"She sells sea shells on a sea shore" 8s @ 10kHz
1
0.8
0.6
0.4
Amplitude
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
1
2
3
4
Time (seconds)
5
6
7
8
4
x 10
First we set an unvoiced threshold, an energy level below which we assume the sound to represent
silence. Then we set a word boundary threshold, the length of time of the shortest unvoiced segment.
After some testing and experimenting, the most coherent results came from an energy threshold of 0.05
and a word length of 0.1s.
Bathiya Senevirathna 2
The Effect of Noise on Automatic Speech Segmentation Algorithms
Figure 2.2 below shows my segmented speech with the red trace representing unvoiced speech.
Figure 2.2 – Plot of segmented speech
Labeled Speech
1
Unvoiced
Voiced
0.8
0.6
0.4
Amplitude
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
1
2
3
4
5
6
7
8
Time (Seconds)
The thresholds I used actually fractured the word “shells” so nine words were found instead of eight.
3. Adding Noise
The automatic speech segmentation algorithm worked quite well so we then decided what effects
adding White Gaussian noise to the speech would have. The first step was to generate an array of
random numbers equal to the total number of samples in my speech file (8s x 10kHz = 80,000 samples).
We then standardized this data.
One parameter that we changed before adding the noise to the speech file was its scale, the ratio of the
amplitudes of my speech and the noise. I first started with a scale of 5, i.e. my speech amplitude was 5
times the size of the noise amplitude. As expected, my original thresholds still worked fine. In fact, after
I increased the energy threshold to 0.055, the word “shells” was correctly segmented as one word. This
did not happen when there was no noise. Next I reduced the scale to 4. It became apparent from the
graph that I had to increase the energy threshold to help identify words.
Figure 3.1 – Plot of noisy speech (1/4 scaled noise)
Labeled Speech
1.5
Unvoiced
Voiced
1
Amplitude
0.5
0
-0.5
-1
-1.5
0
1
2
3
4
Time (Seconds)
5
6
7
8
Bathiya Senevirathna 3
The Effect of Noise on Automatic Speech Segmentation Algorithms
My new parameters of 0.07 (+0.02) for energy and 0.15s (+0.05s) for word length were still enough to
identify the correct words. When the scale was reduced to 2, the algorithm had a harder time
distinguishing words from noise. I had to increase my energy threshold to 0.12 (+0.07) but kept the
same word length as with the scale of 4. It worked well but the word “sea” was lost in the noise which
was expected because of the soft “s” sound. This seems to be the emerging pattern as the noise gets
louder. The softer sounds are lost firs. Since the words in this particular recorded speech has so many
soft sounds (“She sells sea shells on the sea shore”), the success of this automatic speech segment
algorithm falls very quickly.
Figure 3.2 – Plot of noisy speech (1/2 scaled noise)
Labeled Speech
1.5
Unvoiced
Voiced
1
Amplitude
0.5
0
-0.5
-1
-1.5
0
1
2
3
4
5
6
7
8
Time (Seconds)
Finally, with a 1:1 noise:speech scale, it was very difficult to find any words at all automatically.
Download