Name : Design of a 3band equalizer for speech signal using

advertisement
EMM3026: Digital Audio Signal Processing
EMM 3026
Digital Audio Signal Processing
Trimester 2 2011/2012
Laboratory
Experiment DAS-1
Representing Speech in Time, Frequency
and Parametric Domain
MYA
Page 1
2/16/2016
EMM3026: Digital Audio Signal Processing
1. Introduction
For either speech analysis or speech synthesis, it is important to have a clear representation of
speech. Three types of speech representation are commonly in use:
1. time representation
2. spectral representation
3. parametric representation
Marking scheme:
Items
Results
Discussion
Conclusions
Report writing
Total
1.1
Maximum Marks
60
20
10
10
100
Objectives and Apparatus
For this experiment, all you will need to work with is a computer installed with MATLAB. The
sound files will be provided for you. The objective of this lab is to allow you to illustrate the three
representations mentioned above using MATLAB. Throughout this lab sheet, the commands that
you will need to use in MATLAB are written in courier style font. For this lab, you will be
required to submit a lab report in a weeks time (see the last section on a guide to write your report).
The points that must be included in your report are mentioned throughout this lab sheet, preceded
with a symbol "R - ".
Two signals are made available for analysis during this lab - one vocal signal and one musical.
They are both available online:
- vocal (MCF_12345.wav)
- musical (cla8.wav)
Download both into the following folder: C:\MATLABR11\work - this is the work folder of
MATLAB, the folder in which all the files you are currently working on should be stored. This
folder is in the pre-defined "path" of MATLAB.
1.2
Getting Started
Now start MATLAB by clicking on the MATLAB icon on the desktop. The first thing to do is to
read the files into a MATLAB array. In MATLAB, arrays are automatically defined as the need
rises, and some functions incorporate all the open file, read file and close file operations that are
often quite tedious to do in other languages. The command in MATLAB is fairly simple:
a = wavread('MCF_12345.wav');
or
a = wavread('cla8.wav');
This command performs the following steps:
1. checks the existence of a file name 'MCF_12345.wav' in the path;
2. open the said file;
3. check the amount of data in the file;
MYA
Page 2
2/16/2016
EMM3026: Digital Audio Signal Processing
4. allocate an array with sufficient memory and name it "a";
5. read the data from the file and store it in the array;
6. close the file.
To hear the data played as a sound file, type:
wavplay(a);
Now try to play the file by double-clicking on it. You will probably notice that it sounds differently.
That is because the .wav file header contains information about the rate at which this data was
sampled. When playing the file, the software reads this sampling frequency information and uses it
to play the sound correctly. However, in the previous command, we had ignored that particular
data. To read it, we have to use a variant form of the wavread and wavplay commands, as
follows:
[a,fs]=wavread('D:\Lectures\EMM3026\MCF_12345.wav');
wavplay(a, fs);
R-
Assignment 1: Check the sampling frequency of both samples. Identify whether this is CDquality sound.
To know the amount of data in the file, we can use several methods:
a) To check the size of the original file:
wavread('D:\Lectures\EMM3026\MCF_12345.wav', 'size')
b) To check the length of the array:
length(a)
c) To check the dimension of all the variables defined in the current MATLAB session:
whos
R-
Assignment 2: Calculate the exact duration of both samples.
Please note that, in MATLAB, if the line does not end with a semi-colon (";"), the result of the
operation is displayed in the command window. This may be quite annoying if a lot of data is
processed (e.g., imagine what would happen if we used the wavread command without the semicolumn at the end). Also, we can store the result in a variable (like in the command "a =
wavread('MCF_12345.wav');") or discard it (like in the command "length(a)").
To see the content of a variable, we just type its name, and hit return, for example:
a
Can you make any sense out of all those numbers? That is why we need an efficient way to
represent this data. Three such ways are going to be discussed next.
2. Time representation
The most straightforward representation is just to plot the data as is:
plot(a)
An imaging window is automatically created, and the data plotted. The aspect of the plot can be
improved by adding gridlines:
grid
The imaging window can be resized in the usual way, and the plot will be automatically rescaled.
You can cut and paste this plot in a Word document with the command Edit  Copy Figure
MYA
Page 3
2/16/2016
EMM3026: Digital Audio Signal Processing
and then paste in the Word document. You can also save it as Tif by pasting it in the freeware
IrfanView and then clicking on File  Save As.
R-
Assignment 3: Include the two complete time plots in your report. Analyse which part of the
plot corresponds to which word/sound in the sample.
It is possible to zoom in and out by selecting the magnifying glass symbol in the toolbar and rightor left-clicking on the part to be enlarged. However, we would rather zoom the time axis only,
while leaving the intensity axis unaffected. This can be done by using the following command:
axis([2001 4000 -0.05 0.05])
This changes the display of the data by restricting the horizontal axis to values between 2001 and
4000 (that means, samples number 2001 to 4000) and the vertical axis between -0.05 and +0.05.
You can listen to the corresponding samples by using the following command:
wavplay(a(2001:4000), fs);
In MATLAB notation, a(2001:4000) is an array composed of the values of array a between
indices 2001 and 4000. This can even be assigned to another array with the following command:
b = a(2001:4000);
R-
Assignment 4: Try to plot each word/note separately, and include the results in your report.
Distinguish the difference between silence, unvoiced speech and voiced speech. Relate your
answers to the periodicity of voiced and unvoiced speech, and of music.
Since at least part of the signal is periodic, a lot can probably be gained from frequency analysis
using the frequency transform. This will be examined in the second part of this lab.
3. Spectral representation
To calculate the Fourier transform of a signal using the Fast Fourier Transform (FFT), the
command is very simple indeed:
F = fft(a);
To calculate the FFT of part of a signal, we can either store the partial signal
b = a(2001:4000);
F = fft(b);
or calculate the FFT directly:
F = fft(a(2001:4000));
(Please note that MATLAB is case-sensitive for the declaration of its arrays, but case-insensitive
for the function names)
We should not forget that the result of an FFT is a complex number, as can be checked using the
whos command. The phase and amplitude information can be obtained as follows:
P = angle(F);
A = abs(F);
In Digital Audio Signal Processing, we are usually not interested in the phase information, only in
the amplitude - or, more precisely, the logarithm of the amplitude. Again, we can put things
together in a very compact way:
A = log(1 + abs(fft(a(2001:4000))));
Again, we can select only the part of the plot that we are interested in by using the axis command.
R-
MYA
Assignment 5: Can you explain why we take the logarithm of 1 plus the amplitude, and not
the logarithm of the amplitude itself?
Page 4
2/16/2016
EMM3026: Digital Audio Signal Processing
R


Assignment 6: Plot the amplitude of the FFT of the period parts of each signal.
What do you remember about the symmetry properties of the FT?
Explain the different spikes that appear.
Plot also the FFT of the complete signal. What do you observe when you compare the plots
of the different sub-parts of the signal, and the plot of the complete signal?
As we noticed, the frequency content is different for the different vowels/notes. In this case, we had
to select the sub-parts of the signal manually. To come to an automated system for speech
recognition, we will have the computer calculate the FFT at regular intervals during the signal, and
by taking a limited length of the signal into account. For example, we might calculate the FFT
every 20 samples, taking 100 samples into account.
The question is then, how to represent all this data? This is called a spectrogram. The FFT's are
stored one next to the other, with the time axis horizontally and frequency from top to bottom, and
intensities representing the amplitude of the FFT, white being zero amplitude and black maximum
amplitude.
The following code does just that:
L = length(a);
D = 100;
S = 20;
for x=1:S:L-D
b = log10(1 + abs(fft(a(x:x+D))));
if x==1
bl = b(1:D/2);
else
bl = [bl, b(1:D/2)];
end
end
figure;
mx = max(bl(:));
imshow((mx - flipud(bl)) / mx);
Please note that we have to scale the output between 0 and 1 to use the function imshow, which
shows the array as an image.
R-
Assignment 7: Include the spectrogram for at least 5 different values of the window length D,
and evaluate on the influence of this window length. Conclude the advantages or
disadvantages of a large window size.
4. Parametric representation
Linear Predictive Coding (LPC) is one example of a parametric representation of the signal. A new
sample is predicted from a linear combination of the previous N samples. The N coefficients are
calculated so as to minimize the prediction error. This is done for a limited window D in the signal,
and every S samples. Hence, every S samples, we have N parameters to describe the signal.
The whole procedure for LPC is combined in MATLAB in a single function: lpc. Hence, the code
to calculate the LPC and plot it is quite similar to the code used for the generating the spectrogram:
MYA
Page 5
2/16/2016
EMM3026: Digital Audio Signal Processing
L = length(a);
D = 100;
S = 100;
N = 20;
for x=1:S:L-D
b = lpc(a(x:x+D), N);
if x==1
bl = b(:);
else
bl = [bl, b(:)];
end
end
figure;
mx = max(abs(bl(:)));
imshow(flipud(bl)/(2 * mx) + 0.5);
This time, the zero coefficients are plotted as gray, the positive ones as white and the negative ones
as black.
RR-
Assignment 8: Try different values of D, S and N, and describe on the results.
Assignment 9: Put the three representations using in this lab - time, frequency and parametric
- in the same time scale on top of one another, and compare the plots.
 Explain whether the transition between the silent/unvoiced/voice speech and between the
different notes is more easily discernible in one representation than in the other.
 Distinguish the difference between the different vowels/notes in each of the representations.
 Compare the difference between the different unvoiced consonants in each of the
representations.
5. Lab Report
Reports should be written in MS Word and should include 5 major sections:
1. a header page, with your name and ID number, lab number and title, the date of experiment
and the date the report is submitted;
2. an introduction, where you state the objectives of the experiment and how you will
accomplish them;
3. the results, where you state all your procedures and present your m-files (present this neatly
in boxes using a different font, if code is too long, you may stick it in the appendix section,
label it properly so that you can refer back to it), all figures and plots (plots can be included
by using the EditCopy Figure under the menu item, and pasted in MS Photo Editor for
further manipulation, or directly into your Word document; for windows, screen captures can
be used);
4. a discussion, where you explain how the results were generated, compare the results, identify
any discrepancies and explain why it happened and comment on the underlying principles
that are illustrated by the results;
5. a conclusion: where you present a brief summary of the experiment. Did you successfully
accomplish what you set out to do? If not, why? An external reader should be able to
understand the whole experiment just by reading the introduction and the conclusion.
MYA
Page 6
2/16/2016
EMM3026: Digital Audio Signal Processing
Your report should look neat and professional with each section clearly labeled. Additional marks
will be given to reports that are well written and presented. You will be required to submit a
hardcopy version of your report and a softcopy version, which can be sent to me through email.
Copying lab reports is considered a serious offense! As a warning, the softcopy version will be
used to detect any copying using a special "computer-aided copy detection" software. Students who
are caught (both the copier and the person who allowed his/her work to be copied) will receive only
50% of the allocated marks. Note that a complete database of previous students’ reports is also
available for copy detection. Plagiarizing those reports will result in you getting a zero for your
report.
MYA
Page 7
2/16/2016
Download