9.5 Review Questions

advertisement
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
-1-
Lab Experiment No. 9
Name: _____________________________________________________ Roll No: ______________
Score:_________________ Signature:__________________________________ Date:___________
Introduction to Speech Processing & Using Mic in MATLAB
PERFORMANCE OBJECTIVE:
After the successful completion of this lab, students will be able to:





To record audio signals using Microphone.
Save and load them as avi or wave file
Modify them, and play them back.
Recording and playing audio signals using Simulink.
Merging different audio and video files in one file.
LAB REQUIREMENTS:


PC with Windows XP/2007,Operating System.
MATLAB 2007,2009,2011 or latest with Signal processing toolbox
DISCUSSION:
9.3.1 Speech Signal Processing
Speech signal processing refers to the acquisition, manipulation, storage, transfer and output of vocal
utterances by a computer. The main applications are the recognition, synthesis and compression of
human speech.
Speech recognition (also called voice recognition) focuses on capturing the human voice as a digital
sound wave and converting it into a computer-readable format. Speech synthesis is the reverse
process of speech recognition. Advances in this area improve the computer's usability for the visually
impaired. Speech compression is important in the telecommunications area for increasing the amount
of information which can be transferred, stored, or heard, for a given set of time and space constraints.
9.3.2 Analysis and Synthesis of Speech
A speech signal is usually represented in digital format, which is a sequence of binary bits. For storage
and transmission applications, it is desirable to compress a signal by representing it with as few bits as
possible, while maintaining its perceptual quality. In narrowband digital speech compression, speech
signals are sampled at a rate of 8000 samples per second. Typically, each sample is represented by 8
bits. This corresponds to a bit rate of 64 kbits per second. Further compression is possible at the cost of
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
-2-
quality. Most of the current low bit rate speech coders are based on the principle of linear predictive
speech coding.
9.3.3
MATLAB functions
wavread
Read Microsoft WAVE (.wav) sound file
Syntax
y = wavread('filename')
[y,Fs,bits] = wavread('filename')
Description
wavread supports multichannel data, with up to 32 bits per sample, and supports reading 24- and 32bit .wav files.
y = wavread('filename') loads a WAVE file specified by the string filename, returning the sampled data
in y. The .wav extension is appended if no extension is given. Amplitude values are in the range [-1,+1].
[y,Fs,bits] = wavread('filename') returns the sample rate (Fs) in Hertz and the number of bits per
sample (bits) used to encode the data in the file.
wavwrite
Write a Microsoft WAVE (.wav) sound file
Syntax
wavwrite(y,'filename')
wavwrite(y,Fs,'filename')
wavwrite(y,Fs,N,'filename')
Description
wavwrite writes data to 8-, 16-, 24-, and 32-bit .wav files. wavwrite(y,'filename') writes the data stored
in the variable y to a WAVE file called filename. The data has a sample rate of 8000 Hz and is assumed
to be 16-bit. Each column of the data represents a separate channel. Therefore, stereo data should be
specified as a matrix with two columns. Amplitude values outside the range [-1,+1] are clipped prior to
writing. wavwrite(y,Fs,'filename') writes the data stored in the variable y to a WAVE file called
filename. The data has a sample rate of Fs Hz and is assumed to be 16-bit. Amplitude values outside
the range [-1,+1] are clipped prior to writing.
wavwrite(y,Fs,N,'filename') writes the data stored in the variable y to a WAVE file called filename. The
data has a sample rate of Fs Hz and is N-bit, where N is 8, 16, 24, or 32. For N < 32, amplitude values
outside the range [-1,+1] are clipped.
wavrecord
Record sound using a PC-based audio input device.
Syntax
y = wavrecord(n,Fs)
Description
y = wavrecord(n,Fs) records n samples of an audio signal, sampled at a rate of Fs Hz (samples per
second). The default value for Fs is 11025 Hz.
Remarks
Standard sampling rates for PC-based audio hardware are 8000, 11025, 2250, and 44100 samples per
second. Stereo signals are returned as two-column matrices. The first column of a stereo audio matrix
corresponds to the left input channel, while the second column corresponds to the right input channel.
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
-3-
Example 9.1
Record 5 seconds of 16-bit audio sampled at 11025 Hz. Play back the recorded sound using wavplay.
Speak into your audio device (or produce your audio signal) while the wavrecord command runs.
Fs = 11025;
y = wavrecord(5*Fs,Fs,'int16');
wavplay(y,Fs);
wavplay
Play recorded sound on a PC-based audio output device
Syntax
wavplay(y,Fs)
wavplay(...,'mode')
Description
wavplay(y,Fs) plays the audio signal stored in the vector y on a PC-based audio output device. You
specify the audio signal sampling rate with the integer Fs in samples per second. The default value for
Fs is 11025 Hz (samples per second). wavplay supports only 1- or 2-channel (mono or stereo) audio
signals.
wavplay(...,'mode') specifies how wavplay interacts with the command line, according to the string
'mode'. The string 'mode' can be
 'async' (default value): You have immediate access to the command line as soon as the sound
begins to play on the audio output device (a non-blocking device call).
 'sync': You don't have access to the command line until the sound has finished playing (a
blocking device call). The audio signal y can be one of four data types. The number of bits used
to quantize and play back each sample depends on the data type.
Table 9.1: Data types for wavplay
Remarks
You can play your signal in stereo if y is a two-column matrix.
Example 9.2 Obtain a speech signal from microphone and compute its FFT.
Solution Consider the following code:
% An example showing how to obtain a speech signal from microphone
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
-4-
% and compute its Fourier Transform (FFT)
Fs = 10000;
% Sampling Frequency (Hz)
Nseconds = 1; % Length of speech signal
fprintf('say a word immediately after hitting enter: ');
input('');
% Get time-domain speech signal from microphone
y = wavrecord(Nseconds*Fs, Fs, 'double');
% Plot time-domain signal
subplot(2,1,1);
t=(0:(Nseconds*Fs)-1)*Nseconds/(Nseconds*Fs);
plot(t,y);
xlabel('time');
% Compute FFT
x = fft(y);
% Get response until Fs/2 (for frequency from Fs/2 to Fs, response is repeated)
x = x(1:floor(Nseconds*Fs/2));
% Plot magnitude vs. frequency
subplot(2,1,2);
m = abs(x);
f = (0:length(x)-1)*(Fs/2)/length(x);
plot(f,m);
xlabel('Frequency (Hz)');
ylabel('Magnitude');
The output of Example 9.1 is shown in figure 9.1.
Figure 9.1: Output of Example 9.1.
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
9.3.3
-5-
Speech processing using Simulink
In the Signal processing block set different blocks are there which can be used for speech processing.
Among which sources and sinks blocks are of our use for recording, saving and playing audio signals.
Signal processing sources
In signal processing sources block we can find the following blocks from which we can acquire an audio
signal as shown in Figure 9.2
 From Audio Device
 Signal from Workspace
 From Multimedia File
 From Wave File
speech_dft.avi
Audio
A: 22050 Hz, 16 bit, mono
1:10
From Audio
Device
Signal From
Workspace
From Multimedia File
From Wave File
speech_dft.wav Out
(22050Hz/1Ch/16b)
From Wave File
Figure 9.2: Signal processing speech sources.
Signal processing sinks
The sinks related to the audio signals are as follows shown in Figure falana
 Signal to work space
 To Multimedia file
 To wave File
 To Audio Device
yout
Si gnal T o
Workspace
Audi o output.avi
T o Mul ti medi a Fi l e
audi o.wav
T o Wave Fi l e
T o Audi o
Devi ce
Figure 9.2: Signal processing speech sinks
Example 9.3 Recording Audio Files using simulink
Construct the simulink model as shown in Figure 9.3 to obtain a speech signal from microphone and
save it as multimedia file as well as wav file. Check the created files in your current directory, also play
them to listen the audio.
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
-6-
Audi o recorded.avi
From Audi o
Devi ce1
T o M ul ti m edi a Fi l e
recorded.wav
T o Wave Fi l e
Figure 9.3: Audio recording and saving as files on disk
Example 9.4 Playing Audio Files using Simulink
Convert a wav file to an avi file and play the file on your speakers by using Simulink model of Figure 9.4
From Wave File
recorded.wav
Out
(44100Hz/2Ch/16b)
From Wave File
Audio recorded.avi
T o Multimedia File
T o Audio
Device
Figure 9.4: playing Audio files.
9.4
Exercises
Exercise 9.1
Write a MATLAB code to obtain a speech signal from microphone (duration 3 seconds) and compute its
Fourier Transform (FFT). Add different forms of noise to the recorded speech, apply filters to remove
the noise and then play it back. Also comment on the performance.
Please attach the hardcopy of your code along with handout for grading.
Exercise 9.2
Use the From Multimedia File block to import a video stream into a Simulink model. Also use Wave File
block to import an audio stream into the model. Write this audio and video to a single file using the To
Multimedia File block. Play the multimedia file using a media player. The original video file now should
have an audio component to it.
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
ISRA UNIVERSITY
FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY
Lab - 09
9.5
-7-
Review Questions
1. What are the different arguments required for speech recording?
______________________________________________________________________
______________________________________________________________________
2. What is the difference between ‘sync’ and ‘async’ modes?
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
3. How signal from workspace block can be used in Simulink for speech processing?
______________________________________________________________________
______________________________________________________________________
4. How did you calculate the value of ‘samples per output frame’ in the ‘From Wave File Block’
parameters in exercise 9.2
______________________________________________________________________
______________________________________________________________________
Designed By: Engr. Irshad Rahim Memon
Department: Computer Science
Download