ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 -1- Lab Experiment No. 9 Name: _____________________________________________________ Roll No: ______________ Score:_________________ Signature:__________________________________ Date:___________ Introduction to Speech Processing & Using Mic in MATLAB PERFORMANCE OBJECTIVE: After the successful completion of this lab, students will be able to: To record audio signals using Microphone. Save and load them as avi or wave file Modify them, and play them back. Recording and playing audio signals using Simulink. Merging different audio and video files in one file. LAB REQUIREMENTS: PC with Windows XP/2007,Operating System. MATLAB 2007,2009,2011 or latest with Signal processing toolbox DISCUSSION: 9.3.1 Speech Signal Processing Speech signal processing refers to the acquisition, manipulation, storage, transfer and output of vocal utterances by a computer. The main applications are the recognition, synthesis and compression of human speech. Speech recognition (also called voice recognition) focuses on capturing the human voice as a digital sound wave and converting it into a computer-readable format. Speech synthesis is the reverse process of speech recognition. Advances in this area improve the computer's usability for the visually impaired. Speech compression is important in the telecommunications area for increasing the amount of information which can be transferred, stored, or heard, for a given set of time and space constraints. 9.3.2 Analysis and Synthesis of Speech A speech signal is usually represented in digital format, which is a sequence of binary bits. For storage and transmission applications, it is desirable to compress a signal by representing it with as few bits as possible, while maintaining its perceptual quality. In narrowband digital speech compression, speech signals are sampled at a rate of 8000 samples per second. Typically, each sample is represented by 8 bits. This corresponds to a bit rate of 64 kbits per second. Further compression is possible at the cost of Designed By: Engr. Irshad Rahim Memon Department: Computer Science ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 -2- quality. Most of the current low bit rate speech coders are based on the principle of linear predictive speech coding. 9.3.3 MATLAB functions wavread Read Microsoft WAVE (.wav) sound file Syntax y = wavread('filename') [y,Fs,bits] = wavread('filename') Description wavread supports multichannel data, with up to 32 bits per sample, and supports reading 24- and 32bit .wav files. y = wavread('filename') loads a WAVE file specified by the string filename, returning the sampled data in y. The .wav extension is appended if no extension is given. Amplitude values are in the range [-1,+1]. [y,Fs,bits] = wavread('filename') returns the sample rate (Fs) in Hertz and the number of bits per sample (bits) used to encode the data in the file. wavwrite Write a Microsoft WAVE (.wav) sound file Syntax wavwrite(y,'filename') wavwrite(y,Fs,'filename') wavwrite(y,Fs,N,'filename') Description wavwrite writes data to 8-, 16-, 24-, and 32-bit .wav files. wavwrite(y,'filename') writes the data stored in the variable y to a WAVE file called filename. The data has a sample rate of 8000 Hz and is assumed to be 16-bit. Each column of the data represents a separate channel. Therefore, stereo data should be specified as a matrix with two columns. Amplitude values outside the range [-1,+1] are clipped prior to writing. wavwrite(y,Fs,'filename') writes the data stored in the variable y to a WAVE file called filename. The data has a sample rate of Fs Hz and is assumed to be 16-bit. Amplitude values outside the range [-1,+1] are clipped prior to writing. wavwrite(y,Fs,N,'filename') writes the data stored in the variable y to a WAVE file called filename. The data has a sample rate of Fs Hz and is N-bit, where N is 8, 16, 24, or 32. For N < 32, amplitude values outside the range [-1,+1] are clipped. wavrecord Record sound using a PC-based audio input device. Syntax y = wavrecord(n,Fs) Description y = wavrecord(n,Fs) records n samples of an audio signal, sampled at a rate of Fs Hz (samples per second). The default value for Fs is 11025 Hz. Remarks Standard sampling rates for PC-based audio hardware are 8000, 11025, 2250, and 44100 samples per second. Stereo signals are returned as two-column matrices. The first column of a stereo audio matrix corresponds to the left input channel, while the second column corresponds to the right input channel. Designed By: Engr. Irshad Rahim Memon Department: Computer Science ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 -3- Example 9.1 Record 5 seconds of 16-bit audio sampled at 11025 Hz. Play back the recorded sound using wavplay. Speak into your audio device (or produce your audio signal) while the wavrecord command runs. Fs = 11025; y = wavrecord(5*Fs,Fs,'int16'); wavplay(y,Fs); wavplay Play recorded sound on a PC-based audio output device Syntax wavplay(y,Fs) wavplay(...,'mode') Description wavplay(y,Fs) plays the audio signal stored in the vector y on a PC-based audio output device. You specify the audio signal sampling rate with the integer Fs in samples per second. The default value for Fs is 11025 Hz (samples per second). wavplay supports only 1- or 2-channel (mono or stereo) audio signals. wavplay(...,'mode') specifies how wavplay interacts with the command line, according to the string 'mode'. The string 'mode' can be 'async' (default value): You have immediate access to the command line as soon as the sound begins to play on the audio output device (a non-blocking device call). 'sync': You don't have access to the command line until the sound has finished playing (a blocking device call). The audio signal y can be one of four data types. The number of bits used to quantize and play back each sample depends on the data type. Table 9.1: Data types for wavplay Remarks You can play your signal in stereo if y is a two-column matrix. Example 9.2 Obtain a speech signal from microphone and compute its FFT. Solution Consider the following code: % An example showing how to obtain a speech signal from microphone Designed By: Engr. Irshad Rahim Memon Department: Computer Science ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 -4- % and compute its Fourier Transform (FFT) Fs = 10000; % Sampling Frequency (Hz) Nseconds = 1; % Length of speech signal fprintf('say a word immediately after hitting enter: '); input(''); % Get time-domain speech signal from microphone y = wavrecord(Nseconds*Fs, Fs, 'double'); % Plot time-domain signal subplot(2,1,1); t=(0:(Nseconds*Fs)-1)*Nseconds/(Nseconds*Fs); plot(t,y); xlabel('time'); % Compute FFT x = fft(y); % Get response until Fs/2 (for frequency from Fs/2 to Fs, response is repeated) x = x(1:floor(Nseconds*Fs/2)); % Plot magnitude vs. frequency subplot(2,1,2); m = abs(x); f = (0:length(x)-1)*(Fs/2)/length(x); plot(f,m); xlabel('Frequency (Hz)'); ylabel('Magnitude'); The output of Example 9.1 is shown in figure 9.1. Figure 9.1: Output of Example 9.1. Designed By: Engr. Irshad Rahim Memon Department: Computer Science ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 9.3.3 -5- Speech processing using Simulink In the Signal processing block set different blocks are there which can be used for speech processing. Among which sources and sinks blocks are of our use for recording, saving and playing audio signals. Signal processing sources In signal processing sources block we can find the following blocks from which we can acquire an audio signal as shown in Figure 9.2 From Audio Device Signal from Workspace From Multimedia File From Wave File speech_dft.avi Audio A: 22050 Hz, 16 bit, mono 1:10 From Audio Device Signal From Workspace From Multimedia File From Wave File speech_dft.wav Out (22050Hz/1Ch/16b) From Wave File Figure 9.2: Signal processing speech sources. Signal processing sinks The sinks related to the audio signals are as follows shown in Figure falana Signal to work space To Multimedia file To wave File To Audio Device yout Si gnal T o Workspace Audi o output.avi T o Mul ti medi a Fi l e audi o.wav T o Wave Fi l e T o Audi o Devi ce Figure 9.2: Signal processing speech sinks Example 9.3 Recording Audio Files using simulink Construct the simulink model as shown in Figure 9.3 to obtain a speech signal from microphone and save it as multimedia file as well as wav file. Check the created files in your current directory, also play them to listen the audio. Designed By: Engr. Irshad Rahim Memon Department: Computer Science ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 -6- Audi o recorded.avi From Audi o Devi ce1 T o M ul ti m edi a Fi l e recorded.wav T o Wave Fi l e Figure 9.3: Audio recording and saving as files on disk Example 9.4 Playing Audio Files using Simulink Convert a wav file to an avi file and play the file on your speakers by using Simulink model of Figure 9.4 From Wave File recorded.wav Out (44100Hz/2Ch/16b) From Wave File Audio recorded.avi T o Multimedia File T o Audio Device Figure 9.4: playing Audio files. 9.4 Exercises Exercise 9.1 Write a MATLAB code to obtain a speech signal from microphone (duration 3 seconds) and compute its Fourier Transform (FFT). Add different forms of noise to the recorded speech, apply filters to remove the noise and then play it back. Also comment on the performance. Please attach the hardcopy of your code along with handout for grading. Exercise 9.2 Use the From Multimedia File block to import a video stream into a Simulink model. Also use Wave File block to import an audio stream into the model. Write this audio and video to a single file using the To Multimedia File block. Play the multimedia file using a media player. The original video file now should have an audio component to it. Designed By: Engr. Irshad Rahim Memon Department: Computer Science ISRA UNIVERSITY FACULTY OF ENGINEERING, SCIENCE & TECHNOLOGY Lab - 09 9.5 -7- Review Questions 1. What are the different arguments required for speech recording? ______________________________________________________________________ ______________________________________________________________________ 2. What is the difference between ‘sync’ and ‘async’ modes? ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ 3. How signal from workspace block can be used in Simulink for speech processing? ______________________________________________________________________ ______________________________________________________________________ 4. How did you calculate the value of ‘samples per output frame’ in the ‘From Wave File Block’ parameters in exercise 9.2 ______________________________________________________________________ ______________________________________________________________________ Designed By: Engr. Irshad Rahim Memon Department: Computer Science