SPEECH ANALYSIS ChenYan Li University of Oslo

SPEECH ANALYSIS ChenYan Li University of Oslo Project will be accomplished in DSB-lab of IFI. Several people talk through the microphone respectively, Theirs voice will be recorded and converted to digital signal, and then save these in matlab as sample waves. Find feature of their voice by detecting pitch. According feature of their voice to decide the gender of the speaker. There are tow ways to analyze the speech signal: Time-Domain speech analysis and Frequency-domain speech analysis. First, I use Time Domain analysis by measuring acoustic parameters that vary over time period of the utterance, the equivalent acoustic parameter is the short time energy and short-time autocorrelation: Where N1 and N2 are the frame boundaries and i is a frame index. Basic Idea of short-time processing: short segments of the speech signal are isolated and processed as if they were short segments from a sustained sound with fixed properties. The short segments are called frames. The Autocorrelation estimate fundamental frequency from the waveform directly, this function for a section of signal shows how well the waveform shape correlates with itself at range of different delays. This technique detect the peak amplitude in the function to estimate the pitch period. The idea is that the frame is shifted through the signal with a frame size of 30ms and where subsequent frames will be overlap by 50%. Frequency-Domain: Produce a short-duration spectrum of the speech signal through the use of the Fast Fourier Transform, make log of the absolute value Of the spectrum, and then take inverse FFT of this log value to get the cepstrum. To obtain an estimate of the fundamental frequency form the cepstrum, I Look for a peak in the frequency region corresponding to typical Speech fundamental frequencies. I calculate the energy of the speech signal Ei and the short-time autocorrelation, it also can be called voicing parameter. Where the signal is voiced we will find an autocorrelation coefficient that is large at a time lag corresponding to one period of the glottal vibration. Where Ei is greater than an energy threshold and autocorrelation is greater than a voicing parameter, I can determine the Fx for that frame by looking at the time lag k at which the autocorrelation maximum occurs. The upper two figures show time-domain analysis of the word “one, two, three, four… ”spoken by two male speakers. The first panel shows the speech signal’s waveform, the second panel shows the fundamental frequency for those frames where both energy and autocorrelation are above their respective thresholds. The third panel shows the short-time energy and the forth panel shows the autocorrelation. The left figure shows that the Fx trace has a lot of noise and I tried to adjust the threshold of the voicing parameter, it doesn't make it better, so I tried another way: frequency-domain method. Then I get better plot which we can see from nether left figure. I take off these unqualified frequency Fx traces and plot them on one figure (upper right) as the function of the time. It is easy for us to see the difference of the pitch frequency of the male and female. The pitch frequency of the male is around 100 Hz and the pitch frequency of the female is around 250Hz. This system can detect the pitch frequency and make decision about the gender of the speaker, but It can not distinguish the falsetto and true voice, and then will make a wrong decision. Time-domain method make a good decision about gender of the speakers, but some of the frequency result will contain noise. Although Frequency domain method can get rid of some noise of the frequency result but it is more useful for high pitch frequency. In the future, it could be found one solution combined with these two methods and get more better decision about the gender of the speakers. [1] LA.Atkinson, A.M. Kondoz and B.G. Evans: “Pitch detection of speech Signals using segmented autocorrelation ”, Electronics Letters, IEEE 30th Mrch 1995 Vol.31 No.7 [2] Alain de Cheveign: ”Time-domain auditory processing of speech”, Journal Of Phonetics, Vol.31, No.3, July 2003, pp. 547-561(15). [3] Y. Medan, E. Yair and D. Chazan: ”Super Resolution Pitch Determination Of speech signals ”, IEEE, Transactions on Signal Processing, Vol.39, No.1, Jan 1991 [4] R. Sankar: ”Pitch Extraction Algorithm For Voice Recognition Applications”, IEEE, System Theory, ISBN: 0-8186-0847-1, Mar 1988 [5] B. Gold, N. Morgan: ”Speech and Audio Signal Processing: Processing and Perception of Speech and Music”, ISBN 978-0-471-35154-7 [6] http://www.speechtechmag.com

SPEECH ANALYSIS ChenYan Li University of Oslo

Related documents

Products

Support

SPEECH ANALYSIS ChenYan Li University of Oslo

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib