Analysis of Acoustic Cues for

Analysis of Acoustic Cues for Identifying the Consonant /o/ in Continuous Speech by Ying Alisa Cao Submitted to the Department of Electrical Engineering and Computer Science In partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 13, 2002 Copyright @2002 Ying Alisa Cao. All Rights Reserved. The author hereby grants to M.I.T. permission to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others the right to do so. Author ............... ... Depu - F- i En cience June 13, 2002 Certified by ................. Kenneth N. Stevens Professor, Research Laboratory for Electronics Thesis Supervisor Accepted by ............. - % irhur C. Smith Chairman, Department Committee on Graduate Theses OF TECHNOt96Y JUL 3 1 ?002 LIBRARIES amKER Analysis of Acoustic Cues for Identifying the Consonant /o/ in Continuous Speech by Ying Alisa Cao Submitted to the Department of Electrical Engineering and Computer Science June 13, 2002 in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract This project aims to advance automatic recognition of continuous speech by recognizing individual phonemes in speech using acoustic cues unique to each phoneme. This project focuses on studying the acoustic characteristics of one of the most prevalent phonemes of English-the fricative consonant /6/, as in word the, this, those, etc. Since previous research has shown that /6/ can assimilate to its preceding phoneme, characteristics of /6/ and its close-sounding phonemes, /n/, /d/, and /v/, are studied in the preceding context of nasal consonant, stop consonant, and vowel, respectively. Through examining characteristics of /6/ and its counterparts in phrases such as win those and win nose, the goal is to find the contextual-based and invariant cues for identifying /6/. Spectrum analysis tools are used to extract important acoustic information such as the formant frequencies and their changes over time, the energy distribution over frequencies, and the duration of utterances. In the context of a nasal consonant, F2 and the change in F2 over a fixed interval of time (DeltaF2) are found to be the best cues: /6/ has lower F2 and higher DeltaF2 than /n/. In the context of stop consonant, F2 and amplitude difference in the noise burst between the high and low frequency ranges are the best cues: /6/ has lower F2 and a more negative amplitude difference. In the context of vowels, F3 and DeltaF3 are found to be the best cues: /6/ generally has higher F3 and DeltaF3 than /v/, although they are not as reliable as the cues of the other two contexts. The formant frequencies are greatly influenced by the speaker's gender and the succeeding vowel, and they vary among speakers of the same gender. Thus, the more contextual and speaker information the identification criteria are based on, the more accurate the identification of /6/ is likely to be. This correlation suggests human's auditory system is likely to also rely on contextual information for the accurate processing of continuous speech. Thesis Supervisor: Kenneth N. Stevens, Sc.D. Title: Clarence J. Lebel Professor of Electrical Engineering, Research Laboratory for Electronics 2 Acknowledgments My heartfelt gratitude goes to Professor Kenneth Stevens, for giving me this valuable learning experience to engage in research and scientific writing, for personally guiding me through the entire process, for his patience and understanding, and for his care and concerns for my well- being. I would like to thank my friends at MIT who have given me many warm and joyous memories and who have helped me immensely in all aspects of my life in the past five years. I also want to thank my parents who have worked hard and sacrificed much to provide me with the best education and much more. Lastly, even though this thesis hardly measures up to all that I have received from them, I would like to dedicate this thesis to my grandparents. They, too, have sacrificed much to give me the best opportunities, and they gave me a wonderful childhood from which I draw faith and confidence. Their sincerity, work ethic, simple living, dedication toward family, and service to society have been and will always be my most cherished inspiration in life. 3 Table of Contents I: Introduction ......................................................................................... 12 i 1 Mo tivation ................................................................................................................................ 12 1.2 B ackgroundInform ation ..................................................................................................... 13 1.3 Prelim in ary W ork ..................................................................................................................... 15 1.4 R esearch Objectives ................................................................................................................. 17 II: M ethodology.................................................................................... 18 II. Ov erview ................................................................................................................................. 18 e.2................................................................................................................................. Datab ase 18 20 .3 Parameters A nalyzed .............................................................................................................. 11.3.1 Second and Third Formant Frequencies at the Onset of the Succeeding Vowel (F2 and 0 F 3 )..............................................................................................................................................2 11.3.2 Change of 2 and f3 Over a Period of 50 ms (DeltaF2 and DeltaF3)..........................22 11.3.3 Amplitude Difference (Amp(High-Mid)) and Duration of Burst of /6/ and /d/...........23 III: Results and Analysis.................................................................... 26 II.1 Context of Nasal Consonant............................................................................................ 111.1.1 Second vs. Third Formant Frequencies (F2 vs. F3)................................................... 26 26 111.1.2 Second Formant at the Onset of the Succeeding Vowel (F2).....................................27 111.1.3 Movement of f2 Over a Period of 50 ms (DeltaF2)................................................... 111.1.4 F 2 and D eltaF 2 ............................................................................................................... 111.2 Context of Stop Consonant.............................................................................................. 32 33 38 38 111.2.1 Second vs. Third Formant Frequencies (F2 vs. F3)................................................... 111.2.2 The Second Formant Frequency at the Onset of the Succeeding Vowel (F2)............38 111.2.3 Amplitude Difference In The Burst (Amp(High-Mid))............................................ 40 111.2 .4 Burst Duration ................................................................................................................ 41 111.2.5 F2 and Amp(High-Mid)............................................................................................. 42 I I.3 Contex t of Vow el....................................................................................................................4 111.3.1 Second vs. Third Formant Frequencies (F2 vs. F3)................................................... 6 46 111.3.2 Third Formant Frequency at the Onset of the Succeeding Vowel (F3)......................46 111.3 .3 F 3 and DeltaF 3 ............................................................................................................... 47 IV: Conclusion and Future W ork.......................................................52 I V. Co n c lus io n ............................................................................................................................. 4 52 IV 2 Futu re Wo rk ........................................................................................................................... V. References ........................................................................................ 54 57 VI. Appendices......................................................................................58 Appendix A: Sentences Used in the Study..................................................................................58 Appendix B : A dditionalR esults ................................................................................................. Appendix B. 1 Context of Nasal Consonant...........................................................................60 Appendix B.2 Context of Vowel .......................................................................................... 5 60 62 List of Figures Figure 1-1: a) An example spectrogram, b) An example output of formant tracks extracted by xkl. Note that this thesis denotes the second and third formant frequencies with lowercase "f', as "f2" and "f3", and denotes the second and third formant frequencies at the onset of the succeeding vowel with uppercase "F", as "F2" and "F3.".................................................... 15 Figure 1-2: An idealized spectrogram and the important acoustic features measured in the prelim in ary study ....................................................................................................................... 16 Figure 1-3: Three commonly observed patterns in the second and third formant frequencies of /6/ and /v /. ....................................................................................................................................... 16 Figure 1-4: Example of a combination (F3 and -F3) that separates /6/ and /v/. Notice that a line representing some differentiation criteria can be drawn and would separate all utterances of /6/ from th o se of /v/.........................................................................................................................17 Figure 2-1: A male speaker's waveform of the utterance "V" in "TV" (Sentence #12). The changes in the waveform at around 224 ms indicate the onset of the vowel..................................... 21 Figure 2-2: An example spectrum showing auto-picked value of F3 (the dotted vertical line) by xkl. ................................................................................................................................................... 22 Figure 2-3: Spectra of the burst of a) /6/ vs. b) /d/, of a female speaker. Notice that a prominent peak at around 4.7 kHz is seen in the spectrum of /d/ but not in that of /6/. ......................... 23 Figure 2-4: Sample waveforms of a) /6/ and b) /d/. In each waveform, the two vertical lines represent the points of waveform change, and the time in between the lines is the duration of th e b u rst......................................................................................................................................2 5 Figure 3-1: The mid-sagittal view of the vocal track for /6/, /n/, and /d/ [5]. The back cavity is more restricted for /6/, the difference that leads to a lower F2 for /6/ than for /n/ and /d/. ........... 29 Figure 3-2: The average and standard deviations of F2 of /6/ vs. /n/ in nasal context of female speakers, given the succeeding vowel /o/. Notice that the F2 ranges of /6/ and /n/ do not ov erlap . ...................................................................................................................................... 30 Figure 3-3: The average and standard deviations of F2 of /6/ vs. /n/ in nasal context of male speakers, given the succeeding vowel /o/. Notice the slight overlap between the two F2 ranges. ................................................................................................................................................... 6 30 Figure 3-4: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of female speakers, regardless of the succeeding vowel. Notice that it is impossible to draw a line such that /6/ points would be on one side and /n/ points on the other. ...................................................................................... 35 Figure 3-5: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of male speakers, regardless of the succeeding vowel. Notice that it is impossible to draw a line such that /6/ points would be on one side and /n/ points on the other. ...................................................................................... 35 Figure 3-6: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of female speakers, given succeeding v ow el /i/. .................................................................................................................................... 36 Figure 3-7: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual female speakers, a) F#l, b) F#2, given succeeding vowel /i/. /6/ and /n/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-6). ........................................................................ 36 Figure 3-8: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of female speakers, given succeeding vowel /o/. Notice that /6/ and /n/ occupy different ranges of F2 but not of DeltaF2...........37 Figure 3-9: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual female speakers, a) F#1, b) F#2, given succeeding vowel /o/. /6/ and /n/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-8). ........................................................................ 37 Figure 3-10: The average and standard deviations of F2 of /6/ vs. /d/ in stop context of female speakers, given the succeeding vowel /o/. Notice that the F2 of /6/ is significantly lower than th at o f /d/....................................................................................................................................3 9 Figure 3-11: The average and standard deviations of F2 of /6/ vs. /d/ in stop context of male speakers, given the succeeding vowel /o/. Male speakers show similar trend as female speakers do . (F igure 3-10)........................................................................................................................39 Figure 3-12: F2 and amplitude difference of /6/ vs. /d/ in stop context of female speakers. Notice that /6/ and /d/ are clearly separated by both F2 and amplitude difference..........................43 Figure 3-13: F2 and amplitude difference of /6/ vs. /d/ in stop context of male speakers. Unlike in the similar plot for female speakers (Figure 3-12), the regions of /6/ and /d/ overlap slightly for th e m ale sp eak ers.......................................................................................................................43 Figure 3-14: F2 and amplitude difference of /6/ vs. /d/ in stop context of individual female speakers a) F#1, and b) F#2. Notice that regions of /6/ and /d/ are clearly and consistently separated in both sp eak ers..............................................................................................................................44 7 Figure 3-15: F2 and amplitude difference of /6/ vs. /d/ in stop context of individual male speakers a) M# 1, b) M#2. /6/ and /d/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-13) .................................................................................................. 44 Figure 3-16: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female speakers, b) male speakers, given succeeding vow el //.........................................................................................................50 Figure 3-17: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker a) M#1, b) M#2 given succeeding vowel /i/. /6/ and /v/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-16b). .................................................................... 50 Figure 3-18: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker a) F#1, b) F#2 given succeeding vowel /i/. /6/ and /v/ remain difficult to separate even when the speakers are analyzed individually.................................................................................................................50 Figure 3-19: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker a) M#1, b) M#2 in two pairs of sentences succeeded by similar vowels (/o/ and //). Notice the large variations in both F3 and D eltaF3 of /6/ and /v/. ...................................................................................... 51 Figure 3-20: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker (M#1) a) in sentences #3 and #11 (succeeded by vowel /c/) b) in sentences #5 and #16 (succeeded by vowel /D/). Notice the significant differences in F3 and DeltaF3 of both /6/ and /v/ between the two plots. 51 ................................................................................................................................................... Figure 3-21: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker (M#2) a) in sentences #3 and #11 (succeeded by vowel /c/) b) in sentences #5 and #16 (succeeded by vowel /;/). Notice similar decreases in F3 and DeltaF3 of both /6/ of /v/ are seen in graph b) as in Figure 3 -2 0 b .......................................................................................................................................... 51 Figure A-1: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of male speakers, given succeeding vowel /i/. Notice that the regions of /6/ and /n/ overlap. .................................................................. 60 Figure A-2: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual male speakers a) M#1, b) M#2, given succeeding vowel /i/. Notice that /6/ and /n/ are less separable for these males speakers than they are for females speakers (Figure 3-7).................................................... 60 Figure A-3: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of male speakers, given succeeding vowel /o /. .............................................................................................................................................. 8 61 Figure A-4: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual male speakers a) M#1, b) M#2, given succeeding vowel /o/. Notice that /6/ and /n/ become more separable (mostly by F2) for each individual speaker than for both speakers (Figure A-3), but the separation is not as distinct as in similar plots for the female speakers (Figure 3-9). ..................................... 61 Figure A-5: The average and standard deviation of F3 of /6/ vs /v/ in vowel context of a) female speakers, b) male speakers, regardless of the succeeding vowel.........................................62 Figure A-6: The average and standard deviation of F3 of /6/ vs /v/ in vowel context of a) female speakers, b) male speakers, given succeeding vowel /i/......................................................62 Figure A-7: The average and standard deviation of F3 of /6/ vs. /v/ in vowel context, given succeeding vowel /i/, of an individual male speaker (M#1). Notice that the overlap becomes significantly reduced for a given speaker than for both male speakers (Figure A-6b)..........62 Figure A-8: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) male, b) female speakers in two pairs of sentences succeeded by similar vowels (/3/ and /r/). Notice the wide ranges of F3 and DeltaF3 for both /6/ and /v/................................................................................................... 63 Figure A-9: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker a) F#1, b) F#2 in two pairs of sentences succeeded by similar vowels (/3/ and /0/). Notice the large variations in both F3 and DeltaF3 of /6/ and /v/........................................................................................ 64 Figure A- 10: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker (F# 1) a) in sentences #3 and #11 (succeeded by I1/) b) in sentences #5 and #16 (succeeded by /o/)..........64 Figure A- 11: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker (F#2) a) in sentences #3 and #11 (succeeded by vowel //) b) in sentences #5 and #16 (succeeded by vowel /3/). Notice the differences in F3 and especially in DeltaF3 between the two plots, similar to those seen in the other speakers (Figure A-10, 3-20, 3-21)...................................64 Figure A-12: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female speakers, b) male speakers, given succeeding vow el /o/................................................................................... 65 Figure A-13: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual female speakers a) F#1, b) F#2, given succeeding vowel /o/. Notice /6/ and /v/ are mixed in the same region.........65 Figure A-14: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual male speakers a) M#1, b) M#2, given succeeding vowel /o/. Notice /6/ have higher F3 and lower DeltaF3 than /v/--the same general pattern is see in A-13, despite its exceptions in individual enunciations. ........... 65 9 Figure A-15: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female, b) male speakers, given succeeding vow el /a /.................................................................................................................66 Figure A-16: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual speakers a) F#1, b) F#2, c) M#1, and d) M#2, given succeeding vowel /w/. Notice that only c) shows clear distinction between /6/ and /v/. The average values of /6/ and /v/ of each plot, however, consistently show that /6/ has higher F3 and lower DeltaF3 than /v/.................................................................. 66 Figure A-17: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female, b) male speakers given succeeding vowel /I/. Notice /6/ tend to have higher F3 than /v/, but their DeltaF3 values tend to be in the sam e range. ............................................................................................................. 67 Figure A- 18: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual speakers a) F# 1, b) F#2, c) M#1, d) M#2, given succeeding vowel /I/. Neither parameter is very effective at separating /6/ from /v/. In general, F3 is higher in /6/. The average DeltaF3 of /6/ seems slightly higher than that of /v/, which is an exception to the trend of DeltaF3 observed for the rest of the v ow els........................................................................................................................................6 10 7 List of Tables Table 2-1: The contexts, close-sounding consonants, succeeding vowel, and phrases studied in the project. The underlined segments are the parts of the phrases analyzed in detail.................19 Table 2-2: Number of utterances analyzed in the study for each context. ................................... 20 Table 3-1: F2 and F3 of /6/ vs. /n/ in the nasal context, listed by speakers and vowels. Notice that the corresponding F2 of /6/ is less than that of /n/ for all cases, but F3 does not show such consistency (the asterisks mark cases of inconsistency). Thus F2 is further analyzed..........27 Table 3-2: Average F2 values for /6/ vs. /n/ in the nasal context, listed by gender and succeeding vowel. Notice that F2 varies according to gender and the succeeding vowel, but F2 of /6/ is always lower than the corresponding F2 of /n/...................................................................... 27 Table 3-3: Average DeltaF2 of /6/ vs. /n/ in nasal context, listed by speaker and succeeding vowel. The average DeltaF2 is positive for utterances succeeded by /o/ and negative for those by /i/. DeltaF2 of /6/ is greater than that of /n/ in all cases. ........................................................... 32 Table 3-4: Average F2 values for /6/ vs. /d/ in stop context, listed by gender. Notice that F2 is higher in /d/ than in /6/. Note that for /6/ and /d/ only succeeding vowel /o/ is examined........38 Table 3-5: Amplitude difference between the high and mid frequency ranges of /6/ vs. /d/ in stop context, listed by speaker and enunciation. Notice /6/'s amplitude difference is usually negative and is less than that of /d/ in almost all cases........................................................ 40 Table 3-6: Burst duration of /6/ vs. /d/ in stop context, listed by each speaker. Notice that no consistent difference is seen between the burst duration of /6/ and /d/ across all speakers.......41 Table 3-7: Average F2 and amplitude difference between high and mid frequency ranges of /6/ vs. /d/ in stop context, listed by gender. Notice that F2 is substantially lower in /6/ than in /d/ and Amp(High-Mid) is negative for /6/ but positive for /d/........................................................42 11 Chapter I: Introduction 1.1 Motivation Progress in the intense study of acoustics and speech recognition in the past century has manifested in automatic speech recognition software such as SUMMIT used in the MIT-developed Jupiter weather information system. Jupiter, in particular, has been achieving response accuracy of about 90% for novice users and over 98% for experienced users [1]. Such software, however, has a limited domain of acceptable queries, set by the limited number of recognized words; Jupiter can only recognize about 2000 words. This limited scope of recognition is a result of the algorithm of the speech recognition units, such as SUMMIT, which recognize words in the queries by matching key segments of the speech signals with a pre-stored database of phonemes [1]. This kind of search and match algorithm makes real-time recognition of continuous speech impractical, both in terms of size limitations of the pre-stored library of vocabulary and the computation power currently available to run such an algorithm. In light of such limitations, this study is not interested in advancing automatic speech recognition by looking for ways to reduce the limitations of the search and match algorithm mentioned above, but by finding sets of acoustic cues that can unambiguously identify individual phonemes in continuous speech. This study examines the spectral characteristics of phonemes and aims to incorporate these characteristics into algorithms that computers can use to achieve automatic phoneme recognition. In the process, the goal is to gain further understanding of how the human perceptual system processes, identifies, and differentiates 12 between similar-sounding phonemes. In addition to building better recognition tools that can potentially identify every phoneme in continuous speech, this research can also contribute to the development of computerbased methods for transforming spoken speech into its written form. Such a function would be highly valuable in a number of applications, such as taking notes for deaf and hard-of-hearing individuals. 1.2 Background Information This project focuses on finding the acoustic characteristics of /6/ for two main reasons. First, /6/ is one of the most common phonemes in English [2]. Because /6/ is found in common function words such as they, them, those, then, and the etc, it is the 7 th most frequent consonant in spoken English and the most frequent consonant in word-initial position. Because function words could be important for extracting the meaning of sentences, it is important to be able to recognize intended /6/ in natural speech. Second, because /6/ is usually found at the same location, at the beginning of words, /6/ is easier to study. /6/ is a voiced fricative with weak noise, and is produced by air turbulence created when air from the lungs is forced through the vocal tract constriction formed by the tongue and the teeth [3]. /6/ is generally unstressed, as in see the ball, but it can also be stressed, as in see that ball. As mentioned, /6/ is often found at beginning of words, but it can also be found in middle of words, such as bother,father, weather; and in end of a few words, such as seethe, bathe, and teethe [4]. Some recent research in consonants in varied contexts that occur in normal speech has shown that the acoustic features of phonemes vary according to the identity of adjacent phonemes [3], [4]. Therefore, acoustic cues for identifying /6/ could be context-dependent-meaning that the 13 auditory system may identify /6/ based on one of several different sets of cues, depending on the perceived context. At the same time, given that our auditory system can recognize intended /6/ in various contexts as /6/, it is possible that there exist a set of invariant acoustic cues for speech perception--characteristics that are common of /b/ in all contexts. Since our purpose is to find such invariant and context-dependent cues, characteristics of /6/ are analyzed in various contexts. Spectral characterization of /6/ preceded by nasal consonants has already been studied for /n/ [4]. The research found that /6/ assimilates and becomes nasalized when preceded by /n/, i.e. the entire consonant region in the spectrogram of /6/ shows characteristics like those found in /n/. At the same time, acoustic evidence suggests that contextually-nasalized /6/ retains its dental place of articulation [4]. This evidence is based on the second formant frequency (f2) in the following vowel. F2 is considerably lower at the release of contextually-nasalized /6/ than at the release of a true /n/. Furthermore, listeners can generally tell the difference between natural tokens of win nose and win those, even when /6/ is completely nasalized [4]. And when synthetic stimuli were constructed in which signals differed only in F2 near the nasal consonant region of win nose and win those, listeners systematically reported hearing the latter more often when F2 was low at the release of the nasal consonant [4]. These results are consistent with the claims in literature that despite contextual assimilation, listeners can recognize the intended phoneme [3]. Finding the acoustic cues that help listeners to recognize all of the contextually modified /6/ as the same phoneme is the objective of this research. To achieve this objective, this study first analyzes characteristics of /6/ in isolated enunciation of vowel-consonant(fricative)-vowel 14 (VCV) combinations by comparing the spectrogram of /6/ to that of its two closest phonemes, /v/ and /z/. It then proceeds to examine /6/ in a variety of contexts in sentence material. 1.3 Preliminary Work Preliminary characterization has been done for the spectrograms of fricatives /6/, /z/, and /v/ that occur in isolated vowel-fricative-vowel (VCV) combinations containing the vowels /a/, /x/, /e/, /i/, and /u/. The fricative /z/ has visible high frequency content in its spectrogram that is not found in the spectrograms of /6/ and /z/. Thus, /z/ can easily be differentiated from the other two fricatives, and efforts are subsequently focused on finding the more subtle spectral differences between /6/ and /v/. To do so, a program called xkl is used to extract frequency information from the original spectrogram. (See Figure 1-1). An idealized spectrogram illustrating the features of /6/ and /v/ that are measured and compared is displayed in Figure 1-2. I 0 kHz 7- 4 ...- 23 r~0. a) b T 0 200 300 400 50 TIME Time (ms) Figure 1-1: a) An example spectrogram, b) An example output of formant tracks extracted by xkl. Note that this thesis denotes the second and third formant frequencies with lowercase "f", as "f2" and "f3", and denotes the second and third formant frequencies at the onset of the succeeding vowel with uppercase "F", as "F2" and "F3." 15 Onset of Fricative Release of Fricative I I Fricative(C) Vowel(V) -13 Vowel(v) F3 fcativ dutatioh -Fl El1 fi Time (Ms) Figure 1-2: An idealized spectrogram and the important acoustic features measured in the preliminary study. The parameters measured include +F1, +F2, +F3, -F1, -F2, -F3, +Slope of F2, +Slope of F3, -Slope of F2, -Slope of F3, and fricative duration. The most useful parameter for differentiating /6/ and /v/ turns out to be +F3. F3 values of /6/ are greater than those of /v/ in all of the CVC enunciations studied. The rest of the parameters can also be useful in differentiating between /6/ and /v/, especially when the context is known. Three common patterns of movements of F2 and F3 are observed, as shown in Figure 1-3. dh Time (ns) Time (ns) dl Time (ns) Figure 1-3: Three commonly observed patterns in the second and third formant frequencies of /6/ and /v/. 16 Furthermore, using combinations of parameters, /6/ and /v/ can be distinguishable in a number of utterances, but not in all. Figure 1-4 shows an example of a combination of characteristics that are successful in distinguishing /6/ and /v/. 26002500A 2400S2300 adh - 22002100- Au 2 00 0 , * * * 2000 2100 2200 2300 2400 2500 2600 2700 F3 Figure 1-4: Example of a combination (F3 and -F3) that separates /6/ and /v/. Notice that a line representing some differentiation criteria can be drawn and would separate all utterances of /6/ from those of /v/. 1.4 Research Objectives The objective of the project is to apply the methods and results of the analysis of the spectral characteristics of /6/ and /v/ in the simpler VCV enunciation toward finding cues that can identify all intended /6/'s in various contexts in continuous speech. More specifically, the research intends to accomplish the following: 1. Identify the invariant and contextual-based differences between the spectrograms and spectra of /6/ and its close-sounding phonemes. 2. Gain insight into how the human perceptual system processes and identifies the contextually-varying /6/ and other phonemes during cognitive processing of continuous speech. 17 Chapter II: Methodology 11.1 Overview This project examines the invariant- and contextual-based characteristics of /6/. A set of eighteen sentences is designed such that /6/ and its close-sounding phonemes are between the same preceding and succeeding context. These sentences are listed in Appendix A. Three contexts that are studied are nasal, stop, and vowel, and the corresponding close-sounding phonemes of /6/ are /n/, /d/ and /v/, respectively. Based on the finding of the preliminary work, useful characteristics in identifying /6/, such as F2 and F3, are measured and analyzed for the new database of sentences. Additional parameters of the burst of /6/ and /d/ in the context of stop consonants are also measured and analyzed. 11.2 Database Eighteen sentences containing /6/ and its close-sounding phonemes, /n/, /d/ and /v/, are constructed to form the database for this project (See Appendix A). Phrases in the original sentences that contain the consonants of interest are included in Table 2-1. The sequences of preceding context, consonant (/6/; or /n/, /d/, and /v/), and succeeding vowel that are analyzed in detail are underlined in Table 2-1. Because the succeeding vowel may influence the characteristics of the consonants, the succeeding vowels of each pair of phrases are chosen to be the same, as seen in the table. Type of Preceding Context Phoneme In The Preceding Context 161' Close Sounding Consonant (/6/ vs ...) Nasal /n/ /n/ Succeedin g oe // Phrase 1.win those win nose Stop It! /d/ /i/ 2. win these /o/ win Niece 1. putthose put dough Vowel /i/ iV/ 1. guarantee these guard TV's /1/ 2. see this see Victorian /u/ /3/ 3. to the racial two veracious /ai/ /W/ 4. dye that dye vat /a/ /o/ 5. via those via votes /ei/ /c/ 6. may then may vend Table 2-1: The contexts, close-sounding consonants, succeeding vowel, and phrases studied in the project. The underlined segments are the parts of the phrases analyzed in detail. To find the spectral characteristics of /6/ that are common to both genders, the sentences are spoken by both male and female speakers. Because there may be variations among speakers and enunciations, each sentence is spoken by two male and two female speakers and is repeated three times by each speaker. For each context, the database provides the following number of consonant pairs, listed in Table 2-2. 19 Type of Preceding Context Number of Pairs of Phrases Nasal Stop Vowel 2 1 6 Number ofRepetitions _umber____epetition 11 11 11 Total Number of UtterancesAnalyzed 22 11 66 Table 2-2: Number of utterances analyzed in the study for each context. The actual number of data analyzed may be less than the numbers listed here, because some formant frequencies are impossible to determine in some enunciations. Also, 12 repetitions (4 speakers and 3 repetition per speaker) were planned, but due to problems during recording one set of utterances by one of the female speakers was not used for the study. So, only 11 repetitions for pairs of sentences were available for analysis and study. 11.3 Parameters Analyzed 11.3.1 Second and Third Formant Frequencies at the Onset of the Succeeding Vowel (F2 and F3) The onset time of the succeeding vowel is determined by the combination of three pieces of information: changes in the enunciations' sound wave, the formant tracks extracted by xkl, and the spectrogram. Because vowels, unlike consonants, are formed with no obstruction in the vocal tract, significantly more sound energy is concentrated at the lower formant frequencies of vowels than in the case of consonants. High amplitude of sound energy is represented by dark bands on the spectrogram and by bold points in the formant tracks extracted by xkl. Thus, the abrupt appearance of dark bands in the spectrogram and of bold formant frequencies points in the formant tracks give the approximate onset time of the succeeding vowel. 20 A more exact onset time is determined by looking for abrupt changes in the waveform. Because the vocal tract has different shapes for vowels and consonants as a speaker switches from a consonant to a vowel, the change is reflected in the waveform, as shown in Figure 2-1. In the example given below, the point of abrupt change (the appearance of the first negative peak) at around 224 ms marks the onset of the vowel; this value would correspond to the times of abrupt appearance of dark bands and bold points in the spectrogram and formant tracks, respectively. 4000 -464.18 mrs (405) 2000 - -2000 V. -. yfi V . - ~l Ir -4000 170 180 190 200 210 TIME (ms) 220 230 240 250 Figure 2-1: A male speaker's waveform of the utterance "V" in "TV" (Sentence #12). The changes in the waveform at around 224 ms indicate the onset of the vowel. Because the interest is to study the pattern of change of the second and third formant frequencies over time, a narrow time window that gives detailed time domain information is used. In this case, a 6.0 millisecond Hamming window is chosen. Since the amplitudes of sound energy in higher formant frequencies tend to be low for vowels-thereby making determination of F3 more problematic--the pre-emphasis parameter in xkl is set to 100 to raise the amplitude of the F3 prominence. The formant tracks extracted by xkl (the right side figure in Figure 1-1) show abrupt jumps that are uncharacteristic of formant frequencies, due to the measuring algorithm used in xkl. To get 21 a more accurate measurement of F2 and F3, the formant frequencies given by the 6.0 ms Hamming window are averaged over 15 milliseconds--7 milliseconds before the onset time to 7 milliseconds after. An example of the averaged spectrum is shown in Figure 2-2. The auto-pick option is turned on to let xkl find the most accurate values for the formant frequencies, which correspond to the peak frequencies (x-coordinates) seen in the spectrum. dB 70 Avg DFT-spect (kn) win:6.Oms start 276 end 290 60 50 40 30 20 10 2741 Hz 40.3 dB 2741 Hz 31.2 dB 0 1 Figure 2-2: An exam ple 2 3 57 4 FREQ (kHz) 7 8 spectrum showing auto-picked value of F3 (the dotted vertical line) by xkl. 11.3.2 Change of f2 and 3 Over a Period of 50 ms (DeltaF2 and DeltaF3) The same parameter setting of 6.0 ms Hamming window, 100 pre-emphasis, and 15 ms time-averaging is used to read the f2 and f3 values at 50 ms after the onset. The midpoint of the time average is the onset time plus 50 ms. DeltaF2 and DeltaF3 are the differences of the f2 and f3 values at the later time minus those at the earlier time, respectively. 22 11.3.3 Amplitude Difference (Amp(High-Mid)) and Duration of Burst of /6/ and /d/ The burst spectrum of /d/ is expected to contain more energy in the higher frequency range than that of /6/, and /d/ is expected to have longer duration than /6/. Thus the burst amplitude and duration of the two consonant bursts are measured in the hope of finding characteristics that would separate the two phonemes. Example spectra of bursts of /6/ and /d/ that illustrate the difference are shown in Figure 2-3. dB 70 dB dB 70 60 60 50 50 40 40 30 30 20 20 10 10 -- 0 0 a) rI _ 1 2 3 4 5 FREO (kHz) 6 7 a b) 1 2 3 5 4 FREQ (kHz) 6 7 Figure 2-3: Spectra of the burst of a) /6/ vs. b) /d/, of a female speaker. Notice that a prominent peak at around 4.7 kHz is seen in the spectrum of /d/ but not in that of /6/. To measure and quantify this spectral difference, the amplitudes of the peaks in the high and mid frequency ranges are measured, and the difference is taken. This difference is named the amplitude difference, and will be denoted as Amp(High-Mid) from here on. Due to the differences in formant frequencies between the genders, the Amp(High-Mid) value is defined slightly differently for each gender; the cutoff frequency defining the ranges is higher for female speakers than for male speakers. More specifically: 23 1. For female speakers, Amp(High-Mid) equals the amplitude difference between the highest peak at frequencies higher than 4000 Hz and the highest peak (excluding the F1 peak) at under 3000 Hz. 2. For male speakers, Amp(High-Low) equals the amplitude difference between the highest peak at frequencies higher than 3500 Hz and the highest peak (excluding the F1 peak) at under 2500 Hz. As done for the succeeding vowels, the spectra of /6/ and /d/ are also generated by a 6.0Hamming window, averaged over 15 ms, and the formant frequencies for the spectra are autopicked by xkl. The midpoint of the time average is the onset of /6/ and /d/, which are easily determined by looking for abrupt changes in the waveform. (See Figure 2-4.) The pre-emphasis is set at 0, because /6/ and /d/ both have enough energy distributed in the higher frequencies for f3 to be determined unambiguously. Figure 2-4 also illustrates how the duration of /6/ and /d/ are determined. 24 -I -, w - e 270.32 ms (546). 6000 4000 2000 0 -2000 -4000 r 013 ms-5477 - - -6000 215 220 e 245 240 TIME (n) 235 230 225 a) 250 265 260 255 27, TI1E (sms)77 6000 - 3000 f 0 -3000 -6000 275 b) 280 285 290 295 300 310 305 TIME (ms) 315 320 325 330 335 Figure 2-4: Sample waveforms of a) /6/ and b) /d/. In each waveform, the two vertical lines represent the points of waveform change, and the time in between the lines is the duration of the burst. In each waveform, the former and latter points of change indicate the beginning and the end of the burst of /6/ or /d/. The time difference between the two points is the duration of the burst. 25 Chapter III: Results and Analysis The second and third formant frequencies at the onset of the succeeding vowel (F2 and F3) and their change over time (DeltaF2 and DeltaF3) were measured and analyzed for all three contexts. The duration and the amplitude differences of the burst of /6/ and /d/ are also examined in the context of a stop consonant. The resulting measurements of the three contexts are presented in the tables and figures below. Pairs of parameters that show consistent differences between /6/ and its close-sounding phonemes are graphed as (x, y) pairs for the purpose of finding criteria that would separate the consonants. 111.1 Context of Nasal Consonant 111.1.1 Second vs. Third Formant Frequencies (F2 vs. F3) F2 and F3, the second and third formants at the onset of the succeeding vowel, were measured for /6/ and /n/. Table 3-1 shows the average value of each speaker, given a particular consonant and succeeding vowel (F#1 denotes female speaker #1, and so on). 26 M#1 denotes male speaker #1, Parameter Vowel Consonant /o/ // /n/ F2 /n/ /o/ /n/ F3 /i/ /n/ F2 of Individual Speaker (Hz) M#2 F#2 M#J F#1 1386 1654 1606 1260 2394 1952 1312 1533 2363 2552 2773 3072 3009 3087 2037 2394 2793 2856 2825 2993 2016 2058 2667 2741 2694* 2667* 1974 2032 2342* 2258* 2426 2436 1 Table 3-1: F2 and F3 of /6/ vs. /n/ in the nasal context, listed by speakers and vowels. Notice that the corresponding F2 of /6/ is less than that of /n/ for all cases, but F3 does not show such consistency (the asterisks mark cases of inconsistency). Thus F2 is further analyzed. Notice that in Table 3-1, given the same speaker and vowel, the F2 of /6/ is less than that of /n/ for all cases. This difference is not only for the average values but for each enunciation as well. F3, however, does not show the same consistency. The average F3 of /6/ is less than that of /n/ for only 6 out of 8 of the cases (see Table 3-1; not true for the two sets of F3 appended with ""). The relationship between F3 of /6/ and /n/ is even less consistent in each enunciation; only about half of the enunciations have a higher F3 for /n/ than for /6/. Therefore, F2 is determined to be a better parameter for identifying /6/ in nasal context and is thus further studied. 111.1.2 Second Formant at the Onset of the Succeeding Vowel (F2) F2 of /6/ and /n/ is further analyzed by taking the average of F2 for all of the utterances produced within each gender group. See Table 3-2 below for results. Succeeding Vowel /o/ /i/ F2 of Female Speakers (Hz) /n/ /6/ 1993 1660 2200 2473 F2 o Male Speakers (Hz) /n/ /6/ 1423 1360 1995 2045 Table 3-2: Average F2 values for /6/ vs. /n/ in the nasal context, listed by gender and succeeding vowel. Notice that F2 varies according to gender and the succeeding vowel, but F2 of /6/ is always lower than the corresponding F2 of /n/. 27 Table 3-2 indicates that corresponding F2's differ significantly between the genders, with the difference ranging from around 200 to 570 Hz. This gender difference is expected because male speakers have longer vocal tracts, which produce lower formant frequencies, according to perturbation properties of resonators [6]. Table 3-2 indicates that the succeeding vowel also significantly influences F2. F2 of both /6/ and /n/ when succeeded by /o/ is consistently around 600 Hz lower than that of the consonants when succeeded by /i/. F2's correlation with the succeeding vowels is expected, because different vowels are results of different vocal tract shapes, which in turn resonate at different formant frequencies, according to the perturbation properties of resonators [6]. The vowel /i/ is expected to have a high F2 because /i/ has a fronted and high tongue body position [6]. On the other hand, /o/ has a back tongue position and thus should have a lower F2 [6]. The measured F2 agrees with these expectations, and thus helps ensure the validity of the rest of the analysis. The differences in F2 of the same consonant when succeeded by different vowels (Table 32) suggest that knowledge of the identity of the succeeding vowel could be important in the correct identification of /6/. More specifically, if the succeeding vowel is not taken into consideration, the average F2 of /6/ for female would be the average of 1660 and 2200 Hz (Table 3-2), or 1930 Hz; and F2 of /n/ would be the average of 1993 and 2473 Hz (Table 3-2), or 2233 Hz. Given these average F2 values of /6/ and /n/, a reasonable cutoff F2 to differentiate /6/ and /n/ could be the midpoint, or 2082 Hz, i.e. consonants with higher F2 would be identified as a /n/, and consonants with a lower F2 would be identified as a /6/. If this were the case, for /6/ and /n/ succeeded by /o/, 28 the vast majority of both /6/ and /n/ would be determined to be /6/, since more than half of the /n/ would have F2 less than 2082 Hz. To avoid this type of misclassification, differentiation criteria of the consonants would be much more accurate is they are set with regard to a particular succeeding vowel. Table 3-2 also shows that for both genders and vowels, F2 of /n/ is consistently greater than that of the corresponding /6/. This observation is reasonable based on the difference in the vocal tract of the two consonant. Figure 3-1 shows that /6/ has more constriction in the back cavity than /n/ (and /d/). The shape of the back cavity is known to have the strongest influence on the value of F2: the more constriction, the lower F2 [6]. The data obtained are consistent with findings in the literature and thus strengthen the validity of the results. tongue /d/ and /n/ /6/ Figure 3-1: The mid-sagittal view of the vocal track for /6/, /n/, and /d/ [5]. The back cavity is more restricted for /6/, the difference that leads to a lower F2 for /6/ than for /n/ and /d/. The average and standard deviation of F2 for vowel /o/ is calculated and graphed in Figure 3-2 and 3-3. The height of each bar represents the average F2 and the extension above and below the bar represents one standard deviation above and below the average. 29 2600 212_ 2200 N1 1800 - 1400 - 1000 - -0 1626 0/a/ H/n/ Figure 3-2: The average and standard deviations of F2 of /6/ vs. /n/ in nasal context of female speakers, given the succeeding vowel /o/. Notice that the F2 ranges of /6/ and /n/ do not overlap. 2600 - 2200 N 0 /6/ .I/n/ 1800 1423 1400 - 1323 LIr 1000 - Figure 3-3: The average and standard deviations of F2 of /6/ vs. /n/ in nasal context of male speakers, given the succeeding vowel /o/. Notice the slight overlap between the two F2 ranges. 30 For female speakers, the ranges of F2 for /6/ and /n/ are relatively distinct. Most of their F2 values fall in between 1590-1662 Hz for /6/, as opposed to 1886-2371 Hz for /n/. For male speakers, however, F2 of /n/ and /6/ are less distinct and their ranges overlap (Figure 3-3). The ranges of the majority of F2 values of male speakers are 1249-1398 Hz and 1301-1545 Hz, for /6/ and /n/ respectively. The lack of overlap of F2 ranges in female speakers (Figure 3-2) suggests that substantial differences exist between the F2's of the two consonants, and thus it is possible to set a criterion that would separate most utterances of /6/ from /n/ on the basis of F2 alone. In this particular case, the cutoff F2 that would best differentiate between the two consonants is somewhere between the lower range of /n/ (1886 Hz) and the higher range of /6/ (1662 Hz). If the criterion is set half way between the two values for simplicity, or at 1774 Hz, it would be 1.47 standard deviations away from the mean F2 of /n/ and 4.11 standard deviations from mean F2 of /6/. Therefore, more than 93% of /n/ and virtually 100% of /6/ would be identified correctly, assuming Gaussian distributions. With fine adjustments, the cutoff could be set so that it would be as many standard deviations away from the average of both consonants as possible. Such adjustment would further increase the accuracy of identifying the intended consonants. The criterion for differentiating /6/ and /n/ based on F2 is harder to set for the given male speakers. But since the two consonants are articulated similarly between the two genders, the apparent greater difficulty in differentiating /6/ and /n/ in the male speakers is most likely speaker specific due to the particular speakers studied. More specifically, F2 data for each utterance by the 31 two male speakers reveal substantial variation of F2 between the two speakers: F2 of one speaker is significantly and consistently higher than that of the other. For each speaker, however, F2 of /6/ and /n/ show differences similar to those observed in the female speakers Thus, the more acoustic characteristics are known about a particular speaker, the more accurate the identification of /6/ would likely to be for that speaker. 111.1.3 Movement of f2 Over a Period of 50 ms (DeltaF2) The second formant frequency at 50 ms after the onset of the vowel is measured by methods described in Chapter II. The average frequency differences at the two times for a given speaker and succeeding vowel are tabulated in Table 3-3. Succeeding Vowel /o/ /I Consonant Average DeltaF2 of Individual Speaker (Hz) /6/ /n/ /6/ F#1 -205 -409 315 F#2 -94 -188 410 M#1 -178 -179 221 M#2 -147 -221 111 /n/ 205 241 179 74 Table 3-3: Average DeltaF2 of /6/ vs. /n/ in nasal context, listed by speaker and succeeding vowel. The average DeltaF2 is positive for utterances succeeded by /o/ and negative for those by /i/. DeltaF2 of /6/ is greater than that of /n/ in all cases. Notice that the average DeltaF2 is negative for all speakers when the succeeding vowel is /o/ and is positive when the succeeding vowel is /i/. This observation is also consistent with characteristics of F2 of /o/ and /i/. As mentioned earlier, the tongue is raised and fronted when producing /i/. These actions cause a widening of the back cavity of the vocal tract and are reflected 32 by a rising F2. The tongue is moved toward a back position when producing /o/, and the movement results in a lowering of F2, as observed in the data. 111.1.4 F2 and DeltaF2 Since F2 and DeltaF2 are, for the most part, consistently different for /6/ and /n/, they are chosen to be graphed as x- and y-coordinates, i.e. (F2, Delta F2). Figure 3-4 and 3-5 show F2 and DeltaF2 of /6/ vs. /n/, regardless of the succeeding vowel, of the female and male speakers, respectively. Notice that /6/ and /n/ are mixed together in both figures. It is impossible to draw a line, representing a certain differentiation criterion, such that points of /6/ would be on one side and points of /n/ on the other. The F2 and DeltaF2 of /6/ and /n/ of the female speakers when plotted for a particular succeeding vowel (/i/), shown in Figure 3-6, show better separation between /6/ and /n/ than in the case of mixed vowels in Figure 3-4. The separation becomes even clearer in Figure 3-7, when the plots of F2 and DeltaF2 are further narrowed down to each speaker. Notice that for both female speakers, /6/ has lower F2 and higher DeltaF2 than /n/. The same trend is also observed for both female speakers in /6/ and /n/ succeeded by /o/ in Figures 3-8 and 3-9. After normalization by vowel and speaker, points for /6/ and /n/ can be separated onto different sides of a positively sloped line. Such a line would be the graphical representation of an algorithm that differentiates intended /6/ from /n/, and it would be specified by assigning the appropriate coefficients to F2 and DeltaF2. Figures 3-7 and 3-9 show that /6/ and /n/ are indeed distinct enough that it is possible for an algorithm based on F2 and DeltaF2 values to correctly distinguish between the two consonants. 33 Similar plots of F2 and DeltaF2 of /6/ and /n/ of a particular vowel for the each of the two male speakers are included in Appendix B (Figures A-2 and A-4). Notice that each of the four graphs shows more separation of the /6/ and /n/ points than their corresponding plot in which the two male speakers' data are mixed (Figures A-1 and A-3). Again, the clearer separation of /6/ and /n/ in the plots of for each individual speaker shows that, in addition to having knowledge of the succeeding vowel, knowledge of the particular speaker also greatly assists in separating /6/ from its close-sounding phoneme. The extent of separation between /6/ and /n/ in the four plots of Figures A-2 and A-4 for male speakers, however, is not as clear and consistent as for the females (Figures 3-7 and 3-9). Both Figure A-2a and A-4b show the same kind of separation as seen in the female speakers but with less distinction, whereas Figures A-2b and A-4a have overlapping /6/ and /n/ points. Despite the lack of more convincing separation in the male speakers, the clusters of /6/ and /n/ points are located in similar positions relative to each other. This similarity is expected because speakers of both genders articulate /6/ and /n/ by shaping their the vocal tracts in similar ways, thus resulting in the same relationship in the values of F2 and DeltaF2 between /6/ and /n/. The lack of more convincing separation for the male speakers is most likely due to the limited number of utterances that were available for this study. Had more speakers of both genders been asked to repeat each utterance more times, the pattern of relatively lower F2 and higher DeltaF2 of /6/ would probably be more apparent, and the separation between the two consonants would be more distinct. 34 M OUU - 400 200 S-200 - -400 -600 F2 (Hz) Figure 3-4: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of female speakers, regardless of the succeeding vowel. Notice that it is impossible to draw a line such that /6/ points would be on one side and /n/ points on the other. 600 400 2000 0 10 -200-400 -600 F2 (Hz) Figure 3-5: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of male speakers, regardless of the succeeding vowel. Notice that it is impossible to draw a line such that /6/ points would be on one side and /n/ points on the other. 35 600 400 200 0 10 49 "'Do /n S-200i -400 -600 F2 (Hz) Figure 3-6: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of female speakers, given succeeding vowel /i/. 600 600 400 400 200 200 -20J0 0 00 SA D/O 0 -200J -400 -400 -600 -600 n F2 (Hz) F2 (Hz) b) a) Figure 3-7: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual female speakers, a) F#1, b) F#2, given succeeding vowel /i/. /6/ and /n/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-6). 36 600 r400 200 + 0 16 -30 4 0 // -200 -400 -600 F2 (Hz) Figure 3-8: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of female speakers, given succeeding vowel /o/. Notice that /6/ and /n/ occupy different ranges of F2 but not of DeltaF2. 600 600 400 400 200 200 -20/ 10 0 mn 0 2500, 01A 00 3 0 15t~ ~2* -200 -400 -400 -600 -600 ZRI 31 B U F2 (Hz) F2 (Hz) b) a) Figure 3-9: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual female speakers, a) F#1, b) F#2, given succeeding vowel /o/. /6/ and /n/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-8). 37 111.2 Context of Stop Consonant 111.2.1 Second vs. Third Formant Frequencies (F2 vs. F3) Data on F2 and F3 similar to those in Table 3-1 were collected for phrases that compare /6/ and /d/. As in the case of nasal context, F2 of /6/ is consistently lower than that of /d/ whereas F3 of /6/ and /d/ does not show a consistent pattern. Therefore, F2 is further analyzed. 111.2.2 The Second Formant Frequency at the Onset of the Succeeding Vowel (F2) The second formant frequency of /6/ and /d/ is further analyzed by taking the average of F2 for all utterances produced by the female and male speakers separately. The results are summarized in Table 3-4. Note that for /6/ and /d/ only succeeding vowel /o/ is examined. F2 of Male Speakers (Hz) /d/ /6/ F2 of Female Speakers (Hz) /d/ /6/ 1695 1367 2079 1650 Table 3-4: Average F2 values for /6/ vs. /d/ in stop context, listed by gender. Notice that F2 is higher in /d/ than in /6/. Note that for /6/ and /d/ only succeeding vowel /o/ is examined. Again, F2 varies significantly between the genders, on the order of about 300 Hz. F2 of /6/ is lower than that of /d/, because like /n/, /d/ also has less constriction in the back cavity than /6/ (see Figure 3-1). The average and standard deviation of F2 of female and male speakers are shown in Figures 3-10 and 3-11. As in Figure 3-2, the height of each bar represents the average F2, whereas the extension above and below the bar represents one standard deviation away from the average. 38 2600 2200 2079 - - / 169_ 1800 -- M /d/ 1400 1000 - Figure 3-10: The average and standard deviations of F2 of /6/ vs. /d/ in stop context of female speakers, given the succeeding vowel /o/. Notice that the F2 of /6/ is significantly lower than that of /d/. 2600 - 2200 - 1650/ 1800 1397 1400 1000 Figure 3-11: The average and standard deviations of F2 of /6/ vs. /d/ in stop context of male speakers, given the succeeding vowel /o/. Male speakers show similar trend as female speakers do. (Figure 3-10) 39 For female speakers, the ranges of F2 of /6/ and /d/ are relatively distinct (Figure 3-10). Most of the F2 values of /6/ fell between 1660-1729 Hz as opposed to 2012-2146 Hz for /d/. For male speakers, F2 of /6/ and /d/ are relatively distinct too (Figure 3-11). Most F2 values of male speakers are in the range of 1329-1465 Hz and 1552-1747 Hz, for /6/ and /n/ respectively. Because the ranges do not overlap, as in the case of /6/ and /n/ in the nasal context, it is possible to set up a cutoff of F2 such that /6/ and /d/ would be correctly identified by their lower and higher F2 values, respectively. 111.2.3 Amplitude Difference In The Burst (Amp(High-Mid)) As described Chapter II, the energy distribution in the burst spectrum can potentially be used to distinguish intended /6/ and /d/; the amplitude differences in the burst of /6/ and /d/ are tabulated in Table 3-5. Speaker F#1 F#2 Amplitude Difference (Amp(High-Mid)) (dB) Consonant Enunciation #1 Enunciation #2 Enunciation #3 Average of the Enunciations /a/ /d/ /6/ N/a N/a -18.1 -11.1 6.7 -12.5 -14.5 5.8 -8.1 -12.8 6.25 -12.9 /d/ 17.3 14 6.3 12.6 M#1 /6/ M#2 /d/ / / /d/ -6.8 -7.8 1.3 1 -8.1 -1.2 -1.3 5.5 -3.5 -4.5 -6.2 10.4 -6.1 -4.5 -2.1 5.6 Table 3-5: Amplitude difference between the high and mid frequency ranges of /6/ vs. /d/ in stop context, listed by speaker and enunciation. Notice /6/'s amplitude difference is usually negative and is less than that of /d/ in almost all cases. 40 The negative amplitude difference that is consistently observed in most enunciations of /6/ is reasonable and expected because /6/ has more energy in the lower than in the higher frequency range [6]. The stop consonant /d/ is expected to have a positive amplitude difference, because alveolar stop consonants are known to have significant amount of energy in the high frequency range [6]. Such positive amplitude differences are consistently observed in all speakers except in male speaker #1 (M#1). Although M#1's amplitude differences of /d/ are consistently negative, his average value is not as negative as that of /6/. Thus even M#1 shows the expected relative difference between /6/ and /d/ on average, and his negative values for /d/ is most likely a speakerspecific characteristic of M#1 and not results of experimental error. If more enunciations were recorded, M# 1 would likely to continue to show negative amplitude difference for /d/, but M# 1 and others speakers are expected to have more negative amplitude difference for /6/ than for /d/. 111.2.4 Burst Duration Duration of the burst of /d/ and /6/ are measured from the original sound waves, and the results are tabulated in Table 3-6. Burst Duration of Individual Speaker (ms) Consonant /6/ /d/ F#1 15.5 13.1 F#2 11.8 21.4 M#1 16 10.3 M#2 16.1 16.3 Table 3-6: Burst duration of /6/ vs. /d/ in stop context, listed by each speaker. Notice that no consistent difference is seen between the burst duration of /6/ and /d/ across all speakers. Duration of /6/ and /d/ are similar for some speakers (F#1 and M#2) in both genders. For speaker F#2, however, the duration of /d/ is much longer than that of /6/, whereas the opposite is 41 true in speaker M#1. This inconsistency suggests that burst duration is not a useful characteristic in distinguishing the two consonants. The inconsistent duration is most likely due to speakers' differences in articulating the consonants. Some speakers pronounce /d/ more deliberately than /6/ leading to a longer burst duration for /d/, whereas other speakers pronounce /6/ more deliberately or similarly as /d/, leading to a shorter or similar burst duration, respectively. 111.2.5 F2 and Amp(High-Mid) The average F2 and average amplitude difference of each gender are listed in Table 3-7. Parameter F2 (Hz) Amp(High-Mid) (dB) Consonant /a/ Female Speakers 1690 Male Speakers 1337 /d/ 2090 1650 /d/ -12.9 3.4 -4.1 0.55 Table 3-7: Average F2 and amplitude difference between high and mid frequency ranges of /6/ vs. /d/ in stop context, listed by gender. Notice that F2 is substantially lower in /6/ than in /d/ and Amp(High-Mid) is negative for /6/ but positive for /d/. Since F2 and the amplitude difference show consistent differences for /6/ and /d/, they are graphed as x- and y-coordinates, i.e. (F2, Amp(High-Mid)), shown in Figure 3-12 to 3-15. 42 20 - U U 10- 0 99 + /6/ * U. g 00 H /d! -10 -20 F2 (Hz) Figure 3-12: F2 and amplitude difference of /6/ vs. /d/ in stop context of female speakers. Notice that /6/ and /d/ are clearly separated by both F2 and amplitude difference. 20 S10-- -10 -20 F2 (Hz) Figure 3-13: F2 and amplitude difference of /6/ vs. /d/ in stop context of male speakers. Unlike in the similar plot for female speakers (Figure 3-12), the regions of /6/ and /d/ overlap slightly for the male speakers. 43 20 20 - 10 -10 0 00 00 -10 -20 -20F F2 (Hz) F-2 (Hz) b) a) Figure 3-14: F2 and amplitude difference of /6/ vs. /d/ in stop context of individual female speakers a) F#1, and b) F#2. Notice that regions of /6/ and /d/ are clearly and consistently separated in both speakers. 20 20 S,10 10 - om -1D0 0 -10 -20 -20 F2 (Hz) F2 (Hz) a) * /6/ * /d/ b) Figure 3-15: F2 and amplitude difference of /6/ vs. /d/ in stop context of individual male speakers a) M#1, b) M#2. /6/ and /d/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-13). 44 As is the case for separation of /6/ and /n/ in the nasal context, the more additional information is known about the context and speaker, the more distinction is seen between characteristics of intended utterances of /6/ and /d/. For example, when the two male speakers are plotted together (Figure 3-13), regions of /6/ and /d/ overlap. When the speakers are plotted individually (Figure 3-15), however, regions of /6/ and /d/ become separable by a positively sloped line, because /6/ generally has lower F2 and amplitude difference than /d/. In this particular study, the two female speakers had correlated values of F2 and amplitude difference, and thus graphing them together or individually did not make much difference in terms of separating the consonants. The similar values are most likely due to the similar acoustic characteristics of the two speakers. Had more utterances of more speakers, both male and female, been studied, the separation of /6/ and /d/ would most likely follow the same trend as observed in the nasal context-the more additional information is known about the preceding and succeeding context and speaker, the more consistently the characteristics of /6/ would differ from those of /d/. Therefore, if more speakers and utterances are surveyed, a more accurate range of F2 and amplitude difference range can be established for each gender population, leading to a more optimal cutoff values of F2 and amplitude difference that would lead to more accurate identification of all intended utterances of /6/. More importantly, the more the acoustic characteristics are known about a particular speaker, the more fine-tuned the cutoff for F2 and amplitude would be to identify intended /6/ in the continuous speech of that particular speaker. 45 111.3 Context of Vowel 111.3.1 Second vs. Third Formant Frequencies (F2 vs. F3) As with the previous two contexts, both F2 and F3 were measured initially. Unlike the case for the other two contexts, however, it is the F3 of /6/ and /v/ that maintain a consistent difference, instead of F2. Out of the 66 utterances preceded by a vowel in the database (see Table 2-2), all of the F3 of /6/ are greater than that of /v/; F2 was not nearly as consistent between /6/ and /v/. Therefore, the rest of the study focuses on F3. 111.3.2 Third Formant Frequency at the Onset of the Succeeding Vowel (F3) Similar to the normalization of vowel and speaker done in the previous two contexts, the average and standard deviation of F3 are graphed first regardless of the succeeding vowel, then for a given succeeding vowel (/i/), and finally for a given succeeding vowel (/i/) and a particular speaker (M#1) (see Figure A-5 to A-7 in Appendix B). A similar pattern is observed in the vowel context: the range of F3 of /6/ and /v/ overlap less and less as more contextual factors are taken into account. Given the same succeeding vowel and speaker, the range of /6/ and /v/ did not overlap at all. Therefore, a cutoff F3 can be specified such that the vast majority of intended /6/ and /v/ would have F3 above and below the cutoff, respectively. 46 111.3.3 F3 and DeltaF3 In general, the separation of /6/ vs. /v/ is not as definite and consistent as for /6/ vs. /n/ and /d/ in the previous two contexts. For example, in the plots of each speaker for vowel /i/, shown below in Figure 3-17 and 3-18, /6/ and /v/ of each of the male speakers clearly have different ranges of F3 and DeltaF3 and thus are separable; /6/ has higher F3 but lower DeltaF3 than /d/. The same parameters of /6/ and /v/ from the same utterances made by the female speakers, however, overlap significantly (Figure 3-18). Likewise, F3 and DeltaF3 values of /6/ and /v/ obtained in phrases succeeded by vowel /I/ tend to clutter within the same ranges (Figure A-18 in Appendix B). All speakers' /6/ seem to have higher F3 than /v/, but their DeltaF3 values overlap significantly in F#1 and M#2 (Figure A-18a and A-18d). The overlap is even more evident in utterances of succeeding vowel /x/, shown in Figure A-16. For three of the four speakers (F#1, F#2, and M#2), points of /6/ and /v/ are mixed within the same region. The average F3 and average DeltaF3 of each speaker, however, do show consistency in that /6/ has higher average F3 but lower DeltaF3 than /v/ for all speakers. This difference agrees with the pattern in /6/ and /v/ of M#1(Figure A16c), who is the only speaker whose /6/ and /v/ show clear distinction. Therefore, despite a number of exceptions in individual enunciations, the average F3 and DeltaF3 are different enough for /6/ and /v/ that they are useful parameters for distinguishing the two consonants. Furthermore, the average F3 is higher in /6/ than in /v/--the same observation made during the primary work discussed in Chapter I. This consistency helps validate the results obtained in this study of characteristics of /6/ and /v/ in the context of vowels. 47 Data from all three contexts have shown that the succeeding vowel substantially influences formant frequencies. Thus, utterances of /6/ and /v/ succeeded by similar vowels and spoken by the same speaker are expected to have similar acoustic characteristics. Figure 3-19, however, shows that F3 and DeltaF3 of the same consonant in similar context by the same speaker can be quite different actually. Figure 3-19 includes F3 and DeltaF3 of two pairs of sentences that are succeeded by similar vowels /s/ and /z/. In one of the two pairs, with succeeding vowel /E/, "may then"(sentence #11) and "may vend" (sentence #3) are compared and contrasted. In the other pair, with succeeding vowel /3/, "to the racial" (sentence #16) and "two veracious" (sentence #5) are compared and contrasted. The difference is especially noticeable for the male speakers in Figure 320. Notice that the DeltaF3 of /6/'s from similar contexts (sentences #11 and #16) differ by almost 900 Hz, while the DeltaF3 of /v/'s from similar contexts differ by around 500 Hz. Likewise, in Figure 3-21, both F3 and DeltaF3 of /v/'s from similar contexts (sentences #3 and #5) differ by around 400 Hz. At the same time, Figure 3-20 and 3-21, in which the two pairs of sentences are graphed individually, indicate that within the same sentence, /6/ and /v/ show similar and consistent separation as seen in the sentences of other vowels. Therefore, aside form the succeeding vowel, there may still be other subtle context factors that influence the phoneme of interest, such as /6/ and /v/. Again, the more contextual information is known and taken into consideration, the more accurate the recognition would be. In this particular case, sentences #3 and #11 both have the consonant /n/ after the succeeding vowel /E/, whereas sentences #5 and #16 have consonant /r/ after vowel /o/. The consonant /r/ is known to strongly influence its preceding vowel. In this case, /o/ is altered by /r/ and thus shows F3 and DeltaF3 values that are uncharacteristic of a natural /o/. The 48 goal is to determine whether the sentence contains a /6/ or /v/ based on the consonant's effect on the succeeding vowel, but the vowel /;/ in this case is not only affected by the consonants but by /r/ as well. The effect of /r/ most likely has caused the decreases of F3 and DeltaF3 observed in the two sets of sentences in Figure 3-20 and 3-21. In general, F3 is not as reliable for identifying /6/ in vowel context as F2 is for stop and nasal context. The F3 is impossible to determine in some cases given the tools used. Furthermore, as seen in the plots, F3 has more exceptions to the general trend expected than F2 does for the nasal and stop contexts. For example, DeltaF3 is higher in /6/ for succeeding vowel /1/, which is the opposite from all the other vowels. Despite some inconsistency in F3, however, the F2's of /6/ and /v/ are even less consistent. Thus F3 and DeltaF3 remain as the most useful characteristics found thus far in differentiating between /6/ and /v/. F3 is found to be higher in /6/ than in /v/, while DeltaF3 is found to be lower in /6/ than in /v/ for most cases. 49 1000 1000 - * 800 800 N 600 7 600 Nr 400 */f/i Cu *// 200 200 0 0 )0 -2dgC 400 300 5"0 35 00 3000 250 -2d/ 35 00 F3 (Hz) F3 (Hz) b) a) Figure 3-16: F3 and Delta F3 of /6/ vs. /v/ in vo vel context of a) female speakers, b) male speakers, given succeeding vowel /i/. 1000 1000 800 800 600 600 400 400 200 200 *1/ Ely,/ 0 -20 0Do 03 -20U - F3 (Hz) F3 (Hz) b) a) Figure 3-17: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker a) M#1, b) M#2 given succeeding vowel /i/. /6/ and /v/ are more distinctly separated for each individual speaker than for both speakers (Figure 3-16b). 1000 - 1000 800 - 800 N Cu 4.) N 600 */6/ 400 E /v/ Cu 4.) 200 400 200 // - 0 0 00 -200 - 600 - 2500 3000 300 -20 0 DO - 2500 3000 35 F3 (Hz) F3 (Hz) b) a) I_ Figure 3-18: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker a) F#1, b) F#2 given succeeding vowel /i/. /6/ and /v/ remain difficult to separate even when the speakers are analyzed individually. 50 600 600 300 300 N 0 -30# Re no 3000 r0D 0 -3040 300/ -600 -600 - -900 -900 -1200 M /v/ -1200 F3 (Hz) F3 (Hz) I b)1 a) - Figure 3-19: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker a) M#1, b) M#2 in two pairs of sentences succeeded by similar vowels (/Q/ and I/). Notice the large variations in both F3 and DeltaF3 of /6/ and /v/. 600 600 300 300 al/vp , -600 -900 -1200 -1200 F3 (Hz) F3 (Hz) a) /V/ -600 - -900 "A -f/ -304 -304 W b ________________________________ ) Figure 3-20: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker (M#1) a) in sentences #3 and #11 (succeeded by vowel /e/) b) in sentences #5 and #16 (succeeded by vowel /3/). Notice the significant differences in F3 and DeltaF3 of both /6/ and /v/ between the two plots. 600 600 300 300 N U N 0 __li -3040 U 0 -304( -600 - -600 -900 -900 -1200 -1200 U/v/ F3 (Hz) F3 (Hz) b) a) Figure 3-21: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a male speaker (M#2) a) in sentences #3 and #11 (succeeded by vowel /c/) b) in sentences #5 and #16 (succeeded by vowel /3/). Notice similar decreases in F3 and DeltaF3 of both /6/ of /v/ are seen in graph b) as in Figure 3-20b. 51 Chapter IV: Conclusion and Future Work IV.1 Conclusion Substantial Variations In Formant Frequencies Much variation in formant frequencies of /6/ and its close-sounding consonants is observed, and such variation is mainly caused by three factors. First, gender affects formant frequencies: corresponding frequencies are lower in male speakers than in female speakers by orders of hundreds of Hertz. Such difference is due to male speakers' longer vocal tracts, which resonate at lower formant frequencies [6]. Second, the succeeding vowel substantially affects formant frequencies: certain vowels generally have higher formant frequencies than other vowels. This difference is caused by the difference in the shaping of vocal tracts and in the movements of articulators when producing different succeeding vowels. Third, within each gender, acoustic characteristics of individual speakers affect formant frequencies. Such speaker effects on formant frequencies are generally smaller than those of gender and succeeding vowel. And varying speaker-specific acoustic characteristics are mostly due to small variations among speakers' lengths of vocal tracts, widths of palatal vaults, arrangements of teeth, etc. Consistent Differences Between /o/ And Its Close-sounding Phonemes Despite variations between genders and among speakers, the same utterance repeated by the same speaker generally results in similar formant frequencies. Some general trends that are observed are: 52 1. In the context of nasal consonant, /6/ almost always assimilates to the nasalization of a preceding /n/. F2 of /6/ is almost always less than that of /n/. F2 alone can distinguish /6/ from /n/, but the combination of F2 and DeltaF2 separates the two consonants better. /6/ has lower F2 and higher DeltaF2 than /n/. 2. In the context of stop consonant, /6/ is almost always produced as a consonant with an abrupt release, rather than as a fricative. F2 of /6/ is almost always less than that of /d/. The high-frequency amplitude in the burst of /6/ and /d/ is lower in /6/ than in /d/. The difference between the high-frequency amplitude and the mid-frequency amplitude (in dB) is usually negative for /6/ but positive for /d/. F2 alone can distinguish /6/ from /d/, but the combination of F2 and amplitude difference distinguishes the consonants more reliably. 3. In the context of vowels, F3 of /6/ is higher than that of /v/. DeltaF3 of /v/ is higher than that of /6/ in most of the cases. /6/ and /v/ are harder to distinguish than /6/ and /n/ or /6/ and /d/. Algorithm For Identifying /6/ Harder To Specify Due to the influences of context and speaker, the identification criteria of factors such as F2 and F3 are mostly contextual- and speaker- based. Invariant cues have been harder to specify for a given speaker regardless of context, and even harder for the population of an entire gender. The separation of /6/ and its close-sounding phonemes in the two-dimensional plots suggests that contextual criteria based on the two parameters of the plots can be set for a given speaker and can identify the intended /6/ with good accuracy. To set more optimal criteria, studying larger database 53 of utterances and speakers would help by providing the recognizer with phonemic characteristics that are more representative of the population to compare with measurements made on the speech signals. Gathering additional acoustic measurements for individual speakers, however, would be even more useful in developing better speech recognition accuracy. Human's Natural Speech Recognition May Rely Heavily On Contextual Information Given that most cues found as useful in identifying /6/ are contextual-based and that they become more effective with more information about the speaker, it is likely that human ears and brains also rely heavily on contextual information during speech recognition. This conclusion corresponds to the general observation that it is easier to recognize words in context than in isolation and that it is easier to understand familiar speakers than unfamiliar ones. This speculation on human's heavy reliance on contextual cues in natural speech processing, however, does not exclude the possible existence of invariant cues. IV.2 Future Work Study Larger Database The current database of sentences and speakers shows promising distinctions between the acoustic characteristics of /6/ and its close-sounding consonants. However, some exceptions are observed, which would most likely become statistically insignificant if similar analysis is done on a larger database. Even for the utterances that have consistently showed the expected differences between the consonants, a greater database would provide more representative values for the 54 acoustic parameters and help establish more useful cues for distinguishing between /6/ and its close-sounding phonemes. Thus, studying more speakers, sentences, and repetitions would lead to more accurate identification of intended utterances of /6/ in continuous speech. Search For Additional Cues The more independent parameters that show differences between /6/ and its close-sounding phonemes are found, the more likely that the combination of such parameters would lead to better recognition of /6/. Additional parameters that can potentially be useful include those in the spectra of /n/ vs. /6/ in the nasal context and those in the spectra of /6/ vs. /v/ in the vowel context. Normalize Speakers Data have consistently shown that acoustic characteristics differ between each speaker, and such differences make it difficult for non-speaker-based cues to identify /6/ correctly. Thus, some calibration of characteristics between speakers would be essential for accurate recognition. Since average f3 is correlated to the length of the vocal tract for a particular speaker, f3 can be measured and averaged over time to normalize the differences in speakers' vocal tracts. The possibility of incorrect recognition due to speaker differences can be further reduced by pre-recording and preanalyzing each speaker's common vowels, if the situation permits. This pre-stored, speaker-specific information would help recognizer to take speaker-specific characteristics into consideration, and thus to interpret the measurements made in the speech signal more accurately. 55 Develop Recognition Algorithm Using Discriminative Analysis Given a number of parameters that have different values for /6/ and its close-sounding phonemes, discriminant analysis can be used to assign coefficients to the various parameters based on their relative usefulness in identifying /6/. For example, in the 2-D plots of F2 and DeltaF2, discriminant analysis can be used to determine the coefficients of F2 and DeltaF2 to specify a line that would best separate /6/ and /n/ in the plot. For this study, discriminant analysis is not practical because only two parameters are found helpful for each context and the sample size of utterances is statistically too small for the analysis. But given a larger sample size, from which hopefully more useful cues would be found, discriminant analysis can be applied to sets of cues and help optimize the recognition algorithm. Test On Continuous Speech Finally, the algorithm should be tested and fine-tuned on utterances of /6/ in natural sentences and eventually, in continuous speech. 56 V. References [1] Spoken Language Systems, http://www.sls.lcs.mit.edu/sls/whatwedo/applications/jupiter.html, http://www.sls.lcs.mit.edu/sls/whatwedo/architecture.html#SUMMIT, 1998. [2] Denes, P. B. On the Statistics of Spoken English. The Journalof the Acoustical Society of America, volume 88, number 6, p 894, 897, 1963. [3] Denes, P. B. and Pinson, E. N. The Speech Chain. New York: W.H. Freeman and Company, 1993. [4] Manuel, S. Y. Speaker Nasalize /o/ After /n/, but Listeners Still Hear /o/. Journal ofPhonetics, 23, 453-476, 1995. [5] Perkell J.S. and Klatt D.H., editor. Invariance and VariabilityIn Speech Progresses.Hillsdale: Lawrence Erlbaum Associates, 1986. [6] Stevens, K. N. Acoustic Phonetics. Cambridge: MIT Press, 2000. 57 VI. Appendices Appendix A: Sentences Used In The Study In the context of nasal consonant: 1. Every kid wants to win nose of Rudolf. (Sentence #1) Every kid wants to win those orange dolls. (Sentence #17) 2. She tried to win Niece Wendy's toys. (Sentence #9) She tried to win these Wendy's toys. (Sentence #20) In the context of stop consonant: 1. Put those in the refrigerator. (Sentence #18) Put dough in the refrigerator. (Sentence #8) In the context of vowels: 1. Try to guarantee these prices. (Sentence #7) Try to guard TV's Prices. (Sentence #12) 2. Point to the racial statements. (Sentence #16) Repeat two veracious statements. (Sentence #5) 3. See this era of classical paintings. (Sentence #13) See Victorian classical paintings. (Sentence #15) 4. Do not dye that indigenous curtain. (Sentence #14) Put the dye vat in Disney's corner. (Sentence #6) 58 5. They are chosen via those. (Sentence #10) They are chosen via votes. (Sentence #4) 6. She may vend Finny's Kiosk. (Sentence #3) She may then find the kiosk. (Sentence #11) 59 M Appendix B: Additional Results Appendix B.1 Context of Nasal Consonant Succeeding vowel /i/, male speakers. 600 400 200 I+ /5/ -200 -400 -600 F2 (Hz) Figure A-1: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of male speakers, given succeeding vowel /i/. Notice that the regions of /6/ and /n/ overlap. 600 600 400 400 200 200 pro 0 S-2OJ3 1500 2000 200 / -20 j -400 -400 -600 -600 F2 (Hz) F2 (Hz) a) 0 b) Figure A-2: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual male speakers a) M#1, b) M#2, given succeeding vowel /i/. Notice that /6/ and /n/ are less separable for these males speakers than they are for females speakers (Figure 3-7). 60 -M Appendix B.1 Context of Nasal Consonant (Con'd) Succeeding vowel /o/, male speakers. 600 400s 200-777 0 -200 - -400 -600 F2 (Hz) Figure A-3: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of male speakers, given succeeding vowel /o/. r4 600 600 400 400 200 200+ 0 -200Mf 46 I/ _20011 -400 -400 -600 -600 Mn F2 (Hz) F2 (Hz) a) DO0 b) Figure A-4: F2 and DeltaF2 of /6/ vs. /n/ in nasal context of individual male speakers a) M#l, b) M#2, given succeeding vowel /o/. Notice that /6/ and /n/ become more separable (mostly by F2) for each individual speaker than for both speakers (Figure A-3), but the separation is not as distinct as in similar plots for the female speakers (Figure 3-9). 61 Appendix B.2 Context of Vowel 3000 3000 1 - 2777 N 2600 - 2600 - 2200 2200 1800 - 1800 b) ' I a) L -I Ul/vi Ulvi Figure A-5: The average and standard deviation of F3 of /6/ vs /v/ in vowel context of a) female speakers, b) male speakers, regardless of the succeeding vowel. 3000 3000 - 2600 2600 -l5 2200 - - 20 22000 1800 18oo a) b) - Figure A-6: The average and standard deviation of F3 of /6/ vs /v/ in vowel context of a) female speakers, b) male speakers, given succeeding vowel /i/. 3000 2657 I' 24" 2600 -/6/ 2200 1800 Figure A-7: The average and standard deviation of F3 of /6/ vs. /v/ in vowel context, given succeeding vowel /i/, of an individual male speaker (M#1). Notice that the overlap becomes significantly reduced for a given speaker than for both male speakers (Figure A-6b). 62 Appendix B.2 Context of Vowel (Con'd) /b/ vs. /v/, given succeeding vowel /e/ and /a/ (2 pairs of sentences: #3 and #11, and #5 and #16) 600 300 0 -300 U /v/ -600 -900 -1200 F3 (Hz) a) 600 300 0 10 -300- -600 -900 -1200 F3 (Hz) b) Figure A-8: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) male, b) female speakers in two pairs of sentences succeeded by similar vowels (/a/ and /E/). Notice the wide ranges of F3 and DeltaF3 for both /6/ and /v/. 63 2 g 600 600 - 3Inn 300 N 0 -30J 0 2Gg 0 000 A -30J 2000 U 0 E/v/ 9-J -600 -600 -900 -900 -1200 -1200 F3 (Hz) F3 (Hz) b) a) Figure A-9: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker a) F#1, b) F#2 in two pairs of sentences succeeded by similar vowels (/o/ and /e/). Notice the large variations in both F3 and DeltaF3 of /6/ and /v/. 600 600 300 300 0-30O M// -30 -600 -600 -900 -1200 -1200 F3 (Hz) F3 (Hz) b) a) Figure A-10: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker (F#1) a) in sentences #3 and #11 (succeeded by II)b) in sentences #5 and #16 (succeeded by /o/). 600 600 300 300 0- 0J0-300 2000 10DO 3000 -300-Uv -600 -600 -900 -900 2000 M 40 / -1200 -1200 F3 (Hz) F3 (Hz) b) a) Figure A-l: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a female speaker (F#2) a) in sentences #3 and #11 (succeeded by vowel //) b) in sentences #5 and #16 (succeeded by vowel /3/). Notice the differences in F3 and especially in DeltaF3 between the two plots, similar to those seen in the other speakers (Figure A-10, 3-20, 3-2 1). 64 Appendix B.2 Context of Vowel (Con'd) /6/ vs. /v/, given succeeding vowel /o/ (sentences #10 and #4) 450 450 300- 300 150 */1/ S150 0 + 0 -150 -150 -300 -300 F3 (Hz) F3 (Hz) b) a) Figure A-12: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female speakers, b) male speakers, given succeeding vowel /o/. 45. !- - 450 300 300 150 150// // 0 0 0 -150 -150 -300 -300 F3 (Hz) F3 (Hz) b) a) Figure A-13: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual female speakers a) F#1, b) F#2, given succeeding vowel /o/. Notice /6/ and /v/ are mixed in the same region. N 450 450 300 300 150 0 2C00 -150 150/ E/l/ 0 2500 3000 -0 20 -150 35 00 E /V/ 2 3000 350 -300 -300- F3 (Hz) F3 (Hz) b) a) Figure A-14: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual male speakers a) M#1, b) M#2, given succeeding vowel /o/. Notice /6/ have higher F3 and lower DeltaF3 than /v/--the same general pattern is see in A- 13, despite its exceptions in individual enunciations. 65 Appendix B.2 Context of Vowel (Con'd) /6/ vs. /v/, given succeeding vowel /e/ (sentences #14 and #6) N 450 450 300 300 150 150 U * Cu 0 0- F3(Hz) DO -150 - -150 -300 -300 F3 (Hz) F3 (Hz) b) a) Figure A-15: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female, b) male speakers, given succeeding vowel /4/. N 450 450 - 300 - 300 N~ 150 Cu -M, 150 I/v/ 0j- 0 -15- -150 -300 -300 F3 (Hz) F3 (Hz) b) - a) 450 450 300 NT N 150 150 Cu * /v/ Mu 0 2000 -150 300 2500 3000 0 2coo -150 350 0 * /v;/ 2500 3000 35 0 -300 -300 F3 (Hz) F3 (Hz) d)1 c) Figure A-16: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual speakers a) F#1, b) F#2, c) M#1, and d) M#2, given succeeding vowel /w/. Notice that only c) shows clear distinction between /6/ and /v/. The average values of /6/ and /v/ of each plot, however, consistently show that /6/ has higher F3 and lower DeltaF3 than /v/. 66 Appendix B.2 Context of Vowel (Con'd) /b/ vs. /v/, given succeeding vowel /I/ (sentences #13 and #15) 300 N - 300 1 150 0 20 0 N 3000 2"0. 35 150 0 20 I0 0/V/ */a/ -150 -150 -300 -300 op * 32 250 - / 3*0 F3 (Hz) F3 (Hz) b) a) Figure A-17: F3 and Delta F3 of /6/ vs. /v/ in vowel context of a) female, b) male speakers given succeeding vowel /I/. Notice /6/ tend to have higher F3 than /v/, but their DeltaF3 values tend to be in the same range. 300 awl 150 150 N 0 2( 0 22 D/OI -150 - -150 -300 - -300 D/O 0 F3 (Hz) F3 (Hz) b) a) T 300 300 150 150 0 2( 30 3: 2500 6T 3000 0 I 20 00 35 00 -150 -150 -300 - -300 2500 3000 35 M F3 (Hz) F3 (Hz) d) 1 c) Figure A-18: F3 and Delta F3 of /6/ vs. /v/ in vowel context of individual speakers a) F#1, b) F#2, c) M#1, d) M#2, given succeeding vowel /I/. Neither parameter is very effective at separating /6/ from /v/. In general, F3 is higher in /6/. The average DeltaF3 of /6/ seems slightly higher than that of /v/, which is an exception to the trend of DeltaF3 observed for the rest of the vowels. 67

Analysis of Acoustic Cues for

Related documents

Products

Support

Analysis of Acoustic Cues for

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib