Lecture 20 Speech Perception What we know Vowel Perception Vowel perception is dependent on being able to hear formants. What are formants? Concentrations of energy in bands resulting from resonance in the vocal tract. " It has been found that, in general, the first two or three formants (F1, F2 and F3) are sufficient for the perceptual identification and differentiation of vowels Lets look at this figure. Diphthong Perception What is a diphthong? It is a combination of two vowels There is a distinct place where one changes or goes into transition and this transition can be seen over time in a spectrogram as a shift in the formants. The transition of formants seen in diphthong represents changes in resonance that occurs as the tongue moves form one position to the next. So it is this formant transitions that give us acoustic cues allowing us to perceive the diphthong. Look at Figure 8.24 Consonant Perception What does the book say about their perception? The process is more complex than for vowels because the various consonants are dependent on the vowels for their correct perception. To say that another way, a consonant by itself has little acoustic information necessary for it's perception. It needs to be colored by a vowel to have meaning. If we record stop consonants and then splice out the vowel, the result is a sound that is not perceived as a stop. So "stop consonant perception is dependent on rapidly changing formant transitions from the consonant to the vowel in a consonant-vowel (CV) context. The key word her is context. Consonant perception is highly dependant on the vowel context in which it is uttered. The perception of a consonant is dependent on the formant transition and the resulting acoustic signal. However, if the acoustic signal resulting from only the formant transition is presented to a listener, they will not hear just the isolated consonant. If we erase the vowel portion of a recorded CV the perception of the consonant sound also goes away and what is remaining does not have any meaning nor does it sound like a speech sound. To say this another way, even when the acoustic information for a consonant sound is preserved, when the vowel is removed the consonant no longer is a consonant sound. We also know that consonant perception remains the same from one CV context to another even though the acoustic information from the transitions change. What is a Suprasegmental? Definition is on page 325 Read the section starting on page 219 Why is it important to perception? Book says they are prosodic features They are elements of speech that occur at the same time as the individual phonemes and can be reflected in changes to the acoustic signal resulting from the speech sound production. As the book " … they are not confined to phonetic segments and, instead, are overlaid or superimposed on syllables, words, phrases and sentences." In a real sense, they are flavoring or coloring or adding meaning.. All of this is achieved from Intonation Stress Timing All can change the meaning of an utterance so they are a critical component to speech perception. How we, as listeners, perceive suprasegmentals is related to the acoustic changes in the wave form from the utterances. Those changes may be in the form of Reduced intensity Increased intensity Changes in the fundamental frequency Temporal (timing) changes So what is intonation? It is the perception we get from alterations in the fundamental frequency during speech production. Recall that in perception we don't talk about frequency but rather pitch so they are perceptions of pitch or changes in pitch during speech. We can use intonation for several purposes including Differences in meaning? Making a statement a question? Stress relates back to loudness or softness duration and fundamental frequency An example a string of phonemes can represent a noun or a verb through changes in stress. OBject noun obJECT verb Finally, quality can be had or perceived through changes in timing cues. Changes in duration either Relative or Absolute Issues in Speech Perception There are three and they are Invariance Linearity Segmentation In one sense they are issues but in another sense they are problems that need to be explained as we try and understand how we perceive speech. They relate to how we, as listeners, recognize spoken utterances from the acoustic information that is present in the waveform resulting from that utterance. "The principle of acoustic-phonetic invariance states that corresponding to each phoneme (speech sound) is a distinct set of acoustic features, so that each time a given phoneme is produced, the same acoustic cues are identifiable in the speech signal, regardless of context." "The linearity principle proposes that in a spoken word, a specific sound corresponds to each phoneme, with units of sound corresponding to phonemes being discrete and ordered In a particular sequence." " The segmentation principle asserts that the speech signal can be divided (and recombined) into acoustically independent units that correspond to specific phonemes." What they are trying to say is that there appears (or should be) to be a one-to-one correspondence between each phoneme and the acoustic signal present when they are produced. If that were so then speech perception would simply be a matter of correct or proper pattern recognition. There is a but And it is a big but. Over the past fifty years research has been able to establish that in normal conversational speech there is no proof of evidence of invariance, linearity or segmentation. Specifically, what research into speech perception has found that makes the evidence for invariance, linearity and segmentation suspect is: 1. The acoustic cues in speech out number the phonemes in words. 2. The acoustic properties in a given phoneme vary in different ordering context (CV, VC, CVC) 3. That a particular point in a speech stream there is overlapping information present about the acoustic properties of a specific phoneme as well as the phonemes that precedes and follow it. 4. Studies that track the articulators have shown that during speech production the shape or configuration of the vocal tract and the oral cavity is influenced by the shape of the phonemes that precede as well as those that follow the phoneme 5. The temporal boundaries between utterances of actual phonemes are not consistent. They may vary by several microseconds.. 6. The phenomenon of coarticulation which is the simultaneous movement of two articulators shows that there is a clear lack of segmentation in the speech signal. We also know that the evidence for invariance, linearity and segmentation suspect is because there are often large acoustic differences in phoneme utterances by the same person and even larger differences between individual talkers. We do know that lower forms of animals communicate with each other and with us, yet speech communication is a characteristic that is, for the most part, limited to mankind. It has been speculated that somehow, through either a divine or an evolutionary process, mankind has come to have some specialized neural mechanism that is unique and suited to the process of speech perception. Lets skip over now and take a brief look at the Theories of Speech Perception.