The Perception of Speech Speech •Speech is for rapid communication •Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat , /pa/ in pat Acoustic Properties of Speech •Speech can be characterized by a spectrogram Acoustic Properties of Speech •Spectrogram reveals differences between phonemes Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • right?… Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • Then specific phonemes must correspond to specific spectrograms - a property called acoustic-phonetic invariance Perceiving Speech •Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram –This is not the case! For example /d/ followed by different vowels: Perceiving Speech •Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram –This is not the case! For example /d/ •Clearly perception and understanding of speech sounds is more elaborate than simply interpreting an internal spectrogram Perceiving Speech •The phrase “Peter buttered the burnt toast” has five /t/ phonemes. There are not 5 identical sweeps in the spectrogram Perceiving Speech •Segmentation is the perception of silence between words •Often illusory Perceiving Speech •The phrase “I owe you a Yo-Yo” has no silence in it ! Perceiving Speech • So how do you perceive speech? Some of the “strategies”: 1. reduce the data 2. use context clues 3. use vision Categorical Perception •Categorical Perception is a phenomenon in which the brain assigns a stimulus into one or another category but never into an intermediate category Categorical Perception •For example, /ba/ and /pa/ differ in their formant transitions –/ba/ is formed by stopping the flow of air from the lungs and releasing it after about 10 milliseconds (called voice onset time) –/pa/ is similar except that voice onset time is about 50 ms Categorical Perception •Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but... Categorical Perception •Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but... •Listeners will hear either /ba/ or /pa/ but never something in between Categorical Perception is Part of Learning a Language • Babies can discriminate /ba/ from /pa/ and can discriminate these from phonemes with intermediate voice onset times! • By 10 to 12 months, babies (learning English) stop discriminating intermediate voice onset times Categorical Perception is Part of Learning a Language • Once category boundaries are learned it is impossible to unlearn them – non-native speakers of any language often cannot hear certain phonemes the way native speakers do – as a consequence they will always have at least some slight accent Categorical Perception •Another example: Perception (of all types) Makes Use of Context • The stream of information contained in speech is usually ambiguous and incomplete • Your brain makes a “best guess” based on the circumstances Perception (of all types) Makes Use of Context • Consider the following example: shoe”. “The __eel fell of the cough car”. Perception (of all types) Makes Use of Context • Consider the following example: shoe”. “The __eel fell of the cough car”. • Listeners report hearing the “appropriate” phoneme during the cough Much of Speech Perception isn’t Auditory ! •Why rely on only one sensory system when there is information in two !? Much of Speech Perception isn’t Auditory ! •Why rely on only one sensory system when there is information in two !? •The brain seamlessly integrates any information it is given - this is called cross-modal integration Cross-modal Integration •Speech perception involves the synthesis of vision and hearing •The McGurk effect demonstrates the critical role of vision on speech perception Cross-modal Integration QuickTime™ and a Sorenson Video 3 decompressor are needed to see this picture. •The McGurk Effect Cross-modal Integration •The McGurk Effect - demonstrates that visual and auditory information are combined to enhance speech perception Next Time: Vision