Chapter 12 Speech Perception Animals use sound to communicate in many ways Bird calls Whale calls Baboons shrieks Vervet calls Grasshopper rubbing legs These kinds of communication differ from language in the structure of the signals. Speech perception is a broad category Understanding what is said (linguistic information) Understanding “paralinguistic information” Speaker’s identity Speaker’s affective state Speech processing ≠ linguistics processing. Vocal tract Includes larynx, throat, tongue, teeth, and lips. Vocal chords = vocal folds Male vocal chords 60% larger than female vocal chords in humans Size of vocal chords are not the sole cue to sex of speaker. Children’s voices can be discriminated. Physical disturbances in air ≠ phonemes Many different sounds are lumped together in a every single phoneme. Another case of separating the physical from the psychological. Humans normally speak at about 12 phonemes per second. Humans can comprehend speech at up to about 50 phonemes per second. Voice spectrogram changes with age. Spectrograms can be taken of all sorts of sounds. Neural analysis of speech sounds One phoneme can have distinct sound spectrograms. Distinct sound spectrograms can be metamers for a phoneme. Brain mechanisms of speech perception http://www.molbio.princeton.edu/courses/mb427/2000/projects/0008/messedupbrainmain.html Brain mechanisms of speech perception Single-cell recordings in monkeys show they are sensitive to: 1. Time lapsing between lip movements and start of sound production 2. Acoustic context of sound 3. Rate of sound frequency changes Human studies Human studies have been based on neuroimaging (fMRI and PET). A1 is not a linguistic center; merely an auditory center. It does not respond preferentially to speech, rather than sound. Speech processing is a grab bag of kinds of processing, e.g. linguistic, emotional, and speaker identity. Wernicke’s aphasia Subjects can hear sounds. Subjects lose ability to comprehend speech, though they can produce (clearly disturbed) speech themselves. Other brain regions involved in speech processing Right temporal hemisphere is involved in emotion, speaker sex, and identity. Phonagnosia Right temporal hemisphere is less involved in linguistic analysis. Right pre-frontal cortex and parts of the limbic systems respond to emotion. Other brain regions involved in speech processing Both hemispheres active in human vocalizations, such as laughing or humming. Some motor areas for speech are active during speech perception. A “what” and “where” pathway in speech processing? One pathway is anterior (forward) and ventral (below) The other pathway is posterior (backward) and dorsal (above). Not clear what these pathways do. Understanding speech: Aftereffects Tilt aftereffect and motion aftereffect due to “fatigue” of specific neurons. Eimas & Corbett, (1973), performed a linguistic version. Take ambiguous phonemes, e.g. between /t/ and /d/. Listen to /d/ over and over, then the ambiguity disappears. Understanding speech: Context effects In vision, surrounding objects affect interpretation of size, color, brightness. In other words, context influences perception. In speech, context influences perception. We noted this earlier with /di/ and /du/. Understanding speech: Context effects Semantic context can influence perception. Speed of utterance influences phonetic interpretation. Examples of song lyrics. A syllable may sound like /ba/ when preceding words are spoken slowly, but like /pa/ when preceding words are spoken quickly. Cadence of a sentence can influence interpretation of the last word. (Ladeford & Broadbent, 1957) Understanding speech: visual effects McGurk Effect Movies of speakers influence syllables heard. Vocal /ga/ + lip /ba/ = /da/ Vocal “tought” + lip “hole” = “towel”. McGurk effect reduced with face inversion Emotions of talking heads Movie of facial emotion + voice with an emotion When face and voice agree, most subject correctly identity emotion. When face and voice conflict, facial expression provided the emotion. McGurk effect + talking heads effect makes sense, since it enables humans to function more reliably in noise environments. Infants 18-20 weeks old can match voice and face. Humans can match movies of speakers with voices of speakers. Monkeys and preferential looking Ghazanfar & Logothetis, (2003). Showed monkeys two silent movies of monkeys vocalizing at the same time. Played a vocalization that matched one of the silent movies. All 20 monkeys looked at the monkey face that matched the sound. More neuroimaging of speech perception Subjects watched faces of silent speakers. MT (aka V5) was active for motion processing. A1 and additional language centers were also active. Perceived sound boundaries in words are illusory. “Mondegreens” Pauses indicate times at which to switch speakers. Disfluency: repetitions, false starts, and useless interjections. Help by parsing sentence, give subject time to process, and hinting at new information. Language-based learning impairment: A specifically linguistic, rather than acoustic impairment. Fun illusion (nothing to do with class): http://www.ritsumei.ac.jp/~akitaoka/indexe.html