Chapter 12

Chapter 12
Speech Perception
Animals use sound to communicate
in many ways
Bird calls
Whale calls
Baboons shrieks
Vervet calls
Grasshopper rubbing legs
These kinds of communication differ from
language in the structure of the signals.
Speech perception is a broad
Understanding what is said (linguistic
Understanding “paralinguistic information”
Speaker’s identity
 Speaker’s affective state
Speech processing ≠ linguistics processing.
Vocal tract
Includes larynx, throat, tongue, teeth, and lips.
Vocal chords = vocal folds
Male vocal chords 60% larger than female vocal
chords in humans
Size of vocal chords are not the sole cue to sex
of speaker. Children’s voices can be
Physical disturbances in air ≠
Many different sounds
are lumped together in a
every single phoneme.
Another case of
separating the physical
from the psychological.
Humans normally speak at about 12 phonemes per
Humans can comprehend speech at up to about 50
phonemes per second.
Voice spectrogram changes with age.
Spectrograms can be taken of all sorts of sounds.
Neural analysis of speech sounds
One phoneme can have distinct sound spectrograms.
Distinct sound spectrograms can be metamers for a
Brain mechanisms of speech
Brain mechanisms of speech
Single-cell recordings in monkeys show they are
sensitive to:
1. Time lapsing between lip movements and start
of sound production
2. Acoustic context of sound
3. Rate of sound frequency changes
Human studies
Human studies have been based on neuroimaging
(fMRI and PET).
A1 is not a linguistic center; merely an auditory center.
It does not respond preferentially to speech, rather than
Speech processing is a grab bag of kinds of processing,
e.g. linguistic, emotional, and speaker identity.
Wernicke’s aphasia
Subjects can hear sounds.
Subjects lose ability to comprehend speech,
though they can produce (clearly disturbed)
speech themselves.
Other brain regions involved in
speech processing
Right temporal hemisphere is involved in
emotion, speaker sex, and identity.
Right temporal hemisphere is less involved in
linguistic analysis.
Right pre-frontal cortex and parts of the limbic
systems respond to emotion.
Other brain regions involved in
speech processing
Both hemispheres active in human vocalizations,
such as laughing or humming.
Some motor areas for speech are active during
speech perception.
A “what” and “where” pathway in
speech processing?
One pathway is anterior (forward) and ventral
The other pathway is posterior (backward) and
dorsal (above).
Not clear what these pathways do.
Understanding speech: Aftereffects
Tilt aftereffect and motion aftereffect due to “fatigue” of
specific neurons.
Eimas & Corbett, (1973), performed a linguistic version.
Take ambiguous phonemes, e.g. between /t/ and /d/.
Listen to /d/ over and over, then the ambiguity disappears.
Understanding speech:
Context effects
In vision, surrounding objects affect interpretation of size,
color, brightness. In other words, context influences
In speech, context influences perception. We noted this
earlier with /di/ and /du/.
Understanding speech:
Context effects
Semantic context can influence perception.
Speed of utterance influences phonetic interpretation.
Examples of song lyrics.
A syllable may sound like /ba/ when preceding words are
spoken slowly, but like /pa/ when preceding words are
spoken quickly.
Cadence of a sentence can influence interpretation of
the last word. (Ladeford & Broadbent, 1957)
Understanding speech:
visual effects
McGurk Effect
Movies of speakers influence syllables heard.
Vocal /ga/ + lip /ba/ = /da/
 Vocal “tought” + lip “hole” = “towel”.
McGurk effect reduced with face inversion
Emotions of talking heads
Movie of facial emotion + voice with an
When face and voice agree, most subject
correctly identity emotion.
When face and voice conflict, facial expression
provided the emotion.
McGurk effect + talking heads effect makes
sense, since it enables humans to function more
reliably in noise environments.
Infants 18-20 weeks old can match voice and
Humans can match movies of speakers with
voices of speakers.
Monkeys and preferential looking
Ghazanfar & Logothetis, (2003).
Showed monkeys two silent movies of monkeys
vocalizing at the same time.
Played a vocalization that matched one of the
silent movies.
All 20 monkeys looked at the monkey face that
matched the sound.
More neuroimaging of speech
Subjects watched faces of silent speakers.
MT (aka V5) was active for motion processing.
A1 and additional language centers were also
Perceived sound boundaries in words are illusory.
Pauses indicate times at which to switch speakers.
Disfluency: repetitions, false starts, and useless
Help by parsing sentence, give subject time to process, and
hinting at new information.
Language-based learning impairment: A
specifically linguistic, rather than acoustic
Fun illusion
(nothing to do with class):