Chapter 12

advertisement
Chapter 12
Speech Perception
Animals use sound to communicate
in many ways






Bird calls
Whale calls
Baboons shrieks
Vervet calls
Grasshopper rubbing legs
These kinds of communication differ from
language in the structure of the signals.
Speech perception is a broad
category


Understanding what is said (linguistic
information)
Understanding “paralinguistic information”
Speaker’s identity
 Speaker’s affective state


Speech processing ≠ linguistics processing.
Vocal tract




Includes larynx, throat, tongue, teeth, and lips.
Vocal chords = vocal folds
Male vocal chords 60% larger than female vocal
chords in humans
Size of vocal chords are not the sole cue to sex
of speaker. Children’s voices can be
discriminated.
Physical disturbances in air ≠
phonemes

Many different sounds
are lumped together in a
every single phoneme.

Another case of
separating the physical
from the psychological.

Humans normally speak at about 12 phonemes per
second.

Humans can comprehend speech at up to about 50
phonemes per second.

Voice spectrogram changes with age.

Spectrograms can be taken of all sorts of sounds.
Neural analysis of speech sounds

One phoneme can have distinct sound spectrograms.
Distinct sound spectrograms can be metamers for a
phoneme.
Brain mechanisms of speech
perception
http://www.molbio.princeton.edu/courses/mb427/2000/projects/0008/messedupbrainmain.html
Brain mechanisms of speech
perception
Single-cell recordings in monkeys show they are
sensitive to:
1. Time lapsing between lip movements and start
of sound production
2. Acoustic context of sound
3. Rate of sound frequency changes
Human studies

Human studies have been based on neuroimaging
(fMRI and PET).

A1 is not a linguistic center; merely an auditory center.
It does not respond preferentially to speech, rather than
sound.

Speech processing is a grab bag of kinds of processing,
e.g. linguistic, emotional, and speaker identity.
Wernicke’s aphasia


Subjects can hear sounds.
Subjects lose ability to comprehend speech,
though they can produce (clearly disturbed)
speech themselves.
Other brain regions involved in
speech processing

Right temporal hemisphere is involved in
emotion, speaker sex, and identity.

Phonagnosia

Right temporal hemisphere is less involved in
linguistic analysis.

Right pre-frontal cortex and parts of the limbic
systems respond to emotion.
Other brain regions involved in
speech processing

Both hemispheres active in human vocalizations,
such as laughing or humming.

Some motor areas for speech are active during
speech perception.
A “what” and “where” pathway in
speech processing?

One pathway is anterior (forward) and ventral
(below)

The other pathway is posterior (backward) and
dorsal (above).

Not clear what these pathways do.
Understanding speech: Aftereffects

Tilt aftereffect and motion aftereffect due to “fatigue” of
specific neurons.

Eimas & Corbett, (1973), performed a linguistic version.

Take ambiguous phonemes, e.g. between /t/ and /d/.

Listen to /d/ over and over, then the ambiguity disappears.
Understanding speech:
Context effects

In vision, surrounding objects affect interpretation of size,
color, brightness. In other words, context influences
perception.

In speech, context influences perception. We noted this
earlier with /di/ and /du/.
Understanding speech:
Context effects

Semantic context can influence perception.


Speed of utterance influences phonetic interpretation.


Examples of song lyrics.
A syllable may sound like /ba/ when preceding words are
spoken slowly, but like /pa/ when preceding words are
spoken quickly.
Cadence of a sentence can influence interpretation of
the last word. (Ladeford & Broadbent, 1957)
Understanding speech:
visual effects

McGurk Effect
Movies of speakers influence syllables heard.
Vocal /ga/ + lip /ba/ = /da/
 Vocal “tought” + lip “hole” = “towel”.


McGurk effect reduced with face inversion
Emotions of talking heads



Movie of facial emotion + voice with an
emotion
When face and voice agree, most subject
correctly identity emotion.
When face and voice conflict, facial expression
provided the emotion.

McGurk effect + talking heads effect makes
sense, since it enables humans to function more
reliably in noise environments.

Infants 18-20 weeks old can match voice and
face.

Humans can match movies of speakers with
voices of speakers.
Monkeys and preferential looking

Ghazanfar & Logothetis, (2003).

Showed monkeys two silent movies of monkeys
vocalizing at the same time.
Played a vocalization that matched one of the
silent movies.
All 20 monkeys looked at the monkey face that
matched the sound.


More neuroimaging of speech
perception

Subjects watched faces of silent speakers.

MT (aka V5) was active for motion processing.

A1 and additional language centers were also
active.

Perceived sound boundaries in words are illusory.

“Mondegreens”

Pauses indicate times at which to switch speakers.

Disfluency: repetitions, false starts, and useless
interjections.

Help by parsing sentence, give subject time to process, and
hinting at new information.

Language-based learning impairment: A
specifically linguistic, rather than acoustic
impairment.
Fun illusion
(nothing to do with class):

http://www.ritsumei.ac.jp/~akitaoka/indexe.html
Download