The Perception of Speech

advertisement
The Perception of Speech
Speech
•Speech is for rapid communication
•Speech is composed of units of sound called
phonemes
–examples of phonemes: /ba/ in bat , /pa/ in pat
Acoustic Properties of Speech
•Speech can be characterized by a spectrogram
Acoustic Properties of Speech
•Spectrogram reveals differences between phonemes
Perceiving Speech
• So perceiving (interpreting) speech sounds
is simply a matter of matching the
spectrotemporal properties (the shape of the
spectrogram) of the incoming sound waves
to the appropriate phoneme
• right?…
Perceiving Speech
• So perceiving (interpreting) speech sounds
is simply a matter of matching the
spectrotemporal properties (the shape of the
spectrogram) of the incoming sound waves
to the appropriate phoneme
• Then specific phonemes must correspond to
specific spectrograms - a property called
acoustic-phonetic invariance
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/ followed by different
vowels:
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/
•Clearly perception and understanding of speech
sounds is more elaborate than simply interpreting an
internal spectrogram
Perceiving Speech
•The phrase “Peter buttered the burnt toast” has
five /t/ phonemes. There are not 5 identical sweeps
in the spectrogram
Perceiving Speech
•Segmentation is the perception of silence
between words
•Often illusory
Perceiving Speech
•The phrase “I owe you a Yo-Yo” has no silence in it
!
Perceiving Speech
• So how do you perceive speech?
Perceiving Speech
• So how do you perceive speech?
Some of the “strategies”:
1. reduce the data
2. use context clues
3. use vision
Categorical Perception Sifts
through the Incoming Sound
•Categorical Perception is a phenomenon in
which the brain assigns a stimulus into one or
another category but never into an
intermediate category
Categorical Perception
•For example, /ba/ and /pa/ differ in their
formant transitions
–/ba/ is formed by stopping the flow of air from the
lungs and releasing it after about 10 ms. (called
voice onset time)
–/pa/ is similar except that voice onset time is about
50 ms
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
•Listeners will hear either /ba/ or /pa/ but
never something in between
Categorical Perception is Part of
Learning a Language
• Babies can discriminate /ba/ from /pa/ and
can discriminate these from phonemes with
intermediate voice onset times!
• By 10 to 12 months, babies (learning
English) stop discriminating intermediate
voice onset times
Categorical Perception is Part of
Learning a Language
• Once category boundaries are learned it is
impossible to unlearn them
– non-native speakers can often never hear
certain phonemes
– as a consequence they will always have at least
some slight accent
Categorical Perception
•Another example:
Perception (of all types) Makes
Use of Context
• The stream of information contained in
speech is usually ambiguous and incomplete
• Your brain makes a “best guess” based on
the circumstances
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
• Listeners report hearing the “appropriate”
phoneme during the cough
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
•The brain seamlessly integrates any
information it is given - this is called crossmodal integration
Cross-modal Integration
•Speech perception involves the synthesis of
vision and hearing
•The McGurk effect demonstrates the critical
role of vision on speech perception
Cross-modal Integration
QuickTime™ and a
Sorenson Video 3 decompressor
are needed to see this picture.
•The McGurk Effect
Next Time:
• Vision
Download