The Perception of Speech

advertisement
The Perception of Speech
Speech
•Speech is for rapid communication
•Speech is composed of units of sound called
phonemes
–examples of phonemes: /ba/ in bat , /pa/ in pat
Acoustic Properties of Speech
•Speech can be characterized by a spectrogram
Acoustic Properties of Speech
•Spectrogram reveals differences between phonemes
•The differences are in the formants and the formant
transitions
Perceiving Speech
• So perceiving (interpreting) speech sounds is simply
a matter of matching the spectrotemporal properties
(the shape of the spectrogram) of the incoming sound
waves to the appropriate phoneme
• right?…
Perceiving Speech
• So perceiving (interpreting) speech sounds is simply
a matter of matching the spectrotemporal properties
(the shape of the spectrogram) of the incoming sound
waves to the appropriate phoneme
• Then specific phonemes must correspond to specific
spectrograms - a property called acoustic-phonetic
invariance
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/ followed by different
vowels:
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/
•Clearly perception and understanding of speech
sounds is more elaborate than simply interpreting an
internal spectrogram
Perceiving Speech
•The phrase “Peter buttered the burnt toast” has five
/t/ phonemes. There are not 5 identical sweeps in
the spectrogram
Perceiving Speech
•Segmentation is the perception of silence
between words
•Often illusory
Perceiving Speech
•The phrase “I owe you a Yo-Yo” has no silence in it !
Perceiving Speech
• So how do you perceive speech?
Some of the “strategies”:
1. reduce the data
2. use context clues
3. use vision
Categorical Perception
•Categorical Perception is a phenomenon in
which the brain assigns a stimulus into one or
another category but never into an intermediate
category
Categorical Perception
•For example, /ba/ and /pa/ differ in their
formant transitions
–/ba/ is formed by stopping the flow of air from the
lungs and releasing it after about 10 milliseconds
(called voice onset time)
–/pa/ is similar except that voice onset time is about 50
ms
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
•Listeners will hear either /ba/ or /pa/ but never
something in between
Categorical Perception is Part of
Learning a Language
• Babies can discriminate /ba/ from /pa/ and
can discriminate these from phonemes with
intermediate voice onset times!
• By 10 to 12 months, babies (learning
English) stop discriminating intermediate
voice onset times
Categorical Perception is Part of
Learning a Language
• Once category boundaries are learned it is
impossible to unlearn them
– non-native speakers of any language often cannot
hear certain phonemes the way native speakers
do
– as a consequence they will always have at least
some slight accent
Categorical Perception
•Another example:
Perception (of all types) Makes
Use of Context
• The stream of information contained in
speech is usually ambiguous and incomplete
• Your brain makes a “best guess” based on
the circumstances
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
• Listeners report hearing the “appropriate” phoneme
during the cough
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
•The brain seamlessly integrates any
information it is given - this is called crossmodal integration
Cross-modal Integration
•Speech perception involves the synthesis of
vision and hearing
•The McGurk effect demonstrates the critical
role of vision on speech perception
Cross-modal Integration
QuickTime™ and a
Sorenson Video 3 decompressor
are needed to see this picture.
•The McGurk Effect
Cross-modal Integration
•The McGurk Effect - demonstrates that
visual and auditory information are combined
to enhance speech perception
Next Time: Vision
Download