The Perception of Speech

advertisement
The Perception of Speech
Speech
•Speech is for rapid communication
•Speech is composed of units of sound called
phonemes
–examples of phonemes: /ba/ in bat , /pa/ in pat
Acoustic Properties of Speech
•Speech can be characterized by a spectrogram
Acoustic Properties of Speech
•Spectrogram reveals differences between phonemes
•The differences are in the formants and the formant
transitions
Perceiving Speech
• So perceiving (interpreting) speech sounds is simply
a matter of matching the spectrotemporal properties
(the shape of the spectrogram) of the incoming sound
waves to the appropriate phoneme
• right?…
Perceiving Speech
• So perceiving (interpreting) speech sounds is simply
a matter of matching the spectrotemporal properties
(the shape of the spectrogram) of the incoming sound
waves to the appropriate phoneme
• Then specific phonemes must correspond to specific
spectrograms - a property called acoustic-phonetic
invariance
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/ followed by different
vowels:
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/
•Clearly perception and understanding of speech
sounds is more elaborate than simply interpreting an
internal spectrogram
Perceiving Speech
•The phrase “Peter buttered the burnt toast” has five
/t/ phonemes. There are not 5 identical sweeps in
the spectrogram
Perceiving Speech
•Segmentation is the perception of silence
between words
•Often illusory
Perceiving Speech
•The phrase “I owe you a Yo-Yo” has no silence in it !
Perceiving Speech
• So how do you perceive speech?
Some of the “strategies”:
1. reduce the data
2. use context clues
3. use vision
Categorical Perception
•Categorical Perception is a phenomenon in
which the brain assigns a stimulus into one or
another category but never into an intermediate
category
Categorical Perception
•For example, /ba/ and /pa/ differ in their
formant transitions
–/ba/ is formed by stopping the flow of air from the
lungs and releasing it after about 10 milliseconds
(called voice onset time)
–/pa/ is similar except that voice onset time is about 50
ms
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
•Listeners will hear either /ba/ or /pa/ but never
something in between
Categorical Perception is Part of
Learning a Language
• Babies can discriminate /ba/ from /pa/ and
can discriminate these from phonemes with
intermediate voice onset times!
• By 10 to 12 months, babies (learning
English) stop discriminating intermediate
voice onset times
Categorical Perception is Part of
Learning a Language
• Once category boundaries are learned it is
impossible to unlearn them
– non-native speakers of any language often cannot
hear certain phonemes the way native speakers
do
– as a consequence they will always have at least
some slight accent
Categorical Perception
•Another example:
Perception (of all types) Makes
Use of Context
• The stream of information contained in
speech is usually ambiguous and incomplete
• Your brain makes a “best guess” based on
the circumstances
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
• Listeners report hearing the “appropriate” phoneme
during the cough
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
•The brain seamlessly integrates any
information it is given - this is called crossmodal integration
Cross-modal Integration
•Speech perception involves the synthesis of
vision and hearing
•The McGurk effect demonstrates the critical
role of vision on speech perception
Cross-modal Integration
QuickTime™ and a
Sorenson Video 3 decompressor
are needed to see this picture.
•The McGurk Effect
Cross-modal Integration
•The McGurk Effect - suggests that visual
and auditory information are combined to
enhance speech perception under normal
circumstances
•When visual and auditory information are
incongruous the resulting perception is
unpredictable and often wrong
Next Time: Taste Smell Touch
Balance
Download