The Perception of Speech

advertisement
The Perception of Speech
Speech
•Speech is for rapid communication
•Speech is composed of units of sound
called phonemes
–examples of phonemes: /ba/ in bat , /pa/ in
pat
Acoustic Properties of Speech
•Speech can be characterized by a
spectrogram
Acoustic Properties of Speech
•Spectrogram reveals differences between
phonemes
Perceiving Speech
• So perceiving (interpreting) speech sounds is
simply a matter of matching the
spectrotemporal properties (the shape of the
spectrogram) of the incoming sound waves to
the appropriate phoneme
• right?…
Perceiving Speech
• So perceiving (interpreting) speech sounds is
simply a matter of matching the
spectrotemporal properties (the shape of the
spectrogram) of the incoming sound waves to
the appropriate phoneme
• Then specific phonemes must correspond to
specific spectrograms - a property called
acoustic-phonetic invariance
Perceiving Speech
•Acoustic - Phonetic invariance says that
phonemes should match one and only one
pattern in the spectrogram
–This is not the case! For example /d/ followed by
different vowels:
Perceiving Speech
•Acoustic - Phonetic invariance says that
phonemes should match one and only one
pattern in the spectrogram
–This is not the case! For example /d/
•Clearly perception and understanding of
speech sounds is more elaborate than simply
interpreting an internal spectrogram
Perceiving Speech
•The phrase “Peter buttered the burnt toast”
has five /t/ phonemes. There are not 5
identical sweeps in the spectrogram
Perceiving Speech
•Segmentation is the perception of silence
between words
•Often illusory
Perceiving Speech
•The phrase “I owe you a Yo-Yo” has no silence
in it !
Perceiving Speech
• So how do you perceive speech?
Some of the “strategies”:
1. reduce the data
2. use context clues
3. use vision
Categorical Perception
•Categorical Perception is a phenomenon
in which the brain assigns a stimulus into
one or another category but never into an
intermediate category
Categorical Perception
•For example, /ba/ and /pa/ differ in their
formant transitions
–/ba/ is formed by stopping the flow of air from
the lungs and releasing it after about 10
milliseconds (called voice onset time)
–/pa/ is similar except that voice onset time is
about 50 ms
Categorical Perception
•Voice onset time can range from zero to
>50 ms. For example, you could
synthesize a sound with a voice onset
time of 30 ms but...
Categorical Perception
•Voice onset time can range from zero to
>50 ms. For example, you could
synthesize a sound with a voice onset
time of 30 ms but...
•Listeners will hear either /ba/ or /pa/ but
never something in between
Categorical Perception is Part of
Learning a Language
• Babies can discriminate /ba/ from /pa/
and can discriminate these from
phonemes with intermediate voice onset
times!
• By 10 to 12 months, babies (learning
English) stop discriminating
intermediate voice onset times
Categorical Perception is Part of
Learning a Language
• Once category boundaries are learned it
is impossible to unlearn them
– non-native speakers of any language often
cannot hear certain phonemes the way
native speakers do
– as a consequence they will always have at
least some slight accent
Categorical Perception
•Another example:
Perception (of all types) Makes
Use of Context
• The stream of information contained in
speech is usually ambiguous and
incomplete
• Your brain makes a “best guess” based
on the circumstances
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
• Listeners report hearing the “appropriate”
phoneme during the cough
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system
when there is information in two !?
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system
when there is information in two !?
•The brain seamlessly integrates any
information it is given - this is called
cross-modal integration
Cross-modal Integration
•Speech perception involves the
synthesis of vision and hearing
•The McGurk effect demonstrates the
critical role of vision on speech perception
Cross-modal Integration
QuickTime™ and a
Sorenson Video 3 decompressor
are needed to see this picture.
•The McGurk Effect
Cross-modal Integration
•The McGurk Effect - demonstrates that
visual and auditory information are
combined to enhance speech
perception
Next Time: Vision
Download