The Perception of Speech

advertisement
The Perception of Speech
Speech
•Speech is for rapid communication
•Speech is composed of units of sound called
phonemes
–examples of phonemes: /ba/ in bat , /pa/ in pat
Acoustic Properties of Speech
•Speech can be characterized by a spectrogram
Acoustic Properties of Speech
•Spectrogram reveals differences between phonemes
•The differences are in the formants and the formant
transitions
Perceiving Speech
• So perceiving (interpreting) speech sounds is simply
a matter of matching the spectrotemporal properties
(the shape of the spectrogram) of the incoming sound
waves to the appropriate phoneme
• right?…
Perceiving Speech
• So perceiving (interpreting) speech sounds is simply
a matter of matching the spectrotemporal properties
(the shape of the spectrogram) of the incoming sound
waves to the appropriate phoneme
• Then specific phonemes must correspond to specific
spectrograms - a property called acoustic-phonetic
invariance
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/ followed by different
vowels:
Perceiving Speech
•Acoustic - Phonetic invariance says that phonemes
should match one and only one pattern in the
spectrogram
–This is not the case! For example /d/
•Clearly perception and understanding of speech
sounds is more elaborate than simply interpreting an
internal spectrogram
Perceiving Speech
•The phrase “Peter buttered the burnt toast” has five
/t/ phonemes. There are not 5 identical sweeps in
the spectrogram
Perceiving Speech
•The Segmentation Problem
•Segmentation is the perception of silence
between words
•Often illusory
Perceiving Speech
•The phrase “I owe you a Yo-Yo” has no silence in it !
Spoken Input
• The Segmentation Problem:
– The stream of acoustic input is not physically segmented into discrete
phonemes, words, phrases, etc.
– Silent gaps don’t always indicate (aren’t perceived as) interruptions in
speech
Spoken Input
• The Segmentation Problem:
– The stream of acoustic input is not physically segmented into discrete
phonemes, words, phrases, etc.
– Continuous speech stream is sometimes perceived as having gaps
Perceiving Speech
• So how do you perceive speech?
Some of the “strategies”:
1. reduce the data
2. use context clues
3. use vision
Categorical Perception
•Categorical Perception is a phenomenon in
which the brain assigns a stimulus into one or
another category but never into an intermediate
category
Categorical Perception
•For example, /ba/ and /pa/ differ in their
formant transitions
–/ba/ is formed by stopping the flow of air from the
lungs and releasing it after about 10 milliseconds
(called voice onset time)
–/pa/ is similar except that voice onset time is about 50
ms
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
Categorical Perception
•Voice onset time can range from zero to >50
ms. For example, you could synthesize a
sound with a voice onset time of 30 ms but...
•English speakers will hear either /ba/ or /pa/
but never something in between
Categorical Perception is Part of
Learning a Language
• Babies can discriminate /ba/ from /pa/ and
can discriminate these from phonemes with
intermediate voice onset times!
• By 10 to 12 months, babies (learning
English) stop discriminating intermediate
voice onset times
Categorical Perception is Part of
Learning a Language
• Once category boundaries are learned it is
impossible to unlearn them
– non-native speakers of any language often cannot
hear certain phonemes the way native speakers
do
– as a consequence they will always have at least
some slight accent
Categorical Perception
•Another example:
Perception (of all types) Makes
Use of Context
• The stream of information contained in
speech is usually ambiguous and incomplete
• Your brain makes a “best guess” based on
the circumstances
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
Perception (of all types) Makes
Use of Context
• Consider the following example:
shoe”.
“The __eel fell of the
cough
car”.
• Listeners report hearing the “appropriate” phoneme
during the cough
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
Much of Speech Perception isn’t
Auditory !
•Why rely on only one sensory system when
there is information in two !?
•The brain seamlessly integrates any
information it is given - this is called crossmodal integration
Cross-modal Integration
•Speech perception involves the synthesis of
vision and hearing
•The McGurk effect demonstrates the critical
role of vision on speech perception
Cross-modal Integration
•The McGurk Effect
Cross-modal Integration
•The McGurk Effect - suggests that visual
and auditory information are combined to
enhance speech perception under normal
circumstances
•When visual and auditory information are
incongruous the resulting perception is
unpredictable and often wrong
Auditory Scene Analysis
• Sounds don’t happen in isolation, they
happen in streams of changing frequencies
• How does the system group related auditory
events into streams and keep different
streams separate?
Auditory Scene Analysis
• Solving this problem is called Auditory Scene
Analysis
• One important principle is proximity –in pitch,
time, or spatial location
Auditory Scene Analysis
• Effect of timing proximity:
Slow
Fast
Auditory Scene Analysis
• Effect of timing proximity:
Slow
Fast
Do you hear this?
Or this?
Auditory Scene Analysis
• Effect of pitch proximity:
far
close
Auditory Scene Analysis
• Effect of pitch proximity:
Do you hear this?
Or this?
far
close
Auditory Scene Analysis
• Effect of proximity:
– auditory system groups together events that
happen close together in time and frequency
Auditory Scene Analysis
• Effect of proximity:
– auditory system groups together events that
happen close together in time and frequency
– This enables us to perceive meaningful streams of
information when they are mixed with distraction
Auditory Scene Analysis
• Effect of proximity:
– auditory system groups together events that
happen close together in time and frequency
– This enables us to perceive meaningful streams of
information when they are mixed with distraction
– Interestingly, the brain can disentangle mixed
streams only certain circumstances
• E.g. “The picket fence illusion” : gaps of silence
dramatically distort perception of a sentence, while bursts
of noise do not
Next Time: Taste Smell Touch
Balance
Download