Lecture 20

advertisement
Lecture 20
Speech Perception
What we know
Vowel Perception
Vowel perception is dependent on being able to hear formants.
What are formants?
Concentrations of energy in bands resulting from resonance in the vocal tract.
" It has been found that, in general, the first two or three formants (F1, F2 and F3) are
sufficient for the perceptual identification and differentiation of vowels
Lets look at this figure.
Diphthong Perception
What is a diphthong?
It is a combination of two vowels
There is a distinct place where one changes or goes into transition and this transition can
be seen over time in a spectrogram as a shift in the formants.
The transition of formants seen in diphthong represents changes in resonance that occurs
as the tongue moves form one position to the next.
So it is this formant transitions that give us acoustic cues allowing us to perceive the
diphthong.
Look at Figure 8.24
Consonant Perception
What does the book say about their perception?
The process is more complex than for vowels because the various consonants are
dependent on the vowels for their correct perception.
To say that another way, a consonant by itself has little acoustic information necessary
for it's perception.
It needs to be colored by a vowel to have meaning.
If we record stop consonants and then splice out the vowel, the result is a sound that is
not perceived as a stop.
So "stop consonant perception is dependent on rapidly changing formant transitions from
the consonant to the vowel in a consonant-vowel (CV) context.
The key word her is context.
Consonant perception is highly dependant on the vowel context in which it is uttered.
The perception of a consonant is dependent on the formant transition and the resulting
acoustic signal.
However, if the acoustic signal resulting from only the formant transition is presented to
a listener, they will not hear just the isolated consonant.
If we erase the vowel portion of a recorded CV the perception of the consonant sound
also goes away and what is remaining does not have any meaning nor does it sound like
a speech sound.
To say this another way, even when the acoustic information for a consonant sound is
preserved, when the vowel is removed the consonant no longer is a consonant sound.
We also know that consonant perception remains the same from one CV context to
another even though the acoustic information from the transitions change.
What is a Suprasegmental? Definition is on page 325
Read the section starting on page 219
Why is it important to perception?
Book says they are prosodic features
They are elements of speech that occur at the same time as the individual phonemes and
can be reflected in changes to the acoustic signal resulting from the speech sound
production.
As the book " … they are not confined to phonetic segments and, instead, are overlaid or
superimposed on syllables, words, phrases and sentences."
In a real sense, they are flavoring or coloring or adding meaning..
All of this is achieved from
Intonation
Stress
Timing
All can change the meaning of an utterance so they are a critical component to speech
perception.
How we, as listeners, perceive suprasegmentals is related to the acoustic changes in the
wave form from the utterances.
Those changes may be in the form of
Reduced intensity
Increased intensity
Changes in the fundamental frequency
Temporal (timing) changes
So what is intonation?
It is the perception we get from alterations in the fundamental frequency during speech
production.
Recall that in perception we don't talk about frequency but rather pitch so they are
perceptions of pitch or changes in pitch during speech.
We can use intonation for several purposes including
Differences in meaning?
Making a statement a question?
Stress relates back to
loudness or softness
duration
and fundamental frequency
An example a string of phonemes can represent a noun or a verb through changes in
stress.
OBject noun
obJECT verb
Finally, quality can be had or perceived through changes in timing cues.
Changes in duration either
Relative or
Absolute
Issues in Speech Perception
There are three and they are
Invariance
Linearity
Segmentation
In one sense they are issues but in another sense they are problems that need to be
explained as we try and understand how we perceive speech.
They relate to how we, as listeners, recognize spoken utterances from the acoustic
information that is present in the waveform resulting from that utterance.
"The principle of acoustic-phonetic invariance states that corresponding to each
phoneme (speech sound) is a distinct set of acoustic features, so that each time a given
phoneme is produced, the same acoustic cues are identifiable in the speech signal,
regardless of context."
"The linearity principle proposes that in a spoken word, a specific sound corresponds to
each phoneme, with units of sound corresponding to phonemes being discrete and
ordered In a particular sequence."
" The segmentation principle asserts that the speech signal can be divided (and
recombined) into acoustically independent units that correspond to specific phonemes."
What they are trying to say is that there appears (or should be) to be a one-to-one
correspondence between each phoneme and the acoustic signal present when they are
produced.
If that were so then speech perception would simply be a matter of correct or proper
pattern recognition.
There is a but
And it is a big but.
Over the past fifty years research has been able to establish that in normal conversational
speech there is no proof of evidence of invariance, linearity or segmentation.
Specifically, what research into speech perception has found that makes the evidence for
invariance, linearity and segmentation suspect is:
1. The acoustic cues in speech out number the phonemes in words.
2. The acoustic properties in a given phoneme vary in different ordering context
(CV, VC, CVC)
3. That a particular point in a speech stream there is overlapping information present
about the acoustic properties of a specific phoneme as well as the phonemes that precedes
and follow it.
4. Studies that track the articulators have shown that during speech production the
shape or configuration of the vocal tract and the oral cavity is influenced by the
shape of the phonemes that precede as well as those that follow the phoneme
5. The temporal boundaries between utterances of actual phonemes are not
consistent. They may vary by several microseconds..
6. The phenomenon of coarticulation which is the simultaneous movement of two
articulators shows that there is a clear lack of segmentation in the speech signal.
We also know that the evidence for invariance, linearity and segmentation suspect is
because there are often large acoustic differences in phoneme utterances by the same
person and even larger differences between individual talkers.
We do know that lower forms of animals communicate with each other and with us, yet
speech communication is a characteristic that is, for the most part, limited to mankind.
It has been speculated that somehow, through either a divine or an evolutionary process,
mankind has come to have some specialized neural mechanism that is unique and suited
to the process of speech perception.
Lets skip over now and take a brief look at the Theories of Speech Perception.
Download