Speech Perception Lecture

advertisement
Acoustics
• Acoustics = physics of sound
• Sound = moving air particles
• Frequency of motion is measured in Hz
(= hertz = cycles/sec)
• Complex sounds = consist of many different frequencies
simultaneously
– slowest frequency = fundamental frequency (F0)
• determines pitch
– other higher frequencies = harmonics = overtones
• determine timbre
• The voice is a complex sound
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Some Different Ways to Depict Sound
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Acoustics of Speech
• Fundamental Frequency (F0)
– basic pitch of voice
– rate at which whole vocal cords vibrate
• Plus harmonics (= overtones)
– other higher frequencies in voice
– faster rates at which parts of vocal cords & other structures vibrate
• Resonance (= sympathetic vibration)
– rest of vocal tract enhances some frequencies & inhibits others
– freqs that are enhanced or inhibited depends on vocal tract shape
– which depends on positions of articulators
– Produces formants
• enhanced frequency bands
• usually 3-4 formants in speech: F1, F2, & F3
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Speech & Hearing Frequencies
• Human hearing
– 20 - 20,000 Hz
– Most sensitive at 500 - 5,000 Hz
• Human voice fundamental frequency
– Average for men
– Average for women
= 80 - 200 Hz
= up to 400 Hz
• Telephone:
– Cuts off at ~3000 Hz
– Crucial information for identifying some sounds lost
(fricatives)
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
English Spelling
A Dreadful Language
I take it you already know
Of tough and bough and cough and dough.
Others may stumble, but not you,
On hiccough, thorough, touch, and through;
Well done! And now you wish perhaps
To learn of less familiar traps?
Beware of heard, a dreadful word
That looks like beard and sounds like bird.
And dead: it's said like bed, not bead For goodness sake, don't call it "deed".
Watch out for meat and great and threat
(They rhyme with suite and straight and debt).
A moth is not a moth in mother,
Nor both in bother, nor broth in brother.
And here is not a match for there,
Nor dear and fear for bear and pear.
And then there's dose and rose and lose Just look them up - and goose and choose,
And cork and work and word and sword,
And do and go and thwart and cart.
Come, come I've hardly made a start.
A dreadful language? Man alive I mastered it when I was five!
09/01/10
Psyc / Ling / Comm 525 Fall 2010
International Phonetic Alphabet (IPA)
• 1 sound = 1 symbol
• Symbols for all speech sounds in all languages
• Phonetic writing makes pronunciation completely
unambiguous
– Some languages have writing systems that are close to
phonetic (Korean, Italian)
– Some other languages have writing systems that indicate
less about pronunciation (Mandarin?)
09/01/10
Psyc / Ling / Comm 525 Fall 2010
(“Standard” American)
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Coarticulation
• Each sound partially shaped by sounds
before & after it
– keel
vs
kill
vs
cool
– / kil / vs / kIl / vs / kul / (IPA characters)
– place of articulation and rounding on the k
differ a lot
– so, different versions of “the same sound” in
different contexts
– and from different speakers
• This is what allows us to talk so fast
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Coarticulation Across Languages
• How different can different versions of a
sound be & still be heard as “the same
sound”?
– Different for different languages
– A back rounded k and a front unrounded k sound
like “the same sound” to English speakers
• but that same difference is enough to make them
sound like 2 different sounds in some other languages
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Phonemes
• In English, a difference in voicing makes 2 sounds “different sounds”
–
–
pill
/pIl/
vs
vs
bill
/bIl/
– p = voiceless
– b = voiced
• Can find many other minimal pairs of English words where the only
difference is whether or not one sound is voiced
–
–
–
–
–
rip
bat
tip
cap
back
rib
bad
dip
cab
bag
• Therefore, voicing is a distinctive feature in English
– and 2 sounds that differ only in voicing are different phonemes
– phoneme = sound that can signal a meaning difference
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Phonemes vs Allophones
• There’s another difference between pill and bill in English
– The p in pill is aspirated, but the b in bill is not
• /phIl/
vs
/bIl/
• aspiration = air puff when stop consonant is released
• But, there are no minimal pairs of English words that differ
only in whether or not one sound is aspirated
– So, aspiration is a non-distinctive feature in English
– 2 sounds that differ only in aspiration are allophones of the
same phoneme
– allophones = different versions of the “same sound”
• But in Korean, it’s the opposite of English
– aspiration is phonemic
– voicing is allophonic
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Another Cross-Linguistic Example
• In English, there is a minimal pair rip and lip
– & many other pairs that differ in just r vs l
– so r and l are different phonemes in English
• In Japanese, there are no minimal pairs that differ only in r vs l
– Instead, there’s a single phoneme that’s somewhere between
the English r and l
– and it has different pronunciations in different contexts
• sometimes it sounds more like English r
• and sometimes like English l
• r and l are both allophones of a single phoneme
• Makes it very difficult for Japanese speakers to hear the
difference in English
– Japanese speakers have learned to categorize all the allophones
as “the same sound”
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Distinctive Features Across Languages
• There are many kinds of differences between
speech sounds
– Some are important (= distinctive) & some are not
– Which is which varies across languages
• So, have to learn which are the important ones for
your language
• For English consonants, the distinctive features are:
– Voicing (Voice Onset Time)
– Place of articulation
– Manner of articulation
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Speech Perception is Hard!
• Coarticulation
– allows us to talk fast
– which leads to lack of invariance in acoustic signal
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Variability in Vowel Production
From Kuhl, et al. (2004), Nat Rev Neurosci
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Speech Perception is Hard!
• Coarticulation
– allows us to talk fast
– which leads to lack of invariance
– a series of musical notes changing as fast as speech
sounds do would sound like a blur
– we would not be able to perceive individual notes
– yet we have the impression that we hear each speech
sound
• This has led some researchers to propose that:
– speech perception requires a hard-wired uniquely human
ability that evolved specifically for speech
• What sort of evidence would support this idea?
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Evidence about special status of
speech perception
• Categorical Perception
– Inability to hear differences between members of a category
– where category = phoneme
– e.g., variants of /p/ with different VOTs
– Together with ability to hear differences of the same size when
the 2 sounds are members of different categories
– e.g., /p/ vs /b/
• Adults can easily hear only the differences that are important in
their language
– e.g., English speakers easily hear difference between /r/ & /l/
• i.e., they sound like "different sounds“
– while Japanese speakers find it very hard to hear same diff
• i.e., they sound like "the same sound"
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Categorical Perception
• Categorical perception is strongest for voicing &
place of articulation for consonants
– Weaker effect for vowels called a “magnet effect”
• Adults show categorical perception for the
differences that are distinctive in their
language
– So, it depends on learning
– How early is it learned?
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Testing Infant Speech Perception
• Use a habituation paradigm to test perception
– Infants suck on a pacifier with a transducer in it
– Measure how hard & how often they suck
– Whenever something interesting happens, they suck more
• Play synthetic speech syllables that vary on some feature
– e.g., VOT
– Keep playing same syllable over & over until they're bored with it
and their sucking rate decreases (= habituation)
– Then change the syllable
– If sucking rate goes up, they must have heard the change
– If rate does not go up, either they couldn't hear the change, or
it wasn’t interesting enough
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Categorical Perception in Infants
• For VOT
– Play a clear pa over and over
– If then change to one with a different VOT, but that adults would
call ba
• English-hearing infants will speed up sucking rate
• Therefore, they hear the difference
– If instead change to one with a VOT that’s just as different from
the first one, but it’s one adults would still call pa
• Infants don’t speed up
• Therefore, they didn’t hear the change (or it’s not interesting)
• Suggests infants cannot discriminate between different versions
of pa, but can discriminate between pa and ba
– Just like English-speaking adults
– So, English-hearing infants already have categorical perception
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Eimas et al. (1971), Science
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Infant Speech Perception Across
Languages
• Infants easily hear many differences that adults don’t
– they start out able to hear differences that are not important in the
language spoken around them
• Japanese-hearing infants start out being able to hear the difference between r
and l just as well as English-hearing infants
• but by ~1 year old, they no longer hear that difference
• All children start out able to hear (most of) the differences that are
important in any human language
– But over their 1st year, they lose the ability to hear differences that are not
important in the language they’re hearing
– the speech perception system gets tuned to hear only the differences that
are important for the language being learned
• Why by 1 year?
• Maybe because that’s when they start to say words? (Werker)
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Video segment from PBS series The Mind (1989)
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Are there limits to the differences
infants can hear?
• Yes: Lasky et al. (1975)
– Voicing is distinctive for stop consonants in English, Spanish, & Thai
– But the boundary between voiced & voiceless is at different VOT values
Thai
Spanish
English
------------------------------------------------------------------------60
-40
-20
0
+20
+40
+60
VOT (msec)
• The Thai & English boundary values are common to many languages
• The Spanish one is unusual
– Spanish-hearing infants less than 1 year old
• hear the difference between pairs of sounds that straddle both the Thai &
English category boundaries
• but not ones that straddle the Spanish boundary
• So, infants hear most, but not all, differences used in any language
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Categorical Perception, cont’d
• The same synthesized stimuli can be perceived as speech or not
– Play formant transition to one ear (sounds like a chirp)
– and steady-state part to other ear (sounds like vowel)
• If tell people it’s speech, they integrate 2 ears & hear it as speech
– but if don't tell them, they don't hear it as sounding like speech
• When they do hear it as speech, get categorical perception
– but not when they don’t hear it as speech
• CP effects much stronger for consonants than for vowels
• What seems to be critical is:
– a short rapidly changing sound (e.g., consonant)
– followed by a longer slower-changing sound (e.g., vowel)
– where both heard as part of a single input
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Is categorical perception unique to humans?
(i.e., Is it evidence that speech perception is special?)
• No
– Many other animals show results
like human infants in habituation
paradigms
– They can discriminate between
sounds that humans would call
different phonemes
– and cannot discriminate
between sounds that humans
would call the same phoneme
• So, human speech takes
advantage of properties of
auditory system
– by generally using the differences
that are easy to hear to signal
important contrasts in the
language
09/01/10
Psyc / Ling / Comm 525 Fall 2010
What GOOD is categorical
perception???
• Categorical Perception = a failure to discriminate speech
sounds any better than you can identify them
• How can it be desirable to lose the ability to hear differences???
– Speech is hugely variable
•
•
•
•
coarticulation
different speech rates
different speakers with different voices & accents
...
• - The auditory system learns to attend to the differences that are
important and to ignore the ones that are not
• - Lets us tune out a lot of irrelevant variability
• - Can adults re-learn to hear differences they’ve learned to ignore?
- Yes, but it requires a particular kind of training
09/01/10
Psyc / Ling / Comm 525 Fall 2010
McGurk Effect
Visual cues in speech perception
• Conflicting acoustic and visual cues can lead to blended
perception of sound
– If there’s a sound in the language that’s
• close enough to the acoustic signal
• & fits with the visual cues
09/01/10
Psyc / Ling / Comm 525 Fall 2010
More on Visual Context Effects
(Gilbert, Lansing, & Garnsey, in prep)
• Participants heard either /ba/ or /ga/ (50-50)
• Task = Did you hear /ba/? (50-50)
• Syllables embedded in several levels of noise
as well as in quiet
• Simultaneous visual cue
–
–
–
–
09/01/10
Static rectangle
Static smiling face
Chewing face (irrelevant motion)
Speaking face (relevant motion)
Psyc / Ling / Comm 525 Fall 2010
Accuracy
d' Senstivity
3.5
3.0
2.5
2.0
1.5
Quiet
0 dB SNR
-9 dB SNR
1.0
0.5
0.0
-18 dB SNR
Rect
AR
Smile
ASF
Chew
ADF
Speak
AV
Visual Cue Type
Presntation Condition
VO
- Informative facial motion completely compensates for noise
- Other facial cues have no effect on accuracy
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Event-Related Brain Potentials (ERPs)
N100 component
Speak
Chew
- Earlier & smaller when
speech easy to identify
- Irrelevant face
motion speeds up
N100 just as much
as relevant motion
- But doesn’t reduce
its amplitude
N100
- Maybe potentially
relevant face motion
serves an alerting
function?
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Smile
Phoneme Restoration
• Replace one phoneme in an utterance with noise
– If the phoneme is predictable from context, people
“hear” the missing sound (e.g., legi*lature)
– If tell them a sound has been replaced, they’re not
accurate at identifying which sound it is
– Warren & Warren (1970)
• Stimuli (acoustically identical except for last word)
–
–
–
–
It was found that the *eel
It was found that the *eel
It was found that the *eel
It was found that the *eel
was on the orange.
was on the axle.
was on the shoe.
was on the table.
• People believed they had heard the phoneme that
made sense given the final word
– Final word can’t have influenced what they heard at *eel
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Download