Speech Perception Lecture

Acoustics • Acoustics = physics of sound • Sound = moving air particles • Frequency of motion is measured in Hz (= hertz = cycles/sec) • Complex sounds = consist of many different frequencies simultaneously – slowest frequency = fundamental frequency (F0) • determines pitch – other higher frequencies = harmonics = overtones • determine timbre • The voice is a complex sound 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Some Different Ways to Depict Sound 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Acoustics of Speech • Fundamental Frequency (F0) – basic pitch of voice – rate at which whole vocal cords vibrate • Plus harmonics (= overtones) – other higher frequencies in voice – faster rates at which parts of vocal cords & other structures vibrate • Resonance (= sympathetic vibration) – rest of vocal tract enhances some frequencies & inhibits others – freqs that are enhanced or inhibited depends on vocal tract shape – which depends on positions of articulators – Produces formants • enhanced frequency bands • usually 3-4 formants in speech: F1, F2, & F3 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Speech & Hearing Frequencies • Human hearing – 20 - 20,000 Hz – Most sensitive at 500 - 5,000 Hz • Human voice fundamental frequency – Average for men – Average for women = 80 - 200 Hz = up to 400 Hz • Telephone: – Cuts off at ~3000 Hz – Crucial information for identifying some sounds lost (fricatives) 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 From Carroll (2004), The psychology of language, 4th Ed. 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 English Spelling A Dreadful Language I take it you already know Of tough and bough and cough and dough. Others may stumble, but not you, On hiccough, thorough, touch, and through; Well done! And now you wish perhaps To learn of less familiar traps? Beware of heard, a dreadful word That looks like beard and sounds like bird. And dead: it's said like bed, not bead For goodness sake, don't call it "deed". Watch out for meat and great and threat (They rhyme with suite and straight and debt). A moth is not a moth in mother, Nor both in bother, nor broth in brother. And here is not a match for there, Nor dear and fear for bear and pear. And then there's dose and rose and lose Just look them up - and goose and choose, And cork and work and word and sword, And do and go and thwart and cart. Come, come I've hardly made a start. A dreadful language? Man alive I mastered it when I was five! 09/01/10 Psyc / Ling / Comm 525 Fall 2010 International Phonetic Alphabet (IPA) • 1 sound = 1 symbol • Symbols for all speech sounds in all languages • Phonetic writing makes pronunciation completely unambiguous – Some languages have writing systems that are close to phonetic (Korean, Italian) – Some other languages have writing systems that indicate less about pronunciation (Mandarin?) 09/01/10 Psyc / Ling / Comm 525 Fall 2010 (“Standard” American) From Carroll (2004), The psychology of language, 4th Ed. 09/01/10 Psyc / Ling / Comm 525 Fall 2010 From Carroll (2004), The psychology of language, 4th Ed. 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Coarticulation • Each sound partially shaped by sounds before & after it – keel vs kill vs cool – / kil / vs / kIl / vs / kul / (IPA characters) – place of articulation and rounding on the k differ a lot – so, different versions of “the same sound” in different contexts – and from different speakers • This is what allows us to talk so fast 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Coarticulation Across Languages • How different can different versions of a sound be & still be heard as “the same sound”? – Different for different languages – A back rounded k and a front unrounded k sound like “the same sound” to English speakers • but that same difference is enough to make them sound like 2 different sounds in some other languages 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Phonemes • In English, a difference in voicing makes 2 sounds “different sounds” – – pill /pIl/ vs vs bill /bIl/ – p = voiceless – b = voiced • Can find many other minimal pairs of English words where the only difference is whether or not one sound is voiced – – – – – rip bat tip cap back rib bad dip cab bag • Therefore, voicing is a distinctive feature in English – and 2 sounds that differ only in voicing are different phonemes – phoneme = sound that can signal a meaning difference 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Phonemes vs Allophones • There’s another difference between pill and bill in English – The p in pill is aspirated, but the b in bill is not • /phIl/ vs /bIl/ • aspiration = air puff when stop consonant is released • But, there are no minimal pairs of English words that differ only in whether or not one sound is aspirated – So, aspiration is a non-distinctive feature in English – 2 sounds that differ only in aspiration are allophones of the same phoneme – allophones = different versions of the “same sound” • But in Korean, it’s the opposite of English – aspiration is phonemic – voicing is allophonic 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Another Cross-Linguistic Example • In English, there is a minimal pair rip and lip – & many other pairs that differ in just r vs l – so r and l are different phonemes in English • In Japanese, there are no minimal pairs that differ only in r vs l – Instead, there’s a single phoneme that’s somewhere between the English r and l – and it has different pronunciations in different contexts • sometimes it sounds more like English r • and sometimes like English l • r and l are both allophones of a single phoneme • Makes it very difficult for Japanese speakers to hear the difference in English – Japanese speakers have learned to categorize all the allophones as “the same sound” 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Distinctive Features Across Languages • There are many kinds of differences between speech sounds – Some are important (= distinctive) & some are not – Which is which varies across languages • So, have to learn which are the important ones for your language • For English consonants, the distinctive features are: – Voicing (Voice Onset Time) – Place of articulation – Manner of articulation 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Speech Perception is Hard! • Coarticulation – allows us to talk fast – which leads to lack of invariance in acoustic signal 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Variability in Vowel Production From Kuhl, et al. (2004), Nat Rev Neurosci 09/01/10 Psyc / Ling / Comm 525 Fall 2010 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Speech Perception is Hard! • Coarticulation – allows us to talk fast – which leads to lack of invariance – a series of musical notes changing as fast as speech sounds do would sound like a blur – we would not be able to perceive individual notes – yet we have the impression that we hear each speech sound • This has led some researchers to propose that: – speech perception requires a hard-wired uniquely human ability that evolved specifically for speech • What sort of evidence would support this idea? 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Evidence about special status of speech perception • Categorical Perception – Inability to hear differences between members of a category – where category = phoneme – e.g., variants of /p/ with different VOTs – Together with ability to hear differences of the same size when the 2 sounds are members of different categories – e.g., /p/ vs /b/ • Adults can easily hear only the differences that are important in their language – e.g., English speakers easily hear difference between /r/ & /l/ • i.e., they sound like "different sounds“ – while Japanese speakers find it very hard to hear same diff • i.e., they sound like "the same sound" 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Categorical Perception • Categorical perception is strongest for voicing & place of articulation for consonants – Weaker effect for vowels called a “magnet effect” • Adults show categorical perception for the differences that are distinctive in their language – So, it depends on learning – How early is it learned? 09/01/10 Psyc / Ling / Comm 525 Fall 2010 From Carroll (2004), The psychology of language, 4th Ed. 09/01/10 Psyc / Ling / Comm 525 Fall 2010 From Carroll (2004), The psychology of language, 4th Ed. 09/01/10 Psyc / Ling / Comm 525 Fall 2010 From Carroll (2004), The psychology of language, 4th Ed. 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Testing Infant Speech Perception • Use a habituation paradigm to test perception – Infants suck on a pacifier with a transducer in it – Measure how hard & how often they suck – Whenever something interesting happens, they suck more • Play synthetic speech syllables that vary on some feature – e.g., VOT – Keep playing same syllable over & over until they're bored with it and their sucking rate decreases (= habituation) – Then change the syllable – If sucking rate goes up, they must have heard the change – If rate does not go up, either they couldn't hear the change, or it wasn’t interesting enough 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Categorical Perception in Infants • For VOT – Play a clear pa over and over – If then change to one with a different VOT, but that adults would call ba • English-hearing infants will speed up sucking rate • Therefore, they hear the difference – If instead change to one with a VOT that’s just as different from the first one, but it’s one adults would still call pa • Infants don’t speed up • Therefore, they didn’t hear the change (or it’s not interesting) • Suggests infants cannot discriminate between different versions of pa, but can discriminate between pa and ba – Just like English-speaking adults – So, English-hearing infants already have categorical perception 09/01/10 Psyc / Ling / Comm 525 Fall 2010 From Eimas et al. (1971), Science 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Infant Speech Perception Across Languages • Infants easily hear many differences that adults don’t – they start out able to hear differences that are not important in the language spoken around them • Japanese-hearing infants start out being able to hear the difference between r and l just as well as English-hearing infants • but by ~1 year old, they no longer hear that difference • All children start out able to hear (most of) the differences that are important in any human language – But over their 1st year, they lose the ability to hear differences that are not important in the language they’re hearing – the speech perception system gets tuned to hear only the differences that are important for the language being learned • Why by 1 year? • Maybe because that’s when they start to say words? (Werker) 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Video segment from PBS series The Mind (1989) 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Are there limits to the differences infants can hear? • Yes: Lasky et al. (1975) – Voicing is distinctive for stop consonants in English, Spanish, & Thai – But the boundary between voiced & voiceless is at different VOT values Thai Spanish English ------------------------------------------------------------------------60 -40 -20 0 +20 +40 +60 VOT (msec) • The Thai & English boundary values are common to many languages • The Spanish one is unusual – Spanish-hearing infants less than 1 year old • hear the difference between pairs of sounds that straddle both the Thai & English category boundaries • but not ones that straddle the Spanish boundary • So, infants hear most, but not all, differences used in any language 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Categorical Perception, cont’d • The same synthesized stimuli can be perceived as speech or not – Play formant transition to one ear (sounds like a chirp) – and steady-state part to other ear (sounds like vowel) • If tell people it’s speech, they integrate 2 ears & hear it as speech – but if don't tell them, they don't hear it as sounding like speech • When they do hear it as speech, get categorical perception – but not when they don’t hear it as speech • CP effects much stronger for consonants than for vowels • What seems to be critical is: – a short rapidly changing sound (e.g., consonant) – followed by a longer slower-changing sound (e.g., vowel) – where both heard as part of a single input 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Is categorical perception unique to humans? (i.e., Is it evidence that speech perception is special?) • No – Many other animals show results like human infants in habituation paradigms – They can discriminate between sounds that humans would call different phonemes – and cannot discriminate between sounds that humans would call the same phoneme • So, human speech takes advantage of properties of auditory system – by generally using the differences that are easy to hear to signal important contrasts in the language 09/01/10 Psyc / Ling / Comm 525 Fall 2010 What GOOD is categorical perception??? • Categorical Perception = a failure to discriminate speech sounds any better than you can identify them • How can it be desirable to lose the ability to hear differences??? – Speech is hugely variable • • • • coarticulation different speech rates different speakers with different voices & accents ... • - The auditory system learns to attend to the differences that are important and to ignore the ones that are not • - Lets us tune out a lot of irrelevant variability • - Can adults re-learn to hear differences they’ve learned to ignore? - Yes, but it requires a particular kind of training 09/01/10 Psyc / Ling / Comm 525 Fall 2010 McGurk Effect Visual cues in speech perception • Conflicting acoustic and visual cues can lead to blended perception of sound – If there’s a sound in the language that’s • close enough to the acoustic signal • & fits with the visual cues 09/01/10 Psyc / Ling / Comm 525 Fall 2010 More on Visual Context Effects (Gilbert, Lansing, & Garnsey, in prep) • Participants heard either /ba/ or /ga/ (50-50) • Task = Did you hear /ba/? (50-50) • Syllables embedded in several levels of noise as well as in quiet • Simultaneous visual cue – – – – 09/01/10 Static rectangle Static smiling face Chewing face (irrelevant motion) Speaking face (relevant motion) Psyc / Ling / Comm 525 Fall 2010 Accuracy d' Senstivity 3.5 3.0 2.5 2.0 1.5 Quiet 0 dB SNR -9 dB SNR 1.0 0.5 0.0 -18 dB SNR Rect AR Smile ASF Chew ADF Speak AV Visual Cue Type Presntation Condition VO - Informative facial motion completely compensates for noise - Other facial cues have no effect on accuracy 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Event-Related Brain Potentials (ERPs) N100 component Speak Chew - Earlier & smaller when speech easy to identify - Irrelevant face motion speeds up N100 just as much as relevant motion - But doesn’t reduce its amplitude N100 - Maybe potentially relevant face motion serves an alerting function? 09/01/10 Psyc / Ling / Comm 525 Fall 2010 Smile Phoneme Restoration • Replace one phoneme in an utterance with noise – If the phoneme is predictable from context, people “hear” the missing sound (e.g., legi*lature) – If tell them a sound has been replaced, they’re not accurate at identifying which sound it is – Warren & Warren (1970) • Stimuli (acoustically identical except for last word) – – – – It was found that the *eel It was found that the *eel It was found that the *eel It was found that the *eel was on the orange. was on the axle. was on the shoe. was on the table. • People believed they had heard the phoneme that made sense given the final word – Final word can’t have influenced what they heard at *eel 09/01/10 Psyc / Ling / Comm 525 Fall 2010

Speech Perception Lecture

Related documents

Products

Support

Speech Perception Lecture

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib