Continuous acoustic detail affects spoken word recognition Implications for cognition, development and language disorders. Bob McMurray University of Iowa Dept. of Psychology Collaborators Richard Aslin Michael Tanenhaus David Gow J. Bruce Tomblin Joe Toscano Cheyenne Munson Dana Subik Julie Markant Why Speech and Word Recognition 1) Interface between perception and cognition. - Basic Categories - Meaning - Continuous Input -> Discrete representations. 2) Meaningful stimuli are almost always temporal. - Music - Visual Scenes (across saccades) - Language 3) We understand the: - Cognitive processes (word recognition) - Perceptual processes (speech perception) - Ecology of the input (phonetics) 4) Speech is important: disordered language. Divisions, Divisions… Speech / Language Pathology Word Recognition, Sentence Processing Phonetics Phonology, The Lexicon Speech, Hearing Language Cognition Linguistics Perception (& Action) Psychology Speech Perception Divisions, Divisions… Divisions useful for framing research and focusing questions. But: Divisions between domains of study can become… Implicit models of cognitive processing. Divisions in Spoken Language Understanding Speech Perception • Categorization of acoustic input into sublexical units. Acoustic Sublexical Units /a/ /la/ /b/ Word Recognition • Identification of target word from active sublexical units. /ip/ /l/ /p/ Lexicon Divisions yield processes Speech Perception • Pattern Recognition • Normalization Processes • Stream Segregation Acoustic Sublexical Units /a/ /la/ /b/ Word Recognition • Competition • Activation • Constraint Satisfaction /ip/ /l/ /p/ Lexicon Processes yield models Acoustic Reduce Continuous Variance Word Recognition • Identify single referent. • Ignore competitors. Speech Perception • Extract invariant phonemes and features. • Discard continuous variation. Sublexical Units /a/ /la/ /ip/ /b/ /l/ /p/ Lexicon Reduce Variance The Variance Reduction Model Words Remove variance Phonemes (etc) Remove variance Variance Reduction Model (VRM) Understanding speech is a process of progressively extracting invariant, discrete representations from variable, continuous input. Continuous speech cues play a minimal role in word recognition (and probably wouldn’t be helpful anyways). Temporal Integration The VRM might apply if speech were static. “Goon” Goal: Identify /u/ Signal: Low F1, F2, High F3 Noise: Initially: F2 decreasing Variance Later: F2 increasing Reduction Presence of anti-formant Mechanisms Temporal Integration But the dynamic properties make it more difficult. Gone. Maybe in STM? Hasn’t happened yet. “Goon” Goal: Identify /u/ Signal: Low F1, F2, High F3 Noise: Initially: F2 decreasing Later: F2 increasing Presence of anti-formant Temporal Integration But the dynamic properties make it more difficult. Gone. Maybe in STM? Hasn’t happened yet. “Goon” Variance Utilization Mechanisms Goal: Identify /u/ Signal: Low F1, F2, High F3 Signal': Initially: F2 decreasing Later: F2 increasing Presence of anti-formant Prior /g/ Upcoming /n/ Goals 1) Replace the Variance Reduction Model with the Variance Utilization Model. 2) Normal lexical activation processes can serve as variance utilization mechanisms. Words Remove variance Phonemes (etc) Remove variance 3) Speculatively (and not so speculatively) examine the consequences for: • • • Temporal Integration / Short Term Memory. Development Non-normal Development Outline 1) Review • Origins of the VRM. • Spoken Word Recognition. 2) Empirical Test 3) The VUM • Lexical Locus • Temporal Integration • SLI proposal 4) Developmental Consequences • Empirical Tests • Computational Model • CI proposal Word Recognition Online Spoken Word Recognition • Information arrives sequentially • Fundamental Problem: At early points in time, signal is temporarily ambiguous. X basic ba… kery bakery X barrier X X bait barricade X baby • Later arriving information disambiguates the word. Word Recognition Current models of spoken word recognition • Immediacy: Hypotheses formed from the earliest moments of input. • Activation Based: Lexical candidates (words) receive activation to the degree they match the input. • Parallel Processing: Multiple items are active in parallel. • Competition: Items compete with each other for recognition. Word Recognition Input: time beach butter bump putter dog b... u… tt… e… r Word Recognition These processes have been well defined for a phonemic representation of the input. c A gnISn Considerably less ambiguity if we consider subphonemic information. • Bonus: processing dynamics may solve problems in speech perception. Example: subphonemic effects of motor processes. Coarticulation Any action reflects future actions as it unfolds. Example: Coarticulation Articulation (lips, tongue…) reflects current, future and past events. Subtle subphonemic variation in speech reflects temporal organization. Sensitivity to these perceptual details might yield earlier n n disambiguation. e e t c Lexical activation could retain k these perceptual details. Review: These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded. Example: Categorical Perception Categorical Perception P 100 Discrimination B % /p/ 100 Discrimination ID (%/pa/) 0 B 0 VOT P • Sharp identification of tokens on a continuum. • Discrimination poor within a phonetic category. Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme). Categorical Perception Evidence against the strong form of Categorical Perception from psychophysical-type tasks: Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974) Carney, Widin & Viemeister (1977) Training Samuel (1977) Pisoni, Aslin, Perey & Hennessy (1982) Goodness Ratings Miller (1997) Massaro & Cohen (1983) Variance Reduction Model Words CP enabled a fundamental independence of speech perception & spoken word recognition. Remove variance Phonemes (etc) Evidence against CP seen as supporting VRM (auditory vs. phonological processing mode). Critical Prediction: continuous variation in the signal should not affect word recognition. Remove variance Experiment 1 Does within-category acoustic detail systematically affect higher level language? Is there a gradient effect of subphonemic detail on lexical activation? McMurray, Aslin & Tanenhaus (2002) A gradient relationship would yield systematic effects of subphonemic information on lexical activation. If this gradiency is useful for temporal integration, it must be preserved over time. Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation. Acoustic Detail Use a speech continuum—more steps yields a better picture acoustic mapping. KlattWorks: generate synthetic continua from natural speech. 9-step VOT continua (0-40 ms) 6 pairs of words. beach/peach bump/pump bale/pale bomb/palm bear/pear butter/putter lock shoe lip sheep 6 fillers. lamp shark leg shell ladder ship leaf shirt Acoustic Detail Temporal Dynamics How do we tap on-line recognition? With an on-line task: Eye-movements Subjects hear spoken language and manipulate objects in a visual world. Visual world includes set of objects with interesting linguistic properties. a beach,, a peach and some unrelated items. Eye-movements to each object are monitored throughout the task. Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995 Temporal Dynamics Why use eye-movements and visual world paradigm? • Relatively natural task. • Eye-movements generated very fast (within 200ms of first bit of information). • Eye movements time-locked to speech. • Subjects aren’t aware of eye-movements. • Fixation probability maps onto lexical activation.. Task A moment to view the items Task Task Bear Repeat 1080 times Identification Results 1 0.9 proportion /p/ 0.8 High agreement across subjects and items for category boundary. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 B By subject: By item: 15 20 25 VOT (ms) 30 35 40 P 17.25 +/- 1.33ms 17.24 +/- 1.24ms Eye-Movement Analysis 200 ms Trials 1 2 3 4 Target = Bear Competitor = Pear % fixations 5 Unrelated = Lamp, Ship Time Eye-Movement Results VOT=0 Response= VOT=40 Response= Fixation proportion 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 00 400 800 1200 1600 0 400 800 1200 1600 Time (ms) More looks to competitor than unrelated items. 2000 Eye-Movement Results Given that • the subject heard bear • clicked on “bear”… How often was the subject looking at the “pear”? target competitor Gradient Effect Fixation proportion Fixation proportion Categorical Results target competitor time time Eye-Movement Results Response= Response= Competitor Fixations 0.16 VOT VOT 0.14 0 ms 5 ms 10 ms 15 ms 0.12 0.1 20 ms 25 ms 30 ms 35 ms 40 ms 0.08 0.06 0.04 0.02 0 0 400 800 1200 1600 0 400 800 1200 1600 2000 Time since word onset (ms) Long-lasting gradient effect: seen throughout the timecourse of processing. Eye-Movement Results Response= Response= Competitor Fixations 0.08 0.07 Looks to 0.06 0.05 0.04 Looks to Category Boundary 0.03 0.02 0 5 10 15 20 25 30 35 40 VOT (ms) Area under the curve: Clear effects of VOT B: p=.017* Linear Trend B: p=.023* P: p<.001*** P: p=.002*** Eye-Movement Results Response= Response= Competitor Fixations 0.08 0.07 Looks to 0.06 0.05 0.04 Looks to Category Boundary 0.03 0.02 0 5 10 15 20 25 30 35 40 VOT (ms) Unambiguous Stimuli Only Clear effects of VOT B: p=.014* P: p=.001*** Linear Trend B: p=.009** P: p=.007** Summary Subphonemic acoustic differences in VOT have gradient effect on lexical activation. • Gradient effect of VOT on looks to the competitor. • Effect holds even for unambiguous stimuli. • Seems to be long-lasting. Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002). Extensions Basic effect has been extended to other phonetic cues. - general property of word recognition… Voicing (b/p)1 Laterality (l/r), Manner (b/w), Place (d/g)1 Vowels (i/I, /)2 Natural Speech (VOT)3 P L B Sh X Metalinguistic Tasks3 Bear 1 McMurray, Clayards, Tanenhaus & Aslin (2004) 2 McMurray & Toscano (in prep) 3 McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted) Lexical Sensitivity Basic effect has been extended to other phonetic cues. - general property of word recognition… Voicing (b/p)1 Laterality (l/r), Manner (b/w), Place (d/g)1 Vowels (i/I, /)2 0.1 Natural Speech (VOT)3 X Metalinguistic Tasks3 1 McMurray, Clayards, Tanenhaus & Aslin (2004) 2 McMurray & Toscano (in prep) 3 McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted) Competitor Fixations Response=P Looks to B 0.08 0.06 0.04 Response=B Looks to B Category Boundary 0.02 00 5 10 15 20 25 VOT (ms) 30 35 40 Lexical Sensitivity Basic effect has been extended to other phonetic cues. - general property of word recognition… Voicing (b/p) Laterality (l/r), Manner (b/w), Place (d/g) Vowels (i/I, /) 0.1 Natural Speech (VOT) X Metalinguistic Tasks 1 McMurray, Clayards, Tanenhaus & Aslin (2004) 2 McMurray & Toscano (in prep) 3 McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted) Competitor Fixations Response=P Looks to B 0.08 0.06 0.04 Response=B Looks to B Category Boundary 0.02 00 5 10 15 20 25 VOT (ms) 30 35 40 The Variance Utilization Model 1) Word recognition is systematically sensitive to subphonemic acoustic detail. 2) Acoustic detail is represented as gradations in activation across the lexicon. 3) Normal word recognition processes do the work of. • Maintaining detail • Sharpening categories • Anticipating upcoming material • Resolving prior ambiguity. The Variance Utilization Model Input: b... u… m… p… time b/p bump pump dump bun bumper bomb Gradations phonetic cues preserved as relative lexical activation. The Variance Utilization Model Input: b... u… m… p… time bump b/d pump dump bun bumper bomb Gradations phonetic cues preserved as relative lexical activation. The Variance Utilization Model Input: b... u… m… p… time bump pump Vowel length dump bun bumper bomb Non-phonemic distinctions preserved. (e.g. vowel length: Gow & Gordon, 1995; Salverda, Dahan & McQueen 2003) The Variance Utilization Model Input: b... u… m… p… time bump pump n/m dump bun n/m info lost bumper bomb Material only retained until it is no longer needed. Words are a conveniently sized unit. The Variance Utilization Model Input: b... u… m… p… time bump pump dump bun bumper bomb No need for explicit short-term memory: lexical activation persists over time. The Variance Utilization Model Input: b... u… m… p… time bump pump dump bun bumper bomb Lexical competition: Perceptual warping (ala CP) results from natural competition processes. The Variance Utilization Model Current models of spoken word recognition • Immediacy: Phonetic cues not simultaneous, Activation retains early cues. • Activation Based: Graded response to graded input. • Parallel Processing: Preserves alternative interpretations until confident. Anticipatory activation for future possibilities. • Competition: Non-linear transformation of perceptual space. The Variance Utilization Model Current models of spoken word recognition • Immediacy: Phonetic cues not simultaneous, Activation retains early cues. • Activation Based: Graded response to graded input. • Parallel Processing: Preserves alternative interpretations until confident. Anticipatory activation for future possibilities. • Competition: Non-linear transformation of perceptual space. The Variance Utilization Model Current models of spoken word recognition • Immediacy: Phonetic cues not simultaneous, Activation retains early cues. • Parallel Processing: Preserves alternative interpretations until confident. Anticipatory activation for future possibilities. Can lexical activation help integrate continuous acoustic cues over time? • Regressive ambiguity resolution. • Anticipation of upcoming material. Experiment 2: Regressive Ambiguity Resolution How long are gradient effects of within-category detail maintained? Can subphonemic variation play a role in ambiguity ? resolution? How is information at multiple levels integrated? Misperception What if initial portion of a stimulus was misperceived? Competitor still active - easy to activate it rest of the way. Competitor completely inactive - system will “garden-path”. P ( misperception ) distance from boundary. Gradient activation allows the system to hedge its bets. Misperception / beIrəkeId / vs. / peIrəkit / barricade vs. parakeet Input: p/b eI time Categorical Lexicon parakeet barricade Gradient Sensitivity parakeet barricade r ə k i t… Methods (McMurray, Tanenhaus & Aslin, in prep) 10 Pairs of b/p items. Voiced Voiceless Overlap Bumpercar Pumpernickel 6 Barricade Parakeet 5 Bassinet Passenger 5 Blanket Plankton 5 Beachball Peachpit 4 Billboard Pillbox 4 Drain Pipes Train Tracks 4 Dreadlocks Treadmill 4 Delaware Telephone 4 Delicatessen Television 4 Methods X Eye Movement Results Barricade -> Parricade 1 VOT 0 5 10 15 20 25 30 Fixations to Target 0.8 0.6 0.4 0.2 35 0 300 600 900 Time (ms) Faster activation of target as VOTs near lexical endpoint. --Even within the non-word range. Eye Movement Results Barricade -> Parricade Parakeet -> Barakeet 1 VOT 0 5 10 15 20 25 30 Fixations to Target 0.8 0.6 0.4 0.2 35 0 300 600 900 Time (ms) 300 600 900 1200 Time (ms) Faster activation of target as VOTs near lexical endpoint. • Even within the non-word range. Eye Movement Results 1 VOT 0.8 Effect Size Lexical 0.6 0.4 0.2 0 0 200 400 600 800 1000 1200 1400 1600 Time (ms) Effect of VOT reduced as lexical information takes over. Experiment 2b Are results driven by the presence of the visual competitor? or Is this a natural process of lexical activation? X Look, Ma, no parakeet! Experiment 2b: Results Barricade -> Parricade Parakeet-> Barakeet 0.9 0.9 0.8 0.8 0.7 0 5 0.6 10 15 0.5 20 0.4 25 30 0.3 35 Looks to Parakeet Looks to Barricade 0.7 0.6 0.5 0.4 0.3 40 0.2 45 0.1 0.2 0.1 0 0 0 200 400 600 800 Time 1000 1200 1400 0 200 400 600 800 1000 1200 Time • Effect found even without visual competitor. • Regressive ambiguity resolution is a general property of lexical processes. 1400 Experiment 2 Conclusions Gradient effect of within-category variation without minimal-pairs. Gradient effect long-lasting: mean POD = 240 ms. Effect is not driven by visual context. Regressive ambiguity resolution: • Subphonemic gradations maintained until more information arrives. • Subphonemic gradation not maintained after POD. • Subphonemic gradation can improve (or hinder) recovery from garden path. The Variance Utilization Model Current models of spoken word recognition • Immediacy: Phonetic cues not simultaneous, Activation retains early cues. • Parallel Processing: Preserves alternative interpretations until confident. Anticipatory activation for future possibilities. Can lexical activation help integrate continuous acoustic cues over time? • Regressive ambiguity resolution. • Anticipation of upcoming material. ? Progressive Expectation Formation Can within-category detail be used to predict future acoustic/phonetic events? Yes: Phonological regularities create systematic within-category variation. • Predicts future events. (Gow & McMurray, in press) Experiment 3: Anticipation Word-final coronal consonants (n, t, d) assimilate the place of the following segment. Maroong Goose Maroon Duck Place assimilation -> ambiguous segments —anticipate upcoming material. Input: time maroon goose goat duck m… a… rr… oo… ng… g… oo… s… Methods Subject hears “select the maroon “select the maroon “select the maroong “select the maroong duck” goose” goose” duck” * We should see faster eyemovements to “goose” after assimilated consonants. Results Onset of “goose” + oculomotor delay Fixation Proportion 0.9 0.8 0.7 0.6 0.5 0.4 Assimilated 0.3 Non Assimilated 0.2 0.1 0 0 200 Time (ms) 400 Looks to “goose“ as a function of time Anticipatory effect on looks to non-coronal. 600 Results Fixation Proportion Onset of “goose” + oculomotor delay 0.3 Assimilated Non Assimilated 0.25 0.2 0.15 0.1 0.05 0 0 200 Time (ms) 400 600 Looks to “duck” as a function of time Inhibitory effect on looks to coronal (duck, p=.024) Summary Sensitivity to subphonemic detail: • Increase priors on likely upcoming events. • Decrease priors on unlikely upcoming events. • Active Temporal Integration Process. Occasionally assimilation creates ambiguity • Resolves prior ambiguity: mudg drinker • Similar to experiment 2… • Progressive effect delayed 200ms by lexical competition—supports lexical locus. Adult Summary Lexical activation is exquisitely sensitive to withincategory detail. This sensitivity is useful to integrate material over time. • Regressive Ambiguity resolution. • Progressive Facilitation Underpins a potentially lexical role in speech perception. Consequences for Language Disorders Word Recognition: not separable from speech perception. Specific Language Impairment => Deficits in: • Speech Perception: Less categorical perception (some debate: Thibodeaux & Sussman, 1979; Coady, Kluender & Evans, in press; Manis et al, 1997; Serniclaes et al, 2004; Van Alphen et al, 2004) • Word Recognition: Slower recognition. (Montgomery, 2002; Dollaghan, 1998) Could word recognition deficits account for apparent perceptual deficits? The Variance Utilization Model Input: b... u… m… p… time bump pump dump bun bumper bomb Lexical competition: Perceptual warping (ala CP) results from natural competition processes. The Variance Utilization Model Categorical perception: • Stimuli in the same category become closer in perceptual space (e.g. Goldstone, 2001) Lexical competition: • Most active lexical candidate inhibits alternatives. • Becomes more active. • More similar to prototype… • Feedsback to alter phoneme representations (Magnuson, McMurray, Tanenhaus & Aslin, 2003) • Two versions of same word (category) become more similar The Variance Utilization Model p If competitionb is suppressed… 20 Input: 80 … by a low-familiarity word …should less CP beach seepeach …greater to within-category detail Words 80 sensitivity 20 Activates: beach Competes: 90 Feedback: b 90 peach 10 p 10 Words Phonemes (etc) Critical step. Input warped [90 10] more similar to prototype, [100 0]. Perceptual space warped. Consequences for Language Disorders Visual World Paradigm: ideal test • Simple task: useable with many populations. • No meta-linguistic knowledge required. • Used to examine: - Lexical Activation (Allopenna et al, 1998) - Lexical Competition (Dahan et al, 2001) - Within-category sensitivity (McMurray et al, 2002) Consequences for Language Disorders Proposed Research Program (with J. Bruce Tomblin, V. Samelson, and S. Lee) Population: SLI & Normal Adolescents 16-17 y.o. Iowa Longitudinal Study (Tomblin et al) Step 1: Step 2: Step 3: Step 4: Step 5: Word Familiarity (~200 words) Basic Word Recognition Stimuli: Beaker, Beetle, Speaker, etc. Frequency effects Familiar words more active than unfamiliar. Gradiency (sensitivity to VOT) suppressed for familiar words (high competition). How do we buttress lexical activation? Consequences of VUM Word recognition sensitive to perceptual detail. • Temporal integration. Word recognition supports perceptual processed. • Hypothesis: related to SLI Continuous variability NOT discarded during recognition. Does this change how we think about development? Development Historically, work in speech perception has been linked to development. Sensitivity to subphonemic detail must revise our view of development. Use: Infants face additional problems: No lexicon available to clean up noisy input: rely on acoustic regularities. Extracting a phonology from the series of utterances. Development Sensitivity to subphonemic detail: For 30 years, virtually all attempts to address this question have yielded categorical discrimination (e.g. Eimas, Siqueland, Jusczyk & Vigorito, 1971). Exception: Miller & Eimas (1996). • Only at extreme VOTs. • Only when habituated to nonprototypical token. Use? Nonetheless, infants possess abilities that would require within-category sensitivity. • Infants can use allophonic differences at word boundaries for segmentation (Jusczyk, Hohne & Bauman, 1999; Hohne, & Jusczyk, 1994) • Infants can learn phonetic categories from distributional statistics (Maye, Werker & Gerken, 2002; Maye & Weiss, 2004). Statistical Category Learning Speech production causes clustering along contrastive phonetic dimensions. E.g. Voicing / Voice Onset Time B: VOT ~ 0 P: VOT ~ 40 Within a category, VOT forms Gaussian distribution. Result: Bimodal distribution 0ms 40ms VOT Statistical Category Learning To statistically learn speech categories, infants must: • Record frequencies of tokens at each value along a stimulus dimension. frequency • Extract categories from the distribution. +voice 0ms -voice VOT 50ms • This requires ability to track specific VOTs. Statistical Category Learning Known statistical learning abilities (Maye et al) predict: • Within category sensitivity. • Graded structure to category. Why no demonstrations? Statistical Category Learning Why no demonstrations of sensitivity? • Habituation Discrimination not ID. Possible selective adaptation. Possible attenuation of sensitivity. • Synthetic speech Not ideal for infants. • Single exemplar/continuum Not necessarily a category representation Experiment 4: Reassess issue with improved methods. HTPP Head-Turn Preference Procedure (Jusczyk & Aslin, 1995) Infants exposed to a chunk of language: • Words in running speech. • Stream of continuous speech (ala statistical learning paradigm). • Word list. Memory for exposed items (or abstractions) assessed: • Compare listening time between consistent and inconsistent items. HTPP Test trials start with all lights off. HTPP Center Light blinks. HTPP Brings infant’s attention to center. HTPP One of the side-lights blinks. HTPP Beach… Beach… Beach… When infant looks at side-light… …he hears a word HTPP …as long as he keeps looking. Methods 7.5 month old infants exposed to either 4 b-, or 4 p-words. 80 repetitions total. Form a category of the exposed class of words. Bomb Palm Bear Pear Bail Pail Beach Peach Measure listening time on… Original words Bear Competitors Pear VOT closer to boundary Bear* McMurray & Aslin, 2005 Pear Bear Pear* Methods Stimuli constructed by cross-splicing naturally produced tokens of each end point. B: P: M= 3.6 ms VOT M= 40.7 ms VOT B*: P*: M=11.9 ms VOT M=30.2 ms VOT B* and P* were judged /b/ or /p/ at least 90% consistently by adult listeners. B*: 97% P*: 96% Novelty or Familiarity? Novelty/Familiarity preference varies across infants and experiments. We’re only interested in the middle stimuli (b*, p*). Infants were classified as novelty or familiarity preferring by performance on the endpoints. Novelty Familiarity B 36 16 P 21 12 Within each group will we see evidence for gradiency? Novelty or Familiarity? After being exposed to bear… beach… bail… bomb… Infants who show a novelty effect… …will look longer for pear than bear. Listening Time What about in between? Categorical Gradient Bear Bear* Pear Results Novelty infants (B: 36 P: 21) Listening Time (ms) 10000 9000 8000 7000 Exposed to: 6000 B P 5000 4000 Target Target* Target vs. Target*: Competitor vs. Target*: Competitor p<.001 p=.017 Results Familiarity infants (B: 16 P: 12) 10000 Listening Time (ms) Exposed to: 9000 B P 8000 7000 6000 5000 4000 Target Target* Target vs. Target*: Competitor vs. Target*: Competitor P=.003 p=.012 Results Infants exposed to /p/ .009** Novelty N=21 .024* 9000 8000 7000 6000 .028* 9000 5000 4000 P Familiarity N=12 Listening Time (ms) Listening Time (ms) 10000 8000 P* .018* B 7000 6000 5000 4000 P P* B Results Infants exposed to /b/ >.1 >.2 10000 Novelty N=36 9000 8000 7000 6000 .06 10000 5000 4000 B Familiarity N=16 Listening Time (ms) Listening Time (ms) <.001** .15 9000 B* 8000 P 7000 6000 5000 4000 B B* P Experiment 4 Conclusions Contrary to all previous work: 7.5 month old infants show gradient sensitivity to subphonemic detail. • Clear effect for /p/ • Effect attenuated for /b/. Listening Time Reduced effect for /b/… But: Null Effect? Bear* Pear Listening Time Bear Expected Result? Bear Bear* Pear Listening Time Actual result. Bear Bear* Pear • Bear* Pear • Category boundary lies between Bear & Bear* - Between (3ms and 11 ms) [??] • Within-category sensitivity in a different range? Experiment 5 Same design as experiment 3. VOTs shifted away from hypothesized boundary Train Bomb Beach Bear Bale -9.7 ms. Bomb Beach Bear Bale -9.7 ms. Bomb* Beach* Bear* Bale* 3.6 ms. Palm Peach Pear Pail 40.7 ms. Test: Results Familiarity infants (34 Infants) =.01** Listening Time (ms) 9000 =.05* 8000 7000 6000 5000 4000 B- B P Results Novelty infants (25 Infants) =.002** Listening Time (ms) 9000 =.02* 8000 7000 6000 5000 4000 B- B P Experiment 5 Conclusions • Within-category sensitivity in /b/ as well as /p/. • Shifted category boundary in /b/: not consistent with adult boundary (or prior infant work)…. • Graded structure supports statistical learning. Will an implementation of this model allow us to understand developmental mechanism? Computational Model Distributional learning model 1) Model distribution of tokens as a mixture of Gaussian distributions over phonetic dimension (e.g. VOT) . 2) After receiving an input, the Gaussian with the highest posterior probability is the “category”. 3) Each Gaussian has three parameters: VOT Statistical Category Learning 1) Start with a set of randomly selected Gaussians. 2) After each input, adjust each parameter to find best description of the input. 3) Start with more Gaussians than necessary--model doesn’t innately know how many categories. -> 0 for unneeded categories. VOT VOT Training: Lisker & Abramson (1964) distribution of VOTs • Not successful with large K. • [Successful with K=2… …but what if we were learning Hindi?] Solution: Competition (winner-take-all) 1 Category 2 Categories >4 Categories % in right place Mechanism #1: Competition 5% 95% 0% 95% No Competition 0% 0% 100% 66% Competition Required. Validated with neural network. What about the nature of the initial state? Classic view (e.g. Werker & Tees, 1984): • Infants start with many small (nonnative) categories. • Lose distinctions that are not used in native language. Small (nonnative) categories => Large native categories. Combining small categories: easy. What about reverse (large => small)? Large (overgeneralized) categories => Smaller native categories. Dividing large categories: hard. Large (overgeneralized) categories => Smaller native categories. Dividing large categories: hard. Mechanism #2: Combining small categories easier than dividing large. Related to adult non-native speech perception findings? Question: Reduced auditory acuity in cochlear implant users. Answer: Larger region in which stimuli are not discriminable. Assess non-native discrimination in CI users. Larger categories. forthat learning? • Smallinitial categories: AuditoryProblem acuity not bad. • Large categories: suggest different learning mechanisms. (with J. Bruce Tomblin & B. Barker) Infant Summary Infants show graded sensitivity to subphonemic detail. • Supports variance utilization model. • Variance used for statistical learning. Model suggests aspects of developmental mechanism: • Competition. • Starting state (large vs. small) Remaining questions • Unexpected VOT boundary: may require 2AFC task (anticipatory eye-movement methods) • Role of initial category size and learning (possible CI application). Conclusions Infant and adults sensitive to subphonemic detail. Continuous detail not discarded by perception / word recognition. X Variance Reduction Variance Utilization Normal SWR mechanisms yield: 1) Temporal Integration 2) Perceptual warping Conclusions Infant and adults sensitive to subphonemic detail. Infant sensitivity allows long term phonology learning. • Potentially reveals developmental mechanism. Competition processes: 1) Potentially responsible for CP – locus of SLI? 2) Essential for learning. Conclusions Spoken language is defined by change. But the information to cope with it is in the signal—if lexical processes don’t discard it. Within-category acoustic variation is signal, not noise. IR Head-Tracker Emitters Head-Tracker Cam Monitor Head 2 Eye cameras Computers connected via Ethernet Eyetracker Computer Subject Computer Continuous acoustic detail affects spoken word recognition Implications for cognition, development and language disorders. Bob McMurray University of Iowa Dept. of Psychology Misperception: Additional Results Identification Results 1.00 Response Rate 0.90 0.80 0.70 Voiced Voiceless NW 0.60 0.50 0.40 Significant target responses even at extreme. 0.30 0.20 0.10 0.00 0 5 10 15 20 25 Barricade 30 35 Parricade 1.00 Response Rate 0.90 0.80 0.70 0.60 Voiced Voiceless 0.50 0.40 NW 0.30 0.20 0.10 0.00 0 5 Barakeet 10 15 20 25 30 35 Parakeet Graded effects of VOT on correct response rate. Phonetic “Garden-Path” “Garden-path” effect: Difference between looks to each target (b vs. p) at same VOT. VOT = 0 (/b/) VOT = 35 (/p/) Fixations to Target 1 0.8 Barricade Parakeet 0.6 0.4 0.2 0 0 500 1000 Time (ms) 0 500 1000 Time (ms) 1500 Garden-Path Effect ( Barricade - Parakeet ) 0.15 Target 0.1 0.05 GP Effect: Gradient effect of VOT. 0 -0.05 -0.1 0 5 10 15 20 25 30 35 25 30 35 VOT (ms) Garden-Path Effect ( Barricade - Parakeet ) 0.06 0.04 0.02 Competitor 0 -0.02 -0.04 -0.06 -0.08 -0.1 0 5 10 15 20 VOT (ms) Target: p<.0001 Competitor: p<.0001 Assimilation: Additional Results runm picks runm takes *** When /p/ is heard, the bilabial feature can be assumed to come from assimilation (not an underlying /m/). When /t/ is heard, the bilabial feature is likely to be from an underlying /m/. Exp 3 & 4: Conclusions Within-category detail used in recovering from assimilation: temporal integration. • Anticipate upcoming material • Bias activations based on context - Like Exp 2: within-category detail retained to resolve ambiguity.. Phonological variation is a source of information. Subject hears “select the mud “select the mudg “select the mudg drinker” gear” drinker Critical Pair Onset of “gear” Avg. offset of “gear” (402 ms) Fixation Proportion 0.45 0.4 0.35 0.3 0.25 0.2 0.15 Initial Coronal:Mud Gear 0.1 Initial Non-Coronal:Mug Gear 0.05 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time (ms) Mudg Gear is initially ambiguous with a late bias towards “Mud”. Onset of “drinker” Avg. offset of “drinker (408 ms) Fixation Proportion 0.6 0.5 0.4 0.3 0.2 Initial Coronal: Mud Drinker 0.1 Initial Non-Coronal: Mug Drinker 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time (ms) Mudg Drinker is also ambiguous with a late bias towards “Mug” (the /g/ has to come from somewhere). Onset of “gear” Fixation Proportion 0.8 0.7 0.6 0.5 0.4 Assimilated 0.3 Non Assimilated 0.2 0.1 0 0 200 400 600 Time (ms) Looks to non-coronal (gear) following assimilated or non-assimilated consonant. In the same stimuli/experiment there is also a progressive effect! Feedback % /t/ Ganong (1980): Lexical information biases perception of ambiguous phonemes. doot / toot duke / tuke d Phoneme Restoration (Warren, 1970, Samuel, 1997). t Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003) Ganong (1980): Lexical information biases perception of ambiguous phonemes. words phonemes Lexical Feedback: McClelland & Elman (1988); Magnuson, McMurray, Tanenhaus & Aslin (2003) Scales of temporal integration in word recognition • A Word: ordered series of articulations. - Build abstract representations. - Form expectations about future events. - Fast (online) processing. • A phonology: - Abstract across utterances. - Expectations about possible future events. - Slow (developmental) processing Sparseness Overgeneralization • large • costly: lose distinctiveness. Undergeneralization • small • not as costly: maintain distinctiveness. To increase likelihood of successful learning: • err on the side of caution. • start with small 1 0.9 39,900 Models Run P(Success) 0.8 0.7 0.6 2 Category Model 3 Category Model 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 Starting 40 50 60 Small Sparseness coefficient: % of space not strongly mapped to any category. Unmapped space Avg Sparseness Coefficient VOT Starting 0.4 .5-1 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 Training Epochs 10000 12000 Start with large σ Avg Sparsity Coefficient VOT Starting 0.4 .5-1 0.35 0.3 0.25 20-40 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 Training Epochs 10000 12000 Intermediate starting σ Avg Sparsity Coefficient VOT Starting 0.4 .5-1 3-11 12-17 20-40 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 Training Epochs 10000 12000 Model Conclusions To avoid overgeneralization… …better to start with small estimates for Small or even medium starting ’s lead to sparse category structure during infancy—much of phonetic space is unmapped. Sparse categories: Similar temporal integration to exp 2 Retain ambiguity (and partial representations) until more input is available. AEM Paradigm Examination of sparseness/completeness of categories needs a two alternative task. Anticipatory Eye Movements (McMurray & Aslin, 2005) Also useful with Infants are trained • Colorto make anticipatory •eye movements in Shape response to auditory visual • Spatial or Frequency stimulus. • Faces Post-training, generalization can be assessed with respect to both targets. Quicktime Demo bear pail Experiment 6 Anticipatory Eye Movements Train: Test: Bear0: Left Pail35: Right Bear0 Bear5 Bear10 Bear15 palm Pear40 Pear35 Pear30 Pear25 beach Same naturally-produced tokens from Exps 4 & 5. Expected results Sparse categories Adult boundary Performance unmapped Bear space VOT Pail Results Training Tokens { % Correct: 67% 9 / 16 Better than chance. 1 Beach % Correct 0.75 Palm 0.5 0.25 0 0 10 20 VOT 30 40