Aug 10 Outline • Spoken word recognition – Evidence for top-down feedback • TRACE theory • Cohort theory • Windmann Presentation – But isn’t word recognition automatic? • Differences between spoken & written word recognition Evidence for Top-Down influence on speech perception • Phoneme Restoration Effect (Warren, 1970) • Lexical bias in categorical perception task, e.g. dype vs. type (Clifton & Connine, 1987) • Errors made by close shadowers (Marslen-Wilson, 1973) What kinds of Top-Down knowledge can we use for Speech Perception? • Lexical • Syntactic and Semantic – Right-context comes too late, but Left-context might be useful IF our syntactic and semantic processing keeps pace with speech perception. • The driver turned the *eel. • She saw his/him duck. Marslen-Wilson (1973) Speech Shadowing Task – While listening to continuous speech, repeat it back as rapidly as possible. • For isolated words or nonsense syllables, RT is about 150 – 250 ms. • For continuous prose, shadowing latency is about 500 – 1500 ms. – Why different? Maybe because of syntactic and semantic processing for sentences, which requires larger units of processing (e.g. phrase or clause). • If so, people shadowing at very short latencies should make errors that ignore syntactic and semantic constraints of sentence. • Only “distant shadowers” will make errors that respect syntactic and semantic constraints Marslen-Wilson (1973) • Ran 65 participants in shadowing task and measured average latency. – 7 participants were “close shadowers” < 350 ms. – Remaining participants had latencies of 500 -800 ms. • Test passage presented over headphones to 7 close & 7 distant shadowers – 300 words @ 160 words/min. – average syllable = 200 ms – Original passage & shadowing performance recorded on separate tracks of tape recorder. – 4 closest shadowers had 254-287 ms latencies & made 1.7% - 6.6% errors Marslen-Wilson (1973) • Were close shadowers comprehending the input more superficially than distant shadowers? • No. – Memory test on 600 word passage showed no reliable correlation between shadowing latency & memory score • But this could reflect additional processes that lag behind shadowing performance. – Do close shadowers make different types of errors in their shadowing performance itself? Marslen-Wilson (1973) • There were 111 constructive errors, in which participants added a real word or changed a word into another real word. – All but 3 were grammatical & semantically appropriate. – No qualitative difference between close & distant shadowers; sometimes they made the exact same error: • It was beginning to be light enough so THAT I could see. – Especially for close shadowers, constructive errors tended to occur at very short latencies, perhaps relying more on predictive top-down cues than bottom-up information. Marslen-Wilson (1973) Summary • Syntactic and semantic information (higher order structure) was available to both close and distant shadowers – When shadowers made errors, they were syntactically and semantically well-formed • Language Comprehension is Incremental—DTC cannot be correct • Syntactic and Semantic processing keep pace with speech perception (within a syllable or so) – Potential source of top-down cues to guide speech perception & spoken word recognition. Connine & Clifton (1987) • Lexical Bias effect is enhanced by sentential context. – At her birthday, she received a valuable *ift. – * is ambiguous between /g/ & /t/ • Such top-down effects are clearly consistent with interactive (though underspecified) models like TRACE. • How would an autonomous (modular) account of speech perception handle this finding? TRACE (McClelland & Elman, 1986) • • • • At each level, individual nodes (corresponding to features, phonemes, or words) compete for activation. Facilitatory activation from bottom-up & topdown sources Inhibition from bottom-up, top-down, & lateral sources Recognition occurs when network settles into stable state with a clear winner. Word-level competition in TRACE • “bald” Activation Cycles (time) Visual Trace Example Cohort (Marslen-Wilson) Theory combines initial autonomous stage with secondary interactive stage. • Word-initial cohort formed solely on the basis of bottom-up acoustic input – All cohort members are actual words – Lexical access of candidates • Words in the cohort are removed on the basis of… – Inconsistency with further acoustic input – Inconsistency with context – Word recognition = only one candidate remains Cohort Example: stand Use Gating to find Recognition Pt Gating study from Zwitserlood (1989) – People heard successively longer fragments of critical words – In 3 kinds context • Carrier phrase: The next word is kapitein. • Neutral context: They mourned the loss of their kapitein. • Biasing context: With dampened spirits the men stood around the grave. They mourned the loss of their kapitein. – Guessed what the word was – Recognition point = Point in word where everyone identifies it as the critical word • Often earlier than uniqueness point • How much earlier typically depends on degree of contextual constraint – Get to see what competitors are produced before recognition point Zwitserlood (1989) Evidence for parallel lexical activation of cohort members • Present participants with /kaept/, which is ambiguous between captain and captive – Experiments were conducted in Dutch, so modified here slightly to work in English • Then present a word related to either of those continuations – like ship and guard • Both “ship” & “guard” recognized fast, compared with unrelated control word. Indicates access to semantics for both cohort members – True, even in biasing sentence context, so top-down context did not prevent lexical access of cohort candidates! – Example of semantic priming Priming paradigm Name (or make LDT to) red stimulus (i.e., target). prime word: CAT target word: CAT Repetition Priming: Faster to name/LDT target after same-word prime than after any other kind of prime. Semantic Priming: CAT is faster after related prime (DOG) compared to unrelated prime (DOT) Implications of Cohort • Special role for word-onset • Recognition point can precede end of word • An infelicitous word might not be accurately recognized – I mailed the letter w/o a STEAK. • Can account for most top-down effects – But not word-initial phoneme restoration TRACE vs. Cohort • Cohort focus specifically on word level, whereas TRACE models feature and letter/phoneme identification as well. – A later, connectionist version of Cohort incorporates speech perception & addresses shortcomings of original cohort model (Gaskell & Marslen-Wilson, 1997) • Both theories allow for top-down effects on spoken word recognition • TRACE is fully interactive; Cohort has an initial autonomous stage • Cohort depends upon clear phonological input at word onset, for activation of cohort. – TRACE allows for graded activation based on shared features • “ba…” activates “papa” as well as /b/ words. – TRACE allows for activation of rhyming words • “ball” partially activates “fall” and “call” Windmann Presentation • Sandy & Joanne Take-Home Points • Speech perception is fast and many aspects of it seem to be automatic and feed-forward. • Yet when bottom-up input is ambiguous, noisy, or conflicted, top-down knowledge can influence final percept, and perhaps the initial percept. • Sentence-level Syntactic and Semantic Processing keeps pace with speech perception, lagging by no more than a syllable or two. – Unit of syntactic analysis during comprehension is word, not sentence; build parse tree incrementally. Is lexical access Automatic & Modular? Automatic Processes – Fast – Do not require attention – Feed-forward (can’t be guided, controlled, or stopped midstream) – Not subject to top-down feedback (informational encapsulation) Priming paradigm Name (or make LDT to) red stimulus (i.e., target). prime word: CAT target word: CAT Repetition Priming: Faster to name/LDT target after same-word prime than after any other kind of prime. Subliminal Priming: Even if prime is presented too quickly for conscious awareness Stroop Effect Name font color RED GREEN BLUE YELLOW GREEN What happens if you have to name word? Stroop Effect • When font color conflicts with word itself, we are slower and less accurate to name the font color. – Recognition of word interferes with naming color of letters. • No such interference from font color if task is to name the word. • Word recognition is fast & feed-forward; we can’t stop recognizing the word, even when doing so is detrimental to task performance. Is lexical access sensitive to topdown context? • Maybe not. • Zwitserlood (1989) found that cohort members were activated, even if they were inconsistent with the semantic context. – Context did have an effect, but it was after the initial bottom-up activation of cohort members. A Puzzle • Lexical Access seems like an automatic, feedforward, bottom-up process. • Speech perception seems quite sensitive to topdown context effects. • Can both of these be true? • Is lexical access really more interactive than it appears? • Is speech perception really more bottom-up than it appears? Word Recognition Across Modalities Production Spoken vs. Written Lexical Access in Language Production Levels of Processing • Concept selection • Word selection • Phonological & phonetic encoding • Construction of motor plan • Articulation ο Is this bottom-up or top-down processing? ο Describe the Stroop effect in terms of these levels of processing. ο Describe Ashcroft’s deficit in terms of these levels of processing. Differences between spoken and written word recognition • For relatively short words, letters in a written word are processed in parallel – Eye movement data – Word superiority effect – Letter-Search Task • Spoken word unfolds across time – Can recognize some words before they are completely pronounced. Eye Tracking Word Superiority Effect (Cattell, 1886; Reicher, 1969) Present stimulus for brief (near threshold) interval on T-scope. Is the (final) letter a D or a K? It is easier to recognize a letter when it is in a word, compared to a non-word or isolation. OWRK *** • OWRK *** K •K WORK *** • WORK Is the word easier, due to guessing? Visual Trace Example Equal bottom-up support for R & K, but R wins due to top-down support from word level. How many instances of the letter “t” in the first sentence? Which letter “t”s do people miss? Implications: • Word Superiority effect • Letter Search Task – Do we recognize a word by recognizing each of the letters? – Does word recognition facilitate letter recognition? – What is the role of top-down and bottom-up processing in these tasks? Letter Recognition in Words • Just like for phoneme perception in spoken words, there is a great deal of evidence that word & letter perception are intertwined in visual word recognition. • We may recognize the word faster than we can recognize each of the letters, providing the opportunity for top-down processing from word to letter. A Psycholinguistic Hoax Aoccdrnig to rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a total mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. amzanig huh? •Can we take this at face value? Is the order of intermediate letters really irrelevant? Do the number and identity of intermediate letters matter? •How do we notice typos such as transposed letters? •How do we realize we’re reading novel words? •How do we distinguish “skates” from “steaks”? Tasks for studying Word Recognition Words in Isolation • Naming • LDT Words in Context • Eye-tracking during reading • Priming (often cross-modal) Some Basic Findings about Word Recognition • Frequency influences RT in naming and LDT, and gaze duration in eye-tracking • LDT slow for wordy non-words • Priming (Repetition, Semantic, etc.) – Subliminal priming demonstrates that WR doesn’t require attention • High-level context effects??? – Faster to recognize word in congruent context? – Slower in incongruent context? Experimental Design Balota et al. (2004) • Factorial designs – Very common – Many important findings – Limitations • (large-scale) Regression studies – Increasingly popular in word recognition lit Experimental Design • Factorial – Item factors manipulated categorically • E.g. frequency or contextual bias split into high and low conditions – ANOVA • Main effects, interactions – If there is a main effect (e.g. of frequency on naming latency), it suggests that that factor (frequency) impacts lexical access Example Experiment: Factorial LDT • Hypothesis: High frequency words will be recognized faster than low frequency words. • Null Hypothesis: No effect of frequency on word recognition • Dependent Measure: time to say “yes”, measured from onset of visually presented word. • Participants: 24 college students who are native speakers, with normal vision and no reading problems. Stimuli • Critical Trials: 2 levels of Frequency – 20 high frequency words, ranging from 75 to 300 tokens per million words • 5-8 letters in length – 20 low frequency words, ranging from 1 to 15 tokens per million words. • 5-8 letters in length • Filler Trials – 20 words • 5-8 letters in length – 60 nonwords • 5-8 letters in length • All are pronounceable and word-like Analysis of Variance • For each participant, measure average latency on high frequency trials & average latency on low frequency trials. • Is there a main effect of frequency? • F ratio = variance between conditions/variance within conditions • p = probability that an effect of size F is significant, given degrees of freedom in your study Partici pant 1 2 3 4 5 6 7 8 High low 560 603 491 621 555 602 597 533 590 721 496 621 631 674 711 698 2 by 2 Factorial Design • Hypothesis: Frequency effect is larger for long words than for short words • Stimuli (4 critical conditions, 2 factors) – Short, High freq words – Short, Low freq words – Long, High freq words – Long, Low freq words • Predicting an interaction between our 2 factors Limitations of Factorial Designs • Hard to manipulate one factor while holding all other variables constant – E.g., length, regularity, imagability, and age of acquisition are all correlated with frequency • If we don’t control for imagability, it could be confounded with frequency. If so, our “frequency effect” might really be an imagability effect. • Words are not randomly selected – Though this assumptions is implicit in ANOVA – Researchers may be using intuitions to select subsets of words that are recognized fast/slow due to variability on dimensions not intentionally manipulated. Limitations of Factorial Designs • Unwanted list-context effects – Related to non-random sampling – Experimental stimuli may lead participants to expect certain types of words • Categorizing continuous variables decreases statistical power (sensitivity) • More informative to know how much a factor influences word recognition rather than simply that the factor has an impact Balota et al. Regression Study Goals • What is the best way to measure frequency? • What is the independent contribution of theoretically interesting predictor variables? – how much variance can each explain? • Does importance of predictor variable differ for naming and lexical decision? • Does it differ for younger (mean age = 20) and older (74) adults? – 50+ more years of practice – Cognitive declines in late adulthood Balota et al. Stimuli • Critical Stimuli: All monosyllabic, monomorphemic words from million-word, balanced corpus (Kucera & Francis, 1967). – 2,428 words with high accuracy in analyses – Each word coded for various types of frequency, length, and many other variables. • LDT version has an equal number of nonwords created by changing 1-3 letters of real words Naming RT was not very predictive of LDT RT Naming RT was not very predictive of LDT RT Young RT predicts Old RT fairly well Young RT predicts Old RT fairly well Interim Summary • For a given word, RT in naming is not a very good predictor of RT in LDT – Suggests that some predictor variables contribute more to naming, and some contribute more to LDT • For a given word, RT by young adults is a pretty good predictor of RT by older adults, regardless of tasks – Older adults are slower, but performance may be influenced by same predictor variables as young adults Mean SD Young adults were faster and less variable. Overall, predictor variables accounted for about 50% of variance in young, 40% of variance in old. Frequency explains more variance in LDT than in Naming They like Zeno (17 million) corpus, which does pretty well. Other large-scale studies have found that spoken lg corpora do better than text corpora (all these are text). Most common measure does poorly (1 million words) Regression Analysis • Surface Predictors – Phonetic features of onset phoneme coded as 1 (present) or 0 (absent) • Bilabial, dental, fricative, voiced… • Important for naming, because of voice key sensitivity • Lexical Predictors • Semantic Predictors Regression Analysis • Surface Predictors • Lexical Predictors – Length in letters (2-8) – Neighborhood size (# of other words that differ only by 1 letter) – Objective frequency (Zeno norms) – Subjective frequency (Balota familiarity ratings) – Consistency (4 types of spelling-sound correspondence) • Semantic Predictors Regression Analysis • Surface Predictors [step 1] • Lexical Predictors [step 2] • Semantic Predictors [step 3] – Nelson’s set size: # of associates in free association task – Imagability rating – Wordnet connectivity (Miller) Many of predictor variables are related to one another (e.g., longer words have smaller neighborhoods), so analysis must partial out shared variance. I will focus on RT-by items analysis (ignoring Accuracy & subject-level analyses) Regression Analysis Phono onset matters a lot for Naming, especially for young. Length also matters more for Naming Freq matters more for LDT Cost of length higher for low frequency words Interaction Implications of Length Effects • Coltheart et al. (2001) predict a length by lexicality interaction, with non-words showing a greater effect of length. • They also predict the length by frequency interaction observed by Balota et al. • Motivated by the Dual Route Model of Visual Word Recognition Dual Route Model Two pathways for lexical access: • ‘lexical’ route: proceeds directly from orthography to lexicon – Available to well-known words – Preferred for irregularly spelled words – May dominate in LGs with little ortho-graphemic consistency • ‘sublexical’ route: graphemic form converted to phonological representation BEFORE lexical access (each grapheme is assigned a pronunciation by mapping to a phoneme) Some semantic effects (esp LDT) after partialling out phono & lexical effects Meaning probably plays a stronger role in the LDT compared with Naming Summary of Balota et al. • Large-scale regression study replicated many effects established by factorial studies – PLUS…power to detect many small effects, such as the influence of imagability on naming RT – while over-coming limitations associated with small item sets • Clarified unique contributions & interactions of specific variables • Allowed careful examination of – task differences between naming & LDT – age differences between younger & older adults Bilingual Word Recognition Assumptions about the Lexicon • Storehouse of knowledge about all the words you know • Organized phonologically – Word-initial cohort together • Distinct from Semantic Memory οAre bilinguals any different from monolinguals? How is bilingual memory organized? General agreement on the separation of lexicon(s) and semantic memory. – Dog and chien access same semantic network, because both prime cat in FrenchEnglish bilinguals. Whether there is a distinct lexicon for each language is controversial Why study the bilingual lexicon? • Not really a special case, worldwide • Mapping between words and meanings • Mapping between phonology and words/meanings – Are words from multiple languages in wordinitial cohort? – If not, can we also limit by topic domain—e.g., no physics words in history class? Two Opposite Hypotheses 1. Bilinguals have 2 distinct lexicons; tri-linguals have 3, & so on. 2. Everyone has a single lexicon – How keep lgs straight? Possible Evidence for Separate Lexicons • Lack of repetition priming across languages: chien doesn’t prime dog like dog primes dog. – But couch doesn’t prime sofa like sofa primes sofa either Possible Evidence for Separate Lexicons • Release from PI – In a single language • It is difficult to recall an item that occurs late on a list when it is preceded by a lot of similar items. The earlier items cause proactive (as opposed to retroactive) interference. apple, pear, peach, orange, pineapple… As the list increases in length, likelihood of remembering a late-occurring item decreases, unless it is from a new semantic category apple, pear, peach, orange, fireman… – The same release from PI occurs w/a language change pear, peach, orange, pineapple, manzana… Possible Evidence for Combined Lexicon • In comprehension, word-initial cohort includes candidates from both languages • In production, code-switching midsentence – Just use the best word, regardless of language? – But only if your listener knows both languages too! • Lexical access vs. lexical selection What about Cog-Neuro evidence? • Patterns of aphasia in bilingual and multi-lingual speakers • Pre-operative brain stimulation • Imaging (PET, fMRI) and ERP studies Patterns of Recovery in Aphasia Fabbro (1999) • 40% L1 and L2 recover in parallel • 32% L1 > L2 • 28% L2 > L1 Pre-op electrical stimulation (Ojemann) Imaging Dutch-French-English trilinguals in Belgium (Vingerhoets et al., 2003) •Picture naming, word fluency and paragraph comprehension tasks •All tasks revealed “predominantly overlapping” regions for the 3 languages •L2’s show activation in more areas and “more extensive recruitment” of areas activated by L1 (Dutch) Word Fluency Task: Covertly generate as many words as possible beginning with a specified letter. Lexical Access in Speaking There is currently enthusiasm for single-store models or partially-overlapping lexicons. When preparing an utterance for output, how then does the bilingual activate only words from a single language? *automatic, parallel access [concept – words] *deliberate selection mechanism [best word] Some insights from Bilinguals • More evidence for separation of semantic memory & lexicon • More evidence for automaticity of lexical access in both comprehension & production • Distinction between lexical access & lexical selection – So we may activate physics words in history class, and then filter them out Brain potential and functional MRI evidence for how to handle two languages with one brain Rodriguez-Fornells et al