Word recognition

Aug 10 Outline
• Spoken word recognition
– Evidence for top-down feedback
• TRACE theory
• Cohort theory
• Windmann Presentation
– But isn’t word recognition automatic?
• Differences between spoken & written word
Evidence for Top-Down influence
on speech perception
• Phoneme Restoration Effect (Warren,
• Lexical bias in categorical perception task,
e.g. dype vs. type (Clifton & Connine,
• Errors made by close shadowers
(Marslen-Wilson, 1973)
What kinds of Top-Down
knowledge can we use for Speech
• Lexical
• Syntactic and Semantic
– Right-context comes too late, but Left-context
might be useful IF our syntactic and semantic
processing keeps pace with speech
• The driver turned the *eel.
• She saw his/him duck.
Marslen-Wilson (1973)
Speech Shadowing Task
– While listening to continuous speech, repeat it back
as rapidly as possible.
• For isolated words or nonsense syllables, RT is about 150 –
250 ms.
• For continuous prose, shadowing latency is about 500 –
1500 ms.
– Why different? Maybe because of syntactic and
semantic processing for sentences, which requires
larger units of processing (e.g. phrase or clause).
• If so, people shadowing at very short latencies should make
errors that ignore syntactic and semantic constraints of
• Only “distant shadowers” will make errors that respect
syntactic and semantic constraints
Marslen-Wilson (1973)
• Ran 65 participants in shadowing task and
measured average latency.
– 7 participants were “close shadowers” < 350 ms.
– Remaining participants had latencies of 500 -800 ms.
• Test passage presented over headphones to 7
close & 7 distant shadowers
– 300 words @ 160 words/min.
– average syllable = 200 ms
– Original passage & shadowing performance recorded
on separate tracks of tape recorder.
– 4 closest shadowers had 254-287 ms latencies &
made 1.7% - 6.6% errors
Marslen-Wilson (1973)
• Were close shadowers comprehending the input
more superficially than distant shadowers?
• No.
– Memory test on 600 word passage showed no
reliable correlation between shadowing latency &
memory score
• But this could reflect additional processes that lag behind
shadowing performance.
– Do close shadowers make different types of errors in
their shadowing performance itself?
Marslen-Wilson (1973)
• There were 111 constructive errors, in which
participants added a real word or changed a
word into another real word.
– All but 3 were grammatical & semantically
– No qualitative difference between close & distant
shadowers; sometimes they made the exact same
• It was beginning to be light enough so THAT I could see.
– Especially for close shadowers, constructive errors
tended to occur at very short latencies, perhaps
relying more on predictive top-down cues than
bottom-up information.
Marslen-Wilson (1973)
• Syntactic and semantic information (higher order
structure) was available to both close and distant
– When shadowers made errors, they were
syntactically and semantically well-formed
• Language Comprehension is Incremental—DTC
cannot be correct
• Syntactic and Semantic processing keep pace
with speech perception (within a syllable or so)
– Potential source of top-down cues to guide speech
perception & spoken word recognition.
Connine & Clifton (1987)
• Lexical Bias effect is enhanced by sentential
– At her birthday, she received a valuable *ift.
– * is ambiguous between /g/ & /t/
• Such top-down effects are clearly consistent with
interactive (though underspecified) models like
• How would an autonomous (modular) account of
speech perception handle this finding?
TRACE (McClelland & Elman, 1986)
At each level, individual
nodes (corresponding to
features, phonemes, or
words) compete for
Facilitatory activation
from bottom-up & topdown sources
Inhibition from bottom-up,
top-down, & lateral
Recognition occurs when
network settles into
stable state with a clear
Word-level competition in TRACE
• “bald”
Cycles (time)
Visual Trace
Cohort (Marslen-Wilson)
Theory combines initial autonomous stage with
secondary interactive stage.
• Word-initial cohort formed solely on the basis of
bottom-up acoustic input
– All cohort members are actual words
– Lexical access of candidates
• Words in the cohort are removed on the basis
– Inconsistency with further acoustic input
– Inconsistency with context
– Word recognition = only one candidate remains
Cohort Example: stand
Use Gating to find Recognition Pt
Gating study from Zwitserlood (1989)
– People heard successively longer fragments of critical words
– In 3 kinds context
• Carrier phrase: The next word is kapitein.
• Neutral context: They mourned the loss of their kapitein.
• Biasing context: With dampened spirits the men stood around the
grave. They
mourned the loss of their kapitein.
– Guessed what the word was
– Recognition point = Point in word where everyone identifies it as the
critical word
• Often earlier than uniqueness point
• How much earlier typically depends on degree of contextual
– Get to see what competitors are produced before recognition point
Zwitserlood (1989)
Evidence for parallel lexical activation of cohort
• Present participants with /kaept/, which is ambiguous
between captain and captive
– Experiments were conducted in Dutch, so modified here slightly
to work in English
• Then present a word related to either of those
continuations – like ship and guard
• Both “ship” & “guard” recognized fast, compared with
unrelated control word. Indicates access to semantics for
both cohort members
– True, even in biasing sentence context, so top-down context did
not prevent lexical access of cohort candidates!
– Example of semantic priming
Priming paradigm
Name (or make LDT to) red stimulus (i.e.,
prime word: CAT
target word: CAT
Repetition Priming: Faster to name/LDT target after
same-word prime than after any other kind of prime.
Semantic Priming: CAT is faster after related prime
(DOG) compared to unrelated prime (DOT)
Implications of Cohort
• Special role for word-onset
• Recognition point can precede end of
• An infelicitous word might not be
accurately recognized
– I mailed the letter w/o a STEAK.
• Can account for most top-down effects
– But not word-initial phoneme restoration
TRACE vs. Cohort
• Cohort focus specifically on word level, whereas TRACE
models feature and letter/phoneme identification as well.
– A later, connectionist version of Cohort incorporates speech
perception & addresses shortcomings of original cohort model
(Gaskell & Marslen-Wilson, 1997)
• Both theories allow for top-down effects on spoken word
• TRACE is fully interactive; Cohort has an initial
autonomous stage
• Cohort depends upon clear phonological input at word
onset, for activation of cohort.
– TRACE allows for graded activation based on shared features
• “ba…” activates “papa” as well as /b/ words.
– TRACE allows for activation of rhyming words
• “ball” partially activates “fall” and “call”
Windmann Presentation
• Sandy & Joanne
Take-Home Points
• Speech perception is fast and many aspects of it
seem to be automatic and feed-forward.
• Yet when bottom-up input is ambiguous, noisy,
or conflicted, top-down knowledge can influence
final percept, and perhaps the initial percept.
• Sentence-level Syntactic and Semantic
Processing keeps pace with speech perception,
lagging by no more than a syllable or two.
– Unit of syntactic analysis during comprehension is
word, not sentence; build parse tree incrementally.
Is lexical access Automatic &
Automatic Processes
– Fast
– Do not require attention
– Feed-forward (can’t be guided, controlled,
or stopped midstream)
– Not subject to top-down feedback
(informational encapsulation)
Priming paradigm
Name (or make LDT to) red stimulus (i.e.,
prime word: CAT
target word: CAT
Repetition Priming: Faster to name/LDT target after
same-word prime than after any other kind of prime.
Subliminal Priming: Even if prime is presented too
quickly for conscious awareness
Stroop Effect
Name font color
What happens if you have to name word?
Stroop Effect
• When font color conflicts with word itself, we are
slower and less accurate to name the font color.
– Recognition of word interferes with naming color of
• No such interference from font color if task is to
name the word.
• Word recognition is fast & feed-forward; we can’t
stop recognizing the word, even when doing so
is detrimental to task performance.
Is lexical access sensitive to topdown context?
• Maybe not.
• Zwitserlood (1989) found that cohort
members were activated, even if they
were inconsistent with the semantic
– Context did have an effect, but it was after the
initial bottom-up activation of cohort members.
A Puzzle
• Lexical Access seems like an automatic, feedforward, bottom-up process.
• Speech perception seems quite sensitive to topdown context effects.
• Can both of these be true?
• Is lexical access really more interactive than it
• Is speech perception really more bottom-up than
it appears?
Word Recognition Across
Spoken vs. Written
Lexical Access in Language
Levels of Processing
• Concept selection
• Word selection
• Phonological & phonetic encoding
• Construction of motor plan
• Articulation
οƒ˜ Is this bottom-up or top-down processing?
οƒ˜ Describe the Stroop effect in terms of these levels of
οƒ˜ Describe Ashcroft’s deficit in terms of these levels of
Differences between spoken and
written word recognition
• For relatively short words, letters in a
written word are processed in parallel
– Eye movement data
– Word superiority effect
– Letter-Search Task
• Spoken word unfolds across time
– Can recognize some words before they are
completely pronounced.
Eye Tracking
Word Superiority Effect
(Cattell, 1886; Reicher, 1969)
Present stimulus for brief (near threshold)
interval on T-scope. Is the (final) letter a D
or a K?
It is easier to
recognize a
letter when it
is in a word,
compared to a
non-word or
Is the word easier, due to guessing?
Visual Trace
Equal bottom-up
support for R & K,
but R wins due to
top-down support
from word level.
How many instances of the
letter “t” in the first sentence?
Which letter “t”s do people
• Word Superiority effect
• Letter Search Task
– Do we recognize a word by recognizing each
of the letters?
– Does word recognition facilitate letter
– What is the role of top-down and bottom-up
processing in these tasks?
Letter Recognition in Words
• Just like for phoneme perception in
spoken words, there is a great deal of
evidence that word & letter perception are
intertwined in visual word recognition.
• We may recognize the word faster than we
can recognize each of the letters,
providing the opportunity for top-down
processing from word to letter.
A Psycholinguistic Hoax
Aoccdrnig to rscheearch at Cmabrigde Uinervtisy, it deosn't
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit
The rset can be a total mses and you can sitll raed it wouthit a
porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey
lteter by istlef, but the wrod as a wlohe.
amzanig huh?
•Can we take this at face value? Is the order of intermediate
letters really irrelevant? Do the number and identity of
intermediate letters matter?
•How do we notice typos such as transposed letters?
•How do we realize we’re reading novel words?
•How do we distinguish “skates” from “steaks”?
Tasks for studying Word
Words in Isolation
• Naming
Words in Context
• Eye-tracking during reading
• Priming (often cross-modal)
Some Basic Findings about
Word Recognition
• Frequency influences RT in naming and LDT,
and gaze duration in eye-tracking
• LDT slow for wordy non-words
• Priming (Repetition, Semantic, etc.)
– Subliminal priming demonstrates that WR doesn’t
require attention
• High-level context effects???
– Faster to recognize word in congruent context?
– Slower in incongruent context?
Experimental Design
Balota et al. (2004)
• Factorial designs
– Very common
– Many important findings
– Limitations
• (large-scale) Regression studies
– Increasingly popular in word recognition lit
Experimental Design
• Factorial
– Item factors manipulated categorically
• E.g. frequency or contextual bias split into high and
low conditions
• Main effects, interactions
– If there is a main effect (e.g. of frequency on
naming latency), it suggests that that factor
(frequency) impacts lexical access
Example Experiment: Factorial LDT
• Hypothesis: High frequency words will be
recognized faster than low frequency words.
• Null Hypothesis: No effect of frequency on word
• Dependent Measure: time to say “yes”,
measured from onset of visually presented word.
• Participants: 24 college students who are native
speakers, with normal vision and no reading
• Critical Trials: 2 levels of Frequency
– 20 high frequency words, ranging from 75 to 300
tokens per million words
• 5-8 letters in length
– 20 low frequency words, ranging from 1 to 15 tokens
per million words.
• 5-8 letters in length
• Filler Trials
– 20 words
• 5-8 letters in length
– 60 nonwords
• 5-8 letters in length
• All are pronounceable and word-like
Analysis of Variance
• For each participant,
measure average latency
on high frequency trials &
average latency on low
frequency trials.
• Is there a main effect of
• F ratio = variance between
conditions/variance within
• p = probability that an effect
of size F is significant, given
degrees of freedom in your
2 by 2 Factorial Design
• Hypothesis: Frequency effect is larger for
long words than for short words
• Stimuli (4 critical conditions, 2 factors)
– Short, High freq words
– Short, Low freq words
– Long, High freq words
– Long, Low freq words
• Predicting an interaction between our 2
Limitations of Factorial Designs
• Hard to manipulate one factor while holding all
other variables constant
– E.g., length, regularity, imagability, and age of
acquisition are all correlated with frequency
• If we don’t control for imagability, it could be confounded with
frequency. If so, our “frequency effect” might really be an
imagability effect.
• Words are not randomly selected
– Though this assumptions is implicit in ANOVA
– Researchers may be using intuitions to select subsets
of words that are recognized fast/slow due to
variability on dimensions not intentionally
Limitations of Factorial Designs
• Unwanted list-context effects
– Related to non-random sampling
– Experimental stimuli may lead participants to
expect certain types of words
• Categorizing continuous variables
decreases statistical power (sensitivity)
• More informative to know how much a
factor influences word recognition rather
than simply that the factor has an impact
Balota et al. Regression Study
• What is the best way to measure frequency?
• What is the independent contribution of
theoretically interesting predictor variables?
– how much variance can each explain?
• Does importance of predictor variable differ for
naming and lexical decision?
• Does it differ for younger (mean age = 20) and
older (74) adults?
– 50+ more years of practice
– Cognitive declines in late adulthood
Balota et al. Stimuli
• Critical Stimuli: All monosyllabic,
monomorphemic words from million-word,
balanced corpus (Kucera & Francis, 1967).
– 2,428 words with high accuracy in analyses
– Each word coded for various types of frequency,
length, and many other variables.
• LDT version has an equal number of nonwords
created by changing 1-3 letters of real words
Naming RT was not very predictive
Naming RT was not very predictive
Young RT predicts Old RT fairly
Young RT predicts Old RT fairly
Interim Summary
• For a given word, RT in naming is not a very
good predictor of RT in LDT
– Suggests that some predictor variables contribute
more to naming, and some contribute more to LDT
• For a given word, RT by young adults is a pretty
good predictor of RT by older adults, regardless
of tasks
– Older adults are slower, but performance may be
influenced by same predictor variables as young
Young adults were faster and less variable.
Overall, predictor variables accounted for
about 50% of variance in young, 40% of
variance in old.
Frequency explains more variance in LDT than in
They like Zeno
(17 million)
corpus, which
does pretty
well. Other
studies have
found that
spoken lg
corpora do
better than
text corpora
(all these are
Most common
measure does
poorly (1 million
Regression Analysis
• Surface Predictors
– Phonetic features of onset phoneme coded as
1 (present) or 0 (absent)
• Bilabial, dental, fricative, voiced…
• Important for naming, because of voice key
• Lexical Predictors
• Semantic Predictors
Regression Analysis
• Surface Predictors
• Lexical Predictors
– Length in letters (2-8)
– Neighborhood size (# of other words that differ only
by 1 letter)
– Objective frequency (Zeno norms)
– Subjective frequency (Balota familiarity ratings)
– Consistency (4 types of spelling-sound
• Semantic Predictors
Regression Analysis
• Surface Predictors [step 1]
• Lexical Predictors [step 2]
• Semantic Predictors [step 3]
– Nelson’s set size: # of associates in free association
– Imagability rating
– Wordnet connectivity (Miller)
Many of predictor variables are related to one
another (e.g., longer words have smaller
neighborhoods), so analysis must partial out
shared variance.
I will focus on RT-by items analysis (ignoring
Accuracy & subject-level analyses)
Regression Analysis
Phono onset
matters a lot for
especially for
Length also
matters more for
Freq matters
more for LDT
Cost of length higher for low
frequency words
Implications of Length Effects
• Coltheart et al. (2001) predict a length by
lexicality interaction, with non-words
showing a greater effect of length.
• They also predict the length by frequency
interaction observed by Balota et al.
• Motivated by the Dual Route Model of
Visual Word Recognition
Dual Route Model
Two pathways for lexical access:
• ‘lexical’ route: proceeds directly from
orthography to lexicon
– Available to well-known words
– Preferred for irregularly spelled words
– May dominate in LGs with little ortho-graphemic
• ‘sublexical’ route: graphemic form converted
to phonological representation BEFORE
lexical access (each grapheme is assigned a
pronunciation by mapping to a phoneme)
Some semantic effects (esp LDT) after
partialling out phono & lexical effects
Meaning probably
plays a stronger
role in the LDT
compared with
Summary of Balota et al.
• Large-scale regression study replicated many
effects established by factorial studies
– PLUS…power to detect many small effects, such as
the influence of imagability on naming RT
– while over-coming limitations associated with small
item sets
• Clarified unique contributions & interactions of
specific variables
• Allowed careful examination of
– task differences between naming & LDT
– age differences between younger & older adults
Bilingual Word Recognition
Assumptions about the Lexicon
• Storehouse of knowledge about all the
words you know
• Organized phonologically
– Word-initial cohort together
• Distinct from Semantic Memory
οƒ˜Are bilinguals any different from
How is bilingual memory
General agreement on the separation of
lexicon(s) and semantic memory.
– Dog and chien access same semantic
network, because both prime cat in FrenchEnglish bilinguals.
Whether there is a distinct lexicon for each
language is controversial
Why study the bilingual lexicon?
• Not really a special case, worldwide
• Mapping between words and meanings
• Mapping between phonology and
– Are words from multiple languages in wordinitial cohort?
– If not, can we also limit by topic domain—e.g.,
no physics words in history class?
Two Opposite Hypotheses
1. Bilinguals have 2 distinct
lexicons; tri-linguals have 3, &
so on.
2. Everyone has a single lexicon
– How keep lgs straight?
Possible Evidence for
Separate Lexicons
• Lack of repetition priming across
languages: chien doesn’t prime dog like
dog primes dog.
– But couch doesn’t prime sofa like sofa primes
sofa either
Possible Evidence for
Separate Lexicons
• Release from PI
– In a single language
• It is difficult to recall an item that occurs late on a list when it
is preceded by a lot of similar items. The earlier items cause
proactive (as opposed to retroactive) interference.
apple, pear, peach, orange, pineapple…
As the list increases in length, likelihood of remembering a
late-occurring item decreases, unless it is from a new
semantic category
apple, pear, peach, orange, fireman…
– The same release from PI occurs w/a language change
pear, peach, orange, pineapple, manzana…
Possible Evidence for Combined
• In comprehension, word-initial cohort
includes candidates from both languages
• In production, code-switching midsentence
– Just use the best word, regardless of
– But only if your listener knows both
languages too!
• Lexical access vs. lexical selection
What about Cog-Neuro evidence?
• Patterns of aphasia in bilingual and
multi-lingual speakers
• Pre-operative brain stimulation
• Imaging (PET, fMRI) and ERP studies
Patterns of Recovery in Aphasia
Fabbro (1999)
• 40% L1 and L2 recover in
• 32% L1 > L2
• 28% L2 > L1
Imaging Dutch-French-English trilinguals in Belgium (Vingerhoets et al., 2003)
•Picture naming, word
fluency and paragraph
comprehension tasks
•All tasks revealed
overlapping” regions
for the 3 languages
•L2’s show activation
in more areas and
“more extensive
recruitment” of areas
activated by L1
Word Fluency Task: Covertly generate as many words
as possible beginning with a specified letter.
Lexical Access in Speaking
There is currently enthusiasm for single-store
models or partially-overlapping lexicons.
When preparing an utterance for output, how then
does the bilingual activate only words from a
single language?
*automatic, parallel access [concept – words]
*deliberate selection mechanism [best word]
Some insights from Bilinguals
• More evidence for separation of semantic
memory & lexicon
• More evidence for automaticity of lexical access
in both comprehension & production
• Distinction between lexical access & lexical
– So we may activate physics words in history class,
and then filter them out
Brain potential and functional
MRI evidence for how to
handle two languages with
one brain
Rodriguez-Fornells et al