Morphological Decomposition in Word Recognition

advertisement
Decomposition to the Root:
MEG Studies of Morphologically
Complex Words
Alec Marantz
Olla Solomyak, Ehren Reilly
NYU Depts. of Linguistics and Psychology
KIT/NYU MEG Joint Research Lab
Decomposition to the root
(why the morphologist cares about lexical access)
• Claim associated now with Distributed Morphology:
– all “lexical categories” decompose at least to a root and a
category-determining affix
– all relations between words or morphemes (e.g., blocking
relations) are computed at the syntactic level of terminal
nodes. Thus a single item (e.g., undecomposed irregular
past tense “gave”) cannot “compete” with a complex
structure (e.g., [give [pst]]) (Embick & Marantz 2008)
– the grammar itself demands full decomposition to the root
– the existence of “whole word” roots to lexical access or
processing would necessitate a different grammatical
system for processing language as opposed to, say,
computing grammaticality
Decomposition to the root
(why the morphologist cares about lexical access)
• Tracking the “ami” in “amiable,” then, is one step along the
way toward understanding how the root “cat” functions inside
“cat”
adj
√ami
n
adj
-able
√cat
n
ø
(Overly) Simplified Models of Lexical Access:
Pinker’s Words and Rules
• Full storage model: all complex words (walked,
taught) stored and accessed as wholes
– only surface frequency effects predicted
– Reaction Time (RT) correlates with the surface frequency
of a complex word
• Full decomposition model: no complex words stored
and accessed as wholes
– only stem frequency effects predicted
– RT correlates with the frequency of the stem of a complex
word, not the frequency of the word as a whole (surface
frequency)
• Dual Route Model (Pinker’s): irregular complex
forms (taught) are stored and accessed as wholes;
regular complex forms (walked) are not:
– surface frequency effects for irregulars (and high
frequency regulars) = RT to taught correlates with freq of
taught, not teach
– stem frequency but no surface frequency effects on access
for regulars = RT to walked correlates with freq of walk,
not walked
• There are Stem Frequency effects in access for
complex words
– RT to walked does correlate with freq of walk
• These effects are not attributable to postaccess decomposition
• But, surface frequency effects in lexical access
are found in wide variety of cases, including
completely regular morphology (e.g., for most
inflected words in Finnish)
E.g.:
• Surface frequency effects even for transparent
productive regular morphology like -less and for
same words that yield base frequency effects
– surface frequency effects when surface frequency is varied
and base frequency is held constant
– base frequency effects when base frequency is varied and
surface frequency is held constant
Additional Problems for Pinker-style Dual Route
Model
• The representation of irregular derived or
inflected forms must be complex
– from the grammatical point of view, gave is as
complex as walked
• no further affixation: *the gaving, *the walkeding
(note: Pinker’s appeal to irregular plurals inside
compounds highlights his incorrect prediction here –
mice eater, but *micey (mousey))
• alternations with do support: Did he walk/*walked, Did
he give/*gave
– from the psycho and neurolinguistic point of view,
irregulars contain the stem in the same way that
regulars do
• taught-teach identity priming in long-lag priming (only
identity (“morphological”) relations - not semantic nor
phonological - survive in long distance priming)
• and for M350 brain response (e.g., Stockall & Marantz
2006)
– taught-teach M350 (~N400) priming equivalent to identity
priming, although RT priming is reduced
Whole Word “Representations” for Regulars, if
Surface Frequency effects imply whole word
representations
(in some sense)
• Surface frequency effects on access are seen for a variety of
completely regular derivations and inflections, implying whole
word representations, in some sense
• Obligatory decomposition:
– surface frequency effects could be tied to decomposition (the more
you’ve decomposed a particular letter/sound sequence into stem and
affix, the faster you are at it) and/or
– recombination (the more often you’ve put together a particular stem
and affix, the faster you are at it)
– in either case, against Pinker’s dual route model, such effects imply
representation of whole word as complex structure, regardless of
regularity
• walked may “stored” as a complex form with a certain
frequency in the same way that a saying like, And now for
something completely different, is
• That is, any surface frequency effect may be connected to
long-term effects of having computed a complex form and
thus imply a “representation” of the complex form, no matter
how regular
• This “usage-based” account of frequency effects holds no
immediate implications for the grammar of morphologically
complex words, nor for the issue of whether all complex
words are recognized via decomposition (and recomposition)
Encyclopedia
Stored info about
encountered items
(outside language
system)
“And now for something
completely different”
[un[real]]
White House
UN+REAL (??)
lemma
(lexical entry)
“not”
modality specific
access lexicon
un
REAL
unreal
(??)
real
(visual word form)
form code
(letters)
u n
r e
a
l
interactive dual
route models
and obligatory
decomposition
models differ on
the possible
presence of
complex word
forms in modality
specific access
lexicons, and
perhaps on
whether derived
forms have
“lexical entries”
Differences Between Realistic Dual Route Model and
Realistic Full Decomposition Model
• Both models require a (modality specific)
word form “lexicon”
– for full decomposition model, this lexicon holds
only forms of morphemes
– for dual route model, this lexicon holds some
morphologically complex forms
• Dual Route but not Full Decomposition model
allows whole word lexical entries and word
form entries for morphologically complex
forms
Stages of Lexical Access:
which computations in a Full Decomposition
Model affect RT?
• I. Decomposition (affix-stripping): no general
effect on RT
– Taft: cost-free
– Literature: no evidence that ease or difficulty in
affix stripping generally correlates with change in
RT
– MEG studies (to be discussed): brain activity
correlated with decomposition does not correlate
with RT (more brain work associated with
decomposition does not yield longer RTs)
• II. Lemma access: frequency of “lemma” (stem)
correlates with RT
– Lemma (stem) access is modulated by frequency and by
priming
– Morphological family size of a stem and number of related
senses (polysemy) have been shown to modulate brain
activity associated with lemma access at the same brain
time/place (the “M350”) as stem frequency
– However, the relationship between an affix and a stem for
a morphologically complex word has not been shown to
affect the same brain response
• III. Recomposition: surface frequency statistics
correlate with RT because of their role in
determining the ease of recomposition of stem and
affixes
– So, whole word “representations” (in the sense of
“Encyclopedia” storage or simply in the sense of
repeatedly used neural pathways) are accessed via
decomposition and recomposition, where the surface
frequency properties of these representations exert a late
influence on lexical access
Sequential processing of words
Sequential processing of words
Pylkkänen and Marantz, 2003, Trends in Cognitive Sciences
Latency of M350 sensitive to lexical factors such as lexical
frequency and repetition:
reflects stage of lexical access
Frequency
(Embick, Hackl, Shaeffer, Kelepir, Marantz,
Cognitive Brain Research, 2001)
Repetition
(Pylkkänen, Stringfellow, Flagg, Marantz,
Biomag2000 Proceedings, 2000)
Full Decomposition Model Related to MEG
response components
• M100 (“Type I” Tarkiainen et al.) response from
primary visual areas
– visual feature analysis
• M130 (“Type II”) response from occipital-temporal
junction
– abstract letter string analysis
• M170 (“visual word from area”) response from
fusiform area
– affix stripping and functional morpheme identification
– visual word form recognition
Regions of interest derived from peak activity in grand averaged data
across subjects
• M350 (early “N400m”) response from
temporal lobe, with possible (likely)
contribution from inferior frontal cortex
– lemma activation
• Post-M350 N400m response from temporal
lobe (and other regions)
– recombination of stem and affix, contact with
Encyclopedic knowledge, integration into context
Statistical Connections between
Stem and Affix
• J. Hay proposes that the transition probability
of the affix given the stem (so, from stem to
affix) should correlated with ease of
decomposition - the higher this probability,
the harder the decomposition and the more
“affix dominant” a complex word is
• The transition probability of the stem given the affix
(from affix to stem), on the other hand, could reflect
the ease of recomposition.
– Note that for all but the most frequent regular English past
tense verbs, the probability of the stem given the past
tense suffix is vanishingly small.
– If RT that seems to correlate with surface frequency is
actually correlating with the transition probability from
affix to stem, this could explain why regular formations in
English do not show surface frequency effects unless the
frequencies are very high.
Transition Probabilities & Affix
dominance
tokens of
“merely”
tokens of “merely”
tokens of words
containing “mere”
transition probability from stem to suffix
correlates with ratio of a suffixed word’s
frequency to frequency of words with the
same stem, which is essentially
equivalent to “affix dominance”
tokens of words with -ly
transition probability from suffix to stem
correlates with ratio of a suffixed word’s
frequency to the frequency of words
with the same suffix
hypothetical example:
matched for stem frequency (9), difference in surface dominant
(mere(ly)) or stem dominant (sane(ly))
•
•
•
•
•
mere
mere
mere
mere
merely
merely
merely
merely
merely
•
•
•
•
•
•
•
•
sane
sane
sane
sane
sane
sane
sane
sane
sanely
Effect of “Dominance” on Lexical Access:
view from interactive dual route model
• Hay: affix dominance leads to difficulty in
parsing/decomposition, thus reliance on whole-word
recognition and suppression of decomposition in
favor of whole-word route
• So, words with high affix dominance should not be
recognized via decomposition and should show only
surface frequency effects
Taft (2004): “Morphological Decomposition and the Reverse Base Frequency Effect”
Obligatory decomposition makes similar predictions as
interactive Dual Route model for RT in lexical decision
• Base frequency effects…
• RT to complex word correlates with freq of stem
• …reflect accessing the stem of morphological
complex forms whereas
• Surface frequency effects…
• RT to complex word correlates with freq of complex word
• …reflect the stage of checking the recombination of
stem and stripped affix for existence and/or wellformedness.
How can we distinguish these accounts
of RT differences?
• With brain evidence for the various stages of
lexical access leading up to the RT
– Interactive dual route models: no base frequency
effects at lexical access for affix-dominant words
– Full decomposition: base frequency effects across
affix- and stem-dominant words at lexical access
followed by surface frequency effects in RT
associated with recombination
Reilly, Badecker & Marantz 2006 (Mental Lexicon):
Experiment: parallel behavioral and MEG
processing measures
• Lexical Manipulation (Baayen, Dijkstra & Schreuder, 1997,
JML)
– Lemma/stem frequency (CELEX database)
– Stem vs. affix dominance
Stem
Frequency:
Stem Dominant=
low surface freq
Affix Dominant=
high surface freq
High
desk – desks
crop – crops
Mid
deck – decks
cliff – cliffs
Low
chef – chefs
chord – chords
Stimuli: 3 Lexical Categories
fully productive morphology
• Nouns: singular/plural
– bone
– bones
• Verbs:
stem/progressive
– chop
– chopping
• Adjectives: adjective/-ly adverb
– clear
– clearly
Experiment: behavioral measures
• Reliable effect of stem frequency in RT in lexical
decision
Response Time (ms)
760
740
720
700
680
660
640
620
High
Medium
Stem Frequency
Low
Experiment: behavioral measures
• Interacting effects on RT of affixation (base vs.
affixed) and dominance (base-dominant vs. affixdominant)
Response Time (ms)
780
B
Base-Dominant
J
Affix Dominant
760
B
740
720
700
680
J
B
J
660
640
Unaffixed
Affixed
Affixation
This is a surface
frequency effect
for completely
regular
morphology.
Same words, both
base and surface
frequency effects,
undermining
Pinker theory
M350 sensors chosen subject by subject
Analysis of M350 peak latency
(brain index of lexical access)
• Reliable effect of Stem frequency for
unaffixed words and for affixed words
Unaffixed Words
Affixed Words
400
M350 Peak Latency
400
350
300
250
350
300
High
Medium
Stem Frequency
Low
250
High
Medium
Stem Frequency
Low
Analysis of M350 peak latency
M350 Peak Latency (ms)
• No effect of Dominance (base-dominant vs. affixdominant) - no effect of surface frequency - on M350 peak
latency – Against prediction of interactive dual route
400
theory
350
300
250
Affix Dominant
Base Dominant
Affixed Words
Analysis of M350 peak latency
• No interaction between Dominance (base-dominant vs. affixdominant) and Affixation (base vs. affixed)
M350 peak latency
780
760
B
740
720
700
680
J
B
J
M350 Peak Latency (ms)
Behavioral RT
385
375
J
B
365
355
B
J
345
660
640
335
Unaffixed
Affixed
Affixation
Unaffixed
B
Base-Dominant
J
Affix Dominant
Affixed
Affixation
Analysis of M350 peak latency
• Evidence that early stages of access for affixed words is
based on full parsing: Stem frequency affects
M350/lexical access while whole word frequency affects
post-access (recombination) stage of word recognition.
M350 Peak Latency and Residual RT for Base-Dominant and Affix-Dominant Affixed Words
Base Dominant
Affix Dominant
0
100
200
300
400
500
600
700
800
But what about evidence for parsing and
recombination?
RMS Correlations Across Subjects
• For some set of sensors, calculate at each time point
in each experimental “epoch” the root mean square
(RMS) = the square root of the mean of the squares
of the values at each sensor (after normalization of
values)
• So, for each subject, for each item, an RMS “wave”
can be provided for the correlational analysis
• At each time point, the RMS value for each stimulus
is correlated with a stimulus variable
Grand Average All Stimuli All Subjects (11)
M170 sensors chosen on the basis of field
pattern, subject by subject
M170 Correlation with Dominance:
Significant “parsing” effect
The higher the transition probability from stem to affix, the higher the
M170 amplitude – for affix-dominant words
Recombination Effect?:
Correlation with Conditional Probability of Stem, Given Affix, for
Affixed Words at 450ms, after the M350
Summary of Dominance Exp
• Base and Surface Freq RT effects for same words again argues
against simplistic (Pinker) Dual Route theory
• Affix dominance effect at M170 for high affix dominant words
argues against Hay’s interactive Dual Route theory, where
such words should be accessed via the whole word route – as
does lack of M350 latency effects for these words
• M350 latency effects for stem frequency but not surface
frequency (and not affix dominance) followed by effect of
transition probability from affix to stem post M350 argues
that recombination dominates RT effect for surface frequency
of affixed words
Evidence for an orthographic word
form lexicon
• Frequency of stem relative to full affixed form – affix
dominance – correlates with M170 amplitude;
implies access to some kind of stem representation
• Zweig & Pylkkänen (2008) show M170 effect of
decomposition in the contrast between farmer
(complex) and winter (simple), where the contrast
implies access to a representation of farm at the
M170 (wint lacks a representation)
Zweig & Pylkkänen (2008, LCP)
Bimorphemic: farmer, Monomorphemic Orth: winter
Modality-Specific Access Lexicon?
• Pulvermüller in a number of studies has found early
(~150ms) word frequency effects in evoked brain
responses in the posterior brain regions
• These are found for monomorphemic words, and the
effects seem limited to shorter words
• These could be explained by higher order n-gram
frequencies - by the frequencies of letter strings, i.e.,
by features of word form representations that do not
make contact with the (semantic) lexicon
Modality-Specific Access Lexicon?
• “Parsing” at the M170 requires access to word forms (or to
high-n n-grams)
• Dominance effects at the M170 suggest frequency
information associated with word-forms
– dominance reflects the conditional probability of the affix
given the stem, where notion of “stem” implies form
representation of the stem
• Difference between visual word form representation and
lexical entry?
– heteronyms like “wind” (“moving air” vs. “twist”)
– visual word form frequency is not the same as lexical
frequency
– “wind” has one word form frequency but two lexical
frequencies, one for each meaning
Lexical access in early stages of visual word
processing: A single-trial correlational MEG study of
heteronym recognition
Marantz & Solomyak (2008, Brain & Language)
• All (20) monomorphemic heteronyms (meeting other
criteria) of English
• If M170 marks access to visual word form
representations, but not lexical entries, then only
form frequency variables associated with
heteronyms should correlate with M170 brain
activity
• If M170 marks lexical access, relative frequency of
the 2 pronunciations of heteronyms should correlate
with activity
Regions of interest derived from peak activity in grand averaged data
across subjects
Visual Word Form Area
Left Hemisphere — Ventral View
• The white point represents the peak of the Visual Word Form Area, as
identified by Cohen et al. (2002)
• The yellow line outlines the region of peak M170 activation in an average of
9 subjects’ brain activity.
Mean Activity in LH M170 Region for 9 Subjects
(Dotted line shows average across subjects)
Grand averaged activation over time from M170 and M350 ROIs
Only the form property (~bigram frequency) showed significant correlation
with brain activity in the M170 ROI while only the semantic property (ratio
of frequency of meanings) showed a significant correlation in the M350
ROI
A Monte Carlo procedure was used to test for significance in the face of
multiple comparisons (across time points)
Evidence so Far
• Decomposition even for “affix-dominant”
words
– evidence at M170 that high transition probability
between stem and affix makes affix-stripping
harder
– evidence post-M350 and at RT that surface
frequency makes recomposition easier
• Evidence for “visual word form lexicon”
accessed at M170
– transition probability effects at M170 depend on
frequencies over word form representations
– complexity effects at M170 (Zweig & Pylkkänen)
depend on wint vs. farm word form contrast:
farm is a word form but wint (in winter) isn’t
• Evidence that word form effects involve word
forms, not lexical entries
– open bi-gram frequency (representational form
for word forms) correlates with activity at M170
– but frequency ratio for heteronyms doesn’t
correlate with activity at M170
– but does correlate with activity at M350
What about the status of bound stems?
Can MEG help settle a disputed linguistic issue
• Bound stem: durable
– same root in duration
– predicts durability
• Unique stem: amiable
– no other uses of root
– but, predicts amiability
tracking the -able in amiable
• If words like durable with a recurring root and
amiable with a unique root nevertheless are
parsed and computed as is workable with a
free root, then
– M170 “parsing” effects should be visible for these
“opaque” words, since effects are strong for affixdominant words
– M350 effects should be observed for stem
frequency for bound stems
Crucial contrasts:
• To show effect of affix processing, need to show
correlation with, e.g., affix frequency that is not
equally explained by the positional frequency of the
letters at end of the affixed word
– distinguish “able” as affix from “a-b-l-e”
• To show effect of “parsing variable” transition
probability of affix given the stem, need show
correlation with transition probability that is not
equally explained by the transition probability
between the last letters of the stem and letters of
the suffix.
Categories of Affixed Words for New
Experiment
• 1. Free Root-Affix
– taxable
• 2. Bound Root-Affix
– tolerable
• 3. Unique Root-Affix
– capable
• Morphological parsing as from English
Lexicon Project
Nine Affixes
(All derivational suffixes in English that yielded reasonable
number of examples for each category)
• able
• ary
• ant
• ity
• ate
• ic
• er
• al
• ion
The ROIs determined again from the grand average
across subjects.
Decomposition Effects at M170
• Positional letter string freq effects at M130
• Affix freq effects but no letter string effects at M170
• Morph trans probability effects but no orthographic trans
probability effects at M170 – Multiple regression, taking out
first (non-significant) orthographic parsability leaves
significant effect of morphological parsability at M170
Summary
• At M130, form property of final letter frequency
correlates with activity
• At M170, affix frequency but not final letter
frequency correlates with activity for all groups,
including bound and unique root groups
• At M170 transition probability between stem and
affix, but not between last letters of stem and letters
of affix, correlates with activity for both free and
bound stems
• At M350, stem frequency effects for both free and
bound root stems
bound stems
• For transition probability results, bound stems
pattern with free stems
• For affix frequency results, all stems, including
unique bound stems, pattern alike
• Thus we find evidence for full decomposition
for free, bound, and unique stems
Conclusions
• All evidence massively disconfirms a Pinker-style dual
route theory in which some morphologically complex
words are recognized as undecomposed wholes
• Full Decomposition theories of lexical access are
completely consistent with (in fact predict) surface
frequency effects for morphologically complex words
• Surface frequency effects reflect statistics of
composition rather than the frequency of whole
word access
• MEG data confirm the existence of a visual
word form lexicon that enters into
morphological decomposition in the
recognition of complex words
• MEG confirms the morphologist’s claim that
decomposition extends to bound and unique
roots
Thanks to the audience and the colloquium
organizers!
Download