My title raises the issue of the relevance of experiments in

advertisement
Brain Waves and Button Presses: The Role for Experiments in Theoretical Linguistics
Alec Marantz, Utrecht, August 29, 2003
My title raises the issue of the relevance for so-called theoretical linguistics of
experiments in neuro and psycholinguistics. By theoretical linguistics, I mean the theory
of grammar, that is, the theory of the mental representations of language in the mind and
brains of speakers. I run an MEG laboratory at MIT, in which we use magnetic sensors
to monitor brain responses of subjects, and I’m often asked what the MEG studies might
have to do with my theoretical work in morphology and syntax.
Now, on the one hand, it might seem obvious that if theoretical linguistics is about the
grammars in the minds and brains of speakers, brain responses from these speakers
should be relevant to the enterprise. However, on the other hand, at least for those of us
that grew up linguistically in the 60’s and 70’s, we were taught to respect the crucial
competence/performance distinction. [SLIDE] We study competence, but brain waves
would seem to weigh in more on the performance side. In particular, neuro and
psycholinguistic data would be data about language use rather than about linguistic
representations.
Well, what about the competence/performance distinction? Recall that Chomsky framed
this distinction to explain the difference between cognitive approaches to language and
the approaches of the structuralist linguists and the behaviorist psychologists.
Performance theories for the structuralists and behaviorists weren’t cognitive theories of
language use but accounts of the behavior itself (about corpora of utterances, of
example). So a structuralist grammar was supposed to be a compact reduction of a
corpus of sentences. That is, a “competence” theory isn’t a theory devised without
consideration of linguistic performance, used as the data relevant for deciding between
competing theories. Rather, competence theories, about internal representations, are
contrasted with performance theories, which are reductions of or models of behavior.
Although Chomsky recast the competence/performance distinction into the I-language/Elanguage contrast to avoid some confusions, the original distinction is still important, and
should be kept in mind when evaluating the neo-behaviorist theories of some
connectionists. A connectionist model of English past-tense formation, for example, is a
performance theory. The internal representations of the model aren’t representations of
knowledge of language – for example, knowledge of verbs or of tense – but rather are
representations geared to modeling performance, the performance of connecting stem and
past tense forms of verbs. To the extent that any theory of the knowledge of language is
involved in such models, it is embodied in the structure of the input and output nodes to
the model, and here the structure is built-in and presupposed, not part of the model at
issue.
[SLIDE] A well-trained graduate of my Department should tell you that, for linguistic
theory, data are data – there is no special status associated with judgments of
grammaticality or judgments of logical entailment. Every piece of data used by linguists
must be the product of the linguistic knowledge of a speaker interacting with various
2
“performance” systems. We test theories by making assumptions about these
performance systems and trying to abstract away from their effects on language use.
Despite a general understanding of the relationship between data and theory in
linguistics, there are still some lingering suspicions and misunderstandings. Let’s first
address some of the misunderstandings. I work on issues in lexical access, and
frequencies are an important factor in any behavioral response to words. [SLIDE] In
general, of course, the more frequent a word or piece of a word is, the faster a speaker’s
response to the word. I’ve been told by some that frequency effects are irrelevant to
competence theory. Or, alternatively, I’ve been told by others that frequencies must be
represented in the grammar to account for behavior. According to these critics, linguistic
theory is deficient in ignoring frequency in accounting for linguistic knowledge.
I hope it’s clear to this audience that nothing’s at stake here with respect to frequencies,
at least given our current knowledge of how frequency affects behavior. If we represent a
word’s frequency in the grammar by coding the frequency in the size of the
representation, would frequency be part of the linguistic representation? Clearly, the
important issue of linguistic representation is whether frequency of the elements in
linguistic computation interact with something else in an interesting way. If no rules,
constraints or principles of grammar make direct reference to frequency and if the effect
of frequency on computation is some function, linear or non-linear, of the frequency of
the pieces computed over, than the linguist is free to abstract from frequency in the theory
of linguistic representations and computations. And it’s not an issue whether or not
3
frequency is part of linguistic representations. On the other hand, since frequency effects
are so robust and consistent, experiments can exploit the frequency variable to probe
issues of representation – as I’ll illustrate later.
Other misunderstandings include the notion that since, in some theories of grammar,
grammaticality vs. ungrammaticality can be a categorical distinction, linguists treat
judgments of grammaticality as a categorical response. Whatever judgments of
grammaticality are, they are, like any behavioral data, only indirectly related to the
computational system of grammar in speakers and don’t represent special access to the
principles or representations of language. For example, a linguistic theory might claim
that the structure shown here for “gloriosity” is ungrammatical while the similar structure
here for “gloriousness” is grammatical. However, this doesn’t directly predict that
speakers will judge the sound or letter string “gloriosity” as bad and the string
“gloriousness” as good. Rather, predictions about speaker’s judgments must involve both
the theory of grammar and the theory of the task presented to the speaker. I take this
point as rather obvious, but it’s often missed in the literature, at least in the heat of battle.
For example, in a recent article, Mark Seidenburg and colleagues fault a linguistic
account of the contrast between “rats eater” and “mice eater” because the account of why
“rats eater” is ill-formed does not in itself predict that “mice eater” should be less
acceptable that “mouse eater.” That is, they take the categorical label “grammatical” for
“mice eater” to imply that speakers should judge this compound as being as good as any
other grammatical compound.
4
In general, we can clear away these conceptual impediments to a useful integration of
neuro and psycholinguistic data into theoretical linguistic research. However, someone
unfamiliar with the daily work of a linguist might legitimately raise a valid suspicion
concerning the relevance of a linguist’s grammar to predictions of behavior and brain
responses in experiments. Language relates sound and meaning. Linguistic grammars
are a formal account of this relation. Generative grammars instantiate the mapping
between sound and meaning via a particular computational mechanism. Suppose a
linguist’s grammar is accurate in that it describes a speaker’s knowledge of the
connection between sound and meaning. [SLIDE] Does that mean that the speaker
couldn’t also have other computational mechanisms to mediate between sound and
meaning, for example, certain performance strategies optimized for particular types of
language use. For example, couldn’t there be separate specialized systems for
comprehension and for production that derive sound/meaning connections independent of
the grammar?
This particular instantiation of a competence/performance distinction would be a rough
description of Tom Bever’s position. In particular, I think it would be fair to characterize
Bever’s position as one in which listeners use special strategies in sentence
comprehension that by-pass the grammar on the way to an interpretation, at least initially.
His slogan is, I believe, “Syntax Last.”
In the area of morphology – the area for which I’ll present some experimental data during
this talk – the notion that speakers can by-pass the grammar in language use most often
5
surfaces in the proposal that words or other chunks of structure can somehow be
“memorized” and thus that speakers can produce and comprehend these structures as
wholes, without composition or decomposition by the computational system of grammar.
To be honest, I have never seen a version of this proposal – that we memorize certain
complex chunks of structures and access them in production and comprehension as
wholes – I haven’t seen a version of this that I find even coherent. To take a simple case,
what would it mean to say that we memorize “gave” as a whole, without always
decomposing or composing “gave” into or from a stem GIVE with a past tense
morpheme? Unless “gave” has both GIVE and past tense in it, we can’t account for the
blocking of *gived. As far as I can tell, people that claim that “gave” is memorized while
“walked” is not are simply claiming that the relationship between stem and past tense for
“gave” is different than that for “walked,” and this difference involves things we know
about the stem GIVE from experience with “gave.” Of course no one would disagree
with that kind of statement, but it doesn’t imply any kind of “dual route” to linguistic
representations.
Suppose we try to take seriously the claim that there are multiple routes to linguistic
representations such that the linguist’s grammar would not describe the computational
system used in language comprehension or production. Suppose that button-press
reaction time data and brain waves lead to a particular theory of linguistic computations
during language comprehension and production – say something like Bever’s strategies.
If psycholinguists did devise and empirically support such an alternate route to linguistic
representations, the linguist should be both shocked and confused. Shocked because
6
there would be some alternative computational system for generating linguistic
representations that the linguist hadn’t considered him/herself. How did the
psycholinguist come up with a system that would be descriptively adequate in the sense
of yielding linguistic representations that successfully connect sound and meaning but
using a novel computational mechanism? Confused, because the linguist him/herself has
to believe that s/he is using data from comprehension and production in designing and
evaluating his/her theory. What special data did the linguist have that the psycholinguist
ignored?
To argue for multiple computation systems one would have to show that there are two
separable types of data such that one computational system would be responsible for one
and the other for the other. If you just squint at the issue, one might suppose that
comprehension and production could provide these dual data sets requiring separate
systems, although as far as I can tell, realistic investigations into this possibility yield the
conclusion that comprehension and production are largely computationally equivalent.
But there is no way to imagine that there’s a set of data associated with the linguist’s
grammar that would be distinct from data connected to computational accounts of
language production and comprehension. The linguist would be very worried if some
psycholinguistic theory successfully provided an account of the connection between
sound and meaning in language that involved computational mechanisms independent of
the linguist’s grammar. If a psycholinguist claimed to have developed such a theory, then
the linguist should take it on as an alternative hypothesis about the computational system
of language in general and put it in competition against his/her own theory. So, for
7
example, we can argue against Tom Bever’s account of computation during sentence
comprehension by arguing that the representations he claims listeners’ compute using
strategies do not in fact capture the information that listeners extract from linguistic input
– that is, the representations that the strategies yield are not representations that speakers
of a language actually have in their heads.
When we claim, then, that language involves a single generative engine – a single system
for linguistic computation necessarily involved in the analysis and/or generation of
linguistic representations of all sorts, this claim is meant to cover analysis and generation
in language production and comprehension, in fact, in any situation natural or
experimental in which speakers are required to access or produce language. If someone
were to argue for multiple computational systems, the first step would be to show that
that an alternative computation were descriptively adequate, that is, actually described
sound/meaning connections of the sort captured by the linguist’s grammar. If a
psycholinguist proposed such an alternative computational system, the linguist would
take it seriously as a competing theory of grammar. Then the psycholinguist would need
to show that in fact one can separate two sorts of data, where each sort should be
associated with a particular computational mechanism. [I believe that Tom Bever’s
group is the only group of psycholinguists that appreciates the challenges behind
proposing dual systems models of language knowledge and use.]
I have been arguing that linguists cannot treat psycho or neurolinguistic data as special in
any way, subject to an account in terms of an alternative computational system from that
8
proposed by linguists. Every bit of data related to language potentially implicates the
sole computational system for language. Nevertheless, random data of any sort may not
be of particular interest. Why do I believe it’s important to have an MEG machine in my
linguistics department, and why do I applaud the Utrecht Linguistics Institute’s
integration of theoretical and experimental approaches to language?
It’s useful to separate the positive impact of experimental approaches to language on
linguistic theory into three levels: (1) A symbolic importance, a reminder of the
potential testability of competing analyses (2) A constraint on linguistic theory from
what might be called the logical problem of language use (3) Clarification of the
concrete mechanisms of language processing in the brain that allows straightforward
interpretation of brain and behavioral data.
I find the symbolic importance of neurolinguistics to be invoked in the following Wallace
Stevens poem. Take “Tennessee” to stand for the MIT linguistics department and the
“jar” to be the MEG machine in the Department.
I placed a jar in Tennessee,
And round it was, upon a hill.
It made the slovenly wilderness
Surround that hill.
The wilderness rose up to it,
9
And sprawled around, no longer wild.
The jar was round upon the ground
And tall and of a port in air.
It took dominion every where.
The jar was gray and bare.
It did not give of bird or bush,
Like nothing else in Tennessee.
Wallace Stevens
The presence of the MEG machine changes how one thinks about alternative theoretical
possibilities. When theoretical alternatives are being considered, the MEG machine
stands as a reminder that there always can be a fact of the matter about a choice between
theories. That is, to the extent that we want to take our theoretical machinery seriously,
the brain either does it this way or it doesn’t. We may not have any idea at the moment
about the data that might decide between the alternatives, but we can imagine finding a
brain response predicted by one vs. the other theory, at least in principle.
We’re used to constructing linguistic theories under the constraint of the logical problem
of language acquisition – the representational system of a language much be such that a
child could acquire it. Linguists are less accustomed to overtly relying on another
“logical problem” – linguistic computations much be such that they are computable by
10
speakers. I’m calling this a “logical problem” in a very loose sense – I don’t mean that
linguistic computations must be computable in a mathematical sense. Rather, I mean that
we can reach certain conclusions thinking about what’s necessary for language use
independent of particular experiments. For example, any theory that requires linguistic
computation always to operate with all the lexical items in a sentence is hard to reconcile
with the fact that people seem to start computation with the first words in a sentence and
develop representations to a large extent from first to last – left to right. The locality
domain in which linguistic computations connect sound and meaning, then, should be
much much smaller that the sentence. Chomsky’s “phase” based cyclicity passes this
logical test of language use, providing a locality domain that doesn’t obviously fly in the
face of what people seem to be doing.
In order to make predictions about button presses or brain waves in experiments, we need
to map linguistic computations across time and across brain space. Even to get started on
this enterprise, the theory must be practically mapable, i.e., mapable in practice. So,
contemplating doing experiments requires addressing the “logical problem of language
use” through an explicit, although partial, commitment to real-time operation of linguistic
computations.
Finally, the more we actually learn about linguistic computation in the brain, the more
straightforward and natural it becomes to interpret various kinds of behavior or brain data
in connection with linguistic issues. For example, what do we make of reaction time data
in lexical decision experiments? Suppose, for example, that hearing “walked” speeds up
11
one’s lexical decision to “walk” while hearing “gave” doesn’t speed up one’s lexical
decision to “give.” Without detailed knowledge of how the brain does lexical decision,
one might be tempted to interpret this result as indicating that “walk” and “walked” are
lexically related while “gave” and “give” are not. When we map out lexical access in
time and in brain space, however, we can understand the various priming and inhibitory
reactions that might yield no button press RT priming in the case of gave/give, despite the
fact that processing “gave” involves computations with the same stem as appears in
“give.”
GO TO POWERPOINT SLIDES.
12
Download