A Preliminary Probabilistic Model of Language Processing

advertisement
A Preliminary Probabilistic Model of Language Processing
Hadjar Homaei
Michael Mozer
James Martin
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Institute of Cognitive Science
Institute of Cognitive Science
Institute of Cognitive Science
University of Colorado at Boulder
University of Colorado at Boulder
University of Colorado at Boulder
Boulder, CO 80309
Boulder, CO 803039
Boulder, CO 803039
hadjar.homaei@colorado.edu
mozer@colorado.edu
ABSTRACT
The time course of language processing is a huge issue in
psycholinguistics. Timing of things such as eye movements,
ERPs, button presses, etc., is used to adjudicate between several
major classes of theories, and there are a number of computational
models that try to explicitly model the dynamics of interacting
information sources. The time course of processing is thus a very
interesting question.
martin@colorado.edu
Despite decades of research on language, we are far from answers
to any of these questions. Having only recently begun to see the
true importance of statistical properties of the language
environment, we are now faced with an unfortunate lack of
sufficient data, particularly on the typical exposure of children in
everyday life at various stages of development. Although we have
acquired vast knowledge about such tests as priming and lexical
decision, we have relatively little information about the major
language abilities of comprehension and production.
Keywords
Language processing, Sequential processing, Probabilistic models
1. INTRODUCTION
Symbolic, rule based models of human language understanding
have been linguistics preference for so long, mainly because they
are straightforward and able to handle the recursive nature of
language processing.
Because of their structured knowledge, these models were usually
not able to learn and were simply our knowledge of language,
hard-coded for the hardware to process. Interestingly enough it
very matches the popularity of Chomsky school of language and
the belief that the main engines of language processing is innate in
the human brain and there is not much to learn after all.
But over the past three decades there has been rising interest in
connectionist models of language processing. At the start, these
models were actually straightforward implementations of the
same symbolic models using simple neural networks processing
units. However, as the representational and learning abilities of
connectionist theories and systems got a lot better, independently,
connectionist models have taken on a noticeably different
character than symbolic methods.
2. Issues in Sentence Processing
The term, “Sentence processing”, refers to the series of analysis
that lead to understanding sentences while reading. Some classic
empirical and theoretical issues in sentence processing are
ambiguity resolution, garden path effects, top-down vs. bottom-up
parsing, and modularity. But in order to understand how human
brain is able to learn to solve these problems, we have to
understand what information is available in the language learner’s
environment, what abilities does the language learner acquire,
what behaviors does the learner show in using those abilities and
finally, can a system learn those abilities in an appropriate
environment and what properties must the system have to show
the same behaviors as human learners?
3. Approaches to Sentence Processing
The traditional approach to language research has focused largely
on the task of parsing; that is, constructing a labeled hierarchical
representation of the structure of a sentence, usually reflecting the
grammatical rules used to generate it. Symbolic models excel at
manipulating such representations, but they struggle to
incorporate meaning into sentence processing. It is not easy to
capture slight variations in meaning with variables and values.
One aspect of language that can be problematic in particular is the
way that small aspects of word meaning can have a major
influence on the correct syntactic interpretation.
Most sentence processing models are designed to address one of
four major language tasks: parsing, comprehension, word
prediction, or production. Because the main goal of our research is
on word prediction we will also focus on word prediction
literature in our brief review of the literature.
3.1 Connectionists Models
Our focus here is on models that try to explain the semantic and
syntactic issues involved in processing multi-word utterances.
Connectionists model in this area use a wide range of approaches
to solve this problem, from localist implementations of symbolic
systems, to systems of interacting localist units, to distributed
representations and multi-layer learning rules, to recurrent
learning systems.
Some of the nest connectionist models of sentence processing are
the ones that try to do word prediction. Word prediction is a
surprisingly useful ability. It can be used as a basis for a language
model which computes the likelihood of a particular utterance
happening in the language. It can also be used in most speech
recognition systems to resolve the ambiguity by using that
likelihood from the word prediction model. Accurate word
prediction is actually sufficient to generate a language model and
therefore it can also indicate knowledge for the grammar of that
language. Because of that, word predictions models are sometimes
called parsers too although we usually use this term to refer to a
system that can explicitly reveal syntactic structure of a sentence.
The most well known connectionist prediction models are Elman
models (Elman, 1990, 1991, 1993) that use simple-recurrent
networks, also known as Elman networks.
Elman (1990) used simple recurrent networks (SRN) to do letter
prediction on a letter sequence of concatenated words. It was also
shown that this model could be used to detect word boundaries by
identifying locations of high entropy in the sequence where
prediction becomes difficult. Later, Elman extended this model to
do word prediction in a simple language. Word representations
that are produced in the network’s hidden layer can be clustered to
generate a syntactical and semantical classification of words. This
shows that so much of the knowledge that is required for parsing
and comprehension can be extracted from a word prediction
system.
Elman (1991) extended the word prediction model that was
originally based on a simple language to process sentences that
could possibly include multiple embedded clauses. The major idea
of this work was to show that connectionist models are able to
learn and represent complex and hierarchical structure. This is a
very important aspect of natural languages that every language
processing model should take into account if it wants to deal with
natural language. As Elman puts it, “The important result of this
work is to suggest that the sensitivity to context which is
characteristic of many connectionist models, and which is built-in
to the architecture of SRNs, does not preclude the ability to
capture generalizations which are at a high level of abstraction”.
One major and very interesting finding of this work was that these
networks could only learn highly complex sentences only if they
had first started by learning simpler structure. This idea was
explained further in Elman (1993), in which he shows that these
networks can also learn complex structures if their memory spans
are held back in the beginning and are allowed to expand
gradually.
Most of other connectionist word prediction models are more or
less based on Elman (1999). Chater and Conkey (1992) compared
Elman’s SRN training method to a more complex alternative,
backpropagation through time (Rumelhart et al., 1986), which
extends the propagation of error derivatives back to the beginning
of the sentence. Naturally, they found that backpropagation
through time, which is slower and a lot less “biologically
plausible” method, can produce better results that Elman’s SRN.
Christiansen (1994) tested the ability of SRNs to learn simple
languages that have three types of recursion: counting recursion,
center-embeddings, and cross-dependencies, which context-free
grammars can not account for. However, the result of these
experiments comparing to statistical bi-gram models was really
poor. In some cases they even worked worse than uni-gram
models. So in a next experiment, Christiansen extended the
language that was used by Elman (1991) to include prepositional
phrases, left recursive genitives, conjunction of noun phrases, and
sentential complements. Generally, those networks performed
better on these extended languages and showed behaviors that
reproduces human comprehension performance on similar
sentences. In a rather more recent work Christiansen and Chater
(1999) extended their preliminary results and could provided more
detailed comparisons with human performance.
Finally, Tabor, Juliano, and Tanenhaus (1997) did a number of
experiments to compare human and SRN network reading times
on sentences that included structural ambiguities.
Even though they had used a simple-recurrent prediction network
in these studies, but relative reading times were deduced using an
interesting dynamical system analysis. Basically, the hidden word
representations that are produced in the hidden layer of the
network at different stages in processing sentences are plotted in a
high dimensional space and they are treated as physical masses
that have a certain gravitational force. To decide on the reading
time of the network on each word, the network’s hidden
representation is plotted in the high dimensional space and then
allowed to float among the attractor masses until it finds a stable
state and stays there. The settling time is used as a substitute for
reading time.
3.2 Probabilistic Models
Probabilistic methods are providing new approaches to the whole
problem of sentence processing and it also provides means to
investigate fundamental cognitive science questions of how
humans structure, process and acquire language. In such models,
Language comprehension and production involve probabilistic
inference and acquisition involves choosing the best model, based
on innate constraints and linguistic and other perceptions. One of
the advantages of probabilistic models over connectionist models
is that it can account for the learning and processing of language,
while it still maintains the sophistication of symbolic models.
Recent growth of theoretical developments and online corpus
creation such as PropBank, WordBank, FrameNet, etc. has
enabled large statistical and probabilistic models to be tested,
revealing probabilistic constraints in processing and suggesting
important links with probabilistic theories of categorization and
ambiguity resolution in perception. [2] We will have a short
review on probabilistic models as we justify why we are using
them in 4.1.
4. Our Approach
The current plan of our research is focused on three main areas of
language processing: production, prediction and comprehension,
and will try to find a plausible story about how different
constraints – Syntax, Semantics and Pragmatics are combined and
interact in a probabilistic system to do so. At last, the model
should be judged by its ability to explain human behavior.
Although some progress has been made, many technical questions
remain to be answered before the model will be complete and
ready for evaluation.
4.1 A Probabilistic Model
There’s enough evidence to believe that human processing of
language has an underlying probabilistic structure Evidence for
the role of frequency and probability specifically in language
comprehension processing. Evidences gathered up throughout the
second half of the 20th century, showed that high frequency
words are accessed more quickly, they are accessed more easily,
and they are accessed with less input signal than low-frequency
words. [read Chater 2007 for more]
Although these evidences can convince us to try to use
probabilistic methods to model language understanding, and we
know that many kinds of knowledge must interact
probabilistically in the process of building an interpretation of a
sentence but we still do not know very much about how this
probabilistic process takes place, how different aspects of
linguistic knowledge is represented, how these probabilities are
combined, how some interpretations are favored over others and
picked, what is the relationship between probability and
behavioral measures like reading time.
Detailed understanding on time course of the use of different
types of knowledge, clearing out the role that memory limitations,
interference, and locality plays in sentence processing are key
aspects of the architecture of the human sentence processing
mechanism. A thorough understanding of sentence processing
needs to combine these results and build one single
comprehensive model. But unfortunately most of them do not tell
us enough about our former question: how can we understand the
role of probability in representing linguistic knowledge,
combining evidence, and selecting interpretations.
Constraint- based (or sometimes called constraint-based lexicalist)
approaches to model probabilistic aspects of language processing
addresses some of these questions about the role of probability.
The main idea of constraint-based models is that they keep and
process on all interpretations of a partially read or ambiguous
sentence in parallel and choice among these competing
interpretations is made by integrating a large number of
constraints over a wide range of types of knowledge. There have
been a number of computational implementations of the
constraint-based framework that are mainly neural network
models that get various frequency-based and contextual features
as input and combine these features using activation to converge
on one particular interpretation as the winning interpretation
(Burgess and Lund, 1994; Kim et al., 2002; Spivey-Knowlton,
1996; Pearlmutter, Daugherty, MacDonald, & Seidenberg,
1994).[1]
The most completely implemented of these models, and the one
that makes the clearest claims about integration of probability of
probabilistic constraints and how probabilities can imply
processing-time, is Spivey and colleagues’ competitionintegration model (Spivey-Knowlton, 1996; McRae et al.,
1998)[1] This model uses a normalized recurrence algorithm for
modeling constraint integration. Now, the main problem with
these constraint-satisfaction models is deciding on their structure:
how these structured interpretations should be built
probabilistically, how structural knowledge plays a role, how do
we set probabilities for structure and how do we combine
constraints based on the structure.
Actually, there are other probabilistic models which focus on
exactly these questions of structure. For example Jurafsky (1996)
and Crocker and Brants (2000) both propose sentence processing
models based on probabilistic grammars, which present a
principled foundation of probabilistic structure by their nature.
Jurafsky’s model is a probabilistic parser that keeps multiple
parses of an ambiguous sentence, ranking each possible parse tree
by its probability. The probability of an interpretation is computed
by multiplying two probabilities: the stochastic context-free
grammar (SCFG) “prefix” probability of the portion of the
sentence that is already seen, and the “valence’”
(syntactic/semantic subcategorization) probability for each verb.
[1]
However, in spite of their probabilistic nature, neither of these two
models, Jurafsky (1996) nor Crocker and Brants (2000) modeled
the probabilistic relation between individual words, often known
as word transition probabilities or word bigram probabilities,
which is a very important class of behavioral studies. In a recent
work, McDonald et al. (2001) studied the effect of this probability
on reading time by running eye tracking experiments and
recording their eye fixations in the points that subjects are read
verb-noun pairs embedded in normal sentences. Each verb-noun
pair either has a high transition probability or a low transition
probability.
High Probability: One way to avoid confusion is to make the
changes during vacation.
Low Probability: One way to avoid discovery is to make the
changes during vacation.
Other aspects of the sentence pairs, such as their length and
frequency of the noun in the corpus, neutral context, and sentence
plausibility were all matched.
McDonald et al. (2001) showed that the duration of subjects’
average initial fixation on the noun was shorter if the noun was in
a high-transition-probability verb-noun pair.
Our goal in this paper is to attempt to build a model which meets
these principles. The fundamental insight of our model is the use
of Graphical Models (specifically dynamic Bayes nets) in
modeling the probabilistic, nature of human sentence processing.
The advantage of using Bayes nets is that they can represent the
causal relationship between different probabilistic knowledge
sources, how they can be combined, and what we know about the
independence of probabilities.
In our Bayesian model of sentence processing, we construct
dynamic Bayes nets incrementally, as a sentence is being
processed (or produced). Each Bayes net integrates probabilistic
knowledge of lexical, syntactic, and semantic knowledge in an online manner. Our proposal is thus that human combine structure
and evidence probabilistically, computing and incrementally recomputing the probability of each interpretation of an utterance as
it is processed.
This model is on-line and incremental; it assigns structure word
by word as the sentence is read, changing structure as new
information comes into the parser. Like most sentence processing
models, our model is sensitive to various constraints, including
syntactic structure, thematic biases, and lexical structure. Also our
Bayesian model is probabilistic, incrementally computing the
probability of each interpretation conditioned on the input words
so far, and on lexical, syntactical and semantic constraints and
knowledge. So the most preferred interpretation at any time is the
one with the highest probability.
The fundamental insight of our Bayesian model is to build
multiple interpretations for the input, in parallel, compute the
probability of each interpretation, and choose the interpretation
with the maximum probability. Furthermore, this probability plays
a role in reading time; words or structure which are unexpected
(low probability) take longer to read.
4.2 Grammar
Rather than studying language understanding in the abstract, we'll
create a micro world and interpret language in terms of
interpretation of actions in the micro world.
S -> A V B | A W | B is X by A | A threw P
with I
V -> eats | throws | addresses
X -> eaten | thrown | addressed
W -> V | runs
A -> Student(s) | Professor | Secretary |
Hadjar | Mike | Pitcher
B -> Pizza | Aush
C -> Student(s) | Letter
I -> Hand | Bat
P -> Party | Ball
(This is a simplified grammar, the complete
grammar is not shown here)
4.3 Task
The basic issues that we will asses our model on are pragmatics
and two types of ambiguity resolution.
We expect our model to be able to capture the pragmatics of the
world, for example in our micro-world that is used for testing the
model, Mike likes Aush and does not eat Pizza, Hadjar likes Pizza
more than Aush. So when the utterance given to the model is
“Mike eats” the probability distribution over words must show a
general higher probabilities for words that represent food (Aush
and Pizza) and also must show higher probability for the next
word to be Aush rather than Pizza.
(1) Hadjar eats … (Pizza).
Mike eats … (Mike).
The first type of ambiguity that we are addressing is ambiguity
about case role types in a sentence based on the verb. For example
the verb “throw” in different senses can accept different case
roles. Consider the following examples from our grammar:
(2) The pitcher threw a ball with a bat.
Hadjar threw a party.
In the first sentence “throw” means “toss” and it can accept an
instrument case role in a sentence, while “threw” in the second
sentence means “cast” and it can not have an instrument.
Another task that we expect our model to be able to accomplish is
to be able to capture higher order dependencies where the system
has to decide about what object follows, not only based on the
verb or the agent, but based on both. Consider the following
examples, also from our grammar,
(3) The professor addresses … (students).
The secretary addresses … (the letter).
In neither of these sentences, the verb “address” can specify what
object might come next nor the agent “professor” or “secretary”.
In fact it’s the combination of agent-action that can specify the
object.
4.4 Architecture
Our Bayesian model basically tries to build multiple
interpretations of the input that are associated with a probability
and choose the interpretation with the maximum probability.
Assuming that that words or structures that are less probable, take
more time to read, this probability can later be used to model the
reading time.
Suppose that we are given an input sequence of words
W  {w1, w2 ,...} and a set of potential interpretations
I  {s1 , s2 ,..., sn } for sentence S. So our final task is to find the
most probable interpretation s* given the input sequence W. This
computation can be declared by the following formula:
s *  argmax P( s | W )
s
This equation tells us that we would be able to find the best
interpretation for a sequence of words and consequently the best
sequence of words that express a meaning, if we knew how to
compute P( s | W ) . In order to capture this probability, instead of
counting and dividing all word sequences and all interpretations,
which is impractical, we decomposed our knowledge of the model
into a number of components. (Figure 1) These components can
be categorized into three different types: Semantics, Syntax and
Context. Each of these categories has other components that
convey different parts of the knowledge in the model.
Figure 1. Generative Model
Semantics consists of Se which is the random variable for the
ultimate semantics of the world, (i.e. all interpretations), CR
which is a set of random variables, each representing one specific
type of case role and WM that stands for word meanings, also is a
set of random variables, same number as CRs.
Syntax consists of Sy which is the random variable for the general
syntactic structure and F which represents the form, aspect and
tense of each abstract concept that fills each role.
Context is represented by C as is simply a counter of where we are
in the sentence.
The dashed line around CR, WM and P is that this part of the
Bayesian network is repeated a constant number of times, but it is
still naïve bayes, and it is not a part of the dynamism of this
network.
The knowledge in the model consists of the following
distributions:








What we want: P(Se | w1, w2 ,...)
P( wi | Ci , Se, Sy,{CR,WM , F}) : Temporal constraint
P(Ci | Ci 1 ) : Context-update function. In a trivial,
localist way, this update function can simply be a
counter function, such that the context state increments
by 1 at each time.
P ( Sy | Se) : We could decide to make syntax
independent of semantics if we wanted to.
P({F } | Se) : Is the probability of form/tense/aspect of
the filler of each role. We separated this concept from
syntax to keep the syntax node only encoding the
structure of parse tree, so in future we might be able to
replace syntax node with some version of a incremental
statistical parser.
P({CR} | Se) : Represents the probability of each CR
participating in the sentence.
P({WM } | Se) : Is the probability distribution over word
meanings filling each case role.
P (Se)
The generative process first involves selecting the semantics of
the utterance, represented by the random variable Se. Given the
semantics, the syntactic structure must be determined, denotd by
the random variable Sy. Se, CR, WM, F and Sy jointly determine
each of the words in the utterance. Each{CR,WM,F}i set, as
depicted in figure 1. is related to one case role. For example,
{CR,WM,F}patient jointly determine the surface form that might fill
the role of “patient” in the sentence. C, Se and Sy’s contribution to
this will be finding the appropriate place to put this surface form,
in the sentence.
To encode the sequential structure of a sentence, we imagine a
context representation that evolves over time, denoted by c1, c2,
..., cn where time serves as the index. The context tells the model
where it is in the production of a sentence, and could represent
something like which constituent of the sentence is currently
being processed. The context is much like the context
representation in an Elman or Jordan net: it specifies where we are
in the sequence. Although the evolution of the context depends
only on the previous context in the generative model, when the
model is used for recognition, the individual words can modulate
the context representation. E.g., if ci = a, and from state a
transitions can be made to either state b or c, the inferred value of
ci+1 will depend not only on cn but also on wi+1.
The generative model above can be used for recognition, and the
formal statement of recognition is to estimate P(Se | w1, w2 ,...) -the probability of a given semantic representation given the
sequence of words.
Pragmatics comes from priors over semantics, and syntax
conditioned on semantics. Consider the verb eat. The patient must
be a physical object, and so we need to be able to represent the
semantics of "X eat Y", where X is any animate agent, and Y is
any physical object. Pragmatics comes in when we impose
additional constraints on the eating, e.g., Mike eats eggs and salad
but not pizza. That can be expressed in terms of priors over the
semantic representations.
5. Results
In testing our model we first generated a training corpus based on
the predefined grammar. Each record in the training set contains
knowledge about the semantic meaning of each sentence (Se),
meaning of each word (WM) and case roles (CR) and also the
word surface form and aspect (F). So basically the only hidden
information is the syntactic structure of the sentence.
The reason that we made all semantic information visible was that
during language learning, most of the semantics information is
accessible to the language learner through perception (and
instruction in case of thematic roles and different word meanings
with the same surface). However the syntax is not explicitly
accessible to language learners. Infants learn to speak without
having any idea about the existence of a grammar; still they can
capture some aspects of grammar through perception, such as how
to make a singular noun, pleural. That is why we made the form
and aspect information (F) visible to the learning algorithm.
We trained our system using this corpus and tested it on tasks
described in part 4.3. But first of all we tried our system to see
how it can incrementally update its interpretation of the sentence.
The graph in figure 2.a, shows how the distribution over semantic
values in the semantics node changes as new words are given to
the model. Point 1,2 and 3 in this graph represent semantic values
representing “Mike eats Aush”, ”Hadjar eats Aush” and “Jeff
eats Aush”.
This graph reveals both semantics and pragmatics of the world, by
giving high probability to these three first semantic values, so it
understands that Aush can only be eaten and only by people.1 And
also shows that if Aush is going to be eaten, it’s probably the case
that “Aush has been eaten by Mike” because Mike likes Aush
more than Hadjar and Jeff.
1
When testing this sample, professor and student and other animate
objects were not part of the corpus.
5.1 Word prediction, Pragmatics
In order to test our model on word prediction we used the sample
in 4.3.1.
Mike eats …
Hadjar eats …
We trained the model with a corpus that has priors over semantics,
so that “Mike eats Aush” is more probable than “Mike eats
Pizza”, and “Hadjar eats Pizza” is more probable than “Hadjar
eats Aush”, although all four sentences are semantically correct.
We gave our model the first two words of each sentence and then
computed the probability of the next word given two previous
words. The result is shown in figure 2.a and 2.b. point 4 in the
graph shows W4 which is “Aush” and point 5 is W5, “Pizza”.
You can see from the graph that the result given by the model
explicitly shows what has been favored in the corpus.
5.2 Case role assignment
To test our model on case role assignment we use the same
sample in 4.3.2 again.
The pitcher threw a ball with a bat.
Hadjar threw a party.
Remember that the meaning of the verb “throw” in the first
sentence is “to toss” which can easily have an instrument
associated with it, and in the second sentence it means “to cast”
which does not usually have an instrument as its arguments. So
we test our model by feeding each of these sentences to the
model, word by word and monitor the changes in the distribution
in CR nodes.
As shown in figure 3. our belief in the presence of each of
instrument case role (point 4) is changed after the word in
parenthesis is given to the model. Green and yellow bars
respectively show the belief in each CR before and after the new
word is fed to the model. Point 1 through 4 represent CRs for
Action, Agent, Patient and Instrument. Is it clear that after the
model has seen “Hadjar threw a party”, it is disambiguated the
meaning of “throw” and it does not believe that it will need an
instrument. On the other hand when the model reads “The pitcher
threw the ball” it still expects the sentence to have some
instrument role, but as it receives the next word “with” this belief
goes even higher.
Figure 2 Word prediction
5.3 Higher order dependencies
Our example for higher order dependencies was the following:
The professor addressed students
The secretary addressed the letter
To test our model on this phenomenon we gave these two
sentences to the model incrementally and monitored the changes
in WMpatient which gives us a prediction of what word meaning
will fill the patient case role.
Figure 4. shows the conditional probability distribution for
WMpatient values after the model has seen parts of the sentence.
Figure 3 Case roles
6. ACKNOWLEDGMENTS
Our thanks to Jeff Elman.
7. REFERENCES
[1] Narayanan, S., Jurofsky, D., A Bayesian model of human
sentence processing. in preparation, 2005.
[2] Chater, N., Manning, C., Probabilistic models of language
processing and acquisition. Trends in Cognitive Science,
Vol. 10 No. 7, July 2007.
[3] Grodner, D., Gibson, E., Consequences of the Serial Nature
of Linguistic Input for Sentential Complexity. Cognitive
Science, 29, 2005, 261-291.
[4] Elman, J., Finding structure in time. Cognitive Science, 14,
1990, 179-211.
Figure 4 Higher order dependencies
Dark and light brown bars respectively show probabilities, before
and after the model has seen the word in parenthesis.
Note that we have only shown probabilities for four values in
WMpatient for simplicity. They respectively refer to “Professor”,
”Secretary”, “Student” and “Letter”.
Based on the results, after the model have seen “the professor” or
“the secretary”, there’s no preference on “student” or “letter” to
fill the patient role. However after model receives the next word,
“addresses” it clearly selects “student” to fill the patient role for
the first sentence and “letter” for the second one.
[This is the example that Jeff Elman gave us to work on for higher
order dependencies, but I don’t see why it’s possible that one
model be able to do task 1 and 2 and not be able to do this one.]
[5] Griffiths, T., Tanenbaum, J., Two proposals for causal
grammers., 2005.
[6] Jurafsky, D., Martin, J. Speech and language processing.
Upper Saddle River, NJ: Prentice Hall. 2000.
[7] Hale, J., The Information Conveyed by Words in Sentence.,
Journal of Psycholinguistic Research, Vol. 32, No. 2, March
2003.
[8] Rohde, D., A Connectionist Model of Sentence
Comprehension and Production., PhD Thesis, School of
Computer Science, Carnegie Mellon University and the
Center for the Neural Basis of Cognition, 2002.
Download