Sentence Processing:

Sentence Processing:
Multiple constraints in action
Redundancy simplifies computation
• We've seen how redundant cues of many kinds impact on word
• Today we will look at how such cues help us to decipher
information in sentences so well
• Why do we need redundant sources of information?
– In information processing, redundancy removes ambiguity: as
in the error-checking bits in computer communication
– Uncertainty is decreased (probability of correct interpretation
is increased) whenever something you know narrows down the
range of what you don't know
– Any regularity is by definition informational = increases
probability of correctly predicting what you don't know (this is
what it means to have a regularity)
Pattern and information
“Any aggregate of events or objects open (e.g., a sequence of
phonemes, a painting, or a frog, or a culture) shall be said to
contain ‘redundancy’ or ‘pattern’ if the aggregate can be divided in
any way by a ‘slash mark’, such that an observer perceiving only
what is on one side of the slash mark can guess, with better than
random success, what is on the other side of the slash mark. We
may say that what is on one side of the slash contains information
or has meaning about what is on the other side. Or, in engineer’s
language, the aggregate contains ‘redundancy’. Or, again, from the
point of view of a cybernetic observer, the information available
on one side of the slash will restrain (i.e, reduce the probability of)
wrong guessing.”
Gregory Bateson
StepsTo An Ecology Of Mind
P. 104
It’s good to be on top!
• Sentences benefit form being at the high-end of the linguistic
• Constraints from many levels help disambiguate sentences:
syntactic, semantic, prosodic, pragmatic & ‘probabilistic’
– They are all ‘probabilistic’ but the last category emphasizes
that even if you know nothing at all about syntax, semantics,
and pragmatics, language is still not random: some words
appear more often than others, and some words are more
likely follow certain words than others.
– Recall Tom Landauer’s claim: 45% of sentences can be recontructed from their words by maximizing co-occurrence
Order in probability
• If language were entirely random, then all words would be
equally likely, and we already know that:
a.) Some words are much more frequent than others
b.) The language system is sensitive to word probabilitiesHF words are recognized more quickly than LF words
c.) Some words have a very narrow range of words that can
follow them
i.e. articles are almost always followed by adjectives or
nouns; never by verbs, articles, or prepositions
Order in probability
• If language were entirely random, then all words would be
equally likely, and we already know that:
d.) Syntactical constraints specify that words have a nonequal probability of appearing in certain places
i.e. subject usually before verb before object, or heads in
leftmost position in VP and NP)
- But this is much enhanced by semantics: Compare ‘the
N Ved into the X’ to ‘The train pulled into the X’
e.) Syntactical constraints specify that clauses have a nonequal probability of appearing in certain places
i.e.'if', 'in order to', 'because', 'since', 'whenever’ etc. all
require closing clauses)
The psychological reality of syntax
• Experimental evidence shows that words within a
single clause are read more quickly than betweenclause words
– Readers are especially slowed when they reach an
ambiguous section (garden path sentences like ‘The
old man the boats’- we expect to discover what the
old man did)
– If you stop sentences midway and ask people to
recall what they sentence just heard was, they tend
to report by clause boundaries
The psychological reality of syntax
• Other evidence also shows that clauses have psychological
– If you play a click in the middle of auditory presentation of a
sentence, subjects are much better at saying afterwards where
it was if it comes between clauses than if it occurred within a
– Moreover, they tend to report is as occurring at a clausal
boundary far more often than it actually had occurred there.
– This effect remains even if you remove accompanying
disambiguating information such as pauses and changes in
intonation that usually accompany clausal boundaries: i.e. in
a robotic monotone voice clicks at clause boundaries were
still better recalled than clicks within-clauses.
– You get it even with (fake) subliminal clicks
Memory or language?
• One question about these studies is whether is was a memory or
language perception effect
– If you ask people to indicate by pointing to a written sentence
rather than by speaking, the effect was reduced, suggesting a
memory effect
• That is: suggesting that the effect may be due to chunking
– But it is nevertheless still present- suggesting a language
perception effect.
Memory or language?
• Moreover…we can always m’u the distinction between
language and memory
– Following Chomsky, there may be a deep relation between
how syntax chunks words into role and clauses, and how we
• Readers show pauses between clauses, and an effect of how many
chunks there are to integrate (more = longer pauses)
– Following Fauconnier, there may be a deep relation between
how we carve up and chunk experience, and how that chunking
is mapped onto our language system, so that memory chunking
is linguistic (or reflects the same psychological constraints)
– Even single words show amazing memory properties (Terry
Deacon) - and then, so do narratives/stories/myths
• Language is in part a technology for memory
Order in probability
• If language were entirely random, then all words would
be equally likely, and we already know that:
f.) Pragmatic rules limit what can come next to being related
to what has already been communicated or what is
currently happening.
Pragmatics: Grice’s Maxims
• There are a great many complex pragmatic rules, but the
most well-known are known as the Gricean Maxims:
i.) The Maxim Of Quality: Speakers should tell the
truth as they know it, or explicitly acknowledge their
uncertainty about the truth if they are aware of it
ii.) The Maxim of Manner: Speakers should strive to be
clear, succinct, and unambiguous
iii.) The Maxim of Quantity: Speakers should say all
that is necessary or required, but no more than that
iv.) The Maxim of Relation: Speakers should say only
what is relevant
Order in probability
• If language were entirely random, then all words would be
equally likely, and we already know that:
g.) Semantics contributes to disambiguating uncertainty
– The sentence 'The witness examined by the lawyer was
useless' is read more slowly than the sentence 'The
evidence examined by the lawyer was useless', even
though both sentences have identical structure and
phoneme count
– The reason is that witnesses are active beings who can
examine but evidence isn’t active and so cannot
examine- it can only be examined
– THE phrase 'The witness examined' is ambiguous at
that point in a way that 'The evidence examined' is not.
– We say the second sentence is semantically constrained
in a way that the first is not.
Order in probability
• If language were entirely random, then all words would be
equally likely, and we already know that:
h.) We can (and we do) use prosody, stress, and pauses to
disambiguate sentences such as these when they are
– i.e. speakers tend to automatically lengthen the final
vowel in a word just before a clause boundary and
also insert a pause
Context matters
• Each one of these things contributes takes away a
little bit of uncertainty in interpreting sentences
– This can be shown experimentally: increased context
makes words more likely to recognize under conditions
of decreased exposure to that word or increased
noisiness, and makes those words more likely to be
– Grosjean: words in context are recognized in 175-200
ms of their onset (half their length); words out of
context need over 100 ms more (average = 300 ms)
• Cohort model applies to sentence processing too