UCSD ling/cse 256 lecture 12

advertisement
Statistical NLP
Winter 2009
Lecture 12: Computational Psycholinguistics
Roger Levy
NLP techniques, human parsing
• Our “parsing” here is about Treebank parsing
• Now for a bit about human parsing!
• Techniques from NLP are still the foundation
• We’ll focus on rational models of human sentence
processing
[rational = using all available information to make inferences]
• incremental inference: understanding of and response to a
partial utterance
Incrementality and Rationality
• Online sentence comprehension is hard
• But lots of information sources can be usefully brought to
bear to help with the task
• Therefore, it would be rational for people to use all the
information available, whenever possible
• This is what incrementality is
• We have lots of evidence that people do this often
“Put the apple on the towel in the box.”
(Tanenhaus et al., 1995)
Anatomy of ye olde garden path
sentence
The horse raced past the barn fell.
• It’s weird
• People fail to understand it most of the time
• People are more likely to misunderstand it than to
understand it properly
• “What’s a barn fell?”
• The horse that raced past the barn fell
• The horse raced past the barn and fell
• Today I’m going to talk about three outstanding puzzles
involving garden-path sentences
Garden paths: What we do understand
• We have decent models of how this sentence is not
understood
• Incremental probabilistic parsing with beam search
(Jurafsky, 1996)
• Surprisal (Hale, 2001; Levy, 2008): the disambiguating
word fell is extremely low probability  alarm signal
signals “this doesn’t make sense” to the parser
• These models are based on rational use of evidential
information (data-driven probabilistic inference)
• Also compatible with gradations in garden-path difficulty
(Garnsey et al., 1997; McRae et al., 1998)
Hale, 2001; Levy, 2008; Smith & Levy, 2008: surprisal
• Let the difficulty of a word be its surprisal given its
context:
• Captures the expectation intuition: the more we expect
an event, the easier it is to process
• Many probabilistic formalisms, including probabilistic
context-free grammars, can give us word surprisals
a man arrived yesterday
0.3 S  S CC S
0.7 S  NP VP
0.35 NP  DT NN
0.15 VP
 VBD ADVP
0.4 ADVP  RB
...
0.7
0.15
0.35
0.3
0.03
0.02
0.4
0.07
Total probability: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.8510-7
Algorithms by Lafferty and Jelinek (1992), Stolcke
(1995) give us P(wi|context) from a PCFG
Surprisal and garden paths: theory
• Revisiting the horse raced past the barn fell
• After the horse raced past the barn, assume 2 parses:
• Jurafsky 1996 estimated the probability ratio of these
parses as 82:1
• The surprisal differential of fell in reduced versus
unreduced conditions should thus be log2 83 = 6.4 bits
*(assuming independence between RC reduction and main verb)
Surprisal and garden paths: practice
• An unlexicalized PCFG (from Brown corpus) gets right
monotonicity of surprisals at disambiguating word “fell”
Aside: These are
way too high, but
that’s because the
grammar’s crude
this is the key
comparison;
the difference
is small, but in
the right
direction
Garden Paths: What we don’t understand so well
• How do people arrive at the misinterpretations they come
up with?
• What factors induce them to be more or less likely to
come up with such a misinterpretation
Outstanding puzzle: length effects
• Try to read this:
• Tom heard the gossip about the neighbors wasn’t true.
• Compare it with this:
• Tom heard the gossip wasn’t true.
• Likewise:
• While the man hunted the deer that was brown and graceful ran into
the woods.
• While the man hunted the deer ran into the woods.
• The longer the ambiguous region, the harder it is to
recover (Frazier & Rayner, 1987; Tabor & Hutchins, 2004)
• Also problematic for rational models: effects of irrelevant
information
Memory constraints in human parsing
• Sentence meaning is structured
• The number of logically possible analyses for a sentence
is at best exponential in sentence length
• So we must be entertaining some limited subset of
analyses at all times*
*“Dynamic programming”, you say? Ask later.
Dynamic programming
• Exact probabilistic inference with context-free grammars
can be done efficiently in O(n3)
• But…
• This inference requires strict probabilistic locality
• Human parsing is linear—that is, O(n)—anyway
• Here, we’ll explore an approach from the machinelearning literature: the particle filter
The particle filter: general picture
• Sequential Monte Carlo for incremental observations
• Let xi be hidden data, zi be unobserved states
• For parsing: xi are words, zi are structural analyses
• Suppose that after n-1 observations we have the
distribution overinterpretations P(zn-1|x1…n-1)
• After obtaining the next word xn, represent the next
distribution P(zn|x1…n) inductively:
• Representing P(zi|x1…i) by samples makes it a Monte
Carlo method
Particle filter with probabilistic grammars
S
 NP VP
1.0
V
 broke
0.3
NP
 N
0.8
V
 tired
0.3
NP
 N RRC
0.2
Part
 raced
0.1
RRC
 Part Adv
1.0
Part
 broken
0.5
VP
 V Adv
1.0
Part
 tired
0.4
N
 horses
1.0
Adv
 quickly
1.0
V
 raced
0.4
S
*
S*
*
NP *
*
NP *
*
VP *
*
N*
*
N*
*
V*
*
Adv *
*
horses
raced
quickly
1.0
0.4
1.0
tired
horses
1.0
VP
*
RRC*
*
V*
*
raced
0.4
V*
*
Adv *
*
quickly
1.0
tired
1.0
Returning to the puzzle
A-S Tom heard the gossip wasn’t true.
A-L Tom heard the gossip about the neighbors wasn’t true.
U-S Tom heard that the gossip wasn’t true.
U-L Tom heard that the gossip about the neighbors wasn’t
true.
• Previous empirical finding: ambiguity induces difficulty…
• …but so does the length of the ambiguous region
• Our linking hypothesis:
The proportion of parse failures at the disambiguating region
should be monotonically related to the difficulty of the
sentence
Frazier & Rayner,1982; Tabor & Hutchins, 2004
Model Results
Ambiguity matters…
But the length of the ambiguous region also matters!
Human results (offline rating study)
Rational comprehension’s other successes
• Global disambiguation preferences (Jurafsky, 1996)
The women discussed the dogs on the beach
?
?
• Basic garden-path sentences (Hale, 2001)
The horse raced past the barn fell
(
t
h
a
crook
t
• Garden-path gradience (Narayanan & Jurafsky, 2002)
The
arrested by the detective was guilty
• Predictability
w in unambiguous contexts
a
The children
went outside to…
s
)
play
chat
(
n
(Levy, 2008) o
t
• Grounding in optimality/rational analysis
Levy, 2008)
d
i
f
(Norris, 2006;
f Smith &
i
c
Behavioral correlates (Tabor et al., 2004)
tossed
harder than
thrown
• Also, Konieczny (2006, 2007) found compatible results in stops-makingsense and visual-world paradigms
• These results are problematic for theories requiring global
contextual consistency (Frazier, 1987; Gibson, 1991, 1998; Jurafsky, 1996;
Hale, 2001, 2006)
Download