Probabilistic and Lexicalized Parsing 7/15/2016 COMS 4705 Fall 2004 1

advertisement
Probabilistic and Lexicalized Parsing
7/15/2016
COMS 4705 Fall 2004
1
Probabilistic CFGs
• The probabilistic model
– Assigning probabilities to parse trees
• Disambiguate, LM for ASR, faster parsing
• Getting the probabilities for the model
• Parsing with probabilities
– Slight modification to dynamic programming
approach
– Task: find max probability tree for an input string
7/15/2016
COMS 4705 – Fall 2004
2
Probability Model
• Attach probabilities to grammar rules
• Expansions for a given non-terminal sum to 1
VP -> V
.55
VP -> V NP
.40
VP -> V NP NP
.05
– Read this as P(Specific rule | LHS)
– “What’s the probability that VP will expand to V,
given that we have a VP?”
7/15/2016
COMS 4705 – Fall 2004
3
Probability of a Derivation
• A derivation (tree) consists of the set of
grammar rules that are in the parse tree
• The probability of a tree is just the product of
the probabilities of the rules in the derivation
• Note the independence assumption – why don’t
we use conditional probabilities?
7/15/2016
COMS 4705 – Fall 2004
4
Probability of a Sentence
• Probability of a word sequence (sentence) is
probability of its tree in unambiguous case
• Sum of probabilities of possible trees in
ambiguous case
7/15/2016
COMS 4705 – Fall 2004
5
Getting the Probabilities
• From an annotated database
– E.g. the Penn Treebank
– To get the probability for a particular VP rule just
count all the times the rule is used and divide by the
number of VPs overall
– What if you have no treebank (e.g. for a ‘new’
language)?
7/15/2016
COMS 4705 – Fall 2004
6
Assumptions
• We have a grammar to parse with
• We have a large robust dictionary with parts of
speech
• We have a parser
• Given all that… we can parse probabilistically
7/15/2016
COMS 4705 – Fall 2004
7
Typical Approach
• Bottom-up (CYK) dynamic programming
approach
• Assign probabilities to constituents as they are
completed and placed in the table
• Use the max probability for each constituent
going up the tree
7/15/2016
COMS 4705 – Fall 2004
8
How do we fill in the DP table?
• Say we’re talking about a final part of a parse–
finding an S, e.g., of the rule
– S->0NPiVPj
The probability of S is…
P(S->NP VP)*P(NP)*P(VP)
The acqua part is already known, since we’re doing
bottom-up parsing. We don’t need to recalculate the
probabilities of constituents lower in the tree.
7/15/2016
COMS 4705 – Fall 2004
9
Using the Maxima
• P(NP) is known
• But what if there are multiple NPs for the span
of text in question (0 to i)?
• Take the max (Why?)
• Does not mean that other kinds of constituents
for the same span are ignored (i.e. they might
be in the solution)
7/15/2016
COMS 4705 – Fall 2004
10
CYK Parsing: John called Mary from
Denver
•
•
•
•
•
S -> NP VP
VP -> V NP
NP -> NP PP
VP -> VP PP
PP -> P NP
7/15/2016
• NP -> John, Mary, Denver
• V -> called
• P -> from
COMS 4705 – Fall 2004
11
Example
S
NP
VP
PP
VP
V
John
7/15/2016
called
NP
Mary
COMS 4705 – Fall 2004
NP
P
from Denver
12
Example
S
NP
VP
NP
NP
PP
V
John called
7/15/2016
Mary
from Denver
COMS 4705 – Fall 2004
13
Example
John
7/15/2016
called
Mary
from
COMS 4705 – Fall 2004
Denver
14
Base Case: Aw
NP
P
NP
V
NP
Denver
from
Mary
called
John
7/15/2016
COMS 4705 – Fall 2004
15
Recursive Cases: ABC
NP
P
NP
X
V
NP
called
Denver
from
Mary
John
7/15/2016
COMS 4705 – Fall 2004
16
NP
P
VP
NP
X
V
Mary
NP
called
Denver
from
John
7/15/2016
COMS 4705 – Fall 2004
17
NP
X
P
VP
NP
from
X
V
Mary
NP
called
Denver
John
7/15/2016
COMS 4705 – Fall 2004
18
PP
NP
X
P
Denver
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
19
S
NP
PP
NP
X
P
Denver
VP
NP
from
V
Mary
called
John
7/15/2016
COMS 4705 – Fall 2004
20
PP
NP
Denver
X
X
P
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
21
NP
X
S
VP
NP
X
V
Mary
NP
called
PP
NP
P
Denver
from
John
7/15/2016
COMS 4705 – Fall 2004
22
NP
PP
NP
Denver
X
X
X
P
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
23
VP
NP
PP
NP
X
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
24
VP
NP
PP
NP
X
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
25
NP
PP
NP
X
VP1
VP2
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
26
S
NP
PP
NP
X
VP1
VP2
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
27
S
VP
NP
PP
NP
X
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
7/15/2016
COMS 4705 – Fall 2004
28
Problems with PCFGs
• The probability model we’re using is just based
on the rules in the derivation…
– Doesn’t use the words in any real way – e.g. PP
attachment often depends on the verb, its object,
and the preposition (I ate pickles with a fork. I ate
pickles with relish.)
– Doesn’t take into account where in the derivation a
rule is used – e.g. pronouns more often subjects than
objects (She hates Mary. Mary hates her.)
7/15/2016
COMS 4705 – Fall 2004
29
Solution
• Add lexical dependencies to the scheme…
– Add the predilections of particular words into the
probabilities in the derivation
– I.e. Condition the rule probabilities on the actual
words
7/15/2016
COMS 4705 – Fall 2004
30
Heads
• Make use of notion of the head of a phrase, e.g.
– The head of an NP is its noun
– The head of a VP is its verb
– The head of a PP is its preposition
• Phrasal heads
– Can ‘take the place of’ whole phrases, in some sense
– Define most important characteristics of the phrase
– Phrases are generally identified by their heads
7/15/2016
COMS 4705 – Fall 2004
31
Example (correct parse)
Attribute grammar
7/15/2016
COMS 4705 – Fall 2004
32
Example (wrong)
7/15/2016
COMS 4705 – Fall 2004
33
How?
• We started with rule probabilities
– VP -> V NP PP
P(rule|VP)
• E.g., count of this rule divided by the number of VPs in a
treebank
• Now we want lexicalized probabilities
– VP(dumped)-> V(dumped) NP(sacks)PP(in)
– P(r|VP ^ dumped is the verb ^ sacks is the head of
the NP ^ in is the head of the PP)
– Not likely to have significant counts in any treebank
7/15/2016
COMS 4705 – Fall 2004
34
Declare Independence
• So, exploit independence assumption and
collect the statistics you can…
• Focus on capturing two things
– Verb subcategorization
• Particular verbs have affinities for particular VPs
– Objects have affinities for their predicates (mostly
their mothers and grandmothers)
• Some objects fit better with some predicates than others
7/15/2016
COMS 4705 – Fall 2004
35
Verb Subcategorization
• Condition particular VP rules on their head…
so
r: VP -> V NP PP P(r|VP)
Becomes
P(r | VP ^ dumped)
What’s the count?
How many times was this rule used with dump,
divided by the number of VPs that dump appears in
total
7/15/2016
COMS 4705 – Fall 2004
36
Preferences
• Subcat captures the affinity between VP heads
(verbs) and the VP rules they go with.
• What about the affinity between VP heads and
the heads of the other daughters of the VP?
7/15/2016
COMS 4705 – Fall 2004
37
Example (correct parse)
7/15/2016
COMS 4705 – Fall 2004
38
Example (wrong)
7/15/2016
COMS 4705 – Fall 2004
39
Preferences
• The issue here is the attachment of the PP
• So the affinities we care about are the ones
between dumped and into vs. sacks and into.
– Count the times dumped is the head of a constituent
that has a PP daughter with into as its head and
normalize (alternatively, P(into|PP,dumped is mother’s
head))
– Vs. the situation where sacks is a constituent with into
as the head of a PP daughter (or, P(into|PP,sacks is
mother’s head))
7/15/2016
COMS 4705 – Fall 2004
40
Another Example
• Consider the VPs
– Ate spaghetti with gusto
– Ate spaghetti with marinara
• The affinity of gusto for eat is much larger than
its affinity for spaghetti
• On the other hand, the affinity of marinara for
spaghetti is much higher than its affinity for ate
7/15/2016
COMS 4705 – Fall 2004
41
But not a Head Probability Relationship
• Note the relationship here is more distant and
doesn’t involve a headword since gusto and
marinara aren’t the heads of the PPs (Hindle &
Rooth ’91)
Vp (ate)
Vp(ate)
Vp(ate)
Pp(with)
np
v
Ate spaghetti with gusto
7/15/2016
Np(spag)
np Pp(with)
v
Ate spaghetti with marinara
COMS 4705 – Fall 2004
42
Next Time
• Midterm
– Covers everything assigned and all lectures up
through today
– Short answers (2-3 sentences), exercises, longer
answers (1 para)
– Closed book, calculators allowed but no laptops
7/15/2016
COMS 4705 – Fall 2004
43
Download