Parsing More Efficiently and Accurately CS 4705

advertisement
Parsing More Efficiently and
Accurately
CS 4705
Review
• Top-Down vs. Bottom-Up Parsers
• Left-corner table provides more efficient lookahead
• Left recursion solutions
• Structural ambiguity…solutions?
Issues for Better Parsing
•
•
•
•
Efficiency
Error handling
Control strategies
Agreement and subcategorization
Inefficient ReParsing of Subtrees
Dynamic Programming
• Create table of solutions to sub-problems (e.g.
subtrees) as parse proceeds
• Look up subtrees for each constituent rather than
re-parsing
• Since all parses implicitly stored, all available for
later disambiguation
• Examples: Cocke-Younger-Kasami (CYK) (1960),
Graham-Harrison-Ruzzo (GHR) (1980) and
Earley (1970) algorithms
Earley’s Algorithm
• Uses dynamic programming to do parallel topdown search in (worst case) O(N3) time
• First, L2R pass fills out a chart with N+1 states
(N: the number of words in the input)
– Think of chart entries as sitting between words in the
input string keeping track of states of the parse at these
positions
– For each word position, chart contains set of states
representing all partial parse trees generated to date.
E.g. chart[0] contains all partial parse trees generated at
the beginning of the sentence
• Chart entries represent three type of constituents:
– predicted constituents (top-down predictions)
– in-progress constituents (we’re in the midst of …)
– completed constituents (we’ve found …)
• Progress in parse represented by Dotted Rules
– Position of • indicates type of constituent
– 0 Book 1 that 2 flight 3
S --> • VP, [0,0] (predicting VP)
NP --> Det • Nom, [1,2] (finding NP)
VP --> V NP •, [0,3] (found VP)
– [x,y] tells us where the state begins (x) and where the
dot lies (y) wrt the input
0
Book 1 that 2 flight 3
S --> • VP, [0,0]
– First 0 means S constituent begins at the start of the
input
– Second 0 means the dot here too
– So, this is a top-down prediction
NP --> Det • Nom, [1,2]
–
–
–
–
the NP begins at position 1
the dot is at position 2
so, Det has been successfully parsed
Nom predicted next
VP --> V NP •, [0,3]
– Successful VP parse of entire input
– Graphical representation
Successful Parse
• Final answer is found by looking at last entry in
chart
• If entry resembles S -->  • [0,N] then input
parsed successfully
• But … note that chart will also contain a record of
all possible parses of input string, given the
grammar -- not just the successful one(s)
– Why is this useful?
Parsing Procedure for the Earley Algorithm
• Move through each set of states in order, applying
one of three operators to each state:
– predictor: add top-down predictions to the chart
– scanner: read input and add corresponding state to chart
– completer: move dot to right when new constituent
found
• Results (new states) added to current or next set of
states in chart
• No backtracking and no states removed: keep
complete history of parse
– Why is this useful?
Predictor
• Intuition: new states represent top-down
expectations
• Applied when non part-of-speech non-terminals
are to the right of a dot
S --> • VP [0,0]
• Adds new states to end of current chart
– One new state for each expansion of the non-terminal in
the grammar
VP --> • V [0,0]
VP --> • V NP [0,0]
Scanner
• New states for predicted part of speech.
• Applicable when part of speech is to the right of a
dot
VP --> • V NP [0,0] ‘Book…’
• Looks at current word in input
• If match, adds state(s) to next chart
VP --> V • NP [0,1]
• I.e., we’ve found a piece of this constituent!
Completer
• Intuition: we’ve found a constituent, so tell
everyone waiting for this
• Applied when dot has reached right end of rule
NP --> Det Nom • [1,3]
• Find all states w/dot at 1 and expecting an NP
VP --> V • NP [0,1]
• Adds new (completed) state(s) to current chart
VP --> V NP • [0,3]
Book that flight (Chart [0])
• Seed chart with top-down predictions for S from
grammar

S   NP VP
S   Aux NP VP
S   VP
NP   Det Nom
NP   PropN
VP   V
VP   V NP
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
Dummy start state
Predictor
Predictor
Predictor
Predictor
Predictor
Predictor
Predictor
CFG for Fragment of English
S  NP VP
Det  that | this | a
S  Aux NP VP
S  VP
NP  Det Nom
N  book | flight | meal | money
Nom  N
Nom  N Nom
NP PropN
VP  V
VP  V NP
V  book | include | prefer
Aux  does
Prep from | to | on
PropN  Houston | TWA
Nom  Nom PP
PP  Prep NP
• When dummy start state is processed, it’s passed
to Predictor, which produces states representing
every possible expansion of S, and adds these and
every expansion of the left corners of these trees
to bottom of Chart[0]
• When VP --> • V, [0,0] is reached, Scanner called,
which consults first word of input, Book, and adds
first state to Chart[1], VP --> Book •, [0,0]
• Note: When VP --> • V NP, [0,0] is reached in
Chart[0], Scanner does not need to add VP -->
Book •, [0,0] again to Chart[1]
Chart[1]
V book 
VP  V 
VP  V  NP
S  VP 
NP   Det Nom
NP   PropN
[0,1]
[0,1]
[0,1]
[0,1]
[1,1]
[1,1]
Scanner
Completer
Completer
Completer
Predictor
Predictor
V--> book  passed to Completer, which finds 2
states in Chart[0] whose left corner is V and adds
them to Chart[1], moving dots to right
• When VP  V  is itself processed by the
Completer, S  VP  is added to Chart[1] since
VP is a left corner of S
• Last 2 rules in Chart[1] are added by Predictor
when VP  V  NP is processed
• And so on….
How do we retrieve the parses at the end?
• Augment the Completer to add ptr to prior states it
advances as a field in the current state
– I.e. what state did we advance here?
– Read the ptrs back from the final state
Error Handling
• What happens when we look at the contents of the
last table column and don't find a S -->  rule?
– Is it a total loss? No...
– Chart contains every constituent and combination of
constituents possible for the input given the grammar
• Also useful for partial parsing or shallow parsing
used in information extraction
Alternative Control Strategies
• Change Earley top-down strategy to bottom-up or
...
• Change to best-first strategy based on the
probabilities of constituents
– Compute and store probabilities of constituents in the
chart as you parse
– Then instead of expanding states in fixed order, allow
probabilities to control order of expansion
But there are still problems…
• Several things CFGs don’t handle elegantly:
– Agreement (A cat sleeps. Cats sleep.)
S  NP VP
NP  Det Nom
But these rules overgenerate, allowing, e.g., *A
cat sleep…
– Subcategorization (Cats dream. Cats eat
cantaloupe.)
VP  V
VP  V NP
But these also allow *Cats dream cantaloupe.
• We need to constrain the grammar rules to
enforce e.g. number agreement and
subcategorization differences
CFG Solution
• Encode constraints into the non-terminals
– Noun/verb agreement
S SgS
S  PlS
SgS  SgNP SgVP
SgNP  SgDet SgNom
– Verb subcat:
IntransVP  IntransV
TransVP  TransV NP
• But this means huge proliferation of rules…
• An alternative:
– View terminals and non-terminals as complex objects
with associated features, which take on different values
– Write grammar rules whose application is constrained
by tests on these features, e.g.
S  NP VP (only if the NP and VP agree in number)
Feature Structures
• Sets of feature-value pairs where:
– Features are atomic symbols
– Values are atomic symbols or feature structures
– Illustrated by attribute-value matrix








Feature
Feature
...
Feature
1
2
n
Value 
Value 
.... 
Value 
1
2
n
• Number feature


Num
SG
• Number-person features




Num
Pers
SG 
3 
• Number-person-category features (3sgNP)
NP 
Cat
Num SG 
Pers 3 







Features, Unification and Grammars
• How do we incorporate feature structures into our
grammars?
– Assume that constituents are objects which have
feature-structures associated with them
– Associate sets of unification constraints with grammar
rules
– Constraints must be satisfied for rule to be satisfied
• To enforce subject/verb number agreement
S  NP VP
<NP NUM> = <VP NUM>
Agreement in English
• We need to add PERS to our subj/verb agreement
constraint
This cat likes kibble.
S  NP Vp
<NP AGR> = <VP AGR>
Do these cats like kibble?
S  Aux NP VP
<Aux AGR> = <NP AGR>
• Det/Nom agreement can be handled similarly
These cats
This cat
NP  Det Nom
<Det AGR> = <Nom AGR>
<NP AGR> = <Nom AGR>
• And so on …
Verb Subcategorization
• Recall: Different verbs take different types of
argument
– Solution: SUBCAT feature, or subcategorization frames
e.g. Bill wants George to eat.












ORTH want
CAT
V
HEAD





SUBCAT

CAT VP

CAT NP ,


HEAD VFORM INF 



 
 













 
• But there are many phrasal types and so many
types of subcategorization frames, e.g.
–
–
–
–
–
believe
believe [VPrep in] [NP ghosts]
believe [NP my mother]
believe [Sfin that I will pass this test]
believe [Swh what I see] ...
• Verbs also subcategorize for subject as well as
object types ([Swh What she wanted] seemed clear.)
• And other p.o.s. can be seen as subcategorizing for
various arguments, such as prepositions, nouns
and adjectives (It was clear [Sfin that she was
exhausted])
Summing Up
• Ambiguity, left-recursion, and repeated re-parsing
of subtrees present major problems for parsers
• Solutions:
– Combine top-down predictions with bottom-up lookahead, use dynamic programming  e.g. the Earley
algorithm
– Feature structures and subcategorization frames help
constrain parses but increase parsing complexity
• Next time: Read Ch 12
Download