Basic Parsing with Context-Free Grammars CS 4705 1

advertisement
Basic Parsing with Context-Free
Grammars
CS 4705
1
Analyzing Linguistic Units
• Morphological parsing:
– analyze words into morphemes and affixes
– rule-based, FSAs, FSTs
• Ngrams for Language Modeling
• POS Tagging
• Syntactic parsing:
– identify constituents and their relationships
– to see if a sentence is grammatical
– to assign an abstract representation of meaning
2
Syntactic Parsing
• Declarative formalisms like CFGs, FSAs define
the legal strings of a language -- but only tell you
‘this is a legal string of the language X’
• Parsing algorithms specify how to recognize the
strings of a language and assign each string one
(or more) syntactic analyses
• Parsing useful for grammar checking, semantic
analysis, MT, QA, information extraction, speech
recognition…almost every task in NLP…but…
3
Parsing as a Form of Search
• Searching FSAs
– Finding the right path through the automaton
– Search space defined by structure of FSA
• Searching CFGs
– Finding the right parse tree among all possible parse
trees
– Search space defined by the grammar
• Constraints provided by the input sentence and the
automaton or grammar
4
CFG for Fragment of English
S  NP VP
S  Aux NP VP
S  VP
NP  Det Nom
NP PropN
Nom  N Nom
Nom  N
Nom  Nom PP
VP  V NP
TopD BotUp
VP  V
PP -> Prep NP
N  book | flight | meal | money
V  book | include | prefer
Aux  does
Prep from | to | on
PropN  Houston | TWA
Det  that | this | a
E.g.
LC’s
5
Parse Tree for ‘Book that flight’ for Prior CFG
S
VP
NP
Nom
V
Det
N
Book
that
flight
6
Rule Expansion
S  NP VP
S  Aux NP VP
S  VP (1)
NP  Det Nom (3)
NP PropN
Nom  N Nom
Nom  N (4)
Nom  Nom PP
VP  V NP (2)
TopD BotUp
VP  V
PP -> Prep NP
N  book | flight | meal | money
V  book | include | prefer
Aux  does
Prep from | to | on
PropN  Houston | TWA
Det  that | this | a
E.g.
LC’s
7
Top-Down Parser
• Builds from the root S node to the leaves
• Assuming we build all trees in parallel:
–
–
–
–
Find all trees with root S (or all rules w/lhs S)
Next expand all constituents in these trees/rules
Continue until leaves are pos
Candidate trees failing to match pos of input string are
rejected (e.g. Book that flight matches only one
subtree)
8
Top-Down Search Space for CFG (expanding
only leftmost leaves)
S
NP
S
NP
S
VP
S
Aux NP
S
VP
S
VP
S
S
S
VP NP VP Aux NP VP Aux NP VP VP VP
Det Nom PropN
Det Nom
PropN V NP V
Det Nom
N
9
Bottom-Up Parsing
• Parser begins with words of input and builds up
trees, applying grammar rules whose rhs match
– Book that flight
N
Det
N
V
Det N
Book that flight Book that flight
– ‘Book’ ambiguous (2 pos appear in grammar)
– Parse continues until an S root node reached or no
further node expansion possible
10
Two Candidates: One Successful Parse
S
VP
VP
V
Book
NP
Det
that
Nom
N
flight
NP
Nom
V
Det N
Book that flight
S ~  VP NP
11
What’s right/wrong with….
• Top-Down parsers – they never explore illegal
parses (e.g. which can’t form an S) -- but waste
time on trees that can never match the input
• Bottom-Up parsers – they never explore trees
inconsistent with input -- but waste time exploring
illegal parses (with no S root)
• For both: find a control strategy -- how explore
search space efficiently?
– Pursuing all parses in parallel or backtrack or …?
– Which rule to apply next?
– Which node to expand next?
12
A Possible Top-Down Parsing Strategy
• Depth-first search:
– Agenda of search states: expand search space
incrementally, exploring most recently generated state
(tree) each time
– When you reach a state (tree) inconsistent with input,
backtrack to most recent unexplored state (tree)
• Which node to expand?
– Leftmost or rightmost
• Which grammar rule to use?
– Order in the grammar? How?
13
Top-Down, Depth-First, Left-Right Strategy
• Initialize agenda with ‘S’ tree and ptr to first word
(cur)
• Loop: Until successful parse or empty agenda
– Apply next applicable grammar rule to leftmost
unexpanded node (n) of current tree (t) on agenda and
push resulting tree (t’) onto agenda
• If n is a POS category and matches the POS of cur,
push new tree (t’’) onto agenda
• Else pop t’ from agenda
– Final agenda contains history of successful parse
• Does this flight include a meal?
14
Fig 10.7
CFG
15
Left Corners: Top-Down Parsing with
Bottom-Up Filtering
• We saw: Top-Down, depth-first, L2R parsing
– Expands non-terminals along the tree’s left edge down
to leftmost leaf of tree
– Moves on to expand down to next leftmost leaf…
– Note: In successful parse, current input word will be
first word in derivation of node the parser currently
processing
– So….look ahead to left-corner of the tree
• B is a left-corner of A if A =*=> Bα
• Build table with left-corners of all non-terminals in
grammar and consult before applying rule
16
Left Corners
17
Left-Corner Table for CFG
Category
Left Corners
S
Det, PropN, Aux, V
NP
Det, PropN
Nom
N
VP
V
18
Left Recursion vs. Right Recursion
• Depth-first search will never terminate if grammar
is left recursive (e.g. NP --> NP PP)
*
*
( 

, 

)
19
• Solutions:
– Rewrite the grammar (automatically?) to a weakly
equivalent one which is not left-recursive
e.g. The man {on the hill with the telescope…}
NP  NP PP (wanted: Nom plus a sequence of PPs)
NP  Nom PP
NP  Nom
Nom  Det N
…becomes…
NP  Nom NP’
Nom  Det N
NP’  PP NP’ (wanted: a sequence of PPs)
NP’  e
• Not so obvious what these rules mean…
20
– Harder to detect and eliminate non-immediate left
recursion
– NP --> Nom PP
– Nom --> NP
– Fix depth of search explicitly
– Rule ordering: non-recursive rules first
• NP --> Det Nom
• NP --> NP PP
21
An Exercise: The city hall parking lot in town
•
•
•
•
•
•
•
•
•
•
NP NP NP PP
NP  Det Nom
NP  Adj Nom
NP  Nom Nom
Nom  NP Nom
Nom  N
PP  Prep NP
N  city | hall | lot | town
Adj  parking
Prep  to | for | in
22
Another Problem: Structural ambiguity
• Multiple legal structures
– Attachment (e.g. I saw a man on a hill with a telescope)
– Coordination (e.g. younger cats and dogs)
– NP bracketing (e.g. Spanish language teachers)
23
NP vs. VP Attachment
24
• Solution?
– Return all possible parses and disambiguate using
“other methods”
25
Summing Up
• Parsing is a search problem which may be
implemented with many control strategies
– Top-Down or Bottom-Up approaches each have
problems
• Combining the two solves some but not all issues
– Left recursion
– Syntactic ambiguity
• Next time: Making use of statistical information
about syntactic constituents
– Read Ch 12
26
Download