Pp(with)

advertisement
PARSING
Analyzing Linguistic Units
Task
Formal Mechanism
Resulting Representation
Morphology
Analyze words
into morphemes
Context
dependency
rules
FST
composition
Morphological structure
Phonology
Analyze words
into phonemes
Context
dependency
rules
FST
composition
Phonemic structure
Syntax
Analyze
sentences for
syntactic
relations
between words
Grammars:
CFGs
PDA
Top-down,
Parse tree, derivation tree
Bottom-up,
Earley,
CKY
parsing
• Why
should we parse a sentence?
 to detect relations among words
 used to normalize surface syntactic variations.
 invaluable for a number of NLP applications
Some Concepts
Grammar: A generative device that prescribes a set of valid strings.
Parser: A device that uncovers the sequence of grammar rules that
might have generated the input sentence.
•
Input: Grammar, Sentence
•
Output: parse tree, derivation tree
Recognizer: A device that returns a “yes” if the input string could be
generated by the grammar.
•
Input: Grammar, Sentence
•
Output: boolean
Searching for a Parse
Grammar + rewrite procedure encodes
•
all strings generated by the grammar L(G)
•
all parse trees for each string (s) generated T(G) = U{Ts(G)}
Given an input sentence (I), the set of parse trees is TI (G).
Parsing is searching for TI (G) ⊆ T(G)
Ideally, parser finds the appropriate parse for the sentence.
CFG for Fragment of English
S
S  NP VP
VP  V
S  Aux NP VP
PP -> Prep NP
S  VP
N  book | flight | meal | money
NP  Det Nom
V  book | include | prefer
NP PropN
Aux  does
Nom  N Nom
Prep from | to | on
Nom  N
PropN  Houston | TWA
Nom  Nom PP
Det  that | this | a
VP
NP
V
Book
Det
that
VP  V NP
Nom
N
flight
Bottom-up Parsing
Top-down Parsing
Top-down/Bottom-up Parsing
Top-down (recursive decent parser)
Bottom-up (shift-reduce parser)
Starts from
S (goal)
Words (input)
Algorithm
a. Pick non-terminals
(Parallel)
b. Pick rules from the grammar to expand the
non-terminals
a. Match sequence of input symbols with the
RHS of some rule
Termination
Success: When the leaves of a tree match the
input
Failure: No more non-terminals to expand in
any of the trees
Pros/Cons
b. Replace the sequence by the LHS of the
matching rule
Success: When “S” is reached
Failure: No more rewrites possible
Pro: Goal-driven, starts with “S”
Pro: Constrained by the input string
Con: Constructs trees that may not match
input
Con: Constructs constituents that may not
lead to the goal “S”
• Control strategy -- how to explore search space?
• Pursuing all parses in parallel or backtrack or …?
• Which rule to apply next?
• Which node to expand next?
• Look at how the Top-down and Bottom-up parsing works on the
board for “Book that flight”
Top-down, Depth First, Left-to-Right parser
Systematic, incremental expansion of the search space.
•
In contrast to a parallel parser
Start State: (•S,0)
End State: (•,n) n is the length of input to be parsed
Next State Rules
•
(•wj+1b,j)  (•b,j+1)
•
(•Bb,j)  (•gb,j) if Bg (note B is left-most non-terminal)
Agenda: A data structure to keep track of the states to be expanded.
Depth-first expansion, if Agenda is a stack.
Fig 10.7
CFG
Left Corners
• Can we help top-down parsers with some bottom-up
information?
– Unnecessary states created if there are many Bg rules.
– If after successive expansions B * w d; and w does not match the
input, then the series of expansion is wasted.
• The leftmost symbol derivable from B needs to match the input.
– look ahead to left-corner of the tree
• B is a left-corner of A if A * B g
• Build table with left-corners of all non-terminals in grammar and consult before
applying rule
Category
Left Corners
• At a given point in state expansion (•Bb,j)
S
Det, PropN, Aux, V
NP
Det, PropN
Nom
N
VP
V
– Pick the rule B C g if left-corner of C matches the input wj+1
Limitation of Top-down Parsing: Left Recursion
Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP
PP)
*
*
( 

,  

)
Solutions:
• Rewrite the grammar to a weakly equivalent one which is not left-recursive
NP  NP PP
NP  Nom PP
NP  Nom
– This may make rules unnatural
•
NP  Nom NP’
NP’  PP NP’
NP’  e
Fix depth of search explicitly
Other book-keeping needed in top-down parsing
•
Memoization for reusing previously parsed substrings
•
Packed representation for parse ambiguity
Dynamic Programming for Parsing
Memoization:
•
Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds
•
Look up subtrees for each constituent rather than re-parsing
•
Since all parses implicitly stored, all available for later disambiguation
Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR)
(1980) and Earley (1970) algorithms
Earley parser: O(n^3) parser
•
Top-down parser with bottom-up information
•
State: [i, A   • b, j]
–
j is the position in the string that has been parsed
–
i is the position in the string where A begins
•
Top-down prediction: S * w1… wi A g
•
Bottom-up completion:  wj+1 … wn * wi … wn
Earley Parser
Data Structure: An n+1 cell array called : Chart
•
For each word position, chart contains set of states representing all partial
parse trees generated to date.
–
E.g. chart[0] contains all partial parse trees generated at the beginning of the
sentence
Chart entries represent three type of constituents:
•
predicted constituents (top-down predictions)
•
in-progress constituents (we’re in the midst of …)
•
completed constituents (we’ve found …)
Progress in parse represented by Dotted Rules
•
Position of • indicates type of constituent
• 0
Book
1
that
2
flight
3
(0,S  • VP, 0) (predicting VP)
(1,NP  Det • Nom, 2) (finding NP)
(0,VP  V NP •, 3) (found VP)
Earley Parser: Parse Success
Final answer is found by looking at last entry in chart
•
If entry resembles (0,S   •, n) then input parsed
successfully
But … note that chart will also contain a record of all possible
parses of input string, given the grammar -- not just the
successful one(s)
•
Why is this useful?
Earley Parsing Steps
Start State: (0, S’ •S, 0)
End State: (0, S•, n) n is the input size
Next State Rules
•
Scanner: read input

•
Predictor: add top-down predictions

•
(i, A•wj+1b, j)  (i, Awj+1•b, j+1)
(i, A•Bb, j)  (j, B•g, j) if Bg (note B is left-most non-terminal)
Completer: move dot to right when new constituent found

(i, B•Ab, k) (k, Ag•, j)  (i, BA•b, j)
No backtracking and no states removed: keep complete history of parse
•
Why is this useful?
Earley Parser Steps
Scanner
Predictor
Applied when
terminals are to
the right of a dot
Applied when nonApplied when dot
terminals are to the reaches the end of a
right of a dot
rule
(0, VP  • V NP, 0)
(0, S  • VP ,0)
(1, NP  Det Nom •,
3)
What chart
cell is affected
New states are
added to the next
cell
New states are
added to current
cell
New states are added
to current cell
What contents
in the chart
cell
Move the dot over
the terminal
One new state for
each expansion of
the non-terminal in
the grammar
One state for each rule
“waiting” for the
constituent such as
When does it
apply
(0, VP  V • NP, 1)
(0, VP  • V, 0)
(0, VP  • V NP, 0)
Completer
(0, VP  V • NP, 1)
(0, VP  V NP •, 3)
Book that flight (Chart [0])
Seed chart with top-down predictions for S from grammar
g
S   NP VP
S   Aux NP VP
S   VP
NP   Det Nom
NP   PropN
VP   V
VP   V NP
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
[0,0]
Dummy start state
Predictor
Predictor
Predictor
Predictor
Predictor
Predictor
Predictor
CFG for Fragment of English
S  NP VP
S  Aux NP VP
S  VP
NP  Det Nom
Nom  N
Nom  N Nom
NP PropN
VP  V
VP  V NP
Det  that | this | a
N  book | flight | meal | money
V  book | include | prefer
Aux  does
Prep from | to | on
PropN  Houston |
Nom
TWA Nom PP
PP  Prep NP
Chart[1]
V book 
VP  V 
VP  V  NP
S  VP 
NP   Det Nom
NP   PropN
[0,1]
[0,1]
[0,1]
[0,1]
[1,1]
[1,1]
Scanner
Completer
Completer
Completer
Predictor
Predictor
V book  passed to Completer, which finds 2
states in Chart[0] whose left corner is V and
adds them to Chart[1], moving dots to right
Retrieving the parses
Augment the Completer to add pointer to prior states it advances as a
field in the current state
•
i.e. what states combined to arrive here?
•
Read the pointers back from the final state
What if the final cell does not have the final state? – Error handling.
•
Is it a total loss? No...
•
Chart contains every constituent and combination of constituents
possible for the input given the grammar
•
Useful for partial parsing or shallow parsing used in information
extraction
Alternative Control Strategies
Change Earley top-down strategy to bottom-up or ...
Change to best-first strategy based on the probabilities of
constituents
•
Compute and store probabilities of constituents in the chart
as you parse
•
Then instead of expanding states in fixed order, allow
probabilities to control order of expansion
Probabilistic and Lexicalized Parsing
Probabilistic CFGs
Weighted CFGs
•
Attach weights to rules of CFG
•
Compute weights of derivations
•
Use weights to pick, preferred parses
–
Utility: Pruning and ordering the search space, disambiguate,
Language Model for ASR.
Parsing with weighted grammars (like Weighted FA)
•
T* = arg maxT W(T,S)
Probabilistic CFGs are one form of weighted CFGs.
Probability Model
• Rule Probability:
– Attach probabilities to grammar rules
– Expansions for a given non-terminal sum to 1
R1: VP  V
.55
R2: VP  V NP
R3: VP  V NP NP
.40
.05
– Estimate the probabilities from annotated corpora
P(R1)=counts(R1)/counts(VP)
• Derivation Probability:
–
–
–
–
Derivation T= {R1…Rn}
n
P ( Ri )
Probability of a derivation: P(T )  
i 1
Most likely probable parse: T *  arg Tmax P(T )
Probability of a sentence: P( S )   P(T | S )
T
• Sum over all possible derivations for the sentence
• Note the independence assumption: Parse probability does
not change based on where the rule is expanded.
Structural ambiguity
•
•
•
•
•
S  NP VP
VP  V NP
NP  NP PP
VP  VP PP
PP  P NP
• NP  John | Mary | Denver
• V -> called
• P -> from
John called Mary from Denver
S
S
VP
NP
NP
PP
VP
V
John
called
VP
NP
P
NP
NP
Mary from Denver
John
V
NP
called
Mary
PP
P
NP
from Denver
Cocke-Younger-Kasami Parser
Bottom-up parser with top-down filtering
Start State(s): (A, i, i+1) for each Awi+1
End State: (S, 0,n) n is the input size
Next State Rules
•
(B, i, k) (C, k, j)  (A, i, j) if ABC
Example
John
called
Mary
from
Denver
Base Case: Aw
NP
P
NP
V
NP
John
called
Mary
from
Denver
Recursive Cases: ABC
NP
P
NP
X
V
NP
called
John
Mary
from
Denver
NP
P
VP
NP
X
V
Mary
NP
called
John
from
Denver
NP
X
P
VP
NP
from
X
V
Mary
NP
called
John
Denver
PP
NP
X
P
Denver
VP
NP
from
X
V
Mary
NP
called
John
S
NP
John
PP
NP
X
P
Denver
VP
NP
from
V
Mary
called
PP
NP
Denver
X
X
P
S
VP
NP
from
X
V
Mary
NP
called
John
NP
X
S
VP
NP
X
V
Mary
NP
called
John
PP
NP
P
Denver
from
NP
PP
NP
Denver
X
X
X
P
S
VP
NP
from
X
V
Mary
NP
called
John
VP
NP
PP
NP
X
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
VP
NP
PP
NP
X
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
VP1
NP
PP
NP
Denver
VP2
X
X
X
P
S
VP
NP
from
X
V
Mary
NP
called
John
S
VP1
NP
PP
NP
Denver
VP2
X
X
X
P
S
VP
NP
from
X
V
Mary
NP
called
John
S
VP
NP
PP
NP
X
X
X
P
Denver
S
VP
NP
from
X
V
Mary
NP
called
John
Probabilistic CKY
• Assign probabilities to constituents as they are completed and
placed in the table
• Computing the probability
P( A, i, j )   P( A  BC , i, j )
A   BC
P( A  BC , i, j )  P( B, i, k ) *P(C , k , j )* P( A  BC )
– Since we are interested in the max P(S,0,n)
• Use the max probability for each constituent
• Maintain back-pointers to recover the parse.
Problems with PCFGs
The probability model we’re using is just based on the rules in the derivation.
Lexical insensitivity:
•
Doesn’t use the words in any real way
•
Structural disambiguation is lexically driven
–
PP attachment often depends on the verb, its object, and the preposition
–
I ate pickles with a fork.
–
I ate pickles with relish.
Context insensitivity of the derivation
•
Doesn’t take into account where in the derivation a rule is used
–
Pronouns more often subjects than objects
–
She hates Mary.
–
Mary hates her.
Solution: Lexicalization
•
Add lexical information to each rule
An example of lexical information: Heads
Make use of notion of the head of a phrase
•
Head of an NP is a noun
•
Head of a VP is the main verb
•
Head of a PP is its preposition
Each LHS of a rule in the PCFG has a lexical item
Each RHS non-terminal has a lexical item.
•
One of the lexical items is shared with the LHS.
If R is the number of binary branching rules in CFG, in lexicalized
CFG: O(2*|∑|*|R|)
Unary rules: O(|∑|*|R|)
Example (correct parse)
Attribute grammar
Example (less preferred)
Computing Lexicalized Rule Probabilities
We started with rule probabilities
•
VP  V NP PP
–
P(rule|VP)
E.g., count of this rule divided by the number of VPs in a treebank
Now we want lexicalized probabilities
•
VP(dumped)  V(dumped) NP(sacks)PP(in)
•
P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^
in is the head of the PP)
•
Not likely to have significant counts in any treebank
Another Example
Consider the VPs
•
Ate spaghetti with gusto
•
Ate spaghetti with marinara
Dependency is not between mother-child.
Vp (ate)
Vp(ate)
Pp(with)
np
v
Ate spaghetti with gusto
Vp(ate)
Np(spag)
np Pp(with)
v
Ate spaghetti with marinara
Log-linear models for Parsing
• Why restrict to the conditioning to the elements of a
rule?
– Use even larger context
– Word sequence, word types, sub-tree context etc.
• In general, compute P(y|x); where fi(x,y) test the
properties of the context; li is the weight of that
feature.
e i i
P( y | x) 
 * f ( x, y )
e i i
 * f ( x, y )

yY
• Use these as scores in the CKY algorithm to find the
best scoring parse.
Supertagging: Almost parsing
Poachers
now
control
VP
S
N
N
N
Adv
VP
S
NP
NP
S
Adv
NP
poachers
now
S

NP
VP
NP
NP VP
poachers
now
Det
NP
:
:
Adj
trade
N
NP VP
NP
control
N
N
trade
S
NP
S
N
underground
S
V
:
:

Adj
control
Adv
N
N
N
V
VP
NP
underground
the
NP VP
V
S
S
NP
NP VP
control 
N
trade
S
V
V
underground
NP VP
now
poachers
the
S
S
NP VP
 V

NP
NP VP
V
NP

N
Adj
underground
:
trade
Summary
Parsing context-free grammars
•
Top-down and Bottom-up parsers
•
Mixed approaches (CKY, Earley parsers)
Preferences over parses using probabilities
•
Parsing with PCFG and PCKY algorithms
Enriching the probability model
•
Lexicalization
•
Log-linear models for parsing
Download