Syntax 2 - Computer Science

advertisement
Syntax 2:
Parsing Strategy
and
Active Chart Parsing
John Barnden
School of Computer Science
University of Birmingham
Natural Language Processing 1
2010/11 Semester 2
Towards More Variety in How
Parsing Proceeds
Top-Down and Bottom-Up Parsing Strategies
•
Like Prolog in general, DCGs work by goal-driven search :
Trying to satisfy the left-hand side (goal) parts of rules, by going through the components
of the right-hand side as sub-goals.
•
S  NP VP: if we’re trying to find the word-list (or an initial sub-list) to be a sentence
(S), we use this rule to try to find an initial sub-list to be an NP and then try to find the
rest to be (or start with) a VP.
If it doesn’t work out we try other rules for S.
To find an initial sub-list to be an NP, the system looks for rules with L.H.S = NP.
•
This causes the parsing to use a top-down strategy: at a given point we’re guided by a
hypothesis that we might have some higher-level structure (e.g., S), and to prove that
hypothesis we look for lower-level structures (e.g., NP, VP). We “bottom out” at lexical
forms.
Bottom-Up Parsing Strategy
•
A bottom-up (or data-driven) parsing strategy starts with the lexical forms and tries to
combine them into syntactic categories.
•
It can help to think of the rules written in reverse (with body on left and head on right):
NP VP  S
Verb PP  VP
Det Noun  NP
Det Adj Noun  NP
•
So, a Det followed by a Noun can be combined into an NP.
•
When some of the body of a rule has been satisfied, we’re ready for the next component
to arise.
As components of rule bodies are satisfied, we march through the appropriate rules. We
may be marching though more than one rule at once (conceptually), and indeed at
different rates through different instantiations of a given rule.
When we’ve swallowed the Det, we’re part-way through the bodies of the last two rules
above.
When an NP has been found, we’re part way through the top rule above, waiting for a VP
to arise.
•
A rule whose body we’re marching through may not work out, so we have to switch
attention to another.
Choice Points and Policies Towards Them
•
A choice-point is where there is, e.g.,
In top-down parsing:
more than one grammar rule to try for an expected syntactic category;
more than one lexical choice to try for an expected lexical category (NB: even for a
fixed lexical form, may have more than one possible set of grammatical-category
values)
In bottom-up parsing:
more than one lexical category or set of GC values to try for the current lexical form;
more than one rule-body “march” to progress
•
Deterministic parsing: at any choice-point , we commit to one choice.
•
So we usually need non-deterministic parsing: ability to explore various choices at a
choice-point.
•
Note: a choice-point is “open” if there remain alternatives to explore, otherwise it’s
“closed”.
Choice Points and Policies, contd
•
In non-deterministic parsing, there are two broad policies one can adopt: depth-first or
breadth-first.
•
Depth-first:
•
Intuition: you pursue any line of investigation as far as it will go; when you meet an
obstacle, you step back as little possible and then press forward again.
•
More technically: when you need an alternative to explore, pick one at the newest (most
recently found) open choice point.
This implies that when you create a new choice-point, you immediately start exploring an
alternative at it, not any previously existing one.
Also implies that when you’ve fully explored the consequences of the alternatives at a
choice point C, the only open choice-points are on a path from the start to C, and that
you go back to the latest such choice-point. (“Chronological backtracking”)
•
Some detail: each new choice point goes on top of a stack. You always do explorations
from the top of the stack, removing the top item when it’s closed.
Choice Points and Policies, contd
•
Breadth-first:
•
Intuition: you explore paths conceptually in parallel, pushing them all ahead a step
before extending any of them yet further. (A step = till next choice-point met.)
•
More technically: when you need an alternative to explore, pick one at the oldest (least
recently found) open choice point.
•
This implies that when you generate a new choice-point, you don’t explore an alternative
at it, unless there are no pre-existing ones that are open.
•
Some detail: each new choice point goes at the back of a queue. You always do
explorations from the front of the queue, removing the front item when it’s closed.
Three Quasi-Independent Dimensions of Strategy
•
Top-down versus Bottom-up (or some mix or compromise)
•
Breadth-first versus Depth-first (or some mix or compromise or some other strategy)
and, concerning order of progressing through the given lex-form sequence:
•
Left-to-right or Right-to-left or Middle-out or ...
– The basically middle-out idea of finding constitutents wherever you can and then
extending out from them to cover more of the sentence is called “island parsing”
and is especially useful for getting robustness in the face of speaker errors or general
“noise” in the channel.
– Similarly, “chunking” seeks to find particular sorts of constituent, e.g. NPs, wherever
they seem to be.
•
We’ll stick to left-to-right (or right-to-left for a language that is written in that direction).
– One motivation: there is some evidence that people try to structure the sentence as
it comes in, not just when it’s completely in, so it’s reasonable to think that language
has evolved in such a way that left-to-right parsing is a reasonable thing to do.
•
But that still leaves a wide space of possibilities subtended by the first two dimensions.
•
The active chart parsing technique allows very free choice within that space.
Another Issue: Waste in Structure Reinvestigation
•
Active chart parsing will also deal with another issue.
•
Because of bactracking , depth-first parsers may revisit the same sequence of words
many times, in different lines of investigation, and, each time, repeat some or all of the
work previously done.
•
Each time, a syntax subtree may be built: but if the line of investigation has to be
abandoned (backtracked over), then that subtree is thrown away, only to be rebuilt next
time round.
In the DCG case, the grammar-rule applications that don’t work out construct pieces of
tree, and then those pieces are simply lost when backtracking happens.
Waste in Structure Reinvestigation, contd
•
For instance, consider the following grammar and lexicon, which includes an ability to
deal with questions:
S  is NP PRED
PRED  Adj
NP  Det Noun
NP  Det Noun PP
Adj  red
Det  the
PRED  PP
Noun  block
PP  Prep NP
Noun  box
Prep  in
•
Consider the sentence “Is the block in the box red?”
•
This is ultimately parsed by taking the NP in S  is NP PRED to be “the block in the box”
via the rule NP  Det Noun PP, and the Pred to be the Adj “red”.
However, a first attempt could use NP  Det Noun to consume “the block” and then at
some point try to consume “in the box red” by means of PRED  PP.
This will consume “in the box” as a PP but then we get failure because we can’t consume
the “red”. So we have to backtrack to use the other NP rule, NP  Det Noun PP.
This will consume “in the box” as a PP once again.
•
Each of those consumptions of that subsequence creates the same PP subtree.
Well-Formed Substring Table (WFSST)
•
Can store in a WFSST every constituent (such an an NP) found at any point: the subtree
for it, the grammar rule used, and the lex-form subsequence it involves.
•
Then, when for instance we want to see whether there’s a particular sort of constituent
starting at a particular point in the sentence via a particular grammar rule, we can see
whether there’s a WFSST entry involving that rule and a subsequence that equals a
sentence portion starting at that point, instead of using the rule.
For interest: The WFSST idea is just a special case of the idea of a “memo”
function/procedure/method/predicate .... This idea is implemented in some programming
languages (including some versions of Prolog). A memo routine is one that records the result of
any call, so that if the routine is called again with the same parameter values, the result can
simply be retrieved rather than recomputed.
We get the WFSST idea if we regard each grammar rule as a routine whose output is a syntax
subtree and whose input is the relevant lex-form subsequence.
Exercise: consider the usefulness of memo functions in the natural, doubly-recursive
formulation of the Fibonacci function.
• But WFSST leave various problems unsolved, such as which rule to try next.
• Active chart parsers in essence involve WFSSTs but also address the need to be
flexible in the top-down/bottom-up and depth-first/breadth-first space.
Active Chart Parsing
•
The chart in ACP is a data structure that stores
• Complete (i.e., “well-formed”) constituents, as in a WFSST
• Partial constituents
• It is active because it has an associated process that leads to actions such as
trying to complete a partial constituent.
• The chart consists of vertices (points between the lex-forms) and edges joining
them, as in:
1
the
2
dog
3
yawned
4
with
5
dignity
6
Inactive Edges in the Chart
•
Inactive edges represent complete constituents (e.g., a whole NP) and consist of
• Start Vertex, End Vertex: e.g.
1, 3
• Label:
e.g.
NP
• Structure tree:
e.g.
np(det(the), noun (dog))
• Needed constituent list:
always = [ ]
• Such information would be attached to the following edge:
1
the
2
dog
3
4
5
6
Active Edges in the Chart
•
Active edges represent incomplete constituents (e.g., part of an NP) and consist of
• Start Vertex, End Vertex: e.g.
1, 2
• Label:
e.g.
NP
• Structure tree:
e.g.
np(det(the), )
or
• Needed constituent list:
e.g.
[noun] or [noun, PP]
or [PP]
np(det(the), noun(dog), )
• Such information would be attached to the following edges (the 1,3 one being
different from and coexisting with the inactive one on previous slide):
1
the
2
dog
3
4
5
6
Aspects of ACP Process
• Initialization i.e. creation of initial edges
– with a bottom-up sub-aspect (from the sentence) and a top-down sub-aspect
– NB: we’ll assume that grammar rules don’t contain lexical forms: such cases can always
be avoided by introducing a special lexical category containing one word
• Edge spawning from individual new edges added, via the grammar rules
– with bottom-up and top-down sub-aspects
• Edge combination by the Fundamental Rule of ACP: a “sideways” operation
• Agenda for management of edge combination and hence of parsing progress
• Termination e.g. when the agenda is empty or there’s an inactive edge covering
whole sentence and labelled with the “distinguished symbol” i.e. S.
• No Deletion of edges – they may be multiply used.
Initialization: Bottom-Up sub-aspect
• Create an inactive edge between each pair of adjacent vertices,
labelled with the lexical category (e.g. Noun) of the word there:
– or several edges if the word is in several categories (e.g. dog below)
(the following shows the edge info for just two of the edges)
1
the
Start, End: 1,2
Label: Det
Tree: det(the)
Needed: []
2
dog
3
yawned
4
with
5
Start, End: 3,4
Label: Verb
Tree: verb(yawned)
Needed: []
dignity
6
Initialization: Top-Down sub-aspect
• Create an active edge in the form of a loop at the start vertex,
labelled with S, for each rule of form S  ...
Suppose these rules are:
S  NP VP
S  VP
Then:
Start, End: 1,1
Label: S
Tree: s()
Needed: [NP,VP]
Start, End: 1,1
Label: S
Tree: s()
Needed: [VP]
1
the
2
dog
3
yawned
Here
Edge Spawning: Bottom-Up sub-aspect
• Whenever an inactive edge is added, look for grammar rules
whose RHS starts with the edge’s label E.
For each such rule L  E ...rest..., add a new active loop at the
edge’s start vertex, labelled L and with Needed = [E ...rest...]
i
Active edge spawned
Start, End: i,i
Label: L
Tree: l()
Needed: [E ...rest...]
j
Start, End: i,j
Label: E
Tree: e(.......)
Needed: []
Inactive edge just added
(by bottom-up initialization
or by edge-combination)
Edge Spawning: Top-Down sub-aspect
• Whenever an active edge is added, and the first item on its
Needed list is a non-lexical syntactic category L, look for grammar
rules whose LHS is L.
For each such rule L  ...RHS..., add a new active loop at the
edge’s end vertex, labelled L and with Needed = [...RHS...]
i
Active edge spawned
Start, End: i,i
Label: L
Tree: l()
Needed: [...RHS...]
j
Start, End: i,j
Label: ...
Tree: .......
Needed: [L, ...]
Active edge just added
(loop added by top-down
initialization or by top-down
spawning; or non-loop added
by edge-combination
Edge Combination: The “Fundamental Rule”
• When an inactive edge can extend an active edge just before it, build a new
edge. The new edge may be active or inactive, depending on the old active edge.
Start, End: 1,2
Label: NP
Tree: np(det((the),)
Needed: [noun ,...rest or nothing...]
1
the
Start, End: 1,2
Label: Det
Tree: det(the)
Needed: []
2
dog
The active
edge
3
yawned
Start, End: 2,3
Label: Noun
Tree: noun(dog)
Needed: []
4
with
The inactive
edge
5
dignity
6
Edge Combination, contd
• Then a new edge is added:
The original
active edge
Start, End: 1,2
Label: NP
Tree: np(det((the), noun(dog) ,)
Needed: [...rest...]
Start, End: 1,2
Label: NP
Tree: np(det((the),)
Needed: [noun, ...rest...]
1
the
Start, End: 1,2
Label: Det
Tree: det(the)
Needed: []
2
dog
3
yawned
Start, End: 2,3
Label: Noun
Tree: noun(dog)
Needed: []
4
with
The original
inactive edge
5
The NEW
inactive or
active edge
dignity
6
ACP Agenda
• All possible consecutive pairs of active and inactive edges are placed on the
agenda. (Consecutive: active edge’s end = inactive edge’s start. Includes loop case.) So
new pairs usually go onto agenda when either an active edge or an inactive edge is created.
• The order the pairs are kept on the agenda, or the order in which items are
selected from the agenda for processing, defines where we are on
– the
depth-first / breadth-first /... dimension
– the top-down/bottom-up dimension
NB: at some particular points in the space, not all edge-spawning mechanisms needed.
• The edge-combination rule (Fundamental Rule) implicitly reflects a policy of
stepping left-to-right through rules, so there’s implicitly an orientation to
proceeding left-to-right through the sentence ... But still, priority could be given
to pairs on the agenda that are further to the right in the sentence in some sense.
• Termination: e.g. When the agenda is empty or there’s an inactive edge covering
whole sentence and labelled with some “distinguished symbol” — usually S.
Final Remarks on ACP
• Advantages:
– Appears to save space (compared to a straightforward breadth-first) because a given
collection of edges forming a syntax subtree can be part of many alternative parses at
the same time.
– Appears to save time (compared to a straightforward top-down depth-first) because of
the avoidance of the activity of multiply re-creating subtrees.
• Note:
– To handle alternative grammatical-feature values for words and bigger constituents we
may need multiple edges for the same word or constituent
– though we can get by in some cases with unspecified values or with value-ranges
– but then need to spawn copies of structures if such value info becomes more
constrained, e.g. reduced to a specific value.
– This isn’t a specific disadvantage of ACP: analogous measures are needed in other
parsers to keep track of all the possibilities.
• Toy ACP program and explanatory notes – linked from module site.
Download