Syntax 2: Parsing Strategy and Active Chart Parsing John Barnden School of Computer Science University of Birmingham Natural Language Processing 1 2010/11 Semester 2 Towards More Variety in How Parsing Proceeds Top-Down and Bottom-Up Parsing Strategies • Like Prolog in general, DCGs work by goal-driven search : Trying to satisfy the left-hand side (goal) parts of rules, by going through the components of the right-hand side as sub-goals. • S NP VP: if we’re trying to find the word-list (or an initial sub-list) to be a sentence (S), we use this rule to try to find an initial sub-list to be an NP and then try to find the rest to be (or start with) a VP. If it doesn’t work out we try other rules for S. To find an initial sub-list to be an NP, the system looks for rules with L.H.S = NP. • This causes the parsing to use a top-down strategy: at a given point we’re guided by a hypothesis that we might have some higher-level structure (e.g., S), and to prove that hypothesis we look for lower-level structures (e.g., NP, VP). We “bottom out” at lexical forms. Bottom-Up Parsing Strategy • A bottom-up (or data-driven) parsing strategy starts with the lexical forms and tries to combine them into syntactic categories. • It can help to think of the rules written in reverse (with body on left and head on right): NP VP S Verb PP VP Det Noun NP Det Adj Noun NP • So, a Det followed by a Noun can be combined into an NP. • When some of the body of a rule has been satisfied, we’re ready for the next component to arise. As components of rule bodies are satisfied, we march through the appropriate rules. We may be marching though more than one rule at once (conceptually), and indeed at different rates through different instantiations of a given rule. When we’ve swallowed the Det, we’re part-way through the bodies of the last two rules above. When an NP has been found, we’re part way through the top rule above, waiting for a VP to arise. • A rule whose body we’re marching through may not work out, so we have to switch attention to another. Choice Points and Policies Towards Them • A choice-point is where there is, e.g., In top-down parsing: more than one grammar rule to try for an expected syntactic category; more than one lexical choice to try for an expected lexical category (NB: even for a fixed lexical form, may have more than one possible set of grammatical-category values) In bottom-up parsing: more than one lexical category or set of GC values to try for the current lexical form; more than one rule-body “march” to progress • Deterministic parsing: at any choice-point , we commit to one choice. • So we usually need non-deterministic parsing: ability to explore various choices at a choice-point. • Note: a choice-point is “open” if there remain alternatives to explore, otherwise it’s “closed”. Choice Points and Policies, contd • In non-deterministic parsing, there are two broad policies one can adopt: depth-first or breadth-first. • Depth-first: • Intuition: you pursue any line of investigation as far as it will go; when you meet an obstacle, you step back as little possible and then press forward again. • More technically: when you need an alternative to explore, pick one at the newest (most recently found) open choice point. This implies that when you create a new choice-point, you immediately start exploring an alternative at it, not any previously existing one. Also implies that when you’ve fully explored the consequences of the alternatives at a choice point C, the only open choice-points are on a path from the start to C, and that you go back to the latest such choice-point. (“Chronological backtracking”) • Some detail: each new choice point goes on top of a stack. You always do explorations from the top of the stack, removing the top item when it’s closed. Choice Points and Policies, contd • Breadth-first: • Intuition: you explore paths conceptually in parallel, pushing them all ahead a step before extending any of them yet further. (A step = till next choice-point met.) • More technically: when you need an alternative to explore, pick one at the oldest (least recently found) open choice point. • This implies that when you generate a new choice-point, you don’t explore an alternative at it, unless there are no pre-existing ones that are open. • Some detail: each new choice point goes at the back of a queue. You always do explorations from the front of the queue, removing the front item when it’s closed. Three Quasi-Independent Dimensions of Strategy • Top-down versus Bottom-up (or some mix or compromise) • Breadth-first versus Depth-first (or some mix or compromise or some other strategy) and, concerning order of progressing through the given lex-form sequence: • Left-to-right or Right-to-left or Middle-out or ... – The basically middle-out idea of finding constitutents wherever you can and then extending out from them to cover more of the sentence is called “island parsing” and is especially useful for getting robustness in the face of speaker errors or general “noise” in the channel. – Similarly, “chunking” seeks to find particular sorts of constituent, e.g. NPs, wherever they seem to be. • We’ll stick to left-to-right (or right-to-left for a language that is written in that direction). – One motivation: there is some evidence that people try to structure the sentence as it comes in, not just when it’s completely in, so it’s reasonable to think that language has evolved in such a way that left-to-right parsing is a reasonable thing to do. • But that still leaves a wide space of possibilities subtended by the first two dimensions. • The active chart parsing technique allows very free choice within that space. Another Issue: Waste in Structure Reinvestigation • Active chart parsing will also deal with another issue. • Because of bactracking , depth-first parsers may revisit the same sequence of words many times, in different lines of investigation, and, each time, repeat some or all of the work previously done. • Each time, a syntax subtree may be built: but if the line of investigation has to be abandoned (backtracked over), then that subtree is thrown away, only to be rebuilt next time round. In the DCG case, the grammar-rule applications that don’t work out construct pieces of tree, and then those pieces are simply lost when backtracking happens. Waste in Structure Reinvestigation, contd • For instance, consider the following grammar and lexicon, which includes an ability to deal with questions: S is NP PRED PRED Adj NP Det Noun NP Det Noun PP Adj red Det the PRED PP Noun block PP Prep NP Noun box Prep in • Consider the sentence “Is the block in the box red?” • This is ultimately parsed by taking the NP in S is NP PRED to be “the block in the box” via the rule NP Det Noun PP, and the Pred to be the Adj “red”. However, a first attempt could use NP Det Noun to consume “the block” and then at some point try to consume “in the box red” by means of PRED PP. This will consume “in the box” as a PP but then we get failure because we can’t consume the “red”. So we have to backtrack to use the other NP rule, NP Det Noun PP. This will consume “in the box” as a PP once again. • Each of those consumptions of that subsequence creates the same PP subtree. Well-Formed Substring Table (WFSST) • Can store in a WFSST every constituent (such an an NP) found at any point: the subtree for it, the grammar rule used, and the lex-form subsequence it involves. • Then, when for instance we want to see whether there’s a particular sort of constituent starting at a particular point in the sentence via a particular grammar rule, we can see whether there’s a WFSST entry involving that rule and a subsequence that equals a sentence portion starting at that point, instead of using the rule. For interest: The WFSST idea is just a special case of the idea of a “memo” function/procedure/method/predicate .... This idea is implemented in some programming languages (including some versions of Prolog). A memo routine is one that records the result of any call, so that if the routine is called again with the same parameter values, the result can simply be retrieved rather than recomputed. We get the WFSST idea if we regard each grammar rule as a routine whose output is a syntax subtree and whose input is the relevant lex-form subsequence. Exercise: consider the usefulness of memo functions in the natural, doubly-recursive formulation of the Fibonacci function. • But WFSST leave various problems unsolved, such as which rule to try next. • Active chart parsers in essence involve WFSSTs but also address the need to be flexible in the top-down/bottom-up and depth-first/breadth-first space. Active Chart Parsing • The chart in ACP is a data structure that stores • Complete (i.e., “well-formed”) constituents, as in a WFSST • Partial constituents • It is active because it has an associated process that leads to actions such as trying to complete a partial constituent. • The chart consists of vertices (points between the lex-forms) and edges joining them, as in: 1 the 2 dog 3 yawned 4 with 5 dignity 6 Inactive Edges in the Chart • Inactive edges represent complete constituents (e.g., a whole NP) and consist of • Start Vertex, End Vertex: e.g. 1, 3 • Label: e.g. NP • Structure tree: e.g. np(det(the), noun (dog)) • Needed constituent list: always = [ ] • Such information would be attached to the following edge: 1 the 2 dog 3 4 5 6 Active Edges in the Chart • Active edges represent incomplete constituents (e.g., part of an NP) and consist of • Start Vertex, End Vertex: e.g. 1, 2 • Label: e.g. NP • Structure tree: e.g. np(det(the), ) or • Needed constituent list: e.g. [noun] or [noun, PP] or [PP] np(det(the), noun(dog), ) • Such information would be attached to the following edges (the 1,3 one being different from and coexisting with the inactive one on previous slide): 1 the 2 dog 3 4 5 6 Aspects of ACP Process • Initialization i.e. creation of initial edges – with a bottom-up sub-aspect (from the sentence) and a top-down sub-aspect – NB: we’ll assume that grammar rules don’t contain lexical forms: such cases can always be avoided by introducing a special lexical category containing one word • Edge spawning from individual new edges added, via the grammar rules – with bottom-up and top-down sub-aspects • Edge combination by the Fundamental Rule of ACP: a “sideways” operation • Agenda for management of edge combination and hence of parsing progress • Termination e.g. when the agenda is empty or there’s an inactive edge covering whole sentence and labelled with the “distinguished symbol” i.e. S. • No Deletion of edges – they may be multiply used. Initialization: Bottom-Up sub-aspect • Create an inactive edge between each pair of adjacent vertices, labelled with the lexical category (e.g. Noun) of the word there: – or several edges if the word is in several categories (e.g. dog below) (the following shows the edge info for just two of the edges) 1 the Start, End: 1,2 Label: Det Tree: det(the) Needed: [] 2 dog 3 yawned 4 with 5 Start, End: 3,4 Label: Verb Tree: verb(yawned) Needed: [] dignity 6 Initialization: Top-Down sub-aspect • Create an active edge in the form of a loop at the start vertex, labelled with S, for each rule of form S ... Suppose these rules are: S NP VP S VP Then: Start, End: 1,1 Label: S Tree: s() Needed: [NP,VP] Start, End: 1,1 Label: S Tree: s() Needed: [VP] 1 the 2 dog 3 yawned Here Edge Spawning: Bottom-Up sub-aspect • Whenever an inactive edge is added, look for grammar rules whose RHS starts with the edge’s label E. For each such rule L E ...rest..., add a new active loop at the edge’s start vertex, labelled L and with Needed = [E ...rest...] i Active edge spawned Start, End: i,i Label: L Tree: l() Needed: [E ...rest...] j Start, End: i,j Label: E Tree: e(.......) Needed: [] Inactive edge just added (by bottom-up initialization or by edge-combination) Edge Spawning: Top-Down sub-aspect • Whenever an active edge is added, and the first item on its Needed list is a non-lexical syntactic category L, look for grammar rules whose LHS is L. For each such rule L ...RHS..., add a new active loop at the edge’s end vertex, labelled L and with Needed = [...RHS...] i Active edge spawned Start, End: i,i Label: L Tree: l() Needed: [...RHS...] j Start, End: i,j Label: ... Tree: ....... Needed: [L, ...] Active edge just added (loop added by top-down initialization or by top-down spawning; or non-loop added by edge-combination Edge Combination: The “Fundamental Rule” • When an inactive edge can extend an active edge just before it, build a new edge. The new edge may be active or inactive, depending on the old active edge. Start, End: 1,2 Label: NP Tree: np(det((the),) Needed: [noun ,...rest or nothing...] 1 the Start, End: 1,2 Label: Det Tree: det(the) Needed: [] 2 dog The active edge 3 yawned Start, End: 2,3 Label: Noun Tree: noun(dog) Needed: [] 4 with The inactive edge 5 dignity 6 Edge Combination, contd • Then a new edge is added: The original active edge Start, End: 1,2 Label: NP Tree: np(det((the), noun(dog) ,) Needed: [...rest...] Start, End: 1,2 Label: NP Tree: np(det((the),) Needed: [noun, ...rest...] 1 the Start, End: 1,2 Label: Det Tree: det(the) Needed: [] 2 dog 3 yawned Start, End: 2,3 Label: Noun Tree: noun(dog) Needed: [] 4 with The original inactive edge 5 The NEW inactive or active edge dignity 6 ACP Agenda • All possible consecutive pairs of active and inactive edges are placed on the agenda. (Consecutive: active edge’s end = inactive edge’s start. Includes loop case.) So new pairs usually go onto agenda when either an active edge or an inactive edge is created. • The order the pairs are kept on the agenda, or the order in which items are selected from the agenda for processing, defines where we are on – the depth-first / breadth-first /... dimension – the top-down/bottom-up dimension NB: at some particular points in the space, not all edge-spawning mechanisms needed. • The edge-combination rule (Fundamental Rule) implicitly reflects a policy of stepping left-to-right through rules, so there’s implicitly an orientation to proceeding left-to-right through the sentence ... But still, priority could be given to pairs on the agenda that are further to the right in the sentence in some sense. • Termination: e.g. When the agenda is empty or there’s an inactive edge covering whole sentence and labelled with some “distinguished symbol” — usually S. Final Remarks on ACP • Advantages: – Appears to save space (compared to a straightforward breadth-first) because a given collection of edges forming a syntax subtree can be part of many alternative parses at the same time. – Appears to save time (compared to a straightforward top-down depth-first) because of the avoidance of the activity of multiply re-creating subtrees. • Note: – To handle alternative grammatical-feature values for words and bigger constituents we may need multiple edges for the same word or constituent – though we can get by in some cases with unspecified values or with value-ranges – but then need to spawn copies of structures if such value info becomes more constrained, e.g. reduced to a specific value. – This isn’t a specific disadvantage of ACP: analogous measures are needed in other parsers to keep track of all the possibilities. • Toy ACP program and explanatory notes – linked from module site.