chaptr06

advertisement
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
%
Extracts from the book "Natural Language Processing in LISP"
%
published by Addison Wesley
%
Copyright (c) 1989, Gerald Gazdar & Christopher Mellish.
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
%
%
%
%
%
The fundamental rule of chart parsing
We can easily implement a chart parser in LISP by using a global
variable chart to store the active and inactive edges of the chart
(the complete code is in lib chart). Each edge we will represent by a
list of the following form:
(<start> <finish> <label> <found> <tofind>)
where:
<start>
starts
<finish>
<label>
with
<found>
found
<tofind>
(an integer) is the position in the chart where the edge
(an integer) is the position where the edge ends
(a category) is the type of phrase that the edge is involved
(a list) is the list of constituents that have already been
(a list) is the list of constituents that remain to be found
For a chart recognizer, the found and tofind lists need only be lists
of categories. Thus the following edge represents an edge that is
trying to find an NP starting at position 0 (the beginning of the
string). So far, a Det has already been found between the start and
finish points. However, before a complete NP can be found, an Adj and
an N must be found (starting at the current <finish> point):
start finish label
found
tofind
( 0
((Det))
((Adj) (N)) )
3
(NP)
We will make use of the following functions to access the components
of an edge:
(defun start (edge)
(nth 0 edge))
(defun finish (edge)
(nth 1 edge))
(defun label (edge)
(nth 2 edge))
(defun found (edge)
(nth 3 edge))
(defun tofind (edge)
(nth 4 edge))
For a chart parser, we will require that the edges remember more about
what they have already found. So we will require the <found> list to
be a list of parse trees. We will come back to this case shortly, and
will initially consider the recognizer case. In this case, a first
attempt to formulate the fundamental rule of chart parsing looks like
the following:
(dolist (active_edge chart)
(if (not (null (tofind active_edge)))
(dolist (inactive_edge chart)
(if (and
(null (tofind inactive_edge))
(equal (start inactive_edge) (finish active_edge))
(equal (label inactive_edge) (car (tofind active_edge))))
(setq chart
(cons
(list
(start active_edge)
(finish inactive_edge)
(label active_edge)
(append (found active_edge)
(list (label inactive_edge)))
(cdr (tofind active_edge)))))))))
That is, we need to look for pairs of entries in the chart, such that
the first (the active edge) ends where the second (the inactive edge)
starts, the second has no tofind categories and the label of the
second is the same as the first <tofind> category of the first.
This piece of code for applying the fundamental rule in all possible
ways, although suggestive, is unfortunately not quite what we need.
First of all, the additions to the chart are taking place during two
nested dolist iterations. In each of these loops the LISP system is
keeping track of how far through the chart it has searched so far.
When we add new items the chart changes and LISP will almost certainly
get confused, so that we will either check pairs of edges several
times or fail to check some combinations. A second problem is that we
have no control over which new edges are added when - it just depends
on the order in which dolist happens to find things. Finally, we can
make the checking of the fundamental rule more efficient by doing it
only when we add a new edge. So instead of having to check for all
combinations of edges in the database, we simply check, whenever an
edge is added to the chart, whether there is something that combines
with that edge. Here now is the core of our chart parser program, the
function which adds a new edge to the chart, checking for applications
of the fundamental rule:
(defun add_edge (edge)
(setq chart (cons edge chart))
(if (null (tofind edge))
; added edge is inactive
(progn
(dolist (chartedge chart)
(if (not (null (tofind chartedge)))
; look for active edge
(check_and_combine chartedge edge)))
(inactive_edge_function edge))
(progn ; otherwise added edge is active
(dolist (chartedge chart)
(if (null (tofind chartedge))
; look for inactive edge
(check_and_combine edge chartedge)))
(active_edge_function edge))))
The actual testing for the application of the fundamental rule is now
done by the function check_and_combine. check_and_combine tries to
combine an active and inactive edge using the fundamental rule. When
it wishes to add a new edge to the chart, instead of changing the
chart directly it calls agenda_add to put the new edge on the genda of
additions to be made:
(defun check_and_combine (active_edge inactive_edge)
(if
(and
(equal (start inactive_edge) (finish active_edge))
(equal (label inactive_edge) (car (tofind active_edge))))
(agenda_add
(list
; new edge
(start active_edge)
(finish inactive_edge)
(label active_edge)
(append (found active_edge)
(list
(tree (label inactive_edge) (found inactive_edge))))
(cdr (tofind active_edge))))))
As well as calling check_and_combine, add_edge also calls functions
inactive_edge_function and active_edge_function according to whether
the edge added is inactive or active. We will appropriately define
these functions later to obtain bottom-up or top-down parsing as
required. Finally, we have now produced the core of a parser by
having the <found> elements of edges be lists of parse trees. The
function tree constructs a parse tree for a phrase, given the category
of the phrase and the list of the parse trees of the constituents.
Initialization
We have now covered most of the content of the top-level function
chart_parse for the parser. chart_parse expects a goal (a category)
and a string (list of words). It returns all the possible parse trees
in a list. Before going into its main loop, chart_parse must perform
initialization and for this it calls the function initialize_chart.
When the chart is run using a bottom-up strategy, this simply produces
inactive edges for all the words in the string:
(defun initialize_chart (goal string)
(do* (
(vertex 0 (+ vertex 1))
(remaining string (cdr remaining))
(word (car string) (car remaining)))
((null word))
(agenda_add
(list
;; (start finish label found tofind)
vertex (+ 1 vertex) word nil nil))))
Note that for simplicity we have entered the words rather than their
categories. Given that we are representing lexical entries as normal
grammar rules, as we will see that normal bottom-up parsing will fill
in the categories for us. When all applications of the edge-adding
rules have been made in the main loop, chart_parse then looks for
complete parses by looking for edges of the form:
(0 L GOAL TREES ())
where L is the number of words in the string and GOAL is the name of
the category that we are interested in (e.g. (S)). In any edge of
this form, TREES will be the sequence of parse trees for the immediate
constituents of the goal category (e.g. ((NP Dr Chan) (VP (V
employed)(NP nurses)))). These can easily be combined with the GOAL
category (e.g. (S)) to produce a parse tree of the string.
(defun chart_parse (goal string)
(setq agenda nil)
(setq chart nil)
(initialize_chart goal string)
...main loop...
(let ((parses ()))
(dolist (edge chart parses)
(if (and
(equal (start edge) 0)
(equal (finish edge) (length string)) ; end of string
(equal (label edge) goal)
; recognizes goal
(null (tofind edge)))
; edge complete
(setq parses
(cons
(tree goal (found edge)) ; parse tree
parses))))))
Rule invocation
Top-down and bottom-up styles of parsing are implemented in our
program by the functions active_edge_function and
inactive_edge_function, which are called whenever an active, or
inactive, edge is added to the chart. Any parsing strategy will need
to refer to the rules of the grammar, and we will assume that these
are available in the global variable rules, as in Chapter 4. We will
not, for the present, deal with general feature specifications, but
will confine ourselves to monadic categories like (S). Here are the
functions for bottom-up parsing:
(defun inactive_edge_function (edge)
(dolist (rule rules)
(if (equal (label edge) (cadr rule)) ; the first daughter in the
rhs
(agenda_add
(list
(start edge) (start edge) (car rule) nil (cdr rule))))))
(defun active_edge_function (edge) t)
Search strategy
The main loop of the parser is entirely concerned with manipulating
the agenda:
(do
((edge (car agenda) (car agenda)))
((null agenda))
(setq agenda (cdr agenda))
(add_edge edge))
agenda, a global variable, holds the list of edges waiting to be added
to the chart. This list holds the edges in priority order, because
the loop deals with the edges in the order they appear in the list.
In general, of course, add_edge will cause new edges to be added to
the agenda, so that agenda may grow between iterations.
So far, however, our parser remains fairly neutral about the search
strategy to be adopted. However, by providing appropriate definitions
for the remaining functions we can force it to work in a number of
different ways. As we have seen, the search strategy is determined by
how new edges are to be added to the agenda. The following definition
of agenda_add causes new edges to be added at the beginning (i.e. they
are given highest priority) and this leads to a kind of depth-first
search.
(defun agenda_add (edge)
(setq agenda (cons edge agenda)))
Housekeeping
We can readily deal with the necessary housekeeping by minor additions
to the code already shown. The following will ensure that agenda_add
checks for duplicate edges:
(defun agenda_add (edge)
(if (or
(already_in edge agenda)
; left recursion check
(already_in edge chart))
nil
; do not add to agenda
(setq agenda (cons edge agenda)))) ; add to front of agenda
(defun already_in (edge edgelist)
(member edge edgelist :test #'equal))
For the already_in check we use member, but using equal (rather than
eql) as the relevant equality test, because eql will not succeed if it
is provided with two different lists with the same elements.
The function add_edge already calls tree so as to put trees rather
than categories in the found list. This function to build a tree from
a category and a list of subtrees is as follows:
(defun tree (cat subtrees)
(if (consp cat)
(cons (car cat) subtrees)
cat))
Alternative rule invocation strategies
The functions initialize_chart, active_edge_function and
inactive_edge_function are easily redefined for top-down parsing.
Top-down parsing has to be started by the addition of initial active
edges for all rules which can expand the goal category. The function
add_rules_to_expand adds active edges at a given vertex corresponding
to all possible rules with a given category on the LHS. Note that in
bottom-up parsing, no special actions are performed when active edges
are added. For top-down parsing, this situation is reversed:
(defun inactive_edge_function (edge) t)
(defun active_edge_function (edge)
(add_rules_to_expand (car (tofind edge)) (finish edge)))
(defun add_rules_to_expand (goal vertex)
(dolist (rule rules)
(if (equal goal (car rule)) ; the lhs of the rule
(agenda_add
(list vertex vertex (car rule) nil (cdr rule))))))
(defun initialize_chart (goal string)
(do* (
(vertex 0 (+ vertex 1))
(remaining string (cdr remaining))
(word (car string) (car remaining)))
((null word))
(agenda_add
(list
;; (start finish
label found tofind)
vertex (+ 1 vertex) word nil
nil
)))
(add_rules_to_expand goal 0))
Download