PARSER FOR PORTABLE NL INTERFACES USING

From: AAAI-86 Proceedings. Copyright ©1986, AAAI (www.aaai.org). All rights reserved.
A PARSER
FOR PORTABLE
NL INTERFACES
USING GRAPH-UNIFICATION-BASED
GRAMMARS
Kent Wittenburg
and Computer
Microelectronics
may
Abstract
This
paper
presents
of a parser
the reasoning
for the Lingo
the selection
and
that
uses
by
a best-first
flexibility
language
interfaces
on natural
in the selection
a syntactically
allowing
grammars
exhaustive
control
for these
for
best-first
in particular
enumeration
development.
at
of the parsing
algorithm
based grammar,
using
a number
a
other
on an agenda.
processing
of
advantages
representation
of
managed
language
tuning
It
applications
parsing
of
this
languages
advantages
that
advantages
for
particular
choice
for
are outlined,
acrue
to
graph-
as well
this
approach
based
greater
enhance
this
short
greater
design
relatively
to new applications,
sophistication
required
but
also,
for interfaces
of the future.
choice,
the
next
step
for grammar
representation
and
itself.
The representation
language
unification-based
formalism,
(1984) and Shieber
(1984).
languages
have
had
a
tremendous
computational
and
linguistics,
for
constrained
domains,
offer a modularity
in design
that
in the long run.
Not only does the
applications
first
run
highly
transportability
the
to knowledge-based
Given
the
in
grammars
payoffs
possible
formalism
grammar
in
systems
modularity
it makes
domains
while at the same time allowing
of the search
space
during
grammar
Efficiency
unification-based
structure
natural
offer
syntactically
will achieve
design
graph-unification-based
representation
language,
using
Combinatory
Categorial
Grammars,
and adopting
a one-to-many
mapping
from syntactic
bracketings
to semantic
representations
in
certain
cases.
The algorithm
chosen is a variant
of chart parsing
offers
Corporation
unsophisticated
behind
project
MCC.
The major factors
were the choices of having
Technology
was
choose
to
a
an approach
to the
chosen was a graph-
in particular,
one based on Karttunen
Graph-unification-based
representation
impact
in fact
have
in
been
the
field
of
incorporated
in
one form or another
into a number
of influential
linguistic
theories
(see Shieber
1986 for discussion).
Among
the many advantages
of
as
a
by
requiring
graph-unification-based
formalism
no special
training
for
are
(i)
grammar
it
is
writers
easy
with
to
use,
linguistics
virtue
of its use of an agenda
as a control
structure.
We also
mention
two
useful
refinements
to the basic
best-first
chart
parsing
algorithm
that
have
been
implemented
in the Lingo
backgrounds;
(ii) it is a language
separable
from any particular
machine-dependent
implementation
and
thus
amenable
to
optimizations
at many levels; (iii) it avoids the typical
explosion
of
project.
ad
hoc
procedural
declarative
variety
1. Introduction
In designing
the
first
used
a portable
crucial
in the
Existing
there
NL
those
to
(NL)
can
(Martin,
a semantically
interface,
require
it be syntactically
of the
one
based,
along
general
Appelt,
and
Pereira
based
grammar
lines:
grammar
1983),
particular
of
versus
to
grammatical
other
things
to generate
Our
based.
these
the
choice
Categorial
Though
this
for
an
untested
approach
to
(iii)
order
partially
grammar
domain.
offer the
must
The
advantage
for
new
each
and
be rewritten
syntactically
domain
and
Robustness
in
of parsing
systems
which,
along
to separately;
though
initial
design
entwined
with
One
of the
language
grammar.
the
able
to avoid
achieving
first
rehacking
a greater
syntactic
in the
the
level
variations
is a component
grammar
of generality
of
the
decisions
was
in the
to go with
It was felt that,
although
come
MCC
for
Lingo
a general,
grammar,
and
the
being
flexible,
(v)
it is order
same
a purely
accommodating
free,
grammar
a
which
rules
can
be
input.
and
extendable
dealing
base,
was
was
1982,
applications
particularly
Combinatory
Steedman
1985).
to date,
we felt
promising
to empty
it
lexical
relative
in
the
it handles
English
extraction
phenomena
clauses,
etc.)
efficiently
and elegantly
offers
rewrite
rules
a method
free
word
(iv)
it
ambiguity
it is particularly
suitable
designs;
and (vi) it is able
rule
grammar
or complicated
(ii) it accounts
for more of English
approaches
without
special
rules
formalism;
with
the
Steedman
language
grammar
resorting
passing
schemes;
than
alternative
to
and
of
accounting
order
with
suggests
and
heuristic
for left-to-right,
to accomplish
feature
coordination
or ad hoc
for
free
a simple
and
word
easily
natural
techniques
for
rule
preferencing;
(v)
incremental
all this with
processing
a very small
to the alternatives.
syntactically-based
with semantics
generally,
must be attended
robustness
is obviously
not precluded
by this
choice,
it does not
the grammar
itself.
interfaces
the
from scratch,
in general,
for each new
based grammars,
on the other hand,
of being
sophistication
and
(Ades
respects:
(i)
relative
(wh-questions,
operations;
coverage
the
very
that
approach
following
suffer
syntactic
is
as to analyze.
in natural
generally
of
means
Grammar
withough
patchiness
in
it
theories;
as well
domain,
e.g., Plume
(Hayes,
Andersen,
amd Safier
1985).
The
semantically
based
grammars
offer customization
of the entire
robustness
of parsing
along
domain
system
for the
domain;
However,
they
sensitive
lines
is an advantage
usually
cited.
from
(iv)
of
grammar
or semantically
be classified
use a syntactically
TEAM
use
language
is whether
systems
that
e.g.,
that
that
interface
are those
English,
natural
decisions
system
of
among
used
operations
language;
free
since
project
it
is
not
Given
the
the
subject
three
of this
design
decisions
paper,
parser
for NL interface
representation
language
just
namely,
applications
have just
we
mentioned,
the
selection
using the
mentioned.
we now come
and
design
grammars
and
to
of a
the
on natural
syntactically-based
semantically-based
grammars
NATURAL
LANGUAGE
/
1053
2. Charts
First
let
criteria,
us
ask
independently
following
of
what
the
desiderata
we
choices
should
should
expect
related
to
speak
from
the
the
parser
The
grammar.
for themselves:
structure
makes
it
possible
to develop
sophisticated
grammar
development
that the state of a parse can be examined
at any point.
the
tools
so
agendas
is
Another
-
presence
point
its relation
A formally
sound
ensure termination,
l
basis for the
completeness,
algorithm
in order
and tractability.
to
for
debugging
The
l
facilities
ability
domains
systems
The
l
development
grammar
inspection
and
to tune
particular
right-left,
or
language
input.
grammars
potential
to
into
integrate
for
and
for
purposes
to any
heuristic
parsing
middle-out
in
in
this
the
area
effect
paths
with
if necessary.
algorithm
able
the
respect
VS.
Partial
and
an
of
and
can
be
attention
can
be
time,
a property
at any
agenda
processing
of
it is hard
any
heuristics
to suspend
to control
natural
completely
agenda,
possibility
Thus
with
is
as appropriate
of the chart
of being
left-right,
processing
respect
on
as long
with
proceeds
In fact,
can be designed.
any mix of these
orders;
the
later
chart
the control
ordering
arbitrary
promising
general
of
of
can be achieved
produces
processing
contextual
the
version
the
control
about
of whether
Enumeration
operations
to allow
which
less
factors
by
directed
applied
making
some
designs
agenda
designed
in particular
efficient
semantics
preferencing
these
steps.
of parsing
such
that
prototypes
can be developed.
factors
maximize
that
worth
to questions
controlled
Modes
l
of an inspectable
on
resuming
such
to imagine
a more
decisions.
tuning.
A
l
design
credit
that
individual
principle
allows
schemes
performance
to
maximum
place
chart
for
adaptation
to
flexibility
and
formal
future
adaptation
to
soundness,
the
extolled
by
Kay
most
and
others
and
pieces
of evidence
One
of the
their
have
widespread
adoption.
Charts,
figured
prominently
in important
persuasive
from
the computer
natural
language
science
systems,
e.g., Slocum
not
reason
obvious
some variant
of
of charts have
be repeated
here.
in favor
e.g.,
Earley
is that
(1970),
in
e.g.,
Kaplan
(1973),
Bear
and
(1981), Ford,
Bresnan,
and Kaplan
Patil
(1981), Shieber
(1985), and in
(1981).
through
whatever
reason,
in
published
most
designs,
have
despite
written
that
of the
possibility
order
in
driven
chart
applied
However,
with
fact
for
existing
a Wizard-of-Oz
operations
agenda
to
the
(1973)
to control
ways.
other
control
Kaplan
the
Research
hand,
seems
to
into
have
concerns
(e.g.,
Ford,
by the need to develop
the
needs
of NL interface
Retaining
shortly
the
without
maximum
having
grammars
of ‘the
agenda
are (i) as a means
development,
an agenda
The advantages
of having
a parser
go well beyond
to scrap
is a major
relative
structure.
of integrating
flexibility
ease
Other
an agenda
matters
of
for
or radically
advantage.
of adding
possible
semantics
and
later
alter
We will
meta-level
uses of an
contextual
And,
to
complete
our
list
/ ENGINEERING
It seems
safe
to say that
no
added to the parser,
and Martin,
are
interface
Our
choices
relating
approach
to
add
matter
Church,
the
representation
language
what
and
and
further
weight
to this argument
It is a commonly
acknowledged
enumeration.
unification
of
mostly
because
graphs
tends
to be expensive
unification
is destructive
of the
the
against
fact that
computationally,
graphs on which
it
1984). The expense
is particularly
is called (see, e.g., Karttunen
evident
in chart parsing
because,
by definition,
one needs to retain
edge
graphs
create
in
new
their
edge
original
graphs
state
that
before
reflect
unification
the
result
originals.
Optimizations
of the unification
been suggested
as a means for coping with
1985;
Karttunen
and
obviously
welcome,
calls
unification
to
As
While
place
to cut
in
detailed
interpretations
is that
rule
firings
that
which
place.*
(1985),
extraction
is the
identical
lead
to
optimizations
are
the
complete
a
consequence
certain
potential
for
Another
through
identical
Grammars
also
is inappropriate.
and
content.
paths
the
as
the
is to minimize
Avoiding
Categorial
multiple
unifying
line.
in parsing
have
are
such
costs
enumeration
grammars
as well
of
algorithm
itself have
this problem
(Pereira
Combinatory
Wittenburg
these
there
first
in this
handling
in
this
of
in
for
the
step
exhaustive
mechanism
coordination
paths,
1985).
properties
that
is
Kay
another
is a first
Certain
suggest
the
solution.
of
the
kinds
of
ambiguous
way
to say
search space of
The obvious
method
to use in such a domain
avoids enumeration
of all
since there is no obvious
reason to do so. This general plan
*
It is relevant
has been
1054
Patil
application.
search
of
and
language
improvements
others.
Church,
exhaustive
enumeration
in the face of such data
lead to satisfactory
performance
for a natural
by
among
Martin,
Patil added many,
will probably
not
unification
(1980),
experiment,
coverage
for certain
kinds
a corpus
collected
product.
optimizations
factors
into the scoring;
(ii) as a framework
for credit assignment
schemes
to
automate
adaptation
to
individual
performance
situations;
and
(iii)
as the
control
mechanism
for
parallel
processing
of parsing
steps, a natural
use of agendas
pointed
out
Kay
of extensive
each
enumeration
to a parser
example
an agenda
flexible
on
also
systems.
code or existing
see an
and
by psycholinguistic
1982), not specifically
efficiency.
adjustments
breadth-first
(1980)
of using
maximally
is in fact the key ingredient.
as a control
mechanism
for
increased
exhaustive
Kay
parsers,
been
mainly
driven
Bresnan,
and Kaplan
efficient
seems
work
the
enumeration
agenda
there
chart
parsing
in the
attention
in the more
in chart parsing
For
to be an association
of chart parsing
grammars
found 958 parses for the sentence
In as much as allocating
is u tough job I would like to have the total costs related
to
grammar
despite
the
popularity
of
there has been relatively
little
quarters
to the role of agendas
based
amounts
of syntactic
ambiguity
For example,
working
with
(1981)
costs
exhaustive
However,
literature,
theoretical
syntactically
admit staggering
of constructions.
of charts is
or something
very similar,
theoretical
work on parsing
perspective,
research,
Karttunen
(1979), Thompson
(1982), Martin,
Church,
and
applied
will
most
Enumeration
While
many
chart-parsing
algorithms
use control
schemes
that
exhaustively
enumerate
the search space, there are a number
of
reasons why exhaustive
enumeration
is unsuitable
for NL interface
applications
given
the choices
mentioned.
The
most
obvious
situations.
to begin in constructing
a parser
is with
parsing
(Kay 1980).
The many advantages
been
2.1. Exhaustive
incorporating
automate
A design
that does not preclude
parallel
processing
schemes.
l
For
in
assignment
to report,
using
in parsing
(at least
based
structure
on unpublished
sharing
performance.
temporarily)
shelved
work
by David
Wroblewski,
not
yield
the
expected
may
In fact, unification
with structure
in the Lingo
project.
that
dramatic
sharing
has
been
with
embraced
in the
theoretical
formal
devices
utilized
the
Grammar.
It
strategies
which
has
been
context;
has no reason
There
is
one
that
so
as
interpretation.
by Church
in
Among
parse,
mapping
in the
avoiding
one
and
Marcus,
assume
grammar
that
that
are
and
For
more
if we
generating
all
reason we can
than
the
the
of
reached
from
syntactic
bracketings
case of extraposition
from
once we
the same
mentioned
by
one
only
member
one
l
there
Despite
all these
to
noun
have
one
phrase
interpretation
or
that
against
there
are
a grammar
of
happens
have
such
for
l
under
toggled
such
between
arbitrary
points
controlled
chart
it need
not
involve
The
to be explored.
during
be
It is very
circumstances.
exhaustive
in between,
parsing
Thus
and
The
is the
offers
this
sort
we can
that
important
to
so
that
ask
On
The
can
be
as well
as
for.
Agenda
first
algorithm
(Earley
ordering
schemes
associated
for exhaustive
string
during
itself
is ill-suited
have
been
to
get
some
partial
(Slocum
for
anything
along
with
with
chart
parsing,
efforts
Given
distinguish,
then,
suggestions
in the
with
our
what
with
other
breadth-
in such
breadth-first
motivation
to
be
for heuristically
some
form
for
there
a way
proposed
ordering
of breadth-first
as
ordering
keeping
rule
search
should
from
other
in Nilsson
of
all,
sense
has,
of
Nilsson
under
of merging
i.e.,
certain
nodes
alternate
the
paths
are
in the
through
of little
search
the
importance;
algorithm
is not
of the
algorithm,
optimality
the
of the total
number
of
graph
that are expanded,
a
nodes
will
in the
have a
on efficiency.
actions
(1980).
appearing
on a chart
can be looked
at
to take.**
We
chart
parsing,
then,
graph-searching
procedure
now
turn
3. The Best-First
to an overview
allow
a
presented
of the
algorithm
Algorithm
but
that
could
purpose
here is not to present
the best-first
chart
parsing
Readers
may consult
the original
sources or
in depth.
(1986b) for detail in this respect.
What I wish to do is
certain
features
of the algorithm
that help achieve
the
outlined
The
parser
reflection
Nilsson,
two
above.
of
and
enumeration
this
importance
project
Raphael
1968)
has
of
been
the
dubbed
A*
in its design.
previously
published
Astro,
search
The
chart
which
algorithm
algorithm
parsing
is a
(Hart,
we use is
algorithms
respects.
or unary
We assume a grammar
whose rules have only
daughters.
Such a simplification
is a consequence
in
binary
of the
**
but
to
for
the
than
Intuitively,
continue
first
the
itself.
been
just
parse,
of
These
characteristics
of
simplification
of the general
a collection
What
is of
could return
best
effect
solution
hand,
bearing
less general
(e.g., Robinson
1982, Heidorn
1982,
Slocum
1984).
maximum
benefit
for our purposes
is a design that
the
graph
can be irrevocable,
database
of
to the
other
possible
goals
respect
to some
mechanism
by
even restricted
breadth-first
We
of the alternatives.
about
as
set of edges
algorithm
Wittenburg
highlight
is designed
enumeration,
grammars
basically
additional
is
literature
achieved
exhaustive
at partitioning
enumeration
1984).
grammar
with
such a control
but
firings to a minimum,
however,
is less desirable
than
some
of parses
1970),
enumeration
of the
parsing.
Although
or
of flexibility.
Order
Earley
in
scheme
implicit
lengths
graph
the
The
My
2.2. Enumeration
as a standard
graph).
search algorithm;
the
as the closed nodes in a best-first
other
data structure
we need is one to keep track of
open nodes,
which
can be viewed
as an agenda
of
exhaustive
enumeration,
best
is of
backtracking.
itself
the
relative
strong
development
a parser
minimal
search
is
a
enumeration
Assuming
grammar
chart
control
i.e., a measure
complete
search
fails in an application
(whether
will probably
arise in which all
an event
Best-first
is
choice
those
to
using
mode.
in general
attractive
search.
commutative
the
in which,
exhaustive
development
when parsing
circumstances
&ill
using
is
thus
thus
admissability
strong factor.
An example
attachment
arguments
a more
scheme.
to an and/or
system
search
within
that undesirable
rule interactions
can be eliminated.
Also it is
important
to take note of the performance
characteristics
of the
parser
l
analysis
more
path.
that
graph
(1980);
l
arguments
space
be
is a depth-first,
of charts
search graph appropriately;
thus, there may be no need
to check whether
newly added nodes have already
been
generated
in the search graph.
one.
The
is different
syntactic
by a different
deems
to continue
assuming
some
backtracking,
can be represented
conditions,
would
yield the full set of attachments
of view while
“low” attachment
would
mode,
anticipated
are
to
l
set.
there will be times
“hard”
or “soft”),
the search
may
of prepositional
semantic
in an application
enumeration
in
We
processing
order
a strength
control
of heuristic
search
The
for
interpretation
no reason
have reached
the first
solution
in these cases
are reachable
of the first
a semantic
we have
assigned
“high”
attachment
the semantic
point
yield
reach
above.
path,
that
be a treatment
say,
from
to
then
interpretations
interpretations
would
able
one path,
paths
reach
reason
set
are
than
semantic
breadth-first
since
require
(as opposed
Fleck
are advanced
to
but
“best-first”
The
l
Rich et al. (1986) g ive an account
of how such ideas can
be extended
to a variety
of syntactic
constructions
and to the
semantic
representation
itself.
Given this picture,
we again have a
case of a search domain
in which it is inappropriate
to generate
all
through
do not
perspective
phrases.
paths.
subsequent
alternatives
design,
they
semantic
in work
Hindle,
arguments
if
parses
course a well-known
paradigm
in the A.I. literature.
A number
of
interesting
observations
can be made about chart parsing
from the
exhaustive
Let us
by the
with
respect
to
previously
surfaced
various
the
backtracking
a given
a so-called
admitted
(1983),
(1986a)
fire
other
heuristic
one can.
be ambiguous
general
idea has
Pereira
enumerate
it necessary.
to control
to a successful
analyses
one-to-many
interpretations
semantic
forward
use
in order
to
syntactic
to
This
dealing
Categorial
should
appropriate
motivation
for
be mentioned.
(1980),
a
one
grammars
further
should
In Wittenburg
(1983).
most
as one proceeds
individual
designed
are
literature
Combinatory
that
these
to fire all the rules
enumeration
certain
with
rules
as long
using
suggested
in processing
grammar
linguistics
in
exercised;
which
viewed
closed
open
have
not
as closed
yet
nodes
discussed
here,
successors
of that
nodes
nodes
an edge
edge
in a heuristic
are options
been
must
have
The
exercised.
be understood
is placed
search
which
on the
are options
generated
observation
in light
chart
algorithm
been
that
of the fact
concurrent
with
during
chart
that
the
that
edges
in the
have
the search
can
be
algorithm
expansion
of all
on the agenda.
NATURAL
LANGUAGE
!
1055
grammars
being
inherent
used
restriction
parsing
algorithms
right
hand
sides
technique
the
to
MCC
chart
generalize
arbitrary
Lingo
to
Earley
(1970).
the
algorithm
already
binary
general
Most
that
using
grammars
length
by
by
Categorial
project,
parsing.
for complicating
Combinatory
Grammars;
an
Lingo
project;
published
chart
have rules with
the dotted
rule
and
there
are no penalties,
However,
in this
the effect
not
there
way
of dotted
The
the
The
second
to
formal
operations
point
previously
about
this
published
such
as
chart
ones
parsing
is that
is in
transformations
algorithm
it
or
does
with
not
register
maintence
aspects
upon
follows
the basic
permit
setting
based
on
heuristic
function
to
Developing
trial, and
the
error,
and
the
start
as a basis
of Nilsson
for the
order
*agenda*
2. Initialize
*chart*.
3. LOOP:
if *agenda*
as
main
loop;
of the
the
best
action
5. If *best-edge*
from
the
and apply
edge added
remove
it
the
terminating
conditions,
is non-nil,
the set
of M on the
generate
Install
the members
a heuristic
ordering
A4 of
edge
successors
generate
and
also
step,
in chart
which
see
if
a major
achieves
formalisms
appears
initialization.
between checking to
We make
a rule
call
can be
in step
is
6 of
chart,
leading
while actually
applying
highest
ranked
action
to
The
to augmentations
a rule
on the
on the
agenda
call is done only when invoking
agenda.
Given
this distinction,
only,
the
we
can make make use of an optimized
test function
for checking
edge
leaving
actual
unification
with
its
graphs
for rule application,
Since
the latter
destructive
effects
to applying
a rule
call.
operation
is invoked
much less often than the generate
successors
step, there
are
unification-based
themselves
for
major
savings
grammars,
the
test
in
at
function.
unification
costs.
For graph
least
three
options
present
First
is
Shieber’s
notion
of
restricted
unification
(Shieber
1985).
Second,
Karttunen
(1986)
has suggested
a scheme where the destructive
effects of unification
can be undone.
Last,
a more
“porous”,
but
more
efficient
check-graphs
function
could be used that is nondestructive
of its
The last of these options
is the one used in the
graph arguments.
1056
node
heuristic
expansion.
some
general
factors
that
are a
function.
any rule
to the
call is the span
chart
if the
rule
fires
from
the
In general,
be favored.***
content
in the
a role.
lexical
for designing
that
word
others
heuristics
tend
corpora.
and
take
will
in a rule
advantage
be
of
statistically
domains
of discourse.
to become
experience
Automated
gathering
involved
that
senses
in certain
will
through
and
edges
Differentiating
among the
sense assignments
is one
heuristics
some
than
such
make
/ ENGINEERING
mention
parsing
here
algorithm.
much
more
refined
with particular
grammars,
techniques
for statistical
heuristic
refinement
are
always
a
used
the generate
to optimize
the
grammar
in this
basic
for
way
successors
also
allows
step.
The
for further
partitioning
heuristic
of
control
actions.
4.1. Avoiding
Many
chart
grammars
of two useful refinements
to the
The first
introduces
a technique
for
unary
rules
more
restrictive,
thus
**a**
Also we
useless unary
rule applications.
to partition
the rules of the grammar
that is
environments
avoiding
ultimately
review a technique
of the parser’s
a critical
likely
and actually
applying
a rule to a set of edges.
operation
is a part of generating
the successors
of a new
on the
given
a
4. Refinements
chart
Generating
Successors
that
The critical
feature
of this algorithm
efficiency
advantage
for graph-unification-based
succeed
checking
a
algorithms
of
scheme.
3.2.
loop,
of
all
choice
its
*agenda*,
making
the main
the
in scoring
should
sophisticated
We
distinction
as with
is
possibility.
*best-edge*
in the
factor
likely
lexicons,
information
exit
7. Go LOOP.
found
once they
that are more likely to lead to success should
higher
intrinsic
scores,
and play
a role in
****
a rule call.
fact
Naturally,
and
satisfies
successors.
following
we mention
to be added
linguistic
more
the action,
Set *bestto the *chart*,
if any;
algorithm,
search,
a heuristic
edge
method
successfully.
6. If
Here
spans
The
failure.
*agenda*,
successfully
4.
successors
call should
also play
scores of ambiguous
from the *agenda*,
edge* to the new
else, NIL.
to apply
because
in performance,
The rules
be given
l
it
(1980:64).
exit with
our algorithm
downgrade
successfully.
This
span
can be computed
spans of the daughter
edges in the rule call.
the
4. Select
the
An important
l
l
is empty,
given
right set of heuristics
is the product
of intuition,
and depends
upon the particulars
of the grammar
domain.
scoring
1. Initialize
6 fail
in step
graph
for designing
of the system.
suffices
organization
in step
iteration
choice
a slight
3.3. The heuristic function
Critical
to the success of this
longer
3.1. The Main Loop
The following
procedure
except
generated
are chosen
for backtracking.
In
was done by Kaplan
(1973), nor is it designed
our experience,
these simplifications
allow more flexibility
in some
of the agenda
calls
the best
using
rules
remainder
of the grammar
consists
of unary rules,
effect of altering
the complex
categories
in some
respect
if rule
it seems
is no
when
achieved
in the categories
of such grammars.
The
rules that
are needed
are a small,
fixed number
of
combination
rules,
operating
over
these
complex
categories.
which have
way.
of
introduced
motivation
fact
only
very
in
due
useless
parsing
make
unary rule proposals
algorithms
developed
for
context-free
use of relations
in the grammar
which can prune
collecting
this sort of
(1980) mentions
the total search space.
reachability
information
Calls to unary
actions.
Kay
into tables
which
help control
parsing
rule productions
are an obvious
candidate
to
from
the search
space
for Combinatory
It is a general
fact about unary rules that
attempt
to
Categorial
prune
Grammars.
*et
Note
unary
that
this
scheme
by itself
will tend
to favor
binary
rule
applications
over
recover
from
rule applications.
****
Among
ungrammatical
the
input.
rules
may
These,
be
some
in general,
whose
will receive
function
a lower
is
to
priority.
*****
agenda
This
augmentation
items
involving
also
unary
helps
rule calls.
to establish
a heuristic
See Wittenburg
scoring
(1986b)
technique
for details.
for
they
tend
side
of
to be overly
the
usefulness
call
succeed
often
side.
effect
a
out
rule
an
now
typically,
by
to combine
edges
like
this
of
with
in chart
operations
more
under
consideration
be taken
the
itself
right-handthe
grammar
ultimate
Even when a unary
edge to the chart,
additional
to be unable
subsequent
must
i.e.,
measure
application.
superfluous
of making
edge
poor
in adding
turns
Adding
new
promiscuous,
is
of a unary
can
edge
rule
edges
Thus
rule
that
on either
parsing
has
expensive
the
since
unary
rules.
we
follows:
A relation
call
node
exists
some
branching
exhaustive
sisters
shown.
the
that
has
been
e&ended
sister
A is an extended
sister
node
(Y that
dominator
dominator
of B. In the
of A are
shown
of node
of A
as all
and
B if and
only
if there
p where
cr is a non-
@ is a non-branching
derivation
B’s that
tree
can
stand
below,
extended
in the
relation
/\/\
P
I
=
Q
I
=
P
I
=
I
B
I
A
I
B
different
weights
the checking
to partitions
of subsets
as a whole.
of rules
that
are less
likely
or perhaps
whose
checking
is more expensive
than
other
subsets.
Also, it is possible
to define a single check for an entire
grammar
partition
consideration
that
in one fell swoop
of any of those
rules
can eliminate
the further
for the edge sets in question.
this
to be useful in this
It is defined
as
of node
is a sister
exhaustive
found
relation.
assigning
in all generate
What
is
successors
procedures
involving
any adjacent
edges.
called for then is the computation
of some grammar
relation
that
can be used to further
restrict
the conditions
for applications
of
regard
by
we can postpone
5. Evaluation
Since
the
critical
makeup
role
difficult
and further
in
of the
a parser
to evaluate
when
chart
heuristic
based
and different
some of the
grammars
arguments
enumeration
order
application,
language,
Combinatory
heuristic
itself
plays
graph
search,
of the design
non-heuristically
use different
than the
advanced
would
interface
function
on
the efficiency
compared
to other
parsing
systems
that
study
a
it
is
in the general
case
based
designs.
For
representation
languages
ones used
here in
lose
such
force.
in the Lingo project,
favor
of a best-first
However,
a graph-unification-based
Categorial
Grammars,
given
an
NL
representation
a many-to-one
and
mapping
from syntactic
bracketings
to semantic
representations,
it
is hard to imagine
that any radically
different
alternative
could
beat the parsing
program
we have outlined
here on the grounds
of
efficiency,
indicates
first
clarity,
and flexibility.
Experience
in the Lingo
an overwhelming
performance
improvement
with
parsing
design
breadth-first
when
design
compared
that
to a non-selective,
in
figured
our
original
and
refinement
project
a best-
bottom-up,
bootstrapping
operation.
The
extended
precompute
sister
a grammar
the left-hand-sides
must
of
(1986b),
that
unary
successors
match
hand-side
is used
table
of each
unary-rule-call
edge
relation
one
unary
rule.
a consequence
holds
rule.
of this
following
the
extended
is that
the
sisters
some
unary
4.2. Mets
An
rule
calls
Agenda
additional
which
involve
for
sisters
out in
for
adjacent
of the leftWittenburg
generate
successors
step now has to consider
as successors
of a given edge
those unary rule calls which apply
to the edge directly,
those
We
condition
is now that
of the extended
As pointed
move
way.
An additional
of a new edge
at least
that
in the
not
but
the edge as an extended
just
also
sister.
Items
augmentation
that
we have
implemented
involves
adding
Current
research
methods
as
changes
involving
the
tuning
of grammars
of the
research
inspectable
includes
buffering
focused
approximations
the
rules
in
in
the
grammar,
It is also possible
that generates
these basic agenda
these rule calls.
This augmentation
iteration
of checking
the
edges
into n steps corresponding
rules into n cells.
weeding
to define
out
a meta
all
rules
agenda
item
items,
i.e., that itself
is a way of breaking
with
respect
to a partitioning
to the whole
could
lead
[i, P, E’],
cell of the
new
edge
added.
where
i is a heuristic
grammar
and
its
Exercising
rules,
adjacent
rule
calls
added
edges
an agenda
the edges in E’ with respect
only.
Any rules which then
to
the
score
item
at
effect
that
in the
items
of this new
will have the form
P is a partition
assignment,
E’ is a triple
and
the
of this
time
form
that
represents
the
new
involves
edge
the
was
checking
to the grammar
rules of partition
pass these further
tests will result
agenda
of
a
type
that
we
to
of
We are using
particular
corpora
of heuristic
further
design
a set of tools
which
take
of the chart.
designing
credit
for the
advantage
Longer-term
assignment
to specific performance
situations.
of incorporating
some
form
of
agenda
structure
on local
areas
so that
attention
of the
chart.
can be
Initial
that
such
a factor
would
improve
the
which tends to be best when considered
in
than globally.
Also, left-to-right
buffering
cascading
an
of real-time
work
reported
design
important
feeding
step
into
semantic
and
towards
any future
parsing.
also
Rich
made
offered
Wroblewski
including
graph
unification
also
wish
and
advice
the
Lingo
extensive
valuable
David
work,
interface
on here
I gratefully
contributions
has also
the coding
and
to the
the
parser
to thank
involved
many
others
at MCC
acknowledge
the contribution
of
In particular,
team to this work.
comments
Lauri
on parsing
on
to the research
earlier
drafts
of
itself
this
and
paper.
made significant
contributions
to this
of structure-sharing
algorithms
for
design
and
and
coding
grammar
Karttunen
of the window
development
for
system
environment.
his long-standing
support
matters.
P
in
References
assumed
previously.
1. Ades,
Words.
The advantages
of incorporating
are that it is possible
to heuristically
a
besides
the author.
other
members
of
Elaine
Adding
this sort of agenda
item
has the
generate
successors
step, we produce
agenda
type at the first pass.
These new agenda items
consideration
Acknowledgements
The
grammar
for
components,
generates
apart the
of the set of grammar
a
control
structure
the possibility
of
indicates
experience
reliability
of heuristics,
a local fashion
rather
against
all
as
agenda.
within
the
incrementally
pragmatic
fail the test.
evaluation
well
schemes
to automatically
adapt
We also count
the possibility
a new type
of agenda
item.
In the basic
best-first
algorithm,
agenda
items are made up of rule calls over a set of
edges.
These agenda
items are generated
by checking
these edges
which
involves
scoring
A.
and
M.
Linguistics
Steedman.
1982.
and Philosophy
On
the
Order
of
4: 517-558.
this new type of agenda
item
order the iteration
over the
NATURAL
LANGUAGE
/
1057
I
2. Bear,
J.,
Phrase
and
L.
Karttunen.
Structure
Parser.
1979.
Texas
PSG:
A
Linguistic
17. Nilsson,
Simple
Forum
15:
Palo
N.
Alto,
1980.
Ca.:
Principles
of Artificial
Intelligence.
Tioga.
l-46.
18. Pereira,
3. Church,
K.
1980.
On Memory
MIT
Processing.
Language
[Available
Club.]
through
4. Earley,
J.
the
in Natural
Dissertation.
University
Linguistics
Indiana
An
1970.
Algorithm.
Limitations
Doctoral
Efficient
Communications
Context-Free
of the ACM
F.
note
Parsing
and
for the Heuristic
Paths.
IEEE
7. Hayes, P.,
Caseframe
B. Raphael.
Determination
Transactions
1968.
E.,
J.
of the
Computational
Barnett,
K.
of Minimum
on SSC
23rd
Meeting
Linguistics,
J.
8. Heidorn,
G.
Computed
Proceedings
Wittenburg,
and
22. Shieber.
of the Association
Experience
for
S. 1984.
for Linguistic
pp. 362-366.
an
Easily
Metric
for Ranking
Alternative
Parses.
of the 20th Meeting
of the Association
Computational
Linguistics,
23. Shieber,
R.
193-241.
A
1973.
Rustin
(ed.),
pp. 82-84.
New York:
10. Karttunen,
General
Natural
Processor.
24. Shieber,
pp.
Algorithmics.
L. 1986.
HUG:
for unification-based
International
A development
grammars.
and CSLI,
Stanford
ms.,
Association
for
Computational
SRI
M.
Sharing
Meeting
Linguistics,
Algorithm
1980.
Schemata
in Syntactic
Processing.
Xerox
Center,
tech report no. CSL-80-12.
J.
M., D, Hindle,
about Talking
21st
Annual
Computational
15. Martin,
P.,
Transportability
Interface
and M. Fleck.
about Trees.
Meeting
Linguistics,
D.
of
pp.
and
Data
Palo
Alto
In Proceedings
16. Martin,
W.,
K.
Church,
of
a
Preliminary
Analysis
Algorithm:
Theoretical
and
tech
report
Association
for
pp. 129-136.
Appelt,
and
F.
Pereira.
1983.
and Generality
in a Natural-Language
System.
of IJCAI,
/ ENGINEERING
pp.574-581.
and
R.
Patil.
Breadth-First
Experimental
no. MIT/LCS/TR-261.
An Introduction
In
of the Association
for
to Unification-Based
Chicago:
University
of
forthcoming.
1984.
METAL:
The
Paper
presented
LRC
Machine
at the ISSCO
Translation,
April
2-6, 1984,
[Working
paper
no. LRC-84-2,
Research
26. Steedman,
M.
the
Grammar
Center,
University
1985.
Dependency
of Dutch
and
of
Texas
at
and Coordination
in
Language
English.
Chart
H.
1981.
in PSG.
In Proceedings
27. Thompson,
Schemata
Meeting
1981.
Parsing
Results.
Parsing
and
of the
Association
for
19th
Rule
Annual
Computational
pp. 167-172.
28. Wittenburg,
Categorial
tech
the
of
presented
1983. D-Theory:
In Proceedings
of
the
Parsing
Formalisms.
.]
Society
14. Marcus,
Talking
to Extend
pp. 145-152.
Grammar.
Press,
Linguistics,
Structures
Research
Meeting
Language
of Coling84,
Linguistics.
61:523-568.
133-136.
13. Kay,
to
Linguistics
University.
L., and M. Kay. 1985. Structure
In Proceedings
of the 23rd
Trees.
12. Karttunen,
with Binary
23rd
Translation
System.
Tutorial
on Machine
Lugano,
Switzerland.
environment
Unpublished
of a Computer
Restriction
Linguistics,
S. 1986.
25. Slocum,
Austin
11. Karttunen,
for
In
Processing,
L. 1984. Features
and Values.
Proceedings
28-33. Association
for Computational
pp.
of Coling,
Linguistics.
of the
Syntactic
25:27-37.
Design
Using
of the
Approaches
Language
Grammar
for Complex-Feature-Based
Computational
In
for
The
S. 1985.
Chicago
9. Kaplan,
A
of the ACM
Information.
Proceedings
Association
for Computational
Algorithms
with
DIAGRAM:
Cost
4:100-107.
pp. 153-160.
1982.
1982.
Communications
Proceedings
1058
Analysis.
Corporation.
Dialogues.
A Formal
P. Andersen,
and S. Safier. 1985.
Semantic
Parsing
and
Syntactic
Generality.
In
Proceedings
MIT
Language
SRI International.
G. Whittemore.
1986.
Ambiguity
and Procrastination
in NL Interfaces.
Technical
report
no.
HI-073-86,
Microelectronics
and
Computer
Technology
21. Robinson,
P., N. Nilsson,
the
Center,
MIT.
6. Hart,
R.
for Natural
13:94-102.
20. Rich,
Basis
Logic
275, A.I.
19. Pereira,
F. 1985. A Structure-Sharing
Representation
for
Unification-Based
Grammar
Formalisms.
In
Proceedings
of the 23rd Meeting
of the Association
for
Computational
Linguistics,
pp. 137-144.
5. Ford,
M.,
J. Bresnan,
and
R. Kaplan.
1982.
A
Competence-Based
Theory
of Syntactic
Closure.
In
The
Mental
Representation
of
J.
Bresnan
(ed.),
Cambridge,
Grammatical
Relations,
pp.
727-796.
Mass.:
1983.
Technical
K. 1985.
Grammars
at
the
Some Properties
of Combinatory
of Relevance
to Parsing.
Paper
Annual
of America,
report
K. 1986a.
To appear
Anaphora.
Volume
20: Discontinuous
30. Wittenburg,
[MCC
tech
K.
1986b.
Technical
Search.
Microelectronics
Corporation.
of
the
27-30,
Linguistics
Seattle.
[MCC
no. HI-012-86.1
29. Wittenburg,
Academic.
Meeting
December
Extraposition
report
and
from
in Syntax
and
Constituencies.
NP
as
Semantics,
New York:
no. HI-118-85.1
Parsing
as
Heuristic
Graph
HI-075-86,
no.
report
Computer
Technology