Anaphora Resolution for Question Answering

advertisement
Anaphora Resolution for Question Answering
by
Luciano Castagnola
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degrees of
Bachelor of Science in Computer Science and Engineering
and
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2002
@ Massachusetts Institute of Technology 2002. All rights reserved.
Author ..............
r
Department of Electrical Engineering and Computer Science
May 24, 2002
Certified by...
. ...... . . . . . . . . . . . . . . . . . .. . . . . . ..
Boris Katz
Principal Research Scientist
Thesis Supervisor
Accepted by
.
.. . . . . . . . . . . . . . . .
.6:7. ....
Arthur C. Smith
Chairman, Department Committee on Graduate Students
MSSACHUNSETS INSTITUTE
OF TECHNOLOGY
JUL 3 1 2002
LIBRARIES
Anaphora Resolution for Question Answering
by
Luciano Castagnola
Submitted to the Department of Electrical Engineering and Computer Science
on May 24, 2002, in partial fulfillment of the
requirements for the degrees of
Bachelor of Science in Computer Science and Engineering
and
Master of Engineering in Electrical Engineering and Computer Science
Abstract
Anaphora is a major phenomenon of natural language, and anaphora resolution is one
of the important problems in Natural Language Understanding. In order to analyze
text for content, it is important to understand what pronouns (and other referring
expressions) refer to. This is important in the context of Question Answering, where
questions and information sources are analyzed for content in order to provide precise
answers, unlike keyword searches. This thesis describes BRANQA, an anaphora resolution tool built for the purpose of improving the performance of Question Answering
systems. It resolves pronoun references via the use of syntactic analysis and high
precision heuristic rules. BRANQA serves as an infrastructure for the experimentation with different resolution strategies and will enable evaluation of the benefits of
anaphora resolution for Question Answering. We evaluated BRANQA's performance
and found it to be comparable to that of other systems in the literature.
Thesis Supervisor: Boris Katz
Title: Principal Research Scientist
2
Acknowledgments
I am grateful to Sue Felshin and Greg Marton for valuable comments in the preparation of this document. I thank Ali Ibrahim and Greg for their help during the
development of the system.
I thank Boris Katz for his support and patience throughout these years.
I thank Patrick Winston for his constant encouragement and extraordinary generosity.
3
Contents
1
Introduction
8
1.1
What is anaphora? . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.2
Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Resolving Pronouns for Question Answering . . . . . . . . . . . . . .
10
1.4
O utline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2 Anaphora Resolution
2.1
12
Overview of Pronominal Anaphora Resolution . . . . . . . . . . . . .
12
2.1.1
Two Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1.2
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.1.3
Preferences
14
2.1.4
Computational Strategies
. . . . . . . . . . . . . . . . . . . .
15
2.2
Government and Binding Theory
. . . . . . . . . . . . . . . . . . . .
15
2.3
Prior Work
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3.1
RA P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3.2
CogNIAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 System Architecture
3.1
3.2
Link Parser
22
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1
Link Grammar
. . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2
Constituent Structure
23
24
. . . . . . . . . . . . . . . . . . . . . .
25
Noun Phrase Categorization . . . . . . . . . . . . . . . . . . . . . . .
26
3.2.1
27
Heads and Subject . . . . . . . . . . . . . . . . . . . . . . . .
4
3.2.2
Valid References
. . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3
Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4
Coreference Module . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.4.1
Pleonastic pronoun detector . . . . . . . . . . . . . . . . . . .
30
3.4.2
Syntactic filter
. . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.4.3
Resolution procedure . . . . . . . . . . . . . . . . . . . . . . .
31
4 Evaluation
35
4.1
MUC-7 Coreference Task Corpus
4.2
Test Procedure
. . . . . . . . . . . . . . . . . . . .
35
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.3
R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.4
Effect on Question Answering . . . . . . . . . . . . . . . . . . . . . .
38
5 Future Work
5.1
5.2
6
40
Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.1.1
Quoted Speech
. . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.1.2
Named Entity Module . . . . . . . . . . . . . . . . . . . . . .
40
5.1.3
Syntactic Filter . . . . . . . . . . . . . . . . . . . . . . . . . .
41
Future research projects
. . . . . . . . . . . . . . . . . . . . . . . . .
41
5.2.1
Statistics as a proxy for world knowledge . . . . . . . . . . . .
41
5.2.2
Alternative resolution procedures . . . . . . . . . . . . . . . .
42
5.2.3
Integration with other systems . . . . . . . . . . . . . . . . . .
42
Contributions
44
5
List of Figures
2-1
Binding Theory Examples . . . . . . . . . . . . . . . . . . . . . . . .
16
2-2
Examples of disjoint reference . . . . . . . . . . . . . . . . . . . . . .
19
2-3
RAP's pleonastic pronoun detector
. . . . . . . . . . . . . . . . . . .
19
3-1
Overall Architecture
. . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3-2
Link Parser Output Example
. . . . . . . . . . . . . . . . . . . . . .
24
3-3
Problems assigning constituent structure to conjunctions . . . . . . .
25
3-4
Noun Phrase Table . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3-5
Resolution rules in BRANQA
33
. . . . . . . . . . . . . . . . . . . . . .
6
List of Tables
4.1
Test Results by Rule . . . . ..
. . . . . . . . . . . . . . . . . . . . .
37
4.2
Test Results by Pronoun . . ..
. . . . . . . . . . . . . . . . . . . . .
38
7
Chapter 1
Introduction
In this chapter I present the motivation behind the development of BRANQA 1 , an
anaphora resolution tool.
1.1
What is anaphora?
Anaphora is reference to entities mentioned previously in the discourse. The referring
expression is called an anaphorand the entity to which it refers, or binds, is its referent
or antecedent. Anaphora resolution is the process of finding an anaphor's antecedent.
Example:
The car is falling apart, but it still works.
Here "it" is the anaphor and "The car" is the antecedent. This is an example
of pronominal anaphora, or anaphora where the anaphor is a pronoun.
It is the
most common type of anaphora, and will be the focus of this thesis. Other kinds of
anaphora are definite noun phrase anaphora and one-anaphora:
President George Bush signed (...) The president...
If you don't like the coat, you can choose another one.
In the first sentence "The president" is the anaphor, and "President George Bush"
is the antecedent. In the second, "one" is the anaphor and "the coat" the antecedent.
When the anaphor is in the same sentence as the antecedent, it is called an
intrasententialanaphor; otherwise it is an intersententialanaphor.
'BRANQA: BRANQA Resolves ANaphors for Question Answering
8
1.2
Question Answering
The InfoLab Group at MIT's Al Lab has developed systems that attempt to solve
the problem of information access. The belief that natural language is the easiest way
for humans to request information has led the group to work on question answering
systems. The START (SynTactic Analysis using Reversible Transformations) (17, 18]
system provides multimedia access using natural language. It has been available to
answer questions on the World Wide Web 2 since December 1993. Since it came online,
it has answered millions of questions for hundreds of thousands of people all over the
world, providing users with knowledge regarding geography, presidents of the U.S.,
movies, and many other areas.
The START System strives to deliver "just the right information" in response to a
query. Unlike Web search engines, START does not reply with long lists of documents
that might contain an answer to our question; it provides the actual answer we are
looking for. This comes in the form of a short information segment (e.g., an English
sentence, a graph, a picture), rather than an entire document. START has been very
successful in its interaction with users, but its domain of knowledge is fairly limited
and expanding its knowledge base requires human effort. It works extremely well
within the domains it handles, but any question outside its knowledge base will get
a reply from START saying it does not know how to answer it.
In response to this problem, the InfoLab Group began to work on systems with
less stringent requirements with respect to both returning correct answers and delivering "just the right information". These systems lie somewhere along the spectrum
between information retrieval engines like Altavista or Google3 at the one end and
natural language systems like START at the other. They are linguistically informed
search engines, which attempt to use natural language tools to aid the retrieval of
information in order to return a smaller amount of irrelevant information than traditional search engines.
One of these systems, Sapere [22], indexes relations between words to allow it
2
3
http://www.ai.mit.edu/projects/infolab
http://www.altavista.com, http://www.google.com
9
to search for information in a smart way. By storing relations like Subject-VerbObject, it can distinguish between cases that the simple "bag of words" approach
would confuse. For example, in response to the question "When did France attack
England?" Sapere will not return the sentences "England and France attacked China
in 1857", "England attacked France", or "France was attacked by England", since
the crucial relation France-attack-England is missing in all of them. The "bag of
words" approach treats documents as sets of keyword counts, and would thus consider
"England attacked France" to be equivalent to "France attacked England".
1.3
Resolving Pronouns for Question Answering
Underlying the motivation for this project is a desire to improve the performance of
the InfoLab Group's question answering systems. START, Sapere, and future group
projects can benefit from the use of a pronominal anaphora resolution tool.
As mentioned above, Sapere indexes relations as part of its linguistically informed
information retrieval approach. The analysis, indexing and retrieving are all done at
the sentence level and this makes the resolution of anaphors very important. Without
resolving what a pronoun refers to in a sentence, relations involving that pronoun are
not useful for retrieval. After reading "The first seven attempts to climb Mount
Everest were unsuccessful. Edmund Hillary climbed it in 1953..." we cannot answer
"Who climbed Mount Everest for the first time?" unless we find that "Mount Everest"
is an antecedent for "it".
Adding a pronominal anaphora resolution module to Sapere should increase the
number of questions it can answer about a given corpus; resolving pronouns should
increase its recall4 by raising the number of useful relations that are indexed.
START also stands to benefit from the availability of an anaphora resolution
module. One of the features of START that is not currently being used is the ability
to keep track of threads of conversation with different users. Enabling this feature
4Recall is the ratio of correct answers found to correct answers in the corpus. Precision is the
ratio of correct answers to answers given (correct and incorrect).
10
could allow more interesting interaction with the users, turning sessions into dialogues
rather than series of disconnected question/answer pairs. In this mode of operation,
pronominal anaphora resolution would become very important, since it would allow
users to refer to entities introduced in previous sentences much more naturally.
Currently START handles the simplest cases of pronominal anaphora; namely, if
there is only one possible antecedent for a pronoun that passes the gender and number
agreement test, START will resolve it, but otherwise it will ask the user to clarify.
This is the most conservative approach to anaphora resolution (so long as gender and
number of entities is identified correctly, no mistakes will be made), but this leads to
unnatural conversation in many cases where the pronoun could be resolved with high
confidence. We believe that a good pronominal anaphora resolution tool would lead
to improved user interaction, an important goal of the START system.
More traditional approaches to information retrieval also stand to gain from
anaphora resolution [29].
It can even help systems based on the "bag of words"
scheme, where pronouns should raise the counts of their antecedents. Thus, future
group projects in this direction could also profit from this technology.
This thesis presents the design and evaluation of BRANQA, a system motivated
by the benefits that question answering could reap from anaphora resolution.
1.4
Outline
The rest of the document is organized as follows:
* Chapter 2 introduces the background work on which the system is based.
* Chapter 3 describes the system's architecture and how it works.
* Chapter 4 presents an evaluation of the system.
* Chapter 5 lists improvements to be made in the near future together with
research projects suggested by this work.
" Chapter 6 summarizes the contributions made in developing this thesis.
11
Chapter 2
Anaphora Resolution
In this chapter I present the ground on which this thesis rests.
2.1
Overview of Pronominal Anaphora Resolution
When encountering a pronoun in the text, how can one tell what it refers to? The
literature shows a wide variety of approaches to solving this problem. Miktov [27]
provides an excellent overview of the state of the art in anaphora resolution, parts of
which I summarize briefly in this section.
2.1.1
Two Stages
The process of finding the expression to which a pronoun refers can be split into two
tasks: finding a set of plausible referents, and picking a "best" element from the set.
The first task is complicated by the many different types of reference that pronouns
take part in. Pronouns can refer to noun phrases, verb phrases, clauses, sentences or
even whole paragraphs. For example, in "Mary ran ten miles yesterday. She liked
it very much", "she" refers to the noun phrase "Mary", and "it" refers to the verb
phrase "ran ten miles yesterday". Pronouns can also lack referents. This is the case
with pleonastic (alternatively non-referential or semantically empty) pronouns, as in
"It is raining" or "It seems John is unhappy."
12
Additionally, the referent can be mentioned before or after the pronoun. If the
referent is mentioned first, the usual situation, the referent is the antecedent of an
anaphor. If the pronoun is seen first, the kind of reference is called cataphora and
the pronoun is a cataphor. An example of a cataphoric relation would be "When he
woke up, John was drenched in sweat."
An ideal system for the resolution of pronouns would have to handle all of these
kinds of reference, which would involve search over all possible referents (noun phrases,
verb phrases, etc.) both before and after the pronoun.
In practice, the scope of
systems is usually reduced to the detection of pleonastic pronouns and the resolution
of anaphors with noun phrase antecedents, both because of the complexity of handling
the general case, and because these are the most common uses of pronouns. In a
system with this focus, the first step in resolving a pronoun is to determine whether
it is pleonastic, and if not, to identify all noun phrases occurring before it as possible
antecedents.
Once the set of potential antecedents is determined, a number of "resolution factors" are used to track down the correct antecedent.
Factors used frequently in
the resolution process include gender and number agreement, syntactic binding restrictions, semantic consistency, syntactic parallelism, semantic parallelism, salience,
proximity and others. These factors can be divided into constraints (or eliminating
factors), which must hold, and preferences, which are used to rank candidates.
2.1.2
Constraints
Constraints control what an anaphor can refer to. They are conditions that always
need to hold for reference to be valid, and can thus be used to remove implausible
candidates from the list of possible antecedents.
Examples of constraints are gender and number agreement. Anaphors and their
antecedents must always agree in number and gender.'
Some constraints are given
by syntactic theories like Government and Binding Theory, which specifies binding
'Note: Collective nouns like "government" and "team" can be referred to by "they", and plural
nouns like "data" can be referred to by "it". The definition of number is complicated in these cases.
13
restrictions (see Section 2.2).
Other constraints are given by semantics. Although it is beyond current natural
language technology to understand open domain texts, statistics can be used as a
proxy for semantic knowledge.
In the two examples below, the frequency of co-
occurrence of words could be used to disambiguate the anaphors:
Joe removed the diskette from the computer and disconnected iti.
Joe removed the diskette from the computer and copied iti.
Ge, Hale and Charniak [11] present a successful statistical approach to the resolution of pronouns consisting of a probabilistic model trained on a small subset of the
Penn Treebank corpus.
2.1.3
Preferences
Preferences, as opposed to constraints, are not obligatory conditions and therefore do
not always hold. They are criteria that can be used to rank the possible antecedents.
Among preferences, Mitkov lists syntactic parallelism, semantic parallelism and centering.
Syntactic parallelism gives preference to noun phrases with the same syntactic
function as the anaphor. For example:
The programmer successfully combined Prologj with C, but he had
combined it1 with Pascal last time.
The programmeri successfully combined Prolog with Cj, but he had
combined Pascal with it, last time.
Similarly, semantic parallelism says that noun phrases which have the same semantic role as the anaphor are favoured.
Vincent gave the diskette to Sodyi. Kim also gave him a letter.
Vincenti gave the diskette to Sody. Kim got a letter from himi too.
Syntactic and semantic criteria are not always sufficient to choose among a set
of candidates. These criteria are usually used as filters to eliminate unsuitable candidates, and after that the most salient element among the remaining noun phrases
is selected. This most salient element is referred to as the focus [31] or center [12].
Mitkov uses the following example to illustrate this concept:
14
Jenny put the cup on the plate and broke it.
Here the meaning of "it" is ambiguous; its antecedent could be "the cup" or "the
plate". However, context can help disambiguate the reference:
Jenny went window shopping yesterday and spotted a
nice cup. She wanted to buy it, but she had no money
with her. The following day, she went to the shop and
bought the coveted cup. However, once back home and
in her kitchen, she put the cup on the plate and broke it.
Now "the cup" is the most salient entity and is the center of attention throughout
the paragraph; it is preferred over "the plate" as an antecedent for "it". This example
illustrates the important role of tracking down the center/focus in anaphora resolution. After "filtering" unsuitable candidates, the final choice is made by determining
which of the candidates seems to be the center. Various methods have been proposed
for center/focus tracking [5, 9, 25, 32, 36].
2.1.4
Computational Strategies
The traditional approach to anaphora resolution is to eliminate unlikely candidates
until a minimal set of plausible candidates is obtained, and then make use of preferences to choose a candidate. Other approaches compute the most likely candidate
on the basis of statistical or "AI" techniques (Mitkov mentions uncertainty-reasoning
methods as an example of these techniques). In these "alternative" systems the concept of constraint might disappear, and all resolution factors might be considered
preferences whose weights get updated through "Al" techniques. Mitkov [26] compares a traditional and an "alternative" approach using the same set of anaphora
resolution factors.
2.2
Government and Binding Theory
Government and Binding Theory is a version of Chomsky's theory of universal grammar named after his Lectures on Government and Binding [7]. One of its components,
Binding Theory, explains the behavior of intrasentential anaphora. The theory ex15
Johni hit himj/,i.
Johni hit himselfi/,j.
Luciej said [cp that [ip Lilij hurt herselfj/,i/,]].
Luciej said [cp that [jp Lilij hurt heri/k/*jIj.
Poiroti believes [NP John's description of himselfj/*j].
Poiroti believes [NP any description of himselfi/,j].
('*'
denotes ungrammatical co-indexings)
Figure 2-1: Binding Theory Examples
plains when an anaphor can bind to a noun phrase based on their relative positions
in syntactic structure. The details of the theory are complicated, but the important
point for this thesis is that syntax alone can place hard constraints on anaphora, and
this can be used to help us pick antecedents for anaphors by eliminating syntactically disallowed candidates. For an introductory treatment of Binding Theory see
Haegeman [13].
Figure 2-1 shows examples of reference determined valid or invalid by Binding
Theory on the basis of syntactic structure.
2.3
Prior Work
BRANQA is largely based on two prior systems: Lappin and Leass' RAP [19], and
Baldwin's CogNIAC [4]. This section presents some of the ideas taken from them.
2.3.1
RAP
RAP (Resolution of Anaphora Procedure) is an algorithm for identifying the noun
phrase antecedents of third person pronouns and lexical anaphors (reflexive and reciprocal pronouns). The algorithm applies to the syntactic representations generated
by McCord's Slot Grammar parser [23], and relies on salience measures derived from
syntactic structure and a simple dynamic model of attentional state. In a blind test
on computer manual text containing 360 pronoun occurrences the system identified
the correct antecedent for 86% of these pronoun occurrences.
16
RAP contains the following main components:
" An intrasentential syntactic filter for ruling out anaphoric dependence of a pronoun on a noun phrase based on syntactic binding constraints.
" A morphological filter for ruling out anaphoric dependence of a pronoun on a
noun phrase due to non-agreement of person, number or gender features.
* A procedure for identifying pleonastic (semantically empty) pronouns.
* An anaphor binding algorithm for identifying the possible antecedent of a lexical
anaphor within the same sentence.
" A procedure for assigning values to several salience parameters (grammatical
role, parallelism of grammatical roles, frequency of mention, proximity, and
sentence recency) for a noun phrase. This procedure employs a grammatical role
hierarchy according to which the evaluation rules assign higher salience weights
to (i) subject over non-subject noun phrases, (ii) direct objects over indirect
objects, (iii) arguments of a verb over adjuncts and objects of prepositional
phrase adjuncts of the verb, and (iv) head nouns over complements of head
nouns.
* A procedure for identifying anaphorically linked noun phrases as an equivalence
class for which a global salience value is computed as the sum of the salience
values of its elements.
" A decision procedure for selecting the preferred element of a set of antecedent
candidates for a pronoun.
BRANQA's syntactic filter and pleonastic pronoun detector were modeled after
the ones in RAP, which I describe below.
Intrasentential Syntactic Filter
RAP's syntactic filter was developed for English Slot Grammar, a kind of dependencybased grammar [24]. Dependency syntax avoids the use of phrase structure or cat17
egories; instead it marks syntactic dependencies between the words of a sentence.
These are represented by arcs with arrows: X-+Y. We say that Y depends on X, or
that X governs Y. X is called the (syntactic) governor of Y and Y is called the (syntactic) dependent of X. The head of a phrase P is a component of P which governs
all other components of P. An argument of X is a necessary dependent of X (e.g., the
direct object for a transitive verb) and an adjunct of X is an optional dependent of
X (e.g., an adjective modifying a noun).
The filter consists of conditions for non-coreference of a noun phrase and a pronoun
within the same sentence. The following terminology is used to state these conditions:
" A phrase P is in the argument domain of a phrase N iff P and N are both
arguments of the same head.
" P is in the adjunct domain of N iff N is an argument of a head H, P is the object
of a preposition PREP, and PREP is an adjunct of H.
" P is in the NP domain of N iff N is the determiner of a noun
argument of
adjunct of
Q, or
(i) P is an
(ii) P is the object of a preposition PREP and PREP is an
Q.
" A phrase P is contained in a phrase
adjunct of
Q and
Q,
Q
iff (i) P is either an argument or an
Q, or
in Q.
i.e., P is immediately contained in
contained in some phrase R, and R is contained
(ii) P is immediately
Given these definitions, the syntactic filter says that a pronoun P is non-coreferential
with a (non-reflexive or non-reciprocal) noun phrase N if any of the following hold:
1. P is in the argument domain of N.
2. P is in the adjunct domain of N.
3. P is an argument of a head H, N is not a pronoun, and N is contained in H.
4. P is in the NP domain of N.
5. P is a determiner of a noun
Q, and
N is contained in
18
Q.
1. She likes her3 .
Johni seems to want to see him3 .
2. Shei sat near her 3 .
3. Hei believes that the man3 is amusing.
4. Johni's portrait of him is interesting.
5. Hisi portrait of John3 is interesting.
Hisi description of the portrait by John3 is interesting.
Figure 2-2: Examples of disjoint reference
Figure 2-2 shows examples of disjoint reference signalled by these conditions.
Pleonastic pronoun detector
RAP attempts to identify non-referential uses of it to improve resolution performance.
It defines a class of modal adjectives (ModalAdj) containing words like "necessary",
"easy" and "advisable", together with their morphological negations, as well as comparative and superlative forms. It also defines a class of cognitive verbs (CogV) like
"recommend", "think" and "believe". When it is present in one of the constructions
in Figure 2-3 it is considered pleonastic. Syntactic variants of these constructions (It
is not/may be ModalAdj..., Wouldn't it be ModalAdj..., etc) are also recognized.
It is ModalAdj that S
It is ModalAdj (for NP) to VP
It is CogV-past-tense that S
It seems/appears/means/follows (that) S
NP makes/finds it ModalAdj (for NP) to VP
It is time to VP
It is thanks to NP that S
Figure 2-3: RAP's pleonastic pronoun detector
19
2.3.2
CogNIAC
CogNIAC is a pronoun resolution system giving more importance to precision than
to recall. The system resolves a subset of anaphors that do not require general world
knowledge or sophisticated linguistic processing for successful resolution. CogNIAC
does this by being very sensitive to ambiguity, and only resolving pronouns when very
high confidence rules have been satisfied.
CogNIAC, like RAP, first eliminates candidate phrases that are not compatible
with the anaphor's gender and number or that are ruled out on syntactic grounds
(the syntactic constraints used are not mentioned in the paper). The system then
evaluates a set of heuristic rules to choose an antecedent, or in the case that no rules
are triggered, to leave it unresolved.
The six core rules of CogNIAC are (in order of application):
1. Unique in Discourse: If there is a single possible antecedent i in the preceding
portion of the entire discourse, then pick i as the antecedent.
2. Reflexive: Pick nearest possible antecedent in preceding portion of current sentence if the anaphor is a reflexive pronoun.
3. Unique in Current + Prior: If there is a single possible antecedent i in the prior
sentence and the preceding portion of the current sentence, then pick i as the
antecedent.
4. Possessive Pro: If the anaphor is a possessive pronoun and there is a single
exact string match i of the possessive in the prior sentence, then pick i as the
antecedent.
5. Unique Current Sentence: If there is a single possible antecedent in the preceding portion of the current sentence, then pick i as the antecedent.
6. Unique Subject/ Subject Pronoun: If the subject of the prior sentence contains
a single possible antecedent i, and the anaphor is the subject of the current
sentence, then pick i as the antecedent.
20
In the first experiment reported, CogNIAC was tested on 298 third person singular pronouns in narrative texts about two same-gender people (chosen to maximize
ambiguity). It achieved a precision of 92% and recall of 64%.
In a second experiment, CogNIAC was tested on the articles used in the MUC-6
coreference task [30]. The system underwent some changes in preparation for MUC-6,
both because CogNIAC was now being used as part of a larger system, and because
the domain of the MUC-6 documents was different from the narrative. Rule 4 was
eliminated because it did not seem appropriate for the domain. Additions were made
to process quoted speech in a limited fashion (the specific additions were not presented
in the paper). A rule was added to search back for a unique antecedent through the
text looking backwards at progressively larger portions of the text. A new pattern was
added which selected the subject of the immediately surrounding clause. A pleonastic
it detector was also implemented.
After these changes, CogNIAC achieved 73% precision and 75% recall on fifteen
MUC-6 documents containing 114 pronoun occurrences.
21
Chapter 3
System Architecture
As mentioned in Chapter 2, the resolution of anaphoric expressions proceeds in two
stages: the identification of a set of plausible antecedents, followed by the selection
of the most likely candidate. The overall architecture of the system has two main
components, each one dealing with one part of the problem.
Link Parser
Link Parser Interface
Named
Entity
Module
Noun
Phrase
Coreference Module
Pleonastic Pron.
Categorization
Syntactic Filter
ReslutionProcedure
Noun Phrase Table
Figure 3-1: Overall Architecture
22
The noun phrase categorization tool identifies noun phrases in the input and
determines relevant properties of them (e.g., gender and number). The coreference
module then uses these properties of noun phrases to select an antecedent for the
anaphor.
The two components interact with the external Link Parser through a
wrapper that communicates with the parser and attempts to correct some of its
deficiencies.
One of the goals in mind while designing the system was to provide a testbed
for research in anaphora resolution and its application to question answering. Thus,
although the system concentrates on the resolution of pronominal anaphors, the infrastructure is there for experimentation with coreference in general. The resolution
procedure does not depend on the parser output representation, nor does it depend
directly on the linguistic resources used for noun phrase categorization. This modularity allows for easy experimentation with individual parts of the system, for example, evaluating different resolution strategies or different methods for noun phrase
categorization.
The system was written in Java, except for parts of the wrapper for the Link
Parser which were written in C. The Link Parser is the only external dependency
but the design philosophy allows for easy connection to other systems (e.g., a better
Named Entity module).
The following sections describe the system's components in more detail.
3.1
Link Parser
After a sentence is submitted to the system, the system parses it. For this task it
uses the Link Parser developed at Carnegie Mellon University.1
The Link Parser is written in generic C code, and runs on any platform with a C
compiler. An application program interface (API) makes it easy to incorporate the
parser into other applications.
The parser has a dictionary of about 60,000 word forms. It has coverage of a
lhttp://www.link.cs.cmu.edu
23
+-SFsi+---Paf--+--THi--+-Cet+-Ss-+---I---+--Os-+
I
I
I
I
I
I
I
I
it seemed.v likely.a that.c he would.v kiss.v Mary
Figure 3-2: Link Parser Output Example
wide variety of syntactic constructions, including many rare and idiomatic ones. The
parser is robust; it is able to skip over portions of the sentence that it cannot understand, and assign some structure to the rest of the sentence. It is able to handle
unknown vocabulary, and make intelligent guesses from context and spelling about
the syntactic categories of unknown words. It has knowledge of capitalization, numerical expressions, and a variety of punctuation symbols. When several interpretations
of a sentence are possible the parser allows access to all of them, sorted by a measure
of how good the parse is (for example, when the parse is not complete, it includes in
this measure the number of words that had to be skipped).
3.1.1
Link Grammar
The parser is based on a formal grammatical system called a link grammar [33, 34].
A link grammar has no concept of constituents or categories (e.g., noun phrase, verb
phrase).
It contains a set of words (the terminal symbols of the grammar) each of
which has a linking requirement. The parser connects the words with links so as to
satisfy their linking requirements and the requirement of planarity (that links do not
cross each other), which is a property that holds for most sentences of most natural
languages [24]. The linking requirements for the words are specified in a dictionary.
They determine what types of links can be used to connect a word to others.
The link grammar for English contains more than 100 types of links, each of
which specifies a different kind of relation between words. Some of these link types
are very useful for the task of pronominal anaphora resolution.
For example, an
SF link connecting "it" to a verb indicates that this is a non-referential use of "it".
Figure 3-2 shows a sample linkage.
Its focus on dependency relations among words makes the Link Parser great for
24
(S (NP Former guests)
(S (NP Former guests)
(VP include
(NP (NP (NP John)
(VP include
(NP (NP John)
(NP Paul))
(NP Paul)
(NP George)
(NP (NP George)
(NP Ringo)
and
(NP (NP Ringo
and
(NP Steve)
(NP Steve)))
(b) Correct parse (worst rank)
(a) Top-ranked parse
Figure 3-3: Problems assigning constituent structure to conjunctions
the task of extracting relations, and the ongoing JLink project at the InfoLab Group
is working on that problem. The extracted relations can then be used in our question
answering systems (such as Sapere) as well as in new versions of BRANQA by building
on the work of Dagan and Itai [8] (see Chapter 5). Thus, work using the Link Parser
has the possibility of helping the InfoLab Group beyond the direct results of this
thesis, especially through the identification of problems and possible solutions.
3.1.2
Constituent Structure
Although link grammars have no concept of constituents, the Link Parser has (since
version 4.0) a phrase-parser: a system which takes a linkage (the usual link grammar
representation of a sentence, showing links connecting pairs of words) and derives
a constituent or phrase-structure representation, showing conventional phrase categories such as noun phrase (NP), verb phrase (VP), prepositional phrase (PP), clause
(S), and so on. This allows us to identify the noun phrases in the text, the first step
towards resolution of pronominal anaphors.
The interface to the Link Parser takes the NPs identified by the parser and attempts to fix some common problems in the parser's handling of conjunctions. Conjunctions with many disjuncts are almost always parsed incorrectly in the top-ranking
25
Jimi bought a new guitar. He broke it on stage.
NP
1
2
3
4
5
sent.
1
1
2
2
2
text
Jimi
a new guitar
He
it
stage
head(s)
Jimi
guitar
He
it
stage
subject
true
false
true
false
false
he
true
false
true
false
false
she
false
false
false
false
false
it
false
true
false
true
true
they
false
false
false
false
false
ref.
1
2
1
2
5
Figure 3-4: Noun Phrase Table
linkage returned by the parser. This happens because one of the components in the
cost vector used to sort the linkages is the sum of link lengths, and a flatter structure
will have longer links than one with more embedding of phrases. Thus, a sentence
like "Former guests include John, Paul, George, Ringo and Steve" has the constituent
structure in Figure 3-3(a) assigned to the top-ranking parse whereas the correct constituent structure is the one of the worst ranked linkage.
The interface to the Link Parser identifies instances of lists of items, like the one
in the example, and corrects the constituent structure within the conjunction.
By using better knowledge of named entities, BRANQA is also able to correct
some parsing errors when identifying noun phrases. The Link Parser fails to parse
the second sentence in the example below, since "Son" is not recognized as a name.
Mr. Son sang a song. (...) Son was happy.
Our Named Entity recognition module identifies "Son" as a name after having
seen "Mr. Son", enabling us to mark the noun phrase.
3.2
Noun Phrase Categorization
The next step after identifying the noun phrases is to determine their values for
relevant features to be used by the resolution engine. These properties include the
principal noun or head (in the case of conjunctions, the heads of all disjuncts are
listed), whether it is a subject or not, and whether each of "he", "she", "it" and
"they" can refer to it. The noun phrase categorization module builds up a table of
noun phrases that have been observed in the document. Noun phrases in the table
26
have a reference field used to mark anaphoric reference to a previously seen noun
phrase. The coreference module is the one in charge of filling that column of the
table. An example can be seen in Figure 3-4.
3.2.1
Heads and Subject
Finding the head of a noun phrase is accomplished through examination of the link
grammar representation of the sentence. The main noun of a simple noun phrase
is the only word with a link crossing the boundaries of the phrase.
In the case
of conjunctions several words can link outside of the phrase; the heads of all noun
phrases that comprise the conjunction are then listed. Occasionally this happens
in noun phrases that are not conjunctions, but a small list of rules helps select the
correct word in most of these cases.
The value for the subject column is obtained directly from the link grammar parse.
If the phrase is a subject, its head will be linked to a verb using one of the subject
link types (S, SF, SX, SI, SFI or SXI). In passive constructions the surface subject
will be linked to the verb through one of these link types, so care must be taken to
not mark it as a subject. This is done by checking for a Pv link, which marks the use
of a passive verb.
3.2.2
Valid References
The most important task performed by the noun categorization module is determining
which pronouns can refer to the noun phrases.
The main resources used for this task are the Named Entity module, Wordnet [10],
and a list of male and female common nouns. The nouns in Wordnet were split into
three lists according to whether they always, sometimes, or never indicate a person.
This was done by checking whether the word was a hyponym of the synset2 "person,
individual, someone, somebody..." in all, some, or none of its senses. Many of the
2A
synset is a set of synonyms; it is the basic element in the Wordnet hierarchy of meanings.
Synset A is a hyponym of synset B if A "IS-A" or "IS-A-KIND-OF" B.
27
words that were in the sometimes-a-person list were then moved to either the nevera-person or always-a-person lists if they were assigned to that list because of senses
which are very infrequent uses of the word. A similar procedure was used to generate
a list of collective nouns like "team" by looking for hyponyms of "group, grouping".
Proper names are handled by the Named Entity module. For the rest of the noun
phrases, the module looks at the head of the noun phrase and checks for number and
gender by using the aforementioned lists, Wordnet's list of irregular plurals and some
pluralization rules.
3.3
Named Entity Recognition
A very simple Named Entity recognition module was built to assist in the identification of noun phrases and the determination of valid references. The module knows
about male and female first names, countries, and US states. It also uses heuristics
to recognize unknown names.
For example, it identifies as company names those sequences of capitalized words
ending in an element of a set containing "Company", "Co", "Inc" and other words
that indicate the entity is a company. Once it has seen the full name ending in one of
these words, it will recognize subsequences of the words in the name as coreferent with
the full name (e.g. after seeing "Lockheed Martin Corp." it will identify "Lockheed",
a word it doesn't know, as a company).
A similar treatment is given to names of
people, which are identified if they contain known first names or personal titles like
"Dr." , "Ms." or "Capt.".
When a capitalized sequence of words is a subsequence of more than one previously
identified named entity, the module will not mark it as coreferent with any of them,
and will set its valid references to be the union of the valid references for the matching
named entities. For example, after seeing "Janis Joplin", "Joplin" would be resolved
to "Janis Joplin" and identified as a female name. If after that "Peter Joplin" is
mentioned in the same article, future mentions of "Joplin" will be left unresolved,
but they will be identified as persons. If the module had not seen any of the full
28
names, "Joplin" would not be marked as a person, as it could be referring to a
company or a place.
Since articles tend to use full names when a company or person is first mentioned,
this strategy gives good performance without requiring a large list of company names
or last names. However, a good list of companies would certainly help, especially in
the case of household names, since these often show up without a "Co.", "Inc." or
any other indication that it is a company.
The scope of person names is limited to the document where they are found,
i.e., the module forgets the names of people when the system starts working on a
new document. Company names, on the other hand, are not forgotten; after seeing
"Lockheed Martin Corp." in one article, "Lockheed" will be identified as a company
in all subsequent articles.
3.4
Coreference Module
Once noun phrases have been identified, it is the turn of the coreference module to
find what they refer to. Noun phrases are resolved left to right, filling the reference
column of the Noun Phrase Table. Possible values for this column are null, unresolved
or a reference to a noun phrase in the table.
When two noun phrases are identified as coreferent, each gets its set of valid
references reduced to the intersection of the two sets. For example in "Kublai Khan,
first Emperor of the Yuan Dynasty", the noun phrases before and after the comma
will be marked coreferent. Initially the system doesn't know that "Kublai Khan"
refers to a man, or even a person, so it will allow reference by "he", "she" and "it".
But when coreference is found with "first Emperor of the Yuan Dynasty", "Kublai
Khan" will get its set of valid references reduced to only "he", since BRANQA knows
that "Emperor" refers to a male person.
The coreference module is comprised of three components: a pleonastic pronoun
detector, a syntactic filter, and the resolution procedure. The pleonastic pronoun
detector identifies non-referential instances of it and the syntactic filter uses binding
29
constraints to eliminate syntactically disallowed candidates. The resolution procedure
is the component that determines the coreference relations. The following subsections
describe these components in more detail.
3.4.1
Pleonastic pronoun detector
It is important to detect pleonastic instances of it (as the one starting this sentence),
in order to avoid assigning referents to pronouns that are non-referential. The Link
Parser detects some of these instances and uses the link type SF to signal them.
However, there are several cases which the Link Parser does not notice. The pleonastic
pronoun detector supplements the parser with a set of rules for detection. These are
based on the rules in [19] presented in Chapter 2, together with some rules added to
handle uncovered cases (e.g., "It's been a long time ... ")
3.4.2
Syntactic filter
The syntactic filter is used to rule out reference to noun phrases on the basis of
intrasentential binding constraints. Chapter 2 mentioned Government and Binding
Theory, and the fact that it can be used to constrain the search for antecedents.
However, it is not easy to obtain from a link grammar the syntactic structure that
we need to make direct use of binding constraints. In theory, we should be able to
construct the necessary categories from the link grammar representation (and the
Link Parser already helps by providing some constituent structure), but in practice
the mapping from a non-categorial grammar to one based on phrase-structure is not
so easy.
A better match to our system is the set of binding constraints for English Slot
Grammar [23] presented in [19, 20] (explained here in Chapter 2). Slot Grammar
belongs to the set of dependency grammars, and these are very similar to link grammars. Quoting Sleator [34]: "In a dependency grammar, a grammatical sentence is
endowed with dependency structure, which is very similar to a linkage. This structure, as defined by Meikuk [24], consists of a set of planar directed arcs among the
30
words that form a tree. Each word (except the root word) has an arc to exactly one
other word, and no arc may pass over the root word. In a linkage (as opposed to a
dependency structure) the links are labeled, undirected, and may form cycles, and
there is no notion of a root word."
While we cannot directly apply the algorithms in [19, 20] to the linkages obtained
from the Link Parser, it is possible to extract some of the information that would
be present in English Slot Grammar by adding direction to the links. The current
implementation does this for a few link types covering many important cases. Future
work includes improving the syntactic filter to detect more cases of syntactically
invalid reference.
3.4.3
Resolution procedure
The resolution procedure is the core of the coreference module. It uses the Named Entity module, the Noun Phrase Table and the other two components of the coreference
module to make decisions regarding coreference relations.
It is not directly dependent on the representation of the parse trees, and this modularity allows experiments with the resolution strategies and the linguistic resources
to be carried out independently of each other.
The current resolution procedure concentrates on pronominal anaphora, but it also
resolves some simple cases of coreference between noun phrases, namely coreference
between named entities, coreference with appositional phrases, and coreference with
modifiers of a named entity.
Named Entities
The Named Entity Module is used to mark coreference between named entities. When
it identifies a noun phrase as matching one of the previously seen named entities, the
coreference module marks the two expressions as coreferent in the Noun Phrase Table.
31
Appositional Phrases
Appositional phrases are typically used to provide an alternative description or name
for an entity. The module recognizes appositions by checking for noun phrases of the
form: (NP <token>+ , (NP <token>+) ,). For example:
(NP Luca Prodan, (NP the great singer, ...))
(NP the great singer, (NP Luca Prodan, ...))
Here the appositional phrases are marked coreferent with the first noun phrase.
A common use of appositions which does not indicate coreference is in the names
of places (e.g., "Cambridge, Massachusetts"). The system checks for this possibility
using a list of countries and U.S. states, which manages to cover the most common
cases in U.S. newspaper articles. In the near future I plan to improve coverage by
using a larger list of names, including well known U.S. and foreign cities.
Modifiers of Named Entities
The case of modifiers of named entities is similar to that of appositions. For phrases of
the form: (NP (NP <token>+) <named-entity>), the embedded modifier is marked
as coreferent with the whole phrase. An example of this kind of construction is:
(NP (NP famous singer) Luca Prodan) ...
Pronominal Anaphors
We finally get to the reason behind all previously described components, which is resolving pronominal anaphors. The other three cases of coreference marked are there
to allow the resolution procedure to work correctly when resolving pronouns. The resolution strategy used belongs to the traditional approach to anaphora resolution, i.e.,
discounting unlikely candidates and then making use of heuristics to pick a referent
from the remaining set of plausible candidates.
The system eliminates from consideration all noun phrases that do not pass the
morphological and syntactic filters. The morphological filter eliminates all phrases
that do not agree in gender and number with the pronoun. This is done by checking
32
1. Unique in Discourse
2. Reflexive
3. Unique in Current + Prior
4. Unique in Current
5. Search Back (until > 1 candidates)
6. Unique Current Subject
7. Unique Prior Subject
/
/
Subject Pron
Subject Pron
Figure 3-5: Resolution rules in BRANQA
the valid reference columns in the Noun Phrase Table (e.g., themselves cannot refer
to a noun phrase whose value in the they column is false).
The syntactic filter further removes from consideration all those noun phrases that
are ruled out by binding constraints. Here the system also makes use of coreference
links marked between noun phrases to apply the constraints. For example:
Peteri made fun of John Smith. John beat himi up.
In the second sentence, binding constraints forbid him from referring to John.
Since John will be marked coreferent with John Smith, this also eliminates John Smith
from consideration, leaving a single possible antecedent, Peter.
The system then uses heuristics to pick an antecedent from the remaining noun
phrases. The heuristics used are taken from the CogNIAC system, described in Section 2.3.2. The core rules of the system are used, together with a rule to search back
for a unique antecedent when no possible antecedents are found (Search Back) and a
rule that looks for a unique antecedent in the subject of the current sentence (Unique
Current Subj). Rule 4, Possessive Pro, was excluded since it was eliminated when
preparing CogNIAC for MUC-6. The rules used by BRANQA are listed in order of
evaluation in Figure 3-5.
In evaluating the rules, when checking for a "single possible antecedent" we count
possible entities, not possible expressions; that is, if there is more than one possible
antecedent but all possible antecedents refer to the same entity, the rule is allowed to
33
trigger. This is one of the reasons why resolving coreference between non-pronominal
noun phrases helps with pronoun resolution.
Before trying to apply any of the rules, the pleonastic pronoun detector is used
to check if this is an instance of non-referential it, in which case it is assigned null
reference. The rules are then evaluated in order, and if none of them trigger, the
pronoun is marked unresolved in the Noun Phrase Table.
34
Chapter 4
Evaluation
In this chapter I evaluate the performance of BRANQA on a test corpus. I explain
the experimental procedure and present the results.
4.1
MUC-7 Coreference Task Corpus
Evaluation was performed on the newspaper articles used in the MUC-7 Coreference Task [15]. These articles have been annotated for coreference using SGML tags,
allowing one to procedurally check the correctness of BRANQA's decisions. Coreference relations are tagged between markables: nouns, noun phrases and pronouns.
Pronouns include both personal and demonstrative pronouns, and with respect to
personal pronouns, all grammatical cases, including the possessive. Dates ("January
23"), currency expressions ("$1.2 billion"), and percentages ("17%") are considered
noun phrases.
Coreference relations are marked only between pairs of elements both of which are
markables. This means that in those cases where the antecedent is a clause rather
than a markable the relation will not be annotated.
Referring expressions and their antecedents are marked as follows:
<COREF ID="100">Lawson Mardon Group Ltd.</COREF> said <COREF ID="101"
TYPE="IDENT" REF="100">it</COREF> ...
All markables have a unique ID within the document, which is used to refer to
35
them through the REF attribute of COREF tags.
Two sets of articles were available for testing, one specified "dry-run" and the
other one "formal", used in different stages of the MUC-7 evaluation. The "dry-run"
set was used for this evaluation, saving the other set for future experiments. This set
consists of thirty New York Times articles, most of them regarding airplane crashes.
4.2
Test Procedure
The calls to BRANQA's pronoun resolution procedure were instrumented so that it
would send its answers through an evaluation module, which checked them against
the key.
In evaluating the system, errors were not chained; that is, answers were corrected,
if possible, before proceeding to resolve the next pronoun. After resolving a pronoun,
the evaluation procedure recorded the answer and checked it against the key. If it
was wrong, it attempted to find a noun phrase in the Noun Phrase Table that would
match the one in the key. This was not always possible for two reasons: sometimes
the pronoun was not marked on the key because it had no markable antecedent, and
sometimes parser errors caused BRANQA not to identify the marked noun phrase.
In both cases the pronoun was marked unresolved in the Noun Phrase Table before
going on to the next pronoun. If the pronoun was not marked for lack of a markable
antecedent, the evaluation module considered an answer of unresolved as correct.
4.3
Results
Table 4.1 shows BRANQA's precision and recall characteristics on 336 third person
pronouns in the test corpus, broken down by rule. The resolution of a pronoun to null
(for the pleonastic case) was considered correct if the pronoun had no antecedent in
the key (which could happen either if the pronoun was actually pleonastic, or if it had
an antecedent that was not markable). I checked the eight cases marked pleonastic
in the test and they were all in fact pleonastic.
36
Rule
Pleonastic
Unique in Disc.
Reflexive
Unique Cur+Prior
Unique Cur
Search Back
Subject Cur
Subject Prev
Unresolved (correct)
Total
Contribution to Recall
2%
(8/336)
6%
(20/336)
1%
(4/336)
14%
(47/336)
17%
(58/336)
0%
(0/336)
3%
(10/336)
6%
(21/336)
3%
(10/336)
53% (177/336)
Precision
100% (8/8)
100% (20/20)
80% (4/5)
85% (47/55)
95% (58/61)
never used
83% (10/12)
84% (21/25)
100% (10/10)
91% (177/195)
Table 4.1: Test Results by Rule
The "Unresolved (correct)" line of the table shows the number of pronouns that
were left unresolved but had no antecedent in the key, and were thus considered
correct for the purpose of computing precision and recall statistics.
Table 4.2 shows the results broken down by pronoun.
The precision/recall characteristics of BRANQA are comparable to those of
CogNIAC. In the first experiment on narrative texts CogNIAC achieved 92% precision for 64% recall, and in the second test, on MUC-6 documents, it yielded 73%
precision for a recall of 75%. Especially in the second case, CogNIAC's recall is quite
higher than that of BRANQA, but this came at a significant cost in precision.
Of the 18 incorrect resolutions, six happened in cases where there was no antecedent marked on the key (this included pleonastic pronouns, but also cases where
the antecedent was not markable, e.g., "they" refering to two people mentioned in
separate sentences). Three incorrect resolutions can be attributed to misclassification
of a word according to gender and number. Another three were due to parser errors,
and the remaining six can be attributed to failures of the resolution rules.
Of the unresolved cases, several could have been resolved with the existing rules
if not for misclassification of words, failure to eliminate candidates by the syntactic
filter, and parser errors leading to faulty identification of noun phrases. A detailed
case by case analysis of the 140 unresolved pronouns was not carried out.
37
Correct
37
11
40
14
2
36
0
11
Wrong
0
1
7
5
0
0
1
1
Unresolved
24
7
30
13
4
13
0
4
0
0
0
0
0
0
its
itself
them
their
12
3
1
9
1
0
0
2
23
0
11
11
theirs
0
0
0
2
178
0
18
0
140
Pronoun
he
she
it
they
him
his
himself
her
hers
herself
themselves
Total
Precision
100%
92%
85%
74%
100%
100%
0%
92%
-
Recall
61%
58%
52%
44%
33%
73%
0%
69%
-
92%
100%
100%
82%
33%
100%
8%
41%
-
100%
91%
100%
53%
Table 4.2: Test Results by Pronoun
4.4
Effect on Question Answering
I chose to use the rules from Baldwin's system because it was developed with a focus
on high precision coreference and I believe that high precision is more important than
high recall when doing question answering. I think it is better not to give an answer
than to give the wrong answer. On the other hand, the best way not to make mistakes
is to never attempt to resolve pronouns. The purpose of a resolution tool is to raise
the recall characteristics of the systems using it, and thus a balance must be struck
between precision and recall.
More important than precision/recall values for resolution on a test corpus are
the effects that the tool has on the precision/recall characteristics of the systems
using it. Lacking a test set for Sapere, I could not evaluate BRANQA's effect on it.
However, I did look at some articles from the WorldBook Encyclopedia that Sapere
indexed, in order to get an idea of the potential benefits for question answering.
I ran BRANQA on the articles, and then evaluated its results manually, since no
coreference annotations had been made.
38
Taking the article on Afghanistan as an example, we find that out of 42 thirdperson pronoun occurrences, the system resolved 27 of them correctly, resolved 3
incorrectly, and left 12 occurrences unresolved (all 12 had a markable antecedent).
An example of useful resolution for question answering is that of "he" and "his" to
"Abdur Rahman" in "After he died in 1901, his policies were continued by his son,
Habibullah Khan."
Since this is the only article in the Encyclopedia mentioning
Abdur Rahman, the resolution of "he" adds information that we did not have before.
For the three incorrect resolutions, it does not seem like they would cause Sapere to
return wrong answers to questions people would ask. The following lists the mistakes
and the reasons behind them:
* "They" resolved to "their communities" instead of "Mullahs" in "They interpret
Islamic law and educate the young" (failure to recognize "Mullahs" as plural).
* "It" resolved to "the game" in "In the game, dozens of horsemen try to grab a
headless calf and carry it across a goal" (a bug in the syntactic filter eliminating
"a headless calf" from the set of possible antecedents)
* "His" resolved to "The British" instead of "Abdur Rahman Khan" in "The
British agreed to recognize his authority over the country's internal affairs"
(misclassification of "The British" as a person's name)
More formal testing is necessary, but from what I have seen so far I am led to
believe that BRANQA would improve Sapere's performance if used before indexing
relations.
39
Chapter 5
Future Work
In this chapter I present a number of ways in which the system will be improved in
the near future, and possible research projects that are suggested by this thesis.
5.1
Improvements
5.1.1
Quoted Speech
Several of the pronouns left unresolved in the evaluation could have been assigned an
antecedent if better machinery had been added to handle quoted speech. Quotations
are very common in newspaper articles, and it seems plausible to construct a module
that accurately keeps track of who is the speaker being quoted. This can then be
used to add binding constraints for the pronouns in quotations. I expect this should
improve the performance of the system, at least for the domain of newspaper articles.
5.1.2
Named Entity Module
The named entity module developed for this system is very simple and leaves ample
room for improvement. A better named entity tagger is currently being developed
by the InfoLab Group, and I plan to integrate it into BRANQA when it becomes
available.
40
5.1.3
Syntactic Filter
It was previously mentioned that the current implementation of the syntactic filter
does not cover all cases that are ruled out by binding constraints in [19, 20]. Only
a few link types of English Link Grammar are being used within our filter. Several
failures to resolve a pronoun in the test corpus were due to the syntactic filter failing
to establish disjoint reference. I plan to extend the coverage of the filter and to correct
some of the mistakes it currently makes.
5.2
5.2.1
Future research projects
Statistics as a proxy for world knowledge
Baldwin suggests that CogNIAC achieves good performance with a simple set of rules
because it works on those pronoun occurrences which do not need world knowledge
to be resolved. However, there are many cases which do need world knowledge for
adequate resolution.
BRANQA's current world knowledge is limited to classification of words according
to whether they denote people or groups, and small lists of people's names, personal
titles, country names and U.S. states. It doesn't know, for example, that days do not
own vessels, and this led to one of the incorrect resolutions in our test corpus, where
"their" in "their vessel" was resolved to "the past two days" because this phrase
passed the morphological and syntactic filters and triggered Rule 6, Unique Subject
/
Subject Pron.
While it is extremely difficult to add large amounts of knowledge to rule out invalid
antecedents on semantic grounds, it is possible to to extract statistics from a large
corpus to help disambiguate references. In the example above, we could have used the
fact that "Coonan and 10 other crew members" is a noun phrase referring to people
(something the system can currently determine) and statistics showing that people
co-occur with "vessel" in a possessive relation more often than "days" (infinitely more
often, in this case).
41
Dagan and Itai [8] follow this approach using frequency of co-occurrence in subjectverb-object relations to help resolve the pronoun "it". Dagan and Itai show that
their system correctly handles many anaphors that BRANQA's rules would leave
unresolved. For example:
They knew full well that the companies held tax moneyi aside for collection
later on the basis that the government said it3 would collect iti.
By using the JLink relation extraction system currently in development by the
InfoLab Group, we can extend their approach to use other relations like modification
and possession, hopefully improving performance.
5.2.2
Alternative resolution procedures
The literature shows a wide variety of approaches and methods developed for the
resolution of anaphora.
Having laid down the infrastructure that identifies noun
phrases and their relevant properties, it could be interesting to perform experiments
with different resolution procedures. This could potentially lead to a set of resolution
systems with different precision and recall characteristics from which developers could
choose according to their preferences. BRANQA was designed with a bias towards
high precision, but for some applications a higher recall tool might be desirable.
5.2.3
Integration with other systems
The most important future project will be the integration of BRANQA with the rest
of the group's systems. Our goal in this thesis was not to extend the state of the art in
anaphora resolution, but to build a useful tool that improves our question answering
performance. I evaluated the performance of the system on a test corpus, but the
crucial evaluation that remains to be done is testing how much it helps the other
systems developed by the group.
Integration with systems like Sapere, that work by first indexing a corpus should
be straightforward. No changes need to be made in the original system if we simply
replace occurrences of pronouns in the corpora with the referents found by BRANQA.
42
This will allow for initial experiments to be performed without much work once we
have adequate test sets for our question answering systems.
If the systems wanted to take into account the fact that our resolution is not
perfect, we could instead mark noun phrases with SGML coreference tags as done
with the MUC-7 evaluation. This would allow the users of our system to have access
to both the original pronoun and our resolution, leaving it up to them to decide what
to do with our output.
Integration with START will be more involved, as will be the evaluation of improvements to performance.
We envision START carrying out dialogues with its
users, using anaphora resolution to allow for more natural conversations. This will
require changes on the START side to allow this shift from disconnected series of
questions and answers to actual dialogues.
43
Chapter 6
Contributions
This thesis contributes to research at the InfoLab Group by:
* Providing an overview of the relevant literature in anaphora resolution.
e Showing an independent replication of the resolution strategy used in the
CogNIAC system, achieving comparable results.
* Presenting the design and implementation of an anaphora resolution tool that,
in informal evaluation, appears to be helpful for improving the performance of
question answering systems.
e Building useful infrastructure that can be reused in future research projects: an
interface to the Link Parser attempting to fix some of its deficiencies, a noun
phrase categorization tool, a simple named entity module, and an architecture
that allows for easy experimentation with anaphora resolution methods.
I hope this work will bear fruits by improving the performance of our systems and
by motivating new projects that make use of anaphora resolution and coreference in
general. I believe it should at least provide a starting point for the development of
better systems that tackle question answering using coreference.
44
Bibliography
[1] James Allen. Natural Language Understanding. The Benjamin/Cummings Publishing Company Inc., Redwood City, California, second edition, 1995.
[2] B. Amit.
Evaluation of coreferences and coreference resolution systems.
In
Proceedings of the First Language Resource and Evaluation Conference, May
1998.
[3] Carl Lee Baker. English Syntax. MIT Press, Cambridge, Massachusetts, second
edition, 1995.
[4] Breck Baldwin. CogNIAC: High precision coreference with limited knowledge
and linguistic resources. In Proceedings of the ACL Workshop on Operational
Factors in Practical,Robust Anaphora Resolution for Unrestricted Texts, pages
38-45, 1997.
[5] Susan E. Brennan, Marilyn W. Friedman, and Carl Pollard. A centering approach
to pronouns. In A CL Proceedings, 2 5 h Annual Meeting, pages 155-162, 1987.
[6] Donna K. Byron and Joel R. Tetreault.
A flexible architecture for reference
resolution. In Proceedings of the Ninth Conference of the European Chapter of
the Association for ComputationalLinguistics, 1999.
[7] Noam Chomsky. Lectures on Government and Binding. Foris Publications, 1981.
[8] Ido Dagan and Alon Itai. A statistical filter for resolving pronoun references. In
Y.A. Feldman and A. Bruckstein, editors, Artificial Intelligence and Computer
Vision, pages 125-135. Elsevier, 1991.
45
[9] Deborah A. Dahl and Catherine N. Ball. Reference resolution in pundit. Technical Report CAIT-SLS-9004, Paoli: Center for Advanced Information Technology,
March 1990.
[10] Christiane Fellbaum, editor. Wordnet: An Electronic Lexical Database. MIT
Press, Cambridge, Massachusetts, 1998.
[11] N. Ge, J. Hale, and E. Charniak. A statistical approach to anaphora resolution.
In Proceedings of the Sixth Workshop on Very Large Corpora, pages 161-171,
1998.
[12] Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein. Providing a unified
account of definite noun phrases in discourse. In Proceedings of the
2 1"
Annual
meeting of the Association for ComputationalLinguistics, pages 44-50, 1983.
[13] Liliane Haegeman. Introduction to Government and Binding Theory. Blackwell,
1991.
[14] Irene Roswitha Heim. The Semantics of Definite and Indefinite Noun Phrases.
Doctor of Philosophy, University of Massachusetts, 1982.
[15] Lynette Hirschman and Nancy Chinchor. Coreference task definition v3.0. In
Proceedings of the Seventh Message Understanding Conference, July 1997.
[16] Jerry R. Hobbs. Pronoun resolution. Technical Report Technical Report 76-1,
Department of Computer Science, City College, City University of New York,
1976.
[17] Boris Katz. Using English for indexing and retrieving. In P.H. Winston and S.A.
Shellard, editors, Artificial Intelligence at MIT: Expanding Frontiers,volume 1.
MIT Press, 1990.
[18] Boris Katz. Annotating the world wide web using natural language. In Proceedings of the 5 th RIAO Conference on Computer Assisted Information Searching
on the Internet, 1997.
46
[19] Shalom Lappin and Herbert J. Leass. An algorithm for pronominal anaphora
resolution. Computational Linguistics, 20(4):535-561, 1994.
[20] Shalom Lappin and Michael McCord. A syntactic filter on pronominal anaphora
in slot grammar. In Proceedings of the 2 8th Annual Meeting of the Association
for Computational Linguistics, pages 135-142, 1990.
[21] Geoffrey Leech and Roger Garside. Running a grammar factory: the production of syntactically analysed corpora or 'treebanks'.
In Stig Johansson and
Anna-Brita Stenstrom, editors, English Computer Corpora: Selected Papers and
Bibliography. Mouten de Gruyter, 1991.
[22] Jimmy J. Lin. Indexing and Retrieving Natural Language Using Ternary Expressions. Master of Engineering, Massachusetts Institute of Technology, 2001.
[23] Michael McCord, Arendse Bernth, Shalom Lappin, and Wlodek Zadrozny. Natural language processing within a slot grammar framework. InternationalJournal
of Artificial Intelligence Tools, 1(2):229-277, 1992.
[24] Igor A. Mekeuk. Dependency Syntax: Theory and Practice. State University of
New York Press, 1988.
[25] Ruslan Mitkov.
A new approach for tracking center.
In Proceedings of the
InternationalConference "New Methods in Language Processing", 1994.
[26] Ruslan Mitkov. Factors in anaphora resolution: they are not the only things that
matter. In Proceedings of the A CL Workshop on OperationalFactorsin Practical,
Robust Anaphora Resolution for Unrestricted Texts, pages 14-21, 1997.
[27] Ruslan Mitkov. Anaphora resolution: The state of the art. Working paper (Based
on the COLING'98/ACL'98 tutorial on anaphora resolution), 1999.
[28] Ruslan Mitkov, Richard Evans, Constantin Orasan, Catalina Barbu, Lisa Jones,
and Violeta Sotirova. Coreference and anaphora: developing annotating tools,
annotated resources and annotation strategies. In Proceedings of the Discourse
47
Anaphora and Anaphora Resolution Colloquium (DARC'2000), pages 49-58,
2000.
[29] Thomas S. Morton. Using coreference in question answering. In Proceedings of
the 8"h Text REtrieval Conference (TREC-8), 1999.
[30] MUC-6 Program Committee. Coreference task definition v2.3. In Proceedings of
the Sixth Message Understanding Conference, November 1995.
[31] Candace Lee Sidner. Towards a computational theory of definite anaphora comprehension in english discourse. Technical Report AITR-537, MIT Al Lab, 1979.
[32] Candace Lee Sidner. Focusing in the comprehension of definite anaphora. In
Barbara Grosz, Karen Sparck Jones, and Bonny Lynn Webber, editors, Readings
in Natural Language Processing. Morgan Kaufman, 1986.
[33] Daniel Sleator and Davy Temperley. Parsing english with a link grammar. Technical Report CMU-CS-91-196, Carnegie Mellon University, October 1991.
[34] Daniel Sleator and Davy Temperley. Parsing english with a link grammar. In
Third International Workshop on Parsing Technologies, August 1993.
[35] Joel R. Tetreault. Analysis of syntax-based pronoun resolution methods. In
Proceedings of the Association for Computational Linguistics, 1999.
[36] Marylin A. Walker, Masayo lida, and Sharon Cote. Japanese discourse and the
process of centering. Technical Report IRCS Report No. 92-14, The Institute for
Research in Cognitive Science, University of Pennsylvania, 1992.
[37] Patrick H. Winston.
Artificial Intelligence. Addison-Wesley, Reading, Mas-
sachusetts, third edition, 1992.
48
Download