Presentation - Language Technologies Research Centre

advertisement
Anaphora Resolution
Sobha Lalitha Devi
AU-KBC Research Centre
MIT Campus of Anna University
Chennai-44
sobha@au-kbc.org
Contents






Introduction to Anaphora and
Anaphora Resolution
Types of Anaphora
Process of Anaphora Resolution
Tools
Applications
References
Introduction
What is
Anaphora
Antecedent
Anaphora Resolution
1. Sabeer Bhatia arrived at Los Angeles International
Airport at 6 p.m. on September 23, 1998. His flight
from Bangalore had taken 22hrs and he was
starving.
[RD, NOV 2000]
Etymology of Anaphora
ANA- Back, Upstream, Back upstream
Phora-
Act of Carrying
Anaphora - Act of Carrying Back
What is Anaphora
Anaphora, in discourse, is a device for making an
abbreviated reference (containing fewer bits of
disambiguating information, rather than being
lexically or phonetically shorter) to some entity (or
entities) in the expectation that the receiver of the
discourse will be able to disabbreviate the reference
and, thereby, determine the identity of the entity.
(Hirst 1981)
Cataphora

When “anphor” precedes the
antecedent

Because she was going to the
departmental store, Mary was asked
to pick up the vegetables.
Relevance from the Linguistics
point of view



Binding Theory is one of the major results of
the principles and parameters approach
developed in Chomsky (1981) and is one of
the mainstays of generative linguistics.
The Binding Theory deals with the relations
between nominal expressions and possible
antecedents.
It attempts to provide a structural account of
the complementarity of distribution between
pronouns, reflexives and R-expressions.
Dichotomy Between Linguistic and
NLP




The Binding Theory (and its various formulations)
deals only with intra-sentential anaphora,
A very small subset of the anaphoric phenomenon
that practical NLP systems are interested in
resolving.
A much larger set of anaphoric phenomenon is the
resolution of pronouns inter-sententially.
This problem is dealt with by Discourse
Representation Theory and more specifically by
Centering Theory (Grosz et al., 1995)..
Type of Anaphors
The Prime Minister is yet to arrive and he is
expected at the central hall at any time. [The
Times of India, Feb 2001]
This book is about Anaphora Resolution. The
book is designed to help beginners in the
field and its author hopes that it will be
useful.
John screamed, as did Mary .
Pronominal anaphora
Vajpayee hits back forcefully when he told the
opposition today “sometimes we fall prey to
the media and sometimes you do. [Indian
Express 2001]
Possessive
Priyanka eats only chicken sandwiches
before going to take any exam; nothing else
goes down her gullet that day.[Indian Express, 13
March 2001]
Reflexive Pronoun
Finally ,Danian heaved himself up and lay on a waiting
stretcher.
Demonstrative Pronoun
John had lots of packing to do before he shifted his
house. This was something he never liked….
Relative Pronoun
Stumper Sameer Dige, who made his test debut, failed
to show fast reflexives when it mattered.
Pleonastic It
Cognative
a. It is believed that…..
b. It appears that…..
Modal Adjectives
c. It is dangerous……
d. It is important…..
Temporal
e. It is five o’clock
f. It is winter
Weather verbs
g. It is raining
f. It is snowing
Distance
h. How far it is to Chennai?
Non-anaphoric uses of pronouns
He that plants thorns must never expect to gather
roses.
He who dares wins.
Deictic
He seems remarkably bright for a child of his age.
Noun Phrase Anaphora
Definite descriptions and Proper names
Roy Kaene has warned Manchester United he may
snub their pay deal. United’s skipper is even
hinting that unless the future Old Trafford Package
meets his demands, he could quit the club in June
2000. Irishman Keane, 27, still has 17 months to
run on his current 23,000 pound a week contract
and wants to commit himself to United for life. Alex
Ferguson’s No 1 player confirmed: If it’s not the
contract I want, I won’t sign”.
Coreference
Computational Linguists from many
different countries attended the tutorial.
The participants found it hard to cope
with the speed of the presentation,
nevertheless they manages to take
extensive notes.
What is Anaphora Resolution

The Process of finding the antecedent for an
Anaphor is Anaphora resolution


Anaphor-The reference that point to the previous
item.
Antecedent-The entity to which the anaphor
refers
Different Approaches In Anaphora
Resolution

Rule Based

Statistical Based
Lappin and Leass (1994) Anaphora
Resolution Algorithm
 The Lappin and Leass(1994) anaphora resolution algorithm uses
salience weight in determining the antecedent to the pronominals.
 It requires as input a fully parsed sentence structure and
uses hierarchy in identifying the subject, object etc.
 This algorithm uses syntactic criteria to rule out noun
phrases that cannot possibly corefer with it.
 The antecedent is then chosen according to a ranking based
on salience weights.
The salience Factors and Weights
A pronoun P is non-coreferential with a (non-reflexive or nonreciprocal) noun phrase N if any of the following conditions
hold:





P and N have incompatible agreement features.
P is in the argument domain of N.
P is in the adjunct domain of N. P is an argument of a head
H, N is not a pronoun, and N is contained in H.
P is in the NP domain of N.
P is a determiner of a noun Q, and N is contained in Q.
Examples
Condition 1:
The woman said that he is funny.
Condition 2:
She likes her. John seems to want to see him.
Condition 3:
She sat near her.
Condition 4:
He believes that the man is amusing.
This is the man he said John wrote about.
Condition 5:
John’s portrait of him is interesting.
Salience Factors and Weights
Salience factor types with initial weights
Factor type
Initial weight
Sentence recency
100
Subject emphasis
80
Existential emphasis
70
Accusative emphasis
50
Indirect object and oblique
complement emphasis
40
Head noun emphasis
80
Non-adverbial emphasis
50
Kennedy 1996
The linguistic analysis for anaphora resolution includes
The output of a part of speech tagger,
Augmented with syntactic function annotations for each
input token;
Using LINGSOFT
A set of patterns are used for identifying



The NP Chunking with position of the NP in the text:
Nominal Sequencing in two subordinate syntactic
environments:
a. in an adverbial adjunct
b. in an NP (i.e. containment in a prepositional
or clausal complement of a noun, or
containment in a relative clause)
Expletive “it”:
Anaphora Resolution

Uses Lappin and Lease algorithm
SENT-S: 100 iff in the current sentence
CNTX-S: 50 iff in the current context
SUBJ-S: 80 iff GFUN = subject
EXST-S: 70 iff in an existential construction
POSS-S: 65 iff GFUN = possessive
ACC-S: 50 iff GFUN = direct object
DAT-S: 40 iff GFUN = indirect object
OBLQ-S: 30 iff the complement of a preposition
HEAD-S: 80 iff EMBED = NIL
ARG-S: 50 iff ADJUNCT = NIL
Mitkov 1997

No Parsing of the Input Sentence

Boosting indicators

First Noun Phrases: A score of +1 is assigned to the first
NP in a sentence.

Indicating Verbs: A score of +1 is assigned to those NPs
immediately following a verb which is a member of a
predefined set (including verbs such as discuss, present,
illustrate, identify, summarise, examine, describe, define,
show, check, develop, review,
MARS Cont….


Lexical Reiteration: A score of +2 is
assigned to those NPs repeated twice or
more in the paragraph in which the
pronoun appears, a score of +1 is
assigned to those NPs repeated once in
that paragraph.
Section Heading Preference: A score of
+1 is assigned to those NPs that also
occur in the heading of the section in
which the pronoun appears.
Boosting indicators contd..
Collocation Match: A score of +2 is assigned to those
NPs that have an identical collocation pattern to the
pronoun.

Immediate Reference: A score of +2 is assigned to those
NPs appearing in constructions of the form
“… (You) V1 NP … con (you) V2 it (con (you) V3 it)”, where
con Є {and/or/before/after…}.

Sequential Instructions: A score of +2 is applied to NPs in
the NP1 position of constructions of the form: “To V1 NP1
V2 NP2. (Sentence). To V3 it, V4 NP4“ the noun phrase
NP1 is the likely antecedent of the anaphor it (NP1 is
assigned a score of 2).

Term Preference: A score of +1 is applied to those NPs
identified as representing terms in the genre of the text.

Impeding indicators

Indefiniteness: Indefinite NPs are
assigned a score of -1.

Prepositional Noun Phrases: NPs
appearing in prepositional phrases are
assigned a score of -1.
“Vasisth” a Rule Based Anaphora
Resolution System
1. mo:han(i)
avanRe(i) kuttiye
mohan
he-poss
child-acc
(Mohan saw his child.)
2. mo:han(i) avanRe(i) kuttiye
kantu
mohan
he-poss
kantu.
see-pst
ennu kRisnan paRannu.
child-acc see-pst compl krishnan say-pst
(Krishnan said that Mohan saw his child.)
3. *mo:han(i) avane(i) aticcu.
mohan
he-acc beat-pst
(Mohan beat him.)
4. mo:han avane(i) aticcu
ennu
kRisnan(i) paRannu.
mohan
he-acc beat-pst compl krishnan say-pst
(Krishnan said that Mohan beat him.)
The Algorithm for Intra-sentential
Anaphora
A pronoun P is coreferential with an NP iff the following
conditions hold:
a. P and NP have compatible P, N, G features.
b. P does not precede NP.
c. If P is possessive, then NP is the subject of
the clause which contains P.
d. If P is non-possessive, then NP is the subject
of the immediate clause which does not
contain P.




Vasisth is a multilingual Anaphora
Resolution system
Rule based
With minimum Parsing
Exploit the Morphology of Indian
Languages
“VASISTH” Using Salience Measure
for Indian Languages
No In-depth Parsing
Exploit the Rich Morphology
of the
Language
The
analysis depends on the salience
weight of the candidate (NP) for the
antecedent-hood of an anaphor from a
list of probable candidates.
The salience weight assignment
a) The current sentence gets a score of 50 and it reduces by 10
for each preceding sentence till it reaches the fifth sentence.
The system considers five sentences for identifying the
antecedent.
b) The current clause gets a score of 75 if the pronoun present
in the clause is a possessive pronoun and if it is a nonpossessive pronoun it gets zero score.
c) The immediate clause gets the score 70 in the case of
Possessive pronoun and gets a score of 75 for nonpossessive pronouns.
d) For non-immediate clause, the possessive pronoun gets a
score of 30 and non-possessive pronoun gets a score of 65.
e)The analysis showed that the subject could be the
most probable antecedent for the pronoun. The
case markings the subject of a sentence could
take are nominative and dative.
A Nominative, a Dative and a Possessive NP
with a nominative/Dative head could become a
subject of a sentence.
f) The direct object of a sentence could be identified by the
case markings and all the case markings other than the
subject are considered for object. The next most probable
NP for antecedent-hood is the direct object and hence it
gets a score of 40.
g) The third NP in a clause, which is not identified as the
subject or object, is considered as the indirect object and
gets a low score of 30.
Salience factor weights for Indian Languages
Salience Factors
Current sentence
Weights
50- Reduced by 10 for preceding
sentences upto 5th sentence
Possessive
Current clause
Immediate clause
Non-immediate clause
75
70
30
Non-Possessive
Current clause
Immediate clause
Non-immediate clause
Possessive and Non-Possessive
N.Nom
N.Poss
N.Dat
N.Acc, Loc, Instr…
N.others(3rd NP)
0
75
65
80
50
50
40
30
How it works
The salience weight to an NP is assigned in the following
way
Identify the Pronoun
 Consider Four sentences above the sentence containing
the Pronoun
 Consider all the NPs preceding the Pronoun ( This is
the general rule)
Here we take some NPs which follow the the
Pronoun since Tamil
All Indian languages are relatively free word
Order
Assign Salience Weights.
The NP which gets the maximum salience weight
and agrees in png with the anaphor is
considered as the antecedent to the anaphor
Tools




GATE
Java-RAP (pronouns)
GUITAR (Poesio & Kabadjov, 2004;
Kabadjov, 2007)
BART (Versleyet al, 2008)
Where it is required?




Machine Translation
Information Extraction
Summarization
And in……….almost all NLU applications
References


Massimo Poesio Slides: “Anaphora
resolution for Practical task”
Ruslan Mitkov: “MARS a Knowledge
Poor anaphora resolution system”
Thank You
Download