Anaphora Resolution Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 sobha@au-kbc.org Contents Introduction to Anaphora and Anaphora Resolution Types of Anaphora Process of Anaphora Resolution Tools Applications References Introduction What is Anaphora Antecedent Anaphora Resolution 1. Sabeer Bhatia arrived at Los Angeles International Airport at 6 p.m. on September 23, 1998. His flight from Bangalore had taken 22hrs and he was starving. [RD, NOV 2000] Etymology of Anaphora ANA- Back, Upstream, Back upstream Phora- Act of Carrying Anaphora - Act of Carrying Back What is Anaphora Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity. (Hirst 1981) Cataphora When “anphor” precedes the antecedent Because she was going to the departmental store, Mary was asked to pick up the vegetables. Relevance from the Linguistics point of view Binding Theory is one of the major results of the principles and parameters approach developed in Chomsky (1981) and is one of the mainstays of generative linguistics. The Binding Theory deals with the relations between nominal expressions and possible antecedents. It attempts to provide a structural account of the complementarity of distribution between pronouns, reflexives and R-expressions. Dichotomy Between Linguistic and NLP The Binding Theory (and its various formulations) deals only with intra-sentential anaphora, A very small subset of the anaphoric phenomenon that practical NLP systems are interested in resolving. A much larger set of anaphoric phenomenon is the resolution of pronouns inter-sententially. This problem is dealt with by Discourse Representation Theory and more specifically by Centering Theory (Grosz et al., 1995).. Type of Anaphors The Prime Minister is yet to arrive and he is expected at the central hall at any time. [The Times of India, Feb 2001] This book is about Anaphora Resolution. The book is designed to help beginners in the field and its author hopes that it will be useful. John screamed, as did Mary . Pronominal anaphora Vajpayee hits back forcefully when he told the opposition today “sometimes we fall prey to the media and sometimes you do. [Indian Express 2001] Possessive Priyanka eats only chicken sandwiches before going to take any exam; nothing else goes down her gullet that day.[Indian Express, 13 March 2001] Reflexive Pronoun Finally ,Danian heaved himself up and lay on a waiting stretcher. Demonstrative Pronoun John had lots of packing to do before he shifted his house. This was something he never liked…. Relative Pronoun Stumper Sameer Dige, who made his test debut, failed to show fast reflexives when it mattered. Pleonastic It Cognative a. It is believed that….. b. It appears that….. Modal Adjectives c. It is dangerous…… d. It is important….. Temporal e. It is five o’clock f. It is winter Weather verbs g. It is raining f. It is snowing Distance h. How far it is to Chennai? Non-anaphoric uses of pronouns He that plants thorns must never expect to gather roses. He who dares wins. Deictic He seems remarkably bright for a child of his age. Noun Phrase Anaphora Definite descriptions and Proper names Roy Kaene has warned Manchester United he may snub their pay deal. United’s skipper is even hinting that unless the future Old Trafford Package meets his demands, he could quit the club in June 2000. Irishman Keane, 27, still has 17 months to run on his current 23,000 pound a week contract and wants to commit himself to United for life. Alex Ferguson’s No 1 player confirmed: If it’s not the contract I want, I won’t sign”. Coreference Computational Linguists from many different countries attended the tutorial. The participants found it hard to cope with the speed of the presentation, nevertheless they manages to take extensive notes. What is Anaphora Resolution The Process of finding the antecedent for an Anaphor is Anaphora resolution Anaphor-The reference that point to the previous item. Antecedent-The entity to which the anaphor refers Different Approaches In Anaphora Resolution Rule Based Statistical Based Lappin and Leass (1994) Anaphora Resolution Algorithm The Lappin and Leass(1994) anaphora resolution algorithm uses salience weight in determining the antecedent to the pronominals. It requires as input a fully parsed sentence structure and uses hierarchy in identifying the subject, object etc. This algorithm uses syntactic criteria to rule out noun phrases that cannot possibly corefer with it. The antecedent is then chosen according to a ranking based on salience weights. The salience Factors and Weights A pronoun P is non-coreferential with a (non-reflexive or nonreciprocal) noun phrase N if any of the following conditions hold: P and N have incompatible agreement features. P is in the argument domain of N. P is in the adjunct domain of N. P is an argument of a head H, N is not a pronoun, and N is contained in H. P is in the NP domain of N. P is a determiner of a noun Q, and N is contained in Q. Examples Condition 1: The woman said that he is funny. Condition 2: She likes her. John seems to want to see him. Condition 3: She sat near her. Condition 4: He believes that the man is amusing. This is the man he said John wrote about. Condition 5: John’s portrait of him is interesting. Salience Factors and Weights Salience factor types with initial weights Factor type Initial weight Sentence recency 100 Subject emphasis 80 Existential emphasis 70 Accusative emphasis 50 Indirect object and oblique complement emphasis 40 Head noun emphasis 80 Non-adverbial emphasis 50 Kennedy 1996 The linguistic analysis for anaphora resolution includes The output of a part of speech tagger, Augmented with syntactic function annotations for each input token; Using LINGSOFT A set of patterns are used for identifying The NP Chunking with position of the NP in the text: Nominal Sequencing in two subordinate syntactic environments: a. in an adverbial adjunct b. in an NP (i.e. containment in a prepositional or clausal complement of a noun, or containment in a relative clause) Expletive “it”: Anaphora Resolution Uses Lappin and Lease algorithm SENT-S: 100 iff in the current sentence CNTX-S: 50 iff in the current context SUBJ-S: 80 iff GFUN = subject EXST-S: 70 iff in an existential construction POSS-S: 65 iff GFUN = possessive ACC-S: 50 iff GFUN = direct object DAT-S: 40 iff GFUN = indirect object OBLQ-S: 30 iff the complement of a preposition HEAD-S: 80 iff EMBED = NIL ARG-S: 50 iff ADJUNCT = NIL Mitkov 1997 No Parsing of the Input Sentence Boosting indicators First Noun Phrases: A score of +1 is assigned to the first NP in a sentence. Indicating Verbs: A score of +1 is assigned to those NPs immediately following a verb which is a member of a predefined set (including verbs such as discuss, present, illustrate, identify, summarise, examine, describe, define, show, check, develop, review, MARS Cont…. Lexical Reiteration: A score of +2 is assigned to those NPs repeated twice or more in the paragraph in which the pronoun appears, a score of +1 is assigned to those NPs repeated once in that paragraph. Section Heading Preference: A score of +1 is assigned to those NPs that also occur in the heading of the section in which the pronoun appears. Boosting indicators contd.. Collocation Match: A score of +2 is assigned to those NPs that have an identical collocation pattern to the pronoun. Immediate Reference: A score of +2 is assigned to those NPs appearing in constructions of the form “… (You) V1 NP … con (you) V2 it (con (you) V3 it)”, where con Є {and/or/before/after…}. Sequential Instructions: A score of +2 is applied to NPs in the NP1 position of constructions of the form: “To V1 NP1 V2 NP2. (Sentence). To V3 it, V4 NP4“ the noun phrase NP1 is the likely antecedent of the anaphor it (NP1 is assigned a score of 2). Term Preference: A score of +1 is applied to those NPs identified as representing terms in the genre of the text. Impeding indicators Indefiniteness: Indefinite NPs are assigned a score of -1. Prepositional Noun Phrases: NPs appearing in prepositional phrases are assigned a score of -1. “Vasisth” a Rule Based Anaphora Resolution System 1. mo:han(i) avanRe(i) kuttiye mohan he-poss child-acc (Mohan saw his child.) 2. mo:han(i) avanRe(i) kuttiye kantu mohan he-poss kantu. see-pst ennu kRisnan paRannu. child-acc see-pst compl krishnan say-pst (Krishnan said that Mohan saw his child.) 3. *mo:han(i) avane(i) aticcu. mohan he-acc beat-pst (Mohan beat him.) 4. mo:han avane(i) aticcu ennu kRisnan(i) paRannu. mohan he-acc beat-pst compl krishnan say-pst (Krishnan said that Mohan beat him.) The Algorithm for Intra-sentential Anaphora A pronoun P is coreferential with an NP iff the following conditions hold: a. P and NP have compatible P, N, G features. b. P does not precede NP. c. If P is possessive, then NP is the subject of the clause which contains P. d. If P is non-possessive, then NP is the subject of the immediate clause which does not contain P. Vasisth is a multilingual Anaphora Resolution system Rule based With minimum Parsing Exploit the Morphology of Indian Languages “VASISTH” Using Salience Measure for Indian Languages No In-depth Parsing Exploit the Rich Morphology of the Language The analysis depends on the salience weight of the candidate (NP) for the antecedent-hood of an anaphor from a list of probable candidates. The salience weight assignment a) The current sentence gets a score of 50 and it reduces by 10 for each preceding sentence till it reaches the fifth sentence. The system considers five sentences for identifying the antecedent. b) The current clause gets a score of 75 if the pronoun present in the clause is a possessive pronoun and if it is a nonpossessive pronoun it gets zero score. c) The immediate clause gets the score 70 in the case of Possessive pronoun and gets a score of 75 for nonpossessive pronouns. d) For non-immediate clause, the possessive pronoun gets a score of 30 and non-possessive pronoun gets a score of 65. e)The analysis showed that the subject could be the most probable antecedent for the pronoun. The case markings the subject of a sentence could take are nominative and dative. A Nominative, a Dative and a Possessive NP with a nominative/Dative head could become a subject of a sentence. f) The direct object of a sentence could be identified by the case markings and all the case markings other than the subject are considered for object. The next most probable NP for antecedent-hood is the direct object and hence it gets a score of 40. g) The third NP in a clause, which is not identified as the subject or object, is considered as the indirect object and gets a low score of 30. Salience factor weights for Indian Languages Salience Factors Current sentence Weights 50- Reduced by 10 for preceding sentences upto 5th sentence Possessive Current clause Immediate clause Non-immediate clause 75 70 30 Non-Possessive Current clause Immediate clause Non-immediate clause Possessive and Non-Possessive N.Nom N.Poss N.Dat N.Acc, Loc, Instr… N.others(3rd NP) 0 75 65 80 50 50 40 30 How it works The salience weight to an NP is assigned in the following way Identify the Pronoun Consider Four sentences above the sentence containing the Pronoun Consider all the NPs preceding the Pronoun ( This is the general rule) Here we take some NPs which follow the the Pronoun since Tamil All Indian languages are relatively free word Order Assign Salience Weights. The NP which gets the maximum salience weight and agrees in png with the anaphor is considered as the antecedent to the anaphor Tools GATE Java-RAP (pronouns) GUITAR (Poesio & Kabadjov, 2004; Kabadjov, 2007) BART (Versleyet al, 2008) Where it is required? Machine Translation Information Extraction Summarization And in……….almost all NLU applications References Massimo Poesio Slides: “Anaphora resolution for Practical task” Ruslan Mitkov: “MARS a Knowledge Poor anaphora resolution system” Thank You