From: AAAI-99 Proceedings. Copyright © 1999, AAAI (www.aaai.org). All rights reserved. Applying Genetic Algorithms to Pronoun Resolution Donna K. Byron and James F. Allen University of Rochester Department of Computer Science P.O. Box 270226, Rochester NY 14627, U.S.A. dbyron/james@cs.rochester.edu Introduction Many pronoun resolution algorithms work by calculating the most salient candidate antecedent. However, many factors affect salience, for example being the syntactic subject or the most frequently mentioned item, and these factors must be combined into an aggregate salience score. One technique is to assign weights for each factor representing the amount by which that factor impacts the overall salience, and the candidate antecedent which accumulates the most weight is selected. Previous authors assigned weights heuristically (cf. Mitkov 1998). By using a genetic algorithm to select the weights, our program beats baseline techniques, and can be customized for each language domain.1 General outline of the algorithm For this study, each salience factor was implemented as an independent module. The modules developed at this time were inspired by a number of previous studies: Increase salience of candidate selected by Hobbs’ naive algorithm (Hobbs 1986)2 Decrease salience of quoted speech (Kameyama 1998) Decrease salience of indefinite NPs (Mitkov 1998) Increase salience of first NP in sentence (Mitkov 1998) Decrease if in relative clause (Kennedy & Boguraev 1996) Decrease if in prepositional phrase (Mitkov 1998) Increase salience of subjects Increase salience of most recent candidate Input to the program is: is the weight assigned to module is the vector of candidate antecedents is generated by the genetic algorithm using random numbers for the first generation, then standard mutate, crossover, and replicate operations for subsequent generations. Each individual’s fitness is the percent of pronouns resolved correctly. The initial population size is fifteen, and after each generation the five most fit individuals are allowed to reproduce, halting after twenty generations. Copyright c 1999 American Association for Artificial Intelligence, all rights reserved. This material is based on work supported by USAF/Rome Labs contract F30602-95-1-0025, ONR grant N00014-95-1-1088, and Columbia Univ. grant OPG:1307. 1 A more detailed version of this paper is available as URCS-TR 713, from http://www.cs.rochester.edu/trs/ai-trs.html 2 Hobbs’ algorithm was slightly modified to allow for the syntactic structure of Treebank trees (see Ge, Hale, & Charniak 1998). Most-recent 47% Modified Hobbs 67.8% Genetic 69.1% Table 1: Pronoun resolution accuracy on the test corpus Experimental Results Our evaluation corpus is 3900 sentences of Treebank text (Marcus, Santorini, & Marcinkiewicz 1993) for which antecedents of definite pronouns were annotated (Ge, Hale, & Charniak 1998). 70% of the corpus was used to train the genetic algorithm, the remaining 30% was the test corpus. Table 1 shows pronoun resolution accuracy for our three experiments. The ‘most-recent-candidate’ module on its own correctly resolved only 47%. Hobbs’ algorithm, which uses syntactic structure, improved to 67.8%. Hobbs’ algorithm performed best of all the modules when run in isolation. The genetic algorithm correctly resolved 69.1%, a slight improvement over Hobbs. Using the same evaluation corpus, Ge et al (1998) developed a probabilistic model that resolved 84.2% of singular, third-person pronouns correctly. Two powerful predictors from their study, mention counts and selectional restrictions, were not included in our system. We plan to integrate those factors as well as additional salience modules and calculations of non-coreference in future experiments. We also plan to use a more sophisticated method of combining salience weights into an overall score, using one of the many techniques available in the machine learning literature. References Ge, N.; Hale, J.; and Charniak, E. 1998. A statistical approach to anaphora resolution. In Proceedings of the Sixth Workshop on Very Large Corpora. Hobbs, J. 1986. Resolving pronoun reference. In Readings in Natural Language Processing. Morgan Kaufmann. Kameyama, M. 1998. Intrasentential centering: A case study. In Walker, M.; Joshi, A.; and Prince, E., eds., Centering Theory in Discourse, 89–112. Clarendon, Oxford. Kennedy, C., and Boguraev, B. 1996. Anaphora in a wider context: Tracking discourse referents. In ECAI-96. Marcus, M.; Santorini, B.; and Marcinkiewicz, M. 1993. Building a large annotated corpus of english: The Penn Treebank. Computational Linguistics 19(2):313–330. Mitkov, R. 1998. Robust pronoun resolution with limited knowledge. In Proceedings of ACL ’98, 869–875.