Point and Paste Question Answering Sanda M. Harabagiu Department of ComputerScience University of Texasat Dallas Richardson, Texas, 75083-0688 sanda@cs.utdallas.edu Finley L~c~tu~u Department of ComputerScience University of Texasat Dallas Richardson, Texas, 75083-0688 finley @student.utdallas.edu Abstract The informativeanswerto a question needssometimes to be formattedas a summary of a documentor set of documents. This scenariorequires a novelQuestionAnsweringarchitecturethat relies on a combination of topical relations minedfromWordNet and textual information availablefromdocuments related to the question’s topic. This paper presents a methodology of pointing out informationto be included in a summary by using extractiontemplates.Thepaperalso describeshowthe summary is generatedusing a small set of pastingoperators. Introduction Automatictextual Question Answering(QA)draws substantial interest both from research laboratories and the commercial world because it provides a promising solution to the informationoverload we are currently facing. Instead of browsingthrough long lists of documentswheneverwe need answersto a question, we rather read a short text snippet containing the answer we are looking for. Oneinteresting aspect of QAis uncoveredif we take into accountthe observation that not all topics are createdequal. In a givenon-line text collection, sometopics maybe covered by more documents whereas other topics are briefly mentioned. Whena question asks about one of the more popular topics, summarization techniques need to replace that standard paragraph retrieval techniques used in current QAsystems. If a documentcovers at length one of the topics, any of the paragraphs is relevant to the question asking about that topic. Instead of selecting one of the paragraphs based on some keyword-basedmetric, we argue that using summarization techniques will produce more informative answers. If several documentscover different aspects of the same topic, the answer should be generated by multi-document summarization techniques similar to those presented in (Radevand McKeown1998). In this paper we revisit the problemof QAby focusing on questions that are answeredby a summary.Wethus study the changes imposedon the three typical processes of a QA system: (l) question processing; (2) documentprocessing Copyright~) 2002, American Associationfor Artificial Intelligence(www.aaai.org). All rights reserved. 27 Paul Mor,~rescu Department of ComputerScience University of Texasat Dallas Richardson, Texas, 75083-0688 pauim@student.utdallas.edu and (3) answer formulation. Moreover, the techniques employfor generating summariesas answer to question are different from those developed for query-based summarization. At the core of our methodof generating topic-related summariesis the lexico-semantic knowledgethat we discem for each topic in part. To be able to identify textual informationthat is relevant to a question we reply on pattern matchingof linguistic rules against free text. Similar to the InformationExtraction (IE) task developedfor the Message UnderstandingConferences(MUC),if a topic template is known,we can rely on linguistic rules that recognize the topic information. However,whena question is asked, no template is provided. Instead, we generate an ad-hoc template based on topical relations that are mimedfrom the WordNetlexico-semantie data. Whenthe topic is covered by a single document,we generate a summaryusing sentences that are extracted and compressesbased on lexico-semantic and cohesive information brought forward by the WordNet topical relations. If the topic is coveredby multiple documents, the summery is generated by mergingtopic templates and using someof the possible relations betweentemplates. This form of fusing information is highly dependenton the combinationof topic patterns minedfrom WordNetand the lexico-semanticrelations discoveredin texts. The remainingof this paper is organized as follows. Section 2 presents the topical relations that can be minedfrom WordNet,whereasSection 3 presents the techniques for generating point-and-paste answers. Section 4 summarizesthe conclusions. Topical Relations Lexical chains are sequences of semantically related words found in a documentor across documents. They have been used extensively in computationallinguistics to study: discourse, coherence, inference, implicatures, malapropisms, automatic creation of hypertext links, and others (Morris and Hirst 1991), (Hirst and St-Onge 1998), (Green 1999), (Harabagiu and Moldovan1998). Current lexical chainers take advantage only of the WordNetrelations between synsets, totally ignoring the glosses. A far better lexical chaining technology can be developed based on Extended WordNet since the interconnectivity of synsets will largely increase via the gloss concepts. Considerthe followingtext: The mine has revitalised diamondproduction. One cannot {SHi }: (CHi J) ISml: (Cm’i) .------’¢" {Sjl: (Cj. k) l ~ / {Si} : (CidJ /\ - .I ) {Shi} : (Ch; Figure 1: topical" relations are extracted’ffbmseveral sources Morphological Gloss relation Synset (the soundof laughing:v#1) noun - verb laughter:n#1 immediately:r#3 (beating an immediate:a#2relation) adverb- adjective (take out insurance:n#3for) verb - noun insure:v#4 (..characteristic of or befitting a parent:n#1) adjective - noun parental:adj#2 Table 1: Examplesof newmorphologicalrelations revealed by the topical relations explain the cohesion,and the intention of this simple text by using only the existing lexico-semantic relations like those encoded in WordNet(Fellbaum 1998). Typical relations encoded in lexico-semantic knowledgebases like WordNet comprise hypernymy, or IS-A relations, meronymyor PARTOFrelations, entaibnent or causation relations. Noneof these relations can be established betweenany of the words from the abovetext. However,it is obvious that this short text is both cohesive, as wordshave morethan syntactic relations joining them, and coherent,as it is logical. Relations that account for the cohesionof the abovetext can be mined from WordNet’sglosses that define the encoded concepts. In WordNeteach concept is represented through a synonym set of words sharing the same part-of-speech. Such synonymsets are knownarts synsets. The words used to define the synset {mine} are related to the wordsused to define the synset {diamond}. Furthermore, these two synsets are related to other synsets containing wordssuch as ring, bracelet or gemand precious. Similarly, production, market and mine are related to the nameof big diamondconcerns like De Beers. All these relations convergetowardthe topic of "diamondmarket", bringing forward all the various concepts that define the topic. This is whywe call themtopical relations. A good source for topical relations is WordNet,namely the glosses defining each synset. Topical relations are pointers that link a synset to other related synsets that mayoccur in a discourse. Topic changes from a discourse to another, and the sameconcept maybe used in different contexts. Whenthe context changes, the concepts used change. For example, the diamondmarket can be discussed from a financial, or geological, or gem-jewelrypoint of view. In each case, the concepts used wouldbe quite different. This pluralism of topics for the sameconcept creates confusion and difficulty in identifying related concepts. Thesourcesfor the topical relations are illustrated in Figure 1. 1. The first place wherewe look is the gloss of each concept. Since the gloss concepts are used to define a synset, 28 they clearly are related to that synset. Fromthe gloss definition we extract all the nouns, verbs, adjectives and adverbs, less someidiomatic expressions and general concepts that are not specific to that synset. For example,the gloss of the concept diamondis (a transparent piece of diamondthat has beencut and polished and is valued as a precious gent) 2. Eachconcept Ci,j in the gloss of synset Si points to a synset Sj that has its owngloss definition. The concepts C’./,~ mayalso be relevant to the original synset Si. Although this mechanism can be extendedfurther, we will stop at this level. For the case of the gloss of diamond,we shall have pointers to the glosses of gem,cut, polish, valued and precious. 3. A third source is the hypernymSHi and its gloss with concepts CHi,I. For examplethe synset {jewelry, jewelry.} is one of the hypernymsof diamond,having its owngloss. Other relations such as causation, entailment and meronymy are treated in the samewayas the hypernymy. 4. Yet another source consists of the glosses in whichsynset Si is used. Thosesynsets, denoted in the figure with Smare likely to be related to Si since this is part of their definitions. This schemeof constructing topical relations connects a synset to manyother related synsets, somefrom the same part of speechhierarchy, but manyfromother part of speech hierarchies. The increased connectivity betweenhierarchies is an important addition to WordNet.Moreover,the creation of topical relations is a partial solution for the derivational morphologyproblem that is quite severe in WordNet.In Table I we showa few examplesof morphologicallyrelated wordsthat appear in synsets and their glosses whichwill be linked by the newtopical relations. Paths betweensynsets via topical relations Figure 2 showsfive synsets St to $5, each with its topical relation list representedas a vertical line. In the list of synset Si there are relations rij pointing to Sj. It is possible to establish some connections between synsets via topical relations. Wepropose to develop software that automatically provides connecting paths between any two synsets Si and Sj up to a certain distance as shown ! : five synsets paths that can be es blished betweenthe i lynsets in Table 2. The meaningof these paths is that the concepts along a path have in common that they mayoccur in a discourse, thus are topically related. These paths should not be confused with semantic distance or semantic similarity paths that can also be established on knowledgebases by using different techniques. Explanation: polish:v#1 is part of the definition of synset diamond:n#1, thus it is on its topical relations list. Example 2: Is there any topical relation between ring and diamond? Answer: there is a W-path: diamond:n#1 ~ gem:n#1 ring:n#1 Explanation: gem:n#1is found in the definition of diamond:n#1and is also found in the definition of ring:n#1. I Name I Path I V- path W- path Si -Sk-Sj VW- path s~ -s~-s~-s, WW - path &-s~-&-sm-sj Table2: Typesof topically related paths Example 3: Is there a topical connection between diamondand gold?. Answer:there is a VW-path:diamond:n#1--* gem:n#1*-ring:n#1--+ gold:n#1 Explanation: diamond:n#1has gem:n#1in its definition, which in turn is in the definition of r/rig:n#1 which is defined by gold:n#1. The nameof the paths was inspired by their shape in Figure 2. The V path links directly Si and Sj either via a rij or rj,i, namelyeither Si has Sj in its list of topical relations or vice versa. The Wpath allows one other intermediate synset between Si and Sj. Similarly, the VWpath allows two intermediate synsets and the WW path allows three intermediate synsets. Welimit the length of the path to five synsets. For each type of path, there are several possible connections depending on the direction of the connection between two adjacent synsets (i.e. via ri,j or rj,D. Theseconnections are easily established with knownsearch methods. When searching for all topical relations of a given concept, wemay assumethat we try to establish whetherthere are topical relations betweenthat concept and any other concept encoded in WordNet.To speed-up the search, we rely on hash tables of all possible paths betweenany sysnet and any other possible sysntet fromWordNet.In this way,at processing time, retrieving all conceptsthat characterize a topic is a fast and robust process. Examples Supposeone wants to find out whether or not there are any topical relations between two words. The system will try to establish topical paths betweenany combinationof the senses of the two words. Here are someexamples: Example4: Is there a possible topical relation between diamond and mine? Answer:there is a WW-path:diamond:n#1t-- gem:n#1--r ring:n#4 --* gold:ngl ~ mine:n#1 Explanation: gold:n#1 is in the definition of mine:n#1,and the rest is as before. Answering questions with summaries There are questions whoseanswer cannot be directly mined from large text collections because the informationthey request is spread across full documentsor across several documents. Instead of returning all those documents,the answer should summarizethe textual information contained in the texts. To generate summariesas response to a question, the architecture of the typical QAsystemneeds to be modified. Figure 3 illustrates the newQAarchitecture capable of answeringquestion of the type "Whatis the situation with the diamond market?". Instead of capturingthe semanticsof the questionby classifying against the question stem (e.g. what, whoor how), the question processingmoduleidentifies the topic indicated by the question. Topicidentification is performedby first ex- Example 1: Is there any topical relation between diamondand polish? Answer:there is a V-path: diamond:n#1 --+ polish:v#1 t part-of-speechof the synset and by the sense of that word.For parts-of-speech,wedenotenounsas n, verbs as v, adjectivesas a and adverbsas r, followingthe samenotationas in WordNet. tWedenote a synset by one of its elements,followedby the 29 Input article set Input article ~ GISTexter Single-Document Summarizer T I SentenceExtracti°n ~ _ ~ i ~ s~fo: Document I, .---... ,"" Comusot "",~ : human-written:9 ¯ . abstracts ." Sum~ Figure 4: Architecture of GISTEXTER Question Processing Multi-Document Summary Summarization System Figure 3: Architecture of QAsystems that answer question with a summary. tracting the context wordsfrom the question and then mining their associated topical relations from WordNet.Topical paths that display a high degree of redundancyare grouped as separate topics and are used to process the documents.Instead of performingparagraphretrieval as it is donein current fact-based QAsystem, documentprocessing involves a topical categorization of the collection and returns either single documentsthat are characteristic of the topic or set of such documents.Finally, a summarizationsystem generates the answerto the question in the form of a summaryof predefinedsize or compressionrate. For the purpose of summarization,we have trained a summarization system on the corpus of documentsand correspondingabstracts written by four different humanassessors for the DUC-2001 evaluations. Our system, called GiSTEX- 3O TERgeneratesboth single-documentand multi-document summaries. Twoassumptions stay at the core of our techniques:(1) whengenerating the summary of a single document,we shouldextract the sameinformationa human wouldconsider whenwriting an abstract of the samedocument;(2) when generating a multi-documentsummary,weshould capture the textual informationsharedacrossthe document set. To this end, for the multi-document summaries wehaveapplied the CICERO InformationExtraction (IE) systemto generate extraction templates for someof the topics that werealready encoded in the multi-domain knowledgeof CICERO 2. For the topics that werenot coveredby CICEROwehad a back-upsolution of gisting informationby combining cohesion and coherence indicators for sentenceextraction. By definition, gisting is an activity in whichthe information taken into accountis less than the full informationcontent available, as reported in (Resnik 1997). In our backupsolution, getting the gist of the summary information is performed by assumingthat the more cohesive the gisted information is, the moreessential it is to the multi-document summary.In addition coherence cue phrases were considered for information extraction. At the core of getting the gist of the summarystay topical relations extracted from WordNet. The architecture of G ISTEXTER is shownin Figure 4. Input to the systemis either a single document or a set of documents. Whena summaryfor a single documentis sought, in the first stage key sentences are extracted, as in mostof the current summarizers.A sentence extraction function is 2CICEROis an ARDA-sponsored on-goingproject that studies the effects of incorporatingworldknowledge into IE systems.CICERO is being developedat LanguageComputerCorporation. learned, based on features of the single-documentdecwnposition that analyzes the features of human-writtenabstracts for single documents.Tofurther filter out un-necessaryinformation, the extracted sentences are then compressed.In the last stage a summary reduction is performed,to trim the whole summaryto the length of 100 words. If multi-documentsummariesare sought, two situations arise: Case 1: the topic of the documentset has been encodedin the multi-domain knowledgeof CICERO,therefore the IE systemextracts templates filled with informationrelevant to the topic of the documentset. To fill extraction templates CICEROperforms a sequence of operations, implemented as cascaded finite-state automata. The texts are tokenized and the namedentities are semantically disambiguated,then phrasal parses are generated and combinedinto noun and verb groupstypical to a given topic of interest. Coreference betweenentities is resolved and the mentionsof the events of interest are identified. Beforemerginginformationreferring to the sameevent, coreference betweenentities related to the sameevent is resolved. The textual information that fills the templates is used for content planning and generation of the multi-documentsummary. Case 2: the topic is new and unknown.In this case the human-writtenabstracts are decomposedto analyze the cohesive and coherence cues that determine the extraction of textual information for the multi-document summary.For the analysis of the cohesive information, we use the topical relations mined from WordNetas well as the semantic classes of the namedentities provided by CICERO. The coherence information is indicated by cue phrases such as because, therefore or but but also by very dense topical relations betweenthe wordsof the text fragment. 2. Application of SummaryOperators - after commonfeatures of multiple templatesare identified and the distinct features are markedup, summaryoperators are applied to link information from different templates. Often this results into the synthesis of newinformation. For example, a generalization maybe formed from two independent facts. Alternatively, since almostall templateshavea TIME and a LOCATIONslot, contrasts can be highlighted by showinghowevents changes along time or across different locations. The operators we encoded in GISTEXTERare listed in Table3 and weredefined and first introduced in (Radev and McKeown 1998). 3. Orderingof Templatesatut Linguistic Generation- whenever an operator is applied, a newtemplate is generated by merginginformation from the argumentsof the operator. All such resulting templatesare scored highly then the extracted templates. Moreover,dependingon the slots the operators merged,the scores havedifferent values. At the end, information from templates with higher importance appear with priority in the resulting summary.Weuse the topical relations to realize only those sentence fragments that are identified by extraction patterns. Name Changeof Perspective Definition Thesameevent described differently Contradiction Conflictinginformationaboutthe sameevent Addition Additionalinformationabout an event is broughtforward Refinement Moregeneral informationabout an eventis refinedlater Superset/Generalization Combinetwo incomplete templates Trend Severaltemplateshavecommon fillers for the sameslot Single-Document Summarization To generate lO0-wordsummariesfrom single textual documents, we have implementeda sequence of three operations: Table 3: SummaryOperators. 1. SentenceExtraction- identifying sentences in the original documentthat contain information essential for the summary; Whenthe topic is not encoded in CICERO, we need to further implementtwoadditional steps: 1. Automatic Generation of the Extraction Template - in whichtopical relations are used to find redundantinformation,and thus use it as slots of the template; 2. AutomaticAcquisition of Extraction Patterns - in which linguistic rules capable of identifying informationrelevant to the topic in text and to extract it by filling the slots of the template. Usually, such textual information represents only about 10%of a document. For generating extraction templates we collect all the sentences that contain at least twoconceptsthat are related to the topic. Wethen categorize semanticallyall these concepts and select the most common classes observed in the corpus. To these classes we add NamedEntity classes (e.g. PERSON, LOCATION,DATEand ORGANIZATION). We consider the eachof these twotypes of classes as the slots of the template. Whenthe extraction template is ready, extraction patterns are learned by applying the bootstrapping techniques de- 2. Sentence Compression- reducing the extracted sentences by filtering out all non-importantinformation while preserving the grammaticalquality; 3. SummaryReduction - trimming the resulting summaryto the required 100 word/documentlimit. The sentence extraction is based on the topical relations, whereas sentence compression combines the topical relations with syntactic dependencies. The summaryreduction simply retains only the first I00 wordsfrom the summary. Multi-Document Summarization The generation of multi-documentsummariesis a two-tired process. Whenthe topic is one of the 20 topics encodedin CICERO, we generate the summaryas a sequence of three operations: 1. TemplateExtraction - filling relevant information in the slots of predefinedtemplates correspondingto each topic; 31 scribed in (Riloff and Jones 1999). If there is any document from which we cannot extract at least a template, we eliminate the slot corresponding to the least popular semantic class and continue until we have at least on template per document. The extraction techniques have the role of pointing where the information that should be included in the summaryexists. But we also need to paste this information in a coherent and cohesive summary. This form of generation was first introduced in (.ling and McKeown2000). Instead of using knowledge represented in the form of templates alone, cutand-paste operators provide a way of generating summaries by transforming sentences selected from the original documents because they were matched by extraction patterns. The cut-and-paste operators are listed in Table 4. Name Sentence reduction Sentence combination Paraphrasing Generalization/Specification Reordering S. Harabagiu. M. Pa~ca and S. Maiorano. Experiments with Open-DomainTextual Question Answering. In Proceedings of the 18th International Conference on ComputationalLinguistics (COLING-2000),August 2000, Saarhruken Germany,pages 292298. S. Harabagiu, D. Moidovan,M. Pa~ca, R. Mihalcea, M. Surdeanu, R. Bunescu,R. Girju, V. Rusand P. Mor~rescu.Therole of lexicosemantic feedback in open-domaintextual question-answering. In Proceedings of the 39th Annual Meeting of the Association for ComputationalLinguistics (ACL-2001),Toulouse, France, 2001, pages 274-28 I. G. Hirst and D. St-Onge. Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. WordNet-AnElectronic Lexical Database. The MIT Press, C. Fellbaumeditor, pp 305-332, 1998. E. Hovy, L. Gerber, U. Hermjakob,C-Y. Lin, D. Ravichandran. Towardsemantic-based answer pinpointing. In Proceedings of the HumanLanguage Technology Conference (HLT-2001), San Diego, California, 2001. H. Jing and K. McKeown. Cut and Paste Text Summarization. In Proceedingsof the First Meeting of the North AmericanChapter of the Association of ComputationalLinguistics (NAACL-2000), pages 178-185, 2000. W. Lehnert. The Process of Question Answering: a computer simulation of cognition. LawrenceErlbaum,! 978. G.A. Miller. WordNet:a lexical database. Communicationsof theACM,38(I I):39--41, 1995. J. Morris and G. Hirst. Lexical CohesionComputedby Thesaural Relations as an Indicator of the Structure of Text. In Computational Linguistics, Vol.! 7, no 1, pp. 21-48,! 99 !. D.R. Radev and K.R. McKeown.Generating natural language summariesfrom multiple on-line sources. Computational Linguistics, 24(3):469-500, 1998. P. Resnik. Evaluating Multilingual Gisting of WebPages. AAA! Spring Symposiumon Natural LanguageProcessing for the World Wide Web, 1997. E. Riloff and R. Jones. Learning Dictionaries for Information Extraction by Multi-LevelBootstrapping. Proceedingsof the Sixteenth National Conferenceon Artificial Intelligence (AAAI-99), pages 474-479, 1999. Definition Removeextraneous phrases from sentence Mergetext from several sentences Replacephrases with their paraphrases Replace phrases with more general/specific phrases Changethe order of sentences Table 4: Cut-and-Paste Operators. Conclusions In this paper we described a new paradigm for answering natural language questions. Whenevera question relates to a topic that is discusses extensively in a document or set of documents, the answer produced is a summary of those texts. The generation of the summaryrelies on mining topical relations from the WordNetknowledge base. In this way, answers are mined both from the texts and the lexical knowledge base encode in WordNet. Acknowledgments: This research ARDA’s AQUAINTprogram. was partly funded by References Christiane Fellbaum(Editor). WordNet- AnElectronic Lexical Database. MITPress, 1998. S.J. Green. Building Hypertext Links by ComputingSemantic Similarity. IEEETransactions on Knowledgeand Data Engineering, pp 713-730,Sept/Oct1999. S.M. Harabagiu and D.L Moldovan.KnowledgeProcessing on an Extended WordNet. W0rdNet-AnElectronic Lexical Database. The MITPress,C. Fellbaumeditor, pp 379-406, 1998. S. Harabagiu, G.A. Miller and D. Moldovan.WordNet2 - A Morphologically and Semantically EnhancedResource. In Proceedings of SIGLEX-99,June 1999, Univ. of Maryland,pages I-8. S. Harabagiuand S. Maiorano.Finding answers in large collections of texts: Paragraphindexing + abductive inference. In AAAI Fall Symposiumon Question Answering Systems, pages 63-71, November1999. 32