ABSTRACTS SELECTED FOR PRESENTATIONS AT AQUAINT 18-MONTH WORKSHOP JUNE 2003 SAN DIEGO, CA Blair-Goldensohn, Sasha J., Kathleen R. McKeown, and Andrew H. Schlaikjer. “Creating Paragraph-Long Answers to Definitional Questions.” Columbia University. Ciany, Gary, Anita Kulman, Patrick Schone and Carol Van Ess-Dykema. “Insights into Multilingual and Multimedia Question Answering.” Dragon Development/U.S. Department of Defense. Croft, Bruce and Stephen Cronen-Townsend. “Predicting Question Quality.” University of Massachusetts Amherst. Feldman, Jerome. “Dynamic and Probabilistic Inference.” ICSI/Berkeley. Gish, Herb. “Answer Spotting: Finding Answers in Conversational Speech.” BBN Technologies. Hacioglu, Kadri, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward, Dan Jurafsky, and James Martin. “Improved Semantic Role Parsing.” University of Colorado, Boulder. Israel, David. “Natural Language Querying of the Semantic Web.” SRI International. Kantor, Paul. “Data Fusion for Advanced Question-Answering.” University at Albany/Rutgers University. Korelsky, Tanya. “Ontology-based Multi-modal User Interface for Question Answering in MOQA.” CoGenTex, Inc. Nyberg, Eric. “Towards Light Semantic Processing for Question Answering.” CMU. Ogden, W, J. McDonald, R. Zacharski, R. Chadwick. “Evaluating the habitability of Q&A with user-generated tasks.” NMSU. Snow, Rion L. “Automatic Construction of Semantic Hierarchies.” HNC. Starr, Barbara. “CNS Knowledge Base.” SAIC. 1 05/02/03 Weischedel, Ralph, Jinxi Xu, Ana Licuanan. “A Hybrid Approach to Answering Biographical/Definitional Questions.” BBN Technologies. Yan, Rong, Alexander Hauptmann and Rong Jin. “Negative Pseudo-Relevance Feedback in Content-based Video Retrieval.” CMU. 2 05/02/03 Creating Paragraph-Long Answers to Definitional Questions Sasha J. Blair-Goldensohn, Kathleen R. McKeown, and Andrew H. Schlaikjer Department of Computer Science Columbia University Questions such as "What is X?" can sometimes be answered with a short phrase; often, however, the optimal answer is longer and includes information of different types such as more general related terms, background information, and historical examples. We will present DefScriber, a fully implemented component of our question-answering system that combines knowledge-based and statistical methods in forming multi-sentence answers to open-ended definitional questions of the form "What is X?". DefScriber analyzes texts from multiple sources, matching text fragments to a set of definitional predicates proposed as the knowledge-based side of our approach. On the statistical side, we use clustering techniques to detect similar sentences across multiple sources, and lexical cohesion measures to re-order the sentences in a fluent, natural definition. We will present results of a recent human evaluation of definitions generated by DefScriber from Internet documents. Top-down techniques in DefScriber are based on key elements of definitions as identified in the literature and in our own empirical study of definitions. One such element is information on the term's category (Genus) and/or important properties (Species). For instance, category, or Genus, information about the term ``Hajj'' is given in the sentence ``The Hajj is a type of ritual.'' DefScriber specifically searches for sentences that convey these definitional information types in building a definitional description. Since relevant information for a given definition may not be entirely modeled by predicates, we complement our top-down approach with data-driven techniques adapted from work in multi-document summarization. These techniques take advantage of redundancy on the web to identify good definitional sentences. Using centroid-based metrics and clustering, DefScriber finds similarities in documents that focus on a given term and includes them in the response. These techniques allow us to include core information in the definition even when we don't have a specific predicate to model its semantic type. Lexical cohesion measurements allow us to reorder the selected sentences to maximize the readability of the definition as a whole. 3 05/02/03 Insights into Multilingual and Multimedia Question Answering Gary Ciany*, Anita Kulman‡, Patrick Schone‡, Carol Van Ess-Dykema‡ (*Dragon Development, ‡ U.S. Department of Defense) Until recently, question answering systems have focused on extracting answers to factoid-style questions from a single document contained in a collection of authored English newswire texts. However, from the point of view of an Intelligence Community user, such systems are far too limited to successfully respond to the range of data types and question diversity that occur in an analyst’s everyday experience. These challenges include processing multi-agency data collections comprising multilingual, multi-genre, and multimedia documents; allowing questions whose answers can only be found from merging information from multiple documents or multiple languages; and questions that make reference to individual files in addition to the data collection at large. Our system is being developed with these challenges in mind. Our prototype system is being designed to answer questions with English or Spanish queries and data, where the data collections are derived from TREC Newswire, reference and errorful transcripts from CallHome and Switchboard telephone conversations, and other data sources. In our presentation, we will provide a number of insights that we have gained as we have proceeded in our system development. We present these insights with the goal of motivating the AQUAINT community to develop systems that respond to these user requirements, as well as developing modules that can be integrated into our and other AQUAINT systems. In particular, we will share our observations in response to the following technical issues: How do TREC-style questions differ from those that might be presented by an intelligence analyst? For example, an analyst might ask the question “Who is speaking?” This kind of question cannot typically be answered from a TREC-style QA system and requires the need for integrating metadata into the knowledge sources accessed by the system. How does question answering on newswire data differ from that on human transcripts of conversational speech? For example, if parsing is needed to answer the question, what degradation does one experience when tools that were developed for newswire are applied to transcripts? If one further needs to process automatic, errorful transcripts of conversational speech, how does this affect the development of the overall system? For example, if the error rate of a transcript becomes too high, will a QA system that works well for newswire even have any value on the errorful data? What changes are needed to extend a QA system from one that processes English to one that processes Spanish? Are these same changes adequate for extending to another language like Arabic? Do the question structures change? Do the tools need to be modified? What about issues one might experience when the multilingual corpora are combined into a conglomerated corpus? For example, an analyst may need to process documents in more than one language of which none are previously partitioned by language. 4 05/02/03 Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend Department of Computer Science University of Massachusetts Amherst We develop a method for predicting the quality of passage retrieval in question answering systems. Since high-quality passages form the basis for accurate answer extraction, our method would naturally extend to prediction of an entire system's effectiveness at extracting a correct answer for each given question. Such predictions of question performance may lead to ways of automatically improving questions or guiding users in improving them. Building on previous work on predicting the performance of queries for document retrieval, we compute the clarity score for questions using passage-based collections. We show that this score is correlated with average precision in a TREC-9 based system, breakdown the correlation by question type, and discuss example questions. We also study a more general set of queries extracted from a Web log to help make the case for the general usefulness of performance prediction based on question clarity scores. Clarity scores may also help predict when it will be effective to expand a question with related terms. Preliminary results with an approach that calculates clarity improvements after expansion show that it may be possible to improve answer passage retrieval. 5 05/02/03 Dynamic and Probabilistic Inference Jerome Feldman ICSI – Berkeley For advanced question answering, we need to compute information that is not explicitly in the text, but can be inferred from it. A core task of the Quasi effort has been the development of advanced inference algorithms. Our previous work showed how separate dynamic and probabilistic methods can yield information on both the possible causes and potential consequences of an event. The new result is that we now have a unified methodology for both modes of inference and this appears to significantly advance the state of the art. 6 05/02/03 Answer Spotting: Finding Answers in Conversational Speech Herb Gish BBN Technologies Conversational speech corpora represent a unique and vitally important source of information for analysts in accomplishing their mission. In our presentation we will emphasize how an analyst may query and explore a conversational speech corpus using the Speech Navigator, our demo Answer Spotting system. A speech corpus, before processing, has little structure and answers to questions that may be contained in the corpus can only be obtained by tedious and unguided listening to files. Previously we have shown that an analyst, by providing a small amount of transcribed and annotated data to the system, can create a structure in a speech corpus that makes it amenable to queries concerning the domains of interest to the analyst. In our latest work we have shown that significant structure can be incorporated into a speech corpus without the need for any transcriptions. In this regime of no transcriptions the analyst has given the system preferences for certain types of conversations, based on examples, or the analyst has allowed the system to self-organize the speech corpus and is querying the system in an exploratory, data mining mode. In our presentation we will briefly describe the underlying speech technologies that we have developed but our primary emphasis will be on the exploitation of these technologies by the analyst through the use of our Speech Navigator. The Speech Navigator, through the use of audio and visual tools, facilitates the analyst’s querying the corpus of interest. We will demonstrate how an analyst might use these tools. 7 05/02/03 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward, Dan Jurafsky, and James Martin University of Colorado, Boulder One of our core technologies is our semantic role parser, which annotates input sentences with the roles played by constituents relative to target verbs. We use a set of 22 Thematic Roles such as Agent, Manner, Theme, Reason, etc. We report on the performance of three systems developed for semantic annotation of text. 1) Our baseline system was developed from the design of [Gildea & Jurafsky 2002]. This system first uses the Charniak [Charniak 2001] syntactic parser to identify syntactic constituents, and then labels each constituent with a Thematic Role (including NULL). The system estimates posterior probabilities of role assignments for the constituents given sets of features and combines the estimates to assign the final role labels. Using training and test sets from the PropBank [Kingsbury & Palmer 2002] corpus, this initial system achieved a performance of 57% precision and 70% recall. After a number of improvements to this system (such as clustering target verb classes) the precision and recall rose to 69% / 74%. 2) We then developed a new classifier based on Support Vector Machines. As in the baseline system, a syntactic parse is generated by the Charniak parser, and each constituent is classified. In this system, the role classification is done by an SVM. Using the same training and test sets as the previous system, the SVM classifier achieves precision and recall of 77% / 82%. This is by far the highest performance ever reported for this semantic parsing task. 3) We also experimented with a different semantic parsing algorithm based on SVMs that treats the problem as a chunking task. Rather than use a syntactic parser to identify constituents that are then classified, the SVM is used to both segment and label the semantic roles. A chunk is the sequence of words that fills a semantic role. This work extends previous work [Kudo and Matsumoto, 2000] which used SVMs to do syntactic chunking. We have not yet evaluated this system on the PropBank corpus, but initial evaluations on the FrameNet corpus are very encouraging. The potential advantages of this system are that it is very efficient and does not require a separate syntactic parser. References: Eugene Charniak. 2001. Immediate-head parsing for language models. In Proceedings of the 39th Annual Conference of the Association for Computational Linguistics (ACL-01), Toulouse, France. Daniel Gildea and Daniel Jurafsky. 2002. Automatic labelling of semantic roles. Computational Linguistics, 28(3): 245-288 8 05/02/03 Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC2002). Las Palmas, Cabary Islands, Spain. Taku Kudo, Yuji Matsumoto 2000. Use of support vector learning for chunk identification. In Proceedings of the 4th Conference on Very Large corpora, pages 142144. 9 05/02/03 Natural Language Querying of the Semantic Web David Israel SRI International ASCS (the Agent Semantic Communications Service) is a search engine for the Semantic Web. Developed by Teknowledge, Inc., it searches the entire Web and indexes all pages encoded in DAML+OIL, which is a markup language resulting from the DAML, and other, research programs. (DAML = DARPA Agent Markup Language) ASCS allows an interlocutor to make precise queries for information expressed in any of those pages. ASCS also supports certain kinds of simple inference; for instance, queries can be broadened or relaxed. Although it provides a graphical user interface, ASCS can also be used directly by web-based agents to support semantic search and ontology translation. There is, however, a significant barrier to the use of ASCS: you have to be familiar with logic and DAML+OIL to use it. In collaboration with Teknowledge, we have integrated ASCS into Quark, our AQUAINT system, so that ASCS can be interrogated by posing natural-language queries. Quark employs a human-language parser, Gemini, to translate English queries into a logical form. This form is phrased as a conjecture to SNARK, an automatic theorem prover. In the light of the knowledge in its application-domain theory, SNARK transforms the query and decomposes it into subqueries, which are themselves further decomposed into sub-subqueries, and so on. If an appropriate combination of these subqueries can be answered, the proof is complete. By means of an answer-extraction mechanism, SNARK will deduce or compute an answer to the original query from answers to the solved subqueries. SNARK has a procedural-attachment mechanism, which enables us to link symbols from its theory to external procedures, including web-based knowledge sources such as ASCS. The effect of this is to allow information possessed by the linked source to be provided to SNARK on demand to answer a subquery, while the proof is still in progress, just as if that information were part of SNARK's theory. We have experimented with using ASCS to query the CIA World Factbook, since much of the Factbook has been translated into DAML and made available on the Web. For example, suppose we have a query "Find the capital of an Islamic country that borders Afghanistan." This is translated into a conjecture, which posits the existence of the capital of such a country. The conjecture is submitted to SNARK's inference procedure. It is transformed and decomposed into subqueries that involve symbols such as "borders", "religion" and "capital". For each of these symbols, we have introduced procedural attachments to ASCS. Thus, from a subquery borders(afghanistan, ?country) 10 05/02/03 (What is a country that borders Afghanistan?) ASCS returns as one answer borders(afghanistan, pakistan), which tells us that Pakistan is a country that borders Afghanistan. Similar queries to ASCS tell us that the religion of Pakistan is principally Muslim (which SNARK knows implies Islamic), and that the capital of Pakistan is Islamabad. This is the answer passed back to the interlocutor by the answer-extraction mechanism. Note that the use of SNARK allows us to perform inferences that may go beyond what ASCS could do itself; thus the query refers to Islamic countries, but the Factbook prefers to speak of countries whose religions include Muslim. Further answers to the question can be obtained by asking "Are there any others?" which reactivates SNARK to produce alternative proofs. Thus, we can get another answer to the same question, Dushanbe, which is the capital of Tajikistan, another Islamic country that borders Afghanistan. By repeated probing we can get all the cities known to the Factbook and other sources that satisfy the condition. In addition to inference and natural-language querying, by integrating ASCS into Quark we obtain the ability to cooperate with other external knowledge sources in finding and presenting answers. For example, we can construct threedimensional terrain visualizations (from satellite imagery via TerraVision) and display specialized maps (via NIMA's Geospatial Engine or Generic Mapping Tools). We can invoke the Alexandria Digital Library Gazetteer (about 6 million pages of geographic data), the TextPro information-extraction engine, and other sources to search for knowledge not available through ASCS, and combine it with ASCS information in answering a query. For instance, when Quark answers the query "Find a cave within 50 miles of an airport that is south of the capital of Afghanistan," the Factbook (via ASCS) provides the capital, the ADL Gazetteer finds the airport and cave, and another source computes the distances. 11 05/02/03 Data Fusion for Advanced Question-Answering Paul Kantor HITIQA Project University at Albany/Rutgers University Data fusion enters advanced question-answering in many ways. Our project has so far focused on its use in enriching the set of documents from which answers are to be synthesized or extracted. It is thus an application of data fusion in Information Retrieval. We have worked with three systems: Lemur, Smart and InQuery. At present, two of them, Smart and InQuery are included in the HITIQA system. We have found that it is not generally possible to build good linear fusion rules that will work across a broad range of topics. However, we have found that fusion rules tuned to a specific topic almost always produce improvement, and sometimes significant improvement, over the best of the individual systems. We believe that in the target situation, where analysts deal with incoming streams of information, this “topic-specific” or “localized” data fusion is entirely appropriate. We find, by exploratory analysis of the data, that there is substantial potential for improving our performance even further through the use of nonlinear rules such as logical fusion rules, Support Vector Machines, and nonlinear classifiers of other types. Investigations into whether the type of fusion rule can be correlated a priori or even a posteriori, with features of the topic itself, are under investigation. 12 05/02/03 Ontology-based Multi-modal User Interface for Question Answering in MOQA Tanya Korelsky CoGenTex, Inc. This report focuses on work by CoGenTex, Inc. under the project “Meaning-Oriented Question Answering with Ontological Semantics” (MOQA, in collaboration with NMSU CRL and UMBC ILIT). Since the start of the project in August 2002, we have designed and implemented an innovative web-based multi-modal question-answering interface to MOQA’s fact repository. The first version of MOQA concentrates on answering questions about people, organizations, contacts and travel. The presentation of the search results includes tables, maps, time line, graphs for representing social network and various types of contacts, in addition to textual summaries. All these modes are interconnected by hyperlinks, providing the analyst with comprehensive support for intuitive browsing and follow-up search. One of the distinguishing features of the interface is the use of natural language generation for textual summaries and for query validation. Currently, the analyzed queries are automatically paraphrased back to the user using a disambiguated form of language, with terms from the fact repository and resolved references, informing the user of the system’s interpretation of the query. Such paraphrases are a crucial feature of ontology-based query clarification and repair dialogs. Textual summaries of search results complement graphics and tables by highlighting salient facts and events and clustering data in meaningful ways. Since summaries are in hypertext, they enable data exploration via “drilldown”. The MOQA question-answering interface is currently fully functional. It is designed to be portable between subject domains with minimal effort. The implementation supports portability by using XML-based technology and domain-independent generic text plans. 13 05/02/03 Towards Light Semantic Processing for Question Answering Eric Nyberg Carnegie Mellon University This presentation focuses on a lightweight knowledge-based reasoning framework which is currently being implemented for the JAVELIN QA system. The question is mapped into a logical predicate representation, which is grounded on the lexical labels provided by the parser. Passages which are judged relevant to the question are also parsed into the same logical representation. These two representations are matched using a flexible unification strategy which assigns a match score for partial matches between representations. The passages with the best match are selected as answer candidates. At the level of individual terms (atoms), unification is based on the output of a similarity function. The similarity function can be based on semantic similarity (e.g.calculated by searching in WordNet), or statistical models of term similarity trained on large corpora. The predicate representation and unification algorithm are implemented separately from the similarity metric, so that different metrics can be compared empirically. 14 05/02/03 Evaluating the Habitability of Q&A with User-Generated Tasks W. Ogden, J. McDonald, R. Zacharski, P. Bernick & R. Chadwick Computing Research Laboratory New Mexico State University Ultimately, the goal of evaluating question and answering (Q&A) systems, and information retrieval (IR) systems in general, is to discover ways to improve the technology and to make the systems more useful. Traditionally, however, researchers have sought to develop methodologies and metrics that are better suited to compare systems than to identify ways of improving them. These methodologies have proven to be extremely unproductive for evaluating the usability of interactive information-retrieval systems with “real” users. In this talk we discuss the application of ‘comparison-based’ evaluation methodologies to the evaluation of interactive Q&A and search interfaces and show how using controlled tasks to represent information needs may be one source of the problems using these methods (e.g. determining user motives, judging answer completeness). For example, we have discovered that real information needs are often different from the needs expressed in the original question. We also discuss why comparison-based methods do not provide the intended control necessary to compare systems. We have begun to evaluate Q&A systems and search interfaces using ‘user-generated’ information needs and will discuss how this approach more directly addresses the goal of improving the habitability of Q&A systems. We will describe how this methodology has been used in our initial evaluation of Language Computer Corporation’s web Q&A system. Furthermore, we will discuss how to evaluate the advantages and disadvantages of a natural language interface and how to identify ways to improve these interfaces so users will be more productive and satisfied. 15 05/02/03 Automatic Construction of Semantic Hierarchies Rion L. Snow HNC Software In the past, construction of semantic hierarchies (for example, WordNet) has been performed manually, requiring great expenditure of human effort and language expertise. We present a completely automated technique for creating hierarchies of nested word groupings according to semantic content. This domain and language independent operation requires no prior knowledge about the vocabulary or grammar of the language, and thus we can create both general and domain-specific semantic hierarchies in many languages directly from untagged corpora. We will present multiple examples of automatically-generated semantic hierarchies using the AQUAINT newswire corpus, and of more domain-specific hierarchies using the CNS corpus. 16 05/02/03 CNS Knowledge Base Barbara Starr SAIC A recently initiated by-product of the SAIC AQUA program is a sharable ontology and knowledge base for the AQUAINT program. SAIC is spearheading a federation of developers to develop an ontology and populate knowledge bases by extraction from textual data provided by the Center for Nonproliferation Studies (CNS). Federation participants are: Stanford KSL, Xerox Parc, Battelle, IBM & other interested parties. The intent of this federation is to make the CNS ontology available to all AQUAINT participants. The initial source of terms for the CNS ontology is the CNS Verity TopicSet which is organized as an hierarchy compatible with the VERITY search engine, not a formal ontology. Below the upper categorical levels the relations are word associations, phrase decompositions, exemplary instances, and aliases. Topic sets include: treaties, nonproliferation organizations, dual-use materials, nuclear facilities and component technologies, chemical and biological weapons, etc. Beginning with a base of existing ontologies, including the HPKB-Upper-level-kernellatest, Y2-PQ, SAIC-Merged and World-Fact-Book, we are augmenting these with CNS topic set elements, not previously included, to create the CNS-ontology. We have also drawn from other sources, such as the Terrorist Knowledge Base (TKB), developed under DARPA’s High-Performance Knowledge Base (HPKB) program, and continued in support of their Rapid Knowledge Formation (RKF) program. The Teknowledge LGPL WMD ontology is also under consideration. Three knowledge base segments are to be maintained in KIF, resident in the Stanford/KSL Ontolingua Server. (A move towards DAML for information storage is also under consideration, as this would enable use of the JTP DAML reasoner where possible. Restrictions due to the lack of expressivity in DAML apply.) Three different extractors will be used to populate the possibly overlapping segments. The first extraction is provided by the NMSU/UMBC/Onyx MOCA system, and then processed by the SAIC mapper/translator to produce a KIF ontology. KSL is developing the second extractor, which works on formatted and semi-formatted data. Third, the shared ontology is being made available to IBM, who will in turn be performing information extraction of relations over the CNS data. The final product will be an expanded ontology, containing knowledge on terrorist groups and acts and non-proliferation issues, which would be of value to researchers involved with such concerns, such as ARDA’s Novel Intelligence from Massive Data (NIMD) program and DARPA’s TKB and Total Information Awareness (TIA) programs, as well as AQUAINT participants. Browsing the CNS Ontology will be available during the demonstrations. 17 05/02/03 A Hybrid Approach to Answering Biographical/Definitional Questions Ralph Weischedel, Jinxi Xu, Ana Licuanan BBN Technologies This paper focuses on our approach to generating extended answers to biographical/definitional questions. The approach combines the following components: Information retrieval, to judge relevance of source passages/sentences, Information extraction, e.g., to find all the mentions of a person, to find relations appropriate for social network analysis, to find organizational positions and titles, etc. Linguistic analyses to find important descriptions, e.g., from appositive constructions, copula (“be”, “become”) clauses, and relative clauses where the target entity is the focus, and Summarization (compression) of the information. Each of the components can and will be improved. This paper will show the contribution of each component separately by example answers and the result of the hybrid of these technologies. Though there is no established evaluation metric yet for this class of questions, we will also report on the contribution of each component using the bleu scorer, previously used in machine translation evaluations, as well as our subjective evaluation. 18 05/02/03 Negative Pseudo-Relevance Feedback in Content-based Video Retrieval Rong Yan, Alexander Hauptmann and Rong Jin Informedia Project Carnegie Mellon University Pittsburgh, PA 15213 Video information retrieval requires a system to find a visual answer to a question which may be represented simultaneously in different ways through a text description, audio, still images and/or video sequences. We present a novel approach that uses pseudorelevance feedback from retrieved answers that are NOT similar to the query items without requiring further user feedback. We provide insight into this approach using a statistical model and suggest a score combination scheme via posterior probability estimation. An evaluation on the 2002 TREC Video Track queries shows that this technique can improve video retrieval performance on a real collection. Negative pseudo-relevance feedback shows great promise for very difficult multimedia retrieval tasks, especially when combined with other different retrieval algorithms. 19 05/02/03