From: AAAI-80 Proceedings. Copyright © 1980, AAAI (www.aaai.org). All rights reserved. FAILURES IN NATURAL LANGUACESYSTEIG: APPLICATIONS?o DATA BASE WERY SYSTEMS Eric Mays Departmentof Canputerand InformationScience Universityof Pennsylvania Philadelphia,PA 19104 ABSTRACT A significant class of failures in interactionswith data base query systemsare attributable to misconceptions or incunplete knowledgeregardingthe danain of discourseon the part of the user. This paper describes several types of user failures, namely, intensional failures of presumptions. These failures are distinguished fran extensional failures of presumptions since they are dependent on the structure rather than the contentsof the data basee A knowledge representation has been developed for the recognition of intensional failures that are due to the assumption of non-existent relationships between entities. Severalother intensionalfailureswhich dependon more sophisticated knowledgerepresentations are also discussed. Appropriateforms of corrective behavior are outlinedwhich would enable the user to formulatequeriesdirectedto the solution of his/her particular task and compatiblewith the knowledgeorganization. To avoid confusion, a clear distinction should be made betweenfailuresand errors. An error occurs when the system's response to an input is incorrect. Errors generallymanifest themselvesas incorrectresolutionof ambiguities in word sense or modifierplacement. These errors would usually be detected by the user when presented with a paraphrase that differs in a meaningfulway frcanthe originalinput [61. More serious errors result fran incorrectcodingof domain knowledge,and are often undetectable by the user. This paper concerns itself with the recognition and correction of user failuresin naturallanguagedata base query systems -- in particular, failures that arise fran the user@s beliefs about the structure, rather than the cOntent, of the data base. The data base model that has been implementedfor the recognition and correction of simpleuser failuresabout the data base structure is presented. Several other failures which depend on more sophisticated knowledgerepresentation are also discussed. I INTRCDUcrICN II PRESUPFDSITICNANDPRESUMPTICN An important aspect of natural language interaction with intelligent systems is the ability to deal constructivelywith failure. Failurescan be viewed as being of two types. One can be ascribedto a lack of syntactic, semantic, or pragmaticcoverageby the system. This will be termedsystem failure, and manifestsitself in the inability of the system to assign an interpretation to the user's input. Recent work has been done in responding to these types of failures,see for example, Weischedel and Black [aI, and Kwasny and Sondheimer E31- A second class of failuresmay be termed user failures. A user failureariseswhen his/hermiefs about the domain of discourse diverge from those of the system. ** *This work is partiallysupportedby a grant frcm the NationalScienceFoundation,NSF-W 79-08401. **Sane user beliefs regarding the domain of discourse are implicitly encoded in questions posedtothe system. The beliefs held by the system explicit in its knowledge are representation, either procedurally or declaratively. 327 The linguistic notion of presupposition provides a formal basis for the inferenceof a significantclass of user beliefs. There is a less restrictivenotion,presumption,which allows the inferenceof larger class of user beliefs, namely, that knowledgewhich the user must assume when posing a question. A presuppositionis a proposition that is entailed by all the direct answers of a question.*** A presumption is either a presuppositionor it is a propositionthat is entailedby all but one of the direct answersof a presuppositionis a stronger question [21.Hence, , and a presuppositionis a versionof presumption preswtption by definition. For example,question (la)has severaldirect answerssuch as, "John", "Sue", etc., and, of course, "no one". Proposition(lb) is entailed by all the direct answers to (la) except the last one, i.e., "no -------m-e ***The complete definition of presupposition includes the condition that the negationof a question, direct answer pair entails the presupposition. one". Therefore,(lb) is a presumption of (la). Proposition (Id) is a presuppositionof (lc), since it is entailed by all of the question's direct answers. such a change usually represents a fundamental modificationof the organizationof the enterprise that is modelled. One can observethat structural modifications occur over long periods of time (manymonths to years, for example), while the data base contents are subject to changeover relativelyshorterperiodsof time (hourly,daily, or monthly,for example). (la)Which facultymembers teachCSEllO? (lb)Facultymembers teach CSEllO. (lc)When does John take CSEllO? (ld)John takes CSEllO. Presumptionscan be classifiedon the basis of what is asserted -- i.e., an "intensional" statementabout the structureof the data base or an "extensional"statement about its contents. Thus an extensional failure of a presumption occurs based on the currentcontentsof the data base, while an intensionalfailureoccurs based on the structure or organization. For example, question (2a)presumes propositions (2b), (2c), and Presumption (2b) is subject to intensionalfailureif the data base does not allow for the relation "teach"to hold between "faculty"and "course"entities. An extensional failure of presumption (2b) would occur if the data base did not contain any "faculty member" that "teaches" a "course". Also note that the truth of (2b) is a pre-conditionfor the truth of (2c) (a) Kaplan [2] has investigated the -tation and correction of extensional failures of preslrmptions. The approachtaken there involves accessing the contents of the data base to determine if a presumption has a non-empty extension. The remainderof this paper discusses severalways a presumption might be subject to intensional failure. These inferencesare made from the structuralinformationof the data base. l l (2a)Which facultymembers teach CSEllO? (2b)Facultymembers teach courses. (2~)Facultymembers teach CSEllO. (Xi)CSEllO is a course. Although a presumption which fails intensionally will of necessity fail extensionally, it is important to differentiate between them, since an intensionalfailurethat occurswill occur consistentlyfor a given data base structure, whereas extensionalfailureis a transitoryfunctionof the currentcontentsof the data base. This is not meant to imply that a data base structureis not subjectto change. However, 328 III --DATA F3ASEM3DEL In order to recognize failures of presumptions concerningthe structureof the data base, it is necessaryto use a robust data model. The discussionhere will assume a data base model similarto that proposedby Lee and Gerritsen [4], which incorporates the generalization dimension developed by Smith and Smith [7] into the Basically, entity-relationship model [U * entitiesparticipatein relationshipsalong two orthogonal dimensions, aggregation bxmg dissimilar entities) and generalization (among similar entities), as well as having attributes that assume values. As an exampleof this type of structure consider the data base model fragment for a typicaluniversityin figure 1. Entity sets are designatedby ovals, aggregationrelationships by diamonds,and generalizationrelationshipsby edges from the super-entityset to the sub-entity set. The parallelarcs denotemutual exclusion. Mutual exclusion is used to infer the difference between"men that are also faculty" (a possiblynon-emptyset) and "men that are also women" (an empty set by definition),for exmple given figure 1. This distinctioncan be made by prohibiting the traversal of a path in the data model that includes two entity sets which are mutually exclusive. Furthermore,the path in the generalization dimensionis restrictedto "upward" traversals followedby "downward"traversals. An upward (downward)traversalis from a sub-entity (super-entity)set to a super-entity(sub-entity) set. This restriction is made to prevent over-specialization of an entity set when traversingdownwardedges. The set of inferences that can be made in the presence of this restrictionis not overlyconstrained, since any two entity sets that have a ccmmon intersection (sub-entity set) will also have a common union (super-entity set).* IV INTENSICNALFAILURES B. InapplicableFunctions Intensional failures may also occur when attempting to apply a functionon a domain. The question,What is the averagegrade in CSEllO?", will cause no processingproblemsprovidedgrades are assignedover the real numbers. But if grades ranged fron A to F, then the system should inform the user that averagescan not be performed on character data. (Note that the clever system designermight trap this case and make numerical assignments to the letter grades.) A more significantaspect is the notionof a function to be meaningfulover a particulardcmain. That is, certainoperations, even though they might be applicable, may not be meaningful. An example would be "averagesocial security number". The user who requested such a amputation does not reallyunderstandwhat the data is supposed to represent. In such cases a short explanation regarding the function of the data would be appropriate. To achievethis type of behavior,of course,the data base model must be augmented to includetype and functionalinformation. A. Non-existentRelationships C. HigherOrder Failures The most basic intensionalfailure that can occur is the presumption of a non-existent In the relationship between entity sets. university data base model fragmentgiven above, such a failure occurs in the question "Which faculty take courses?". This questionpresumes that a "take" relationship could exist between "faculty" and "courses"entities. Since no such relationshipcan be established,that presumption has failed intensionally.Recognizingthe failure is only part of the problem-- it is also useful to provide the user with related intensional knowledge. Given a relationR, entitiesX and Y, failed presumption (R X Y), salient and a intensionalknowledgecan be found by abstracting on R, X, or Y to create a new relation. For example,considerthe followingexchange: Q: Which facultytake courses?" A: "I don't believe that faculty can take courses. Faculty teach courses. Studentstake courses." A similarfailureoccurs in the presumption of a non-existent attribute of an entity set. For example,What is the cost of all courses taught by teaching assistants?", incorrectly presumes that in this data base, "cost" is an attribute of "courses". The mutual exclusion operator allows the detectionof a failurewhen the questionspecifies a restrictionof an entity set by any two of its mutually exclusivesub-entitysets. For examplep Which teachers that advise students take courses?" presumes that there could be same "teachers"that are both "faculty"and "students". Since this situationcould never arise,given the structurein figure 1, it should be cammunicated to the user as an intensionalfailure. If an exhaustiveness operatoris incorporated as well, unnecessary restrictions of an entity set by disjunctionof all of its exhaustive sub-entity sets can be detected. Althoughthis would not constitutea failure,it does indicatethat there is scane misconceptionregardingthe structureof the data base on the part of the user. If the sub-entitysets were known to be exhaustiveby the user, there would be no reason to make the restriction. As an example,the additionof the fact that "grads"and "undergrads" were exhaustive sub-entity sets of "students" would cause this misconceptionto arise in the question "Which students are either grads or undergrads?".The following behavior would be desired in these cases: Q: "Whichteachersthat advise students take courses?" A: "Facultyadvise students. Studentstake courses. I don't believethat a teacher can be both a facultymember and a student." *See [5] for a more detaileddescription. 329 D. Data Currency [2] Kaplan,S.J., Cooperative Responses Fran a Portable Natural LanguageData Base Query -em; Ph.D. Dision, Ccmpz and Information Science Department, University of Pennsylvania, Philadelphia,PA., 1979, Same failuresdepend on the currency of the data. One such example occurs in a naval data base about ships, subs, and aircraft. The question"What is the positionof the Kitty Hawk?" presumes that timely data is maintained. Actually, positions of friendly vessels are current,while those of enemy ships might be hopelessly out of date. In this case, the failures would be extensional since the last update of the attribute must be checked for currency. It may be the case that sane data is current while other data is not. However,the update processing time lag from actual event occurence to capture in the data base might be sufficientlylong that such presumptionsmight be subject to intensional failure. Thus the user could be made aware that current data was never available. [31 Kwasny, S.C.r and Sondheimer, N.K., "Ungrammaticality and Extra-Grannnaticality in Natural Language Understanding systerns" , Proceedings of the Conferenceof the Association for Cqutational Linguistics, La Jolla, CA., August 1979. [4] Lee, R.M. and Gerritsen, R., "A Hybrid Representation for Database Semantics",Working Paper 78-01-01, Decision Sciences Department, Universityof Pennsylvania,1978. "Correcting MisconceptionsAbout [51 Maysr E., Data Base Structure", Proceedings of the Conference of the Canadian for Society Computational Studies of Intelligence, Victoria, BritishColumbia,Canada,May 1980. v coNcLus1oN In this paper we have discussedseveral.kinds of failures of presumptions that depend on knowledgeabout the structureor organization of the data base. It is importantto distinguish betweenstructureand content,since there is a significant difference in the rate at which they change. When respondingto intensional failures of presumptions, simplypointingout the failure is in most cases inadequate. The user must also be informedwith regard to relatedknowledgeabout the structure of the data base in order to formulate queries directed at solving his/her particularproblem. The technique described for recognizing intensional failuresthat are due to the presumption of non-existent relationships between entities and attributesof entitieshas been implemented. Further work is aimed at developing knowledgerepresentations for temporal and functionalinformation.We hope to eventually develop a general account of user failuresin naturallanguagequery systems. [63 McKeown,K., "Paraphrasing Using Given and New Information in a Question-Answer system", Proceedingsof the Conferenceof the Association for Canputational Linguistics, La Jolla, CA., August 1979. [7] Smith, J.M. and Smith, D.C.P., "Database Abstractions: Aggregation and Generalization", ACM Transactionson DatabaseSystems,Vol. 2, No. 2, June 1977. 183Weischedel,R.M., and Black, J., "Responding Intelligently to UnparseableSentences",American Journalof CanputationalLinguistics,Vol. 6, No. 2, April-June1980. I would like to thank Peter Buneman, Aravind Joshi, Kathy McKeown,and BonnieWebber for their valuablecammentson an earlier draft of this p a p er l [l] Chen, P.P.S., "The Entity-RelationshipModel -- Tawards a Unified View of Data", ACM Transactionson DatabaseSystems,Vol. 1, No. 1, March 1976. 330