From: AAAI-80 Proceedings. Copyright © 1980, AAAI (www.aaai.org). All rights reserved.
FAILURES IN NATURAL LANGUACESYSTEIG:
APPLICATIONS?o DATA BASE WERY SYSTEMS
Eric Mays
Departmentof Canputerand InformationScience
Universityof Pennsylvania
Philadelphia,PA 19104
ABSTRACT
A
significant class of
failures
in
interactionswith data base query systemsare
attributable to misconceptions or
incunplete
knowledgeregardingthe danain of discourseon the
part of the user. This paper describes several
types of user failures, namely, intensional
failures of presumptions. These failures are
distinguished fran extensional failures of
presumptions since they are dependent on the
structure rather than the contentsof the data
basee A knowledge representation has been
developed for the recognition of intensional
failures that are due to the assumption of
non-existent relationships between entities.
Severalother intensionalfailureswhich dependon
more sophisticated knowledgerepresentations
are
also discussed. Appropriateforms of corrective
behavior are outlinedwhich would enable the user
to formulatequeriesdirectedto the solution of
his/her particular task and compatiblewith the
knowledgeorganization.
To avoid confusion, a clear distinction
should be made betweenfailuresand errors. An
error occurs when the system's response to an
input is incorrect. Errors generallymanifest
themselvesas incorrectresolutionof ambiguities
in word sense or modifierplacement. These errors
would usually be detected by the user when
presented with a paraphrase that differs in a
meaningfulway frcanthe originalinput [61. More
serious errors result fran incorrectcodingof
domain knowledge,and are often undetectable by
the user.
This paper concerns itself with the
recognition and correction of user failuresin
naturallanguagedata base query systems -- in
particular, failures that arise fran the user@s
beliefs about the structure, rather than the
cOntent, of the data base. The data base model
that has been implementedfor the recognition and
correction of simpleuser failuresabout the data
base structure is presented. Several other
failures which depend on more sophisticated
knowledgerepresentation
are also discussed.
I INTRCDUcrICN
II PRESUPFDSITICNANDPRESUMPTICN
An important aspect of natural language
interaction with intelligent systems is the
ability to deal constructivelywith failure.
Failurescan be viewed as being of two types. One
can be ascribedto a lack of syntactic, semantic,
or pragmaticcoverageby the system. This will be
termedsystem failure, and manifestsitself in the
inability of
the system to assign an
interpretation
to the user's input. Recent work
has been done in responding to these types of
failures,see for example, Weischedel and Black
[aI, and Kwasny and Sondheimer E31- A second
class of failuresmay be termed user failures. A
user failureariseswhen his/hermiefs about the
domain of discourse diverge from those of the
system.
**
*This work is partiallysupportedby a grant frcm
the NationalScienceFoundation,NSF-W 79-08401.
**Sane user beliefs regarding the domain of
discourse are implicitly encoded in questions
posedtothe system. The beliefs held by the
system
explicit
in
its knowledge
are
representation, either
procedurally
or
declaratively.
327
The linguistic notion of presupposition
provides a formal basis for the inferenceof a
significantclass of user beliefs. There is a
less restrictivenotion,presumption,which allows
the inferenceof larger class of user beliefs,
namely, that knowledgewhich the user must assume
when posing a question.
A presuppositionis a proposition that is
entailed by all the direct answers of a
question.*** A
presumption is
either
a
presuppositionor it is a propositionthat is
entailedby all but one of the direct answersof a
presuppositionis a stronger
question [21.Hence,
, and a presuppositionis a
versionof presumption
preswtption by definition. For example,question
(la)has severaldirect answerssuch as, "John",
"Sue", etc., and, of course, "no one".
Proposition(lb) is entailed by all the direct
answers to (la) except the last one, i.e., "no
-------m-e
***The complete definition of presupposition
includes the condition that the negationof a
question, direct answer pair entails the
presupposition.
one". Therefore,(lb) is a presumption of (la).
Proposition (Id) is a presuppositionof (lc),
since it is entailed by all of the question's
direct answers.
such a change usually represents a fundamental
modificationof the organizationof the enterprise
that is modelled. One can observethat structural
modifications occur over long periods of time
(manymonths to years, for example), while the
data base contents are subject to changeover
relativelyshorterperiodsof time (hourly,daily,
or monthly,for example).
(la)Which facultymembers teachCSEllO?
(lb)Facultymembers teach CSEllO.
(lc)When does John take CSEllO?
(ld)John takes CSEllO.
Presumptionscan be classifiedon the basis
of what is asserted -- i.e., an "intensional"
statementabout the structureof the data base or
an "extensional"statement about its contents.
Thus an extensional failure of a presumption
occurs based on the currentcontentsof the data
base, while an intensionalfailureoccurs based on
the structure or organization. For example,
question (2a)presumes propositions (2b), (2c),
and
Presumption (2b) is subject to
intensionalfailureif the data base does not
allow for the relation "teach"to hold between
"faculty"and "course"entities. An extensional
failure of presumption (2b) would occur if the
data base did not contain any "faculty member"
that "teaches" a "course". Also note that the
truth of (2b) is a pre-conditionfor the truth of
(2c)
(a)
Kaplan [2] has investigated the -tation
and correction of extensional failures of
preslrmptions.
The approachtaken there involves
accessing the contents of the data base to
determine if a presumption has a non-empty
extension. The remainderof this paper discusses
severalways a presumption might be subject to
intensional failure. These inferencesare made
from the structuralinformationof the data base.
l
l
(2a)Which facultymembers teach CSEllO?
(2b)Facultymembers teach courses.
(2~)Facultymembers teach CSEllO.
(Xi)CSEllO is a course.
Although a
presumption which
fails
intensionally
will
of
necessity fail
extensionally,
it is important to differentiate
between them, since an intensionalfailurethat
occurswill occur consistentlyfor a given data
base structure, whereas extensionalfailureis a
transitoryfunctionof the currentcontentsof the
data base. This is not meant to imply that a data
base structureis not subjectto change. However,
328
III --DATA F3ASEM3DEL
In order to recognize failures of
presumptions concerningthe structureof the data
base, it is necessaryto use a robust data model.
The discussionhere will assume a data base model
similarto that proposedby Lee and Gerritsen [4],
which incorporates the generalization
dimension
developed by Smith and Smith [7] into the
Basically,
entity-relationship model
[U *
entitiesparticipatein relationshipsalong two
orthogonal dimensions, aggregation
bxmg
dissimilar entities) and generalization (among
similar entities), as well as having attributes
that assume values. As an exampleof this type of
structure consider the data base model fragment
for a typicaluniversityin figure 1. Entity sets
are designatedby ovals, aggregationrelationships
by diamonds,and generalizationrelationshipsby
edges from the super-entityset to the sub-entity
set. The parallelarcs denotemutual exclusion.
Mutual exclusion is used to infer the
difference between"men that are also faculty" (a
possiblynon-emptyset) and "men that are also
women" (an empty set by definition),for exmple
given figure 1. This distinctioncan be made by
prohibiting the traversal of a path in the data
model that includes two entity sets which are
mutually exclusive. Furthermore,the path in the
generalization
dimensionis restrictedto "upward"
traversals followedby "downward"traversals. An
upward (downward)traversalis from a sub-entity
(super-entity)set to a super-entity(sub-entity)
set. This restriction is made to prevent
over-specialization of
an entity set when
traversingdownwardedges. The set of inferences
that can be made in the presence of this
restrictionis not overlyconstrained, since any
two entity sets that have a ccmmon intersection
(sub-entity
set) will also have a common union
(super-entity
set).*
IV INTENSICNALFAILURES
B. InapplicableFunctions
Intensional failures may also occur when
attempting to apply a functionon a domain. The
question,What is the averagegrade in CSEllO?",
will cause no processingproblemsprovidedgrades
are assignedover the real numbers. But if grades
ranged fron A to F, then the system should inform
the user that averagescan not be performed on
character data. (Note that the clever system
designermight trap this case and make numerical
assignments to the letter grades.) A more
significantaspect is the notionof a function to
be meaningfulover a particulardcmain. That is,
certainoperations, even though they might be
applicable, may not be meaningful. An example
would be "averagesocial security number". The
user who requested such a amputation does not
reallyunderstandwhat the data is supposed to
represent. In such cases a short explanation
regarding the function of the data would be
appropriate. To achievethis type of behavior,of
course,the data base model must be augmented to
includetype and functionalinformation.
A. Non-existentRelationships
C. HigherOrder Failures
The most basic intensionalfailure that can
occur is the presumption of a non-existent
In the
relationship between entity sets.
university data base model fragmentgiven above,
such a failure occurs in the question "Which
faculty take courses?". This questionpresumes
that a "take" relationship could exist between
"faculty" and "courses"entities. Since no such
relationshipcan be established,that presumption
has failed intensionally.Recognizingthe failure
is only part of the problem-- it is also useful
to provide the user with related intensional
knowledge. Given a relationR, entitiesX and Y,
failed presumption (R X Y), salient
and a
intensionalknowledgecan be found by abstracting
on R, X, or Y to create a new relation. For
example,considerthe followingexchange:
Q: Which facultytake courses?"
A: "I don't believe that faculty can take
courses.
Faculty teach courses.
Studentstake courses."
A similarfailureoccurs in the presumption of a
non-existent attribute of an entity set. For
example,What is the cost of all courses taught
by teaching assistants?", incorrectly presumes
that in this data base, "cost" is an attribute of
"courses".
The mutual exclusion operator allows the
detectionof a failurewhen the questionspecifies
a restrictionof an entity set by any two of its
mutually exclusivesub-entitysets. For examplep
Which teachers that advise students take
courses?" presumes that there could be same
"teachers"that are both "faculty"and "students".
Since this situationcould never arise,given the
structurein figure 1, it should be cammunicated
to the user as an intensionalfailure. If an
exhaustiveness
operatoris incorporated as well,
unnecessary restrictions of an entity set by
disjunctionof all of its exhaustive sub-entity
sets can be detected. Althoughthis would not
constitutea failure,it does indicatethat there
is scane misconceptionregardingthe structureof
the data base on the part of the user. If the
sub-entitysets were known to be exhaustiveby the
user, there would be no reason to make the
restriction. As an example,the additionof the
fact that "grads"and "undergrads"
were exhaustive
sub-entity sets of "students" would cause this
misconceptionto arise in the question "Which
students are either grads or undergrads?".The
following behavior would be desired in these
cases:
Q: "Whichteachersthat advise students take
courses?"
A: "Facultyadvise students.
Studentstake courses.
I don't believethat a teacher can be
both a facultymember and a student."
*See [5] for a more detaileddescription.
329
D. Data Currency
[2] Kaplan,S.J., Cooperative Responses Fran a
Portable Natural LanguageData Base Query -em;
Ph.D. Dision,
Ccmpz
and
Information
Science Department, University of Pennsylvania,
Philadelphia,PA., 1979,
Same failuresdepend on the currency of the
data. One such example occurs in a naval data
base about ships, subs, and aircraft. The
question"What is the positionof the Kitty Hawk?"
presumes that timely data is maintained.
Actually, positions of friendly vessels are
current,while those of enemy ships might be
hopelessly out of date. In this case, the
failures would be extensional since the last
update of the attribute must be checked for
currency. It may be the case that sane data is
current while other data is not. However,the
update processing time lag from actual event
occurence to capture in the data base might be
sufficientlylong that such presumptionsmight be
subject to intensional failure. Thus the user
could be made aware that current data was never
available.
[31 Kwasny, S.C.r and Sondheimer, N.K.,
"Ungrammaticality and Extra-Grannnaticality
in
Natural
Language Understanding
systerns"
,
Proceedings of the Conferenceof the Association
for Cqutational Linguistics, La Jolla, CA.,
August 1979.
[4] Lee, R.M. and Gerritsen, R., "A Hybrid
Representation for Database Semantics",Working
Paper 78-01-01, Decision Sciences Department,
Universityof Pennsylvania,1978.
"Correcting MisconceptionsAbout
[51 Maysr E.,
Data Base Structure", Proceedings of
the
Conference of the Canadian
for
Society
Computational Studies of Intelligence,
Victoria,
BritishColumbia,Canada,May 1980.
v coNcLus1oN
In this paper we have discussedseveral.kinds
of failures of presumptions that depend on
knowledgeabout the structureor organization of
the data base. It is importantto distinguish
betweenstructureand content,since there is a
significant difference in the rate at which they
change. When respondingto intensional failures
of presumptions, simplypointingout the failure
is in most cases inadequate. The user must also
be informedwith regard to relatedknowledgeabout
the structure of the data base in order to
formulate queries directed at solving his/her
particularproblem. The technique described for
recognizing intensional failuresthat are due to
the presumption of non-existent relationships
between entities and attributesof entitieshas
been implemented. Further work is aimed at
developing knowledgerepresentations
for temporal
and functionalinformation.We hope to eventually
develop a general account of user failuresin
naturallanguagequery systems.
[63 McKeown,K., "Paraphrasing
Using Given and New
Information in a Question-Answer system",
Proceedingsof the Conferenceof the Association
for Canputational Linguistics, La Jolla, CA.,
August 1979.
[7] Smith, J.M. and Smith, D.C.P., "Database
Abstractions: Aggregation and Generalization",
ACM Transactionson DatabaseSystems,Vol. 2, No.
2, June 1977.
183Weischedel,R.M., and Black, J., "Responding
Intelligently to UnparseableSentences",American
Journalof CanputationalLinguistics,Vol. 6, No.
2, April-June1980.
I would like to thank Peter Buneman, Aravind
Joshi, Kathy McKeown,and BonnieWebber for their
valuablecammentson an earlier draft of this
p a p er
l
[l] Chen, P.P.S., "The Entity-RelationshipModel
-- Tawards a Unified View of Data", ACM
Transactionson DatabaseSystems,Vol. 1, No. 1,
March 1976.
330