VERIFICATION ISSUES IN COMBINING INDEPENDENTLY CONSISTENT

advertisement
From: AAAI Technical Report WS-93-05. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
VERIFICATION
ISSUES
IN COMBINING INDEPENDENTLY
CONSISTENT THEORIES
Windy Gambetta
ArtificialIntelligenceLaboratory,
Schoolof Computer
ScienceandEngineering,
University of NewSouthWales, POBox1, Kensington,NSW,Australia, 2033
windy@cs.unsw.oz.au
Abstract
This paper addresses verification issues in
combining knowledge bases or theories from
different sources. Tile knowledgemaybe acquired
using direct knowledgeacquisition or inductive
learning methods. Keeping the consistency of
knowledge becomes all important issue when
integrating two or moreknowledgebases. Several
independentlyconsistent knowledgebases maygive
an inconsistent system whenthey are combined.
Theinconsistencies whichcould be syntactical or
semantic are described in this paper and some
suggestionsto handle themare given. This process
can be used to refine knowledgebases, in terms of
validity and completeness,and to obtain a larger
integrated system.
1. Introduction
Expert systems have been widely used in various
areas. Theyhave becomethe most successful area
of Artificial Intelligence. Real systemswith large
knowledge bases t are not uucommon, so
correctness is important. The careless design and
implementation of knowledge acquired using
various techniquescan give problemsranging from
minorto major [1]. Becauseof this, verification
and validation of knowledgebases has becomean
active researcharea
Manyverification techniques have been developed
to reveal anomalies but are only suitable for
verifying small to mediumknowledge bases,
systems with several hundreds rules 2. For larger
knowledgebases, these techniques do not give a
goodperformancebecause their thne complexityis
exponential especially when checking global
consistency [2, 3]. As an exmnple, COVER
[3]
needs 45 minutes to check of a knowledgebase
with 150 rules and 3.5 hours for 550 rules. One
alternative is to use distributed verification [4] but
the complexityappears to remaiuexponential.
2. Related Researchin Integrating
Knowledge Bases
The integration process is not widely studied.
Mostknowledgebases are built individually rather
than resulting from integration. Theyare basedon
either direct knowledgeacquisition, inductive
algorithms or case based reasoning; and perhaps
using more.thmlone expert. A few approaches to
integration havebeenstudied.
Brazdil and Torgo[5] conductedan experimentto
developan integrated knowledgebase. Theydivided
a set of cases into several subsets and the applied
file inductive algorithms, ID3 and AQ11
to generate
several individual theories. Thetheories weretested
to measure the "quality" of each rule (i.e. the
numberof test cases successfullyclassified). Based
on this quality, an integrated theory wasdeveloped
that consists of the best rules from the sub
knowledgebases. It is clear that this approachis
onlyapplicable if tile knowledge
bases to integrate
havethe s,’une set of attributes and suitable test
casesare available.
Ngwenyamaand Bryson in [6] have a similar
approach to Brazdil and Torgo. Despite using
knowledgebases from cases, their knowledgewhich
camefrom different experts still has to have the
sameattributes. Their propositional systems are
written in tabular form.
Another paper concerning integrated knowledge
bases is by Eggen et al [7]. A project called
ACKnowledge
which is developing a knowledge
engineering workbenchincludes an integration
method. The methodis based on Mun’ay’smodel
1 Knowledge base and dleory in this paper have die same
meaning
2The knowledge bases dealt
based.
This paper deals with the verification problemof
combining knowledge bases where several
knowledgebases whichhave been verified will not
always give a consistent system whenintegrated.
Conflicting views about someconcepts mayoccur
because the expertise may differ, or more
commonlythe knowledge bases are far from
complete; the knowledgebases mayperform well
but mayuse conceptsdifferently.
with in this paper are rule
65
[8] whereone knowledgebase is addedrule by rule
to another knowledgebase so that a rule is only
addedif it does not conflict with the knowledge
basebeingaddedto. If it is not tile case,tile user is
askedto resolve the conflict. The problemin this
methodis howto choosetile main knowledgebase
and howto order the rules. Ill a real situation, if
we want to join two knowledgebase it is very
difficult to measurewhichone is moreimportant.
The methodwill work regardless, but the effort
required to verily the systemwill dependon tile
choce of main knowledgebase and tile order in
whichrules are added.
Unlike the aforementionedproposals, Baral, Kraus,
Minker, and Subrahmanianin [9, 10] combined
theories in first order predicate calculus. This
approachis tileoretical and has somedifferences in
howinconsistency problems are viewedfrom file
knowledgebased systemview point.
Blackboardapproaches[11] also can be viewedin
terms of knowledgebase integration. Instead of
trying to develop one integrated system, systems
work together to solve the problem.Theywork by
changingcommon
data using the blackboard.
Van Someren et al [12] suggest integrating
paradigmsof knowledge
acquisition. In their paper,
paradigms to develop knowledge bases are
mentionedand analysed. Basedon the result, they
argue that an integrated systemcan be developed
midwill be better than those developedusing one of
the paradigms.
3. Verification
of Knowledge Bases
Verification of knowledgebases deal with the
processof evaluatingthe internal consistencyof the
knowledge. Verification normally assumes that
terms used in the system represent mutually
exclusive concepts which have no other relation
with other conceptsexcept tile explicit relations
represented by rules. It is only concernedwith
syntactical consistency. It meansthat any term /
symbolused to represent particular conceptdiffers
from any other term / symbol. In combining
knowledgebases the issue of semanticconsistency
arises, i.e. the relationship amongconceptsother
than those whichare representedas rules.
3.1. Syntactical Consistency
Manyreferences about anomalies in knowledge
base[4, 13-17] identify different types of problem,
but these can be reduced to four classes redundancy, circularity,
contradiction, and
deficiency.
66
a. A rule is redundantif it can be derived using
inference proceduresfromsomeother rules.
b. Circularity is defined as the problem of the
existenceof a chain of dependency
of rules that cau
lead to an infinite loop.
c. Contradictionoccurswhenan illegal situation, a
conflicting conclusion happens because of the
firing of somerules. Anexampleof conflicting
conclusions is two complementaryconclusions
fromone set of permissibleinput.
d. Deficiencymeansthat the knowledgebase gives
no meaningfulconclusion, and does not reach final
hypothesisfor a permissibleinput.
3.2. Semantic Consistency
The semantic view of knowledgebase focusses on
relationships which are probably not covered by
rules. Someobviousrelationships are :
a. Synonym
Twodifferent terms are synonyms
if they have tile
samemeaning.For instance, student and scholar.
Even here there may be a more complex
relationship if scholarin one knowledge
base refers
to both post graduates and undergraduateswhile in
¯ tile other studentrefers to undergraduates.
b. Antonym
Twodifferent terms are antonymsof each other if
they refer to conflicting concepts. For instance,
from the Garvan ES-1 expert system [18],
TSH_HIGH
is an antonym of TSH_LOW.
Raffler
than two booleanvariables these conceptswouldbe
better represented by ordinal values for a single
variable, TSH.
c. Positive Constraint
Twoterms or moreare in a positive constraint if
they always both apply. For instance if A and B
are in positive constraint, it meansthat every time
A happensso does B.
d. NegativeConstraint
A negative constraing is where a set of term can
not happentogether. If someof themhappenthen
the rest have to be false. For instance the
relationship between MALE
and PREGNANT.
If
MALE
is true then PREGNANT
has to be false,
and vice versa.
Semantic ’constraints representing semantic
consistency could be declared in the knowledgebase
using rules or meta rules. Theimplicit assumption
taken in designingthe knowledgebase is that tile
systemis closed: constraints exist onlyif explicitly
defined. The introduction of semantic consistency
affects verification and the size of tile verification
prooblem space, e.g. synonymous and
antonymous terms relate to redundancy and
circularity.
consistency is the motivationbehindthis strategy.
Theintegration processis focussedon those p,’u’ts
of the knowledge
that mayinteract.
4. Proposal for Verification in the
Integration
Process
The process of verification in integrating
knowledgebases can be divided into a uumberof
steps, such as
1. selecting parts of knowledge
to merge,
2. integratingthese parts,
3. checkingfor anomaliesin the result and then
4. merging the remaining knowledge with the
result of step 3.
In this paper, weassumea knowledgebase is a set
of rules in Hornclause formwith facts/findings as
conditions and hypotheses as conclusions. Some
conclusions are final hypotheses if they never
appear as conditions in a rule. It is clear that a
knowledge
base can be representedas a set of rules
which map facts into final hypotheses, e.g.
Hi ~ (fi), Hi is a final hypothesis,and (fj) is
of facts which conclude Hi. Colombhas shown
that all propositionalexpert systemscan be reduced
to table form[19].
4.1. Selecting part of the knowledge to
merge
Thereare a numberof strategies whichcould be use
to select whichknowledgeto merge.
a. 1Ex.haustive
Integration
All of knowledgeis put together and consideredin
the verification process. In this strategy we assume
that all part of the systems are affected by the
integration process so that we have to check a
wholeintegrated system. This simple strategy will
take a long time because we have to check all of
knowledge.
b. Feed-forwardIntegration
In this strategy one knowledgebase is chosenas
die backgroundknowledgewhile rules from other
knowledgebases are addedone by one as input to
tile mainknowledge.Althoughthe consistency of
the main knowledgebase can be kept through the
feed-forwardintegration process by checkingthe
effect of the incomingrule on all parts of the main
knowledge,
the difficulty of die processwill depend
on the the choice of the mainknowledgebase and
the order in whichrules are added. Thesechoices
are not clear cut.
c. Common-term
Integration
The minimisation of the knowledge to be
consideredin integrating and checkingits internal
67
Common
term integration, the methodwe propose
in this paper, tries to minimisedie integration
task. In this regard it should be moreconsistent
than feed forwardor rule at a time integration where
task minimisationdependson difficult choices of
knowledgebase mid rule ordering. Co~mnon
term
integration minimises the task because it only
considersthe relevant parts of the knowledge
bases.
Only in the worst case does the commontenn
integration’ methodneed to consider the whole
knowledge base. This methods also ensures
completenessbecauseall of the relevant knowledge
is checkedtogether. The verification process will
consider all the relations amongthe common
terms
at the sametime.
There are two ways to find the commonterms:
syntactical and semantic. In the syntactical
approacha term X is selected as a common
term if
the twoknowledgebases contain Xas either fact or
final hypothesis. Here, the same term from any
knowledgebase is assumedto represent the s,’une
concept. On the other hand, in the semantic
approachuse of the sameterms does not guarantee
that they refer to the sameconcepts. Twodifferent
terms could represent tile sameconceptor one term
from different knowledgebases could point to
different concepts.In this case, an oracle or user is
neededto give informationaboutthe relation.
Usingdie common
terms identified, we can find the
intersecting knowledgewhich can be groupedinto
tlu’eeclasses:
a. Intersection betweenfact sets whichideutify the
sameconcept,
b. Intersection between final hypotheses which
identify die sameconcept,
c. Intersection betweenfacts and final hypotheses
where the conclusion derived from one knowledge
baseis input for the other.
4.2. Integrating the selected knowledge
parts
The intersections in the knowledge
are collected for
integration. They are described in terms of
relationship betweenfinal hypotheses and facts
whereeach final hypothesishas a set of supporting
facts. In addition to these relations, somesemantic
relations mayalso apply. All of the selected part
are then combinedusing these relations to create
further relationships betweenfacts and hypotheses
whereappropriate. The tenns with the s,’une name
but have different meaningsshould be renamed,
while those with different namesbut havefile same
meaningobviouslyneedto be treated as tile same.
antonymreiation. Redundancywould also be the
correct analysis if there wasa subsumption
relation
betweenthe hypotheses. The oracle wouldhave to
be involvedin resolving such problems.
4.3. Checking for anomalies in the
result
Havingidentified tile intersection of the knowledge
bases and drawnrelations betweenthem the next
step is to checkthe consistency of this knowledge
when combined. The main differences between
checkinganomalies in single knowledgebases and
integrated systems is the semantic aspects. It is
not sufficient to consider only syntactic aspects
unlessweare sure that the termsused,are consistent
in that each term is used uniquely to represent
different concept.
c. Conflict
Like redundancy problems, checking conflicting
rules involves semanticrelations. Becauseof this,
somerules can contain this anomalyif riley depend
on conflicting facts whichbreak someconstraints.
Joining two inference paths from different
knowledge
bases reflecting different viewsmaygive
suchconflicts.
Besidescheckingfor anomalies,it is desirable to
search for hidden relations in the integrated
knowledgesince an oracle is unlikely to know
about both knowledge bases. Checking for
anomalieswe include refining tile knowledgebase
usingthese relations.
a. Deficiency
The problemof deficiency is unlikely to occur in
an integrated knowledgebase if it is built from
deficient-free knowledgebases. Assmning
that tile
necessaryrelations havebeenfoundtile integrated
system will have no deficiency unless some
careless changes are made when refining the
system. To avoid this situation, any change made
should not reduce tile domaincoverage of the
system.
b. Redundancy
Tworedundancy-free knowledge bases may give
someredundant rules if they are combined.In a
single knowledgebase, redundancydetection can be
performedonly for rules with tile s,’une hypotheses
but for an integrated knowledgebase we also have
to consider rules with different hypothesesbecause
eventhoughthey are apparentlydifferent there may
be a semanticlink betweenthem.
Somesituations which would be classified as
inconsistency in a single knowledge base
verification are actually redundancyproblemsin an
integrated system. As an example consider two
rulesX(--- (al, 2 ..... a k) and ~Y (at, a ........ at ,).
If (al,a2 ..... ak) =(at,a,, ..... at,), then in single
systemthese rules can be classified as conflicting
because X is not file stone as Y, but if they are
from two knowledge bases there would be
redundancyif there was a synonym
relation between
the hypotheses or inconsistency if there are
For a set of rules conflicts canbe classified in thrcc
ways.Firstly, a set of rules with the s,’une final
hypothesesconflict if somefacts for the hypotheses
conflict. Secondly,a set of rules with conflicting
hypothesesconflict if the fact set of any rule is
subset of those of other rules in that set. Thirdly,
contradictionmayoccur whensets of rules havetile
samefact set but with different hypotheses. As
mentionedbefore, these hypotheses mayrepresent
the sameconcept.If it is the case then no conflict
exists but redundancy.
d. Circularity
For each common
term each fact in tile fact sel is
tested by substituting it with the fact set from
whichtile fact to be replaced can be concluded.If
the result contains terms whichincludes its header
then a circular path is detected. As an example,
A~-.(B,C,D) and B(---(A,D,X),will
give a
combinedrule A <--- (A, D, X, C). This can happen
if some facts from one knowledgebase become
final hypothesesfor the other knowledgebase, and
vice versa.
4.4. Merging the remaining knowledge
with the integrated part
The last step is mergingthe integrated knowledge
with all of tile non-selected knowledge.To avoid
conflicts whichmayoccur becauseof terms used as
intermediate hypotheses, names have to be kept
unique. This can be performedsimply by including
in the namesomeidentification of tile knowledge
base it come. For instance, an intermediate
hypothesis A from the knowledgebase 1 may be
renamedinto A1.
5. Discussion
In this paper someaspects of verification in
integrating knowledgeare addressed. The verification processhas to be extendedif weare dealing
with an integrated knowledge base which is
68
5
produced from several smaller knowledgebases
whichare independentlyconsistent. The syntactic
and semanticanomaliesare the result of different
views of the domain taken by the knowledge
engineer (and expert) whendesigning the system.
Asreported by Shaw[20] different experts maynot
agree with each other as they mayuse different
terminology for the same concept or the same
terminology for different concepts,. Even one
single expert maycreate this problemwhendealing
with the sameconceptsat different times. Because
of this problem, the verification of integrated
knowledge bases can not be fully automatic.
Semantic relationships amongterms used must be
found to enable anomalies in the system to be
discovered.
These relations can be obtained manually or
possibly semi-automatically. In a fully manual
methodfile oracle or user will be asked to supply
these relations in terms rules or meta rules. The
verification process will use this information to
check the consistency. On the other hand, in
computer-aidedmodethe identification of these
relations could be guided by an inductive logic
programwhichwill try to find the relations and ask
the user to verify the relation it suggests. Bythis
interaction, anomalies can be removedand some
hidden concepts can be uncovered [21, 22].
Inductive logic programmingwouldappear to be
feasible for this task particularly if the type of
relations to be discoveredare all fully identified as
backgroundknowledge.However,at this stage the
utility of ILPfor this task is an hypothesis.
concepts. Another consideration to take into
accountis defining somecharacteristics of systems
to integrate.
6. Acknowledgment
I would lille to thank mysupervisor, Mr. Paul
Compton,for his helpful comments.
References
1. Geismann,J.R. and R.D. Schultz, Verification
and Validation of Expert Systems, AI Expert,
1988,3(2): p. 26 - 33.
2. Ginsberg, A., A NewApproach to Checking
Knowledge Bases for Inconsistency
and
Redundancy.In Proceedings of The Third Annual
Expert Systems in GovernmentConference, 1987,
Washington DC.
3. Preece, A.D., R. Shinghal, and A. Batarekh,
Verifying Expert Systems ¯ A Logical Framework
and a Practical Tool, Expert Systems With
Applications, 1992, 5: p. 421- 436.
4. Zlatareva, N.P, Distributed Verification : ANew
Formal Approachfor Verifying Knowledge-Based
Systems. In Proceedingsof The WorldCongresson
Expert Systems, 1991, Orlando, Florida, USA:
PergamonPress.
In addition to joining knowledge bases, the
integration processcan also be used for verifying a
large knowledgebase. The systemis divided into
partitions using someclustering methodso that
related rules are grouped.Thepartitions created are
considered as knowledge bases, verified
independently and then the integration process
applied. A good partition algorithm will then
minimisethe knowledgeparts to be considered in
integration becauseeachpartition will be relatively
independent.This is a divide-and-conquerapproach
where a problem is divided into sub problems
whichare easier to handle. The partitions will be
easier to verify than the whole knowledgebase
becauseof their size. This conjectureremainsto be
proven.
5. Brazdil, P.B. and L. Torgo, Knowledge
Acquisition via Knowledge
Integration. In Current
Trends in KnowledgeAcquisition, B. Wielinga, et
al., editor, 1990, lOS: Amsterdmn.
p. 90 - 104.
6. Ngwenyama,O.K. and N. Bryson, A Formal
MethodFor Analysing and Integrating Tile Rule
Sets of Multiple Experts, Information Systems,
1992, 17(1): p. 1 - 16.
7. Eggen, J., A.M. Lundteigen, and M. Mehus,
Integration
of Knowledge from Different
Knowledge
Acquisition Tools. In Current Trendsin
Knowledge
Acquisition, B. Wielinga,et al., editor,
1990, IOS: Amsterdmn,p. 123 - 142.
Although it looks promising, the idea of
integration should be tested using someactual
knowledgebases. The next step in our research is
to implement and test this idea against real
knowledge
bases. Acrucial issue will be the use of
Inductivelogic progranuningtechniquesto e~flmnce
the process of uncovering relations among
69
8. Murray,.K.S. and B.W. Porter, Developing a
tool for knowledgeintegration: Initial Results. In
Proceedings of The Third KnowledgeAcquisition
for Knowledge-based Systems Workshop, 1988,
Banff, Canada.
6
9. Baral, C., S. Kraus, and J. Minker,Combining
Multiple KnowledgeBases, IEEETransactions on
KnowledgeAnd Data Engineering, 1991, 3(2):
208 - 220.
10. Baral, C., et al., Combining
Knowledge
Bases
Consisting of First-Order Theories, Computational
Intelligence, 1992,8(1): p. 45 - 71.
11. Nii, H.P, BlackboardSystems:The Blackboard
Modelof Problem Solving and The Evolution of
Blackboard Architectures (Part One), The AI
Magazine,1986, Summer1986: p. 38 - 53.
12. Van Someren, M.W., L.L. Zheng, and P.
Wilfried, Cases, modelsor compiledknowledge?
- a
comparativeanalysis and proposedintegration. In
Current Trends in KnowledgeAcquisition, B.
Wielinga,et al., editor. 1990, ItS: Amsterdanl.p.
339- 355.
Eight National Conference on Artificial
Intelligence, 1990, AAAIPress.
20. Shaw, M.L.G, Validation in A Knowledge
Acquisition Systems with Multiple Experts. In
Proceedings of The International Conference on
Fifth Generation ComputerSystems, 1988, Tokyo:
ICOT.
21. Muggleton, S, Duce, An Oracle Based
Approachto ConstructiveInduction. In Proceedings
of International Joint Conferenceon Artificial
Intelligence.
1987, Milan, Italy: Morgan
Kaufinann.
22. Muggleton, S. ,editor, Inductive Logic
Programming,1992, AcademicPress: London.
13. Yu, X. and G. Biswas, CHECKER:An
Efficient
Algorithm for Knowledge Base
Verification.
In Proceedings of The Third
International Conference on Industrial and
EngineeringApplicationsof Artificial Intelligence
and Expert Systems, IEA/AIE-90,1990.
14. Preece, A.D., R. Shinghal, and A. Batarekh,
Principles and Practice In Verifying Rule-based
Systems, The Knowledge Engineering Review,
1992, 7(2): p. 115- 141.
15. Chang, C.L., J.B. Combs, and R.A.
Stachowitz, A Report on The Expert Systems
Validation Associate (EVA),Expert Systems With
Applications, 1990, 1(1): p. 217 - 230.
16. Harrison, P.R. and P.A. Ratcliffe, Towards
Standards for the validation of Expert Systems,
Expert Systems With Applications, 1991, 2: p.
251- 258.
17. Nazareth,D.L, Issues in Theverification of
Knowledge
In Rule-basedSystems, International
Journalof Man-Machine
Studies, 1989, 30: p. 255
- 271.
18. Horn, K.A., et al.,An Expert SystemFor The
Interpretation of ThyroidIn A Clinical Laboratory,
The Australian ComputerJournal, 1985, 17(1):
7-11.
19. Colomb, R.M. and C.Y. Chung, Very Fast
Decision Table Execution of A propositional
Expert System. In Proceedings of AAAI-90The
7O
Download