From: AAAI Technical Report WS-93-05. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. VERIFICATION ISSUES IN COMBINING INDEPENDENTLY CONSISTENT THEORIES Windy Gambetta ArtificialIntelligenceLaboratory, Schoolof Computer ScienceandEngineering, University of NewSouthWales, POBox1, Kensington,NSW,Australia, 2033 windy@cs.unsw.oz.au Abstract This paper addresses verification issues in combining knowledge bases or theories from different sources. Tile knowledgemaybe acquired using direct knowledgeacquisition or inductive learning methods. Keeping the consistency of knowledge becomes all important issue when integrating two or moreknowledgebases. Several independentlyconsistent knowledgebases maygive an inconsistent system whenthey are combined. Theinconsistencies whichcould be syntactical or semantic are described in this paper and some suggestionsto handle themare given. This process can be used to refine knowledgebases, in terms of validity and completeness,and to obtain a larger integrated system. 1. Introduction Expert systems have been widely used in various areas. Theyhave becomethe most successful area of Artificial Intelligence. Real systemswith large knowledge bases t are not uucommon, so correctness is important. The careless design and implementation of knowledge acquired using various techniquescan give problemsranging from minorto major [1]. Becauseof this, verification and validation of knowledgebases has becomean active researcharea Manyverification techniques have been developed to reveal anomalies but are only suitable for verifying small to mediumknowledge bases, systems with several hundreds rules 2. For larger knowledgebases, these techniques do not give a goodperformancebecause their thne complexityis exponential especially when checking global consistency [2, 3]. As an exmnple, COVER [3] needs 45 minutes to check of a knowledgebase with 150 rules and 3.5 hours for 550 rules. One alternative is to use distributed verification [4] but the complexityappears to remaiuexponential. 2. Related Researchin Integrating Knowledge Bases The integration process is not widely studied. Mostknowledgebases are built individually rather than resulting from integration. Theyare basedon either direct knowledgeacquisition, inductive algorithms or case based reasoning; and perhaps using more.thmlone expert. A few approaches to integration havebeenstudied. Brazdil and Torgo[5] conductedan experimentto developan integrated knowledgebase. Theydivided a set of cases into several subsets and the applied file inductive algorithms, ID3 and AQ11 to generate several individual theories. Thetheories weretested to measure the "quality" of each rule (i.e. the numberof test cases successfullyclassified). Based on this quality, an integrated theory wasdeveloped that consists of the best rules from the sub knowledgebases. It is clear that this approachis onlyapplicable if tile knowledge bases to integrate havethe s,’une set of attributes and suitable test casesare available. Ngwenyamaand Bryson in [6] have a similar approach to Brazdil and Torgo. Despite using knowledgebases from cases, their knowledgewhich camefrom different experts still has to have the sameattributes. Their propositional systems are written in tabular form. Another paper concerning integrated knowledge bases is by Eggen et al [7]. A project called ACKnowledge which is developing a knowledge engineering workbenchincludes an integration method. The methodis based on Mun’ay’smodel 1 Knowledge base and dleory in this paper have die same meaning 2The knowledge bases dealt based. This paper deals with the verification problemof combining knowledge bases where several knowledgebases whichhave been verified will not always give a consistent system whenintegrated. Conflicting views about someconcepts mayoccur because the expertise may differ, or more commonlythe knowledge bases are far from complete; the knowledgebases mayperform well but mayuse conceptsdifferently. with in this paper are rule 65 [8] whereone knowledgebase is addedrule by rule to another knowledgebase so that a rule is only addedif it does not conflict with the knowledge basebeingaddedto. If it is not tile case,tile user is askedto resolve the conflict. The problemin this methodis howto choosetile main knowledgebase and howto order the rules. Ill a real situation, if we want to join two knowledgebase it is very difficult to measurewhichone is moreimportant. The methodwill work regardless, but the effort required to verily the systemwill dependon tile choce of main knowledgebase and tile order in whichrules are added. Unlike the aforementionedproposals, Baral, Kraus, Minker, and Subrahmanianin [9, 10] combined theories in first order predicate calculus. This approachis tileoretical and has somedifferences in howinconsistency problems are viewedfrom file knowledgebased systemview point. Blackboardapproaches[11] also can be viewedin terms of knowledgebase integration. Instead of trying to develop one integrated system, systems work together to solve the problem.Theywork by changingcommon data using the blackboard. Van Someren et al [12] suggest integrating paradigmsof knowledge acquisition. In their paper, paradigms to develop knowledge bases are mentionedand analysed. Basedon the result, they argue that an integrated systemcan be developed midwill be better than those developedusing one of the paradigms. 3. Verification of Knowledge Bases Verification of knowledgebases deal with the processof evaluatingthe internal consistencyof the knowledge. Verification normally assumes that terms used in the system represent mutually exclusive concepts which have no other relation with other conceptsexcept tile explicit relations represented by rules. It is only concernedwith syntactical consistency. It meansthat any term / symbolused to represent particular conceptdiffers from any other term / symbol. In combining knowledgebases the issue of semanticconsistency arises, i.e. the relationship amongconceptsother than those whichare representedas rules. 3.1. Syntactical Consistency Manyreferences about anomalies in knowledge base[4, 13-17] identify different types of problem, but these can be reduced to four classes redundancy, circularity, contradiction, and deficiency. 66 a. A rule is redundantif it can be derived using inference proceduresfromsomeother rules. b. Circularity is defined as the problem of the existenceof a chain of dependency of rules that cau lead to an infinite loop. c. Contradictionoccurswhenan illegal situation, a conflicting conclusion happens because of the firing of somerules. Anexampleof conflicting conclusions is two complementaryconclusions fromone set of permissibleinput. d. Deficiencymeansthat the knowledgebase gives no meaningfulconclusion, and does not reach final hypothesisfor a permissibleinput. 3.2. Semantic Consistency The semantic view of knowledgebase focusses on relationships which are probably not covered by rules. Someobviousrelationships are : a. Synonym Twodifferent terms are synonyms if they have tile samemeaning.For instance, student and scholar. Even here there may be a more complex relationship if scholarin one knowledge base refers to both post graduates and undergraduateswhile in ¯ tile other studentrefers to undergraduates. b. Antonym Twodifferent terms are antonymsof each other if they refer to conflicting concepts. For instance, from the Garvan ES-1 expert system [18], TSH_HIGH is an antonym of TSH_LOW. Raffler than two booleanvariables these conceptswouldbe better represented by ordinal values for a single variable, TSH. c. Positive Constraint Twoterms or moreare in a positive constraint if they always both apply. For instance if A and B are in positive constraint, it meansthat every time A happensso does B. d. NegativeConstraint A negative constraing is where a set of term can not happentogether. If someof themhappenthen the rest have to be false. For instance the relationship between MALE and PREGNANT. If MALE is true then PREGNANT has to be false, and vice versa. Semantic ’constraints representing semantic consistency could be declared in the knowledgebase using rules or meta rules. Theimplicit assumption taken in designingthe knowledgebase is that tile systemis closed: constraints exist onlyif explicitly defined. The introduction of semantic consistency affects verification and the size of tile verification prooblem space, e.g. synonymous and antonymous terms relate to redundancy and circularity. consistency is the motivationbehindthis strategy. Theintegration processis focussedon those p,’u’ts of the knowledge that mayinteract. 4. Proposal for Verification in the Integration Process The process of verification in integrating knowledgebases can be divided into a uumberof steps, such as 1. selecting parts of knowledge to merge, 2. integratingthese parts, 3. checkingfor anomaliesin the result and then 4. merging the remaining knowledge with the result of step 3. In this paper, weassumea knowledgebase is a set of rules in Hornclause formwith facts/findings as conditions and hypotheses as conclusions. Some conclusions are final hypotheses if they never appear as conditions in a rule. It is clear that a knowledge base can be representedas a set of rules which map facts into final hypotheses, e.g. Hi ~ (fi), Hi is a final hypothesis,and (fj) is of facts which conclude Hi. Colombhas shown that all propositionalexpert systemscan be reduced to table form[19]. 4.1. Selecting part of the knowledge to merge Thereare a numberof strategies whichcould be use to select whichknowledgeto merge. a. 1Ex.haustive Integration All of knowledgeis put together and consideredin the verification process. In this strategy we assume that all part of the systems are affected by the integration process so that we have to check a wholeintegrated system. This simple strategy will take a long time because we have to check all of knowledge. b. Feed-forwardIntegration In this strategy one knowledgebase is chosenas die backgroundknowledgewhile rules from other knowledgebases are addedone by one as input to tile mainknowledge.Althoughthe consistency of the main knowledgebase can be kept through the feed-forwardintegration process by checkingthe effect of the incomingrule on all parts of the main knowledge, the difficulty of die processwill depend on the the choice of the mainknowledgebase and the order in whichrules are added. Thesechoices are not clear cut. c. Common-term Integration The minimisation of the knowledge to be consideredin integrating and checkingits internal 67 Common term integration, the methodwe propose in this paper, tries to minimisedie integration task. In this regard it should be moreconsistent than feed forwardor rule at a time integration where task minimisationdependson difficult choices of knowledgebase mid rule ordering. Co~mnon term integration minimises the task because it only considersthe relevant parts of the knowledge bases. Only in the worst case does the commontenn integration’ methodneed to consider the whole knowledge base. This methods also ensures completenessbecauseall of the relevant knowledge is checkedtogether. The verification process will consider all the relations amongthe common terms at the sametime. There are two ways to find the commonterms: syntactical and semantic. In the syntactical approacha term X is selected as a common term if the twoknowledgebases contain Xas either fact or final hypothesis. Here, the same term from any knowledgebase is assumedto represent the s,’une concept. On the other hand, in the semantic approachuse of the sameterms does not guarantee that they refer to the sameconcepts. Twodifferent terms could represent tile sameconceptor one term from different knowledgebases could point to different concepts.In this case, an oracle or user is neededto give informationaboutthe relation. Usingdie common terms identified, we can find the intersecting knowledgewhich can be groupedinto tlu’eeclasses: a. Intersection betweenfact sets whichideutify the sameconcept, b. Intersection between final hypotheses which identify die sameconcept, c. Intersection betweenfacts and final hypotheses where the conclusion derived from one knowledge baseis input for the other. 4.2. Integrating the selected knowledge parts The intersections in the knowledge are collected for integration. They are described in terms of relationship betweenfinal hypotheses and facts whereeach final hypothesishas a set of supporting facts. In addition to these relations, somesemantic relations mayalso apply. All of the selected part are then combinedusing these relations to create further relationships betweenfacts and hypotheses whereappropriate. The tenns with the s,’une name but have different meaningsshould be renamed, while those with different namesbut havefile same meaningobviouslyneedto be treated as tile same. antonymreiation. Redundancywould also be the correct analysis if there wasa subsumption relation betweenthe hypotheses. The oracle wouldhave to be involvedin resolving such problems. 4.3. Checking for anomalies in the result Havingidentified tile intersection of the knowledge bases and drawnrelations betweenthem the next step is to checkthe consistency of this knowledge when combined. The main differences between checkinganomalies in single knowledgebases and integrated systems is the semantic aspects. It is not sufficient to consider only syntactic aspects unlessweare sure that the termsused,are consistent in that each term is used uniquely to represent different concept. c. Conflict Like redundancy problems, checking conflicting rules involves semanticrelations. Becauseof this, somerules can contain this anomalyif riley depend on conflicting facts whichbreak someconstraints. Joining two inference paths from different knowledge bases reflecting different viewsmaygive suchconflicts. Besidescheckingfor anomalies,it is desirable to search for hidden relations in the integrated knowledgesince an oracle is unlikely to know about both knowledge bases. Checking for anomalieswe include refining tile knowledgebase usingthese relations. a. Deficiency The problemof deficiency is unlikely to occur in an integrated knowledgebase if it is built from deficient-free knowledgebases. Assmning that tile necessaryrelations havebeenfoundtile integrated system will have no deficiency unless some careless changes are made when refining the system. To avoid this situation, any change made should not reduce tile domaincoverage of the system. b. Redundancy Tworedundancy-free knowledge bases may give someredundant rules if they are combined.In a single knowledgebase, redundancydetection can be performedonly for rules with tile s,’une hypotheses but for an integrated knowledgebase we also have to consider rules with different hypothesesbecause eventhoughthey are apparentlydifferent there may be a semanticlink betweenthem. Somesituations which would be classified as inconsistency in a single knowledge base verification are actually redundancyproblemsin an integrated system. As an example consider two rulesX(--- (al, 2 ..... a k) and ~Y (at, a ........ at ,). If (al,a2 ..... ak) =(at,a,, ..... at,), then in single systemthese rules can be classified as conflicting because X is not file stone as Y, but if they are from two knowledge bases there would be redundancyif there was a synonym relation between the hypotheses or inconsistency if there are For a set of rules conflicts canbe classified in thrcc ways.Firstly, a set of rules with the s,’une final hypothesesconflict if somefacts for the hypotheses conflict. Secondly,a set of rules with conflicting hypothesesconflict if the fact set of any rule is subset of those of other rules in that set. Thirdly, contradictionmayoccur whensets of rules havetile samefact set but with different hypotheses. As mentionedbefore, these hypotheses mayrepresent the sameconcept.If it is the case then no conflict exists but redundancy. d. Circularity For each common term each fact in tile fact sel is tested by substituting it with the fact set from whichtile fact to be replaced can be concluded.If the result contains terms whichincludes its header then a circular path is detected. As an example, A~-.(B,C,D) and B(---(A,D,X),will give a combinedrule A <--- (A, D, X, C). This can happen if some facts from one knowledgebase become final hypothesesfor the other knowledgebase, and vice versa. 4.4. Merging the remaining knowledge with the integrated part The last step is mergingthe integrated knowledge with all of tile non-selected knowledge.To avoid conflicts whichmayoccur becauseof terms used as intermediate hypotheses, names have to be kept unique. This can be performedsimply by including in the namesomeidentification of tile knowledge base it come. For instance, an intermediate hypothesis A from the knowledgebase 1 may be renamedinto A1. 5. Discussion In this paper someaspects of verification in integrating knowledgeare addressed. The verification processhas to be extendedif weare dealing with an integrated knowledge base which is 68 5 produced from several smaller knowledgebases whichare independentlyconsistent. The syntactic and semanticanomaliesare the result of different views of the domain taken by the knowledge engineer (and expert) whendesigning the system. Asreported by Shaw[20] different experts maynot agree with each other as they mayuse different terminology for the same concept or the same terminology for different concepts,. Even one single expert maycreate this problemwhendealing with the sameconceptsat different times. Because of this problem, the verification of integrated knowledge bases can not be fully automatic. Semantic relationships amongterms used must be found to enable anomalies in the system to be discovered. These relations can be obtained manually or possibly semi-automatically. In a fully manual methodfile oracle or user will be asked to supply these relations in terms rules or meta rules. The verification process will use this information to check the consistency. On the other hand, in computer-aidedmodethe identification of these relations could be guided by an inductive logic programwhichwill try to find the relations and ask the user to verify the relation it suggests. Bythis interaction, anomalies can be removedand some hidden concepts can be uncovered [21, 22]. Inductive logic programmingwouldappear to be feasible for this task particularly if the type of relations to be discoveredare all fully identified as backgroundknowledge.However,at this stage the utility of ILPfor this task is an hypothesis. concepts. Another consideration to take into accountis defining somecharacteristics of systems to integrate. 6. Acknowledgment I would lille to thank mysupervisor, Mr. Paul Compton,for his helpful comments. References 1. Geismann,J.R. and R.D. Schultz, Verification and Validation of Expert Systems, AI Expert, 1988,3(2): p. 26 - 33. 2. Ginsberg, A., A NewApproach to Checking Knowledge Bases for Inconsistency and Redundancy.In Proceedings of The Third Annual Expert Systems in GovernmentConference, 1987, Washington DC. 3. Preece, A.D., R. Shinghal, and A. Batarekh, Verifying Expert Systems ¯ A Logical Framework and a Practical Tool, Expert Systems With Applications, 1992, 5: p. 421- 436. 4. Zlatareva, N.P, Distributed Verification : ANew Formal Approachfor Verifying Knowledge-Based Systems. In Proceedingsof The WorldCongresson Expert Systems, 1991, Orlando, Florida, USA: PergamonPress. In addition to joining knowledge bases, the integration processcan also be used for verifying a large knowledgebase. The systemis divided into partitions using someclustering methodso that related rules are grouped.Thepartitions created are considered as knowledge bases, verified independently and then the integration process applied. A good partition algorithm will then minimisethe knowledgeparts to be considered in integration becauseeachpartition will be relatively independent.This is a divide-and-conquerapproach where a problem is divided into sub problems whichare easier to handle. The partitions will be easier to verify than the whole knowledgebase becauseof their size. This conjectureremainsto be proven. 5. Brazdil, P.B. and L. Torgo, Knowledge Acquisition via Knowledge Integration. In Current Trends in KnowledgeAcquisition, B. Wielinga, et al., editor, 1990, lOS: Amsterdmn. p. 90 - 104. 6. Ngwenyama,O.K. and N. Bryson, A Formal MethodFor Analysing and Integrating Tile Rule Sets of Multiple Experts, Information Systems, 1992, 17(1): p. 1 - 16. 7. Eggen, J., A.M. Lundteigen, and M. Mehus, Integration of Knowledge from Different Knowledge Acquisition Tools. In Current Trendsin Knowledge Acquisition, B. Wielinga,et al., editor, 1990, IOS: Amsterdmn,p. 123 - 142. Although it looks promising, the idea of integration should be tested using someactual knowledgebases. The next step in our research is to implement and test this idea against real knowledge bases. Acrucial issue will be the use of Inductivelogic progranuningtechniquesto e~flmnce the process of uncovering relations among 69 8. Murray,.K.S. and B.W. Porter, Developing a tool for knowledgeintegration: Initial Results. In Proceedings of The Third KnowledgeAcquisition for Knowledge-based Systems Workshop, 1988, Banff, Canada. 6 9. Baral, C., S. Kraus, and J. Minker,Combining Multiple KnowledgeBases, IEEETransactions on KnowledgeAnd Data Engineering, 1991, 3(2): 208 - 220. 10. Baral, C., et al., Combining Knowledge Bases Consisting of First-Order Theories, Computational Intelligence, 1992,8(1): p. 45 - 71. 11. Nii, H.P, BlackboardSystems:The Blackboard Modelof Problem Solving and The Evolution of Blackboard Architectures (Part One), The AI Magazine,1986, Summer1986: p. 38 - 53. 12. Van Someren, M.W., L.L. Zheng, and P. Wilfried, Cases, modelsor compiledknowledge? - a comparativeanalysis and proposedintegration. In Current Trends in KnowledgeAcquisition, B. Wielinga,et al., editor. 1990, ItS: Amsterdanl.p. 339- 355. Eight National Conference on Artificial Intelligence, 1990, AAAIPress. 20. Shaw, M.L.G, Validation in A Knowledge Acquisition Systems with Multiple Experts. In Proceedings of The International Conference on Fifth Generation ComputerSystems, 1988, Tokyo: ICOT. 21. Muggleton, S, Duce, An Oracle Based Approachto ConstructiveInduction. In Proceedings of International Joint Conferenceon Artificial Intelligence. 1987, Milan, Italy: Morgan Kaufinann. 22. Muggleton, S. ,editor, Inductive Logic Programming,1992, AcademicPress: London. 13. Yu, X. and G. Biswas, CHECKER:An Efficient Algorithm for Knowledge Base Verification. In Proceedings of The Third International Conference on Industrial and EngineeringApplicationsof Artificial Intelligence and Expert Systems, IEA/AIE-90,1990. 14. Preece, A.D., R. Shinghal, and A. Batarekh, Principles and Practice In Verifying Rule-based Systems, The Knowledge Engineering Review, 1992, 7(2): p. 115- 141. 15. Chang, C.L., J.B. Combs, and R.A. Stachowitz, A Report on The Expert Systems Validation Associate (EVA),Expert Systems With Applications, 1990, 1(1): p. 217 - 230. 16. Harrison, P.R. and P.A. Ratcliffe, Towards Standards for the validation of Expert Systems, Expert Systems With Applications, 1991, 2: p. 251- 258. 17. Nazareth,D.L, Issues in Theverification of Knowledge In Rule-basedSystems, International Journalof Man-Machine Studies, 1989, 30: p. 255 - 271. 18. Horn, K.A., et al.,An Expert SystemFor The Interpretation of ThyroidIn A Clinical Laboratory, The Australian ComputerJournal, 1985, 17(1): 7-11. 19. Colomb, R.M. and C.Y. Chung, Very Fast Decision Table Execution of A propositional Expert System. In Proceedings of AAAI-90The 7O